US20130179644A1 - Parallel processing processor system - Google Patents

Parallel processing processor system Download PDF

Info

Publication number
US20130179644A1
US20130179644A1 US13/784,738 US201313784738A US2013179644A1 US 20130179644 A1 US20130179644 A1 US 20130179644A1 US 201313784738 A US201313784738 A US 201313784738A US 2013179644 A1 US2013179644 A1 US 2013179644A1
Authority
US
United States
Prior art keywords
cache
memory
shared memory
processing
parallel processing
Prior art date
Legal status (The legal status is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the status listed.)
Abandoned
Application number
US13/784,738
Inventor
Hideyasu Tomi
Current Assignee (The listed assignees may be inaccurate. Google has not performed a legal analysis and makes no representation or warranty as to the accuracy of the list.)
Canon Inc
Original Assignee
Canon Inc
Priority date (The priority date is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the date listed.)
Filing date
Publication date
Application filed by Canon Inc filed Critical Canon Inc
Priority to US13/784,738 priority Critical patent/US20130179644A1/en
Publication of US20130179644A1 publication Critical patent/US20130179644A1/en
Abandoned legal-status Critical Current

Links

Images

Classifications

    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F12/00Accessing, addressing or allocating within memory systems or architectures
    • G06F12/02Addressing or allocation; Relocation
    • G06F12/08Addressing or allocation; Relocation in hierarchically structured memory systems, e.g. virtual memory systems
    • G06F12/0802Addressing of a memory level in which the access to the desired data or data block requires associative addressing means, e.g. caches
    • G06F12/0806Multiuser, multiprocessor or multiprocessing cache systems
    • G06F12/0808Multiuser, multiprocessor or multiprocessing cache systems with cache invalidating means
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F12/00Accessing, addressing or allocating within memory systems or architectures
    • G06F12/02Addressing or allocation; Relocation
    • G06F12/08Addressing or allocation; Relocation in hierarchically structured memory systems, e.g. virtual memory systems
    • G06F12/0802Addressing of a memory level in which the access to the desired data or data block requires associative addressing means, e.g. caches
    • G06F12/0806Multiuser, multiprocessor or multiprocessing cache systems
    • G06F12/084Multiuser, multiprocessor or multiprocessing cache systems with a shared cache
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F12/00Accessing, addressing or allocating within memory systems or architectures
    • G06F12/02Addressing or allocation; Relocation
    • G06F12/08Addressing or allocation; Relocation in hierarchically structured memory systems, e.g. virtual memory systems
    • G06F12/0802Addressing of a memory level in which the access to the desired data or data block requires associative addressing means, e.g. caches
    • G06F12/0875Addressing of a memory level in which the access to the desired data or data block requires associative addressing means, e.g. caches with dedicated cache, e.g. instruction or stack
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F2212/00Indexing scheme relating to accessing, addressing or allocation within memory systems or architectures
    • G06F2212/10Providing a specific technical effect
    • G06F2212/1016Performance improvement
    • G06F2212/1021Hit rate improvement

Definitions

  • the present invention relates to a parallel processing processor system, provided with multiple processors, that processes data to be processed in parallel using the multiple processors.
  • the present invention particularly relates to a parallel processing processor system capable of reducing inherent instruction cache capacities of each of the processors while also maintaining the degree of performance thereof.
  • MFPs Multifunction Peripherals
  • individual hardware logic is provided for processes such as image reading, recording, printing, communication, fax, and so on, thereby realizing functions requested of the MFP.
  • preparing circuits for each function makes it difficult to reduce the cost of the controller while also maintaining its functionality.
  • DSPs Digital Signal Processors
  • reconfigurable processors and configurable processors
  • reducing costs by switching firmware using multiple DSPs shall be considered as an example.
  • a configuration in which multiple DSPs that are each assigned to different image processes are connected and a series of multiple types of image processes are executed sequentially on the same image region is called “pipeline architecture”. If pipeline architecture is employed, differences in processing times among the DSPs will result in DSPs that act as bottlenecks, making sufficient throughput difficult to achieve.
  • the DSPs can be customized so that the processing times of the individual DSPs are equal.
  • a data parallel processing architecture in which the image data to be processed is divided, each piece of image data obtained through the division is assigned to a different DSP, and the multiple processes that were executed by different DSPs in the pipeline architecture are executed by those multiple DSPs, is more preferable than a pipeline architecture.
  • an architecture in which multiple DSPs are used, the image data to be processed is divided, and a series of processes are performed on the pieces of image data obtained through the division in parallel by the DSPs shall be called a data parallel processing architecture.
  • the size of the programs executed by the DSPs increases, and thus the cache miss rate is higher than when a pipeline architecture is employed for an instruction cache of the same capacity.
  • the DSP accesses a main memory.
  • the main memory is a DRAM (Dynamic Random Access Memory) or the like located off of the chip that implements the DSP.
  • a method that uses a secondary cache can be employed in order to reduce the latency at the time of a cache miss.
  • a “secondary cache” is a processor-specific storage device with a higher latency than a primary cache and a lower latency than a DRAM.
  • cache transfer is executed in units called “cache lines” and thus the efficiency is poor.
  • the present invention solves the above problems by employing a shared memory such as an SRAM (Static Random Access Memory) shared among DSPs rather than needing to employ a secondary cache.
  • a shared memory such as an SRAM (Static Random Access Memory) shared among DSPs rather than needing to employ a secondary cache.
  • SRAM Static Random Access Memory
  • the present invention provides, in a structure where data parallel processing is executed by multiple processors, a configuration capable of reducing the capacity of an instruction cache while obtaining the desired degree of performance.
  • a parallel processing processor system that includes multiple processors and performs parallel processing on data read out from a main memory using the multiple processors.
  • the system comprises multiple processor elements, each processor element including a processor and an instruction cache that holds an instruction corresponding to at least part of a program executed by the processor.
  • the system also comprises a shared memory, whose latency with the processors is less than the latency between the main memory and the processors, that stores the program transferred from the main memory and is shared by the multiple processor elements.
  • the system further comprises an update unit that updates the instruction in the instruction cache with an instruction in the program stored in the shared memory in the case where a cache miss has occurred in the instruction cache.
  • FIG. 1 is a block diagram illustrating the hardware configuration of an image processing apparatus according to an embodiment of the present invention.
  • FIG. 2 is a block diagram illustrating an outline of a controller in an image processing apparatus according to a first embodiment.
  • FIG. 3 is a flowchart illustrating an example of operations performed by the image processing apparatus according to the first embodiment.
  • FIG. 4 is a flowchart illustrating an example of operations performed by a parallel processing processor system according to the first embodiment.
  • FIG. 5 is a conceptual diagram illustrating a tile data/firmware data flow according to the first embodiment.
  • FIG. 6 is a conceptual diagram illustrating movement of the content of a shared memory/instruction cache according to the first embodiment.
  • FIG. 7 is a block diagram illustrating an outline of a controller in an image processing apparatus according to a second embodiment.
  • FIG. 8 is a flowchart illustrating an example of operations performed by the image processing apparatus according to the second embodiment.
  • FIG. 9 is a flowchart illustrating an example of operations performed by a parallel processing processor system according to the second embodiment.
  • FIG. 10 is a flowchart illustrating an example of operations performed by a parallel processing processor system according to the second embodiment.
  • FIG. 11 is a conceptual diagram illustrating movement of the content of a shared memory/instruction cache according to the second embodiment.
  • FIG. 1 is a block diagram illustrating the hardware configuration of an image processing apparatus including a parallel processing processor system according to the present invention.
  • the image processing apparatus according to the present embodiment is assumed to be an MFP (Multifunction Peripheral) provided with a copy function, a printer function, a fax function, and a scanner function, and is configured so as to include a controller 101 , a UI (User Interface) unit 102 , a printer 103 , a scanner 104 , a memory 105 , and a communication I/F (interface) 106 .
  • MFP Multifunction Peripheral
  • the controller 101 is a unit that controls the image processing apparatus as a whole.
  • the controller 101 is electrically connected to the various blocks, such as the printer 103 , the scanner 104 , and so on, and performs control so as to realize a high level of functionality. This shall be described in greater detail later.
  • the UI unit 102 provides a user interface (UI) for a user to operate the image processing apparatus.
  • UI user interface
  • the UI unit 102 is configured of, for example, a liquid crystal touch panel, and accepts operational instructions for the image processing apparatus from a user, displays previews of images to be printed, and so on.
  • the printer 103 is a block that prints a visual image onto a recording sheet based on an electrical image signal, and is configured of, for example, a laser printer, an inkjet printer, or the like.
  • the scanner 104 is a block that optically reads a document image and converts the read image into an electrical image signal.
  • the memory 105 is an external memory configured of a memory device such as, for example, a DDR-SDRAM, an HDD, or the like.
  • This memory 105 functions as a main memory, and not only temporarily stores image data, but also stores control programs, data, and so on used by the controller 101 to realize the functions of the image processing apparatus.
  • the communication I/F 106 is a block that exchanges data with an external device, and connects to the Internet or a LAN, connects to a public telephone line to perform fax communication, connects to a PC (Personal Computer) through a USB interface, or the like.
  • a PC Personal Computer
  • FIG. 2 is a block diagram illustrating an outline of the controller 101 .
  • the controller 101 includes a CPU (Central Processing Unit) 201 , an I/O controller 202 , a parallel processing processor system 203 , and a data bus 204 .
  • the I/O controller 202 controls data transfer between the controller 101 and units such as the memory 105 and the communication I/F 106 , and has DMA (Direct Memory Access) functionality.
  • the configuration is such that a single parallel processing processor system 203 is included in the controller 101 , but a configuration in which multiple parallel processing processor systems 203 are included is also possible.
  • the CPU 201 , the I/O controller 202 , and the parallel processing processor system 203 are connected via the data bus 204 .
  • the parallel processing processor system 203 includes DSPs (Digital Signal Processors) 301 , instruction caches 302 , image local memories 303 , a shared memory 304 , and a data bus 305 .
  • DSPs Digital Signal Processors
  • a single DSP 301 , instruction cache 302 , and image local memory 303 are collectively called a processor element (PE).
  • PE processor element
  • the parallel processing processor system 203 includes three processor elements PE 1 , PE 2 , and PE 3 , the number of processor elements is not limited to three. Furthermore, each DSP 301 in the present embodiment is assumed to have the same processing capabilities.
  • each piece of image data obtained through division into predetermined units is stored in the image local memory 303 of a single PE, and that image is processed by the DSP 301 of the same PE according to instructions within the instruction cache of that PE.
  • Firmware which is program instructions executed by the DSPs 301 of multiple PEs, is stored in the shared memory 304 .
  • An advantage of storing instructions rather than data in the shared memory 304 is that write accesses from the DSPs 301 do not occur, reducing the likelihood of access concentration. Access concentration occurring in the shared memory 304 creates a bottleneck, leading to a drop in the processing capabilities.
  • the shared memory 304 has a lower latency with the DSPs 301 than the memory 105 and has a high operational frequency, and thus the DSPs 301 are capable of reading out the firmware at high speeds.
  • the I/O controller 202 DMA-transfers firmware stored in the memory 105 to the shared memory 304 .
  • the DSPs 301 are connected to the shared memory 304 via the data bus 305 .
  • the parallel processing processor system 203 carries out read image processing, recorded image processing, communication image processing, and so on.
  • Read image processing refers to executing shading correction or the like on image data received from the scanner 104 , and also performing various types of image processes, such as MTF correction, color conversion processing, filter processing, gamma processing, and so on, on that image data.
  • “Recorded image processing” refers to performing binarization processing, halftone processing, and color conversion processing such as RGB to CMYK conversion on image data that has undergone the aforementioned read image processing, thereby converting that image data into a halftone image. Furthermore, this processing involves performing various types of image processes such as resolution conversion based on the recording resolution, image magnification, smoothing, darkness correction, and so on, thereby converting the image data into high-resolution image data, and outputting that data to a laser printer or the like.
  • Communication image processing refers to performing resolution conversion, color conversion, and so on on a read image in accordance with communication capabilities, performing resolution conversion on an image received through communication in accordance with recording capabilities, and so on.
  • the size of the firmware for read image processing/recorded image processing is assumed to be less than 16 KB
  • the capacity of the shared memory 304 is assumed to be 16 KB
  • the capacity of the instruction cache 302 is assumed to be 4 KB.
  • the instruction cache 302 need only be capable of holding at least some of the instructions of the program executed by the DSP 301 , and therefore the capacity of the instruction cache 302 may be significantly smaller than the overall size of the program.
  • FIG. 3 is a flowchart illustrating operations performed by the image processing apparatus according to the present embodiment.
  • the parallel processing processor system 203 during processing spanning from image data being obtained through the scanner 104 to the image data being outputted to the printer 103 .
  • image data is obtained through the scanner 104 (S 101 ), and that image data is then transferred to the memory 105 (S 102 ).
  • read image processing firmware is transferred to the parallel processing processor system 203 of the controller 101 (S 103 ), and the read image processing is executed by the parallel processing processor system 203 (S 104 ).
  • recorded image processing firmware is transferred to the parallel processing processor system 203 (S 105 ), and the recorded image processing is executed by the parallel processing processor system 203 (S 106 ).
  • the image data is transferred to the printer 103 via the data bus 204 (S 107 ). Detailed descriptions regarding S 104 and S 106 shall be given later.
  • the image processing firmware is transferred to the shared memory 304 by, for example, the I/O controller 202 .
  • FIG. 4 illustrates the operations of a single DSP 301 , all of the DSPs 301 present in the parallel processing processor system 203 execute the same processes in parallel.
  • the DSP 301 reads out image data of a predetermined size to be processed from the memory 105 (called “tile data” hereinafter) and stores that data in the image local memory 303 (S 201 ).
  • the DSP 301 executes the firmware (S 202 ), and it is determined whether or not a cache miss has occurred in the instruction cache 302 (S 203 ). In the case where a cache miss has occurred, that an instruction in the instruction cache 302 is updated with another instruction in a program stored in the shared memory 304 (S 204 ). To be more specific, the stated update is carried out by copying into the instruction cache 302 , the content of the shared memory 304 corresponding to the address that the DSP 301 accessed.
  • the DSP 301 accesses the memory 105 , which is the main memory and is located off the chip.
  • the instruction cache is updated by accessing the shared memory 304 , which has a lower latency with the DSP than the memory 105 , which is the main memory located off the chip. For this reason, the present embodiment is superior to the conventional technique with respect to processing speed.
  • FIG. 5 illustrates a conceptual diagram of the tile data/firmware data flow based on the flowchart in FIG. 4 .
  • Image data 401 and firmware 402 are stored in the memory 105 .
  • the image data 401 is divided into tiles 1 , 2 , and 3 , which are processed by DSPs 3 , 2 , and 1 , respectively.
  • the step numbers added to the arrows correspond to the step numbers in the flowcharts in FIG. 3 and FIG. 4 .
  • FIG. 6 is a conceptual diagram illustrating this.
  • the tile data is first stored in the image local memory 303 , after which various types of image processes performed at the pixel level are carried out on all of the pixels in the tile data. The following is carried out on all of the image data.
  • the read image processing firmware is transferred (S 103 ) to the shared memory 304 ( 501 ).
  • the processing has not been carried out on all of the image data, the next tile data is read out and stored in the image local memory 303 , and the processing continues.
  • the latency depends on the latency of the data bus 204 , the latency of the DDR-SDRAM, and so on, and 20 to 30 clocks are necessary for a one-word read/write.
  • the shared memory 304 is disposed so as to have several clocks' worth of latency from the DSP 301 , the latency of the time of a cache miss can be reduced to approximately one fifth.
  • the capacity of the instruction cache can be reduced while obtaining the desired degree of performance in a structure where the image data to be processed is processed in parallel by multiple DSPs.
  • the hardware configuration of an image processing apparatus according to the present embodiment is the same as that shown in FIG. 1 . Furthermore, the configuration of the controller 101 is basically the same as that shown in FIG. 2 .
  • FIG. 7 is a block diagram illustrating an outline of a parallel processing processor system 203 according to the present embodiment.
  • the parallel processing processor system according to the present embodiment is configured so as to further include a synchronization controller 306 that controls synchronization between the DSPs.
  • a synchronization controller 306 that controls synchronization between the DSPs.
  • an interrupt signal from a DSP is used as a synchronizing signal.
  • the synchronization controller 306 Upon receiving a synchronizing signal from a DSP 301 , the synchronization controller 306 instructs, for example, the I/O controller 202 to rewrite the firmware in the shared memory 304 .
  • the size of the firmware for read image processing/recorded image processing is assumed to be no less than 8 KB and less than 16 KB, whereas the capacity of the shared memory 304 is assumed to be 8 KB. Meanwhile, the capacity of the instruction cache 302 is assumed to be 4 KB.
  • FIG. 8 is a flowchart illustrating operations performed by the image processing apparatus according to the present embodiment.
  • FIG. 8 detailed descriptions shall be given regarding operations of the parallel processing processor system during processing spanning from image data being obtained through the scanner 104 to the read image processing being performed.
  • image data is obtained through the scanner 104 (S 301 ), and that image data is then transferred to the memory 105 (S 302 ).
  • the MTF correction processing function and the color conversion processing function are transferred to the parallel processing processor system 203 (S 303 ), and the read image processing is performed by the parallel processing processor system 203 (S 304 ).
  • S 303 the image processing firmware is transferred to the shared memory 304 by the I/O controller 202 .
  • FIGS. 9 and 10 are flowcharts illustrating operations performed by the parallel processing processor system 203 according to the present embodiment.
  • FIG. 9 is a flowchart illustrating processes performed by the DSP 301
  • FIG. 10 is a flowchart illustrating processes performed by the synchronization controller 306 .
  • FIG. 9 shall be described.
  • MTF correction processing, color conversion processing, filter processing, and gamma correction processing are performed in that order on each piece of tile data.
  • the MTF correction processing function and the color conversion processing function are stored in the shared memory 304 at a certain point in time, and the DSPs 301 execute the MTF correction processing and the color conversion processing.
  • the filter processing function and the gamma correction processing function are stored in the shared memory 304 , and the DSPs 301 execute the filter processing and the gamma correction processing.
  • the tile data is read out from the memory 105 and stored in the image local memory 303 (S 401 ).
  • the DSP 301 executes the firmware (S 402 ), and it is determined whether or not a cache miss has occurred (S 403 ).
  • the instruction cache 302 is updated (S 404 ).
  • the instruction cache update is realized by copying, into the instruction cache, the content of a region in the shared memory 304 that corresponds to the address accessed by the DSP 301 . If a cache miss has not occurred, it is determined whether or not the gamma correction processing has ended (S 405 ). If the gamma correction processing has not ended, it is determined whether or not the color conversion processing has ended (S 406 ).
  • S 407 after the tile data is written back into the memory 105 from the image local memory 303 , it is determined whether or not the processing has been completed for all of the image data (S 410 ). If in S 410 the processing has been completed for all of the image data, the overall process is complete, whereas if the processing has not been completed, an interrupt is outputted to the synchronization controller 306 (S 411 ), the control shifts to the synchronization controller 306 (B), and the DSP enters a standby state.
  • FIG. 10 shall be described.
  • the synchronization controller 306 When the processing commences, the synchronization controller 306 enters an interrupt standby state (S 412 ). Having been notified of an interrupt through the aforementioned A or B, the synchronization controller 306 releases the interrupt of the DSP 301 (S 413 ).
  • the synchronization controller 306 requests the I/O controller 202 to transfer the filtering processing and gamma correction processing firmware to the shared memory 304 .
  • the I/O controller 202 transfers the filter processing and gamma correction processing firmware to the shared memory 304 (S 416 ), starts the DSP 301 (S 417 ), and transfers control to the DSP 301 (C).
  • the interrupt cause is B
  • the series of processing for a single piece of tile data has been completed in its entirety, and thus it is necessary to perform the MTF correction processing and color conversion processing on a new piece of tile data. Accordingly, the MTF correction processing and color conversion processing firmware is transferred to the shared memory 304 (S 418 ), the DSP 301 is started (S 419 ), and control is transferred to the DSP 301 (D).
  • the DSP 301 started based on C resumes processing from the execution of the newly-transferred filter processing and gamma correction processing firmware (S 402 ) on the same piece of tile data. Meanwhile, the DSP 301 started based on D resumes processing from the process for reading out a new piece of tile data from the memory 105 and storing that piece of tile data in the image local memory 303 (S 401 ). The newly-transferred MTF correction processing and color conversion processing firmware is then executed on that new piece of tile data (S 402 ).
  • FIG. 11 is a conceptual diagram illustrating this.
  • the tile data is first read out into the image local memory 303 , after which various types of image processes performed at the pixel level are carried out on all of the pixels in the tile data. The following is carried out on all of the image data.
  • the MTF correction processing function and color conversion processing function of the read image processing firmware is transferred (S 303 ) to the shared memory 304 ( 601 ).
  • the DSP 301 commences the execution of the firmware (S 402 )
  • the firmware for the MTF correction processing function and the color conversion processing function is not stored in the instruction cache 302 , and thus a cache miss occurs (S 403 ).
  • part of the firmware stored in the shared memory 304 (the MTF correction processing function) is copied (S 404 ) to the instruction cache 302 ( 602 ).
  • the filter processing and gamma correction processing firmware is transferred by the synchronization controller 306 to the shared memory 304 (S 416 ), and as a result, the shared memory/cache are as indicated by 604 .
  • the processing has not been carried out on all of the image data, the next tile data is read out and stored in the image local memory 303 , and the processing continues.
  • the present invention is not limited to the aforementioned embodiments.
  • the data to be processed is not limited to image data, and the present invention can also be applied to audio data or the like.
  • the processing can be accelerated by preferentially assigning tile data to DSPs that have a higher processing speed.
  • the processing time can be made uniform by changing the size of the tile data in accordance with the processing speeds of the DSPs.
  • aspects of the present invention can also be realized by a computer of a system or apparatus (or devices such as a CPU or MPU) that reads out and executes a program recorded on a memory device to perform the functions of the above-described embodiments, and by a method, the steps of which are performed by a computer of a system or apparatus by, for example, reading out and executing a program recorded on a memory device to perform the functions of the above-described embodiments.
  • the program is provided to the computer for example via a network or from a recording medium of various types serving as the memory device (e.g., computer-readable medium).

Abstract

A parallel processing processor system includes multiple processor elements, a main memory, and a shared memory, whose latency with the processors is less than the latency between the main memory and the processors. Each of the multiple processor elements has a DSP (Digital Signal Processor) and an instruction cache. Firmware executed by the DSPs is transferred from the main memory to the shared memory and is shared by the DSPs. Updating of the instruction caches in the case where a cache miss has occurred is performed by, for example, copying, into the instruction caches, the content of the shared memory corresponding to an address accessed by a DSP.

Description

    BACKGROUND OF THE INVENTION
  • 1. Field of the Invention
  • The present invention relates to a parallel processing processor system, provided with multiple processors, that processes data to be processed in parallel using the multiple processors. The present invention particularly relates to a parallel processing processor system capable of reducing inherent instruction cache capacities of each of the processors while also maintaining the degree of performance thereof.
  • 2. Description of the Related Art
  • In the controllers of MFPs (Multifunction Peripherals), individual hardware logic is provided for processes such as image reading, recording, printing, communication, fax, and so on, thereby realizing functions requested of the MFP. However, preparing circuits for each function makes it difficult to reduce the cost of the controller while also maintaining its functionality.
  • Reducing costs while maintaining functionality is possible by executing non-simultaneous image processes using programmable hardware. DSPs (Digital Signal Processors), reconfigurable processors, and configurable processors can be given as examples of programmable hardware. Here, reducing costs by switching firmware using multiple DSPs shall be considered as an example.
  • A configuration in which multiple DSPs that are each assigned to different image processes are connected and a series of multiple types of image processes are executed sequentially on the same image region is called “pipeline architecture”. If pipeline architecture is employed, differences in processing times among the DSPs will result in DSPs that act as bottlenecks, making sufficient throughput difficult to achieve.
  • In order to avoid this problem, the DSPs can be customized so that the processing times of the individual DSPs are equal.
  • However, if a DSP is customized for a certain process, it is difficult to customize that DSP in the same manner for a different piece of firmware when switching to and executing that different piece of firmware.
  • Meanwhile, although techniques for regulating loads among the DSPs exist (for example, see Japanese Patent Laid-Open No. 2006-133839), such regulation requires overhead; furthermore, improving the throughput is difficult and the control involved is complex, and thus such a technique is not necessarily desirable. Moreover, pipeline architecture has a problem in that it is difficult to implement a changeable configuration that has scalability, where costs are reduced by reducing the number of DSPs, performance is improved by increasing the number of DSPs, and so on.
  • Based on this, a data parallel processing architecture, in which the image data to be processed is divided, each piece of image data obtained through the division is assigned to a different DSP, and the multiple processes that were executed by different DSPs in the pipeline architecture are executed by those multiple DSPs, is more preferable than a pipeline architecture. In the present specification, an architecture in which multiple DSPs are used, the image data to be processed is divided, and a series of processes are performed on the pieces of image data obtained through the division in parallel by the DSPs shall be called a data parallel processing architecture.
  • When a structure in which image data to be processed is divided and data parallel processing is executed thereon by multiple DSPs, the size of the programs executed by the DSPs increases, and thus the cache miss rate is higher than when a pipeline architecture is employed for an instruction cache of the same capacity. When a cache miss occurs, the DSP accesses a main memory. The main memory is a DRAM (Dynamic Random Access Memory) or the like located off of the chip that implements the DSP.
  • With an off-chip DRAM, 20 to 30 clocks are necessary for a one-word read/write, and thus the latency at the time of the cache miss is extremely high, which greatly influences the processing capabilities of the DSP. Meanwhile, if an instruction cache having a capacity capable of storing all the processes assigned to each DSP is employed, the size of the instruction cache increases, thereby increasing the surface area of the circuit.
  • A method that uses a secondary cache can be employed in order to reduce the latency at the time of a cache miss. A “secondary cache” is a processor-specific storage device with a higher latency than a primary cache and a lower latency than a DRAM. Although using a secondary cache can solve the aforementioned problem, doing so also leads to the following problems:
  • because a cache requires a circuit called a “tag” in addition to a circuit for storing data, the circuit scale increases; and
  • cache transfer is executed in units called “cache lines” and thus the efficiency is poor.
  • SUMMARY OF THE INVENTION
  • The present invention solves the above problems by employing a shared memory such as an SRAM (Static Random Access Memory) shared among DSPs rather than needing to employ a secondary cache. With a shared memory, tags are unnecessary, and transfers not executed in cache line units are possible.
  • The present invention provides, in a structure where data parallel processing is executed by multiple processors, a configuration capable of reducing the capacity of an instruction cache while obtaining the desired degree of performance.
  • According to one aspect of the present invention, a parallel processing processor system that includes multiple processors and performs parallel processing on data read out from a main memory using the multiple processors is provided. The system comprises multiple processor elements, each processor element including a processor and an instruction cache that holds an instruction corresponding to at least part of a program executed by the processor. The system also comprises a shared memory, whose latency with the processors is less than the latency between the main memory and the processors, that stores the program transferred from the main memory and is shared by the multiple processor elements. The system further comprises an update unit that updates the instruction in the instruction cache with an instruction in the program stored in the shared memory in the case where a cache miss has occurred in the instruction cache.
  • Further features of the present invention will become apparent from the following description of exemplary embodiments with reference to the attached drawings.
  • BRIEF DESCRIPTION OF THE DRAWINGS
  • FIG. 1 is a block diagram illustrating the hardware configuration of an image processing apparatus according to an embodiment of the present invention.
  • FIG. 2 is a block diagram illustrating an outline of a controller in an image processing apparatus according to a first embodiment.
  • FIG. 3 is a flowchart illustrating an example of operations performed by the image processing apparatus according to the first embodiment.
  • FIG. 4 is a flowchart illustrating an example of operations performed by a parallel processing processor system according to the first embodiment.
  • FIG. 5 is a conceptual diagram illustrating a tile data/firmware data flow according to the first embodiment.
  • FIG. 6 is a conceptual diagram illustrating movement of the content of a shared memory/instruction cache according to the first embodiment.
  • FIG. 7 is a block diagram illustrating an outline of a controller in an image processing apparatus according to a second embodiment.
  • FIG. 8 is a flowchart illustrating an example of operations performed by the image processing apparatus according to the second embodiment.
  • FIG. 9 is a flowchart illustrating an example of operations performed by a parallel processing processor system according to the second embodiment.
  • FIG. 10 is a flowchart illustrating an example of operations performed by a parallel processing processor system according to the second embodiment.
  • FIG. 11 is a conceptual diagram illustrating movement of the content of a shared memory/instruction cache according to the second embodiment.
  • DESCRIPTION OF THE EMBODIMENTS
  • Various exemplary embodiments, features, and aspects of the present invention will be described in detail below with reference to the drawings.
  • First Embodiment
  • FIG. 1 is a block diagram illustrating the hardware configuration of an image processing apparatus including a parallel processing processor system according to the present invention. The image processing apparatus according to the present embodiment is assumed to be an MFP (Multifunction Peripheral) provided with a copy function, a printer function, a fax function, and a scanner function, and is configured so as to include a controller 101, a UI (User Interface) unit 102, a printer 103, a scanner 104, a memory 105, and a communication I/F (interface) 106.
  • Outlines of these units shall be given hereinafter.
  • The controller 101 is a unit that controls the image processing apparatus as a whole. The controller 101 is electrically connected to the various blocks, such as the printer 103, the scanner 104, and so on, and performs control so as to realize a high level of functionality. This shall be described in greater detail later.
  • The UI unit 102 provides a user interface (UI) for a user to operate the image processing apparatus. The UI unit 102 is configured of, for example, a liquid crystal touch panel, and accepts operational instructions for the image processing apparatus from a user, displays previews of images to be printed, and so on.
  • The printer 103 is a block that prints a visual image onto a recording sheet based on an electrical image signal, and is configured of, for example, a laser printer, an inkjet printer, or the like.
  • The scanner 104 is a block that optically reads a document image and converts the read image into an electrical image signal.
  • The memory 105 is an external memory configured of a memory device such as, for example, a DDR-SDRAM, an HDD, or the like. This memory 105 functions as a main memory, and not only temporarily stores image data, but also stores control programs, data, and so on used by the controller 101 to realize the functions of the image processing apparatus.
  • The communication I/F 106 is a block that exchanges data with an external device, and connects to the Internet or a LAN, connects to a public telephone line to perform fax communication, connects to a PC (Personal Computer) through a USB interface, or the like.
  • FIG. 2 is a block diagram illustrating an outline of the controller 101. The controller 101 includes a CPU (Central Processing Unit) 201, an I/O controller 202, a parallel processing processor system 203, and a data bus 204. The I/O controller 202 controls data transfer between the controller 101 and units such as the memory 105 and the communication I/F 106, and has DMA (Direct Memory Access) functionality. In the present embodiment, the configuration is such that a single parallel processing processor system 203 is included in the controller 101, but a configuration in which multiple parallel processing processor systems 203 are included is also possible. The CPU 201, the I/O controller 202, and the parallel processing processor system 203 are connected via the data bus 204.
  • The parallel processing processor system 203 includes DSPs (Digital Signal Processors) 301, instruction caches 302, image local memories 303, a shared memory 304, and a data bus 305. A single DSP 301, instruction cache 302, and image local memory 303 are collectively called a processor element (PE).
  • Although the configuration of the present embodiment is such that the parallel processing processor system 203 includes three processor elements PE1, PE2, and PE3, the number of processor elements is not limited to three. Furthermore, each DSP 301 in the present embodiment is assumed to have the same processing capabilities.
  • With the parallel processing processor system 203, each piece of image data obtained through division into predetermined units is stored in the image local memory 303 of a single PE, and that image is processed by the DSP 301 of the same PE according to instructions within the instruction cache of that PE. Firmware, which is program instructions executed by the DSPs 301 of multiple PEs, is stored in the shared memory 304. An advantage of storing instructions rather than data in the shared memory 304 is that write accesses from the DSPs 301 do not occur, reducing the likelihood of access concentration. Access concentration occurring in the shared memory 304 creates a bottleneck, leading to a drop in the processing capabilities. The shared memory 304 has a lower latency with the DSPs 301 than the memory 105 and has a high operational frequency, and thus the DSPs 301 are capable of reading out the firmware at high speeds.
  • When switching firmware, the I/O controller 202 DMA-transfers firmware stored in the memory 105 to the shared memory 304. The DSPs 301 are connected to the shared memory 304 via the data bus 305.
  • With the image processing apparatus according to the present embodiment, the parallel processing processor system 203 carries out read image processing, recorded image processing, communication image processing, and so on.
  • “Read image processing” refers to executing shading correction or the like on image data received from the scanner 104, and also performing various types of image processes, such as MTF correction, color conversion processing, filter processing, gamma processing, and so on, on that image data.
  • “Recorded image processing” refers to performing binarization processing, halftone processing, and color conversion processing such as RGB to CMYK conversion on image data that has undergone the aforementioned read image processing, thereby converting that image data into a halftone image. Furthermore, this processing involves performing various types of image processes such as resolution conversion based on the recording resolution, image magnification, smoothing, darkness correction, and so on, thereby converting the image data into high-resolution image data, and outputting that data to a laser printer or the like.
  • “Communication image processing” refers to performing resolution conversion, color conversion, and so on on a read image in accordance with communication capabilities, performing resolution conversion on an image received through communication in accordance with recording capabilities, and so on. In the present embodiment, for example, the size of the firmware for read image processing/recorded image processing is assumed to be less than 16 KB, whereas the capacity of the shared memory 304 is assumed to be 16 KB. Meanwhile, the capacity of the instruction cache 302 is assumed to be 4 KB. The instruction cache 302 need only be capable of holding at least some of the instructions of the program executed by the DSP 301, and therefore the capacity of the instruction cache 302 may be significantly smaller than the overall size of the program.
  • FIG. 3 is a flowchart illustrating operations performed by the image processing apparatus according to the present embodiment. In the present embodiment, detailed descriptions shall be given regarding operations of the parallel processing processor system 203 during processing spanning from image data being obtained through the scanner 104 to the image data being outputted to the printer 103.
  • First, image data is obtained through the scanner 104 (S101), and that image data is then transferred to the memory 105 (S102).
  • Next, read image processing firmware is transferred to the parallel processing processor system 203 of the controller 101 (S103), and the read image processing is executed by the parallel processing processor system 203 (S104).
  • Furthermore, recorded image processing firmware is transferred to the parallel processing processor system 203 (S105), and the recorded image processing is executed by the parallel processing processor system 203 (S106).
  • Finally, the image data is transferred to the printer 103 via the data bus 204 (S107). Detailed descriptions regarding S104 and S106 shall be given later. In S103 and S105, the image processing firmware is transferred to the shared memory 304 by, for example, the I/O controller 202.
  • The operations of the parallel processing processor system 203 in S104 and S106 shall be described using the flowchart in FIG. 4. Although FIG. 4 illustrates the operations of a single DSP 301, all of the DSPs 301 present in the parallel processing processor system 203 execute the same processes in parallel.
  • When the processing commences, the DSP 301 reads out image data of a predetermined size to be processed from the memory 105 (called “tile data” hereinafter) and stores that data in the image local memory 303 (S201).
  • Next, the DSP 301 executes the firmware (S202), and it is determined whether or not a cache miss has occurred in the instruction cache 302 (S203). In the case where a cache miss has occurred, that an instruction in the instruction cache 302 is updated with another instruction in a program stored in the shared memory 304 (S204). To be more specific, the stated update is carried out by copying into the instruction cache 302, the content of the shared memory 304 corresponding to the address that the DSP 301 accessed.
  • As described earlier, when a cache miss occurs in the conventional configuration, the DSP 301 accesses the memory 105, which is the main memory and is located off the chip. As opposed to this, in the present embodiment, the instruction cache is updated by accessing the shared memory 304, which has a lower latency with the DSP than the memory 105, which is the main memory located off the chip. For this reason, the present embodiment is superior to the conventional technique with respect to processing speed.
  • If a cache miss has not occurred, it is determined whether or not all of the image processes have been executed on the tile data (S205).
  • If in S205 all of the image processes have not been completed, the procedure returns to S202, and the execution of the firmware by the DSP 301 is continued. However, if all of the image processes have been completed, the processed tile data is written back into the memory 105 from the image local memory 303 (S206).
  • Next, it is determined whether or not processing has been completed for all of the image data (S207). If the processing has not been completed, the procedure returns to S201, where the next piece of tile data is read out from the memory 105 and stored in the image local memory 303, whereas if the processing has been completed, the overall procedure ends.
  • FIG. 5 illustrates a conceptual diagram of the tile data/firmware data flow based on the flowchart in FIG. 4.
  • Image data 401 and firmware 402 are stored in the memory 105. The image data 401 is divided into tiles 1, 2, and 3, which are processed by DSPs 3, 2, and 1, respectively. In FIG. 5, the step numbers added to the arrows correspond to the step numbers in the flowcharts in FIG. 3 and FIG. 4.
  • The movement of the content of the shared memory/instruction cache occurring during the processing in the present embodiment shall be described using the flow occurring during the read image processing as an example. FIG. 6 is a conceptual diagram illustrating this.
  • With the parallel processing processor system 203, the tile data is first stored in the image local memory 303, after which various types of image processes performed at the pixel level are carried out on all of the pixels in the tile data. The following is carried out on all of the image data.
  • First, the read image processing firmware is transferred (S103) to the shared memory 304 (501).
  • When the DSP 301 commences execution of the firmware (S202), the read image processing firmware is not stored in the instruction cache 302, and thus a cache miss occurs (S203). At this time, part of the firmware stored in the shared memory 304 (an MTF correction processing function) is copied (S204) to the instruction cache 302 (502).
  • No cache misses occur during the period in which the MTF correction process is being carried out on all the pixels within the tile data. However, when the MTF correction process has been completed for all of the pixels in the tile data, a cache miss occurs, and a color conversion processing function, which is a different part of the firmware stored in the shared memory 304, is copied into the instruction cache 302 (503). This is repeated until the gamma correction processing is completed, and when the gamma correction processing has been completed, the processed tile data is written back into the memory 105.
  • If the processing has not been carried out on all of the image data, the next tile data is read out and stored in the image local memory 303, and the processing continues.
  • The latency of the shared memory 304 shall now be described.
  • In the case where the memory 105 is provided with the DDR-SDRAM as the main memory, the latency depends on the latency of the data bus 204, the latency of the DDR-SDRAM, and so on, and 20 to 30 clocks are necessary for a one-word read/write. However, if the shared memory 304 is disposed so as to have several clocks' worth of latency from the DSP 301, the latency of the time of a cache miss can be reduced to approximately one fifth.
  • Accordingly, through the above processing, the capacity of the instruction cache can be reduced while obtaining the desired degree of performance in a structure where the image data to be processed is processed in parallel by multiple DSPs.
  • Second Embodiment
  • Hereinafter, a second embodiment of the present invention shall be described in detail with reference to the appended drawings.
  • The hardware configuration of an image processing apparatus according to the present embodiment is the same as that shown in FIG. 1. Furthermore, the configuration of the controller 101 is basically the same as that shown in FIG. 2.
  • FIG. 7 is a block diagram illustrating an outline of a parallel processing processor system 203 according to the present embodiment. In FIG. 7, constituent elements that are the same as those shown in FIG. 2 are given the same reference numerals. As illustrated in FIG. 7, the parallel processing processor system according to the present embodiment is configured so as to further include a synchronization controller 306 that controls synchronization between the DSPs. In the present embodiment, an interrupt signal from a DSP is used as a synchronizing signal.
  • Upon receiving a synchronizing signal from a DSP 301, the synchronization controller 306 instructs, for example, the I/O controller 202 to rewrite the firmware in the shared memory 304. In the present embodiment, the size of the firmware for read image processing/recorded image processing is assumed to be no less than 8 KB and less than 16 KB, whereas the capacity of the shared memory 304 is assumed to be 8 KB. Meanwhile, the capacity of the instruction cache 302 is assumed to be 4 KB.
  • FIG. 8 is a flowchart illustrating operations performed by the image processing apparatus according to the present embodiment. Here, detailed descriptions shall be given regarding operations of the parallel processing processor system during processing spanning from image data being obtained through the scanner 104 to the read image processing being performed.
  • First, image data is obtained through the scanner 104 (S301), and that image data is then transferred to the memory 105 (S302).
  • Next, of the read image processing firmware, the MTF correction processing function and the color conversion processing function are transferred to the parallel processing processor system 203 (S303), and the read image processing is performed by the parallel processing processor system 203 (S304).
  • Detailed descriptions shall be given later regarding S304. In S303, the image processing firmware is transferred to the shared memory 304 by the I/O controller 202.
  • FIGS. 9 and 10 are flowcharts illustrating operations performed by the parallel processing processor system 203 according to the present embodiment. FIG. 9 is a flowchart illustrating processes performed by the DSP 301, whereas FIG. 10 is a flowchart illustrating processes performed by the synchronization controller 306.
  • First, FIG. 9 shall be described. With the read image processing performed by the DSPs 301, it is assumed that MTF correction processing, color conversion processing, filter processing, and gamma correction processing are performed in that order on each piece of tile data. Because the size of the image processing firmware as a whole exceeds the capacity of the shared memory 304, it cannot be stored therein in its entirety. For this reason, the MTF correction processing function and the color conversion processing function are stored in the shared memory 304 at a certain point in time, and the DSPs 301 execute the MTF correction processing and the color conversion processing. Then, at a different point in time, the filter processing function and the gamma correction processing function are stored in the shared memory 304, and the DSPs 301 execute the filter processing and the gamma correction processing.
  • When the processing commences, the tile data is read out from the memory 105 and stored in the image local memory 303 (S401).
  • Next, the DSP 301 executes the firmware (S402), and it is determined whether or not a cache miss has occurred (S403). In the case where a cache miss has occurred, the instruction cache 302 is updated (S404). At this time, the instruction cache update is realized by copying, into the instruction cache, the content of a region in the shared memory 304 that corresponds to the address accessed by the DSP 301. If a cache miss has not occurred, it is determined whether or not the gamma correction processing has ended (S405). If the gamma correction processing has not ended, it is determined whether or not the color conversion processing has ended (S406).
  • If in S405 the gamma correction processing has ended, the series of read image processes has ended for the current tile data, and therefore the processed tile data is written back into the memory 105 from the image local memory 303 (S407). If in S406 the color conversion processing has not ended, the procedure returns to S402, where the DSP 301 continues with the execution of the current firmware, whereas if the color conversion processing has ended, it is determined whether or not an interrupt is already being outputted (S408).
  • If in S408 an interrupt is being outputted, the procedure returns to S402, where the DSP 301 continues with the execution of the firmware, whereas if an interrupt is not being outputted, an interrupt is outputted to the synchronization controller 306 (S409). After this, the control shifts to the synchronization controller 306 (A), and the DSP enters a standby state.
  • In S407, after the tile data is written back into the memory 105 from the image local memory 303, it is determined whether or not the processing has been completed for all of the image data (S410). If in S410 the processing has been completed for all of the image data, the overall process is complete, whereas if the processing has not been completed, an interrupt is outputted to the synchronization controller 306 (S411), the control shifts to the synchronization controller 306 (B), and the DSP enters a standby state.
  • Next, FIG. 10 shall be described.
  • When the processing commences, the synchronization controller 306 enters an interrupt standby state (S412). Having been notified of an interrupt through the aforementioned A or B, the synchronization controller 306 releases the interrupt of the DSP 301 (S413).
  • Next, it is determined whether or not interrupts have been received from all the DSPs (S414); if interrupts have not been received from all the DSPs, the synchronization controller 306 returns to the interrupt standby state, whereas if interrupts have been received from all the DSPs, it is determined whether the interrupt cause is A or B (S415). Once interrupts have been received from all the DSPs, the processing by the firmware currently stored in the shared memory 304 is completed for all the DSPs, and thus the processing has been synchronized. Accordingly, in order to carry out the next processing, new firmware is transferred to the shared memory 304, and the content thereof is rewritten.
  • If in S415 the interrupt cause is A, the series of processing for a single piece of tile data has progressed as far as the completion of the color conversion processing, and thus it is necessary to proceed to the filter processing and gamma correction processing for that piece of tile data. Accordingly, in response to this, the synchronization controller 306 requests the I/O controller 202 to transfer the filtering processing and gamma correction processing firmware to the shared memory 304. In response to this request, the I/O controller 202 transfers the filter processing and gamma correction processing firmware to the shared memory 304 (S416), starts the DSP 301 (S417), and transfers control to the DSP 301 (C).
  • On the other hand, if in S415 the interrupt cause is B, the series of processing for a single piece of tile data has been completed in its entirety, and thus it is necessary to perform the MTF correction processing and color conversion processing on a new piece of tile data. Accordingly, the MTF correction processing and color conversion processing firmware is transferred to the shared memory 304 (S418), the DSP 301 is started (S419), and control is transferred to the DSP 301 (D).
  • The DSP 301 started based on C resumes processing from the execution of the newly-transferred filter processing and gamma correction processing firmware (S402) on the same piece of tile data. Meanwhile, the DSP 301 started based on D resumes processing from the process for reading out a new piece of tile data from the memory 105 and storing that piece of tile data in the image local memory 303 (S401). The newly-transferred MTF correction processing and color conversion processing firmware is then executed on that new piece of tile data (S402).
  • The movement of the content of the shared memory/instruction cache occurring during the processing in the present embodiment shall be described next. FIG. 11 is a conceptual diagram illustrating this.
  • With the parallel processing processor system 203, the tile data is first read out into the image local memory 303, after which various types of image processes performed at the pixel level are carried out on all of the pixels in the tile data. The following is carried out on all of the image data.
  • First, the MTF correction processing function and color conversion processing function of the read image processing firmware is transferred (S303) to the shared memory 304 (601).
  • When the DSP 301 commences the execution of the firmware (S402), the firmware for the MTF correction processing function and the color conversion processing function is not stored in the instruction cache 302, and thus a cache miss occurs (S403). At this time, part of the firmware stored in the shared memory 304 (the MTF correction processing function) is copied (S404) to the instruction cache 302 (602).
  • No cache misses occur during the period in which the MTF correction process is being carried out on all the pixels within the tile data. When the MTF correction processing is completed for all the pixels within the tile data, a cache miss occurs, and the color conversion processing function is copied from the shared memory 304 to the instruction cache 302 (603).
  • When the color conversion processing ends, the filter processing and gamma correction processing firmware is transferred by the synchronization controller 306 to the shared memory 304 (S416), and as a result, the shared memory/cache are as indicated by 604.
  • When the DSP 301 resumes execution of the firmware (S402), the filter processing and gamma correction processing firmware is not stored in the cache, and thus a cache miss occurs (S403). At this time, part of the firmware stored in the shared memory 304 (the filter processing function) is copied (S404) to the instruction cache 302 (605).
  • This is repeated until the gamma correction processing is completed, and when the gamma correction processing has been completed, the processed tile data is written back into the memory 105.
  • If the processing has not been carried out on all of the image data, the next tile data is read out and stored in the image local memory 303, and the processing continues.
  • In a configuration in which image data to be processed is divided into predetermined units and parallel processing is executed thereon by multiple DSPs, providing a synchronization controller makes it possible to achieve a desired degree of performance while reducing the capacity of an instruction cache even in the case where the entirety of the firmware is not stored in the shared memory, by performing the processing described thus far.
  • The present invention is not limited to the aforementioned embodiments. For example, the data to be processed is not limited to image data, and the present invention can also be applied to audio data or the like.
  • As another embodiment, when there is a gap between the processing capabilities of the DSPs, the processing can be accelerated by preferentially assigning tile data to DSPs that have a higher processing speed.
  • As yet another embodiment, in the case where synchronization control is performed among the DSPs and there is also a gap between the processing capabilities of the DSPs, the processing time can be made uniform by changing the size of the tile data in accordance with the processing speeds of the DSPs.
  • Finally, as yet another embodiment, a configuration in which an image local memory is not provided and the DSPs execute processing on images stored in a memory is also possible.
  • Other Embodiments
  • Aspects of the present invention can also be realized by a computer of a system or apparatus (or devices such as a CPU or MPU) that reads out and executes a program recorded on a memory device to perform the functions of the above-described embodiments, and by a method, the steps of which are performed by a computer of a system or apparatus by, for example, reading out and executing a program recorded on a memory device to perform the functions of the above-described embodiments. For this purpose, the program is provided to the computer for example via a network or from a recording medium of various types serving as the memory device (e.g., computer-readable medium).
  • While the present invention has been described with reference to exemplary embodiments, it is to be understood that the invention is not limited to the disclosed exemplary embodiments. The scope of the following claims is to be accorded the broadest interpretation so as to encompass all such modifications and equivalent structures and functions.
  • This application claims the benefit of Japanese Patent Application No. 2009-051287, filed Mar. 4, 2009, which is hereby incorporated by reference herein in its entirety.

Claims (12)

1. A parallel processing processor system that includes multiple processors and performs parallel processing on data read out from a main memory using the multiple processors, the system comprising:
multiple processor elements, each processor element including a processor and a cache that holds an instruction corresponding to at least part of a program executed by the processor;
a shared memory, whose latency with the processors is less than the latency between the main memory and the processors, that stores the program transferred from the main memory and is shared by the multiple processor elements;
an update unit that updates the instruction in the cache with an instruction in the program stored in the shared memory in the case where a cache miss has occurred in the cache;
a transfer control unit that controls transfers between the main memory and the shared memory; and
a synchronization control unit that requests the transfer control unit to rewrite the program in the shared memory in response to a synchronizing signal.
2. The parallel processing processor system according to claim 1, wherein the capacity of the cache is smaller than the capacity of the shared memory.
3. The parallel processing processor system according to claim 1, wherein in the case where the cache miss has occurred in the cache, the update unit performs the update by copying, into the cache, the content of the shared memory corresponding to an address that the processor accessed.
4. (canceled)
5. The parallel processing processor system according to claim 1, wherein each of the multiple processors outputs the synchronizing signal upon completing the execution of the program currently stored in the shared memory, and the synchronization control unit requests the program in the shared memory to be rewritten in response to the synchronizing signal being outputted by all of the multiple processors.
6. The parallel processing processor system according to claim 1, wherein each of the multiple processor elements further include a local memory that stores data to be processed read out from the main memory.
7. The parallel processing processor system according to claim 1, wherein the shared memory operates in a higher frequency than the main memory.
8. The parallel processing processor system according to claim 1, wherein the transfer control unit transfers the program from the main memory to the shared memory in Direct Memory Access (DMA).
9. The parallel processing processor system according to claim 1, wherein at least one of the processors executes at least one of shading correction, MTF correction, color conversion processing, filter processing, and gamma processing on the data.
10. The parallel processing processor system according to claim 1, wherein at least one of the processors executes at least one of binarization processing, halftone processing, color conversion processing, resolution conversion, smoothing, and darkness correction on the data.
11. A method for a parallel processing processor system that includes multiple processors and performs parallel processing on data read out from a main memory using the multiple processors, wherein the system comprises multiple processor elements, each processor element including a processor and a cache that holds an instruction corresponding to at least part of a program executed by the processor, and a shared memory, whose latency with the processors is less than the latency between the main memory and the processors, that stores the program transferred from the main memory and is shared by the multiple processor elements,
wherein the method comprises:
updating the instruction in the cache with an instruction in the program stored in the shared memory in the case where a cache miss has occurred in the cache;
controlling transfers between the main memory and the shared memory; and
requesting to rewrite the program in the shared memory in response to a synchronizing signal.
12. A non-transitory computer-readable storage medium storing a program for causing a computer to execute a method for a parallel processing processor system that includes multiple processors and performs parallel processing on data read out from a main memory using the multiple processors, wherein the system comprises multiple processor elements, each processor element including a processor and a cache that holds an instruction corresponding to at least part of a program executed by the processor, and a shared memory, whose latency with the processors is less than the latency between the main memory and the processors, that stores the program transferred from the main memory and is shared by the multiple processor elements,
wherein the method comprises:
updating the instruction in the cache with an instruction in the program stored in the shared memory in the case where a cache miss has occurred in the cache;
controlling transfers between the main memory and the shared memory; and
requesting to rewrite the program in the shared memory in response to a synchronizing signal.
US13/784,738 2009-03-04 2013-03-04 Parallel processing processor system Abandoned US20130179644A1 (en)

Priority Applications (1)

Application Number Priority Date Filing Date Title
US13/784,738 US20130179644A1 (en) 2009-03-04 2013-03-04 Parallel processing processor system

Applications Claiming Priority (4)

Application Number Priority Date Filing Date Title
JP2009-051287 2009-03-04
JP2009051287A JP5411530B2 (en) 2009-03-04 2009-03-04 Parallel processor system
US12/712,128 US8397033B2 (en) 2009-03-04 2010-02-24 Parallel processing processor system
US13/784,738 US20130179644A1 (en) 2009-03-04 2013-03-04 Parallel processing processor system

Related Parent Applications (1)

Application Number Title Priority Date Filing Date
US12/712,128 Continuation US8397033B2 (en) 2009-03-04 2010-02-24 Parallel processing processor system

Publications (1)

Publication Number Publication Date
US20130179644A1 true US20130179644A1 (en) 2013-07-11

Family

ID=42679240

Family Applications (2)

Application Number Title Priority Date Filing Date
US12/712,128 Expired - Fee Related US8397033B2 (en) 2009-03-04 2010-02-24 Parallel processing processor system
US13/784,738 Abandoned US20130179644A1 (en) 2009-03-04 2013-03-04 Parallel processing processor system

Family Applications Before (1)

Application Number Title Priority Date Filing Date
US12/712,128 Expired - Fee Related US8397033B2 (en) 2009-03-04 2010-02-24 Parallel processing processor system

Country Status (2)

Country Link
US (2) US8397033B2 (en)
JP (1) JP5411530B2 (en)

Cited By (5)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
US20150007078A1 (en) * 2013-06-28 2015-01-01 Sap Ag Data Displays in a Tile-Based User Interface
US20150339062A1 (en) * 2014-05-20 2015-11-26 Fujitsu Limited Arithmetic processing device, information processing device, and control method of arithmetic processing device
US9967417B2 (en) 2015-01-21 2018-05-08 Canon Kabushiki Kaisha Managing apparatus power states
US10045154B2 (en) 2015-02-13 2018-08-07 Qualcomm Incorporated Proximity based device usage
US11388312B2 (en) * 2020-07-28 2022-07-12 Seiko Epson Corporation Image processing apparatus and image processing method

Families Citing this family (6)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
US8427660B2 (en) * 2010-03-23 2013-04-23 Fuji Xerox Co., Ltd. Image processing apparatus, image forming apparatus, and computer readable medium storing program
KR101754998B1 (en) 2011-01-27 2017-07-06 삼성전자주식회사 Multi Core System and Method for Processing Data
JP5834672B2 (en) * 2011-09-16 2015-12-24 株式会社リコー Image processing apparatus, image processing method, image forming apparatus, program, and recording medium
JP6099418B2 (en) 2013-02-01 2017-03-22 ルネサスエレクトロニクス株式会社 Semiconductor device and data processing method thereof
JP2017220802A (en) * 2016-06-07 2017-12-14 住友電気工業株式会社 Optical transceiver
CN109766139B (en) * 2018-12-13 2023-02-14 平安普惠企业管理有限公司 Configuration method and device of configuration file

Citations (3)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
US5664214A (en) * 1994-04-15 1997-09-02 David Sarnoff Research Center, Inc. Parallel processing computer containing a multiple instruction stream processing architecture
US20030088182A1 (en) * 2001-09-28 2003-05-08 Teratech Corporation Ultrasound imaging system
US20040249880A1 (en) * 2001-12-14 2004-12-09 Martin Vorbach Reconfigurable system

Family Cites Families (8)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
JP3820645B2 (en) * 1996-09-20 2006-09-13 株式会社日立製作所 Multiprocessor system
JPH11203200A (en) * 1998-01-12 1999-07-30 Sony Corp Parallel processor and memory control method
JP2001051898A (en) * 1999-08-05 2001-02-23 Hitachi Ltd Method for referring data in hierarchical cache memory and data processor including hierarchical cache memory
JP4457577B2 (en) * 2003-05-19 2010-04-28 日本電気株式会社 Multiprocessor system
US7136967B2 (en) * 2003-12-09 2006-11-14 International Business Machinces Corporation Multi-level cache having overlapping congruence groups of associativity sets in different cache levels
JP2006133839A (en) * 2004-11-02 2006-05-25 Seiko Epson Corp Image processing device, print device and image processing method
US7809926B2 (en) * 2006-11-03 2010-10-05 Cornell Research Foundation, Inc. Systems and methods for reconfiguring on-chip multiprocessors
US7702888B2 (en) * 2007-02-28 2010-04-20 Globalfoundries Inc. Branch predictor directed prefetch

Patent Citations (3)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
US5664214A (en) * 1994-04-15 1997-09-02 David Sarnoff Research Center, Inc. Parallel processing computer containing a multiple instruction stream processing architecture
US20030088182A1 (en) * 2001-09-28 2003-05-08 Teratech Corporation Ultrasound imaging system
US20040249880A1 (en) * 2001-12-14 2004-12-09 Martin Vorbach Reconfigurable system

Non-Patent Citations (1)

* Cited by examiner, † Cited by third party
Title
CPU Cache, Wikipedia, February 2009 *

Cited By (7)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
US20150007078A1 (en) * 2013-06-28 2015-01-01 Sap Ag Data Displays in a Tile-Based User Interface
US10775971B2 (en) 2013-06-28 2020-09-15 Successfactors, Inc. Pinch gestures in a tile-based user interface
US20150339062A1 (en) * 2014-05-20 2015-11-26 Fujitsu Limited Arithmetic processing device, information processing device, and control method of arithmetic processing device
US9766820B2 (en) * 2014-05-20 2017-09-19 Fujitsu Limited Arithmetic processing device, information processing device, and control method of arithmetic processing device
US9967417B2 (en) 2015-01-21 2018-05-08 Canon Kabushiki Kaisha Managing apparatus power states
US10045154B2 (en) 2015-02-13 2018-08-07 Qualcomm Incorporated Proximity based device usage
US11388312B2 (en) * 2020-07-28 2022-07-12 Seiko Epson Corporation Image processing apparatus and image processing method

Also Published As

Publication number Publication date
US20100228920A1 (en) 2010-09-09
JP2010205083A (en) 2010-09-16
JP5411530B2 (en) 2014-02-12
US8397033B2 (en) 2013-03-12

Similar Documents

Publication Publication Date Title
US8397033B2 (en) Parallel processing processor system
US9507584B2 (en) Electronic device including a memory technology device
US9292777B2 (en) Information processing apparatus, information processing method, and storage medium
US20120079141A1 (en) Information processing apparatus and inter-processor communication control method
US20130247049A1 (en) Control apparatus and method of starting control apparatus
US8526039B2 (en) Image processing apparatus, and control method thereof and program
US20180060081A1 (en) Information processing apparatus with semiconductor integrated circuits, control method therefor, and storage medium
US20160154603A1 (en) Data transfer control device, apparatus including the same, and data transfer control method
JP2008067299A (en) Image forming apparatus
US20150032885A1 (en) Image processing apparatus and control method
JP2015084507A (en) Image processing apparatus, integrated circuit, and image forming apparatus
JP7081477B2 (en) Image processing device, control method of image processing device, and program
JP4034323B2 (en) Image data processing method, image data processing apparatus, and image forming apparatus
JP2015103121A (en) Electronic equipment
JP2011068012A (en) Information processor, method and program for controlling the same
JP2018118477A (en) Image processing device, control method and program of the same
JP7194009B2 (en) Data processing apparatus and method
JP4516336B2 (en) Image processing apparatus, image forming apparatus, image processing method, computer program, and recording medium
JP2008027353A (en) Dma control method and dma controller
JP6091481B2 (en) Image processing apparatus and screen display method
JP2023170564A (en) Information processing apparatus, image formation apparatus, information processing method, and information processing program
JP2014154000A (en) Memory control device, and control method and control program thereof
JP2005242917A (en) Image forming device
JP2011239042A (en) Image processing apparatus, image processing method, and image processing program
JP2014106868A (en) Semiconductor device, image forming apparatus, and access control method

Legal Events

Date Code Title Description
STCB Information on status: application discontinuation

Free format text: ABANDONED -- FAILURE TO RESPOND TO AN OFFICE ACTION