WO2012004990A1 - プロセッサ - Google Patents
プロセッサ Download PDFInfo
- Publication number
- WO2012004990A1 WO2012004990A1 PCT/JP2011/003861 JP2011003861W WO2012004990A1 WO 2012004990 A1 WO2012004990 A1 WO 2012004990A1 JP 2011003861 W JP2011003861 W JP 2011003861W WO 2012004990 A1 WO2012004990 A1 WO 2012004990A1
- Authority
- WO
- WIPO (PCT)
- Prior art keywords
- instruction
- memory area
- thread
- completed
- read
- Prior art date
Links
- 238000012545 processing Methods 0.000 claims description 90
- 238000000034 method Methods 0.000 claims description 50
- 238000006243 chemical reaction Methods 0.000 claims description 49
- 230000008569 process Effects 0.000 claims description 39
- 238000013139 quantization Methods 0.000 claims description 10
- 238000007726 management method Methods 0.000 description 107
- 238000001514 detection method Methods 0.000 description 67
- 239000000872 buffer Substances 0.000 description 36
- 230000000875 corresponding effect Effects 0.000 description 14
- 230000006870 function Effects 0.000 description 9
- 238000010586 diagram Methods 0.000 description 8
- 230000015556 catabolic process Effects 0.000 description 4
- 238000006731 degradation reaction Methods 0.000 description 4
- 238000004364 calculation method Methods 0.000 description 3
- 238000004891 communication Methods 0.000 description 3
- 238000012986 modification Methods 0.000 description 3
- 230000004048 modification Effects 0.000 description 3
- 238000005516 engineering process Methods 0.000 description 2
- 230000002401 inhibitory effect Effects 0.000 description 2
- 238000013519 translation Methods 0.000 description 2
- 238000004458 analytical method Methods 0.000 description 1
- 238000013459 approach Methods 0.000 description 1
- 230000008859 change Effects 0.000 description 1
- 230000001276 controlling effect Effects 0.000 description 1
- 230000002596 correlated effect Effects 0.000 description 1
- 230000008878 coupling Effects 0.000 description 1
- 238000010168 coupling process Methods 0.000 description 1
- 238000005859 coupling reaction Methods 0.000 description 1
- 238000013461 design Methods 0.000 description 1
- 238000001914 filtration Methods 0.000 description 1
- 230000006872 improvement Effects 0.000 description 1
- 238000013404 process transfer Methods 0.000 description 1
- 230000004044 response Effects 0.000 description 1
- 239000004065 semiconductor Substances 0.000 description 1
- 239000013589 supplement Substances 0.000 description 1
- 230000001960 triggered effect Effects 0.000 description 1
Images
Classifications
-
- G—PHYSICS
- G06—COMPUTING; CALCULATING OR COUNTING
- G06F—ELECTRIC DIGITAL DATA PROCESSING
- G06F9/00—Arrangements for program control, e.g. control units
- G06F9/06—Arrangements for program control, e.g. control units using stored programs, i.e. using an internal store of processing equipment to receive or retain programs
- G06F9/30—Arrangements for executing machine instructions, e.g. instruction decode
- G06F9/30003—Arrangements for executing specific machine instructions
- G06F9/30076—Arrangements for executing specific machine instructions to perform miscellaneous control operations, e.g. NOP
- G06F9/3009—Thread control instructions
-
- G—PHYSICS
- G06—COMPUTING; CALCULATING OR COUNTING
- G06F—ELECTRIC DIGITAL DATA PROCESSING
- G06F9/00—Arrangements for program control, e.g. control units
- G06F9/06—Arrangements for program control, e.g. control units using stored programs, i.e. using an internal store of processing equipment to receive or retain programs
- G06F9/30—Arrangements for executing machine instructions, e.g. instruction decode
- G06F9/30003—Arrangements for executing specific machine instructions
- G06F9/3004—Arrangements for executing specific machine instructions to perform operations on memory
- G06F9/30047—Prefetch instructions; cache control instructions
-
- G—PHYSICS
- G06—COMPUTING; CALCULATING OR COUNTING
- G06F—ELECTRIC DIGITAL DATA PROCESSING
- G06F9/00—Arrangements for program control, e.g. control units
- G06F9/06—Arrangements for program control, e.g. control units using stored programs, i.e. using an internal store of processing equipment to receive or retain programs
- G06F9/30—Arrangements for executing machine instructions, e.g. instruction decode
- G06F9/30003—Arrangements for executing specific machine instructions
- G06F9/30076—Arrangements for executing specific machine instructions to perform miscellaneous control operations, e.g. NOP
-
- G—PHYSICS
- G06—COMPUTING; CALCULATING OR COUNTING
- G06F—ELECTRIC DIGITAL DATA PROCESSING
- G06F9/00—Arrangements for program control, e.g. control units
- G06F9/06—Arrangements for program control, e.g. control units using stored programs, i.e. using an internal store of processing equipment to receive or retain programs
- G06F9/30—Arrangements for executing machine instructions, e.g. instruction decode
- G06F9/38—Concurrent instruction execution, e.g. pipeline or look ahead
- G06F9/3824—Operand accessing
- G06F9/3834—Maintaining memory consistency
-
- G—PHYSICS
- G06—COMPUTING; CALCULATING OR COUNTING
- G06F—ELECTRIC DIGITAL DATA PROCESSING
- G06F9/00—Arrangements for program control, e.g. control units
- G06F9/06—Arrangements for program control, e.g. control units using stored programs, i.e. using an internal store of processing equipment to receive or retain programs
- G06F9/30—Arrangements for executing machine instructions, e.g. instruction decode
- G06F9/38—Concurrent instruction execution, e.g. pipeline or look ahead
- G06F9/3836—Instruction issuing, e.g. dynamic instruction scheduling or out of order instruction execution
- G06F9/3838—Dependency mechanisms, e.g. register scoreboarding
-
- G—PHYSICS
- G06—COMPUTING; CALCULATING OR COUNTING
- G06F—ELECTRIC DIGITAL DATA PROCESSING
- G06F9/00—Arrangements for program control, e.g. control units
- G06F9/06—Arrangements for program control, e.g. control units using stored programs, i.e. using an internal store of processing equipment to receive or retain programs
- G06F9/30—Arrangements for executing machine instructions, e.g. instruction decode
- G06F9/38—Concurrent instruction execution, e.g. pipeline or look ahead
- G06F9/3836—Instruction issuing, e.g. dynamic instruction scheduling or out of order instruction execution
- G06F9/3851—Instruction issuing, e.g. dynamic instruction scheduling or out of order instruction execution from multiple instruction streams, e.g. multistreaming
-
- G—PHYSICS
- G06—COMPUTING; CALCULATING OR COUNTING
- G06F—ELECTRIC DIGITAL DATA PROCESSING
- G06F9/00—Arrangements for program control, e.g. control units
- G06F9/06—Arrangements for program control, e.g. control units using stored programs, i.e. using an internal store of processing equipment to receive or retain programs
- G06F9/46—Multiprogramming arrangements
- G06F9/48—Program initiating; Program switching, e.g. by interrupt
- G06F9/4806—Task transfer initiation or dispatching
- G06F9/4812—Task transfer initiation or dispatching by interrupt, e.g. masked
-
- G—PHYSICS
- G06—COMPUTING; CALCULATING OR COUNTING
- G06F—ELECTRIC DIGITAL DATA PROCESSING
- G06F9/00—Arrangements for program control, e.g. control units
- G06F9/06—Arrangements for program control, e.g. control units using stored programs, i.e. using an internal store of processing equipment to receive or retain programs
- G06F9/46—Multiprogramming arrangements
- G06F9/48—Program initiating; Program switching, e.g. by interrupt
- G06F9/4806—Task transfer initiation or dispatching
- G06F9/4843—Task transfer initiation or dispatching by program, e.g. task dispatcher, supervisor, operating system
-
- G—PHYSICS
- G06—COMPUTING; CALCULATING OR COUNTING
- G06F—ELECTRIC DIGITAL DATA PROCESSING
- G06F9/00—Arrangements for program control, e.g. control units
- G06F9/06—Arrangements for program control, e.g. control units using stored programs, i.e. using an internal store of processing equipment to receive or retain programs
- G06F9/46—Multiprogramming arrangements
- G06F9/52—Program synchronisation; Mutual exclusion, e.g. by means of semaphores
- G06F9/526—Mutual exclusion algorithms
Definitions
- the present invention relates to a technology that improves computation efficiency by efficiently transmitting and receiving data shared by a plurality of threads in a processor that can execute a plurality of threads simultaneously.
- the performance improvement is achieved by using a high-performance multi-thread processor (see, for example, Non-Patent Document 1) in which the calculation efficiency is greatly improved by executing a plurality of programs simultaneously.
- multi-thread processors when executing multiple threads (programs) simultaneously, there is a dependency between one thread and another thread, for example, after a given write instruction by one thread is executed, the other thread is executed. Thus, there is a situation where a read instruction for reading out data of a portion written by the predetermined write instruction is executed.
- Patent Document 1 As a method of realizing the processing in this case, the technology is disclosed in Patent Document 1.
- Patent Document 1 the dependency between the two is realized by managing an address of a memory area. Specifically, in the technique of Patent Document 1, when a write instruction by one thread is executed for an address being managed, it is possible to allow another thread to access an area indicated by the address, that is, read It is possible to do
- the dependency is for a single write instruction, but for example, after executing a plurality of write instructions for a memory area indicated by a single address, a read instruction is If it does, it can not cope. Because the management target is the address, when the first write instruction is executed, the read instruction to the memory area indicated by the same address by the other thread is executed. Therefore, the dependency to execute a read instruction after execution of a plurality of write instructions can not be maintained.
- an object of the present invention is to provide a processor and a method capable of maintaining the dependency between a plurality of instructions and one read instruction.
- the present invention is a processor that executes a plurality of threads, and writing to the memory area is completed in one thread that writes to a memory area commonly used by other threads.
- the use information indicating whether the writing to the memory area is completed indicates that the writing to the memory area by the one thread is completed.
- the processor executes an instruction present at a position that guarantees that writing to the memory area is completed in one thread that writes to the memory area commonly used by other threads. Then, the data existing in the memory area is read by another thread. That is, the processor can read data existing in the memory area by another thread after guaranteeing that the instruction existing before the instruction existing at the position is executed. . This allows the processor to maintain the dependency of, for example, executing the read instruction after executing the write instruction multiple times.
- FIG. 1 shows a configuration of a multi-thread processor 100.
- A is a figure which shows an example of a data structure of access management table T100
- (b) is a figure which shows an example of a data structure of Read access management table T150.
- FIG. 6 is a diagram showing the configuration of a read detection unit 116 and an instruction detection unit 117.
- 18 is a flowchart showing an operation performed by the instruction detection unit 117. It is a flowchart which shows the operation
- FIG. 16 is a diagram showing the configuration of a multi-thread processor 1100.
- FIG. 2 is a diagram showing the configuration of a multi-thread processor 2100. It is a figure which shows an example of a structure of the address conversion part 2130, and the data structure of conversion table T300.
- FIG. 16 is a diagram showing a hardware configuration of a multi-core processor system 3000. It is a figure which shows the outline in the case in which the multi thread processor 100 is applied to the decoding process of an image
- FIG. 1 is a block diagram showing the configuration of the multi-thread processor 100 according to the first embodiment.
- the multi-thread processor 100 is a processor that simultaneously and independently executes N (N is an integer of 2 or more) instruction streams (N threads), and includes an instruction memory 101, an instruction fetch control unit 102, and an instruction group A determination unit 103, N instruction buffers (a first instruction buffer 104, a second instruction buffer 105,..., An Nth instruction buffer 106), an issued instruction determination unit 107, a priority determination unit 108, N pieces of Register file (first register file 109, second register file 110,..., N-th register file 111), arithmetic unit group 112, write back bus 113, update control unit 114, data memory 115, read detection unit A command detection unit 117 and a management table storage unit 118 are provided.
- N is an integer of 2 or more instruction streams (N threads)
- N instruction buffers a first instruction buffer 104, a second instruction buffer 105,..., An Nth instruction buffer 106
- an issued instruction determination unit 107 a priority determination unit 108
- N pieces of Register file first register file
- each instruction buffer and register file are associated on a one-to-one basis, and constitute N logical processors.
- Instruction memory 101 is a memory for holding an instruction to be executed in the multi-thread processor, and holds N independently executed instruction streams (threads).
- Instruction fetch control unit 102 holds the program counter (PC) of each thread and reads an instruction to be executed next from the instruction memory.
- PC program counter
- the program counter of each thread is counted in a range of different values.
- the instruction fetch control unit 102 when the instruction fetch control unit 102 receives a Read access signal indicating whether the instruction fetch can be continued from the read detection unit 116, the instruction fetch control unit 102 continues fetching the instruction and generates an exception according to the value of the received signal. Start branch processing to the corresponding special processing vector. Specifically, when the value of the received signal is 1, the instruction fetch control unit 102 halts the instruction execution sequence up to that point and starts branch processing to a special processing vector corresponding to an exception occurrence. . If the value of the received signal is 0, the previous instruction execution sequence is continued.
- Instruction group determination unit 103 reads an instruction belonging to each instruction stream from the instruction memory 101, performs decoding, and writes the instruction in the instruction buffer to which the instruction is assigned.
- the i-th instruction buffer (i is an integer of 1 or more and N or less) receives and holds an instruction belonging to the i-th instruction stream (hereinafter referred to as the i-th instruction stream).
- Issued instruction determination unit 107 determines an instruction to be issued for each machine cycle from the N instruction buffers.
- Priority determination unit 108 holds a priority information table used when the issued instruction determination unit 107 determines an instruction to be issued.
- the issued instruction determination unit 107 uses the priority information table held by the priority determination unit 108 to determine an instruction to be issued for each machine cycle.
- the i-th register file (i is an integer of 1 or more and N or less) is a group of registers for holding data to be read and written by executing the instruction stream held in the i-th instruction buffer.
- the arithmetic unit group 112 also has a memory access 120, as shown in FIG.
- the memory access 120 is an arithmetic unit for executing an instruction to access the data memory.
- the write back bus 113 is a bus for writing the output from the computing unit group 112 back to the first register file 109 to the Nth register file 111.
- Data memory 115 The data memory 115 is accessed by an instruction to access the data memory, and holds data when executing a program.
- Management table storage unit 118 As shown in FIGS. 2A and 2B, the management table storage unit 118 stores an access management table T100 and a read access management table T150.
- Access management table T100 has an area for storing a plurality of sets of entry_valid 200, dep_id 201, and validated PC 202, as shown in FIG. 2A.
- the entry_valid 200 indicates whether or not the set (entry) is valid information. For example, when the value “0” is set in the entry_valid 200, the value “the value of the entry“ is invalid ”. When 1 ′ ′ is set, it indicates that the entry is valid.
- the dep_id 201 is information for correlating the entry with an entry in the Read access management table T150.
- the read access management table T150 has an area for storing a plurality of sets of entry_valid 210, dep_id 211, address 212, valid 213, th_id 214, and th_stride 215.
- entry_valid 210 indicates whether or not the information is valid for the set (entry), and for example, as in the entry_valid 200, when the value “0” is set in the entry_valid 210, the entry is invalid. If the value “1” is set, it indicates that the entry is valid.
- the dep_id 211 is information for correlating the entry with an entry in the access management table T100. Here, it is assumed that the same values of dep_id 211 and dep_id 201 correspond to each other.
- Address 212 indicates the start address of the memory area managed by the entry.
- the valid 213 indicates that the writing by any thread is completed to the memory area managed by the entry.
- the th_id 214 indicates the number of the thread for which the writing to the memory area managed by the entry is completed.
- Th_stride 215 is a value indicating a distance between a thread for which writing to a memory area managed by the entry is completed and a thread for reading.
- Update control unit 114 updates the access management table T100 and the read access management table T150.
- the update control unit 114 records a plurality of entries to be managed in the access management table T100 for the software.
- the update control unit 114 updates the fields in the Read access management table T150 when a software update instruction is received from software as in the case of updating the access management table T100. Both fields can be read and written by software.
- the update control unit 114 when the update control unit 114 receives a software update instruction before starting processing of software operating in N threads, the update control unit 114 records a plurality of entries to be managed in the read access management table T150 for the software.
- the update control unit 114 when the update control unit 114 receives instruction detection information from the instruction detection unit 117, the update control unit 114 updates the fields in the read access management table T150.
- the instruction detection information is information indicating that an instruction triggered by the update of the access management table T100 has been detected (hereinafter referred to as an instruction detection signal), and dep_id (hereinafter referred to as the instruction dep_id) included in the entry to be updated.
- the instruction th_id is a thread number of a program to which the instruction that is output from the computing unit group 112 when a certain instruction is executed belongs.
- the update control unit 114 when the update control unit 114 receives instruction detection information from the instruction detection unit 117, the update control unit 114 sets the value of valid included in the entry to 1 to an entry including dep_id that matches the instruction dep_id included in the received instruction detection information. Change the value of th_id of the entry to the value of the instruction th_id included in the received instruction detection information.
- the update control unit 114 updates the value of “valid” from “1” to “0” when a read instruction for a memory area whose value of “valid” is “1” is executed in a certain thread.
- Instruction detection unit 117 When the instruction is executed, the instruction detection unit 117 detects whether the program counter is managed by the access management table T100 held by the management table storage unit 118 based on the value of the program counter of the instruction. It is a department. That is, it is detected whether writing to a certain memory area is completed.
- the instruction detection unit 117 includes a table read control unit 300, a dep_id selection unit 301, and a PC comparison unit 302.
- the table read control unit 300 When the table read control unit 300 receives an instruction execution signal from the operation unit group 112 when a certain instruction is executed, the table read control unit 300 starts an operation of reading the entry of the access management table T100.
- the PC comparison unit 302 determines whether or not the validated PC included in the entry of the read access management table T100 matches the PC output from the computing unit group 112 when a certain instruction is executed. And outputs the result to the dep_id selection unit 301.
- the dep_id selection unit 301 acquires the value of entry_valid read by the table read control unit 300 and the value of dep_id.
- dep_id selection unit 301 also acquires the comparison result by the PC comparison unit 302.
- the dep_id selection unit 301 also receives an instruction execution signal, a PC, and an instruction th_id from the arithmetic unit group 112 when a certain instruction is executed.
- the dep_id selection unit 301 receives the corresponding instruction th_id, the acquired dep_id , And an instruction detection signal including an instruction detection signal are output to the update control unit 114.
- Readout detection unit 116 When an instruction to access the data memory is executed, the read detection unit 116 determines whether or not the memory area is managed by the Read access management table T150 held in the management table storage unit 118 based on the access target address. To detect
- the read detection unit 116 includes a table read control unit 400, a valid selection unit 401, a read address comparison unit 402, a th_id comparison unit 403, and an adder 404.
- the table read control unit 400 When the table read control unit 400 receives a read execution signal from the memory access 120 when the read instruction for performing read access to the data memory 115 in the memory access 120 is executed, the table read control unit 400 performs the operation of reading the entry of the read access management table T150. Start.
- the read address comparison unit 402 compares the read address value of the read access management table T150 with the read address output from the memory access 120, and outputs the comparison result to the valid selection unit 401.
- the Read address is a data memory address that is output from the memory access 120 and is an object to be subjected to the read access when an instruction to perform the read access to the data memory 115 in the memory access 120 is executed.
- the adder 404 calculates the sum of the read th_id value and the th_stride value, and outputs the calculation result to the th_id comparison unit 403.
- the th_id comparison unit 403 compares the value of Read th_id output from the memory access 120 with the value received from the adder 404 (the sum of the read th_id value and th_stride value), and compares the comparison result. Output to the valid selection unit 401.
- Read th_id is a thread number of a thread to which a read access instruction to be output is output from the memory access 120 when an instruction to read access to the data memory 115 is executed in the memory access 120.
- the valid selection unit 401 receives a read execution signal from the memory access 120.
- the valid selecting unit 401 acquires the value of entry_valid, the value of dep_id, and the value of valid read by the table reading control unit 300.
- the valid selecting unit 401 obtains each of the comparison result by the Read address comparing unit 402 and the comparison result by the th_id comparing unit 403.
- the valid selection unit 401 accesses the memory access 120 for the value of valid acquired correspondingly.
- the table read control unit 300 When a certain instruction is executed, the table read control unit 300 initializes a counter n used for control for reading an entry from the access management table T100 to 0 (step S5). Here, the counter n is held in the table read control unit 300.
- the PC comparison unit 302 acquires the value of the validated PC contained in the n-th entry from the access management table T 100 (step S 10), and the acquired value of the validated PC and the value of the PC received from the computing unit group 112 It is determined whether and are equal (step S15).
- step S15 When the PC comparison unit 302 determines that the value of the validated PC and the value of the PC received from the computing unit group 112 are equal (“Yes” in step S15), the dep_id selection unit 301 is included in the nth entry.
- the value of entry_valid is acquired (step S20).
- the fact that the determination result in step S15 is true means that the instruction shown by the PC managed in the access management table T100 has been executed, that is, it is detected that the writing to a certain memory area is completed.
- the dep_id selection unit 301 determines whether the value of the acquired entry_valid is “1” (step S25).
- the dep_id selecting unit 301 acquires the value of dep_id included in the n-th entry from the access management table T100 (step S30).
- the dep_id selecting unit 301 outputs the acquired dep_id value (instruction dep_id), th_id (instruction th_id) corresponding to the thread including the executed instruction, and an instruction detection signal to the update control unit 114 (step S35). .
- step S25 If the dep_id selecting unit 301 determines that the value is not "1" ("No" in step S25), the table read control unit 300 increments the value of the counter n (step S40). The table read control unit 300 acquires the termination number of the entry registered in the access management table T100 from the entry termination register (step S45). Then, table read control unit 300 determines whether the value of counter n matches the end number of the entry (step S50). If it is determined that they are equal (“Yes” in step S50), and if they are not equal, the process returns to step S10 ("No" in step S50).
- the update control unit 114 When receiving the instruction dep_id, the instruction th_id and the instruction detection signal from the instruction detection unit 117, the update control unit 114 initializes a counter m used for control to read an entry from the read access management table T150 to 0 (step S100). .
- the counter m is held in the update control unit 114.
- the update control unit 114 acquires the value of dep_id included in the m-th entry from the Read access management table T150 (step S105), and the acquired value of dep_id and the value of the instruction dep_id received from the instruction detection unit 117 are It is determined whether they are equal (step S110).
- step S110 If it is determined that the value of dep_id is equal to the value of the instruction dep_id (“Yes” in step S110), the update control unit 114 sets “1” to the value of valid included in the mth entry from the Read access management table T150. And the value of th_id is changed to the value of the instruction th_id received from the instruction detection unit 117 (step S115).
- step S110 If it is determined that the value of dep_id is not equal to the value of the instruction dep_id (“No” in step S110), the update control unit 114 increments the value of the counter m (step S120). The update control unit 114 acquires the termination number of the entry registered in the Read access management table T150 from the entry termination register (step S125). Then, the update control unit 114 determines whether the value of the counter m matches the end number of the entry (step S130). If it is determined that they are equal (“Yes” in step S130), and if they are not equal, the process returns to step S105 ("No" in step S130).
- the table read control unit 400 initializes a counter p used for control for reading an entry from the read access management table T150 to 0 (step S200).
- the counter p is held in the table read control unit 400.
- the Read address comparison unit 402 acquires the value of Address included in the p-th entry of the Read access management table T150 (step S205), and the acquired value of Address matches the value of the Read address received from the memory access 120. It is determined whether or not to do (step S210).
- the valid selecting unit 401 acquires the value of entry_valid included in the p-th entry of the read access management table T150 (step S215). The valid selecting unit 401 determines whether the acquired value of entry_valid is 1 (step S220).
- step S220 When the valid selecting unit 401 determines that the value of entry_valid is 1 (“Yes” in step S220), the adder 404 calculates the value of th_id and the value of th_stride included in the p-th entry of the Read access management table T150. Are obtained, and the sum is calculated (step S225).
- the th_id comparison unit 403 determines whether the calculated sum (the sum of the value of th_id and the value of th_stride) matches the value of Read th_id received from the memory access 120 (step S230).
- step S230 If it is determined that they match ("Yes” in step S230), the valid selecting unit 401 acquires the value of valid included in the p-th entry of the Read access management table T150 (step S235).
- that the comparison result is true indicates that the memory area has been written by the desired preceding thread indicated by the value of th_stride.
- the valid selecting unit 401 determines whether the acquired value of valid is 1 (step S240).
- step S240 If it is determined that the value of valid is 1 ("Yes" in step S240), the valid selecting unit 401 outputs a Read access signal having a value of 0 to the instruction fetch control unit 102 (step S245).
- step S210 When the read address comparison unit 402 determines that they do not match (“No” in step S210), and when the valid selection unit 401 determines that the value of entry_valid is not 1 (“No” in step S220), The table read control unit 400 increments the value of the counter p (step S250). The table read control unit 400 acquires the termination number of the entry registered in the Read access management table T150 from the entry termination register (step S255). Then, table read control unit 400 determines whether the value of counter p matches the end number of the entry (step S260). If it is determined that they are equal (“Yes" in step S260), and if they are not equal, the process returns to step S205 ("No" in step S260).
- the valid selecting unit 401 If it is determined that the value of valid is not 1 ("No" in step S240), the valid selecting unit 401 outputs a Read access signal whose value is 1 to the instruction fetch control unit 102 (step S265).
- the instruction fetch control unit 102 when the value of the received signal is 1, when the instruction fetch control unit 102 receives the Read access signal, the instruction execution control unit 102 stops the instruction execution sequence up to that point, and sends a special processing vector corresponding to an exception occurrence. Start branch processing. Because the instruction to the PC managed in the access management table T100 is not executed, that is, the write to the memory area by the desired preceding thread is not completed, so an exception is generated and the sequence waits for the completion of the write. Because it is necessary to enter.
- the multi-thread processor 100 described in the present embodiment can maintain the dependency that, for example, a read instruction is executed after executing a write instruction a plurality of times.
- the code for synchronization becomes unnecessary as compared with the processing by the semaphore as in the prior art, and the communication overhead is reduced. Therefore, it is possible to realize a processor that does not become a major performance degradation factor even when software processing for managing dependencies between threads becomes large when the number of threads increases or when the dependency becomes complicated.
- timing of generation of the access management table T100 and the read access management table T150 is such that after the program to be executed is divided by the parallelization tool and allocation is performed to each thread so that processing can be performed in parallel. It is. This is because assigning to each thread makes the dependency among the threads clear.
- FIG. 7 is a block diagram showing the configuration of the multi-thread processor 1100 according to the second embodiment.
- the multi-thread processor 1100 is a processor that simultaneously and independently executes N (N is an integer of 2 or more) instruction streams (N threads), and includes an instruction memory 1101, an instruction fetch control unit 1102, and an instruction group A decision unit 1103, N instruction buffers (first instruction buffer 1104, second instruction buffer 1105,..., Nth instruction buffer 1106), an issued instruction decision unit 1107, a priority decision unit 1108, N instructions Register file (first register file 1109, second register file 1110,..., Nth register file 1111), arithmetic unit group 1112, write back bus 1113, update control unit 1114, data memory 1115, read detection unit 1116, instruction detection unit 1117, management table storage unit 1118, address conversion It is equipped with a 1130.
- each instruction buffer and register file are in one-to-one correspondence, and configure N logical processors as in the first embodiment.
- the address conversion unit 1130 converts the fetch address (logical address) input from the instruction fetch control unit 1102 into another address (physical address) using the conversion table T200 and outputs the address to the instruction memory 1101. This operation is the operation of a translation lookaside buffer (TLB) for managing a page in the virtual space on a processor equipped with an MMU (memory management unit) to handle the virtual space (for example, Reference 2).
- TLB translation lookaside buffer
- Non-Patent Document 2 "Modern Processor Design", McGraw-Hill Series in Electrical and Computer Engineering, p. 142-145 (ISBN 0-07-057064)
- address conversion unit 1130 specific functions of the address conversion unit 1130 in the present embodiment will be described.
- the address conversion unit 1130 has a conversion table T200, as shown in FIG.
- the conversion table T200 has an area for storing a plurality of sets (entries) consisting of PC check flags, flags, logical addresses, and physical addresses.
- a memory area called a page of 4 KB is managed.
- the PC check flag is a flag indicating that there is a possibility that a page to be notified to the instruction detection unit 1117 to be checked is included in the page included in the entry. Specifically, when the value of the PC check flag is 1, it indicates that there is a possibility that the PC to be checked is included, and when the value is 0, the PC to be checked is included. Indicates that there is no possibility.
- Flags are flags provided in a general TLB and will not be described in detail in this patent.
- the logical address is a logical address field provided in a general TLB and will not be described in detail in this patent.
- the physical address is a physical address field provided in a general TLB and will not be described in detail in this patent.
- the address conversion unit 1130 converts the logical address into a physical address using the conversion table T200.
- the address conversion unit 1130 When the value of the PC check flag corresponding to the received logical address is 1, the address conversion unit 1130 outputs the physical address and the PC check request whose value is 1 to the instruction memory 1101. If the value of the PC check flag is 0, the address conversion unit 1130 outputs a physical address and a PC check request whose value is 0 to the instruction memory 1101.
- the PC check request indicates to the instruction detection unit 1117 whether to check the PC of the instruction to be executed or not, and should be checked if the value of the PC check request is 1. In the case where the value is 0, it indicates that the check is unnecessary.
- the instruction memory 1101 is a memory that holds instructions to be executed in the multi-thread processor, and holds N independently executed instruction streams (threads).
- instruction memory 1101 When instruction memory 1101 receives a PC check request whose value is 1 from instruction fetch control unit 1102, when fetching an instruction specified by a physical address, it is an instruction to be checked for the fetched instruction. And the flag information indicating the flag is output to the instruction fetch control unit 1102.
- Instruction fetch control unit 1102 holds the program counter of each thread and reads an instruction to be executed next from the instruction memory.
- the instruction fetch control unit 1102 outputs the logical address of the instruction to be executed next to the address conversion unit 1130, and then receives an instruction from the instruction memory 1101. When an instruction is received, a flag state is added if the instruction is to check the PC.
- Instruction group determination unit 1103 is the same as the instruction group determination unit 103 described in the first embodiment, and thus the description thereof is omitted here.
- First instruction buffer 1104 to Nth instruction buffer 1106 The first instruction buffer 1104 to the N-th instruction buffer 1106 are the same as the instruction buffers shown in the first embodiment, and thus the description thereof is omitted here.
- the i-th instruction stream is referred to as an i-th instruction stream (i is an integer of 1 or more and N or less).
- Issued instruction determination unit 1107 The issued instruction determining unit 1107 is the same as the issued instruction determining unit 107 described in the first embodiment, and thus the description thereof is omitted here.
- Priority determination unit 1108 is similar to the priority determination unit 108 described in the first embodiment, and thus the description thereof is omitted here.
- the first register file 1109 to the Nth register file 1111 are the same as the register files shown in the first embodiment, and thus the description thereof is omitted here.
- the arithmetic unit group 1112 is a processing unit including a plurality of arithmetic units such as an adder and a multiplier as in the first embodiment, and also has a memory access 1120.
- the memory access 120 is an arithmetic unit for executing an instruction to access the data memory.
- the computing unit group 1112 When the flag state is added to the instruction to be executed, the computing unit group 1112 notifies the information indicating that the PC should be checked.
- the write back bus 1113 is the same as the write back bus 113 described in the first embodiment, and thus the description thereof is omitted here.
- Data memory 1115 The data memory 1115 is the same as the data memory 115 shown in the first embodiment, and thus the description thereof is omitted here.
- Management table storage unit 1118 stores an access management table and a read access management table, as in the first embodiment. Note that, if necessary in the following description, the description will be made using the access management table T100 and the Read access management table T150 shown in FIGS. 2A and 2B.
- Update control unit 1114 updates the access management table T100 and the read access management table T150 in the same manner as the update control unit 114 according to the first embodiment.
- description here is abbreviate
- Instruction detection unit 1117 has the same components as the instruction detection unit 117 described in the first embodiment, and is held by the management table storage unit 1118 based on the value of the program counter of the instruction when the instruction is executed. A process is performed to detect whether the program counter is managed by the access management table T100.
- the difference from the first embodiment is that the above-described process is started when notified of information to check the PC from the computing unit group 1112.
- Readout detection unit 1116 The read detection unit 1116 has the same components as the read detection unit 116 described in the first embodiment, and when executing an instruction to access the data memory, the management table storage unit 1118 is used based on the access target address. It is detected whether or not the memory area is managed by the Read access management table T150 held by
- Operation at the time of instruction detection is the same flow of operations as the operation (see FIG. 4) shown in the first embodiment, but the start timing is different.
- the instruction detection unit 1117 starts the processing when the information indicating that the PC should be checked is notified from the computing unit group 1112.
- the address conversion unit 1130 As described above, by using the address conversion unit 1130, the number of instructions executed by the computing unit group 1112 and checked by the instruction detection unit 1117 is significantly reduced, and the operation frequency of the instruction detection unit 1117 is reduced. Power consumption of the circuit can be reduced.
- the multi-thread processor 100 described in the present embodiment can maintain the dependency of executing the read instruction after executing the write instruction a plurality of times, for example.
- the instruction execution is managed at the hardware level, and the code for synchronization becomes unnecessary as compared with the processing by the semaphore as in the prior art. , Communication overhead is reduced. Therefore, it is possible to realize a processor that does not become a major performance degradation factor even when software processing for managing dependencies between threads becomes large when the number of threads increases or when the dependency becomes complicated.
- the timing of generation of the access management table T100 and the read access management table T150 is such that the program to be executed is divided by the parallelization tool so that processing can be performed in parallel. It is after assignment to a thread.
- the conversion table T200 is also generated after the program to be executed is divided by the parallelization tool and allocated to each thread so that processing can be performed in parallel. This is because, by assigning to each thread, pages etc. used between each thread can be clearly determined.
- FIG. 9 is a block diagram showing a configuration of the multi-thread processor 2100 in the third embodiment.
- the multi-thread processor 2100 is a processor that simultaneously and independently executes N (N is an integer of 2 or more) instruction streams (N threads), and includes an instruction memory 2101, an instruction fetch control unit 2102, and an instruction group A decision unit 2103, N instruction buffers (first instruction buffer 2104, second instruction buffer 2105,..., Nth instruction buffer 2106), an issued instruction decision unit 2107, a priority decision unit 2108, N instructions Register file (first register file 2109, second register file 2110,..., Nth register file 2111), arithmetic operation unit group 2112, write back bus 2113, update control unit 2114, data memory 2115, read detection unit 2116, instruction detection unit 2117, management table storage unit 2118, address conversion It is equipped with a 2130.
- each instruction buffer and register file are in one-to-one correspondence, and constitute N logical processors as in the first and second embodiments.
- Instruction memory 2101 is a memory for holding an instruction to be executed in a multi-thread processor, and holds N independently executed instruction streams (threads).
- Instruction fetch control unit 2102 is the same as the instruction fetch control unit 102 described in the first embodiment, and thus the description thereof is omitted here.
- Instruction group determination unit 2103 is the same as the instruction group determination unit 103 described in the first embodiment, and thus the description thereof is omitted here.
- First instruction buffer 2104 to Nth instruction buffer 2106 The first instruction buffer 2104 to the N-th instruction buffer 2106 are the same as the instruction buffers shown in the first embodiment, and thus the description thereof is omitted here.
- the i-th instruction stream is referred to as an i-th instruction stream (i is an integer of 1 or more and N or less).
- Issued instruction determination unit 2107 The issued instruction determining unit 2107 is the same as the issued instruction determining unit 107 described in the first embodiment, and thus the description thereof is omitted here.
- Priority determination unit 2108 is the same as the priority determination unit 108 described in the first embodiment, and thus the description thereof is omitted here.
- First register file 2109 to Nth register file 2111 are the same as the register files shown in the first embodiment, and thus the description thereof is omitted here.
- the address conversion unit 2130 converts the access address (logical address) input from the memory access 2120 into another address (physical address) using the conversion table T300 and outputs the address to the data memory 2115.
- This operation is an operation of a translation lookaside buffer (TLB) for managing a page in the virtual space on a processor provided with an MMU (memory management unit) to handle the virtual space (see Non-Patent Document 1). .
- TLB translation lookaside buffer
- the address conversion unit 2130 has a conversion table T300 as shown in FIG.
- the conversion table T300 has an area for storing a plurality of sets (entries) including a Read check flag, flags, logical addresses, and physical addresses.
- a memory area called a page of 4 KB is managed.
- the Read check flag is a flag indicating that the page included in the entry may include a Read address to be notified and checked by the read detection unit 2116. Specifically, when the value of the Read check flag is 1, it indicates that the read address to be checked may be included, and when the value is 0, the read address to be checked is included. Indicates that there is no possibility of being
- Flags are flags provided in a general TLB and will not be described in detail in this patent.
- the logical address is a logical address field provided in a general TLB and will not be described in detail in this patent.
- the logical address is a physical address field provided in a general TLB and will not be described in detail in this patent.
- the address conversion unit 2130 converts the logical address into a physical address using the conversion table T300.
- the address conversion unit 2130 When the value of the Read check flag corresponding to the received logical address is 1, the address conversion unit 2130 outputs a Read check request whose value is 1 to the memory access 2120. If the value of the Read check flag is 0, the address conversion unit 2130 outputs a Read check request whose value is 0 to the memory access 2120.
- the Read check request indicates whether the read access of the Read instruction to be executed by notifying the read detection unit 2116 should be checked or not, and if the value of the Read check request is 1, the check is performed. It indicates that it should be done, and if the value is 0, it indicates that the check is unnecessary.
- the arithmetic unit group 2112 is a processing unit including a plurality of arithmetic units such as an adder and a multiplier as in the first embodiment, and also has a memory access 2120.
- the memory access 220 is an arithmetic unit for executing an instruction to access the data memory.
- the memory access 2120 When the memory access 2120 receives a read check request for an instruction to be executed from the address conversion unit 2130, the memory access 2120 notifies the information that the read address should be checked.
- the write back bus 2113 is the same as the write back bus 113 described in the first embodiment, and thus the description thereof is omitted here.
- Data memory 2115 The data memory 2115 is the same as the data memory 115 shown in the first embodiment, and thus the description thereof is omitted here.
- Management table storage unit 2118 stores an access management table and a read access management table, as in the first embodiment. Note that, if necessary in the following description, the description will be made using the access management table T100 and the Read access management table T150 shown in FIGS. 2A and 2B.
- Update control unit 2114 updates the access management table T100 and the read access management table T150 in the same manner as the update control unit 114 according to the first embodiment.
- description here is abbreviate
- Instruction detection unit 2117 has the same components as the instruction detection unit 117 described in the first embodiment, and is held by the management table storage unit 2118 based on the value of the program counter of the instruction when the instruction is executed. A process is performed to detect whether the program counter is managed by the access management table T100.
- Readout detection unit 2116 The read detection unit 2116 has the same components as the read detection unit 116 described in the first embodiment, and when executing an instruction to access the data memory, the management table storage unit 1118 is used based on the access target address. It is detected whether or not the memory area is managed by the Read access management table T150 held by
- the difference from the first and second embodiments is that the above process is started when the memory access 2120 notifies the information that the read address should be checked.
- the operation at the time of detecting the Read instruction in the present embodiment is the same flow of operations as the operation (see FIG. 6) shown in the first embodiment, but the timing of the start Is different.
- the read detection unit 2116 starts the process when the memory access 2120 notifies of the information that the read address should be checked.
- the multi-thread processor 100 described in the present embodiment can maintain the dependency of executing the read instruction after executing the write instruction a plurality of times, for example.
- the instruction execution is managed at the hardware level, and the code for synchronization becomes unnecessary as compared with the processing by the semaphore as in the prior art. , Communication overhead is reduced. Therefore, it is possible to realize a processor that does not become a major performance degradation factor even when software processing for managing dependencies between threads becomes large when the number of threads increases or when the dependency becomes complicated.
- the multi-thread processor 2100 described in the present embodiment is configured by adding the address conversion unit 2130 to the components of the multi-thread processor 100 described in the first embodiment, but the multi-thread processor 2100 described in the present embodiment
- the processor 2100 may be configured by adding an address conversion unit 2130 to the components of the multi-thread processor 1100 described in the second embodiment.
- the timing of generation of the access management table T100 and the read access management table T150 is such that the program to be executed is divided by the parallelization tool so that processing can be performed in parallel. It is after assignment to a thread.
- the conversion table T300 is also generated after the program to be executed is divided by the parallelization tool and allocated to each thread so that processing can be performed in parallel. This is because, by assigning to each thread, pages etc. used between each thread can be clearly determined.
- FIG. 11 is a block diagram showing the hardware configuration of the multi-core processor system 3000 according to the fourth embodiment.
- the multi-core processor system 3000 is composed of multi-thread processors 100a and 100b as shown in FIG.
- the multithreaded processors 100a and 100b both have the same configuration as the multithreaded processor 100 described in the first embodiment.
- the multi-thread processor 100 differs from the multi-thread processor 100 in that the functions of the update control unit 114 a of the multi-thread processor 100 a and the update control unit 114 b of the multi-thread processor 100 b are the same as the functions of the update control unit 114 in the first embodiment. It is a different point.
- the management table storage unit 118a of the multi-thread processor 100a and the management table storage unit 118b of the multi-thread processor 100b are the same as the management table storage unit 118b shown in the first embodiment, so I omit it.
- the update control unit 114a updates each table held by the multi-thread processor 100a and also updates each table held by the multi-thread processor 100b.
- the update control unit 114b updates the tables held by the multi-thread processor 100b and also updates the tables held by the multi-thread processor 100a.
- the update timing is the same as that of the first embodiment, and thus detailed description thereof will be omitted.
- the multi-thread processor 3100 described in the present embodiment is configured by changing the update control unit 114 of the multi-thread processor 100 described in the first embodiment, the present invention is not limited to this.
- the multi-thread processor 3100 described in the present embodiment may have a configuration obtained by modifying the update control unit 1114 of the multi-thread processor 1100 described in the second embodiment, or the multi-thread processor 2100 described in the third embodiment.
- the update control unit 2114 may be modified.
- the multi-thread processor 100 shown in the first embodiment is applied to decoding and encoding of system LSI video for digital AV equipment. The operation of the case will be described.
- the multi-thread processor 100 decodes four macroblocks (MBn, MBn + 1, Bn + 2, Bn + 3) as shown in FIG. 12, and processes the macroblock units (MBn, MBn + 1, Bn + 2, Bn + 3) as threads. Decoding processing of each macroblock is performed from 0 to thread 3.
- macro blocks MBn, MBn + 1, Bn + 2, and Bn + 3 are assumed to be continuous in their arrangement.
- variable length decoding of variable length coded signal (VLD), inverse quantization and inverse frequency conversion (IQT) It is necessary to perform motion compensation (MC), image reconstruction (Recon), and deblock filtering (DBF).
- VLD variable length coded signal
- IQT inverse quantization and inverse frequency conversion
- MC motion compensation
- Recon image reconstruction
- DPF deblock filtering
- data (passing data) to be referred to in the next macroblock MBn + 1 is written to a certain memory area.
- the multi-thread processor 100 updates the value of valid according to the Address corresponding to the memory area in the Read access management table T150 to 1. By doing this, the next macroblock MBn + 1 can start reading the passing data written in the memory area.
- the multi-thread processor 100 Updates the value of valid corresponding to the corresponding Address in the Read access management table T150 to one.
- the process (IQT, MC, Recon, DBF) related to the decoding is performed in the macroblock MBn + 1, reading of the passing data written by the macroblock MBn can be started.
- For encoding processing of a macro block usually, subtraction processing for calculating a prediction error with respect to image data to be encoded, quantization processing for performing frequency conversion and quantization on the prediction error, variable quantization DCT coefficient and motion vector It includes encoding processing for long encoding, processing related to generation of a reference image, and processing related to motion compensation.
- the concept of the operation of the multi-thread processor is the same as that applied to the above-mentioned decoding processing, the subtraction processing for one macroblock (for example, MBn) is completed, that is, the writing of data to be delivered to the next macroblock MBn + 1 is completed. Then, the macroblock MBn + 1 starts reading the written delivery data.
- MBn macroblock
- the software process for managing the dependency between threads by managing the dependency between threads with the program counter is Even when the number of threads increases or the dependency relationship becomes complicated, it is possible to realize a system LSI image for digital AV equipment that does not become a major performance degradation factor.
- the comparison using the all address bits is performed in the comparison between the value of the Read address and the value of the Address of the Read access management table T150.
- the present invention is not limited to this.
- comparison may be performed with 128-bit accuracy by excluding the lower 7 bits from comparison targets.
- the storage capacity of the read access management table T150 can be reduced.
- the timing of storing the access management table and the read access management table in the management table storage unit may be by user operation.
- the access management table and the read access management table are held in advance in a storage area different from the management table storage unit, and a dedicated instruction for specifying the held address is executed.
- Each table may be copied to the management table storage unit.
- the dedicated instruction may be placed at a specific position included in the thread to be detected.
- the code may be processed so that an OP exception occurs at that position before execution, and the corresponding interrupt processing routine.
- the read instruction in addition to decoding processing of the processor element for the read instruction, the read instruction is trapped by a method such as code rewriting so that an OP exception occurs before execution. It may be processed.
- the entry of the access management table and the entry of the Read access management table have a one-to-one relationship, but the present invention is not limited to this.
- the read detection unit determines “No” in the determination of step S230 illustrated in FIG. 6, the read detection unit can be implemented by changing the process transfer destination to step S250.
- the area length of the memory area to be managed may be a fixed length, or may be a variable length different for each memory area to be managed.
- Reading may be performed by designating PC (program counter). For example, when the dependency between write (Write) and read (Read) can not specify the PC at the time of Write, the Read side can specify the PC (instruction to guarantee that writing has already been completed). The dependency of Read can be maintained.
- PC program counter
- the PC is specified as one that guarantees that writing has been completed.
- the present invention is not limited to this.
- the contents of a specific control register or the contents of a memory may be referenced to determine whether the writing has been completed.
- parallel processing that is, thread allocation, is performed in units of macroblocks in which allocation is continuous, but the present invention is not limited to this.
- Parallel processing may be performed in units of macroblock lines, or may be processing units such as IDCT in image processing. Alternatively, it may be in GOP (Group Of Picture) units.
- GOP Group Of Picture
- a program describing the procedure of the method described in the above embodiment is stored in a memory, and a central processing unit (CPU) or the like reads the program from the memory and executes the read program.
- CPU central processing unit
- a program in which the procedure of the method is described may be stored in a recording medium and distributed.
- a processor that executes a plurality of threads completes writing to the memory area in one thread that writes to a memory area commonly used by other threads. If an instruction that exists in a location that guarantees that is executed, it is set in the usage information that indicates whether writing to the memory area is complete, indicating that writing to the memory area by the one thread is complete Setting means, and when the use information indicates that writing to the memory area by the one thread is completed, an instruction to read data present in the memory area by another thread is executed. When the use information indicates that writing to the memory area by the one thread is not completed, the read And control means for suppressing the execution of the instruction.
- the processor executes an instruction present at a position that guarantees that writing to the memory area is completed in one thread that writes to the memory area commonly used by other threads. Read data present in the memory area by another thread. That is, the processor can read data existing in the memory area by another thread after guaranteeing that the instruction existing before the instruction existing at the position is executed. . This allows the processor to maintain the dependency of, for example, executing the read instruction after executing the write instruction multiple times.
- the setting means has a holding area for holding in advance the value of the program counter according to the instruction present at the position, and when the value of the program counter is obtained from the outside, the holding area And the value of the program counter corresponding to the instruction executed by the one thread matches the stored value, the writing to the memory area by the one thread is completed May be set in the usage information.
- the processor can easily identify an instruction present at a position that guarantees that writing to the memory area in one thread is completed.
- the control unit is configured to read the read instruction from the other thread. Even if the read target address of the memory area to be read by is acquired and the read target address matches the memory address, the read instruction is executed and inhibited according to the content indicated by the corresponding usage information. Good.
- the processor since the processor holds the usage information and the memory address in association with each other, the processor can easily specify the usage status of the memory area indicated by the memory address indicating the read target.
- the processor further converts a virtual address acquired when data is read into a physical address, and the converted physical address is the memory address stored in advance by the permission unit.
- address conversion means for notifying that the use state of the memory area should be confirmed if there is an association between the memory areas, and the control means, upon receiving the notification, detects the memory area corresponding to the memory address.
- the usage status may be determined based on the usage information.
- the processor can specify in advance whether the control means needs to determine the utilization state of the predetermined memory area by providing the address conversion means.
- the memory area is an area used also by another thread other than the other threads, and the control means further transmits the use information to the memory area by the one thread. If it indicates that the writing has been completed, an instruction to read data present in the memory area by the other thread is executed, and the use information indicates that the writing to the memory area by the one thread is completed. If it does not indicate, the execution of the read instruction may be suppressed.
- the processor can execute with keeping the dependency.
- the processor further converts a virtual address acquired when fetching an instruction into a physical address, and the converted physical address is between the program counter held in advance by the permission means.
- address conversion means for notifying the permission means of request information for requesting the execution of the determination by the permission means, the permission means comprising means for converting the request information from the address conversion means The above determination may be made upon receipt.
- the processor can specify in advance whether the determination by the permitting unit needs to be performed by providing the address converting unit.
- the setting unit when the setting unit further sets, in the use information, an indication that writing to the memory area has been completed, the setting means is for another use information managed by another processor. Also, contents indicating that writing to the memory area has been completed may be set.
- the processor can perform processing while maintaining the dependency between the execution of a plurality of write instructions and the execution of a read instruction after execution of another read instruction.
- the one thread and the other threads are for decoding an image
- the processor may be provided in an image processing system for decoding an image.
- the processor can perform the decoding process while maintaining the dependency of executing the read instruction after execution of the plurality of write instructions.
- the one thread and the other threads may be for encoding an image
- the processor may be included in an image processing system for encoding an image.
- the processor can perform the encoding process while maintaining the dependency of executing the read instruction after execution of the plurality of write instructions.
- an image processing apparatus that processes an image using a plurality of threads is the memory area in one thread that performs writing to a memory area commonly used with other threads. Execution of an instruction at a position that guarantees that writing to the memory area is completed, the use information indicating whether writing to the memory area is completed, writing to the memory area by the one thread is completed Setting means for indicating that the data is stored in the memory area by another thread if the use information indicates that the writing to the memory area by the one thread is completed.
- a control means for inhibiting the execution of the read instruction.
- the image processing apparatus executes an instruction that exists in a location that guarantees that writing to the memory area is completed in one thread that writes to the memory area commonly used with other threads. Then, the data existing in the memory area is read by another thread. That is, the image processing apparatus performs reading of data existing in the memory area by another thread after guaranteeing that the instruction existing before the instruction existing at the position is executed. Can. As a result, the image processing apparatus can maintain, for example, a dependency that a read instruction is executed after the write instruction is executed a plurality of times.
- the image processing apparatus decodes the encoded image, and the plurality of threads are assigned such that macroblocks whose arrangement is continuous in one encoded image are different from each other
- the instruction at the position where it is guaranteed that the writing has been completed includes variable length decoding processing, processing concerning inverse quantization / inverse frequency conversion, processing concerning motion compensation, image reconstruction processing and deblocking filter processing
- the control means when judging that the writing has been completed for one macroblock, the next macro located subsequent to the one macroblock.
- the process may be controlled to execute the same process as the process determined to be completed for the block.
- the image processing apparatus can perform the decoding process on the macroblocks whose arrangement is continuous while maintaining the dependency between the macroblocks.
- the image processing apparatus is for encoding an image, and macroblocks whose arrangement is continuous in one image are allocated to the plurality of threads so as to be different from each other, and the writing is completed.
- the instructions present at the guaranteed position are subtraction processing for calculating the prediction error for the image data to be encoded, quantization processing for performing quantization and frequency conversion on the prediction error, encoding processing, and generation of a reference image
- the process is any of the processes relating to motion compensation, and when the control means determines that the writing has been completed for one macro block, the next macro block located subsequent to the one macro block is determined.
- the process may be controlled to execute the same process as the process determined to be completed.
- the image processing apparatus can perform encoding processing on macroblocks whose arrangement is continuous while maintaining the dependency between macroblocks.
- the multi-thread processor according to the present invention has the function of realizing flexible and high-performance arithmetic processing, and thus can be applied to a multi-thread processor that performs media processing of video and audio such as a DVD recorder and digital TV.
- multi-thread processor 101 instruction memory 102 instruction fetch control unit 103 instruction group determination unit 104 first instruction buffer 105 second instruction buffer 106 Nth instruction buffer 107 issue instruction determination unit 108 priority determination unit 109 first register file 110 second Register file 111 N register file 112 arithmetic unit group 113 write back bus 114 update control unit 115 data memory 116 read detection unit 117 instruction detection unit 118 management table storage unit 120 memory access 300 table read control unit 301 dep_id selection unit 302 PC comparison Unit 400 Table read control unit 401 dep_id selection unit 402 Read address comparison unit 403 comparison unit 404 adder
Landscapes
- Engineering & Computer Science (AREA)
- Software Systems (AREA)
- Theoretical Computer Science (AREA)
- Physics & Mathematics (AREA)
- General Engineering & Computer Science (AREA)
- General Physics & Mathematics (AREA)
- Multimedia (AREA)
- Advance Control (AREA)
- Information Retrieval, Db Structures And Fs Structures Therefor (AREA)
- Multi Processors (AREA)
- Memory System Of A Hierarchy Structure (AREA)
Abstract
Description
以下、本発明に係る第1の実施の形態について、図面を参照しながら説明する。
図1は、第1の実施の形態におけるマルチスレッドプロセッサ100の構成を示すブロック図である。
命令メモリ101は、マルチスレッドプロセッサにおいて実行される命令を保持するメモリであり、N本の独立に実行される命令流(スレッド)を保持している。
命令フェッチ制御部102は、各スレッドのプログラムカウンタ(PC)を保持し、次に実行する命令を命令メモリから読み出す。ここで、各スレッドのプログラムカウンタは、互いに異なる値の範囲においてカウントされるものとする。
命令群決定部103は、命令メモリ101から、各命令流に属する命令を読み出し、デコードを行い、当該命令が割り当てられている命令バッファに書き込む。
第i命令バッファ(iは1以上N以下の整数)は、i番目の命令流(以下、第i命令流という。)に属する命令を受け取り、保持するものである。
発行命令決定部107は、N個の命令バッファからマシンサイクル毎に発行する命令を決定するものである。
優先度決定部108は、発行命令決定部107において発行する命令を決定する際に用いる優先度情報テーブル保持している。
第iレジスタファイル(iは1以上N以下の整数)は、第i命令バッファに保持された命令流を実行することによって、読み出し及び書き込みの対象とされるデータを保持するレジスタ群である。
ライトバックバス113は、演算器群112からの出力を第1レジスタファイル109~第Nレジスタファイル111に書き戻すためのバスである。
データメモリ115は、データメモリとアクセスする命令によってアクセスされ、プログラムを実行する際のデータを保持している。
管理テーブル記憶部118は、図2(a)、(b)に示すように、アクセス管理テーブルT100及びReadアクセス管理テーブルT150を記憶している。
アクセス管理テーブルT100は、図2(a)に示すように、entry_valid200、dep_id201及びvalid化PC202からなる組を複数個記憶するための領域を有している。
更新制御部114は、アクセス管理テーブルT100及びReadアクセス管理テーブルT150を更新するものである。
更新制御部114は、ソフトウェアからソフトウェア更新指示を受け付けると、アクセス管理テーブルT100内のフィールドを更新する。いずれのフィールドも、ソフトウェアによる読み出し、書き込みが可能である。
更新制御部114は、アクセス管理テーブルT100の更新時と同様にソフトウェアからソフトウェア更新指示を受け付けると、Readアクセス管理テーブルT150内のフィールドを更新する。いずれのフィールドも、ソフトウェアによる読み出し、書き込みが可能である。
命令検出部117は、命令の実行時に、その命令のプログラムカウンタの値に基づいて、管理テーブル記憶部118で保持されているアクセス管理テーブルT100で管理されているプログラムカウンタか否かを検出する処理部である。つまり、あるメモリ領域への書き込みが完了したか否かを検出するものである。
読み出し検出部116は、データメモリとアクセスする命令の実行時に、そのアクセス対象アドレスに基づいて、管理テーブル記憶部118で保持されているReadアクセス管理テーブルT150で管理されているメモリ領域であるか否かを検出する。
ここでは、マルチスレッドプロセッサ100の動作について説明する。
まず、ある命令が実行された際に、命令検出部117で行われる動作について、図4に示す流れ図を用いて説明する。なお、当該処理は、命令検出部117が演算器群112からある命令に対する命令実行信号、PC及び命令th_idを受け取ったことで、開始されるものとする。
ここでは、アクセス管理テーブルT100において管理されているPCで示される命令までが実行されると、行なわれるReadアクセス管理テーブルT150の更新の動作について、図5に示す流れ図を用いて説明する。
ここでは、Read命令が実行される際の動作について、図6に示す流れ図を用いて説明する。なお、当該処理は、読み出し検出部116がメモリアクセス120からRead命令に対するRead実行信号、Readアドレス及びRead th_idを受け取ったことで、開始されるものとする。
テーブル読出制御部400は、カウンタpの値をインクリメントする(ステップS250)。テーブル読出制御部400は、Readアクセス管理テーブルT150に登録されているエントリの終端番号をエントリ終端レジスタから取得する(ステップS255)。そして、テーブル読出制御部400は、カウンタpの値が、エントリの終端番号と一致するか否かを判断する(ステップS260)。等しいと判断する場合には処理を終了し(ステップS260における「Yes」)、等しくないと判断する場合には処理はステップS205へ戻る(ステップS260における「No」)。
以上により、本実施の形態で示すマルチスレッドプロセッサ100は、例えば書き込み命令を複数回実行した後に読み出し命令を実行するという依存関係を保つことができる。
以下、本発明に係る第2の実施の形態について、図面を参照しながら、第1の実施の形態と異なる点を中心に説明する。
図7は、第2の実施の形態におけるマルチスレッドプロセッサ1100の構成を示すブロック図である。
アドレス変換部1130は、命令フェッチ制御部1102から入力されたフェッチアドレス(論理アドレス)を、変換テーブルT200を用いて別アドレス(物理アドレス)へ変換して、命令メモリ1101へ出力するものである。この動作は、仮想空間を扱うためにMMU(メモリマネージメントユニット)を備えるプロセッサ上で、仮想空間のページを管理するためのTLB(トランスレーションルックアサイドバッファ)の動作である(例えば、以下の非特許文献2を参照)。
以下、本実施の形態におけるアドレス変換部1130の具体的な機能について説明する。
命令メモリ1101は、第1の実施の形態と同様に、マルチスレッドプロセッサにおいて実行される命令を保持するメモリであり、N本の独立に実行される命令流(スレッド)を保持している。
命令フェッチ制御部1102は、各スレッドのプログラムカウンタを保持し、次に実行する命令を命令メモリから読み出す。
命令群決定部1103は、第1の実施の形態で示す命令群決定部103と同様であるので、ここでの説明は省略する。
第1命令バッファ1104から第N命令バッファ1106は、は、第1の実施の形態で示す各命令バッファと同様であるので、ここでの説明は省略する。なお、以下において、i番目の命令流を第i命令流という(iは1以上N以下の整数)。
発行命令決定部1107は、第1の実施の形態で示す発行命令決定部107と同様であるので、ここでの説明は省略する。
優先度決定部1108は、第1の実施の形態で示す優先度決定部108と同様であるので、ここでの説明は省略する。
第1レジスタファイル1109から第Nレジスタファイル1111は、第1の実施の形態で示す各レジスタファイルと同様であるので、ここでの説明は省略する。
ライトバックバス1113は、第1の実施の形態で示すライトバックバス113と同様であるので、ここでの説明は省略する。
データメモリ1115は、第1の実施の形態で示すデータメモリ115と同様であるので、ここでの説明は省略する。
管理テーブル記憶部1118は、第1の実施の形態と同様に、アクセス管理テーブル及びReadアクセス管理テーブルを記憶している。なお、以降の説明において必要であれば、図2(a)、(b)で示すアクセス管理テーブルT100及びReadアクセス管理テーブルT150を用いて説明する。
更新制御部1114は、第1の実施の形態の更新制御部114と同様に、アクセス管理テーブルT100及びReadアクセス管理テーブルT150を更新するものである。なお、更新についての詳細な機能は、第1の実施の形態と同様であるので、ここでの説明は省略する。
命令検出部1117は、第1の実施の形態で示す命令検出部117と同様の構成要素を有し、命令の実行時に、その命令のプログラムカウンタの値に基づいて、管理テーブル記憶部1118で保持されているアクセス管理テーブルT100で管理されているプログラムカウンタか否かを検出する処理を行う。
読み出し検出部1116は、第1の実施の形態で示す読み出し検出部116と同様の構成要素を有し、データメモリとアクセスする命令の実行時に、そのアクセス対象アドレスに基づいて、管理テーブル記憶部1118で保持されているReadアクセス管理テーブルT150で管理されているメモリ領域であるか否かを検出する。
ここでは、マルチスレッドプロセッサ1100の動作について、第1の実施の形態で示すマルチスレッドプロセッサ100の動作と異なる点を中心に説明する。
命令検出時の動作は、第1の実施の形態で示す動作(図4参照)と同様の動作の流れであるが、開始のタイミングが異なる。本実施の形態では、命令検出部1117は、演算器群1112からPCをチェックすべき旨の情報が通知されたときに、当該処理を開始する。
本実施の形態におけるReadアクセス管理テーブルT150の更新の動作は、第1の実施の形態で示す動作(図5参照)と同様であるので、ここでの説明は省略する。
本実施の形態におけるRead命令検出時の動作は、第1の実施の形態で示す動作(図6参照)と同様であるので、ここでの説明は省略する。
以上により、アドレス変換部1130を用いることで、演算器群1112で実行され、命令検出部1117でチェックする命令数を大幅に削減し、命令検出部1117の動作頻度を削減し、回路の消費電力を削減することができる。
以下、本発明に係る第3の実施の形態について、図面を参照しながら、第1及び第2の実施の形態と異なる点を中心に説明する。
図9は、第3の実施の形態におけるマルチスレッドプロセッサ2100の構成を示すブロック図である。
命令メモリ2101は、第1の実施の形態と同様に、マルチスレッドプロセッサにおいて実行される命令を保持するメモリであり、N本の独立に実行される命令流(スレッド)を保持している。
命令フェッチ制御部2102は、第1の実施の形態で示す命令フェッチ制御部102と同様であるので、ここでの説明は省略する。
命令群決定部2103は、第1の実施の形態で示す命令群決定部103と同様であるので、ここでの説明は省略する。
第1命令バッファ2104から第N命令バッファ2106は、は、第1の実施の形態で示す各命令バッファと同様であるので、ここでの説明は省略する。なお、以下において、i番目の命令流を第i命令流という(iは1以上N以下の整数)。
発行命令決定部2107は、第1の実施の形態で示す発行命令決定部107と同様であるので、ここでの説明は省略する。
優先度決定部2108は、第1の実施の形態で示す優先度決定部108と同様であるので、ここでの説明は省略する。
第1レジスタファイル2109から第Nレジスタファイル2111は、第1の実施の形態で示す各レジスタファイルと同様であるので、ここでの説明は省略する。
アドレス変換部2130は、メモリアクセス2120から入力されたアクセスアドレス(論理アドレス)を、変換テーブルT300を用いて別アドレス(物理アドレス)へ変換して、データメモリ2115へ出力するものである。この動作は、仮想空間を扱うためにMMU(メモリマネージメントユニット)を備えるプロセッサ上で、仮想空間のページを管理するためのTLB(トランスレーションルックアサイドバッファ)の動作である(非特許文献1参照)。
ライトバックバス2113は、第1の実施の形態で示すライトバックバス113と同様であるので、ここでの説明は省略する。
データメモリ2115は、第1の実施の形態で示すデータメモリ115と同様であるので、ここでの説明は省略する。
管理テーブル記憶部2118は、第1の実施の形態と同様に、アクセス管理テーブル及びReadアクセス管理テーブルを記憶している。なお、以降の説明において必要であれば、図2(a)、(b)で示すアクセス管理テーブルT100及びReadアクセス管理テーブルT150を用いて説明する。
更新制御部2114は、第1の実施の形態の更新制御部114と同様に、アクセス管理テーブルT100及びReadアクセス管理テーブルT150を更新するものである。なお、更新についての詳細な機能は、第1の実施の形態と同様であるので、ここでの説明は省略する。
命令検出部2117は、第1の実施の形態で示す命令検出部117と同様の構成要素を有し、命令の実行時に、その命令のプログラムカウンタの値に基づいて、管理テーブル記憶部2118で保持されているアクセス管理テーブルT100で管理されているプログラムカウンタか否かを検出する処理を行う。
読み出し検出部2116は、第1の実施の形態で示す読み出し検出部116と同様の構成要素を有し、データメモリとアクセスする命令の実行時に、そのアクセス対象アドレスに基づいて、管理テーブル記憶部1118で保持されているReadアクセス管理テーブルT150で管理されているメモリ領域であるか否かを検出する。
ここでは、マルチスレッドプロセッサ2100の動作について、第1の実施の形態で示すマルチスレッドプロセッサ100、及び第2の実施の形態で示すマルチスレッドプロセッサ1100の動作と異なる点を中心に説明する。
命令検出時の動作は、第1の実施の形態で示す動作(図4参照)と同様であるので、ここでの説明は省略する。
本実施の形態におけるReadアクセス管理テーブルT150の更新の動作は、第1の実施の形態で示す動作(図5参照)と同様であるので、ここでの説明は省略する。
本実施の形態におけるRead命令検出時の動作は、第1の実施の形態で示す動作(図6参照)と同様の動作の流れであるが、開始のタイミングが異なる。本実施の形態では、読み出し検出部2116は、メモリアクセス2120からReadアドレスをチェックすべき旨の情報が通知されたときに、当該処理を開始する。
以上により、アドレス変換部1130を用いることで、演算器群2112で実行され、読み出し検出部2116でチェックするRead命令数を大幅に削減し、読み出し検出部2116の動作頻度を削減し、回路の消費電力を削減することができる。
以下、本発明に係る第4の実施の形態について、図面を参照しながら、第1の実施の形態と異なる点を中心に説明する。
ここでは、本発明に係る第5の実施の形態について、第1の実施の形態で示すマルチスレッドプロセッサ100をデジタルAV機器向けのシステムLSI映像のデコード及びエンコード処理に適用する場合の動作を説明する。
以上、各実施の形態に基づいて説明したが、本発明は上記の各実施の形態に限られない。例えば、以下のような変形例が考えられる。
特定の制御レジスタの内容やメモリの内容を参照して、書き込みが終わっているか否かを判断してもよい。 (9)上記第5の実施の形態において、配置が連続するマクロブロック単位に並列処理、つまりスレッドの割り当てを行ったが、これに限定されない。
(1)本発明の一実施態様である、複数のスレッドを実行するプロセッサは、他のスレッドと共通に利用するメモリ領域への書き込みを行う一のスレッドにおいて前記メモリ領域への書き込みが完了したこと保証する位置に存在する命令を実行すると、当該メモリ領域への書き込みが完了したか否かを示す利用情報に、当該一のスレッドによる当該メモリ領域への書き込みが完了したことを示す旨を設定する設定手段と、前記利用情報が前記一のスレッドによる前記メモリ領域への書き込みが完了したことを示している場合には、他のスレッドによる前記メモリ領域に存在するデータの読み出し命令を実行し、前記利用情報が前記一のスレッドによる前記メモリ領域への書き込みが完了していないことを示している場合には、当該読み出し命令の実行を抑止する制御手段とを備えることを特徴とする。
101 命令メモリ
102 命令フェッチ制御部
103 命令群決定部
104 第1命令バッファ
105 第2命令バッファ
106 第N命令バッファ
107 発行命令決定部
108 優先度決定部
109 第1レジスタファイル
110 第2レジスタファイル
111 第Nレジスタファイル
112 演算器群
113 ライトバックバス
114 更新制御部
115 データメモリ
116 読み出し検出部
117 命令検出部
118 管理テーブル記憶部
120 メモリアクセス
300 テーブル読出制御部
301 dep_id選択部
302 PC比較部
400 テーブル読出制御部
401 dep_id選択部
402 Readアドレス比較部
403 比較部
404 加算器
Claims (13)
- 複数のスレッドを実行するプロセッサであって、
他のスレッドと共通に利用するメモリ領域への書き込みを行う一のスレッドにおいて前記メモリ領域への書き込みが完了したこと保証する位置に存在する命令を実行すると、当該メモリ領域への書き込みが完了したか否かを示す利用情報に、当該一のスレッドによる当該メモリ領域への書き込みが完了したことを示す旨を設定する設定手段と、
前記利用情報が前記一のスレッドによる前記メモリ領域への書き込みが完了したことを示している場合には、他のスレッドによる前記メモリ領域に存在するデータの読み出し命令を実行し、前記利用情報が前記一のスレッドによる前記メモリ領域への書き込みが完了していないことを示している場合には、当該読み出し命令の実行を抑止する制御手段とを備える
ことを特徴とするプロセッサ。 - 前記設定手段は、
前記位置に存在する命令に応じたプログラムカウンタの値を予め保持する保持領域を有しており、
外部から前記プログラムカウンタの値を取得すると、前記保持領域へ格納し、
前記一のスレッドにて実行される命令に応じたプログラムカウンタの値と、保持している値とが一致する場合に、前記一のスレッドによる当該メモリ領域への書き込みが完了したことを示す旨を前記利用情報に設定する
ことを特徴とする請求項1に記載のプロセッサ。 - 前記保持領域には、さらに、
前記利用情報と、前記メモリ領域を示すメモリアドレスとが対応付けられて保持されており、
前記制御手段は、前記他のスレッドから前記読み出し命令による読み出し対象であるメモリ領域の読み出し対象アドレスを取得し、前記読み出し対象アドレスと前記メモリアドレスとが一致する場合に、対応する利用情報が示す内容に応じて前記読み出し命令を実行及び抑止する
ことを特徴とする請求項2に記載のプロセッサ。 - 前記プロセッサは、さらに、
データの読み出しを行う際に取得した仮想アドレスを物理アドレスに変換し、変換した前記物理アドレスが前記許可手段で予め保持している前記メモリアドレスとの間に関連がある場合には、前記メモリ領域の利用状況を確認すべき旨を通知するアドレス変換手段を備え、
前記制御手段は、前記通知を受け取ると、前記メモリアドレスに対応する前記メモリ領域の利用状況を前記利用情報に基づいて判断する
ことを特徴とする請求項3に記載のプロセッサ。 - 前記メモリ領域は、前記他のスレッドとは別のスレッドにも利用される領域であり、
前記制御手段は、さらに、
前記利用情報が前記一のスレッドによる前記メモリ領域への書き込みが完了したことを示している場合には、前記別のスレッドによる前記メモリ領域に存在するデータの読み出し命令を実行し、前記利用情報が前記一のスレッドによる前記メモリ領域への書き込みが完了していないことを示している場合には、当該読み出し命令の実行を抑止する
ことを特徴とする請求項2に記載のプロセッサ。 - 前記プロセッサは、さらに、
命令をフェッチする際に取得した仮想アドレスを物理アドレスに変換し、変換した前記物理アドレスが前記許可手段で予め保持している前記プログラムカウンタとの間に関連がある場合には、前記許可手段による前記判断を実行することを要求する要求情報を前記許可手段に通知するアドレス変換手段を備え、
前記許可手段は、前記要求情報を前記アドレス変換手段から受け取ると、前記判断を行う
ことを特徴とする請求項2に記載のプロセッサ。 - 前記設定手段は、さらに、
前記メモリ領域への書き込みが完了したことを示す旨を利用情報に設定する際に、他のプロセッサで管理されている別の利用情報に対しても前記メモリ領域への書き込みが完了したことを示す内容を設定する
ことを特徴とする請求項2に記載のプロセッサ。 - 前記一のスレッド及び他のスレッドは、画像のデコード処理を行うためのものであり、
前記プロセッサは、
画像のデコード処理を行う画像処理システムに備えられる
ことを特徴とする請求項1に記載のプロセッサ。 - 前記一のスレッド及び他のスレッドは、画像のエンコード処理を行うためのものであり、
前記プロセッサは、
画像のエンコード処理を行う画像処理システムに備えられる
ことを特徴とする請求項1に記載のプロセッサ。 - 複数のスレッドを実行するプロセッサで用いられる制御方法であって、
他のスレッドと共通に利用するメモリ領域への書き込みを行う一のスレッドにおいて前記メモリ領域への書き込みが完了したこと保証する位置に存在する命令を実行すると、当該メモリ領域への書き込みが完了したか否かを示す利用情報に、当該一のスレッドによる当該メモリ領域への書き込みが完了したことを示す旨を設定する設定ステップと、
前記利用情報が前記一のスレッドによる前記メモリ領域への書き込みが完了したことを示している場合には、他のスレッドによる前記メモリ領域に存在するデータの読み出し命令を実行し、前記利用情報が前記一のスレッドによる前記メモリ領域への書き込みが完了していないことを示している場合には、当該読み出し命令の実行を抑止する制御ステップとを含む
ことを特徴とする制御方法。 - 複数のスレッドを用いて画像を処理する画像処理装置であって、
他のスレッドと共通に利用するメモリ領域への書き込みを行う一のスレッドにおいて前記メモリ領域への書き込みが完了したこと保証する位置に存在する命令を実行すると、当該メモリ領域への書き込みが完了したか否かを示す利用情報に、当該一のスレッドによる当該メモリ領域への書き込みが完了したことを示す旨を設定する設定手段と、
前記利用情報が前記一のスレッドによる前記メモリ領域への書き込みが完了したことを示している場合には、他のスレッドによる前記メモリ領域に存在するデータの読み出し命令を実行し、前記利用情報が前記一のスレッドによる前記メモリ領域への書き込みが完了していないことを示している場合には、当該読み出し命令の実行を抑止する制御手段とを備える
ことを特徴とする画像処理装置。 - 前記画像処理装置は、符号化された画像を復号するものであり、
前記複数のスレッドには、符号化された一の画像において配置が連続するマクロブロックが互いに異なるよう割り当てられ、
前記書き込みが完了したこと保証する位置に存在する命令とは、可変長復号処理、逆量子化・逆周波数変換に係る処理、動き補償に係る処理、画像の再構成処理及びデブロッキングフィルタ処理の何れかの処理が完了したことを示す命令であり、
制御手段は、一のマクロブロックについて、前記書き込みが完了したと判断する場合には、当該一のマクロブロックの後続に位置する次のマクロブロックについて、書き込みが完了したと判断された処理と同一の処理を実行するよう当該処理を制御する
ことを特徴とする請求項11に記載の画像処理装置。 - 前記画像処理装置は、画像を符号化するものであり、
前記複数のスレッドには、一の画像において配置が連続するマクロブロックが互いに異なるよう割り当てられ、
前記書き込みが完了したこと保証する位置に存在する命令とは、符号化対象の画像データに対する予測誤差を算出する減算処理、予測誤差に対して量子化と周波数変換を行う量子化処理、符号化処理、参照画像の生成処理、動き補償に係る処理の何れかであり、
制御手段は、一のマクロブロックについて、前記書き込みが完了したと判断する場合には、当該一のマクロブロックの後続に位置する次のマクロブロックについて、書き込みが完了したと判断された処理と同一の処理を実行するよう当該処理を制御する
ことを特徴とする請求項11に記載の画像処理装置。
Priority Applications (3)
Application Number | Priority Date | Filing Date | Title |
---|---|---|---|
US13/393,967 US8898671B2 (en) | 2010-07-07 | 2011-07-06 | Processor that executes a plurality of threads by promoting efficiency of transfer of data that is shared with the plurality of threads |
JP2012523764A JP5853217B2 (ja) | 2010-07-07 | 2011-07-06 | プロセッサ |
CN201180003728.4A CN102483708B (zh) | 2010-07-07 | 2011-07-06 | 处理器 |
Applications Claiming Priority (2)
Application Number | Priority Date | Filing Date | Title |
---|---|---|---|
JP2010154629 | 2010-07-07 | ||
JP2010-154629 | 2010-07-07 |
Publications (1)
Publication Number | Publication Date |
---|---|
WO2012004990A1 true WO2012004990A1 (ja) | 2012-01-12 |
Family
ID=45440979
Family Applications (1)
Application Number | Title | Priority Date | Filing Date |
---|---|---|---|
PCT/JP2011/003861 WO2012004990A1 (ja) | 2010-07-07 | 2011-07-06 | プロセッサ |
Country Status (4)
Country | Link |
---|---|
US (1) | US8898671B2 (ja) |
JP (1) | JP5853217B2 (ja) |
CN (1) | CN102483708B (ja) |
WO (1) | WO2012004990A1 (ja) |
Families Citing this family (5)
Publication number | Priority date | Publication date | Assignee | Title |
---|---|---|---|---|
DE102013008420A1 (de) | 2013-05-17 | 2014-11-20 | Abb Technology Ag | Antriebseinheit zur Ansteuerung eines Motors |
CN108347613B (zh) * | 2017-01-25 | 2020-05-26 | 龙芯中科技术有限公司 | 图像宏块并行编码方法和装置 |
CN107038021B (zh) * | 2017-04-05 | 2019-05-24 | 华为技术有限公司 | 用于访问随机存取存储器ram的方法、装置和系统 |
US10902113B2 (en) * | 2017-10-25 | 2021-01-26 | Arm Limited | Data processing |
US20220206799A1 (en) * | 2020-12-30 | 2022-06-30 | Silicon Laboratories Inc. | Apparatus for Processor with Hardware Fence and Associated Methods |
Citations (4)
Publication number | Priority date | Publication date | Assignee | Title |
---|---|---|---|---|
JP2003029987A (ja) * | 2001-07-12 | 2003-01-31 | Nec Corp | スレッド終了方法及び装置並びに並列プロセッサシステム |
JP2003323415A (ja) * | 2002-04-26 | 2003-11-14 | Internatl Business Mach Corp <Ibm> | メモリ・アクセス順序付け及びロック管理の方法、装置、プログラム及び記録媒体 |
JP2007520769A (ja) * | 2003-06-27 | 2007-07-26 | インテル コーポレイション | モニタメモリ待機を用いたキューされたロック |
JP2008165834A (ja) * | 2001-12-31 | 2008-07-17 | Intel Corp | 指定されたメモリアクセスが発生するまでスレッドの実行をサスペンドする方法及び装置 |
Family Cites Families (2)
Publication number | Priority date | Publication date | Assignee | Title |
---|---|---|---|---|
FR2854754B1 (fr) * | 2003-05-06 | 2005-12-16 | Procede et dispositif de codage ou decodage d'image avec parallelisation du traitement sur une pluralite de processeurs, programme d'ordinateur et signal de synchronisation correspondants | |
US8176022B1 (en) * | 2006-08-26 | 2012-05-08 | Radames Garcia | Locking protocol using dynamic locks and dynamic shared memory |
-
2011
- 2011-07-06 JP JP2012523764A patent/JP5853217B2/ja active Active
- 2011-07-06 CN CN201180003728.4A patent/CN102483708B/zh active Active
- 2011-07-06 US US13/393,967 patent/US8898671B2/en active Active
- 2011-07-06 WO PCT/JP2011/003861 patent/WO2012004990A1/ja active Application Filing
Patent Citations (4)
Publication number | Priority date | Publication date | Assignee | Title |
---|---|---|---|---|
JP2003029987A (ja) * | 2001-07-12 | 2003-01-31 | Nec Corp | スレッド終了方法及び装置並びに並列プロセッサシステム |
JP2008165834A (ja) * | 2001-12-31 | 2008-07-17 | Intel Corp | 指定されたメモリアクセスが発生するまでスレッドの実行をサスペンドする方法及び装置 |
JP2003323415A (ja) * | 2002-04-26 | 2003-11-14 | Internatl Business Mach Corp <Ibm> | メモリ・アクセス順序付け及びロック管理の方法、装置、プログラム及び記録媒体 |
JP2007520769A (ja) * | 2003-06-27 | 2007-07-26 | インテル コーポレイション | モニタメモリ待機を用いたキューされたロック |
Also Published As
Publication number | Publication date |
---|---|
CN102483708A (zh) | 2012-05-30 |
JP5853217B2 (ja) | 2016-02-09 |
US20120167114A1 (en) | 2012-06-28 |
CN102483708B (zh) | 2016-01-20 |
JPWO2012004990A1 (ja) | 2013-09-02 |
US8898671B2 (en) | 2014-11-25 |
Similar Documents
Publication | Publication Date | Title |
---|---|---|
CN107980118B (zh) | 使用多线程处理的多核处理器设备 | |
EP3314399B1 (en) | Decoupled instruction window and operand buffer in a block based architecture | |
JP5433676B2 (ja) | プロセッサ装置、マルチスレッドプロセッサ装置 | |
KR101996592B1 (ko) | 명확화 없는 비순차 load store 큐를 갖는 재정렬된 투기적 명령어 시퀀스들 | |
US10558460B2 (en) | General purpose register allocation in streaming processor | |
US20080046689A1 (en) | Method and apparatus for cooperative multithreading | |
US8433884B2 (en) | Multiprocessor | |
KR101996462B1 (ko) | 명확화 없는 비순차 load store 큐 | |
KR101996351B1 (ko) | 통합된 구조를 갖는 동적 디스패치 윈도우를 가지는 가상 load store 큐 | |
US9465670B2 (en) | Generational thread scheduler using reservations for fair scheduling | |
KR101804027B1 (ko) | 메모리로부터 순차적으로 판독하는 load들을 구성하는 메모리 일관성 모델에서 비순차 load들을 갖는 세마포어 방법 및 시스템 | |
Hoogerbrugge et al. | A multithreaded multicore system for embedded media processing | |
JP5853217B2 (ja) | プロセッサ | |
KR20140113434A (ko) | 바이패스 멀티플 인스턴스화 테이블을 갖는 이동 제거 시스템 및 방법 | |
KR20170102576A (ko) | 분산된 구조를 갖는 동적 디스패치 윈도우를 가지는 가상 load store 큐 | |
US20130290675A1 (en) | Mitigation of thread hogs on a threaded processor | |
GB2520731A (en) | Soft-partitioning of a register file cache | |
CN114153500A (zh) | 指令调度方法、指令调度装置、处理器及存储介质 | |
US10534614B2 (en) | Rescheduling threads using different cores in a multithreaded microprocessor having a shared register pool | |
WO2022161013A1 (zh) | 处理器装置及其指令执行方法、计算设备 | |
US10824431B2 (en) | Releasing rename registers for floating-point operations | |
US9317287B2 (en) | Multiprocessor system | |
US20130166887A1 (en) | Data processing apparatus and data processing method | |
JP5488609B2 (ja) | リングバスによって相互接続された複数の処理要素を有する単一命令多重データ(simd)プロセッサ | |
Tu et al. | A portable and efficient user dispatching mechanism for multicore systems |
Legal Events
Date | Code | Title | Description |
---|---|---|---|
WWE | Wipo information: entry into national phase |
Ref document number: 201180003728.4 Country of ref document: CN |
|
WWE | Wipo information: entry into national phase |
Ref document number: 2012523764 Country of ref document: JP |
|
121 | Ep: the epo has been informed by wipo that ep was designated in this application |
Ref document number: 11803327 Country of ref document: EP Kind code of ref document: A1 |
|
WWE | Wipo information: entry into national phase |
Ref document number: 13393967 Country of ref document: US |
|
NENP | Non-entry into the national phase |
Ref country code: DE |
|
122 | Ep: pct application non-entry in european phase |
Ref document number: 11803327 Country of ref document: EP Kind code of ref document: A1 |