WO2012026124A1 - Thread arbitration system, processor, video recording/reproduction device, and thread arbitration method - Google Patents

Thread arbitration system, processor, video recording/reproduction device, and thread arbitration method Download PDF

Info

Publication number
WO2012026124A1
WO2012026124A1 PCT/JP2011/004727 JP2011004727W WO2012026124A1 WO 2012026124 A1 WO2012026124 A1 WO 2012026124A1 JP 2011004727 W JP2011004727 W JP 2011004727W WO 2012026124 A1 WO2012026124 A1 WO 2012026124A1
Authority
WO
WIPO (PCT)
Prior art keywords
thread
shared resource
threads
processor
specific instruction
Prior art date
Application number
PCT/JP2011/004727
Other languages
French (fr)
Japanese (ja)
Inventor
直紀 越智
Original Assignee
パナソニック株式会社
Priority date (The priority date is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the date listed.)
Filing date
Publication date
Application filed by パナソニック株式会社 filed Critical パナソニック株式会社
Publication of WO2012026124A1 publication Critical patent/WO2012026124A1/en

Links

Images

Classifications

    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F9/00Arrangements for program control, e.g. control units
    • G06F9/06Arrangements for program control, e.g. control units using stored programs, i.e. using an internal store of processing equipment to receive or retain programs
    • G06F9/46Multiprogramming arrangements
    • G06F9/50Allocation of resources, e.g. of the central processing unit [CPU]
    • G06F9/5005Allocation of resources, e.g. of the central processing unit [CPU] to service a request
    • G06F9/5011Allocation of resources, e.g. of the central processing unit [CPU] to service a request the resources being hardware resources other than CPUs, Servers and Terminals
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F2209/00Indexing scheme relating to G06F9/00
    • G06F2209/50Indexing scheme relating to G06F9/50
    • G06F2209/507Low-level

Definitions

  • the present invention relates to a thread arbitration system, and more particularly to a thread arbitration system used for a multi-thread processor.
  • a thread refers to a unit of processing executed in a computer system or a computer program for causing a computer system to execute the processing.
  • the size of the thread is arbitrarily determined by the program designer.
  • FIG. 15 is a diagram schematically showing a typical example of the configuration and operation of such a processor.
  • This processor includes a fetch unit, a dispatcher, a decoder, an arithmetic unit A, and an arithmetic unit B as hardware resources (hereinafter referred to as resources for short).
  • resources for short hardware resources
  • Each instruction is divided into a plurality of stages and pipelined by resources associated with each stage.
  • FIG. 15 shows the processing status of the instructions P1, Q1, R1,... Of the three threads P, Q, and R.
  • This processor is configured so that all stages of all instructions are processed in one unit time, and in each stage, instructions of different threads are processed in an orderly manner for each unit time.
  • 1 unit time may be, for example, one clock cycle, or may be a plurality of predetermined clock cycles.
  • one unit time is generally referred to as one time slot.
  • a specific instruction for example, a division instruction
  • the processor processes the execution stage of such a specific instruction over a plurality of time slots.
  • the upstream stage of a specific instruction (eg, a division instruction) is processed in one time slot, and the execution stage of the specific instruction is processed in three consecutive time slots using a shared resource (eg, a divider).
  • a shared resource eg, a divider.
  • the shared resource is occupied in a time division manner by a plurality of specific instructions.
  • a thread that is allowed to start an execution stage of a specific instruction for each time slot is defined, and among the subsequent specific instructions that are waiting when the execution stage of the preceding specific instruction ends, Consider how to start the execution stage of a specific instruction of a thread that is allowed in that time slot.
  • FIG. 16 is a diagram showing an example of the processing status of the execution stage of a specific instruction based on such a concept.
  • FIG. 16 shows the processing status of each of the execution stages of the three specific instructions P1, P2, and P3 of the thread P, the specific instruction Q1 of the thread Q, and the specific instruction R1 of the thread R.
  • the execution stage of the specific instruction can be started (thick line) when the processing of the upstream stage is completed.
  • a specific instruction whose execution stage can be started is executed (solid line) only when the shared resource is free when the time slot in which the execution stage of the thread is permitted is turned around. Until it is a start wait (broken line).
  • the present invention has been made in view of the above circumstances, and provides a thread arbitration system that is suitably used in a processor that can execute a plurality of threads and that can guarantee the time required to process instructions of each thread. It is for the purpose.
  • a thread arbitration system includes a processor that executes a plurality of threads each corresponding to a computer program using a shared resource.
  • the shared resource is occupied in a time-sharing manner by a specific instruction included in each thread, and each thread is assigned to each thread.
  • the shared resource can be used by processing the upstream stage of the specific instruction in a time slot that is sequentially and exclusively allocated, and then, for the processing of the downstream stage of the specific instruction over a plurality of time slots.
  • the shared resource Occupies a shared resource and When the first thread of the plurality of threads terminates the use of the shared resource, the first thread and a second thread different from the first thread use the shared resource, respectively. When it is in a ready state, the shared resource is allocated to the second thread prior to the first thread.
  • the specific instruction of the thread that has been able to use the shared resource first may be dispatched to the downstream stage.
  • the thread arbitration system when the second thread different from the first thread can use the shared resource while the first thread is using the shared resource among the plurality of threads,
  • the subsequent specific instruction of the first thread may be dispatched to the downstream stage after the second thread has finished using the shared resource.
  • any one thread does not continue to use the shared resource, so the shared resource is evenly distributed to all threads.
  • the specific instruction of each thread can complete the processing within a predetermined guarantee time.
  • the thread arbitration system sets a priority for each thread, and among the plurality of threads, a second thread having a higher priority than the first thread is used while the first thread is using the shared resource.
  • the shared resource becomes available, the use of the shared resource by the first thread is stopped, the specific instruction of the second thread is dispatched to a downstream stage, and the second thread After the use is finished, the first thread may resume the use of the shared resource.
  • the processing time is guaranteed only for the thread having the highest priority, so that the thread having the highest priority is not guaranteed, instead of guaranteeing the processing time of the thread having the lower priority.
  • a shorter processing time can be guaranteed for a specific instruction.
  • the processor according to one aspect of the present invention may include the above-described thread arbitration system.
  • a processor that can guarantee the time required for a plurality of threads can be obtained.
  • a video recording / playback apparatus includes the above-described processor, and performs video recording processing in a first thread and video playback processing in a second thread among the plurality of threads. May be.
  • the present invention can be realized not only as such a thread arbitration system, a processor, and a video recording / reproducing apparatus, but also as a thread arbitration method.
  • the thread arbitration system of the present invention when the first thread finishes using the shared resource among the plurality of threads, the first thread and the second thread different from the first thread are When the shared resource is in a usable state, the shared resource is allocated to the second thread before the first thread. Therefore, when there are a plurality of threads in a state where the shared resource can be used, No one thread keeps using shared resources.
  • FIG. 1 is a block diagram illustrating an example of a functional configuration of a processor including a thread arbitration system according to Embodiment 1 of the present invention.
  • FIG. 2 is a block diagram showing an example of a specific configuration of the dispatcher according to Embodiment 1 of the present invention.
  • FIG. 3 is a state transition diagram defining an example of the operation of the dispatcher according to the first embodiment of the present invention.
  • FIG. 4 is a diagram showing an example of the processing status of the execution stage of the specific instruction according to Embodiment 1 of the present invention.
  • FIG. 5 is a block diagram illustrating an example of a functional configuration of a processor including a thread arbitration system according to Embodiment 2 of the present invention.
  • FIG. 1 is a block diagram illustrating an example of a functional configuration of a processor including a thread arbitration system according to Embodiment 2 of the present invention.
  • FIG. 6 is a state transition diagram defining an example of the operation of the dispatcher according to the second embodiment of the present invention.
  • FIG. 7 is a diagram showing an example of the processing status of the execution stage of the specific instruction according to the second embodiment of the present invention.
  • FIG. 8 is a block diagram illustrating an example of a functional configuration of a processor including a thread arbitration system according to Embodiment 3 of the present invention.
  • FIG. 9 is a state transition diagram defining an example of the operation of the dispatcher according to the third embodiment of the present invention.
  • FIG. 10 is a diagram showing an example of the processing status of the execution stage of a specific instruction according to Embodiment 3 of the present invention.
  • FIG. 11 is a block diagram showing an example of a functional configuration of a processor system according to Embodiment 4 of the present invention.
  • FIG. 12 is a diagram showing an example of the appearance of a video recording / reproducing apparatus using the processor system according to Embodiment 4 of the present invention.
  • FIG. 13 is a block diagram illustrating an example of a functional configuration of a processor according to a comparative example.
  • FIG. 14 is a block diagram illustrating an example of a functional configuration of a processor according to a comparative example.
  • FIG. 15 is a diagram schematically illustrating a typical example of the configuration and operation of a conventional processor.
  • FIG. 16 is a diagram for explaining a problem in processing of an execution stage of a specific instruction.
  • FIG. 1 is a block diagram illustrating an example of a functional configuration of a processor 10 including a dispatcher 30 as a thread arbitration system according to Embodiment 1 of the present invention.
  • FIG. 1 shows a memory 60 accessed from the processor 10 together with the processor 10.
  • the processor 10 is a processor that can process a plurality of threads in a pseudo-parallel manner, and includes a fetch unit 20, a dispatcher 30, a decoder 40, an arithmetic unit A51, an arithmetic unit B52, an arithmetic unit X53, and a signal line 58.
  • the memory 60 holds a thread P61, a thread Q62, and a thread R63.
  • the thread P61, the thread Q62, and the thread R63 are computer programs executed by the processor 10, respectively.
  • the fetch unit 20 fetches the instructions of the thread P61, the thread Q62, and the thread R63 from the memory 60, and sequentially supplies the fetched instructions to the dispatcher 30.
  • the dispatcher 30 functions as the thread arbitration system of the present invention by dispatching the instructions supplied from the fetch unit 20 in a predetermined order. Information related to thread arbitration is recorded in the control table 35. The instruction dispatched from the dispatcher 30 is delivered to the decoder 40.
  • the decoder 40 decodes the instruction delivered from the dispatcher 30 to identify the type of the instruction. Depending on the identified type of instruction, the decoder 40 sends the instruction to any one of the arithmetic unit A51, the arithmetic unit B52, and the arithmetic unit X53. The execution stage process is performed.
  • the computing unit A51, the computing unit B52, and the computing unit X53 process instruction execution stages (for example, arithmetic operations, logical operations, etc.).
  • the fetch unit 20, the dispatcher 30, the decoder 40, the arithmetic unit A51, and the arithmetic unit B52 of the processor 10 are configured to process the stage of the instruction that they are responsible for in one time slot. In these stages, the instructions of different threads are processed in an orderly manner for each time slot.
  • the arithmetic unit X53 processes the execution stage of the specific instruction over a plurality of time slots.
  • the specific instruction may be a division instruction
  • the arithmetic unit X53 may be a divider that processes an execution stage of the division instruction.
  • an instruction in which an execution stage is processed over a plurality of time slots is generally called a specific instruction.
  • the computing unit X53 is an example of the shared resource of the present invention, and is occupied by each thread in a time division manner in order to process the execution stage of a specific instruction.
  • processor 10 configured as described above, there may occur a situation in which specific instructions of a plurality of threads finish processing of the upstream stage and wait for the start of the execution stage.
  • the dispatcher 30 determines that the specific instruction subsequent to the first thread and the specific instruction of the second thread different from the first thread are present.
  • the specific instruction of the second thread is dispatched before the specific instruction of the subsequent thread.
  • this operation of the dispatcher 30 is performed when the first thread and the second thread can use the shared resource when the first thread finishes using the shared resource. This is equivalent to the operation of allocating the shared resource to the second thread before one thread.
  • FIG. 2 is a block diagram showing an example of a specific configuration of the dispatcher 30.
  • the control table 35 is configured by a FIFO (First-In First-Out) 35 a that can temporarily hold a specific command.
  • FIFO First-In First-Out
  • the computing unit status signal notified from the computing unit X53 to the dispatcher 30 indicates whether the computing unit X53 is free (IDLE) or in use (BUSY).
  • FIG. 3 is a state transition diagram that defines an example of the operation of the dispatcher 30.
  • EMPTY in FIG. 3 indicates a state in which the FIFO 35a is empty, and EXIST indicates a state in which one or more specific instructions are contained in the FIFO 35a.
  • Curved arrows indicate state transitions, and the explanation given to the arrows indicates the conditions under which state transitions occur and the operations performed by dispatcher 30 during the state transitions (only when there are operations to be performed) separated by slashes. .
  • the dispatcher 30 operates as follows according to the state transition diagram shown in FIG.
  • the FIFO 35a is empty (S10). At this time, when a specific instruction is supplied from the fetch unit 20, the dispatcher 30 writes the specific instruction in the FIFO 35a (S11). When a specific instruction is further supplied from the fetch unit 20, the dispatcher 30 writes the specific instruction in the FIFO 35a (S12). If the arithmetic unit X53 is BUSY, the specific instruction in the FIFO 35a is held in the FIFO 35a without being dispatched (S13).
  • the dispatcher 30 immediately reads the first specific instruction from the FIFO 35a and dispatches it (S14, S15). As a result, the specific instruction that has been in the state where the computing unit X53, which is the shared resource, can be used first is dispatched to the downstream stage. When the dispatcher 30 reads and dispatches the last specific instruction from the FIFO 35a, the FIFO 35a becomes empty (S15).
  • the dispatcher 30 when two or more threads are in a state where each of the two or more threads can use the shared resource when a certain thread finishes using the shared resource, the dispatcher 30 The specific instruction of the thread that is in the state where the shared resource can be used first is dispatched.
  • FIG. 4 is a diagram showing an example of the processing status of the execution stage of the specific instruction in the computing unit X53 when the dispatcher 30 performs the above-described operation.
  • FIG. 4 shows the processing status of the execution stages of the three specific instructions P1, P2, and P3 of the thread P, the three specific instructions Q1, Q2, and Q3 of the thread Q, and the two specific instructions R1 and R2 of the thread R. Is shown.
  • the execution stage of the specific instruction can be started (thick line) by being written in the FIFO 35a when the processing of the upstream stage is completed.
  • the specific instruction that can be started is immediately dispatched and executed (solid-line band) if the arithmetic unit X53 is free, and waits for start (broken-line band) if the arithmetic unit X53 is in use.
  • the guaranteed time which is the upper limit of the time required from the start of each specific command to the completion of execution, is indicated by an arrow.
  • This guaranteed time is represented by a number of time slots obtained by multiplying the number of threads that need to guarantee the processing time of a specific instruction by the number of time slots for processing the execution stage of the specific instruction.
  • the processing time of the specific instruction of the three threads P, Q, and R can be guaranteed, and the guaranteed time is 9 time slots, assuming that 3 time slots are required for the execution stage of the specific instruction.
  • the feature of this operation is that, as seen in the time slot 4, when the thread P finishes using the shared resource, the thread P and a thread Q different from the thread P can use the shared resource.
  • the shared resource is allocated to the thread Q before the thread P.
  • Such an operation is performed when two or more threads (threads P, Q, R) in which the dispatcher 30 can use the shared resource when a certain thread (thread P) finishes using the shared resource. ),
  • the shared resource is allocated to a thread (thread Q) that has been in a state where the shared resource can be used first.
  • any one thread will not continue to use the shared resource, and the shared resource will be assigned to all threads evenly. As a result, the time required for processing the specific instruction of each thread can be guaranteed.
  • FIG. 5 is a block diagram illustrating an example of a functional configuration of the processor 11 including the dispatcher 31 as the thread arbitration system according to the second embodiment of the present invention.
  • the processor 11 differs from the processor 10 of the first embodiment in the contents of the control table 36 and the operation of the dispatcher 31.
  • the operation of the dispatcher 31 as a thread arbitration system is that the first thread and the second thread can use the shared resource when the first thread finishes using the shared resource. This is common to the dispatcher 30 of the first embodiment in that the shared resource is allocated to the second thread first.
  • the thread that waits for the start of the execution stage by another thread waits until the execution stage of the thread that has waited for itself is completed. It differs from the dispatcher 30 of the first embodiment in that the start is restricted (inhibited).
  • the control table 36 has a command status column 36a, a specific command column 36b, and an inhibitor column 36c corresponding to each thread.
  • Each column of the control table 36 is configured using a register, for example.
  • the instruction status column 36a holds information indicating that the execution stage of the specific instruction is being executed by the computing unit X53 (EXEC), waiting for start (READY), or there is no specific instruction to be executed (NONE).
  • the specific command column 36b holds a specific command waiting for the execution stage to be started or being executed.
  • the inhibitor column 36c information for identifying another thread that has waited for the start of the execution stage by the thread corresponding to the inhibitor column 36c is held.
  • the start of the execution stage of the specific instruction of the thread corresponding to the inhibitor column 36c is regulated by the thread recorded in the inhibitor column 36c.
  • the computing unit status signal notified from the computing unit X53 to the dispatcher 31 indicates whether the computing unit X53 is free (IDLE) or in use (BUSY).
  • FIG. 6 is a state transition diagram defining an example of the operation of the dispatcher 31 configured as described above.
  • the dispatcher 31 performs the operations defined in the state transition diagram of FIG. 6 in parallel for each of a plurality of threads.
  • NONE, READY, and EXEC in FIG. 6 indicate the contents of the instruction status column 36a of the thread to be operated.
  • the command status column 36a is NONE (S20).
  • the dispatcher 31 sets the instruction state column 36a to READY and records the specific instruction in the specific instruction column 36b (S21).
  • the dispatcher 31 records information for identifying the target thread in the inhibitor column 36c of the other thread whose instruction status column 36a is EXEC or READY, so that the execution stage of the other thread is recorded.
  • the start is restricted (S22).
  • the dispatcher 31 dispatches the specific instruction recorded in the specific instruction column 36b if the inhibitor column 36c of the target thread is not empty, that is, the start is restricted by another thread. It waits without doing (S23).
  • the dispatcher 31 dispatches the specific instruction recorded in the specific instruction column 36b if the computing unit X53 is IDLE and the inhibitor column 36c of the target thread is empty, that is, if the start is not restricted by another thread,
  • the instruction status column 36a is set to EXEC (S24).
  • the dispatcher 31 removes the information for identifying the target thread from the inhibitor column 36c of the other thread, thereby releasing the restriction on the other thread. If the next specific instruction of the target thread is supplied from the fetch unit 20, the dispatcher 31 sets the instruction state column 36a to READY, records the specific instruction in the specific instruction column 36b (S25), and If there is no specific command, the dispatcher 31 sets the command status column 36a to NONE (S26).
  • Such an operation is performed in parallel for each of a plurality of threads, so that a thread that has waited for the start of the execution stage by another thread waits for itself until the end of its execution stage.
  • the overall operation of regulating the start of the next execution stage of the thread is realized.
  • FIG. 7 is a diagram showing an example of the processing status of the execution stage of the specific instruction in the computing unit X53 when the dispatcher 31 performs the above-described operation.
  • FIG. 7 shows the processing status of the execution stages of the three specific instructions P1, P2, and P3 of the thread P, the three specific instructions Q1, Q2, and Q3 of the thread Q, and the two specific instructions R1 and R2 of the thread R. Is shown.
  • the execution stage of the specific instruction can be started (thick line) by being written in the specific instruction column 36b when the processing of the upstream stage is completed.
  • a specific instruction that can be started is dispatched and executed (solid-line band) unless the operation unit X53 is free and the start of the operation is not restricted by another thread, and the operation unit X53 is in use. If the start is restricted by another thread, the start is being restricted (dashed hatched band).
  • the reference numerals in parentheses displayed for the thread whose start is restricted indicate the thread that restricts the start of the thread.
  • the guaranteed time which is the upper limit of the time required from the start of each specific command to the completion of execution, is indicated by an arrow. This guarantee time is the same as the guarantee time described in FIG.
  • the feature of this operation is that, as seen in the time slot 4, when the thread P finishes using the shared resource, the thread P and a thread Q different from the thread P can use the shared resource.
  • the shared resource is allocated to the thread Q before the thread P.
  • the dispatcher 31 is realized by dispatching a specific instruction that follows the first thread (thread P) after the second thread (threads Q and R) has finished using the shared resource.
  • any one thread will not continue to use the shared resource, and the shared resource will be assigned to all threads evenly. As a result, the time required for processing the specific instruction of each thread can be guaranteed.
  • FIG. 8 is a block diagram illustrating an example of a functional configuration of the processor 12 including the dispatcher 32 as the thread arbitration system according to the third embodiment of the present invention.
  • the processor 12 differs from the processor 10 of the first embodiment in the contents of the control table 37 and the operation of the dispatcher 32.
  • a priority is set for each of a plurality of threads and the thread arbitration is performed based on the priority, as compared with the dispatcher 30 of the first embodiment and the dispatcher 31 of the second embodiment. The point is different.
  • the dispatcher 32 stops the execution stage being processed and the priority becomes higher. Interrupt control that starts the execution stage of a specific instruction of a high thread is performed.
  • the dispatcher 32 performs a thread arbitration operation equivalent to that of the dispatcher 30 of the first embodiment or the dispatcher 31 of the second embodiment between a plurality of threads having the same priority.
  • an interrupt performed by the dispatcher 32 is performed. The control will be described in detail.
  • the control table 37 has a command status column 37a, a specific command column 37b, and a priority column 37c corresponding to each thread.
  • Each column of the control table 37 is configured using a register, for example.
  • the instruction status column 37a holds information indicating that the execution stage of the specific instruction is being executed by the computing unit X53 (EXEC), waiting for start (READY), or there is no specific instruction to be executed (NONE).
  • the specific command column 37b holds a specific command waiting for the start of the execution stage or being executed.
  • the priority column 37c holds a value indicating the priority of the thread. The smaller this value, the higher the priority. The maximum number of priorities is not limited.
  • the computing unit status signal notified from the computing unit X53 to the dispatcher 32 indicates whether the computing unit X53 is free (IDLE) or in use (BUSY).
  • FIG. 9 is a state transition diagram defining an example of the operation of the dispatcher 32 configured as described above.
  • the dispatcher 32 performs the operations defined in the state transition diagram of FIG. 9 in parallel for each of a plurality of threads.
  • NONE, READY, and EXEC in FIG. 9 indicate the contents of the instruction status column 37a of the thread to be operated.
  • the command status column 37a is NONE (S30).
  • the dispatcher 32 sets the instruction state column 37a to READY and records the specific instruction in the specific instruction column 37b (S31).
  • the dispatcher 32 prioritizes the priority of the other thread (that is, the thread currently using the arithmetic unit X53) whose instruction state column 37a is ACTIVE and the priority of the target thread. Comparison is made based on the value in the degree column 37c. If the thread using the arithmetic unit X53 is an equal thread having a priority equal to the priority of the target thread or an upper thread having a higher priority, a specific instruction of the target thread is dispatched. It waits without doing (S32).
  • the dispatcher 32 dispatches the specific instruction recorded in the specific instruction column 37b and sets the instruction state column 37a to EXEC (S33).
  • the dispatcher 32 waits for the execution stage of the specific instruction currently being processed by the computing unit X53 to end. Instead, the specific instruction recorded in the specific instruction column 37b is dispatched, and the instruction state column 37a is set to EXEC (S34).
  • the arithmetic unit X53 stops the execution stage of the specific instruction currently being processed and starts processing the execution stage of the new specific instruction.
  • the dispatcher 32 sets the instruction status column 37a to READY, records the specific command in the specific command column 37b (S36), and if there is no next specific command, the dispatcher 32 sets the command status column 37a to NONE (S37). .
  • the execution stage of a specific instruction of a thread with a higher priority can be started during the processing of the execution stage of a specific instruction of a thread.
  • interrupt control is realized in which the execution stage being processed is stopped and the execution stage of a specific instruction of a thread having a higher priority is started.
  • FIG. 10 is a diagram showing an example of the processing status of the execution stage of the specific instruction in the computing unit X53 when the dispatcher 32 performs the above-described operation.
  • FIG. 10 shows the processing status of the execution stages of the three specific instructions P1, P2, and P3 of the thread P, the three specific instructions Q1, Q2, and Q3 of the thread Q, and the two specific instructions R1 and R2 of the thread R. Is shown. Here, it is assumed that the priority of the threads P and Q is higher than the priority of the thread R.
  • the execution stage of the specific instruction can be started (thick line) by being written in the specific instruction column 36b when the processing of the upstream stage is completed.
  • the specific instruction that can be started is immediately dispatched and being executed if the computing unit X53 is free, or is interrupted and dispatched and running if the computing unit X53 is being used by a lower-level thread ( If the arithmetic unit X53 is being used by an equal or higher-level thread, the operation waits for start (dashed white line).
  • the execution stage of the interrupted lower thread is aborted (dashed vertical stripes) and later dispatched again.
  • the execution stage of the specific instruction dispatched again may be restarted from the beginning. Also, when the execution stage of a specific instruction is canceled and the progress (shared resource state) is held in a save resource (for example, a register not shown), and the specific instruction is dispatched again In addition, the intermediate process held in the save resource may be returned to the shared resource to continue processing. It is only necessary to provide a number of resources for saving by one less than the maximum number of priorities.
  • the guaranteed time which is the upper limit of the time required from the start of each specific command to the completion of execution, is indicated by an arrow.
  • This guaranteed time is represented by a number of time slots obtained by multiplying the number of threads that need to guarantee the processing time of a specific instruction by the number of time slots for processing the execution stage of the specific instruction.
  • the processing time of the specific instruction of the two threads P and Q having the highest priority can be guaranteed, and the guaranteed time is 6 time slots, assuming that the execution stage of the specific instruction requires 3 time slots. .
  • the guaranteed time is shortened by reducing the number of threads that guarantee the processing time.
  • the feature of this operation is the interrupt control in which the upper thread interrupts the lower thread to acquire the shared resource as seen in the time slot 17 and the time slot 22. Due to such an operation, the processing time of the specific instruction of the lower thread is not guaranteed, but the guaranteed time of the upper thread is shortened.
  • the processors 10, 11, and 12 described above include dispatchers 30, 31, and 32 as specific thread arbitration systems, respectively, and can guarantee the processing time of specific instructions of a plurality of threads. This is extremely useful for applications that perform processing requiring time.
  • Embodiment 4 of the present invention a processor system and a video recording / reproducing apparatus as examples of such applications will be described.
  • FIG. 11 is a block diagram showing an example of a functional configuration of the processor system 100 using the processors 10, 11, or 12 according to the fourth embodiment of the present invention.
  • the processor system 100 is a system LSI that performs various signal processing relating to a video / audio stream, and includes the processors 10, 11, or 12 described above.
  • the processor system 100 is used in, for example, a video recording / reproducing apparatus.
  • FIG. 12 is a diagram showing an example of the appearance of the video recording / reproducing apparatus 200 using the processor system 100.
  • the video recording / playback apparatus 200 acquires a video / audio stream from a broadcast wave, and displays the broadcast program on the display device 201 while recording the broadcast program represented by the video / audio stream. I do.
  • the processor system 100 includes a processor 10, a stream I / O block 71, an AVIO (Audio Visual Input Output) block 72, and a memory IF block 73.
  • a processor 10 As shown in FIG. 11, the processor system 100 includes a processor 10, a stream I / O block 71, an AVIO (Audio Visual Input Output) block 72, and a memory IF block 73.
  • AVIO Audio Visual Input Output
  • the processor system 100 acquires a video / audio stream from a broadcast wave by the stream I / O block 71, decompresses the video / audio stream into video / audio data by the processor 10, and In block 72, a video / audio signal is generated from the video / audio data and output to the display device 201.
  • the processor system 100 records the broadcast program in parallel with the display.
  • the processor 10 compresses the video / audio data into a recording format by the processor 10 and stores the compressed video / audio data in the memory IF block.
  • the data is recorded in the external memory 60 via 73.
  • each time required for the decompression process of the video / audio stream and the compression process of the video / audio data performed by the processor 10 is accurately determined. Need to be estimated.
  • the processing time of the instruction is guaranteed by executing the video reproduction processing including the video / audio stream decompression processing and the video recording processing including the video / audio data compression processing by the processor 10 as threads. This makes it possible to accurately estimate the required time for the video / audio stream decompression processing (generally video display processing) and video / audio data compression processing (generally video recording processing).
  • FIG. 13 is a block diagram illustrating an example of a functional configuration of a processor according to a comparative example.
  • This processor has as many computing units as threads that can be processed.
  • each of the plurality of threads can occupy the arithmetic unit, so that the processing time of the thread can be guaranteed.
  • the number of threads changes, it is necessary to change the number of computing units, and there are disadvantages that the area of the processor and the power consumption increase.
  • FIG. 14 is a block diagram illustrating an example of a functional configuration of a processor according to another comparative example.
  • the processor is divided into as many stages as threads that can be processed by the execution stage.
  • the processing time of the threads can be guaranteed.
  • the number of threads changes, it is necessary to change the number of stages, and there are disadvantages that the area of the processor and the power consumption increase.
  • the number of arithmetic units equal to the number of threads is provided, the flexibility of the configuration is lacking in that it is necessary to divide the stage, and the area and power consumption of the processor increase. Does not give a sufficiently satisfactory solution to guarantee the processing time.
  • a processor including the thread arbitration system according to the embodiment of the present invention may have only one arithmetic unit X53, and the number of execution stages divided may be fixed.
  • the processing time of each thread is guaranteed by controlling the execution order of specific instructions of each thread, there is an advantage that an increase in the area and power consumption of the processor can be suppressed as compared with the processor of the comparative example.
  • the thread arbitration system according to the present invention is useful for applications where it is necessary to guarantee the processing time of each of a plurality of threads in a multi-thread processor, a video recording / reproducing apparatus, and the like.

Landscapes

  • Engineering & Computer Science (AREA)
  • Software Systems (AREA)
  • Theoretical Computer Science (AREA)
  • Physics & Mathematics (AREA)
  • General Engineering & Computer Science (AREA)
  • General Physics & Mathematics (AREA)
  • Bus Control (AREA)

Abstract

In the present invention, an arithmetic unit (X) (53), which is a shared resource that a processor (10) has, is appropriated in a time-shared manner by means of a particular command contained in each thread (61-63). The threads enter a state that can use the arithmetic unit (X) (53) by means of the upstream stage (20-40) of the particular command being processed in time slots that are sequentially and exclusively allocated to the threads; afterwards, the arithmetic unit (X) (53) is appropriated in order to process the downstream stage of the particular command across the plurality of time slots. When a first thread has finished using the arithmetic unit (X) (53), in the case that the first thread and a second thread that is different from the first thread are each in the state that can use the arithmetic unit (X) (53), a thread arbitration system allocates the arithmetic unit (X) (53) to the second thread before the first thread.

Description

スレッド調停システム、プロセッサ、映像記録再生装置、およびスレッド調停方法Thread arbitration system, processor, video recording / reproducing apparatus, and thread arbitration method
 本発明は、スレッド調停システムに関し、特にマルチスレッドプロセッサに用いられるスレッド調停システムに関する。 The present invention relates to a thread arbitration system, and more particularly to a thread arbitration system used for a multi-thread processor.
 従来、複数のスレッドを擬似並列的に処理できるマルチスレッドプロセッサが提案されている(例えば、特許文献1を参照)。スレッドとは、コンピュータシステムにおいて実行される1単位の処理、またはその処理をコンピュータシステムに実行させるためのコンピュータプログラムのことを言う。スレッドの大きさ(処理量または命令数)は、プログラムの設計者によって、任意に定められる。 Conventionally, a multi-thread processor capable of processing a plurality of threads in a pseudo-parallel manner has been proposed (see, for example, Patent Document 1). A thread refers to a unit of processing executed in a computer system or a computer program for causing a computer system to execute the processing. The size of the thread (the processing amount or the number of instructions) is arbitrarily determined by the program designer.
 図15は、そのようなプロセッサの構成および動作の一典型例を模式的に示す図である。このプロセッサは、ハードウェアリソース(以下では短く、リソースと言う)として、フェッチユニット、ディスパッチャ、デコーダ、演算器A、演算器Bを備えている。各命令は、複数のステージに分割して、各ステージに対応付けられたリソースによってパイプライン処理される。 FIG. 15 is a diagram schematically showing a typical example of the configuration and operation of such a processor. This processor includes a fetch unit, a dispatcher, a decoder, an arithmetic unit A, and an arithmetic unit B as hardware resources (hereinafter referred to as resources for short). Each instruction is divided into a plurality of stages and pipelined by resources associated with each stage.
 図15には、3つのスレッドP、Q、Rの命令P1、Q1、R1、・・・の処理状況が示されている。このプロセッサでは、全ての命令の全てのステージが1単位時間で処理されるように構成され、各ステージでは、1単位時間ごとに異なるスレッドの命令が整然と処理されていく。 FIG. 15 shows the processing status of the instructions P1, Q1, R1,... Of the three threads P, Q, and R. This processor is configured so that all stages of all instructions are processed in one unit time, and in each stage, instructions of different threads are processed in an orderly manner for each unit time.
 1単位時間は、例えば1クロックサイクルであってもよく、また所定の複数のクロックサイクルであってもよい。以下の説明では、1単位時間のことを一般的に1タイムスロットと言う。 1 unit time may be, for example, one clock cycle, or may be a plurality of predetermined clock cycles. In the following description, one unit time is generally referred to as one time slot.
 このように構成されたプロセッサでは、プロセッサ内部のリソースを使用するためにスレッド間の競合が起こり得ないので、各スレッドからは実際の1/3の速度で動作するプロセッサを占有しているように見える。そして、各スレッドの各命令は常に確定した時間で処理が完了する。 In a processor configured in this manner, since contention between threads cannot occur because resources inside the processor are used, it seems that each thread occupies a processor operating at an actual speed of 1/3. appear. Then, the processing of each instruction of each thread is always completed in a fixed time.
 このことは、複数のスレッドのそれぞれが実時間性を要求される処理を行う場合など、各スレッドの命令の処理に要する時間を保証する必要がある場合に極めて有用である。 This is extremely useful when it is necessary to guarantee the time required for processing instructions of each thread, such as when each of a plurality of threads performs processing that requires real-time processing.
特表2003-523561号公報Special Table 2003-523561
 ところが、実際的には、実行ステージの処理に複数のタイムスロットを要する特定命令(例えば、除算命令)が定義され、プロセッサは、そのような特定命令の実行ステージを複数のタイムスロットにわたって処理するように構成されることがある。 However, in practice, a specific instruction (for example, a division instruction) that requires a plurality of time slots to process an execution stage is defined, and the processor processes the execution stage of such a specific instruction over a plurality of time slots. May be configured.
 説明のための例示として、特定命令(例えば除算命令)の上流ステージを1タイムスロットで処理し、特定命令の実行ステージを、共有リソース(例えば除算器)を用いて連続する3タイムスロットで処理するプロセッサを考える。共有リソースは、複数の特定命令によって時分割に占有される。 As an illustrative example, the upstream stage of a specific instruction (eg, a division instruction) is processed in one time slot, and the execution stage of the specific instruction is processed in three consecutive time slots using a shared resource (eg, a divider). Think of a processor. The shared resource is occupied in a time division manner by a plurality of specific instructions.
 そのようなプロセッサでは、前述したプロセッサとは異なり、複数のスレッドの特定命令が、上流ステージの処理を終えて実行ステージの開始を待つ状況が生じ得る。そのような状況で、次にどのスレッドの特定命令の実行ステージを共有リソースで処理するのがよい方法なのかは必ずしも自明ではない。 In such a processor, unlike the above-described processor, there may occur a situation where specific instructions of a plurality of threads wait for the start of the execution stage after finishing the processing of the upstream stage. In such a situation, it is not necessarily obvious which thread should execute the specific instruction execution stage next with the shared resource.
 単純な一例として、タイムスロットごとに特定命令の実行ステージを開始することが許可されるスレッドを定めておき、先行する特定命令の実行ステージが終わったときに待っている後続の特定命令のうち、そのタイムスロットで許可されるスレッドの特定命令の実行ステージを開始する方法を考えてみる。 As a simple example, a thread that is allowed to start an execution stage of a specific instruction for each time slot is defined, and among the subsequent specific instructions that are waiting when the execution stage of the preceding specific instruction ends, Consider how to start the execution stage of a specific instruction of a thread that is allowed in that time slot.
 図16は、そのような考え方に基づく特定命令の実行ステージの処理状況の一例を示す図である。 FIG. 16 is a diagram showing an example of the processing status of the execution stage of a specific instruction based on such a concept.
 図16には、スレッドPの3つの特定命令P1、P2、P3、スレッドQの特定命令Q1、およびスレッドRの特定命令R1のそれぞれの実行ステージの処理状況が示される。特定命令の実行ステージは、上流ステージの処理が終了することで開始可能(太線)になる。実行ステージが開始可能になった特定命令は、そのスレッドの実行ステージの開始が許可されるタイムスロットが回ってきたときに共有リソースが空いている場合のみ実行中(実線の帯)になり、それまでは開始待ち(破線の帯)となる。 FIG. 16 shows the processing status of each of the execution stages of the three specific instructions P1, P2, and P3 of the thread P, the specific instruction Q1 of the thread Q, and the specific instruction R1 of the thread R. The execution stage of the specific instruction can be started (thick line) when the processing of the upstream stage is completed. A specific instruction whose execution stage can be started is executed (solid line) only when the shared resource is free when the time slot in which the execution stage of the thread is permitted is turned around. Until it is a start wait (broken line).
 図16の例では、タイムスロット6の終了時点で3つの特定命令Q1、R1、P3が開始可能となり、演算器が空くタイムスロット7において、タイムスロット7で開始を許可されるスレッドPの特定命令P3の実行ステージが開始される。その結果、スレッドQの特定命令Q1およびスレッドRの特定命令R1は、いつ開始できるか不明のまま待たされ続けるという不都合が生じる。 In the example of FIG. 16, three specific instructions Q1, R1, and P3 can be started at the end of the time slot 6, and the specific instruction of the thread P that is permitted to start in the time slot 7 in the time slot 7 in which the computing unit is vacant. The execution stage of P3 is started. As a result, there is a disadvantage that the specific instruction Q1 of the thread Q and the specific instruction R1 of the thread R are kept waiting without knowing when they can start.
 つまり、一例として挙げたような、タイムスロットごとに特定命令の実行ステージを開始することが許可されるスレッドを定める方法では、各スレッドの命令の処理に要する時間を保証できないことが分かる。 That is, it can be seen that the method of determining a thread that is permitted to start the execution stage of a specific instruction for each time slot as exemplified above cannot guarantee the time required to process the instruction of each thread.
 しかしながら、各スレッドの命令の処理に要する時間を保証するための好適な方法は、従来知られていない。 However, a suitable method for guaranteeing the time required for processing the instruction of each thread has not been known.
 本発明は、上記の事情に鑑みてなされたものであり、複数のスレッドを実行できるプロセッサにおいて好適に用いられ、各スレッドの命令の処理に要する時間を保証することができるスレッド調停システムを提供することを目的とするものである。 The present invention has been made in view of the above circumstances, and provides a thread arbitration system that is suitably used in a processor that can execute a plurality of threads and that can guarantee the time required to process instructions of each thread. It is for the purpose.
 前記従来の課題を解決するために、本発明の1つの態様に係るスレッド調停システムは、各々がコンピュータプログラムに対応する複数のスレッドを、共有リソースを用いて実行するプロセッサにおいて、前記複数のスレッドに前記共有リソースを割り当てるための調停を行うスレッド調停システムであって、前記プロセッサにおいて、前記共有リソースは、前記各スレッドに含まれる特定命令によって時分割に占有され、前記各スレッドは、前記各スレッドに順次排他的に割り当てられるタイムスロットにおいて前記特定命令の上流ステージが処理されることによって前記共有リソースを使用できる状態になり、その後、複数のタイムスロットにわたって前記特定命令の下流ステージの処理のために前記共有リソースを占有し、前記スレッド調停システムは、前記複数のスレッドのうち、第1スレッドが前記共有リソースの使用を終了したときに、前記第1スレッドと前記第1スレッドとは異なる第2スレッドとがそれぞれ前記共有リソースを使用できる状態にある場合、前記第1スレッドよりも先に前記第2スレッドに前記共有リソースを割り当てる。 In order to solve the above-described conventional problem, a thread arbitration system according to one aspect of the present invention includes a processor that executes a plurality of threads each corresponding to a computer program using a shared resource. In the thread arbitration system that performs arbitration for allocating the shared resource, in the processor, the shared resource is occupied in a time-sharing manner by a specific instruction included in each thread, and each thread is assigned to each thread. The shared resource can be used by processing the upstream stage of the specific instruction in a time slot that is sequentially and exclusively allocated, and then, for the processing of the downstream stage of the specific instruction over a plurality of time slots. Occupies a shared resource and When the first thread of the plurality of threads terminates the use of the shared resource, the first thread and a second thread different from the first thread use the shared resource, respectively. When it is in a ready state, the shared resource is allocated to the second thread prior to the first thread.
 また、前記スレッド調停システムは、前記複数のスレッドのうち、第1スレッドが前記共有リソースの使用を終了したときに、2つ以上のスレッドがそれぞれ前記共有リソースを使用できる状態にある場合、前記2つ以上のスレッドのうち最も先に前記共有リソースを使用できる状態になっていたスレッドの特定命令を下流ステージへディスパッチしてもよい。 In the thread arbitration system, when two or more threads can use the shared resource when the first thread finishes using the shared resource among the plurality of threads, Of the two or more threads, the specific instruction of the thread that has been able to use the shared resource first may be dispatched to the downstream stage.
 また、前記スレッド調停システムは、前記複数のスレッドのうち、第1スレッドが前記共有リソースを使用中に、前記第1スレッドとは異なる第2スレッドが前記共有リソースを使用できる状態になった場合、前記第1スレッドの後続する特定命令を、前記第2スレッドが前記共有リソースの使用を終了した後に、下流ステージへディスパッチしてもよい。 Further, the thread arbitration system, when the second thread different from the first thread can use the shared resource while the first thread is using the shared resource among the plurality of threads, The subsequent specific instruction of the first thread may be dispatched to the downstream stage after the second thread has finished using the shared resource.
 このような構成によれば、共有リソースを使用できる状態になっている複数のスレッドがある場合に、いずれか1つのスレッドが共有リソースを使い続けることがないので、共有リソースはすべてのスレッドに満遍なく割り当てられることになり、その結果、各スレッドの特定命令は、所定の保証時間内で処理を完了できる。 According to such a configuration, when there are multiple threads in a state where the shared resource can be used, any one thread does not continue to use the shared resource, so the shared resource is evenly distributed to all threads. As a result, the specific instruction of each thread can complete the processing within a predetermined guarantee time.
 また、前記スレッド調停システムは、前記各スレッドに優先度を定め、前記複数のスレッドのうち、第1スレッドが前記共有リソースを使用中に、前記第1スレッドよりも優先度が高い第2スレッドが前記共有リソースを使用できる状態になった場合、前記第1スレッドによる前記共有リソースの使用を中止させ、前記第2スレッドの前記特定命令を下流ステージへディスパッチし、前記第2スレッドが前記共有リソースの使用を終了した後に、前記第1スレッドに前記共有リソースの使用を再開させてもよい。 Further, the thread arbitration system sets a priority for each thread, and among the plurality of threads, a second thread having a higher priority than the first thread is used while the first thread is using the shared resource. When the shared resource becomes available, the use of the shared resource by the first thread is stopped, the specific instruction of the second thread is dispatched to a downstream stage, and the second thread After the use is finished, the first thread may resume the use of the shared resource.
 このような構成によれば、最高位の優先度を持つスレッドについてのみ処理時間を保証することで、低位の優先度を持つスレッドの処理時間を保証しない代わりに、最高位の優先度を持つスレッドの特定命令に対してより短い処理時間を保証することができる。 According to such a configuration, the processing time is guaranteed only for the thread having the highest priority, so that the thread having the highest priority is not guaranteed, instead of guaranteeing the processing time of the thread having the lower priority. A shorter processing time can be guaranteed for a specific instruction.
 また、本発明の1つの態様に係るプロセッサは、前述のスレッド調停システムを備えてもよい。 The processor according to one aspect of the present invention may include the above-described thread arbitration system.
 このような構成によれば、複数のスレッドの所要時間を保証できるプロセッサが得られる。 According to such a configuration, a processor that can guarantee the time required for a plurality of threads can be obtained.
 また、本発明の1つの態様に係る映像記録再生装置は、前述のプロセッサを備え、前記複数のスレッドのうち、第1スレッドにて映像記録処理を行い、第2スレッドにて映像再生処理を行ってもよい。 A video recording / playback apparatus according to one aspect of the present invention includes the above-described processor, and performs video recording processing in a first thread and video playback processing in a second thread among the plurality of threads. May be.
 このような構成によれば、映像記録処理および映像再生処理の所要時間が正確に見積もられることから、それらの処理の所要時間の見積もりが定まらないことで生じる映像の欠落を回避する上で有効である。 According to such a configuration, since the time required for the video recording process and the video reproduction process can be accurately estimated, it is effective in avoiding the lack of video caused by the estimation of the time required for these processes not being determined. is there.
 本発明は、このようなスレッド調停システム、プロセッサ、および映像記録再生装置として実現できるだけでなく、スレッド調停方法として実現することもできる。 The present invention can be realized not only as such a thread arbitration system, a processor, and a video recording / reproducing apparatus, but also as a thread arbitration method.
 本発明に係るスレッド調停システムによれば、複数のスレッドのうち、第1スレッドが共有リソースの使用を終了したときに、前記第1スレッドと前記第1スレッドとは異なる第2スレッドとがそれぞれ前記共有リソースを使用できる状態にある場合、前記第1スレッドよりも先に前記第2スレッドに前記共有リソースを割り当てるので、前記共有リソースを使用できる状態になっている複数のスレッドがある場合に、いずれか1つのスレッドが共有リソースを使い続けることがない。 According to the thread arbitration system of the present invention, when the first thread finishes using the shared resource among the plurality of threads, the first thread and the second thread different from the first thread are When the shared resource is in a usable state, the shared resource is allocated to the second thread before the first thread. Therefore, when there are a plurality of threads in a state where the shared resource can be used, No one thread keeps using shared resources.
 したがって、共有リソースはすべてのスレッドに満遍なく割り当てられることになり、その結果、各スレッドの処理に要する時間を保証できる。 Therefore, shared resources are uniformly allocated to all threads, and as a result, the time required for processing of each thread can be guaranteed.
図1は、本発明の実施の形態1に係るスレッド調停システムを備えるプロセッサの機能的な構成の一例を示すブロック図である。FIG. 1 is a block diagram illustrating an example of a functional configuration of a processor including a thread arbitration system according to Embodiment 1 of the present invention. 図2は、本発明の実施の形態1に係るディスパッチャの具体的な構成の一例を示すブロック図である。FIG. 2 is a block diagram showing an example of a specific configuration of the dispatcher according to Embodiment 1 of the present invention. 図3は、本発明の実施の形態1に係るディスパッチャの動作の一例を定義する状態遷移図である。FIG. 3 is a state transition diagram defining an example of the operation of the dispatcher according to the first embodiment of the present invention. 図4は、本発明の実施の形態1に係る特定命令の実行ステージの処理状況の一例を示す図である。FIG. 4 is a diagram showing an example of the processing status of the execution stage of the specific instruction according to Embodiment 1 of the present invention. 図5は、本発明の実施の形態2に係るスレッド調停システムを備えるプロセッサの機能的な構成の一例を示すブロック図である。FIG. 5 is a block diagram illustrating an example of a functional configuration of a processor including a thread arbitration system according to Embodiment 2 of the present invention. 図6は、本発明の実施の形態2に係るディスパッチャの動作の一例を定義する状態遷移図である。FIG. 6 is a state transition diagram defining an example of the operation of the dispatcher according to the second embodiment of the present invention. 図7は、本発明の実施の形態2に係る特定命令の実行ステージの処理状況の一例を示す図である。FIG. 7 is a diagram showing an example of the processing status of the execution stage of the specific instruction according to the second embodiment of the present invention. 図8は、本発明の実施の形態3に係るスレッド調停システムを備えるプロセッサの機能的な構成の一例を示すブロック図である。FIG. 8 is a block diagram illustrating an example of a functional configuration of a processor including a thread arbitration system according to Embodiment 3 of the present invention. 図9は、本発明の実施の形態3に係るディスパッチャの動作の一例を定義する状態遷移図である。FIG. 9 is a state transition diagram defining an example of the operation of the dispatcher according to the third embodiment of the present invention. 図10は、本発明の実施の形態3に係る特定命令の実行ステージの処理状況の一例を示す図である。FIG. 10 is a diagram showing an example of the processing status of the execution stage of a specific instruction according to Embodiment 3 of the present invention. 図11は、本発明の実施の形態4に係るプロセッサシステムの機能的な構成の一例を示すブロック図である。FIG. 11 is a block diagram showing an example of a functional configuration of a processor system according to Embodiment 4 of the present invention. 図12は、本発明の実施の形態4に係るプロセッサシステムを用いた映像記録再生装置の外観の一例を示す図である。FIG. 12 is a diagram showing an example of the appearance of a video recording / reproducing apparatus using the processor system according to Embodiment 4 of the present invention. 図13は、比較例に係るプロセッサの機能的な構成の一例を示すブロック図である。FIG. 13 is a block diagram illustrating an example of a functional configuration of a processor according to a comparative example. 図14は、比較例に係るプロセッサの機能的な構成の一例を示すブロック図である。FIG. 14 is a block diagram illustrating an example of a functional configuration of a processor according to a comparative example. 図15は、従来のプロセッサの構成および動作の一典型例を模式的に示す図である。FIG. 15 is a diagram schematically illustrating a typical example of the configuration and operation of a conventional processor. 図16は、特定命令の実行ステージの処理における課題を説明する図である。FIG. 16 is a diagram for explaining a problem in processing of an execution stage of a specific instruction.
 以下、本発明の実施の形態について、図面を参照しながら説明する。 Hereinafter, embodiments of the present invention will be described with reference to the drawings.
 (実施の形態1)
 図1は、本発明の実施の形態1に係るスレッド調停システムとしてのディスパッチャ30を備えるプロセッサ10の機能的な構成の一例を示すブロック図である。図1には、プロセッサ10とともに、プロセッサ10からアクセスされるメモリ60が示されている。
(Embodiment 1)
FIG. 1 is a block diagram illustrating an example of a functional configuration of a processor 10 including a dispatcher 30 as a thread arbitration system according to Embodiment 1 of the present invention. FIG. 1 shows a memory 60 accessed from the processor 10 together with the processor 10.
 プロセッサ10は、複数のスレッドを擬似並列的に処理できるプロセッサであり、フェッチユニット20、ディスパッチャ30、デコーダ40、演算器A51、演算器B52、演算器X53、および信号線58を有している。 The processor 10 is a processor that can process a plurality of threads in a pseudo-parallel manner, and includes a fetch unit 20, a dispatcher 30, a decoder 40, an arithmetic unit A51, an arithmetic unit B52, an arithmetic unit X53, and a signal line 58.
 メモリ60は、スレッドP61、スレッドQ62、スレッドR63を保持している。ここで、スレッドP61、スレッドQ62、スレッドR63は、それぞれプロセッサ10で実行されるコンピュータプログラムである。 The memory 60 holds a thread P61, a thread Q62, and a thread R63. Here, the thread P61, the thread Q62, and the thread R63 are computer programs executed by the processor 10, respectively.
 フェッチユニット20は、メモリ60から、スレッドP61、スレッドQ62、およびスレッドR63の命令をフェッチし、フェッチされた命令を順次ディスパッチャ30へ供給する。 The fetch unit 20 fetches the instructions of the thread P61, the thread Q62, and the thread R63 from the memory 60, and sequentially supplies the fetched instructions to the dispatcher 30.
 ディスパッチャ30は、フェッチユニット20から供給された命令を、所定の順序でディスパッチすることにより、本発明のスレッド調停システムとして機能する。制御テーブル35には、スレッドの調停に関する情報が記録される。ディスパッチャ30からディスパッチされた命令は、デコーダ40に引き渡される。 The dispatcher 30 functions as the thread arbitration system of the present invention by dispatching the instructions supplied from the fetch unit 20 in a predetermined order. Information related to thread arbitration is recorded in the control table 35. The instruction dispatched from the dispatcher 30 is delivered to the decoder 40.
 デコーダ40は、ディスパッチャ30から引き渡された命令をデコードすることにより命令の種類を識別し、識別された命令の種類に応じて、演算器A51、演算器B52、演算器X53のいずれかに、命令の実行ステージの処理を行わせる。 The decoder 40 decodes the instruction delivered from the dispatcher 30 to identify the type of the instruction. Depending on the identified type of instruction, the decoder 40 sends the instruction to any one of the arithmetic unit A51, the arithmetic unit B52, and the arithmetic unit X53. The execution stage process is performed.
 演算器A51、演算器B52、および演算器X53は、命令の実行ステージ(例えば、算術演算、論理演算など)を処理する。 The computing unit A51, the computing unit B52, and the computing unit X53 process instruction execution stages (for example, arithmetic operations, logical operations, etc.).
 プロセッサ10の、フェッチユニット20、ディスパッチャ30、デコーダ40、演算器A51、および演算器B52は、背景技術において説明したように、それぞれが担当する命令のステージを1タイムスロットで処理するように構成され、これらのステージでは、タイムスロットごとに異なるスレッドの命令が整然と処理されていく。 As described in the background art, the fetch unit 20, the dispatcher 30, the decoder 40, the arithmetic unit A51, and the arithmetic unit B52 of the processor 10 are configured to process the stage of the instruction that they are responsible for in one time slot. In these stages, the instructions of different threads are processed in an orderly manner for each time slot.
 そのため、演算器A51または演算器B52で実行ステージが処理される命令については、常に確定した時間で処理が完了する。この動作については、本願発明の範囲ではないため、説明を省略する。 Therefore, for the instruction whose execution stage is processed by the arithmetic unit A51 or the arithmetic unit B52, the processing is always completed in a fixed time. Since this operation is not within the scope of the present invention, a description thereof will be omitted.
 他方、演算器X53は、特定命令の実行ステージを複数のタイムスロットにわたって処理する。一例を挙げるならば、特定命令は除算命令であり、演算器X53は除算命令の実行ステージを処理する除算器であってもよい。 On the other hand, the arithmetic unit X53 processes the execution stage of the specific instruction over a plurality of time slots. For example, the specific instruction may be a division instruction, and the arithmetic unit X53 may be a divider that processes an execution stage of the division instruction.
 本明細書では、実行ステージを複数のタイムスロットにわたって処理される命令のことを、一般に特定命令と呼ぶ。演算器X53は、本発明の共有リソースの一例であり、特定命令の実行ステージを処理するために、各スレッドによって時分割に占有される。 In this specification, an instruction in which an execution stage is processed over a plurality of time slots is generally called a specific instruction. The computing unit X53 is an example of the shared resource of the present invention, and is occupied by each thread in a time division manner in order to process the execution stage of a specific instruction.
 このように構成されたプロセッサ10では、複数のスレッドの特定命令が、上流ステージの処理を終えて、実行ステージの開始を待つ状況が生じ得る。 In the processor 10 configured as described above, there may occur a situation in which specific instructions of a plurality of threads finish processing of the upstream stage and wait for the start of the execution stage.
 ディスパッチャ30は、この状況に対して、第1スレッドの先行する特定命令の実行ステージが終了したときに、第1スレッドの後続の特定命令と第1スレッドとは異なる第2スレッドの特定命令とがそれぞれ実行ステージの開始を待っている場合、第1スレッドの後続の特定命令よりも先に第2スレッドの特定命令をディスパッチする。 In this situation, when the execution stage of the specific instruction preceding the first thread is completed, the dispatcher 30 determines that the specific instruction subsequent to the first thread and the specific instruction of the second thread different from the first thread are present. When waiting for the start of the execution stage, the specific instruction of the second thread is dispatched before the specific instruction of the subsequent thread.
 ディスパッチャ30のこの動作は、スレッド調停システムとして見れば、第1スレッドが共有リソースの使用を終了したときに、第1スレッドと第2スレッドとがそれぞれ共有リソースを使用できる状態にある場合、前記第1スレッドよりも先に前記第2スレッドに前記共有リソースを割り当てる動作に等しい。 In the thread arbitration system, this operation of the dispatcher 30 is performed when the first thread and the second thread can use the shared resource when the first thread finishes using the shared resource. This is equivalent to the operation of allocating the shared resource to the second thread before one thread.
 ディスパッチャ30のより具体的な構成および動作について説明を続ける。 The description of the more specific configuration and operation of the dispatcher 30 will be continued.
 図2は、ディスパッチャ30の具体的な構成の一例を示すブロック図である。この例では、制御テーブル35は、特定命令を一時的に保持できるFIFO(First-In First-Out)35aで構成される。 FIG. 2 is a block diagram showing an example of a specific configuration of the dispatcher 30. In this example, the control table 35 is configured by a FIFO (First-In First-Out) 35 a that can temporarily hold a specific command.
 演算器X53からディスパッチャ30に通知される演算器状態信号は、演算器X53が空いている(IDLE)か、使用中(BUSY)かを示す。 The computing unit status signal notified from the computing unit X53 to the dispatcher 30 indicates whether the computing unit X53 is free (IDLE) or in use (BUSY).
 図3は、ディスパッチャ30の動作の一例を定義する状態遷移図である。図3のEMPTYはFIFO35aが空である状態を示し、EXISTはFIFO35aに1つ以上の特定命令が入っている状態を示す。曲線の矢印は状態遷移を示し、矢印に付した説明は、状態遷移が発生する条件と状態遷移の際にディスパッチャ30が行う動作(行う動作がある場合のみ)とをスラッシュで区切って示している。ディスパッチャ30は、図3に示される状態遷移図に従って、次のように動作する。 FIG. 3 is a state transition diagram that defines an example of the operation of the dispatcher 30. EMPTY in FIG. 3 indicates a state in which the FIFO 35a is empty, and EXIST indicates a state in which one or more specific instructions are contained in the FIFO 35a. Curved arrows indicate state transitions, and the explanation given to the arrows indicates the conditions under which state transitions occur and the operations performed by dispatcher 30 during the state transitions (only when there are operations to be performed) separated by slashes. . The dispatcher 30 operates as follows according to the state transition diagram shown in FIG.
 フェッチユニット20から特定命令が供給される前、FIFO35aが空である(S10)。このとき、フェッチユニット20から特定命令が供給されると、ディスパッチャ30は、その特定命令をFIFO35aに書き込む(S11)。フェッチユニット20からさらに特定命令が供給されると、ディスパッチャ30はその特定命令をFIFO35aに書き込む(S12)。演算器X53がBUSYであれば、FIFO35aに入っている特定命令は、ディスパッチされることなくFIFO35aに保持される(S13)。 Before the specific instruction is supplied from the fetch unit 20, the FIFO 35a is empty (S10). At this time, when a specific instruction is supplied from the fetch unit 20, the dispatcher 30 writes the specific instruction in the FIFO 35a (S11). When a specific instruction is further supplied from the fetch unit 20, the dispatcher 30 writes the specific instruction in the FIFO 35a (S12). If the arithmetic unit X53 is BUSY, the specific instruction in the FIFO 35a is held in the FIFO 35a without being dispatched (S13).
 ディスパッチャ30は、演算器X53がIDLEであれば直ちに、FIFO35aから先頭の特定命令を読み出してディスパッチする(S14、S15)。これにより、最も先に共有リソースである演算器X53を使用できる状態になっていた特定命令が下流ステージへディスパッチされる。ディスパッチャ30がFIFO35aから最後の特定命令を読み出してディスパッチすることで、FIFO35aが空になる(S15)。 If the computing unit X53 is IDLE, the dispatcher 30 immediately reads the first specific instruction from the FIFO 35a and dispatches it (S14, S15). As a result, the specific instruction that has been in the state where the computing unit X53, which is the shared resource, can be used first is dispatched to the downstream stage. When the dispatcher 30 reads and dispatches the last specific instruction from the FIFO 35a, the FIFO 35a becomes empty (S15).
 ディスパッチャ30は、図3の状態遷移図に従って、あるスレッドが共有リソースの使用を終了したときに、2つ以上のスレッドがそれぞれ前記共有リソースを使用できる状態にある場合、前記2つ以上のスレッドのうち最も先に前記共有リソースを使用できる状態になっていたスレッドの特定命令をディスパッチする。 According to the state transition diagram of FIG. 3, when two or more threads are in a state where each of the two or more threads can use the shared resource when a certain thread finishes using the shared resource, the dispatcher 30 The specific instruction of the thread that is in the state where the shared resource can be used first is dispatched.
 図4は、ディスパッチャ30が上述の動作を行う場合の、演算器X53における特定命令の実行ステージの処理状況の一例を示す図である。 FIG. 4 is a diagram showing an example of the processing status of the execution stage of the specific instruction in the computing unit X53 when the dispatcher 30 performs the above-described operation.
 図4には、スレッドPの3つの特定命令P1、P2、P3、スレッドQの3つの特定命令Q1、Q2、Q3、およびスレッドRの2つの特定命令R1、R2のそれぞれの実行ステージの処理状況が示される。特定命令の実行ステージは、上流ステージの処理が終了することでFIFO35aに書き込まれて開始可能(太線)になる。開始可能になった特定命令は、演算器X53が空いていればただちにディスパッチされて実行中(実線の帯)になり、演算器X53が使用中であれば開始待ち(破線の帯)となる。 FIG. 4 shows the processing status of the execution stages of the three specific instructions P1, P2, and P3 of the thread P, the three specific instructions Q1, Q2, and Q3 of the thread Q, and the two specific instructions R1 and R2 of the thread R. Is shown. The execution stage of the specific instruction can be started (thick line) by being written in the FIFO 35a when the processing of the upstream stage is completed. The specific instruction that can be started is immediately dispatched and executed (solid-line band) if the arithmetic unit X53 is free, and waits for start (broken-line band) if the arithmetic unit X53 is in use.
 また、図4には、それぞれの特定命令が開始可能になってから実行完了までに要する時間の上限である保証時間が矢印で示されている。この保証時間は、特定命令の処理時間を保証する必要があるスレッドの数に、特定命令の実行ステージを処理するためのタイムスロット数を乗じた数のタイムスロットで表される。 Further, in FIG. 4, the guaranteed time, which is the upper limit of the time required from the start of each specific command to the completion of execution, is indicated by an arrow. This guaranteed time is represented by a number of time slots obtained by multiplying the number of threads that need to guarantee the processing time of a specific instruction by the number of time slots for processing the execution stage of the specific instruction.
 ここでは、3つのスレッドP、Q、Rの特定命令の処理時間を保証することができ、特定命令の実行ステージに3タイムスロットを要するとして、保証時間は9タイムスロットである。 Here, the processing time of the specific instruction of the three threads P, Q, and R can be guaranteed, and the guaranteed time is 9 time slots, assuming that 3 time slots are required for the execution stage of the specific instruction.
 この動作の特徴は、タイムスロット4に見られるように、スレッドPが共有リソースの使用を終了したときに、スレッドPと、スレッドPとは異なるスレッドQとが共有リソースを使用できる状態になっている場合、スレッドPよりも先にスレッドQに共有リソースが割り当てられることにある。 The feature of this operation is that, as seen in the time slot 4, when the thread P finishes using the shared resource, the thread P and a thread Q different from the thread P can use the shared resource. The shared resource is allocated to the thread Q before the thread P.
 このような動作は、あるスレッド(スレッドP)が共有リソースの使用を終了したときに、ディスパッチャ30が、前記共有リソースを使用できる状態になっている2つ以上のスレッド(スレッドP、Q、R)のうち最も先に前記共有リソースを使用できる状態になっていたスレッド(スレッドQ)に共有リソースを割り当てることで実現されている。 Such an operation is performed when two or more threads (threads P, Q, R) in which the dispatcher 30 can use the shared resource when a certain thread (thread P) finishes using the shared resource. ), The shared resource is allocated to a thread (thread Q) that has been in a state where the shared resource can be used first.
 これにより、共有リソースを使用できる状態になっている複数のスレッドがある場合に、いずれか1つのスレッドが共有リソースを使い続けることがないので、共有リソースはすべてのスレッドに満遍なく割り当てられることになり、その結果、各スレッドの特定命令の処理に要する時間を保証できる。 As a result, if there are multiple threads that can use the shared resource, any one thread will not continue to use the shared resource, and the shared resource will be assigned to all threads evenly. As a result, the time required for processing the specific instruction of each thread can be guaranteed.
 (実施の形態2)
 図5は、本発明の実施の形態2に係るスレッド調停システムとしてのディスパッチャ31を備えるプロセッサ11の機能的な構成の一例を示すブロック図である。プロセッサ11は、実施の形態1のプロセッサ10と比べて、制御テーブル36の内容およびディスパッチャ31の動作が異なる。
(Embodiment 2)
FIG. 5 is a block diagram illustrating an example of a functional configuration of the processor 11 including the dispatcher 31 as the thread arbitration system according to the second embodiment of the present invention. The processor 11 differs from the processor 10 of the first embodiment in the contents of the control table 36 and the operation of the dispatcher 31.
 以下、実施の形態1で説明した構成要素と同一の構成要素には同一の符号を付して適宜説明を省略し、実施の形態1との相違点について主に説明する。 Hereinafter, the same components as those described in the first embodiment are denoted by the same reference numerals, and description thereof will be omitted as appropriate, and differences from the first embodiment will be mainly described.
 ディスパッチャ31のスレッド調停システムとしての動作は、第1スレッドが共有リソースの使用を終了したときに、第1スレッドと第2スレッドとがそれぞれ共有リソースを使用できる状態にある場合、前記第1スレッドよりも先に前記第2スレッドに前記共有リソースを割り当てる点で、実施の形態1のディスパッチャ30と共通している。 The operation of the dispatcher 31 as a thread arbitration system is that the first thread and the second thread can use the shared resource when the first thread finishes using the shared resource. This is common to the dispatcher 30 of the first embodiment in that the shared resource is allocated to the second thread first.
 ただし、ディスパッチャ31は、上述の動作を行うために、他のスレッドによって実行ステージの開始を待たされたスレッドが、自らの実行ステージが終了するまで、自らを待たせたスレッドの次の実行ステージの開始を規制(インヒビット)するという点で、実施の形態1のディスパッチャ30とは異なっている。 However, in order for the dispatcher 31 to perform the above-described operation, the thread that waits for the start of the execution stage by another thread waits until the execution stage of the thread that has waited for itself is completed. It differs from the dispatcher 30 of the first embodiment in that the start is restricted (inhibited).
 制御テーブル36は、個々のスレッドに対応して、命令状態欄36a、特定命令欄36b、インヒビタ欄36cを有している。制御テーブル36の各欄は、例えばレジスタを用いて構成される。 The control table 36 has a command status column 36a, a specific command column 36b, and an inhibitor column 36c corresponding to each thread. Each column of the control table 36 is configured using a register, for example.
 命令状態欄36aには、特定命令の実行ステージが演算器X53で実行中(EXEC)、開始待ち(READY)、または実行すべき特定命令がない(NONE)ことを示す情報が保持される。特定命令欄36bには、実行ステージの開始待ちまたは実行中の特定命令が保持される。インヒビタ欄36cには、そのインヒビタ欄36cに対応するスレッドによって実行ステージの開始が待たされた他のスレッドを識別する情報が保持される。インヒビタ欄36cに対応するスレッドの特定命令の実行ステージの開始は、インヒビタ欄36cに記録されたスレッドによって規制される。 The instruction status column 36a holds information indicating that the execution stage of the specific instruction is being executed by the computing unit X53 (EXEC), waiting for start (READY), or there is no specific instruction to be executed (NONE). The specific command column 36b holds a specific command waiting for the execution stage to be started or being executed. In the inhibitor column 36c, information for identifying another thread that has waited for the start of the execution stage by the thread corresponding to the inhibitor column 36c is held. The start of the execution stage of the specific instruction of the thread corresponding to the inhibitor column 36c is regulated by the thread recorded in the inhibitor column 36c.
 演算器X53からディスパッチャ31に通知される演算器状態信号は、演算器X53が空いている(IDLE)か、使用中(BUSY)かを示す。 The computing unit status signal notified from the computing unit X53 to the dispatcher 31 indicates whether the computing unit X53 is free (IDLE) or in use (BUSY).
 図6は、このように構成されたディスパッチャ31の動作の一例を定義する状態遷移図である。ディスパッチャ31は、図6の状態遷移図で定義される動作を、複数のスレッドのそれぞれを対象として並行して行う。図6のNONE、READY、EXECは、動作の対象とるスレッドの命令状態欄36aの内容を示す。 FIG. 6 is a state transition diagram defining an example of the operation of the dispatcher 31 configured as described above. The dispatcher 31 performs the operations defined in the state transition diagram of FIG. 6 in parallel for each of a plurality of threads. NONE, READY, and EXEC in FIG. 6 indicate the contents of the instruction status column 36a of the thread to be operated.
 フェッチユニット20から特定命令が供給される前、命令状態欄36aがNONEである(S20)。このとき、フェッチユニット20から対象スレッドの特定命令が供給されると、ディスパッチャ31は、命令状態欄36aをREADYにし、その特定命令を特定命令欄36bに記録する(S21)。 Before the specific command is supplied from the fetch unit 20, the command status column 36a is NONE (S20). At this time, when the specific instruction of the target thread is supplied from the fetch unit 20, the dispatcher 31 sets the instruction state column 36a to READY and records the specific instruction in the specific instruction column 36b (S21).
 ディスパッチャ31は、演算器X53がBUSYであれば、命令状態欄36aがEXECまたはREADYである他のスレッドのインヒビタ欄36cに対象スレッドを識別する情報を記録することで、他のスレッドの実行ステージの開始を規制する(S22)。 If the computing unit X53 is BUSY, the dispatcher 31 records information for identifying the target thread in the inhibitor column 36c of the other thread whose instruction status column 36a is EXEC or READY, so that the execution stage of the other thread is recorded. The start is restricted (S22).
 ディスパッチャ31は、演算器X53がIDLEであっても、対象スレッドのインヒビタ欄36cが空でない、つまり他のスレッドから開始を規制されていれば、特定命令欄36bに記録されている特定命令をディスパッチせずに待つ(S23)。 Even if the computing unit X53 is IDLE, the dispatcher 31 dispatches the specific instruction recorded in the specific instruction column 36b if the inhibitor column 36c of the target thread is not empty, that is, the start is restricted by another thread. It waits without doing (S23).
 ディスパッチャ31は、演算器X53がIDLEであり、かつ対象スレッドのインヒビタ欄36cが空、つまり他のスレッドから開始を規制されていなければ、特定命令欄36bに記録されている特定命令をディスパッチし、命令状態欄36aをEXECにする(S24)。 The dispatcher 31 dispatches the specific instruction recorded in the specific instruction column 36b if the computing unit X53 is IDLE and the inhibitor column 36c of the target thread is empty, that is, if the start is not restricted by another thread, The instruction status column 36a is set to EXEC (S24).
 その後、演算器X53がIDLEになると、ディスパッチャ31は、他のスレッドのインヒビタ欄36cから、対象スレッドを識別する情報を削除することで、他のスレッドに対する規制を解除する。そして、フェッチユニット20から対象スレッドの次の特定命令が供給されている場合は、ディスパッチャ31は、命令状態欄36aをREADYにし、その特定命令を特定命令欄36bに記録し(S25)、次の特定命令がなければ、ディスパッチャ31は、命令状態欄36aをNONEにする(S26)。 Thereafter, when the computing unit X53 becomes IDLE, the dispatcher 31 removes the information for identifying the target thread from the inhibitor column 36c of the other thread, thereby releasing the restriction on the other thread. If the next specific instruction of the target thread is supplied from the fetch unit 20, the dispatcher 31 sets the instruction state column 36a to READY, records the specific instruction in the specific instruction column 36b (S25), and If there is no specific command, the dispatcher 31 sets the command status column 36a to NONE (S26).
 このような動作が、複数のスレッドのそれぞれを対象として並行して行われることで、他のスレッドによって実行ステージの開始を待たされたスレッドが、自らの実行ステージが終了するまで、自らを待たせたスレッドの次の実行ステージの開始を規制する全体的な動作が実現される。 Such an operation is performed in parallel for each of a plurality of threads, so that a thread that has waited for the start of the execution stage by another thread waits for itself until the end of its execution stage. The overall operation of regulating the start of the next execution stage of the thread is realized.
 ディスパッチャ31は、図6の状態遷移図に従って、第1スレッドが共有リソースを使用中に、前記第1スレッドとは異なる第2スレッドが前記共有リソースを使用できる状態になった場合、前記第1スレッドの後続する特定命令を、前記第2スレッドが前記共有リソースの使用を終了した後にディスパッチする。 When the first thread is using the shared resource and the second thread different from the first thread can use the shared resource according to the state transition diagram of FIG. Are dispatched after the second thread has finished using the shared resource.
 図7は、ディスパッチャ31が上述の動作を行う場合の、演算器X53における特定命令の実行ステージの処理状況の一例を示す図である。 FIG. 7 is a diagram showing an example of the processing status of the execution stage of the specific instruction in the computing unit X53 when the dispatcher 31 performs the above-described operation.
 図7には、スレッドPの3つの特定命令P1、P2、P3、スレッドQの3つの特定命令Q1、Q2、Q3、およびスレッドRの2つの特定命令R1、R2のそれぞれの実行ステージの処理状況が示される。特定命令の実行ステージは、上流ステージの処理が終了することで特定命令欄36bに書き込まれて開始可能(太線)になる。開始可能になった特定命令は、演算器X53が空いていて、かつ他のスレッドから開始を規制されていなければディスパッチされて実行中(実線の帯)になり、演算器X53が使用中であれば開始待ち(破線の白帯)となり、他のスレッドから開始を規制されていれば開始規制中(破線の斜線帯)となる。開始規制中のスレッドに表示された括弧付きの符号は、そのスレッドの開始を規制しているスレッドを示している。 FIG. 7 shows the processing status of the execution stages of the three specific instructions P1, P2, and P3 of the thread P, the three specific instructions Q1, Q2, and Q3 of the thread Q, and the two specific instructions R1 and R2 of the thread R. Is shown. The execution stage of the specific instruction can be started (thick line) by being written in the specific instruction column 36b when the processing of the upstream stage is completed. A specific instruction that can be started is dispatched and executed (solid-line band) unless the operation unit X53 is free and the start of the operation is not restricted by another thread, and the operation unit X53 is in use. If the start is restricted by another thread, the start is being restricted (dashed hatched band). The reference numerals in parentheses displayed for the thread whose start is restricted indicate the thread that restricts the start of the thread.
 また、図7には、それぞれの特定命令が開始可能になってから実行完了までに要する時間の上限である保証時間が矢印で示されている。この保証時間は、図4で説明した保証時間と同じものである。 Also, in FIG. 7, the guaranteed time, which is the upper limit of the time required from the start of each specific command to the completion of execution, is indicated by an arrow. This guarantee time is the same as the guarantee time described in FIG.
 この動作の特徴は、タイムスロット4に見られるように、スレッドPが共有リソースの使用を終了したときに、スレッドPと、スレッドPとは異なるスレッドQとが共有リソースを使用できる状態になっている場合、スレッドPよりも先にスレッドQに共有リソースが割り当てられることにある。 The feature of this operation is that, as seen in the time slot 4, when the thread P finishes using the shared resource, the thread P and a thread Q different from the thread P can use the shared resource. The shared resource is allocated to the thread Q before the thread P.
 このような動作は、第1スレッドが共有リソースを使用中に、前記第1スレッド(スレッドP)とは異なる第2スレッド(スレッドQ、R)が前記共有リソースを使用できる状態になった場合、ディスパッチャ31が、前記第1スレッド(スレッドP)の後続する特定命令を、前記第2スレッド(スレッドQ、R)が前記共有リソースの使用を終了した後にディスパッチすることで実現されている。 Such an operation is performed when the second thread (threads Q and R) different from the first thread (thread P) can use the shared resource while the first thread is using the shared resource. The dispatcher 31 is realized by dispatching a specific instruction that follows the first thread (thread P) after the second thread (threads Q and R) has finished using the shared resource.
 これにより、共有リソースを使用できる状態になっている複数のスレッドがある場合に、いずれか1つのスレッドが共有リソースを使い続けることがないので、共有リソースはすべてのスレッドに満遍なく割り当てられることになり、その結果、各スレッドの特定命令の処理に要する時間を保証できる。 As a result, if there are multiple threads that can use the shared resource, any one thread will not continue to use the shared resource, and the shared resource will be assigned to all threads evenly. As a result, the time required for processing the specific instruction of each thread can be guaranteed.
 (実施の形態3)
 図8は、本発明の実施の形態3に係るスレッド調停システムとしてのディスパッチャ32を備えるプロセッサ12の機能的な構成の一例を示すブロック図である。プロセッサ12は、実施の形態1のプロセッサ10と比べて、制御テーブル37の内容およびディスパッチャ32の動作が異なる。
(Embodiment 3)
FIG. 8 is a block diagram illustrating an example of a functional configuration of the processor 12 including the dispatcher 32 as the thread arbitration system according to the third embodiment of the present invention. The processor 12 differs from the processor 10 of the first embodiment in the contents of the control table 37 and the operation of the dispatcher 32.
 以下、実施の形態1で説明した構成要素と同一の構成要素には同一の符号を付して適宜説明を省略し、実施の形態1および実施の形態2との相違点について主に説明する。 Hereinafter, the same components as those described in the first embodiment will be denoted by the same reference numerals, and description thereof will be omitted as appropriate, and differences from the first and second embodiments will be mainly described.
 ディスパッチャ32のスレッド調停システムとしての動作は、実施の形態1のディスパッチャ30および実施の形態2のディスパッチャ31と比べて、複数のスレッドのそれぞれに優先度を定め、優先度に基づいてスレッド調停を行う点が異なる。 In the operation of the dispatcher 32 as a thread arbitration system, a priority is set for each of a plurality of threads and the thread arbitration is performed based on the priority, as compared with the dispatcher 30 of the first embodiment and the dispatcher 31 of the second embodiment. The point is different.
 ディスパッチャ32は、あるスレッドの特定命令の実行ステージの処理中に、優先度がより高いスレッドの特定命令の実行ステージが開始可能になった場合、処理中の実行ステージを止めて、優先度がより高いスレッドの特定命令の実行ステージを開始させる割り込み制御を行う。 If the execution stage of a specific instruction of a thread with a higher priority can be started during the processing of the execution stage of a specific instruction of a thread, the dispatcher 32 stops the execution stage being processed and the priority becomes higher. Interrupt control that starts the execution stage of a specific instruction of a high thread is performed.
 ディスパッチャ32は、優先度が同一の複数のスレッド間では、実施の形態1のディスパッチャ30または実施の形態2のディスパッチャ31と同等のスレッド調停動作をするものとし、以下では、ディスパッチャ32によって行われる割り込み制御について詳細に説明する。 The dispatcher 32 performs a thread arbitration operation equivalent to that of the dispatcher 30 of the first embodiment or the dispatcher 31 of the second embodiment between a plurality of threads having the same priority. Hereinafter, an interrupt performed by the dispatcher 32 is performed. The control will be described in detail.
 制御テーブル37は、個々のスレッドに対応して、命令状態欄37a、特定命令欄37b、優先度欄37cを有している。制御テーブル37の各欄は、例えばレジスタを用いて構成される。 The control table 37 has a command status column 37a, a specific command column 37b, and a priority column 37c corresponding to each thread. Each column of the control table 37 is configured using a register, for example.
 命令状態欄37aには、特定命令の実行ステージが演算器X53で実行中(EXEC)、開始待ち(READY)、または実行すべき特定命令がない(NONE)ことを示す情報が保持される。特定命令欄37bには、実行ステージの開始待ちまたは実行中の特定命令が保持される。優先度欄37cには、そのスレッドの優先度を示す値が保持される。この値は小さいほど高い優先度を示す。優先度の最大数は限定されない。 The instruction status column 37a holds information indicating that the execution stage of the specific instruction is being executed by the computing unit X53 (EXEC), waiting for start (READY), or there is no specific instruction to be executed (NONE). The specific command column 37b holds a specific command waiting for the start of the execution stage or being executed. The priority column 37c holds a value indicating the priority of the thread. The smaller this value, the higher the priority. The maximum number of priorities is not limited.
 演算器X53からディスパッチャ32に通知される演算器状態信号は、演算器X53が空いている(IDLE)か、使用中(BUSY)かを示す。 The computing unit status signal notified from the computing unit X53 to the dispatcher 32 indicates whether the computing unit X53 is free (IDLE) or in use (BUSY).
 図9は、このように構成されたディスパッチャ32の動作の一例を定義する状態遷移図である。ディスパッチャ32は、図9の状態遷移図で定義される動作を、複数のスレッドのそれぞれを対象として並行して行う。図9のNONE、READY、EXECは、動作の対象とるスレッドの命令状態欄37aの内容を示す。 FIG. 9 is a state transition diagram defining an example of the operation of the dispatcher 32 configured as described above. The dispatcher 32 performs the operations defined in the state transition diagram of FIG. 9 in parallel for each of a plurality of threads. NONE, READY, and EXEC in FIG. 9 indicate the contents of the instruction status column 37a of the thread to be operated.
 フェッチユニット20から特定命令が供給される前、命令状態欄37aがNONEである(S30)。このとき、フェッチユニット20から対象スレッドの特定命令が供給されると、ディスパッチャ32は、命令状態欄37aをREADYにし、その特定命令を特定命令欄37bに記録する(S31)。 Before the specific command is supplied from the fetch unit 20, the command status column 37a is NONE (S30). At this time, when the specific instruction of the target thread is supplied from the fetch unit 20, the dispatcher 32 sets the instruction state column 37a to READY and records the specific instruction in the specific instruction column 37b (S31).
 演算器X53がBUSYであれば、ディスパッチャ32は、命令状態欄37aがACTIVEである他のスレッド(つまり演算器X53を現在使用しているスレッド)の優先度と対象スレッドの優先度とを、優先度欄37cの値に基づいて比較する。そして演算器X53を使用しているスレッドが、対象スレッドの優先度と等しい優先度を持つ等位スレッドであるか、より高い優先度を持つ上位スレッドである場合は、対象スレッドの特定命令をディスパッチせずに待つ(S32)。 If the arithmetic unit X53 is BUSY, the dispatcher 32 prioritizes the priority of the other thread (that is, the thread currently using the arithmetic unit X53) whose instruction state column 37a is ACTIVE and the priority of the target thread. Comparison is made based on the value in the degree column 37c. If the thread using the arithmetic unit X53 is an equal thread having a priority equal to the priority of the target thread or an upper thread having a higher priority, a specific instruction of the target thread is dispatched. It waits without doing (S32).
 ディスパッチャ32は、演算器X53がIDLEであれば、特定命令欄37bに記録されている特定命令をディスパッチして、命令状態欄37aをEXECにする(S33)。 If the computing unit X53 is IDLE, the dispatcher 32 dispatches the specific instruction recorded in the specific instruction column 37b and sets the instruction state column 37a to EXEC (S33).
 演算器X53がBUSYであって、かつ対象スレッドよりも優先度が低い下位スレッドによって使用されている場合、ディスパッチャ32は、演算器X53で現在処理中の特定命令の実行ステージが終了するのを待つことなく、特定命令欄37bに記録されている特定命令をディスパッチし、命令状態欄37aをEXECにする(S34)。 When the computing unit X53 is BUSY and is used by a lower thread having a lower priority than the target thread, the dispatcher 32 waits for the execution stage of the specific instruction currently being processed by the computing unit X53 to end. Instead, the specific instruction recorded in the specific instruction column 37b is dispatched, and the instruction state column 37a is set to EXEC (S34).
 演算器X53は、新たな特定命令がディスパッチされることによって、現在処理中の特定命令の実行ステージを中止して、当該新たな特定命令の実行ステージの処理を開始する。 When the new specific instruction is dispatched, the arithmetic unit X53 stops the execution stage of the specific instruction currently being processed and starts processing the execution stage of the new specific instruction.
 命令状態欄37aがEXECのとき、優先度がより高い上位スレッドがREADYになった場合、その上位スレッドから割り込まれることで演算器X53における処理が中止されるため、ディスパッチャ32は、命令状態欄37aをREADYにする(S35)。 When the instruction state column 37a is EXEC, when a higher-order thread having a higher priority becomes READY, the processing in the computing unit X53 is interrupted by being interrupted by the higher-level thread, so that the dispatcher 32 has the instruction state column 37a. Is set to READY (S35).
 命令状態欄37aがEXECのとき、演算器X53がIDLEになった場合、つまり、演算器X53における処理が完了した場合は、フェッチユニット20から対象スレッドの次の特定命令が供給されていれば、ディスパッチャ32は、命令状態欄37aをREADYにし、その特定命令を特定命令欄37bに記録し(S36)、次の特定命令がなければ、ディスパッチャ32は、命令状態欄37aをNONEにする(S37)。 When the instruction status column 37a is EXEC and the computing unit X53 becomes IDLE, that is, when the processing in the computing unit X53 is completed, if the next specific instruction of the target thread is supplied from the fetch unit 20, The dispatcher 32 sets the instruction status column 37a to READY, records the specific command in the specific command column 37b (S36), and if there is no next specific command, the dispatcher 32 sets the command status column 37a to NONE (S37). .
 このような動作が、複数のスレッドのそれぞれを対象として並行して行われることで、あるスレッドの特定命令の実行ステージの処理中に、優先度がより高いスレッドの特定命令の実行ステージが開始可能になった場合、処理中の実行ステージを止めて、優先度がより高いスレッドの特定命令の実行ステージを開始させる割り込み制御が実現される。 By performing such operations in parallel for each of multiple threads, the execution stage of a specific instruction of a thread with a higher priority can be started during the processing of the execution stage of a specific instruction of a thread. In this case, interrupt control is realized in which the execution stage being processed is stopped and the execution stage of a specific instruction of a thread having a higher priority is started.
 図10は、ディスパッチャ32が上述の動作を行う場合の、演算器X53における特定命令の実行ステージの処理状況の一例を示す図である。 FIG. 10 is a diagram showing an example of the processing status of the execution stage of the specific instruction in the computing unit X53 when the dispatcher 32 performs the above-described operation.
 図10には、スレッドPの3つの特定命令P1、P2、P3、スレッドQの3つの特定命令Q1、Q2、Q3、およびスレッドRの2つの特定命令R1、R2のそれぞれの実行ステージの処理状況が示される。ここでは、スレッドP、Qの優先度が、スレッドRの優先度よりも高いとする。 FIG. 10 shows the processing status of the execution stages of the three specific instructions P1, P2, and P3 of the thread P, the three specific instructions Q1, Q2, and Q3 of the thread Q, and the two specific instructions R1 and R2 of the thread R. Is shown. Here, it is assumed that the priority of the threads P and Q is higher than the priority of the thread R.
 特定命令の実行ステージは、上流ステージの処理が終了することで特定命令欄36bに書き込まれて開始可能(太線)になる。開始可能になった特定命令は、演算器X53が空いていれば直ちにディスパッチされて実行中になるか、または、演算器X53が低位のスレッドで使用中であれば割り込んでディスパッチされて実行中(実線の帯)になり、演算器X53が等位または上位のスレッドで使用中であれば開始待ち(破線の白帯)となる。割り込まれた低位のスレッドの実行ステージは中止され(破線の縦縞帯)、後に再びディスパッチされる。 The execution stage of the specific instruction can be started (thick line) by being written in the specific instruction column 36b when the processing of the upstream stage is completed. The specific instruction that can be started is immediately dispatched and being executed if the computing unit X53 is free, or is interrupted and dispatched and running if the computing unit X53 is being used by a lower-level thread ( If the arithmetic unit X53 is being used by an equal or higher-level thread, the operation waits for start (dashed white line). The execution stage of the interrupted lower thread is aborted (dashed vertical stripes) and later dispatched again.
 再びディスパッチされた特定命令の実行ステージは、最初からやり直しされてもよい。また、特定命令の実行ステージが中止される時点での途中経過(共有リソースの状態)を退避用のリソース(例えば図示されていないレジスタ)に保持しておき、その特定命令が再びディスパッチされたときに、退避用のリソースに保持されている途中経過を共有リソースへ復帰して、続きを処理してもよい。退避用のリソースは、優先度の最大数から1少ない数だけ設ければよい。 The execution stage of the specific instruction dispatched again may be restarted from the beginning. Also, when the execution stage of a specific instruction is canceled and the progress (shared resource state) is held in a save resource (for example, a register not shown), and the specific instruction is dispatched again In addition, the intermediate process held in the save resource may be returned to the shared resource to continue processing. It is only necessary to provide a number of resources for saving by one less than the maximum number of priorities.
 また、図10には、それぞれの特定命令が開始可能になってから実行完了までに要する時間の上限である保証時間が矢印で示されている。この保証時間は、特定命令の処理時間を保証する必要があるスレッドの数に、特定命令の実行ステージを処理するためのタイムスロット数を乗じた数のタイムスロットで表される。 Also, in FIG. 10, the guaranteed time, which is the upper limit of the time required from the start of each specific command to the completion of execution, is indicated by an arrow. This guaranteed time is represented by a number of time slots obtained by multiplying the number of threads that need to guarantee the processing time of a specific instruction by the number of time slots for processing the execution stage of the specific instruction.
 ここでは、最上位の優先度を持つ2つのスレッドP、Qの特定命令の処理時間を保証することができ、特定命令の実行ステージに3タイムスロットを要するとして、保証時間は6タイムスロットである。実施の形態1および実施の形態2の例と比べると、処理時間を保証するスレッドが減ることで、保証時間が短縮される。 Here, the processing time of the specific instruction of the two threads P and Q having the highest priority can be guaranteed, and the guaranteed time is 6 time slots, assuming that the execution stage of the specific instruction requires 3 time slots. . Compared to the example of the first embodiment and the second embodiment, the guaranteed time is shortened by reducing the number of threads that guarantee the processing time.
 この動作の特徴は、タイムスロット17およびタイムスロット22に見られるように、上位のスレッドが低位のスレッドに割り込んで共有リソースを獲得する割り込み制御にある。このような動作のため、低位のスレッドの特定命令の処理時間は保証されない代わりに、上位のスレッドの保証時間が短縮される。 The feature of this operation is the interrupt control in which the upper thread interrupts the lower thread to acquire the shared resource as seen in the time slot 17 and the time slot 22. Due to such an operation, the processing time of the specific instruction of the lower thread is not guaranteed, but the guaranteed time of the upper thread is shortened.
 (実施の形態4)
 上記で説明したプロセッサ10、11、12は、それぞれ特有のスレッド調停システムとしてのディスパッチャ30、31、32を備え、複数のスレッドの特定命令の処理時間を保証できることから、複数のスレッドのそれぞれが実時間性を要求される処理を行う応用にとって極めて有用である。
(Embodiment 4)
The processors 10, 11, and 12 described above include dispatchers 30, 31, and 32 as specific thread arbitration systems, respectively, and can guarantee the processing time of specific instructions of a plurality of threads. This is extremely useful for applications that perform processing requiring time.
 本発明の実施の形態4では、そのような応用の一例としてのプロセッサシステムおよび映像記録再生装置について説明する。 In Embodiment 4 of the present invention, a processor system and a video recording / reproducing apparatus as examples of such applications will be described.
 図11は、本発明の実施の形態4に係る、プロセッサ10、11、または12を用いたプロセッサシステム100の機能的な構成の一例を示すブロック図である。 FIG. 11 is a block diagram showing an example of a functional configuration of the processor system 100 using the processors 10, 11, or 12 according to the fourth embodiment of the present invention.
 プロセッサシステム100は、映像音声ストリームに関する様々な信号処理を行うシステムLSIであり、前述したプロセッサ10、11、または12を備えている。プロセッサシステム100は、例えば、映像記録再生装置に用いられる。 The processor system 100 is a system LSI that performs various signal processing relating to a video / audio stream, and includes the processors 10, 11, or 12 described above. The processor system 100 is used in, for example, a video recording / reproducing apparatus.
 図12は、プロセッサシステム100を用いた映像記録再生装置200の外観の一例を示す図である。映像記録再生装置200は、一つの典型例として、放送波から映像音声ストリームを取得し、当該映像音声ストリームで表される放送番組を記録しながら、当該放送番組を表示装置201に表示するといった処理を行う。 FIG. 12 is a diagram showing an example of the appearance of the video recording / reproducing apparatus 200 using the processor system 100. As a typical example, the video recording / playback apparatus 200 acquires a video / audio stream from a broadcast wave, and displays the broadcast program on the display device 201 while recording the broadcast program represented by the video / audio stream. I do.
 図11に示されるように、プロセッサシステム100は、プロセッサ10、ストリームI/Oブロック71、AVIO(Audio Visual Input Output)ブロック72、およびメモリIFブロック73を備える。 As shown in FIG. 11, the processor system 100 includes a processor 10, a stream I / O block 71, an AVIO (Audio Visual Input Output) block 72, and a memory IF block 73.
 プロセッサシステム100は、放送番組を表示装置201に表示するため、例えば、ストリームI/Oブロック71で放送波から映像音声ストリームを取得し、プロセッサ10で映像音声ストリームを映像音声データに伸長し、AVIOブロック72で映像音声データから映像音声信号を生成して表示装置201に出力する。 In order to display the broadcast program on the display device 201, for example, the processor system 100 acquires a video / audio stream from a broadcast wave by the stream I / O block 71, decompresses the video / audio stream into video / audio data by the processor 10, and In block 72, a video / audio signal is generated from the video / audio data and output to the display device 201.
 また、プロセッサシステム100は、放送番組を前記の表示と並行して記録するため、例えば、プロセッサ10で前記映像音声データを記録用のフォーマットに圧縮し、圧縮された映像音声データを、メモリIFブロック73を介して、外部メモリ60に記録する。 Further, the processor system 100 records the broadcast program in parallel with the display. For example, the processor 10 compresses the video / audio data into a recording format by the processor 10 and stores the compressed video / audio data in the memory IF block. The data is recorded in the external memory 60 via 73.
 このような処理において、放送番組の表示および記録の欠落(いわゆるコマ落ち)を防ぐために、プロセッサ10で行われる映像音声ストリームの伸長処理および映像音声データの圧縮処理のそれぞれの所要時間が、正確に見積もられる必要がある。 In such a process, in order to prevent the display and recording of the broadcast program from being lost (so-called frame dropping), each time required for the decompression process of the video / audio stream and the compression process of the video / audio data performed by the processor 10 is accurately determined. Need to be estimated.
 そこで、映像音声ストリームの伸長処理を含む映像再生処理および映像音声データの圧縮処理を含む映像記録処理のそれぞれをスレッドとしてプロセッサ10で実行することで、命令の処理時間を保証する。これにより、映像音声ストリームの伸長処理(広くは、映像表示処理)および映像音声データの圧縮処理(広くは、映像記録処理)のそれぞれの所要時間を正確に見積もることが可能となる。 Therefore, the processing time of the instruction is guaranteed by executing the video reproduction processing including the video / audio stream decompression processing and the video recording processing including the video / audio data compression processing by the processor 10 as threads. This makes it possible to accurately estimate the required time for the video / audio stream decompression processing (generally video display processing) and video / audio data compression processing (generally video recording processing).
 (比較例との対比による効果の説明)
 本発明の実施の形態とは異なる構成によって複数のスレッドの特定命令の処理時間を保証するプロセッサを比較例に用いて、本発明の実施の形態に係るスレッド調停システムの優位性について、さらに説明する。
(Explanation of effect by comparison with comparative example)
The superiority of the thread arbitration system according to the embodiment of the present invention will be further described by using, as a comparative example, a processor that guarantees the processing time of specific instructions of a plurality of threads by a configuration different from the embodiment of the present invention. .
 図13は、比較例に係るプロセッサの機能的な構成の一例を示すブロック図である。このプロセッサは、処理できるスレッドと同数の演算器を持っている。このように構成されたプロセッサでは、複数のスレッドがそれぞれ演算器を占有できるので、スレッドの処理時間を保証することができる。しかしながら、スレッド数が変わると演算器の数を変える必要があり、かつプロセッサの面積や消費電力が増加するといった不利がある。 FIG. 13 is a block diagram illustrating an example of a functional configuration of a processor according to a comparative example. This processor has as many computing units as threads that can be processed. In the processor configured as described above, each of the plurality of threads can occupy the arithmetic unit, so that the processing time of the thread can be guaranteed. However, when the number of threads changes, it is necessary to change the number of computing units, and there are disadvantages that the area of the processor and the power consumption increase.
 図14は、他の比較例に係るプロセッサの機能的な構成の一例を示すブロック図である。このプロセッサは、実行ステージが処理できるスレッドと同数のステージに分割されている。このように構成されたプロセッサでは、複数のスレッドがそれぞれ分割されたステージを占有して処理されるので、スレッドの処理時間を保証することができる。しかしながら、スレッド数が変わるとステージの数を変える必要があり、かつプロセッサの面積や消費電力が増加するといった不利がある。 FIG. 14 is a block diagram illustrating an example of a functional configuration of a processor according to another comparative example. The processor is divided into as many stages as threads that can be processed by the execution stage. In the processor configured as described above, since a plurality of threads are processed while occupying the divided stages, the processing time of the threads can be guaranteed. However, when the number of threads changes, it is necessary to change the number of stages, and there are disadvantages that the area of the processor and the power consumption increase.
 これらのプロセッサによれば、スレッドの数と同数の演算器を設け、またステージを分割する必要がある点で構成の柔軟性に欠け、またプロセッサの面積や消費電力が増加することから、各スレッドの処理時間を保証するために十分に満足できる解決策を与えるものではない。 According to these processors, the number of arithmetic units equal to the number of threads is provided, the flexibility of the configuration is lacking in that it is necessary to divide the stage, and the area and power consumption of the processor increase. Does not give a sufficiently satisfactory solution to guarantee the processing time.
 これらのプロセッサと比較して、本発明の実施の形態に係るスレッド調停システムを備えるプロセッサでは、演算器X53は1つでよく、また実行ステージが分割される数も固定でよい。その上で、各スレッドの特定命令の実行順序を制御することで各スレッドの処理時間を保証するので、比較例のプロセッサと比べてプロセッサの面積や消費電力の増加も抑制できるという利点がある。 Compared with these processors, a processor including the thread arbitration system according to the embodiment of the present invention may have only one arithmetic unit X53, and the number of execution stages divided may be fixed. In addition, since the processing time of each thread is guaranteed by controlling the execution order of specific instructions of each thread, there is an advantage that an increase in the area and power consumption of the processor can be suppressed as compared with the processor of the comparative example.
 本発明にかかるスレッド調停システムは、マルチスレッドプロセッサ、映像記録再生装置などにおいて、複数のスレッドそれぞれの処理時間を保証する必要がある応用に有用である。 The thread arbitration system according to the present invention is useful for applications where it is necessary to guarantee the processing time of each of a plurality of threads in a multi-thread processor, a video recording / reproducing apparatus, and the like.
   10、11、12  プロセッサ
   20  フェッチユニット
   30、31、32  ディスパッチャ
   35、36、37  制御テーブル
   40  デコーダ
   51  演算器A
   52  演算器B
   53  演算器X
   58  信号線
   59  信号線
   60  メモリ
   61  スレッドP
   62  スレッドQ
   63  スレッドR
   71  ストリームI/Oブロック
   72  AVIOブロック
   73  メモリIFブロック
  100  プロセッサシステム
  200  映像記録再生装置
  201  表示装置
10, 11, 12 Processor 20 Fetch unit 30, 31, 32 Dispatcher 35, 36, 37 Control table 40 Decoder 51 Calculator A
52 Calculator B
53 Calculator X
58 signal line 59 signal line 60 memory 61 thread P
62 Thread Q
63 Thread R
71 Stream I / O Block 72 AVIO Block 73 Memory IF Block 100 Processor System 200 Video Recording / Reproducing Device 201 Display Device

Claims (7)

  1.  各々がコンピュータプログラムに対応する複数のスレッドを、共有リソースを用いて実行するプロセッサにおいて、前記複数のスレッドに前記共有リソースを割り当てるための調停を行うスレッド調停システムであって、
     前記プロセッサにおいて、
     前記共有リソースは、前記各スレッドに含まれる特定命令によって時分割に占有され、
     前記各スレッドは、前記各スレッドに順次排他的に割り当てられるタイムスロットにおいて前記特定命令の上流ステージが処理されることによって前記共有リソースを使用できる状態になり、その後、複数のタイムスロットにわたって前記特定命令の下流ステージの処理のために前記共有リソースを占有し、
     前記スレッド調停システムは、
     前記複数のスレッドのうち、第1スレッドが前記共有リソースの使用を終了したときに、前記第1スレッドと前記第1スレッドとは異なる第2スレッドとがそれぞれ前記共有リソースを使用できる状態にある場合、前記第1スレッドよりも先に前記第2スレッドに前記共有リソースを割り当てる
     スレッド調停システム。
    In a processor that executes a plurality of threads each corresponding to a computer program using a shared resource, a thread arbitration system that performs arbitration for assigning the shared resource to the plurality of threads,
    In the processor,
    The shared resource is occupied in a time-sharing manner by a specific instruction included in each thread,
    Each thread becomes ready to use the shared resource by processing an upstream stage of the specific instruction in a time slot that is sequentially and exclusively assigned to each thread, and then the specific instruction is transmitted over a plurality of time slots. Occupy the shared resource for downstream processing of
    The thread arbitration system is:
    The first thread and the second thread different from the first thread can use the shared resource when the first thread finishes using the shared resource among the plurality of threads. A thread arbitration system that allocates the shared resource to the second thread prior to the first thread.
  2.  前記スレッド調停システムは、
     前記複数のスレッドのうち、第1スレッドが前記共有リソースの使用を終了したときに、2つ以上のスレッドがそれぞれ前記共有リソースを使用できる状態にある場合、前記2つ以上のスレッドのうち最も先に前記共有リソースを使用できる状態になっていたスレッドの特定命令を下流ステージへディスパッチする
     請求項1に記載のスレッド調停システム。
    The thread arbitration system is:
    Among the plurality of threads, when two or more threads can use the shared resource when the first thread finishes using the shared resource, the earliest of the two or more threads. The thread arbitration system according to claim 1, wherein a specific instruction of a thread that is in a state in which the shared resource can be used is dispatched to a downstream stage.
  3.  前記スレッド調停システムは、
     前記複数のスレッドのうち、第1スレッドが前記共有リソースを使用中に、前記第1スレッドとは異なる第2スレッドが前記共有リソースを使用できる状態になった場合、前記第1スレッドの後続する特定命令を、前記第2スレッドが前記共有リソースの使用を終了した後に、下流ステージへディスパッチする
     請求項1に記載のスレッド調停システム。
    The thread arbitration system is:
    When the second thread different from the first thread becomes able to use the shared resource while the first thread is using the shared resource among the plurality of threads, the subsequent identification of the first thread The thread arbitration system according to claim 1, wherein the instruction is dispatched to a downstream stage after the second thread finishes using the shared resource.
  4.  前記スレッド調停システムは、
     前記各スレッドに優先度を定め、
     前記複数のスレッドのうち、第1スレッドが前記共有リソースを使用中に、前記第1スレッドよりも優先度が高い第2スレッドが前記共有リソースを使用できる状態になった場合、前記第1スレッドによる前記共有リソースの使用を中止させ、前記第2スレッドの前記特定命令を下流ステージへディスパッチし、
     前記第2スレッドが前記共有リソースの使用を終了した後に、前記第1スレッドに前記共有リソースの使用を再開させる
     請求項1に記載のスレッド調停システム。
    The thread arbitration system is:
    Prioritize each thread,
    When a second thread having a higher priority than the first thread becomes able to use the shared resource while the first thread is using the shared resource among the plurality of threads, the first thread Stop using the shared resource, and dispatch the specific instruction of the second thread to a downstream stage;
    The thread arbitration system according to claim 1, wherein after the second thread finishes using the shared resource, the first thread causes the first thread to resume using the shared resource.
  5.  請求項1から4のいずれか1項に記載のスレッド調停システムを備えるプロセッサ。 A processor comprising the thread arbitration system according to any one of claims 1 to 4.
  6.  請求項5に記載のプロセッサを備え、前記複数のスレッドのうち、第1スレッドにて映像記録処理を行い、第2スレッドにて映像再生処理を行う映像記録再生装置。 6. A video recording / reproducing apparatus comprising the processor according to claim 5, wherein video recording processing is performed in a first thread among the plurality of threads, and video reproduction processing is performed in a second thread.
  7.  各々がコンピュータプログラムに対応する複数のスレッドを、共有リソースを用いて実行するプロセッサにおいて、前記複数のスレッドに前記共有リソースを割り当てるための調停を行うスレッド調停方法であって、
     前記プロセッサにおいて、
     前記共有リソースは、前記各スレッドに含まれる特定命令によって時分割に占有され、
     前記各スレッドは、前記各スレッドに順次排他的に割り当てられるタイムスロットにおいて前記特定命令の上流ステージが処理されることによって前記共有リソースを使用できる状態になり、その後、複数のタイムスロットにわたって前記特定命令の下流ステージの処理のために前記共有リソースを占有し、
     前記スレッド調停方法は、
     前記複数のスレッドのうち、第1スレッドが前記共有リソースの使用を終了したときに、前記第1スレッドと前記第1スレッドとは異なる第2スレッドとがそれぞれ前記共有リソースを使用できる状態にある場合、前記第1スレッドよりも先に前記第2スレッドに前記共有リソースを割り当てる
     スレッド調停方法。
    In a processor that executes a plurality of threads each corresponding to a computer program using a shared resource, a thread arbitration method that performs arbitration for assigning the shared resource to the plurality of threads,
    In the processor,
    The shared resource is occupied in a time-sharing manner by a specific instruction included in each thread,
    Each thread becomes ready to use the shared resource by processing an upstream stage of the specific instruction in a time slot that is sequentially and exclusively assigned to each thread, and then the specific instruction is transmitted over a plurality of time slots. Occupy the shared resource for downstream processing of
    The thread arbitration method is:
    The first thread and the second thread different from the first thread can use the shared resource when the first thread finishes using the shared resource among the plurality of threads. A thread arbitration method that allocates the shared resource to the second thread before the first thread.
PCT/JP2011/004727 2010-08-25 2011-08-25 Thread arbitration system, processor, video recording/reproduction device, and thread arbitration method WO2012026124A1 (en)

Applications Claiming Priority (2)

Application Number Priority Date Filing Date Title
JP2010-188745 2010-08-25
JP2010188745A JP2012048399A (en) 2010-08-25 2010-08-25 Thread arbitration system, processor, video recording and reproducing device, and thread arbitration method

Publications (1)

Publication Number Publication Date
WO2012026124A1 true WO2012026124A1 (en) 2012-03-01

Family

ID=45723146

Family Applications (1)

Application Number Title Priority Date Filing Date
PCT/JP2011/004727 WO2012026124A1 (en) 2010-08-25 2011-08-25 Thread arbitration system, processor, video recording/reproduction device, and thread arbitration method

Country Status (2)

Country Link
JP (1) JP2012048399A (en)
WO (1) WO2012026124A1 (en)

Citations (3)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
JP2006343872A (en) * 2005-06-07 2006-12-21 Keio Gijuku Multithreaded central operating unit and simultaneous multithreading control method
JP2007507805A (en) * 2003-10-01 2007-03-29 インテル・コーポレーション Method and apparatus for enabling thread execution in a multi-threaded computer system
JP2007533007A (en) * 2004-04-07 2007-11-15 サンドブリッジ テクノロジーズ インコーポレーテッド Multi-thread processor with multiple simultaneous pipelines per thread

Patent Citations (3)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
JP2007507805A (en) * 2003-10-01 2007-03-29 インテル・コーポレーション Method and apparatus for enabling thread execution in a multi-threaded computer system
JP2007533007A (en) * 2004-04-07 2007-11-15 サンドブリッジ テクノロジーズ インコーポレーテッド Multi-thread processor with multiple simultaneous pipelines per thread
JP2006343872A (en) * 2005-06-07 2006-12-21 Keio Gijuku Multithreaded central operating unit and simultaneous multithreading control method

Also Published As

Publication number Publication date
JP2012048399A (en) 2012-03-08

Similar Documents

Publication Publication Date Title
US10891158B2 (en) Task scheduling method and apparatus
JP5097251B2 (en) Method for reducing energy consumption in buffered applications using simultaneous multithreading processors
JP6199477B2 (en) System and method for using a hypervisor with a guest operating system and virtual processor
KR100591727B1 (en) Recording media and information processing systems recording scheduling methods and programs for executing the methods
US8407454B2 (en) Processing long-latency instructions in a pipelined processor
US8161491B2 (en) Soft real-time load balancer
EP2593862B1 (en) Out-of-order command execution in a multimedia processor
US6944850B2 (en) Hop method for stepping parallel hardware threads
US9170841B2 (en) Multiprocessor system for comparing execution order of tasks to a failure pattern
US10545892B2 (en) Multi-thread processor and its interrupt processing method
JP2008123045A (en) Processor
US9588808B2 (en) Multi-core system performing packet processing with context switching
KR20050000487A (en) Scheduling method and realtime processing system
KR20050011689A (en) Method and system for performing real-time operation
US20130347000A1 (en) Computer, virtualization mechanism, and scheduling method
KR20130066900A (en) Method to guarantee real time for soft real time operating system
US8225320B2 (en) Processing data using continuous processing task and binary routine
WO2005048009A2 (en) Method and system for multithreaded processing using errands
US20240036921A1 (en) Cascading of Graph Streaming Processors
JP2006146758A (en) Computer system
WO2019187719A1 (en) Information processing device, information processing method, and program
WO2012026124A1 (en) Thread arbitration system, processor, video recording/reproduction device, and thread arbitration method
CN111381887B (en) Method and device for performing image motion compensation in MVP processor and processor
JP2760273B2 (en) Arithmetic device and control method thereof
US20220382587A1 (en) Data processing systems

Legal Events

Date Code Title Description
121 Ep: the epo has been informed by wipo that ep was designated in this application

Ref document number: 11819600

Country of ref document: EP

Kind code of ref document: A1

NENP Non-entry into the national phase

Ref country code: DE

122 Ep: pct application non-entry in european phase

Ref document number: 11819600

Country of ref document: EP

Kind code of ref document: A1