WO2012026124A1

WO2012026124A1 - Thread arbitration system, processor, video recording/reproduction device, and thread arbitration method

Info

Publication number: WO2012026124A1
Application number: PCT/JP2011/004727
Authority: WO
Inventors: 直紀越智
Original assignee: パナソニック株式会社
Priority date: 2010-08-25
Filing date: 2011-08-25
Publication date: 2012-03-01
Also published as: JP2012048399A

Abstract

In the present invention, an arithmetic unit (X) (53), which is a shared resource that a processor (10) has, is appropriated in a time-shared manner by means of a particular command contained in each thread (61-63). The threads enter a state that can use the arithmetic unit (X) (53) by means of the upstream stage (20-40) of the particular command being processed in time slots that are sequentially and exclusively allocated to the threads; afterwards, the arithmetic unit (X) (53) is appropriated in order to process the downstream stage of the particular command across the plurality of time slots. When a first thread has finished using the arithmetic unit (X) (53), in the case that the first thread and a second thread that is different from the first thread are each in the state that can use the arithmetic unit (X) (53), a thread arbitration system allocates the arithmetic unit (X) (53) to the second thread before the first thread.

Description

Thread arbitration system, processor, video recording / reproducing apparatus, and thread arbitration method

The present invention relates to a thread arbitration system, and more particularly to a thread arbitration system used for a multi-thread processor.

Conventionally, a multi-thread processor capable of processing a plurality of threads in a pseudo-parallel manner has been proposed (see, for example, Patent Document 1). A thread refers to a unit of processing executed in a computer system or a computer program for causing a computer system to execute the processing. The size of the thread (the processing amount or the number of instructions) is arbitrarily determined by the program designer.

FIG. 15 is a diagram schematically showing a typical example of the configuration and operation of such a processor. This processor includes a fetch unit, a dispatcher, a decoder, an arithmetic unit A, and an arithmetic unit B as hardware resources (hereinafter referred to as resources for short). Each instruction is divided into a plurality of stages and pipelined by resources associated with each stage.

FIG. 15 shows the processing status of the instructions P1, Q1, R1,... Of the three threads P, Q, and R. This processor is configured so that all stages of all instructions are processed in one unit time, and in each stage, instructions of different threads are processed in an orderly manner for each unit time.

1 unit time may be, for example, one clock cycle, or may be a plurality of predetermined clock cycles. In the following description, one unit time is generally referred to as one time slot.

In a processor configured in this manner, since contention between threads cannot occur because resources inside the processor are used, it seems that each thread occupies a processor operating at an actual speed of 1/3. appear. Then, the processing of each instruction of each thread is always completed in a fixed time.

This is extremely useful when it is necessary to guarantee the time required for processing instructions of each thread, such as when each of a plurality of threads performs processing that requires real-time processing.

Special Table 2003-523561

However, in practice, a specific instruction (for example, a division instruction) that requires a plurality of time slots to process an execution stage is defined, and the processor processes the execution stage of such a specific instruction over a plurality of time slots. May be configured.

As an illustrative example, the upstream stage of a specific instruction (eg, a division instruction) is processed in one time slot, and the execution stage of the specific instruction is processed in three consecutive time slots using a shared resource (eg, a divider). Think of a processor. The shared resource is occupied in a time division manner by a plurality of specific instructions.

In such a processor, unlike the above-described processor, there may occur a situation where specific instructions of a plurality of threads wait for the start of the execution stage after finishing the processing of the upstream stage. In such a situation, it is not necessarily obvious which thread should execute the specific instruction execution stage next with the shared resource.

As a simple example, a thread that is allowed to start an execution stage of a specific instruction for each time slot is defined, and among the subsequent specific instructions that are waiting when the execution stage of the preceding specific instruction ends, Consider how to start the execution stage of a specific instruction of a thread that is allowed in that time slot.

FIG. 16 is a diagram showing an example of the processing status of the execution stage of a specific instruction based on such a concept.

FIG. 16 shows the processing status of each of the execution stages of the three specific instructions P1, P2, and P3 of the thread P, the specific instruction Q1 of the thread Q, and the specific instruction R1 of the thread R. The execution stage of the specific instruction can be started (thick line) when the processing of the upstream stage is completed. A specific instruction whose execution stage can be started is executed (solid line) only when the shared resource is free when the time slot in which the execution stage of the thread is permitted is turned around. Until it is a start wait (broken line).

In the example of FIG. 16, three specific instructions Q1, R1, and P3 can be started at the end of the time slot 6, and the specific instruction of the thread P that is permitted to start in the time slot 7 in the time slot 7 in which the computing unit is vacant. The execution stage of P3 is started. As a result, there is a disadvantage that the specific instruction Q1 of the thread Q and the specific instruction R1 of the thread R are kept waiting without knowing when they can start.

That is, it can be seen that the method of determining a thread that is permitted to start the execution stage of a specific instruction for each time slot as exemplified above cannot guarantee the time required to process the instruction of each thread.

However, a suitable method for guaranteeing the time required for processing the instruction of each thread has not been known.

The present invention has been made in view of the above circumstances, and provides a thread arbitration system that is suitably used in a processor that can execute a plurality of threads and that can guarantee the time required to process instructions of each thread. It is for the purpose.

In order to solve the above-described conventional problem, a thread arbitration system according to one aspect of the present invention includes a processor that executes a plurality of threads each corresponding to a computer program using a shared resource. In the thread arbitration system that performs arbitration for allocating the shared resource, in the processor, the shared resource is occupied in a time-sharing manner by a specific instruction included in each thread, and each thread is assigned to each thread. The shared resource can be used by processing the upstream stage of the specific instruction in a time slot that is sequentially and exclusively allocated, and then, for the processing of the downstream stage of the specific instruction over a plurality of time slots. Occupies a shared resource and When the first thread of the plurality of threads terminates the use of the shared resource, the first thread and a second thread different from the first thread use the shared resource, respectively. When it is in a ready state, the shared resource is allocated to the second thread prior to the first thread.

In the thread arbitration system, when two or more threads can use the shared resource when the first thread finishes using the shared resource among the plurality of threads, Of the two or more threads, the specific instruction of the thread that has been able to use the shared resource first may be dispatched to the downstream stage.

Further, the thread arbitration system, when the second thread different from the first thread can use the shared resource while the first thread is using the shared resource among the plurality of threads, The subsequent specific instruction of the first thread may be dispatched to the downstream stage after the second thread has finished using the shared resource.

According to such a configuration, when there are multiple threads in a state where the shared resource can be used, any one thread does not continue to use the shared resource, so the shared resource is evenly distributed to all threads. As a result, the specific instruction of each thread can complete the processing within a predetermined guarantee time.

Further, the thread arbitration system sets a priority for each thread, and among the plurality of threads, a second thread having a higher priority than the first thread is used while the first thread is using the shared resource. When the shared resource becomes available, the use of the shared resource by the first thread is stopped, the specific instruction of the second thread is dispatched to a downstream stage, and the second thread After the use is finished, the first thread may resume the use of the shared resource.

According to such a configuration, the processing time is guaranteed only for the thread having the highest priority, so that the thread having the highest priority is not guaranteed, instead of guaranteeing the processing time of the thread having the lower priority. A shorter processing time can be guaranteed for a specific instruction.

The processor according to one aspect of the present invention may include the above-described thread arbitration system.

According to such a configuration, a processor that can guarantee the time required for a plurality of threads can be obtained.

A video recording / playback apparatus according to one aspect of the present invention includes the above-described processor, and performs video recording processing in a first thread and video playback processing in a second thread among the plurality of threads. May be.

According to such a configuration, since the time required for the video recording process and the video reproduction process can be accurately estimated, it is effective in avoiding the lack of video caused by the estimation of the time required for these processes not being determined. is there.

The present invention can be realized not only as such a thread arbitration system, a processor, and a video recording / reproducing apparatus, but also as a thread arbitration method.

According to the thread arbitration system of the present invention, when the first thread finishes using the shared resource among the plurality of threads, the first thread and the second thread different from the first thread are When the shared resource is in a usable state, the shared resource is allocated to the second thread before the first thread. Therefore, when there are a plurality of threads in a state where the shared resource can be used, No one thread keeps using shared resources.

Therefore, shared resources are uniformly allocated to all threads, and as a result, the time required for processing of each thread can be guaranteed.

FIG. 1 is a block diagram illustrating an example of a functional configuration of a processor including a thread arbitration system according to Embodiment 1 of the present invention. FIG. 2 is a block diagram showing an example of a specific configuration of the dispatcher according to Embodiment 1 of the present invention. FIG. 3 is a state transition diagram defining an example of the operation of the dispatcher according to the first embodiment of the present invention. FIG. 4 is a diagram showing an example of the processing status of the execution stage of the specific instruction according to Embodiment 1 of the present invention. FIG. 5 is a block diagram illustrating an example of a functional configuration of a processor including a thread arbitration system according to Embodiment 2 of the present invention. FIG. 6 is a state transition diagram defining an example of the operation of the dispatcher according to the second embodiment of the present invention. FIG. 7 is a diagram showing an example of the processing status of the execution stage of the specific instruction according to the second embodiment of the present invention. FIG. 8 is a block diagram illustrating an example of a functional configuration of a processor including a thread arbitration system according to Embodiment 3 of the present invention. FIG. 9 is a state transition diagram defining an example of the operation of the dispatcher according to the third embodiment of the present invention. FIG. 10 is a diagram showing an example of the processing status of the execution stage of a specific instruction according to Embodiment 3 of the present invention. FIG. 11 is a block diagram showing an example of a functional configuration of a processor system according to Embodiment 4 of the present invention. FIG. 12 is a diagram showing an example of the appearance of a video recording / reproducing apparatus using the processor system according to Embodiment 4 of the present invention. FIG. 13 is a block diagram illustrating an example of a functional configuration of a processor according to a comparative example. FIG. 14 is a block diagram illustrating an example of a functional configuration of a processor according to a comparative example. FIG. 15 is a diagram schematically illustrating a typical example of the configuration and operation of a conventional processor. FIG. 16 is a diagram for explaining a problem in processing of an execution stage of a specific instruction.

Hereinafter, embodiments of the present invention will be described with reference to the drawings.

(Embodiment 1)
FIG. 1 is a block diagram illustrating an example of a functional configuration of a processor 10 including a dispatcher 30 as a thread arbitration system according to Embodiment 1 of the present invention. FIG. 1 shows a memory 60 accessed from the processor 10 together with the processor 10.

The processor 10 is a processor that can process a plurality of threads in a pseudo-parallel manner, and includes a fetch unit 20, a dispatcher 30, a decoder 40, an arithmetic unit A51, an arithmetic unit B52, an arithmetic unit X53, and a signal line 58.

The memory 60 holds a thread P61, a thread Q62, and a thread R63. Here, the thread P61, the thread Q62, and the thread R63 are computer programs executed by the processor 10, respectively.

The fetch unit 20 fetches the instructions of the thread P61, the thread Q62, and the thread R63 from the memory 60, and sequentially supplies the fetched instructions to the dispatcher 30.

The dispatcher 30 functions as the thread arbitration system of the present invention by dispatching the instructions supplied from the fetch unit 20 in a predetermined order. Information related to thread arbitration is recorded in the control table 35. The instruction dispatched from the dispatcher 30 is delivered to the decoder 40.

The decoder 40 decodes the instruction delivered from the dispatcher 30 to identify the type of the instruction. Depending on the identified type of instruction, the decoder 40 sends the instruction to any one of the arithmetic unit A51, the arithmetic unit B52, and the arithmetic unit X53. The execution stage process is performed.

The computing unit A51, the computing unit B52, and the computing unit X53 process instruction execution stages (for example, arithmetic operations, logical operations, etc.).

As described in the background art, the fetch unit 20, the dispatcher 30, the decoder 40, the arithmetic unit A51, and the arithmetic unit B52 of the processor 10 are configured to process the stage of the instruction that they are responsible for in one time slot. In these stages, the instructions of different threads are processed in an orderly manner for each time slot.

Therefore, for the instruction whose execution stage is processed by the arithmetic unit A51 or the arithmetic unit B52, the processing is always completed in a fixed time. Since this operation is not within the scope of the present invention, a description thereof will be omitted.

On the other hand, the arithmetic unit X53 processes the execution stage of the specific instruction over a plurality of time slots. For example, the specific instruction may be a division instruction, and the arithmetic unit X53 may be a divider that processes an execution stage of the division instruction.

In this specification, an instruction in which an execution stage is processed over a plurality of time slots is generally called a specific instruction. The computing unit X53 is an example of the shared resource of the present invention, and is occupied by each thread in a time division manner in order to process the execution stage of a specific instruction.

In the processor 10 configured as described above, there may occur a situation in which specific instructions of a plurality of threads finish processing of the upstream stage and wait for the start of the execution stage.

In this situation, when the execution stage of the specific instruction preceding the first thread is completed, the dispatcher 30 determines that the specific instruction subsequent to the first thread and the specific instruction of the second thread different from the first thread are present. When waiting for the start of the execution stage, the specific instruction of the second thread is dispatched before the specific instruction of the subsequent thread.

In the thread arbitration system, this operation of the dispatcher 30 is performed when the first thread and the second thread can use the shared resource when the first thread finishes using the shared resource. This is equivalent to the operation of allocating the shared resource to the second thread before one thread.

The description of the more specific configuration and operation of the dispatcher 30 will be continued.

FIG. 2 is a block diagram showing an example of a specific configuration of the dispatcher 30. In this example, the control table 35 is configured by a FIFO (First-In First-Out) 35 a that can temporarily hold a specific command.

The computing unit status signal notified from the computing unit X53 to the dispatcher 30 indicates whether the computing unit X53 is free (IDLE) or in use (BUSY).

FIG. 3 is a state transition diagram that defines an example of the operation of the dispatcher 30. EMPTY in FIG. 3 indicates a state in which the FIFO 35a is empty, and EXIST indicates a state in which one or more specific instructions are contained in the FIFO 35a. Curved arrows indicate state transitions, and the explanation given to the arrows indicates the conditions under which state transitions occur and the operations performed by dispatcher 30 during the state transitions (only when there are operations to be performed) separated by slashes. . The dispatcher 30 operates as follows according to the state transition diagram shown in FIG.

Before the specific instruction is supplied from the fetch unit 20, the FIFO 35a is empty (S10). At this time, when a specific instruction is supplied from the fetch unit 20, the dispatcher 30 writes the specific instruction in the FIFO 35a (S11). When a specific instruction is further supplied from the fetch unit 20, the dispatcher 30 writes the specific instruction in the FIFO 35a (S12). If the arithmetic unit X53 is BUSY, the specific instruction in the FIFO 35a is held in the FIFO 35a without being dispatched (S13).

If the computing unit X53 is IDLE, the dispatcher 30 immediately reads the first specific instruction from the FIFO 35a and dispatches it (S14, S15). As a result, the specific instruction that has been in the state where the computing unit X53, which is the shared resource, can be used first is dispatched to the downstream stage. When the dispatcher 30 reads and dispatches the last specific instruction from the FIFO 35a, the FIFO 35a becomes empty (S15).

According to the state transition diagram of FIG. 3, when two or more threads are in a state where each of the two or more threads can use the shared resource when a certain thread finishes using the shared resource, the dispatcher 30 The specific instruction of the thread that is in the state where the shared resource can be used first is dispatched.

FIG. 4 is a diagram showing an example of the processing status of the execution stage of the specific instruction in the computing unit X53 when the dispatcher 30 performs the above-described operation.

FIG. 4 shows the processing status of the execution stages of the three specific instructions P1, P2, and P3 of the thread P, the three specific instructions Q1, Q2, and Q3 of the thread Q, and the two specific instructions R1 and R2 of the thread R. Is shown. The execution stage of the specific instruction can be started (thick line) by being written in the FIFO 35a when the processing of the upstream stage is completed. The specific instruction that can be started is immediately dispatched and executed (solid-line band) if the arithmetic unit X53 is free, and waits for start (broken-line band) if the arithmetic unit X53 is in use.

Further, in FIG. 4, the guaranteed time, which is the upper limit of the time required from the start of each specific command to the completion of execution, is indicated by an arrow. This guaranteed time is represented by a number of time slots obtained by multiplying the number of threads that need to guarantee the processing time of a specific instruction by the number of time slots for processing the execution stage of the specific instruction.

Here, the processing time of the specific instruction of the three threads P, Q, and R can be guaranteed, and the guaranteed time is 9 time slots, assuming that 3 time slots are required for the execution stage of the specific instruction.

The feature of this operation is that, as seen in the time slot 4, when the thread P finishes using the shared resource, the thread P and a thread Q different from the thread P can use the shared resource. The shared resource is allocated to the thread Q before the thread P.

Such an operation is performed when two or more threads (threads P, Q, R) in which the dispatcher 30 can use the shared resource when a certain thread (thread P) finishes using the shared resource. ), The shared resource is allocated to a thread (thread Q) that has been in a state where the shared resource can be used first.

As a result, if there are multiple threads that can use the shared resource, any one thread will not continue to use the shared resource, and the shared resource will be assigned to all threads evenly. As a result, the time required for processing the specific instruction of each thread can be guaranteed.

(Embodiment 2)
FIG. 5 is a block diagram illustrating an example of a functional configuration of the processor 11 including the dispatcher 31 as the thread arbitration system according to the second embodiment of the present invention. The processor 11 differs from the processor 10 of the first embodiment in the contents of the control table 36 and the operation of the dispatcher 31.

Hereinafter, the same components as those described in the first embodiment are denoted by the same reference numerals, and description thereof will be omitted as appropriate, and differences from the first embodiment will be mainly described.

The operation of the dispatcher 31 as a thread arbitration system is that the first thread and the second thread can use the shared resource when the first thread finishes using the shared resource. This is common to the dispatcher 30 of the first embodiment in that the shared resource is allocated to the second thread first.

However, in order for the dispatcher 31 to perform the above-described operation, the thread that waits for the start of the execution stage by another thread waits until the execution stage of the thread that has waited for itself is completed. It differs from the dispatcher 30 of the first embodiment in that the start is restricted (inhibited).

The control table 36 has a command status column 36a, a specific command column 36b, and an inhibitor column 36c corresponding to each thread. Each column of the control table 36 is configured using a register, for example.

The instruction status column 36a holds information indicating that the execution stage of the specific instruction is being executed by the computing unit X53 (EXEC), waiting for start (READY), or there is no specific instruction to be executed (NONE). The specific command column 36b holds a specific command waiting for the execution stage to be started or being executed. In the inhibitor column 36c, information for identifying another thread that has waited for the start of the execution stage by the thread corresponding to the inhibitor column 36c is held. The start of the execution stage of the specific instruction of the thread corresponding to the inhibitor column 36c is regulated by the thread recorded in the inhibitor column 36c.

The computing unit status signal notified from the computing unit X53 to the dispatcher 31 indicates whether the computing unit X53 is free (IDLE) or in use (BUSY).

FIG. 6 is a state transition diagram defining an example of the operation of the dispatcher 31 configured as described above. The dispatcher 31 performs the operations defined in the state transition diagram of FIG. 6 in parallel for each of a plurality of threads. NONE, READY, and EXEC in FIG. 6 indicate the contents of the instruction status column 36a of the thread to be operated.

Before the specific command is supplied from the fetch unit 20, the command status column 36a is NONE (S20). At this time, when the specific instruction of the target thread is supplied from the fetch unit 20, the dispatcher 31 sets the instruction state column 36a to READY and records the specific instruction in the specific instruction column 36b (S21).

If the computing unit X53 is BUSY, the dispatcher 31 records information for identifying the target thread in the inhibitor column 36c of the other thread whose instruction status column 36a is EXEC or READY, so that the execution stage of the other thread is recorded. The start is restricted (S22).

Even if the computing unit X53 is IDLE, the dispatcher 31 dispatches the specific instruction recorded in the specific instruction column 36b if the inhibitor column 36c of the target thread is not empty, that is, the start is restricted by another thread. It waits without doing (S23).

The dispatcher 31 dispatches the specific instruction recorded in the specific instruction column 36b if the computing unit X53 is IDLE and the inhibitor column 36c of the target thread is empty, that is, if the start is not restricted by another thread, The instruction status column 36a is set to EXEC (S24).

Thereafter, when the computing unit X53 becomes IDLE, the dispatcher 31 removes the information for identifying the target thread from the inhibitor column 36c of the other thread, thereby releasing the restriction on the other thread. If the next specific instruction of the target thread is supplied from the fetch unit 20, the dispatcher 31 sets the instruction state column 36a to READY, records the specific instruction in the specific instruction column 36b (S25), and If there is no specific command, the dispatcher 31 sets the command status column 36a to NONE (S26).

Such an operation is performed in parallel for each of a plurality of threads, so that a thread that has waited for the start of the execution stage by another thread waits for itself until the end of its execution stage. The overall operation of regulating the start of the next execution stage of the thread is realized.

When the first thread is using the shared resource and the second thread different from the first thread can use the shared resource according to the state transition diagram of FIG. Are dispatched after the second thread has finished using the shared resource.

FIG. 7 is a diagram showing an example of the processing status of the execution stage of the specific instruction in the computing unit X53 when the dispatcher 31 performs the above-described operation.

FIG. 7 shows the processing status of the execution stages of the three specific instructions P1, P2, and P3 of the thread P, the three specific instructions Q1, Q2, and Q3 of the thread Q, and the two specific instructions R1 and R2 of the thread R. Is shown. The execution stage of the specific instruction can be started (thick line) by being written in the specific instruction column 36b when the processing of the upstream stage is completed. A specific instruction that can be started is dispatched and executed (solid-line band) unless the operation unit X53 is free and the start of the operation is not restricted by another thread, and the operation unit X53 is in use. If the start is restricted by another thread, the start is being restricted (dashed hatched band). The reference numerals in parentheses displayed for the thread whose start is restricted indicate the thread that restricts the start of the thread.

Also, in FIG. 7, the guaranteed time, which is the upper limit of the time required from the start of each specific command to the completion of execution, is indicated by an arrow. This guarantee time is the same as the guarantee time described in FIG.

Such an operation is performed when the second thread (threads Q and R) different from the first thread (thread P) can use the shared resource while the first thread is using the shared resource. The dispatcher 31 is realized by dispatching a specific instruction that follows the first thread (thread P) after the second thread (threads Q and R) has finished using the shared resource.

(Embodiment 3)
FIG. 8 is a block diagram illustrating an example of a functional configuration of the processor 12 including the dispatcher 32 as the thread arbitration system according to the third embodiment of the present invention. The processor 12 differs from the processor 10 of the first embodiment in the contents of the control table 37 and the operation of the dispatcher 32.

Hereinafter, the same components as those described in the first embodiment will be denoted by the same reference numerals, and description thereof will be omitted as appropriate, and differences from the first and second embodiments will be mainly described.

In the operation of the dispatcher 32 as a thread arbitration system, a priority is set for each of a plurality of threads and the thread arbitration is performed based on the priority, as compared with the dispatcher 30 of the first embodiment and the dispatcher 31 of the second embodiment. The point is different.

If the execution stage of a specific instruction of a thread with a higher priority can be started during the processing of the execution stage of a specific instruction of a thread, the dispatcher 32 stops the execution stage being processed and the priority becomes higher. Interrupt control that starts the execution stage of a specific instruction of a high thread is performed.

The dispatcher 32 performs a thread arbitration operation equivalent to that of the dispatcher 30 of the first embodiment or the dispatcher 31 of the second embodiment between a plurality of threads having the same priority. Hereinafter, an interrupt performed by the dispatcher 32 is performed. The control will be described in detail.

The control table 37 has a command status column 37a, a specific command column 37b, and a priority column 37c corresponding to each thread. Each column of the control table 37 is configured using a register, for example.

The instruction status column 37a holds information indicating that the execution stage of the specific instruction is being executed by the computing unit X53 (EXEC), waiting for start (READY), or there is no specific instruction to be executed (NONE). The specific command column 37b holds a specific command waiting for the start of the execution stage or being executed. The priority column 37c holds a value indicating the priority of the thread. The smaller this value, the higher the priority. The maximum number of priorities is not limited.

The computing unit status signal notified from the computing unit X53 to the dispatcher 32 indicates whether the computing unit X53 is free (IDLE) or in use (BUSY).

FIG. 9 is a state transition diagram defining an example of the operation of the dispatcher 32 configured as described above. The dispatcher 32 performs the operations defined in the state transition diagram of FIG. 9 in parallel for each of a plurality of threads. NONE, READY, and EXEC in FIG. 9 indicate the contents of the instruction status column 37a of the thread to be operated.

Before the specific command is supplied from the fetch unit 20, the command status column 37a is NONE (S30). At this time, when the specific instruction of the target thread is supplied from the fetch unit 20, the dispatcher 32 sets the instruction state column 37a to READY and records the specific instruction in the specific instruction column 37b (S31).

If the arithmetic unit X53 is BUSY, the dispatcher 32 prioritizes the priority of the other thread (that is, the thread currently using the arithmetic unit X53) whose instruction state column 37a is ACTIVE and the priority of the target thread. Comparison is made based on the value in the degree column 37c. If the thread using the arithmetic unit X53 is an equal thread having a priority equal to the priority of the target thread or an upper thread having a higher priority, a specific instruction of the target thread is dispatched. It waits without doing (S32).

If the computing unit X53 is IDLE, the dispatcher 32 dispatches the specific instruction recorded in the specific instruction column 37b and sets the instruction state column 37a to EXEC (S33).

When the computing unit X53 is BUSY and is used by a lower thread having a lower priority than the target thread, the dispatcher 32 waits for the execution stage of the specific instruction currently being processed by the computing unit X53 to end. Instead, the specific instruction recorded in the specific instruction column 37b is dispatched, and the instruction state column 37a is set to EXEC (S34).

When the new specific instruction is dispatched, the arithmetic unit X53 stops the execution stage of the specific instruction currently being processed and starts processing the execution stage of the new specific instruction.

When the instruction state column 37a is EXEC, when a higher-order thread having a higher priority becomes READY, the processing in the computing unit X53 is interrupted by being interrupted by the higher-level thread, so that the dispatcher 32 has the instruction state column 37a. Is set to READY (S35).

When the instruction status column 37a is EXEC and the computing unit X53 becomes IDLE, that is, when the processing in the computing unit X53 is completed, if the next specific instruction of the target thread is supplied from the fetch unit 20, The dispatcher 32 sets the instruction status column 37a to READY, records the specific command in the specific command column 37b (S36), and if there is no next specific command, the dispatcher 32 sets the command status column 37a to NONE (S37). .

By performing such operations in parallel for each of multiple threads, the execution stage of a specific instruction of a thread with a higher priority can be started during the processing of the execution stage of a specific instruction of a thread. In this case, interrupt control is realized in which the execution stage being processed is stopped and the execution stage of a specific instruction of a thread having a higher priority is started.

FIG. 10 is a diagram showing an example of the processing status of the execution stage of the specific instruction in the computing unit X53 when the dispatcher 32 performs the above-described operation.

FIG. 10 shows the processing status of the execution stages of the three specific instructions P1, P2, and P3 of the thread P, the three specific instructions Q1, Q2, and Q3 of the thread Q, and the two specific instructions R1 and R2 of the thread R. Is shown. Here, it is assumed that the priority of the threads P and Q is higher than the priority of the thread R.

The execution stage of the specific instruction can be started (thick line) by being written in the specific instruction column 36b when the processing of the upstream stage is completed. The specific instruction that can be started is immediately dispatched and being executed if the computing unit X53 is free, or is interrupted and dispatched and running if the computing unit X53 is being used by a lower-level thread ( If the arithmetic unit X53 is being used by an equal or higher-level thread, the operation waits for start (dashed white line). The execution stage of the interrupted lower thread is aborted (dashed vertical stripes) and later dispatched again.

The execution stage of the specific instruction dispatched again may be restarted from the beginning. Also, when the execution stage of a specific instruction is canceled and the progress (shared resource state) is held in a save resource (for example, a register not shown), and the specific instruction is dispatched again In addition, the intermediate process held in the save resource may be returned to the shared resource to continue processing. It is only necessary to provide a number of resources for saving by one less than the maximum number of priorities.

Also, in FIG. 10, the guaranteed time, which is the upper limit of the time required from the start of each specific command to the completion of execution, is indicated by an arrow. This guaranteed time is represented by a number of time slots obtained by multiplying the number of threads that need to guarantee the processing time of a specific instruction by the number of time slots for processing the execution stage of the specific instruction.

Here, the processing time of the specific instruction of the two threads P and Q having the highest priority can be guaranteed, and the guaranteed time is 6 time slots, assuming that the execution stage of the specific instruction requires 3 time slots. . Compared to the example of the first embodiment and the second embodiment, the guaranteed time is shortened by reducing the number of threads that guarantee the processing time.

The feature of this operation is the interrupt control in which the upper thread interrupts the lower thread to acquire the shared resource as seen in the time slot 17 and the time slot 22. Due to such an operation, the processing time of the specific instruction of the lower thread is not guaranteed, but the guaranteed time of the upper thread is shortened.

(Embodiment 4)
The

processors

10, 11, and 12 described above include

dispatchers

30, 31, and 32 as specific thread arbitration systems, respectively, and can guarantee the processing time of specific instructions of a plurality of threads. This is extremely useful for applications that perform processing requiring time.

In Embodiment 4 of the present invention, a processor system and a video recording / reproducing apparatus as examples of such applications will be described.

FIG. 11 is a block diagram showing an example of a functional configuration of the processor system 100 using the

processors

10, 11, or 12 according to the fourth embodiment of the present invention.

The processor system 100 is a system LSI that performs various signal processing relating to a video / audio stream, and includes the

processors

10, 11, or 12 described above. The processor system 100 is used in, for example, a video recording / reproducing apparatus.

FIG. 12 is a diagram showing an example of the appearance of the video recording / reproducing apparatus 200 using the processor system 100. As a typical example, the video recording / playback apparatus 200 acquires a video / audio stream from a broadcast wave, and displays the broadcast program on the display device 201 while recording the broadcast program represented by the video / audio stream. I do.

As shown in FIG. 11, the processor system 100 includes a processor 10, a stream I / O block 71, an AVIO (Audio Visual Input Output) block 72, and a memory IF block 73.

In order to display the broadcast program on the display device 201, for example, the processor system 100 acquires a video / audio stream from a broadcast wave by the stream I / O block 71, decompresses the video / audio stream into video / audio data by the processor 10, and In block 72, a video / audio signal is generated from the video / audio data and output to the display device 201.

Further, the processor system 100 records the broadcast program in parallel with the display. For example, the processor 10 compresses the video / audio data into a recording format by the processor 10 and stores the compressed video / audio data in the memory IF block. The data is recorded in the external memory 60 via 73.

In such a process, in order to prevent the display and recording of the broadcast program from being lost (so-called frame dropping), each time required for the decompression process of the video / audio stream and the compression process of the video / audio data performed by the processor 10 is accurately determined. Need to be estimated.

Therefore, the processing time of the instruction is guaranteed by executing the video reproduction processing including the video / audio stream decompression processing and the video recording processing including the video / audio data compression processing by the processor 10 as threads. This makes it possible to accurately estimate the required time for the video / audio stream decompression processing (generally video display processing) and video / audio data compression processing (generally video recording processing).

(Explanation of effect by comparison with comparative example)
The superiority of the thread arbitration system according to the embodiment of the present invention will be further described by using, as a comparative example, a processor that guarantees the processing time of specific instructions of a plurality of threads by a configuration different from the embodiment of the present invention. .

FIG. 13 is a block diagram illustrating an example of a functional configuration of a processor according to a comparative example. This processor has as many computing units as threads that can be processed. In the processor configured as described above, each of the plurality of threads can occupy the arithmetic unit, so that the processing time of the thread can be guaranteed. However, when the number of threads changes, it is necessary to change the number of computing units, and there are disadvantages that the area of the processor and the power consumption increase.

FIG. 14 is a block diagram illustrating an example of a functional configuration of a processor according to another comparative example. The processor is divided into as many stages as threads that can be processed by the execution stage. In the processor configured as described above, since a plurality of threads are processed while occupying the divided stages, the processing time of the threads can be guaranteed. However, when the number of threads changes, it is necessary to change the number of stages, and there are disadvantages that the area of the processor and the power consumption increase.

According to these processors, the number of arithmetic units equal to the number of threads is provided, the flexibility of the configuration is lacking in that it is necessary to divide the stage, and the area and power consumption of the processor increase. Does not give a sufficiently satisfactory solution to guarantee the processing time.

Compared with these processors, a processor including the thread arbitration system according to the embodiment of the present invention may have only one arithmetic unit X53, and the number of execution stages divided may be fixed. In addition, since the processing time of each thread is guaranteed by controlling the execution order of specific instructions of each thread, there is an advantage that an increase in the area and power consumption of the processor can be suppressed as compared with the processor of the comparative example.

The thread arbitration system according to the present invention is useful for applications where it is necessary to guarantee the processing time of each of a plurality of threads in a multi-thread processor, a video recording / reproducing apparatus, and the like.

10, 11, 12 Processor 20 Fetch

unit

30, 31, 32

Dispatcher

35, 36, 37 Control table 40 Decoder 51 Calculator A
52 Calculator B
53 Calculator X
58 signal line 59 signal line 60 memory 61 thread P
62 Thread Q
63 Thread R
71 Stream I / O Block 72 AVIO Block 73 Memory IF Block 100 Processor System 200 Video Recording / Reproducing Device 201 Display Device

Claims

In a processor that executes a plurality of threads each corresponding to a computer program using a shared resource, a thread arbitration system that performs arbitration for assigning the shared resource to the plurality of threads,
In the processor,
The shared resource is occupied in a time-sharing manner by a specific instruction included in each thread,
Each thread becomes ready to use the shared resource by processing an upstream stage of the specific instruction in a time slot that is sequentially and exclusively assigned to each thread, and then the specific instruction is transmitted over a plurality of time slots. Occupy the shared resource for downstream processing of
The thread arbitration system is:
The first thread and the second thread different from the first thread can use the shared resource when the first thread finishes using the shared resource among the plurality of threads. A thread arbitration system that allocates the shared resource to the second thread prior to the first thread.
The thread arbitration system is:
Among the plurality of threads, when two or more threads can use the shared resource when the first thread finishes using the shared resource, the earliest of the two or more threads. The thread arbitration system according to claim 1, wherein a specific instruction of a thread that is in a state in which the shared resource can be used is dispatched to a downstream stage.
The thread arbitration system is:
When the second thread different from the first thread becomes able to use the shared resource while the first thread is using the shared resource among the plurality of threads, the subsequent identification of the first thread The thread arbitration system according to claim 1, wherein the instruction is dispatched to a downstream stage after the second thread finishes using the shared resource.
The thread arbitration system is:
Prioritize each thread,
When a second thread having a higher priority than the first thread becomes able to use the shared resource while the first thread is using the shared resource among the plurality of threads, the first thread Stop using the shared resource, and dispatch the specific instruction of the second thread to a downstream stage;
The thread arbitration system according to claim 1, wherein after the second thread finishes using the shared resource, the first thread causes the first thread to resume using the shared resource.
A processor comprising the thread arbitration system according to any one of claims 1 to 4.
6. A video recording / reproducing apparatus comprising the processor according to claim 5, wherein video recording processing is performed in a first thread among the plurality of threads, and video reproduction processing is performed in a second thread.
In a processor that executes a plurality of threads each corresponding to a computer program using a shared resource, a thread arbitration method that performs arbitration for assigning the shared resource to the plurality of threads,
In the processor,
The shared resource is occupied in a time-sharing manner by a specific instruction included in each thread,
Each thread becomes ready to use the shared resource by processing an upstream stage of the specific instruction in a time slot that is sequentially and exclusively assigned to each thread, and then the specific instruction is transmitted over a plurality of time slots. Occupy the shared resource for downstream processing of
The thread arbitration method is:
The first thread and the second thread different from the first thread can use the shared resource when the first thread finishes using the shared resource among the plurality of threads. A thread arbitration method that allocates the shared resource to the second thread before the first thread.