WO2002069150A1

WO2002069150A1 - Microprocessor and instruction execution order scheduling method

Info

Publication number: WO2002069150A1
Application number: PCT/JP2002/001272
Authority: WO
Inventors: Makoto Ueda
Original assignee: International Business Machines Corporation
Priority date: 2001-02-27
Filing date: 2002-02-14
Publication date: 2002-09-06
Also published as: KR20030088031A; JPWO2002069150A1; TW556082B

Abstract

The degradation of the ratio of utilization of an MPU caused by mishit of a cache memory is suppressed. A microprocessor (10) has an execution unit (26) for executing instructions including an additional test instruction to check if a desired routine including instructions or a desired data structure containing data is present in cache memories (20, 30). Right before reading such a routine, a routine included in such a data structure and parallel processable, or such a data structure, the microprocessor (10) confirms whether the routine or data structure to be read is present in the cache memories (20, 30). The routine or data structure in the cache memories (20, 30), out of the routines parallel processable and data structures, is preferentially executed by the microprocessor (10).

Description

Description Microprocessor and instruction execution order scheduling method

The present invention relates to a microprocessor and an instruction execution order scheduling method, and more particularly, to a method for scheduling an instruction execution order of a microprocessor and a microphone processor that executes instructions in an order specified by a program. Background art

Figure 4 shows an example of the configuration of an MPU (microprocessor unit) 10. The MPU 10 has a smaller capacity and higher-speed access than the external memory 40, and stores a part of the instruction read from the external memory 40 and a part of the data in the cache 'memory 14; A fetch unit 22 for reading an instruction or data from the internal memory 40, an execution unit 26 for executing the read instruction, a general-purpose register 32 for storing data used by the instruction being executed, and an external device ( 40) includes the path interface unit 12 to which it is connected.

The cache 'memory 14 includes an instruction cache 20 where instructions are stored and a data' cache 30 where data is stored. The MPU 10 'is connected to an external memory (semiconductor storage device) 40 via a path interface' unit 12 ', and instructions and data are read and written between the external memory 40 and the MPU 10'. However, since the access speed of the cache memories 20 and 30 is 60 to 100 times faster than that of the external memory 40, the cache memories 20 and 30 are used with priority over the external memory 40. If the instructions or data required by the MPUs 10 and do not exist in the cache memories 20 and 30, the instructions or data are read from the external memory 40. The reading of instructions or data from the external memory 40 when it does not exist in the cache memories 20 and 30 is controlled by hardware. For example, the control unit (not shown) that controls the entire MPU 10 'performs this control.

The external memory 40 is also connected to a hard disk (fixed magnetic storage device) 42, and instructions and data are read and written between the external memory 40 and the hard disk 42. If the command or data required by MPU 10 'is not present in external memory 40, read the command or data from hard disk 42. The reading of instructions or data from the hard disk 42 when it does not exist in the external memory 40 is controlled by software. Normally, OS (operating system) controls this.

To cause MPU 10 to execute an instruction, fetch unit 22 reads the instruction from instruction cache 20 or external memory 40. If the target instruction exists in the instruction cache 20, the instruction is read from the instruction cache 20; otherwise, the instruction is read from the external memory 40. When an instruction is read from the external memory 40, the read instruction is also sent to and stored in the instruction cache 20.

The instruction read by the fetch unit 22 is sent to the execution unit 26, where it is executed. Data necessary for executing the instruction is read from the data cache 30 or the external memory 40 to the general-purpose register 32. If the target data exists in the data cache 30, the data is read from the data cache 30; otherwise, the data is read from the external memory 40. Data read from external memory 40 is also sent to data cache 30. And stored. Fig. 5 (a) shows a flowchart of an example of a program that causes MPU 10 to execute two types of routines (Func A, Func B) using two data (DATAsA, DATAsB). FIG. 5 (a) mainly shows the reading and processing of data (DATAsA. DATAsB). The MPU 10 'reads and executes data in the execution order specified by the program shown in FIG. 5 (a).

DATAsA and DATAsB each contain some data (DATA-A0, DATA-Al, DAT

A-A2, and DATA-BO, DATA-Bl, DATA-B2,). These DATAs A and DATAs B are independent data.

FuncA and FuncB are a series of instructions with a certain function that constitute a part of the program. FuncA and FuncB have several instructions (Inst-

AO, Inst-Al, Inst-A2, and Inst-B0, Inst-Bl, Inst-B2,). FuncA and FuncB are independent instructions. For example, unless there is a branch instruction, FuncA executes instructions in the order of Inst-A0, Inst-Al, Inst-A2, and FuncB executes Inst-BO, Inst-Bl, Execute the instruction in the order of Inst-B2, _{c As} shown in FIG. 5 (a), the MPU 10, reads out DATAsA (S172) and executes Func A using DATAs A (S174). Then, DATAsB is read (S 176), and Func A using DATAsB is executed (S 178). Subsequently, DATAs A is read (S 172,), and FuncB using DATAsA is executed (S182). Next, DATAsB is read (S176 ′), and FuncB using DATAsB is executed (S186). ).

When data is read in the order shown in FIG. 5A, if the data to be read does not exist in the data / cache 30, the waiting time of the MPU 10 'increases. For example, when reading DATAsA and executing FuncA (S174), if DATAsA is not in the data cache 30, external memory 40 Read DATAsA from the input. Since the access speed of the external memory 40 is 60 to 100 times slower than that of the data cache 30, the waiting time of the MPU 10 'for reading data from the external memory 40 is 60 to 100 times longer.

Moreover, even if DATAsB exists in the data cache 30 and DATAsB can be read in a shorter time (1/100 to 1/60) than DATAs A, DATAsA can be read from the external memory 40. After reading (S 172), the reading (S 176) and FuncB (S 178) of DATAs B cannot be executed until the execution of FuncA (S 174) using the read DATAs A is completed.

Since DATAsA and DATAsB are independent data, and FuncA and FuncB are independent instructions, the program execution order is shown in the flow chart of Fig. 5 (a) to the flow chart shown in Fig. 5 (b). It is also possible to change to The MPU 10, reads out DATAs A (S172) and executes FmicA and FuncB (S174, S182), and then reads out DATAsB (S176) and executes FuncA and FuncB (S178, S178). S 186). However, in this case, as in FIG. 5 (a), if the data to be read is not in the data cache 30, the waiting time of the MPU 10, increases.

As described above, the reading of FuncA and FuncB when executing the forces FuncA and FuncB described in the example of reading DATAsA and MTAsB is the same. FIG. 6 is a flowchart showing an example of a program that causes the MPU 10 ′ to execute two routines (FuncA and FuncB). The MPU 10 'reads out FuncA (S190) and executes it (S192), and then reads out FuncB (S194) and executes it (S196).

In the case of execution of FuncA and FuncB, as in the case of the data described above (FIGS. 5A and 5B), if there is no target routine in the instruction cache 20, M The waiting time of PU10 becomes longer, and the utilization rate of MPU10 decreases.

As a method of reducing the increase in the waiting time of the MPU 10 due to such a cache 'hit of the memory 20 or 30' miss, an instruction which is expected to become necessary in near parallel to the program in parallel with the processing being executed Alternatively, there is prefetching (prefetching) in which data is read out to the MPU 10 'as much as possible. For the pre-touch, for example, a touch instruction is used. The touch instruction is an instruction for instructing the fetch unit 22 to read an instruction or data. When the touch instruction is executed, the instruction or data requested by the touch instruction is read from the external memory 40 to the cache memories 20 and 30. Even while the touch instruction is being executed, the execution unit 26 can execute other instructions in parallel. By using a touch instruction, the program can notify the MPU 10 'of an instruction or data expected to be accessed in the near future. The MPU 10 'improves the hit rate of the cache memories 20 and 30 by reading the instruction or data notified by the touch instruction into the caches '20 and 30 in advance.

However, prediction of the instruction or data to be prefetched is usually performed in the state of the source program before executing the program. Prefetched instructions or data are not necessarily needed because the instructions or data that are expected to be needed before the execution of the program are read. The effectiveness of prefetch depends on the accuracy of the prediction before the program is executed, and does not always have an effect.

There is also a method called multithreading that changes the order of instruction execution by using OS during the execution of a program. When a running thread enters the waiting state, the scheduler switches the other executable thread to the running state. A thread is a unit that can change the execution order of a program, and each thread is called a content related to the execution state of the program. Has information that can be lost. When the OS changes the execution order in units of threads, contexts, which are called context switches, are saved to registers and restored. Executing a context switch involves an interrupt indicating that the running thread has entered the wait state, starting the scheduler, accessing a register, and switching the execution of the thread. If the context switch is executed while waiting for access to the hard disk, the execution time of the context switch is sufficiently short and the multi-thread works effectively. However, if the context switch is executed during a cache 'miss latency, the context switch execution time is not short and the multi' thread does not work effectively.

There is also a method called "art-of-order" that changes the order of instruction execution inside the MPU during program execution. Out-of-order is performed by a super-scalar MPU that performs parallel processing using multiple execution units.When the execution unit enters a wait state, execution is not restricted to the instruction execution order specified by the program. The instructions that can be executed are executed first. However, since the order of instruction execution is arbitrarily changed on the MPU side, all previously executed instructions are often wasted. Disclosure of the invention

An object of the present invention is to reduce a decrease in the utilization rate of an MPU due to a cache 'memory hit' miss.

In the microprocessor of the present invention, the instructions executed by the execution unit include a test instruction for confirming whether a required routine or data structure exists in the cache memory. Such a microprocessor executes a test instruction immediately before reading a routine or data structure, so that the routine or data structure being read is keyed. · You can know beforehand whether or not it exists in memory.

According to the instruction execution order scheduling method of the present invention, just before reading a routine or a data structure capable of parallel processing, it is checked whether or not the routine or the data structure exists in the cache memory. The confirmation step and the priority execution step of causing the microprocessor to preferentially process the routine or data structure that is confirmed to exist in the cache memory among the routines or data structures that can be processed in parallel. Including. BRIEF DESCRIPTION OF THE FIGURES

FIG. 1 is a block diagram showing one configuration example of the MPU according to the present invention. FIG. 2 is a flowchart showing one embodiment of the scheduling according to the present invention.

FIG. 3 is a flowchart showing another embodiment of the scheduling according to the present invention.

FIG. 4 is a block diagram showing a configuration example of a conventional MPU.

Fig. 5 (a) is a flowchart showing an example of data processing performed by the MPU. Fig. 5 (b) is a routine that executes the routine using the same data structure in Fig. 5 (a). FIG. 11 is a flow chart chart.

FIG. 6 is a flowchart showing an example of the routine execution of the MPU.

BEST MODE FOR CARRYING OUT THE INVENTION

Next, an embodiment of a microprocessor and an instruction execution order scheduling method according to the present invention will be described in detail with reference to the drawings. As shown in FIG. 1, the execution unit 26 of the MPU 10 according to the present invention has a specified routine or data structure stored in the cache memory 14. A test instruction has been added to check for existing forces. When the test instruction is sent to MPU10, the MPU 10 checks whether the routine or data structure specified by the test instruction exists in the instruction cache 20 or the data cache 30 or exists ("1"). Returns a result that does not exist ("0"). This result is stored in general-purpose register 32. The test instruction is executed in the execution unit 26 of the MPU 10 like other instructions.

The processing order specified by the program cannot be changed based on the hardware viewpoint. However, there are cases where processing can be performed without problems even if the processing order is changed based on the viewpoint of software. For example, there are cases where the same processing is repeated for a plurality of data structures that do not have a dependency, or where a plurality of independent processings are performed. In the present embodiment, a description will be given of a conventional (FIG. 5B) scheduling of the execution order of FuncA and FuncB using DATAsA and DATAsB as an example. DATAsA and DATAsB are independent data structures, respectively, and FuncA and FuncB are independent routines. The order of reading DATAsA and DATAsB and reading and executing FuncA and FuncB can be interchanged. DATAs A, DATAs B and Func A, Func B are data units and instruction units, respectively, as viewed from the program.

FIG. 2 shows the conventional (FIG. 5 (b)) flow chart showing the scheduling part (S110, S112, S114, S116, S118, S122, S124, S126) of the present invention. , S128, S130) are shown in the flow chart. The reading and processing portions (S172, S174, S182, S176, S178, S186) of DATAs A and DATAsB are the same as the conventional one (FIG. 5 (b)). In the scheduling method of the present invention, just before reading DATAsA (S172), it is checked whether DATAsA exists in the data cache 30 (S114). This check is performed using test instructions. Send test instruction to MPU10 Then, MPU10 checks whether DATAsA exists in data cache 30 ("1") or does not exist ("0"), and stores the result ("1" or "0") in general-purpose register 32. I do.

If DATAsA exists in the data cache 30, MTAsA is read from the data cache 30 (S172), and FuncA and FuncB using DATAsA are executed by the MPU 10 (S174, S182). . If it does not exist, the conventional touch instruction is sent to the MPU 10 to prefetch DATAsA (S118). This prefetch can be performed in parallel with the execution of other instructions of the execution unit 26.

As for DATAsB, similarly to DATAsA, immediately before reading DATAsB (S176), it is confirmed by a test instruction whether DATAsB exists in the data cache 30 (S124). As in the case of MTAsA, when DATAsB exists in the data cache 30 內, DATAsB is read (S176), and the MPU 10 executes FuncA and FuncB (S178, S186). If it does not exist, prefetch DATAsB (S128).

In the present invention, a parameter DoneDA indicating whether FuncA and FuncB using DATAsA have been completed and a parameter DoneDB indicating whether FuncA and FuncB using DATAsB have been completed are used. If DoneDA and DoneDB are "1", FuncA and FuncB using DATAs A and DATAsB have been completed, respectively. If DoneDA and DoneDB are "0", FuncA and DATAs using DATAs A and DATAsB have been completed. Indicates that FuncB is not completed. Done M and DoneDB are stored in the data cache 30 or the external memory 40.

The initial values of DoneDA and DoneDB are "0" (S110). When FuncA and FuncB using DATAsA are completed, DoneDA is updated to "1" (S116). When FuncA and FuncB using DATAsB are completed, DoneDB is also set to "1". (S126) By referring to DoneDA and DoneDB, it is possible to confirm whether Func A and Func B using DAT AsA and DATAs B have been completed (S1 12). , S 122, S 1 30) ₀

When Func A and Fun cB using DATAs A are not completed when DoneM is referenced (S1 12), the force confirmation that DATAsA exists in the data cache 30 is performed (S114). Similarly, when Func A and Func B using DATAsB are not completed when DoneDB is referred to (S122), it is confirmed whether DATAsB exists in the data cache 30 (S124).

While data or instructions are prefetched, execution unit 26 can execute other instructions. For example, if DoneDA is "0" and DATAsA is not in the data cache 30, but DoneDB is "0" and DATAsB is in the data cache 30, the FuncA and FuncB using DATAsB during the prefetch of DATAsA Can be performed.

Next, the operation of the scheduling of the execution order of FuncA and FuncB using DATAsA and DATAsB will be described.

First, DoneM and DoneDB are initialized (S110). Next, it is checked whether Func A and FuncB using DATAsA have been completed with reference to DcmeDA (S112). If DoneDA is "0", Func A and FuncB using DATAs A have not been executed yet, so it is confirmed by a test instruction whether DATAsA exists in the data cache 30 (S114). .

Whether or not DATAsA exists in the data cache 30 can be determined based on whether or not all data included in the DAT AsA exists in the data cache 30. It can also be determined by whether or not the data DATA-A0 exists in the data cache 30 內. If D ATA-A0 exists in the data cache 30, the other parts (DATA-Al, DAT A-A2,;) are also considered to exist in the data cache 30. It is possible to easily and quickly determine the hit and hit / miss mistakes.

If DATAsA exists in the data cache 30, DATAsA is read from the data cache 30 (S172), and FuncA and FuncB are executed (S174, S182). When FuncA and FuncB using DATAsA are completed, DoneDA is updated to "1" (S116). If DoneDA is "1", the portions related to DATAsA (S114, S172, S174, S182, S116, S118) are not executed (S112).

If DATAsA does not exist in the data cache 30, a touch instruction used in the conventional prefetch is sent to the MPU 10 (S118) to prefetch DATAsA. During prefetching of DATAsA, FuncA and FuncB using DATAsB can be executed in parallel.

The following portion related to DATAsB (S122, S124, S176, S178, S186, S126, S128) is the portion related to DATAsA described above (S112, S114, S128). 172, S174, S182, S116, S118). If FuncA and FuncB using DATAsB are not completed (S122), it is confirmed by a test instruction whether DATAsB exists in the data cache 30 (S124). If the data exists in the data cache 30, DATAsB is read from the data cache 30 (S176), and FuncA and FuncB are executed (S178, S186). If it does not exist, prefetching of DATAsB is performed (S128).

If both DoneDA and DoneDB are "1", FuncA and FuncB using DATAsA and DATAsB have all been completed (S130). Unlike the conventional case (Fig. 5 (b)), for example, when DATAsA is not in the data cache 30 and DATAsB is in the data cache 30, the prefetch of DATAsA (S11

In parallel with 8), FuncA and FuncB using DATAsB can be executed (S1 78, S 186) _{0 When} FuncA and FuncB using DATAsB are completed (S126), FuncA and FuncB using DATAs A prefetched to the data cache 30 during the processing are executed (78). S174, S182). Data ・ DatasB present in the cache 30 is processed before DATAsA not present in the cache 30, and DATAsA can be prefetched in parallel with the processing of DATAsB. The waiting time of MP U10 at the time can be shortened. Since the prefetch is performed after confirming the hit / miss of the data cache 30, unlike the conventional prefetch based on the prediction before executing the program, useless prefetch is not executed.

In the above, two data structures (DATAsA, DATAsB) have been described as examples, but the number of data structures that can be processed in parallel is arbitrary. For example, if the number of data structures that can be processed in parallel is 5, five parameters indicating whether or not the routine using each data structure has been completed (for example, DoneDA, Done DB, DoneDC, DoneDD, DoneDE), a test instruction is executed immediately before each data structure is read out as in FIG. 2, and processing can be executed from the data structure confirmed to exist in the data cache 30. There may be a plurality of groups of data structures that can be processed in parallel. A data structure may contain only one piece of data and no power.

Although the data structure (DATAsA, DATAsB) has been described above as an example, the scheduling method of the present invention is also used for reading out Func A and Func B when executing the routine (Func A, Func B). Can be. FIG. 3 shows the conventional (FIG. 6) flow chart showing the scheduling part (S140, S142, S144, S146, S148, S152, S154, S156, S15) of the present invention. 8, the flow chart with S160) added is shown. The readout and execution part (S190, S192, S194, S196) of FuncA and FuncB are It is the same as Fig. 6).

In FIG. 3, DoneFA is a parameter indicating whether FuncA has been executed, and DoneFB is a parameter indicating whether FuncB has been executed. If DoneFA and DoneFB are "1", it indicates that FuncA and FuncB have been executed, respectively, and if DoneFA and DoneFB are "0", it indicates that FuncA and FuncB have not been executed.

The scheduling of the routine (FuncA, FuncB) is the same as the scheduling of the data structure (DATAsA, DATAsB) described above. Before reading out FuncA, FuncB (S190, S194), it is checked whether or not FuncA, FuncB exists in the instruction cache 20 by a test instruction (S144, S154). If it exists, the instruction is read from the instruction cache 20 (S190, S194) and executed (S192, S196). If not, the instruction is prefetched (S148, S158).

Whether FuncA exists in the instruction cache 20 or not can be determined based on whether all instructions included in FuncA exist in the instruction cache 20, but for simplicity, the first instruction of FuncA Inst- The determination can also be made based on whether or not A0 exists in the instruction cache 20. If Inst-AO exists in instruction cache 20, other parts (Inst-Al, Inst-A2,) are also considered to be present in instruction cache 20, so that it is easy and easy to determine cache hit and hit / miss. Can be done at high speed.

Unlike the conventional case (Fig. 6), if FuncA is not in instruction cache 20 and FuncB is in instruction cache 20, FuncB can be executed in parallel with FuncA prefetch (S148). (S196). When the execution of FuncB is completed (S156), FuncA prefetched by the instruction cache 20 is executed while FuncB is being executed (S192).

Instruction cache 20 more than FuncA not present in instruction cache 20 Since the existing Func B is executed first, and the Func A can be prefetched in parallel with the execution of Func B, the wait time of the MPU 10 when the instruction cache 20 hits or misses can be shortened. Since prefetching is performed after checking the hit-miss of the instruction cache 20, unlike the conventional prefetch based on the prediction before executing the program, useless prefetching is not performed.

In the above, two routines (FuncA, Func B) have been described as examples, but the number of routines that can be executed in parallel is arbitrary. For example, if the number of routines that can be executed in parallel is 5, increase the parameter indicating whether execution of each routine has been completed to 5 (for example, DoneFA, DoneFB, DoneFC, DoneFD, DoneFE). As in FIG. 3, a test instruction can be executed before each routine is read, and execution can be started from a routine that has been confirmed to exist in the instruction cache 20. There may be more than one group of routines that can be processed in parallel. A routine may have only one instruction and no power. The scheduling at the time of reading the data structure and the scheduling at the time of reading the routine can be arbitrarily combined. For example, the scheduling shown in FIG. 3 can be used for reading out Func A and FuncB shown in FIG. Since the instruction cache 20 and the data-cache 30 are independent of each other, reading and writing of the instruction cache 20 and reading and writing of the data cache 30 can be executed independently.

The scheduling part of the present invention, which is added to the conventional flow charts shown in FIGS. 5 (b) and 6, is a part related to the reading and processing of the conventional routine or data structure. Does not branch at all. The scheduling part added by the present invention does not affect other parts of the flow chart. The present invention reorders data structures and routines. Instructed in the program. Unlike multi-threading, in which switching is controlled by the OS scheduler, since the content switch is not activated, the load on the MPU and OS is small and high-speed processing is possible.

Addition of test instructions to the part immediately before reading a routine or data structure capable of parallel processing can be added automatically at compile time, or manually added to the source program. In general, in the case of algorithms such as matrix operations, the compiler can detect concurrency, and in many cases, test instructions can be added automatically at the time of compilation. Since the compiler cannot detect concurrency in the part related to 1/0 (input / output), test instructions are often added manually.

As described above, the present invention has been described with reference to the specific embodiments, but the present invention is not limited thereto. For example, a test command can be sent to TLB (Translation Lookaside Buffer) used for address translation. TLB is a cache memory in which a part of the address conversion table existing in the external memory 40 is stored. The exchange of the address translation table stored in the TLB is performed by automatic search of PTE (Page Table Entry). When using the TLB, a cache hit indicates that both the instruction cache (or data cache) and the TLB have a cache hit.

Automatic search of PTE is performed by accessing TL while accessing external memory 40 multiple times.

Swap the data stored in B. By using the MPU and the instruction execution order scheduling method of the present invention, it is possible to shorten the MPU wait time in the case of a hit / miss of the TLB as in the case of a hit / miss of the data cache and the instruction cache. Can be. The test instruction may examine the instruction cache, data cache, and TLB cache hits all at once, or may examine each cache hit individually. In addition, the present invention can be embodied in variously modified, modified, and modified forms based on the knowledge of those skilled in the art without departing from the spirit thereof.

In the microprocessor of the present invention, an instruction (test instruction) for checking whether a required routine or data structure exists in the cache memory (instruction cache, data cache) is added. The test instruction can determine whether the routine or data structure to be read exists in the cache memory immediately before reading the routine or data structure.

According to the scheduling method of the present invention, a routine or a data structure that can be processed in parallel is examined by a test instruction described above to determine whether a routine or a data structure to be read is present in a cache memory or not. Based on the result, the routine or data structure existing in the cache memory is read preferentially. Routines or data structures that exist in the cache memory are read and processed before routines or data structures that do not exist in the cache memory, and routines or data that do not exist in the cache memory are processed in parallel with the processing. Prefetching the structure can reduce the latency of the micro processor due to cache 'memory hit' misses.

Claims

The scope of the claims

1. A cache memory that stores a part of the instruction read from the external memory and a part of the data, and an instruction read from the cache memory or the external memory or an instruction using the read data. A microprocessor for processing instructions in an order instructed by the program, comprising:

The instruction executed by the execution unit includes a test instruction for confirming whether a required routine including a plurality of instructions or a required data structure including a plurality of data exists in the cache memory. Microprocessor included.

2. The microprocessor according to claim 1, wherein the test instruction includes an instruction for confirming whether or not a head address portion of the required routine or data structure exists in the cache memory.

3. The cache memory capacity

The instruction cache where the routine is stored

Data structure where data is stored

3. The microprocessor according to claim 1 or claim 2, comprising:

4. The test instruction comprises: an instruction for determining whether the required routine is present in the instruction cache memory;

Instructions for checking whether the required data structure exists in the data cache memory;

4. The microprocessor of claim 3, comprising:

5. A unit for reading a required routine or data structure in parallel with execution of the instruction of the execution unit. A microprocessor according to any one of claims 1 to 4.

6. When the microprocessor executes a routine including a plurality of instructions read from the external memory or the cache memory or a routine using a data structure including a plurality of data in an order specified by the program. A method of scheduling an instruction execution order for a routine or a data structure capable of parallel processing included in the routine or the data structure, comprising:

Immediately before reading a routine or data structure capable of parallel processing, a confirmation step for confirming whether or not the routine or data structure exists in the cache memory or not;

A priority execution step of causing the microprocessor to preferentially process the routine or data structure among the routines or data structures that can be processed in parallel and confirmed to be present in the cache memory.

And an instruction execution order scheduling method.

7. In the priority execution step, if the routine or data structure to be read exists in the cache memory, the routine or data structure is read from the cache memory and processed by the microprocessor. Steps and

If the routine or data structure being read does not exist in cache memory, an instruction step for instructing the microprocessor to read the routine or data structure from external memory.

7. The instruction execution order scheduling method according to claim 6, comprising:

8. The instruction execution order scheduling method according to claim 7, wherein the reading of the routine or the data structure of the instruction step from the external memory is performed in parallel with the execution step.

9. The checking step is performed after the execution step or the instruction step. Checking for any unfinished routines or data structures;

If there is a routine or data structure that has not been processed, the microprocessor checks whether the routine or data structure exists in the cache memory.

9. The instruction execution order scheduling method according to claim 6, comprising:

10. The execution step includes a step of updating execution completion information indicating whether the routine or the data structure has been processed, for the completed routine or data structure,

4. The method according to claim 1, wherein the step of checking whether there is a routine or a data structure whose processing has not been completed includes checking whether there is a routine or a data structure whose processing has not been completed based on the execution completion information. The instruction execution order scheduling method of any one of 9 above.

11. The instruction execution order scheduling method according to any one of claims 6 to 9, wherein the confirming step confirms the presence in the cache memory based on a start address portion of the routine or the data structure. .