WO2016113869A1

WO2016113869A1 - Arranging apparatus for functions in cache memory

Info

Publication number: WO2016113869A1
Application number: PCT/JP2015/050845
Authority: WO
Inventors: 孝祐水野
Original assignee: 三菱電機株式会社
Priority date: 2015-01-14
Filing date: 2015-01-14
Publication date: 2016-07-21
Also published as: JP6138384B2; JPWO2016113869A1

Abstract

The present invention comprises: an acquisition unit (21) that acquires a calling order (31) for functions; a virtual cache memory generation unit that generates a virtual cache memory having storage areas corresponding to the number of ways for cache memory; a simulator unit that performs a simulation calling an instruction code included in each of a plurality of functions on the calling order (31) as a call instruction code with respect to the virtual cache memory and that, in cases where conflict arises, acquires information concerning the conflict as conflict information (70); and an arrangement position determination unit (80) that determines an arrangement position of functions in the cache memory on the basis of the conflict information 70.

Description

[Name of invention determined by ISA based on Rule 37.2] Function allocation device in cache memory

The present invention relates to a program placement apparatus, a program placement method, and a program placement program.

In a computer system, a cache memory is used to improve performance. In the computer system, high-speed reading is possible by copying the information in the main memory to the cache memory. However, since the cache memory has a small capacity, information eviction associated with instruction reading, that is, a contention cache miss occurs. The occurrence of a competitive cache miss leads to a decrease in the execution speed of the program, so it is necessary to suppress it as much as possible.

Therefore, a method has been studied in which instructions that are likely to cause contention cache misses are not allocated to the same cache line.
Focusing on the fact that programs are structured in units of functions, a method has been proposed in which function strengths representing the call relationship between functions are defined, and functions with high function strengths are not assigned to the same cache line. .

In Patent Literature 1, when executing a program, a dynamic function flow expressing the function calling order as a time series is generated, and based on the generated dynamic function flow, function strength information on a certain function and all other functions. Ask for. Patent Document 1 discloses a technique for reducing contention cache misses by arranging functions in a memory space based on the function strength information.

In addition, a method has been proposed that focuses on copying data from the main memory to the cache memory in units of cache lines.
In Patent Document 2, a function is divided into instruction code blocks having a cache line size, that is, an ICB, simulated, and flow information is extracted in ICB units. In Patent Document 2, information that considers the appearance frequency of an ICB that belongs to a function different from itself in the vicinity of each ICB is obtained as the neighbor weight information of another ICB viewed from the own ICB. A technique for reducing contention cache misses by determining function placement based on this neighborhood weight information is disclosed.

JP 2009-032198 A JP 2010-218218 A

In the prior art, the correlation between functions is obtained based on the time-series relationship between functions at the time of program execution or between cache lines, so that highly correlated functions are not assigned to the same cache line. Functions were placed in memory space.
However, depending on the configuration of the cache memory, there may be a case where a conflict cache miss does not occur even between highly correlated functions. For example, when a 4-way configuration is adopted as the cache memory, a conflict cache miss does not occur even if four highly correlated functions are called continuously and repeatedly.
As described above, depending on the configuration of the cache memory and the function call pattern, there is a possibility that a conflict cache miss may not occur. Not. For this reason, there is a problem that an optimal arrangement cannot be obtained depending on the configuration of the cache memory and the function call pattern.

The object of the present invention is to output optimum function arrangement information regardless of the cache memory configuration and function call pattern.

A program placement apparatus according to the present invention is a program placement apparatus that places a program including a plurality of functions in a cache memory that uses at least one way.
An acquisition unit that acquires the calling order of each function of the plurality of functions by executing the program;
A virtual cache memory generation unit that generates a virtual cache memory having storage areas corresponding to the number of ways of the cache memory;
A simulation is executed to call each function of the plurality of functions as a call instruction code in the calling order with respect to the virtual cache memory, and when the call instruction code that has already been called is called again, other than the call instruction code in the storage area When a conflict in which the function is stored occurs, a simulator unit that acquires information on the generated conflict as conflict information;
An arrangement position determining unit that determines an arrangement position of each function of the plurality of functions in the cache memory based on the contention information.

According to the program placement apparatus according to the present invention, the acquisition unit acquires the function calling order, and the virtual cache memory generation unit generates a virtual cache memory having storage areas corresponding to the number of ways of the cache memory. Also, the simulator unit executes a simulation for calling each function of the plurality of functions as a call instruction code in the calling order with respect to the virtual cache memory, and when a conflict occurs, the information on the generated conflict is acquired as the conflict information. . Further, the arrangement position determining unit determines the function arrangement position in the cache memory based on the competition information. Therefore, it is possible to obtain an optimal function arrangement position in consideration of the cache memory configuration and the function call order.

FIG. 2 is a block configuration diagram showing a program arrangement device according to the first embodiment. FIG. 3 is a block configuration diagram of a virtual cache simulator unit according to the first embodiment. 6 is a diagram showing an execution sequence of ICB according to Embodiment 1. FIG. FIG. 3 is a configuration diagram of conflict information according to the first embodiment. FIG. 3 is a hardware configuration diagram of the program arrangement device according to the first embodiment. FIG. 3 is a flowchart showing the operation of the program placement method of the program placement apparatus according to the first embodiment. FIG. 3 is a flowchart showing an operation of simulation processing according to the first embodiment. FIG. 4 is a diagram for explaining virtual cache memory generation processing according to the first embodiment. FIG. 4 is a diagram for explaining virtual cache memory generation processing according to the first embodiment. The figure which shows the arrangement | positioning condition on the cache memory of the number of sets 8 when the function A and the function B are arrange | positioned in the continuous memory area. The figure which shows the arrangement | positioning condition on the cache memory of the number of sets 8 when the function A and the function B are arrange | positioned in the continuous memory area. The figure explaining the registration method to competition information at the time of cache miss occurrence. The figure explaining the registration method to competition information at the time of cache miss occurrence. FIG. 6 is an example of competition information according to the first embodiment. FIG. 3 is a flowchart showing an operation of competition information registration processing according to the first embodiment. FIG. 10 is a detailed process flowchart of step S2050 according to the first embodiment. FIG. 5 is a detailed process flow diagram of step S2051 according to the first embodiment. FIG. 4 is a diagram showing virtual cache memory update processing according to the first embodiment. FIG. 4 is a diagram showing virtual cache memory update processing according to the first embodiment. FIG. 3 is a flowchart showing an operation of arrangement position determination processing according to the first embodiment. FIG. 3 is a configuration diagram of a rule 131 according to the first embodiment. The flowchart which shows the operation | movement of the calculation process of the number of contention mistakes with the already arranged function of step S303 which concerns on Embodiment 1. FIG. The figure which shows the calculation result of the frequency | count of a cache miss at the time of arrange | positioning the function B in the set 0. The figure which shows the calculation result of the frequency | count of a cache miss at the time of arrange | positioning the function B in the set 1. FIG. The figure which shows the calculation result of the frequency | count of a cache miss at the time of arrange | positioning the function A to the set 0. The figure which shows the calculation result of the cache miss occurrence frequency when function A is arranged in set 1. The figure which shows an example of the function arrangement | positioning information 90. FIG. FIG. 5 is a block configuration diagram showing a program arrangement apparatus according to a second embodiment. FIG. 9 is a flowchart showing an operation of a program placement method of the program placement apparatus according to the second embodiment. FIG. 10 is a flowchart showing an operation of function arrangement adjustment processing according to the second embodiment. The figure which shows an example of the function arrangement | positioning information 90. FIG.

Embodiment 1 FIG.
*** Explanation of configuration ***
FIG. 1 is a block configuration diagram showing a program placement apparatus 500 according to the first embodiment.
The program placement apparatus 500 includes a program 10, an acquisition unit 21, a calling order 31, cache configuration information 40, program information 50, a virtual cache simulator unit 60, competition information 70, a placement position determination unit 80, function placement information 90, and a function placement unit. 100, an optimized program 110, and a priority table 130.

The program placement apparatus 500 places a program including a plurality of functions in a cache memory that uses at least one way. The program placement apparatus 500 is a program optimization apparatus that optimizes the placement position of the program 10 in a cache memory provided in the computer.

Program 10 is an optimization target program that is a target for reducing contention cache misses. The program 10 includes a source code, an object file, and one or more files among execution files.

The acquiring unit 21 acquires the instruction code calling order 31 included in each function of the plurality of functions by executing the program 10. The acquisition unit 21 is a program execution unit 20 that executes the program 10 and acquires the instruction trace 30 that is the function call order 31. The program execution unit 20 may be either a real machine including the target processor or a simulator that simulates the target processor as long as it has a mechanism that can acquire the instruction trace 30.

The instruction trace 30 is data in which instruction addresses of execution instructions when the program 10 is executed are arranged in time series.
The cache configuration information 40 is information on the cache memory 401 mounted on the target processor that finally operates the program 10, and includes information on the cache line size, the number of ways, the number of sets, and the type of replacement algorithm.
The program information 50 holds information on a set of function labels, arrangement addresses, and function sizes for all functions included in the program 10.

The virtual cache simulator unit 60 receives the instruction trace 30, the cache configuration information 40, and the program information 50 and outputs contention information 70. Based on the instruction trace 30, the virtual cache simulator unit 60 generates conflict information for each ICB included in each function, which cannot be acquired by the cache memory 401 that is a real cache memory, and outputs it as the conflict information 70.
The contention information 70 is a contention miss database that stores contention information in units of ICBs of the cache line size included in each function.

The arrangement position determination unit 80 determines the arrangement position of each function of the plurality of functions in the cache memory 401 based on the competition information 70. The arrangement position determination unit 80 receives the cache configuration information 40, the program information 50, and the competition information 70 as inputs, and outputs function arrangement information 90. The arrangement position determination unit 80 arranges each function in the memory space in order based on the determined priority order. When placing a certain function, the placement position determination unit 80 refers to the conflict information 70 and obtains the number of occurrences of a conflict cache miss with an already placed function. The arrangement position determination unit 80 determines the function arrangement position so that the number of occurrences of the contention cache miss is minimized.

The function placement information 90 is a list of function placements that is calculated by the placement location determination unit 80 and minimizes the number of contention cache miss occurrences, and is composed of a set of labels and placement addresses of all functions.
The function placement unit 100 receives the program 10 and the function placement information 90 as input, executes the function rearrangement, and outputs the optimized program 110.
The optimized program 110 has the same operation as that of the program 10, but has a different function arrangement in the program and minimizes contention cache misses.

The priority table 130 sets a rule 131 for determining the priority of each function of a plurality of functions. The arrangement position determination unit 80 determines the priority of each function of a plurality of functions arranged in the cache memory 401 based on the priority table 130. The arrangement position determination unit 80 determines the arrangement position of each function of the plurality of functions in the order of the determined priority.

Each functional block may be implemented as one or a plurality of programs, or a plurality of functional blocks may be implemented as one program. Further, the instruction trace 30, the cache configuration information 40, the program information 50, the conflict information 70, and the function arrangement information 90 may exist as files, or may be data arranged only on the memory.

FIG. 2 is a block configuration diagram of the virtual cache simulator unit 60 according to the present embodiment.
The virtual cache simulator unit 60 includes a virtual cache memory generation unit 601 and a simulator unit 605. The simulator unit 605 includes a virtual cache data holding unit 602, an instruction trace reading unit 603, and a conflict information generating unit 604.

The virtual cache memory generation unit 601 generates virtual cache memories 403 having as many storage areas 4031 as the number of ways in the cache memory 401. Specifically, the virtual cache memory generation unit 601 generates a plurality of areas 4039 having consecutive addresses as the virtual cache memory 403. The plurality of areas 4039 includes a storage area 4031 corresponding to the number of ways and a plurality of temporary storage areas 4032 continuous behind the storage areas 4031 corresponding to the number of ways. Hereinafter, the storage area 4031 may be referred to as a real way and the temporary storage area 4032 may be referred to as a virtual way.
As described above, the virtual cache memory generation unit 601 builds the virtual cache memory 403 that is the data structure of the virtual cache data holding unit 602 based on the cache configuration information 40.

The simulator unit 605 executes a simulation for calling the instruction code 301 included in each function of the plurality of functions in the calling order 31 as the calling instruction code 4033 with respect to the virtual cache memory 403. When the simulator unit 605 calls the call instruction code 4033 that has already been called again, and there is a conflict in which the instruction code 301 other than the call instruction code 4033 is stored in the storage area 4031, the simulator unit 605 displays the information of the generated conflict as the conflict information. Obtained as 70.

Specifically, when the simulator unit 605 calls the call instruction code 4033, a conflict occurs in the storage area 4031 and the call instruction code 4033 is in the temporary storage area 4032, the following conflict occurs: An instruction code set 701 is acquired. The simulator unit 605 acquires a set of all instruction codes and call instruction codes stored in an area before the call instruction code 4033 already stored in the virtual cache memory 403 as a conflict instruction code set 701.

In addition, the simulator unit 605 acquires all the instruction codes stored in the area before the calling instruction code already stored in the virtual cache memory 403 after acquiring the competing instruction code set 701, respectively Move to. The simulator unit 605 stores the call instruction code 4033 already stored in the virtual cache memory 403 in the first storage area 4031 of the virtual cache memory 403, and the instruction code 301 next to the call instruction code 4033 in the call order 31. Is called as the call instruction code 4033.

Here, in the instruction trace 30 which is the calling order 31, an instruction code 301 obtained by dividing a function into cache line sizes, that is, an execution sequence 303 of an ICB for each function is set. Therefore, the instruction code 301 called as the calling instruction code 4033 by the simulator unit 605 is an ICB for each function obtained by dividing the function into the cache line size of the cache memory 401. Hereinafter, the instruction code 301 will be described as an instruction code block, that is, an ICB. The conflicting instruction code set 701 is a conflicting ICB 702 that is a combination of ICBs related to the conflict.

The virtual cache data holding unit 602 reads the instruction trace 30 based on the data structure of the virtual cache memory 403 constructed by the virtual cache memory generation unit 601 and simulates the operation of the cache.

The instruction trace reading unit 603 sequentially extracts the instruction code 301, that is, the ICB, based on the instruction trace 30 and passes the instruction code 301 to the virtual cache data holding unit 602.
When the cache miss occurs in the virtual cache data holding unit 602, the conflict information generation unit 604 registers information regarding the combination of ICBs causing the cache miss in the conflict information 70.

As described above, the simulator unit 605 acquires a combination of instruction codes 301 that caused a conflict among instruction codes included in each function of a plurality of functions as a conflict instruction code set 701. The simulator unit 605 counts the number of conflicts generated by the conflicting instruction code set 701 as the number of conflicts, and acquires the counted number of conflicts as the conflict information 70.

The instruction trace 30 will be described with reference to FIG. The configuration of the competition information 70 will be described with reference to FIG.
It is assumed that the ICB execution sequence 303 of FIG. 3 is obtained from the instruction trace 30. The competition information generation unit 604 generates the competition information 70 shown in FIG. 4 based on the ICB execution sequence 303 shown in FIG.

As shown in FIG. 4, the contention information 70 is composed of contention miss data records 71 for all ICBs. The contention miss data record 71 has an ICB name 72, a total number of misses 73 as its own ICB, a total number of misses 74 as another ICB, and one or more contention miss entries 75.
The conflict miss entry 75 includes a miss ID 76, a conflict ICB number 77, and a miss number 78 for each conflict ICB. The self ICB represents the ICB being focused on, and A0 is the self ICB in the contention miss data record 71 of A0 in FIG. The other ICB represents a set of other ICBs with which the focused ICB is competing. In the conflict miss data record 71 of A0 in FIG. 4, B0 and C0 are other ICBs. The competing ICB represents a set of all ICBs competing when a certain cache miss occurs.

The ICB name 72 is a name for identifying each ICB.
The total number of misses 73 as the own ICB represents the number of contention cache misses that occurred when referring to itself.
The total miss count 74 as another ICB represents the number of contention cache misses that occurred when referring to another ICB that competes with itself.
The contention miss entry 75 is data indicating a cache miss occurrence state in a certain ICB combination, and there is one or more contention miss data records 71 for each contention miss data record 71.

The miss ID 76 is an ID for uniquely identifying a conflicting cache miss in each ICB.
The contention ICB number 77 represents the number of ICBs related when a contention cache miss occurs. The number of competing ICBs 77 includes its own ICB. In the same ICB, although the number of competing ICBs is the same, it is recorded as another cache miss because the number of occurrences of competing cache misses is held for each combination of competing ICBs.

The number of misses 78 for each competing ICB represents the number of occurrences of a competing cache miss for each ICB related to the occurrence of a competing cache miss. The meaning of the value of the number of misses 78 for each competing ICB will be described using the contention miss data record 71 of A0 in FIG. 4 as an example. In the conflict miss entry 75 included in A0, focusing on the row where the miss ID 76 is 3, the conflict ICB number 77 is 2, the miss number 78 for each conflict ICB is 1 for A0, hatched for B0, and 0 for C0. Has been. In the row with the miss ID 3, the number of cache misses in which two ICBs A0 and C0 are used as competing ICBs is set. An ICB in which a number is input indicates that it is related to a cache miss, and a hatched ICB indicates that it is not related. Therefore, in the row where the miss ID is 3, numbers are set in A0 and C0. The numbers indicate the number of occurrences of cache misses caused by the combination of competing ICBs. A row with a miss ID of 3 indicates that in the combination of A0 and C0, a cache miss occurs once when A0 is referenced, and no cache miss occurs when C0 is referenced.

An example of the hardware configuration of the program placement apparatus 500 according to the present embodiment will be described with reference to FIG.

A hardware configuration example of the program placement apparatus 500 will be described with reference to FIG.
The program placement apparatus 500 is a computer.
The program placement apparatus 500 includes hardware such as a processor 901, an auxiliary storage device 902, a memory 903, a communication device 904, an input interface 905, and a display interface 906.
The processor 901 is connected to other hardware via the signal line 910, and controls these other hardware.
The input interface 905 is connected to the input device 907.
The display interface 906 is connected to the display 908.

The processor 901 is an IC (Integrated Circuit) that performs processing.
The processor 901 is, for example, a CPU (Central Processing Unit), a DSP (Digital Signal Processor), or a GPU (Graphics Processing Unit).
The auxiliary storage device 902 is, for example, a ROM (Read Only Memory), a flash memory, or an HDD (Hard Disk Drive).
The memory 903 is, for example, a RAM (Random Access Memory).
The communication device 904 includes a receiver 9401 that receives data and a transmitter 9402 that transmits data.
The communication device 904 is, for example, a communication chip or a NIC (Network Interface Card).
The input interface 905 is a port to which the cable 911 of the input device 907 is connected.
The input interface 905 is, for example, a USB (Universal Serial Bus) terminal.
The display interface 906 is a port to which the cable 912 of the display 908 is connected.
The display interface 906 is, for example, a USB terminal or an HDMI (registered trademark) (High Definition Multimedia Interface) terminal.
The input device 907 is, for example, a mouse, a keyboard, or a touch panel.
The display 908 is, for example, an LCD (Liquid Crystal Display).

The auxiliary storage device 902 includes an acquisition unit 21, a virtual cache simulator unit 60, an arrangement position determination unit 80, and a function arrangement unit 100 (hereinafter, acquisition unit 21, virtual cache simulator unit 60, arrangement position determination unit 80, A program for realizing the function of the function arrangement unit 100 is collectively stored as “part”. A program that realizes the function of the “unit” included in the above-described program placement apparatus 500 is referred to as a program placement program. The program arrangement program may be a single program or may be composed of a plurality of programs.
This program arrangement program is loaded into the memory 903, read into the processor 901, and executed by the processor 901.
Further, the auxiliary storage device 902 also stores an OS (Operating System).
Then, at least a part of the OS is loaded into the memory 903, and the processor 901 executes a program that realizes the function of “unit” while executing the OS.
Although one processor 901 is illustrated in FIG. 5, the program placement apparatus 500 may include a plurality of processors 901.
A plurality of processors 901 may execute a program for realizing the function of “unit” in cooperation with each other.
In addition, information, data, signal values, and variable values indicating the results of the processing of “unit” are stored as files in the memory 903, the auxiliary storage device 902, or a register or cache memory in the processor 901.

The “part” may be provided as “circuitry”.
Further, “part” may be read as “circuit”, “process”, “procedure”, or “processing”.
“Circuit” and “Circuitry” include not only the processor 901 but also other types of processing circuits such as a logic IC or GA (Gate Array) or ASIC (Application Specific Integrated Circuit) or FPGA (Field-Programmable Gate Array). It is a concept to include.

Note that what is called a program product is a storage medium, storage device, or the like on which a program placement program that realizes the function described as “part” is recorded, and can be read by a computer regardless of the appearance format. The program is being loaded.

*** Explanation of operation ***
The program placement method and program placement processing S10 of the program placement apparatus 500 according to the present embodiment will be described with reference to FIG.
As described above, the program arrangement program is a program that is executed by the program arrangement apparatus 500 that is a computer that arranges the program 10 including a plurality of functions in the cache memory 401 using at least one way.

In the acquisition process S1, the acquisition unit 21 executes the acquisition process S1 for acquiring the calling order 31 of the instruction code 301 included in each function of the plurality of functions by executing the program 10. That is, the program execution unit 20 executes the program 10 and extracts the instruction trace 30.

In the virtual cache memory generation process S1a, the virtual cache memory generation unit 601 executes a virtual cache memory generation process S1a for generating virtual cache memories 403 having as many storage areas 4031 as the number of ways in the cache memory 401.

In the simulation process S2, the simulator unit 605 executes the simulation process S2 that calls the instruction code 301 to the virtual cache memory 403 as the call instruction code 4033 in the calling order 31. In the simulation process S2, the simulator unit 605 conflicts the conflict information when there is a conflict in which an instruction code other than the call instruction code is stored in the storage area 4031 when the call instruction code 4033 that has already been called is called again. Obtained as information 70. That is, in the simulation process S 2, the virtual cache simulator unit 60 generates the competition information 70 based on the instruction trace 30, the cache configuration information 40, and the program information 50.

In the arrangement position determination process S3, the arrangement position determination unit 80 executes an arrangement position determination process S3 for determining an arrangement position 801 of each function of the plurality of functions in the cache memory 401 based on the competition information 70. That is, in the arrangement position determination process S3, the arrangement position determination unit 80 receives the cache configuration information 40, the program information 50, and the competition information 70 as input, and generates function arrangement information 90 related to the function arrangement position 801.

Finally, in the function placement process S4, the function placement unit 100 receives the program 10 and the function placement information 90 as input, executes the function placement process S4 that places the function in the cache memory 401, and outputs the optimized program 110 To do.

FIG. 7 is a flowchart showing the operation of the virtual cache simulation process S200 according to the present embodiment.
The virtual cache simulation process includes a virtual cache memory generation process S1a and a simulation process S2. The virtual cache simulator unit 60 executes a virtual cache simulation process S200 for generating contention information 70 based on the information of the instruction trace 30, the cache configuration information 40, and the program information 50.

<Virtual cache memory generation processing S1a>
In step S200, the virtual cache memory generation unit 601 executes a virtual cache memory generation process S1a for generating the virtual cache memory 403 based on the cache configuration information 40. Details of the virtual cache memory generation processing S1a will be described later. The virtual cache data holding unit 602 holds data in the virtual cache memory 403 configured by the virtual cache memory generation unit 601.

<Simulation process S2>
In step S201, the instruction trace reading unit 603 determines whether the instruction trace 30 has been read to the end. If not read, the process proceeds to step S202. If read, the process ends.

In step S202, the instruction trace reading unit 603 extracts one instruction address from the instruction trace 30, and the process proceeds to step S203.
In step S203, the instruction trace reading unit 603 executes an ICB acquisition process S203 for obtaining an ICB including the extracted instruction address. After executing the ICB acquisition process S203, the instruction trace reading unit 603 proceeds to step S204.

In step S204, the conflict information generation unit 604 determines whether the ICB causes a cache miss. If a cache miss occurs, the contention information generation unit 604 proceeds to step S205, and if not, the contention information generation unit 604 proceeds to step S206.
In step S205, the conflict information generation unit 604 registers the combination information of the ICB when a cache miss occurs as the conflict ICB 702 in the conflict information 70, and proceeds to step S206. Details of step S205 will be described later.
In step S206, the contents of the cache are updated according to the read block, and the process returns to step S201. The virtual cache simulator unit 60 continues the above processing until step S201 becomes true, that is, until YES in step S201.

<Detailed Description of Virtual Cache Memory Generation Processing S1a>
The virtual cache memory generation process S1a will be described with reference to FIGS.
The virtual cache memory generation unit 601 changes the configuration of the virtual cache memory 403 based on the cache configuration information 40. The virtual cache memory generation unit 601 generates a plurality of areas 4039 having consecutive addresses. The plurality of areas 4039 are configured by storage areas 4031 that are actual ways corresponding to the number of ways, and temporary storage areas 4032 that are a plurality of virtual ways that follow the storage area 4031.

FIG. 8 shows a method of generating the virtual cache memory 403 when the cache memory 401 is of the direct map method, the number of ways is 1, the number of sets is 2, 64B cache lines, and the LRU (Least Recently Used) method.
FIG. 9 shows a method for generating the virtual cache memory 403 when the cache memory 401 is a 2-way set associative method, the number of ways is 2, the number of sets is 4, a 64B cache line, and an LRU method.
The virtual cache memory 403 is configured as an N-way set associative cache with one set regardless of the number of ways and the number of sets of the cache memory 401. Here, N is a parameter that can be arbitrarily set by the user, and as the value is set larger, the correlation between a large number of ICBs can be observed.

As described above, the virtual cache memory 403 is configured as an N-way, but the timing for determining a cache miss when executing a simulation differs depending on the configuration of the cache memory 401. That is, in the virtual cache memory generation process S1a, the number of ways in the cache memory 401 is handled as the storage area 4031, that is, the actual way, and the remaining ways are handled as the temporary storage area 4032, that is, the virtual way.

In the example of FIG. 8, the real way is 1 and the virtual way is N-1. In the example of FIG. 9, the real way is 2 and the virtual way is N-2. Even if the data to be accessed is in the virtual way, if it is not in the real way, the contention information generation unit 604 treats that a cache miss has occurred. As described above, by configuring the virtual cache memory 403, the correlation between ICBs at the timing of occurrence of a cache miss that cannot be observed in the cache memory 401 is detected. The cache line size and replacement algorithm of the virtual cache memory 403 are the same as those of the cache memory 401.

<ICB acquisition process S203>
It supplements about ICB acquisition process S203 which calculates | requires ICB containing the instruction address in step S203.
In order to obtain the ICB including the instruction address, it is necessary to select how to divide the function into ICBs. As a technique, there are a method of dividing by the size of ICB from the top of the function shown in FIG. 10, and a method of dividing from the top of the cache line including the function shown in FIG.

10 and 11 show an arrangement state on the cache memory with the number of sets 8 when the function A and the function B are arranged in a continuous memory area. Function A is arranged from set 0 to set 3, and function B is arranged from set 3 to set 7. When an ICB of a function is expressed, a format in which the ICB serial number of the function is added to the function label is used. Here, the ICB

serial numbers

0, 1, 2, and 3 are added to the function labels A and B, respectively.

In the method shown in FIG. 10, the function is divided into ICB from the beginning. The function A is divided as A0 to A3, and the function B is divided as B0 to B3. As described above, the method shown in FIG. 10 does not consider the actual cache arrangement and performs division based on the top of the function.

On the other hand, in the method shown in FIG. 11, the division into ICBs is performed in units of cache lines in consideration of the actual memory arrangement. The instruction codes of function A from set 0 to set 2 are divided into A0, A1, and A2, respectively. In the case of sharing with another function B as in the set 3 of the function A, only the instruction code belonging to the own function is divided as an ICB. That is, it is assumed that the first half of the contents of A3 is the instruction code of the function A, and the remaining half contains nothing.
Further, the set 3 of the function B starts not from the beginning of the cache line but from the middle. Even in such a case, the division from the top of the cache line of set 3 to the ICB is executed instead of from the top of the function B. That is, it is assumed that the front half of B0 contains nothing and the other half contains the instruction code of function B.
As described above, by executing the ICB division in consideration of the memory arrangement, it is possible to record an accurate contention cache miss occurrence situation according to the actual program operation. Therefore, in the present embodiment, the method shown in FIG. 11 is adopted.

<Competitive information registration process S205>
The competition information registration process S205 for registering the ICB combination information in the competition information 70 will be described with reference to FIGS.
Here, a case is considered in which an execution sequence 303 of ICB as shown in FIG.
FIG. 12 shows the contents of the virtual cache memory 403 with the number of actual ways 1 at the timing when C0 which is the first call instruction code 4033 is referred to. At the timing referring to C0 for the first time, since there is no C0 in the real way, a cache miss occurs. Since C0 does not exist in the virtual way, it is understood that the first reference to C0 is an initial reference miss. In the case of an initial reference mistake, information is not recorded in the conflict information 70.

FIG. 13 shows the contents of the virtual cache memory with the number of actual ways 1 at the timing when B0 which is the second call instruction code 4033 is referred to. At the timing of referring to B0 for the second time, since there is no B0 in the real way, a cache miss occurs. On the other hand, since B0 exists in the virtual way, the second reference to B0 is treated as a conflict cache miss, that is, a conflict has occurred. All ICBs on the left side of B0 which is the call instruction code 4033 existing in the virtual way and B0 which is the call instruction code 4033 are registered in the conflict information 70 as the conflict ICB 702. That is, the simulator unit 605 registers all ICBs and B0 in the area before B0 which is the call instruction code 4033 existing in the virtual way as the conflicting ICB 702 in the conflict information 70.

The contention information generation unit 604 first records the contention miss data record 71 corresponding to B0 as a contention cache miss of its own ICB by a combination of B0 and C0. The contention information generation unit 604 then records the contention miss data record 71 of C0, which is the contention ICB, as a contention cache miss of another ICB by a combination of B0 and C0. If there are other competing ICBs, the same processing as in C0 is executed. The contention information 70 after the second B0 reference is as shown in FIG. As described above, when recording the contention information 70, not only the contention cache miss that occurs when referring to the own ICB but also the number of contention cache misses that occur due to the influence of other ICBs are recorded.

The operation of the conflict information registration process S205 will be described with reference to FIGS.
The overall processing flow of the conflict information registration process S205 will be described with reference to FIG.
In step S2050, the competition information generation unit 604 registers the own ICB in the competition information 70.
In step S 2051, the conflict information generation unit 604 registers ICBs other than the own ICB in the conflict information 70 among the competing ICBs.

<Detailed Description of Step S2050 Own ICB Registration Process>
The detailed processing flow of step S2050 in FIG. 15 will be described with reference to FIG.
In step S205000, the competition information generation unit 604 determines whether the own ICB exists in the competition information 70. The competition information generation unit 604 transitions to step S205001 if it exists, and transitions to step S205002 if it does not exist.
In step S205001, the contention information generation unit 604 extracts the contention miss data record 71 of the own ICB, and transitions to step S205005.

In step S205002, the conflict information generation unit 604 generates a conflict miss data record 71 of its own ICB, and the process proceeds to step S205003.
In step S205003, the self ICB name is set in the ICB name 72, and the process proceeds to step S205004.
In step S205004, the total number of mistakes 73 as the own ICB is initialized to 1, and the total number of mistakes 74 as another ICB is initialized to 0, and the process proceeds to step S205007.

In step S205005, the conflict information generation unit 604 confirms whether the record of the combination of conflict ICBs to be registered, that is, the conflict miss entry 75 already exists. The competition information generation unit 604 transitions to step S205006 if it exists, and transitions to step S205007 if it does not exist.
In step S205006, the contention information generation unit 604 increments the value of the total number of mistakes 73 as its own ICB by 1, and proceeds to step S205011.

In step S205007, the conflict information generation unit 604 generates the conflict miss entry 75, adds the generated conflict miss entry 75 to the conflict miss data record 71, and transitions to step S205008.
In step S205008, the conflict information generation unit 604 allocates a new miss ID 76 for the cache miss to be processed, and the process proceeds to step S20509.
In step S20509, the conflict information generation unit 604 sets the conflicting ICB number of the cache miss to be processed to the conflicting ICB number 77, and the processing transitions to step S205010.
In step S205010, the contention information generation unit 604 initializes the value of the contention ICB to 0 in the number of misses 78 for each contention ICB, and transitions to step S205011.

In step S205011, the conflict information generation unit 604 increments the value of its own ICB by 1 in the number of misses 78 for each competing ICB, and ends the processing.

<Detailed Description of Step S2051 Other ICB Registration Process>
The detailed processing flow of step S2051 in FIG. 15 will be described with reference to FIG.
In step S205100, the conflict information generation unit 604 determines whether there is an unprocessed conflict ICB. The competition information generation unit 604 transitions to step S205101 if it exists, and ends if it does not exist.

In step S205101, the competition information generation unit 604 selects one of the unprocessed competition ICBs, and transitions to step S205102.
In step S 205102, the conflict information generation unit 604 determines whether a conflict ICB to be processed exists in the conflict information 70. The competition information generation unit 604 transitions to step S205103 if it exists, and transitions to step S205104 if it does not exist.

In step S205103, the conflict information generation unit 604 takes out the conflict miss data record 71 of the conflict ICB to be processed, and proceeds to step S205107.

In step S205104, the conflict information generation unit 604 generates a conflict miss data record 71 of the target conflict ICB, and the process proceeds to step S205105.
In step S205105, the conflict information generation unit 604 sets the target ICB name to be processed in the ICB name 72, and the process proceeds to step S205106.
In step S205106, the contention information generation unit 604 initializes the total number of mistakes 73 as its own ICB to 0 and the total number of mistakes 74 as another ICB to 1, and proceeds to step S205109.

In step S205107, the conflict information generation unit 604 checks whether a record of the combination of conflict ICBs to be registered, that is, the conflict miss entry 75 already exists. When it exists, it changes to step S205108, and when it does not exist, it changes to step S205109.
In step S205108, the conflict information generation unit 604 increments the value of the total number of mistakes 74 as the other ICB by 1, and the process proceeds to step S205112.

In step S205109, after generating the conflict miss entry 75, the conflict information generation unit 604 adds the generated conflict miss entry 75 to the conflict miss data record 71, and proceeds to step S205110.
In step S205110, the conflict information generation unit 604 allocates a new miss ID 76 for the cache miss to be processed, and the process proceeds to step S205111.
In step S205111, the conflict information generation unit 604 sets the conflicting ICB number of the cache miss to be processed as the conflicting ICB number 77, and the processing transitions to step S205112.
In step S205112, the contention information generation unit 604 initializes the value of the contention ICB to 0 in the number of misses 78 for each contention ICB, and proceeds to step S205113.

In step S205113, the conflict information generation unit 604 increments the value of its own ICB by 1 in the number of misses 78 for each competing ICB, and returns to step S205100.
The conflict information generation unit 604 continues the above processing until step S205100 is false, that is, NO in step S205100.

<Detailed Description of Step S206 Cache Content Update Processing>
Next, the cache content update processing in step S206 of FIG. 7 will be described with reference to FIGS.
In step S 206, the virtual cache simulator unit 60 performs ICB replacement in the virtual cache memory 403 based on the replacement algorithm set in the cache configuration information 40. In the virtual cache memory 403, the ICB to be replaced is arranged on the rightmost side. In the LRU method, an ICB existing in a way with a smaller number indicates that access has been made more recently.

18 and 19 show how the contents of the cache are replaced by the LRU method.
In FIG. 18, since C0 does not exist in the virtual cache memory 403 when C0 is referred to, the virtual cache simulator unit 60 shifts all ICBs existing in the virtual cache memory 403 to the right by one way, Save C0 in real way 0. In other words, the virtual cache simulator unit 60 shifts all ICBs existing on the virtual cache memory 403 backward by one area, and stores C0 in the top real way of the virtual cache memory 403.

In FIG. 19, since B0 exists on the virtual cache memory 403 when referring to B0, the virtual cache simulator 60 moves B0 existing on the virtual way 1 to the real way 0, and all the left side of B0 ICB is shifted one way to the right. In FIG. 19, since the ICB on the left side of B0 is only C0, the virtual cache simulator unit 60 shifts C0 to the right by one way. In other words, after acquiring the competing ICB 702, the virtual cache simulator unit 60 moves all ICBs stored in the area before B0, which is the call instruction code 4033, to the next area. Then, the virtual cache simulator unit 60 stores B0 in the top real way of the virtual cache memory 403.

When replacement is performed in the cache memory 401, replacement between ways does not occur, and processing for validating a bit representing a replacement target is performed. However, in the virtual cache memory 403, a reference was made between two references of a certain ICB. Since it is necessary to detect the ICB, such an update process is performed.

<Arrangement position determination process S3>
FIG. 20 is a flowchart showing the operation of the arrangement position determination process S3 according to the present embodiment.
The arrangement position determination unit 80 outputs function arrangement information 90 based on the competition information 70.
First, in step S300, the arrangement position determination unit 80 determines the priority order for arranging the functions. The arrangement position determination unit 80 determines the order in which the functions are arranged based on the rule 131.

The rule 131 will be described with reference to FIG. FIG. 21 is an example of the rule 131, and the rule 131 may be other than FIG. 21.
In the rule 131, rules 1 to 4 are set. The arrangement position determination unit 80 applies the rule from a smaller number, and applies the following rule when the same value is obtained by a plurality of functions.

First, the arrangement position determination unit 80 gives priority to each function of a plurality of functions according to rule 1 in descending order of the sum of the total number of mistakes 74 as other ICBs belonging to each ICB. Whether or not to cause a contention error is determined by the relationship with the already-arranged function, and therefore the arrangement position determination unit 80 arranges from a function that is highly likely to affect other ICBs.
Next, the arrangement position determination unit 80 applies rule 2 between functions having the same value in rule 1 and assigns priorities according to the number of types of competitive ICBs of each ICB belonging to the function. Rule 2 is applied in order to preferentially arrange a function that affects more ICBs even if the number of contention errors occurring in the other ICBs is the same.
If neither rule 2 nor rule 2 is superior or inferior, the placement position determination unit 80 applies rule 3 and determines the total number of mistakes 73 as the own ICB of each ICB belonging to the function. Finally, the arrangement position determining unit 80 gives priority to the size of the function of the rule 4. If the priority order is not determined by rule 4, the arrangement position determination unit 80 selects a function to be arbitrarily arranged.

In step S301, the arrangement position determination unit 80 determines whether or not all functions have been arranged, and if not arranged, the process proceeds to step S302, and if arranged, the process proceeds to step S305.
In step S 302, the arrangement position determining unit 80 extracts one function having a higher priority from unallocated functions.
In step S303, the fetched processing target function is moved from set 0 to the last set, and in all placement methods, the number of contention cache miss occurrences caused by the already placed function is calculated, and the place where the minimum number of misses is obtained is obtained. . Details of step S303 will be described later.
In step S304, the arrangement position determining unit 80 determines an arrangement at a location indicating the minimum number of contention cache miss occurrences, and transitions to step S301. The above processing is continued until step S301 becomes true, that is, until YES is determined in step S301.
In step S305, the arrangement position determination unit 80 summarizes the above arrangement results and outputs function arrangement information 90.

<Step S303: Computation processing for the number of contention errors with the already placed function>
With reference to FIG. 22, the calculation processing of the number of mistakes with the already arranged function in step S303 will be described.
In step S30300, the arrangement position determining unit 80 sets an arrangement target function that is a function to be arranged in a place where it has not yet been arranged, and the process proceeds to step S30301.
In step S30301, the arrangement position determining unit 80 initializes a temporary variable for storing the number of misses to 0, and proceeds to step S30302.
In step S30302, the arrangement position determination unit 80 extracts ICBs for which the number of cache misses has not yet been calculated from the ICBs included in the arrangement target function, and proceeds to step S30303.

In step S30303, it is checked whether the extracted ICB to be processed exists in the competition information 70. If it exists, the process proceeds to step S30304, and if not, the process proceeds to step S30310.
In step S30304, the arrangement position determination unit 80 extracts the contention error data record 71 of the extracted processing target ICB, and proceeds to step S30305.
In step S30305, the arrangement position determining unit 80 extracts one unexamined conflict miss entry 75, and proceeds to step S30306.
In step S30306, the arrangement position determining unit 80 checks the number of competing ICBs already arranged among the competing ICBs included in the conflict miss entry 75, and proceeds to step S30307.

In step S30307, the arrangement position determination unit 80 transitions to step S30308 when the number of already arranged competitive ICBs + 1 is larger than the number of ways described in the cache configuration information 40, that is, when a cache miss occurs. The arrangement position determination unit 80 transitions to step S30309 when the number of already arranged competing ICBs + 1 is equal to or less than the number of ways described in the cache configuration information 40, that is, no cache miss occurs.

In step S30308, the arrangement position determining unit 80 totals the number of misses 78 for each competing ICB included in the contention miss entry 75, adds it to the temporary variable for the number of misses, and proceeds to step S30309.
In step S30309, the arrangement position determination unit 80 determines whether all the contention miss entries 75 included in the processing target ICB have been checked. The arrangement position determination unit 80 transitions to step S30310 when all are checked, and returns to step S30305 when there is a contention miss entry 75 that is not checked.

In step S30310, the placement position determination unit 80 determines whether all ICBs included in the placement target function have been examined. The arrangement position determination unit 80 transitions to step S30311 when it is examined, and returns to step S30302 otherwise.
In step S30311, the arrangement position determining unit 80 transitions to step S30312 if the value of the temporary variable for the number of misses is smaller than the current minimum miss number, and transitions to step S30313 otherwise.
In step S30312, the arrangement position determination unit 80 updates the minimum number of misses and the minimum arrangement, and the process proceeds to step S30313.
In step S30313, the arrangement position determination unit 80 determines whether all arrangements have been tried in the arrangement target function. If the arrangement has been tried, the arrangement ends. If not, the arrangement position determination unit 80 returns to step S30300.

The arrangement position determining unit 80 repeats the above processing until step S30313 becomes true, that is, until YES is determined in step S30313. If “YES” in the step S30313, it means that all arrangements have been tried in the arrangement target function.

Next, the arrangement position determination process S3 will be described in detail using a specific example.
When the execution sequence 303 of the ICB as shown in FIG. 3 is obtained from the instruction trace 30, the arrangement in the direct map cache with the number of sets 2 is considered. There are three types of functions A, B, and C, and all the functions are assumed to have a size within one ICB. When the competition information 70 is generated from the execution sequence of FIG. 3, the result is as shown in FIG.

Specific processing by the arrangement position determination unit 80 will be described below using the detailed processing flow of step S3 in FIG.
In step S300, the priority order of the functions is calculated from the rule 131 and the competition information 70, and it is determined that the functions C, B, and A are arranged in this order.

In step S302, the function C is first extracted. Since function C is the first function, it is placed in set 0. Subsequently, in step S302, the function B is extracted.
In step S303, the function B is arranged from the set 0 to the last set, and the place where the smallest number of misses among the number of contention misses occurring with the existing arrangement function is obtained.
FIG. 23 shows the calculation result of the number of cache miss occurrences when the function B is arranged in the set 0. A cache miss occurring with C0 when function B is placed in set 0 is calculated. As a result of the calculation, CM (B0, C0) is obtained as 4. Here, CM (X, Y) represents the number of cache misses when ICBs specified in parentheses are arranged in the same set. Therefore, CM (B0, C0) is the number of cache misses when B0 and C0 are arranged in the same set.
FIG. 24 shows the calculation result of the number of cache miss occurrences when the function B is arranged in the set 1. When the function B is arranged in the set 1, CM (B0) = 0 is obtained.
As a result, the function B is arranged in the set 1 in step S304.

Finally, the function A is taken out and the calculation of the number of cache misses is executed in the same manner.
In step S303, the function A is arranged from the set 0 to the last set, and the place where the smallest number of misses among the number of contention misses occurring with the existing arrangement function is obtained.
FIG. 25 shows the calculation result of the number of cache miss occurrences when the function A is arranged in the set 0. The cache miss that occurs with C0 when function A is placed in set 0 is calculated. As a result of the calculation, CM (A0, C0) = 3 + 2 = 5 is obtained. CM (A0, C0) is the number of cache misses when A0 and C0 are arranged in the same set.
FIG. 26 shows the calculation result of the number of cache miss occurrences when the function A is arranged in the set 1. When the function A is arranged in the set 1, CM (A0, B0) = 2 + 1 = 3 is obtained.
As a result, the function A is arranged in the set 1 in step S304.

Since the arrangement of all the functions is completed after the arrangement of the function A, step S301 becomes true, and the process proceeds to step S305.
In step S305, function allocation information 90 is output assuming that function C is allocated to the address stored in set 0, and function B and function A are allocated to the addresses stored in set 1, respectively.

*** Explanation of effects ***
Program placement apparatus 500 according to the present embodiment includes a virtual cache simulator unit that receives instruction traces and extracts contention information between functions. In addition, a conflict miss database that holds the conflict information of each ICB is provided, and a layout position determination unit that calculates a function layout that suppresses the conflict miss is provided based on the conflict miss database. The program placement apparatus 500 divides a function into ICBs having a cache line size, extracts ICB combination information when a contention miss occurs, and places the function based on the information. Therefore, according to the program placement apparatus 500, it is possible to determine the placement that suppresses the contention error.

In the virtual cache simulator unit, the program placement apparatus 500 according to the present embodiment sets the number of sets to 1 regardless of the configuration of the real cache memory, and assumes that all ICBs use the same cache area and perform simulation. Execute. Therefore, according to the program placement apparatus 500, it is possible to extract combination information of ICBs having correlation when a conflict error occurs.

The program placement apparatus 500 according to the present embodiment records the number of times of occurrence of contention misses for each combination of correlated ICBs when contention misses occur in the contention miss database. Therefore, according to the program placement apparatus 500, it is possible to calculate the accurate number of contention errors in the optimum placement search process, that is, the placement position determination process.

In the program placement apparatus 500 according to the present embodiment, the optimal placement search unit, that is, the placement position determination unit, calculates the number of times that a conflict miss of a function other than itself is caused by each function based on the conflict miss database. To calculate. Then, the arrangement position determination unit arranges in the memory area in order from the calculated function having the highest number of contention misses. Therefore, according to the program placement apparatus 500, an accurate optimum placement can be searched.

As described above, according to the program placement apparatus 500 according to the present embodiment, the virtual cache simulator unit stores the correlation of each function block in the instruction trace as the competition information. Since the arrangement position determination unit determines the function arrangement that minimizes the number of competing cache misses based on the competition information, more accurate function arrangement information can be output.

Embodiment 2. FIG.
In the present embodiment, differences from the first embodiment will be mainly described.
The same components as those described in Embodiment 1 are denoted by the same reference numerals, and the description thereof may be omitted.

In the first embodiment, the arrangement position determination unit 80 arranges functions in a memory space from a function having a high priority, and outputs function arrangement information 90. However, depending on the order and size of the functions to be arranged, the memory usage efficiency may deteriorate.

Consider a case where the function arrangement information 90 shown in FIG. 27 is output.
In FIG. 27, the ratio of the area from the beginning of the first function to the end of the last function that is filled with valid function instructions is approximately 69%. Since an increase in memory capacity leads to an increase in cost, it is important to improve memory efficiency.
Therefore, in this embodiment, a method for improving the memory usage efficiency will be described.

*** Explanation of configuration ***
FIG. 28 is a block configuration diagram showing a program placement apparatus 500a according to the second embodiment.
The program placement apparatus 500a according to the present embodiment includes a function placement adjustment unit 120 in addition to the configuration shown in FIG.
The function placement adjustment unit 120 rearranges each function of the plurality of functions into a rearrangement position 802 different from the placement position 801 based on the placement position 801 of each function of the plurality of functions determined by the placement position determination unit 80. . The free area 4015 of the cache memory 401 when each function of the plurality of functions is arranged at the rearrangement position 802 is more than the free area 4016 of the cache memory 401 when each function of the plurality of functions is arranged at the arrangement position 801. There are few.

The function placement adjustment unit 120 receives the program information 50 and the function placement information 90 as input, and outputs function placement information 90a with improved memory use efficiency. The function arrangement information 90 is information in which each function of a plurality of functions is arranged at the arrangement position 801. The function arrangement information 90a is information in which each function of a plurality of functions is arranged at the rearrangement position 802.

Each functional block may be implemented as one or a plurality of programs, or a plurality of functional blocks may be implemented as one program.

*** Explanation of operation ***
The program placement method and program placement processing S10a of the program placement apparatus 500a according to the present embodiment will be described with reference to FIG.

Steps S1, S1a, S2, and S3 in FIG. 29 are the same as those in the first embodiment.
In step S 5, the function placement adjustment unit 120 generates function placement information 90 a with improved memory use efficiency based on the program information 50 and the function placement information 90.

The function arrangement adjustment process S5 according to the present embodiment will be described with reference to FIG.
First, in step S500, the function placement adjustment unit 120 initializes the memory address to the top of the memory space in which the function is placed, and the process proceeds to step S501.
In step S501, the function placement adjustment unit 120 checks whether all functions have been placed. If not arranged, the process proceeds to step S502, and if arranged, the process proceeds to step S506.
In step S502, the function placement adjustment unit 120 checks whether there is a function starting with a block address and a block offset indicated by the current memory address among the unplaced functions. If it exists, the process proceeds to step S503, and if it does not exist, the process proceeds to step S505.
In step S503, the function placement adjustment unit 120 extracts from the function placement information 90 one function that satisfies the block address and the block offset indicated by the current memory address from among the unplaced functions, and places the function at the current memory address. To do. After placement, the process proceeds to step S504.
In step S504, the function placement adjustment unit 120 advances the memory address by the size of the function to be processed.
In step S505, the function arrangement adjustment unit 120 advances the memory address by one block offset.
In step S506, the function arrangement adjustment unit 120 outputs the current arrangement information as the function arrangement information 90, and ends the process.

Next, the function arrangement adjustment processing S5 will be described in detail using a specific example.
A case where the function arrangement adjustment process S5 is executed on the function arrangement information 90 shown in FIG. 27 will be described. The result of executing the function placement adjustment process S5 on the function placement information 90 is assumed to be function placement information 90a in FIG.
In this specific example, the top of the memory space is 0, and the number of instructions in the block is 2.
First, the function placement adjustment unit 120 initializes a memory address, a block address, and a block offset at the beginning of a memory space in which a function is placed. At this time, the memory address is 0, the block address is 0, and the block offset is 0.

Next, the function placement adjustment unit 120 places the function A satisfying the block address 0 and the block offset 0 among the unplaced functions at the memory address 0. Then, the function arrangement adjustment unit 120 adds the size 5 of the function A to the memory address to obtain the memory address 5, the block address 2, and the block offset 1.

Next, the function placement adjustment unit 120 takes out the function E satisfying the block address 2 and the block offset 1 and places it at the memory address 5. After the arrangement, the function arrangement adjustment unit 120 adds the size 6 of the function E to the memory address to obtain the memory address 11, the block address 5, and the block offset 1.

Since there is no function that satisfies the next block address 5 and block offset 1, when the function placement adjustment unit 120 advances the memory address by block offset 1, the memory address 12, block address 6, and block offset 2 are obtained. Subsequently, by proceeding in the same manner, the functions are arranged in the order of function D, function C, function G, function F, and function B.
As described above, as a result of the function arrangement adjustment process S5, the ratio of the area from the beginning of the first function to the end of the last function that is filled with valid function instructions is 98%.

*** Explanation of effects ***
The program arrangement apparatus 500a according to the present embodiment includes a function arrangement adjustment unit that rearranges function arrangements based on the function arrangement information 90 output from the arrangement position determination unit. Then, the program placement apparatus 500a executes the function rearrangement so that the free space in the memory is smaller than the free space in the memory when placed according to the function placement information 90.
As described above, according to the program arrangement apparatus 500a according to the present embodiment, the function arrangement adjustment unit 120 executes the function rearrangement based on the function arrangement information 90 output from the arrangement position determination unit. Memory usage efficiency can be improved.

In the above embodiment, “acquisition unit”, “arrangement position determination unit”, “function arrangement unit”, “instruction trace reading unit”, “virtual cache memory generation unit”, “virtual cache data holding unit”, “contention” The “information generation unit” and “function arrangement adjustment unit” were independent functional blocks. However, the program placement apparatus may not be configured as described above, and the configuration of the program placement apparatus is arbitrary. The “acquisition unit”, “arrangement position determination unit”, “function arrangement unit”, and “function arrangement adjustment unit” may be realized by one functional block. Further, the “instruction trace reading unit”, “virtual cache memory generation unit”, “virtual cache data holding unit”, and “contention information generation unit” may be realized by one functional block.

Further, the program placement apparatus may be a program placement system constituted by a plurality of devices instead of a single device. The function blocks of the program placement apparatus are arbitrary as long as the functions described in the embodiment can be realized. Any other combination of these functional blocks may constitute the program placement apparatus.

Although the first and second embodiments have been described above, one of the two embodiments may be partially implemented. Alternatively, a plurality of these two embodiments may be partially combined. In addition, these two embodiments may be implemented in any combination as a whole or in part.
In addition, said embodiment is an essentially preferable illustration, Comprising: It does not intend restrict | limiting the range of this invention, its application thing, or a use, A various change is possible as needed. .

10 programs, 20 program execution units, 21 acquisition units, 30 instruction traces, 31 calling order, 40 cache configuration information, 50 program information, 60 virtual cache simulator units, 70 conflict information, 71 conflict miss data records, 72 ICB names, 73 Total number of mistakes as own ICB, 74 Total number of mistakes as other ICBs, 75 conflict miss entries, 76 miss IDs, 77 conflict ICB counts, 78 miss counts per competing ICB, 80 placement position determination unit, 90, 90a function placement Information, 100 function allocation unit, 110 optimized program, 120 function allocation adjustment unit, 130 priority table, 131 rule, 301 instruction code, 303 ICB execution series, 401 cache memory, 403 virtual cache memory, 01 Virtual cache memory generation unit, 602 Virtual cache data holding unit, 603 Instruction trace reading unit, 604 Contention information generation unit, 605 Simulator unit, 701 Contention instruction code set, 702 Contention ICB, 703 Number of contention, 801 Location, 802 Re Placement position, 500, 500a Program placement device, 901 processor, 902 auxiliary storage device, 903 memory, 904 communication device, 905 input interface, 906 display interface, 907 input device, 908 display, 910 signal line, 911, 912 cable, 9401 Receiver, 9402 transmitter, 4015, 4016 free space, 4031 storage area, 4032 temporary storage area, 4033 call instruction code, 4 39 a plurality of areas, S1 acquisition process, S1a virtual cache generating process, S2 simulation process, S3 position determination processing, S4 function arrangement process, S5 function disposition adjustment process, S10, S10a program allocation process.

Claims

In a program placement apparatus that places a program including a plurality of functions in a cache memory that uses at least one way,
An acquisition unit that acquires the calling order of instruction codes included in each function of the plurality of functions by executing the program;
A virtual cache memory generation unit that generates a virtual cache memory having storage areas corresponding to the number of ways of the cache memory;
A simulation is executed to call the instruction code to the virtual cache memory as a calling instruction code in the calling order, and when the calling instruction code that has already been called is called again, an instruction code other than the calling instruction code is stored in the storage area. When a stored conflict occurs, a simulator unit that acquires information on the generated conflict as conflict information;
A program placement apparatus comprising: a placement position determination unit that determines a placement position of each function of the plurality of functions in the cache memory based on the contention information.
The simulator unit is
A combination of instruction codes causing the conflict among instruction codes included in each function of the plurality of functions is acquired as a conflict instruction code set, and the number of conflicts generated by the conflict instruction code set is counted as the number of conflicts. And obtaining the counted number of competitions as the competition information,
The arrangement position determining unit
The program placement apparatus according to claim 1, wherein the placement position of each function of the plurality of functions is determined based on the number of times of competition.
The virtual cache memory generator is
As the virtual cache memory, a plurality of areas consisting of a plurality of areas having consecutive addresses, each including a storage area for the number of ways and a plurality of temporary storage areas following the storage area for the number of ways. The program placement device according to claim 2 to be generated.
The simulator unit is
The call instruction code already stored in the virtual cache memory when the contention occurs in the storage area when the call instruction code is called and the call instruction code is in the temporary storage area 4. The program placement apparatus according to claim 3, wherein a set of all instruction codes stored in an earlier area and the call instruction code is acquired as the competing instruction code set.
The simulator unit is
After obtaining the contention instruction code set, all instruction codes stored in the area before the call instruction code already stored in the virtual cache memory are moved to the area one after, respectively, 5. The calling instruction code stored in a virtual cache memory is stored in the storage area at the head of the virtual cache memory, and an instruction code next to the calling instruction code in the calling order is called as the calling instruction code. The program arrangement device described in 1.
The program placement device comprises:
A priority table for setting a rule for determining the priority of each function of the plurality of functions;
The arrangement position determining unit
2. The priority of each function of the plurality of functions to be arranged in the cache memory is determined based on the priority table, and the arrangement position of each function of the plurality of functions is determined in the determined priority order. 6. The program arrangement device according to any one of items 1 to 5.
The program placement device comprises:
A function arrangement adjustment unit that rearranges each function of the plurality of functions to a rearrangement position different from the arrangement position based on the arrangement position of each function of the plurality of functions determined by the arrangement position determination unit; ,
The free area of the cache memory when each function of the plurality of functions is arranged at the relocation position is more than the free area of the cache memory when each function of the plurality of functions is arranged at the arrangement position. The program arrangement device according to any one of claims 1 to 6, wherein there are few.
In a program placement method of a program placement device for placing a program including a plurality of functions in a cache memory that uses at least one way,
The acquisition unit acquires the calling order of instruction codes included in each function of the plurality of functions by executing the program,
A virtual cache memory generating unit generates a virtual cache memory having storage areas corresponding to the number of ways of the cache memory;
When the simulator unit executes a simulation for calling the instruction code as the calling instruction code in the calling order with respect to the virtual cache memory, and when the calling instruction code that has already been called is called again, other than the calling instruction code in the storage area If a conflict that stores the instruction code is generated, information on the generated conflict is acquired as conflict information,
A program arrangement method, wherein an arrangement position determination unit determines an arrangement position of each function of the plurality of functions in the cache memory based on the contention information.
In a program placement program of a program placement device that places a program including a plurality of functions in a cache memory that uses at least one way,
An acquisition process for acquiring the calling order of instruction codes included in each function of the plurality of functions by executing the program;
Virtual cache memory generation processing for generating virtual cache memory having storage areas for the number of ways of the cache memory;
A simulation is executed to call the instruction code to the virtual cache memory as a calling instruction code in the calling order, and when the calling instruction code that has already been called is called again, an instruction code other than the calling instruction code is stored in the storage area. When a stored conflict occurs, a simulation process for acquiring information on the generated conflict as conflict information;
A program arrangement program for causing a computer to execute an arrangement position determination process for determining an arrangement position of each function of the plurality of functions in the cache memory based on the contention information.