CN100428161C

CN100428161C - Automatic layout method for storage sub-system internal storage of embedded microprocessor

Info

Publication number: CN100428161C
Application number: CNB2007100223705A
Authority: CN
Inventors: 王学香; 凌明; 杨军; 刘新宁; 陆生礼
Original assignee: Southeast University
Current assignee: Southeast University
Priority date: 2007-05-15
Filing date: 2007-05-15
Publication date: 2008-10-22
Anticipated expiration: 2027-05-15
Also published as: CN101051276A

Abstract

A method for laying out internal memory of storage subsystem in embedded microprocessor includes operating binary object program generated by external ARMCC tool chain in out-plate synchronous dynamic random storage to obtain access record of microprocessor, dividing said object program to be a series of data nodes and command nodes in relation array according to chain information and said access record, selecting nodes to be operated on onplate static random storage to obtain selected node list being used to obtain new binary object program, operating relevant portion of new program in onplate static random storage.

Description

The automatic layout method for storage sub-system internal storage of embedded microprocessor

Technical field

The present invention is a kind of automatic layout method for storage sub-system internal storage that is applied to the embedded microprocessor in the System on Chip/SoC design, belongs to the embedded microprocessor design field.

Background technology

The storage subsystem of modern system chip generally is made of embedded microprocessor, on-chip memory, three major parts of chip external memory, according to their physical location and tissue characteristics, various piece differs to the influence of chip performance, on-chip memory generally is a static RAM on the very fast sheet of access speed, the address space of it and chip external memory is separated from each other, but shared address and data bus; Chip external memory is generally the outer synchronous DRAM chip of sheet, and for static RAM on the sheet, its access speed is very slow and unstable, and the visit power consumption is bigger.Along with improving constantly of System on Chip/SoC design level, the problem that the low reading speed of chip external memory and the high primary frequency speed of processor chips are not complementary has directly limited the lifting of processor chips overall performances, has caused the loss of processor chips performances.In addition, the high power consumption of visit chip external memory also causes the increase of processor chips power consumption.

Summary of the invention

Technical matters: the objective of the invention is to solve above-mentioned problems of the prior art, a kind of automatic layout method for storage sub-system internal storage of embedded microprocessor is provided, realize the memory mapping optimization of storage subsystem, improve the performance of storage subsystem and reduce its power consumption.

Technical scheme: for solving the technical matters that exists in the prior art, the automatic layout method for storage sub-system internal storage of the designed embedded microprocessor of the present invention is at the storage subsystem of static RAM, the outer synchronous DRAM of sheet on the ARM7TDMI embedded microprocessor that comprises ARM company, the sheet.Step is as follows: a) the former binary object program that outside ARMCC instrument chain is generated, all put into the outer synchronous DRAM of sheet and move, obtain in the operational process ARM7TDMI embedded microprocessor the Visitor Logs of the outer synchronous DRAM of sheet; B) according to the link information of ARMCC instrument chain generation and the Visitor Logs of previous step generation, described former binary object program is divided into a series of back end and instruction node, and generates the relational matrix of priority relationship between the expression node; C) select to put into the node that moves on the static RAM on the sheet: described whole nodes are arranged according to the priority, selected the highest node of priority; Consider the influence of described relational matrix, all the other each nodes are carried out described arrangement and selection repeatedly, static RAM can't be put into any node again on sheet, obtains choosing node listing; D) according to the described node listing of choosing, revise former binary object program, obtain new binary object program; E) with in the new binary object program and the described part of choosing the node correspondence in the node listing put on the sheet static RAM and move.

The step that described former binary object program is divided into back end and instruction node is as follows: a) symbol table is rebuild: the text that comprises symbol table with ARMCC instrument chain (referring to the instrument chain that is used to compile the ARM instruction set that ARM company provides) generates converts the new symbol table that partition process can be discerned to; B) (DCD is a pseudoinstruction that defines in the ARM assembly language to DCD, be used for initialization 32 bit data storage unit, in compilation process, also be used for preserving constant and data in addition, be referred to as the DCD data) data analysis: determine which is the DCD back end in the former binary object program, which is the DCD instruction node; C) function is divided: what and the jump instruction of carrying out number of times according to the function each several part are divided into each instruction node with it, be the boundary promptly with the jump instruction in each function, function is divided into some instruction node, and all functions are divided into a series of instruction node in the most former binary object program; D) function call analysis: analyze each the bar function call instruction in all instruction node, Primary Construction goes out the expansion control flow graph of whole former binary object program; E) overall storehouse analysis: encapsulate the overall storehouse of former binary object program, the new global data node that obtains, this global data size of node are 1.5 times of the storehouse size that obtains of statistics in the program operation process; F) data access analysis: determine the relation between each instruction node and the global data node, finish the expansion control flow graph of former binary object program.Wherein, back end comprises two kinds: a kind of is the former global data node of described binary object program, and another kind is the overall storehouse of the described binary object program of encapsulation and the new global data node that obtains; Instruction node is in the described binary object program, meets the function of the amount of capacity restriction of static RAM on the sheet, a plurality of function fundamental blocks that form according to its inner jump instruction cutting.When generating a series of back end and instruction node, generate relational matrix and represent to make one or more nodes move into the variation of other node priorities after the static RAM on the sheet owing to internodal mutual relationship.

Choose node listing to adopt knapsack algorithm to obtain: at first, described whole nodes to be arranged according to the priority, selected the highest node of priority to put into the described node listing of choosing; Secondly, consider the influence of relational matrix, all the other each nodes are carried out repeatedly the arrangement and the selection of back again, static RAM can't be put into any node again on sheet, promptly obtains the described node listing of choosing.Standard as the priority of each node being arranged and selecting, both can for each node put into the program execution time that can reduce after the static RAM on the sheet what with the ratio of node size, also can put into the size of the power consumption that can save after the static RAM on the sheet and the ratio of node size for each node, the ratio big priority that heals is higher.

Beneficial effect: the automatic layout method for storage sub-system internal storage of embedded microprocessor of the present invention, the static RAM allocation algorithm carries out static RAM distribution on the sheet on the sheet of employing consideration relationships between nodes, accurately analyzed the performance optimization effect of program, the memory mapping of change program, program with compute-intensive applications, data are assigned on outer synchronous DRAM of sheet and the sheet in the static RAM suitably, guarantee to call at most, computing time, the longest program was positioned at static RAM on the sheet, utilize static RAM space on the sheet fully, improved the performance of storage subsystem, and reduced external memory and read the huge power consumption of bringing, the optimization of the memory mapping of the storage subsystem of realization embedded microprocessor.

The automatic layout method for storage sub-system internal storage of embedded microprocessor of the present invention can make the performance of application programs such as MP3 decoding program improve 37%-52%.Shown in the following table, be that the test of related application is compared:

Description of drawings

Fig. 1 is the storage subsystem structured flowchart of ARM7TDMI embedded microprocessor.

Fig. 2 is a workflow diagram of the present invention.

Fig. 3 is the expansion control flow graph of application program.

Have among the figure: static RAM 2, the outer synchronous DRAM 3 of sheet, storage subsystem performance simulation module 4, procedure division device 5, SPM divider (static RAM divider on the sheet) 6, linker 7 on ARM7TDMI embedded microprocessor 1, the sheet.

Embodiment

Below in conjunction with accompanying drawing and embodiment the present invention is described in further detail.

Fig. 1 is the storage subsystem structured flowchart of ARM7TDMI embedded microprocessor.This storage subsystem comprises the outer synchronous DRAM 3 of static RAM 2 on ARM7TDMI embedded microprocessor 1, the sheet, sheet, wherein, is connected by the AMBA bus between the static RAM 2 on ARM7TDMI embedded microprocessor 1 and the sheet; Be connected indirectly by the external memory storage control interface between the outer synchronous DRAM 3 of sheet and the ARM7TDMI embedded microprocessor 1, ARM7TDMI embedded microprocessor 1 is connected by the AMBA bus with outer going on foot between the storer control interface, the data-interface and the control interface of the outer synchronous DRAM 3 of sheet, respectively by data bus and control bus, be connected corresponding of data-interface of external memory storage with control interface.

Workflow of the present invention is as follows, referring to Fig. 2:

The first step is all put into outer synchronous DRAM 3 operations of sheet with the binary object program that outside ARMCC instrument chain generates, and obtains the Visitor Logs of the outer synchronous DRAM 3 of 1 pair of sheet of ARM7TDMI embedded microprocessor in the operational process.

Second step, link information and described Visitor Logs that procedure division device 5 generates according to ARMCC instrument chain, described binary object program is divided into a series of back end and instruction node, and finish the statistics of each node visit information, generate and describe the relational matrix that concerns between each node.

The 3rd step, SPM divider (static RAM divider on the sheet) 6 visit information and relational matrix according to each node, adopt knapsack algorithm to select part will put into the node of static RAM 2 on the sheet, obtain choosing node listing: described whole nodes are arranged according to the priority, selected the highest node of priority; Consider the influence of described relational matrix, all the other each nodes are carried out described arrangement and selection repeatedly, static RAM can't be put into any node again on sheet, obtains choosing node listing.

In the 4th step, linker 7 is revised former binary object program according to the described node listing of choosing, and obtains new binary object program;

The 5th step, in the new binary object program of linker 7 generations, corresponding to the described part of choosing the node in the node listing, the initial phase that moves at new binary object program is put into operation in the static RAM 2 on the sheet, finishes storage sub-system internal storage autoplacement optimizing process.

The former binary object program that outside ARMCC instrument chain is generated of the present invention, the step that is divided into back end and instruction node is as follows: a) symbol table is rebuild: the text that comprises symbol table with ARMCC instrument chain generates converts the new symbol table that partition process can be discerned to; B) DCD data analysis: determine which is the DCD back end in the former binary object program, which is the DCD instruction node; C) function is divided: what and the jump instruction of carrying out number of times according to the function each several part are divided into each instruction node with it, be the boundary promptly with the jump instruction in each function, function is divided into some instruction node, all functions are divided into a series of instruction node in the most former binary object program: jump instruction makes the execution number of times of each the bar instruction in the program not quite identical, put into on-chip memory carrying out each maximum bar instruction of number of times in the program, more execution time can be reduced, herein, jump instruction comprises the rebound instruction of redirect destination address before inner this instruction of function, jump to the v2 node such as the v3 node among Fig. 3, with jump to destination address after inner this instruction of function some instructions preceding immediate skip, jump to the v5 node such as the v2 node among Fig. 3, but do not comprise destination address not in the jump instruction of function inside, jump instruction comprises condition jump instruction and unconditional jump instruction; D) function call analysis: analyze each the bar function call instruction in all instruction node, Primary Construction goes out the expansion control flow graph of whole former binary object program; E) overall storehouse analysis: encapsulate the overall storehouse of former binary object program, the new global data node that obtains, this global data size of node are 1.5 times of the storehouse size that obtains of statistics in the program operation process; F) data access analysis: determine the relation between each instruction node and the global data node, finish the expansion control flow graph of former binary object program.Wherein, described back end comprises two kinds, and a kind of is the former global data node of former binary object program, and another kind is the overall storehouse of the former binary object program of encapsulation and the new global data node that obtains; Described instruction node is in the former binary object program, meets the function of the capacity limit of static RAM 2 on the sheet, a plurality of function fundamental blocks that form according to its inner jump instruction cutting.

As one embodiment of the present of invention, each node is arranged and the standard of the priority selected, for after each node puts on the sheet static RAM 2 operations, the program execution time that can reduce how much with the ratio of node size, the ratio big priority height of healing of healing.

As an alternative embodiment of the invention, each node is arranged and the standard of the priority selected, for each node is put on the sheet after static RAM 2 operations, the power consumption size that can save and the ratio of node size, the ratio big priority height of healing of healing.

Introduce four key modules that the automatic layout method for storage sub-system internal storage of embedded microprocessor of the present invention relates to below respectively.

One, storage subsystem performance simulation module 4:

The storage subsystem performance simulation module that embedded microprocessor is write in the copying design that utilizes the ARMCC instrument chain of ARM company to provide, make application program can be applied program when on this module, moving, be used for subsequent step the Visitor Logs of storage subsystem.

Two, the procedure division device 5:

Be the compile optimization ability of the ARMCC instrument chain that makes full use of ARM company, go up the binary object program file that the static RAM optimisation technique is directly analyzed ARMCC link back output for of the present invention, rather than the C language source code of program.By the symbol table of reconstruction algorithm, all the elements of multianalysis program comprise function, global data variable and overall storehouse.Wherein, all meet the function of the amount of capacity restriction of static RAM on the sheet in the program, all are cut into a plurality of fundamental blocks according to its inner jump instruction; The overall storehouse of program is packaged into a global data variable.Because program is when handling different inputs, its storehouse size can change, for preventing that storehouse from overflowing, the size of this data variable is actual 1.5 times of using the storehouse size in the program operation process, and the global data variable of former binary object program remains unchanged.At last, be admitted to SPM divider (static RAM divider on the sheet) 6 contents of program of selecting and comprise global data variable and function fundamental block.

The partition process of the former binary object program of 5 pairs of ARMCC instruments of procedure division device chain output is divided into six steps successively: symbol table reconstruction, DCD data analysis, function division, function call analysis, overall storehouse analysis and data access analysis.Through above-mentioned six steps, former binary object program is divided into a series of instruction node and back end, wherein, back end comprises two kinds, a kind of is the former global data node of former binary object program, and another kind is the overall storehouse of the former binary object program of encapsulation and the new global data node that obtains; Instruction node is meant in the former binary object program, meets the function of the amount of capacity restriction of static RAM 2 on the sheet, a plurality of function fundamental blocks that form according to its inner jump instruction cutting.Algorithm design and realization for convenient follow-up SPM divider (static RAM divider on the sheet) 6, the present invention has set up a kind of control flow graph of expansion and has described these contents of program and the relation between them, V6 is a back end among Fig. 3, other nodes all are the instruction nodes, except the unconditional transfer relation, other internodal relation all has diagram in the drawings: conditional transfer relation, data access relation, order are carried out relation, call relation.Some ARM instructions have very strict restriction for address space, such as jump instruction and Data Loading instruction.Generally because the address jump range is excessive, a jump instruction can not redirect between chip external memory and on-chip memory.To replace by two instructions after jump instruction in the chip external memory is moved in the on-chip memory, cause moving to the increase of the node size in the on-chip memory, thereby influenced program feature and power consumption.So respectively five kinds of relations between analysis node concern initiator node and concern that recipient's node is moved into static RAM 1 posterior nodal point on the sheet respectively big or small and to the influence of program feature and power consumption.The present invention represents that with relational matrix relation comprises performance matrix T RM to the influence of node size, program feature and power consumption between the node _Ij, power consumption matrix ERM _IjWith big or small matrix S RM _IjExpression and if only if node V respectively _iWhen being moved into SPM, node V _jWith V _iBetween all relations to the influence of program execution time, to the influence of saving power consumption with to the influence of node size, with following formulate:

{TRM}_{ij} = Σ_{type = 1}^{5} ({ΔT}_{start} (R_{type} (i, j)) + {ΔT}_{end} (R_{type} (j, i)))

{ERM}_{ij} = Σ_{type = 1}^{5} ({ΔE}_{start} (R_{type} (i, j)) + {ΔE}_{end} (R_{type} (j, i)))

{SRM}_{ij} = Σ_{type = 1}^{5} ΔS (R_{type} (i, j))

In the formula, R _Type(i, j) the internodal five kinds of relations of expression, Δ T _StartExpression concern that initiator node moves into static RAM on the sheet (1) variation of program execution time afterwards, Δ T _EndExpression concern that recipient's node moves into static RAM on the sheet (1) variation of program execution time afterwards, Δ E _StartExpression concern that initiator node moves into the variation of the power consumption that static RAM on the sheet (1) afterwards saved, Δ E _EndExpression concern that recipient's node moves into the variation of the power consumption that static RAM on the sheet (1) afterwards saved, and Δ S represents that node moves into the variation of static RAM on the sheet (1) posterior nodal point size.

Three, SPM divider (static RAM divider on the sheet) 6:

This divider adopts the improved knapsack algorithm of having considered relationships between nodes, whole nodes are arranged, selected, obtain to put into the tabulation of the node of static RAM 2 on the sheet, step is as follows: a, described whole nodes are arranged according to the priority, selected the highest node of priority to put into the described node listing of choosing; B, consider the influence of relational matrix, all the other each nodes are carried out the arrangement and the selection of back repeatedly, static RAM 2 can't be put into any node again on sheet, obtains the described node listing of choosing.The standard of the node of static RAM 2 on the sheet is put in selection, can be each node put into 2 backs of static RAM on the sheet the size of obtainable performance benefits, or each node is put into the size of the power consumption that static RAM 2 backs can be saved on the sheet, the performance benefits big person's priority that heals is higher, and the more person's priority is higher perhaps to save power consumption.

Four, linker 7:

Obtain after the selected node listing of moving into static RAM on the sheet under the capacity of static RAM on the anter 2, by with all nodes in the linker 7 of C language compilation chain program again, and generate new binary file automatically, this document comprises three parts: initialization section, static RAM part on outer synchronous DRAM part of sheet and the sheet, wherein, the outer synchronous DRAM of sheet partly is former binary object program to be moved situation according to node revised jump instruction, static RAM partly is that linker is newly-increased on initialization section and the sheet, the static RAM part is also moved situation according to node and has been revised jump instruction on the sheet, initialization section partly copies to static RAM on the sheet in the storage space of static RAM 2 on the real sheet, jump to the principal function first address place of original program then, move into static RAM 2 on the sheet if the first node of principal function is selected, then initialization section will be leapt to the correspondence position of static RAM 2 on the sheet.New binary object program begins to carry out from initialization section.

Claims

1, a kind of automatic layout method for storage sub-system internal storage of embedded microprocessor, this storage subsystem comprises static RAM (2) on ARM7TDMI embedded microprocessor (1), the sheet, the outer synchronous DRAM (3) of sheet, it is characterized in that this internal memory autoplacement method comprises following steps:

1a) the former binary object program that ARMCC instrument chain is generated, all put into outer synchronous DRAM (3) operation of sheet, obtain in the operational process ARM7TDMI embedded microprocessor (1) the sheet Visitor Logs of synchronous DRAM (3) outward;

1b) link information and the step 1a that generates according to ARMCC instrument chain) described Visitor Logs is divided into a series of back end and instruction node to former binary object program, and generates the relational matrix of priority relationship between the expression node;

1c) select to put into the node that static RAM on the sheet (2) goes up operation:

1c1) described whole nodes are arranged according to the priority, selected the highest node of priority;

1c2) at step 1b) influence of described relational matrix, all the other each nodes are carried out step 1c1 repeatedly) described arrangement and selection, static RAM on sheet (2) can't be put into any node again, obtains choosing node listing;

1d), revise former binary object program, obtain new binary object program according to the described node listing of choosing;

1e) will put into static RAM on the sheet (2) with the described part of the node correspondence in the node listing of choosing in the new binary object program;

The step that described former binary object program is divided into back end and instruction node is as follows:

2a) symbol table is rebuild: the text that comprises symbol table with ARMCC instrument chain generates converts the new symbol table that partition process can be discerned to;

2b) DCD data analysis: determine which is the DCD back end in the former binary object program, which is the DCD instruction node;

2c) function is divided: what and the jump instruction of carrying out number of times according to the function each several part are divided into each instruction node with it, be the boundary promptly with the jump instruction in each function, function is divided into some instruction node, and all functions are divided into a series of instruction node in the most former binary object program;

2d) function call analysis: analyze each the bar function call instruction in all instruction node, Primary Construction goes out the expansion control flow graph of whole former binary object program;

2e) overall storehouse analysis: encapsulate the overall storehouse of former binary object program, the new global data node that obtains, this global data size of node are 1.5 times of the storehouse size that obtains of statistics in the program operation process;

2f) data access analysis: determine the relation between each instruction node and the global data node, finish the expansion control flow graph of former binary object program.

2, the automatic layout method for storage sub-system internal storage of embedded microprocessor according to claim 1, it is characterized in that, described priority for each node put into the program execution time that static RAM on the sheet (2) back can reduce how much with the ratio of node size, the ratio big priority height of healing of healing.

3, the automatic layout method for storage sub-system internal storage of embedded microprocessor according to claim 1, it is characterized in that, described priority is put into the power consumption size that static RAM on the sheet (2) back can save and the ratio of node size for each node, the ratio big priority height of healing of healing.