CN103500107B

CN103500107B - Hardware optimization method for CPU

Info

Publication number: CN103500107B
Application number: CN201310450768.4A
Authority: CN
Inventors: 朱钟琦; 曾田; 阮航; 王炜
Original assignee: 709th Research Institute of CSIC
Current assignee: Wuhan Lingjiu Microelectronics Co ltd; 709th Research Institute of CSSC
Priority date: 2013-09-29
Filing date: 2013-09-29
Publication date: 2017-05-17
Anticipated expiration: 2033-09-29
Also published as: CN103500107A

Abstract

A hardware optimization method for a CPU comprises the following steps: (1) hardware of a system is designed, wherein the system comprises a master controller, a storage controller, an external bus interface and other modules; (2) a burst region is set, wherein the burst region and a non-burst region in an internal storage can be set simply through a burst address editor; (3) a burst mode is used, wherein the burst mode can be started when the length of data is larger than 32 bits. The hardware optimization method for the CPU has the advantages that the hardware optimization method is completely compatible with an X86 instruction set, and a burst read-write region can be configured entirely to conduct segmented optimization of data read-write; with respect to hardware deign, a peripheral circuit is not added, the number of logic gates of the CPU is rarely increased, and therefore the cost of the system is not affected; the execution time of an instruction is greatly shortened, and the number of bus visit times of the CPU is reduced; the hardware optimization method can be popularized to the CPU design of other CISC or RISC instruction sets, thereby being wide in application range.

Description

A kind of hardware optimization method of CPU

Technical field

The present invention relates to the hardware design field of processor, more particularly to CPU（Central processing unit）The hardware of data transfer Optimization design.

Background technology

Data transfer component is always the important component part of CPU, during processor design is always to its Optimization Work One of emphasis of performance optimization.In state's inner treater design the optimization of data transfer component design is mainly passed through to improve Cache （Cache）Execution efficiency, solution read-write correlation, increase DMAC（Direct memory access controller）The modes such as part are entering OK, the optimization to the execution flow process of data transfer instruction is seldom referred to.By taking the data transfer instruction of 32 x86 instruction sets as an example, It is entirely according to traditional mode transmitted one by one, by byte to perform flow process（Byte）, word（Word）Or double word（It is double Word）One by one order passes to destination address from source address.This kind of instruction can in a large number occupy bus, cause the pause of CPU streamlines With the increase of bus bandwidth load.

By taking 32 X86 instruction as an example, figure one and figure two are respectively typical instructions REP of memory to memory data transfer MOVS（String transmission instruction）With typical instructions POPA of memory to register transfer（Pull instruction）Flow chart.For REP For MOVS, it is assumed that ECX values are 100, then this instruction needs 100 step 1- steps 5 of repetition just can complete.It can be seen that this Class instruction execution efficiency is low, bus bandwidth is loaded higher.

This intellectual achievement carries out labor by the data transfer instruction to x86 instruction set, it is proposed that for therein The method that continuous data transmission instruction carries out hardware optimization, the execution cycle of reduction data transfer instruction and CPU are to bus access （Particularly write access）Number of times, effectively increase the instruction execution speed of CPU.

The content of the invention

The invention reside in providing a kind of hardware optimization method of data transfer instruction, it is intended to improve holding for data transfer instruction Line efficiency, and then lift the performance of processor.Its address parameter setting is simple, can only in system initialization（For example BIOS sets It is fixed）When setting once, it is also possible to according to client need parameter setting is changed in system implementation；Performance is obviously improved, excellent Data transfer instruction after change generally has more than one times of improved efficiency.

A kind of hardware optimization method of CPU of the present invention, including：

（1）, design system hardware：System is by module compositions such as master controller, storage control, external bus interfaces： Wherein, master controller is the main part of system, is that the storage that can be directed to different memory access addresses execution different modes is operated CPU, specifically, this CPU can be set by way of BIOS program or setting internal register value can burst ground Location area, for can the memory access of burst areas perform by the instruction scheme after optimization, can not burst areas perform by former instruction scheme, storage Controller is written and read operation to memory by receiving master controller data message and control information, it would be preferable to support burst reads Write and non-burst read-writes.External bus interface is then used for the inside and outside data communication of CPU.Packet after hardware optimization Include：The source address information of data transfer, destination address information, data length information, control information, data message.

（2）, setting burst regions.By burst addresses editing machine, burst areas that can be simply in set memory and Can not burst areas.Can burst areas and can not burst areas be the configurable region determined according to the Memory Allocation mode of user.Can Burst areas and can not burst areas be not required for it is completely continuous, the memory address of system can be assigned as it is multiple can burst areas and It is multiple can not burst areas.Whether can be configured by hardware and software two ways using burst patterns.Outside can pass through Single switch is turned on and off burst patterns, and inside can decide whether to open burst moulds by arranging particular register value Formula.In addition to using burst editing machines, user can also be set by specific program can burst areas.It is generally initial in system During change by BIOS setup once, it is also possible to reset burst areas when program is run as needed.The data of execution When source address, the destination address of transmission instruction are in burst regions, the instruction scheme of optimization is enabled.

（3）, using burst patterns.When data length is more than 32, you can enable burst patterns.Burst patterns are not Depend on whether to open cache functions, can at most support the memory read-write of the data of a cache row size.Burst is most This requirement of data of one cache row size of many supports is simultaneously revocable, and one why is selected in this intellectual achievement The size of cache rows, as maximum, is conveniently to open carry out data exchange, burst reality with cache modules under cache patterns Desirable maximum number of byte has no hard requirement.When CPU inside and outside carries out data transmission, when reading internal memory, can be with one The secondary data by a cache row are read in the burst registers of data transmission unit, then carry out the behaviour such as register assignment Make；During write internal memory, in the burst registers of data transmission unit, then once the data for needing write are temporarily stored in first Property ground write internal memory.Open cache functions to be very helpful burst read-write capabilitys, CPU can be effectively improved and perform effect Rate.

The number of times of cpu bus access is reduced by burst read-writes, traditional instruction execution flow is shortened, to specific Instruction（Such as REP MOVSB）Instruction flow can be reduced to the 1/16 of traditional process after optimization（According to byte number contained by cache rows It is fixed）.

This optimization method is not limited only in the instruction set of X86 structures, and other CISC or risc instruction set are also applicable. By taking risc instruction set as an example, there is an instruction to be LDMIA in the ARM instruction widely of application, multiple deposits can be completed The transmission of device value, at most can transmit the value of 16 general registers.The way of ARM is to perform after load/store operations one by one, is deposited Storage unit address is increased by word length.If using the burst read-writes in this patent and assignment mode, can disposably by register Value all read again property and be all assigned to 16 general registers, the lifting of performance is obvious.

A kind of advantage of the hardware optimization method of CPU of the present invention is：It is completely compatible with x86 instruction set, operating system and should With software without the need for any change；

The burst writable areas of the present invention are fully configurable, user can according to oneself need divide burst writable areas, it is right Reading and writing data carries out subsection optimization；

For hardware design, peripheral circuit is not increased, cpu logic door increases number seldom, system cost is not affected；

Further, by analysing in depth to data transfer instruction, partial data transmission of this intellectual achievement to X86 refers to The execution flow process of order is optimized, and substantially reduces the execution time of instruction, reduces the bus access number of times of CPU, for Specific instruction（Such as REP MOVSB instructions）Bus access number of times can be reduced to 1/16（Depending on byte number contained by cache rows）.By More concentrate in bus access, so the bandwidth load of bus also has substantial degradation；

In extending to the CPU design of other CISC or risc instruction set, range of application is big.

Description of the drawings：

Fig. 1 is the typical instructions REP MOVS tradition execution flow charts of memory to memory data transfer.

Fig. 2 is typical instructions POPA tradition execution flow chart of the memory to register transfer.

Fig. 3 is general execution flow process comparison diagram before and after data transfer instruction optimization.

Fig. 4 is the execution flow chart after REP MOVS optimizations.

Fig. 5 is the execution flow chart after POPA optimizations.

Specific embodiment：

Shown in Ju Fig. 1~Fig. 5, a kind of hardware optimization method of CPU, its step is as follows：

1. burst regions are configured by burst addresses editing machine

Assignment is carried out to certain two general register in CPU so as to the lower address and upper address in value correspondence burst regions. Again a certain reserved bit of EFLAG is carried out putting an operation, by the burst configuration register groups of the two address assignments to CPU, Again zero-setting operation is performed to the reserved bit of EFLAG, to carry out burst regions configuration next time.This step is repeated several times with Configure multiple burst regions.

2. instructed by the transmission of hardware optimization rapid memory to internal memory

By taking REP MOVS instructions as an example, first DS is judged by hardware：ESI and ES：Whether the address of EDI is in In burst regions.If being in burst regions, optimization logic is performed, otherwise perform former logic.In optimization logic, ECX values are judged Whether burst_num is more than（According to the difference of operand size M, burst_num can be 4,8,16 etc.）If being more than operand Size, then from DS：ESI takes out N number of byte by a read burst（N represents cache row byte numbers）Data be temporarily stored in In internal burst registers, then ES is write by a write burst：EDI, and ESI/ EDI are deducted or added（Plus Or subtract value depending on DF）N, by ECX burst_num is deducted；If burst_num is less than, from DS：ESI is by once Read burst take out（ECX*M）The data of individual byte are temporarily stored in internal burst registers, then by a write Burst writes ES：EDI, and ESI/ EDI are deducted or plus ECX*M, and ECX is set to 0.

3. accelerate register to instruct to the transmission of internal memory by hardware optimization

By taking POPA/PUSHA instructions as an example, SS is first judged：Whether ESP is in burst regions.If being in burst regions, Optimization logic is performed, former logic is otherwise performed.In optimization logic, for POPA：The stacked data of N number of byte is read and kept in Internally in burst registers, the general register of the CPU such as DI, SI, BP is assigned to successively.For PUSHA：By DI, SI, BP Value combination Deng the general register of CPU is assigned to internal burst registers, is then disposably write out by write burst.

In addition to above-described embodiment, can be many to use the method for the present invention to write out in other categorical data transmission technologys Individual embodiment, here is not repeated one by one.

Claims

1. a kind of hardware optimization method of CPU, it is characterised in that：Comprise the following steps：

(1) hardware of design system：Described system includes：Master controller, storage control, external bus interface module；

(2) burst regions are set：By burst addresses editing machine, burst areas in simple set memory and can not burst Area；Can burst areas and can not burst areas be the configurable region determined according to the Memory Allocation mode of user, can burst areas and Can not burst areas be not required for it is completely continuous, the memory address of system be assigned as it is multiple can burst areas and it is multiple can not burst Area；Whether can be configured by hardware and software two ways using burst patterns；Outside is turned on and off by single switch Burst patterns, it is internal to decide whether to open burst patterns by arranging particular register value；Except using burst editing machines with Outward, user can also burst areas to set by setting internal register value；Pass through BIOS setup one generally in system initialization It is secondary, or burst areas are reset when program is run as needed；The source address of the data transfer instruction of execution, mesh Address in the burst regions when, enable the instruction scheme of optimization；

(3) using burst patterns：When data length is more than 32, you can enable burst patterns, burst patterns are not relied on Cache functions whether are opened, the memory read-write of the data of a cache row size can be at most supported.

2. the hardware optimization method of a kind of CPU described in a Ju claim 1, it is characterised in that：System includes：Master controller, Storage control, external bus interface module；Wherein, master controller is the main part of system, and being can be for different memory access Address performs the CPU of the storage operation of different modes, and specifically, this CPU can be internal by BIOS program or setting The mode of register value come set can burst address areas, for can the memory access of burst areas by optimization after instruction scheme perform, no Can burst areas perform by former instruction scheme, storage control by receiving master controller data message and control information, to storage Device is written and read operation, it would be preferable to support burst reads and writes and non-burst read-writes；It is inside and outside that external bus interface is then used for CPU Data communication；

Information after hardware optimization includes：The source address information of data transfer, destination address information, data length information, control Information, data message.

3. the hardware optimization method of a kind of CPU described in a Ju claim 1, it is characterised in that：Described use burst patterns It is when data length is more than 32, you can enable burst patterns；Burst patterns are independent on whether to open cache functions, The memory read-write of the data of a cache row size can at most be supported；When CPU inside and outside carries out data transmission, read When taking internal memory, once the data of a cache row are read in the burst registers of data transmission unit, then deposited Device assignment operation；During write internal memory, the data for needing write are temporarily stored in the burst registers of data transmission unit first, Then internal memory is disposably write.

4. a kind of hardware optimization method of CPU, it is characterised in that：Comprise the following steps：

(3) using burst patterns：When data length is more than 32, you can enable burst patterns, burst patterns are not relied on Whether cache function is opened, and the actual desirable maximum number of bytes of burst are without hard requirement.

5. the hardware optimization method of a kind of CPU described in a Ju claim 4, it is characterised in that：System includes：Master controller, Storage control, external bus interface module；Wherein, master controller is the main part of system, and being can be for different memory access Address performs the CPU of the storage operation of different modes, and specifically, this CPU can be internal by BIOS program or setting The mode of register value come set can burst address areas, for can the memory access of burst areas by optimization after instruction scheme perform, no Can burst areas perform by former instruction scheme, storage control by receiving master controller data message and control information, to storage Device is written and read operation, it would be preferable to support burst reads and writes and non-burst read-writes；It is inside and outside that external bus interface is then used for CPU Data communication；