CN117472797A

CN117472797A - Processing method and device of unaligned address access instruction and electronic equipment

Info

Publication number: CN117472797A
Application number: CN202311828923.1A
Authority: CN
Inventors: 李祖松; 郇丹丹; 商家玮
Original assignee: Beijing Micro Core Technology Co ltd
Current assignee: Beijing Micro Core Technology Co ltd
Priority date: 2023-12-28
Filing date: 2023-12-28
Publication date: 2024-01-30

Abstract

The invention provides a processing method and a device for a non-aligned address access instruction and electronic equipment, and relates to the technical field of computers, wherein the method comprises the following steps: determining whether the to-be-processed access instruction is a non-aligned address access instruction; determining at least one memory access operation corresponding to the to-be-processed memory access instruction based on the memory access address of the to-be-processed memory access instruction under the condition that the to-be-processed memory access instruction is a non-aligned address memory access instruction, wherein each memory access operation is a memory access operation which does not cross a cache line; executing the at least one access operation to obtain data corresponding to the at least one access operation; and writing back the to-be-processed memory access instruction based on the data corresponding to the at least one memory access operation. According to the scheme, the processing efficiency of the unaligned address access instruction can be improved.

Description

Processing method and device of unaligned address access instruction and electronic equipment

Technical Field

The present disclosure relates to the field of computer technologies, and in particular, to a method and an apparatus for processing a non-aligned address access instruction, and an electronic device.

Background

Non-aligned address memory instructions refer to memory instructions in which the memory address is not aligned on the boundary of the memory size.

Currently, a non-aligned address access instruction is processed by software in such a way that a processor generates a non-aligned exception. The software splits the non-aligned address memory access instruction into multiple aligned address memory access instructions, which are then executed by the processor hardware.

However, this processing method often requires tens to hundreds of processor cycles to complete, and the processing efficiency of the unaligned address access instruction is low.

Disclosure of Invention

The application provides a processing method and device for a non-aligned address access instruction and electronic equipment, and aims to solve the problem of low processing efficiency of the non-aligned address access instruction.

In a first aspect, the present application provides a method for processing a non-aligned address access instruction, including:

determining whether the to-be-processed access instruction is a non-aligned address access instruction;

determining at least one memory access operation corresponding to the to-be-processed memory access instruction based on the memory access address of the to-be-processed memory access instruction under the condition that the to-be-processed memory access instruction is a non-aligned address memory access instruction, wherein each memory access operation is a memory access operation which does not cross a cache line;

executing the at least one access operation to obtain data corresponding to the at least one access operation;

And writing back the to-be-processed memory access instruction based on the data corresponding to the at least one memory access operation.

In a possible implementation manner, the determining, based on the memory address of the pending memory access instruction, at least one memory access operation corresponding to the pending memory access instruction includes:

determining whether the to-be-processed access instruction is an access instruction crossing a cache line based on the access address of the to-be-processed access instruction;

and determining at least one access operation corresponding to the to-be-processed access instruction based on whether the to-be-processed access instruction is an access instruction crossing a cache line.

In one possible implementation manner, the determining, based on whether the pending access instruction is an access instruction crossing a cache line, at least one access operation corresponding to the pending access instruction includes:

determining a target memory access operation corresponding to the to-be-processed memory access instruction based on a memory access address of the to-be-processed memory access instruction under the condition that the to-be-processed memory access instruction is a memory access instruction which does not cross a cache line;

splitting the memory access address of the memory access instruction to be processed based on the boundary of the cache line to obtain a first memory access address positioned before the boundary and a second memory access address positioned after the boundary under the condition that the memory access instruction to be processed is a memory access instruction crossing the cache line; and determining a first memory access operation corresponding to the first memory access and a second memory access operation corresponding to the second memory access.

In a possible implementation manner, the executing the at least one access operation to obtain data corresponding to the at least one access operation includes:

determining the target memory access operation as a memory access operation to be processed and executing a first process under the condition that the at least one memory access operation comprises the target memory access operation;

determining the second memory operation as a pending memory operation and executing the first process if the at least one memory operation includes the first memory operation and the second memory operation; after the first memory access operation is rolled back to a transmission queue, determining the first memory access operation as a new memory access operation to be processed, and executing the first process;

wherein the first process comprises:

determining a physical address corresponding to a memory operation to be processed based on the memory address corresponding to the memory operation to be processed;

under the condition that the physical address corresponding to the to-be-processed access operation is determined, determining whether the to-be-processed access operation hits in a data cache;

and determining data corresponding to the pending access operation based on whether the pending access operation hits in a data cache.

In a possible implementation manner, the determining, based on the address corresponding to the pending access operation, the physical address corresponding to the pending access operation includes:

determining whether the address corresponding to the pending memory operation hits in a translation look-aside buffer;

and determining a physical address corresponding to the pending access operation based on whether the access address corresponding to the pending access operation hits in the translation look-aside buffer.

In a possible implementation manner, the determining, based on whether the address corresponding to the pending access operation hits in the translation look-aside buffer, the physical address corresponding to the pending access operation includes:

determining whether the access address corresponding to the access operation to be processed hits in a primary translation look-aside buffer;

determining a physical address corresponding to the to-be-processed memory operation from the primary translation look-aside buffer under the condition that the memory address corresponding to the to-be-processed memory operation hits the primary translation look-aside buffer;

and under the condition that the access address corresponding to the access operation to be processed does not hit in the primary translation backup buffer, determining the physical address corresponding to the access operation to be processed from the secondary translation backup buffer.

In one possible implementation manner, the determining, based on whether the pending access operation hits in a data cache, data corresponding to the pending access operation includes:

determining data corresponding to the to-be-processed access operation from the data cache under the condition that the to-be-processed access operation hits the data cache;

and under the condition that the pending access operation misses the data cache, determining data corresponding to the pending access operation from a lower storage system based on an access failure queue.

In a possible implementation manner, the writing back the pending memory access instruction based on the data corresponding to the at least one memory access operation includes:

writing data corresponding to the target memory access operation into a queue entry corresponding to the memory access instruction to be processed under the condition that the at least one memory access operation comprises the target memory access operation;

when the at least one access operation comprises the first access operation and the second access operation, combining data corresponding to the first access operation and data corresponding to the second access operation to obtain combined data; and writing the merged data into a queue entry corresponding to the to-be-processed access instruction.

In one possible implementation manner, the determining whether the pending access instruction is a non-aligned address access instruction includes:

determining the instruction type of the to-be-processed memory access instruction, wherein the instruction type comprises a double-word memory access instruction, a half-word memory access instruction or a byte memory access instruction;

and determining whether the to-be-processed access instruction is a non-aligned address access instruction or not based on the instruction type of the to-be-processed access instruction and the access address of the to-be-processed access instruction.

In a second aspect, the present application provides a processing apparatus for a non-aligned address access instruction, including:

the determining module is used for determining whether the to-be-processed access instruction is a non-aligned address access instruction;

the first processing module is used for determining at least one memory access operation corresponding to the to-be-processed memory access instruction based on the memory access address of the to-be-processed memory access instruction under the condition that the to-be-processed memory access instruction is a non-aligned address memory access instruction, wherein each memory access operation is a memory access operation which does not cross a cache line;

the execution module is used for executing the at least one access operation to obtain data corresponding to the at least one access operation;

And the second processing module is used for writing back the memory access instruction to be processed based on the data corresponding to the at least one memory access operation.

In one possible implementation manner, the first processing module is specifically configured to:

In one possible implementation manner, the execution module is specifically configured to:

wherein the first process comprises:

In a possible implementation manner, the second processing module is specifically configured to:

In one possible implementation manner, the determining module is specifically configured to:

In a third aspect, the present application provides an electronic device, including a memory, a processor, and a computer program stored on the memory and executable on the processor, the processor implementing a method for processing a non-aligned address access instruction according to any one of the first aspects when the program is executed.

In a fourth aspect, the present application provides a non-transitory computer readable storage medium having stored thereon a computer program which, when executed by a processor, implements a method of processing a non-aligned address access instruction according to any of the first aspects.

According to the processing method, the device and the electronic equipment for the non-aligned address memory command, whether the to-be-processed memory command is the non-aligned address memory command is determined first, and under the condition that the to-be-processed memory command is the non-aligned address memory command, at least one memory operation corresponding to the to-be-processed memory command is determined based on the memory address of the to-be-processed memory command, each memory operation is a memory operation which does not cross a cache line, then at least one memory operation is executed, data corresponding to the at least one memory operation is obtained, and the to-be-processed memory command is written back based on the data corresponding to the at least one memory operation, so that the processing of the to-be-processed memory command is completed. According to the scheme, aiming at the condition that the to-be-processed access instruction is the non-aligned address access instruction, the access operation corresponding to the to-be-processed access instruction is only required to be the access operation without crossing the cache line, but the non-aligned address access instruction does not cross the cache line or two cache lines, so that the to-be-processed access instruction is not required to be split into a plurality of aligned address access instructions, the number of to-be-processed access instruction splitting is reduced, and the access queue item resources required in the processing process of the non-aligned address access instruction are reduced, and the processing efficiency of the non-aligned address access instruction is improved.

Drawings

For a clearer description of the present application or of the prior art, the drawings that are used in the description of the embodiments or of the prior art will be briefly described, it being apparent that the drawings in the description below are some embodiments of the present application, and that other drawings may be obtained from these drawings without inventive effort for a person skilled in the art.

FIG. 1 is a flowchart of a method for processing a non-aligned address access instruction according to an embodiment of the present disclosure;

FIG. 2 is a schematic diagram of a hardware implementation of a non-aligned address access instruction according to an embodiment of the present disclosure;

FIG. 3 is a schematic diagram of a data cache before and after filling according to an embodiment of the present application;

FIG. 4 is a flowchart of determining at least one memory access operation corresponding to a memory access instruction to be processed according to an embodiment of the present application;

FIG. 5 is a flowchart illustrating a processing procedure of a pending memory access instruction according to an embodiment of the present disclosure;

FIG. 6 is a schematic diagram of address partitioning of a pending memory instruction according to an embodiment of the present disclosure;

FIG. 7 is a schematic diagram illustrating a processing of a pending access instruction according to an embodiment of the present disclosure;

FIG. 8 is a second schematic diagram illustrating a processing of a pending access instruction according to an embodiment of the present disclosure;

Fig. 9 is a third schematic diagram of processing a pending access instruction according to an embodiment of the present application;

FIG. 10 is a fourth schematic diagram illustrating a processing of a pending access instruction according to an embodiment of the present disclosure;

FIG. 11 is a fifth schematic diagram illustrating a processing of a pending memory access instruction according to an embodiment of the present disclosure;

FIG. 12 is a sixth schematic diagram illustrating a processing of a pending memory access instruction according to an embodiment of the present disclosure;

FIG. 13 is a schematic diagram of a processing device for a non-aligned address access instruction according to an embodiment of the present disclosure;

fig. 14 is a schematic physical structure of an electronic device according to an embodiment of the present application.

Detailed Description

For the purposes of making the objects, technical solutions and advantages of the present application more apparent, the technical solutions in the present application will be clearly and completely described below with reference to the drawings in the present application, and it is apparent that the described embodiments are some, but not all, embodiments of the present application. All other embodiments, which can be made by one of ordinary skill in the art based on the embodiments herein without making any inventive effort, are intended to be within the scope of the present application.

The non-aligned address memory access instruction mainly refers to that the memory access of the memory access instruction is not aligned on the boundary of the memory access size. Such as a memory instruction that accesses a double word (8 bytes), then the memory address should be aligned in double words, i.e., the memory address low three bits should be all 0 s. Accessing the memory instruction of word (4 bytes), then the memory address should be aligned by word, i.e., the lower two bits of the memory address should be all 0 s. Half-word (2 bytes) is accessed, then the address should be aligned by half-word, i.e., the address least significant bit should be 0. For memory accesses that do not meet the above criteria, it is referred to as a non-aligned address memory access instruction. Only the access instruction with the access byte has no requirement on the address of the access instruction, and the address alignment problem is not involved.

The address-aligned access instruction can simplify the cache access logic and the judgment logic related to the processor access instruction address, so that the general high-performance processor hardware is the access instruction supporting the address alignment, and most programs are compiled into the address-aligned access instruction. However, there are still some programs in the library function that are address-unaligned, and thus processing of address-unaligned memory instructions is still involved. If program migration is performed each time, the time is long. In addition, in order to reduce the cost of memory space, there is also a need for a high performance processor to allocate memory space continuously, and access to memory instructions with unaligned addresses may also occur.

Currently, in the processing of a non-aligned address access instruction, if the processor hardware does not support the non-aligned address access instruction, when the non-aligned address access instruction is encountered, a non-aligned exception is generated by a general processor, and the non-aligned exception is processed by software. The software splits the non-aligned address access instruction into multiple aligned address access instructions for execution by the processor hardware.

However, the processor adopts a processing mode of entering an exception and then splitting by software, and often tens to hundreds of processor cycles are required to complete. The processing efficiency of non-aligned address access instructions becomes a performance bottleneck for the processor.

Excessive occupation of resources and degradation of execution efficiency are also caused during the splitting process. That is, by splitting the access instruction into multiple aligned addresses, multiple ports are required for accessing the cache, multiple entries are required for accessing resources such as the access queue, and multiple write backs are required to be performed, so that the register file port is occupied for multiple times, the processing efficiency of the non-aligned address access instruction is affected, and precious processor resources are additionally occupied.

Based on this, the embodiment of the application provides a processing method of a non-aligned address access instruction, so as to solve the above technical problem.

Fig. 1 is a flowchart of a method for processing a non-aligned address access instruction according to an embodiment of the present application, where, as shown in fig. 1, the method includes:

s11, determining whether the to-be-processed access instruction is a non-aligned address access instruction.

For a pending memory access instruction, first, it is determined whether it is a non-aligned address memory access instruction. After the to-be-processed memory access instruction enters the memory access queue, a queue item is allocated for the to-be-processed memory access instruction, relevant information of the to-be-processed memory access instruction is recorded in the queue item, the memory access of the to-be-processed memory access instruction can be determined based on the relevant information of the to-be-processed memory access instruction, and whether the to-be-processed memory access instruction is a non-aligned address memory access instruction can be determined based on the memory access of the to-be-processed memory access instruction.

Specifically, firstly, determining an instruction type of a to-be-processed memory access instruction, wherein the instruction type of the to-be-processed memory access instruction comprises a double-word memory access instruction, a half-word memory access instruction or a byte memory access instruction. After determining the instruction type of the to-be-processed memory access instruction, determining whether the to-be-processed memory access instruction is a non-aligned address memory access instruction or not based on the instruction type of the to-be-processed memory access instruction and the memory access address of the to-be-processed memory access instruction.

For example, if the to-be-processed memory access instruction is a double-word memory access instruction, the memory access should be aligned according to double words, that is, the lower three bits of the memory access should be all 0, if the lower three bits of the memory access of the to-be-processed memory access instruction are all 0, the to-be-processed memory access instruction belongs to an aligned address memory access instruction, otherwise, the to-be-processed memory access instruction belongs to a non-aligned address memory access instruction; for example, if the to-be-processed memory access instruction is a word memory access instruction, it needs to be determined whether the lower two bits of the memory access address of the to-be-processed memory access instruction are all 0, if yes, the to-be-processed memory access instruction belongs to an aligned address memory access instruction, and if not, the to-be-processed memory access instruction belongs to a non-aligned address memory access instruction; for example, if the to-be-processed memory access instruction is a half-word memory access instruction, it is required to determine whether the lower one bit of the memory access address of the to-be-processed memory access instruction is 0, if yes, the to-be-processed memory access instruction belongs to the aligned address memory access instruction, and if not, the to-be-processed memory access instruction belongs to the non-aligned address memory access instruction; for example, if the pending access instruction is a byte access instruction, then it is determined that the pending access instruction belongs to an aligned address access instruction, and so on.

S12, under the condition that the to-be-processed access instruction is a non-aligned address access instruction, determining at least one access operation corresponding to the to-be-processed access instruction based on the access address of the to-be-processed access instruction, wherein each access operation is an access operation which does not cross a cache line.

In the case that the pending access instruction is a non-aligned address access instruction, the embodiment of the present application may determine, based on the access address of the pending access instruction, at least one corresponding access operation.

A plurality of cache lines are included in the data cache, with virtual addresses of different cache lines being different. For any access operation, if the data to be accessed by the access operation is in a cache line, the access operation does not cross the cache line; if the memory access operation requires that the data to be accessed be in two cache lines, the memory access operation spans the cache lines. In this embodiment of the present application, for at least one access operation corresponding to a pending access instruction, any access operation is an access operation that does not span a cache line.

S13, executing at least one access operation to obtain data corresponding to the at least one access operation.

After determining at least one memory access operation corresponding to the memory access instruction to be processed, for each memory access operation, data memory access can be performed based on the memory access operation, so as to obtain data corresponding to each memory access operation.

S14, writing back a memory access instruction to be processed based on data corresponding to at least one memory access operation.

If the number of the memory access operations corresponding to the memory access instruction to be processed is one, writing the data corresponding to the memory access operations back to the queue item of the memory access instruction to be processed. If the memory access operation corresponding to the memory access instruction to be processed is more than one, the data corresponding to each of the memory access operations is required to be written into the queue entry of the memory access instruction to be processed.

On the basis of any one of the above embodiments, the following describes the scheme of the embodiment of the application in detail with reference to the accompanying drawings.

Fig. 2 is a schematic diagram of a hardware implementation of a non-aligned address access instruction according to an embodiment of the present application, where, as shown in fig. 2, the hardware implementation includes an issue queue, an address calculation module, a non-aligned address instruction processing module, a data cache and translation look-aside buffer access module, an access instruction address correlation determination and data transfer module, and an access queue.

Transmit queues: for transmitting the memory access instructions while allocating a queue entry for the memory access instructions in the memory access queue.

An address calculation module: and the memory access address of the memory access instruction is calculated according to the instruction information of the memory access instruction.

A non-aligned address instruction processing module: and the memory access instruction is used for judging whether the memory access instruction belongs to the non-aligned address memory access instruction according to the instruction type and the memory access address of the memory access instruction. For a non-aligned address access instruction, whether the non-aligned address access instruction crosses a cache line is judged, and if the non-aligned address access instruction crosses the cache line, the non-aligned address access instruction is split into two internal operations according to the cache line boundary. The first internal operation is a low address cache line access in both operations, continuing the access pipeline access. The second internal operation is a cache line access operation for a high address of the two operations, the second operation rolling back to the transmit queue for re-execution.

Data cache and translation look-aside buffer access module: for reading a data cache tag memory bank, reading the data cache data memory bank while accessing a translation look-aside buffer. If the translation backup buffer is not hit, the translation backup buffer invalidation processing is carried out, and the translation backup buffer is rolled back to the transmission queue to be re-executed. If the translation look-aside buffer hits, tag comparisons are made to determine if the data cache hits. If the data cache hits, the data of the hit path needed by the access instruction is placed at the low order of the access result register. Write back directly to access instructions that do not cross cache lines. Entering a memory access queue, and writing data into a data item of the memory access queue; for instructions that cross cache lines, both operations write data into the memory queue in the order of address height for the queue entry allocated when the original memory instruction was dispatched.

And the access instruction address related judgment and data transfer module: for maintaining the order of execution among the out-of-order transmitted memory access instructions. The memory access related adopts a data transmission mechanism, specifically, one access instruction needs to acquire corresponding data from the memory instructions related to the access instruction before all the execution sequences; an store instruction determines whether all instructions already in the memory queue, associated therewith, and whose execution sequence is behind it are present, and if so, invalidates the corresponding fetch instruction. The instruction address-related judgment makes an accurate judgment in units of bytes. For fetch instruction write backs, the previous store instruction address needs to be rolled back to the issue queue for re-execution if it is not computed. And the data of the access instruction are all placed in the low order of the register, and when the data transmission is carried out, the position of the transmitted data in the register is obtained according to the address of the access instruction, and the data is selected for transmission.

Access queue: and the sequencing of the access instruction is responsible for invalidating the data access and write-back of the access instruction, and the split non-aligned address access instruction performs the data splicing of two internal operations. The memory queue receives memory instructions from the transmit queue, allocates memory queue entries, and occupies the same memory queue entry for two internal operations of the split non-aligned address memory instructions.

The memory queue receives the results of the memory instruction from the memory pipeline to update its internal state. The hit memory operation enters the memory queue, data is written into the data item of the memory queue, and for the instruction crossing the cache line, the data is written into the original memory instruction item according to the address high-low order by the two operations. And entering a memory access queue, and waiting for the access with the cache invalidation to be sent to the memory access invalidation queue. For a memory access instruction crossing a cache line, according to the result of whether the caches of the two split internal operations hit or not, sending memory access invalidation requests to a memory access invalidation queue according to the cache line addresses of the internal operations for the internal operations which do not hit. The access queue receives the result of the invalid filling of the data cache, and updates the instruction state of all data waiting for the filling in the access queue.

A data cache fill will pass data to all memory access queues waiting for this cache line, and all instructions waiting for this cache line data in the memory access queues will get the data by address matching. The data states of these access queue entries are identified as valid. If the instruction has previously forwarded the store instruction to the fetch instruction, the memory access queue is responsible for merging the forward results as the data is backfilled. The memory queue writes back memory instructions for which all the required data has been obtained.

Each queue entry in the access queue contains the following information: operation information of the memory access instruction, memory access operation and access width corresponding to the memory access instruction, whether the memory access instruction belongs to information such as splitting the non-aligned address memory access instruction into two internal operations, a physical address (physical address of the memory access instruction), a data item (data accessed by the memory access instruction), and a status bit (status of the queue item).

For status bits of a queue entry, table 1 illustrates the possible cases of status bits.

TABLE 1

Access failure queue: is responsible for handling stale cache requests in the memory queue and accessing the underlying storage system. If a invalidated access instruction is successfully allocated an access invalidate queue entry, it will subsequently snoop the access queue for the result of the data cache fill.

Fig. 3 is a schematic diagram of comparison before and after data cache filling, and as shown in fig. 3, illustrates changes of each item in the memory queue before and after data cache filling.

Referring to the memory queue entry in fig. 3, which is not filled with the data cache, the data in the memory queue entry is empty and the corresponding status bit is invalid before the data cache is not filled. Referring to the memory queue entry filled with the data cache in fig. 3, after filling, the memory queue entry has corresponding backfill data, and the corresponding status bit is valid for data.

In the foregoing embodiments, the hardware implementation of the unaligned address access instruction provided in the embodiments of the present application is described, and based on the foregoing hardware implementation, the scheme of the embodiments of the present application is described in detail below with reference to the accompanying drawings.

Fig. 4 is a flowchart of determining at least one memory access operation corresponding to a pending memory access instruction according to an embodiment of the present application, where, as shown in fig. 4, the method includes:

s41, based on the memory access of the memory access instruction to be processed, determining whether the memory access instruction to be processed is a memory access instruction crossing a cache line.

The memory access instruction crossing the cache line refers to the memory access instruction of the data to be accessed in two cache lines, and the corresponding memory access instruction not crossing the cache line refers to the memory access instruction of the data to be accessed in one cache line. After determining the memory address of the pending memory instruction, it may be determined whether the pending memory instruction is a memory instruction that spans a cache line.

S42, determining at least one access operation corresponding to the to-be-processed access instruction based on whether the to-be-processed access instruction is an access instruction crossing a cache line.

And under the condition that the to-be-processed access instruction is an access instruction which does not cross the cache line, determining a target access operation corresponding to the to-be-processed access instruction based on the access address of the to-be-processed access instruction. That is, under the condition that the to-be-processed access instruction does not cross the cache line, the to-be-processed access instruction does not need to be split, only one access operation corresponding to the to-be-processed access instruction, namely the target access operation, is needed, and the corresponding access address is the access address of the to-be-processed access instruction.

Splitting the memory access address of the memory access instruction to be processed based on the boundary of the cache line under the condition that the memory access instruction to be processed is a memory access instruction crossing the cache line, so as to obtain a first memory access address positioned in front of the boundary and a second memory access address positioned behind the boundary; and determining a first memory access operation corresponding to the first memory access and a second memory access operation corresponding to the second memory access. That is, in the case where the pending access instruction crosses a cache line, the pending access instruction needs to be split to obtain two corresponding internal operations, namely, a first access operation with a high address and a second access operation with a low address. Neither the split first nor second memory access operation spans a cache line.

In the above embodiment, it is described how to determine at least one memory access operation corresponding to a pending memory access instruction, and after determining at least one memory access operation, the at least one memory access operation needs to be executed to obtain data corresponding to the at least one memory access operation.

Specifically, under the condition that at least one access operation comprises a target access operation, determining the target access operation as a to-be-processed access operation, and executing a first process;

Under the condition that the at least one access operation comprises a first access operation and a second access operation, determining the second access operation as a to-be-processed access operation, and executing a first process; after the first memory access operation is rolled back to the transmission queue, determining the first memory access operation as a new memory access operation to be processed, and executing a first process;

the first process comprises the following steps 1 to 3:

step 1, determining a physical address corresponding to the memory access operation to be processed based on the memory access address corresponding to the memory access operation to be processed.

Specifically, it is first determined whether the address corresponding to the pending memory operation hits in the translation lookaside buffer. And determining a physical address corresponding to the pending memory operation based on whether the memory address corresponding to the pending memory operation hits in the translation look-aside buffer. Alternatively, the translation look-aside buffer in embodiments of the present application may be a multi-stage translation look-aside buffer, such as a two-stage translation look-aside buffer, a three-stage translation look-aside buffer, and so forth, which embodiments of the present application do not limit.

In one possible implementation, taking the translation look-aside buffer as a two-stage translation look-aside buffer as an example, it is first determined whether the address corresponding to the pending access operation hits in the one-stage translation look-aside buffer. Under the condition that the access address corresponding to the access operation to be processed hits in the primary translation backup buffer, the access address corresponding to the access operation to be processed is hit in the translation backup buffer, so that the physical address corresponding to the access operation to be processed can be determined from the primary translation backup buffer; in the case that the address corresponding to the pending memory operation misses in the primary translation look-aside buffer, it indicates that the translation look-aside buffer misses, and a physical address corresponding to the pending memory operation needs to be determined from the secondary translation look-aside buffer.

And step 2, determining whether the pending access operation hits in the data cache or not under the condition that the physical address corresponding to the pending access operation is determined.

And step 3, determining data corresponding to the to-be-processed access operation based on whether the to-be-processed access operation hits in the data cache.

Specifically, in the case that the pending access operation hits the data cache, data corresponding to the pending access operation is determined from the data cache. And under the condition that the data cache is not hit by the to-be-processed access operation, determining data corresponding to the to-be-processed access operation from the lower storage system based on the access failure queue.

And then, writing back the memory access instruction to be processed based on the data corresponding to the at least one memory access operation. Writing data corresponding to the target memory access operation into a queue entry corresponding to the memory access instruction to be processed under the condition that at least one memory access operation comprises the target memory access operation; under the condition that at least one access operation comprises a first access operation and a second access operation, merging data corresponding to the first access operation and data corresponding to the second access operation to obtain merged data; and writing the combined data into a queue entry corresponding to the to-be-processed access instruction.

Fig. 5 is a process flow diagram of a pending access instruction provided in an embodiment of the present application, as shown in fig. 5, including:

s501, a transmission queue transmits a memory access instruction to be processed.

The transmission queue transmits the to-be-processed access instruction, and simultaneously, a queue entry is allocated for the to-be-processed access instruction in the access queue.

S502, calculating the memory access address of the memory access instruction to be processed.

And calculating the memory access address of the memory access instruction according to the instruction information of the memory access instruction to be processed. The instruction information of the to-be-processed memory access instruction comprises the base address and the offset of the to-be-processed memory access instruction, and the memory access of the to-be-processed memory access instruction can be obtained through adding the base address and the offset of the to-be-processed memory access instruction.

S503, judging whether the to-be-processed access instruction is a non-aligned address access instruction according to the instruction type and the access address of the to-be-processed access instruction, if so, executing S504, and if not, executing S506.

S504, judging whether the to-be-processed access instruction spans the cache line, if so, executing S505, and if not, executing S506.

S505, splitting into a first access operation and a second access operation according to the boundary of the cache line, executing S501 for the first access operation, and executing S506 for the second access operation.

And judging whether the to-be-processed memory access instruction is a non-aligned address memory access instruction or not according to the instruction type and the memory access address of the to-be-processed memory access instruction. For a non-aligned address access instruction, it is determined whether it crosses a cache line. The specific implementation manner of determining whether the pending access instruction is a non-aligned address access instruction and whether to cross a cache line may be referred to the related description in the above embodiment, which is not repeated herein.

If the pending memory instruction crosses a cache line, then the memory instruction is split into two internal operations, a first memory operation and a second memory operation, according to the cache line boundaries. The second memory operation is a cache line access of a low address of the two operations, continuing the memory pipeline access. The first memory operation is a cache line access operation of the high address of the two operations, and the first memory operation rolls back to the transmit queue for re-execution.

S506, reading the data cache tag memory bank, reading the data cache data memory bank, and accessing the translation look-aside buffer. Whether the translation look-aside buffer is hit or not is determined, if yes, S507 is executed, and if no, S501 is executed.

S506 to S511 are specific implementation flows of the first procedure, i.e. for the pending access operation, it is first determined whether the translation look-aside buffer is hit. If the translation look-aside buffer is hit, then subsequent flows may be executed, and if the translation look-aside buffer is not hit, then rollback to the transmit queue for re-execution.

Since the physical address corresponding to the pending access operation is looked up from the secondary translation look-up buffer after the translation look-up buffer is missed, the translation look-up buffer can be hit after the rollback to the transmit queue is re-executed, and the subsequent flow is continued.

S507, comparing the data cache label memory bank, judging whether the data cache is hit, if yes, executing S508, otherwise, executing S510.

By tag comparison, it can be determined whether the data cache hits.

S508, the data of the hit way needed by the to-be-processed access instruction is placed at the lower position of the access result register, the related judgment and data transfer of the access and storage instructions are carried out, and the to-be-processed access instruction which does not cross the cache line is directly written back.

If the data cache hits, the data of the hit path needed by the to-be-processed access instruction is placed at the low position of the access result register, and the related judgment and data transfer of the access instruction and the memory instruction are carried out. For instructions that do not cross a cache line, write back directly.

S509, entering a memory queue, writing data into a data item of the memory queue, and for a to-be-processed memory instruction crossing a cache line, writing the data into a primary memory instruction item according to the address order.

S510, entering a memory access queue, and waiting for accessing a memory access invalidation queue.

S511, the access failure queue accesses the lower storage system, returns filling data to the access queue and the data cache, and writes the data required by the to-be-processed access instruction into the access queue entry.

S512, writing back the pending access instruction, and exiting after submitting.

The following illustrates aspects of embodiments of the present application in a few specific examples.

Fig. 6 is a schematic view of dividing an address of a to-be-processed access instruction according to an embodiment of the present application, where, as shown in fig. 6, the address of the to-be-processed access instruction includes a tag, an index, and an offset in a cache line, and whether the data cache is hit can be determined by the tag and the index. The processor cache line size is 512 bits (64 bytes), then the cache line corresponding to the 512-bit cache line is offset to the lowest 6 bits of the address.

Fig. 7 is a schematic diagram illustrating a processing of a pending access instruction according to an embodiment of the present application, as shown in fig. 7, where the pending access instruction is a non-aligned address access instruction and does not cross a cache line.

The non-aligned address memory access instruction does not cross a cache line, and the non-aligned address memory access instruction accesses n bytes, all in the same cache line. Therefore, the method does not need to be split into two internal operations, only n bytes are required to be accessed from the memory address of the original memory instruction to be processed, and n byte positions from 0 to n-1 are required to be placed in the register, as shown in fig. 7, and n is 8.

Fig. 8 is a second schematic processing diagram of a pending access instruction according to the embodiment of the present application, as shown in fig. 8, where the pending access instruction is a double-word access instruction, for example, a double-word fetch operation on an address 0x407_1002, and takes 64-bit (8-byte) long integer data, and the address of the lowest 6 bits is 16 0x02 (binary: 000010, decimal: 2), and the fetch width is double-word. Since the lowest 3 bits of the double-word access address are not all 0 s, the pending access instruction is determined to be a non-aligned address access instruction, which does not span a cache line, and the 8 bytes accessed are all in the cache line starting from 0x407_1002, as shown in fig. 8.

Fig. 9 is a third schematic diagram of processing a pending access instruction according to an embodiment of the present application, where, as shown in fig. 9, the pending access instruction is a non-aligned address access instruction and spans a cache line. The unaligned address accesses n bytes, m bytes at the end of the first cache line and n-m bytes at the second cache line. The second memory access operation is to access m bytes from the memory address of the original memory access instruction to be processed, and put the m bytes into the 0 to m-1 byte position from the register. The first memory operation split is to access n-m bytes from the lowest address of the next cache line, put into the register from m to n-1 bytes. The results of the two internal operations are stitched into one n-byte data in the register, as shown in fig. 9, m is 2 and n is 8.

Fig. 10 is a schematic diagram of processing a pending access instruction according to an embodiment of the present application, as shown in fig. 10, where the pending access instruction is a double word access, such as a double word fetch operation on an address 0x407_103e, and takes 64 bits (8 bytes) of long integer data, where the address of the lowest 6 bits is 16 0x3e (binary: 111110, decimal: 62), and the fetch width is double word. Because the lowest 3 bits of the double-word access address are not all 0 s, it can be determined that the pending access instruction whose contents 2 bytes are at the last 2 bytes of the cache line starting at 0x407_1000 and the remaining 6 bytes are at the first 6 bytes of the cache line starting at 0x407_1040 address is a non-aligned address access, as shown in FIG. 10.

Since the request contents of the unaligned address access instruction cross the boundaries of 0x407_1000 and 0x407_1040 cache lines, it is necessary to split the internal access operations of the two cache lines, access data from 0x407_1000 and 0x407_1040, respectively, into two internal operations, the second access operation is to fetch two bytes of data from address 0x407_103e, the first access operation is to fetch 6 bytes of data from address 0x407_1040, and then splice the fetched data of the two operations to the same register. The two internal operations of the access instruction occupy the same queue entry in the access queue, and the write-back register also writes back the spliced data once.

Fig. 11 is a fifth processing schematic diagram of a pending access instruction provided in the embodiment of the present application, where, as shown in fig. 11, the pending access instruction is an aligned address access instruction, and there is no problem of crossing cache lines. The pending access instruction accesses n bytes, all in the same cache line, aligned to the address. The memory access instruction to be processed is not required to be split into two internal operations, n bytes are accessed from the memory access address of the original memory access instruction to be processed, n byte positions from 0 to n-1 are put into a register, and n is 8 as shown in fig. 11.

Fig. 12 is a sixth schematic processing diagram of a pending access instruction provided in the embodiment of the present application, as shown in fig. 12, where the pending access instruction is a double word access, for example, a double word fetch operation on the 0x407_1008 address, 64 bits (8 bytes) of long integer data are fetched, the lowest 6 bits address is 16 0x08 (binary: 001000, decimal: 8), and the fetch width is double word. Because the lowest 3 bits of the double-word access address are all 0 s, the pending access instruction may be determined to be an aligned address access, as shown in FIG. 12.

The non-aligned address access instruction spans pages, which is the case when the cache line is crossed, and the two split internal operations respectively perform access of the translation backup buffer and invalidation of the translation backup buffer.

According to the processing method of the non-aligned address access instruction, data of the access instruction are uniformly placed in a low-order register, the non-aligned address access instruction crossing a cache line is divided into two internal operations to access a data cache and a translation backup buffer, splitting of the access instruction not crossing the cache line is avoided, the instruction crossing the cache line is only split into two internal operations at most, and is only split according to the boundary of the cache line, the access instruction with multiple aligned addresses is not split, the execution of the split second internal operation adopts a rollback execution method, the speed of processing the non-aligned address access instruction by a processor is accelerated, the problem that the processing efficiency of the non-aligned address access instruction is low is solved, accordingly, the performance of the processor is improved, the split second operation multiplexes the existing access instruction pipeline, and additional hardware overhead is not required to be increased. According to the scheme, the software is prevented from being trapped in an exception processing program, and meanwhile, the non-aligned address access instruction is prevented from being executed by the access instruction split into a plurality of aligned addresses.

The following describes a processing device of a non-aligned address access instruction provided by the present application, and the processing device of the non-aligned address access instruction described below and the processing method of the non-aligned address access instruction described above may be referred to correspondingly.

Fig. 13 is a schematic structural diagram of a processing device for a non-aligned address access instruction according to an embodiment of the present application, where, as shown in fig. 13, the device includes:

a determining module 131, configured to determine whether the pending access instruction is a non-aligned address access instruction;

the first processing module 132 is configured to determine, based on the memory address of the to-be-processed memory instruction, at least one memory operation corresponding to the to-be-processed memory instruction, where each memory operation is a memory operation that does not span a cache line, if the to-be-processed memory instruction is a non-aligned address memory instruction;

an execution module 133, configured to execute the at least one access operation to obtain data corresponding to the at least one access operation;

the second processing module 134 is configured to write back the pending memory access instruction based on data corresponding to the at least one memory access operation.

In one possible implementation, the first processing module 132 is specifically configured to:

In one possible implementation, the execution module 133 is specifically configured to:

Wherein the first process comprises:

In one possible implementation, the second processing module 134 is specifically configured to:

In one possible implementation, the determining module 131 is specifically configured to:

Fig. 14 illustrates a physical structure diagram of an electronic device, as shown in fig. 14, which may include: processor 1410, communication interface (Communications Interface) 1420, memory 1430 and communication bus 1440, wherein processor 1410, communication interface 1420 and memory 1430 communicate with each other via communication bus 1440. Processor 1410 may invoke logic instructions in memory 1430 to perform a method of processing non-aligned address memory access instructions, the method comprising: determining whether the to-be-processed access instruction is a non-aligned address access instruction; determining at least one memory access operation corresponding to the to-be-processed memory access instruction based on the memory access address of the to-be-processed memory access instruction under the condition that the to-be-processed memory access instruction is a non-aligned address memory access instruction, wherein each memory access operation is a memory access operation which does not cross a cache line; executing the at least one access operation to obtain data corresponding to the at least one access operation; and writing back the to-be-processed memory access instruction based on the data corresponding to the at least one memory access operation.

In addition, the logic instructions in the memory 1430 described above may be implemented in the form of software functional units and may be stored in a computer readable storage medium when sold or used as a stand alone product. Based on such understanding, the technical solution of the present application may be embodied essentially or in a part contributing to the prior art or in a part of the technical solution, in the form of a software product stored in a storage medium, including several instructions for causing a computer device (which may be a personal computer, a server, or a network device, etc.) to perform all or part of the steps of the methods described in the embodiments of the present application. And the aforementioned storage medium includes: a U-disk, a removable hard disk, a Read-Only Memory (ROM), a random access Memory (RAM, random Access Memory), a magnetic disk, or an optical disk, or other various media capable of storing program codes.

In another aspect, the present application further provides a computer program product, where the computer program product includes a computer program, where the computer program may be stored on a non-transitory computer readable storage medium, where the computer program when executed by a processor is capable of executing a method for processing a non-aligned address memory instruction provided by the above methods, where the method includes: determining whether the to-be-processed access instruction is a non-aligned address access instruction; determining at least one memory access operation corresponding to the to-be-processed memory access instruction based on the memory access address of the to-be-processed memory access instruction under the condition that the to-be-processed memory access instruction is a non-aligned address memory access instruction, wherein each memory access operation is a memory access operation which does not cross a cache line; executing the at least one access operation to obtain data corresponding to the at least one access operation; and writing back the to-be-processed memory access instruction based on the data corresponding to the at least one memory access operation.

In yet another aspect, the present application further provides a non-transitory computer readable storage medium having stored thereon a computer program, which when executed by a processor, is implemented to perform a method for processing a non-aligned address access instruction provided by the above methods, the method comprising: determining whether the to-be-processed access instruction is a non-aligned address access instruction; determining at least one memory access operation corresponding to the to-be-processed memory access instruction based on the memory access address of the to-be-processed memory access instruction under the condition that the to-be-processed memory access instruction is a non-aligned address memory access instruction, wherein each memory access operation is a memory access operation which does not cross a cache line; executing the at least one access operation to obtain data corresponding to the at least one access operation; and writing back the to-be-processed memory access instruction based on the data corresponding to the at least one memory access operation.

The apparatus embodiments described above are merely illustrative, wherein the elements illustrated as separate elements may or may not be physically separate, and the elements shown as elements may or may not be physical elements, may be located in one place, or may be distributed over a plurality of network elements. Some or all of the modules may be selected according to actual needs to achieve the purpose of the solution of this embodiment. Those of ordinary skill in the art will understand and implement the present invention without undue burden.

From the above description of the embodiments, it will be apparent to those skilled in the art that the embodiments may be implemented by means of software plus necessary general hardware platforms, or of course may be implemented by means of hardware. Based on this understanding, the foregoing technical solution may be embodied essentially or in a part contributing to the prior art in the form of a software product, which may be stored in a computer readable storage medium, such as ROM/RAM, a magnetic disk, an optical disk, etc., including several instructions for causing a computer device (which may be a personal computer, a server, or a network device, etc.) to execute the method described in the respective embodiments or some parts of the embodiments.

Finally, it should be noted that: the above embodiments are only for illustrating the technical solution of the present application, and are not limiting thereof; although the present application has been described in detail with reference to the foregoing embodiments, it should be understood by those of ordinary skill in the art that: the technical scheme described in the foregoing embodiments can be modified or some technical features thereof can be replaced by equivalents; such modifications and substitutions do not depart from the spirit and scope of the corresponding technical solutions.

Claims

1. A method for processing a non-aligned address access instruction, comprising:

2. The method of claim 1, wherein the determining at least one memory operation corresponding to the pending memory instruction based on the memory address of the pending memory instruction comprises:

3. The method of claim 2, wherein the determining at least one memory operation corresponding to the pending memory instruction based on whether the pending memory instruction is a memory instruction that spans a cache line comprises:

4. The method of claim 3, wherein the performing the at least one memory access operation to obtain data corresponding to the at least one memory access operation comprises:

wherein the first process comprises:

5. The method of claim 4, wherein the determining the physical address corresponding to the pending memory operation based on the memory address corresponding to the pending memory operation comprises:

6. The method of claim 5, wherein the determining the physical address corresponding to the pending memory operation based on whether the memory address corresponding to the pending memory operation hits in the translation look-aside buffer comprises:

7. The method of claim 4, wherein the determining data corresponding to the pending memory access operation based on whether the pending memory access operation hits in a data cache comprises:

8. The method of claim 3, wherein writing back the pending memory instruction based on the data corresponding to the at least one memory access operation comprises:

9. A method according to any of claims 1-3, wherein said determining whether the pending memory access instruction is a non-aligned address memory access instruction comprises:

10. A processing apparatus for a non-aligned address access instruction, comprising:

11. An electronic device comprising a memory, a processor and a computer program stored on the memory and executable on the processor, wherein the processor implements a method of processing a non-aligned address access instruction as claimed in any one of claims 1 to 9 when the program is executed by the processor.

12. A non-transitory computer readable storage medium having stored thereon a computer program, wherein the computer program when executed by a processor implements a method of processing a non-aligned address access instruction as claimed in any of claims 1 to 9.