WO2021217293A1

WO2021217293A1 - Addressing method for processor, processor, movable platform, and electronic device

Info

Publication number: WO2021217293A1
Application number: PCT/CN2020/086985
Authority: WO
Inventors: 韩志; 吴穹蔗; 刘石壮
Original assignee: 深圳市大疆创新科技有限公司
Priority date: 2020-04-26
Filing date: 2020-04-26
Publication date: 2021-11-04
Also published as: CN112639747A

Abstract

An addressing method for a processor, a processor, a movable platform, and an electronic device. The processor comprises: a processor core, an addressing module, and a memory. The addressing method comprises: the addressing module obtains a base address and an offset address of data in the memory; the addressing module obtains a storage address of the data in the memory according to the base address and the offset address; and the processor core accesses the data at the storage address by means of the addressing module.

Description

Addressing method of processor, processor, movable platform and electronic equipment

Technical field

The present disclosure relates to the field of data processing, and in particular to an addressing method of a processor, a processor, a movable platform, and an electronic device.

Background technique

When the processor performs data processing, the memory needs to be addressed to read data from the memory or write data to the memory. For some algorithms of image processing and digital signal processing, the storage address of the data in the memory is often irregular, or the rules are too complex and changeable, so the processor usually accesses the memory by means of look-up table addressing.

Summary of the invention

The present disclosure provides an addressing method for a processor, the processor includes: a processor core, an addressing module, and a memory; the addressing method includes:

The addressing module obtains the base address and the offset address of the data in the memory;

The addressing module obtains the storage address of the data in the memory according to the base address and the offset address; and

The processor core accesses the data of the storage address through the addressing module.

The present disclosure also provides a processor, which includes: a processor core, an addressing module, and a memory;

The addressing module is configured to obtain the base address and the offset address of the data in the memory, and obtain the storage address of the data in the memory according to the base address and the offset address;

The processor core may access the data of the memory at the storage address through the addressing module.

The present disclosure also provides a computer-readable storage medium, including instructions, which when run on a computer, cause the computer to execute the above-mentioned addressing method.

The present disclosure also provides a movable platform. The movable platform includes a fuselage; the fuselage includes at least one circuit; and the circuit includes at least one processor as described above.

The present disclosure also provides an electronic device, the electronic device includes: a housing; the housing is provided with: at least one circuit; the circuit includes: at least one processor as described above.

The present disclosure also provides a computer program product including instructions, which, when the instructions run on a computer, cause the computer to execute the addressing method described above.

Description of the drawings

FIG. 1 is a flowchart of an addressing method of a processor according to an embodiment of the disclosure.

Fig. 2 is a schematic structural diagram of a processor according to an embodiment of the disclosure.

FIG. 3 is a schematic structural diagram of an addressing module according to an embodiment of the disclosure.

FIG. 4 shows the data flow of the read operation of the first addressing unit and the second addressing unit of the embodiment of the present disclosure.

FIG. 5 is a schematic diagram of the structure of the first address calculation in an embodiment of the disclosure.

FIG. 6 is a flowchart of a processor core accessing data of a storage address through a first addressing unit during a read operation in an embodiment of the disclosure.

FIG. 7 is a schematic structural diagram of a first conflict processing unit according to an embodiment of the disclosure.

FIG. 8 shows the data processing process of the conflict resolution mechanism in the read operation of the embodiment of the present disclosure.

FIG. 9 shows the data processing process in the embodiment of the present disclosure in which there is no storage block conflict.

FIG. 10 shows the data processing process of data splicing in the embodiment of the present disclosure.

FIG. 11 shows the data processing process of the base address update mode of the embodiment of the present disclosure.

FIG. 12 is a schematic structural diagram of a base address update unit according to an embodiment of the disclosure.

FIG. 13 shows a signal timing diagram of the handshake protocol of a read operation according to an embodiment of the present disclosure.

FIG. 14 is a schematic diagram of another structure of the first addressing unit according to an embodiment of the disclosure.

FIG. 15 shows the data flow of the write operation of the first addressing unit and the second addressing unit of the embodiment of the present disclosure.

FIG. 16 is a flowchart of a processor core accessing data of a storage address through a first addressing unit in a write operation in an embodiment of the disclosure.

Figure 17 shows the data processing process of data splitting in an embodiment of the present disclosure.

FIG. 18 is a schematic diagram of another structure of the first conflict processing unit according to an embodiment of the disclosure.

FIG. 19 shows the data processing process of the conflict resolution mechanism in the write operation of the embodiment of the present disclosure.

FIG. 20 shows a signal timing diagram of the handshake protocol of a write operation in an embodiment of the present disclosure.

FIG. 21 is a schematic diagram of another structure of the first addressing unit according to an embodiment of the disclosure.

FIG. 22 is a schematic diagram of another structure of an addressing module according to an embodiment of the disclosure.

FIG. 23 is a schematic structural diagram of an addressing module in a ping-pong addressing mode according to an embodiment of the disclosure.

FIG. 24 is a schematic diagram of another structure of an addressing module in a ping-pong addressing mode according to an embodiment of the disclosure.

FIG. 25 is a schematic structural diagram of a movable platform according to an embodiment of the disclosure.

FIG. 26 is a schematic structural diagram of an electronic device according to an embodiment of the disclosure.

Detailed ways

In the addressing process of some technologies, the processor core reads the base address and offset address in the register, and calculates the storage address. The entire addressing process is completed by the processor core, which occupies the computing resources of the processor core, and the table look-up efficiency is low. During the processing of the storage block conflict by the processor core, other operations cannot be performed, and it is necessary to wait for the resolution of the storage block conflict, which affects the efficiency of the processor. In addition, the addressing mode is single, the flexibility is insufficient, and multiple flexible addressing modes cannot be provided, and the addressing efficiency for reading and writing of large amounts of data is low.

The addressing method for the processor, the processor, the computer-readable storage medium, the removable platform, and the electronic device provided in the present disclosure can use the addressing module to realize the access of the processor core to the memory, that is, the processor core can access the memory through the addressing module. The memory reads data and writes data to the memory.

It should be noted that the processor in this embodiment can be any type of device with data processing capabilities, such as but not limited to central processing unit (CPU), digital signal processor (DSP), application specific integrated circuit (ASIC), field Programmable gate array (FPGA), graphics processing unit (GPU), microprocessor, microcontroller, network processor (NP) or other programmable logic devices, discrete gate or transistor logic devices, discrete hardware components.

The processor may be a single-core processor or a multi-core processor, including one or more processor cores. The processor core may include an arithmetic logic unit (ALU) and/or control logic. ALU can perform arithmetic and logical operations. The control logic is used to control a series of operations of the ALU. For example, for the DSP, the ALU may include a multiply and ACumulate (MAC, Multiply and ACumulate) and a shifter. Each MAC includes a multiplier and an adder, which are used to perform arithmetic operations of multiplication and addition. The shifter is used to perform logic operations for shifting data.

The memory in this embodiment may be various random access memories (Random Access Memory, RAM), for example, static random access memory (Static RAM, SRAM), dynamic random access memory (Dynamic RAM, DRAM), synchronous dynamic Random access memory (Snchronous DRAM, SDRAM), double data rate synchronous dynamic random access memory (Double Data Rate SDRAM, DDR SDRAM), enhanced synchronous dynamic random access memory (Enhanced SDRAM, ESDRAM), synchronous connection dynamic random access memory Access memory (SynchLink DRAM, SLDRAM) and direct memory bus random access memory (Direct Rambus RAM, DR RAM).

The technical solutions of the present disclosure will be clearly and completely described below in conjunction with the embodiments and the drawings in the embodiments.

An embodiment of the present disclosure provides an addressing method for a processor. As shown in FIG. 1, the addressing method includes:

S101: The addressing module obtains the base address and the offset address of the data in the memory;

S102: The addressing module obtains the storage address of the data in the memory according to the base address and the offset address;

S103: The processor core accesses the data of the storage address through the addressing module.

In this embodiment, as shown in FIG. 2, the processor includes: a processor core, an addressing module, and a memory, and the addressing module can be integrated inside the processor. In the addressing method of this embodiment, the addressing module can be used to realize the table look-up addressing of the memory by the processor core, that is, the processor core can read data from the memory and write data into the memory in a table look-up manner through the addressing module.

The addressing module may include one or more groups of addressing units. In this embodiment, a group of addressing units is taken as an example to describe the case where the group of addressing units executes the addressing method.

As shown in Figure 3, the group of addressing units includes two identical addressing units. The two addressing units communicate with the processor core through the system bus respectively. An internal bus is set in the addressing module, and the two addressing units communicate through the internal bus. When the addressing mode of this embodiment is implemented, one of the two addressing units is used to read and write data. This addressing unit can be called a table addressing unit, and the other addressing unit is used for reading and writing data. To read and write the data at the offset address of the memory, the other addressing unit may be called an offset addressing unit. For the convenience of description, the two addressing units are referred to as the first addressing unit and the second addressing unit respectively, and the first addressing unit is used as the table addressing unit, and the second addressing unit is used as the offset addressing unit. The unit is taken as an example to describe the addressing method of this embodiment. However, those skilled in the art should understand that the roles of the first addressing unit and the second addressing unit can also be interchanged, that is, the first addressing unit is used as an offset addressing unit, and the second addressing unit is used as a table addressing unit. .

In the addressing method of this embodiment, an addressing module is set in the processor, and the storage address of the data in the memory is obtained by the addressing module instead of the processor core according to the base address and the offset address. The addressing operations are all completed by the addressing module. The storage address calculation process does not require the participation of the processor core, but the addressing module calculates the storage address, which improves the efficiency of table look-up addressing compared with ordinary processors.

The addressing method of this embodiment will be described below through the processes of read operation and write operation respectively.

Read operation

When the processor core needs to read data from the memory, first through S101, the first addressing unit obtains the base address and offset address of the data in the memory.

In this embodiment, as shown in FIG. 4, the first addressing unit includes: a first address calculation unit, a first conflict processing unit, a first data processing unit, a first data transceiving unit, and a first control unit. The solid lines in the figure represent address and data signals, and the dashed lines represent control signals.

The first control unit can communicate with the processor core through the system bus, and control the operations of the first address calculation unit, the first conflict processing unit, the first data processing unit, and the first data transceiving unit. When the processor core needs to read data from the memory, the processor core sends a read request to the first control unit through the system bus; in response to the read request, the first control unit sends an offset address to the second addressing unit through the internal bus ask. In response to the offset address request, the second addressing unit reads the offset address of the data in the memory from the memory, sends the offset address to the first address calculation unit via the internal bus, and sends the offset address to the first address calculation unit via the internal bus. A control unit sends an offset address valid signal. In response to the offset address valid signal, the first control unit starts each other unit of the first addressing unit to perform a read operation.

The first address calculation module receives the base address of the data sent by the processor core and the offset address sent by the second addressing unit, and obtains the vector address of the data from the base address and the offset address. As shown in FIG. 5, the first address calculation unit includes: a base address selector, an offset address selector, and an adder.

The base address selector selects the base address sent by the processor core. The offset address selector selects the offset address sent by the second addressing unit through the internal bus. For look-up table addressing, the number of offset addresses corresponds to the number of banks (Bank) of the memory. When the memory includes N banks, the number of offset addresses is N.

After obtaining the base address and the offset address, in S102, the first address calculation unit obtains the storage address of the data in the memory according to the base address and the offset address.

As shown in FIG. 5, the adder of the first address calculation unit respectively sums the base address and the N offset addresses to obtain the storage address of the data, and the storage address is a vector address including 16 addresses.

After obtaining the vector address, in S103, the processor core reads the data stored at the vector address from the memory through the first addressing unit. When there is a bank conflict in the vector address, the processor core reads the data of the vector address through the first addressing unit.

When at least two addresses in the vector address correspond to the same bank of the memory, it is considered that there is a bank conflict at this time. The first conflict processing unit determines whether there is a Bank conflict. When there is a Bank conflict, as shown in FIG. 6, the processor core accesses the data of the storage address through the first addressing unit including:

S601: The first conflict processing unit reads the data from the vector address by using a conflict resolution mechanism, and sends the data to the first data processing unit;

S602: The first data processing unit processes the data, and sends the processed data to the first data transceiving unit;

S603: The first data transceiving unit sends the processed data to the processor core.

As shown in FIG. 7, the first conflict processing unit includes: a conflict judgment unit, an address mapping unit, an address strobe, and a read data recombination unit.

The following describes the conflict resolution mechanism in S601.

The vector address sent by the first address calculation unit is directly input to the address strobe, and the vector address buffer caches the vector address at the same time.

The address strobe strobes the vector address directly sent by the first address calculation unit, so that the vector address is output to the conflict judgment unit.

The conflict judgment unit judges the vector address:

When there is a bank conflict in the vector address, the conflict judgment unit generates a conflict flag valid signal, and feeds back the conflict flag valid signal to the address strobe, so that the address strobe strobes the vector address output by the vector address register. During the conflict processing, keep the vector address of the input conflict judgment unit unchanged.

The address mapping unit maps the vector address to the physical address of the memory.

The data reorganization unit reads the data of the physical address, reorganizes the data, and sends the reorganized data to the first data processing unit.

After that, the conflict judgment unit generates a conflict flag invalidation signal. In response to the conflict flag invalidation signal, the address selector selects the vector address of the next data sent by the first address calculation unit, and the vector address buffer caches the vector address of the next data at the same time. The conflict judgment unit continues to process the bank conflict of the next data.

The address mapping unit maps the vector address to the physical address of the memory in the following way:

The first memory cell (cell) corresponding to the vector address of each bank is grouped into a group, the second cell corresponding to the vector address is grouped into a group, and so on, until the nth cell corresponding to the vector address is grouped The cells are grouped into a group to obtain a total of n groups of cells, and the n groups of cells in the memory are sequentially strobed.

The read data reorganization unit reads the data of the vector address in the following way, and reorganizes the data:

According to the gating sequence of the n groups of cells, sequentially read the data stored in the n groups of cells, and rearrange the data stored in the n groups of cells in the order of address from small to large, to obtain the reorganized data.

The following uses FIG. 8 as an example to describe the above conflict resolution mechanism. In an example, as shown in FIG. 8, the number of banks in the memory is N=16, the memory includes 16 banks, each bank includes 5 cells, and each cell can store 4 bytes of 32 bits. Using table lookup addressing, the vector address of each group of data includes 16 addresses, and the first addressing unit can read a group of 16 data from the memory each time. Suppose the processor core needs to read a set of data labeled "1", "2", "3", and "4" in the memory, and the data labeled "1", "2", "3", "4" The base address of the data is 0, the offset address is [0, 1, 2, 3, 68, 69, 70, 71, 136, 137, 138, 139, 204, 205, 206, 207], the first address calculation unit The output vector address is [0, 1, 2, 3, 68, 69, 70, 71, 136, 137, 138, 139, 204, 205, 206, 207]. The vector address is directly input to the address strobe of the first conflict processing unit, and at the same time, the vector address buffer caches the vector address. The address strobe selects the vector address, and the conflict judgment unit judges the vector address.

As shown in Figure 8, because the vector address of the group of data corresponds to the same Bank of the memory, for example, the vector address corresponds to the first and second cells of Bank1 (B1), and the first and second cells of B2 And the third cell, and so on, so there is a Bank conflict with this vector address. At this time, the conflict judgment unit generates a conflict flag valid signal. Under the control of the conflict flag valid signal, the address strobe gates the vector address output by the vector address register, so that the vector address can be kept unchanged during the conflict processing. Until the end of the conflict resolution.

Then the address mapping unit compiles the first cell corresponding to the vector address of Bank1-Bankf (B0-Bf) into a group. In the memory of Figure 8, the first cell of B0-B3 corresponds to [0,1,2,3] in the vector address, the second cell of B4 corresponds to [71] in the vector address, and the first cell of B5 The three ells correspond to [139] in the vector address, and the fourth cell of B6 corresponds to [207] in the vector address. Therefore, the first cell corresponding to the vector address of B0-Bf is: the first cell of B0-B3, the second cell of B4, the third cell of B5, and the fourth cell of B6, namely the first cell A group of cells includes 7 cells labeled "1".

In the same way, the second cell corresponding to the vector address of B0-Bf is grouped into a group. In the memory of Figure 8, the second cell of B1-B3 corresponds to [68,69,70] in the vector address, the third cell of B4 corresponds to [138] in the vector address, and the fourth cell of B5 ell corresponds to [206] in the vector address. Therefore, the second cell corresponding to the vector address of B0-Bf is: the second cell of B1-B3, the third cell of B4, and the fourth cell of B5, that is, the second group of cells includes the label " 2" 5 cells.

By analogy, two groups of cells of the third cell and the fourth cell corresponding to the vector address of B0-Bf can be obtained. The third and fourth groups of cells are respectively the 3 cells labeled "3", 1 cell labeled "4". The address mapping unit sequentially selects the four groups of cells in the memory.

In the example of FIG. 8, n=4, that is, the vector address is mapped to the four groups of cells in the memory. In other examples, n can be other values, which depend on the vector address itself.

The read data reorganization unit reads the data stored in the four groups of cells from the memory in sequence according to the strobe sequence of the four groups of cells, and the read data is shown in FIG. 8. Through the above conflict resolution mechanism, the data stored in a group of cells can be read in one clock cycle, and the data can be read in four clock cycles.

It should be noted that in this embodiment, multiple gating sequences can be used. For example, the first to fourth groups of cells can be sequentially selected (as shown in FIG. 8), or the fourth group to the first group can be sequentially selected in reverse order. Group cells, four groups of cells can also be selected in sequence at random.

As shown in Figure 8, the data read from the memory by the read data reorganization unit is not arranged in accordance with the address, and its arrangement order does not match its actual storage location in the memory, that is to say, the read data is not arranged in accordance with the processing The processor cores need to be arranged in the order, and the processor cores cannot be used yet. Therefore, the read data reorganization unit needs to reorganize the data stored in the 4 groups of cells, and rearrange them in the order of address from small to large to obtain the reorganized data. After reorganization, the data is arranged according to its actual storage location in the memory, and the data reorganization unit sends the reorganized data to the first data processing unit for use by the processor core.

At this point, the conflict processing of the group of data ends, and the conflict judgment unit generates a conflict flag failure signal. In response to the conflict flag invalidation signal, the address selector selects the vector address of the next set of data sent by the first address calculation unit, the vector address buffer caches the vector address of the next set of data at the same time, and the conflict judgment unit continues to check the next set of data. Bank conflicts are dealt with.

In the conflict resolution mechanism, the conflict judgment unit judges the vector address. When there is no bank conflict in the vector address, the conflict judgment unit generates a conflict flag failure signal. The address mapping unit maps the vector address to the physical address of the memory. The read data reorganization unit reads the data of the physical address without reorganizing the read data, and directly sends the read data to the first data processing unit. Since there is no bank conflict, the data can be read in one clock cycle. In response to the conflict flag failure signal, the address selector strobes the vector address of the next set of data.

In addition to determining that each address of the vector address corresponds to different banks of the memory as no Bank conflict, the absence of Bank conflict described in this embodiment also includes the following situations:

When the bit width of the Bank is m bytes, the vector addresses are equally divided into m groups or 2×m groups. If each group of addresses corresponds to a cell of a bank, it is considered that there is no bank conflict in the vector address, and the first conflict processing unit performs address splicing on the vector address.

The following uses FIG. 9 as an example to illustrate the above-mentioned situation where there is no Bank conflict.

As shown in Figure 9, suppose the processor core needs to read a set of data labeled "1" to "16" in the memory, and the base address of the data labeled "1" to "16" is 0, and the offset address Is [0, 1, 2, 3, 68, 69, 70, 71, 136, 137, 138, 139, 204, 205, 206, 207], the vector address output by the first address calculation unit is [0, 1, 2, 3, 68, 69, 70, 71, 136, 137, 138, 139, 204, 205, 206, 207]. If at least two addresses in the vector address correspond to the bank conflict definition of the same bank of the memory, this situation is a bank conflict, and the first conflict processing unit should use the conflict resolution mechanism to read the group of data, which requires four clocks The reading of the data can be completed in a cycle.

In view of the above situation, when at least two addresses in the vector address correspond to the same bank of the memory, the conflict determination unit of this embodiment continues to determine the vector address. In Figure 9, the bit width of the Bank is 4 bytes, and the vector address is divided into 4 groups equally. The 4 groups of addresses are [0, 1, 2, 3], [68, 69, 70, 71], [136,137,138,139], [204,205,206,207]. Since each group of addresses corresponds to a cell of a Bank, [0, 1, 2, 3] corresponds to the first cell of B0, [68, 69, 70, 71] corresponds to the second cell of B1, [136, 137,138,139] corresponds to the third cell of B2, [204,205,206,207] corresponds to the fourth cell of B3, so the vector address is considered to be [0,1,2,3,68,69,70 , 71, 136, 137, 138, 139, 204, 205, 206, 207] There is no Bank conflict, the first conflict processing unit can read all the data "1" to "16" in one clock cycle.

The above is an example of dividing the vector addresses into 4 groups equally. In this embodiment, the vector addresses can also be equally divided into 8 groups. If each group of addresses corresponds to a cell of a bank, it is also considered that there is no bank conflict in the vector address.

It can be seen that when the bit width of the Bank is m bytes, the vector addresses are equally divided into m groups or 2×m groups, and each group address corresponds to a cell of a bank, if the conflict resolution mechanism is used To read data, n clock cycles are required, n depends on the vector address itself, and the maximum value can be 16. However, in this embodiment, through address splicing, all data can be read out in one clock cycle. Compared with the conflict resolution mechanism, it can save up to 15 clock cycles, which greatly improves the efficiency of data reading.

After the first data processing unit receives the data sent by the first conflict processing unit, it decides whether to perform further processing on the data according to the data width required by the processor core. When the data required by the processor core is not all the bytes read from each cell of each Bank, but a partial byte of each cell of each Bank, the first data processing unit compares the data sent by the first conflict processing unit Part of the bytes are spliced to generate data required by the processor core, and the spliced data is sent to the first data transceiver unit.

Specifically, the first data processing unit splicing partial bytes of the data sent by the first conflict processing unit includes:

For the data sent by the first conflict processing unit, when the bit width of the Bank is m bytes, and what the processor core needs is k bytes in every m bytes of the data, start from every m words Select the k bytes in the section to obtain N×k bytes; k≤log ₂ ^m .

Combine each m bytes of N×k bytes together to obtain data of m×k blocks each with a width of m bytes.

The following takes FIG. 10 as an example to describe the process of data splicing.

For the memory shown in Figure 8, N=16, m=4, including 16 Banks, the bit width of the Bank is 4 bytes, that is, each cell of each Bank stores 4 bytes of data, so as As shown in FIG. 10, the data sent by the first conflict processing unit to the first data processing unit includes 16 blocks, each block includes 4 bytes, and a total of 64 bytes of 512 bits. Among the 64 bytes of 1-64, if k=1, the processor core needs 1 byte in every 4 bytes, and the first data processing unit selects the processor core from every 4 bytes. The required 1 byte is 16×1=16 bytes. In Figure 10, the processor core needs 16 bytes of 2, 7, 9, 16, ..., 64. Then, the first data processing unit combines every 4 bytes of the selected 16 bytes to obtain 4 blocks of data with a width of 4 bytes each, which is the data required by the processor core , And send the spliced data to the first data transceiver unit.

When the data required by the processor core is all the bytes read from each cell of each Bank, for the example in Figure 10, if the data required by the processor core is all 64 bytes of 1-64, then The first data processing unit does not need to splice the data sent by the first conflict processing unit, but directly sends the data to the first data transceiving unit.

The process of data splicing is described above by taking k=1 as an example. According to k≤log ₂ ^m , when m=4, the value of k can also be 2, that is, what the processor core needs is 2 bytes in every 4 bytes. In this case, the first data processing unit The operation of is similar to when k=1. The first data processing unit selects 2 bytes required by the processor core from every 4 bytes to obtain 16×2=32 bytes. Then, the first data processing unit combines every 4 bytes of the selected 32 bytes to obtain 8 blocks of data with a width of 4 bytes each, which is the data required by the processor core , And send the spliced data to the first data transceiver unit.

The first data transceiver unit includes: a receiving buffer and a sending buffer. The sending buffer buffers the data sent by the first data processing unit, and sends the buffered data to the processor core through the system bus. The depth of the receiving buffer and the sending buffer can be set according to actual needs. In an example, the minimum depth of the receive buffer is 2 and the minimum depth of the transmit buffer is 0.

So far, the operation of the processor core to read the data of the memory through the first addressing unit is completed.

In some processors, the addressing operation is performed by the processor core, that is, the processor core obtains the base address and the offset address and calculates the vector address. If there is a bank conflict in the vector address, the processor needs to check the bank conflict and deal with it. During the processing of the Bank conflict, the processor core cannot perform other operations and needs to wait for the resolution of the Bank conflict. After the Bank conflict is resolved, the processor core can perform other operations. In the addressing method of this embodiment, the first addressing unit is set in the processor, and the acquisition of the base address and the offset address and the calculation of the vector address are all completed by the first addressing unit. When there is a bank conflict in the vector address, the first addressing unit uses the conflict resolution mechanism to resolve the bank conflict, and the processor core does not need to deal with the bank conflict. During the processing of the Bank conflict, the processor core can still perform other operations without waiting for the resolution of the Bank conflict. Therefore, the addressing method of this embodiment can significantly improve the efficiency of the processor and increase the computing speed of the processor.

The process of reading data through the addressing method of this embodiment is described above. Each set of data required by the processor core is read in the above manner. When the processor core reads multiple sets of data through the addressing method of this embodiment, the first addressing unit can obtain the base address and offset address of the multiple sets of data based on multiple different modes.

One mode can be called the offset address update mode. In the offset address update mode, the base address of multiple groups of data is unchanged, and the offset address of each group of data comes from the second addressing unit.

The first address calculation unit obtains the base address sent by the processor core; the second addressing unit sequentially reads the offset address of each group of data in the memory; the first address calculation unit obtains the offset address read by the second addressing unit.

As shown in Figure 5, when reading multiple sets of data, in the offset address update mode, the base address selector selects the base address sent by the processor core. Whenever a group of data is read, the offset address selector selects the offset address of the group of data sent by the second addressing unit through the internal bus. The adder sums the base address and the offset address of the group of data to obtain the vector address of the group of data. The adder sends the vector address of the group of data to the first conflict resolution unit, and sends the data to the processor core through the first conflict resolution unit, the first data processing unit, and the first data transceiver unit to complete the reading of the group of data Pick. For each group of data, the offset address selector selects the offset address of the group of data sent by the second addressing unit through the internal bus, so as to realize the reading of multiple groups of data.

The other mode can be called the base address update mode. In the base address update mode, the offset address of multiple groups of data comes from the second addressing unit, and the offset address of each group of data is the same offset address. By updating the initial value of the base address, the offset address of each group of data is obtained. Base address.

As shown in Figure 11, when reading 3 sets of data labeled "1", "2" and "3", for the first set of data labeled "1", the vector address is [0, 4, 8 , 12, 68, 72, 76, 80, 136, 140, 144, 148, 204, 208, 212, 216]; for the second set of data labeled "2", the vector address is [20, 24, 28, 32, 88, 92, 96, 100, 156, 160, 164, 168, 224, 228, 232, 236]; for the third group of data labeled "3", the vector address is [40, 44, 48, 52 , 108, 112, 116, 120, 176, 180, 184, 188, 244, 248, 252, 256]. If according to the general look-up table addressing method, each time a group of data is read, the processor core needs to write the offset address of the group of data into the memory through the second addressing unit, and then the second addressing unit. The offset address of the data is read from the memory and sent to the first addressing unit. The first addressing unit obtains the vector address from the base address [0] sent by the processor core and the offset address sent by the second addressing unit, Read this set of data from its vector address. In this way, for the above three sets of data, three offset address write operations are required.

Considering these three sets of data, if the first address of the vector address of each set of data is used as the base address, the other addresses of the vector address of the set of data have the same offset address relative to the first address. If the address [0] of the first group of data is used as the base address, the offset address of the first group of data can be expressed as [0, 4, 8, 12, 68, 72, 76, 80, 136, 140, 144, 148, 204, 208, 212, 216]; Similarly, if the address [20] of the second group of data is used as the base address, the offset address of the second group of data can also be expressed as [0, 4, 8, 12, 68, 72, 76, 80, 136, 140, 144, 148, 204, 208, 212, 216]; if the address [40] of the third group of data is used as the base address, the offset address of the third group of data is also It can be expressed as [0, 4, 8, 12, 68, 72, 76, 80, 136, 140, 144, 148, 204, 208, 212, 216].

In this embodiment, as shown in FIG. 12, the base address update unit includes an adder and a D flip-flop. In the base address update mode of this embodiment, when reading these three sets of data, the processor core uses the second addressing unit to offset the address [0, 4, 8, 12, 68, 72, 76, 80, 136, 140, 144, 148, 204, 208, 212, 216] write to the memory, and set the second addressing unit to the cyclic read mode. In this way, when reading the first group of data labeled "1", the second addressing unit will offset the address [0, 4, 8, 12, 68, 72, 76, 80, 136, 140, 144, 148, 204, 208, 212, 216] are read from the memory and sent to the first addressing unit. As shown in Figure 5 and Figure 12, the base address selector selects the output of the base address update unit, the processor core sends the base address initial value [0] to the base address update unit, and the base address initial value [0] passes through the adder. Enter the D flip-flop, when the clock pulse CP of the D flip-flop is valid, the D flip-flop sends the initial value of the base address [0] to the base address selector, and the base address selector sends the initial value of the base address [0] to the first Adder for address calculation unit. The adder adds the initial value of the base address [0] and the offset address [0, 4, 8, 12, 68, 72, 76, 80, 136, 140, 144, 148, 204, 208, 212, 216] Get the vector address [0, 4, 8, 12, 68, 72, 76, 80, 136, 140, 144, 148, 204, 208, 212, 216], read the data labeled "1" from the vector address out.

When reading the first group of data labeled "2", the second addressing unit will still offset the address [0, 4, 8, 12, 68, 72, 76, 80, 136, 140, 144, 148, 204, 208, 212, 216] are read from the memory and sent to the first addressing unit. The base address selector still selects the output of the base address update unit, the processor core sends the base address update value [20] to the base address update unit, and the adder combines the base address update value [20] with the data labeled "1". The base address (that is, the initial value of the base address [0]) is added to obtain the base address [20] of the data labeled "2", and enter the D flip-flop. When the clock pulse CP of the D flip-flop is valid, the D flip-flop The base address [20] is sent to the base address selector, and the base address selector sends the base address [20] to the adder of the first address calculation unit. The adder adds the base address [20] and the offset address [0, 4, 8, 12, 68, 72, 76, 80, 136, 140, 144, 148, 204, 208, 212, 216] to obtain the vector address [20, 24, 28, 32, 88, 92, 96, 100, 156, 160, 164, 168, 224, 228, 232, 236], read the data labeled "2" from the vector address.

When reading the first group of data labeled "3", the second addressing unit will still offset the address [0, 4, 8, 12, 68, 72, 76, 80, 136, 140, 144, 148, 204, 208, 212, 216] are read from the memory and sent to the first addressing unit. The base address selector still selects the output of the base address update unit, and the adder of the base address update unit adds the base address update value [20] to the base address [20] of the data labeled "2", and the result is labeled "3" "Data base address [40], and enter the D flip-flop. When the clock pulse CP of the D flip-flop is valid, the D flip-flop will send the base address [40] to the base address selector, and the base address selector will set the base address [40] Send to the adder of the first address calculation unit. The adder adds the base address [40] and the offset address [0, 4, 8, 12, 68, 72, 76, 80, 136, 140, 144, 148, 204, 208, 212, 216] to get the vector address [40, 44, 48, 52, 108, 112, 116, 120, 176, 180, 184, 188, 244, 248, 252, 256], read the data labeled "3" from the vector address.

For the three groups of data labeled "1", "2" and "3", the general look-up table addressing mode requires the processor core to write the offset address to the memory three times, and this embodiment uses the base address update mode , The processor core writes the offset address to the memory once. It can be seen that the addressing mode of this embodiment reduces the number of times to write the offset address and saves the time spent writing the offset address. When the processor checks the memory for large-scale table look-up addressing, it can greatly reduce Addressing time, improving addressing efficiency, the advantage is extremely obvious. At the same time, during the addressing process, the processor core sets the second addressing unit to the cyclic read mode and provides the base address update value. The entire addressing process does not require the processor core to participate too much, which can significantly improve the processor The efficiency of the processor increases the computing speed of the processor, especially in the large-scale look-up table addressing.

In addition to the base address update mode and the offset address update mode, the addressing method of this embodiment also provides a fixed offset address mode. As shown in Figure 5, in the fixed offset address mode, the first address calculation unit obtains the base address sent by the processor core, and the base address selector selects the base address sent by the processor core, and sends the base address to the addition. Device. The processor core also sends a fixed offset address to the first addressing unit, and the offset address selector of the first address calculation unit selects the fixed offset address and sends the fixed offset address to the adder. The adder of the first address calculation unit adds the base address and the offset address to obtain the vector address. The fixed offset address mode can be used in multiple addressing scenarios such as linear addressing and step addressing.

It can be seen that the addressing mode of this embodiment provides an offset address update mode, a base address update mode, and a fixed offset address mode, which can be flexibly selected according to actual conditions, which improves the flexibility of table look-up addressing.

When the user's program code is running on the processor, the processor compiles the program code into instructions that the processor can execute. When the execution of a certain code needs to read data from the memory, the processor core sequentially executes several operations such as reading instructions, decoding, reading data, and executing instructions. In some processors, when the processor core executes the operation of reading data, if a bank conflict occurs, the processor core handles the bank conflict, and the processor core needs to generate multiple instructions to read the data. Therefore, the addressing mode of some processors is an instruction-driven addressing.

The addressing method in this embodiment is a task-driven addressing. When the processor core executes the operation of reading data, the processor core generates a set of instructions to read data, which is equivalent to a task instruction, and sends the task instruction to the first addressing unit through the system bus, and the entire addressing The process is completed by the first addressing unit. The data read from the memory by the first addressing unit is sent to the processor core via the system bus. The processor core then performs subsequent operations after receiving the data. It can be seen that in this task-driven addressing of this embodiment, when the processor core needs to read data from the memory, the task instruction can be sent to the first addressing unit, and the processor core does not need to care about the specific addressing process. Even if the bank conflict occurs, it is handled by the first addressing unit. Compared with the general processor, the operation of the processor core is simplified and the efficiency is improved.

In this embodiment, the first control unit of the first addressing unit can communicate with the processor core through a handshake protocol.

As shown in Figure 13, the processor core communicates with the first addressing unit through the system bus. The system bus includes: clock signal line, read request valid, read request ready, read request, read data valid, read data ready and read For the data line, the first addressing unit works under the drive of the clock signal line. When the processor core needs to read data from the memory, the processor core sends task instructions to the first addressing unit and receives data from the first addressing unit through the handshake protocol. When the read request valid signal is high, it indicates that the read request signal is valid; when the read request valid signal and the read request ready signal are both high, the first control unit reads the read request from the processor core. After that, the first control unit controls the first address calculation unit, the first conflict processing unit, the first data processing unit, and the first data transceiving unit to read data from the memory. When the read data valid signal is high, it indicates that the read data is valid; when the read data valid signal and the read data ready signal are both high, the first control unit controls the first data transceiver unit to send data to the processor core.

In this embodiment, the first control unit controls the first address calculation unit, the first conflict processing unit, the first data processing unit, and the first data transceiving unit to work in a pipeline manner.

As shown in FIG. 14, the first control unit includes: read and write request buffers, synchronization registers, selectors, and the first, second, third, fourth, and fifth stages of the pipeline.

The processor core sends a read request through the system bus. If the read request is a table lookup request, the selector strobes the synchronization register. The read and write request cache receives the read request and caches the read request. After receiving the read request, the read and write request buffer sends an offset address request to the second addressing unit through the internal bus, and sends the read request to the synchronization register. In response to the offset address request, the second addressing unit reads the offset address of the data in the memory from the memory, sends the offset address to the first address calculation unit via the internal bus, and synchronizes the data via the internal bus. Send an offset address valid signal. After receiving the offset address valid signal, the synchronization register sends the read request signal to the pipeline controllers at all levels to start the pipeline operation. The first-level, second-level, third-level, fourth-level, and fifth-level controllers send control signals to the first address calculation, the first conflict processing unit, the first data processing unit, and the first data transceiver unit, respectively. During the addressing process of the first addressing unit, the first address calculation unit is located at the first stage of the pipeline, the first conflict processing unit is located at the second stage of the pipeline, and the first data processing unit is located at the third and fourth stages of the pipeline , The first data transceiver unit is located at the fifth stage of the pipeline. If the read request is not a table lookup request, the selector strobes the read and write request cache, sends the read request directly to the pipeline controllers at all levels, and starts the pipeline operation.

The first addressing unit of this embodiment also provides a streamline pause mechanism. When the processor core cannot receive the data sent by the first data transceiver unit through the system bus, the processor core sends the read request cache, synchronization register, and the first stage, second stage, third stage, and third stage of the pipeline through the system bus. The fourth-level and fifth-level controllers send a bus pause signal. After the first, second, third, fourth, and fifth stage controllers of the read request cache, synchronization register, and pipeline receive the bus suspend signal, the pipeline suspends work. When the first conflict processing unit finds that there is a bank conflict in the vector address, the first conflict processing unit controls the read request cache, synchronization register, and the first, second, third, fourth, and fifth stages of the pipeline The controller sends a conflict pause signal, read request buffer, synchronization register, the first stage, second stage, third stage, fourth stage, and fifth stage of the pipeline after receiving the bus pause signal, the pipeline suspends work. After the Bank conflict is processed, the first conflict processing unit sends a conflict recovery signal to read the request buffer, synchronization register, and the first, second, third, fourth, and fifth-level controllers of the pipeline. After the conflict recovery signal is reached, the pipeline is restarted.

In this embodiment, by adopting a pipeline method, each module of the first addressing unit can be executed in parallel according to the pipeline, which can greatly improve the working efficiency of the first addressing unit, reduce the addressing time, and improve the addressing efficiency.

In this embodiment, when the first address calculation unit obtains the offset address of the data from the second addressing unit, the offset address is read from the memory by the second addressing unit. Before the second addressing unit reads the offset address from the memory, the processor core sends the offset address to the second addressing unit through the system bus, and the second addressing unit writes the offset address into the memory. The operation of the second addressing unit to read the offset address from the memory is similar to the above-mentioned operation of the first addressing unit to read data from the memory. As mentioned earlier, the structure of the second addressing unit is the same as that of the first addressing unit. When the second addressing unit reads the offset address from the memory, the offset address is equivalent to the data that needs to be read. The second addressing unit uses basically the same operation as the operation of the first addressing unit to read data from the memory, and the offset address can be read from the memory. 4, the second addressing unit includes: a second control unit, a second address calculation unit, a second conflict processing unit, a second data processing unit, and a second data transceiving unit.

When the second addressing unit reads the offset address from the memory, the second conflict processing unit reads the offset address from the memory, and sends the offset address to the second data processing unit; the second data processing unit checks the offset address The processing is performed, and the processed offset address is sent to the second data transceiving unit; the second data transceiving unit sends the processed offset address to the first addressing unit.

The difference between the second addressing unit and the first addressing unit is that the second data transceiving unit sends the processed offset address to the first addressing unit through the internal bus instead of the first addressing unit. The first data transceiver unit sends data to the processor core through the system bus. In addition, the operations of the second control unit, the second address calculation unit, the second conflict processing unit, the second data processing unit, and the second data transceiving unit are similar to the units corresponding to the first addressing unit.

Write operation

When the processor core writes data into the memory, part of the operation of the first addressing unit is similar to the read operation. For the sake of brevity, the following will focus on the differences between write operations and read operations.

In the write operation, in S103, the processor core writes the data into the vector address of the memory through the first addressing unit. As shown in FIG. 15, the data flow of the first data transceiving unit, the first data processing unit, and the first conflict processing unit is opposite to the read operation.

When there is a bank conflict, as shown in Figure 16, the processor core accesses the data of the storage address through the first addressing unit including:

S1601: The first data transceiver unit receives data sent by the processor core, and sends the data to the first data processing unit;

S1602: The first data processing unit processes the data, and sends the processed data to the first conflict processing unit;

S1603: The first conflict processing unit uses the conflict resolution mechanism to write data into the vector address.

In S1601, the receiving buffer receives and buffers the data sent by the processor core through the system bus, and sends the data to the first data processing unit.

In S1602, after receiving the data sent by the first data transceiving unit, the first data processing unit decides whether to perform further processing on the data according to the data width written by the processor core. When the processor core does not write data into all the bytes of each cell of each Bank, but writes data into the partial bytes of each cell of each Bank, the first data processing unit sends data to the first data transceiver unit. Splitting is performed to generate data that needs to be written into the memory, and the split data is sent to the first conflict processing unit.

Specifically, the splitting of the data sent by the first data transceiving unit by the first data processing unit includes:

When the data sent by the first data transceiver unit includes m×k blocks and the width of each block is m bytes, the m bytes of each block are split to obtain N×k bytes, so that every k words The sections correspond to the k addresses of a Bank respectively; among them, N is the number of banks in the memory; the bit width of the bank is m bytes; k≤log ₂ ^m .

The following takes Figure 17 as an example to describe the process of data splitting.

For the memory shown in FIG. 8, N=16, m=4, including 16 Banks, and the bit width of the Bank is 4 bytes, that is, each cell of each Bank stores 4 bytes of data. As shown in Figure 17, the data written by the processor core includes 4 blocks, and each block includes 4 bytes. The 16 bytes correspond to the storage location of one byte in each cell of the memory B0-Bf. At this time, the 4 bytes of each block are split to obtain 16 bytes, so that each byte corresponds to the storage location of one byte of one cell of one Bank. The split data format is the data format written into the memory, and the first data processing unit sends the split data to the first conflict processing unit.

When the processor core wants to write data into all the bytes of each cell of each Bank, that is, the processor core needs to write 64 bytes to the memory, the first data processing unit does not need to send data to the first data transceiver unit. Splitting is performed, but the data is directly sent to the first conflict processing unit.

The process of data splicing is described above by taking k=1 as an example. According to k≤log ₂ ^m , when m=4, the value of k can also be 2, that is, the processor core should write data into the two bytes of each cell of each Bank. In this case, the first data processing The operation of the unit is similar to when k=1. The first data processing unit splits the 2 bytes of each block to obtain 32 bytes, so that each byte corresponds to the storage position of the two bytes of one cell of a bank. The split data format is the data format written into the memory, and the first data processing unit sends the split data to the first conflict processing unit.

The following describes the conflict resolution mechanism in S1603 with reference to FIG. 18. As shown in FIG. 18, the first conflict processing unit further includes: a write data buffer, a write data strobe, and a write data reorganization unit.

The address strobe gates the vector address directly sent by the first address calculation unit, so that the vector address is output to the conflict judgment unit.

The data sent by the first data processing unit is sent to the write data strobe, and the write data buffer simultaneously buffers the data sent by the first data processing unit.

The conflict judgment unit judges the vector address:

When there is a bank conflict in the vector address, the conflict judgment unit generates a conflict flag valid signal, and feeds back the conflict flag valid signal to the address strobe and the write data strobe, so that the address strobe gates the output of the vector address register Vector address, the write data strobe strobes the data output from the write data buffer. In this way, the vector address of the input conflict judgment unit and the data of the input write data reorganization unit can be kept unchanged during the conflict processing;

Write data reorganization unit to reorganize data;

The address mapping unit maps the vector address to the physical address of the memory, and the write data reorganization unit writes the reorganized data into the physical address of the memory.

After that, the conflict judgment unit generates a conflict flag invalidation signal. In response to the conflict flag invalidation signal, the address selector selects the vector address of the next set of data sent by the first address calculation unit, and the vector address buffer caches the vector of the next set of data at the same time. Address, the write data strobe strobes the next set of data sent by the first data processing unit, and buffers the next set of data to the write data buffer, and the conflict judgment unit continues to process the bank conflict of the next set of data.

The data reorganization unit reorganizes the data including:

Determine the first cell corresponding to the vector address of each Bank, and compile the data corresponding to the first cell into a row according to the order of address ascending;

Determine the second cell corresponding to the vector address of each Bank, and compile the data corresponding to the second cell into a row according to the order of address from small to large;

By analogy, until the n-th cell corresponding to the vector address of each Bank is determined, the data corresponding to the n-th cell is compiled into one row according to the order of address from small to large, and a total of n rows of data are obtained.

The address mapping unit maps the vector address to the physical address of the memory, including:

The address mapping unit sequentially selects n groups of cells corresponding to n rows of data;

The write data reorganization unit writes the reorganized data into the physical address of the memory, including:

According to the strobe sequence of the n groups of cells, n rows of data are written into the n groups of cells in sequence.

The following uses FIG. 19 as an example to describe the above conflict resolution mechanism. In an example, as shown in FIG. 19, the number of banks in the memory is N=16, the memory includes 16 banks, each bank includes 5 cells, and each cell can store 4 bytes of 32 bits. Using table look-up addressing mode, the vector address of each group of data includes 16 addresses, and the first addressing unit can write a group of 16 data to the memory each time. Suppose the processor core needs to write a set of data labeled "1", "2", "3", and "4", and data labeled "1", "2", "3", "4" The base address is 0, the offset address is [0, 1, 2, 3, 68, 69, 70, 71, 136, 137, 138, 139, 204, 205, 206, 207], the output of the first address calculation unit The vector address is [0, 1, 2, 3, 68, 69, 70, 71, 136, 137, 138, 139, 204, 205, 206, 207]. The vector address is directly input to the address strobe of the first conflict processing unit, and at the same time, the vector address buffer caches the vector address. The address strobe selects the vector address, and the conflict judgment unit judges the vector address.

As shown in Figure 19, since the vector address of this group of data corresponds to the same Bank of the memory, for example, the vector address [1] and [69] correspond to the first and second cells of B1, [2] , [69], [136] correspond to the first, second and third cells of B2, etc. Therefore, there is a Bank conflict in this vector address. At this time, the conflict judgment unit generates a conflict flag valid signal. Under the control of the conflict flag valid signal, the address strobe gates the vector address output by the vector address buffer, and the write data strobe gates the data output from the write data buffer. . In this way, during the conflict processing period, the vector address can be kept unchanged until the conflict processing ends.

Afterwards, the write data reorganization unit reorganizes the data and compiles the bytes of the first cell corresponding to the vector address corresponding to B0-Bf into one row. In the data in Figure 19, the data corresponding to the vector address [0, 1, 2, 3] corresponds to the first cell of B0-B3, and the data corresponding to the vector address [71] corresponds to the second cell of B4, the vector address [139] The corresponding data corresponds to the third ell of B5, and the data corresponding to the vector address [207] corresponds to the fourth cell of B6. Therefore, the first row of data is the data corresponding to the vector address [0, 1, 2, 3, 71, 139, 207], that is, the 7 data labeled "1".

In the same way, the bytes of the second cell corresponding to the vector address corresponding to B0-Bf are compiled into a row. In the data in Figure 19, the data corresponding to the vector addresses [68,69,70], [138][206] correspond to the second cell of B1-B3, the third cell of B4, and the fourth of B5. The cell corresponds. Therefore, the second row of data is the data corresponding to the vector address [0, 1, 2, 3, 71, 139, 207], that is, the 7 data labeled "2".

By analogy, two rows of data corresponding to the third cell and fourth cell corresponding to the vector address of B0-Bf can be obtained. The third and fourth rows of data respectively include the three labeled "3" Data, 1 data labeled "4". The address mapping unit sequentially selects the n groups of cells corresponding to the n rows of data.

In the example in Fig. 8, n=4, that is, the data is divided into four groups. In other examples, n can be other values, which depend on the vector address itself.

The write data reorganization unit sequentially writes the 4 rows of data into the memory according to the strobe sequence of the 4 groups of cells, and the written data is shown in FIG. 19. Through the above-mentioned conflict resolution mechanism, data stored in a group of cells can be written in one clock cycle, and data writing can be completed in four clock cycles.

It should be noted that in this embodiment, multiple gating sequences can be used. For example, the first to fourth groups of cells can be sequentially strobed (as shown in FIG. 19), or the fourth group to the first group can be sequentially strobed in reverse order. Group cells, four groups of cells can also be selected in sequence at random.

At this point, the conflict processing of the group of data ends, and the conflict judgment unit generates a conflict flag failure signal. In response to the conflict flag failure signal, the address selector strobes the vector address of the next set of data sent by the first address calculation unit, the vector address buffer buffers the vector address of the next set of data at the same time, and the write data strobe strobes the first set of data. The data processing unit sends the next set of data and buffers the next set of data to the write data buffer, and the conflict judgment unit continues to process the bank conflict of the next set of data.

In the conflict resolution mechanism, the conflict judgment unit judges the vector address. When there is no bank conflict in the vector address, the conflict judgment unit generates a conflict flag failure signal; the address mapping unit maps the vector address to the physical address of the memory; write data reorganization The unit writes data to the physical address; in response to the conflict flag failure signal, the address selector selects the vector address of the next set of data.

So far, the operation of writing data into the memory by the processor core through the first addressing unit is completed.

Similar to the read operation, in the addressing method of this embodiment, the first addressing unit is set in the processor, and the acquisition of the base address and the offset address and the calculation of the vector address are all completed by the first addressing unit. When there is a bank conflict in the vector address, the first addressing unit uses the conflict resolution mechanism to resolve the bank conflict, and the processor core does not need to deal with the bank conflict. During the processing of the Bank conflict, the processor core can still perform other operations without waiting for the resolution of the Bank conflict. Therefore, the addressing method of this embodiment can significantly improve the efficiency of the processor and increase the computing speed of the processor.

In the write operation, the first control unit of the first addressing unit also communicates with the processor core through the handshake protocol.

As shown in Figure 20, the processor core communicates with the first addressing unit through the system bus. The system bus includes: clock signal line, write request valid, write request ready, write request, write data line, and write busy. The address unit works under the drive of the clock signal line. When the processor core needs to write data to the memory, the processor core sends task instructions and data to the first addressing unit through a handshake protocol. When the write request valid signal is high, it means that the write request signal and write data are valid; when the write request valid signal and the write request ready signal are both high, the first control unit reads the write request from the processor core and the write is busy The signal is pulled high. After that, the first control unit controls the first address calculation unit, the first conflict processing unit, the first data processing unit, and the first data transceiver unit to write data to the memory, and the write busy signal is pulled low.

In the write operation, the first control unit controls the first address calculation unit, the first conflict processing unit, the first data processing unit, and the first data transceiving unit to work in a pipeline manner.

As shown in Figure 21, the processor core sends a write request through the system bus. If the write request is a table lookup request, the selector strobes the synchronization register. The read and write request cache receives the write request and caches the write request. After receiving the write request, the read and write request buffer sends an offset address request to the second addressing unit through the internal bus, and sends the write request to the synchronization register. In response to the offset address request, the second addressing unit reads the offset address of the data in the memory from the memory, sends the offset address to the first address calculation unit via the internal bus, and synchronizes the data via the internal bus. Send an offset address valid signal. After receiving the offset address valid signal, the synchronization register sends the write request to the pipeline controllers at all levels to start the pipeline operation. The first-level and second-level controllers respectively send control signals to the first address calculation, the first data processing unit, and the first conflict processing unit. During the addressing process of the first addressing unit, the first data transceiver unit and the read The write request cache is located in the same stage, the first address calculation unit and the first data processing unit are located in the first stage of the pipeline, and the first conflict processing unit is located in the second stage of the pipeline. If the write request is not a table lookup request, the selector strobes the read and write request cache, sends the write request directly to the pipeline controllers at all levels, and starts the pipeline operation.

Similarly, a pipeline suspend mechanism is also provided in the write operation. When the processor core cannot send data to the first data transceiver unit through the system bus, the processor core sends a bus suspend signal to the read request cache, the synchronization register, and the first stage and second stage controllers of the pipeline through the system bus. After the read request buffer, synchronization register, and the first and second stage controllers of the pipeline receive the bus pause signal, the pipeline suspends work. When the first conflict processing unit finds that there is a bank conflict in the vector address, the first conflict processing unit sends a conflict pause signal to the read request cache, the synchronization register, the first and second stage controllers of the pipeline, and the read request cache, After the synchronization register and the first stage and second stage controllers of the pipeline receive the bus pause signal, the pipeline suspends work. After the bank conflict is processed, the first conflict processing unit sends a conflict recovery signal, and the read request buffer, synchronization register, and the first and second stage controllers of the pipeline restart the pipeline after receiving the conflict recovery signal.

Before the second addressing unit reads the offset address from the memory, the processor core sends the offset address to the second addressing unit through the system bus, and the second addressing unit writes the offset address into the memory. The operation of the second addressing unit to write the offset address to the memory is similar to the operation of the above-mentioned first addressing unit to write data to the memory. As mentioned earlier, the structure of the second addressing unit is the same as that of the first addressing unit. When the second addressing unit writes an offset address to the memory, the offset address is equivalent to the written data. The second addressing unit can write the offset address into the memory by using the same operation as the operation of the first addressing unit to write data to the memory.

Another embodiment of the present disclosure provides an addressing method for a processor. In this embodiment, as shown in FIG. 22, the addressing module includes multiple groups of addressing units, and each group of addressing units may be a group of addressing units of the previous embodiment.

Each group of addressing units includes: the same two addressing units. The two addressing units communicate with the processor core through the system bus respectively. An internal bus is set in the addressing module, and the two addressing units communicate through the internal bus. When the addressing mode of this embodiment is implemented, one of the two addressing units is used to read and write data. This addressing unit can be called a table addressing unit, and the other addressing unit is used for reading and writing data. To read and write the data at the offset address of the memory, the other addressing unit may be called an offset addressing unit.

The addressing method of this embodiment can be executed in parallel by multiple groups of addressing units. Each group of addressing units can communicate with the processor core through the system bus, and read and write the memory. When the processor core needs to read and write multiple groups of data at the same time, each group of addressing units can complete their respective addressing tasks independently. How many groups of addressing units are specifically included is not limited in this embodiment, and can be determined according to actual requirements. Compared with a single group of addressing units, this embodiment can double the addressing efficiency of the processor, which greatly improves the addressing ability of the processor.

Yet another embodiment of the present disclosure provides an addressing method for a processor. In this embodiment, a group of addressing units obtains the base address or the offset address through the ping-pong addressing mode.

The following describes the acquisition of the offset address through the ping-pong addressing mode in conjunction with FIG. 23. As shown in FIG. 23, a group of addressing units includes: a third addressing unit, a fourth addressing unit, and a fifth addressing unit. Ping-pong addressing modes include:

The processor core alternately writes the offset address into the memory through the fourth addressing unit and the fifth addressing unit;

The third addressing unit obtains the base address sent by the processor core, and alternately obtains the offset address stored in the memory through the fourth addressing unit and the fifth addressing unit.

The third addressing unit is used as a table addressing unit, and the fourth and fifth addressing units are used as an offset addressing unit. When the processor core reads and writes multiple sets of data, the next set of offset addresses can be written into the memory through the fifth addressing unit, the third addressing unit sends an offset address request to the fourth addressing unit, and the fourth seeks After receiving the offset address request, the addressing unit reads the last set of offset addresses from the memory, and sends the last set of offset addresses to the third addressing unit. After that, the roles of the fourth addressing unit and the fifth addressing unit are exchanged. The processor core writes the next set of offset addresses into the memory through the fourth addressing unit. At the same time, the third addressing unit sends an offset address request to the fifth addressing unit, and the fifth addressing unit receives the offset address. After the request, the next set of offset addresses are read from the memory, and the next set of offset addresses are sent to the third addressing unit. By switching repeatedly in this way, the third addressing unit alternately obtains the offset address from the fourth addressing unit and the fifth addressing unit, so as to realize the ping-pong addressing of the offset address.

The following describes how to obtain the base address through ping-pong addressing in conjunction with Figure 24. As shown in FIG. 24, a group of addressing units includes: a sixth addressing unit, a seventh addressing unit, and an eighth addressing unit. Ping-pong addressing modes include:

The processor core writes the offset address into the memory through the eighth addressing unit;

The sixth addressing unit and the seventh addressing unit alternately obtain the base address sent by the processor core, and obtain the offset address stored in the memory through the eighth addressing unit.

The sixth addressing unit and the seventh addressing unit are used as a table addressing unit, and the eighth addressing unit is used as an offset addressing unit. When the processor core reads and writes multiple sets of data, while sending the next base address to the seventh addressing unit, the sixth addressing unit sends an offset address request to the eighth addressing unit, and the eighth addressing unit receives After the offset address is requested, the offset address is read from the memory, and the offset address is sent to the sixth addressing unit. After that, the roles of the sixth addressing unit and the seventh addressing unit are exchanged. The processor core sends the next base address to the sixth addressing unit. At the same time, the seventh addressing unit sends an offset address request to the eighth addressing unit. After the eighth addressing unit receives the offset address request, the The memory reads the offset address and sends the offset address to the seventh addressing unit. By switching repeatedly in this way, the sixth addressing unit and the seventh addressing unit alternately obtain the offset address from the eighth addressing unit to realize the ping-pong addressing of the base address.

It can be seen that, in this embodiment, through the ping-pong addressing mode, three or more addressing units can execute the write and read operations of the base address and the offset address in parallel, which improves the addressing ability of the processor, especially in large In scale look-up table addressing, addressing efficiency can be greatly improved.

Another embodiment of the present disclosure provides a processor. As shown in FIG. 2, the processor includes: a processor core, an addressing module, and a memory. The addressing module can be integrated inside the processor.

The addressing module is used to obtain the base address and offset address of the data in the memory, and obtain the storage address of the data in the memory according to the base address and the offset address;

The processor core can access the data at the storage address of the memory through the addressing module.

The addressing module may include one or more groups of addressing units. For each group of addressing units, as shown in Figure 3, the same two addressing units are included. The two addressing units communicate with the processor core through the system bus respectively. An internal bus is set in the addressing module, and the two addressing units communicate through the internal bus. One of the two addressing units is used to read and write data. This addressing unit can be called a table addressing unit, and the other addressing unit is used to perform the offset address of the data in the memory. For reading and writing, the other addressing unit can be called an offset addressing unit. For the convenience of description, the two addressing units are referred to as the first addressing unit and the second addressing unit respectively, and the first addressing unit is used as the table addressing unit, and the second addressing unit is used as the offset addressing unit. Take the unit as an example to describe the processor. However, those skilled in the art should understand that the roles of the first addressing unit and the second addressing unit can also be interchanged, that is, the first addressing unit is used as an offset addressing unit, and the second addressing unit is used as a table addressing unit. .

In the processor of this embodiment, an addressing module is set in the processor, and the storage address of the data in the memory is calculated through the addressing module instead of the processor core. The addressing operation is completed by the addressing module, and the storage address calculation process does not need to be processed. The processor core participates, but the addressing module calculates the storage address, which improves the efficiency of table look-up addressing compared with ordinary processors.

When the processor core needs to read data from the memory, the first addressing unit is used to obtain the base address and offset address of the data in the memory.

Referring to FIG. 4, the first addressing unit includes: a first address calculation unit, a first conflict processing unit, a first data processing unit, a first data transceiving unit, and a first control unit.

The first control unit can communicate with the processor core through the system bus, and is used to control the operations of the first address calculation unit, the first conflict processing unit, the first data processing unit, and the first data transceiving unit. When the processor core needs to read data from the memory, the processor core can send a read request to the first control unit via the system bus; in response to the read request, the first control unit can send a bias to the second addressing unit via the internal bus. Move address request. In response to the offset address request, the second addressing unit can read the offset address of the data in the memory from the memory, and send the offset address to the first address calculation unit via the internal bus, and can use the internal bus Send an offset address valid signal to the first control unit. In response to the offset address valid signal, the first control unit can start each other unit of the first addressing unit to perform a read operation.

The first address calculation module is configured to receive the base address of the data sent by the processor core and the offset address sent by the second addressing unit, and obtain the vector address of the data from the base address and the offset address. As shown in FIG. 5, the first address calculation unit includes: a base address selector, an offset address selector, and an adder.

The base address selector is used to select the base address sent by the processor core. The offset address selector is used to select the offset address sent by the second addressing unit through the internal bus. For look-up table addressing, the number of offset addresses corresponds to the number of banks in the memory. When the memory includes N banks, the number of offset addresses is N.

The first address calculation unit is also used to obtain the storage address of the data in the memory according to the base address and the offset address.

The adder of the first address calculation unit is used to respectively sum the base address and the N offset addresses to obtain the storage address of the data, and the storage address is a vector address including 16 addresses.

After obtaining the vector address, the processor core reads the data stored at the vector address from the memory through the first addressing unit. When the vector address has a bank conflict, the processor core can read the data of the vector address through the first addressing unit.

The first conflict processing unit is used to determine whether there is a Bank conflict. When there is a Bank conflict, the first conflict processing unit can use a conflict resolution mechanism to read the data from the vector address and send the data to the The first data processing unit; the first data processing unit is used to process the data and send the processed data to the first data transceiving unit; the first data transceiving unit is used to send the processed data The data is sent to the processor core.

In the conflict resolution mechanism, the vector address sent by the first address calculation unit is directly input to the address strobe, and the vector address buffer is used to buffer the vector address.

The address strobe is used to strobe the vector address directly sent by the first address calculation unit, so that the vector address is output to the conflict judgment unit.

The conflict judgment unit is used to judge the vector address:

When there is a bank conflict in the vector address, the conflict judging unit is used to generate a conflict flag valid signal and feed back the conflict flag valid signal to the address strobe, so that the address strobe strobes the vector address output by the vector address register. During the conflict processing, the vector address of the input conflict judgment unit can be kept unchanged.

The address mapping unit is used to map the vector address to the physical address of the memory.

The read data reorganization unit is used to read the data of the physical address, reorganize the data, and send the reorganized data to the first data processing unit.

Afterwards, the conflict judgment unit is used to generate a conflict flag invalidation signal. In response to the conflict flag invalidation signal, the address selector is used to gate the vector address of the next data sent by the first address calculation unit, and the vector address buffer is used to buffer the next data at the same time. The conflict judgment unit is used to continue processing the bank conflict of the next data.

The address mapping unit is used to group the first cell corresponding to the vector address of each bank into a group, and the second cell corresponding to the vector address into a group, and so on, until the nth cell corresponding to the vector address is grouped. The cells are grouped into a group to obtain a total of n groups of cells, and the n groups of cells in the memory are sequentially strobed.

The read data reorganization unit can read the vector address data and reorganize the data in the following ways:

According to the gating sequence of the n groups of cells, sequentially read the data stored in the n groups of cells, and rearrange the data stored in the n groups of cells in the order of address from small to large, to obtain the reorganized data. After reorganization, the data is arranged according to its actual storage location in the memory, and the data reorganization unit sends the reorganized data to the first data processing unit for use by the processor core.

So far, the conflict processing of the group of data ends, and the conflict judgment unit is used to generate a conflict flag failure signal. In response to the conflict flag invalidation signal, the address selector is used to select the vector address of the next set of data sent by the first address calculation unit, the vector address buffer is used to buffer the vector address of the next set of data at the same time, and the conflict judgment unit is used to continue to check the vector address of the next set of data. Bank conflicts of the next set of data are processed.

In the conflict resolution mechanism, the conflict judgment unit is used to judge the vector address. When there is no bank conflict in the vector address, the conflict judgment unit is used to generate a conflict flag failure signal. The address mapping unit is used to map the vector address to the physical address of the memory. The read data reorganization unit is used to read the data of the physical address, without reorganizing the read data, and directly sends the read data to the first data processing unit. Since there is no bank conflict, the data can be read in one clock cycle. In response to the conflict flag failure signal, the address selector is used to select the vector address of the next set of data.

When the bit width of the Bank is m bytes, the vector addresses are equally divided into m groups or 2×m groups. If each group of addresses corresponds to a cell of a bank, it is considered that there is no bank conflict in the vector address, and the first conflict processing unit is used to perform address splicing on the vector address.

After the first data processing unit receives the data sent by the first conflict processing unit, it decides whether to perform further processing on the data according to the data width required by the processor core. When the data required by the processor core is not all the bytes read from each cell of each Bank, but a partial byte of each cell of each Bank, the first data processing unit is used to send to the first conflict processing unit Part of the bytes of the data are spliced to generate data required by the processor core, and the spliced data is sent to the first data transceiver unit.

Specifically, for the data sent by the first conflict processing unit, when the bit width of the Bank is m bytes, and what the processor core needs is k bytes out of every m bytes of the data, the first The data processing unit is used to select the k bytes from every m bytes to obtain N×k bytes; k≤log ₂ ^m ; combine every m bytes of N×k bytes together , Get the data of m×k block, each block width is m bytes.

When the data required by the processor core is all the bytes read from each cell of each Bank, the first data processing unit does not need to splice the data sent by the first conflict processing unit, but directly sends the data to the first data Transceiver unit.

The first data transceiver unit includes: a receiving buffer and a sending buffer. The sending buffer buffers the data sent by the first data processing unit, and sends the buffered data to the processor core through the system bus. The depth of the receiving buffer and the transmitting buffer can be set according to actual needs, where the minimum depth of the receiving buffer is 2 and the minimum depth of the transmitting buffer is 0.

In the processor of this embodiment, the first addressing unit is set in the processor, and the acquisition of the base address and the offset address and the calculation of the vector address are all completed by the first addressing unit. When there is a bank conflict in the vector address, the first addressing unit uses the conflict resolution mechanism to resolve the bank conflict, and the processor core does not need to deal with the bank conflict. During the processing of the Bank conflict, the processor core can still perform other operations without waiting for the resolution of the Bank conflict. Therefore, the addressing method of this embodiment can significantly improve the efficiency of the processor and increase the computing speed of the processor.

The first addressing unit can obtain base addresses and offset addresses of multiple sets of data based on multiple different modes.

The first address calculation unit is used to obtain the base address sent by the processor core; the second addressing unit is used to sequentially read the offset address of each group of data in the memory; the first address calculation unit is used to obtain the reading of the second addressing unit The offset address taken.

As shown in Figure 5, when reading multiple sets of data, in the offset address update mode, the base address selector is used to select the base address sent by the processor core. Whenever a group of data is read, the offset address selector is used to select the offset address of the group of data sent by the second addressing unit through the internal bus. The adder is used to sum the base address and the offset address of the group of data to obtain the vector address of the group of data. The adder is used to send the vector address of the group of data to the first conflict resolution unit, and send the data to the processor core through the first conflict resolution unit, the first data processing unit, and the first data transceiver unit to complete the set of data Read. For each group of data, the offset address selector selects the offset address of the group of data sent by the second addressing unit through the internal bus, so as to realize the reading of multiple groups of data.

In addition to the base address update mode and the offset address update mode, the processor of this embodiment also provides a fixed offset address mode. As shown in Figure 5, in the fixed offset address mode, the first address calculation unit is used to obtain the base address sent by the processor core, and the base address selector is used to gate the base address sent by the processor core and change the base address Send to the adder. The processor core is also used to send a fixed offset address to the first addressing unit, and the offset address selector of the first address calculation unit is used to select the fixed offset address and send the fixed offset address to the adder . The adder of the first address calculation unit is used to add the base address and the offset address to obtain the vector address. The fixed offset address mode can be used in multiple addressing scenarios such as linear addressing and step addressing.

It can be seen that the processor of this embodiment provides an offset address update mode, a base address update mode, and a fixed offset address mode, which can be flexibly selected according to actual conditions, which improves the flexibility of table look-up addressing.

The processor of this embodiment has task-driven addressing capabilities. When the processor core executes the operation of reading data, the processor core generates a set of instructions to read data, which is equivalent to a task instruction, and sends the task instruction to the first addressing unit through the system bus, and the entire addressing The process is completed by the first addressing unit. The data read from the memory by the first addressing unit is sent to the processor core via the system bus. The processor core then performs subsequent operations after receiving the data. It can be seen that in this task-driven addressing of this embodiment, when the processor core needs to read data from the memory, the task instruction can be sent to the first addressing unit, and the processor core does not need to care about the specific addressing process. Even if the bank conflict occurs, it is handled by the first addressing unit. Compared with the general processor, the operation of the processor core is simplified and the efficiency is improved.

The processor core sends a read request through the system bus. If the read request is a table lookup request, the selector is used to gate the synchronization register. The read and write request cache is used to receive the read request and cache the read request. After receiving the read request, the read and write request buffer is used to send an offset address request to the second addressing unit through the internal bus, and send the read request to the synchronization register. In response to the offset address request, the second addressing unit is used to read the offset address of the data in the memory from the memory, and send the offset address to the first address calculation unit through the internal bus, and through the internal bus Send an offset address valid signal to the synchronization. After receiving the offset address valid signal, the synchronization register is used to send the read request signal to the pipeline controllers at all levels to start the pipeline operation. The first-level, second-level, third-level, fourth-level, and fifth-level controllers can respectively send control signals to the first address calculation, the first conflict processing unit, the first data processing unit, and the first data transceiving unit, In the addressing process of the first addressing unit, the first address calculation unit is located in the first stage of the pipeline, the first conflict processing unit is located in the second stage of the pipeline, and the first data processing unit is located in the third and fourth stages of the pipeline. Stage, the first data transceiver unit is located at the fifth stage of the pipeline. If the read request is not a table lookup request, the selector is used to strobe the read and write request cache, send the read request directly to the pipeline controllers at all levels, and start the pipeline operation.

The first addressing unit of this embodiment also provides a streamline pause mechanism. When the processor core cannot receive the data sent by the first data transceiver unit through the system bus, the processor core can read the request cache, synchronization register, the first stage, second stage, and third stage of the pipeline through the system bus. The fourth and fifth level controllers send a bus pause signal. After the first, second, third, fourth, and fifth stage controllers of the read request cache, synchronization register, and pipeline receive the bus suspend signal, the pipeline suspends work. When the first conflict processing unit finds that there is a bank conflict in the vector address, the first conflict processing unit is used to send the read request cache, synchronization register, the first stage, the second stage, the third stage, the fourth stage, and the fifth stage of the pipeline. The level controller sends a conflict pause signal, and reads the request buffer, synchronization register, the first, second, third, fourth, and fifth stage of the pipeline after receiving the bus pause signal, and the pipeline pauses Work. After the Bank conflict is processed, the first conflict processing unit sends a conflict recovery signal to read the request buffer, synchronization register, and the first, second, third, fourth, and fifth-level controllers of the pipeline. After the conflict recovery signal is reached, the pipeline is restarted.

4, the second addressing unit includes: a second control unit, a second address calculation unit, a second conflict processing unit, a second data processing unit, and a second data transceiving unit.

When the second addressing unit reads the offset address from the memory, the second conflict processing unit is used for reading the offset address from the memory and sending the offset address to the second data processing unit; the second data processing unit is used for The offset address is processed, and the processed offset address is sent to the second data transceiving unit; the second data transceiving unit is used to send the processed offset address to the first addressing unit.

The difference between the second addressing unit and the first addressing unit is that the second data transceiving unit is used to send the processed offset address to the first addressing unit through the internal bus, instead of the first addressing unit. In that way, the first data transceiver unit is used to send data to the processor core through the system bus. In addition, the operations of the second control unit, the second address calculation unit, the second conflict processing unit, the second data processing unit, and the second data transceiving unit are similar to those of the units corresponding to the first addressing unit.

When the processor core writes data into the memory, part of the operation of the first addressing unit is similar to the read operation. The processor core can also write the data into the vector address of the memory through the first addressing unit.

When there is a bank conflict, the first data transceiver unit is used to receive the data sent by the processor core and send the data to the first data processing unit; the first data processing unit is used to process the data and transfer the processed data Sent to the first conflict processing unit; the first conflict processing unit writes the data into the vector address using the conflict resolution mechanism.

The receiving buffer is used to receive and buffer the data sent by the processor core through the system bus, and send the data to the first data processing unit.

The first data processing unit is configured to, after receiving the data sent by the first data transceiving unit, determine whether to perform further processing on the data according to the data width written by the processor core. When the processor core does not write data into all the bytes of each cell of each Bank, but writes data into the partial bytes of each cell of each Bank, the first data processing unit is used to send to the first data transceiver unit Split the data to generate data that needs to be written into the memory, and send the split data to the first conflict processing unit.

When the data sent by the first data transceiver unit includes m×k blocks and the width of each block is m bytes, the first data processing unit is used to split the m bytes of each block to obtain N×k words Section, so that each k bytes correspond to the k addresses of a Bank; among them, N is the number of banks in the memory; the bit width of the bank is m bytes; k≤log ₂ ^m .

As shown in FIG. 18, the first conflict processing unit further includes: a write data buffer, a write data strobe, and a write data reorganization unit.

The vector address sent by the first address calculation unit is directly input to the address strobe, and the vector address buffer is used to buffer the vector address.

The conflict judgment unit is used to judge the vector address:

When there is a bank conflict in the vector address, the conflict judgment unit is used to generate a conflict flag valid signal, and feedback the conflict flag valid signal to the address strobe and the write data strobe, so that the address strobe strobes the vector address register The output vector address, the write data strobe is used to strobe the data output from the write data buffer. In this way, the vector address of the input conflict judgment unit and the data of the input write data reorganization unit can be kept unchanged during the conflict processing;

The write data reorganization unit is used to reorganize data;

The address mapping unit is used to map the vector address to the physical address of the memory, and the write data reorganization unit is used to write the reorganized data into the physical address of the memory.

Afterwards, the conflict judgment unit is used to generate a conflict flag invalidation signal. In response to the conflict flag invalidation signal, the address selector is used to select the vector address of the next group of data sent by the first address calculation unit, and the vector address buffer caches the next group at the same time. The vector address of the data. The write data strobe is used to strobe the next set of data sent by the first data processing unit and buffer the next set of data to the write data buffer. The conflict judgment unit continues to conflict with the next set of data. To process.

The data reorganization unit is used to reorganize data:

The address mapping unit is used to sequentially select n groups of cells corresponding to n rows of data;

The write data reorganization unit is used to sequentially write n rows of data into the n groups of cells according to the gating sequence of the n groups of cells.

So far, the conflict processing of the group of data ends, and the conflict judgment unit is used to generate a conflict flag failure signal. In response to the conflict flag invalidation signal, the address selector is used to strobe the vector address of the next set of data sent by the first address calculation unit, the vector address buffer also buffers the vector address of the next set of data, and the write data strobe is used to strobe The first data processing unit sends the next set of data and buffers the next set of data to the write data buffer, and the conflict judgment unit continues to process the bank conflict of the next set of data.

In the conflict resolution mechanism, the conflict judgment unit is used to judge the vector address. When there is no bank conflict in the vector address, the conflict judgment unit is used to generate a conflict flag failure signal; the address mapping unit is used to map the vector address to the memory Physical address; the write data recombination unit is used to write data into the physical address; in response to the conflict flag failure signal, the address selector is used to select the vector address of the next set of data.

Similar to the read operation, the first addressing unit is set in the processor, and the acquisition of the base address and the offset address and the calculation of the vector address are all completed by the first addressing unit. When there is a bank conflict in the vector address, the first addressing unit uses the conflict resolution mechanism to resolve the bank conflict, and the processor core does not need to deal with the bank conflict. During the processing of the Bank conflict, the processor core can still perform other operations without waiting for the resolution of the Bank conflict. Therefore, the addressing method of this embodiment can significantly improve the efficiency of the processor and increase the computing speed of the processor.

As shown in Figure 20, the processor core communicates with the first addressing unit through the system bus. The system bus includes: clock signal line, write request valid, write request ready, write request, write data line, and write busy. The address unit works under the drive of the clock signal line. When the processor core needs to write data to the memory, the processor core may send task instructions and data to the first addressing unit through a handshake protocol. When the write request valid signal is high, it means that the write request signal and the write data are valid; when the write request valid signal and the write request ready signal are both high, the first control unit can read the write request from the processor core and write The busy signal is pulled high. After that, the first control unit is used to control the first address calculation unit, the first conflict processing unit, the first data processing unit and the first data transceiver unit to write data to the memory, and the write busy signal is pulled low.

In the write operation, the first control unit is used to control the first address calculation unit, the first conflict processing unit, the first data processing unit, and the first data transceiving unit to work in a pipeline manner.

As shown in Figure 21, the processor core can send a write request through the system bus. If the write request is a table lookup request, the selector is used to gate the synchronization register. The read and write request cache receives the write request and caches the write request. After receiving the write request, the read and write request buffer can send an offset address request to the second addressing unit through the internal bus, and send the write request to the synchronization register. In response to the offset address request, the second addressing unit is used to read the offset address of the data in the memory from the memory, and send the offset address to the first address calculation unit through the internal bus, and through the internal bus Send an offset address valid signal to the synchronization. After receiving the offset address valid signal, the synchronization register is used to send the write request to the pipeline controllers at all levels to start the pipeline operation. The first-level and second-level controllers are used to respectively send control signals to the first address calculation, the first data processing unit, and the first conflict processing unit. During the addressing process of the first addressing unit, the first data transceiver unit and the The read and write request cache is located in the same stage, the first address calculation unit and the first data processing unit are located in the first stage of the pipeline, and the first conflict processing unit is located in the second stage of the pipeline. If the write request is not a table lookup request, the selector strobes the read and write request cache, sends the write request directly to the pipeline controllers at all levels, and starts the pipeline operation.

Similarly, a pipeline suspend mechanism is also provided in the write operation. When the processor core cannot send data to the first data transceiver unit through the system bus, the processor core can send a bus suspend signal to the read request cache, synchronization register, and the first-level and second-level controllers of the pipeline through the system bus. . After the read request buffer, synchronization register, and the first and second stage controllers of the pipeline receive the bus pause signal, the pipeline suspends work. When the first conflict processing unit finds that there is a bank conflict in the vector address, the first conflict processing unit is used to send a conflict pause signal to the read request cache, the synchronization register, the first and second stage controllers of the pipeline, and the read request After the first and second stage controllers of the buffer, synchronization register, and pipeline receive the bus suspend signal, the pipeline suspends work. After the Bank conflict is processed, the first conflict processing unit is used to send a conflict recovery signal, and the read request buffer, synchronization register, and the first and second stage controllers of the pipeline restart the pipeline after receiving the conflict recovery signal.

Before the second addressing unit reads the offset address from the memory, the processor core may send the offset address to the second addressing unit through the system bus, and the second addressing unit is used to write the offset address into the memory. The operation of the second addressing unit to write the offset address to the memory is similar to the operation of the above-mentioned first addressing unit to write data to the memory.

The addressing module of this embodiment may include multiple groups of addressing units. The processor of this embodiment can perform data read and write operations in parallel by multiple groups of addressing units. Each group of addressing units can communicate with the processor core through the system bus, and read and write the memory. When the processor core needs to read and write multiple groups of data at the same time, each group of addressing units can complete their respective addressing tasks independently. How many groups of addressing units are specifically included is not limited in this embodiment, and can be determined according to actual requirements. Compared with a single group of addressing units, this embodiment can double the addressing efficiency of the processor, which greatly improves the addressing ability of the processor.

A group of addressing units in this embodiment can obtain a base address or an offset address through a ping-pong addressing mode. As shown in FIG. 23, a group of addressing units includes: a third addressing unit, a fourth addressing unit, and a fifth addressing unit. Ping-pong addressing modes include:

The processor core can alternately write the offset address into the memory through the fourth addressing unit and the fifth addressing unit;

The third addressing unit is used to obtain the base address sent by the processor core, and alternately obtain the offset address stored in the memory through the fourth addressing unit and the fifth addressing unit.

The third addressing unit is used as a table addressing unit, and the fourth and fifth addressing units are used as an offset addressing unit. When the processor core reads and writes multiple sets of data, when a set of offset addresses can be written into the memory through the fifth addressing unit, the third addressing unit can send an offset address request to the fourth addressing unit, and the fourth seeks After receiving the offset address request, the addressing unit can read the last set of offset addresses from the memory and send the last set of offset addresses to the third addressing unit. After that, the roles of the fourth addressing unit and the fifth addressing unit are exchanged. The processor core can write the next set of offset addresses into the memory through the fourth addressing unit. At the same time, the third addressing unit can send an offset address request to the fifth addressing unit, and the fifth addressing unit receives the offset After the address request, the set of offset addresses can be read from the memory, and the set of offset addresses can be sent to the third addressing unit. By switching repeatedly in this way, the third addressing unit alternately obtains the offset address from the fourth addressing unit and the fifth addressing unit, so as to realize the ping-pong addressing of the offset address.

The processor core can write the offset address into the memory through the eighth addressing unit;

The sixth addressing unit and the seventh addressing unit can alternately obtain the base address sent by the processor core, and can obtain the offset address stored in the memory through the eighth addressing unit.

The sixth addressing unit and the seventh addressing unit are used as a table addressing unit, and the eighth addressing unit is used as an offset addressing unit. When the processor core reads and writes multiple sets of data, while a base address can be sent to the seventh addressing unit, the sixth addressing unit can send an offset address request to the eighth addressing unit, and the eighth addressing unit receives After the offset address is requested, the offset address can be read from the memory and sent to the sixth addressing unit. After that, the roles of the sixth addressing unit and the seventh addressing unit are exchanged. The processor core can send the next base address to the sixth addressing unit. At the same time, the seventh addressing unit can send an offset address request to the eighth addressing unit. After the eighth addressing unit receives the offset address request, The offset address can be read from the memory and sent to the seventh addressing unit. By switching repeatedly in this way, the sixth addressing unit and the seventh addressing unit alternately obtain the offset address from the eighth addressing unit to realize the ping-pong addressing of the base address.

Another embodiment of the present disclosure also provides a movable platform. The movable platform includes a fuselage; the fuselage includes at least one circuit; and the circuit includes at least one processor of the above-mentioned embodiments.

The movable platform can be any movable vehicle or carrier, such as but not limited to: robots, drones, unmanned vehicles, unmanned ships, etc. Taking a drone as an example, referring to Figure 25, the body of the drone may have a shell. The housing may be formed of a single integral piece, two integral pieces, or multiple parts. The housing may include a single cavity or multiple cavities. For each cavity, one or more components can be placed in the cavity. The component may be, for example, at least one circuit board, one or more sensors, one or more communication units, or any other type of component. Each circuit board may include one or more processors of the foregoing embodiments, and the processors are used to perform functions such as flight control, navigation, and image processing.

Another embodiment of the present disclosure also provides an electronic device. The electronic device includes: a housing; the housing is provided with: at least one circuit; the circuit includes: at least one processor as described in the foregoing embodiment.

The electronic device of this embodiment, as shown in FIG. 26, may be a remote control, especially a remote control of a movable platform. The electronic device can also be any portable or non-portable device, such as but not limited to: smart phone/mobile phone, tablet computer, personal digital assistant (PDA), laptop computer, desktop computer, media content player, video game station/system, Virtual reality systems, augmented reality systems, wearable devices (for example, watches, glasses, gloves, headwear), gesture recognition devices, microphones, equipment capable of providing or rendering image data, etc.

Another embodiment of the present disclosure also provides a computer-readable storage medium that stores executable instructions. When the executable instructions are executed by one or more processors, one or more processors can execute the foregoing implementation. Example addressing method.

Those skilled in the art can clearly understand that for the convenience and conciseness of the description, only the division of the above-mentioned functional modules is used as an example. In practical applications, the above-mentioned functions can be allocated by different functional modules as required, that is, the device The internal structure is divided into different functional modules to complete all or part of the functions described above. For the specific working process of the device described above, reference may be made to the corresponding process in the foregoing method embodiment, which is not repeated here.

Finally, it should be noted that the above embodiments are only used to illustrate the technical solutions of the present disclosure, not to limit it; although the present disclosure has been described in detail with reference to the foregoing embodiments, those of ordinary skill in the art should understand that: The technical solutions described in the foregoing embodiments can still be modified, or some or all of the technical features can be equivalently replaced; in the case of no conflict, the features in the embodiments of the present disclosure can be combined arbitrarily; and these modifications or replacements It does not cause the essence of the corresponding technical solutions to deviate from the scope of the technical solutions of the embodiments of the present disclosure.

Claims

An addressing method for a processor, wherein the processor includes: a processor core, an addressing module, and a memory; the addressing method includes:

The addressing module obtains the base address and the offset address of the data in the memory;

The addressing module obtains the storage address of the data in the memory according to the base address and the offset address; and

The processor core accesses the data of the storage address through the addressing module.
The addressing method of the processor according to claim 1, wherein the addressing module comprises: at least one set of addressing units; and the addressing method is executed by the at least one set of addressing units.
3. The addressing method of a processor according to claim 2, wherein the group of addressing units at least comprises: a first addressing unit; and the storage address is a vector address;

The access by the processor core to the data of the storage address through the addressing module includes:

The processor core accesses the data of the vector address through the first addressing unit; when the vector address has a memory block conflict, the first addressing unit uses a conflict resolution mechanism to access the data.
8. The processor addressing method of claim 3, wherein the addressing method further comprises: before the addressing module obtains the base address and the offset address of the data in the memory,

In response to the code for accessing the data, the processor core generates a task instruction for accessing the data, and sends the task instruction to the first addressing unit.
The addressing method for a processor according to claim 4, wherein the first addressing unit comprises: a first address calculation unit, a first conflict processing unit, a first data processing unit, and a first data transceiving unit .
The addressing method of the processor according to claim 5, wherein:

The access by the processor core to the data of the storage address through the first addressing unit includes:

The first conflict processing unit reads the data from the vector address by using the conflict resolution mechanism, and sends the data to the first data processing unit;

The first data processing unit processes the data, and sends the processed data to the first data transceiving unit;

The first data transceiving unit sends the processed data to the processor core.
8. The addressing method of a processor according to claim 6, wherein the first conflict processing unit comprises: a conflict judgment unit, an address mapping unit, an address strobe, and a read data recombination unit;

The conflict resolution mechanism includes:

The address strobe strobes the vector address so that the vector address is output to the conflict judgment unit;

When there is a memory block conflict in the vector address, the conflict judgment unit generates a conflict flag valid signal;

In response to the conflict flag valid signal, the address strobe keeps strobing the vector address; the address mapping unit maps the vector address to the physical address of the memory; the read data reorganization unit reads Reorganizing the data of the physical address, and sending the reorganized data to the first data processing unit;

The conflict judgment unit generates a conflict flag failure signal;

In response to the conflict flag failure signal, the address selector selects the vector address of the next set of data.
The addressing method of the processor according to claim 7, wherein:

The mapping of the vector address to the physical address of the memory by the address mapping unit includes:

The first storage unit corresponding to the vector address of each storage block is grouped into a group, and the second storage unit corresponding to the vector address is grouped into a group, until the nth memory unit corresponding to the vector address is grouped. The storage units are grouped into a group to obtain a total of n groups of storage units, and n groups of storage units are sequentially selected;

The read data reorganization unit reads the data of the vector address and reorganizes the data, including:

According to the strobe sequence of the n groups of storage units, read the data stored in the n groups of storage units in sequence, and rearrange the data stored in the n groups of storage units in the order of address ascending, to obtain the reorganized Of said data.
The addressing method of the processor according to claim 7, wherein:

When there is no storage block conflict in the vector address, the conflict judgment unit generates a conflict flag failure signal;

The address mapping unit maps the vector address to the physical address of the memory;

The read data reorganization unit reads the data of the physical address, and sends the read data to the first data processing unit;

In response to the conflict flag failure signal, the address selector selects the vector address of the next set of data.
The addressing method of the processor according to claim 9, wherein the non-existent storage block conflict of the vector address comprises:

Dividing the vector addresses into m groups or 2×m groups evenly, and each group address corresponds to one storage unit of one storage block of the memory;

Wherein, the bit width of the storage block is m bytes.
The addressing method of the processor according to claim 5, wherein:

The processor core accessing the data of the storage address through the first addressing unit includes:

The first data transceiver unit receives the data sent by the processor core, and sends the data to the first data processing unit;

The first data processing unit processes the data, and sends the processed data to the first conflict processing unit;

The first conflict processing unit uses the conflict resolution mechanism to write the data into the vector address.
11. The addressing method of a processor according to claim 11, wherein the first conflict processing unit comprises: a conflict judgment unit, an address mapping unit, an address strobe, a data strobe, and a write data recombination unit;

The conflict resolution mechanism includes:

The address strobe strobes the vector address so that the vector address is output to the conflict judgment unit;

The data strobe strobes the data so that the data is output to the write data recombination unit;

When there is a memory block conflict in the vector address, the conflict judgment unit generates a conflict flag valid signal;

In response to the conflict flag valid signal, the address strobe keeps gating the vector address, and the data strobe keeps gating the data;

The write data reorganization unit reorganizes the data, the address mapping unit maps the vector address to a physical address of the memory, and the write data reorganization unit writes the reorganized data into the memory;

The conflict judgment unit generates a conflict flag failure signal;

In response to the conflict flag failure signal, the data strobe strobes the next group of data, and the address selector strobes the vector address of the next group of data.
The addressing method of the processor according to claim 12, wherein:

The reorganization of the data by the write data reorganization unit includes:

Determine the first storage unit of each storage block corresponding to the vector address, and group the data corresponding to the first storage unit into a row according to the order of address ascending;

Determine the second storage unit corresponding to the vector address of each storage block, and compile the data corresponding to the second storage unit into one row according to the order of address from small to large;

Until the nth storage unit corresponding to the vector address of each storage block is determined, the data corresponding to the nth storage unit is compiled into one row according to the address ascending order, and a total of n rows of data are obtained;

The address mapping unit mapping the vector address to the physical address of the memory includes:

The address mapping unit sequentially selects n groups of storage units corresponding to the n rows of data;

The writing data reorganization unit to write the reorganized data into the memory includes:

According to the strobe sequence of the n groups of memory cells, sequentially write the n rows of data into the n groups of memory cells.
The addressing method of the processor according to claim 12, wherein:

When there is no storage block conflict in the vector address, the conflict judgment unit generates a conflict flag failure signal;

The address mapping unit maps the vector address to the physical address of the memory;

The write data reorganization unit writes the data into the physical address;

In response to the conflict flag failure signal, the address selector selects the vector address of the next set of data.
8. The processor addressing method of claim 6, wherein the processing of the data by the first data processing unit comprises:

For the data sent by the first conflict processing unit, the first data processing unit splices some bytes therein.
The addressing method of the processor according to claim 15, wherein the first data processing unit splicing some bytes therein includes:

For the data sent by the first conflict processing unit, when it is necessary to select k bytes from every m bytes to read, then select the k bytes from every m bytes to obtain N×k Bytes; where k≤log 2 m ;

Combine each m bytes of the N×k bytes together to obtain m×k blocks of data with a width of m bytes each;

Wherein, N is the number of storage blocks of the memory; the bit width of the storage block is m bytes.
The addressing method of the processor according to claim 11, wherein the processing of the data by the first data processing unit comprises:

The first data processing unit splits the data.
The addressing method for a processor according to claim 17, wherein the first data processing unit to split the data comprises:

When the data includes m×k blocks and the width of each block is m bytes, the m bytes of each block are split to obtain N×k bytes, so that each k bytes corresponds to one storage K addresses of the block;

Wherein, N is the number of storage blocks of the memory; the bit width of the storage block is m bytes; k≤log 2 m .
The addressing method of the processor according to claim 3, wherein:

The acquisition of the base address and offset address of data in the memory by the addressing module includes:

The first addressing unit obtains the base address and the offset address of multiple sets of the data based on multiple modes.
The addressing method of the processor according to claim 19, wherein the multiple modes include at least: an offset address update mode and a base address update mode.
The addressing method for a processor according to claim 20, wherein the group of addressing units at least further comprises: a second addressing unit;

The offset address update mode includes:

Obtaining the base address sent by the processor core by the first address calculation unit;

The second addressing unit sequentially reads the offset address of each group of the data in the memory;

The first address calculation unit obtains the offset address read by the second addressing unit.
The addressing method of the processor according to claim 21, wherein:

The addressing module obtains the storage address of the data in the memory according to the base address and the offset address, including:

The first address calculation unit sequentially adds the base address sent by the processor core and the offset address of each group of data to obtain the vector address of each group of data.
The addressing method for a processor according to claim 20, wherein the group of addressing units at least further comprises: a second addressing unit;

The base address update mode includes:

The first address calculation unit sequentially obtains the base address update value of each group of the data sent by the processor core;

The second addressing unit cyclically reads the same offset address of each group of the data in the memory;

The first address calculation unit sequentially accumulates the updated base address value of each group of data to the base address of the previous group of data to obtain the base address of each group of data.
The addressing method of the processor according to claim 23, wherein:

The addressing module obtains the storage address of the data in the memory according to the base address and the offset address, including:

The first address calculation unit sequentially adds the base address of each group of data to the same offset address to obtain the vector address of each group of data.
The addressing method of the processor according to claim 21 or 23, wherein the addressing method further comprises:

The second addressing unit obtains the offset address sent by the processor core, and writes the offset address into the memory.
The addressing method for a processor according to claim 25, wherein the second addressing unit comprises: a second conflict processing unit, a second data processing unit, and a second data transceiving unit;

The reading of the offset address of the data in the memory by the second addressing unit includes:

The second conflict processing unit reads the offset address from the memory by using the conflict resolution mechanism, and sends the offset address to the second data processing unit;

The second data processing unit processes the offset address, and sends the processed offset address to the second data transceiving unit;

The second data transceiving unit sends the processed offset address to the first addressing unit.
The addressing method of the processor according to claim 3, wherein:

The acquisition of the base address and offset address of data in the memory by the addressing module includes:

Obtaining the base address sent by the processor core by the first address calculation unit;

Obtaining the offset address sent by the processor core by the first address calculation unit;

The addressing module obtains the storage address of the data in the memory according to the base address and the offset address, including:

The first address calculation unit adds the base address and the offset address to obtain the vector address.
3. The addressing method of the processor according to claim 2, wherein the addressing module comprises: multiple groups of addressing units; and the addressing method is executed in parallel by the multiple groups of addressing units.
3. The addressing method of the processor according to claim 2, wherein the group of addressing units obtains the base address or the offset address through a ping-pong addressing mode.
The addressing method for a processor according to claim 29, wherein the group of addressing units at least includes: a third addressing unit, a fourth addressing unit, and a fifth addressing unit;

The group of addressing units obtains the offset address in a ping-pong addressing manner, including:

The processor core alternately writes the offset address into the memory through the fourth addressing unit and the fifth addressing unit;

The third addressing unit obtains the base address sent by the processor core, and alternately obtains the offset stored in the memory through the fourth addressing unit and the fifth addressing unit address.
The addressing method for a processor according to claim 29, wherein the group of addressing units at least includes: a sixth addressing unit, a seventh addressing unit, and an eighth addressing unit;

The group of addressing units acquiring the base address in a ping-pong addressing manner includes:

The processor core writes the offset address into the memory through the eighth addressing unit;

The sixth addressing unit and the seventh addressing unit alternately obtain the base address sent by the processor core, and obtain the offset stored in the memory through the eighth addressing unit address.
8. The addressing method for a processor according to claim 5, wherein the first addressing unit further comprises: a first control unit;

The addressing method further includes:

The first control unit communicates with the processor core through a handshake protocol.
The addressing method of the processor according to claim 32, wherein the first control unit controls the first address calculation unit, the first conflict processing unit, the first data processing unit, and the first data transceiving unit Work in an assembly line manner.
The addressing method for a processor according to claim 33, wherein the processor core accessing the data of the vector address through the first addressing unit comprises:

The processor core reads the data from the vector address through the first addressing unit, and the first address calculation unit is located in the first stage of the pipeline, and the first conflict processing unit is located in the first stage of the pipeline. The second stage of the pipeline, the first data processing unit are located in the third and fourth stages of the pipeline, and the first data transceiving unit is located in the fifth stage of the pipeline.
The addressing method of the processor according to claim 34, wherein the first control unit comprises: a read and write request cache;

The access by the processor core to the data of the vector address through the first addressing unit includes:

The processor core writes the data into the vector address through the first addressing unit, and the first data transceiver unit and the read-write request cache are at the same level, and the first address calculation unit And the first data processing unit is located in the first stage of the pipeline, and the first conflict processing unit is located in the second stage of the pipeline.
The addressing method for a processor according to claim 32, wherein the handshake protocol comprises:

When the read request valid signal and the read request ready signal are both high, the first control unit reads the read request from the processor core;

When the read data valid signal and the read data ready signal are both high, the first control unit controls the first data transceiver unit to send the data to the processor core.
The addressing method for a processor according to claim 32, wherein the handshake protocol comprises:

When the write request valid signal and the write request ready signal are both high, the first control unit reads the write request and data from the processor core, and the write busy signal is pulled high;

The first control unit writes the data into the memory, and the write busy signal is pulled low.
A processor, characterized in that the processor includes: a processor core, an addressing module, and a memory;

The addressing module is configured to obtain the base address and the offset address of the data in the memory, and obtain the storage address of the data in the memory according to the base address and the offset address;

The processor core may access the data of the memory at the storage address through the addressing module.
The processor of claim 38, wherein the addressing module comprises: at least one set of addressing units; the at least one set of addressing units is used to perform operations of the addressing module.
The processor according to claim 39, wherein the set of addressing units at least comprises: a first addressing unit; and the storage address is a vector address;

The processor core may access the data of the vector address through the first addressing unit; when there is a memory block conflict in the vector address, the first addressing unit may use a conflict resolution mechanism to access the data data.
The processor of claim 40, wherein:

The processor core may also generate a task instruction for accessing the data in response to the code for accessing the data, and send the task instruction to the first addressing unit.
The processor of claim 41, wherein the first addressing unit comprises: a first address calculation unit, a first conflict processing unit, a first data processing unit, and a first data transceiving unit.
The processor of claim 42, wherein:

The first conflict processing unit may use the conflict resolution mechanism to read the data from the vector address, and send the data to the first data processing unit;

The first data processing unit is configured to process the data, and send the processed data to the first data transceiving unit;

The first data transceiver unit is configured to send the processed data to the processor core.
The processor of claim 43, wherein the first conflict processing unit comprises: a conflict judgment unit, an address mapping unit, an address strobe, and a read data recombination unit;

In the conflict resolution mechanism:

The address strobe is used to gate the vector address so that the vector address is output to the conflict judgment unit;

The conflict judgment unit is configured to generate a conflict flag valid signal when there is a storage block conflict in the vector address;

In response to the conflict flag valid signal, the address strobe is also used to keep strobing the vector address, the address mapping unit is used to map the vector address to the physical address of the memory, and the read The data reorganization unit is configured to read the data of the physical address, reorganize the data, and send the reorganized data to the first data processing unit;

The conflict judgment unit is also used to generate a conflict flag failure signal;

In response to the conflict flag failure signal, the address selector is also used to select the vector address of the next set of data.
The processor of claim 44, wherein:

The address mapping unit is also used to group the first storage unit corresponding to the vector address of each storage block into a group, and the second storage unit corresponding to the vector address into a group, until and The n-th storage unit corresponding to the vector address is grouped into a group to obtain a total of n groups of storage units, and the n groups of storage units are sequentially strobed;

The read data reorganization unit is also used to read the data stored in the n groups of storage units in sequence according to the strobe sequence of the n groups of storage units, and to order the data stored in the n groups of storage units in descending order of addresses The large order is rearranged to obtain the reorganized data.
The processor of claim 44, wherein:

The conflict judgment unit is further configured to generate a conflict flag failure signal when there is no storage block conflict in the vector address;

The address mapping unit is further configured to map the vector address to the physical address of the memory;

The read data recombination unit is further configured to read the data of the physical address, and send the read data to the first data processing unit;

In response to the conflict flag failure signal, the address selector is also used to select the vector address of the next set of data.
The processor according to claim 46, wherein the non-existent memory block conflict of the vector address comprises:

Dividing the vector addresses into m groups or 2×m groups evenly, and each group address corresponds to one storage unit of one storage block of the memory;

Wherein, the bit width of the storage block is m bytes.
The processor of claim 42, wherein:

The first data transceiving unit is configured to receive the data sent by the processor core, and send the data to the first data processing unit;

The first data processing unit is configured to process the data, and send the processed data to the first conflict processing unit;

The first conflict processing unit may use the conflict resolution mechanism to write the data into the vector address.
The processor of claim 48, wherein the first conflict processing unit comprises: a conflict judgment unit, an address mapping unit, an address strobe, a data strobe, and a write data recombination unit;

In the conflict resolution mechanism:

The address strobe is used to gate the vector address so that the vector address is output to the conflict judgment unit;

The data strobe is also used to gate the data so that the data is output to the write data recombination unit;

The conflict judgment unit is configured to generate a conflict flag valid signal when there is a storage block conflict in the vector address;

In response to the conflict flag valid signal, the address strobe is also used to keep gating the vector address, and the data strobe is also used to keep gating the data;

The write data reorganization unit is used to reorganize the data, the address mapping unit is used to map the vector address to the physical address of the memory, and the write data reorganization unit will also be used for the reorganized data Write to the memory;

The conflict judgment unit is also used to generate a conflict flag failure signal;

In response to the conflict flag failure signal, the data strobe is also used to strobe the next set of data, and the address selector is also used to strobe the vector address of the next set of data.
The processor of claim 49, wherein:

The write data reorganization unit is also used for:

Determine the first storage unit of each storage block corresponding to the vector address, and group the data corresponding to the first storage unit into a row according to the order of address ascending;

Determine the second storage unit corresponding to the vector address of each storage block, and compile the data corresponding to the second storage unit into one row according to the order of address from small to large;

Until the nth storage unit corresponding to the vector address of each storage block is determined, the data corresponding to the nth storage unit is compiled into one row according to the address ascending order, and a total of n rows of data are obtained;

The address mapping unit is further configured to sequentially select n groups of storage units corresponding to the n rows of data;

The write data reorganization unit is further configured to sequentially write the n rows of data into the n groups of storage units according to the gating sequence of the n groups of storage units.
The processor of claim 49, wherein:

The conflict judgment unit is further configured to generate a conflict flag failure signal when there is no storage block conflict in the vector address;

The address mapping unit is further configured to map the vector address to the physical address of the memory;

The write data reorganization unit is also used to write the data into the physical address;

In response to the conflict flag failure signal, the address selector is also used to select the vector address of the next set of data.
The processor of claim 43, wherein for the data sent by the first conflict processing unit, the first data processing unit is further configured to splice some of the bytes therein.
The processor of claim 52, wherein the first data processing unit is further configured to:

For the data sent by the first conflict processing unit, when it is necessary to select k bytes from every m bytes to read, then select the k bytes from every m bytes to obtain N×k Bytes; where k≤log 2 m ;

Combine each m bytes of the N×k bytes together to obtain m×k blocks of data with a width of m bytes each;

Wherein, N is the number of storage blocks of the memory; the bit width of the storage block is m bytes.
The processor of claim 48, wherein the first data processing unit is further configured to split the data.
The processor of claim 54, wherein the first data processing unit is further configured to:

When the data includes m×k blocks and the width of each block is m bytes, the m bytes of each block are split to obtain N×k bytes, so that each k bytes corresponds to one storage K addresses of the block;

Wherein, N is the number of storage blocks of the memory; the bit width of the storage block is m bytes; k≤log 2 m .
The processor of claim 40, wherein the first addressing unit comprises: a base address selector and an offset address selector;

The base address update unit and the offset address selector may obtain the base address and the offset address of multiple sets of the data based on multiple modes.
The processor of claim 56, wherein the multiple modes include at least: an offset address update mode and a base address update mode.
The processor of claim 57, wherein the set of addressing units at least further comprises: a second addressing unit;

In the offset address update mode:

The base address selector is used to select and obtain the base address sent by the processor core;

The second addressing unit is used to sequentially read the offset address of each group of the data in the memory;

The offset address selector is used to select the offset address read by the second addressing unit.
The processor of claim 58, wherein:

The first address calculation unit further includes: an adder;

The adder is configured to sequentially add the base address sent by the processor core and the offset address of each group of data to obtain the vector address.
The addressing method for a processor according to claim 57, wherein the group of addressing units at least further comprises: a second addressing unit;

In the base address update mode:

The first address calculation unit is configured to sequentially obtain the base address update value of each group of the data sent by the processor core;

The second addressing unit is used to read the same offset address of each group of data in the memory in a loop;

The offset address selector is used to select the same offset address read by the second addressing unit;

The first address calculation unit is further configured to sequentially accumulate the updated value of the base address of each group of data to the base address of the previous group of data to obtain the base address of each group of data.
The addressing method of the processor according to claim 60, wherein the first address calculation unit further comprises: an adder;

The adder is used to sequentially add the base address of each group of data to the same offset address to obtain the vector address of each group of data.
The processor of claim 58 or 60, wherein:

The second addressing unit is further configured to obtain the offset address sent by the processor core, and write the offset address into the memory.
The processor of claim 62, wherein the second addressing unit comprises: a second conflict processing unit, a second data processing unit, and a second data transceiving unit;

The second conflict processing unit may use the conflict resolution mechanism to read the offset address from the memory, and send the offset address to the second data processing unit;

The second data processing unit is configured to process the offset address, and send the processed offset address to the second data transceiving unit;

The second data transceiving unit is configured to send the processed offset address to the first addressing unit.
The processor of claim 40, wherein the first addressing unit comprises: a base address selector, an offset address selector, and an adder;

The base address selector is used to select the base address sent by the processor core;

The offset address selector is used to select the offset address sent by the processor core;

The adder is used to add the base address and the offset address to obtain the vector address.
The processor of claim 39, wherein the addressing module comprises: multiple groups of addressing units; and the multiple groups of addressing units are used to execute operations of the addressing module in parallel.
The processor of claim 39, wherein the group of addressing units can obtain the base address or the offset address in a ping-pong addressing manner.
The processor of claim 66, wherein the set of addressing units at least includes: a third addressing unit, a fourth addressing unit, and a fifth addressing unit;

The processor core is further configured to alternately write the offset address into the memory through the fourth addressing unit and the fifth addressing unit;

The third addressing unit is used to obtain the base address sent by the processor core, and alternately obtain the stored in the memory through the fourth addressing unit and the fifth addressing unit. Offset address.
The processor of claim 66, wherein the set of addressing units at least includes: a sixth addressing unit, a seventh addressing unit, and an eighth addressing unit;

The processor core is further configured to write the offset address into the memory through the eighth addressing unit;

The sixth addressing unit and the seventh addressing unit are used to alternately obtain the base address sent by the processor core, and obtain the stored in the memory through the eighth addressing unit. Offset address.
The processor of claim 42, wherein the first addressing unit further comprises: a first control unit;

The first control unit may communicate with the processor core through a handshake protocol.
The processor of claim 69, wherein the first control unit is configured to make the first address calculation unit, the first conflict processing unit, the first data processing unit, and the first data transceiving unit in a pipeline Way to work.
The processor of claim 70, wherein:

The processor core may also read the data from the vector address through the first addressing unit, and the first address calculation unit is located in the first stage of the pipeline, and the first conflict processing unit It is located in the second stage of the pipeline, the first data processing unit is located in the third and fourth stages of the pipeline, and the first data transceiving unit is located in the fifth stage of the pipeline.
The processor of claim 71, wherein the first control unit comprises: a read and write request cache;

The processor core may also write the data to the vector address through the first addressing unit, and the first data transceiving unit and the read-write request cache are at the same level, and the first address The calculation unit and the first data processing unit are located in the first stage of the pipeline, and the first conflict processing unit is located in the second stage of the pipeline.
The processor of claim 69, wherein, in the handshake protocol:

When the read request valid signal and the read request ready signal are both high, the first control unit is further configured to read the read request from the processor core;

When the read data valid signal and the read data ready signal are both high, the first control unit is further configured to control the first data transceiver unit to send the data to the processor core.
The processor of claim 69, wherein, in the handshake protocol:

When the write request valid signal and the write request ready signal are both high, the first control unit is also used to read the write request and data from the processor core, and the write busy signal is pulled high;

The first control unit is also used to write the data into the memory, and the write busy signal is pulled low.
A computer-readable storage medium, characterized by comprising instructions, which when run on a computer, causes the computer to execute the addressing method according to any one of claims 1 to 37.
A movable platform, characterized in that the movable platform comprises: a fuselage; the fuselage includes: at least one circuit; and the circuit includes: at least one processor according to any one of claims 38 to 74 Device.
An electronic device, characterized in that the electronic device comprises: a housing; the housing is provided with: at least one circuit; the circuit comprises: at least one processor according to any one of claims 38 to 74 .
A computer program product comprising instructions, characterized in that, when the instructions are run on a computer, the computer executes the addressing method according to any one of claims 1 to 37.