WO2021217293A1 - Addressing method for processor, processor, movable platform, and electronic device - Google Patents

Addressing method for processor, processor, movable platform, and electronic device Download PDF

Info

Publication number
WO2021217293A1
WO2021217293A1 PCT/CN2020/086985 CN2020086985W WO2021217293A1 WO 2021217293 A1 WO2021217293 A1 WO 2021217293A1 CN 2020086985 W CN2020086985 W CN 2020086985W WO 2021217293 A1 WO2021217293 A1 WO 2021217293A1
Authority
WO
WIPO (PCT)
Prior art keywords
data
address
unit
addressing
conflict
Prior art date
Application number
PCT/CN2020/086985
Other languages
French (fr)
Chinese (zh)
Inventor
韩志
吴穹蔗
刘石壮
Original Assignee
深圳市大疆创新科技有限公司
Priority date (The priority date is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the date listed.)
Filing date
Publication date
Application filed by 深圳市大疆创新科技有限公司 filed Critical 深圳市大疆创新科技有限公司
Priority to CN202080004613.6A priority Critical patent/CN112639747A/en
Priority to PCT/CN2020/086985 priority patent/WO2021217293A1/en
Publication of WO2021217293A1 publication Critical patent/WO2021217293A1/en

Links

Images

Classifications

    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F12/00Accessing, addressing or allocating within memory systems or architectures
    • G06F12/02Addressing or allocation; Relocation
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F12/00Accessing, addressing or allocating within memory systems or architectures
    • G06F12/02Addressing or allocation; Relocation
    • G06F12/08Addressing or allocation; Relocation in hierarchically structured memory systems, e.g. virtual memory systems
    • G06F12/10Address translation
    • G06F12/1072Decentralised address translation, e.g. in distributed shared memory systems

Definitions

  • the present disclosure relates to the field of data processing, and in particular to an addressing method of a processor, a processor, a movable platform, and an electronic device.
  • the memory needs to be addressed to read data from the memory or write data to the memory.
  • the storage address of the data in the memory is often irregular, or the rules are too complex and changeable, so the processor usually accesses the memory by means of look-up table addressing.
  • the present disclosure provides an addressing method for a processor, the processor includes: a processor core, an addressing module, and a memory; the addressing method includes:
  • the addressing module obtains the base address and the offset address of the data in the memory
  • the addressing module obtains the storage address of the data in the memory according to the base address and the offset address;
  • the processor core accesses the data of the storage address through the addressing module.
  • the present disclosure also provides a processor, which includes: a processor core, an addressing module, and a memory;
  • the addressing module is configured to obtain the base address and the offset address of the data in the memory, and obtain the storage address of the data in the memory according to the base address and the offset address;
  • the processor core may access the data of the memory at the storage address through the addressing module.
  • the present disclosure also provides a computer-readable storage medium, including instructions, which when run on a computer, cause the computer to execute the above-mentioned addressing method.
  • the present disclosure also provides a movable platform.
  • the movable platform includes a fuselage; the fuselage includes at least one circuit; and the circuit includes at least one processor as described above.
  • the present disclosure also provides an electronic device, the electronic device includes: a housing; the housing is provided with: at least one circuit; the circuit includes: at least one processor as described above.
  • the present disclosure also provides a computer program product including instructions, which, when the instructions run on a computer, cause the computer to execute the addressing method described above.
  • FIG. 1 is a flowchart of an addressing method of a processor according to an embodiment of the disclosure.
  • Fig. 2 is a schematic structural diagram of a processor according to an embodiment of the disclosure.
  • FIG. 3 is a schematic structural diagram of an addressing module according to an embodiment of the disclosure.
  • FIG. 4 shows the data flow of the read operation of the first addressing unit and the second addressing unit of the embodiment of the present disclosure.
  • FIG. 5 is a schematic diagram of the structure of the first address calculation in an embodiment of the disclosure.
  • FIG. 6 is a flowchart of a processor core accessing data of a storage address through a first addressing unit during a read operation in an embodiment of the disclosure.
  • FIG. 7 is a schematic structural diagram of a first conflict processing unit according to an embodiment of the disclosure.
  • FIG. 8 shows the data processing process of the conflict resolution mechanism in the read operation of the embodiment of the present disclosure.
  • FIG. 9 shows the data processing process in the embodiment of the present disclosure in which there is no storage block conflict.
  • FIG. 10 shows the data processing process of data splicing in the embodiment of the present disclosure.
  • FIG. 11 shows the data processing process of the base address update mode of the embodiment of the present disclosure.
  • FIG. 12 is a schematic structural diagram of a base address update unit according to an embodiment of the disclosure.
  • FIG. 13 shows a signal timing diagram of the handshake protocol of a read operation according to an embodiment of the present disclosure.
  • FIG. 14 is a schematic diagram of another structure of the first addressing unit according to an embodiment of the disclosure.
  • FIG. 15 shows the data flow of the write operation of the first addressing unit and the second addressing unit of the embodiment of the present disclosure.
  • FIG. 16 is a flowchart of a processor core accessing data of a storage address through a first addressing unit in a write operation in an embodiment of the disclosure.
  • Figure 17 shows the data processing process of data splitting in an embodiment of the present disclosure.
  • FIG. 18 is a schematic diagram of another structure of the first conflict processing unit according to an embodiment of the disclosure.
  • FIG. 19 shows the data processing process of the conflict resolution mechanism in the write operation of the embodiment of the present disclosure.
  • FIG. 20 shows a signal timing diagram of the handshake protocol of a write operation in an embodiment of the present disclosure.
  • FIG. 21 is a schematic diagram of another structure of the first addressing unit according to an embodiment of the disclosure.
  • FIG. 22 is a schematic diagram of another structure of an addressing module according to an embodiment of the disclosure.
  • FIG. 23 is a schematic structural diagram of an addressing module in a ping-pong addressing mode according to an embodiment of the disclosure.
  • FIG. 24 is a schematic diagram of another structure of an addressing module in a ping-pong addressing mode according to an embodiment of the disclosure.
  • FIG. 25 is a schematic structural diagram of a movable platform according to an embodiment of the disclosure.
  • FIG. 26 is a schematic structural diagram of an electronic device according to an embodiment of the disclosure.
  • the processor core reads the base address and offset address in the register, and calculates the storage address.
  • the entire addressing process is completed by the processor core, which occupies the computing resources of the processor core, and the table look-up efficiency is low.
  • the addressing mode is single, the flexibility is insufficient, and multiple flexible addressing modes cannot be provided, and the addressing efficiency for reading and writing of large amounts of data is low.
  • the addressing method for the processor, the processor, the computer-readable storage medium, the removable platform, and the electronic device provided in the present disclosure can use the addressing module to realize the access of the processor core to the memory, that is, the processor core can access the memory through the addressing module.
  • the memory reads data and writes data to the memory.
  • processor in this embodiment can be any type of device with data processing capabilities, such as but not limited to central processing unit (CPU), digital signal processor (DSP), application specific integrated circuit (ASIC), field Programmable gate array (FPGA), graphics processing unit (GPU), microprocessor, microcontroller, network processor (NP) or other programmable logic devices, discrete gate or transistor logic devices, discrete hardware components.
  • CPU central processing unit
  • DSP digital signal processor
  • ASIC application specific integrated circuit
  • FPGA field Programmable gate array
  • GPU graphics processing unit
  • microprocessor microcontroller
  • NP network processor
  • the processor may be a single-core processor or a multi-core processor, including one or more processor cores.
  • the processor core may include an arithmetic logic unit (ALU) and/or control logic.
  • ALU can perform arithmetic and logical operations.
  • the control logic is used to control a series of operations of the ALU.
  • the ALU may include a multiply and ACumulate (MAC, Multiply and ACumulate) and a shifter.
  • MAC multiply and ACumulate
  • Each MAC includes a multiplier and an adder, which are used to perform arithmetic operations of multiplication and addition.
  • the shifter is used to perform logic operations for shifting data.
  • the memory in this embodiment may be various random access memories (Random Access Memory, RAM), for example, static random access memory (Static RAM, SRAM), dynamic random access memory (Dynamic RAM, DRAM), synchronous dynamic Random access memory (Snchronous DRAM, SDRAM), double data rate synchronous dynamic random access memory (Double Data Rate SDRAM, DDR SDRAM), enhanced synchronous dynamic random access memory (Enhanced SDRAM, ESDRAM), synchronous connection dynamic random access memory Access memory (SynchLink DRAM, SLDRAM) and direct memory bus random access memory (Direct Rambus RAM, DR RAM).
  • static random access memory Static RAM, SRAM
  • dynamic random access memory Dynamic RAM, DRAM
  • synchronous dynamic Random access memory Snchronous DRAM, SDRAM
  • Double data rate synchronous dynamic random access memory Double Data Rate SDRAM, DDR SDRAM
  • Enhanced SDRAM, ESDRAM enhanced synchronous dynamic random access memory
  • SynchLink DRAM, SLDRAM synchronous connection dynamic random access memory Access memory
  • Direct Rambus RAM Direct Ram
  • An embodiment of the present disclosure provides an addressing method for a processor. As shown in FIG. 1, the addressing method includes:
  • the addressing module obtains the base address and the offset address of the data in the memory
  • the addressing module obtains the storage address of the data in the memory according to the base address and the offset address;
  • S103 The processor core accesses the data of the storage address through the addressing module.
  • the processor includes: a processor core, an addressing module, and a memory, and the addressing module can be integrated inside the processor.
  • the addressing module can be used to realize the table look-up addressing of the memory by the processor core, that is, the processor core can read data from the memory and write data into the memory in a table look-up manner through the addressing module.
  • the addressing module may include one or more groups of addressing units.
  • a group of addressing units is taken as an example to describe the case where the group of addressing units executes the addressing method.
  • the group of addressing units includes two identical addressing units.
  • the two addressing units communicate with the processor core through the system bus respectively.
  • An internal bus is set in the addressing module, and the two addressing units communicate through the internal bus.
  • one of the two addressing units is used to read and write data.
  • This addressing unit can be called a table addressing unit, and the other addressing unit is used for reading and writing data.
  • the other addressing unit may be called an offset addressing unit.
  • the two addressing units are referred to as the first addressing unit and the second addressing unit respectively, and the first addressing unit is used as the table addressing unit, and the second addressing unit is used as the offset addressing unit.
  • the unit is taken as an example to describe the addressing method of this embodiment.
  • the roles of the first addressing unit and the second addressing unit can also be interchanged, that is, the first addressing unit is used as an offset addressing unit, and the second addressing unit is used as a table addressing unit. .
  • an addressing module is set in the processor, and the storage address of the data in the memory is obtained by the addressing module instead of the processor core according to the base address and the offset address.
  • the addressing operations are all completed by the addressing module.
  • the storage address calculation process does not require the participation of the processor core, but the addressing module calculates the storage address, which improves the efficiency of table look-up addressing compared with ordinary processors.
  • the first addressing unit obtains the base address and offset address of the data in the memory.
  • the first addressing unit includes: a first address calculation unit, a first conflict processing unit, a first data processing unit, a first data transceiving unit, and a first control unit.
  • the solid lines in the figure represent address and data signals, and the dashed lines represent control signals.
  • the first control unit can communicate with the processor core through the system bus, and control the operations of the first address calculation unit, the first conflict processing unit, the first data processing unit, and the first data transceiving unit.
  • the processor core needs to read data from the memory
  • the processor core sends a read request to the first control unit through the system bus; in response to the read request, the first control unit sends an offset address to the second addressing unit through the internal bus ask.
  • the second addressing unit reads the offset address of the data in the memory from the memory, sends the offset address to the first address calculation unit via the internal bus, and sends the offset address to the first address calculation unit via the internal bus.
  • a control unit sends an offset address valid signal. In response to the offset address valid signal, the first control unit starts each other unit of the first addressing unit to perform a read operation.
  • the first address calculation module receives the base address of the data sent by the processor core and the offset address sent by the second addressing unit, and obtains the vector address of the data from the base address and the offset address.
  • the first address calculation unit includes: a base address selector, an offset address selector, and an adder.
  • the base address selector selects the base address sent by the processor core.
  • the offset address selector selects the offset address sent by the second addressing unit through the internal bus.
  • the number of offset addresses corresponds to the number of banks (Bank) of the memory.
  • the number of offset addresses is N.
  • the first address calculation unit After obtaining the base address and the offset address, in S102, the first address calculation unit obtains the storage address of the data in the memory according to the base address and the offset address.
  • the adder of the first address calculation unit respectively sums the base address and the N offset addresses to obtain the storage address of the data, and the storage address is a vector address including 16 addresses.
  • the processor core After obtaining the vector address, in S103, the processor core reads the data stored at the vector address from the memory through the first addressing unit. When there is a bank conflict in the vector address, the processor core reads the data of the vector address through the first addressing unit.
  • the first conflict processing unit determines whether there is a Bank conflict.
  • the processor core accesses the data of the storage address through the first addressing unit including:
  • the first conflict processing unit reads the data from the vector address by using a conflict resolution mechanism, and sends the data to the first data processing unit;
  • the first data processing unit processes the data, and sends the processed data to the first data transceiving unit;
  • the first data transceiving unit sends the processed data to the processor core.
  • the first conflict processing unit includes: a conflict judgment unit, an address mapping unit, an address strobe, and a read data recombination unit.
  • the vector address sent by the first address calculation unit is directly input to the address strobe, and the vector address buffer caches the vector address at the same time.
  • the address strobe strobes the vector address directly sent by the first address calculation unit, so that the vector address is output to the conflict judgment unit.
  • the conflict judgment unit judges the vector address:
  • the conflict judgment unit When there is a bank conflict in the vector address, the conflict judgment unit generates a conflict flag valid signal, and feeds back the conflict flag valid signal to the address strobe, so that the address strobe strobes the vector address output by the vector address register. During the conflict processing, keep the vector address of the input conflict judgment unit unchanged.
  • the address mapping unit maps the vector address to the physical address of the memory.
  • the data reorganization unit reads the data of the physical address, reorganizes the data, and sends the reorganized data to the first data processing unit.
  • the conflict judgment unit After that, the conflict judgment unit generates a conflict flag invalidation signal.
  • the address selector selects the vector address of the next data sent by the first address calculation unit, and the vector address buffer caches the vector address of the next data at the same time.
  • the conflict judgment unit continues to process the bank conflict of the next data.
  • the address mapping unit maps the vector address to the physical address of the memory in the following way:
  • the first memory cell (cell) corresponding to the vector address of each bank is grouped into a group, the second cell corresponding to the vector address is grouped into a group, and so on, until the nth cell corresponding to the vector address is grouped
  • the cells are grouped into a group to obtain a total of n groups of cells, and the n groups of cells in the memory are sequentially strobed.
  • the read data reorganization unit reads the data of the vector address in the following way, and reorganizes the data:
  • the gating sequence of the n groups of cells sequentially read the data stored in the n groups of cells, and rearrange the data stored in the n groups of cells in the order of address from small to large, to obtain the reorganized data.
  • the memory includes 16 banks
  • each bank includes 5 cells
  • each cell can store 4 bytes of 32 bits.
  • the vector address of each group of data includes 16 addresses
  • the first addressing unit can read a group of 16 data from the memory each time.
  • the processor core needs to read a set of data labeled "1", “2”, “3”, and “4" in the memory, and the data labeled "1", “2", “3”, “4"
  • the base address of the data is 0,
  • the offset address is [0, 1, 2, 3, 68, 69, 70, 71, 136, 137, 138, 139, 204, 205, 206, 207],
  • the first address calculation unit The output vector address is [0, 1, 2, 3, 68, 69, 70, 71, 136, 137, 138, 139, 204, 205, 206, 207].
  • the vector address is directly input to the address strobe of the first conflict processing unit, and at the same time, the vector address buffer caches the vector address.
  • the address strobe selects the vector address, and the conflict judgment unit judges the vector address.
  • the vector address of the group of data corresponds to the same Bank of the memory, for example, the vector address corresponds to the first and second cells of Bank1 (B1), and the first and second cells of B2 And the third cell, and so on, so there is a Bank conflict with this vector address.
  • the conflict judgment unit generates a conflict flag valid signal.
  • the address strobe gates the vector address output by the vector address register, so that the vector address can be kept unchanged during the conflict processing. Until the end of the conflict resolution.
  • the address mapping unit compiles the first cell corresponding to the vector address of Bank1-Bankf (B0-Bf) into a group.
  • the first cell of B0-B3 corresponds to [0,1,2,3] in the vector address
  • the second cell of B4 corresponds to [71] in the vector address
  • the first cell of B5 The three ells correspond to [139] in the vector address
  • the fourth cell of B6 corresponds to [207] in the vector address. Therefore, the first cell corresponding to the vector address of B0-Bf is: the first cell of B0-B3, the second cell of B4, the third cell of B5, and the fourth cell of B6, namely the first cell A group of cells includes 7 cells labeled "1".
  • the second cell corresponding to the vector address of B0-Bf is grouped into a group.
  • the second cell of B1-B3 corresponds to [68,69,70] in the vector address
  • the third cell of B4 corresponds to [138] in the vector address
  • the fourth cell of B5 ell corresponds to [206] in the vector address. Therefore, the second cell corresponding to the vector address of B0-Bf is: the second cell of B1-B3, the third cell of B4, and the fourth cell of B5, that is, the second group of cells includes the label " 2" 5 cells.
  • the address mapping unit sequentially selects the four groups of cells in the memory.
  • n can be other values, which depend on the vector address itself.
  • the read data reorganization unit reads the data stored in the four groups of cells from the memory in sequence according to the strobe sequence of the four groups of cells, and the read data is shown in FIG. 8.
  • the data stored in a group of cells can be read in one clock cycle, and the data can be read in four clock cycles.
  • first to fourth groups of cells can be sequentially selected (as shown in FIG. 8), or the fourth group to the first group can be sequentially selected in reverse order.
  • Group cells, four groups of cells can also be selected in sequence at random.
  • the data read from the memory by the read data reorganization unit is not arranged in accordance with the address, and its arrangement order does not match its actual storage location in the memory, that is to say, the read data is not arranged in accordance with the processing
  • the processor cores need to be arranged in the order, and the processor cores cannot be used yet. Therefore, the read data reorganization unit needs to reorganize the data stored in the 4 groups of cells, and rearrange them in the order of address from small to large to obtain the reorganized data.
  • the data is arranged according to its actual storage location in the memory, and the data reorganization unit sends the reorganized data to the first data processing unit for use by the processor core.
  • the conflict processing of the group of data ends, and the conflict judgment unit generates a conflict flag failure signal.
  • the address selector selects the vector address of the next set of data sent by the first address calculation unit, the vector address buffer caches the vector address of the next set of data at the same time, and the conflict judgment unit continues to check the next set of data. Bank conflicts are dealt with.
  • the conflict judgment unit judges the vector address. When there is no bank conflict in the vector address, the conflict judgment unit generates a conflict flag failure signal.
  • the address mapping unit maps the vector address to the physical address of the memory.
  • the read data reorganization unit reads the data of the physical address without reorganizing the read data, and directly sends the read data to the first data processing unit. Since there is no bank conflict, the data can be read in one clock cycle.
  • the address selector strobes the vector address of the next set of data.
  • each address of the vector address corresponds to different banks of the memory as no Bank conflict
  • the absence of Bank conflict described in this embodiment also includes the following situations:
  • the vector addresses are equally divided into m groups or 2 ⁇ m groups. If each group of addresses corresponds to a cell of a bank, it is considered that there is no bank conflict in the vector address, and the first conflict processing unit performs address splicing on the vector address.
  • FIG. 9 uses FIG. 9 as an example to illustrate the above-mentioned situation where there is no Bank conflict.
  • the processor core needs to read a set of data labeled "1" to "16" in the memory, and the base address of the data labeled "1" to “16” is 0, and the offset address Is [0, 1, 2, 3, 68, 69, 70, 71, 136, 137, 138, 139, 204, 205, 206, 207], the vector address output by the first address calculation unit is [0, 1, 2, 3, 68, 69, 70, 71, 136, 137, 138, 139, 204, 205, 206, 207].
  • the first conflict processing unit should use the conflict resolution mechanism to read the group of data, which requires four clocks The reading of the data can be completed in a cycle.
  • the conflict determination unit of this embodiment continues to determine the vector address.
  • the bit width of the Bank is 4 bytes, and the vector address is divided into 4 groups equally.
  • the 4 groups of addresses are [0, 1, 2, 3], [68, 69, 70, 71], [136,137,138,139], [204,205,206,207].
  • each group of addresses corresponds to a cell of a Bank
  • [0, 1, 2, 3] corresponds to the first cell of B0
  • [68, 69, 70, 71] corresponds to the second cell of B1
  • [136, 137,138,139] corresponds to the third cell of B2
  • [204,205,206,207] corresponds to the fourth cell of B3
  • the vector address is considered to be [0,1,2,3,68,69,70 , 71, 136, 137, 138, 139, 204, 205, 206, 207]
  • the first conflict processing unit can read all the data "1" to "16" in one clock cycle.
  • the vector addresses can also be equally divided into 8 groups. If each group of addresses corresponds to a cell of a bank, it is also considered that there is no bank conflict in the vector address.
  • the first data processing unit After the first data processing unit receives the data sent by the first conflict processing unit, it decides whether to perform further processing on the data according to the data width required by the processor core. When the data required by the processor core is not all the bytes read from each cell of each Bank, but a partial byte of each cell of each Bank, the first data processing unit compares the data sent by the first conflict processing unit Part of the bytes are spliced to generate data required by the processor core, and the spliced data is sent to the first data transceiver unit.
  • the first data processing unit splicing partial bytes of the data sent by the first conflict processing unit includes:
  • FIG. 10 takes FIG. 10 as an example to describe the process of data splicing.
  • the bit width of the Bank is 4 bytes, that is, each cell of each Bank stores 4 bytes of data, so as As shown in FIG. 10, the data sent by the first conflict processing unit to the first data processing unit includes 16 blocks, each block includes 4 bytes, and a total of 64 bytes of 512 bits.
  • the processor core needs 1 byte in every 4 bytes, and the first data processing unit selects the processor core from every 4 bytes.
  • the processor core needs 16 bytes of 2, 7, 9, 16, ..., 64.
  • the first data processing unit combines every 4 bytes of the selected 16 bytes to obtain 4 blocks of data with a width of 4 bytes each, which is the data required by the processor core , And send the spliced data to the first data transceiver unit.
  • the first data processing unit does not need to splice the data sent by the first conflict processing unit, but directly sends the data to the first data transceiving unit.
  • the first data processing unit combines every 4 bytes of the selected 32 bytes to obtain 8 blocks of data with a width of 4 bytes each, which is the data required by the processor core , And send the spliced data to the first data transceiver unit.
  • the first data transceiver unit includes: a receiving buffer and a sending buffer.
  • the sending buffer buffers the data sent by the first data processing unit, and sends the buffered data to the processor core through the system bus.
  • the depth of the receiving buffer and the sending buffer can be set according to actual needs. In an example, the minimum depth of the receive buffer is 2 and the minimum depth of the transmit buffer is 0.
  • the addressing operation is performed by the processor core, that is, the processor core obtains the base address and the offset address and calculates the vector address. If there is a bank conflict in the vector address, the processor needs to check the bank conflict and deal with it. During the processing of the Bank conflict, the processor core cannot perform other operations and needs to wait for the resolution of the Bank conflict. After the Bank conflict is resolved, the processor core can perform other operations.
  • the first addressing unit is set in the processor, and the acquisition of the base address and the offset address and the calculation of the vector address are all completed by the first addressing unit.
  • the first addressing unit uses the conflict resolution mechanism to resolve the bank conflict, and the processor core does not need to deal with the bank conflict.
  • the processor core can still perform other operations without waiting for the resolution of the Bank conflict. Therefore, the addressing method of this embodiment can significantly improve the efficiency of the processor and increase the computing speed of the processor.
  • the process of reading data through the addressing method of this embodiment is described above. Each set of data required by the processor core is read in the above manner.
  • the first addressing unit can obtain the base address and offset address of the multiple sets of data based on multiple different modes.
  • One mode can be called the offset address update mode.
  • the offset address update mode the base address of multiple groups of data is unchanged, and the offset address of each group of data comes from the second addressing unit.
  • the first address calculation unit obtains the base address sent by the processor core; the second addressing unit sequentially reads the offset address of each group of data in the memory; the first address calculation unit obtains the offset address read by the second addressing unit.
  • the base address selector selects the base address sent by the processor core.
  • the offset address selector selects the offset address of the group of data sent by the second addressing unit through the internal bus.
  • the adder sums the base address and the offset address of the group of data to obtain the vector address of the group of data.
  • the adder sends the vector address of the group of data to the first conflict resolution unit, and sends the data to the processor core through the first conflict resolution unit, the first data processing unit, and the first data transceiver unit to complete the reading of the group of data Pick.
  • the offset address selector selects the offset address of the group of data sent by the second addressing unit through the internal bus, so as to realize the reading of multiple groups of data.
  • the other mode can be called the base address update mode.
  • the base address update mode the offset address of multiple groups of data comes from the second addressing unit, and the offset address of each group of data is the same offset address. By updating the initial value of the base address, the offset address of each group of data is obtained. Base address.
  • the vector address is [0, 4, 8 , 12, 68, 72, 76, 80, 136, 140, 144, 148, 204, 208, 212, 216]; for the second set of data labeled "2”, the vector address is [20, 24, 28, 32, 88, 92, 96, 100, 156, 160, 164, 168, 224, 228, 232, 236]; for the third group of data labeled "3", the vector address is [40, 44, 48, 52 , 108, 112, 116, 120, 176, 180, 184, 188, 244, 248, 252, 256].
  • the processor core needs to write the offset address of the group of data into the memory through the second addressing unit, and then the second addressing unit.
  • the offset address of the data is read from the memory and sent to the first addressing unit.
  • the first addressing unit obtains the vector address from the base address [0] sent by the processor core and the offset address sent by the second addressing unit, Read this set of data from its vector address. In this way, for the above three sets of data, three offset address write operations are required.
  • the other addresses of the vector address of the set of data have the same offset address relative to the first address.
  • the offset address of the first group of data can be expressed as [0, 4, 8, 12, 68, 72, 76, 80, 136, 140, 144, 148, 204, 208, 212, 216];
  • the offset address of the second group of data can also be expressed as [0, 4, 8, 12, 68, 72, 76, 80, 136, 140, 144, 148, 204, 208, 212, 216];
  • the offset address of the third group of data is also It can be expressed as [0, 4, 8, 12, 68, 72, 76, 80, 136
  • the base address update unit includes an adder and a D flip-flop.
  • the processor core uses the second addressing unit to offset the address [0, 4, 8, 12, 68, 72, 76, 80, 136, 140, 144, 148, 204, 208, 212, 216] write to the memory, and set the second addressing unit to the cyclic read mode.
  • the second addressing unit when reading the first group of data labeled "1", the second addressing unit will offset the address [0, 4, 8, 12, 68, 72, 76, 80, 136, 140, 144, 148, 204, 208, 212, 216] are read from the memory and sent to the first addressing unit.
  • the base address selector selects the output of the base address update unit, the processor core sends the base address initial value [0] to the base address update unit, and the base address initial value [0] passes through the adder.
  • the D flip-flop when the clock pulse CP of the D flip-flop is valid, the D flip-flop sends the initial value of the base address [0] to the base address selector, and the base address selector sends the initial value of the base address [0] to the first Adder for address calculation unit.
  • the adder adds the initial value of the base address [0] and the offset address [0, 4, 8, 12, 68, 72, 76, 80, 136, 140, 144, 148, 204, 208, 212, 216] Get the vector address [0, 4, 8, 12, 68, 72, 76, 80, 136, 140, 144, 148, 204, 208, 212, 216], read the data labeled "1" from the vector address out.
  • the second addressing unit When reading the first group of data labeled "2", the second addressing unit will still offset the address [0, 4, 8, 12, 68, 72, 76, 80, 136, 140, 144, 148, 204, 208, 212, 216] are read from the memory and sent to the first addressing unit.
  • the base address selector still selects the output of the base address update unit, the processor core sends the base address update value [20] to the base address update unit, and the adder combines the base address update value [20] with the data labeled "1".
  • the base address (that is, the initial value of the base address [0]) is added to obtain the base address [20] of the data labeled "2", and enter the D flip-flop.
  • the D flip-flop When the clock pulse CP of the D flip-flop is valid, the D flip-flop The base address [20] is sent to the base address selector, and the base address selector sends the base address [20] to the adder of the first address calculation unit.
  • the adder adds the base address [20] and the offset address [0, 4, 8, 12, 68, 72, 76, 80, 136, 140, 144, 148, 204, 208, 212, 216] to obtain the vector address [20, 24, 28, 32, 88, 92, 96, 100, 156, 160, 164, 168, 224, 228, 232, 236], read the data labeled "2" from the vector address.
  • the second addressing unit When reading the first group of data labeled "3", the second addressing unit will still offset the address [0, 4, 8, 12, 68, 72, 76, 80, 136, 140, 144, 148, 204, 208, 212, 216] are read from the memory and sent to the first addressing unit.
  • the base address selector still selects the output of the base address update unit, and the adder of the base address update unit adds the base address update value [20] to the base address [20] of the data labeled "2", and the result is labeled "3" "Data base address [40], and enter the D flip-flop.
  • the D flip-flop When the clock pulse CP of the D flip-flop is valid, the D flip-flop will send the base address [40] to the base address selector, and the base address selector will set the base address [40] Send to the adder of the first address calculation unit.
  • the adder adds the base address [40] and the offset address [0, 4, 8, 12, 68, 72, 76, 80, 136, 140, 144, 148, 204, 208, 212, 216] to get the vector address [40, 44, 48, 52, 108, 112, 116, 120, 176, 180, 184, 188, 244, 248, 252, 256], read the data labeled "3" from the vector address.
  • the general look-up table addressing mode requires the processor core to write the offset address to the memory three times, and this embodiment uses the base address update mode , The processor core writes the offset address to the memory once. It can be seen that the addressing mode of this embodiment reduces the number of times to write the offset address and saves the time spent writing the offset address.
  • the processor checks the memory for large-scale table look-up addressing, it can greatly reduce Addressing time, improving addressing efficiency, the advantage is extremely obvious.
  • the processor core sets the second addressing unit to the cyclic read mode and provides the base address update value. The entire addressing process does not require the processor core to participate too much, which can significantly improve the processor The efficiency of the processor increases the computing speed of the processor, especially in the large-scale look-up table addressing.
  • the addressing method of this embodiment also provides a fixed offset address mode.
  • the first address calculation unit obtains the base address sent by the processor core, and the base address selector selects the base address sent by the processor core, and sends the base address to the addition. Device.
  • the processor core also sends a fixed offset address to the first addressing unit, and the offset address selector of the first address calculation unit selects the fixed offset address and sends the fixed offset address to the adder.
  • the adder of the first address calculation unit adds the base address and the offset address to obtain the vector address.
  • the fixed offset address mode can be used in multiple addressing scenarios such as linear addressing and step addressing.
  • the addressing mode of this embodiment provides an offset address update mode, a base address update mode, and a fixed offset address mode, which can be flexibly selected according to actual conditions, which improves the flexibility of table look-up addressing.
  • the processor When the user's program code is running on the processor, the processor compiles the program code into instructions that the processor can execute. When the execution of a certain code needs to read data from the memory, the processor core sequentially executes several operations such as reading instructions, decoding, reading data, and executing instructions. In some processors, when the processor core executes the operation of reading data, if a bank conflict occurs, the processor core handles the bank conflict, and the processor core needs to generate multiple instructions to read the data. Therefore, the addressing mode of some processors is an instruction-driven addressing.
  • the addressing method in this embodiment is a task-driven addressing.
  • the processor core executes the operation of reading data
  • the processor core generates a set of instructions to read data, which is equivalent to a task instruction, and sends the task instruction to the first addressing unit through the system bus, and the entire addressing
  • the process is completed by the first addressing unit.
  • the data read from the memory by the first addressing unit is sent to the processor core via the system bus.
  • the processor core then performs subsequent operations after receiving the data. It can be seen that in this task-driven addressing of this embodiment, when the processor core needs to read data from the memory, the task instruction can be sent to the first addressing unit, and the processor core does not need to care about the specific addressing process. Even if the bank conflict occurs, it is handled by the first addressing unit. Compared with the general processor, the operation of the processor core is simplified and the efficiency is improved.
  • the first control unit of the first addressing unit can communicate with the processor core through a handshake protocol.
  • the processor core communicates with the first addressing unit through the system bus.
  • the system bus includes: clock signal line, read request valid, read request ready, read request, read data valid, read data ready and read
  • the first addressing unit works under the drive of the clock signal line.
  • the processor core needs to read data from the memory, the processor core sends task instructions to the first addressing unit and receives data from the first addressing unit through the handshake protocol.
  • the read request valid signal is high, it indicates that the read request signal is valid; when the read request valid signal and the read request ready signal are both high, the first control unit reads the read request from the processor core.
  • the first control unit controls the first address calculation unit, the first conflict processing unit, the first data processing unit, and the first data transceiving unit to read data from the memory.
  • the read data valid signal is high, it indicates that the read data is valid; when the read data valid signal and the read data ready signal are both high, the first control unit controls the first data transceiver unit to send data to the processor core.
  • the first control unit controls the first address calculation unit, the first conflict processing unit, the first data processing unit, and the first data transceiving unit to work in a pipeline manner.
  • the first control unit includes: read and write request buffers, synchronization registers, selectors, and the first, second, third, fourth, and fifth stages of the pipeline.
  • the processor core sends a read request through the system bus. If the read request is a table lookup request, the selector strobes the synchronization register.
  • the read and write request cache receives the read request and caches the read request. After receiving the read request, the read and write request buffer sends an offset address request to the second addressing unit through the internal bus, and sends the read request to the synchronization register.
  • the second addressing unit reads the offset address of the data in the memory from the memory, sends the offset address to the first address calculation unit via the internal bus, and synchronizes the data via the internal bus. Send an offset address valid signal.
  • the synchronization register After receiving the offset address valid signal, the synchronization register sends the read request signal to the pipeline controllers at all levels to start the pipeline operation.
  • the first-level, second-level, third-level, fourth-level, and fifth-level controllers send control signals to the first address calculation, the first conflict processing unit, the first data processing unit, and the first data transceiver unit, respectively.
  • the first address calculation unit is located at the first stage of the pipeline
  • the first conflict processing unit is located at the second stage of the pipeline
  • the first data processing unit is located at the third and fourth stages of the pipeline
  • the first data transceiver unit is located at the fifth stage of the pipeline. If the read request is not a table lookup request, the selector strobes the read and write request cache, sends the read request directly to the pipeline controllers at all levels, and starts the pipeline operation.
  • the first addressing unit of this embodiment also provides a streamline pause mechanism.
  • the processor core cannot receive the data sent by the first data transceiver unit through the system bus, the processor core sends the read request cache, synchronization register, and the first stage, second stage, third stage, and third stage of the pipeline through the system bus.
  • the fourth-level and fifth-level controllers send a bus pause signal. After the first, second, third, fourth, and fifth stage controllers of the read request cache, synchronization register, and pipeline receive the bus suspend signal, the pipeline suspends work.
  • the first conflict processing unit When the first conflict processing unit finds that there is a bank conflict in the vector address, the first conflict processing unit controls the read request cache, synchronization register, and the first, second, third, fourth, and fifth stages of the pipeline The controller sends a conflict pause signal, read request buffer, synchronization register, the first stage, second stage, third stage, fourth stage, and fifth stage of the pipeline after receiving the bus pause signal, the pipeline suspends work.
  • the first conflict processing unit sends a conflict recovery signal to read the request buffer, synchronization register, and the first, second, third, fourth, and fifth-level controllers of the pipeline. After the conflict recovery signal is reached, the pipeline is restarted.
  • each module of the first addressing unit can be executed in parallel according to the pipeline, which can greatly improve the working efficiency of the first addressing unit, reduce the addressing time, and improve the addressing efficiency.
  • the offset address is read from the memory by the second addressing unit.
  • the processor core sends the offset address to the second addressing unit through the system bus, and the second addressing unit writes the offset address into the memory.
  • the operation of the second addressing unit to read the offset address from the memory is similar to the above-mentioned operation of the first addressing unit to read data from the memory.
  • the structure of the second addressing unit is the same as that of the first addressing unit.
  • the second addressing unit uses basically the same operation as the operation of the first addressing unit to read data from the memory, and the offset address can be read from the memory.
  • the second addressing unit includes: a second control unit, a second address calculation unit, a second conflict processing unit, a second data processing unit, and a second data transceiving unit.
  • the second conflict processing unit reads the offset address from the memory, and sends the offset address to the second data processing unit; the second data processing unit checks the offset address The processing is performed, and the processed offset address is sent to the second data transceiving unit; the second data transceiving unit sends the processed offset address to the first addressing unit.
  • the difference between the second addressing unit and the first addressing unit is that the second data transceiving unit sends the processed offset address to the first addressing unit through the internal bus instead of the first addressing unit.
  • the first data transceiver unit sends data to the processor core through the system bus.
  • the operations of the second control unit, the second address calculation unit, the second conflict processing unit, the second data processing unit, and the second data transceiving unit are similar to the units corresponding to the first addressing unit.
  • part of the operation of the first addressing unit is similar to the read operation. For the sake of brevity, the following will focus on the differences between write operations and read operations.
  • the processor core writes the data into the vector address of the memory through the first addressing unit.
  • the data flow of the first data transceiving unit, the first data processing unit, and the first conflict processing unit is opposite to the read operation.
  • the processor core accesses the data of the storage address through the first addressing unit including:
  • the first data transceiver unit receives data sent by the processor core, and sends the data to the first data processing unit;
  • the first data processing unit processes the data, and sends the processed data to the first conflict processing unit;
  • the first conflict processing unit uses the conflict resolution mechanism to write data into the vector address.
  • the receiving buffer receives and buffers the data sent by the processor core through the system bus, and sends the data to the first data processing unit.
  • the first data processing unit decides whether to perform further processing on the data according to the data width written by the processor core.
  • the processor core does not write data into all the bytes of each cell of each Bank, but writes data into the partial bytes of each cell of each Bank
  • the first data processing unit sends data to the first data transceiver unit. Splitting is performed to generate data that needs to be written into the memory, and the split data is sent to the first conflict processing unit.
  • the splitting of the data sent by the first data transceiving unit by the first data processing unit includes:
  • the m bytes of each block are split to obtain N ⁇ k bytes, so that every k words
  • the sections correspond to the k addresses of a Bank respectively; among them, N is the number of banks in the memory; the bit width of the bank is m bytes; k ⁇ log 2 m .
  • the data written by the processor core includes 4 blocks, and each block includes 4 bytes.
  • the 16 bytes correspond to the storage location of one byte in each cell of the memory B0-Bf.
  • the 4 bytes of each block are split to obtain 16 bytes, so that each byte corresponds to the storage location of one byte of one cell of one Bank.
  • the split data format is the data format written into the memory, and the first data processing unit sends the split data to the first conflict processing unit.
  • the processor core wants to write data into all the bytes of each cell of each Bank, that is, the processor core needs to write 64 bytes to the memory
  • the first data processing unit does not need to send data to the first data transceiver unit. Splitting is performed, but the data is directly sent to the first conflict processing unit.
  • the first data processing unit splits the 2 bytes of each block to obtain 32 bytes, so that each byte corresponds to the storage position of the two bytes of one cell of a bank.
  • the split data format is the data format written into the memory, and the first data processing unit sends the split data to the first conflict processing unit.
  • the first conflict processing unit further includes: a write data buffer, a write data strobe, and a write data reorganization unit.
  • the vector address sent by the first address calculation unit is directly input to the address strobe, and the vector address buffer caches the vector address at the same time.
  • the address strobe gates the vector address directly sent by the first address calculation unit, so that the vector address is output to the conflict judgment unit.
  • the data sent by the first data processing unit is sent to the write data strobe, and the write data buffer simultaneously buffers the data sent by the first data processing unit.
  • the conflict judgment unit judges the vector address:
  • the conflict judgment unit When there is a bank conflict in the vector address, the conflict judgment unit generates a conflict flag valid signal, and feeds back the conflict flag valid signal to the address strobe and the write data strobe, so that the address strobe gates the output of the vector address register Vector address, the write data strobe strobes the data output from the write data buffer.
  • the vector address of the input conflict judgment unit and the data of the input write data reorganization unit can be kept unchanged during the conflict processing;
  • the address mapping unit maps the vector address to the physical address of the memory, and the write data reorganization unit writes the reorganized data into the physical address of the memory.
  • the conflict judgment unit After that, the conflict judgment unit generates a conflict flag invalidation signal.
  • the address selector selects the vector address of the next set of data sent by the first address calculation unit, and the vector address buffer caches the vector of the next set of data at the same time.
  • the write data strobe strobes the next set of data sent by the first data processing unit, and buffers the next set of data to the write data buffer, and the conflict judgment unit continues to process the bank conflict of the next set of data.
  • the data reorganization unit reorganizes the data including:
  • the data corresponding to the n-th cell is compiled into one row according to the order of address from small to large, and a total of n rows of data are obtained.
  • the address mapping unit maps the vector address to the physical address of the memory, including:
  • the address mapping unit sequentially selects n groups of cells corresponding to n rows of data
  • the write data reorganization unit writes the reorganized data into the physical address of the memory, including:
  • n rows of data are written into the n groups of cells in sequence.
  • the memory includes 16 banks
  • each bank includes 5 cells
  • each cell can store 4 bytes of 32 bits.
  • the vector address of each group of data includes 16 addresses
  • the first addressing unit can write a group of 16 data to the memory each time.
  • the processor core needs to write a set of data labeled "1", “2”, “3”, and “4", and data labeled "1", “2", “3”, “4"
  • the base address is 0,
  • the offset address is [0, 1, 2, 3, 68, 69, 70, 71, 136, 137, 138, 139, 204, 205, 206, 207], the output of the first address calculation unit
  • the vector address is [0, 1, 2, 3, 68, 69, 70, 71, 136, 137, 138, 139, 204, 205, 206, 207].
  • the vector address is directly input to the address strobe of the first conflict processing unit, and at the same time, the vector address buffer caches the vector address.
  • the address strobe selects the vector address, and the conflict judgment unit judges the vector address.
  • the vector address of this group of data corresponds to the same Bank of the memory, for example, the vector address [1] and [69] correspond to the first and second cells of B1, [2] , [69], [136] correspond to the first, second and third cells of B2, etc. Therefore, there is a Bank conflict in this vector address.
  • the conflict judgment unit generates a conflict flag valid signal.
  • the address strobe gates the vector address output by the vector address buffer, and the write data strobe gates the data output from the write data buffer. . In this way, during the conflict processing period, the vector address can be kept unchanged until the conflict processing ends.
  • the write data reorganization unit reorganizes the data and compiles the bytes of the first cell corresponding to the vector address corresponding to B0-Bf into one row.
  • the data corresponding to the vector address [0, 1, 2, 3] corresponds to the first cell of B0-B3
  • the data corresponding to the vector address [71] corresponds to the second cell of B4
  • the vector address [139] The corresponding data corresponds to the third ell of B5
  • the data corresponding to the vector address [207] corresponds to the fourth cell of B6. Therefore, the first row of data is the data corresponding to the vector address [0, 1, 2, 3, 71, 139, 207], that is, the 7 data labeled "1".
  • the bytes of the second cell corresponding to the vector address corresponding to B0-Bf are compiled into a row.
  • the data corresponding to the vector addresses [68,69,70], [138][206] correspond to the second cell of B1-B3, the third cell of B4, and the fourth of B5.
  • the cell corresponds. Therefore, the second row of data is the data corresponding to the vector address [0, 1, 2, 3, 71, 139, 207], that is, the 7 data labeled "2".
  • the address mapping unit sequentially selects the n groups of cells corresponding to the n rows of data.
  • n can be other values, which depend on the vector address itself.
  • the write data reorganization unit sequentially writes the 4 rows of data into the memory according to the strobe sequence of the 4 groups of cells, and the written data is shown in FIG. 19.
  • data stored in a group of cells can be written in one clock cycle, and data writing can be completed in four clock cycles.
  • first to fourth groups of cells can be sequentially strobed (as shown in FIG. 19), or the fourth group to the first group can be sequentially strobed in reverse order.
  • Group cells, four groups of cells can also be selected in sequence at random.
  • the conflict processing of the group of data ends, and the conflict judgment unit generates a conflict flag failure signal.
  • the address selector strobes the vector address of the next set of data sent by the first address calculation unit, the vector address buffer buffers the vector address of the next set of data at the same time, and the write data strobe strobes the first set of data.
  • the data processing unit sends the next set of data and buffers the next set of data to the write data buffer, and the conflict judgment unit continues to process the bank conflict of the next set of data.
  • the conflict judgment unit judges the vector address. When there is no bank conflict in the vector address, the conflict judgment unit generates a conflict flag failure signal; the address mapping unit maps the vector address to the physical address of the memory; write data reorganization The unit writes data to the physical address; in response to the conflict flag failure signal, the address selector selects the vector address of the next set of data.
  • the first addressing unit is set in the processor, and the acquisition of the base address and the offset address and the calculation of the vector address are all completed by the first addressing unit.
  • the first addressing unit uses the conflict resolution mechanism to resolve the bank conflict, and the processor core does not need to deal with the bank conflict.
  • the processor core can still perform other operations without waiting for the resolution of the Bank conflict. Therefore, the addressing method of this embodiment can significantly improve the efficiency of the processor and increase the computing speed of the processor.
  • the first control unit of the first addressing unit also communicates with the processor core through the handshake protocol.
  • the processor core communicates with the first addressing unit through the system bus.
  • the system bus includes: clock signal line, write request valid, write request ready, write request, write data line, and write busy.
  • the address unit works under the drive of the clock signal line.
  • the processor core needs to write data to the memory, the processor core sends task instructions and data to the first addressing unit through a handshake protocol.
  • the write request valid signal is high, it means that the write request signal and write data are valid; when the write request valid signal and the write request ready signal are both high, the first control unit reads the write request from the processor core and the write is busy The signal is pulled high. After that, the first control unit controls the first address calculation unit, the first conflict processing unit, the first data processing unit, and the first data transceiver unit to write data to the memory, and the write busy signal is pulled low.
  • the first control unit controls the first address calculation unit, the first conflict processing unit, the first data processing unit, and the first data transceiving unit to work in a pipeline manner.
  • the processor core sends a write request through the system bus. If the write request is a table lookup request, the selector strobes the synchronization register.
  • the read and write request cache receives the write request and caches the write request. After receiving the write request, the read and write request buffer sends an offset address request to the second addressing unit through the internal bus, and sends the write request to the synchronization register.
  • the second addressing unit reads the offset address of the data in the memory from the memory, sends the offset address to the first address calculation unit via the internal bus, and synchronizes the data via the internal bus. Send an offset address valid signal.
  • the synchronization register After receiving the offset address valid signal, the synchronization register sends the write request to the pipeline controllers at all levels to start the pipeline operation.
  • the first-level and second-level controllers respectively send control signals to the first address calculation, the first data processing unit, and the first conflict processing unit.
  • the first data transceiver unit and the read The write request cache is located in the same stage, the first address calculation unit and the first data processing unit are located in the first stage of the pipeline, and the first conflict processing unit is located in the second stage of the pipeline. If the write request is not a table lookup request, the selector strobes the read and write request cache, sends the write request directly to the pipeline controllers at all levels, and starts the pipeline operation.
  • a pipeline suspend mechanism is also provided in the write operation.
  • the processor core cannot send data to the first data transceiver unit through the system bus, the processor core sends a bus suspend signal to the read request cache, the synchronization register, and the first stage and second stage controllers of the pipeline through the system bus. After the read request buffer, synchronization register, and the first and second stage controllers of the pipeline receive the bus pause signal, the pipeline suspends work.
  • the first conflict processing unit finds that there is a bank conflict in the vector address, the first conflict processing unit sends a conflict pause signal to the read request cache, the synchronization register, the first and second stage controllers of the pipeline, and the read request cache, After the synchronization register and the first stage and second stage controllers of the pipeline receive the bus pause signal, the pipeline suspends work. After the bank conflict is processed, the first conflict processing unit sends a conflict recovery signal, and the read request buffer, synchronization register, and the first and second stage controllers of the pipeline restart the pipeline after receiving the conflict recovery signal.
  • each module of the first addressing unit can be executed in parallel according to the pipeline, which can greatly improve the working efficiency of the first addressing unit, reduce the addressing time, and improve the addressing efficiency.
  • the processor core sends the offset address to the second addressing unit through the system bus, and the second addressing unit writes the offset address into the memory.
  • the operation of the second addressing unit to write the offset address to the memory is similar to the operation of the above-mentioned first addressing unit to write data to the memory.
  • the structure of the second addressing unit is the same as that of the first addressing unit.
  • the offset address is equivalent to the written data.
  • the second addressing unit can write the offset address into the memory by using the same operation as the operation of the first addressing unit to write data to the memory.
  • the addressing module includes multiple groups of addressing units, and each group of addressing units may be a group of addressing units of the previous embodiment.
  • Each group of addressing units includes: the same two addressing units.
  • the two addressing units communicate with the processor core through the system bus respectively.
  • An internal bus is set in the addressing module, and the two addressing units communicate through the internal bus.
  • one of the two addressing units is used to read and write data.
  • This addressing unit can be called a table addressing unit, and the other addressing unit is used for reading and writing data.
  • the other addressing unit may be called an offset addressing unit.
  • the addressing method of this embodiment can be executed in parallel by multiple groups of addressing units.
  • Each group of addressing units can communicate with the processor core through the system bus, and read and write the memory.
  • each group of addressing units can complete their respective addressing tasks independently. How many groups of addressing units are specifically included is not limited in this embodiment, and can be determined according to actual requirements. Compared with a single group of addressing units, this embodiment can double the addressing efficiency of the processor, which greatly improves the addressing ability of the processor.
  • Yet another embodiment of the present disclosure provides an addressing method for a processor.
  • a group of addressing units obtains the base address or the offset address through the ping-pong addressing mode.
  • a group of addressing units includes: a third addressing unit, a fourth addressing unit, and a fifth addressing unit.
  • Ping-pong addressing modes include:
  • the processor core alternately writes the offset address into the memory through the fourth addressing unit and the fifth addressing unit;
  • the third addressing unit obtains the base address sent by the processor core, and alternately obtains the offset address stored in the memory through the fourth addressing unit and the fifth addressing unit.
  • the third addressing unit is used as a table addressing unit, and the fourth and fifth addressing units are used as an offset addressing unit.
  • the processor core reads and writes multiple sets of data
  • the next set of offset addresses can be written into the memory through the fifth addressing unit
  • the third addressing unit sends an offset address request to the fourth addressing unit
  • the fourth seeks After receiving the offset address request, the addressing unit reads the last set of offset addresses from the memory, and sends the last set of offset addresses to the third addressing unit. After that, the roles of the fourth addressing unit and the fifth addressing unit are exchanged.
  • the processor core writes the next set of offset addresses into the memory through the fourth addressing unit.
  • the third addressing unit sends an offset address request to the fifth addressing unit, and the fifth addressing unit receives the offset address.
  • the next set of offset addresses are read from the memory, and the next set of offset addresses are sent to the third addressing unit.
  • the third addressing unit alternately obtains the offset address from the fourth addressing unit and the fifth addressing unit, so as to realize the ping-pong addressing of the offset address.
  • a group of addressing units includes: a sixth addressing unit, a seventh addressing unit, and an eighth addressing unit.
  • Ping-pong addressing modes include:
  • the processor core writes the offset address into the memory through the eighth addressing unit
  • the sixth addressing unit and the seventh addressing unit alternately obtain the base address sent by the processor core, and obtain the offset address stored in the memory through the eighth addressing unit.
  • the sixth addressing unit and the seventh addressing unit are used as a table addressing unit, and the eighth addressing unit is used as an offset addressing unit.
  • the processor core reads and writes multiple sets of data, while sending the next base address to the seventh addressing unit, the sixth addressing unit sends an offset address request to the eighth addressing unit, and the eighth addressing unit receives After the offset address is requested, the offset address is read from the memory, and the offset address is sent to the sixth addressing unit. After that, the roles of the sixth addressing unit and the seventh addressing unit are exchanged.
  • the processor core sends the next base address to the sixth addressing unit.
  • the seventh addressing unit sends an offset address request to the eighth addressing unit.
  • the The memory reads the offset address and sends the offset address to the seventh addressing unit.
  • the sixth addressing unit and the seventh addressing unit alternately obtain the offset address from the eighth addressing unit to realize the ping-pong addressing of the base address.
  • three or more addressing units can execute the write and read operations of the base address and the offset address in parallel, which improves the addressing ability of the processor, especially in large In scale look-up table addressing, addressing efficiency can be greatly improved.
  • the processor includes: a processor core, an addressing module, and a memory.
  • the addressing module can be integrated inside the processor.
  • the addressing module is used to obtain the base address and offset address of the data in the memory, and obtain the storage address of the data in the memory according to the base address and the offset address;
  • the processor core can access the data at the storage address of the memory through the addressing module.
  • the addressing module may include one or more groups of addressing units. For each group of addressing units, as shown in Figure 3, the same two addressing units are included.
  • the two addressing units communicate with the processor core through the system bus respectively.
  • An internal bus is set in the addressing module, and the two addressing units communicate through the internal bus.
  • One of the two addressing units is used to read and write data.
  • This addressing unit can be called a table addressing unit, and the other addressing unit is used to perform the offset address of the data in the memory. For reading and writing, the other addressing unit can be called an offset addressing unit.
  • the two addressing units are referred to as the first addressing unit and the second addressing unit respectively, and the first addressing unit is used as the table addressing unit, and the second addressing unit is used as the offset addressing unit.
  • the first addressing unit is used as the table addressing unit
  • the second addressing unit is used as the offset addressing unit.
  • an addressing module is set in the processor, and the storage address of the data in the memory is calculated through the addressing module instead of the processor core.
  • the addressing operation is completed by the addressing module, and the storage address calculation process does not need to be processed.
  • the processor core participates, but the addressing module calculates the storage address, which improves the efficiency of table look-up addressing compared with ordinary processors.
  • the first addressing unit is used to obtain the base address and offset address of the data in the memory.
  • the first addressing unit includes: a first address calculation unit, a first conflict processing unit, a first data processing unit, a first data transceiving unit, and a first control unit.
  • the first control unit can communicate with the processor core through the system bus, and is used to control the operations of the first address calculation unit, the first conflict processing unit, the first data processing unit, and the first data transceiving unit.
  • the processor core can send a read request to the first control unit via the system bus; in response to the read request, the first control unit can send a bias to the second addressing unit via the internal bus.
  • Move address request In response to the offset address request, the second addressing unit can read the offset address of the data in the memory from the memory, and send the offset address to the first address calculation unit via the internal bus, and can use the internal bus Send an offset address valid signal to the first control unit.
  • the first control unit can start each other unit of the first addressing unit to perform a read operation.
  • the first address calculation module is configured to receive the base address of the data sent by the processor core and the offset address sent by the second addressing unit, and obtain the vector address of the data from the base address and the offset address.
  • the first address calculation unit includes: a base address selector, an offset address selector, and an adder.
  • the base address selector is used to select the base address sent by the processor core.
  • the offset address selector is used to select the offset address sent by the second addressing unit through the internal bus.
  • the number of offset addresses corresponds to the number of banks in the memory. When the memory includes N banks, the number of offset addresses is N.
  • the first address calculation unit is also used to obtain the storage address of the data in the memory according to the base address and the offset address.
  • the adder of the first address calculation unit is used to respectively sum the base address and the N offset addresses to obtain the storage address of the data, and the storage address is a vector address including 16 addresses.
  • the processor core After obtaining the vector address, the processor core reads the data stored at the vector address from the memory through the first addressing unit. When the vector address has a bank conflict, the processor core can read the data of the vector address through the first addressing unit.
  • the first conflict processing unit is used to determine whether there is a Bank conflict. When there is a Bank conflict, the first conflict processing unit can use a conflict resolution mechanism to read the data from the vector address and send the data to the The first data processing unit; the first data processing unit is used to process the data and send the processed data to the first data transceiving unit; the first data transceiving unit is used to send the processed data The data is sent to the processor core.
  • the first conflict processing unit includes: a conflict judgment unit, an address mapping unit, an address strobe, and a read data recombination unit.
  • the vector address sent by the first address calculation unit is directly input to the address strobe, and the vector address buffer is used to buffer the vector address.
  • the address strobe is used to strobe the vector address directly sent by the first address calculation unit, so that the vector address is output to the conflict judgment unit.
  • the conflict judgment unit is used to judge the vector address:
  • the conflict judging unit When there is a bank conflict in the vector address, the conflict judging unit is used to generate a conflict flag valid signal and feed back the conflict flag valid signal to the address strobe, so that the address strobe strobes the vector address output by the vector address register.
  • the vector address of the input conflict judgment unit can be kept unchanged.
  • the address mapping unit is used to map the vector address to the physical address of the memory.
  • the read data reorganization unit is used to read the data of the physical address, reorganize the data, and send the reorganized data to the first data processing unit.
  • the conflict judgment unit is used to generate a conflict flag invalidation signal.
  • the address selector is used to gate the vector address of the next data sent by the first address calculation unit, and the vector address buffer is used to buffer the next data at the same time.
  • the conflict judgment unit is used to continue processing the bank conflict of the next data.
  • the address mapping unit is used to group the first cell corresponding to the vector address of each bank into a group, and the second cell corresponding to the vector address into a group, and so on, until the nth cell corresponding to the vector address is grouped.
  • the cells are grouped into a group to obtain a total of n groups of cells, and the n groups of cells in the memory are sequentially strobed.
  • the read data reorganization unit can read the vector address data and reorganize the data in the following ways:
  • the gating sequence of the n groups of cells sequentially read the data stored in the n groups of cells, and rearrange the data stored in the n groups of cells in the order of address from small to large, to obtain the reorganized data.
  • the data is arranged according to its actual storage location in the memory, and the data reorganization unit sends the reorganized data to the first data processing unit for use by the processor core.
  • the conflict processing of the group of data ends, and the conflict judgment unit is used to generate a conflict flag failure signal.
  • the address selector is used to select the vector address of the next set of data sent by the first address calculation unit, the vector address buffer is used to buffer the vector address of the next set of data at the same time, and the conflict judgment unit is used to continue to check the vector address of the next set of data. Bank conflicts of the next set of data are processed.
  • the conflict judgment unit is used to judge the vector address. When there is no bank conflict in the vector address, the conflict judgment unit is used to generate a conflict flag failure signal.
  • the address mapping unit is used to map the vector address to the physical address of the memory.
  • the read data reorganization unit is used to read the data of the physical address, without reorganizing the read data, and directly sends the read data to the first data processing unit. Since there is no bank conflict, the data can be read in one clock cycle.
  • the address selector is used to select the vector address of the next set of data.
  • each address of the vector address corresponds to different banks of the memory as no Bank conflict
  • the absence of Bank conflict described in this embodiment also includes the following situations:
  • the vector addresses are equally divided into m groups or 2 ⁇ m groups. If each group of addresses corresponds to a cell of a bank, it is considered that there is no bank conflict in the vector address, and the first conflict processing unit is used to perform address splicing on the vector address.
  • the first data processing unit After the first data processing unit receives the data sent by the first conflict processing unit, it decides whether to perform further processing on the data according to the data width required by the processor core. When the data required by the processor core is not all the bytes read from each cell of each Bank, but a partial byte of each cell of each Bank, the first data processing unit is used to send to the first conflict processing unit Part of the bytes of the data are spliced to generate data required by the processor core, and the spliced data is sent to the first data transceiver unit.
  • the first conflict processing unit when the bit width of the Bank is m bytes, and what the processor core needs is k bytes out of every m bytes of the data, the first The data processing unit is used to select the k bytes from every m bytes to obtain N ⁇ k bytes; k ⁇ log 2 m ; combine every m bytes of N ⁇ k bytes together , Get the data of m ⁇ k block, each block width is m bytes.
  • the first data processing unit When the data required by the processor core is all the bytes read from each cell of each Bank, the first data processing unit does not need to splice the data sent by the first conflict processing unit, but directly sends the data to the first data Transceiver unit.
  • the first data processing unit combines every 4 bytes of the selected 32 bytes to obtain 8 blocks of data with a width of 4 bytes each, which is the data required by the processor core , And send the spliced data to the first data transceiver unit.
  • the first data transceiver unit includes: a receiving buffer and a sending buffer.
  • the sending buffer buffers the data sent by the first data processing unit, and sends the buffered data to the processor core through the system bus.
  • the depth of the receiving buffer and the transmitting buffer can be set according to actual needs, where the minimum depth of the receiving buffer is 2 and the minimum depth of the transmitting buffer is 0.
  • the first addressing unit is set in the processor, and the acquisition of the base address and the offset address and the calculation of the vector address are all completed by the first addressing unit.
  • the first addressing unit uses the conflict resolution mechanism to resolve the bank conflict, and the processor core does not need to deal with the bank conflict.
  • the processor core can still perform other operations without waiting for the resolution of the Bank conflict. Therefore, the addressing method of this embodiment can significantly improve the efficiency of the processor and increase the computing speed of the processor.
  • the first addressing unit can obtain base addresses and offset addresses of multiple sets of data based on multiple different modes.
  • One mode can be called the offset address update mode.
  • the offset address update mode the base address of multiple groups of data is unchanged, and the offset address of each group of data comes from the second addressing unit.
  • the first address calculation unit is used to obtain the base address sent by the processor core; the second addressing unit is used to sequentially read the offset address of each group of data in the memory; the first address calculation unit is used to obtain the reading of the second addressing unit The offset address taken.
  • the base address selector when reading multiple sets of data, in the offset address update mode, is used to select the base address sent by the processor core.
  • the offset address selector Whenever a group of data is read, the offset address selector is used to select the offset address of the group of data sent by the second addressing unit through the internal bus.
  • the adder is used to sum the base address and the offset address of the group of data to obtain the vector address of the group of data.
  • the adder is used to send the vector address of the group of data to the first conflict resolution unit, and send the data to the processor core through the first conflict resolution unit, the first data processing unit, and the first data transceiver unit to complete the set of data Read.
  • the offset address selector selects the offset address of the group of data sent by the second addressing unit through the internal bus, so as to realize the reading of multiple groups of data.
  • the other mode can be called the base address update mode.
  • the base address update mode the offset address of multiple groups of data comes from the second addressing unit, and the offset address of each group of data is the same offset address. By updating the initial value of the base address, the offset address of each group of data is obtained. Base address.
  • the processor of this embodiment also provides a fixed offset address mode.
  • the first address calculation unit is used to obtain the base address sent by the processor core, and the base address selector is used to gate the base address sent by the processor core and change the base address Send to the adder.
  • the processor core is also used to send a fixed offset address to the first addressing unit, and the offset address selector of the first address calculation unit is used to select the fixed offset address and send the fixed offset address to the adder .
  • the adder of the first address calculation unit is used to add the base address and the offset address to obtain the vector address.
  • the fixed offset address mode can be used in multiple addressing scenarios such as linear addressing and step addressing.
  • the processor of this embodiment provides an offset address update mode, a base address update mode, and a fixed offset address mode, which can be flexibly selected according to actual conditions, which improves the flexibility of table look-up addressing.
  • the processor of this embodiment has task-driven addressing capabilities.
  • the processor core executes the operation of reading data
  • the processor core generates a set of instructions to read data, which is equivalent to a task instruction, and sends the task instruction to the first addressing unit through the system bus, and the entire addressing The process is completed by the first addressing unit.
  • the data read from the memory by the first addressing unit is sent to the processor core via the system bus.
  • the processor core then performs subsequent operations after receiving the data. It can be seen that in this task-driven addressing of this embodiment, when the processor core needs to read data from the memory, the task instruction can be sent to the first addressing unit, and the processor core does not need to care about the specific addressing process. Even if the bank conflict occurs, it is handled by the first addressing unit. Compared with the general processor, the operation of the processor core is simplified and the efficiency is improved.
  • the first control unit of the first addressing unit can communicate with the processor core through a handshake protocol.
  • the processor core communicates with the first addressing unit through the system bus.
  • the system bus includes: clock signal line, read request valid, read request ready, read request, read data valid, read data ready and read
  • the first addressing unit works under the drive of the clock signal line.
  • the processor core needs to read data from the memory, the processor core sends task instructions to the first addressing unit and receives data from the first addressing unit through the handshake protocol.
  • the read request valid signal is high, it indicates that the read request signal is valid; when the read request valid signal and the read request ready signal are both high, the first control unit reads the read request from the processor core.
  • the first control unit controls the first address calculation unit, the first conflict processing unit, the first data processing unit, and the first data transceiving unit to read data from the memory.
  • the read data valid signal is high, it indicates that the read data is valid; when the read data valid signal and the read data ready signal are both high, the first control unit controls the first data transceiver unit to send data to the processor core.
  • the first control unit controls the first address calculation unit, the first conflict processing unit, the first data processing unit, and the first data transceiving unit to work in a pipeline manner.
  • the first control unit includes: read and write request buffers, synchronization registers, selectors, and the first, second, third, fourth, and fifth stages of the pipeline.
  • the processor core sends a read request through the system bus. If the read request is a table lookup request, the selector is used to gate the synchronization register.
  • the read and write request cache is used to receive the read request and cache the read request. After receiving the read request, the read and write request buffer is used to send an offset address request to the second addressing unit through the internal bus, and send the read request to the synchronization register.
  • the second addressing unit is used to read the offset address of the data in the memory from the memory, and send the offset address to the first address calculation unit through the internal bus, and through the internal bus Send an offset address valid signal to the synchronization.
  • the synchronization register After receiving the offset address valid signal, the synchronization register is used to send the read request signal to the pipeline controllers at all levels to start the pipeline operation.
  • the first-level, second-level, third-level, fourth-level, and fifth-level controllers can respectively send control signals to the first address calculation, the first conflict processing unit, the first data processing unit, and the first data transceiving unit, In the addressing process of the first addressing unit, the first address calculation unit is located in the first stage of the pipeline, the first conflict processing unit is located in the second stage of the pipeline, and the first data processing unit is located in the third and fourth stages of the pipeline. Stage, the first data transceiver unit is located at the fifth stage of the pipeline. If the read request is not a table lookup request, the selector is used to strobe the read and write request cache, send the read request directly to the pipeline controllers at all levels, and start the pipeline operation.
  • the first addressing unit of this embodiment also provides a streamline pause mechanism.
  • the processor core cannot receive the data sent by the first data transceiver unit through the system bus, the processor core can read the request cache, synchronization register, the first stage, second stage, and third stage of the pipeline through the system bus.
  • the fourth and fifth level controllers send a bus pause signal.
  • the pipeline suspends work.
  • the first conflict processing unit finds that there is a bank conflict in the vector address, the first conflict processing unit is used to send the read request cache, synchronization register, the first stage, the second stage, the third stage, the fourth stage, and the fifth stage of the pipeline.
  • the level controller sends a conflict pause signal, and reads the request buffer, synchronization register, the first, second, third, fourth, and fifth stage of the pipeline after receiving the bus pause signal, and the pipeline pauses Work.
  • the first conflict processing unit sends a conflict recovery signal to read the request buffer, synchronization register, and the first, second, third, fourth, and fifth-level controllers of the pipeline. After the conflict recovery signal is reached, the pipeline is restarted.
  • each module of the first addressing unit can be executed in parallel according to the pipeline, which can greatly improve the working efficiency of the first addressing unit, reduce the addressing time, and improve the addressing efficiency.
  • the second addressing unit includes: a second control unit, a second address calculation unit, a second conflict processing unit, a second data processing unit, and a second data transceiving unit.
  • the second conflict processing unit is used for reading the offset address from the memory and sending the offset address to the second data processing unit; the second data processing unit is used for The offset address is processed, and the processed offset address is sent to the second data transceiving unit; the second data transceiving unit is used to send the processed offset address to the first addressing unit.
  • the difference between the second addressing unit and the first addressing unit is that the second data transceiving unit is used to send the processed offset address to the first addressing unit through the internal bus, instead of the first addressing unit. In that way, the first data transceiver unit is used to send data to the processor core through the system bus.
  • the operations of the second control unit, the second address calculation unit, the second conflict processing unit, the second data processing unit, and the second data transceiving unit are similar to those of the units corresponding to the first addressing unit.
  • part of the operation of the first addressing unit is similar to the read operation.
  • the processor core can also write the data into the vector address of the memory through the first addressing unit.
  • the first data transceiver unit is used to receive the data sent by the processor core and send the data to the first data processing unit; the first data processing unit is used to process the data and transfer the processed data Sent to the first conflict processing unit; the first conflict processing unit writes the data into the vector address using the conflict resolution mechanism.
  • the receiving buffer is used to receive and buffer the data sent by the processor core through the system bus, and send the data to the first data processing unit.
  • the first data processing unit is configured to, after receiving the data sent by the first data transceiving unit, determine whether to perform further processing on the data according to the data width written by the processor core.
  • the processor core does not write data into all the bytes of each cell of each Bank, but writes data into the partial bytes of each cell of each Bank
  • the first data processing unit is used to send to the first data transceiver unit Split the data to generate data that needs to be written into the memory, and send the split data to the first conflict processing unit.
  • the first data processing unit is used to split the m bytes of each block to obtain N ⁇ k words Section, so that each k bytes correspond to the k addresses of a Bank; among them, N is the number of banks in the memory; the bit width of the bank is m bytes; k ⁇ log 2 m .
  • the processor core wants to write data into all the bytes of each cell of each Bank, that is, the processor core needs to write 64 bytes to the memory
  • the first data processing unit does not need to send data to the first data transceiver unit. Splitting is performed, but the data is directly sent to the first conflict processing unit.
  • the first conflict processing unit further includes: a write data buffer, a write data strobe, and a write data reorganization unit.
  • the vector address sent by the first address calculation unit is directly input to the address strobe, and the vector address buffer is used to buffer the vector address.
  • the address strobe is used to strobe the vector address directly sent by the first address calculation unit, so that the vector address is output to the conflict judgment unit.
  • the data sent by the first data processing unit is sent to the write data strobe, and the write data buffer simultaneously buffers the data sent by the first data processing unit.
  • the conflict judgment unit is used to judge the vector address:
  • the conflict judgment unit When there is a bank conflict in the vector address, the conflict judgment unit is used to generate a conflict flag valid signal, and feedback the conflict flag valid signal to the address strobe and the write data strobe, so that the address strobe strobes the vector address register
  • the output vector address, the write data strobe is used to strobe the data output from the write data buffer. In this way, the vector address of the input conflict judgment unit and the data of the input write data reorganization unit can be kept unchanged during the conflict processing;
  • the write data reorganization unit is used to reorganize data
  • the address mapping unit is used to map the vector address to the physical address of the memory
  • the write data reorganization unit is used to write the reorganized data into the physical address of the memory.
  • the conflict judgment unit is used to generate a conflict flag invalidation signal.
  • the address selector is used to select the vector address of the next group of data sent by the first address calculation unit, and the vector address buffer caches the next group at the same time.
  • the vector address of the data is used to strobe the next set of data sent by the first data processing unit and buffer the next set of data to the write data buffer.
  • the conflict judgment unit continues to conflict with the next set of data. To process.
  • the data reorganization unit is used to reorganize data:
  • the data corresponding to the n-th cell is compiled into one row according to the order of address from small to large, and a total of n rows of data are obtained.
  • the address mapping unit is used to sequentially select n groups of cells corresponding to n rows of data
  • the write data reorganization unit is used to sequentially write n rows of data into the n groups of cells according to the gating sequence of the n groups of cells.
  • the conflict processing of the group of data ends, and the conflict judgment unit is used to generate a conflict flag failure signal.
  • the address selector is used to strobe the vector address of the next set of data sent by the first address calculation unit, the vector address buffer also buffers the vector address of the next set of data, and the write data strobe is used to strobe
  • the first data processing unit sends the next set of data and buffers the next set of data to the write data buffer, and the conflict judgment unit continues to process the bank conflict of the next set of data.
  • the conflict judgment unit is used to judge the vector address.
  • the conflict judgment unit is used to generate a conflict flag failure signal;
  • the address mapping unit is used to map the vector address to the memory Physical address;
  • the write data recombination unit is used to write data into the physical address; in response to the conflict flag failure signal, the address selector is used to select the vector address of the next set of data.
  • the first addressing unit is set in the processor, and the acquisition of the base address and the offset address and the calculation of the vector address are all completed by the first addressing unit.
  • the first addressing unit uses the conflict resolution mechanism to resolve the bank conflict, and the processor core does not need to deal with the bank conflict.
  • the processor core can still perform other operations without waiting for the resolution of the Bank conflict. Therefore, the addressing method of this embodiment can significantly improve the efficiency of the processor and increase the computing speed of the processor.
  • the processor core communicates with the first addressing unit through the system bus.
  • the system bus includes: clock signal line, write request valid, write request ready, write request, write data line, and write busy.
  • the address unit works under the drive of the clock signal line.
  • the processor core may send task instructions and data to the first addressing unit through a handshake protocol.
  • the write request valid signal is high, it means that the write request signal and the write data are valid; when the write request valid signal and the write request ready signal are both high, the first control unit can read the write request from the processor core and write The busy signal is pulled high. After that, the first control unit is used to control the first address calculation unit, the first conflict processing unit, the first data processing unit and the first data transceiver unit to write data to the memory, and the write busy signal is pulled low.
  • the first control unit is used to control the first address calculation unit, the first conflict processing unit, the first data processing unit, and the first data transceiving unit to work in a pipeline manner.
  • the processor core can send a write request through the system bus. If the write request is a table lookup request, the selector is used to gate the synchronization register.
  • the read and write request cache receives the write request and caches the write request. After receiving the write request, the read and write request buffer can send an offset address request to the second addressing unit through the internal bus, and send the write request to the synchronization register.
  • the second addressing unit is used to read the offset address of the data in the memory from the memory, and send the offset address to the first address calculation unit through the internal bus, and through the internal bus Send an offset address valid signal to the synchronization.
  • the synchronization register After receiving the offset address valid signal, the synchronization register is used to send the write request to the pipeline controllers at all levels to start the pipeline operation.
  • the first-level and second-level controllers are used to respectively send control signals to the first address calculation, the first data processing unit, and the first conflict processing unit.
  • the first data transceiver unit and the The read and write request cache is located in the same stage, the first address calculation unit and the first data processing unit are located in the first stage of the pipeline, and the first conflict processing unit is located in the second stage of the pipeline. If the write request is not a table lookup request, the selector strobes the read and write request cache, sends the write request directly to the pipeline controllers at all levels, and starts the pipeline operation.
  • a pipeline suspend mechanism is also provided in the write operation.
  • the processor core cannot send data to the first data transceiver unit through the system bus, the processor core can send a bus suspend signal to the read request cache, synchronization register, and the first-level and second-level controllers of the pipeline through the system bus. . After the read request buffer, synchronization register, and the first and second stage controllers of the pipeline receive the bus pause signal, the pipeline suspends work.
  • the first conflict processing unit finds that there is a bank conflict in the vector address, the first conflict processing unit is used to send a conflict pause signal to the read request cache, the synchronization register, the first and second stage controllers of the pipeline, and the read request After the first and second stage controllers of the buffer, synchronization register, and pipeline receive the bus suspend signal, the pipeline suspends work.
  • the first conflict processing unit is used to send a conflict recovery signal, and the read request buffer, synchronization register, and the first and second stage controllers of the pipeline restart the pipeline after receiving the conflict recovery signal.
  • each module of the first addressing unit can be executed in parallel according to the pipeline, which can greatly improve the working efficiency of the first addressing unit, reduce the addressing time, and improve the addressing efficiency.
  • the processor core may send the offset address to the second addressing unit through the system bus, and the second addressing unit is used to write the offset address into the memory.
  • the operation of the second addressing unit to write the offset address to the memory is similar to the operation of the above-mentioned first addressing unit to write data to the memory.
  • the addressing module of this embodiment may include multiple groups of addressing units.
  • the processor of this embodiment can perform data read and write operations in parallel by multiple groups of addressing units.
  • Each group of addressing units can communicate with the processor core through the system bus, and read and write the memory.
  • each group of addressing units can complete their respective addressing tasks independently. How many groups of addressing units are specifically included is not limited in this embodiment, and can be determined according to actual requirements. Compared with a single group of addressing units, this embodiment can double the addressing efficiency of the processor, which greatly improves the addressing ability of the processor.
  • a group of addressing units in this embodiment can obtain a base address or an offset address through a ping-pong addressing mode.
  • a group of addressing units includes: a third addressing unit, a fourth addressing unit, and a fifth addressing unit.
  • Ping-pong addressing modes include:
  • the processor core can alternately write the offset address into the memory through the fourth addressing unit and the fifth addressing unit;
  • the third addressing unit is used to obtain the base address sent by the processor core, and alternately obtain the offset address stored in the memory through the fourth addressing unit and the fifth addressing unit.
  • the third addressing unit is used as a table addressing unit, and the fourth and fifth addressing units are used as an offset addressing unit.
  • the processor core reads and writes multiple sets of data
  • the third addressing unit can send an offset address request to the fourth addressing unit
  • the fourth seeks After receiving the offset address request, the addressing unit can read the last set of offset addresses from the memory and send the last set of offset addresses to the third addressing unit. After that, the roles of the fourth addressing unit and the fifth addressing unit are exchanged.
  • the processor core can write the next set of offset addresses into the memory through the fourth addressing unit.
  • the third addressing unit can send an offset address request to the fifth addressing unit, and the fifth addressing unit receives the offset After the address request, the set of offset addresses can be read from the memory, and the set of offset addresses can be sent to the third addressing unit.
  • the third addressing unit alternately obtains the offset address from the fourth addressing unit and the fifth addressing unit, so as to realize the ping-pong addressing of the offset address.
  • a group of addressing units includes: a sixth addressing unit, a seventh addressing unit, and an eighth addressing unit.
  • Ping-pong addressing modes include:
  • the processor core can write the offset address into the memory through the eighth addressing unit;
  • the sixth addressing unit and the seventh addressing unit can alternately obtain the base address sent by the processor core, and can obtain the offset address stored in the memory through the eighth addressing unit.
  • the sixth addressing unit and the seventh addressing unit are used as a table addressing unit, and the eighth addressing unit is used as an offset addressing unit.
  • the processor core reads and writes multiple sets of data, while a base address can be sent to the seventh addressing unit, the sixth addressing unit can send an offset address request to the eighth addressing unit, and the eighth addressing unit receives After the offset address is requested, the offset address can be read from the memory and sent to the sixth addressing unit. After that, the roles of the sixth addressing unit and the seventh addressing unit are exchanged.
  • the processor core can send the next base address to the sixth addressing unit.
  • the seventh addressing unit can send an offset address request to the eighth addressing unit.
  • the offset address can be read from the memory and sent to the seventh addressing unit.
  • the sixth addressing unit and the seventh addressing unit alternately obtain the offset address from the eighth addressing unit to realize the ping-pong addressing of the base address.
  • three or more addressing units can execute the write and read operations of the base address and the offset address in parallel, which improves the addressing ability of the processor, especially in large In scale look-up table addressing, addressing efficiency can be greatly improved.
  • the movable platform includes a fuselage; the fuselage includes at least one circuit; and the circuit includes at least one processor of the above-mentioned embodiments.
  • the movable platform can be any movable vehicle or carrier, such as but not limited to: robots, drones, unmanned vehicles, unmanned ships, etc.
  • the body of the drone may have a shell.
  • the housing may be formed of a single integral piece, two integral pieces, or multiple parts.
  • the housing may include a single cavity or multiple cavities. For each cavity, one or more components can be placed in the cavity.
  • the component may be, for example, at least one circuit board, one or more sensors, one or more communication units, or any other type of component.
  • Each circuit board may include one or more processors of the foregoing embodiments, and the processors are used to perform functions such as flight control, navigation, and image processing.
  • the electronic device includes: a housing; the housing is provided with: at least one circuit; the circuit includes: at least one processor as described in the foregoing embodiment.
  • the electronic device of this embodiment may be a remote control, especially a remote control of a movable platform.
  • the electronic device can also be any portable or non-portable device, such as but not limited to: smart phone/mobile phone, tablet computer, personal digital assistant (PDA), laptop computer, desktop computer, media content player, video game station/system, Virtual reality systems, augmented reality systems, wearable devices (for example, watches, glasses, gloves, headwear), gesture recognition devices, microphones, equipment capable of providing or rendering image data, etc.
  • Another embodiment of the present disclosure also provides a computer-readable storage medium that stores executable instructions.
  • executable instructions When the executable instructions are executed by one or more processors, one or more processors can execute the foregoing implementation.
  • Example addressing method When the executable instructions are executed by one or more processors, one or more processors can execute the foregoing implementation. Example addressing method.

Abstract

An addressing method for a processor, a processor, a movable platform, and an electronic device. The processor comprises: a processor core, an addressing module, and a memory. The addressing method comprises: the addressing module obtains a base address and an offset address of data in the memory; the addressing module obtains a storage address of the data in the memory according to the base address and the offset address; and the processor core accesses the data at the storage address by means of the addressing module.

Description

处理器的寻址方法、处理器、可移动平台和电子设备Addressing method of processor, processor, movable platform and electronic equipment 技术领域Technical field
本公开涉及数据处理领域,尤其涉及一种处理器的寻址方法、处理器、可移动平台和电子设备。The present disclosure relates to the field of data processing, and in particular to an addressing method of a processor, a processor, a movable platform, and an electronic device.
背景技术Background technique
当处理器进行数据处理时,需要对存储器进行寻址,以从存储器读取数据或将数据写入存储器。对于图像处理、数字信号处理的一些算法,数据在存储器中的存储地址往往无规律可循,或者规律过于复杂多变,因此处理器通常采用查表寻址的方式访问存储器。When the processor performs data processing, the memory needs to be addressed to read data from the memory or write data to the memory. For some algorithms of image processing and digital signal processing, the storage address of the data in the memory is often irregular, or the rules are too complex and changeable, so the processor usually accesses the memory by means of look-up table addressing.
发明内容Summary of the invention
本公开提供了一种处理器的寻址方法,所述处理器包括:处理器核、寻址模块和存储器;所述寻址方法包括:The present disclosure provides an addressing method for a processor, the processor includes: a processor core, an addressing module, and a memory; the addressing method includes:
所述寻址模块获取数据在所述存储器的基地址以及偏移地址;The addressing module obtains the base address and the offset address of the data in the memory;
所述寻址模块根据所述基地址和所述偏移地址得到所述数据在所述存储器的存储地址;以及The addressing module obtains the storage address of the data in the memory according to the base address and the offset address; and
所述处理器核通过所述寻址模块访问所述存储地址的所述数据。The processor core accesses the data of the storage address through the addressing module.
本公开还提供了一种处理器,所述处理器包括:处理器核、寻址模块和存储器;The present disclosure also provides a processor, which includes: a processor core, an addressing module, and a memory;
所述寻址模块用于获取数据在所述存储器的基地址以及偏移地址,并根据所述基地址和所述偏移地址得到所述数据在所述存储器的存储地址;The addressing module is configured to obtain the base address and the offset address of the data in the memory, and obtain the storage address of the data in the memory according to the base address and the offset address;
所述处理器核可通过所述寻址模块访问所述存储器在所述存储地址的所述数据。The processor core may access the data of the memory at the storage address through the addressing module.
本公开还提供了一种计算机可读存储介质,包括指令,当其在计算机上运行时,使得计算机执行上述寻址方法。The present disclosure also provides a computer-readable storage medium, including instructions, which when run on a computer, cause the computer to execute the above-mentioned addressing method.
本公开还提供了一种可移动平台,所述可移动平台包括:机身;所述机身包括:至少一个电路;所述电路包括:至少一个上述处理器。The present disclosure also provides a movable platform. The movable platform includes a fuselage; the fuselage includes at least one circuit; and the circuit includes at least one processor as described above.
本公开还提供了一种电子设备,所述电子设备包括:壳体;所述壳体内设有:至少一个电路;所述电路包括:至少一个上述的处理器。The present disclosure also provides an electronic device, the electronic device includes: a housing; the housing is provided with: at least one circuit; the circuit includes: at least one processor as described above.
本公开还提供一种包括指令的计算机程序产品,当所述指令在计算机上运行时,使得计算机执行上述寻址方法。The present disclosure also provides a computer program product including instructions, which, when the instructions run on a computer, cause the computer to execute the addressing method described above.
附图说明Description of the drawings
图1为本公开实施例处理器的寻址方法流程图。FIG. 1 is a flowchart of an addressing method of a processor according to an embodiment of the disclosure.
图2为本公开实施例处理器的结构示意图。Fig. 2 is a schematic structural diagram of a processor according to an embodiment of the disclosure.
图3为本公开实施例寻址模块的结构示意图。FIG. 3 is a schematic structural diagram of an addressing module according to an embodiment of the disclosure.
图4为显示了本公开实施例第一寻址单元和第二寻址单元的读操作数据流。FIG. 4 shows the data flow of the read operation of the first addressing unit and the second addressing unit of the embodiment of the present disclosure.
图5为本公开实施例第一地址计算的结构示意图。FIG. 5 is a schematic diagram of the structure of the first address calculation in an embodiment of the disclosure.
图6为本公开实施例在读操作中,处理器核通过第一寻址单元访问存储地址的数据的流程图。FIG. 6 is a flowchart of a processor core accessing data of a storage address through a first addressing unit during a read operation in an embodiment of the disclosure.
图7为本公开实施例第一冲突处理单元的结构示意图。FIG. 7 is a schematic structural diagram of a first conflict processing unit according to an embodiment of the disclosure.
图8显示了本公开实施例冲突解决机制在读操作中的数据处理过程。FIG. 8 shows the data processing process of the conflict resolution mechanism in the read operation of the embodiment of the present disclosure.
图9显示了本公开实施例不存在存储块冲突的数据处理过程。FIG. 9 shows the data processing process in the embodiment of the present disclosure in which there is no storage block conflict.
图10显示了本公开实施例数据拼接的数据处理过程。FIG. 10 shows the data processing process of data splicing in the embodiment of the present disclosure.
图11显示了本公开实施例基地址更新模式的数据处理过程。FIG. 11 shows the data processing process of the base address update mode of the embodiment of the present disclosure.
图12为本公开实施例基地址更新单元的结构示意图。FIG. 12 is a schematic structural diagram of a base address update unit according to an embodiment of the disclosure.
图13显示了本公开实施例读操作的握手协议的信号时序图。FIG. 13 shows a signal timing diagram of the handshake protocol of a read operation according to an embodiment of the present disclosure.
图14为本公开实施例第一寻址单元的另一结构示意图。FIG. 14 is a schematic diagram of another structure of the first addressing unit according to an embodiment of the disclosure.
图15为显示了本公开实施例第一寻址单元和第二寻址单元的写操作数据流。FIG. 15 shows the data flow of the write operation of the first addressing unit and the second addressing unit of the embodiment of the present disclosure.
图16为本公开实施例写操作中,处理器核通过第一寻址单元访问存储地址的数据的流程图。FIG. 16 is a flowchart of a processor core accessing data of a storage address through a first addressing unit in a write operation in an embodiment of the disclosure.
图17显示了本公开实施例数据拆分的数据处理过程。Figure 17 shows the data processing process of data splitting in an embodiment of the present disclosure.
图18为本公开实施例第一冲突处理单元的另一结构示意图。FIG. 18 is a schematic diagram of another structure of the first conflict processing unit according to an embodiment of the disclosure.
图19显示了本公开实施例冲突解决机制在写操作中的数据处理过程。FIG. 19 shows the data processing process of the conflict resolution mechanism in the write operation of the embodiment of the present disclosure.
图20显示了本公开实施例写操作的握手协议的信号时序图。FIG. 20 shows a signal timing diagram of the handshake protocol of a write operation in an embodiment of the present disclosure.
图21为本公开实施例第一寻址单元的又一结构示意图。FIG. 21 is a schematic diagram of another structure of the first addressing unit according to an embodiment of the disclosure.
图22为本公开实施例寻址模块的另一结构示意图。FIG. 22 is a schematic diagram of another structure of an addressing module according to an embodiment of the disclosure.
图23为本公开实施例乒乓寻址方式下的寻址模块的结构示意图。FIG. 23 is a schematic structural diagram of an addressing module in a ping-pong addressing mode according to an embodiment of the disclosure.
图24为本公开实施例乒乓寻址方式下的寻址模块的另一结构示意图。FIG. 24 is a schematic diagram of another structure of an addressing module in a ping-pong addressing mode according to an embodiment of the disclosure.
图25为本公开实施例可移动平台的结构示意图。FIG. 25 is a schematic structural diagram of a movable platform according to an embodiment of the disclosure.
图26为本公开实施例电子设备的结构示意图。FIG. 26 is a schematic structural diagram of an electronic device according to an embodiment of the disclosure.
具体实施方式Detailed ways
在一些技术的寻址过程,是由处理器核读取寄存器中的基地址和偏移地址,并计算存储地址。整个寻址过程是由处理器核完成,占用处理器核的运算资源,查表效率低。在处理器核处理存储块冲突期间,无法执行其他操作,需要等待存储块冲突的解决,影响处理器的效率。另外,寻址方式单一,灵活性不足,不能提供多种灵活的寻址方式,大数据量读写的寻址效率低下。In the addressing process of some technologies, the processor core reads the base address and offset address in the register, and calculates the storage address. The entire addressing process is completed by the processor core, which occupies the computing resources of the processor core, and the table look-up efficiency is low. During the processing of the storage block conflict by the processor core, other operations cannot be performed, and it is necessary to wait for the resolution of the storage block conflict, which affects the efficiency of the processor. In addition, the addressing mode is single, the flexibility is insufficient, and multiple flexible addressing modes cannot be provided, and the addressing efficiency for reading and writing of large amounts of data is low.
本公开提供的处理器的寻址方法、处理器、计算机可读存储介质、可移动平台以及电子设备,可利用寻址模块实现处理器核对存储器的访问,即处理器核可通过寻址模块从存储器读取数据以及将数据写入存储器。The addressing method for the processor, the processor, the computer-readable storage medium, the removable platform, and the electronic device provided in the present disclosure can use the addressing module to realize the access of the processor core to the memory, that is, the processor core can access the memory through the addressing module. The memory reads data and writes data to the memory.
需要说明的是,本实施例的处理器可以是任何类型的具有数据处理能力的器件,例如但不限于中央处理器(CPU)、数字信号处理器(DSP)、专用集成电路(ASIC)、现场可编程门阵列(FPGA)、图形处理器(GPU)、微处理器、微控制器、网络处理器(NP)或其他可编程逻辑器件、分立栅极或晶体管逻辑器件、分立硬件组件。It should be noted that the processor in this embodiment can be any type of device with data processing capabilities, such as but not limited to central processing unit (CPU), digital signal processor (DSP), application specific integrated circuit (ASIC), field Programmable gate array (FPGA), graphics processing unit (GPU), microprocessor, microcontroller, network processor (NP) or other programmable logic devices, discrete gate or transistor logic devices, discrete hardware components.
处理器可以是单核处理器或多核处理器,包括一个或多个处理器核。处理器核可以包括算术逻辑单元(ALU)和/或控制逻辑。ALU可以执行算术运算和逻辑运算。控制逻辑用于控制ALU的一系列操作。例如,对于DSP来说,ALU可以包括乘加器(MAC,Multiply and ACumulate)和移位器。每个MAC包括一个乘法器和一个加法器,用于执行乘加的算术运算。移位器用于执行数据移位的逻辑运算。The processor may be a single-core processor or a multi-core processor, including one or more processor cores. The processor core may include an arithmetic logic unit (ALU) and/or control logic. ALU can perform arithmetic and logical operations. The control logic is used to control a series of operations of the ALU. For example, for the DSP, the ALU may include a multiply and ACumulate (MAC, Multiply and ACumulate) and a shifter. Each MAC includes a multiplier and an adder, which are used to perform arithmetic operations of multiplication and addition. The shifter is used to perform logic operations for shifting data.
本实施例中的存储器可以是各种随机存取存储器(Random Access Memory,RAM),例如,静态随机存取存储器(Static RAM,SRAM)、动态随机存取存储器(Dynamic RAM,DRAM)、同步动态随机存取存储器(Snchronous DRAM,SDRAM)、双倍数据速率同步动态随机存取存储器(Double Data Rate SDRAM,DDR SDRAM)、增强型同步动态随机存取存储器(Enhanced SDRAM,ESDRAM)、同步连接动态随机存取存储器(SynchLink DRAM,SLDRAM)和直接内存总线随机存取存储器(Direct Rambus RAM,DR RAM)。The memory in this embodiment may be various random access memories (Random Access Memory, RAM), for example, static random access memory (Static RAM, SRAM), dynamic random access memory (Dynamic RAM, DRAM), synchronous dynamic Random access memory (Snchronous DRAM, SDRAM), double data rate synchronous dynamic random access memory (Double Data Rate SDRAM, DDR SDRAM), enhanced synchronous dynamic random access memory (Enhanced SDRAM, ESDRAM), synchronous connection dynamic random access memory Access memory (SynchLink DRAM, SLDRAM) and direct memory bus random access memory (Direct Rambus RAM, DR RAM).
下面将结合实施例和实施例中的附图,对本公开技术方案进行清楚、完整的描述。The technical solutions of the present disclosure will be clearly and completely described below in conjunction with the embodiments and the drawings in the embodiments.
本公开一实施例提供了一种处理器的寻址方法,如图1所示,寻址方法包括:An embodiment of the present disclosure provides an addressing method for a processor. As shown in FIG. 1, the addressing method includes:
S101:寻址模块获取数据在存储器的基地址以及偏移地址;S101: The addressing module obtains the base address and the offset address of the data in the memory;
S102:寻址模块根据基地址和所述偏移地址得到数据在存储器的存储地址;S102: The addressing module obtains the storage address of the data in the memory according to the base address and the offset address;
S103:处理器核通过寻址模块访问存储地址的所述数据。S103: The processor core accesses the data of the storage address through the addressing module.
在本实施例中,如图2所示,处理器包括:处理器核、寻址模块和存储器,寻址模块可集成在处理器内部。本实施例的寻址方法,可利用寻址模块实现处理器核对存 储器的查表寻址,即处理器核可通过寻址模块以查表方式从存储器读取数据以及将数据写入存储器。In this embodiment, as shown in FIG. 2, the processor includes: a processor core, an addressing module, and a memory, and the addressing module can be integrated inside the processor. In the addressing method of this embodiment, the addressing module can be used to realize the table look-up addressing of the memory by the processor core, that is, the processor core can read data from the memory and write data into the memory in a table look-up manner through the addressing module.
寻址模块可包括一组或多组寻址单元。本实施例以一组寻址单元为例,对该组寻址单元执行所述寻址方法的情况进行说明。The addressing module may include one or more groups of addressing units. In this embodiment, a group of addressing units is taken as an example to describe the case where the group of addressing units executes the addressing method.
如图3所示,该组寻址单元包括:相同的两个寻址单元。两个寻址单元分别通过系统总线与处理器核进行通信。在寻址模块内设置有内部总线,两个寻址单元之间通过内部总线进行通信。当执行本实施例的寻址方式时,两个寻址单元中的其中一个寻址单元用于对数据进行读写,该寻址单元可称为表寻址单元,而另一个寻址单元用于对该数据在存储器的偏移地址进行读写,该另一寻址单元可称为偏移地址寻址单元。为描述方便,以下将两个寻址单元分别称为第一寻址单元和第二寻址单元,并以第一寻址单元作为表寻址单元,第二寻址单元作为偏移地址寻址单元为例,对本实施例的寻址方法进行说明。但本领域技术人员应当明白,第一寻址单元和第二寻址单元的角色也可以互换,即第一寻址单元作为偏移地址寻址单元,第二寻址单元作为表寻址单元。As shown in Figure 3, the group of addressing units includes two identical addressing units. The two addressing units communicate with the processor core through the system bus respectively. An internal bus is set in the addressing module, and the two addressing units communicate through the internal bus. When the addressing mode of this embodiment is implemented, one of the two addressing units is used to read and write data. This addressing unit can be called a table addressing unit, and the other addressing unit is used for reading and writing data. To read and write the data at the offset address of the memory, the other addressing unit may be called an offset addressing unit. For the convenience of description, the two addressing units are referred to as the first addressing unit and the second addressing unit respectively, and the first addressing unit is used as the table addressing unit, and the second addressing unit is used as the offset addressing unit. The unit is taken as an example to describe the addressing method of this embodiment. However, those skilled in the art should understand that the roles of the first addressing unit and the second addressing unit can also be interchanged, that is, the first addressing unit is used as an offset addressing unit, and the second addressing unit is used as a table addressing unit. .
本实施例寻址方法,在处理器中设置寻址模块,通过寻址模块而不是处理器核根据基地址和偏移地址得到数据在存储器的存储地址,寻址操作均由寻址模块完成,存储地址的计算过程不需要处理器核参与,而是由寻址模块计算存储地址,相对于一般的处理器,提高了查表寻址的效率。In the addressing method of this embodiment, an addressing module is set in the processor, and the storage address of the data in the memory is obtained by the addressing module instead of the processor core according to the base address and the offset address. The addressing operations are all completed by the addressing module. The storage address calculation process does not require the participation of the processor core, but the addressing module calculates the storage address, which improves the efficiency of table look-up addressing compared with ordinary processors.
以下分别通过读操作和写操作的过程对本实施例的寻址方法进行描述。The addressing method of this embodiment will be described below through the processes of read operation and write operation respectively.
读操作Read operation
当处理器核需要从存储器读取数据时,首先通过S101,由第一寻址单元获取数据在存储器的基地址以及偏移地址。When the processor core needs to read data from the memory, first through S101, the first addressing unit obtains the base address and offset address of the data in the memory.
本实施例中,如图4所示,第一寻址单元包括:第一地址计算单元、第一冲突处理单元、第一数据处理单元、第一数据收发单元和第一控制单元。图中的实线表示地址和数据信号,虚线表示控制信号。In this embodiment, as shown in FIG. 4, the first addressing unit includes: a first address calculation unit, a first conflict processing unit, a first data processing unit, a first data transceiving unit, and a first control unit. The solid lines in the figure represent address and data signals, and the dashed lines represent control signals.
第一控制单元可通过系统总线与处理器核通信,并控制第一地址计算单元、第一冲突处理单元、第一数据处理单元和第一数据收发单元的操作。当处理器核需要从存储器读取数据时,处理器核通过系统总线向第一控制单元发送读请求;响应于该读请求,第一控制单元通过内部总线向第二寻址单元发送偏移地址请求。响应于该偏移地址请求,第二寻址单元从存储器中读取所述数据在存储器中的偏移地址,将偏移地址 通过内部总线发送给第一地址计算单元,并通过内部总线向第一控制单元发送一偏移地址有效信号。响应于偏移地址有效信号,第一控制单元启动第一寻址单元的各个其他单元进行读操作。The first control unit can communicate with the processor core through the system bus, and control the operations of the first address calculation unit, the first conflict processing unit, the first data processing unit, and the first data transceiving unit. When the processor core needs to read data from the memory, the processor core sends a read request to the first control unit through the system bus; in response to the read request, the first control unit sends an offset address to the second addressing unit through the internal bus ask. In response to the offset address request, the second addressing unit reads the offset address of the data in the memory from the memory, sends the offset address to the first address calculation unit via the internal bus, and sends the offset address to the first address calculation unit via the internal bus. A control unit sends an offset address valid signal. In response to the offset address valid signal, the first control unit starts each other unit of the first addressing unit to perform a read operation.
第一地址计算模块接收处理器核发送的所述数据的基地址、以及第二寻址单元发送的偏移地址,由基地址和偏移地址得到所述数据的向量地址。如图5所示,第一地址计算单元包括:基地址选择器、偏移地址选择器和加法器。The first address calculation module receives the base address of the data sent by the processor core and the offset address sent by the second addressing unit, and obtains the vector address of the data from the base address and the offset address. As shown in FIG. 5, the first address calculation unit includes: a base address selector, an offset address selector, and an adder.
基地址选择器选择处理器核发送的基地址。偏移地址选择器选择第二寻址单元通过内部总线发送的偏移地址。对于查表寻址来说,偏移地址的数量与存储器的存储块(Bank)的数量对应。当存储器包括N个Bank时,偏移地址的数量为N。The base address selector selects the base address sent by the processor core. The offset address selector selects the offset address sent by the second addressing unit through the internal bus. For look-up table addressing, the number of offset addresses corresponds to the number of banks (Bank) of the memory. When the memory includes N banks, the number of offset addresses is N.
获取基地址和偏移地址后,在S102中,第一地址计算单元根据基地址和偏移地址得到数据在存储器的存储地址。After obtaining the base address and the offset address, in S102, the first address calculation unit obtains the storage address of the data in the memory according to the base address and the offset address.
如图5所示,第一地址计算单元的加法器分别将基地址与N个偏移地址求和,得到数据的存储地址,该存储地址为包括16个地址的向量地址。As shown in FIG. 5, the adder of the first address calculation unit respectively sums the base address and the N offset addresses to obtain the storage address of the data, and the storage address is a vector address including 16 addresses.
得到向量地址后,在S103中,处理器核通过第一寻址单元从存储器读取存储在向量地址的数据。当向量地址存在Bank冲突时,处理器核通过第一寻址单元读取向量地址的数据。After obtaining the vector address, in S103, the processor core reads the data stored at the vector address from the memory through the first addressing unit. When there is a bank conflict in the vector address, the processor core reads the data of the vector address through the first addressing unit.
当向量地址中的至少两个地址均对应于存储器的同一个Bank时,此时认为存在Bank冲突。第一冲突处理单元判断是否存在Bank冲突,当存在Bank冲突时,如图6所示,处理器核通过第一寻址单元访问存储地址的数据包括:When at least two addresses in the vector address correspond to the same bank of the memory, it is considered that there is a bank conflict at this time. The first conflict processing unit determines whether there is a Bank conflict. When there is a Bank conflict, as shown in FIG. 6, the processor core accesses the data of the storage address through the first addressing unit including:
S601:第一冲突处理单元利用一冲突解决机制从所述向量地址读取所述数据,并将所述数据发送给所述第一数据处理单元;S601: The first conflict processing unit reads the data from the vector address by using a conflict resolution mechanism, and sends the data to the first data processing unit;
S602:所述第一数据处理单元对所述数据进行处理,并将处理后的所述数据发送至所述第一数据收发单元;S602: The first data processing unit processes the data, and sends the processed data to the first data transceiving unit;
S603:所述第一数据收发单元将处理后的所述数据发送至所述处理器核。S603: The first data transceiving unit sends the processed data to the processor core.
如图7所示,第一冲突处理单元包括:冲突判断单元、地址映射单元、地址选通器和读数据重组单元。As shown in FIG. 7, the first conflict processing unit includes: a conflict judgment unit, an address mapping unit, an address strobe, and a read data recombination unit.
以下介绍S601中的冲突解决机制。The following describes the conflict resolution mechanism in S601.
第一地址计算单元发送的向量地址直接输入地址选通器,同时向量地址缓存器对所述向量地址进行缓存。The vector address sent by the first address calculation unit is directly input to the address strobe, and the vector address buffer caches the vector address at the same time.
地址选通器选通第一地址计算单元直接发送的向量地址,使该向量地址输出至冲 突判断单元。The address strobe strobes the vector address directly sent by the first address calculation unit, so that the vector address is output to the conflict judgment unit.
冲突判断单元对向量地址进行判断:The conflict judgment unit judges the vector address:
当向量地址存在Bank冲突时,冲突判断单元产生一冲突标志有效信号,并将冲突标志有效信号反馈至地址选通器,使地址选通器选通向量地址缓存器输出的向量地址,这样可在冲突处理期间,保持输入冲突判断单元的向量地址不变。When there is a bank conflict in the vector address, the conflict judgment unit generates a conflict flag valid signal, and feeds back the conflict flag valid signal to the address strobe, so that the address strobe strobes the vector address output by the vector address register. During the conflict processing, keep the vector address of the input conflict judgment unit unchanged.
地址映射单元将向量地址映射至存储器的物理地址。The address mapping unit maps the vector address to the physical address of the memory.
数据重组单元读取物理地址的数据,对数据进行重组,并将重组后的所述数据发送至第一数据处理单元。The data reorganization unit reads the data of the physical address, reorganizes the data, and sends the reorganized data to the first data processing unit.
之后冲突判断单元产生一冲突标志失效信号,响应于冲突标志失效信号,地址选择器选通第一地址计算单元发送的下一个数据的向量地址,向量地址缓存器同时缓存下一个数据的向量地址,冲突判断单元继续对下一个数据的Bank冲突进行处理。After that, the conflict judgment unit generates a conflict flag invalidation signal. In response to the conflict flag invalidation signal, the address selector selects the vector address of the next data sent by the first address calculation unit, and the vector address buffer caches the vector address of the next data at the same time. The conflict judgment unit continues to process the bank conflict of the next data.
地址映射单元通过以下方式将向量地址映射至存储器的物理地址:The address mapping unit maps the vector address to the physical address of the memory in the following way:
分别将各个Bank的与向量地址对应的第一个存储单元(cell)编为一组、与向量地址对应的第二个cell编为一组,依次类推,直至将与向量地址对应的第n个cell编为一组,共得到n组cell,并依次选通存储器的这n组cell。The first memory cell (cell) corresponding to the vector address of each bank is grouped into a group, the second cell corresponding to the vector address is grouped into a group, and so on, until the nth cell corresponding to the vector address is grouped The cells are grouped into a group to obtain a total of n groups of cells, and the n groups of cells in the memory are sequentially strobed.
读数据重组单元通过以下方式读取向量地址的数据,并对数据进行重组:The read data reorganization unit reads the data of the vector address in the following way, and reorganizes the data:
按照所述n组cell的选通顺序,依次读取所述n组cell存储的数据,并将n组cell存储的数据按照地址由小到大的顺序重新排列,得到重组后的数据。According to the gating sequence of the n groups of cells, sequentially read the data stored in the n groups of cells, and rearrange the data stored in the n groups of cells in the order of address from small to large, to obtain the reorganized data.
以下以图8为例对上述冲突解决机制进行说明。在一个示例中,如图8所示,存储器的Bank数量N=16,存储器包括16个Bank,每个Bank包括5个cell,每个cell可存储4个字节32bit。采用查表寻址方式,每组数据的向量地址包括16个地址,第一寻址单元每次可从存储器读取一组16个数据。假设处理器核需要读取存储器中一组标号为“1”、“2”、“3”、“4”的数据,且标号为“1”、“2”、“3”、“4”的数据的基地址为0,偏移地址为[0,1,2,3,68,69,70,71,136,137,138,139,204,205,206,207],第一地址计算单元输出的向量地址为[0,1,2,3,68,69,70,71,136,137,138,139,204,205,206,207]。该向量地址直接输入第一冲突处理单元的地址选通器,同时向量地址缓存器对该向量地址进行缓存。地址选通器选通该向量地址,冲突判断单元对该向量地址进行判断。The following uses FIG. 8 as an example to describe the above conflict resolution mechanism. In an example, as shown in FIG. 8, the number of banks in the memory is N=16, the memory includes 16 banks, each bank includes 5 cells, and each cell can store 4 bytes of 32 bits. Using table lookup addressing, the vector address of each group of data includes 16 addresses, and the first addressing unit can read a group of 16 data from the memory each time. Suppose the processor core needs to read a set of data labeled "1", "2", "3", and "4" in the memory, and the data labeled "1", "2", "3", "4" The base address of the data is 0, the offset address is [0, 1, 2, 3, 68, 69, 70, 71, 136, 137, 138, 139, 204, 205, 206, 207], the first address calculation unit The output vector address is [0, 1, 2, 3, 68, 69, 70, 71, 136, 137, 138, 139, 204, 205, 206, 207]. The vector address is directly input to the address strobe of the first conflict processing unit, and at the same time, the vector address buffer caches the vector address. The address strobe selects the vector address, and the conflict judgment unit judges the vector address.
如图8所示,由于该组数据的向量地址存在对应于存储器的同一个Bank的情况,例如,向量地址对应于Bank1(B1)的第一和第二个cell,B2的第一、第二和第三个cell,等等,因此,该向量地址存在Bank冲突。此时冲突判断单元产生一冲突标志有效信号, 在冲突标志有效信号的控制下,地址选通器选通向量地址缓存器输出的向量地址,这样可在冲突处理期间,保持该向量地址不变,一直到冲突处理结束。As shown in Figure 8, because the vector address of the group of data corresponds to the same Bank of the memory, for example, the vector address corresponds to the first and second cells of Bank1 (B1), and the first and second cells of B2 And the third cell, and so on, so there is a Bank conflict with this vector address. At this time, the conflict judgment unit generates a conflict flag valid signal. Under the control of the conflict flag valid signal, the address strobe gates the vector address output by the vector address register, so that the vector address can be kept unchanged during the conflict processing. Until the end of the conflict resolution.
之后地址映射单元将Bank1-Bankf(B0-Bf)的与向量地址对应的第一个cell编为一组。在图8的存储器中,B0-B3的第一个cell与向量地址中的[0,1,2,3]对应,B4的第二个cell与向量地址中的[71]对应,B5的第三个ell与向量地址中的[139]对应,B6的第四个cell与向量地址中的[207]对应。所以,B0-Bf的与向量地址对应的第一个cell分别为:B0-B3的第一个cell、B4的第二个cell、B5的第三个cell和B6的第四个cell,即第一组cell包括标号为“1”的7个cell。Then the address mapping unit compiles the first cell corresponding to the vector address of Bank1-Bankf (B0-Bf) into a group. In the memory of Figure 8, the first cell of B0-B3 corresponds to [0,1,2,3] in the vector address, the second cell of B4 corresponds to [71] in the vector address, and the first cell of B5 The three ells correspond to [139] in the vector address, and the fourth cell of B6 corresponds to [207] in the vector address. Therefore, the first cell corresponding to the vector address of B0-Bf is: the first cell of B0-B3, the second cell of B4, the third cell of B5, and the fourth cell of B6, namely the first cell A group of cells includes 7 cells labeled "1".
同理,再将B0-Bf的与向量地址对应的第二个cell编为一组。在图8的存储器中,B1-B3的第二个cell与向量地址中的[68,69,70]对应,B4的第三个cell与向量地址中的[138]对应,B5的第四个ell与向量地址中的[206]对应。所以,B0-Bf的与向量地址对应的第二个cell分别为:B1-B3的第二个cell、B4的第三个cell、B5的第四个cell,即第二组cell包括标号为“2”的5个cell。In the same way, the second cell corresponding to the vector address of B0-Bf is grouped into a group. In the memory of Figure 8, the second cell of B1-B3 corresponds to [68,69,70] in the vector address, the third cell of B4 corresponds to [138] in the vector address, and the fourth cell of B5 ell corresponds to [206] in the vector address. Therefore, the second cell corresponding to the vector address of B0-Bf is: the second cell of B1-B3, the third cell of B4, and the fourth cell of B5, that is, the second group of cells includes the label " 2" 5 cells.
以此类推,可以得到B0-Bf的与向量地址对应的第三个cell、第四个cell的两组cell,第三组和第四组cell分别为包括标号为“3”的3个cell、标号为“4”的1个cell。地址映射单元依次选通存储器的这4组cell。By analogy, two groups of cells of the third cell and the fourth cell corresponding to the vector address of B0-Bf can be obtained. The third and fourth groups of cells are respectively the 3 cells labeled "3", 1 cell labeled "4". The address mapping unit sequentially selects the four groups of cells in the memory.
在图8的示例中,n=4,即将向量地址映射至存储器的四组cell。在其他示例中,n可以是其他值,其取决于向量地址本身。In the example of FIG. 8, n=4, that is, the vector address is mapped to the four groups of cells in the memory. In other examples, n can be other values, which depend on the vector address itself.
读数据重组单元按照这4组cell的选通顺序,依次从存储器读取这4组cell存储的数据,读取的数据如图8所示。通过上述冲突解决机制,一个时钟周期可读取一组cell存储的数据,通过四个时钟周期即可完成数据的读取。The read data reorganization unit reads the data stored in the four groups of cells from the memory in sequence according to the strobe sequence of the four groups of cells, and the read data is shown in FIG. 8. Through the above conflict resolution mechanism, the data stored in a group of cells can be read in one clock cycle, and the data can be read in four clock cycles.
需要说明的是,本实施例可以采用多种选通顺序,例如,可以顺序依次选通第一至第四组cell(如图8所示),也可以倒序依次选通第四组至第一组cell,也可以随机依次选通四组cell。It should be noted that in this embodiment, multiple gating sequences can be used. For example, the first to fourth groups of cells can be sequentially selected (as shown in FIG. 8), or the fourth group to the first group can be sequentially selected in reverse order. Group cells, four groups of cells can also be selected in sequence at random.
如图8所示,读数据重组单元从存储器读出的数据,并不是按照地址排列的,其排列顺序与其在存储器的实际存储位置并不相符,也就是说,读出的数据并不是按照处理器核需要的顺序排列,处理器核还不能使用。因此,读数据重组单元需要对这4组cell存储的数据进行重组,按照地址由小到大的顺序重新排列,得到重组后的数据。经过重组后,数据是按照其在存储器的实际存储位置进行排列的,数据重组单元将重组后的数据发送至第一数据处理单元,以供处理器核使用。As shown in Figure 8, the data read from the memory by the read data reorganization unit is not arranged in accordance with the address, and its arrangement order does not match its actual storage location in the memory, that is to say, the read data is not arranged in accordance with the processing The processor cores need to be arranged in the order, and the processor cores cannot be used yet. Therefore, the read data reorganization unit needs to reorganize the data stored in the 4 groups of cells, and rearrange them in the order of address from small to large to obtain the reorganized data. After reorganization, the data is arranged according to its actual storage location in the memory, and the data reorganization unit sends the reorganized data to the first data processing unit for use by the processor core.
至此,该组数据的冲突处理结束,冲突判断单元产生一冲突标志失效信号。响应于冲突标志失效信号,地址选择器选通第一地址计算单元发送的下一组数据的向量地址,向量地址缓存器同时缓存下一组数据的向量地址,冲突判断单元继续对下一组数据的Bank冲突进行处理。At this point, the conflict processing of the group of data ends, and the conflict judgment unit generates a conflict flag failure signal. In response to the conflict flag invalidation signal, the address selector selects the vector address of the next set of data sent by the first address calculation unit, the vector address buffer caches the vector address of the next set of data at the same time, and the conflict judgment unit continues to check the next set of data. Bank conflicts are dealt with.
在冲突解决机制中,冲突判断单元对向量地址进行判断,当向量地址不存在Bank冲突时,冲突判断单元产生冲突标志失效信号。地址映射单元将向量地址映射至存储器的物理地址。读数据重组单元读取物理地址的数据,无需对读取的数据进行重组,直接将读取的数据发送至第一数据处理单元。由于不存在Bank冲突,一个时钟周期即可完成数据的读取。响应于冲突标志失效信号,地址选择器选通下一组数据的向量地址。In the conflict resolution mechanism, the conflict judgment unit judges the vector address. When there is no bank conflict in the vector address, the conflict judgment unit generates a conflict flag failure signal. The address mapping unit maps the vector address to the physical address of the memory. The read data reorganization unit reads the data of the physical address without reorganizing the read data, and directly sends the read data to the first data processing unit. Since there is no bank conflict, the data can be read in one clock cycle. In response to the conflict flag failure signal, the address selector strobes the vector address of the next set of data.
除了将向量地址的各个地址对应于存储器的不同Bank认定为不存在Bank冲突,本实施例所述的不存在Bank冲突还包括以下情况:In addition to determining that each address of the vector address corresponds to different banks of the memory as no Bank conflict, the absence of Bank conflict described in this embodiment also includes the following situations:
当Bank的位宽为m个字节时,将向量地址平均分为m组或2×m组。如果每组地址均对应于一个Bank的一个cell,则认为向量地址不存在Bank冲突,第一冲突处理单元对向量地址进行地址拼接。When the bit width of the Bank is m bytes, the vector addresses are equally divided into m groups or 2×m groups. If each group of addresses corresponds to a cell of a bank, it is considered that there is no bank conflict in the vector address, and the first conflict processing unit performs address splicing on the vector address.
以下以图9为例对上述不存在Bank冲突的情况进行说明。The following uses FIG. 9 as an example to illustrate the above-mentioned situation where there is no Bank conflict.
如图9所示,假设处理器核需要读取存储器中一组标号为“1”至“16”的数据,且标号为“1”至“16”的数据的基地址为0,偏移地址为[0,1,2,3,68,69,70,71,136,137,138,139,204,205,206,207],第一地址计算单元输出的向量地址为[0,1,2,3,68,69,70,71,136,137,138,139,204,205,206,207]。如果按照向量地址中的至少两个地址均对应于存储器的同一个Bank的Bank冲突定义,这种情况属于Bank冲突,第一冲突处理单元应当利用冲突解决机制读取该组数据,需要四个时钟周期才能完成数据的读取。As shown in Figure 9, suppose the processor core needs to read a set of data labeled "1" to "16" in the memory, and the base address of the data labeled "1" to "16" is 0, and the offset address Is [0, 1, 2, 3, 68, 69, 70, 71, 136, 137, 138, 139, 204, 205, 206, 207], the vector address output by the first address calculation unit is [0, 1, 2, 3, 68, 69, 70, 71, 136, 137, 138, 139, 204, 205, 206, 207]. If at least two addresses in the vector address correspond to the bank conflict definition of the same bank of the memory, this situation is a bank conflict, and the first conflict processing unit should use the conflict resolution mechanism to read the group of data, which requires four clocks The reading of the data can be completed in a cycle.
针对上述情况,当向量地址中的至少两个地址均对应于存储器的同一个Bank时,本实施例的冲突判断单元继续对向量地址进行判断。在图9中,Bank的位宽为4个字节,将向量地址平均分为4组,该4组地址分别为[0,1,2,3],[68,69,70,71],[136,137,138,139],[204,205,206,207]。由于每组地址均对应于一个Bank的一个cell,[0,1,2,3]对应B0的第一个cell,[68,69,70,71]对应B1的第二个cell,[136,137,138,139]对应B2的第三个cell,[204,205,206,207]对应B3的第四个cell,所以认为向量地址为[0,1,2,3,68,69,70,71,136,137,138,139,204,205,206,207]不存在Bank冲 突,第一冲突处理单元可通过一个时钟周期把数据“1”至“16”全部读出。In view of the above situation, when at least two addresses in the vector address correspond to the same bank of the memory, the conflict determination unit of this embodiment continues to determine the vector address. In Figure 9, the bit width of the Bank is 4 bytes, and the vector address is divided into 4 groups equally. The 4 groups of addresses are [0, 1, 2, 3], [68, 69, 70, 71], [136,137,138,139], [204,205,206,207]. Since each group of addresses corresponds to a cell of a Bank, [0, 1, 2, 3] corresponds to the first cell of B0, [68, 69, 70, 71] corresponds to the second cell of B1, [136, 137,138,139] corresponds to the third cell of B2, [204,205,206,207] corresponds to the fourth cell of B3, so the vector address is considered to be [0,1,2,3,68,69,70 , 71, 136, 137, 138, 139, 204, 205, 206, 207] There is no Bank conflict, the first conflict processing unit can read all the data "1" to "16" in one clock cycle.
以上以将向量地址平均分为4组为例进行了说明。本实施例还可将向量地址平均分为8组。如果每组地址均对应于一个Bank的一个cell,则也认为向量地址不存在Bank冲突。The above is an example of dividing the vector addresses into 4 groups equally. In this embodiment, the vector addresses can also be equally divided into 8 groups. If each group of addresses corresponds to a cell of a bank, it is also considered that there is no bank conflict in the vector address.
由此可见,对于当Bank的位宽为m个字节,将向量地址平均分为m组或2×m组,且每组地址均对应于一个Bank的一个cell的情况,如果利用冲突解决机制读取数据,则需要n个时钟周期,n取决于向量地址本身,最大值可以为16。而本实施例通过地址拼接的方式,一个时钟周期即可把数据全部读出。相对于冲突解决机制最多可以节省15个时钟周期,大大提高了数据读取效率。It can be seen that when the bit width of the Bank is m bytes, the vector addresses are equally divided into m groups or 2×m groups, and each group address corresponds to a cell of a bank, if the conflict resolution mechanism is used To read data, n clock cycles are required, n depends on the vector address itself, and the maximum value can be 16. However, in this embodiment, through address splicing, all data can be read out in one clock cycle. Compared with the conflict resolution mechanism, it can save up to 15 clock cycles, which greatly improves the efficiency of data reading.
所述第一数据处理单元接收到第一冲突处理单元发送的数据后,根据处理器核需要的数据宽度,决定是否对数据进行进一步的处理。当处理器核需要的数据并非是从各个Bank的各个cell读取的全部字节,而是各个Bank的各个cell的部分字节时,第一数据处理单元对第一冲突处理单元发送的数据的部分字节进行拼接,以生成处理器核需要的数据,并将拼接后的数据发送给第一数据收发单元。After the first data processing unit receives the data sent by the first conflict processing unit, it decides whether to perform further processing on the data according to the data width required by the processor core. When the data required by the processor core is not all the bytes read from each cell of each Bank, but a partial byte of each cell of each Bank, the first data processing unit compares the data sent by the first conflict processing unit Part of the bytes are spliced to generate data required by the processor core, and the spliced data is sent to the first data transceiver unit.
具体来说,第一数据处理单元对第一冲突处理单元发送的数据的部分字节进行拼接包括:Specifically, the first data processing unit splicing partial bytes of the data sent by the first conflict processing unit includes:
对第一冲突处理单元发送的数据,当Bank的位宽为m个字节,且处理器核需要的是该数据的每m个字节中的k个字节时,则从每m个字节中选择所述k个字节,得到N×k个字节;k≤log 2 mFor the data sent by the first conflict processing unit, when the bit width of the Bank is m bytes, and what the processor core needs is k bytes in every m bytes of the data, start from every m words Select the k bytes in the section to obtain N×k bytes; k≤log 2 m .
将N×k个字节的每m个字节组合在一起,得到m×k块、每块宽度为m个字节的数据。Combine each m bytes of N×k bytes together to obtain data of m×k blocks each with a width of m bytes.
以下以图10为例,对数据拼接的过程进行说明。The following takes FIG. 10 as an example to describe the process of data splicing.
对于图8所示的存储器,N=16,m=4,包括16个Bank,Bank的位宽为4个字节,即每个Bank的每个cell存储4个字节的数据,因此,如图10所示,第一冲突处理单元发送给第一数据处理单元的数据包括16个块、每块包括4个字节,共64个字节512bit。在1-64这64个字节中,如果k=1,即处理器核需要每4个字节中的1个字节,第一数据处理单元则从每4个字节中选择处理器核所需的1个字节,得到16×1=16个字节。在图10中,处理器核需要的是第2、7、9、16、...、64这16个字节。然后,第一数据处理单元将选择的16个字节的每4个字节组合在一起,得到4块、每块宽度为4个字节的数据,该数据即为处理器核所需的数据,并将拼接后的数据发送给第一数据收 发单元。For the memory shown in Figure 8, N=16, m=4, including 16 Banks, the bit width of the Bank is 4 bytes, that is, each cell of each Bank stores 4 bytes of data, so as As shown in FIG. 10, the data sent by the first conflict processing unit to the first data processing unit includes 16 blocks, each block includes 4 bytes, and a total of 64 bytes of 512 bits. Among the 64 bytes of 1-64, if k=1, the processor core needs 1 byte in every 4 bytes, and the first data processing unit selects the processor core from every 4 bytes. The required 1 byte is 16×1=16 bytes. In Figure 10, the processor core needs 16 bytes of 2, 7, 9, 16, ..., 64. Then, the first data processing unit combines every 4 bytes of the selected 16 bytes to obtain 4 blocks of data with a width of 4 bytes each, which is the data required by the processor core , And send the spliced data to the first data transceiver unit.
当处理器核需要的数据是从各个Bank的各个cell读取的全部字节时,对于图10的示例来说,如果处理器核所需的数据为1-64这全部64个字节,则第一数据处理单元无需对第一冲突处理单元发送的数据进行拼接,而是直接将数据发送给第一数据收发单元。When the data required by the processor core is all the bytes read from each cell of each Bank, for the example in Figure 10, if the data required by the processor core is all 64 bytes of 1-64, then The first data processing unit does not need to splice the data sent by the first conflict processing unit, but directly sends the data to the first data transceiving unit.
以上以k=1为例对数据拼接的过程进行了说明。根据k≤log 2 m,当m=4时,k的值还可以为2,即处理器核需要的是每4个字节中的2个字节,这种情况下,第一数据处理单元的操作与k=1时是类似的。第一数据处理单元从每4个字节中选择处理器核所需的2个字节,得到16×2=32个字节。然后,第一数据处理单元将选择的32个字节的每4个字节组合在一起,得到8块、每块宽度为4个字节的数据,该数据即为处理器核所需的数据,并将拼接后的数据发送给第一数据收发单元。 The process of data splicing is described above by taking k=1 as an example. According to k≤log 2 m , when m=4, the value of k can also be 2, that is, what the processor core needs is 2 bytes in every 4 bytes. In this case, the first data processing unit The operation of is similar to when k=1. The first data processing unit selects 2 bytes required by the processor core from every 4 bytes to obtain 16×2=32 bytes. Then, the first data processing unit combines every 4 bytes of the selected 32 bytes to obtain 8 blocks of data with a width of 4 bytes each, which is the data required by the processor core , And send the spliced data to the first data transceiver unit.
第一数据收发单元包括:接收缓存器和发送缓存器。发送缓存器缓存第一数据处理单元发送的数据,并通过系统总线将缓存的数据发送至处理器核。接收缓存器和发送缓存器的深度可根据实际需要设置。在一个示例中,接收缓存器深度最小为2,发送缓存器的深度最小为0。The first data transceiver unit includes: a receiving buffer and a sending buffer. The sending buffer buffers the data sent by the first data processing unit, and sends the buffered data to the processor core through the system bus. The depth of the receiving buffer and the sending buffer can be set according to actual needs. In an example, the minimum depth of the receive buffer is 2 and the minimum depth of the transmit buffer is 0.
至此,处理器核通过第一寻址单元读取存储器的数据的操作完成。So far, the operation of the processor core to read the data of the memory through the first addressing unit is completed.
在一些处理器中,是由处理器核进行寻址操作,即处理器核获取基地址和偏移地址并计算向量地址。如果向量地址存在Bank冲突,需要处理器核对Bank冲突进行处理。而在处理Bank冲突期间,处理器核无法执行其他操作,需要等待Bank冲突的解决。当Bank冲突解决后,处理器核才可执行其他操作。而本实施例的寻址方法,在处理器中设置第一寻址单元,基地址和偏移地址的获取以及向量地址的计算均由第一寻址单元完成。当向量地址存在Bank冲突,由第一寻址单元利用冲突解决机制解决Bank冲突,无需处理器核处理Bank冲突进行处理。在处理Bank冲突期间,处理器核仍然可以执行其他操作,无需等待Bank冲突的解决。因此,本实施例寻址方法可以显著提高处理器的效率,提升处理器的运算速度。In some processors, the addressing operation is performed by the processor core, that is, the processor core obtains the base address and the offset address and calculates the vector address. If there is a bank conflict in the vector address, the processor needs to check the bank conflict and deal with it. During the processing of the Bank conflict, the processor core cannot perform other operations and needs to wait for the resolution of the Bank conflict. After the Bank conflict is resolved, the processor core can perform other operations. In the addressing method of this embodiment, the first addressing unit is set in the processor, and the acquisition of the base address and the offset address and the calculation of the vector address are all completed by the first addressing unit. When there is a bank conflict in the vector address, the first addressing unit uses the conflict resolution mechanism to resolve the bank conflict, and the processor core does not need to deal with the bank conflict. During the processing of the Bank conflict, the processor core can still perform other operations without waiting for the resolution of the Bank conflict. Therefore, the addressing method of this embodiment can significantly improve the efficiency of the processor and increase the computing speed of the processor.
以上介绍了通过本实施例的寻址方法读取数据的过程。对于处理器核需要的每组数据都按照以上方式读取。当处理器核通过本实施例的寻址方法读取多组数据时,第一寻址单元可基于多种不同的模式获取多组数据的基地址以及偏移地址。The process of reading data through the addressing method of this embodiment is described above. Each set of data required by the processor core is read in the above manner. When the processor core reads multiple sets of data through the addressing method of this embodiment, the first addressing unit can obtain the base address and offset address of the multiple sets of data based on multiple different modes.
一种模式可称为偏移地址更新模式。在偏移地址更新模式中,多组数据的基地址不变,各组数据的偏移地址来自第二寻址单元。One mode can be called the offset address update mode. In the offset address update mode, the base address of multiple groups of data is unchanged, and the offset address of each group of data comes from the second addressing unit.
第一地址计算单元获取处理器核发送的基地址;第二寻址单元依次读取每组数据在存储器的偏移地址;第一地址计算单元获取第二寻址单元读取的偏移地址。The first address calculation unit obtains the base address sent by the processor core; the second addressing unit sequentially reads the offset address of each group of data in the memory; the first address calculation unit obtains the offset address read by the second addressing unit.
如图5所示,当读取多组数据时,在偏移地址更新模式,基地址选择器选择处理器核发送的基地址。每当读取一组数据时,偏移地址选择器选择第二寻址单元通过内部总线发送的该组数据的偏移地址。加法器将基地址与该组数据的偏移地址求和,得到该组数据的向量地址。加法器将该组数据的向量地址发送给第一冲突解决单元,并通过第一冲突解决单元、第一数据处理单元和第一数据收发单元将数据发送给处理器核,完成该组数据的读取。对每组数据,偏移地址选择器都选择第二寻址单元通过内部总线发送的该组数据的偏移地址,以实现对多组数据的读取。As shown in Figure 5, when reading multiple sets of data, in the offset address update mode, the base address selector selects the base address sent by the processor core. Whenever a group of data is read, the offset address selector selects the offset address of the group of data sent by the second addressing unit through the internal bus. The adder sums the base address and the offset address of the group of data to obtain the vector address of the group of data. The adder sends the vector address of the group of data to the first conflict resolution unit, and sends the data to the processor core through the first conflict resolution unit, the first data processing unit, and the first data transceiver unit to complete the reading of the group of data Pick. For each group of data, the offset address selector selects the offset address of the group of data sent by the second addressing unit through the internal bus, so as to realize the reading of multiple groups of data.
另一种模式可称为基地址更新模式。在基地址更新模式中,多组数据的偏移地址来自第二寻址单元,且各组数据的偏移地址为同一个偏移地址,通过对基地址初值进行更新,得到各组数据的基地址。The other mode can be called the base address update mode. In the base address update mode, the offset address of multiple groups of data comes from the second addressing unit, and the offset address of each group of data is the same offset address. By updating the initial value of the base address, the offset address of each group of data is obtained. Base address.
如图11所示,当读取标号分别为“1”、“2”和“3”的3组数据时,对于标号为“1”的第一组数据,向量地址为[0,4,8,12,68,72,76,80,136,140,144,148,204,208,212,216];对于标号为“2”的第二组数据,向量地址为[20,24,28,32,88,92,96,100,156,160,164,168,224,228,232,236];对于标号为“3”的第三组数据,向量地址为[40,44,48,52,108,112,116,120,176,180,184,188,244,248,252,256]。如果按照一般的查表寻址方式,每读取一组数据时,处理器核需要先通过第二寻址单元将该组数据的偏移地址写入存储器,第二寻址单元再将该组数据的偏移地址从存储器读取出来并发送给第一寻址单元,第一寻址单元由处理器核发送的基地址[0]以及第二寻址单元发送的偏移地址得到向量地址,将该组数据从其向量地址读出。这样对于上述3组数据,需要3次的偏移地址写入操作。As shown in Figure 11, when reading 3 sets of data labeled "1", "2" and "3", for the first set of data labeled "1", the vector address is [0, 4, 8 , 12, 68, 72, 76, 80, 136, 140, 144, 148, 204, 208, 212, 216]; for the second set of data labeled "2", the vector address is [20, 24, 28, 32, 88, 92, 96, 100, 156, 160, 164, 168, 224, 228, 232, 236]; for the third group of data labeled "3", the vector address is [40, 44, 48, 52 , 108, 112, 116, 120, 176, 180, 184, 188, 244, 248, 252, 256]. If according to the general look-up table addressing method, each time a group of data is read, the processor core needs to write the offset address of the group of data into the memory through the second addressing unit, and then the second addressing unit. The offset address of the data is read from the memory and sent to the first addressing unit. The first addressing unit obtains the vector address from the base address [0] sent by the processor core and the offset address sent by the second addressing unit, Read this set of data from its vector address. In this way, for the above three sets of data, three offset address write operations are required.
考虑到这3组数据中,如果以每组数据的向量地址的第一个地址为基地址,则该组数据的向量地址的其他地址相对于第一个地址,偏移地址都是相同的。如果以第一组数据的地址[0]为基地址,则第一组数据的偏移地址可以表示为[0,4,8,12,68,72,76,80,136,140,144,148,204,208,212,216];同样,如果以第二组数据的地址[20]为基地址,则第二组数据的偏移地址也可以表示为[0,4,8,12,68,72,76,80,136,140,144,148,204,208,212,216];如果以第三组数据的地址[40]为基地址,则第三组数据的偏移地址也可以表示为[0,4,8,12,68,72,76,80,136,140,144,148,204,208,212,216]。Considering these three sets of data, if the first address of the vector address of each set of data is used as the base address, the other addresses of the vector address of the set of data have the same offset address relative to the first address. If the address [0] of the first group of data is used as the base address, the offset address of the first group of data can be expressed as [0, 4, 8, 12, 68, 72, 76, 80, 136, 140, 144, 148, 204, 208, 212, 216]; Similarly, if the address [20] of the second group of data is used as the base address, the offset address of the second group of data can also be expressed as [0, 4, 8, 12, 68, 72, 76, 80, 136, 140, 144, 148, 204, 208, 212, 216]; if the address [40] of the third group of data is used as the base address, the offset address of the third group of data is also It can be expressed as [0, 4, 8, 12, 68, 72, 76, 80, 136, 140, 144, 148, 204, 208, 212, 216].
在本实施例中,如图12所示,基地址更新单元包括:加法器和D触发器。本实施例的基地址更新模式,在读取这3组数据时,处理器核通过第二寻址单元将偏移地址[0,4,8,12,68,72,76,80,136,140,144,148,204,208,212,216]写入存储器,并将第二寻址单元设置为循环读取模式即可。这样,当读取标号为“1”的第一组数据时,第二寻址单元将该偏移地址[0,4,8,12,68,72,76,80,136,140,144,148,204,208,212,216]从存储器读取出来并发送给第一寻址单元。结合图5和图12所示,基地址选择器选择基地址更新单元的输出,处理器核将基地址初始值[0]发送给基地址更新单元,基地址初始值[0]经加法器后进入D触发器,当D触发器的时钟脉冲CP有效时,D触发器将基地址初始值[0]发送给基地址选择器,基地址选择器将基地址初始值[0]发送给第一地址计算单元的加法器。加法器将基地址初值值[0]以及偏移地址[0,4,8,12,68,72,76,80,136,140,144,148,204,208,212,216]相加得到向量地址[0,4,8,12,68,72,76,80,136,140,144,148,204,208,212,216],将标号为“1”的数据从该向量地址读出。In this embodiment, as shown in FIG. 12, the base address update unit includes an adder and a D flip-flop. In the base address update mode of this embodiment, when reading these three sets of data, the processor core uses the second addressing unit to offset the address [0, 4, 8, 12, 68, 72, 76, 80, 136, 140, 144, 148, 204, 208, 212, 216] write to the memory, and set the second addressing unit to the cyclic read mode. In this way, when reading the first group of data labeled "1", the second addressing unit will offset the address [0, 4, 8, 12, 68, 72, 76, 80, 136, 140, 144, 148, 204, 208, 212, 216] are read from the memory and sent to the first addressing unit. As shown in Figure 5 and Figure 12, the base address selector selects the output of the base address update unit, the processor core sends the base address initial value [0] to the base address update unit, and the base address initial value [0] passes through the adder. Enter the D flip-flop, when the clock pulse CP of the D flip-flop is valid, the D flip-flop sends the initial value of the base address [0] to the base address selector, and the base address selector sends the initial value of the base address [0] to the first Adder for address calculation unit. The adder adds the initial value of the base address [0] and the offset address [0, 4, 8, 12, 68, 72, 76, 80, 136, 140, 144, 148, 204, 208, 212, 216] Get the vector address [0, 4, 8, 12, 68, 72, 76, 80, 136, 140, 144, 148, 204, 208, 212, 216], read the data labeled "1" from the vector address out.
当读取标号为“2”的第一组数据时,第二寻址单元仍将偏移地址[0,4,8,12,68,72,76,80,136,140,144,148,204,208,212,216]从存储器读取出来并发送给第一寻址单元。基地址选择器仍选择基地址更新单元的输出,处理器核将基地址更新值[20]发送给基地址更新单元,加法器将基地址更新值[20]与标号为“1”的数据的基地址(即基地址初始值[0])相加,得到标号为“2”的数据的基地址[20],并进入D触发器,当D触发器的时钟脉冲CP有效时,D触发器将基地址[20]发送给基地址选择器,基地址选择器将基地址[20]发送给第一地址计算单元的加法器。加法器将基地址[20]以及偏移地址[0,4,8,12,68,72,76,80,136,140,144,148,204,208,212,216]相加得到向量地址[20,24,28,32,88,92,96,100,156,160,164,168,224,228,232,236],将标号为“2”的数据从该向量地址读出。When reading the first group of data labeled "2", the second addressing unit will still offset the address [0, 4, 8, 12, 68, 72, 76, 80, 136, 140, 144, 148, 204, 208, 212, 216] are read from the memory and sent to the first addressing unit. The base address selector still selects the output of the base address update unit, the processor core sends the base address update value [20] to the base address update unit, and the adder combines the base address update value [20] with the data labeled "1". The base address (that is, the initial value of the base address [0]) is added to obtain the base address [20] of the data labeled "2", and enter the D flip-flop. When the clock pulse CP of the D flip-flop is valid, the D flip-flop The base address [20] is sent to the base address selector, and the base address selector sends the base address [20] to the adder of the first address calculation unit. The adder adds the base address [20] and the offset address [0, 4, 8, 12, 68, 72, 76, 80, 136, 140, 144, 148, 204, 208, 212, 216] to obtain the vector address [20, 24, 28, 32, 88, 92, 96, 100, 156, 160, 164, 168, 224, 228, 232, 236], read the data labeled "2" from the vector address.
当读取标号为“3”的第一组数据时,第二寻址单元仍将偏移地址[0,4,8,12,68,72,76,80,136,140,144,148,204,208,212,216]从存储器读取出来并发送给第一寻址单元。基地址选择器仍选择基地址更新单元的输出,基地址更新单元的加法器将基地址更新值[20]与标号为“2”的数据的基地址[20]相加,得到标号为“3”的数据的基地址[40],并进入D触发器,当D触发器的时钟脉冲CP有效时,D触发器将基地址[40]发送给基地址选择器,基地址选择器将基地址[40]发送给第一地址计算单元的加法器。加法器将基地址[40]以及偏移地址 [0,4,8,12,68,72,76,80,136,140,144,148,204,208,212,216]相加得到向量地址[40,44,48,52,108,112,116,120,176,180,184,188,244,248,252,256],将标号为“3”的数据从该向量地址读出。When reading the first group of data labeled "3", the second addressing unit will still offset the address [0, 4, 8, 12, 68, 72, 76, 80, 136, 140, 144, 148, 204, 208, 212, 216] are read from the memory and sent to the first addressing unit. The base address selector still selects the output of the base address update unit, and the adder of the base address update unit adds the base address update value [20] to the base address [20] of the data labeled "2", and the result is labeled "3" "Data base address [40], and enter the D flip-flop. When the clock pulse CP of the D flip-flop is valid, the D flip-flop will send the base address [40] to the base address selector, and the base address selector will set the base address [40] Send to the adder of the first address calculation unit. The adder adds the base address [40] and the offset address [0, 4, 8, 12, 68, 72, 76, 80, 136, 140, 144, 148, 204, 208, 212, 216] to get the vector address [40, 44, 48, 52, 108, 112, 116, 120, 176, 180, 184, 188, 244, 248, 252, 256], read the data labeled "3" from the vector address.
对于标号分别为“1”、“2”和“3”的3组数据,一般的查表寻址方式,处理器核需要向存储器写入三次偏移地址,而本实施例通过基地址更新模式,处理器核向存储器写入一次偏移地址即可。由此可见,本实施例的寻址方式减少了写入偏移地址的次数,节省了写入偏移地址耗费的时间,当处理器核对存储器进行大规模查表寻址时,能够大幅度减少寻址时间,提高寻址效率,优势极为明显。同时,在寻址过程中,处理器核将第二寻址单元设置为循环读取模式,并提供基地址更新值即可,整个寻址过程无需处理器核过多参与,可显著提高处理器的效率,提升处理器的运算速度,在大规模查表寻址时尤其如此。For the three groups of data labeled "1", "2" and "3", the general look-up table addressing mode requires the processor core to write the offset address to the memory three times, and this embodiment uses the base address update mode , The processor core writes the offset address to the memory once. It can be seen that the addressing mode of this embodiment reduces the number of times to write the offset address and saves the time spent writing the offset address. When the processor checks the memory for large-scale table look-up addressing, it can greatly reduce Addressing time, improving addressing efficiency, the advantage is extremely obvious. At the same time, during the addressing process, the processor core sets the second addressing unit to the cyclic read mode and provides the base address update value. The entire addressing process does not require the processor core to participate too much, which can significantly improve the processor The efficiency of the processor increases the computing speed of the processor, especially in the large-scale look-up table addressing.
除基地址更新模式和偏移地址更新模式外,本实施例的寻址方法还提供固定偏移地址模式。如图5所示,在固定偏移地址模式中,第一地址计算单元获取处理器核发送的基地址,基地址选择器选通处理器核发送的基地址,并将该基地址送入加法器。处理器核还向第一寻址单元发送一固定偏移地址,第一地址计算单元的偏移地址选择器选通该固定偏移地址,并将该固定偏移地址送入加法器。第一地址计算单元的加法器将基地址与偏移地址相加,得到向量地址。固定偏移地址模式可用于线性寻址、步长寻址等多种寻址场景。In addition to the base address update mode and the offset address update mode, the addressing method of this embodiment also provides a fixed offset address mode. As shown in Figure 5, in the fixed offset address mode, the first address calculation unit obtains the base address sent by the processor core, and the base address selector selects the base address sent by the processor core, and sends the base address to the addition. Device. The processor core also sends a fixed offset address to the first addressing unit, and the offset address selector of the first address calculation unit selects the fixed offset address and sends the fixed offset address to the adder. The adder of the first address calculation unit adds the base address and the offset address to obtain the vector address. The fixed offset address mode can be used in multiple addressing scenarios such as linear addressing and step addressing.
由此可见,本实施例的寻址方式提供了偏移地址更新模式、基地址更新模式和固定偏移地址模式,可根据实际情况灵活选择,提高了查表寻址的灵活性。It can be seen that the addressing mode of this embodiment provides an offset address update mode, a base address update mode, and a fixed offset address mode, which can be flexibly selected according to actual conditions, which improves the flexibility of table look-up addressing.
当用户的程序代码在处理器运行时,处理器将程序代码编译为处理器可以执行的指令。当某条代码的执行需要从存储器读取数据时,处理器核依次执行读指令、译码、读数据、执行指令等几个操作。在一些处理器中,当处理器核执行读数据的操作时,如果出现Bank冲突,是由处理器核对Bank冲突进行处理,处理器核需要生成多条指令来读取数据。因此,一些处理器的寻址方式是一种指令驱动寻址。When the user's program code is running on the processor, the processor compiles the program code into instructions that the processor can execute. When the execution of a certain code needs to read data from the memory, the processor core sequentially executes several operations such as reading instructions, decoding, reading data, and executing instructions. In some processors, when the processor core executes the operation of reading data, if a bank conflict occurs, the processor core handles the bank conflict, and the processor core needs to generate multiple instructions to read the data. Therefore, the addressing mode of some processors is an instruction-driven addressing.
而本实施例的寻址方法是一种任务驱动寻址。当处理器核执行读数据的操作时,处理器核生成一套读取数据的指令,该指令相当于是一条任务指令,并将该任务指令通过系统总线发送至第一寻址单元,整个寻址过程交由第一寻址单元完成。第一选址单元从存储器读取的数据再经系统总线发送给处理器核。处理器核接收到数据后接着进行后续的操作。由此可见,本实施例的这种任务驱动寻址,当处理器核需要从存储 器读取数据时,将任务指令发送至第一寻址单元即可,处理器核无需关心具体的寻址过程,即使是出现Bank冲突的情况,也是由第一寻址单元来处理,相对于一般的处理器,处理器核的操作得到了简化,效率得到了提升。The addressing method in this embodiment is a task-driven addressing. When the processor core executes the operation of reading data, the processor core generates a set of instructions to read data, which is equivalent to a task instruction, and sends the task instruction to the first addressing unit through the system bus, and the entire addressing The process is completed by the first addressing unit. The data read from the memory by the first addressing unit is sent to the processor core via the system bus. The processor core then performs subsequent operations after receiving the data. It can be seen that in this task-driven addressing of this embodiment, when the processor core needs to read data from the memory, the task instruction can be sent to the first addressing unit, and the processor core does not need to care about the specific addressing process. Even if the bank conflict occurs, it is handled by the first addressing unit. Compared with the general processor, the operation of the processor core is simplified and the efficiency is improved.
在本实施例中,第一寻址单元的第一控制单元可通过一握手协议与处理器核通信。In this embodiment, the first control unit of the first addressing unit can communicate with the processor core through a handshake protocol.
如图13所示,处理器核与第一寻址单元通过系统总线通信,系统总线包括:时钟信号线、读请求有效、读请求备好、读请求、读数据有效、读数据备好和读数据线,第一寻址单元在时钟信号线的驱动下工作。当处理器核需要从存储器读取数据时,处理器核通过握手协议向第一寻址单元发送任务指令和从第一寻址单元接收数据。当读请求有效信号为高时,表示读请求信号有效;当读请求有效信号和读请求备好信号同为高时,第一控制单元从处理器核读取读请求。之后第一控制单元控制第一地址计算单元、第一冲突处理单元、第一数据处理单元和第一数据收发单元从存储器读取数据。当读数据有效信号为高时,表示读数据有效;当读数据有效信号和读数据备好信号同为高时,第一控制单元控制第一数据收发单元将数据发送至处理器核。As shown in Figure 13, the processor core communicates with the first addressing unit through the system bus. The system bus includes: clock signal line, read request valid, read request ready, read request, read data valid, read data ready and read For the data line, the first addressing unit works under the drive of the clock signal line. When the processor core needs to read data from the memory, the processor core sends task instructions to the first addressing unit and receives data from the first addressing unit through the handshake protocol. When the read request valid signal is high, it indicates that the read request signal is valid; when the read request valid signal and the read request ready signal are both high, the first control unit reads the read request from the processor core. After that, the first control unit controls the first address calculation unit, the first conflict processing unit, the first data processing unit, and the first data transceiving unit to read data from the memory. When the read data valid signal is high, it indicates that the read data is valid; when the read data valid signal and the read data ready signal are both high, the first control unit controls the first data transceiver unit to send data to the processor core.
在本实施例中,第一控制单元控制第一地址计算单元、第一冲突处理单元、第一数据处理单元和第一数据收发单元以流水线的方式工作。In this embodiment, the first control unit controls the first address calculation unit, the first conflict processing unit, the first data processing unit, and the first data transceiving unit to work in a pipeline manner.
如图14所示,第一控制单元包括:读写请求缓存、同步寄存器、选择器、以及流水线的第一级、第二级、第三级、第四级和第五级控制器。As shown in FIG. 14, the first control unit includes: read and write request buffers, synchronization registers, selectors, and the first, second, third, fourth, and fifth stages of the pipeline.
处理器核通过系统总线发送一读请求,如果该读请求为查表请求,则选择器选通同步寄存器。读写请求缓存接收该读请求,并将读请求缓存。收到读请求后,读写请求缓存通过内部总线向第二寻址单元发送偏移地址请求,并将读请求发送至同步寄存器。响应于该偏移地址请求,第二寻址单元从存储器中读取所述数据在存储器中的偏移地址,将偏移地址通过内部总线发送给第一地址计算单元,并通过内部总线向同步发送一偏移地址有效信号。收到偏移地址有效信号后,同步寄存器将读请求信号发送给各级流水线控制器,启动流水线操作。第一级、第二级、第三级、第四级和第五级控制器分别向第一地址计算、第一冲突处理单元、第一数据处理单元和第一数据收发单元发送控制信号,在第一寻址单元的寻址过程中,第一地址计算单元位于流水线的第一级、第一冲突处理单元位于流水线的第二级、第一数据处理单元位于流水线的第三级和第四级、第一数据收发单元位于流水线的第五级。如果该读请求不是查表请求,则选择器选通读写请求缓存,将读请求直接发送给各级流水线控制器,启动流水线操作。The processor core sends a read request through the system bus. If the read request is a table lookup request, the selector strobes the synchronization register. The read and write request cache receives the read request and caches the read request. After receiving the read request, the read and write request buffer sends an offset address request to the second addressing unit through the internal bus, and sends the read request to the synchronization register. In response to the offset address request, the second addressing unit reads the offset address of the data in the memory from the memory, sends the offset address to the first address calculation unit via the internal bus, and synchronizes the data via the internal bus. Send an offset address valid signal. After receiving the offset address valid signal, the synchronization register sends the read request signal to the pipeline controllers at all levels to start the pipeline operation. The first-level, second-level, third-level, fourth-level, and fifth-level controllers send control signals to the first address calculation, the first conflict processing unit, the first data processing unit, and the first data transceiver unit, respectively. During the addressing process of the first addressing unit, the first address calculation unit is located at the first stage of the pipeline, the first conflict processing unit is located at the second stage of the pipeline, and the first data processing unit is located at the third and fourth stages of the pipeline , The first data transceiver unit is located at the fifth stage of the pipeline. If the read request is not a table lookup request, the selector strobes the read and write request cache, sends the read request directly to the pipeline controllers at all levels, and starts the pipeline operation.
本实施例的第一寻址单元还提供一流水线暂停机制。当处理器核无法通过系统总线接收到第一数据收发单元发送的数据时,处理器核通过系统总线向读取请求缓存、同步寄存器、流水线的第一级、第二级、第三级、第四级和第五级控制器发送一总线暂停信号。读取请求缓存、同步寄存器、流水线的第一级、第二级、第三级、第四级和第五级控制器收到该总线暂停信号后,流水线暂停工作。当第一冲突处理单元发现向量地址存在Bank冲突时,第一冲突处理单元向读取请求缓存、同步寄存器、流水线的第一级、第二级、第三级、第四级和第五级控制器发送一冲突暂停信号,读取请求缓存、同步寄存器、流水线的第一级、第二级、第三级、第四级和第五级控制器收到该总线暂停信号后,流水线暂停工作。待Bank冲突处理完后,第一冲突处理单元发送一冲突恢复信号,读取请求缓存、同步寄存器、流水线的第一级、第二级、第三级、第四级和第五级控制器收到该冲突恢复信号后,重启流水线。The first addressing unit of this embodiment also provides a streamline pause mechanism. When the processor core cannot receive the data sent by the first data transceiver unit through the system bus, the processor core sends the read request cache, synchronization register, and the first stage, second stage, third stage, and third stage of the pipeline through the system bus. The fourth-level and fifth-level controllers send a bus pause signal. After the first, second, third, fourth, and fifth stage controllers of the read request cache, synchronization register, and pipeline receive the bus suspend signal, the pipeline suspends work. When the first conflict processing unit finds that there is a bank conflict in the vector address, the first conflict processing unit controls the read request cache, synchronization register, and the first, second, third, fourth, and fifth stages of the pipeline The controller sends a conflict pause signal, read request buffer, synchronization register, the first stage, second stage, third stage, fourth stage, and fifth stage of the pipeline after receiving the bus pause signal, the pipeline suspends work. After the Bank conflict is processed, the first conflict processing unit sends a conflict recovery signal to read the request buffer, synchronization register, and the first, second, third, fourth, and fifth-level controllers of the pipeline. After the conflict recovery signal is reached, the pipeline is restarted.
本实施例通过采用流水线的方式,使第一寻址单元的各个模块能够按照流水线并行执行,可以大幅提升第一寻址单元的工作效率,减少寻址时间,使寻址效率得以提升。In this embodiment, by adopting a pipeline method, each module of the first addressing unit can be executed in parallel according to the pipeline, which can greatly improve the working efficiency of the first addressing unit, reduce the addressing time, and improve the addressing efficiency.
在本实施例中,当第一地址计算单元从第二寻址单元获取数据的偏移地址时,偏移地址是第二寻址单元从存储器读取的。在第二寻址单元从存储器读取偏移地址之前,处理器核通过系统总线将偏移地址发送给第二寻址单元,第二寻址单元将偏移地址写入存储器。第二寻址单元从存储器读取偏移地址的操作与上述第一寻址单元从存储器读取数据的操作是类似的。如前所述,第二寻址单元与第一寻址单元的结构是相同的,当第二寻址单元从存储器读取偏移地址时,该偏移地址就相当于需要读取的数据,第二寻址单元采用基本上与上述第一寻址单元从存储器读取数据的操作相同的操作,即可将偏移地址从存储器读取出来。参见图4,第二寻址单元包括:第二控制单元、第二地址计算单元、第二冲突处理单元、第二数据处理单元和第二数据收发单元。In this embodiment, when the first address calculation unit obtains the offset address of the data from the second addressing unit, the offset address is read from the memory by the second addressing unit. Before the second addressing unit reads the offset address from the memory, the processor core sends the offset address to the second addressing unit through the system bus, and the second addressing unit writes the offset address into the memory. The operation of the second addressing unit to read the offset address from the memory is similar to the above-mentioned operation of the first addressing unit to read data from the memory. As mentioned earlier, the structure of the second addressing unit is the same as that of the first addressing unit. When the second addressing unit reads the offset address from the memory, the offset address is equivalent to the data that needs to be read. The second addressing unit uses basically the same operation as the operation of the first addressing unit to read data from the memory, and the offset address can be read from the memory. 4, the second addressing unit includes: a second control unit, a second address calculation unit, a second conflict processing unit, a second data processing unit, and a second data transceiving unit.
当第二寻址单元从存储器读取偏移地址时,第二冲突处理单元从存储器读取偏移地址,并将偏移地址发送给第二数据处理单元;第二数据处理单元对偏移地址进行处理,并将处理后的偏移地址发送至第二数据收发单元;第二数据收发单元将处理后的偏移地址发送至第一寻址单元。When the second addressing unit reads the offset address from the memory, the second conflict processing unit reads the offset address from the memory, and sends the offset address to the second data processing unit; the second data processing unit checks the offset address The processing is performed, and the processed offset address is sent to the second data transceiving unit; the second data transceiving unit sends the processed offset address to the first addressing unit.
第二寻址单元与第一寻址单元的区别之处在于,第二数据收发单元将处理后的偏移地址通过内部总线发送给第一寻址单元,而不是像第一寻址单元那样,第一数据收发单元将数据通过系统总线发送给处理器核。除此之外,第二控制单元、第二地址计 算单元、第二冲突处理单元、第二数据处理单元和第二数据收发单元的操作都是与第一寻址单元对应的单元类似的。The difference between the second addressing unit and the first addressing unit is that the second data transceiving unit sends the processed offset address to the first addressing unit through the internal bus instead of the first addressing unit. The first data transceiver unit sends data to the processor core through the system bus. In addition, the operations of the second control unit, the second address calculation unit, the second conflict processing unit, the second data processing unit, and the second data transceiving unit are similar to the units corresponding to the first addressing unit.
写操作Write operation
当处理器核将数据写入存储器时,第一寻址单元的部分操作与读操作是类似的。为了简要起见,以下将重点描述写操作与读操作不同之处。When the processor core writes data into the memory, part of the operation of the first addressing unit is similar to the read operation. For the sake of brevity, the following will focus on the differences between write operations and read operations.
在写操作中,在S103中,处理器核通过第一寻址单元将所述数据写入存储器的向量地址。如图15所示,第一数据收发单元、第一数据处理单元和第一冲突处理单元的数据流与读操作相反。In the write operation, in S103, the processor core writes the data into the vector address of the memory through the first addressing unit. As shown in FIG. 15, the data flow of the first data transceiving unit, the first data processing unit, and the first conflict processing unit is opposite to the read operation.
当存在Bank冲突时,如图16所示,处理器核通过第一寻址单元访问存储地址的数据包括:When there is a bank conflict, as shown in Figure 16, the processor core accesses the data of the storage address through the first addressing unit including:
S1601:第一数据收发单元接收处理器核发送的数据,并将数据发送至第一数据处理单元;S1601: The first data transceiver unit receives data sent by the processor core, and sends the data to the first data processing unit;
S1602:第一数据处理单元对数据进行处理,并将处理后的数据发送至第一冲突处理单元;S1602: The first data processing unit processes the data, and sends the processed data to the first conflict processing unit;
S1603:第一冲突处理单元利用冲突解决机制将数据写入向量地址。S1603: The first conflict processing unit uses the conflict resolution mechanism to write data into the vector address.
在S1601中,接收缓存器接收并缓存处理器核通过系统总线发送的数据,并将数据发送给第一数据处理单元。In S1601, the receiving buffer receives and buffers the data sent by the processor core through the system bus, and sends the data to the first data processing unit.
在S1602中,第一数据处理单元接收到第一数据收发单元发送的数据后,根据处理器核写入的数据宽度,决定是否对数据进行进一步的处理。当处理器核并非是将数据写入各个Bank的各个cell全部字节,而是将数据写入各个Bank的各个cell的部分字节时,第一数据处理单元对第一数据收发单元发送的数据进行拆分,以生成需要写入存储器的数据,并将拆分后的数据发送给第一冲突处理单元。In S1602, after receiving the data sent by the first data transceiving unit, the first data processing unit decides whether to perform further processing on the data according to the data width written by the processor core. When the processor core does not write data into all the bytes of each cell of each Bank, but writes data into the partial bytes of each cell of each Bank, the first data processing unit sends data to the first data transceiver unit. Splitting is performed to generate data that needs to be written into the memory, and the split data is sent to the first conflict processing unit.
具体来说,第一数据处理单元对第一数据收发单元发送的数据进行拆分包括:Specifically, the splitting of the data sent by the first data transceiving unit by the first data processing unit includes:
当第一数据收发单元发送的数据包括m×k块且每块宽度为m个字节时,对每块的m个字节进行拆分,得到N×k个字节,使每k个字节分别对应一个Bank的k个地址;其中,N为存储器的Bank数量;Bank的位宽为m个字节;k≤log 2 mWhen the data sent by the first data transceiver unit includes m×k blocks and the width of each block is m bytes, the m bytes of each block are split to obtain N×k bytes, so that every k words The sections correspond to the k addresses of a Bank respectively; among them, N is the number of banks in the memory; the bit width of the bank is m bytes; k≤log 2 m .
以下以图17为例,对数据拆分的过程进行说明。The following takes Figure 17 as an example to describe the process of data splitting.
对于图8所示的存储器,N=16,m=4,包括16个Bank,Bank的位宽为4个字节,即每个Bank的每个cell存储4个字节的数据。如图17所示,处理器核写入的数据包括4块、且每块包括4个字节,16个字节分别对应存储器B0-Bf的各自一个cell的一 个字节的存储位置。此时对每块的4个字节进行拆分,得到16个字节,使每个字节分别对应一个Bank的一个cell的一个字节的存储位置。拆分后的数据格式即为写入存储器的数据格式,第一数据处理单元将拆分后的数据发送至第一冲突处理单元。For the memory shown in FIG. 8, N=16, m=4, including 16 Banks, and the bit width of the Bank is 4 bytes, that is, each cell of each Bank stores 4 bytes of data. As shown in Figure 17, the data written by the processor core includes 4 blocks, and each block includes 4 bytes. The 16 bytes correspond to the storage location of one byte in each cell of the memory B0-Bf. At this time, the 4 bytes of each block are split to obtain 16 bytes, so that each byte corresponds to the storage location of one byte of one cell of one Bank. The split data format is the data format written into the memory, and the first data processing unit sends the split data to the first conflict processing unit.
当处理器核要将数据写入各个Bank的各个cell的全部字节时,即处理器核要向存储器写入64个字节,则第一数据处理单元无需对第一数据收发单元发送的数据进行拆分,而是直接将数据发送给第一冲突处理单元。When the processor core wants to write data into all the bytes of each cell of each Bank, that is, the processor core needs to write 64 bytes to the memory, the first data processing unit does not need to send data to the first data transceiver unit. Splitting is performed, but the data is directly sent to the first conflict processing unit.
以上以k=1为例对数据拼接的过程进行了说明。根据k≤log 2 m,当m=4时,k的值还可以为2,即处理器核要将数据写入各个Bank的各个cell的两个字节,这种情况下,第一数据处理单元的操作与k=1时是类似的。第一数据处理单元对每块的2个字节进行拆分,得到32个字节,使每个字节分别对应一个Bank的一个cell的两个字节的存储位置。拆分后的数据格式即为写入存储器的数据格式,第一数据处理单元将拆分后的数据发送至第一冲突处理单元。 The process of data splicing is described above by taking k=1 as an example. According to k≤log 2 m , when m=4, the value of k can also be 2, that is, the processor core should write data into the two bytes of each cell of each Bank. In this case, the first data processing The operation of the unit is similar to when k=1. The first data processing unit splits the 2 bytes of each block to obtain 32 bytes, so that each byte corresponds to the storage position of the two bytes of one cell of a bank. The split data format is the data format written into the memory, and the first data processing unit sends the split data to the first conflict processing unit.
以下结合图18介绍S1603中的冲突解决机制。如图18所示,第一冲突处理单元还包括:写数据缓存器、写数据选通器和写数据重组单元。The following describes the conflict resolution mechanism in S1603 with reference to FIG. 18. As shown in FIG. 18, the first conflict processing unit further includes: a write data buffer, a write data strobe, and a write data reorganization unit.
第一地址计算单元发送的向量地址直接输入地址选通器,同时向量地址缓存器对所述向量地址进行缓存。The vector address sent by the first address calculation unit is directly input to the address strobe, and the vector address buffer caches the vector address at the same time.
地址选通器选通第一地址计算单元直接发送的向量地址,使该向量地址输出至冲突判断单元。The address strobe gates the vector address directly sent by the first address calculation unit, so that the vector address is output to the conflict judgment unit.
第一数据处理单元发送的数据发送给写数据选通器,写数据缓存器同时将第一数据处理单元发送的数据缓存。The data sent by the first data processing unit is sent to the write data strobe, and the write data buffer simultaneously buffers the data sent by the first data processing unit.
冲突判断单元对向量地址进行判断:The conflict judgment unit judges the vector address:
当向量地址存在Bank冲突时,冲突判断单元产生一冲突标志有效信号,并将冲突标志有效信号反馈至地址选通器和写数据选通器,使地址选通器选通向量地址缓存器输出的向量地址,写数据选通器选通写数据缓存器输出的数据。这样可在冲突处理期间,保持输入冲突判断单元的向量地址和输入写数据重组单元的数据不变;When there is a bank conflict in the vector address, the conflict judgment unit generates a conflict flag valid signal, and feeds back the conflict flag valid signal to the address strobe and the write data strobe, so that the address strobe gates the output of the vector address register Vector address, the write data strobe strobes the data output from the write data buffer. In this way, the vector address of the input conflict judgment unit and the data of the input write data reorganization unit can be kept unchanged during the conflict processing;
写数据重组单元对数据进行重组;Write data reorganization unit to reorganize data;
地址映射单元将向量地址映射至存储器的物理地址,写数据重组单元将重组后的数据写入存储器的物理地址。The address mapping unit maps the vector address to the physical address of the memory, and the write data reorganization unit writes the reorganized data into the physical address of the memory.
之后冲突判断单元产生一冲突标志失效信号,响应于冲突标志失效信号,地址选择器选通第一地址计算单元发送的下一组数据的向量地址,向量地址缓存器同时缓存 下一组数据的向量地址,写数据选通器选通第一数据处理单元发送的下一组数据,并将下一组数据缓存至写数据缓存器,冲突判断单元继续对下一组数据的Bank冲突进行处理。After that, the conflict judgment unit generates a conflict flag invalidation signal. In response to the conflict flag invalidation signal, the address selector selects the vector address of the next set of data sent by the first address calculation unit, and the vector address buffer caches the vector of the next set of data at the same time. Address, the write data strobe strobes the next set of data sent by the first data processing unit, and buffers the next set of data to the write data buffer, and the conflict judgment unit continues to process the bank conflict of the next set of data.
数据重组单元对数据进行重组包括:The data reorganization unit reorganizes the data including:
确定各个Bank的与向量地址对应的第一个cell,按照地址由小到大的顺序将对应于第一个cell的数据编为一行;Determine the first cell corresponding to the vector address of each Bank, and compile the data corresponding to the first cell into a row according to the order of address ascending;
确定各个Bank的与向量地址对应的第二个cell,按照地址由小到大的顺序将对应于第二个cell的数据编为一行;Determine the second cell corresponding to the vector address of each Bank, and compile the data corresponding to the second cell into a row according to the order of address from small to large;
依次类推,直至确定各个Bank的与向量地址对应的第n个cell,按照地址由小到大的顺序将对应于第n个cell的数据编为一行,共得到n行数据。By analogy, until the n-th cell corresponding to the vector address of each Bank is determined, the data corresponding to the n-th cell is compiled into one row according to the order of address from small to large, and a total of n rows of data are obtained.
地址映射单元将向量地址映射至存储器的物理地址,包括:The address mapping unit maps the vector address to the physical address of the memory, including:
地址映射单元依次选通n行数据对应的n组cell;The address mapping unit sequentially selects n groups of cells corresponding to n rows of data;
写数据重组单元将重组后的数据写入存储器的物理地址,包括:The write data reorganization unit writes the reorganized data into the physical address of the memory, including:
按照n组cell的选通顺序,依次将n行数据写入n组cell。According to the strobe sequence of the n groups of cells, n rows of data are written into the n groups of cells in sequence.
以下以图19为例对上述冲突解决机制进行说明。在一个示例中,如图19所示,存储器的Bank数量N=16,存储器包括16个Bank,每个Bank包括5个cell,每个cell可存储4个字节32bit。采用查表寻址方式,每组数据的向量地址包括16个地址,第一寻址单元每次可向存储器写入一组16个数据。假设处理器核需要写入一组标号为“1”、“2”、“3”、“4”的数据,且标号为“1”、“2”、“3”、“4”的数据的基地址为0,偏移地址为[0,1,2,3,68,69,70,71,136,137,138,139,204,205,206,207],第一地址计算单元输出的向量地址为[0,1,2,3,68,69,70,71,136,137,138,139,204,205,206,207]。该向量地址直接输入第一冲突处理单元的地址选通器,同时向量地址缓存器对该向量地址进行缓存。地址选通器选通该向量地址,冲突判断单元对该向量地址进行判断。The following uses FIG. 19 as an example to describe the above conflict resolution mechanism. In an example, as shown in FIG. 19, the number of banks in the memory is N=16, the memory includes 16 banks, each bank includes 5 cells, and each cell can store 4 bytes of 32 bits. Using table look-up addressing mode, the vector address of each group of data includes 16 addresses, and the first addressing unit can write a group of 16 data to the memory each time. Suppose the processor core needs to write a set of data labeled "1", "2", "3", and "4", and data labeled "1", "2", "3", "4" The base address is 0, the offset address is [0, 1, 2, 3, 68, 69, 70, 71, 136, 137, 138, 139, 204, 205, 206, 207], the output of the first address calculation unit The vector address is [0, 1, 2, 3, 68, 69, 70, 71, 136, 137, 138, 139, 204, 205, 206, 207]. The vector address is directly input to the address strobe of the first conflict processing unit, and at the same time, the vector address buffer caches the vector address. The address strobe selects the vector address, and the conflict judgment unit judges the vector address.
如图19所示,由于该组数据的向量地址存在对应于存储器的同一个Bank的情况,例如,向量地址[1]和[69]对应于B1的第一和第二个cell,[2],[69],[136]对应B2的第一、第二和第三个cell,等等,因此,该向量地址存在Bank冲突。此时冲突判断单元产生一冲突标志有效信号,在冲突标志有效信号的控制下,地址选通器选通向量地址缓存器输出的向量地址,写数据选通器选通写数据缓存器输出的数据。这样可在冲突处理期间,保持该向量地址不变,一直到冲突处理结束。As shown in Figure 19, since the vector address of this group of data corresponds to the same Bank of the memory, for example, the vector address [1] and [69] correspond to the first and second cells of B1, [2] , [69], [136] correspond to the first, second and third cells of B2, etc. Therefore, there is a Bank conflict in this vector address. At this time, the conflict judgment unit generates a conflict flag valid signal. Under the control of the conflict flag valid signal, the address strobe gates the vector address output by the vector address buffer, and the write data strobe gates the data output from the write data buffer. . In this way, during the conflict processing period, the vector address can be kept unchanged until the conflict processing ends.
之后写数据重组单元对数据进行重组,将对应于B0-Bf的与向量地址对应的第一 个cell的字节编为一行。在图19的数据中,向量地址[0,1,2,3]对应的数据与B0-B3的第一个cell对应,向量地址[71]对应的数据与B4的第二个cell,向量地址[139]对应的数据与B5的第三个ell对应,向量地址[207]对应的数据与B6的第四个cell对应。所以,第一行数据为与向量地址[0,1,2,3,71,139,207]对应的数据,即标号为“1”的7个数据。Afterwards, the write data reorganization unit reorganizes the data and compiles the bytes of the first cell corresponding to the vector address corresponding to B0-Bf into one row. In the data in Figure 19, the data corresponding to the vector address [0, 1, 2, 3] corresponds to the first cell of B0-B3, and the data corresponding to the vector address [71] corresponds to the second cell of B4, the vector address [139] The corresponding data corresponds to the third ell of B5, and the data corresponding to the vector address [207] corresponds to the fourth cell of B6. Therefore, the first row of data is the data corresponding to the vector address [0, 1, 2, 3, 71, 139, 207], that is, the 7 data labeled "1".
同理,将对应于B0-Bf的与向量地址对应的第二个cell的字节编为一行。在图19的数据中,向量地址[68,69,70]、[138][206]对应的数据分别与B1-B3的第二个cell对应,B4的第三个cell,B5的第四个cell对应。所以,第二行数据为与向量地址[0,1,2,3,71,139,207]对应的数据,即标号为“2”的7个数据。In the same way, the bytes of the second cell corresponding to the vector address corresponding to B0-Bf are compiled into a row. In the data in Figure 19, the data corresponding to the vector addresses [68,69,70], [138][206] correspond to the second cell of B1-B3, the third cell of B4, and the fourth of B5. The cell corresponds. Therefore, the second row of data is the data corresponding to the vector address [0, 1, 2, 3, 71, 139, 207], that is, the 7 data labeled "2".
以此类推,可以得到对应于B0-Bf的与向量地址对应的第三个cell、第四个cell的两行数据,第三行和第四行数据分别为包括标号为“3”的3个数据、标号为“4”的1个数据。地址映射单元依次选通这n行数据对应的n组cell。By analogy, two rows of data corresponding to the third cell and fourth cell corresponding to the vector address of B0-Bf can be obtained. The third and fourth rows of data respectively include the three labeled "3" Data, 1 data labeled "4". The address mapping unit sequentially selects the n groups of cells corresponding to the n rows of data.
在图8的示例中,n=4,即将数据分为四组。在其他示例中,n可以是其他值,其取决于向量地址本身。In the example in Fig. 8, n=4, that is, the data is divided into four groups. In other examples, n can be other values, which depend on the vector address itself.
写数据重组单元按照这4组cell的选通顺序,依次将这4行数据写入存储器,写入的数据如图19所示。通过上述冲突解决机制,一个时钟周期可写入一组cell存储的数据,通过四个时钟周期即可完成数据的写入。The write data reorganization unit sequentially writes the 4 rows of data into the memory according to the strobe sequence of the 4 groups of cells, and the written data is shown in FIG. 19. Through the above-mentioned conflict resolution mechanism, data stored in a group of cells can be written in one clock cycle, and data writing can be completed in four clock cycles.
需要说明的是,本实施例可以采用多种选通顺序,例如,可以顺序依次选通第一至第四组cell(如图19所示),也可以倒序依次选通第四组至第一组cell,也可以随机依次选通四组cell。It should be noted that in this embodiment, multiple gating sequences can be used. For example, the first to fourth groups of cells can be sequentially strobed (as shown in FIG. 19), or the fourth group to the first group can be sequentially strobed in reverse order. Group cells, four groups of cells can also be selected in sequence at random.
至此,该组数据的冲突处理结束,冲突判断单元产生一冲突标志失效信号。响应于冲突标志失效信号,地址选择器选通第一地址计算单元发送的下一组数据的向量地址,向量地址缓存器同时缓存下一组数据的向量地址,写数据选通器选通第一数据处理单元发送的下一组数据,并将下一组数据缓存至写数据缓存器,冲突判断单元继续对下一组数据的Bank冲突进行处理。At this point, the conflict processing of the group of data ends, and the conflict judgment unit generates a conflict flag failure signal. In response to the conflict flag failure signal, the address selector strobes the vector address of the next set of data sent by the first address calculation unit, the vector address buffer buffers the vector address of the next set of data at the same time, and the write data strobe strobes the first set of data. The data processing unit sends the next set of data and buffers the next set of data to the write data buffer, and the conflict judgment unit continues to process the bank conflict of the next set of data.
在冲突解决机制中,冲突判断单元对向量地址进行判断,当向量地址不存在Bank冲突时,冲突判断单元产生一冲突标志失效信号;地址映射单元将向量地址映射至存储器的物理地址;写数据重组单元将数据写入物理地址;响应于冲突标志失效信号,地址选择器选通下一组数据的向量地址。In the conflict resolution mechanism, the conflict judgment unit judges the vector address. When there is no bank conflict in the vector address, the conflict judgment unit generates a conflict flag failure signal; the address mapping unit maps the vector address to the physical address of the memory; write data reorganization The unit writes data to the physical address; in response to the conflict flag failure signal, the address selector selects the vector address of the next set of data.
至此,处理器核通过第一寻址单元将数据写入存储器的操作完成。So far, the operation of writing data into the memory by the processor core through the first addressing unit is completed.
与读操作类似,本实施例寻址方法,在处理器中设置第一寻址单元,基地址和偏 移地址的获取以及向量地址的计算均由第一寻址单元完成。当向量地址存在Bank冲突,由第一寻址单元利用冲突解决机制解决Bank冲突,无需处理器核处理Bank冲突进行处理。在处理Bank冲突期间,处理器核仍然可以执行其他操作,无需等待Bank冲突的解决。因此,本实施例寻址方法可以显著提高处理器的效率,提升处理器的运算速度。Similar to the read operation, in the addressing method of this embodiment, the first addressing unit is set in the processor, and the acquisition of the base address and the offset address and the calculation of the vector address are all completed by the first addressing unit. When there is a bank conflict in the vector address, the first addressing unit uses the conflict resolution mechanism to resolve the bank conflict, and the processor core does not need to deal with the bank conflict. During the processing of the Bank conflict, the processor core can still perform other operations without waiting for the resolution of the Bank conflict. Therefore, the addressing method of this embodiment can significantly improve the efficiency of the processor and increase the computing speed of the processor.
在写操作中,第一寻址单元的第一控制单元也通过握手协议与处理器核通信。In the write operation, the first control unit of the first addressing unit also communicates with the processor core through the handshake protocol.
如图20所示,处理器核与第一寻址单元通过系统总线通信,系统总线包括:时钟信号线、写请求有效、写请求备好、写请求、写数据线和写繁忙,第一寻址单元在时钟信号线的驱动下工作。当处理器核需要向存储器写入数据时,处理器核通过握手协议向第一寻址单元发送任务指令和数据。当写请求有效信号为高时,表示写请求信号和写数据有效;当写请求有效信号和写请求备好信号同为高时,第一控制单元从处理器核读取写请求,且写繁忙信号拉高。之后第一控制单元控制第一地址计算单元、第一冲突处理单元、第一数据处理单元和第一数据收发单元向存储器写入数据,写繁忙信号拉低。As shown in Figure 20, the processor core communicates with the first addressing unit through the system bus. The system bus includes: clock signal line, write request valid, write request ready, write request, write data line, and write busy. The address unit works under the drive of the clock signal line. When the processor core needs to write data to the memory, the processor core sends task instructions and data to the first addressing unit through a handshake protocol. When the write request valid signal is high, it means that the write request signal and write data are valid; when the write request valid signal and the write request ready signal are both high, the first control unit reads the write request from the processor core and the write is busy The signal is pulled high. After that, the first control unit controls the first address calculation unit, the first conflict processing unit, the first data processing unit, and the first data transceiver unit to write data to the memory, and the write busy signal is pulled low.
在写操作中,第一控制单元控制第一地址计算单元、第一冲突处理单元、第一数据处理单元和第一数据收发单元以流水线的方式工作。In the write operation, the first control unit controls the first address calculation unit, the first conflict processing unit, the first data processing unit, and the first data transceiving unit to work in a pipeline manner.
如图21所示,处理器核通过系统总线发送一写请求,如果该写请求为查表请求,则选择器选通同步寄存器。读写请求缓存接收该写请求,并将写请求缓存。收到写请求后,读写请求缓存通过内部总线向第二寻址单元发送偏移地址请求,并将写请求发送至同步寄存器。响应于该偏移地址请求,第二寻址单元从存储器中读取所述数据在存储器中的偏移地址,将偏移地址通过内部总线发送给第一地址计算单元,并通过内部总线向同步发送一偏移地址有效信号。收到偏移地址有效信号后,同步寄存器将写请求发送给各级流水线控制器,启动流水线操作。第一级、第二级控制器分别向第一地址计算、第一数据处理单元和第一冲突处理单元发送控制信号,在第一寻址单元的寻址过程中,第一数据收发单元与读写请求缓存位于同一级,第一地址计算单元和第一数据处理单元位于流水线的第一级、第一冲突处理单元位于流水线的第二级。如果该写请求不是查表请求,则选择器选通读写请求缓存,将写请求直接发送给各级流水线控制器,启动流水线操作。As shown in Figure 21, the processor core sends a write request through the system bus. If the write request is a table lookup request, the selector strobes the synchronization register. The read and write request cache receives the write request and caches the write request. After receiving the write request, the read and write request buffer sends an offset address request to the second addressing unit through the internal bus, and sends the write request to the synchronization register. In response to the offset address request, the second addressing unit reads the offset address of the data in the memory from the memory, sends the offset address to the first address calculation unit via the internal bus, and synchronizes the data via the internal bus. Send an offset address valid signal. After receiving the offset address valid signal, the synchronization register sends the write request to the pipeline controllers at all levels to start the pipeline operation. The first-level and second-level controllers respectively send control signals to the first address calculation, the first data processing unit, and the first conflict processing unit. During the addressing process of the first addressing unit, the first data transceiver unit and the read The write request cache is located in the same stage, the first address calculation unit and the first data processing unit are located in the first stage of the pipeline, and the first conflict processing unit is located in the second stage of the pipeline. If the write request is not a table lookup request, the selector strobes the read and write request cache, sends the write request directly to the pipeline controllers at all levels, and starts the pipeline operation.
同样,在写操作中也提供有流水线暂停机制。当处理器核无法通过系统总线向第一数据收发单元发送数据时,处理器核通过系统总线向读取请求缓存、同步寄存器、 流水线的第一级、第二级控制器发送一总线暂停信号。读取请求缓存、同步寄存器、流水线的第一级、第二级控制器收到该总线暂停信号后,流水线暂停工作。当第一冲突处理单元发现向量地址存在Bank冲突时,第一冲突处理单元向读取请求缓存、同步寄存器、流水线的第一级、第二级控制器发送一冲突暂停信号,读取请求缓存、同步寄存器、流水线的第一级、第二级控制器收到该总线暂停信号后,流水线暂停工作。待Bank冲突处理完后,第一冲突处理单元发送一冲突恢复信号,读取请求缓存、同步寄存器、流水线的第一级、第二级控制器收到该冲突恢复信号后,重启流水线。Similarly, a pipeline suspend mechanism is also provided in the write operation. When the processor core cannot send data to the first data transceiver unit through the system bus, the processor core sends a bus suspend signal to the read request cache, the synchronization register, and the first stage and second stage controllers of the pipeline through the system bus. After the read request buffer, synchronization register, and the first and second stage controllers of the pipeline receive the bus pause signal, the pipeline suspends work. When the first conflict processing unit finds that there is a bank conflict in the vector address, the first conflict processing unit sends a conflict pause signal to the read request cache, the synchronization register, the first and second stage controllers of the pipeline, and the read request cache, After the synchronization register and the first stage and second stage controllers of the pipeline receive the bus pause signal, the pipeline suspends work. After the bank conflict is processed, the first conflict processing unit sends a conflict recovery signal, and the read request buffer, synchronization register, and the first and second stage controllers of the pipeline restart the pipeline after receiving the conflict recovery signal.
本实施例通过采用流水线的方式,使第一寻址单元的各个模块能够按照流水线并行执行,可以大幅提升第一寻址单元的工作效率,减少寻址时间,使寻址效率得以提升。In this embodiment, by adopting a pipeline method, each module of the first addressing unit can be executed in parallel according to the pipeline, which can greatly improve the working efficiency of the first addressing unit, reduce the addressing time, and improve the addressing efficiency.
在第二寻址单元从存储器读取偏移地址之前,处理器核通过系统总线将偏移地址发送给第二寻址单元,第二寻址单元将偏移地址写入存储器。第二寻址单元向存储器写入偏移地址的操作与上述第一寻址单元向存储器写入数据的操作是类似的。如前所述,第二寻址单元与第一寻址单元的结构是相同的,当第二寻址单元向存储器写入偏移地址时,该偏移地址就相当于写入的数据,第二寻址单元采用与上述第一寻址单元向存储器写入数据的操作相同的操作,即可将偏移地址写入存储器。Before the second addressing unit reads the offset address from the memory, the processor core sends the offset address to the second addressing unit through the system bus, and the second addressing unit writes the offset address into the memory. The operation of the second addressing unit to write the offset address to the memory is similar to the operation of the above-mentioned first addressing unit to write data to the memory. As mentioned earlier, the structure of the second addressing unit is the same as that of the first addressing unit. When the second addressing unit writes an offset address to the memory, the offset address is equivalent to the written data. The second addressing unit can write the offset address into the memory by using the same operation as the operation of the first addressing unit to write data to the memory.
本公开另一实施例提供了一种处理器的寻址方法。在本实施例中,如图22所示,寻址模块包括多组寻址单元,每组寻址单元可以是上一实施例的一组寻址单元。Another embodiment of the present disclosure provides an addressing method for a processor. In this embodiment, as shown in FIG. 22, the addressing module includes multiple groups of addressing units, and each group of addressing units may be a group of addressing units of the previous embodiment.
每组寻址单元包括:相同的两个寻址单元。两个寻址单元分别通过系统总线与处理器核进行通信。在寻址模块内设置有内部总线,两个寻址单元之间通过内部总线进行通信。当执行本实施例的寻址方式时,两个寻址单元中的其中一个寻址单元用于对数据进行读写,该寻址单元可称为表寻址单元,而另一个寻址单元用于对该数据在存储器的偏移地址进行读写,该另一寻址单元可称为偏移地址寻址单元。Each group of addressing units includes: the same two addressing units. The two addressing units communicate with the processor core through the system bus respectively. An internal bus is set in the addressing module, and the two addressing units communicate through the internal bus. When the addressing mode of this embodiment is implemented, one of the two addressing units is used to read and write data. This addressing unit can be called a table addressing unit, and the other addressing unit is used for reading and writing data. To read and write the data at the offset address of the memory, the other addressing unit may be called an offset addressing unit.
本实施例的寻址方法可由多组寻址单元并行地执行。每组寻址单元均可通过系统总线与处理器核进行通信,并对存储器进行读写。当处理器核需要同时对多组数据进行读写时,可分别由各组寻址单元独立完成各自的寻址任务。具体包括多少组寻址单元,本实施例不做限制,可根据实际需求而确定。相对于单组寻址单元,本实施例可成倍地提到处理器的寻址效率,极大地提高了处理器的寻址能力。The addressing method of this embodiment can be executed in parallel by multiple groups of addressing units. Each group of addressing units can communicate with the processor core through the system bus, and read and write the memory. When the processor core needs to read and write multiple groups of data at the same time, each group of addressing units can complete their respective addressing tasks independently. How many groups of addressing units are specifically included is not limited in this embodiment, and can be determined according to actual requirements. Compared with a single group of addressing units, this embodiment can double the addressing efficiency of the processor, which greatly improves the addressing ability of the processor.
本公开再一实施例提供了一种处理器的寻址方法。在本实施例中,一组寻址单元通过乒乓寻址方式获取基地址或偏移地址。Yet another embodiment of the present disclosure provides an addressing method for a processor. In this embodiment, a group of addressing units obtains the base address or the offset address through the ping-pong addressing mode.
以下结合图23介绍通过乒乓寻址方式获取偏移地址。如图23所示,一组寻址单元包括:第三寻址单元、第四寻址单元和第五寻址单元。乒乓寻址方式包括:The following describes the acquisition of the offset address through the ping-pong addressing mode in conjunction with FIG. 23. As shown in FIG. 23, a group of addressing units includes: a third addressing unit, a fourth addressing unit, and a fifth addressing unit. Ping-pong addressing modes include:
处理器核通过第四寻址单元和第五寻址单元将偏移地址交替写入存储器;The processor core alternately writes the offset address into the memory through the fourth addressing unit and the fifth addressing unit;
第三寻址单元获取处理器核发送的基地址,并通过第四寻址单元和第五寻址单元交替获取存储在存储器中的偏移地址。The third addressing unit obtains the base address sent by the processor core, and alternately obtains the offset address stored in the memory through the fourth addressing unit and the fifth addressing unit.
第三寻址单元作为表寻址单元,第四和第五寻址单元作为偏移地址寻址单元。当处理器核读写多组数据时,可通过第五寻址单元将下一组偏移地址写入存储器时,第三寻址单元向第四寻址单元发送偏移地址请求,第四寻址单元收到偏移地址请求后,从存储器读取上一组偏移地址,并将该上一组偏移地址发送至第三寻址单元。之后第四寻址单元和第五寻址单元的角色互换。处理器核通过第四寻址单元将下下一组偏移地址写入存储器,同时,第三寻址单元向第五寻址单元发送偏移地址请求,第五寻址单元收到偏移地址请求后,从存储器读取该下一组偏移地址,并将该下一组偏移地址发送至第三寻址单元。如此不断反复切换,第三寻址单元交替从第四寻址单元和第五寻址单元获取偏移地址,实现偏移地址的乒乓寻址。The third addressing unit is used as a table addressing unit, and the fourth and fifth addressing units are used as an offset addressing unit. When the processor core reads and writes multiple sets of data, the next set of offset addresses can be written into the memory through the fifth addressing unit, the third addressing unit sends an offset address request to the fourth addressing unit, and the fourth seeks After receiving the offset address request, the addressing unit reads the last set of offset addresses from the memory, and sends the last set of offset addresses to the third addressing unit. After that, the roles of the fourth addressing unit and the fifth addressing unit are exchanged. The processor core writes the next set of offset addresses into the memory through the fourth addressing unit. At the same time, the third addressing unit sends an offset address request to the fifth addressing unit, and the fifth addressing unit receives the offset address. After the request, the next set of offset addresses are read from the memory, and the next set of offset addresses are sent to the third addressing unit. By switching repeatedly in this way, the third addressing unit alternately obtains the offset address from the fourth addressing unit and the fifth addressing unit, so as to realize the ping-pong addressing of the offset address.
以下结合图24介绍通过乒乓寻址方式获取基地址。如图24所示,一组寻址单元包括:第六寻址单元、第七寻址单元和第八寻址单元。乒乓寻址方式包括:The following describes how to obtain the base address through ping-pong addressing in conjunction with Figure 24. As shown in FIG. 24, a group of addressing units includes: a sixth addressing unit, a seventh addressing unit, and an eighth addressing unit. Ping-pong addressing modes include:
处理器核通过第八寻址单元将偏移地址写入存储器;The processor core writes the offset address into the memory through the eighth addressing unit;
第六寻址单元和第七寻址单元交替获取处理器核发送的基地址,并通过第八寻址单元获取存储在存储器中的偏移地址。The sixth addressing unit and the seventh addressing unit alternately obtain the base address sent by the processor core, and obtain the offset address stored in the memory through the eighth addressing unit.
第六寻址单元和第七寻址单元作为表寻址单元,第八寻址单元作为偏移地址寻址单元。当处理器核读写多组数据时,将下一个基地址发送给第七寻址单元的同时,第六寻址单元向第八寻址单元发送偏移地址请求,第八寻址单元收到偏移地址请求后,从存储器读取偏移地址,并将该偏移地址发送至第六寻址单元。之后第六寻址单元和第七寻址单元的角色互换。处理器核将下下一个基地址发送至第六寻址单元,同时,第七寻址单元向第八寻址单元发送偏移地址请求,第八寻址单元收到偏移地址请求后,从存储器读取该偏移地址,并将该偏移地址发送至第七寻址单元。如此不断反复切换,第六寻址单元和第七寻址单元交替从第八寻址单元获取偏移地址,实现基地址的乒乓寻址。The sixth addressing unit and the seventh addressing unit are used as a table addressing unit, and the eighth addressing unit is used as an offset addressing unit. When the processor core reads and writes multiple sets of data, while sending the next base address to the seventh addressing unit, the sixth addressing unit sends an offset address request to the eighth addressing unit, and the eighth addressing unit receives After the offset address is requested, the offset address is read from the memory, and the offset address is sent to the sixth addressing unit. After that, the roles of the sixth addressing unit and the seventh addressing unit are exchanged. The processor core sends the next base address to the sixth addressing unit. At the same time, the seventh addressing unit sends an offset address request to the eighth addressing unit. After the eighth addressing unit receives the offset address request, the The memory reads the offset address and sends the offset address to the seventh addressing unit. By switching repeatedly in this way, the sixth addressing unit and the seventh addressing unit alternately obtain the offset address from the eighth addressing unit to realize the ping-pong addressing of the base address.
由此可见,本实施例通过乒乓寻址的方式,三个以上的寻址单元可并行执行基地址和偏移地址的写入和读取操作,提高了处理器的寻址能力,尤其在大规模查表寻址 中,可大幅提高寻址效率。It can be seen that, in this embodiment, through the ping-pong addressing mode, three or more addressing units can execute the write and read operations of the base address and the offset address in parallel, which improves the addressing ability of the processor, especially in large In scale look-up table addressing, addressing efficiency can be greatly improved.
本公开又一实施例提供了一种处理器,参见图2所示,处理器包括:处理器核、寻址模块和存储器。寻址模块可集成在处理器内部。Another embodiment of the present disclosure provides a processor. As shown in FIG. 2, the processor includes: a processor core, an addressing module, and a memory. The addressing module can be integrated inside the processor.
寻址模块用于获取数据在存储器的基地址以及偏移地址,并根据基地址和偏移地址得到数据在存储器的存储地址;The addressing module is used to obtain the base address and offset address of the data in the memory, and obtain the storage address of the data in the memory according to the base address and the offset address;
处理器核可通过寻址模块访问存储器在存储地址的数据。The processor core can access the data at the storage address of the memory through the addressing module.
寻址模块可包括一组或多组寻址单元。对于每组寻址单元,如图3所示,包括相同的两个寻址单元。两个寻址单元分别通过系统总线与处理器核进行通信。在寻址模块内设置有内部总线,两个寻址单元之间通过内部总线进行通信。两个寻址单元中的其中一个寻址单元用于对数据进行读写,该寻址单元可称为表寻址单元,而另一个寻址单元用于对该数据在存储器的偏移地址进行读写,该另一寻址单元可称为偏移地址寻址单元。为描述方便,以下将两个寻址单元分别称为第一寻址单元和第二寻址单元,并以第一寻址单元作为表寻址单元,第二寻址单元作为偏移地址寻址单元为例,对处理器进行说明。但本领域技术人员应当明白,第一寻址单元和第二寻址单元的角色也可以互换,即第一寻址单元作为偏移地址寻址单元,第二寻址单元作为表寻址单元。The addressing module may include one or more groups of addressing units. For each group of addressing units, as shown in Figure 3, the same two addressing units are included. The two addressing units communicate with the processor core through the system bus respectively. An internal bus is set in the addressing module, and the two addressing units communicate through the internal bus. One of the two addressing units is used to read and write data. This addressing unit can be called a table addressing unit, and the other addressing unit is used to perform the offset address of the data in the memory. For reading and writing, the other addressing unit can be called an offset addressing unit. For the convenience of description, the two addressing units are referred to as the first addressing unit and the second addressing unit respectively, and the first addressing unit is used as the table addressing unit, and the second addressing unit is used as the offset addressing unit. Take the unit as an example to describe the processor. However, those skilled in the art should understand that the roles of the first addressing unit and the second addressing unit can also be interchanged, that is, the first addressing unit is used as an offset addressing unit, and the second addressing unit is used as a table addressing unit. .
本实施例处理器,在处理器中设置寻址模块,通过寻址模块而不是处理器核计算数据在存储器的存储地址,寻址操作均由寻址模块完成,存储地址的计算过程不需要处理器核参与,而是由寻址模块计算存储地址,相对于一般的处理器,提高了查表寻址的效率。In the processor of this embodiment, an addressing module is set in the processor, and the storage address of the data in the memory is calculated through the addressing module instead of the processor core. The addressing operation is completed by the addressing module, and the storage address calculation process does not need to be processed. The processor core participates, but the addressing module calculates the storage address, which improves the efficiency of table look-up addressing compared with ordinary processors.
当处理器核需要从存储器读取数据时,第一寻址单元用于获取数据在存储器的基地址以及偏移地址。When the processor core needs to read data from the memory, the first addressing unit is used to obtain the base address and offset address of the data in the memory.
参见图4,第一寻址单元包括:第一地址计算单元、第一冲突处理单元、第一数据处理单元、第一数据收发单元和第一控制单元。Referring to FIG. 4, the first addressing unit includes: a first address calculation unit, a first conflict processing unit, a first data processing unit, a first data transceiving unit, and a first control unit.
第一控制单元可通过系统总线与处理器核通信,并用于控制第一地址计算单元、第一冲突处理单元、第一数据处理单元和第一数据收发单元的操作。当处理器核需要从存储器读取数据时,处理器核可通过系统总线向第一控制单元发送读请求;响应于该读请求,第一控制单元可通过内部总线向第二寻址单元发送偏移地址请求。响应于该偏移地址请求,第二寻址单元可从存储器中读取所述数据在存储器中的偏移地址,将偏移地址通过内部总线发送给第一地址计算单元,并可通过内部总线向第一控制单元发送一偏移地址有效信号。响应于偏移地址有效信号,第一控制单元可启动第一寻 址单元的各个其他单元进行读操作。The first control unit can communicate with the processor core through the system bus, and is used to control the operations of the first address calculation unit, the first conflict processing unit, the first data processing unit, and the first data transceiving unit. When the processor core needs to read data from the memory, the processor core can send a read request to the first control unit via the system bus; in response to the read request, the first control unit can send a bias to the second addressing unit via the internal bus. Move address request. In response to the offset address request, the second addressing unit can read the offset address of the data in the memory from the memory, and send the offset address to the first address calculation unit via the internal bus, and can use the internal bus Send an offset address valid signal to the first control unit. In response to the offset address valid signal, the first control unit can start each other unit of the first addressing unit to perform a read operation.
第一地址计算模块用于接收处理器核发送的所述数据的基地址、以及第二寻址单元发送的偏移地址,并由基地址和偏移地址得到所述数据的向量地址。如图5所示,第一地址计算单元包括:基地址选择器、偏移地址选择器和加法器。The first address calculation module is configured to receive the base address of the data sent by the processor core and the offset address sent by the second addressing unit, and obtain the vector address of the data from the base address and the offset address. As shown in FIG. 5, the first address calculation unit includes: a base address selector, an offset address selector, and an adder.
基地址选择器用于选择处理器核发送的基地址。偏移地址选择器用于选择第二寻址单元通过内部总线发送的偏移地址。对于查表寻址来说,偏移地址的数量与存储器的Bank的数量对应。当存储器包括N个Bank时,偏移地址的数量为N。The base address selector is used to select the base address sent by the processor core. The offset address selector is used to select the offset address sent by the second addressing unit through the internal bus. For look-up table addressing, the number of offset addresses corresponds to the number of banks in the memory. When the memory includes N banks, the number of offset addresses is N.
第一地址计算单元还用于根据基地址和偏移地址得到数据在存储器的存储地址。The first address calculation unit is also used to obtain the storage address of the data in the memory according to the base address and the offset address.
第一地址计算单元的加法器用于分别将基地址与N个偏移地址求和,得到数据的存储地址,该存储地址为包括16个地址的向量地址。The adder of the first address calculation unit is used to respectively sum the base address and the N offset addresses to obtain the storage address of the data, and the storage address is a vector address including 16 addresses.
得到向量地址后,处理器核通过第一寻址单元从存储器读取存储在向量地址的数据。当向量地址存在Bank冲突时,处理器核可通过第一寻址单元读取向量地址的数据。After obtaining the vector address, the processor core reads the data stored at the vector address from the memory through the first addressing unit. When the vector address has a bank conflict, the processor core can read the data of the vector address through the first addressing unit.
第一冲突处理单元用于判断是否存在Bank冲突,当存在Bank冲突时,第一冲突处理单元可利用一冲突解决机制从所述向量地址读取所述数据,并将所述数据发送给所述第一数据处理单元;第一数据处理单元用于对所述数据进行处理,并将处理后的所述数据发送至所述第一数据收发单元;第一数据收发单元用于将处理后的所述数据发送至处理器核。The first conflict processing unit is used to determine whether there is a Bank conflict. When there is a Bank conflict, the first conflict processing unit can use a conflict resolution mechanism to read the data from the vector address and send the data to the The first data processing unit; the first data processing unit is used to process the data and send the processed data to the first data transceiving unit; the first data transceiving unit is used to send the processed data The data is sent to the processor core.
如图7所示,第一冲突处理单元包括:冲突判断单元、地址映射单元、地址选通器和读数据重组单元。As shown in FIG. 7, the first conflict processing unit includes: a conflict judgment unit, an address mapping unit, an address strobe, and a read data recombination unit.
在冲突解决机制中,第一地址计算单元发送的向量地址直接输入地址选通器,同时向量地址缓存器用于对所述向量地址进行缓存。In the conflict resolution mechanism, the vector address sent by the first address calculation unit is directly input to the address strobe, and the vector address buffer is used to buffer the vector address.
地址选通器用于选通第一地址计算单元直接发送的向量地址,使该向量地址输出至冲突判断单元。The address strobe is used to strobe the vector address directly sent by the first address calculation unit, so that the vector address is output to the conflict judgment unit.
冲突判断单元用于对向量地址进行判断:The conflict judgment unit is used to judge the vector address:
当向量地址存在Bank冲突时,冲突判断单元用于产生一冲突标志有效信号,并将冲突标志有效信号反馈至地址选通器,使地址选通器选通向量地址缓存器输出的向量地址,这样可在冲突处理期间,保持输入冲突判断单元的向量地址不变。When there is a bank conflict in the vector address, the conflict judging unit is used to generate a conflict flag valid signal and feed back the conflict flag valid signal to the address strobe, so that the address strobe strobes the vector address output by the vector address register. During the conflict processing, the vector address of the input conflict judgment unit can be kept unchanged.
地址映射单元用于将向量地址映射至存储器的物理地址。The address mapping unit is used to map the vector address to the physical address of the memory.
读数据重组单元用于读取物理地址的数据,对数据进行重组,并将重组后的所述数据发送至第一数据处理单元。The read data reorganization unit is used to read the data of the physical address, reorganize the data, and send the reorganized data to the first data processing unit.
之后冲突判断单元用于产生一冲突标志失效信号,响应于冲突标志失效信号,地址选择器用于选通第一地址计算单元发送的下一个数据的向量地址,向量地址缓存器用于同时缓存下一个数据的向量地址,冲突判断单元用于继续对下一个数据的Bank冲突进行处理。Afterwards, the conflict judgment unit is used to generate a conflict flag invalidation signal. In response to the conflict flag invalidation signal, the address selector is used to gate the vector address of the next data sent by the first address calculation unit, and the vector address buffer is used to buffer the next data at the same time. The conflict judgment unit is used to continue processing the bank conflict of the next data.
地址映射单元用于分别将各个Bank的与向量地址对应的第一个cell编为一组、与向量地址对应的第二个cell编为一组,依次类推,直至将与向量地址对应的第n个cell编为一组,共得到n组cell,并依次选通存储器的这n组cell。The address mapping unit is used to group the first cell corresponding to the vector address of each bank into a group, and the second cell corresponding to the vector address into a group, and so on, until the nth cell corresponding to the vector address is grouped. The cells are grouped into a group to obtain a total of n groups of cells, and the n groups of cells in the memory are sequentially strobed.
读数据重组单元可通过以下方式读取向量地址的数据,并对数据进行重组:The read data reorganization unit can read the vector address data and reorganize the data in the following ways:
按照所述n组cell的选通顺序,依次读取所述n组cell存储的数据,并将n组cell存储的数据按照地址由小到大的顺序重新排列,得到重组后的数据。经过重组后,数据是按照其在存储器的实际存储位置进行排列的,数据重组单元将重组后的数据发送至第一数据处理单元,以供处理器核使用。According to the gating sequence of the n groups of cells, sequentially read the data stored in the n groups of cells, and rearrange the data stored in the n groups of cells in the order of address from small to large, to obtain the reorganized data. After reorganization, the data is arranged according to its actual storage location in the memory, and the data reorganization unit sends the reorganized data to the first data processing unit for use by the processor core.
至此,该组数据的冲突处理结束,冲突判断单元用于产生一冲突标志失效信号。响应于冲突标志失效信号,地址选择器用于选通第一地址计算单元发送的下一组数据的向量地址,向量地址缓存器用于同时缓存下一组数据的向量地址,冲突判断单元用于继续对下一组数据的Bank冲突进行处理。So far, the conflict processing of the group of data ends, and the conflict judgment unit is used to generate a conflict flag failure signal. In response to the conflict flag invalidation signal, the address selector is used to select the vector address of the next set of data sent by the first address calculation unit, the vector address buffer is used to buffer the vector address of the next set of data at the same time, and the conflict judgment unit is used to continue to check the vector address of the next set of data. Bank conflicts of the next set of data are processed.
在冲突解决机制中,冲突判断单元用于对向量地址进行判断,当向量地址不存在Bank冲突时,冲突判断单元用于产生冲突标志失效信号。地址映射单元用于将向量地址映射至存储器的物理地址。读数据重组单元用于读取物理地址的数据,无需对读取的数据进行重组,直接将读取的数据发送至第一数据处理单元。由于不存在Bank冲突,一个时钟周期即可完成数据的读取。响应于冲突标志失效信号,地址选择器用于选通下一组数据的向量地址。In the conflict resolution mechanism, the conflict judgment unit is used to judge the vector address. When there is no bank conflict in the vector address, the conflict judgment unit is used to generate a conflict flag failure signal. The address mapping unit is used to map the vector address to the physical address of the memory. The read data reorganization unit is used to read the data of the physical address, without reorganizing the read data, and directly sends the read data to the first data processing unit. Since there is no bank conflict, the data can be read in one clock cycle. In response to the conflict flag failure signal, the address selector is used to select the vector address of the next set of data.
除了将向量地址的各个地址对应于存储器的不同Bank认定为不存在Bank冲突,本实施例所述的不存在Bank冲突还包括以下情况:In addition to determining that each address of the vector address corresponds to different banks of the memory as no Bank conflict, the absence of Bank conflict described in this embodiment also includes the following situations:
当Bank的位宽为m个字节时,将向量地址平均分为m组或2×m组。如果每组地址均对应于一个Bank的一个cell,则认为向量地址不存在Bank冲突,第一冲突处理单元用于对向量地址进行地址拼接。When the bit width of the Bank is m bytes, the vector addresses are equally divided into m groups or 2×m groups. If each group of addresses corresponds to a cell of a bank, it is considered that there is no bank conflict in the vector address, and the first conflict processing unit is used to perform address splicing on the vector address.
所述第一数据处理单元接收到第一冲突处理单元发送的数据后,根据处理器核需要的数据宽度,决定是否对数据进行进一步的处理。当处理器核需要的数据并非是从各个Bank的各个cell读取的全部字节,而是各个Bank的各个cell的部分字节时,第 一数据处理单元用于对第一冲突处理单元发送的数据的部分字节进行拼接,以生成处理器核需要的数据,并将拼接后的数据发送给第一数据收发单元。After the first data processing unit receives the data sent by the first conflict processing unit, it decides whether to perform further processing on the data according to the data width required by the processor core. When the data required by the processor core is not all the bytes read from each cell of each Bank, but a partial byte of each cell of each Bank, the first data processing unit is used to send to the first conflict processing unit Part of the bytes of the data are spliced to generate data required by the processor core, and the spliced data is sent to the first data transceiver unit.
具体来说,对第一冲突处理单元发送的数据,当Bank的位宽为m个字节,且处理器核需要的是该数据的每m个字节中的k个字节时,第一数据处理单元用于从每m个字节中选择所述k个字节,得到N×k个字节;k≤log 2 m;将N×k个字节的每m个字节组合在一起,得到m×k块、每块宽度为m个字节的数据。 Specifically, for the data sent by the first conflict processing unit, when the bit width of the Bank is m bytes, and what the processor core needs is k bytes out of every m bytes of the data, the first The data processing unit is used to select the k bytes from every m bytes to obtain N×k bytes; k≤log 2 m ; combine every m bytes of N×k bytes together , Get the data of m×k block, each block width is m bytes.
当处理器核需要的数据是从各个Bank的各个cell读取的全部字节时,第一数据处理单元无需对第一冲突处理单元发送的数据进行拼接,而是直接将数据发送给第一数据收发单元。When the data required by the processor core is all the bytes read from each cell of each Bank, the first data processing unit does not need to splice the data sent by the first conflict processing unit, but directly sends the data to the first data Transceiver unit.
以上以k=1为例对数据拼接的过程进行了说明。根据k≤log 2 m,当m=4时,k的值还可以为2,即处理器核需要的是每4个字节中的2个字节,这种情况下,第一数据处理单元的操作与k=1时是类似的。第一数据处理单元从每4个字节中选择处理器核所需的2个字节,得到16×2=32个字节。然后,第一数据处理单元将选择的32个字节的每4个字节组合在一起,得到8块、每块宽度为4个字节的数据,该数据即为处理器核所需的数据,并将拼接后的数据发送给第一数据收发单元。 The process of data splicing is described above by taking k=1 as an example. According to k≤log 2 m , when m=4, the value of k can also be 2, that is, what the processor core needs is 2 bytes in every 4 bytes. In this case, the first data processing unit The operation of is similar to when k=1. The first data processing unit selects 2 bytes required by the processor core from every 4 bytes to obtain 16×2=32 bytes. Then, the first data processing unit combines every 4 bytes of the selected 32 bytes to obtain 8 blocks of data with a width of 4 bytes each, which is the data required by the processor core , And send the spliced data to the first data transceiver unit.
第一数据收发单元包括:接收缓存器和发送缓存器。发送缓存器缓存第一数据处理单元发送的数据,并通过系统总线将缓存的数据发送至处理器核。接收缓存器和发送缓存器的深度可根据实际需要设置,其中接收缓存器深度最小为2,发送缓存器的深度最小为0。The first data transceiver unit includes: a receiving buffer and a sending buffer. The sending buffer buffers the data sent by the first data processing unit, and sends the buffered data to the processor core through the system bus. The depth of the receiving buffer and the transmitting buffer can be set according to actual needs, where the minimum depth of the receiving buffer is 2 and the minimum depth of the transmitting buffer is 0.
至此,处理器核通过第一寻址单元读取存储器的数据的操作完成。So far, the operation of the processor core to read the data of the memory through the first addressing unit is completed.
本实施例的处理器,在处理器中设置第一寻址单元,基地址和偏移地址的获取以及向量地址的计算均由第一寻址单元完成。当向量地址存在Bank冲突,由第一寻址单元利用冲突解决机制解决Bank冲突,无需处理器核处理Bank冲突进行处理。在处理Bank冲突期间,处理器核仍然可以执行其他操作,无需等待Bank冲突的解决。因此,本实施例寻址方法可以显著提高处理器的效率,提升处理器的运算速度。In the processor of this embodiment, the first addressing unit is set in the processor, and the acquisition of the base address and the offset address and the calculation of the vector address are all completed by the first addressing unit. When there is a bank conflict in the vector address, the first addressing unit uses the conflict resolution mechanism to resolve the bank conflict, and the processor core does not need to deal with the bank conflict. During the processing of the Bank conflict, the processor core can still perform other operations without waiting for the resolution of the Bank conflict. Therefore, the addressing method of this embodiment can significantly improve the efficiency of the processor and increase the computing speed of the processor.
第一寻址单元可基于多种不同的模式获取多组数据的基地址以及偏移地址。The first addressing unit can obtain base addresses and offset addresses of multiple sets of data based on multiple different modes.
一种模式可称为偏移地址更新模式。在偏移地址更新模式中,多组数据的基地址不变,各组数据的偏移地址来自第二寻址单元。One mode can be called the offset address update mode. In the offset address update mode, the base address of multiple groups of data is unchanged, and the offset address of each group of data comes from the second addressing unit.
第一地址计算单元用于获取处理器核发送的基地址;第二寻址单元用于依次读取每组数据在存储器的偏移地址;第一地址计算单元用于获取第二寻址单元读取的偏移 地址。The first address calculation unit is used to obtain the base address sent by the processor core; the second addressing unit is used to sequentially read the offset address of each group of data in the memory; the first address calculation unit is used to obtain the reading of the second addressing unit The offset address taken.
如图5所示,当读取多组数据时,在偏移地址更新模式,基地址选择器用于选择处理器核发送的基地址。每当读取一组数据时,偏移地址选择器用于选择第二寻址单元通过内部总线发送的该组数据的偏移地址。加法器用于将基地址与该组数据的偏移地址求和,得到该组数据的向量地址。加法器用于将该组数据的向量地址发送给第一冲突解决单元,并通过第一冲突解决单元、第一数据处理单元和第一数据收发单元将数据发送给处理器核,完成该组数据的读取。对每组数据,偏移地址选择器都选择第二寻址单元通过内部总线发送的该组数据的偏移地址,以实现对多组数据的读取。As shown in Figure 5, when reading multiple sets of data, in the offset address update mode, the base address selector is used to select the base address sent by the processor core. Whenever a group of data is read, the offset address selector is used to select the offset address of the group of data sent by the second addressing unit through the internal bus. The adder is used to sum the base address and the offset address of the group of data to obtain the vector address of the group of data. The adder is used to send the vector address of the group of data to the first conflict resolution unit, and send the data to the processor core through the first conflict resolution unit, the first data processing unit, and the first data transceiver unit to complete the set of data Read. For each group of data, the offset address selector selects the offset address of the group of data sent by the second addressing unit through the internal bus, so as to realize the reading of multiple groups of data.
另一种模式可称为基地址更新模式。在基地址更新模式中,多组数据的偏移地址来自第二寻址单元,且各组数据的偏移地址为同一个偏移地址,通过对基地址初值进行更新,得到各组数据的基地址。The other mode can be called the base address update mode. In the base address update mode, the offset address of multiple groups of data comes from the second addressing unit, and the offset address of each group of data is the same offset address. By updating the initial value of the base address, the offset address of each group of data is obtained. Base address.
除基地址更新模式和偏移地址更新模式外,本实施例的处理器还提供固定偏移地址模式。如图5所示,在固定偏移地址模式中,第一地址计算单元用于获取处理器核发送的基地址,基地址选择器用于选通处理器核发送的基地址,并将该基地址送入加法器。处理器核还用于向第一寻址单元发送一固定偏移地址,第一地址计算单元的偏移地址选择器用于选通该固定偏移地址,并将该固定偏移地址送入加法器。第一地址计算单元的加法器用于将基地址与偏移地址相加,得到向量地址。固定偏移地址模式可用于线性寻址、步长寻址等多种寻址场景。In addition to the base address update mode and the offset address update mode, the processor of this embodiment also provides a fixed offset address mode. As shown in Figure 5, in the fixed offset address mode, the first address calculation unit is used to obtain the base address sent by the processor core, and the base address selector is used to gate the base address sent by the processor core and change the base address Send to the adder. The processor core is also used to send a fixed offset address to the first addressing unit, and the offset address selector of the first address calculation unit is used to select the fixed offset address and send the fixed offset address to the adder . The adder of the first address calculation unit is used to add the base address and the offset address to obtain the vector address. The fixed offset address mode can be used in multiple addressing scenarios such as linear addressing and step addressing.
由此可见,本实施例的处理器提供了偏移地址更新模式、基地址更新模式和固定偏移地址模式,可根据实际情况灵活选择,提高了查表寻址的灵活性。It can be seen that the processor of this embodiment provides an offset address update mode, a base address update mode, and a fixed offset address mode, which can be flexibly selected according to actual conditions, which improves the flexibility of table look-up addressing.
本实施例的处理器具有任务驱动寻址能力。当处理器核执行读数据的操作时,处理器核生成一套读取数据的指令,该指令相当于是一条任务指令,并将该任务指令通过系统总线发送至第一寻址单元,整个寻址过程交由第一寻址单元完成。第一选址单元从存储器读取的数据再经系统总线发送给处理器核。处理器核接收到数据后接着进行后续的操作。由此可见,本实施例的这种任务驱动寻址,当处理器核需要从存储器读取数据时,将任务指令发送至第一寻址单元即可,处理器核无需关心具体的寻址过程,即使是出现Bank冲突的情况,也是由第一寻址单元来处理,相对于一般的处理器,处理器核的操作得到了简化,效率得到了提升。The processor of this embodiment has task-driven addressing capabilities. When the processor core executes the operation of reading data, the processor core generates a set of instructions to read data, which is equivalent to a task instruction, and sends the task instruction to the first addressing unit through the system bus, and the entire addressing The process is completed by the first addressing unit. The data read from the memory by the first addressing unit is sent to the processor core via the system bus. The processor core then performs subsequent operations after receiving the data. It can be seen that in this task-driven addressing of this embodiment, when the processor core needs to read data from the memory, the task instruction can be sent to the first addressing unit, and the processor core does not need to care about the specific addressing process. Even if the bank conflict occurs, it is handled by the first addressing unit. Compared with the general processor, the operation of the processor core is simplified and the efficiency is improved.
在本实施例中,第一寻址单元的第一控制单元可通过一握手协议与处理器核通信。In this embodiment, the first control unit of the first addressing unit can communicate with the processor core through a handshake protocol.
如图13所示,处理器核与第一寻址单元通过系统总线通信,系统总线包括:时钟 信号线、读请求有效、读请求备好、读请求、读数据有效、读数据备好和读数据线,第一寻址单元在时钟信号线的驱动下工作。当处理器核需要从存储器读取数据时,处理器核通过握手协议向第一寻址单元发送任务指令和从第一寻址单元接收数据。当读请求有效信号为高时,表示读请求信号有效;当读请求有效信号和读请求备好信号同为高时,第一控制单元从处理器核读取读请求。之后第一控制单元控制第一地址计算单元、第一冲突处理单元、第一数据处理单元和第一数据收发单元从存储器读取数据。当读数据有效信号为高时,表示读数据有效;当读数据有效信号和读数据备好信号同为高时,第一控制单元控制第一数据收发单元将数据发送至处理器核。As shown in Figure 13, the processor core communicates with the first addressing unit through the system bus. The system bus includes: clock signal line, read request valid, read request ready, read request, read data valid, read data ready and read For the data line, the first addressing unit works under the drive of the clock signal line. When the processor core needs to read data from the memory, the processor core sends task instructions to the first addressing unit and receives data from the first addressing unit through the handshake protocol. When the read request valid signal is high, it indicates that the read request signal is valid; when the read request valid signal and the read request ready signal are both high, the first control unit reads the read request from the processor core. After that, the first control unit controls the first address calculation unit, the first conflict processing unit, the first data processing unit, and the first data transceiving unit to read data from the memory. When the read data valid signal is high, it indicates that the read data is valid; when the read data valid signal and the read data ready signal are both high, the first control unit controls the first data transceiver unit to send data to the processor core.
在本实施例中,第一控制单元控制第一地址计算单元、第一冲突处理单元、第一数据处理单元和第一数据收发单元以流水线的方式工作。In this embodiment, the first control unit controls the first address calculation unit, the first conflict processing unit, the first data processing unit, and the first data transceiving unit to work in a pipeline manner.
如图14所示,第一控制单元包括:读写请求缓存、同步寄存器、选择器、以及流水线的第一级、第二级、第三级、第四级和第五级控制器。As shown in FIG. 14, the first control unit includes: read and write request buffers, synchronization registers, selectors, and the first, second, third, fourth, and fifth stages of the pipeline.
处理器核通过系统总线发送一读请求,如果该读请求为查表请求,则选择器用于选通同步寄存器。读写请求缓存用于接收该读请求,并将读请求缓存。收到读请求后,读写请求缓存用于通过内部总线向第二寻址单元发送偏移地址请求,并将读请求发送至同步寄存器。响应于该偏移地址请求,第二寻址单元用于从存储器中读取所述数据在存储器中的偏移地址,将偏移地址通过内部总线发送给第一地址计算单元,并通过内部总线向同步发送一偏移地址有效信号。收到偏移地址有效信号后,同步寄存器用于将读请求信号发送给各级流水线控制器,启动流水线操作。第一级、第二级、第三级、第四级和第五级控制器可分别向第一地址计算、第一冲突处理单元、第一数据处理单元和第一数据收发单元发送控制信号,在第一寻址单元的寻址过程中,第一地址计算单元位于流水线的第一级、第一冲突处理单元位于流水线的第二级、第一数据处理单元位于流水线的第三级和第四级、第一数据收发单元位于流水线的第五级。如果该读请求不是查表请求,则选择器用于选通读写请求缓存,将读请求直接发送给各级流水线控制器,启动流水线操作。The processor core sends a read request through the system bus. If the read request is a table lookup request, the selector is used to gate the synchronization register. The read and write request cache is used to receive the read request and cache the read request. After receiving the read request, the read and write request buffer is used to send an offset address request to the second addressing unit through the internal bus, and send the read request to the synchronization register. In response to the offset address request, the second addressing unit is used to read the offset address of the data in the memory from the memory, and send the offset address to the first address calculation unit through the internal bus, and through the internal bus Send an offset address valid signal to the synchronization. After receiving the offset address valid signal, the synchronization register is used to send the read request signal to the pipeline controllers at all levels to start the pipeline operation. The first-level, second-level, third-level, fourth-level, and fifth-level controllers can respectively send control signals to the first address calculation, the first conflict processing unit, the first data processing unit, and the first data transceiving unit, In the addressing process of the first addressing unit, the first address calculation unit is located in the first stage of the pipeline, the first conflict processing unit is located in the second stage of the pipeline, and the first data processing unit is located in the third and fourth stages of the pipeline. Stage, the first data transceiver unit is located at the fifth stage of the pipeline. If the read request is not a table lookup request, the selector is used to strobe the read and write request cache, send the read request directly to the pipeline controllers at all levels, and start the pipeline operation.
本实施例的第一寻址单元还提供一流水线暂停机制。当处理器核无法通过系统总线接收到第一数据收发单元发送的数据时,处理器核可通过系统总线向读取请求缓存、同步寄存器、流水线的第一级、第二级、第三级、第四级和第五级控制器发送一总线暂停信号。读取请求缓存、同步寄存器、流水线的第一级、第二级、第三级、第四级和第五级控制器收到该总线暂停信号后,流水线暂停工作。当第一冲突处理单元发现 向量地址存在Bank冲突时,第一冲突处理单元用于向读取请求缓存、同步寄存器、流水线的第一级、第二级、第三级、第四级和第五级控制器发送一冲突暂停信号,读取请求缓存、同步寄存器、流水线的第一级、第二级、第三级、第四级和第五级控制器收到该总线暂停信号后,流水线暂停工作。待Bank冲突处理完后,第一冲突处理单元发送一冲突恢复信号,读取请求缓存、同步寄存器、流水线的第一级、第二级、第三级、第四级和第五级控制器收到该冲突恢复信号后,重启流水线。The first addressing unit of this embodiment also provides a streamline pause mechanism. When the processor core cannot receive the data sent by the first data transceiver unit through the system bus, the processor core can read the request cache, synchronization register, the first stage, second stage, and third stage of the pipeline through the system bus. The fourth and fifth level controllers send a bus pause signal. After the first, second, third, fourth, and fifth stage controllers of the read request cache, synchronization register, and pipeline receive the bus suspend signal, the pipeline suspends work. When the first conflict processing unit finds that there is a bank conflict in the vector address, the first conflict processing unit is used to send the read request cache, synchronization register, the first stage, the second stage, the third stage, the fourth stage, and the fifth stage of the pipeline. The level controller sends a conflict pause signal, and reads the request buffer, synchronization register, the first, second, third, fourth, and fifth stage of the pipeline after receiving the bus pause signal, and the pipeline pauses Work. After the Bank conflict is processed, the first conflict processing unit sends a conflict recovery signal to read the request buffer, synchronization register, and the first, second, third, fourth, and fifth-level controllers of the pipeline. After the conflict recovery signal is reached, the pipeline is restarted.
本实施例通过采用流水线的方式,使第一寻址单元的各个模块能够按照流水线并行执行,可以大幅提升第一寻址单元的工作效率,减少寻址时间,使寻址效率得以提升。In this embodiment, by adopting a pipeline method, each module of the first addressing unit can be executed in parallel according to the pipeline, which can greatly improve the working efficiency of the first addressing unit, reduce the addressing time, and improve the addressing efficiency.
参见图4,第二寻址单元包括:第二控制单元、第二地址计算单元、第二冲突处理单元、第二数据处理单元和第二数据收发单元。4, the second addressing unit includes: a second control unit, a second address calculation unit, a second conflict processing unit, a second data processing unit, and a second data transceiving unit.
当第二寻址单元从存储器读取偏移地址时,第二冲突处理单元用于从存储器读取偏移地址,并将偏移地址发送给第二数据处理单元;第二数据处理单元用于对偏移地址进行处理,并将处理后的偏移地址发送至第二数据收发单元;第二数据收发单元用于将处理后的偏移地址发送至第一寻址单元。When the second addressing unit reads the offset address from the memory, the second conflict processing unit is used for reading the offset address from the memory and sending the offset address to the second data processing unit; the second data processing unit is used for The offset address is processed, and the processed offset address is sent to the second data transceiving unit; the second data transceiving unit is used to send the processed offset address to the first addressing unit.
第二寻址单元与第一寻址单元的区别之处在于,第二数据收发单元用于将处理后的偏移地址通过内部总线发送给第一寻址单元,而不是像第一寻址单元那样,第一数据收发单元用于将数据通过系统总线发送给处理器核。除此之外,第二控制单元、第二地址计算单元、第二冲突处理单元、第二数据处理单元和第二数据收发单元的操作都是与第一寻址单元对应的单元类似的。The difference between the second addressing unit and the first addressing unit is that the second data transceiving unit is used to send the processed offset address to the first addressing unit through the internal bus, instead of the first addressing unit. In that way, the first data transceiver unit is used to send data to the processor core through the system bus. In addition, the operations of the second control unit, the second address calculation unit, the second conflict processing unit, the second data processing unit, and the second data transceiving unit are similar to those of the units corresponding to the first addressing unit.
当处理器核将数据写入存储器时,第一寻址单元的部分操作与读操作是类似的。处理器核还可通过第一寻址单元将所述数据写入存储器的向量地址。When the processor core writes data into the memory, part of the operation of the first addressing unit is similar to the read operation. The processor core can also write the data into the vector address of the memory through the first addressing unit.
当存在Bank冲突时,第一数据收发单元用于接收处理器核发送的数据,并将数据发送至第一数据处理单元;第一数据处理单元用于对数据进行处理,并将处理后的数据发送至第一冲突处理单元;第一冲突处理单元利用冲突解决机制将数据写入向量地址。When there is a bank conflict, the first data transceiver unit is used to receive the data sent by the processor core and send the data to the first data processing unit; the first data processing unit is used to process the data and transfer the processed data Sent to the first conflict processing unit; the first conflict processing unit writes the data into the vector address using the conflict resolution mechanism.
接收缓存器用于接收并缓存处理器核通过系统总线发送的数据,并将数据发送给第一数据处理单元。The receiving buffer is used to receive and buffer the data sent by the processor core through the system bus, and send the data to the first data processing unit.
第一数据处理单元用于接收到第一数据收发单元发送的数据后,根据处理器核写入的数据宽度,决定是否对数据进行进一步的处理。当处理器核并非是将数据写入各 个Bank的各个cell全部字节,而是将数据写入各个Bank的各个cell的部分字节时,第一数据处理单元用于对第一数据收发单元发送的数据进行拆分,以生成需要写入存储器的数据,并将拆分后的数据发送给第一冲突处理单元。The first data processing unit is configured to, after receiving the data sent by the first data transceiving unit, determine whether to perform further processing on the data according to the data width written by the processor core. When the processor core does not write data into all the bytes of each cell of each Bank, but writes data into the partial bytes of each cell of each Bank, the first data processing unit is used to send to the first data transceiver unit Split the data to generate data that needs to be written into the memory, and send the split data to the first conflict processing unit.
当第一数据收发单元发送的数据包括m×k块且每块宽度为m个字节时,第一数据处理单元用于对每块的m个字节进行拆分,得到N×k个字节,使每k个字节分别对应一个Bank的k个地址;其中,N为存储器的Bank数量;Bank的位宽为m个字节;k≤log 2 mWhen the data sent by the first data transceiver unit includes m×k blocks and the width of each block is m bytes, the first data processing unit is used to split the m bytes of each block to obtain N×k words Section, so that each k bytes correspond to the k addresses of a Bank; among them, N is the number of banks in the memory; the bit width of the bank is m bytes; k≤log 2 m .
当处理器核要将数据写入各个Bank的各个cell的全部字节时,即处理器核要向存储器写入64个字节,则第一数据处理单元无需对第一数据收发单元发送的数据进行拆分,而是直接将数据发送给第一冲突处理单元。When the processor core wants to write data into all the bytes of each cell of each Bank, that is, the processor core needs to write 64 bytes to the memory, the first data processing unit does not need to send data to the first data transceiver unit. Splitting is performed, but the data is directly sent to the first conflict processing unit.
如图18所示,第一冲突处理单元还包括:写数据缓存器、写数据选通器和写数据重组单元。As shown in FIG. 18, the first conflict processing unit further includes: a write data buffer, a write data strobe, and a write data reorganization unit.
第一地址计算单元发送的向量地址直接输入地址选通器,同时向量地址缓存器用于对所述向量地址进行缓存。The vector address sent by the first address calculation unit is directly input to the address strobe, and the vector address buffer is used to buffer the vector address.
地址选通器用于选通第一地址计算单元直接发送的向量地址,使该向量地址输出至冲突判断单元。The address strobe is used to strobe the vector address directly sent by the first address calculation unit, so that the vector address is output to the conflict judgment unit.
第一数据处理单元发送的数据发送给写数据选通器,写数据缓存器同时将第一数据处理单元发送的数据缓存。The data sent by the first data processing unit is sent to the write data strobe, and the write data buffer simultaneously buffers the data sent by the first data processing unit.
冲突判断单元用于对向量地址进行判断:The conflict judgment unit is used to judge the vector address:
当向量地址存在Bank冲突时,冲突判断单元用于产生一冲突标志有效信号,并将冲突标志有效信号反馈至地址选通器和写数据选通器,使地址选通器选通向量地址缓存器输出的向量地址,写数据选通器用于选通写数据缓存器输出的数据。这样可在冲突处理期间,保持输入冲突判断单元的向量地址和输入写数据重组单元的数据不变;When there is a bank conflict in the vector address, the conflict judgment unit is used to generate a conflict flag valid signal, and feedback the conflict flag valid signal to the address strobe and the write data strobe, so that the address strobe strobes the vector address register The output vector address, the write data strobe is used to strobe the data output from the write data buffer. In this way, the vector address of the input conflict judgment unit and the data of the input write data reorganization unit can be kept unchanged during the conflict processing;
写数据重组单元用于对数据进行重组;The write data reorganization unit is used to reorganize data;
地址映射单元用于将向量地址映射至存储器的物理地址,写数据重组单元用于将重组后的数据写入存储器的物理地址。The address mapping unit is used to map the vector address to the physical address of the memory, and the write data reorganization unit is used to write the reorganized data into the physical address of the memory.
之后冲突判断单元用于产生一冲突标志失效信号,响应于冲突标志失效信号,地址选择器用于选通第一地址计算单元发送的下一组数据的向量地址,向量地址缓存器同时缓存下一组数据的向量地址,写数据选通器用于选通第一数据处理单元发送的下一组数据,并将下一组数据缓存至写数据缓存器,冲突判断单元继续对下一组数据的Bank冲突进行处理。Afterwards, the conflict judgment unit is used to generate a conflict flag invalidation signal. In response to the conflict flag invalidation signal, the address selector is used to select the vector address of the next group of data sent by the first address calculation unit, and the vector address buffer caches the next group at the same time. The vector address of the data. The write data strobe is used to strobe the next set of data sent by the first data processing unit and buffer the next set of data to the write data buffer. The conflict judgment unit continues to conflict with the next set of data. To process.
数据重组单元用于对数据进行重组:The data reorganization unit is used to reorganize data:
确定各个Bank的与向量地址对应的第一个cell,按照地址由小到大的顺序将对应于第一个cell的数据编为一行;Determine the first cell corresponding to the vector address of each Bank, and compile the data corresponding to the first cell into a row according to the order of address ascending;
确定各个Bank的与向量地址对应的第二个cell,按照地址由小到大的顺序将对应于第二个cell的数据编为一行;Determine the second cell corresponding to the vector address of each Bank, and compile the data corresponding to the second cell into a row according to the order of address from small to large;
依次类推,直至确定各个Bank的与向量地址对应的第n个cell,按照地址由小到大的顺序将对应于第n个cell的数据编为一行,共得到n行数据。By analogy, until the n-th cell corresponding to the vector address of each Bank is determined, the data corresponding to the n-th cell is compiled into one row according to the order of address from small to large, and a total of n rows of data are obtained.
地址映射单元用于依次选通n行数据对应的n组cell;The address mapping unit is used to sequentially select n groups of cells corresponding to n rows of data;
写数据重组单元用于按照n组cell的选通顺序,依次将n行数据写入n组cell。The write data reorganization unit is used to sequentially write n rows of data into the n groups of cells according to the gating sequence of the n groups of cells.
至此,该组数据的冲突处理结束,冲突判断单元用于产生一冲突标志失效信号。响应于冲突标志失效信号,地址选择器用于选通第一地址计算单元发送的下一组数据的向量地址,向量地址缓存器同时缓存下一组数据的向量地址,写数据选通器用于选通第一数据处理单元发送的下一组数据,并将下一组数据缓存至写数据缓存器,冲突判断单元继续对下一组数据的Bank冲突进行处理。So far, the conflict processing of the group of data ends, and the conflict judgment unit is used to generate a conflict flag failure signal. In response to the conflict flag invalidation signal, the address selector is used to strobe the vector address of the next set of data sent by the first address calculation unit, the vector address buffer also buffers the vector address of the next set of data, and the write data strobe is used to strobe The first data processing unit sends the next set of data and buffers the next set of data to the write data buffer, and the conflict judgment unit continues to process the bank conflict of the next set of data.
在冲突解决机制中,冲突判断单元用于对向量地址进行判断,当向量地址不存在Bank冲突时,冲突判断单元用于产生一冲突标志失效信号;地址映射单元用于将向量地址映射至存储器的物理地址;写数据重组单元用于将数据写入物理地址;响应于冲突标志失效信号,地址选择器用于选通下一组数据的向量地址。In the conflict resolution mechanism, the conflict judgment unit is used to judge the vector address. When there is no bank conflict in the vector address, the conflict judgment unit is used to generate a conflict flag failure signal; the address mapping unit is used to map the vector address to the memory Physical address; the write data recombination unit is used to write data into the physical address; in response to the conflict flag failure signal, the address selector is used to select the vector address of the next set of data.
至此,处理器核通过第一寻址单元将数据写入存储器的操作完成。So far, the operation of writing data into the memory by the processor core through the first addressing unit is completed.
与读操作类似,在处理器中设置第一寻址单元,基地址和偏移地址的获取以及向量地址的计算均由第一寻址单元完成。当向量地址存在Bank冲突,由第一寻址单元利用冲突解决机制解决Bank冲突,无需处理器核处理Bank冲突进行处理。在处理Bank冲突期间,处理器核仍然可以执行其他操作,无需等待Bank冲突的解决。因此,本实施例寻址方法可以显著提高处理器的效率,提升处理器的运算速度。Similar to the read operation, the first addressing unit is set in the processor, and the acquisition of the base address and the offset address and the calculation of the vector address are all completed by the first addressing unit. When there is a bank conflict in the vector address, the first addressing unit uses the conflict resolution mechanism to resolve the bank conflict, and the processor core does not need to deal with the bank conflict. During the processing of the Bank conflict, the processor core can still perform other operations without waiting for the resolution of the Bank conflict. Therefore, the addressing method of this embodiment can significantly improve the efficiency of the processor and increase the computing speed of the processor.
如图20所示,处理器核与第一寻址单元通过系统总线通信,系统总线包括:时钟信号线、写请求有效、写请求备好、写请求、写数据线和写繁忙,第一寻址单元在时钟信号线的驱动下工作。当处理器核需要向存储器写入数据时,处理器核可通过握手协议向第一寻址单元发送任务指令和数据。当写请求有效信号为高时,表示写请求信号和写数据有效;当写请求有效信号和写请求备好信号同为高时,第一控制单元可从处理器核读取写请求,且写繁忙信号拉高。之后第一控制单元用于控制第一地址计算 单元、第一冲突处理单元、第一数据处理单元和第一数据收发单元向存储器写入数据,写繁忙信号拉低。As shown in Figure 20, the processor core communicates with the first addressing unit through the system bus. The system bus includes: clock signal line, write request valid, write request ready, write request, write data line, and write busy. The address unit works under the drive of the clock signal line. When the processor core needs to write data to the memory, the processor core may send task instructions and data to the first addressing unit through a handshake protocol. When the write request valid signal is high, it means that the write request signal and the write data are valid; when the write request valid signal and the write request ready signal are both high, the first control unit can read the write request from the processor core and write The busy signal is pulled high. After that, the first control unit is used to control the first address calculation unit, the first conflict processing unit, the first data processing unit and the first data transceiver unit to write data to the memory, and the write busy signal is pulled low.
在写操作中,第一控制单元用于控制第一地址计算单元、第一冲突处理单元、第一数据处理单元和第一数据收发单元以流水线的方式工作。In the write operation, the first control unit is used to control the first address calculation unit, the first conflict processing unit, the first data processing unit, and the first data transceiving unit to work in a pipeline manner.
如图21所示,处理器核可通过系统总线发送一写请求,如果该写请求为查表请求,则选择器用于选通同步寄存器。读写请求缓存接收该写请求,并将写请求缓存。收到写请求后,读写请求缓存可通过内部总线向第二寻址单元发送偏移地址请求,并将写请求发送至同步寄存器。响应于该偏移地址请求,第二寻址单元用于从存储器中读取所述数据在存储器中的偏移地址,将偏移地址通过内部总线发送给第一地址计算单元,并通过内部总线向同步发送一偏移地址有效信号。收到偏移地址有效信号后,同步寄存器用于将写请求发送给各级流水线控制器,启动流水线操作。第一级、第二级控制器用于分别向第一地址计算、第一数据处理单元和第一冲突处理单元发送控制信号,在第一寻址单元的寻址过程中,第一数据收发单元与读写请求缓存位于同一级,第一地址计算单元和第一数据处理单元位于流水线的第一级、第一冲突处理单元位于流水线的第二级。如果该写请求不是查表请求,则选择器选通读写请求缓存,将写请求直接发送给各级流水线控制器,启动流水线操作。As shown in Figure 21, the processor core can send a write request through the system bus. If the write request is a table lookup request, the selector is used to gate the synchronization register. The read and write request cache receives the write request and caches the write request. After receiving the write request, the read and write request buffer can send an offset address request to the second addressing unit through the internal bus, and send the write request to the synchronization register. In response to the offset address request, the second addressing unit is used to read the offset address of the data in the memory from the memory, and send the offset address to the first address calculation unit through the internal bus, and through the internal bus Send an offset address valid signal to the synchronization. After receiving the offset address valid signal, the synchronization register is used to send the write request to the pipeline controllers at all levels to start the pipeline operation. The first-level and second-level controllers are used to respectively send control signals to the first address calculation, the first data processing unit, and the first conflict processing unit. During the addressing process of the first addressing unit, the first data transceiver unit and the The read and write request cache is located in the same stage, the first address calculation unit and the first data processing unit are located in the first stage of the pipeline, and the first conflict processing unit is located in the second stage of the pipeline. If the write request is not a table lookup request, the selector strobes the read and write request cache, sends the write request directly to the pipeline controllers at all levels, and starts the pipeline operation.
同样,在写操作中也提供有流水线暂停机制。当处理器核无法通过系统总线向第一数据收发单元发送数据时,处理器核可通过系统总线向读取请求缓存、同步寄存器、流水线的第一级、第二级控制器发送一总线暂停信号。读取请求缓存、同步寄存器、流水线的第一级、第二级控制器收到该总线暂停信号后,流水线暂停工作。当第一冲突处理单元发现向量地址存在Bank冲突时,第一冲突处理单元用于向读取请求缓存、同步寄存器、流水线的第一级、第二级控制器发送一冲突暂停信号,读取请求缓存、同步寄存器、流水线的第一级、第二级控制器收到该总线暂停信号后,流水线暂停工作。待Bank冲突处理完后,第一冲突处理单元用于发送一冲突恢复信号,读取请求缓存、同步寄存器、流水线的第一级、第二级控制器收到该冲突恢复信号后,重启流水线。Similarly, a pipeline suspend mechanism is also provided in the write operation. When the processor core cannot send data to the first data transceiver unit through the system bus, the processor core can send a bus suspend signal to the read request cache, synchronization register, and the first-level and second-level controllers of the pipeline through the system bus. . After the read request buffer, synchronization register, and the first and second stage controllers of the pipeline receive the bus pause signal, the pipeline suspends work. When the first conflict processing unit finds that there is a bank conflict in the vector address, the first conflict processing unit is used to send a conflict pause signal to the read request cache, the synchronization register, the first and second stage controllers of the pipeline, and the read request After the first and second stage controllers of the buffer, synchronization register, and pipeline receive the bus suspend signal, the pipeline suspends work. After the Bank conflict is processed, the first conflict processing unit is used to send a conflict recovery signal, and the read request buffer, synchronization register, and the first and second stage controllers of the pipeline restart the pipeline after receiving the conflict recovery signal.
本实施例通过采用流水线的方式,使第一寻址单元的各个模块能够按照流水线并行执行,可以大幅提升第一寻址单元的工作效率,减少寻址时间,使寻址效率得以提升。In this embodiment, by adopting a pipeline method, each module of the first addressing unit can be executed in parallel according to the pipeline, which can greatly improve the working efficiency of the first addressing unit, reduce the addressing time, and improve the addressing efficiency.
在第二寻址单元从存储器读取偏移地址之前,处理器核可通过系统总线将偏移地 址发送给第二寻址单元,第二寻址单元用于将偏移地址写入存储器。第二寻址单元向存储器写入偏移地址的操作与上述第一寻址单元向存储器写入数据的操作是类似的。Before the second addressing unit reads the offset address from the memory, the processor core may send the offset address to the second addressing unit through the system bus, and the second addressing unit is used to write the offset address into the memory. The operation of the second addressing unit to write the offset address to the memory is similar to the operation of the above-mentioned first addressing unit to write data to the memory.
本实施例的寻址模块可包括多组寻址单元。本实施例的处理器可由多组寻址单元并行执行数据读写操作。每组寻址单元均可通过系统总线与处理器核进行通信,并对存储器进行读写。当处理器核需要同时对多组数据进行读写时,可分别由各组寻址单元独立完成各自的寻址任务。具体包括多少组寻址单元,本实施例不做限制,可根据实际需求而确定。相对于单组寻址单元,本实施例可成倍地提到处理器的寻址效率,极大地提高了处理器的寻址能力。The addressing module of this embodiment may include multiple groups of addressing units. The processor of this embodiment can perform data read and write operations in parallel by multiple groups of addressing units. Each group of addressing units can communicate with the processor core through the system bus, and read and write the memory. When the processor core needs to read and write multiple groups of data at the same time, each group of addressing units can complete their respective addressing tasks independently. How many groups of addressing units are specifically included is not limited in this embodiment, and can be determined according to actual requirements. Compared with a single group of addressing units, this embodiment can double the addressing efficiency of the processor, which greatly improves the addressing ability of the processor.
本实施例的一组寻址单元可通过乒乓寻址方式获取基地址或偏移地址。如图23所示,一组寻址单元包括:第三寻址单元、第四寻址单元和第五寻址单元。乒乓寻址方式包括:A group of addressing units in this embodiment can obtain a base address or an offset address through a ping-pong addressing mode. As shown in FIG. 23, a group of addressing units includes: a third addressing unit, a fourth addressing unit, and a fifth addressing unit. Ping-pong addressing modes include:
处理器核可通过第四寻址单元和第五寻址单元将偏移地址交替写入存储器;The processor core can alternately write the offset address into the memory through the fourth addressing unit and the fifth addressing unit;
第三寻址单元用于获取处理器核发送的基地址,并通过第四寻址单元和第五寻址单元交替获取存储在存储器中的偏移地址。The third addressing unit is used to obtain the base address sent by the processor core, and alternately obtain the offset address stored in the memory through the fourth addressing unit and the fifth addressing unit.
第三寻址单元作为表寻址单元,第四和第五寻址单元作为偏移地址寻址单元。当处理器核读写多组数据时,可通过第五寻址单元将一组偏移地址写入存储器时,第三寻址单元可向第四寻址单元发送偏移地址请求,第四寻址单元收到偏移地址请求后,可从存储器读取上一组偏移地址,并将该上一组偏移地址发送至第三寻址单元。之后第四寻址单元和第五寻址单元的角色互换。处理器核可通过第四寻址单元将下一组偏移地址写入存储器,同时,第三寻址单元可向第五寻址单元发送偏移地址请求,第五寻址单元收到偏移地址请求后,可从存储器读取该组偏移地址,并将该组偏移地址发送至第三寻址单元。如此不断反复切换,第三寻址单元交替从第四寻址单元和第五寻址单元获取偏移地址,实现偏移地址的乒乓寻址。The third addressing unit is used as a table addressing unit, and the fourth and fifth addressing units are used as an offset addressing unit. When the processor core reads and writes multiple sets of data, when a set of offset addresses can be written into the memory through the fifth addressing unit, the third addressing unit can send an offset address request to the fourth addressing unit, and the fourth seeks After receiving the offset address request, the addressing unit can read the last set of offset addresses from the memory and send the last set of offset addresses to the third addressing unit. After that, the roles of the fourth addressing unit and the fifth addressing unit are exchanged. The processor core can write the next set of offset addresses into the memory through the fourth addressing unit. At the same time, the third addressing unit can send an offset address request to the fifth addressing unit, and the fifth addressing unit receives the offset After the address request, the set of offset addresses can be read from the memory, and the set of offset addresses can be sent to the third addressing unit. By switching repeatedly in this way, the third addressing unit alternately obtains the offset address from the fourth addressing unit and the fifth addressing unit, so as to realize the ping-pong addressing of the offset address.
以下结合图24介绍通过乒乓寻址方式获取基地址。如图24所示,一组寻址单元包括:第六寻址单元、第七寻址单元和第八寻址单元。乒乓寻址方式包括:The following describes how to obtain the base address through ping-pong addressing in conjunction with Figure 24. As shown in FIG. 24, a group of addressing units includes: a sixth addressing unit, a seventh addressing unit, and an eighth addressing unit. Ping-pong addressing modes include:
处理器核可通过第八寻址单元将偏移地址写入存储器;The processor core can write the offset address into the memory through the eighth addressing unit;
第六寻址单元和第七寻址单元可交替获取处理器核发送的基地址,并可通过第八寻址单元获取存储在存储器中的偏移地址。The sixth addressing unit and the seventh addressing unit can alternately obtain the base address sent by the processor core, and can obtain the offset address stored in the memory through the eighth addressing unit.
第六寻址单元和第七寻址单元作为表寻址单元,第八寻址单元作为偏移地址寻址单元。当处理器核读写多组数据时,可将一基地址发送给第七寻址单元的同时,第六 寻址单元可向第八寻址单元发送偏移地址请求,第八寻址单元收到偏移地址请求后,可从存储器读取偏移地址,并将该偏移地址发送至第六寻址单元。之后第六寻址单元和第七寻址单元的角色互换。处理器核可将下一个基地址发送至第六寻址单元,同时,第七寻址单元可向第八寻址单元发送偏移地址请求,第八寻址单元收到偏移地址请求后,可从存储器读取该偏移地址,并将该偏移地址发送至第七寻址单元。如此不断反复切换,第六寻址单元和第七寻址单元交替从第八寻址单元获取偏移地址,实现基地址的乒乓寻址。The sixth addressing unit and the seventh addressing unit are used as a table addressing unit, and the eighth addressing unit is used as an offset addressing unit. When the processor core reads and writes multiple sets of data, while a base address can be sent to the seventh addressing unit, the sixth addressing unit can send an offset address request to the eighth addressing unit, and the eighth addressing unit receives After the offset address is requested, the offset address can be read from the memory and sent to the sixth addressing unit. After that, the roles of the sixth addressing unit and the seventh addressing unit are exchanged. The processor core can send the next base address to the sixth addressing unit. At the same time, the seventh addressing unit can send an offset address request to the eighth addressing unit. After the eighth addressing unit receives the offset address request, The offset address can be read from the memory and sent to the seventh addressing unit. By switching repeatedly in this way, the sixth addressing unit and the seventh addressing unit alternately obtain the offset address from the eighth addressing unit to realize the ping-pong addressing of the base address.
由此可见,本实施例通过乒乓寻址的方式,三个以上的寻址单元可并行执行基地址和偏移地址的写入和读取操作,提高了处理器的寻址能力,尤其在大规模查表寻址中,可大幅提高寻址效率。It can be seen that, in this embodiment, through the ping-pong addressing mode, three or more addressing units can execute the write and read operations of the base address and the offset address in parallel, which improves the addressing ability of the processor, especially in large In scale look-up table addressing, addressing efficiency can be greatly improved.
本公开又一实施例,还提供了一种可移动平台,可移动平台包括:机身;机身内部包括:至少一个电路;电路包括:至少一个上述实施例的处理器。Another embodiment of the present disclosure also provides a movable platform. The movable platform includes a fuselage; the fuselage includes at least one circuit; and the circuit includes at least one processor of the above-mentioned embodiments.
可移动平台可以是任何可以移动的车辆或载体,例如但不限于:机器人、无人机、无人车、无人船等。以无人机为例,参见图25,无人机的机身可以具有外壳。该外壳可以是由单个整体件、两个整体件或多个零件形成的。该外壳可以包括单个空腔或多个空腔。对于每个空腔,在该空腔内可安置一个或多个部件。该部件可以是诸如至少一个电路板、一个或多个传感器、一个或多个通信单元,或者任何其他类型的部件。每个电路板均可以包括一个或多个上述实施例的处理器,处理器用于执行飞行控制、导航、图像处理等功能。The movable platform can be any movable vehicle or carrier, such as but not limited to: robots, drones, unmanned vehicles, unmanned ships, etc. Taking a drone as an example, referring to Figure 25, the body of the drone may have a shell. The housing may be formed of a single integral piece, two integral pieces, or multiple parts. The housing may include a single cavity or multiple cavities. For each cavity, one or more components can be placed in the cavity. The component may be, for example, at least one circuit board, one or more sensors, one or more communication units, or any other type of component. Each circuit board may include one or more processors of the foregoing embodiments, and the processors are used to perform functions such as flight control, navigation, and image processing.
本公开又一实施例,还提供了一种电子设备,电子设备包括:壳体;壳体内设有:至少一个电路;电路包括:至少一个上述实施例所述的处理器。Another embodiment of the present disclosure also provides an electronic device. The electronic device includes: a housing; the housing is provided with: at least one circuit; the circuit includes: at least one processor as described in the foregoing embodiment.
本实施例的电子设备,如图26所示,可以是遥控器,尤其是可移动平台的遥控器。电子设备还可以是任何便携式或非便携式设备,例如但不限于:智能电话/手机、平板电脑、个人数字助理(PDA)、膝上计算机、台式计算机、媒体内容播放器、视频游戏站/系统、虚拟现实系统、增强现实系统、可穿戴式装置(例如,手表、眼镜、手套、头饰)、手势识别装置、麦克风、能够提供或渲染图像数据的设备等。The electronic device of this embodiment, as shown in FIG. 26, may be a remote control, especially a remote control of a movable platform. The electronic device can also be any portable or non-portable device, such as but not limited to: smart phone/mobile phone, tablet computer, personal digital assistant (PDA), laptop computer, desktop computer, media content player, video game station/system, Virtual reality systems, augmented reality systems, wearable devices (for example, watches, glasses, gloves, headwear), gesture recognition devices, microphones, equipment capable of providing or rendering image data, etc.
本公开又一实施例,还提供了一种计算机可读存储介质,其存储有可执行指令,可执行指令在由一个或多个处理器执行时,可以使一个或多个处理器执行上述实施例的寻址方法。Another embodiment of the present disclosure also provides a computer-readable storage medium that stores executable instructions. When the executable instructions are executed by one or more processors, one or more processors can execute the foregoing implementation. Example addressing method.
本领域技术人员可以清楚地了解到,为描述的方便和简洁,仅以上述各功能模块 的划分进行举例说明,实际应用中,可以根据需要而将上述功能分配由不同的功能模块完成,即将装置的内部结构划分成不同的功能模块,以完成以上描述的全部或者部分功能。上述描述的装置的具体工作过程,可以参考前述方法实施例中的对应过程,在此不再赘述。Those skilled in the art can clearly understand that for the convenience and conciseness of the description, only the division of the above-mentioned functional modules is used as an example. In practical applications, the above-mentioned functions can be allocated by different functional modules as required, that is, the device The internal structure is divided into different functional modules to complete all or part of the functions described above. For the specific working process of the device described above, reference may be made to the corresponding process in the foregoing method embodiment, which is not repeated here.
最后应说明的是:以上各实施例仅用以说明本公开的技术方案,而非对其限制;尽管参照前述各实施例对本公开进行了详细的说明,本领域的普通技术人员应当理解:其依然可以对前述各实施例所记载的技术方案进行修改,或者对其中部分或者全部技术特征进行等同替换;在不冲突的情况下,本公开实施例中的特征可以任意组合;而这些修改或者替换,并不使相应技术方案的本质脱离本公开各实施例技术方案的范围。Finally, it should be noted that the above embodiments are only used to illustrate the technical solutions of the present disclosure, not to limit it; although the present disclosure has been described in detail with reference to the foregoing embodiments, those of ordinary skill in the art should understand that: The technical solutions described in the foregoing embodiments can still be modified, or some or all of the technical features can be equivalently replaced; in the case of no conflict, the features in the embodiments of the present disclosure can be combined arbitrarily; and these modifications or replacements It does not cause the essence of the corresponding technical solutions to deviate from the scope of the technical solutions of the embodiments of the present disclosure.

Claims (78)

  1. 一种处理器的寻址方法,其特征在于,所述处理器包括:处理器核、寻址模块和存储器;所述寻址方法包括:An addressing method for a processor, wherein the processor includes: a processor core, an addressing module, and a memory; the addressing method includes:
    所述寻址模块获取数据在所述存储器的基地址以及偏移地址;The addressing module obtains the base address and the offset address of the data in the memory;
    所述寻址模块根据所述基地址和所述偏移地址得到所述数据在所述存储器的存储地址;以及The addressing module obtains the storage address of the data in the memory according to the base address and the offset address; and
    所述处理器核通过所述寻址模块访问所述存储地址的所述数据。The processor core accesses the data of the storage address through the addressing module.
  2. 如权利要求1所述的处理器的寻址方法,其特征在于,所述寻址模块包括:至少一组寻址单元;所述寻址方法由所述至少一组寻址单元执行。The addressing method of the processor according to claim 1, wherein the addressing module comprises: at least one set of addressing units; and the addressing method is executed by the at least one set of addressing units.
  3. 如权利要求2所述的处理器的寻址方法,其特征在于,所述一组寻址单元至少包括:第一寻址单元;所述存储地址为向量地址;3. The addressing method of a processor according to claim 2, wherein the group of addressing units at least comprises: a first addressing unit; and the storage address is a vector address;
    所述处理器核通过所述寻址模块访问所述存储地址的所述数据,包括:The access by the processor core to the data of the storage address through the addressing module includes:
    所述处理器核通过所述第一寻址单元访问所述向量地址的所述数据;当所述向量地址存在存储块冲突时,所述第一寻址单元利用冲突解决机制访问所述数据。The processor core accesses the data of the vector address through the first addressing unit; when the vector address has a memory block conflict, the first addressing unit uses a conflict resolution mechanism to access the data.
  4. 如权利要求3所述的处理器的寻址方法,其特征在于,所述寻址方法还包括:在所述寻址模块获取数据在所述存储器的基地址以及偏移地址之前,8. The processor addressing method of claim 3, wherein the addressing method further comprises: before the addressing module obtains the base address and the offset address of the data in the memory,
    响应于访问所述数据的代码,所述处理器核生成访问所述数据的任务指令,并将所述任务指令发送至所述第一寻址单元。In response to the code for accessing the data, the processor core generates a task instruction for accessing the data, and sends the task instruction to the first addressing unit.
  5. 如权利要求4所述的处理器的寻址方法,其特征在于,所述第一寻址单元包括:第一地址计算单元、第一冲突处理单元、第一数据处理单元和第一数据收发单元。The addressing method for a processor according to claim 4, wherein the first addressing unit comprises: a first address calculation unit, a first conflict processing unit, a first data processing unit, and a first data transceiving unit .
  6. 如权利要求5所述的处理器的寻址方法,其特征在于,The addressing method of the processor according to claim 5, wherein:
    所述处理器核通过所述第一寻址单元访问所述存储地址的所述数据,包括:The access by the processor core to the data of the storage address through the first addressing unit includes:
    所述第一冲突处理单元利用所述冲突解决机制从所述向量地址读取所述数据,并将所述数据发送给所述第一数据处理单元;The first conflict processing unit reads the data from the vector address by using the conflict resolution mechanism, and sends the data to the first data processing unit;
    所述第一数据处理单元对所述数据进行处理,并将处理后的所述数据发送至所述第一数据收发单元;The first data processing unit processes the data, and sends the processed data to the first data transceiving unit;
    所述第一数据收发单元将处理后的所述数据发送至所述处理器核。The first data transceiving unit sends the processed data to the processor core.
  7. 如权利要求6所述的处理器的寻址方法,其特征在于,所述第一冲突处理单元包括:冲突判断单元、地址映射单元、地址选通器和读数据重组单元;8. The addressing method of a processor according to claim 6, wherein the first conflict processing unit comprises: a conflict judgment unit, an address mapping unit, an address strobe, and a read data recombination unit;
    所述冲突解决机制包括:The conflict resolution mechanism includes:
    所述地址选通器选通所述向量地址,使所述向量地址输出至所述冲突判断单元;The address strobe strobes the vector address so that the vector address is output to the conflict judgment unit;
    当所述向量地址存在存储块冲突时,所述冲突判断单元产生冲突标志有效信号;When there is a memory block conflict in the vector address, the conflict judgment unit generates a conflict flag valid signal;
    响应于所述冲突标志有效信号,所述地址选通器保持选通所述向量地址;所述地址映射单元将所述向量地址映射至所述存储器的物理地址;所述读数据重组单元读取所述物理地址的数据,对所述数据进行重组,并将重组后的所述数据发送至所述第一数据处理单元;In response to the conflict flag valid signal, the address strobe keeps strobing the vector address; the address mapping unit maps the vector address to the physical address of the memory; the read data reorganization unit reads Reorganizing the data of the physical address, and sending the reorganized data to the first data processing unit;
    所述冲突判断单元产生冲突标志失效信号;The conflict judgment unit generates a conflict flag failure signal;
    响应于所述冲突标志失效信号,地址选择器选通下一组数据的向量地址。In response to the conflict flag failure signal, the address selector selects the vector address of the next set of data.
  8. 如权利要求7所述的处理器的寻址方法,其特征在于,The addressing method of the processor according to claim 7, wherein:
    所述地址映射单元将所述向量地址映射至所述存储器的物理地址包括:The mapping of the vector address to the physical address of the memory by the address mapping unit includes:
    分别将各个存储块的与所述向量地址对应的第一个存储单元编为一组、与所述向量地址对应的第二个存储单元编为一组、直至与所述向量地址对应的第n个存储单元编为一组,共得到n组存储单元,并依次选通n组存储单元;The first storage unit corresponding to the vector address of each storage block is grouped into a group, and the second storage unit corresponding to the vector address is grouped into a group, until the nth memory unit corresponding to the vector address is grouped. The storage units are grouped into a group to obtain a total of n groups of storage units, and n groups of storage units are sequentially selected;
    所述读数据重组单元读取所述向量地址的所述数据,并对所述数据进行重组,包括:The read data reorganization unit reads the data of the vector address and reorganizes the data, including:
    按照所述n组存储单元的选通顺序,依次读取所述n组存储单元存储的数据,并将所述n组存储单元存储的数据按照地址由小到大的顺序重新排列,得到重组后的所述数据。According to the strobe sequence of the n groups of storage units, read the data stored in the n groups of storage units in sequence, and rearrange the data stored in the n groups of storage units in the order of address ascending, to obtain the reorganized Of said data.
  9. 如权利要求7所述的处理器的寻址方法,其特征在于,The addressing method of the processor according to claim 7, wherein:
    当所述向量地址不存在存储块冲突时,所述冲突判断单元产生冲突标志失效信号;When there is no storage block conflict in the vector address, the conflict judgment unit generates a conflict flag failure signal;
    所述地址映射单元将所述向量地址映射至所述存储器的物理地址;The address mapping unit maps the vector address to the physical address of the memory;
    所述读数据重组单元读取所述物理地址的数据,并将读取的所述数据发送至所述第一数据处理单元;The read data reorganization unit reads the data of the physical address, and sends the read data to the first data processing unit;
    响应于所述冲突标志失效信号,地址选择器选通下一组数据的向量地址。In response to the conflict flag failure signal, the address selector selects the vector address of the next set of data.
  10. 如权利要求9所述的处理器的寻址方法,其特征在于,所述向量地址不存在存储块冲突包括:The addressing method of the processor according to claim 9, wherein the non-existent storage block conflict of the vector address comprises:
    将向量地址平均分为m组或2×m组,且每组地址均对应于所述存储器的一个所述存储块的一个存储单元;Dividing the vector addresses into m groups or 2×m groups evenly, and each group address corresponds to one storage unit of one storage block of the memory;
    其中,所述存储块的位宽为m个字节。Wherein, the bit width of the storage block is m bytes.
  11. 如权利要求5所述的处理器的寻址方法,其特征在于,The addressing method of the processor according to claim 5, wherein:
    所述处理器核通过所述第一寻址单元访问所述存储地址的所述数据包括:The processor core accessing the data of the storage address through the first addressing unit includes:
    所述第一数据收发单元接收所述处理器核发送的所述数据,并将所述数据发送至所述第一数据处理单元;The first data transceiver unit receives the data sent by the processor core, and sends the data to the first data processing unit;
    所述第一数据处理单元对所述数据进行处理,并将处理后的所述数据发送至所述第一冲突处理单元;The first data processing unit processes the data, and sends the processed data to the first conflict processing unit;
    所述第一冲突处理单元利用所述冲突解决机制将所述数据写入所述向量地址。The first conflict processing unit uses the conflict resolution mechanism to write the data into the vector address.
  12. 如权利要求11所述的处理器的寻址方法,其特征在于,所述第一冲突处理单元包括:冲突判断单元、地址映射单元、地址选通器、数据选通器和写数据重组单元;11. The addressing method of a processor according to claim 11, wherein the first conflict processing unit comprises: a conflict judgment unit, an address mapping unit, an address strobe, a data strobe, and a write data recombination unit;
    所述冲突解决机制包括:The conflict resolution mechanism includes:
    所述地址选通器选通所述向量地址,使所述向量地址输出至所述冲突判断单元;The address strobe strobes the vector address so that the vector address is output to the conflict judgment unit;
    所述数据选通器选通所述数据,使所述数据输出至所述写数据重组单元;The data strobe strobes the data so that the data is output to the write data recombination unit;
    当所述向量地址存在存储块冲突时,所述冲突判断单元产生冲突标志有效信号;When there is a memory block conflict in the vector address, the conflict judgment unit generates a conflict flag valid signal;
    响应于所述冲突标志有效信号,所述地址选通器保持选通所述向量地址,所述数据选通器保持选通所述数据;In response to the conflict flag valid signal, the address strobe keeps gating the vector address, and the data strobe keeps gating the data;
    所述写数据重组单元对所述数据进行重组,所述地址映射单元将所述向量地址映射至所述存储器的物理地址,所述写数据重组单元将重组后的数据写入所述存储器;The write data reorganization unit reorganizes the data, the address mapping unit maps the vector address to a physical address of the memory, and the write data reorganization unit writes the reorganized data into the memory;
    所述冲突判断单元产生冲突标志失效信号;The conflict judgment unit generates a conflict flag failure signal;
    响应于所述冲突标志失效信号,所述数据选通器选通下一组数据,地址选择器选通所述下一组数据的向量地址。In response to the conflict flag failure signal, the data strobe strobes the next group of data, and the address selector strobes the vector address of the next group of data.
  13. 如权利要求12所述的处理器的寻址方法,其特征在于,The addressing method of the processor according to claim 12, wherein:
    所述写数据重组单元对所述数据进行重组,包括:The reorganization of the data by the write data reorganization unit includes:
    确定各个存储块的与所述向量地址对应的第一个存储单元,按照地址由小到大的顺序将对应于第一个存储单元的数据编为一行;Determine the first storage unit of each storage block corresponding to the vector address, and group the data corresponding to the first storage unit into a row according to the order of address ascending;
    确定各个存储块的与所述向量地址对应的第二个存储单元,按照地址由小到大的顺序将对应于第二个存储单元的数据编为一行;Determine the second storage unit corresponding to the vector address of each storage block, and compile the data corresponding to the second storage unit into one row according to the order of address from small to large;
    直至确定各个存储块的与所述向量地址对应的第n个存储单元,按照地址由小到大的顺序将对应于第n个存储单元的数据编为一行,共得到n行数据;Until the nth storage unit corresponding to the vector address of each storage block is determined, the data corresponding to the nth storage unit is compiled into one row according to the address ascending order, and a total of n rows of data are obtained;
    所述地址映射单元将所述向量地址映射至所述存储器的物理地址,包括:The address mapping unit mapping the vector address to the physical address of the memory includes:
    所述地址映射单元依次选通所述n行数据对应的n组存储单元;The address mapping unit sequentially selects n groups of storage units corresponding to the n rows of data;
    所述写数据重组单元将重组后的所述数据写入所述存储器,包括:The writing data reorganization unit to write the reorganized data into the memory includes:
    按照所述n组存储单元的选通顺序,依次将所述n行数据写入n组存储单元。According to the strobe sequence of the n groups of memory cells, sequentially write the n rows of data into the n groups of memory cells.
  14. 如权利要求12所述的处理器的寻址方法,其特征在于,The addressing method of the processor according to claim 12, wherein:
    当所述向量地址不存在存储块冲突时,所述冲突判断单元产生冲突标志失效信号;When there is no storage block conflict in the vector address, the conflict judgment unit generates a conflict flag failure signal;
    所述地址映射单元将所述向量地址映射至所述存储器的物理地址;The address mapping unit maps the vector address to the physical address of the memory;
    所述写数据重组单元将所述数据写入所述物理地址;The write data reorganization unit writes the data into the physical address;
    响应于所述冲突标志失效信号,地址选择器选通下一组数据的向量地址。In response to the conflict flag failure signal, the address selector selects the vector address of the next set of data.
  15. 如权利要求6所述的处理器的寻址方法,其特征在于,所述第一数据处理单元对所述数据进行处理,包括:8. The processor addressing method of claim 6, wherein the processing of the data by the first data processing unit comprises:
    对于所述第一冲突处理单元发送的数据,所述第一数据处理单元对其中的部分字节进行拼接。For the data sent by the first conflict processing unit, the first data processing unit splices some bytes therein.
  16. 如权利要求15所述的处理器的寻址方法,其特征在于,所述第一数据处理单元对其中的部分字节进行拼接,包括:The addressing method of the processor according to claim 15, wherein the first data processing unit splicing some bytes therein includes:
    对于所述第一冲突处理单元发送的数据,当需要从其每m个字节中选择k个字节读取,则从每m个字节中选择所述k个字节,得到N×k个字节;其中k≤log 2 mFor the data sent by the first conflict processing unit, when it is necessary to select k bytes from every m bytes to read, then select the k bytes from every m bytes to obtain N×k Bytes; where k≤log 2 m ;
    将所述N×k个字节的每m个字节组合在一起,得到m×k块、每块宽度为m个字节的数据;Combine each m bytes of the N×k bytes together to obtain m×k blocks of data with a width of m bytes each;
    其中,N为所述存储器的存储块数量;所述存储块的位宽为m个字节。Wherein, N is the number of storage blocks of the memory; the bit width of the storage block is m bytes.
  17. 如权利要求11所述的处理器的寻址方法,其特征在于,所述第一数据处理单元对所述数据进行处理,包括:The addressing method of the processor according to claim 11, wherein the processing of the data by the first data processing unit comprises:
    所述第一数据处理单元对所述数据进行拆分。The first data processing unit splits the data.
  18. 如权利要求17所述的处理器的寻址方法,其特征在于,所述第一数据处理单元对所述数据进行拆分,包括:The addressing method for a processor according to claim 17, wherein the first data processing unit to split the data comprises:
    当所述数据包括m×k块且每块宽度为m个字节时,对每块的m个字节进行拆分,得到N×k个字节,使每k个字节分别对应一个存储块的k个地址;When the data includes m×k blocks and the width of each block is m bytes, the m bytes of each block are split to obtain N×k bytes, so that each k bytes corresponds to one storage K addresses of the block;
    其中,N为所述存储器的存储块数量;所述存储块的位宽为m个字节;k≤log 2 mWherein, N is the number of storage blocks of the memory; the bit width of the storage block is m bytes; k≤log 2 m .
  19. 如权利要求3所述的处理器的寻址方法,其特征在于,The addressing method of the processor according to claim 3, wherein:
    所述寻址模块获取数据在所述存储器的基地址以及偏移地址,包括:The acquisition of the base address and offset address of data in the memory by the addressing module includes:
    所述第一寻址单元基于多种模式获取多组所述数据的所述基地址以及所述偏移地址。The first addressing unit obtains the base address and the offset address of multiple sets of the data based on multiple modes.
  20. 如权利要求19所述的处理器的寻址方法,其特征在于,所述多种模式至少包括:偏移地址更新模式、基地址更新模式。The addressing method of the processor according to claim 19, wherein the multiple modes include at least: an offset address update mode and a base address update mode.
  21. 如权利要求20所述的处理器的寻址方法,其特征在于,所述一组寻址单元至少还包括:第二寻址单元;The addressing method for a processor according to claim 20, wherein the group of addressing units at least further comprises: a second addressing unit;
    所述偏移地址更新模式包括:The offset address update mode includes:
    所述第一地址计算单元获取所述处理器核发送的基地址;Obtaining the base address sent by the processor core by the first address calculation unit;
    所述第二寻址单元依次读取每组所述数据在所述存储器的所述偏移地址;The second addressing unit sequentially reads the offset address of each group of the data in the memory;
    所述第一地址计算单元获取所述第二寻址单元读取的所述偏移地址。The first address calculation unit obtains the offset address read by the second addressing unit.
  22. 如权利要求21所述的处理器的寻址方法,其特征在于,The addressing method of the processor according to claim 21, wherein:
    所述寻址模块根据所述基地址和所述偏移地址得到所述数据在所述存储器的存储地址,包括:The addressing module obtains the storage address of the data in the memory according to the base address and the offset address, including:
    所述第一地址计算单元依次将所述处理器核发送的所述基地址与每组所述数据的 所述偏移地址相加,得到每组所述数据的所述向量地址。The first address calculation unit sequentially adds the base address sent by the processor core and the offset address of each group of data to obtain the vector address of each group of data.
  23. 如权利要求20所述的处理器的寻址方法,其特征在于,所述一组寻址单元至少还包括:第二寻址单元;The addressing method for a processor according to claim 20, wherein the group of addressing units at least further comprises: a second addressing unit;
    所述基地址更新模式包括:The base address update mode includes:
    所述第一地址计算单元依次获取所述处理器核发送的每组所述数据的基地址更新值;The first address calculation unit sequentially obtains the base address update value of each group of the data sent by the processor core;
    所述第二寻址单元循环读取每组所述数据在所述存储器的同一所述偏移地址;The second addressing unit cyclically reads the same offset address of each group of the data in the memory;
    所述第一地址计算单元依次将每组所述数据的所述基地址更新值累加至前一组所述数据的基地址,得到每组所述数据的基地址。The first address calculation unit sequentially accumulates the updated base address value of each group of data to the base address of the previous group of data to obtain the base address of each group of data.
  24. 如权利要求23所述的处理器的寻址方法,其特征在于,The addressing method of the processor according to claim 23, wherein:
    所述寻址模块根据所述基地址和所述偏移地址得到所述数据在所述存储器的存储地址,包括:The addressing module obtains the storage address of the data in the memory according to the base address and the offset address, including:
    所述第一地址计算单元依次将每组所述数据的基地址与同一所述偏移地址相加,得到每组所述数据的所述向量地址。The first address calculation unit sequentially adds the base address of each group of data to the same offset address to obtain the vector address of each group of data.
  25. 如权利要求21或23所述的处理器的寻址方法,其特征在于,所述寻址方法还包括:The addressing method of the processor according to claim 21 or 23, wherein the addressing method further comprises:
    所述第二寻址单元获取所述处理器核发送的所述偏移地址,并将所述偏移地址写入所述存储器。The second addressing unit obtains the offset address sent by the processor core, and writes the offset address into the memory.
  26. 如权利要求25所述的处理器的寻址方法,其特征在于,所述第二寻址单元包括:第二冲突处理单元、第二数据处理单元和第二数据收发单元;The addressing method for a processor according to claim 25, wherein the second addressing unit comprises: a second conflict processing unit, a second data processing unit, and a second data transceiving unit;
    所述第二寻址单元读取所述数据在所述存储器的所述偏移地址,包括:The reading of the offset address of the data in the memory by the second addressing unit includes:
    所述第二冲突处理单元利用所述冲突解决机制从所述存储器读取所述偏移地址,并将所述偏移地址发送给所述第二数据处理单元;The second conflict processing unit reads the offset address from the memory by using the conflict resolution mechanism, and sends the offset address to the second data processing unit;
    所述第二数据处理单元对所述偏移地址进行处理,并将处理后的所述偏移地址发送至所述第二数据收发单元;The second data processing unit processes the offset address, and sends the processed offset address to the second data transceiving unit;
    所述第二数据收发单元将处理后的所述偏移地址发送至所述第一寻址单元。The second data transceiving unit sends the processed offset address to the first addressing unit.
  27. 如权利要求3所述的处理器的寻址方法,其特征在于,The addressing method of the processor according to claim 3, wherein:
    所述寻址模块获取数据在所述存储器的基地址以及偏移地址,包括:The acquisition of the base address and offset address of data in the memory by the addressing module includes:
    所述第一地址计算单元获取所述处理器核发送的基地址;Obtaining the base address sent by the processor core by the first address calculation unit;
    所述第一地址计算单元获取所述处理器核发送的偏移地址;Obtaining the offset address sent by the processor core by the first address calculation unit;
    所述寻址模块根据所述基地址和所述偏移地址得到所述数据在所述存储器的存储地址,包括:The addressing module obtains the storage address of the data in the memory according to the base address and the offset address, including:
    所述第一地址计算单元将所述基地址与所述偏移地址相加,得到所述向量地址。The first address calculation unit adds the base address and the offset address to obtain the vector address.
  28. 如权利要求2所述的处理器的寻址方法,其特征在于,所述寻址模块包括:多组寻址单元;所述寻址方法由所述多组寻址单元并行执行。3. The addressing method of the processor according to claim 2, wherein the addressing module comprises: multiple groups of addressing units; and the addressing method is executed in parallel by the multiple groups of addressing units.
  29. 如权利要求2所述的处理器的寻址方法,其特征在于,所述一组寻址单元通过乒乓寻址方式获取所述基地址或所述偏移地址。3. The addressing method of the processor according to claim 2, wherein the group of addressing units obtains the base address or the offset address through a ping-pong addressing mode.
  30. 如权利要求29所述的处理器的寻址方法,其特征在于,所述一组寻址单元至少包括:第三寻址单元、第四寻址单元和第五寻址单元;The addressing method for a processor according to claim 29, wherein the group of addressing units at least includes: a third addressing unit, a fourth addressing unit, and a fifth addressing unit;
    所述一组寻址单元通过乒乓寻址方式获取所述偏移地址,包括:The group of addressing units obtains the offset address in a ping-pong addressing manner, including:
    所述处理器核通过所述第四寻址单元和所述第五寻址单元将所述偏移地址交替写入所述存储器;The processor core alternately writes the offset address into the memory through the fourth addressing unit and the fifth addressing unit;
    所述第三寻址单元获取所述处理器核发送的所述基地址,并通过所述第四寻址单元和所述第五寻址单元交替获取存储在所述存储器中的所述偏移地址。The third addressing unit obtains the base address sent by the processor core, and alternately obtains the offset stored in the memory through the fourth addressing unit and the fifth addressing unit address.
  31. 如权利要求29所述的处理器的寻址方法,其特征在于,所述一组寻址单元至少包括:第六寻址单元、第七寻址单元和第八寻址单元;The addressing method for a processor according to claim 29, wherein the group of addressing units at least includes: a sixth addressing unit, a seventh addressing unit, and an eighth addressing unit;
    所述一组寻址单元通过乒乓寻址方式获取所述基地址,包括:The group of addressing units acquiring the base address in a ping-pong addressing manner includes:
    所述处理器核通过所述第八寻址单元将所述偏移地址写入所述存储器;The processor core writes the offset address into the memory through the eighth addressing unit;
    所述第六寻址单元和所述第七寻址单元交替获取所述处理器核发送的所述基地址,并通过所述第八寻址单元获取存储在所述存储器中的所述偏移地址。The sixth addressing unit and the seventh addressing unit alternately obtain the base address sent by the processor core, and obtain the offset stored in the memory through the eighth addressing unit address.
  32. 如权利要求5所述的处理器的寻址方法,其特征在于,所述第一寻址单元还包括:第一控制单元;8. The addressing method for a processor according to claim 5, wherein the first addressing unit further comprises: a first control unit;
    所述寻址方法还包括:The addressing method further includes:
    所述第一控制单元通过握手协议与所述处理器核通信。The first control unit communicates with the processor core through a handshake protocol.
  33. 如权利要求32所述的处理器的寻址方法,其特征在于,所述第一控制单元控制所述第一地址计算单元、第一冲突处理单元、第一数据处理单元和第一数据收发单元以流水线的方式工作。The addressing method of the processor according to claim 32, wherein the first control unit controls the first address calculation unit, the first conflict processing unit, the first data processing unit, and the first data transceiving unit Work in an assembly line manner.
  34. 如权利要求33所述的处理器的寻址方法,其特征在于,所述处理器核通过所述第一寻址单元访问所述向量地址的所述数据,包括:The addressing method for a processor according to claim 33, wherein the processor core accessing the data of the vector address through the first addressing unit comprises:
    所述处理器核通过所述第一寻址单元从所述向量地址读取所述数据,且所述第一地址计算单元位于所述流水线的第一级、所述第一冲突处理单元位于所述流水线的第二级、所述第一数据处理单元位于所述流水线的第三级和第四级、所述第一数据收发单元位于所述流水线的第五级。The processor core reads the data from the vector address through the first addressing unit, and the first address calculation unit is located in the first stage of the pipeline, and the first conflict processing unit is located in the first stage of the pipeline. The second stage of the pipeline, the first data processing unit are located in the third and fourth stages of the pipeline, and the first data transceiving unit is located in the fifth stage of the pipeline.
  35. 如权利要求34所述的处理器的寻址方法,其特征在于,所述第一控制单元包括:读写请求缓存;The addressing method of the processor according to claim 34, wherein the first control unit comprises: a read and write request cache;
    所述处理器核通过所述第一寻址单元访问所述向量地址的所述数据,包括:The access by the processor core to the data of the vector address through the first addressing unit includes:
    所述处理器核通过所述第一寻址单元将所述数据写入所述向量地址,且所述第一数据收发单元与所述读写请求缓存位于同一级、所述第一地址计算单元和所述第一数据处理单元位于所述流水线的第一级、所述第一冲突处理单元位于所述流水线的第二级。The processor core writes the data into the vector address through the first addressing unit, and the first data transceiver unit and the read-write request cache are at the same level, and the first address calculation unit And the first data processing unit is located in the first stage of the pipeline, and the first conflict processing unit is located in the second stage of the pipeline.
  36. 如权利要求32所述的处理器的寻址方法,其特征在于,所述握手协议包括:The addressing method for a processor according to claim 32, wherein the handshake protocol comprises:
    当读请求有效信号和读请求备好信号同为高时,所述第一控制单元从所述处理器核读取读请求;When the read request valid signal and the read request ready signal are both high, the first control unit reads the read request from the processor core;
    当读数据有效信号和读数据备好信号同为高时,所述第一控制单元控制所述第一数据收发单元将所述数据发送至所述处理器核。When the read data valid signal and the read data ready signal are both high, the first control unit controls the first data transceiver unit to send the data to the processor core.
  37. 如权利要求32所述的处理器的寻址方法,其特征在于,所述握手协议包括:The addressing method for a processor according to claim 32, wherein the handshake protocol comprises:
    当写请求有效信号和写请求备好信号同为高时,所述第一控制单元从所述处理器核读取写请求和数据,写繁忙信号拉高;When the write request valid signal and the write request ready signal are both high, the first control unit reads the write request and data from the processor core, and the write busy signal is pulled high;
    所述第一控制单元将所述数据写入所述存储器,写繁忙信号拉低。The first control unit writes the data into the memory, and the write busy signal is pulled low.
  38. 一种处理器,其特征在于,所述处理器包括:处理器核、寻址模块和存储器;A processor, characterized in that the processor includes: a processor core, an addressing module, and a memory;
    所述寻址模块用于获取数据在所述存储器的基地址以及偏移地址,并根据所述基地址和所述偏移地址得到所述数据在所述存储器的存储地址;The addressing module is configured to obtain the base address and the offset address of the data in the memory, and obtain the storage address of the data in the memory according to the base address and the offset address;
    所述处理器核可通过所述寻址模块访问所述存储器在所述存储地址的所述数据。The processor core may access the data of the memory at the storage address through the addressing module.
  39. 如权利要求38所述的处理器,其特征在于,所述寻址模块包括:至少一组寻址单元;所述至少一组寻址单元用于执行所述寻址模块的操作。The processor of claim 38, wherein the addressing module comprises: at least one set of addressing units; the at least one set of addressing units is used to perform operations of the addressing module.
  40. 如权利要求39所述的处理器,其特征在于,所述一组寻址单元至少包括:第一寻址单元;所述存储地址为向量地址;The processor according to claim 39, wherein the set of addressing units at least comprises: a first addressing unit; and the storage address is a vector address;
    所述处理器核可通过所述第一寻址单元访问所述向量地址的所述数据;当所述向量地址存在存储块冲突时,所述第一寻址单元可利用冲突解决机制访问所述数据。The processor core may access the data of the vector address through the first addressing unit; when there is a memory block conflict in the vector address, the first addressing unit may use a conflict resolution mechanism to access the data data.
  41. 如权利要求40所述的处理器,其特征在于,The processor of claim 40, wherein:
    所述处理器核还可响应于访问所述数据的代码而生成访问所述数据的任务指令,并将所述任务指令发送至所述第一寻址单元。The processor core may also generate a task instruction for accessing the data in response to the code for accessing the data, and send the task instruction to the first addressing unit.
  42. 如权利要求41所述的处理器,其特征在于,所述第一寻址单元包括:第一地址计算单元、第一冲突处理单元、第一数据处理单元和第一数据收发单元。The processor of claim 41, wherein the first addressing unit comprises: a first address calculation unit, a first conflict processing unit, a first data processing unit, and a first data transceiving unit.
  43. 如权利要求42所述的处理器,其特征在于,The processor of claim 42, wherein:
    所述第一冲突处理单元可利用所述冲突解决机制从所述向量地址读取所述数据,并将所述数据发送给所述第一数据处理单元;The first conflict processing unit may use the conflict resolution mechanism to read the data from the vector address, and send the data to the first data processing unit;
    所述第一数据处理单元用于对所述数据进行处理,并将处理后的所述数据发送至所述第一数据收发单元;The first data processing unit is configured to process the data, and send the processed data to the first data transceiving unit;
    所述第一数据收发单元用于将处理后的所述数据发送至所述处理器核。The first data transceiver unit is configured to send the processed data to the processor core.
  44. 如权利要求43所述的处理器,其特征在于,所述第一冲突处理单元包括:冲突判断单元、地址映射单元、地址选通器和读数据重组单元;The processor of claim 43, wherein the first conflict processing unit comprises: a conflict judgment unit, an address mapping unit, an address strobe, and a read data recombination unit;
    在所述冲突解决机制中:In the conflict resolution mechanism:
    所述地址选通器用于选通所述向量地址,使所述向量地址输出至所述冲突判断单元;The address strobe is used to gate the vector address so that the vector address is output to the conflict judgment unit;
    所述冲突判断单元用于当所述向量地址存在存储块冲突时,产生冲突标志有效信号;The conflict judgment unit is configured to generate a conflict flag valid signal when there is a storage block conflict in the vector address;
    响应于所述冲突标志有效信号,所述地址选通器还用于保持选通所述向量地址,所述地址映射单元用于将所述向量地址映射至所述存储器的物理地址,所述读数据重组单元用于读取所述物理地址的数据,对所述数据进行重组,并将重组后的所述数据发送至所述第一数据处理单元;In response to the conflict flag valid signal, the address strobe is also used to keep strobing the vector address, the address mapping unit is used to map the vector address to the physical address of the memory, and the read The data reorganization unit is configured to read the data of the physical address, reorganize the data, and send the reorganized data to the first data processing unit;
    所述冲突判断单元还用于产生冲突标志失效信号;The conflict judgment unit is also used to generate a conflict flag failure signal;
    响应于所述冲突标志失效信号,所述地址选择器还用于选通下一组数据的向量地址。In response to the conflict flag failure signal, the address selector is also used to select the vector address of the next set of data.
  45. 如权利要求44所述的处理器,其特征在于,The processor of claim 44, wherein:
    所述地址映射单元还用于分别将各个存储块的与所述向量地址对应的第一个存储单元编为一组、与所述向量地址对应的第二个存储单元编为一组、直至与所述向量地址对应的第n个存储单元编为一组,共得到n组存储单元,并依次选通n组存储单元;The address mapping unit is also used to group the first storage unit corresponding to the vector address of each storage block into a group, and the second storage unit corresponding to the vector address into a group, until and The n-th storage unit corresponding to the vector address is grouped into a group to obtain a total of n groups of storage units, and the n groups of storage units are sequentially strobed;
    所述读数据重组单元还用于按照所述n组存储单元的选通顺序,依次读取所述n组存储单元存储的数据,并将所述n组存储单元存储的数据按照地址由小到大的顺序重新排列,得到重组后的所述数据。The read data reorganization unit is also used to read the data stored in the n groups of storage units in sequence according to the strobe sequence of the n groups of storage units, and to order the data stored in the n groups of storage units in descending order of addresses The large order is rearranged to obtain the reorganized data.
  46. 如权利要求44所述的处理器,其特征在于,The processor of claim 44, wherein:
    所述冲突判断单元还用于当所述向量地址不存在存储块冲突时,产生冲突标志失效信号;The conflict judgment unit is further configured to generate a conflict flag failure signal when there is no storage block conflict in the vector address;
    所述地址映射单元还用于将所述向量地址映射至所述存储器的物理地址;The address mapping unit is further configured to map the vector address to the physical address of the memory;
    所述读数据重组单元还用于读取所述物理地址的数据,并将读取的所述数据发送至所述第一数据处理单元;The read data recombination unit is further configured to read the data of the physical address, and send the read data to the first data processing unit;
    响应于所述冲突标志失效信号,所述地址选择器还用于选通下一组数据的向量地 址。In response to the conflict flag failure signal, the address selector is also used to select the vector address of the next set of data.
  47. 如权利要求46所述的处理器,其特征在于,所述向量地址不存在存储块冲突包括:The processor according to claim 46, wherein the non-existent memory block conflict of the vector address comprises:
    将向量地址平均分为m组或2×m组,且每组地址均对应于所述存储器的一个所述存储块的一个存储单元;Dividing the vector addresses into m groups or 2×m groups evenly, and each group address corresponds to one storage unit of one storage block of the memory;
    其中,所述存储块的位宽为m个字节。Wherein, the bit width of the storage block is m bytes.
  48. 如权利要求42所述的处理器,其特征在于,The processor of claim 42, wherein:
    所述第一数据收发单元用于接收所述处理器核发送的所述数据,并将所述数据发送至所述第一数据处理单元;The first data transceiving unit is configured to receive the data sent by the processor core, and send the data to the first data processing unit;
    所述第一数据处理单元用于对所述数据进行处理,并将处理后的所述数据发送至所述第一冲突处理单元;The first data processing unit is configured to process the data, and send the processed data to the first conflict processing unit;
    所述第一冲突处理单元可利用所述冲突解决机制将所述数据写入所述向量地址。The first conflict processing unit may use the conflict resolution mechanism to write the data into the vector address.
  49. 如权利要求48所述的处理器,其特征在于,所述第一冲突处理单元包括:冲突判断单元、地址映射单元、地址选通器、数据选通器和写数据重组单元;The processor of claim 48, wherein the first conflict processing unit comprises: a conflict judgment unit, an address mapping unit, an address strobe, a data strobe, and a write data recombination unit;
    在所述冲突解决机制中:In the conflict resolution mechanism:
    所述地址选通器用于选通所述向量地址,使所述向量地址输出至所述冲突判断单元;The address strobe is used to gate the vector address so that the vector address is output to the conflict judgment unit;
    所述数据选通器还用于选通所述数据,使所述数据输出至所述写数据重组单元;The data strobe is also used to gate the data so that the data is output to the write data recombination unit;
    所述冲突判断单元用于当所述向量地址存在存储块冲突时,产生冲突标志有效信号;The conflict judgment unit is configured to generate a conflict flag valid signal when there is a storage block conflict in the vector address;
    响应于所述冲突标志有效信号,所述地址选通器还用于保持选通所述向量地址,所述数据选通器还用于保持选通所述数据;In response to the conflict flag valid signal, the address strobe is also used to keep gating the vector address, and the data strobe is also used to keep gating the data;
    所述写数据重组单元用于对所述数据进行重组,所述地址映射单元用于将所述向量地址映射至所述存储器的物理地址,所述写数据重组单元将还用于重组后的数据写入所述存储器;The write data reorganization unit is used to reorganize the data, the address mapping unit is used to map the vector address to the physical address of the memory, and the write data reorganization unit will also be used for the reorganized data Write to the memory;
    所述冲突判断单元还用于产生冲突标志失效信号;The conflict judgment unit is also used to generate a conflict flag failure signal;
    响应于所述冲突标志失效信号,所述数据选通器还用于选通下一组数据,所述地 址选择器还用于选通所述下一组数据的向量地址。In response to the conflict flag failure signal, the data strobe is also used to strobe the next set of data, and the address selector is also used to strobe the vector address of the next set of data.
  50. 如权利要求49所述的处理器,其特征在于,The processor of claim 49, wherein:
    所述写数据重组单元还用于:The write data reorganization unit is also used for:
    确定各个存储块的与所述向量地址对应的第一个存储单元,按照地址由小到大的顺序将对应于第一个存储单元的数据编为一行;Determine the first storage unit of each storage block corresponding to the vector address, and group the data corresponding to the first storage unit into a row according to the order of address ascending;
    确定各个存储块的与所述向量地址对应的第二个存储单元,按照地址由小到大的顺序将对应于第二个存储单元的数据编为一行;Determine the second storage unit corresponding to the vector address of each storage block, and compile the data corresponding to the second storage unit into one row according to the order of address from small to large;
    直至确定各个存储块的与所述向量地址对应的第n个存储单元,按照地址由小到大的顺序将对应于第n个存储单元的数据编为一行,共得到n行数据;Until the nth storage unit corresponding to the vector address of each storage block is determined, the data corresponding to the nth storage unit is compiled into one row according to the address ascending order, and a total of n rows of data are obtained;
    所述地址映射单元还用于依次选通所述n行数据对应的n组存储单元;The address mapping unit is further configured to sequentially select n groups of storage units corresponding to the n rows of data;
    所述写数据重组单元还用于按照所述n组存储单元的选通顺序,依次将所述n行数据写入n组存储单元。The write data reorganization unit is further configured to sequentially write the n rows of data into the n groups of storage units according to the gating sequence of the n groups of storage units.
  51. 如权利要求49所述的处理器,其特征在于,The processor of claim 49, wherein:
    所述冲突判断单元还用于当所述向量地址不存在存储块冲突时,产生冲突标志失效信号;The conflict judgment unit is further configured to generate a conflict flag failure signal when there is no storage block conflict in the vector address;
    所述地址映射单元还用于将所述向量地址映射至所述存储器的物理地址;The address mapping unit is further configured to map the vector address to the physical address of the memory;
    所述写数据重组单元还用于将所述数据写入所述物理地址;The write data reorganization unit is also used to write the data into the physical address;
    响应于所述冲突标志失效信号,所述地址选择器还用于选通下一组数据的向量地址。In response to the conflict flag failure signal, the address selector is also used to select the vector address of the next set of data.
  52. 如权利要求43所述的处理器,其特征在于,对于所述第一冲突处理单元发送的数据,所述第一数据处理单元还用于对其中的部分字节进行拼接。The processor of claim 43, wherein for the data sent by the first conflict processing unit, the first data processing unit is further configured to splice some of the bytes therein.
  53. 如权利要求52所述的处理器,其特征在于,所述第一数据处理单元还用于:The processor of claim 52, wherein the first data processing unit is further configured to:
    对于所述第一冲突处理单元发送的数据,当需要从其每m个字节中选择k个字节读取,则从每m个字节中选择所述k个字节,得到N×k个字节;其中k≤log 2 mFor the data sent by the first conflict processing unit, when it is necessary to select k bytes from every m bytes to read, then select the k bytes from every m bytes to obtain N×k Bytes; where k≤log 2 m ;
    将所述N×k个字节的每m个字节组合在一起,得到m×k块、每块宽度为m个字节的数据;Combine each m bytes of the N×k bytes together to obtain m×k blocks of data with a width of m bytes each;
    其中,N为所述存储器的存储块数量;所述存储块的位宽为m个字节。Wherein, N is the number of storage blocks of the memory; the bit width of the storage block is m bytes.
  54. 如权利要求48所述的处理器,其特征在于,所述第一数据处理单元还用于对所述数据进行拆分。The processor of claim 48, wherein the first data processing unit is further configured to split the data.
  55. 如权利要求54所述的处理器,其特征在于,所述第一数据处理单元还用于:The processor of claim 54, wherein the first data processing unit is further configured to:
    当所述数据包括m×k块且每块宽度为m个字节时,对每块的m个字节进行拆分,得到N×k个字节,使每k个字节分别对应一个存储块的k个地址;When the data includes m×k blocks and the width of each block is m bytes, the m bytes of each block are split to obtain N×k bytes, so that each k bytes corresponds to one storage K addresses of the block;
    其中,N为所述存储器的存储块数量;所述存储块的位宽为m个字节;k≤log 2 mWherein, N is the number of storage blocks of the memory; the bit width of the storage block is m bytes; k≤log 2 m .
  56. 如权利要求40所述的处理器,其特征在于,所述第一寻址单元包括:基地址选择器和偏移地址选择器;The processor of claim 40, wherein the first addressing unit comprises: a base address selector and an offset address selector;
    所述基地址更新单元和所述偏移地址选择器可基于多种模式获取多组所述数据的所述基地址以及所述偏移地址。The base address update unit and the offset address selector may obtain the base address and the offset address of multiple sets of the data based on multiple modes.
  57. 如权利要求56所述的处理器,其特征在于,所述多种模式至少包括:偏移地址更新模式、基地址更新模式。The processor of claim 56, wherein the multiple modes include at least: an offset address update mode and a base address update mode.
  58. 如权利要求57所述的处理器,其特征在于,所述一组寻址单元至少还包括:第二寻址单元;The processor of claim 57, wherein the set of addressing units at least further comprises: a second addressing unit;
    在所述偏移地址更新模式中:In the offset address update mode:
    所述基地址选择器用于选择获取所述处理器核发送的基地址;The base address selector is used to select and obtain the base address sent by the processor core;
    所述第二寻址单元用于依次读取每组所述数据在所述存储器的所述偏移地址;The second addressing unit is used to sequentially read the offset address of each group of the data in the memory;
    所述偏移地址选择器用于选择所述第二寻址单元读取的所述偏移地址。The offset address selector is used to select the offset address read by the second addressing unit.
  59. 如权利要求58所述的处理器,其特征在于,The processor of claim 58, wherein:
    所述第一地址计算单元还包括:加法器;The first address calculation unit further includes: an adder;
    所述加法器用于依次将所述处理器核发送的所述基地址与每组所述数据的所述偏移地址相加,得到所述向量地址。The adder is configured to sequentially add the base address sent by the processor core and the offset address of each group of data to obtain the vector address.
  60. 如权利要求57所述的处理器的寻址方法,其特征在于,所述一组寻址单元至少还包括:第二寻址单元;The addressing method for a processor according to claim 57, wherein the group of addressing units at least further comprises: a second addressing unit;
    在所述基地址更新模式中:In the base address update mode:
    所述第一地址计算单元用于依次获取所述处理器核发送的每组所述数据的基地址更新值;The first address calculation unit is configured to sequentially obtain the base address update value of each group of the data sent by the processor core;
    所述第二寻址单元用于循环读取每组所述数据在所述存储器的同一所述偏移地址;The second addressing unit is used to read the same offset address of each group of data in the memory in a loop;
    所述偏移地址选择器用于选择所述第二寻址单元读取的同一所述偏移地址;The offset address selector is used to select the same offset address read by the second addressing unit;
    所述第一地址计算单元还用于依次将每组所述数据的所述基地址更新值累加至前一组所述数据的基地址,得到每组所述数据的基地址。The first address calculation unit is further configured to sequentially accumulate the updated value of the base address of each group of data to the base address of the previous group of data to obtain the base address of each group of data.
  61. 如权利要求60所述的处理器的寻址方法,其特征在于,所述第一地址计算单元还包括:加法器;The addressing method of the processor according to claim 60, wherein the first address calculation unit further comprises: an adder;
    所述加法器用于用于依次将每组所述数据的基地址与同一所述偏移地址相加,得到每组所述数据的所述向量地址。The adder is used to sequentially add the base address of each group of data to the same offset address to obtain the vector address of each group of data.
  62. 如权利要求58或60所述的处理器,其特征在于,The processor of claim 58 or 60, wherein:
    所述第二寻址单元还用于获取所述处理器核发送的所述偏移地址,并将所述偏移地址写入所述存储器。The second addressing unit is further configured to obtain the offset address sent by the processor core, and write the offset address into the memory.
  63. 如权利要求62所述的处理器,其特征在于,所述第二寻址单元包括:第二冲突处理单元、第二数据处理单元和第二数据收发单元;The processor of claim 62, wherein the second addressing unit comprises: a second conflict processing unit, a second data processing unit, and a second data transceiving unit;
    所述第二冲突处理单元可利用所述冲突解决机制从所述存储器读取所述偏移地址,并将所述偏移地址发送给所述第二数据处理单元;The second conflict processing unit may use the conflict resolution mechanism to read the offset address from the memory, and send the offset address to the second data processing unit;
    所述第二数据处理单元用于对所述偏移地址进行处理,并将处理后的所述偏移地址发送至所述第二数据收发单元;The second data processing unit is configured to process the offset address, and send the processed offset address to the second data transceiving unit;
    所述第二数据收发单元用于将处理后的所述偏移地址发送至所述第一寻址单元。The second data transceiving unit is configured to send the processed offset address to the first addressing unit.
  64. 如权利要求40所述的处理器,其特征在于,所述第一寻址单元包括:基地址选择器、偏移地址选择器和加法器;The processor of claim 40, wherein the first addressing unit comprises: a base address selector, an offset address selector, and an adder;
    所述基地址选择器用于选择所述处理器核发送的所述基地址;The base address selector is used to select the base address sent by the processor core;
    所述偏移地址选择器用于选择所述处理器核发送的所述偏移地址;The offset address selector is used to select the offset address sent by the processor core;
    所述加法器用于将所述基地址与所述偏移地址相加,得到所述向量地址。The adder is used to add the base address and the offset address to obtain the vector address.
  65. 如权利要求39所述的处理器,其特征在于,所述寻址模块包括:多组寻址单元;所述多组寻址单元用于并行执行所述寻址模块的操作。The processor of claim 39, wherein the addressing module comprises: multiple groups of addressing units; and the multiple groups of addressing units are used to execute operations of the addressing module in parallel.
  66. 如权利要求39所述的处理器,其特征在于,所述一组寻址单元可通过乒乓寻址方式获取所述基地址或所述偏移地址。The processor of claim 39, wherein the group of addressing units can obtain the base address or the offset address in a ping-pong addressing manner.
  67. 如权利要求66所述的处理器,其特征在于,所述一组寻址单元至少包括:第三寻址单元、第四寻址单元和第五寻址单元;The processor of claim 66, wherein the set of addressing units at least includes: a third addressing unit, a fourth addressing unit, and a fifth addressing unit;
    所述处理器核还用于通过所述第四寻址单元和所述第五寻址单元将所述偏移地址交替写入所述存储器;The processor core is further configured to alternately write the offset address into the memory through the fourth addressing unit and the fifth addressing unit;
    所述第三寻址单元用于获取所述处理器核发送的所述基地址,并通过所述第四寻址单元和所述第五寻址单元交替获取存储在所述存储器中的所述偏移地址。The third addressing unit is used to obtain the base address sent by the processor core, and alternately obtain the stored in the memory through the fourth addressing unit and the fifth addressing unit. Offset address.
  68. 如权利要求66所述的处理器,其特征在于,所述一组寻址单元至少包括:第六寻址单元、第七寻址单元和第八寻址单元;The processor of claim 66, wherein the set of addressing units at least includes: a sixth addressing unit, a seventh addressing unit, and an eighth addressing unit;
    所述处理器核还用于通过所述第八寻址单元将所述偏移地址写入所述存储器;The processor core is further configured to write the offset address into the memory through the eighth addressing unit;
    所述第六寻址单元和所述第七寻址单元用于交替获取所述处理器核发送的所述基地址,并通过所述第八寻址单元获取存储在所述存储器中的所述偏移地址。The sixth addressing unit and the seventh addressing unit are used to alternately obtain the base address sent by the processor core, and obtain the stored in the memory through the eighth addressing unit. Offset address.
  69. 如权利要求42所述的处理器,其特征在于,所述第一寻址单元还包括:第一控制单元;The processor of claim 42, wherein the first addressing unit further comprises: a first control unit;
    所述第一控制单元可通过握手协议与所述处理器核通信。The first control unit may communicate with the processor core through a handshake protocol.
  70. 如权利要求69所述的处理器,其特征在于,所述第一控制单元用于使所述第一地址计算单元、第一冲突处理单元、第一数据处理单元和第一数据收发单元以流水线的方式工作。The processor of claim 69, wherein the first control unit is configured to make the first address calculation unit, the first conflict processing unit, the first data processing unit, and the first data transceiving unit in a pipeline Way to work.
  71. 如权利要求70所述的处理器,其特征在于,The processor of claim 70, wherein:
    所述处理器核还可通过所述第一寻址单元从所述向量地址读取所述数据,且所述第一地址计算单元位于所述流水线的第一级、所述第一冲突处理单元位于所述流水线的第二级、所述第一数据处理单元位于所述流水线的第三级和第四级、所述第一数据收发单元位于所述流水线的第五级。The processor core may also read the data from the vector address through the first addressing unit, and the first address calculation unit is located in the first stage of the pipeline, and the first conflict processing unit It is located in the second stage of the pipeline, the first data processing unit is located in the third and fourth stages of the pipeline, and the first data transceiving unit is located in the fifth stage of the pipeline.
  72. 如权利要求71所述的处理器,其特征在于,所述第一控制单元包括:读写请求缓存;The processor of claim 71, wherein the first control unit comprises: a read and write request cache;
    所述处理器核还可通过所述第一寻址单元将所述数据写入所述向量地址,且所述第一数据收发单元与所述读写请求缓存位于同一级、所述第一地址计算单元和所述第一数据处理单元位于所述流水线的第一级、所述第一冲突处理单元位于所述流水线的第二级。The processor core may also write the data to the vector address through the first addressing unit, and the first data transceiving unit and the read-write request cache are at the same level, and the first address The calculation unit and the first data processing unit are located in the first stage of the pipeline, and the first conflict processing unit is located in the second stage of the pipeline.
  73. 如权利要求69所述的处理器,其特征在于,在握手协议中:The processor of claim 69, wherein, in the handshake protocol:
    当读请求有效信号和读请求备好信号同为高时,所述第一控制单元还用于从所述处理器核读取读请求;When the read request valid signal and the read request ready signal are both high, the first control unit is further configured to read the read request from the processor core;
    当读数据有效信号和读数据备好信号同为高时,所述第一控制单元还用于控制所述第一数据收发单元将所述数据发送至所述处理器核。When the read data valid signal and the read data ready signal are both high, the first control unit is further configured to control the first data transceiver unit to send the data to the processor core.
  74. 如权利要求69所述的处理器,其特征在于,在握手协议中:The processor of claim 69, wherein, in the handshake protocol:
    当写请求有效信号和写请求备好信号同为高时,所述第一控制单元还用于从所述处理器核读取写请求和数据,写繁忙信号拉高;When the write request valid signal and the write request ready signal are both high, the first control unit is also used to read the write request and data from the processor core, and the write busy signal is pulled high;
    所述第一控制单元还用于将所述数据写入所述存储器,写繁忙信号拉低。The first control unit is also used to write the data into the memory, and the write busy signal is pulled low.
  75. 一种计算机可读存储介质,其特征在于,包括指令,当其在计算机上运行时,使得计算机执行如权利要求1至37中任一项权利要求所述的寻址方法。A computer-readable storage medium, characterized by comprising instructions, which when run on a computer, causes the computer to execute the addressing method according to any one of claims 1 to 37.
  76. 一种可移动平台,其特征在于,所述可移动平台包括:机身;所述机身包括:至少一个电路;所述电路包括:至少一个如权利要求38至74任一项所述的处理器。A movable platform, characterized in that the movable platform comprises: a fuselage; the fuselage includes: at least one circuit; and the circuit includes: at least one processor according to any one of claims 38 to 74 Device.
  77. 一种电子设备,其特征在于,所述电子设备包括:壳体;所述壳体内设有: 至少一个电路;所述电路包括:至少一个如权利要求38至74任一项所述的处理器。An electronic device, characterized in that the electronic device comprises: a housing; the housing is provided with: at least one circuit; the circuit comprises: at least one processor according to any one of claims 38 to 74 .
  78. 一种包括指令的计算机程序产品,其特征在于,当所述指令在计算机上运行时,使得计算机执行如权利要求1至37中任一项权利要求所述的寻址方法。A computer program product comprising instructions, characterized in that, when the instructions are run on a computer, the computer executes the addressing method according to any one of claims 1 to 37.
PCT/CN2020/086985 2020-04-26 2020-04-26 Addressing method for processor, processor, movable platform, and electronic device WO2021217293A1 (en)

Priority Applications (2)

Application Number Priority Date Filing Date Title
CN202080004613.6A CN112639747A (en) 2020-04-26 2020-04-26 Addressing method of processor, movable platform and electronic equipment
PCT/CN2020/086985 WO2021217293A1 (en) 2020-04-26 2020-04-26 Addressing method for processor, processor, movable platform, and electronic device

Applications Claiming Priority (1)

Application Number Priority Date Filing Date Title
PCT/CN2020/086985 WO2021217293A1 (en) 2020-04-26 2020-04-26 Addressing method for processor, processor, movable platform, and electronic device

Publications (1)

Publication Number Publication Date
WO2021217293A1 true WO2021217293A1 (en) 2021-11-04

Family

ID=75291167

Family Applications (1)

Application Number Title Priority Date Filing Date
PCT/CN2020/086985 WO2021217293A1 (en) 2020-04-26 2020-04-26 Addressing method for processor, processor, movable platform, and electronic device

Country Status (2)

Country Link
CN (1) CN112639747A (en)
WO (1) WO2021217293A1 (en)

Families Citing this family (1)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN113672499A (en) * 2021-07-20 2021-11-19 平头哥(杭州)半导体有限公司 Method and system for tracking target variable in executable program

Citations (5)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN103810111A (en) * 2012-11-08 2014-05-21 国际商业机器公司 Address Generation In An Active Memory Device
CN104919417A (en) * 2012-12-29 2015-09-16 英特尔公司 Apparatus and method for tracking TLB flushes on a per thread basis
US20160026572A1 (en) * 2014-07-22 2016-01-28 International Business Machines Corporation Cache line crossing load techniques
CN107608913A (en) * 2017-09-25 2018-01-19 郑州云海信息技术有限公司 A kind of CPU addressing methods, device and its CPU addressing equipment used
CN110457198A (en) * 2018-05-07 2019-11-15 龙芯中科技术有限公司 Debugging message output method, device and storage medium

Patent Citations (5)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN103810111A (en) * 2012-11-08 2014-05-21 国际商业机器公司 Address Generation In An Active Memory Device
CN104919417A (en) * 2012-12-29 2015-09-16 英特尔公司 Apparatus and method for tracking TLB flushes on a per thread basis
US20160026572A1 (en) * 2014-07-22 2016-01-28 International Business Machines Corporation Cache line crossing load techniques
CN107608913A (en) * 2017-09-25 2018-01-19 郑州云海信息技术有限公司 A kind of CPU addressing methods, device and its CPU addressing equipment used
CN110457198A (en) * 2018-05-07 2019-11-15 龙芯中科技术有限公司 Debugging message output method, device and storage medium

Also Published As

Publication number Publication date
CN112639747A (en) 2021-04-09

Similar Documents

Publication Publication Date Title
CN107301455B (en) Hybrid cube storage system for convolutional neural network and accelerated computing method
CN110390385B (en) BNRP-based configurable parallel general convolutional neural network accelerator
CN109284825B (en) Apparatus and method for performing LSTM operations
US20220051088A1 (en) Artificial intelligence accelerator, artificial intelligence acceleration device, artificial intelligence acceleration chip, and data processing method
US9378533B2 (en) Central processing unit, GPU simulation method thereof, and computing system including the same
TW202024922A (en) Method and apparatus for accessing tensor data
WO2021217293A1 (en) Addressing method for processor, processor, movable platform, and electronic device
CN114911596B (en) Scheduling method and device for model training, electronic equipment and storage medium
CN115687229A (en) AI training board card, server based on AI training board card, server cluster based on AI training board card and distributed training method based on AI training board card
US11941528B2 (en) Neural network training in a distributed system
WO2022179075A1 (en) Data processing method and apparatus, computer device and storage medium
WO2023071566A1 (en) Data processing method and apparatus, computer device, computer-readable storage medium, and computer program product
CN111078286B (en) Data communication method, computing system and storage medium
EP4142217A1 (en) Inter-node communication method and device based on multiple processing nodes
WO2022027172A1 (en) Data processing apparatus, method, and system, and neural network accelerator
CN114371920A (en) Network function virtualization system based on graphic processor accelerated optimization
CN117940934A (en) Data processing apparatus and method
CN115904681A (en) Task scheduling method and device and related products
WO2021179286A1 (en) Data processing method, prediction method, and calculation device for convolutional neural network, and storage medium
CN117114055B (en) FPGA binary neural network acceleration method for industrial application scene
US11983128B1 (en) Multidimensional and multiblock tensorized direct memory access descriptors
CN117493237B (en) Computing device, server, data processing method, and storage medium
US11467836B2 (en) Executing cross-core copy instructions in an accelerator to temporarily store an operand that cannot be accommodated by on-chip memory of a primary core into a secondary core
CN116166605B (en) Data hybrid transmission method, device, DMA controller, medium and system
CN115129233B (en) Data processing device, method and related product

Legal Events

Date Code Title Description
121 Ep: the epo has been informed by wipo that ep was designated in this application

Ref document number: 20932864

Country of ref document: EP

Kind code of ref document: A1

NENP Non-entry into the national phase

Ref country code: DE

122 Ep: pct application non-entry in european phase

Ref document number: 20932864

Country of ref document: EP

Kind code of ref document: A1