WO2023208087A1 - Procédé et appareil de traitement de données, et dispositif associé - Google Patents

Procédé et appareil de traitement de données, et dispositif associé Download PDF

Info

Publication number
WO2023208087A1
WO2023208087A1 PCT/CN2023/091041 CN2023091041W WO2023208087A1 WO 2023208087 A1 WO2023208087 A1 WO 2023208087A1 CN 2023091041 W CN2023091041 W CN 2023091041W WO 2023208087 A1 WO2023208087 A1 WO 2023208087A1
Authority
WO
WIPO (PCT)
Prior art keywords
data
memory
accelerator
address
state
Prior art date
Application number
PCT/CN2023/091041
Other languages
English (en)
Chinese (zh)
Inventor
熊鹰
徐栋
卢霄
Original Assignee
华为技术有限公司
Priority date (The priority date is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the date listed.)
Filing date
Publication date
Application filed by 华为技术有限公司 filed Critical 华为技术有限公司
Publication of WO2023208087A1 publication Critical patent/WO2023208087A1/fr

Links

Classifications

    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F3/00Input arrangements for transferring data to be processed into a form capable of being handled by the computer; Output arrangements for transferring data from processing unit to output unit, e.g. interface arrangements
    • G06F3/06Digital input from, or digital output to, record carriers, e.g. RAID, emulated record carriers or networked record carriers
    • G06F3/0601Interfaces specially adapted for storage systems
    • G06F3/0602Interfaces specially adapted for storage systems specifically adapted to achieve a particular effect
    • G06F3/061Improving I/O performance
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F12/00Accessing, addressing or allocating within memory systems or architectures
    • G06F12/02Addressing or allocation; Relocation
    • G06F12/06Addressing a physical block of locations, e.g. base addressing, module addressing, memory dedication
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F12/00Accessing, addressing or allocating within memory systems or architectures
    • G06F12/02Addressing or allocation; Relocation
    • G06F12/06Addressing a physical block of locations, e.g. base addressing, module addressing, memory dedication
    • G06F12/0646Configuration or reconfiguration
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F12/00Accessing, addressing or allocating within memory systems or architectures
    • G06F12/02Addressing or allocation; Relocation
    • G06F12/08Addressing or allocation; Relocation in hierarchically structured memory systems, e.g. virtual memory systems
    • G06F12/0802Addressing of a memory level in which the access to the desired data or data block requires associative addressing means, e.g. caches
    • G06F12/0877Cache access modes
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F16/00Information retrieval; Database structures therefor; File system structures therefor
    • G06F16/20Information retrieval; Database structures therefor; File system structures therefor of structured data, e.g. relational data
    • G06F16/23Updating
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F3/00Input arrangements for transferring data to be processed into a form capable of being handled by the computer; Output arrangements for transferring data from processing unit to output unit, e.g. interface arrangements
    • G06F3/06Digital input from, or digital output to, record carriers, e.g. RAID, emulated record carriers or networked record carriers
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F3/00Input arrangements for transferring data to be processed into a form capable of being handled by the computer; Output arrangements for transferring data from processing unit to output unit, e.g. interface arrangements
    • G06F3/06Digital input from, or digital output to, record carriers, e.g. RAID, emulated record carriers or networked record carriers
    • G06F3/0601Interfaces specially adapted for storage systems
    • G06F3/0628Interfaces specially adapted for storage systems making use of a particular technique
    • G06F3/0629Configuration or reconfiguration of storage systems

Definitions

  • the present application relates to the field of computers, and in particular, to a data processing method, device and related equipment.
  • accelerators also called accelerated processing units
  • ASICs application-specific integrated circuits
  • FPGAs field-programmable gate arrays
  • ASICs application-specific integrated circuits
  • FPGAs field-programmable gate arrays
  • ASICs application-specific integrated circuits
  • ML machine learning
  • accelerators have high requirements for real-time reading and writing of data in memory.
  • the current access frequency of the accelerator to data in the memory is not balanced with the available bandwidth provided by the memory to the accelerator.
  • the accelerator needs to perform read and write operations on the data in the memory, it often needs to wait for an additional period of time, resulting in a decrease in the processing performance of the accelerator.
  • This application provides a data processing method, device and related equipment to solve the problem of accelerator processing performance degradation caused by imbalance between the accelerator's access frequency to data in the memory and the available bandwidth provided by the memory to the accelerator.
  • a data processing method is provided.
  • the method is applicable to an accelerator.
  • the accelerator includes a first memory.
  • the method may include the following steps: the accelerator obtains data processing instructions, wherein the data processing instructions include data in the second memory. address, the data reading and writing efficiency of the first memory is greater than the data reading and writing efficiency of the second memory, the accelerator reads data from the first memory according to the data processing instructions, and processes the data, where the first memory is used to store the second memory
  • the data to be cached in the data to be cached includes data accessed by the accelerator in history.
  • the accelerator stores the data to be cached in the second memory in the first memory, where the read and write efficiency of the first memory is greater than the read and write efficiency of the second memory, so that the accelerator is processing the data processing instructions. At this time, it can directly interact with the first memory with higher reading and writing efficiency, thereby improving the data reading and writing efficiency of the accelerator.
  • the accelerator is implemented through application-specific integrated circuit (ASIC) or field-programmable gate array (FPGA) technology
  • the first memory includes static random access memory (static RAM) , SRAM), storage class memory (storage class memory, SCM), register, content-addressable memory (content-addressable memory, CAM), etc.
  • static RAM static random access memory
  • SRAM static random access memory
  • SCM storage class memory
  • register content-addressable memory
  • CAM content-addressable memory
  • the computing device where the accelerator is located may include a central processing unit (CPU) and an accelerator.
  • the accelerator may be a system-level chip implemented through FPGA, ASIC and other technologies. Specifically, it may be used in the computing device 100 to assist the CPU.
  • the above special types of computing tasks can be graphics processing, vector calculations, machine learning, etc.
  • the accelerator can be a graphics processing unit (GPU), a distributed processing unit (DPU), a neural network processing unit (NPU), etc.
  • the accelerator may also be a CPU.
  • the computing device may include multiple processors such as CPU1 and CPU2, where CPU1 is the processor 110 and CPU2 is an accelerator. It should be understood that the above examples are for illustration and are not specifically limited in this application.
  • the first memory and the second memory may be the original internal memory of the accelerator; or the first memory is the original internal memory of the accelerator, and the second memory is the original memory outside the accelerator, for example, the second memory is the accelerator.
  • the method provided by this application can be combined with specific application scenarios to select an appropriate memory as the first memory inside the accelerator. If there are multiple memories inside the accelerator, then the memory with low read and write efficiency can be As the second memory, the one with high read and write efficiency is used as the first memory. If there are not multiple memories inside the accelerator or the read and write efficiencies of multiple memories are the same, then the memory inside the accelerator can be used as the first memory, and the read and write memory outside the accelerator can be used. A memory with low efficiency but large storage capacity is used as the second memory.
  • the above implementation method uses the original memory hardware inside the accelerator and improves the data reading and writing efficiency of the accelerator 120 through algorithms. There is no need to deploy additional cache hardware and reduces the cost of solution implementation, especially for small hardware specifications and sizes implemented by ASIC or FPGA technology. For accelerators, the plan is more implementable.
  • the data to be cached includes data with an access frequency higher than the first threshold.
  • the accelerator may store data in the second memory, and then store the data to be cached in the second memory in the first memory.
  • the accelerator can update the data to be cached to the first memory in real time, or can update the data to be cached to the first memory in a delayed manner or according to a certain algorithm, which is not specifically limited in this application.
  • the data to be cached includes data historically accessed by the accelerator. In this way, when the accelerator accesses the data again, it can directly interact with the first memory, which has a faster reading and writing speed, thereby improving the data reading and writing efficiency of the accelerator.
  • the data to be cached may include data whose historical access frequency is higher than the first threshold. That is to say, the data stored in the first memory is the data with higher access frequency stored in the second memory. Since the first memory has faster reading and writing efficiency, the accelerator can directly communicate with the first memory when processing frequently accessed data. Interaction, thereby improving the read and write efficiency of the accelerator.
  • the size of the first threshold can be determined according to specific application scenarios, and is not specifically limited in this application.
  • the data in the first memory when the data in the first memory reaches the storage threshold, the data in the first memory whose access frequency is not higher than the second threshold can be deleted, and then the data accessed by the accelerator from the second memory can be continued to be stored in the first memory,
  • the data in the first memory reaches the storage threshold, the data in the first memory whose access frequency is not higher than the second threshold is deleted, so that the data stored in the first memory is data that has a higher access frequency and is accessed by the accelerator.
  • the first threshold and the second threshold may be the same value, or the second threshold may be greater than the first threshold, which is not specifically limited in this application.
  • the data to be cached may also be data recently accessed by the accelerator.
  • the latest here may refer to data accessed by the accelerator within a time range.
  • the time range here may be determined according to the storage capacity of the first memory. Specifically, when the amount of data in the first memory reaches the storage threshold, the access time of each data in the first memory is sorted, and the data with the longest access time from the current time will be deleted, and so on, thereby ensuring that the data in the first memory is The access time of stored data is the most recent access time.
  • the data recently accessed by the accelerator may be the data accessed by the accelerator within a preset time range. That is to say, if the current time is T, then the preset time range may be the time range from T-t time to T time, The size of t can be determined according to specific application scenarios, and is not specifically limited in this application.
  • the data to be cached can also include prefetched data.
  • the storage controller can determine the data that the accelerator may access through the prefetch algorithm, extract it from the second memory in advance, and store it to read and write efficiency. In this way, when the accelerator requests to read the data, it can directly interact with the first memory, thereby improving the read and write efficiency of the accelerator.
  • the data to be cached may also include more types of data, which may be determined based on the application scenario of the accelerator, which is not specifically limited in this application.
  • the accelerator configures the first memory and obtains configuration information of the first memory.
  • the configuration information includes one or more of cache switch status, cache address range, and cache capacity, where the cache switch status It is used to indicate whether the accelerator uses the first memory.
  • the cache address range is used to instruct the accelerator to store data with a storage address within the cache address range in the first memory.
  • the cache capacity is used to indicate the capacity of the first memory.
  • configuring the cache switch to on means using the first memory for data storage, and being off means not using the first memory for data storage.
  • Configuring the address range as the target address range means that the data stored in the first memory is the cache data of the data stored in the target address range of the second memory.
  • the cache depth configured as D means that the storage capacity of the first memory is D.
  • the accelerator includes an address memory, and the address memory is used to store a correspondence between an address of the data in the first memory and an address of the data in the second memory.
  • the data processing instructions may be instructions generated by the accelerator, or instructions sent to it by the CPU in the computing device where the accelerator is located.
  • the data processing instructions may be data writing instructions or data reading instructions, and may also include Other instructions for business processing after the data is read out.
  • the accelerator can update, delete, merge and other operations on the data, and can also perform calculations on the multiple data read.
  • Processing such as matrix multiplication, convolution operations, etc., can be determined according to the processing business of the accelerator. This application specifically limits the specific process of subsequent data processing.
  • the write address in the data write instruction is the second storage address of the data. If the above-mentioned second storage address is already stored in the address memory and is the same as the first storage address, Correspondence means that the historical version of the data has been written to the first storage address of the first memory. At this time, the historical version of data corresponding to the first storage address can be updated and the data is sent to the second memory to Request to update the data corresponding to the second storage address.
  • the accelerator can first determine the first storage address corresponding to the second storage address, and then store the data carried by the data write instruction in the first memory. In the first storage address of a memory, the corresponding relationship between the first storage address and the second storage address of the data is stored in the address memory. It should be noted that when the storage controller writes data for the first time, it can determine the first storage address of the data based on the current free address of the first memory. This application does not limit the address allocation strategy of the storage controller.
  • the read address in the data read instruction is the second storage address of the data.
  • the storage controller can compare the second storage address with the address in the address memory. The address is matched. If the address memory includes the second storage address, the corresponding first storage address can be obtained according to the second storage address of the data, and then the data can be read from the first memory.
  • the accelerator can read the data from the second memory according to the second storage address.
  • the accelerator can also determine the first storage address of the data, and store the data read from the second memory into the first memory, so that the accelerator can subsequently read directly from the first memory with higher reading and writing efficiency.
  • the data is obtained, and the mapping relationship between the first storage address and the second storage address of the data is stored in the address memory.
  • the above implementation method uses the address memory to record the correspondence between the first storage address and the second storage address, thereby realizing the addressing of the data to be cached. In this way, receiving the data processing instruction carrying the second storage address can also be done individually. According to the address recorded in the address memory, the data is read from the first memory, so that the accelerator can directly interact with the first memory with faster reading and writing speed, thereby improving the data reading and writing efficiency of the accelerator.
  • the accelerator includes a state information memory, the state information memory is used to store state information of data, the state information includes a first state and a second state, where the state information includes a first state and a second state,
  • the first state is used to indicate that the data is in a modified state and is being written to the first memory or the second memory. At this time, the data cannot be read from the first memory, otherwise the wrong version of data may be read.
  • the second state is used to indicate that the data is not in a modified state, and the data can be read from the first memory at this time.
  • the accelerator when the accelerator processes the data write instruction, if the status information of the data is in the first state, that is, the data is currently being modified, the accelerator can first set the status information of the data to the second state, Then the data is written into the first storage address, and then the data is written into the second memory for data updating.
  • the data in the data processing instruction can be written to the first memory for data update, and the data can be written into the third memory.
  • the second memory performs data updates.
  • the accelerator when the accelerator processes the data read instruction, if the address memory does not include the read address, that is, the data does not exist in the first memory, and when the accelerator reads the data from the second memory, the accelerator will The state information of the data is set to the first state, and the accelerator reads the data from the second memory. After the data is retrieved, when the state information of the data is the first state, the accelerator sets the state information of the data to the second state and stores the data. to the first storage address of the first memory. If the status information changes to the second status, it means that the storage controller has updated the latest version of the data, and the data can no longer be stored in the first memory at this time. It should be understood that if a new version of data is written into the first memory and the second memory during the data reading process, the status information can ensure that the data is the latest version of the data.
  • the address memory includes a read address and the status information is the first state, that is, the previous version of the data is being operated on, the latest version of the data can be read from the second memory. If the address memory includes a read address and the status information is in the second state, that is, the data is not being operated on, the latest version of the data can be read from the first memory.
  • the above state information can be represented by binary characters.
  • the first state is represented by the character "1”
  • the second state is represented by the character "0”.
  • the first state and the second state can also be distinguished by other identifiers. status, this application does not specifically limit this.
  • the status information of the data when the status information of the data is in the first state, it means that the data is being written to the first memory or the second memory. At this time, the data cannot be read from the first memory, otherwise the read may occur. In the case of incorrect version data; when the status information of the data is in the second state, it means that the data is not in a modified state. At this time, the data can be read from the first memory, and the status information of the data is recorded through the status information memory to ensure that the user does not The wrong version of data will be read. Make sure that the data recorded in the first memory is the latest version.
  • a data processing device in a second aspect, includes a first memory.
  • the data processing device includes: an acquisition unit for acquiring data processing instructions, wherein the data processing instructions include data in the second memory.
  • the address in For storing the data to be cached in the second memory, the data to be cached includes data historically accessed by the data processing device.
  • the accelerator stores the data to be cached in the second memory in the first memory, where the read and write efficiency of the first memory is greater than the read and write efficiency of the second memory, so that the accelerator processes the data processing instructions.
  • the accelerator can directly interact with the first memory with higher reading and writing efficiency, thereby improving the data reading and writing efficiency of the accelerator.
  • the data processing device includes a status information memory.
  • the status information memory is used to store status information of the data.
  • the status information includes a first state and a second state, where the first state is used to indicate that the data is in modification. status, the second status is used to indicate that the data is not in a modified state.
  • the data processing device includes an address memory, and the address memory is used to store a correspondence between the address of the data in the first memory and the address of the data in the second memory.
  • the data processing instruction includes a data writing instruction; a processing unit configured to set the status information of the data when the status information of the data is the first status and the data processing instruction is the data writing instruction.
  • the data processing instruction In the second state, data is read from the first memory and updated according to the data processing instructions, and the updated data is stored in the first memory.
  • the data processing instructions include data reading instructions; the processing unit is used to determine that the address in the data processing instruction is not included in the address memory before the data processing device obtains the data processing instructions; the processing unit is configured to for setting the status information of the data to the first state and reading the data from the second memory; and the processing unit for setting the status information of the data to the first state when the status information of the data is the first status.
  • the second state data is stored in the first storage address of the first memory.
  • the device further includes a configuration unit configured to configure the first memory and obtain configuration information of the first memory.
  • the configuration information includes one of cache switch status, cache address range, and cache capacity.
  • the cache switch state is used to indicate whether the data processing device uses the first memory
  • the cache address range is used to indicate the data processing device to store data with a storage address within the cache address range in the first memory
  • the cache capacity is used to indicate whether the data processing device uses the first memory. To indicate the capacity of the first memory.
  • the data processing device is implemented through ASIC or FPGA technology
  • the first memory includes one or more of SRAM, register, SCM, and CAM.
  • the data to be cached includes data with an access frequency higher than the first threshold.
  • an accelerator in a third aspect, includes a processor and a power supply circuit.
  • the power supply circuit is used to supply power to the processor.
  • the processor is used to implement the functions of the operation steps performed by the accelerator described in the first aspect.
  • a computing device in a fourth aspect, includes a CPU and an accelerator.
  • the CPU is used to run instructions to implement business functions of the computing device.
  • the accelerator is used to implement the functions of the operation steps performed by the accelerator described in the first aspect.
  • Figure 1 is a schematic structural diagram of a computing device provided by this application.
  • FIG. 2 is a schematic structural diagram of another computing device provided by this application.
  • FIG. 3 is a schematic structural diagram of an accelerator provided by this application.
  • Figure 4 is a schematic structural diagram of a data processing device provided by this application.
  • Figure 5 is a schematic flowchart of steps in an application scenario of a data processing method provided by this application.
  • Figure 6 is a schematic flowchart of steps for processing data reading instructions in a data processing method provided by this application;
  • FIG. 7 is a schematic flowchart of steps for processing data writing instructions in a data processing method provided by this application.
  • This application provides a computing device.
  • the accelerator in the computing device includes a first memory.
  • the accelerator stores the data to be cached in the second memory in the first memory, where the read and write efficiency of the first memory is greater than the read and write efficiency of the second memory, so that when the accelerator processes the data processing instructions, it can directly communicate with The first memory with higher reading and writing efficiency interacts with each other, thereby improving the data reading and writing efficiency of the accelerator.
  • Figure 1 is a schematic structural diagram of a computing device provided by this application.
  • the computing device 100 may include a processor 110, an accelerator 120, a second memory 130, a communication interface 140 and a storage medium 150.
  • communication connections can be established between the processor 110, the accelerator 120, the second memory 130, the communication interface 140 and the storage medium 150 through a bus.
  • the number of the processor 110, the accelerator 120, the second memory 130, the communication interface 140 and the storage medium 150 may be one or more, and is not specifically limited in this application.
  • the processor 110 and the accelerator 120 may be hardware accelerators or a combination of hardware accelerators.
  • the above-mentioned hardware accelerator is an application-specific integrated circuit (ASIC), a programmable logic device (PLD), or a combination thereof.
  • the above-mentioned PLD is a complex programmable logic device (CPLD), a field-programmable gate array (FPGA), a general array logic (GAL), or any combination thereof.
  • the processor 110 is used to execute instructions in the storage medium 150 to implement the business functions of the computing device 100 .
  • the processor 110 can be a central processing unit (CPU), and the accelerator 120 (also called an accelerated processing unit (APU)) can be a system-level chip implemented through FPGA, ASIC and other technologies.
  • the accelerator 120 is a processing unit in the computing device 100 for assisting the CPU in processing special types of computing tasks.
  • the special types of computing tasks may be graphics processing, vector calculations, machine learning, etc.
  • the accelerator 120 may be a graphics processor (graphics processing unit, GPU), a processor distributed processing unit (data processing unit, DPU), a neural network processor (neural-network processing unit, NPU), etc.
  • the accelerator 120 may also be a CPU.
  • the computing device 100 may include multiple processors such as CPU1 and CPU2, where CPU1 is the processor 110 and CPU2 is the accelerator 120. It should be understood that the above examples are for illustration. There are no specific limitations in this application.
  • the storage medium 150 is a carrier for storing data, such as hard disk (hard disk), U disk (universal serial bus, USB), flash memory (flash), SD card (secure digital memory Card, SD card), memory stick, etc.
  • the hard disk can It is a hard disk drive (HDD), a solid state disk (SSD), a mechanical hard disk (HDD), etc., and is not specifically limited in this application.
  • the storage medium 150 may include a second memory, and in a specific implementation, the second memory may be DDR.
  • the communication interface 140 is a wired interface (such as an Ethernet interface), an internal interface (such as a high-speed serial computer expansion bus (Peripheral Component Interconnect express, PCIe) bus interface), a wired interface (such as an Ethernet interface), or a wireless interface (such as a cellular interface). network interface or using the wireless LAN interface) for communicating with other servers or units.
  • a wired interface such as an Ethernet interface
  • an internal interface such as a high-speed serial computer expansion bus (Peripheral Component Interconnect express, PCIe) bus interface
  • PCIe Peripheral Component Interconnect express
  • Ethernet interface such as an Ethernet interface
  • a wireless interface such as a cellular interface
  • Bus 160 is a peripheral component interconnect express (PCIe) bus, an extended industry standard architecture (EISA) bus, a unified bus (unified bus, Ubus or UB), or a computer express link (compute express) link (CXL), cache coherent interconnect for accelerators (CCIX), etc.
  • PCIe peripheral component interconnect express
  • EISA extended industry standard architecture
  • unified bus unified bus, Ubus or UB
  • CXL computer express link
  • CXL cache coherent interconnect for accelerators
  • the bus 160 is divided into an address bus, a data bus, a control bus, etc. For the sake of clear explanation, various buses are marked as the bus 160 in the figure.
  • the second memory 130 includes volatile memory or non-volatile memory, or both volatile and non-volatile memory.
  • volatile memory is random access memory (RAM).
  • RAM random access memory
  • double rate synchronous dynamic random access memory double data rate RAM, DDR
  • dynamic random access memory DRAM
  • synchronous dynamic random access memory synchronous DRAM, SDRAM
  • double data rate synchronous dynamic random access memory synchronous dynamic random access memory
  • double data date SDRAM, DDR SDRAM double data date SDRAM, DDR SDRAM
  • enhanced synchronous dynamic random access memory enhanced SDRAM, ESDRAM
  • synchronous link dynamic random access memory synchroned RAM
  • direct memory bus random access memory direct rambus RAM, DR RAM
  • the accelerator 120 may also include a first memory 121, where the data reading and writing efficiency of the first memory 121 is greater than the data reading and writing efficiency of the second memory 130.
  • the first memory 121 may include static random access memory (static RAM, SRAM), storage class memory (storage class memory, SCM), register, content-addressable memory (content-addressable memory, CAM), etc. etc., this application does not make specific limitations.
  • the data to be cached stored in the second memory 130 can be cached in the first memory 121.
  • the read and write efficiency can be improved.
  • the data is read from the high-performance first memory 121, thereby making up for the difference in processing speed between the accelerator 120 and the low-performance memory, and improving the data reading and writing efficiency of the accelerator 120.
  • the second memory 130 in Figure 1 can be the original memory inside the computing device 100
  • the first memory 121 can be the original memory inside the accelerator 120.
  • This application uses the original memory inside the computing device 100 and the accelerator 120.
  • Hardware storage the algorithm improves the data reading and writing efficiency of the accelerator 120, without the need to deploy additional cache hardware, reducing the implementation cost of the solution.
  • the solution is more implementable.
  • Figure 1 is an exemplary division method of the present application.
  • the second memory 130 can also be deployed inside the accelerator 120.
  • the accelerator 120 itself includes At least two memories, one of which has a higher reading and writing efficiency than the second memory, then the memory with low reading and writing efficiency inside the accelerator 120 can be used as the second memory 130, and the memory with high reading and writing efficiency can be used as the first memory.
  • Memory 121 the memory with low reading and writing efficiency inside the accelerator 120 can be used as the second memory 130, and the memory with high reading and writing efficiency can be used as the first memory.
  • the communication between the first memory 121 and the second memory 130 may be off-chip communication.
  • the bus can be an off-chip bus.
  • the off-chip bus here generally refers to the public information channel between the CPU and external devices, such as the above-mentioned PCIe bus, EISA bus, UB bus, CXL bus, CCIX bus, GenZ bus, etc., this application No specific limitation is made.
  • the communication between the first memory 121 and the second memory 130 may be in-band communication, and the bus between the first memory 121 and the second memory 130 may be On-chip buses, such as advanced eXtensible interface (AXI) bus, advanced microcontroller bus architecture (AMBA), etc., are not specifically limited in this application.
  • AXI advanced eXtensible interface
  • AMBA advanced microcontroller bus architecture
  • the data to be cached stored in the second memory 130 can be cached in the first memory 121.
  • the accelerator 120 needs to access the data in the second memory 130, it can be accessed from The data is read from the first memory 121 with higher reading and writing efficiency, thereby making up for the difference in processing speed between the accelerator 120 and the low-performance memory, and improving the data reading and writing efficiency of the accelerator 120 .
  • first memory 121 and the second memory 130 in Figure 2 are the original memories inside the accelerator 120.
  • the data reading and writing efficiency of the accelerator 120 is improved through algorithms, without additional Deploy caching hardware to reduce the implementation cost of the solution.
  • the solution is more implementable.
  • Figure 3 is an accelerator provided by this application.
  • a schematic structural diagram of , Figure 3 is an exemplary division method.
  • the accelerator 120 may include a storage controller 122, a first memory 121, a status information memory 123 and an address memory 124, where the storage controller 122, A communication connection is established between the first memory 121, the status information memory 123 and the address memory 124 through an internal bus.
  • the internal bus reference can be made to the description of the bus 160, which will not be repeated here.
  • the accelerator 120 may also include a power supply circuit, and the power supply circuit may provide power to the memory controller 122 .
  • the memory controller 122 can be implemented by a hardware logic circuit, for example, an application specific integrated circuit (ASIC) to implement various functions of the accelerator 120 .
  • the power supply circuit may be located in the same accelerator as the storage controller 122, or may be located in another accelerator other than the accelerator where the storage controller 122 is located.
  • the power supply circuit includes but is not limited to at least one of the following: a power supply subsystem, a power management accelerator, a power management processor, or a power management control circuit.
  • accelerator 120 is an independent accelerator.
  • the first memory 121 is used to store data
  • the status information memory 123 is used to store the status information of the data
  • the address memory 124 is used to store the address information of the data.
  • the first memory 121 may be a memory with a greater reading and writing efficiency than the second memory 130 , such as SRAM; the storage space required for status information is small, but it needs to be synchronized with the status of the data, so the status information memory 123 can be a register (register); the address memory 124 can be a CAM.
  • the CAM is based on content
  • the working mechanism of the addressed memory is to compare an input data item with all data items stored in the CAM to determine whether the input data item matches the data items stored in the CAM, so the address information used to store the data
  • the address memory 124 can be implemented using the CAM in the accelerator 120. In this way, when the user requests to access data, the CAM can match the address of the data requested by the user with the address information stored in the CAM. If it matches, it means that the data has been stored in In the first memory 121, it should be understood that the above examples are for illustration, and the address memory 124 can also be implemented using other memories, which is not specifically limited in this application.
  • the first memory 121 in this application is implemented using the memory in the accelerator 120, and does not require additional cache deployment.
  • the cache function of accelerators such as FPGA and ASIC is realized with very low hardware cost, and the software implementation process only needs to pass
  • the online programming function of FPGA and ASIC can be realized, which can solve the problem of accelerator's read and write efficiency limited by the memory bandwidth bottleneck in a simple, efficient and low-cost way.
  • the storage controller 122 can obtain a data processing instruction, wherein the data processing instruction includes the address of the second memory 130, and reads the data from the first memory according to the address of the second memory 130, The data is then processed according to the data processing instructions.
  • data processing instructions may be data processing instructions generated by the accelerator 120 during business processing, or may be data processing instructions sent by the processor 110 to the accelerator 120, which is not specifically limited in this application.
  • the storage controller 122 may store data in the second memory 130, and then store the data to be cached in the second memory 130 in the first memory 121.
  • the storage controller 122 can update the data to be cached to the first memory 121 in real time, or can update the data to be cached to the first memory 121 with a delay or according to a certain algorithm, which is not specifically limited in this application.
  • the data to be cached includes data historically accessed by the accelerator 120 . In this way, when the accelerator 120 accesses the data again, it can directly interact with the first memory 121 that has a faster reading and writing speed, thereby improving the data reading and writing efficiency of the accelerator 120 .
  • the data to be cached may include data whose historical access frequency is higher than the first threshold. That is to say, the data stored in the first memory 121 is the data with a higher access frequency stored in the second memory 130. Since the first memory has a faster reading and writing efficiency, the accelerator 120 can directly process the frequently accessed data. The first memory 121 interacts, thereby improving the reading and writing efficiency of the accelerator 120 .
  • the size of the first threshold can be determined according to specific application scenarios, and is not specifically limited in this application.
  • the access frequency in the first memory 121 can be The data whose rate is not higher than the second threshold is deleted, and then the data accessed by the accelerator 120 from the second memory 130 is continued to be stored in the first memory 121.
  • the data in the first memory 121 reaches the storage threshold, the data in the first memory 121 is deleted.
  • Data whose access frequency is not higher than the second threshold is deleted, so that the data stored in the first memory 121 is data whose access frequency is higher and is accessed by the accelerator 120 .
  • the first threshold and the second threshold may be the same value, or the second threshold may be greater than the first threshold, which is not specifically limited in this application.
  • the data to be cached may also be data recently accessed by the accelerator 120 , where the latest may refer to data accessed by the accelerator 120 within a time range, and the time range here may be determined according to the storage capacity of the first memory 121 , specifically, when the amount of data in the first memory 121 reaches the storage threshold, the access time of each data in the first memory 121 is sorted, and the data with the longest access time from the current time will be deleted, and so on, so that It is guaranteed that the access time of the data stored in the first memory 121 is the latest access time.
  • the data recently accessed by the accelerator 120 may be the data accessed by the accelerator 120 within a preset time range. That is to say, if the current time is T, then the preset time range may be the period from time T-t to time T.
  • the range, where the size of t can be determined according to specific application scenarios, is not specifically limited in this application.
  • the possibility of the recently accessed data in the second memory 130 being accessed again by the accelerator 120 is very high, and it is stored in the first memory 121, so that when the accelerator 120 accesses the data again, it can be directly related to the read and write speed.
  • the faster first memory 121 interacts, thereby improving the data reading and writing efficiency of the accelerator 120 .
  • the data to be cached may also include prefetched data.
  • the storage controller 122 may determine the data that the accelerator 120 may access through a prefetching algorithm, extract it from the second memory 130 in advance, and store it in In this way, when the accelerator 120 requests to read the data, it can directly interact with the first memory 121, thereby improving the reading and writing efficiency of the accelerator 120.
  • the data to be cached can also include more types of data, which can be specifically determined according to the application scenario of the accelerator 120, which is not specifically limited in this application.
  • the user before the storage controller 122 stores data in the first memory 121, the user can configure the storage controller 122.
  • the specific configuration content may include configuring the cache switch, configuring the address range, configuring the cache depth, etc.
  • when the cache switch is configured to be on it means that the first memory 121 is used for data storage, and when it is set to off, it means that the first memory 121 is not used for data storage.
  • the address range configured as the target address range means that the data stored in the first memory 121 is the cache data of the data stored in the target address range of the second memory 130 .
  • the cache depth configuration of D means that the storage capacity of the first memory 121 is D.
  • the user can also perform other configurations on the storage controller 122, which can be determined based on the actual business environment, and is not specifically limited in this application.
  • the cache depth is configured as 2M
  • the cache switch is configured as on
  • the address range is configured as add0 ⁇ add5
  • the data requested by the accelerator 120 to write add0 ⁇ add5 will be cached first.
  • the data of add0 to add5 can also be written to the second memory 130 with a delay, while data requested to be written to other addresses in the second memory 130 can be directly written to the second memory 130 .
  • the cache switch can also be turned off. It should be understood that the above examples are for illustration and are not specifically limited in this application.
  • the data processing instructions obtained by the storage controller 122 may be data read instructions or data write instructions.
  • the data write instructions include the second storage address of the data.
  • the second storage address is the storage address of the second memory 130. You can first determine whether the cache switch is on. If the cache switch is on, determine whether the address carried by the data processing instruction is within the address range configured by the user. If it is within the address range configured by the user, configured address range, then the storage controller 122 can store the data in the first memory 121 , otherwise, the data can be stored in the second memory 130 .
  • the storage controller 122 obtains the data read instruction, it determines that the cache switch is on and the read address is within the address range configured by the user, and then reads the data from the first memory 121. Otherwise, The data is read from the second memory 130 and will not be described again here.
  • the storage controller 122 may store the correspondence between the first storage address and the second storage address of the data in the address memory 124 .
  • the write address in the data write instruction is the second storage address of the data. If the above-mentioned second storage address is already stored in the address memory 124 and is the same as the second storage address, A storage address correspondence means that the historical version of the data has been written to the first storage address of the first memory 121. At this time, the historical version of data corresponding to the first storage address can be updated and sent to the second memory. 130. Send the data to request updating of the data corresponding to the second storage address.
  • the storage controller 122 can first determine the first storage address corresponding to the second storage address, and then carry the data writing instruction. The data is stored in the first storage address of the first memory 121, and the corresponding relationship between the first storage address and the second storage address of the data is stored in the address memory 124. It should be noted that when the storage controller 122 writes data for the first time, it can determine the first storage address of the data based on the current free address of the first memory 121. This application does not limit the address allocation strategy of the storage controller 122.
  • the storage controller 122 when the storage controller 122 obtains a data read instruction, the read address in the data read instruction is the second storage address of the data. At this time, the storage controller 122 can compare the second storage address with the address memory. Match the address in 124. If the address memory 124 includes the second storage address, the corresponding first storage address can be obtained according to the second storage address of the data, and then the data can be read from the first memory 121.
  • the storage controller 122 can read the data from the second memory 130 according to the second storage address. data. At the same time, the storage controller 122 can also determine the first storage address of the data, and store the data read from the second memory 130 into the first memory 121, so that the accelerator 120 can directly read and write the data directly from the second memory 130. The data is read from the first memory 121, and the mapping relationship between the first storage address and the second storage address of the data is stored in the address memory 124.
  • the accelerator 120 includes a status information memory 123.
  • the status information memory 123 is used to store the status information of the data.
  • the storage controller 122 may first determine the status information of the data, and write the data into the third page according to the status information.
  • the status information includes a first state and a second state. The first state is used to indicate that the data is in a modification state and is being written to the first memory 121 or the second memory 130. At this time, the data cannot be read from the first memory 121 or the second memory 130.
  • the second state is used to indicate that the data is not in a modified state, and the data can be read from the first memory 121 at this time.
  • the accelerator 120 when the accelerator 120 processes the data writing instruction, if the status information of the data is in the first state, that is, the data is currently being modified, the accelerator can first set the status information of the data to the second state. , then write the data into the first storage address, and then write the data into the second memory 130 for data update.
  • the data in the data processing instruction can be written to the first memory 121 for data update, and the data is written into The second memory 130 performs data updating.
  • the accelerator 120 when the accelerator 120 processes the data read instruction, if the address memory does not include the read address, , that is to say, there is no such data in the first memory 121, and when the accelerator 120 reads the data from the second memory, the accelerator 120 sets the state information of the data to the first state, and the accelerator 120 reads the data from the second memory 130, After retrieving the data, when the status information of the data is in the first state, the accelerator 120 sets the status information of the data to the second status, and stores the data to the first storage address of the first memory 121 . If the status information changes to the second status, it means that the storage controller 122 has updated the latest version of the data, and the data can no longer be stored in the first memory 121 at this time. It should be understood that if a new version of data is written into the first memory 121 and the second memory 130 during the data reading process, the status information can ensure that the data is the latest version of the data.
  • the address memory 124 includes a read address and the status information is the first state, that is, the previous version of the data is being operated on, the latest version of the data can be read from the second memory 130 at this time. If the address memory 124 includes a read address and the status information is in the second state, that is, the data is not being operated on, the latest version of the data can be read from the first memory 121 at this time.
  • the above state information can be represented by binary characters.
  • the first state is represented by the character "1”
  • the second state is represented by the character "0”.
  • the first state and the second state can also be distinguished by other identifiers. status, this application does not specifically limit this.
  • the above-mentioned accelerator 120 can also be a CPU
  • the first memory can be a memory in the CPU, such as an SRAM in the CPU
  • the storage controller 122 can cache data in combination with the multi-level cache architecture of the CPU, such as as the third memory of the CPU.
  • the four-level cache architecture enables more levels of CPU cache and reduces the implementation cost of multi-level cache on the basis of reducing hardware complexity.
  • the data processing instructions not only include the above-mentioned data reading instructions and data writing instructions, but also include other instructions for performing business processing after reading the data, such as after the accelerator 120 reads the data from the first memory.
  • the data can be updated, deleted, merged, etc., and the multiple data read can also be calculated and processed, such as matrix multiplication, convolution operations, etc.
  • the details can be determined according to the processing business of the accelerator 120. This application is The specific process for subsequent processing of data is specifically limited.
  • FIG 4 is a schematic structural diagram of a data processing device provided by this application.
  • the data processing device 400 can be the accelerator 120 in Figure 3.
  • the data processing device 400 can include a configuration module 1221 and an acquisition module 1225. and processing module 1226, wherein the processing module 1226 may include a search and write data update module 1222, a read data return processing module 1223, and a read data update module 1224. It should be understood that each of the above modules may correspond to each circuit module in an ASIC or FPGA. .
  • the functions of the acquisition module 1225, the configuration module 1221 and the processing module 1226 in Figure 4 can be implemented by the storage controller 122 in Figure 3, and the data processing device 400 shown in Figure 4 is the one shown in Figure 1
  • the division method in the application scenario is the application scenario in which the second memory 130 is deployed outside the accelerator. It should be understood that in the application scenario shown in Figure 2, the second memory 130 is deployed in the data processing device 400. This will not be repeated here. illustrate.
  • the acquisition module 1225 is used to acquire data processing instructions generated by the accelerator 120, which may include data writing instructions and data reading instructions.
  • the configuration module 1221 is used to receive configuration information input by the user.
  • the configuration information may include information about configuring the cache switch, information about configuring the address range, and information about configuring the cache depth. Wherein, the information about configuring the cache switch is on, which means using the first
  • the memory 121 performs data storage, which means that the first memory 121 is not used for data storage.
  • the information configuring the address range includes the target address range, that is, the data stored in the first memory 121 is the cache data of the data stored in the target address range of the second memory 130 .
  • the information configuring the cache depth includes the cache depth D, that is, the storage capacity of the first memory 121 is D.
  • the search and write data update module 1222 is used to obtain the data writing instructions generated by the accelerator 120, and perform the data writing instructions for processing. Specifically, the search and write data update module 1222 may first query whether the second storage address exists in the address memory 124 according to the second storage address of the data carried in the data writing instruction. If the second storage address exists in the address memory 124 , then query whether the state information of the data in the state information memory 123 is the first state. If it is the first state, modify the first state of the data to the second state, and read the data from the first memory 121 according to the first storage address. And update it, the updated data is rewritten into the first memory 121, and then the updated data is written into the second memory 130; if it is the second state, the data is directly written into the first memory 121, and then the Data is written to the second memory 130.
  • the first storage address corresponding to the second storage address is determined according to the current storage capacity of the first memory 121, and the correspondence between the first storage address and the second storage address is The relationship is stored in the first memory 121, and the data is stored in the first memory 121 and the second memory 130.
  • the read data return processing module 1223 and the read data update module 1224 are used to obtain the data read instructions generated by the accelerator 120 and process the data read instructions. Among them, the read data return processing module 1223 is mainly used to read data, and the read data update module 1224 is mainly used to update data.
  • the read data return processing module 1223 can first query whether the second storage address exists in the address memory 124 according to the second storage address of the data carried in the data reading instruction. If the second storage address exists in the address memory 124, Then query whether the status information of the data in the status information memory 123 is the first state. If it is the first state, it means that the data is being modified, so the data can be read from the second memory 130; if it is the second state, it means the data has not been modified. is modified so that the data can be read from the first memory 121.
  • the read data update module 1224 can first store the status information of the data in the status information memory 123 as the second status, and then determine the second storage address corresponding to the storage capacity of the first memory 121. The first storage address, and the corresponding relationship between the first storage address and the second storage address is stored in the first memory 121.
  • the read data return processing module 1223 reads data from the second memory.
  • the read data update module 1224 Determine whether the state information of the data in the state information memory 123 is the first state. If it is the first state, modify the state information to the second state, and then update the data to the first memory. If it is the second state, do not change the state information to the first state. Data is updated to the first memory. It should be understood that if it is the second state, it means that the search and write data update module 1222 has updated the data in the first memory during this period, so there is no need to update the data to the first memory.
  • Figure 4 is an exemplary division method.
  • the memory controller 122 provided in this application can also be divided into more modules.
  • the address memory 124 is searched for the first data processing instruction.
  • the step of storing addresses is implemented by the search and write data update module 1222.
  • the search and write data update module 1222 can also be further divided into a search module and a write data update module, which is not specifically limited in this application.
  • the accelerator provided by the present application stores the data to be cached in the second memory in the first memory, where the read and write efficiency of the first memory is greater than the read and write efficiency of the second memory, so that the accelerator is faster when reading and writing data. It can directly interact with the first memory with high reading and writing efficiency, thereby improving the data reading and writing efficiency of the accelerator.
  • the first memory is implemented by the memory in the accelerator, such as SRAM, register, CAM, etc., without the need to deploy additional cache. , realizes the cache function of accelerators such as FPGA and ASIC with very low hardware cost, which can solve the problem of accelerator's read and write efficiency limited by the memory bandwidth bottleneck in a simple, efficient and low-cost way.
  • Figure 5 is a schematic flowchart of the steps of the data processing method provided by this application in an application scenario.
  • Figure 6 is a schematic flowchart of the steps of processing data reading instructions in a data processing method provided by this application.
  • Figure 7 is a schematic flowchart of the steps of processing data reading instructions in a data processing method provided by this application.
  • a schematic flow chart of steps for processing data writing instructions in a data processing method is provided. Simply put, Figure 6 is used to describe step 7 in Figure 5, and Figure 7 is used to describe step 8 in Figure 5. As shown in Figures 5 to 7, the method may include the following steps:
  • Step 1 Configure the storage controller 122. This step can be implemented by the configuration module 1221 in Figure 4.
  • the configuration content includes one or more of cache switch status, cache address range, and cache capacity, where the cache switch status is used to indicate whether the accelerator uses the first memory 121, and the cache address range is used to indicate that the accelerator will store Data whose address is within the cache address range is stored in the first memory 121 , and the cache capacity is used to indicate the capacity of the first memory 121 .
  • the cache depth configuration of D means that the storage capacity of the first memory 121 is D.
  • the user can also perform other configurations on the storage controller 122, which can be determined based on the actual business environment, and is not specifically limited in this application.
  • Step 2 The storage controller 122 obtains data processing instructions.
  • the data processing instruction may be a data reading instruction or a data processing instruction. This step can be implemented by the acquisition module 1225 in Figure 4.
  • Step 3 The storage controller 122 determines whether the cache switch is on. If the cache switch is on, step 4 is performed. If the cache switch is off, step 9 is performed. Wherein, when the cache switch is configured to be on, it means that the first memory 121 is used for data storage, and when it is set to off, it means that the first memory 121 is not used for data storage. This step can be implemented by the configuration module 1221 in Figure 4.
  • Step 4 The storage controller 122 determines whether the address range is set. If the address range is set, step 5 is performed. If the address range is not set, step 9 is performed. This step can be implemented by the configuration module 1221 in Figure 4.
  • step 6 when the address range is not set, since the cache switch is already turned on, that is to say, the user wants to use the cache function of the first memory 121, the user can be prompted to set the cache address range, or the step can be skipped. For the processing flow of 5, directly execute step 6.
  • Step 5 The storage controller 122 determines whether the address is within the range, where the address refers to the address of the data access request received in step 2, and the range refers to the cache address range configured in step 1. If the address is within the range, 1 is within the configured cache address range, go to step 6. If it is not within the cache address range, go to step 9. This step can be implemented by the configuration module 1221 in Figure 4.
  • Step 6 The storage controller 122 determines whether the data processing instruction is a data read instruction. If it is a data read instruction, step 7 is performed. If it is not a data read instruction, step 8 is performed. This step can be implemented by the configuration module 1221.
  • Step 7 The storage controller 122 processes the data read instruction.
  • the process of processing the data read instruction in step 7 will be described in detail in the embodiment of Figure 6. This step can be implemented by the read data return processing module 1223 and the read data update module 1224 in Figure 4.
  • Step 8 The storage controller 122 processes the data write instruction. Among them, the process of processing the data writing instruction in step 8 will be described in detail in the embodiment of FIG. 7 . This step can be implemented by the search and write data update module 1222 in Figure 4.
  • Step 9 The storage controller 122 issues a read request or write request to the second memory.
  • the storage controller 122 can issue a DDR read request.
  • request if the data access request is a data processing instruction, then the storage controller 122 can issue a DDR write request.
  • the above example is for illustration and is not specifically limited in this application.
  • step 7 is a flow chart of the processing steps of the data read instruction provided by this application. As shown in Figure 6, step 7 may include the following step flow:
  • Step 71 The address memory 124 determines whether the same address is stored in the address memory 124, where the data read instruction carries the read address of the data, and the address is the address of the second memory 130. This step can be implemented by the read data return processing module 1223 in Figure 4.
  • the address memory 124 may be a CAM, where the CAM is a memory that is addressed by content.
  • the working mechanism of the memory is to compare an input data item with all data items stored in the CAM to determine whether the input data item matches the data items stored in the CAM, so the address used to store the address information of the data
  • the memory 124 can be implemented using a CAM in the accelerator 120. In this way, when the user requests to read data, the CAM can match the read address with the address information stored in the CAM. If they match, it means that the data has been stored in the first memory 121. middle.
  • the address memory 124 can also be implemented using other memories, and then the memory controller 122 implements the function of obtaining the address from the address memory 124 and matching it with the address carried in the data read instruction. Applications are not subject to specific restrictions.
  • step 71 determines whether the same address exists in the address memory 124, if it exists, it means that the latest or historical version of the data has been stored in the first memory 121, and step 72 can be executed. If it does not exist, it means that the first memory 121 There is no version of this data stored in , so step 75 can be performed.
  • Step 72 Determine whether the status information is the first state.
  • the first state can also be represented by a high-order state or a "1" state, which is not specifically limited in this application.
  • This step can be implemented by the read data return processing module 1223 in Figure 4.
  • the status information memory 123 may be a register, and step 72 may be implemented by a register that determines whether the status information is the first state, or the status information memory 123 may implement the storage function, and the storage controller 122 may implement the determination function.
  • the storage controller 122 obtains the status information of the data from the status information memory 123 and determines whether it is in the first state. This application does not specifically limit this.
  • step 73 when the status information of the data is in the first state (high bit), step 73 is executed, and when the status information of the data is in the second state (low bit), step 74 is executed.
  • Step 73 Read data from the second memory 130.
  • the storage controller 122 may issue a data read instruction to the second memory 130. If the second memory 130 is DDR, then the data read instruction may be a DDR read instruction. ask. This step can be implemented by the read data return processing module 1223 in Figure 4.
  • the status information of the data is the first state, referring to the foregoing content, it can be seen that when the data is in the first state, the data may be in the process of being retrieved from the second memory 130, so the data is read directly from the second memory 130 at this time. Fetching data can avoid reading errors and improve the accuracy of data reading.
  • Step 74 Read data from the first memory. Specifically, the storage controller 122 may query the address memory 124 to determine the first storage address of the data in the first memory according to the read address carried in the data read instruction, and then read the data from the first storage address. This step can be implemented by the read data return processing module 1223 in Figure 4.
  • Step 75 Set the state information to the first state, and store the correspondence between the first storage address and the second storage address.
  • the first state can also be represented by a high-order state or a "1" state, which is not specifically limited in this application.
  • the above corresponding relationship will be stored in the address memory 124.
  • This step can be implemented by the read data update module 1224 in Figure 4.
  • Step 76 Read data from the second memory.
  • the storage controller 122 may issue a data read instruction to the second memory 130. If the second memory 130 is DDR, then the data read instruction may be a DDR read request. This step can be implemented by the read data update module 1224 in Figure 4.
  • Step 77 Determine whether the status information is the first status (high bit). If it is the first state, steps 78 to 79 are executed. If it is the second state (low level), steps 78 to 79 are not executed. This step can be implemented by the read data update module 1224 in Figure 4.
  • the status information changes to the second state (low bit), referring to the foregoing content, it can be seen that during the data writing process, if the status information becomes the first state (high bit), the storage controller 122 will change the status information of the data from The first state (high bit) is modified to the second state (low bit), which means that the data has been modified between step 75 and step 79, and the storage controller 122 is writing or has written the latest version of the data to the first memory 121 , so the data read at this time can not be written into the first memory 121 to avoid overwriting the data of the new version.
  • the status information does not change to the second state (low bit), it means that the data has not been modified between step 75 and step 79. Therefore, the data read from the second memory 130 at this time can be updated to the second state.
  • Step 78 Modify the status information to the second status (low). This step can be implemented by the read data update module 1224 in Figure 4.
  • Step 79 Update the data to the first memory 121. This step can be implemented by the read data update module 1224 in Figure 4.
  • Step 8 may include the following step flow:
  • Step 81 Determine whether the same address exists.
  • the data processing instruction carries the write address of the data.
  • Step 81 can determine whether the write address exists in the address memory 124.
  • this step can be implemented by the search and write data update module 1222 in Figure 4 .
  • step 82 is executed, and if the same address does not exist, step 84 is executed.
  • Step 82 Determine whether the status information is the first state.
  • the first state can also be represented by a high-order state or a "1" state, which is not specifically limited in this application.
  • This step can be implemented by the search and write data update module 1222 in Figure 4.
  • step 83, step 85 and step 86 are executed.
  • step 85 and step 86 are executed.
  • Step 83 Set the status information to the second state.
  • the second state can also be represented by the low-order state or the "0" state, which is not specifically limited in this application.
  • This step can be implemented by the search and write data update module 1222 in Figure 4.
  • step 75 when the status information is in the first state, it means that the data is executing steps 75 to 78. At this time, the data received in step 83 is the latest version of data, so the status information is set to the second state. status, which can prevent the old version data in steps 75 to 78 from overwriting the current new version data.
  • step 83 step 85 and step 86 can be executed.
  • Step 84 Store the corresponding relationship between the first storage address and the second storage address, and store it in the address memory 124. It can be understood that when the address carried in the data processing instruction does not exist in the address memory 124, it means that the historical version of the data has not been written to the first memory 121. Therefore, it can be divided into writes to the first memory 121 at this time. Enter the address, that is, the first storage address, and then store the first storage address into the address memory 124. In this way, when the storage controller 122 receives a read request for the data, the first storage address of the data can be obtained through the address memory. address, and then reads the data to achieve data caching. This step can be implemented by the search and write data update module 1222 in Figure 4.
  • step 84 step 85 and step 86 can be executed.
  • Step 85 Write data to the first memory. This step can be implemented by the search and write data update module 1222 in Figure 4.
  • Step 86 Write data to the second memory. This step can be implemented by the search and write data update module 1222 in Figure 4.
  • the data processing method provided by this application stores data in the first memory and stores the data in the first memory in the second memory, where the read and write efficiency of the first memory is greater than the read and write efficiency of the second memory.
  • the accelerator to directly interact with the first memory with higher reading and writing efficiency when reading and writing data, thereby improving the data reading and writing efficiency of the accelerator.
  • the first memory is implemented by the memory in the accelerator, such as SRAM, register, CAM, etc. etc., without the need to deploy additional cache, and realize the cache function of accelerators such as FPGA and ASIC with very low hardware cost, which can simply, efficiently and cost-effectively solve the problem of accelerator read and write efficiency limited by the memory bandwidth bottleneck.
  • the embodiment of the present application provides an accelerator.
  • the accelerator includes a processor and a power supply circuit.
  • the power supply circuit is used to process The processor supplies power, and the processor is used to implement the functions of the operating steps performed by the accelerator described in Figures 5 to 7 above.
  • Embodiments of the present application provide a computing device.
  • the computing device includes a CPU and an accelerator.
  • the CPU is used to run instructions to implement business functions of the computing device, and to implement the functions of the operation steps performed by the accelerator described in Figures 5 to 7.
  • the above embodiments are implemented in whole or in part by software, hardware, firmware or any other combination.
  • the above-described embodiments are implemented in whole or in part in the form of a computer program product.
  • a computer program product includes at least one computer instruction.
  • the computer is a general-purpose computer, a special-purpose computer, a computer network, or other programming device.
  • Computer instructions are stored in or transmitted from one computer-readable storage medium to another, e.g., from a website, computer, server, or data center via wired (e.g., coaxial cable, fiber optic cable) , digital subscriber line (DSL)) or wireless (such as infrared, wireless, microwave, etc.) to transmit to another website, computer, server or data center.
  • Computer-readable storage media are any media that can be accessed by a computer or data storage nodes such as servers and data centers that contain at least one media collection.
  • the media used is magnetic media (for example, floppy disk, hard disk, tape), optical media (for example, high-density digital video disc (DVD)), or semiconductor media.
  • the semiconductor medium is SSD.

Abstract

La présente demande concerne un procédé et un appareil de traitement de données, ainsi qu'un dispositif électronique. Le procédé est appliqué à un accélérateur. L'accélérateur comprend une première mémoire. Le procédé comprend les étapes suivantes : l'accélérateur obtient une instruction de traitement de données, l'instruction de traitement de données comprenant une adresse de données dans une seconde mémoire, et l'efficacité de lecture-écriture de données de la première mémoire étant supérieure à celle de la seconde mémoire; et l'accélérateur lit les données de la première mémoire selon l'instruction de traitement de données et traite les données, la première mémoire servant à stocker les données à mettre en cache de la seconde mémoire, et les données à mettre en cache comprenant des données auxquelles l'accélérateur accède historiquement, de façon à ce que l'accélérateur puisse interagir directement avec la première mémoire ayant une efficacité de lecture-écriture élevée lors de la lecture et de l'écriture de données. Par conséquent, l'efficacité de lecture-écriture des données de l'accélérateur est améliorée, et le problème selon lequel l'efficacité de lecture-écriture de l'accélérateur est limitée par le goulot d'étranglement de la bande passante de la mémoire interne est résolu.
PCT/CN2023/091041 2022-04-27 2023-04-27 Procédé et appareil de traitement de données, et dispositif associé WO2023208087A1 (fr)

Applications Claiming Priority (2)

Application Number Priority Date Filing Date Title
CN202210451716.8A CN117008810A (zh) 2022-04-27 2022-04-27 一种数据处理方法、装置及相关设备
CN202210451716.8 2022-04-27

Publications (1)

Publication Number Publication Date
WO2023208087A1 true WO2023208087A1 (fr) 2023-11-02

Family

ID=88517853

Family Applications (1)

Application Number Title Priority Date Filing Date
PCT/CN2023/091041 WO2023208087A1 (fr) 2022-04-27 2023-04-27 Procédé et appareil de traitement de données, et dispositif associé

Country Status (2)

Country Link
CN (1) CN117008810A (fr)
WO (1) WO2023208087A1 (fr)

Citations (5)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN102566978A (zh) * 2010-09-30 2012-07-11 Nxp股份有限公司 存储器加速器缓冲器置换方法及系统
CN105074677A (zh) * 2013-03-12 2015-11-18 英派尔科技开发有限公司 加速器缓存器访问
CN106415485A (zh) * 2014-01-23 2017-02-15 高通股份有限公司 用于动态语言中的内联高速缓存的硬件加速
WO2018179873A1 (fr) * 2017-03-28 2018-10-04 日本電気株式会社 Bibliothèque pour ordinateur muni d'un accélérateur, et accélérateur
CN111752867A (zh) * 2019-03-29 2020-10-09 英特尔公司 共享加速器存储器系统和方法

Patent Citations (5)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN102566978A (zh) * 2010-09-30 2012-07-11 Nxp股份有限公司 存储器加速器缓冲器置换方法及系统
CN105074677A (zh) * 2013-03-12 2015-11-18 英派尔科技开发有限公司 加速器缓存器访问
CN106415485A (zh) * 2014-01-23 2017-02-15 高通股份有限公司 用于动态语言中的内联高速缓存的硬件加速
WO2018179873A1 (fr) * 2017-03-28 2018-10-04 日本電気株式会社 Bibliothèque pour ordinateur muni d'un accélérateur, et accélérateur
CN111752867A (zh) * 2019-03-29 2020-10-09 英特尔公司 共享加速器存储器系统和方法

Also Published As

Publication number Publication date
CN117008810A (zh) 2023-11-07

Similar Documents

Publication Publication Date Title
JP2019067417A (ja) 最終レベルキャッシュシステム及び対応する方法
US11650940B2 (en) Storage device including reconfigurable logic and method of operating the storage device
US9697111B2 (en) Method of managing dynamic memory reallocation and device performing the method
KR20180054394A (ko) 호스트 메모리 버퍼(Host Memory Buffer)를 관리하기 위한 NVMe(Non-Volatile Memory Express) 컨트롤러를 포함하는 솔리드 스테이트 저장 장치(solid state storage device), 이를 포함하는 시스템 및 호스트의 호스트 메모리 버퍼를 관리하는 방법
US11675709B2 (en) Reading sequential data from memory using a pivot table
US11449230B2 (en) System and method for Input/Output (I/O) pattern prediction using recursive neural network and proaction for read/write optimization for sequential and random I/O
US7222217B2 (en) Cache residency test instruction
US8838873B2 (en) Methods and apparatus for data access by a reprogrammable circuit module
US20220197814A1 (en) Per-process re-configurable caches
WO2023125524A1 (fr) Procédé et système de stockage de données, procédé de configuration d'accès au stockage et dispositif associé
CN110597742A (zh) 用于具有持久系统存储器的计算机系统的改进存储模型
US20150121033A1 (en) Information processing apparatus and data transfer control method
EP4242819A1 (fr) Systeme et procede permettant d'obtenir efficacement des informations stockees dans un espace d'adresses
WO2023208087A1 (fr) Procédé et appareil de traitement de données, et dispositif associé
WO2023124304A1 (fr) Système de cache de puce, procédé de traitement de données, dispositif, support de stockage et puce
US6324633B1 (en) Division of memory into non-binary sized cache and non-cache areas
KR20230117692A (ko) 하이브리드 데이터베이스 가속 장치, 시스템 및 방법
CN116541415A (zh) 用于加速的装置、系统和方法
Awamoto et al. Designing a storage software stack for accelerators
US10216524B2 (en) System and method for providing fine-grained memory cacheability during a pre-OS operating environment
US10776011B2 (en) System and method for accessing a storage device
US20240061784A1 (en) System and method for performing caching in hashed storage
US20230384960A1 (en) Storage system and operation method therefor
US20230359389A1 (en) Operation method of host configured to communicate with storage devices and memory devices, and system including storage devices and memory devices
US20240078036A1 (en) Hybrid memory management systems and methods with in-storage processing and attribute data management

Legal Events

Date Code Title Description
121 Ep: the epo has been informed by wipo that ep was designated in this application

Ref document number: 23795519

Country of ref document: EP

Kind code of ref document: A1