WO2023208087A1 - Data processing method and apparatus and related device - Google Patents

Data processing method and apparatus and related device Download PDF

Info

Publication number
WO2023208087A1
WO2023208087A1 PCT/CN2023/091041 CN2023091041W WO2023208087A1 WO 2023208087 A1 WO2023208087 A1 WO 2023208087A1 CN 2023091041 W CN2023091041 W CN 2023091041W WO 2023208087 A1 WO2023208087 A1 WO 2023208087A1
Authority
WO
WIPO (PCT)
Prior art keywords
data
memory
accelerator
address
state
Prior art date
Application number
PCT/CN2023/091041
Other languages
French (fr)
Chinese (zh)
Inventor
熊鹰
徐栋
卢霄
Original Assignee
华为技术有限公司
Priority date (The priority date is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the date listed.)
Filing date
Publication date
Application filed by 华为技术有限公司 filed Critical 华为技术有限公司
Publication of WO2023208087A1 publication Critical patent/WO2023208087A1/en

Links

Classifications

    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F3/00Input arrangements for transferring data to be processed into a form capable of being handled by the computer; Output arrangements for transferring data from processing unit to output unit, e.g. interface arrangements
    • G06F3/06Digital input from, or digital output to, record carriers, e.g. RAID, emulated record carriers or networked record carriers
    • G06F3/0601Interfaces specially adapted for storage systems
    • G06F3/0602Interfaces specially adapted for storage systems specifically adapted to achieve a particular effect
    • G06F3/061Improving I/O performance
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F12/00Accessing, addressing or allocating within memory systems or architectures
    • G06F12/02Addressing or allocation; Relocation
    • G06F12/06Addressing a physical block of locations, e.g. base addressing, module addressing, memory dedication
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F12/00Accessing, addressing or allocating within memory systems or architectures
    • G06F12/02Addressing or allocation; Relocation
    • G06F12/06Addressing a physical block of locations, e.g. base addressing, module addressing, memory dedication
    • G06F12/0646Configuration or reconfiguration
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F12/00Accessing, addressing or allocating within memory systems or architectures
    • G06F12/02Addressing or allocation; Relocation
    • G06F12/08Addressing or allocation; Relocation in hierarchically structured memory systems, e.g. virtual memory systems
    • G06F12/0802Addressing of a memory level in which the access to the desired data or data block requires associative addressing means, e.g. caches
    • G06F12/0877Cache access modes
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F16/00Information retrieval; Database structures therefor; File system structures therefor
    • G06F16/20Information retrieval; Database structures therefor; File system structures therefor of structured data, e.g. relational data
    • G06F16/23Updating
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F3/00Input arrangements for transferring data to be processed into a form capable of being handled by the computer; Output arrangements for transferring data from processing unit to output unit, e.g. interface arrangements
    • G06F3/06Digital input from, or digital output to, record carriers, e.g. RAID, emulated record carriers or networked record carriers
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F3/00Input arrangements for transferring data to be processed into a form capable of being handled by the computer; Output arrangements for transferring data from processing unit to output unit, e.g. interface arrangements
    • G06F3/06Digital input from, or digital output to, record carriers, e.g. RAID, emulated record carriers or networked record carriers
    • G06F3/0601Interfaces specially adapted for storage systems
    • G06F3/0628Interfaces specially adapted for storage systems making use of a particular technique
    • G06F3/0629Configuration or reconfiguration of storage systems

Definitions

  • the present application relates to the field of computers, and in particular, to a data processing method, device and related equipment.
  • accelerators also called accelerated processing units
  • ASICs application-specific integrated circuits
  • FPGAs field-programmable gate arrays
  • ASICs application-specific integrated circuits
  • FPGAs field-programmable gate arrays
  • ASICs application-specific integrated circuits
  • ML machine learning
  • accelerators have high requirements for real-time reading and writing of data in memory.
  • the current access frequency of the accelerator to data in the memory is not balanced with the available bandwidth provided by the memory to the accelerator.
  • the accelerator needs to perform read and write operations on the data in the memory, it often needs to wait for an additional period of time, resulting in a decrease in the processing performance of the accelerator.
  • This application provides a data processing method, device and related equipment to solve the problem of accelerator processing performance degradation caused by imbalance between the accelerator's access frequency to data in the memory and the available bandwidth provided by the memory to the accelerator.
  • a data processing method is provided.
  • the method is applicable to an accelerator.
  • the accelerator includes a first memory.
  • the method may include the following steps: the accelerator obtains data processing instructions, wherein the data processing instructions include data in the second memory. address, the data reading and writing efficiency of the first memory is greater than the data reading and writing efficiency of the second memory, the accelerator reads data from the first memory according to the data processing instructions, and processes the data, where the first memory is used to store the second memory
  • the data to be cached in the data to be cached includes data accessed by the accelerator in history.
  • the accelerator stores the data to be cached in the second memory in the first memory, where the read and write efficiency of the first memory is greater than the read and write efficiency of the second memory, so that the accelerator is processing the data processing instructions. At this time, it can directly interact with the first memory with higher reading and writing efficiency, thereby improving the data reading and writing efficiency of the accelerator.
  • the accelerator is implemented through application-specific integrated circuit (ASIC) or field-programmable gate array (FPGA) technology
  • the first memory includes static random access memory (static RAM) , SRAM), storage class memory (storage class memory, SCM), register, content-addressable memory (content-addressable memory, CAM), etc.
  • static RAM static random access memory
  • SRAM static random access memory
  • SCM storage class memory
  • register content-addressable memory
  • CAM content-addressable memory
  • the computing device where the accelerator is located may include a central processing unit (CPU) and an accelerator.
  • the accelerator may be a system-level chip implemented through FPGA, ASIC and other technologies. Specifically, it may be used in the computing device 100 to assist the CPU.
  • the above special types of computing tasks can be graphics processing, vector calculations, machine learning, etc.
  • the accelerator can be a graphics processing unit (GPU), a distributed processing unit (DPU), a neural network processing unit (NPU), etc.
  • the accelerator may also be a CPU.
  • the computing device may include multiple processors such as CPU1 and CPU2, where CPU1 is the processor 110 and CPU2 is an accelerator. It should be understood that the above examples are for illustration and are not specifically limited in this application.
  • the first memory and the second memory may be the original internal memory of the accelerator; or the first memory is the original internal memory of the accelerator, and the second memory is the original memory outside the accelerator, for example, the second memory is the accelerator.
  • the method provided by this application can be combined with specific application scenarios to select an appropriate memory as the first memory inside the accelerator. If there are multiple memories inside the accelerator, then the memory with low read and write efficiency can be As the second memory, the one with high read and write efficiency is used as the first memory. If there are not multiple memories inside the accelerator or the read and write efficiencies of multiple memories are the same, then the memory inside the accelerator can be used as the first memory, and the read and write memory outside the accelerator can be used. A memory with low efficiency but large storage capacity is used as the second memory.
  • the above implementation method uses the original memory hardware inside the accelerator and improves the data reading and writing efficiency of the accelerator 120 through algorithms. There is no need to deploy additional cache hardware and reduces the cost of solution implementation, especially for small hardware specifications and sizes implemented by ASIC or FPGA technology. For accelerators, the plan is more implementable.
  • the data to be cached includes data with an access frequency higher than the first threshold.
  • the accelerator may store data in the second memory, and then store the data to be cached in the second memory in the first memory.
  • the accelerator can update the data to be cached to the first memory in real time, or can update the data to be cached to the first memory in a delayed manner or according to a certain algorithm, which is not specifically limited in this application.
  • the data to be cached includes data historically accessed by the accelerator. In this way, when the accelerator accesses the data again, it can directly interact with the first memory, which has a faster reading and writing speed, thereby improving the data reading and writing efficiency of the accelerator.
  • the data to be cached may include data whose historical access frequency is higher than the first threshold. That is to say, the data stored in the first memory is the data with higher access frequency stored in the second memory. Since the first memory has faster reading and writing efficiency, the accelerator can directly communicate with the first memory when processing frequently accessed data. Interaction, thereby improving the read and write efficiency of the accelerator.
  • the size of the first threshold can be determined according to specific application scenarios, and is not specifically limited in this application.
  • the data in the first memory when the data in the first memory reaches the storage threshold, the data in the first memory whose access frequency is not higher than the second threshold can be deleted, and then the data accessed by the accelerator from the second memory can be continued to be stored in the first memory,
  • the data in the first memory reaches the storage threshold, the data in the first memory whose access frequency is not higher than the second threshold is deleted, so that the data stored in the first memory is data that has a higher access frequency and is accessed by the accelerator.
  • the first threshold and the second threshold may be the same value, or the second threshold may be greater than the first threshold, which is not specifically limited in this application.
  • the data to be cached may also be data recently accessed by the accelerator.
  • the latest here may refer to data accessed by the accelerator within a time range.
  • the time range here may be determined according to the storage capacity of the first memory. Specifically, when the amount of data in the first memory reaches the storage threshold, the access time of each data in the first memory is sorted, and the data with the longest access time from the current time will be deleted, and so on, thereby ensuring that the data in the first memory is The access time of stored data is the most recent access time.
  • the data recently accessed by the accelerator may be the data accessed by the accelerator within a preset time range. That is to say, if the current time is T, then the preset time range may be the time range from T-t time to T time, The size of t can be determined according to specific application scenarios, and is not specifically limited in this application.
  • the data to be cached can also include prefetched data.
  • the storage controller can determine the data that the accelerator may access through the prefetch algorithm, extract it from the second memory in advance, and store it to read and write efficiency. In this way, when the accelerator requests to read the data, it can directly interact with the first memory, thereby improving the read and write efficiency of the accelerator.
  • the data to be cached may also include more types of data, which may be determined based on the application scenario of the accelerator, which is not specifically limited in this application.
  • the accelerator configures the first memory and obtains configuration information of the first memory.
  • the configuration information includes one or more of cache switch status, cache address range, and cache capacity, where the cache switch status It is used to indicate whether the accelerator uses the first memory.
  • the cache address range is used to instruct the accelerator to store data with a storage address within the cache address range in the first memory.
  • the cache capacity is used to indicate the capacity of the first memory.
  • configuring the cache switch to on means using the first memory for data storage, and being off means not using the first memory for data storage.
  • Configuring the address range as the target address range means that the data stored in the first memory is the cache data of the data stored in the target address range of the second memory.
  • the cache depth configured as D means that the storage capacity of the first memory is D.
  • the accelerator includes an address memory, and the address memory is used to store a correspondence between an address of the data in the first memory and an address of the data in the second memory.
  • the data processing instructions may be instructions generated by the accelerator, or instructions sent to it by the CPU in the computing device where the accelerator is located.
  • the data processing instructions may be data writing instructions or data reading instructions, and may also include Other instructions for business processing after the data is read out.
  • the accelerator can update, delete, merge and other operations on the data, and can also perform calculations on the multiple data read.
  • Processing such as matrix multiplication, convolution operations, etc., can be determined according to the processing business of the accelerator. This application specifically limits the specific process of subsequent data processing.
  • the write address in the data write instruction is the second storage address of the data. If the above-mentioned second storage address is already stored in the address memory and is the same as the first storage address, Correspondence means that the historical version of the data has been written to the first storage address of the first memory. At this time, the historical version of data corresponding to the first storage address can be updated and the data is sent to the second memory to Request to update the data corresponding to the second storage address.
  • the accelerator can first determine the first storage address corresponding to the second storage address, and then store the data carried by the data write instruction in the first memory. In the first storage address of a memory, the corresponding relationship between the first storage address and the second storage address of the data is stored in the address memory. It should be noted that when the storage controller writes data for the first time, it can determine the first storage address of the data based on the current free address of the first memory. This application does not limit the address allocation strategy of the storage controller.
  • the read address in the data read instruction is the second storage address of the data.
  • the storage controller can compare the second storage address with the address in the address memory. The address is matched. If the address memory includes the second storage address, the corresponding first storage address can be obtained according to the second storage address of the data, and then the data can be read from the first memory.
  • the accelerator can read the data from the second memory according to the second storage address.
  • the accelerator can also determine the first storage address of the data, and store the data read from the second memory into the first memory, so that the accelerator can subsequently read directly from the first memory with higher reading and writing efficiency.
  • the data is obtained, and the mapping relationship between the first storage address and the second storage address of the data is stored in the address memory.
  • the above implementation method uses the address memory to record the correspondence between the first storage address and the second storage address, thereby realizing the addressing of the data to be cached. In this way, receiving the data processing instruction carrying the second storage address can also be done individually. According to the address recorded in the address memory, the data is read from the first memory, so that the accelerator can directly interact with the first memory with faster reading and writing speed, thereby improving the data reading and writing efficiency of the accelerator.
  • the accelerator includes a state information memory, the state information memory is used to store state information of data, the state information includes a first state and a second state, where the state information includes a first state and a second state,
  • the first state is used to indicate that the data is in a modified state and is being written to the first memory or the second memory. At this time, the data cannot be read from the first memory, otherwise the wrong version of data may be read.
  • the second state is used to indicate that the data is not in a modified state, and the data can be read from the first memory at this time.
  • the accelerator when the accelerator processes the data write instruction, if the status information of the data is in the first state, that is, the data is currently being modified, the accelerator can first set the status information of the data to the second state, Then the data is written into the first storage address, and then the data is written into the second memory for data updating.
  • the data in the data processing instruction can be written to the first memory for data update, and the data can be written into the third memory.
  • the second memory performs data updates.
  • the accelerator when the accelerator processes the data read instruction, if the address memory does not include the read address, that is, the data does not exist in the first memory, and when the accelerator reads the data from the second memory, the accelerator will The state information of the data is set to the first state, and the accelerator reads the data from the second memory. After the data is retrieved, when the state information of the data is the first state, the accelerator sets the state information of the data to the second state and stores the data. to the first storage address of the first memory. If the status information changes to the second status, it means that the storage controller has updated the latest version of the data, and the data can no longer be stored in the first memory at this time. It should be understood that if a new version of data is written into the first memory and the second memory during the data reading process, the status information can ensure that the data is the latest version of the data.
  • the address memory includes a read address and the status information is the first state, that is, the previous version of the data is being operated on, the latest version of the data can be read from the second memory. If the address memory includes a read address and the status information is in the second state, that is, the data is not being operated on, the latest version of the data can be read from the first memory.
  • the above state information can be represented by binary characters.
  • the first state is represented by the character "1”
  • the second state is represented by the character "0”.
  • the first state and the second state can also be distinguished by other identifiers. status, this application does not specifically limit this.
  • the status information of the data when the status information of the data is in the first state, it means that the data is being written to the first memory or the second memory. At this time, the data cannot be read from the first memory, otherwise the read may occur. In the case of incorrect version data; when the status information of the data is in the second state, it means that the data is not in a modified state. At this time, the data can be read from the first memory, and the status information of the data is recorded through the status information memory to ensure that the user does not The wrong version of data will be read. Make sure that the data recorded in the first memory is the latest version.
  • a data processing device in a second aspect, includes a first memory.
  • the data processing device includes: an acquisition unit for acquiring data processing instructions, wherein the data processing instructions include data in the second memory.
  • the address in For storing the data to be cached in the second memory, the data to be cached includes data historically accessed by the data processing device.
  • the accelerator stores the data to be cached in the second memory in the first memory, where the read and write efficiency of the first memory is greater than the read and write efficiency of the second memory, so that the accelerator processes the data processing instructions.
  • the accelerator can directly interact with the first memory with higher reading and writing efficiency, thereby improving the data reading and writing efficiency of the accelerator.
  • the data processing device includes a status information memory.
  • the status information memory is used to store status information of the data.
  • the status information includes a first state and a second state, where the first state is used to indicate that the data is in modification. status, the second status is used to indicate that the data is not in a modified state.
  • the data processing device includes an address memory, and the address memory is used to store a correspondence between the address of the data in the first memory and the address of the data in the second memory.
  • the data processing instruction includes a data writing instruction; a processing unit configured to set the status information of the data when the status information of the data is the first status and the data processing instruction is the data writing instruction.
  • the data processing instruction In the second state, data is read from the first memory and updated according to the data processing instructions, and the updated data is stored in the first memory.
  • the data processing instructions include data reading instructions; the processing unit is used to determine that the address in the data processing instruction is not included in the address memory before the data processing device obtains the data processing instructions; the processing unit is configured to for setting the status information of the data to the first state and reading the data from the second memory; and the processing unit for setting the status information of the data to the first state when the status information of the data is the first status.
  • the second state data is stored in the first storage address of the first memory.
  • the device further includes a configuration unit configured to configure the first memory and obtain configuration information of the first memory.
  • the configuration information includes one of cache switch status, cache address range, and cache capacity.
  • the cache switch state is used to indicate whether the data processing device uses the first memory
  • the cache address range is used to indicate the data processing device to store data with a storage address within the cache address range in the first memory
  • the cache capacity is used to indicate whether the data processing device uses the first memory. To indicate the capacity of the first memory.
  • the data processing device is implemented through ASIC or FPGA technology
  • the first memory includes one or more of SRAM, register, SCM, and CAM.
  • the data to be cached includes data with an access frequency higher than the first threshold.
  • an accelerator in a third aspect, includes a processor and a power supply circuit.
  • the power supply circuit is used to supply power to the processor.
  • the processor is used to implement the functions of the operation steps performed by the accelerator described in the first aspect.
  • a computing device in a fourth aspect, includes a CPU and an accelerator.
  • the CPU is used to run instructions to implement business functions of the computing device.
  • the accelerator is used to implement the functions of the operation steps performed by the accelerator described in the first aspect.
  • Figure 1 is a schematic structural diagram of a computing device provided by this application.
  • FIG. 2 is a schematic structural diagram of another computing device provided by this application.
  • FIG. 3 is a schematic structural diagram of an accelerator provided by this application.
  • Figure 4 is a schematic structural diagram of a data processing device provided by this application.
  • Figure 5 is a schematic flowchart of steps in an application scenario of a data processing method provided by this application.
  • Figure 6 is a schematic flowchart of steps for processing data reading instructions in a data processing method provided by this application;
  • FIG. 7 is a schematic flowchart of steps for processing data writing instructions in a data processing method provided by this application.
  • This application provides a computing device.
  • the accelerator in the computing device includes a first memory.
  • the accelerator stores the data to be cached in the second memory in the first memory, where the read and write efficiency of the first memory is greater than the read and write efficiency of the second memory, so that when the accelerator processes the data processing instructions, it can directly communicate with The first memory with higher reading and writing efficiency interacts with each other, thereby improving the data reading and writing efficiency of the accelerator.
  • Figure 1 is a schematic structural diagram of a computing device provided by this application.
  • the computing device 100 may include a processor 110, an accelerator 120, a second memory 130, a communication interface 140 and a storage medium 150.
  • communication connections can be established between the processor 110, the accelerator 120, the second memory 130, the communication interface 140 and the storage medium 150 through a bus.
  • the number of the processor 110, the accelerator 120, the second memory 130, the communication interface 140 and the storage medium 150 may be one or more, and is not specifically limited in this application.
  • the processor 110 and the accelerator 120 may be hardware accelerators or a combination of hardware accelerators.
  • the above-mentioned hardware accelerator is an application-specific integrated circuit (ASIC), a programmable logic device (PLD), or a combination thereof.
  • the above-mentioned PLD is a complex programmable logic device (CPLD), a field-programmable gate array (FPGA), a general array logic (GAL), or any combination thereof.
  • the processor 110 is used to execute instructions in the storage medium 150 to implement the business functions of the computing device 100 .
  • the processor 110 can be a central processing unit (CPU), and the accelerator 120 (also called an accelerated processing unit (APU)) can be a system-level chip implemented through FPGA, ASIC and other technologies.
  • the accelerator 120 is a processing unit in the computing device 100 for assisting the CPU in processing special types of computing tasks.
  • the special types of computing tasks may be graphics processing, vector calculations, machine learning, etc.
  • the accelerator 120 may be a graphics processor (graphics processing unit, GPU), a processor distributed processing unit (data processing unit, DPU), a neural network processor (neural-network processing unit, NPU), etc.
  • the accelerator 120 may also be a CPU.
  • the computing device 100 may include multiple processors such as CPU1 and CPU2, where CPU1 is the processor 110 and CPU2 is the accelerator 120. It should be understood that the above examples are for illustration. There are no specific limitations in this application.
  • the storage medium 150 is a carrier for storing data, such as hard disk (hard disk), U disk (universal serial bus, USB), flash memory (flash), SD card (secure digital memory Card, SD card), memory stick, etc.
  • the hard disk can It is a hard disk drive (HDD), a solid state disk (SSD), a mechanical hard disk (HDD), etc., and is not specifically limited in this application.
  • the storage medium 150 may include a second memory, and in a specific implementation, the second memory may be DDR.
  • the communication interface 140 is a wired interface (such as an Ethernet interface), an internal interface (such as a high-speed serial computer expansion bus (Peripheral Component Interconnect express, PCIe) bus interface), a wired interface (such as an Ethernet interface), or a wireless interface (such as a cellular interface). network interface or using the wireless LAN interface) for communicating with other servers or units.
  • a wired interface such as an Ethernet interface
  • an internal interface such as a high-speed serial computer expansion bus (Peripheral Component Interconnect express, PCIe) bus interface
  • PCIe Peripheral Component Interconnect express
  • Ethernet interface such as an Ethernet interface
  • a wireless interface such as a cellular interface
  • Bus 160 is a peripheral component interconnect express (PCIe) bus, an extended industry standard architecture (EISA) bus, a unified bus (unified bus, Ubus or UB), or a computer express link (compute express) link (CXL), cache coherent interconnect for accelerators (CCIX), etc.
  • PCIe peripheral component interconnect express
  • EISA extended industry standard architecture
  • unified bus unified bus, Ubus or UB
  • CXL computer express link
  • CXL cache coherent interconnect for accelerators
  • the bus 160 is divided into an address bus, a data bus, a control bus, etc. For the sake of clear explanation, various buses are marked as the bus 160 in the figure.
  • the second memory 130 includes volatile memory or non-volatile memory, or both volatile and non-volatile memory.
  • volatile memory is random access memory (RAM).
  • RAM random access memory
  • double rate synchronous dynamic random access memory double data rate RAM, DDR
  • dynamic random access memory DRAM
  • synchronous dynamic random access memory synchronous DRAM, SDRAM
  • double data rate synchronous dynamic random access memory synchronous dynamic random access memory
  • double data date SDRAM, DDR SDRAM double data date SDRAM, DDR SDRAM
  • enhanced synchronous dynamic random access memory enhanced SDRAM, ESDRAM
  • synchronous link dynamic random access memory synchroned RAM
  • direct memory bus random access memory direct rambus RAM, DR RAM
  • the accelerator 120 may also include a first memory 121, where the data reading and writing efficiency of the first memory 121 is greater than the data reading and writing efficiency of the second memory 130.
  • the first memory 121 may include static random access memory (static RAM, SRAM), storage class memory (storage class memory, SCM), register, content-addressable memory (content-addressable memory, CAM), etc. etc., this application does not make specific limitations.
  • the data to be cached stored in the second memory 130 can be cached in the first memory 121.
  • the read and write efficiency can be improved.
  • the data is read from the high-performance first memory 121, thereby making up for the difference in processing speed between the accelerator 120 and the low-performance memory, and improving the data reading and writing efficiency of the accelerator 120.
  • the second memory 130 in Figure 1 can be the original memory inside the computing device 100
  • the first memory 121 can be the original memory inside the accelerator 120.
  • This application uses the original memory inside the computing device 100 and the accelerator 120.
  • Hardware storage the algorithm improves the data reading and writing efficiency of the accelerator 120, without the need to deploy additional cache hardware, reducing the implementation cost of the solution.
  • the solution is more implementable.
  • Figure 1 is an exemplary division method of the present application.
  • the second memory 130 can also be deployed inside the accelerator 120.
  • the accelerator 120 itself includes At least two memories, one of which has a higher reading and writing efficiency than the second memory, then the memory with low reading and writing efficiency inside the accelerator 120 can be used as the second memory 130, and the memory with high reading and writing efficiency can be used as the first memory.
  • Memory 121 the memory with low reading and writing efficiency inside the accelerator 120 can be used as the second memory 130, and the memory with high reading and writing efficiency can be used as the first memory.
  • the communication between the first memory 121 and the second memory 130 may be off-chip communication.
  • the bus can be an off-chip bus.
  • the off-chip bus here generally refers to the public information channel between the CPU and external devices, such as the above-mentioned PCIe bus, EISA bus, UB bus, CXL bus, CCIX bus, GenZ bus, etc., this application No specific limitation is made.
  • the communication between the first memory 121 and the second memory 130 may be in-band communication, and the bus between the first memory 121 and the second memory 130 may be On-chip buses, such as advanced eXtensible interface (AXI) bus, advanced microcontroller bus architecture (AMBA), etc., are not specifically limited in this application.
  • AXI advanced eXtensible interface
  • AMBA advanced microcontroller bus architecture
  • the data to be cached stored in the second memory 130 can be cached in the first memory 121.
  • the accelerator 120 needs to access the data in the second memory 130, it can be accessed from The data is read from the first memory 121 with higher reading and writing efficiency, thereby making up for the difference in processing speed between the accelerator 120 and the low-performance memory, and improving the data reading and writing efficiency of the accelerator 120 .
  • first memory 121 and the second memory 130 in Figure 2 are the original memories inside the accelerator 120.
  • the data reading and writing efficiency of the accelerator 120 is improved through algorithms, without additional Deploy caching hardware to reduce the implementation cost of the solution.
  • the solution is more implementable.
  • Figure 3 is an accelerator provided by this application.
  • a schematic structural diagram of , Figure 3 is an exemplary division method.
  • the accelerator 120 may include a storage controller 122, a first memory 121, a status information memory 123 and an address memory 124, where the storage controller 122, A communication connection is established between the first memory 121, the status information memory 123 and the address memory 124 through an internal bus.
  • the internal bus reference can be made to the description of the bus 160, which will not be repeated here.
  • the accelerator 120 may also include a power supply circuit, and the power supply circuit may provide power to the memory controller 122 .
  • the memory controller 122 can be implemented by a hardware logic circuit, for example, an application specific integrated circuit (ASIC) to implement various functions of the accelerator 120 .
  • the power supply circuit may be located in the same accelerator as the storage controller 122, or may be located in another accelerator other than the accelerator where the storage controller 122 is located.
  • the power supply circuit includes but is not limited to at least one of the following: a power supply subsystem, a power management accelerator, a power management processor, or a power management control circuit.
  • accelerator 120 is an independent accelerator.
  • the first memory 121 is used to store data
  • the status information memory 123 is used to store the status information of the data
  • the address memory 124 is used to store the address information of the data.
  • the first memory 121 may be a memory with a greater reading and writing efficiency than the second memory 130 , such as SRAM; the storage space required for status information is small, but it needs to be synchronized with the status of the data, so the status information memory 123 can be a register (register); the address memory 124 can be a CAM.
  • the CAM is based on content
  • the working mechanism of the addressed memory is to compare an input data item with all data items stored in the CAM to determine whether the input data item matches the data items stored in the CAM, so the address information used to store the data
  • the address memory 124 can be implemented using the CAM in the accelerator 120. In this way, when the user requests to access data, the CAM can match the address of the data requested by the user with the address information stored in the CAM. If it matches, it means that the data has been stored in In the first memory 121, it should be understood that the above examples are for illustration, and the address memory 124 can also be implemented using other memories, which is not specifically limited in this application.
  • the first memory 121 in this application is implemented using the memory in the accelerator 120, and does not require additional cache deployment.
  • the cache function of accelerators such as FPGA and ASIC is realized with very low hardware cost, and the software implementation process only needs to pass
  • the online programming function of FPGA and ASIC can be realized, which can solve the problem of accelerator's read and write efficiency limited by the memory bandwidth bottleneck in a simple, efficient and low-cost way.
  • the storage controller 122 can obtain a data processing instruction, wherein the data processing instruction includes the address of the second memory 130, and reads the data from the first memory according to the address of the second memory 130, The data is then processed according to the data processing instructions.
  • data processing instructions may be data processing instructions generated by the accelerator 120 during business processing, or may be data processing instructions sent by the processor 110 to the accelerator 120, which is not specifically limited in this application.
  • the storage controller 122 may store data in the second memory 130, and then store the data to be cached in the second memory 130 in the first memory 121.
  • the storage controller 122 can update the data to be cached to the first memory 121 in real time, or can update the data to be cached to the first memory 121 with a delay or according to a certain algorithm, which is not specifically limited in this application.
  • the data to be cached includes data historically accessed by the accelerator 120 . In this way, when the accelerator 120 accesses the data again, it can directly interact with the first memory 121 that has a faster reading and writing speed, thereby improving the data reading and writing efficiency of the accelerator 120 .
  • the data to be cached may include data whose historical access frequency is higher than the first threshold. That is to say, the data stored in the first memory 121 is the data with a higher access frequency stored in the second memory 130. Since the first memory has a faster reading and writing efficiency, the accelerator 120 can directly process the frequently accessed data. The first memory 121 interacts, thereby improving the reading and writing efficiency of the accelerator 120 .
  • the size of the first threshold can be determined according to specific application scenarios, and is not specifically limited in this application.
  • the access frequency in the first memory 121 can be The data whose rate is not higher than the second threshold is deleted, and then the data accessed by the accelerator 120 from the second memory 130 is continued to be stored in the first memory 121.
  • the data in the first memory 121 reaches the storage threshold, the data in the first memory 121 is deleted.
  • Data whose access frequency is not higher than the second threshold is deleted, so that the data stored in the first memory 121 is data whose access frequency is higher and is accessed by the accelerator 120 .
  • the first threshold and the second threshold may be the same value, or the second threshold may be greater than the first threshold, which is not specifically limited in this application.
  • the data to be cached may also be data recently accessed by the accelerator 120 , where the latest may refer to data accessed by the accelerator 120 within a time range, and the time range here may be determined according to the storage capacity of the first memory 121 , specifically, when the amount of data in the first memory 121 reaches the storage threshold, the access time of each data in the first memory 121 is sorted, and the data with the longest access time from the current time will be deleted, and so on, so that It is guaranteed that the access time of the data stored in the first memory 121 is the latest access time.
  • the data recently accessed by the accelerator 120 may be the data accessed by the accelerator 120 within a preset time range. That is to say, if the current time is T, then the preset time range may be the period from time T-t to time T.
  • the range, where the size of t can be determined according to specific application scenarios, is not specifically limited in this application.
  • the possibility of the recently accessed data in the second memory 130 being accessed again by the accelerator 120 is very high, and it is stored in the first memory 121, so that when the accelerator 120 accesses the data again, it can be directly related to the read and write speed.
  • the faster first memory 121 interacts, thereby improving the data reading and writing efficiency of the accelerator 120 .
  • the data to be cached may also include prefetched data.
  • the storage controller 122 may determine the data that the accelerator 120 may access through a prefetching algorithm, extract it from the second memory 130 in advance, and store it in In this way, when the accelerator 120 requests to read the data, it can directly interact with the first memory 121, thereby improving the reading and writing efficiency of the accelerator 120.
  • the data to be cached can also include more types of data, which can be specifically determined according to the application scenario of the accelerator 120, which is not specifically limited in this application.
  • the user before the storage controller 122 stores data in the first memory 121, the user can configure the storage controller 122.
  • the specific configuration content may include configuring the cache switch, configuring the address range, configuring the cache depth, etc.
  • when the cache switch is configured to be on it means that the first memory 121 is used for data storage, and when it is set to off, it means that the first memory 121 is not used for data storage.
  • the address range configured as the target address range means that the data stored in the first memory 121 is the cache data of the data stored in the target address range of the second memory 130 .
  • the cache depth configuration of D means that the storage capacity of the first memory 121 is D.
  • the user can also perform other configurations on the storage controller 122, which can be determined based on the actual business environment, and is not specifically limited in this application.
  • the cache depth is configured as 2M
  • the cache switch is configured as on
  • the address range is configured as add0 ⁇ add5
  • the data requested by the accelerator 120 to write add0 ⁇ add5 will be cached first.
  • the data of add0 to add5 can also be written to the second memory 130 with a delay, while data requested to be written to other addresses in the second memory 130 can be directly written to the second memory 130 .
  • the cache switch can also be turned off. It should be understood that the above examples are for illustration and are not specifically limited in this application.
  • the data processing instructions obtained by the storage controller 122 may be data read instructions or data write instructions.
  • the data write instructions include the second storage address of the data.
  • the second storage address is the storage address of the second memory 130. You can first determine whether the cache switch is on. If the cache switch is on, determine whether the address carried by the data processing instruction is within the address range configured by the user. If it is within the address range configured by the user, configured address range, then the storage controller 122 can store the data in the first memory 121 , otherwise, the data can be stored in the second memory 130 .
  • the storage controller 122 obtains the data read instruction, it determines that the cache switch is on and the read address is within the address range configured by the user, and then reads the data from the first memory 121. Otherwise, The data is read from the second memory 130 and will not be described again here.
  • the storage controller 122 may store the correspondence between the first storage address and the second storage address of the data in the address memory 124 .
  • the write address in the data write instruction is the second storage address of the data. If the above-mentioned second storage address is already stored in the address memory 124 and is the same as the second storage address, A storage address correspondence means that the historical version of the data has been written to the first storage address of the first memory 121. At this time, the historical version of data corresponding to the first storage address can be updated and sent to the second memory. 130. Send the data to request updating of the data corresponding to the second storage address.
  • the storage controller 122 can first determine the first storage address corresponding to the second storage address, and then carry the data writing instruction. The data is stored in the first storage address of the first memory 121, and the corresponding relationship between the first storage address and the second storage address of the data is stored in the address memory 124. It should be noted that when the storage controller 122 writes data for the first time, it can determine the first storage address of the data based on the current free address of the first memory 121. This application does not limit the address allocation strategy of the storage controller 122.
  • the storage controller 122 when the storage controller 122 obtains a data read instruction, the read address in the data read instruction is the second storage address of the data. At this time, the storage controller 122 can compare the second storage address with the address memory. Match the address in 124. If the address memory 124 includes the second storage address, the corresponding first storage address can be obtained according to the second storage address of the data, and then the data can be read from the first memory 121.
  • the storage controller 122 can read the data from the second memory 130 according to the second storage address. data. At the same time, the storage controller 122 can also determine the first storage address of the data, and store the data read from the second memory 130 into the first memory 121, so that the accelerator 120 can directly read and write the data directly from the second memory 130. The data is read from the first memory 121, and the mapping relationship between the first storage address and the second storage address of the data is stored in the address memory 124.
  • the accelerator 120 includes a status information memory 123.
  • the status information memory 123 is used to store the status information of the data.
  • the storage controller 122 may first determine the status information of the data, and write the data into the third page according to the status information.
  • the status information includes a first state and a second state. The first state is used to indicate that the data is in a modification state and is being written to the first memory 121 or the second memory 130. At this time, the data cannot be read from the first memory 121 or the second memory 130.
  • the second state is used to indicate that the data is not in a modified state, and the data can be read from the first memory 121 at this time.
  • the accelerator 120 when the accelerator 120 processes the data writing instruction, if the status information of the data is in the first state, that is, the data is currently being modified, the accelerator can first set the status information of the data to the second state. , then write the data into the first storage address, and then write the data into the second memory 130 for data update.
  • the data in the data processing instruction can be written to the first memory 121 for data update, and the data is written into The second memory 130 performs data updating.
  • the accelerator 120 when the accelerator 120 processes the data read instruction, if the address memory does not include the read address, , that is to say, there is no such data in the first memory 121, and when the accelerator 120 reads the data from the second memory, the accelerator 120 sets the state information of the data to the first state, and the accelerator 120 reads the data from the second memory 130, After retrieving the data, when the status information of the data is in the first state, the accelerator 120 sets the status information of the data to the second status, and stores the data to the first storage address of the first memory 121 . If the status information changes to the second status, it means that the storage controller 122 has updated the latest version of the data, and the data can no longer be stored in the first memory 121 at this time. It should be understood that if a new version of data is written into the first memory 121 and the second memory 130 during the data reading process, the status information can ensure that the data is the latest version of the data.
  • the address memory 124 includes a read address and the status information is the first state, that is, the previous version of the data is being operated on, the latest version of the data can be read from the second memory 130 at this time. If the address memory 124 includes a read address and the status information is in the second state, that is, the data is not being operated on, the latest version of the data can be read from the first memory 121 at this time.
  • the above state information can be represented by binary characters.
  • the first state is represented by the character "1”
  • the second state is represented by the character "0”.
  • the first state and the second state can also be distinguished by other identifiers. status, this application does not specifically limit this.
  • the above-mentioned accelerator 120 can also be a CPU
  • the first memory can be a memory in the CPU, such as an SRAM in the CPU
  • the storage controller 122 can cache data in combination with the multi-level cache architecture of the CPU, such as as the third memory of the CPU.
  • the four-level cache architecture enables more levels of CPU cache and reduces the implementation cost of multi-level cache on the basis of reducing hardware complexity.
  • the data processing instructions not only include the above-mentioned data reading instructions and data writing instructions, but also include other instructions for performing business processing after reading the data, such as after the accelerator 120 reads the data from the first memory.
  • the data can be updated, deleted, merged, etc., and the multiple data read can also be calculated and processed, such as matrix multiplication, convolution operations, etc.
  • the details can be determined according to the processing business of the accelerator 120. This application is The specific process for subsequent processing of data is specifically limited.
  • FIG 4 is a schematic structural diagram of a data processing device provided by this application.
  • the data processing device 400 can be the accelerator 120 in Figure 3.
  • the data processing device 400 can include a configuration module 1221 and an acquisition module 1225. and processing module 1226, wherein the processing module 1226 may include a search and write data update module 1222, a read data return processing module 1223, and a read data update module 1224. It should be understood that each of the above modules may correspond to each circuit module in an ASIC or FPGA. .
  • the functions of the acquisition module 1225, the configuration module 1221 and the processing module 1226 in Figure 4 can be implemented by the storage controller 122 in Figure 3, and the data processing device 400 shown in Figure 4 is the one shown in Figure 1
  • the division method in the application scenario is the application scenario in which the second memory 130 is deployed outside the accelerator. It should be understood that in the application scenario shown in Figure 2, the second memory 130 is deployed in the data processing device 400. This will not be repeated here. illustrate.
  • the acquisition module 1225 is used to acquire data processing instructions generated by the accelerator 120, which may include data writing instructions and data reading instructions.
  • the configuration module 1221 is used to receive configuration information input by the user.
  • the configuration information may include information about configuring the cache switch, information about configuring the address range, and information about configuring the cache depth. Wherein, the information about configuring the cache switch is on, which means using the first
  • the memory 121 performs data storage, which means that the first memory 121 is not used for data storage.
  • the information configuring the address range includes the target address range, that is, the data stored in the first memory 121 is the cache data of the data stored in the target address range of the second memory 130 .
  • the information configuring the cache depth includes the cache depth D, that is, the storage capacity of the first memory 121 is D.
  • the search and write data update module 1222 is used to obtain the data writing instructions generated by the accelerator 120, and perform the data writing instructions for processing. Specifically, the search and write data update module 1222 may first query whether the second storage address exists in the address memory 124 according to the second storage address of the data carried in the data writing instruction. If the second storage address exists in the address memory 124 , then query whether the state information of the data in the state information memory 123 is the first state. If it is the first state, modify the first state of the data to the second state, and read the data from the first memory 121 according to the first storage address. And update it, the updated data is rewritten into the first memory 121, and then the updated data is written into the second memory 130; if it is the second state, the data is directly written into the first memory 121, and then the Data is written to the second memory 130.
  • the first storage address corresponding to the second storage address is determined according to the current storage capacity of the first memory 121, and the correspondence between the first storage address and the second storage address is The relationship is stored in the first memory 121, and the data is stored in the first memory 121 and the second memory 130.
  • the read data return processing module 1223 and the read data update module 1224 are used to obtain the data read instructions generated by the accelerator 120 and process the data read instructions. Among them, the read data return processing module 1223 is mainly used to read data, and the read data update module 1224 is mainly used to update data.
  • the read data return processing module 1223 can first query whether the second storage address exists in the address memory 124 according to the second storage address of the data carried in the data reading instruction. If the second storage address exists in the address memory 124, Then query whether the status information of the data in the status information memory 123 is the first state. If it is the first state, it means that the data is being modified, so the data can be read from the second memory 130; if it is the second state, it means the data has not been modified. is modified so that the data can be read from the first memory 121.
  • the read data update module 1224 can first store the status information of the data in the status information memory 123 as the second status, and then determine the second storage address corresponding to the storage capacity of the first memory 121. The first storage address, and the corresponding relationship between the first storage address and the second storage address is stored in the first memory 121.
  • the read data return processing module 1223 reads data from the second memory.
  • the read data update module 1224 Determine whether the state information of the data in the state information memory 123 is the first state. If it is the first state, modify the state information to the second state, and then update the data to the first memory. If it is the second state, do not change the state information to the first state. Data is updated to the first memory. It should be understood that if it is the second state, it means that the search and write data update module 1222 has updated the data in the first memory during this period, so there is no need to update the data to the first memory.
  • Figure 4 is an exemplary division method.
  • the memory controller 122 provided in this application can also be divided into more modules.
  • the address memory 124 is searched for the first data processing instruction.
  • the step of storing addresses is implemented by the search and write data update module 1222.
  • the search and write data update module 1222 can also be further divided into a search module and a write data update module, which is not specifically limited in this application.
  • the accelerator provided by the present application stores the data to be cached in the second memory in the first memory, where the read and write efficiency of the first memory is greater than the read and write efficiency of the second memory, so that the accelerator is faster when reading and writing data. It can directly interact with the first memory with high reading and writing efficiency, thereby improving the data reading and writing efficiency of the accelerator.
  • the first memory is implemented by the memory in the accelerator, such as SRAM, register, CAM, etc., without the need to deploy additional cache. , realizes the cache function of accelerators such as FPGA and ASIC with very low hardware cost, which can solve the problem of accelerator's read and write efficiency limited by the memory bandwidth bottleneck in a simple, efficient and low-cost way.
  • Figure 5 is a schematic flowchart of the steps of the data processing method provided by this application in an application scenario.
  • Figure 6 is a schematic flowchart of the steps of processing data reading instructions in a data processing method provided by this application.
  • Figure 7 is a schematic flowchart of the steps of processing data reading instructions in a data processing method provided by this application.
  • a schematic flow chart of steps for processing data writing instructions in a data processing method is provided. Simply put, Figure 6 is used to describe step 7 in Figure 5, and Figure 7 is used to describe step 8 in Figure 5. As shown in Figures 5 to 7, the method may include the following steps:
  • Step 1 Configure the storage controller 122. This step can be implemented by the configuration module 1221 in Figure 4.
  • the configuration content includes one or more of cache switch status, cache address range, and cache capacity, where the cache switch status is used to indicate whether the accelerator uses the first memory 121, and the cache address range is used to indicate that the accelerator will store Data whose address is within the cache address range is stored in the first memory 121 , and the cache capacity is used to indicate the capacity of the first memory 121 .
  • the cache depth configuration of D means that the storage capacity of the first memory 121 is D.
  • the user can also perform other configurations on the storage controller 122, which can be determined based on the actual business environment, and is not specifically limited in this application.
  • Step 2 The storage controller 122 obtains data processing instructions.
  • the data processing instruction may be a data reading instruction or a data processing instruction. This step can be implemented by the acquisition module 1225 in Figure 4.
  • Step 3 The storage controller 122 determines whether the cache switch is on. If the cache switch is on, step 4 is performed. If the cache switch is off, step 9 is performed. Wherein, when the cache switch is configured to be on, it means that the first memory 121 is used for data storage, and when it is set to off, it means that the first memory 121 is not used for data storage. This step can be implemented by the configuration module 1221 in Figure 4.
  • Step 4 The storage controller 122 determines whether the address range is set. If the address range is set, step 5 is performed. If the address range is not set, step 9 is performed. This step can be implemented by the configuration module 1221 in Figure 4.
  • step 6 when the address range is not set, since the cache switch is already turned on, that is to say, the user wants to use the cache function of the first memory 121, the user can be prompted to set the cache address range, or the step can be skipped. For the processing flow of 5, directly execute step 6.
  • Step 5 The storage controller 122 determines whether the address is within the range, where the address refers to the address of the data access request received in step 2, and the range refers to the cache address range configured in step 1. If the address is within the range, 1 is within the configured cache address range, go to step 6. If it is not within the cache address range, go to step 9. This step can be implemented by the configuration module 1221 in Figure 4.
  • Step 6 The storage controller 122 determines whether the data processing instruction is a data read instruction. If it is a data read instruction, step 7 is performed. If it is not a data read instruction, step 8 is performed. This step can be implemented by the configuration module 1221.
  • Step 7 The storage controller 122 processes the data read instruction.
  • the process of processing the data read instruction in step 7 will be described in detail in the embodiment of Figure 6. This step can be implemented by the read data return processing module 1223 and the read data update module 1224 in Figure 4.
  • Step 8 The storage controller 122 processes the data write instruction. Among them, the process of processing the data writing instruction in step 8 will be described in detail in the embodiment of FIG. 7 . This step can be implemented by the search and write data update module 1222 in Figure 4.
  • Step 9 The storage controller 122 issues a read request or write request to the second memory.
  • the storage controller 122 can issue a DDR read request.
  • request if the data access request is a data processing instruction, then the storage controller 122 can issue a DDR write request.
  • the above example is for illustration and is not specifically limited in this application.
  • step 7 is a flow chart of the processing steps of the data read instruction provided by this application. As shown in Figure 6, step 7 may include the following step flow:
  • Step 71 The address memory 124 determines whether the same address is stored in the address memory 124, where the data read instruction carries the read address of the data, and the address is the address of the second memory 130. This step can be implemented by the read data return processing module 1223 in Figure 4.
  • the address memory 124 may be a CAM, where the CAM is a memory that is addressed by content.
  • the working mechanism of the memory is to compare an input data item with all data items stored in the CAM to determine whether the input data item matches the data items stored in the CAM, so the address used to store the address information of the data
  • the memory 124 can be implemented using a CAM in the accelerator 120. In this way, when the user requests to read data, the CAM can match the read address with the address information stored in the CAM. If they match, it means that the data has been stored in the first memory 121. middle.
  • the address memory 124 can also be implemented using other memories, and then the memory controller 122 implements the function of obtaining the address from the address memory 124 and matching it with the address carried in the data read instruction. Applications are not subject to specific restrictions.
  • step 71 determines whether the same address exists in the address memory 124, if it exists, it means that the latest or historical version of the data has been stored in the first memory 121, and step 72 can be executed. If it does not exist, it means that the first memory 121 There is no version of this data stored in , so step 75 can be performed.
  • Step 72 Determine whether the status information is the first state.
  • the first state can also be represented by a high-order state or a "1" state, which is not specifically limited in this application.
  • This step can be implemented by the read data return processing module 1223 in Figure 4.
  • the status information memory 123 may be a register, and step 72 may be implemented by a register that determines whether the status information is the first state, or the status information memory 123 may implement the storage function, and the storage controller 122 may implement the determination function.
  • the storage controller 122 obtains the status information of the data from the status information memory 123 and determines whether it is in the first state. This application does not specifically limit this.
  • step 73 when the status information of the data is in the first state (high bit), step 73 is executed, and when the status information of the data is in the second state (low bit), step 74 is executed.
  • Step 73 Read data from the second memory 130.
  • the storage controller 122 may issue a data read instruction to the second memory 130. If the second memory 130 is DDR, then the data read instruction may be a DDR read instruction. ask. This step can be implemented by the read data return processing module 1223 in Figure 4.
  • the status information of the data is the first state, referring to the foregoing content, it can be seen that when the data is in the first state, the data may be in the process of being retrieved from the second memory 130, so the data is read directly from the second memory 130 at this time. Fetching data can avoid reading errors and improve the accuracy of data reading.
  • Step 74 Read data from the first memory. Specifically, the storage controller 122 may query the address memory 124 to determine the first storage address of the data in the first memory according to the read address carried in the data read instruction, and then read the data from the first storage address. This step can be implemented by the read data return processing module 1223 in Figure 4.
  • Step 75 Set the state information to the first state, and store the correspondence between the first storage address and the second storage address.
  • the first state can also be represented by a high-order state or a "1" state, which is not specifically limited in this application.
  • the above corresponding relationship will be stored in the address memory 124.
  • This step can be implemented by the read data update module 1224 in Figure 4.
  • Step 76 Read data from the second memory.
  • the storage controller 122 may issue a data read instruction to the second memory 130. If the second memory 130 is DDR, then the data read instruction may be a DDR read request. This step can be implemented by the read data update module 1224 in Figure 4.
  • Step 77 Determine whether the status information is the first status (high bit). If it is the first state, steps 78 to 79 are executed. If it is the second state (low level), steps 78 to 79 are not executed. This step can be implemented by the read data update module 1224 in Figure 4.
  • the status information changes to the second state (low bit), referring to the foregoing content, it can be seen that during the data writing process, if the status information becomes the first state (high bit), the storage controller 122 will change the status information of the data from The first state (high bit) is modified to the second state (low bit), which means that the data has been modified between step 75 and step 79, and the storage controller 122 is writing or has written the latest version of the data to the first memory 121 , so the data read at this time can not be written into the first memory 121 to avoid overwriting the data of the new version.
  • the status information does not change to the second state (low bit), it means that the data has not been modified between step 75 and step 79. Therefore, the data read from the second memory 130 at this time can be updated to the second state.
  • Step 78 Modify the status information to the second status (low). This step can be implemented by the read data update module 1224 in Figure 4.
  • Step 79 Update the data to the first memory 121. This step can be implemented by the read data update module 1224 in Figure 4.
  • Step 8 may include the following step flow:
  • Step 81 Determine whether the same address exists.
  • the data processing instruction carries the write address of the data.
  • Step 81 can determine whether the write address exists in the address memory 124.
  • this step can be implemented by the search and write data update module 1222 in Figure 4 .
  • step 82 is executed, and if the same address does not exist, step 84 is executed.
  • Step 82 Determine whether the status information is the first state.
  • the first state can also be represented by a high-order state or a "1" state, which is not specifically limited in this application.
  • This step can be implemented by the search and write data update module 1222 in Figure 4.
  • step 83, step 85 and step 86 are executed.
  • step 85 and step 86 are executed.
  • Step 83 Set the status information to the second state.
  • the second state can also be represented by the low-order state or the "0" state, which is not specifically limited in this application.
  • This step can be implemented by the search and write data update module 1222 in Figure 4.
  • step 75 when the status information is in the first state, it means that the data is executing steps 75 to 78. At this time, the data received in step 83 is the latest version of data, so the status information is set to the second state. status, which can prevent the old version data in steps 75 to 78 from overwriting the current new version data.
  • step 83 step 85 and step 86 can be executed.
  • Step 84 Store the corresponding relationship between the first storage address and the second storage address, and store it in the address memory 124. It can be understood that when the address carried in the data processing instruction does not exist in the address memory 124, it means that the historical version of the data has not been written to the first memory 121. Therefore, it can be divided into writes to the first memory 121 at this time. Enter the address, that is, the first storage address, and then store the first storage address into the address memory 124. In this way, when the storage controller 122 receives a read request for the data, the first storage address of the data can be obtained through the address memory. address, and then reads the data to achieve data caching. This step can be implemented by the search and write data update module 1222 in Figure 4.
  • step 84 step 85 and step 86 can be executed.
  • Step 85 Write data to the first memory. This step can be implemented by the search and write data update module 1222 in Figure 4.
  • Step 86 Write data to the second memory. This step can be implemented by the search and write data update module 1222 in Figure 4.
  • the data processing method provided by this application stores data in the first memory and stores the data in the first memory in the second memory, where the read and write efficiency of the first memory is greater than the read and write efficiency of the second memory.
  • the accelerator to directly interact with the first memory with higher reading and writing efficiency when reading and writing data, thereby improving the data reading and writing efficiency of the accelerator.
  • the first memory is implemented by the memory in the accelerator, such as SRAM, register, CAM, etc. etc., without the need to deploy additional cache, and realize the cache function of accelerators such as FPGA and ASIC with very low hardware cost, which can simply, efficiently and cost-effectively solve the problem of accelerator read and write efficiency limited by the memory bandwidth bottleneck.
  • the embodiment of the present application provides an accelerator.
  • the accelerator includes a processor and a power supply circuit.
  • the power supply circuit is used to process The processor supplies power, and the processor is used to implement the functions of the operating steps performed by the accelerator described in Figures 5 to 7 above.
  • Embodiments of the present application provide a computing device.
  • the computing device includes a CPU and an accelerator.
  • the CPU is used to run instructions to implement business functions of the computing device, and to implement the functions of the operation steps performed by the accelerator described in Figures 5 to 7.
  • the above embodiments are implemented in whole or in part by software, hardware, firmware or any other combination.
  • the above-described embodiments are implemented in whole or in part in the form of a computer program product.
  • a computer program product includes at least one computer instruction.
  • the computer is a general-purpose computer, a special-purpose computer, a computer network, or other programming device.
  • Computer instructions are stored in or transmitted from one computer-readable storage medium to another, e.g., from a website, computer, server, or data center via wired (e.g., coaxial cable, fiber optic cable) , digital subscriber line (DSL)) or wireless (such as infrared, wireless, microwave, etc.) to transmit to another website, computer, server or data center.
  • Computer-readable storage media are any media that can be accessed by a computer or data storage nodes such as servers and data centers that contain at least one media collection.
  • the media used is magnetic media (for example, floppy disk, hard disk, tape), optical media (for example, high-density digital video disc (DVD)), or semiconductor media.
  • the semiconductor medium is SSD.

Abstract

The present application provides a data processing method and apparatus and a related device. The method is applied to an accelerator. The accelerator comprises a first memory. The method comprises the following steps: the accelerator obtains a data processing instruction, wherein the data processing instruction comprises an address of data in a second memory, and the data read-write efficiency of the first memory is greater than that of the second memory; and the accelerator reads the data from the first memory according to the data processing instruction and processes the data, wherein the first memory is used for storing data to be cached of the second memory, and the data to be cached comprises data historically accessed by the accelerator, so that the accelerator can directly interact with the first memory having high read-write efficiency when reading and writing data. Therefore, the data read-write efficiency of the accelerator is improved, and the problem that the read-write efficiency of the accelerator is limited by the internal memory bandwidth bottleneck is solved.

Description

一种数据处理方法、装置及相关设备A data processing method, device and related equipment
本申请要求于2022年4月27日提交至中国专利局、申请号为202210451716.8、发明名称为“一种数据处理方法、装置及相关设备”的中国专利申请的优先权,所述专利申请的全部内容通过引用结合在本申请中。This application requests the priority of the Chinese patent application submitted to the China Patent Office on April 27, 2022, with the application number 202210451716.8 and the invention title "A data processing method, device and related equipment". All of the patent applications The contents are incorporated into this application by reference.
技术领域Technical field
本申请涉及计算机领域,尤其涉及一种数据处理方法、装置及相关设备。The present application relates to the field of computers, and in particular, to a data processing method, device and related equipment.
背景技术Background technique
随着科学技术的不断发展,专用集成电路(application-specific integrated circuit,ASIC)、现场编程逻辑门阵列(field-programmable gate array,FPGA)等加速器(又可称为加速处理单元)在数据处理方面承担了越来越重要的角色,比如矩阵运算、图像处理和机器学习(machine learning,ML)等等。这些领域中,加速器对内存中数据读写的实时性有着较高的需求。但是,当前加速器对内存中数据的访问频率和内存为加速器提供的可用带宽不均衡,在加速器需对内存中数据执行读写操作时,往往需要额外等待一定时间周期,导致加速器的处理性能下降。With the continuous development of science and technology, accelerators (also called accelerated processing units) such as application-specific integrated circuits (ASICs) and field-programmable gate arrays (FPGAs) are playing an important role in data processing. It has assumed an increasingly important role, such as matrix operations, image processing, machine learning (ML), etc. In these fields, accelerators have high requirements for real-time reading and writing of data in memory. However, the current access frequency of the accelerator to data in the memory is not balanced with the available bandwidth provided by the memory to the accelerator. When the accelerator needs to perform read and write operations on the data in the memory, it often needs to wait for an additional period of time, resulting in a decrease in the processing performance of the accelerator.
发明内容Contents of the invention
本申请提供了一种数据处理方法、装置及相关设备,用于解决加速器对内存中数据的访问频率和内存为加速器提供的可用带宽不均衡,导致加速器处理性能下降的问题。This application provides a data processing method, device and related equipment to solve the problem of accelerator processing performance degradation caused by imbalance between the accelerator's access frequency to data in the memory and the available bandwidth provided by the memory to the accelerator.
第一方面,提供了一种数据处理方法,该方法可应用于加速器,加速器包括第一存储器,该方法可包括以下步骤:加速器获取数据处理指令,其中,数据处理指令包括数据在第二存储器中的地址,第一存储器的数据读写效率大于第二存储器的数据读写效率,加速器根据数据处理指令从第一存储器读取数据,对数据进行处理,其中,第一存储器用于存储第二存储器中的待缓存数据,待缓存数据包括被加速器历史访问的数据。In a first aspect, a data processing method is provided. The method is applicable to an accelerator. The accelerator includes a first memory. The method may include the following steps: the accelerator obtains data processing instructions, wherein the data processing instructions include data in the second memory. address, the data reading and writing efficiency of the first memory is greater than the data reading and writing efficiency of the second memory, the accelerator reads data from the first memory according to the data processing instructions, and processes the data, where the first memory is used to store the second memory The data to be cached in , the data to be cached includes data accessed by the accelerator in history.
实施第一方面描述的方法,加速器将第二存储器中的待缓存数据存储于第一存储器,其中第一存储器的读写效率大于第二存储器的读写效率,使得加速器在对数据处理指令进行处理时,可以直接与读写效率较高的第一存储器进行交互,从而提高加速器的数据读写效率。To implement the method described in the first aspect, the accelerator stores the data to be cached in the second memory in the first memory, where the read and write efficiency of the first memory is greater than the read and write efficiency of the second memory, so that the accelerator is processing the data processing instructions. At this time, it can directly interact with the first memory with higher reading and writing efficiency, thereby improving the data reading and writing efficiency of the accelerator.
在一可能的实现方式中,加速器通过专用集成电路(application-specific integrated circuit,ASIC)或者现场编程逻辑门阵列(field-programmable gate array,FPGA)技术实现,第一存储器包括静态随机存储器(static RAM,SRAM)、存储级内存(storage class memory,SCM)、寄存器(register)、内容可寻址存储器(content-addressable memory,CAM)等等,本申请不作具体限定。In a possible implementation, the accelerator is implemented through application-specific integrated circuit (ASIC) or field-programmable gate array (FPGA) technology, and the first memory includes static random access memory (static RAM) , SRAM), storage class memory (storage class memory, SCM), register, content-addressable memory (content-addressable memory, CAM), etc., this application does not make specific limitations.
具体实现中,加速器所在的计算设备可包括中央处理器(central processing unit,CPU)以及加速器,加速器可以是通过FPGA、ASIC等技术实现的系统级芯片,具体可以是计算设备100内用于辅助CPU处理特殊类型计算任务的处理单元,上述特殊类型计算任务可以是图形处理、矢量计算、机器学习等等。加速器可以是图像处理器(graphics processing unit,GPU)、处理器分散处理单元(data processing unit,DPU)、神经网络处理器(neural-network processing unit,NPU)等。可选地,加速器也可以是CPU,换句话说,计算设备可包括多个处理器比如CPU1 和CPU2,其中CPU1为处理器110,CPU2为加速器,应理解,上述举例用于说明,本申请不作具体限定。In specific implementation, the computing device where the accelerator is located may include a central processing unit (CPU) and an accelerator. The accelerator may be a system-level chip implemented through FPGA, ASIC and other technologies. Specifically, it may be used in the computing device 100 to assist the CPU. A processing unit that handles special types of computing tasks. The above special types of computing tasks can be graphics processing, vector calculations, machine learning, etc. The accelerator can be a graphics processing unit (GPU), a distributed processing unit (DPU), a neural network processing unit (NPU), etc. Alternatively, the accelerator may also be a CPU. In other words, the computing device may include multiple processors such as CPU1 and CPU2, where CPU1 is the processor 110 and CPU2 is an accelerator. It should be understood that the above examples are for illustration and are not specifically limited in this application.
具体实现中,第一存储器和第二存储器可以是加速器原有的内部存储器;或者,第一存储器是加速器原有的内部存储器,第二存储器是加速器外部原有的存储器,比如第二存储器是加速器所在计算设备中的存储器,应理解,本申请提供的方法,可结合具体的应用场景,在加速器内部选择合适存储器作为第一存储器,如果加速器内部有多个存储器,那么可以将读写效率低的作为第二存储器,读写效率高的作为第一存储器,如果加速器内部没有多个存储器或者多个存储器的读写效率相同,那么可以将加速器内部的存储器作为第一存储器,将加速器外部的读写效率低但是存储容量大的存储器作为第二存储器。In specific implementation, the first memory and the second memory may be the original internal memory of the accelerator; or the first memory is the original internal memory of the accelerator, and the second memory is the original memory outside the accelerator, for example, the second memory is the accelerator. For the memory in the computing device, it should be understood that the method provided by this application can be combined with specific application scenarios to select an appropriate memory as the first memory inside the accelerator. If there are multiple memories inside the accelerator, then the memory with low read and write efficiency can be As the second memory, the one with high read and write efficiency is used as the first memory. If there are not multiple memories inside the accelerator or the read and write efficiencies of multiple memories are the same, then the memory inside the accelerator can be used as the first memory, and the read and write memory outside the accelerator can be used. A memory with low efficiency but large storage capacity is used as the second memory.
上述实现方式,通过使用加速器内部原有的存储器硬件,通过算法提高加速器120的数据读写效率,无需额外部署缓存硬件,降低方案实施成本,尤其对于硬件规格尺寸小、由ASIC或FPGA技术实现的加速器来说,方案的可实施性更强。The above implementation method uses the original memory hardware inside the accelerator and improves the data reading and writing efficiency of the accelerator 120 through algorithms. There is no need to deploy additional cache hardware and reduces the cost of solution implementation, especially for small hardware specifications and sizes implemented by ASIC or FPGA technology. For accelerators, the plan is more implementable.
在一可能的实现方式中,待缓存数据包括访问频率高于第一阈值的数据。In a possible implementation, the data to be cached includes data with an access frequency higher than the first threshold.
具体实现中,加速器可以将数据存储于第二存储器,然后将第二存储器中的待缓存数据存储于上述第一存储器。加速器可实时将待缓存数据更新至第一存储器,也可延时或根据某种算法将所述待缓存数据更新至第一存储器,本申请不作具体限定。其中,待缓存数据包括被加速器历史访问的数据。这样,当加速器再次访问该数据时,可以直接与读写速度更快的第一存储器进行数据交互,从而提高加速器的数据读写效率。In specific implementation, the accelerator may store data in the second memory, and then store the data to be cached in the second memory in the first memory. The accelerator can update the data to be cached to the first memory in real time, or can update the data to be cached to the first memory in a delayed manner or according to a certain algorithm, which is not specifically limited in this application. The data to be cached includes data historically accessed by the accelerator. In this way, when the accelerator accesses the data again, it can directly interact with the first memory, which has a faster reading and writing speed, thereby improving the data reading and writing efficiency of the accelerator.
可选地,待缓存数据可以包括历史访问频率高于第一阈值的数据。也就是说,第一存储器中存储的数据是第二存储器中存储的访问频率较高的数据,由于第一存储器的读写效率较快,这样加速器处理频繁访问的数据时可以直接与第一存储器交互,从而提高加速器的读写效率。其中,第一阈值的大小可根据具体的应用场景来决定,本申请不作具体限定。Optionally, the data to be cached may include data whose historical access frequency is higher than the first threshold. That is to say, the data stored in the first memory is the data with higher access frequency stored in the second memory. Since the first memory has faster reading and writing efficiency, the accelerator can directly communicate with the first memory when processing frequently accessed data. Interaction, thereby improving the read and write efficiency of the accelerator. The size of the first threshold can be determined according to specific application scenarios, and is not specifically limited in this application.
可选地,当第一存储器中的数据达到存储阈值时,可以将第一存储器中访问频率不高于第二阈值的数据删除,然后继续将加速器访问第二存储器的数据存储于第一存储器,再当第一存储器中的数据达到存储阈值时,将第一存储器中访问频率不高于第二阈值的数据删除,使得第一存储器中存储的数据是访问频率较高并且被加速器访问的数据。其中,第一阈值和第二阈值可以是相同的数值,或者,第二阈值大于第一阈值,本申请不作具体限定。Optionally, when the data in the first memory reaches the storage threshold, the data in the first memory whose access frequency is not higher than the second threshold can be deleted, and then the data accessed by the accelerator from the second memory can be continued to be stored in the first memory, When the data in the first memory reaches the storage threshold, the data in the first memory whose access frequency is not higher than the second threshold is deleted, so that the data stored in the first memory is data that has a higher access frequency and is accessed by the accelerator. The first threshold and the second threshold may be the same value, or the second threshold may be greater than the first threshold, which is not specifically limited in this application.
可选地,待缓存数据也可以是最近被加速器访问的数据,这里的最近可以是指在时间范围内被加速器访问的数据,这里的时间范围可根据第一存储器的存储容量来确定,具体地,当第一存储器的数据量达到存储阈值时,将第一存储器中每个数据的访问时间进行排序,访问时间距离当前时间最久的数据将被删除,以此类推,从而保证第一存储器中存储的数据的访问时间为最近的访问时间。Optionally, the data to be cached may also be data recently accessed by the accelerator. The latest here may refer to data accessed by the accelerator within a time range. The time range here may be determined according to the storage capacity of the first memory. Specifically, , when the amount of data in the first memory reaches the storage threshold, the access time of each data in the first memory is sorted, and the data with the longest access time from the current time will be deleted, and so on, thereby ensuring that the data in the first memory is The access time of stored data is the most recent access time.
可选地,上述最近被加速器访问的数据可以是预设时间范围内被加速器访问的数据,也就是说,当前时刻为T,那么预设时间范围可以是T-t时刻~T时刻这一段时间范围,其中,t的大小可根据具体应用场景来决定,本申请不作具体限定。Optionally, the data recently accessed by the accelerator may be the data accessed by the accelerator within a preset time range. That is to say, if the current time is T, then the preset time range may be the time range from T-t time to T time, The size of t can be determined according to specific application scenarios, and is not specifically limited in this application.
可选地,待缓存数据也可以包括预取数据,简单来说,存储控制器可以通过预取算法确定加速器可能会访问的数据,将其提前从第二存储器中提取出来,存储至读写效率更快速的第一存储器中,这样,当加速器请求读取该数据时,可以直接与第一存储器交互,从而提高加速器的读写效率。Optionally, the data to be cached can also include prefetched data. Simply put, the storage controller can determine the data that the accelerator may access through the prefetch algorithm, extract it from the second memory in advance, and store it to read and write efficiency. In this way, when the accelerator requests to read the data, it can directly interact with the first memory, thereby improving the read and write efficiency of the accelerator.
应理解,待缓存数据还可以包括更多类型的数据,具体可根据加速器的应用场景进行判定,本申请不对此进行具体限定。 It should be understood that the data to be cached may also include more types of data, which may be determined based on the application scenario of the accelerator, which is not specifically limited in this application.
上述实现方式,由于访问频率高、最近被访问或者预取的数据被加速器再次访问的可能性很高,将其存储于第一存储器中,使得加速器再次访问该数据时,可以直接与读写速度更快的第一存储器进行交互,从而提高加速器的数据读写效率。In the above implementation, due to the high access frequency and the high possibility that the recently accessed or prefetched data will be accessed again by the accelerator, it is stored in the first memory, so that when the accelerator accesses the data again, it can be directly related to the read and write speed. Faster first memory interaction, thereby improving the accelerator's data reading and writing efficiency.
在一可能的实现方式中,加速器对第一存储器进行配置,获得第一存储器的配置信息,配置信息包括缓存开关状态、缓存地址范围、缓存容量中的一种或者多种,其中,缓存开关状态用于指示加速器是否使用第一存储器,缓存地址范围用于指示加速器将存储地址处于缓存地址范围的数据存储于第一存储器,缓存容量用于指示第一存储器的容量。In a possible implementation, the accelerator configures the first memory and obtains configuration information of the first memory. The configuration information includes one or more of cache switch status, cache address range, and cache capacity, where the cache switch status It is used to indicate whether the accelerator uses the first memory. The cache address range is used to instruct the accelerator to store data with a storage address within the cache address range in the first memory. The cache capacity is used to indicate the capacity of the first memory.
具体实现中,缓存开关配置为开指的是使用第一存储器进行数据存储,为关指的是不使用第一存储器进行数据存储。地址范围配置为目标地址范围指的是:第一存储器中存储的数据是第二存储器目标地址范围内的存储的数据的缓存数据。缓存深度配置为D指的是第一存储器的存储容量为D。In specific implementation, configuring the cache switch to on means using the first memory for data storage, and being off means not using the first memory for data storage. Configuring the address range as the target address range means that the data stored in the first memory is the cache data of the data stored in the target address range of the second memory. The cache depth configured as D means that the storage capacity of the first memory is D.
上述实现方式,通过对第一存储器进行配置,可以根据业务需求选择是否开启存储控制器的缓存功能,设置缓存的地址空间和容量,使得本申请的方案适用于更多的应用场景,方案灵活性更好。In the above implementation, by configuring the first memory, you can choose whether to enable the cache function of the storage controller and set the address space and capacity of the cache according to business needs, making the solution of this application suitable for more application scenarios and providing flexibility. better.
在一可能的实现方式中,加速器包括地址存储器,地址存储器用于存储数据在第一存储器中的地址与数据在第二存储器中的地址之间的对应关系。In a possible implementation, the accelerator includes an address memory, and the address memory is used to store a correspondence between an address of the data in the first memory and an address of the data in the second memory.
可选地,数据处理指令可以是加速器生成的指令,也可以是加速器所在的计算设备中的CPU向其发送的指令,数据处理指令可以是数据写入指令或者数据读取指令,还可包括将数据读取出来后进行业务处理的其他指令,比如加速器将数据从第一存储器中读取出来后,可以对数据进行更新、删除、合并等操作,还可以对读取到的多个数据进行计算处理,比如矩阵乘法、卷积操作等等,具体可根据加速器的处理业务决定,本申请对数据后续进行处理的具体流程具体限定。Optionally, the data processing instructions may be instructions generated by the accelerator, or instructions sent to it by the CPU in the computing device where the accelerator is located. The data processing instructions may be data writing instructions or data reading instructions, and may also include Other instructions for business processing after the data is read out. For example, after the accelerator reads the data from the first memory, it can update, delete, merge and other operations on the data, and can also perform calculations on the multiple data read. Processing, such as matrix multiplication, convolution operations, etc., can be determined according to the processing business of the accelerator. This application specifically limits the specific process of subsequent data processing.
在具体实现中,加速器获取到数据写入指令时,该数据写入指令中的写入地址是数据的第二存储地址,若地址存储器中已存储有上述第二存储地址且与第一存储地址对应,即表示该数据的历史版本已写入至第一存储器的第一存储地址,此时可以将第一存储地址对应的历史版本的数据进行更新即可,并向第二存储器发送该数据以请求对第二存储地址对应的数据进行更新。In a specific implementation, when the accelerator obtains a data write instruction, the write address in the data write instruction is the second storage address of the data. If the above-mentioned second storage address is already stored in the address memory and is the same as the first storage address, Correspondence means that the historical version of the data has been written to the first storage address of the first memory. At this time, the historical version of data corresponding to the first storage address can be updated and the data is sent to the second memory to Request to update the data corresponding to the second storage address.
若地址存储器中没有上述第二存储地址,即表示该数据是首次写入第一存储器,加速器可以先确定第二存储地址对应的第一存储地址,然后将数据写入指令携带的数据存储于第一存储器的第一存储地址中,并将数据的第一存储地址以及第二存储地址之间的对应关系存储于地址存储器。需要说明的,存储控制器在首次写入数据时,可以根据当前第一存储器的空闲地址确定该数据的第一存储地址,本申请不对存储控制器分配地址的策略进行限定。If there is no above-mentioned second storage address in the address memory, it means that the data is written to the first memory for the first time. The accelerator can first determine the first storage address corresponding to the second storage address, and then store the data carried by the data write instruction in the first memory. In the first storage address of a memory, the corresponding relationship between the first storage address and the second storage address of the data is stored in the address memory. It should be noted that when the storage controller writes data for the first time, it can determine the first storage address of the data based on the current free address of the first memory. This application does not limit the address allocation strategy of the storage controller.
同理,存储控制器在获取到数据读取指令时,该数据读取指令中的读取地址是数据的第二存储地址,此时存储控制器可以将该第二存储地址与地址存储器中的地址进行匹配,若地址存储器中包括该第二存储地址,此时可以根据数据的第二存储地址获取对应的第一存储地址,然后从第一存储器中读取该数据。Similarly, when the storage controller obtains a data read instruction, the read address in the data read instruction is the second storage address of the data. At this time, the storage controller can compare the second storage address with the address in the address memory. The address is matched. If the address memory includes the second storage address, the corresponding first storage address can be obtained according to the second storage address of the data, and then the data can be read from the first memory.
若地址存储器中没有存储该第二存储地址,即表示该数据没有存储于第一存储器中,那么加速器可以根据该第二存储地址从第二存储器读取该数据。同时,加速器还可以确定该数据的第一存储地址,将从第二存储器读取到的该数据存储至第一存储器中,以供加速器后续可以直接从读写效率更高的第一存储器中读取该数据,并将数据的第一存储地址和第二存储地址之间的映射关系存储于该地址存储器中。 If the second storage address is not stored in the address memory, it means that the data is not stored in the first memory, and then the accelerator can read the data from the second memory according to the second storage address. At the same time, the accelerator can also determine the first storage address of the data, and store the data read from the second memory into the first memory, so that the accelerator can subsequently read directly from the first memory with higher reading and writing efficiency. The data is obtained, and the mapping relationship between the first storage address and the second storage address of the data is stored in the address memory.
上述实现方式,通过地址存储器来记录第一存储地址和第二存储地址之间的对应关系,从而实现待缓存数据的寻址,这样接收到携带有第二存储地址的数据处理指令,也可以个根据地址存储器中记录的地址,从第一存储器中读取到该数据,使得加速器可以直接与读写速度更快的第一存储器进行交互,从而提高加速器的数据读写效率。The above implementation method uses the address memory to record the correspondence between the first storage address and the second storage address, thereby realizing the addressing of the data to be cached. In this way, receiving the data processing instruction carrying the second storage address can also be done individually. According to the address recorded in the address memory, the data is read from the first memory, so that the accelerator can directly interact with the first memory with faster reading and writing speed, thereby improving the data reading and writing efficiency of the accelerator.
在一可能的实现方式中,加速器包括状态信息存储器,状态信息存储器用于存储数据的状态信息,状态信息包括第一状态和第二状态,其中,该状态信息包括第一状态和第二状态,第一状态用于指示该数据正处于修改状态,正在被写入至第一存储器或者第二存储器,此时数据不可以从第一存储器读取,否则可能会出现读取到错误版本数据的情况;第二状态用于指示该数据没有处于修改状态,此时可以从第一存储器中读取该数据。In a possible implementation, the accelerator includes a state information memory, the state information memory is used to store state information of data, the state information includes a first state and a second state, where the state information includes a first state and a second state, The first state is used to indicate that the data is in a modified state and is being written to the first memory or the second memory. At this time, the data cannot be read from the first memory, otherwise the wrong version of data may be read. ; The second state is used to indicate that the data is not in a modified state, and the data can be read from the first memory at this time.
具体实现中,加速器在处理数据写入指令时,若数据的状态信息为第一状态的情况下,也就是说,数据当前正在被修改,加速器可以先将数据的状态信息设置为第二状态,然后将数据写入第一存储地址,再将该数据写入第二存储器进行数据更新。In specific implementation, when the accelerator processes the data write instruction, if the status information of the data is in the first state, that is, the data is currently being modified, the accelerator can first set the status information of the data to the second state, Then the data is written into the first storage address, and then the data is written into the second memory for data updating.
可选地,若数据的状态信息为第二状态,也就是说,数据当前没有被操作,此时可以将数据处理指令中的数据写入至第一存储器进行数据更新,将该数据写入第二存储器进行数据更新。Optionally, if the status information of the data is in the second state, that is to say, the data is not currently being manipulated, at this time, the data in the data processing instruction can be written to the first memory for data update, and the data can be written into the third memory. The second memory performs data updates.
具体实现中,加速器在处理数据读取指令时,若地址存储器不包括读取地址的情况下,也就是说,第一存储器中没有该数据,加速器从第二存储器读取数据时,加速器将数据的状态信息设置为第一状态,加速器从第二存储器读取数据,数据取回后在数据的状态信息为第一状态的情况下,加速器将数据的状态信息设置为第二状态,将数据存储至第一存储器的第一存储地址。如果状态信息变为第二状态,则表示存储控制器将该数据的最新版本已进行了更新,此时可以不再将数据存储于第一存储器。应理解,如果数据读取过程中,新版本的数据被写入了第一存储器和第二存储器,通过状态信息可以确保数据是最新版本的数据。In the specific implementation, when the accelerator processes the data read instruction, if the address memory does not include the read address, that is, the data does not exist in the first memory, and when the accelerator reads the data from the second memory, the accelerator will The state information of the data is set to the first state, and the accelerator reads the data from the second memory. After the data is retrieved, when the state information of the data is the first state, the accelerator sets the state information of the data to the second state and stores the data. to the first storage address of the first memory. If the status information changes to the second status, it means that the storage controller has updated the latest version of the data, and the data can no longer be stored in the first memory at this time. It should be understood that if a new version of data is written into the first memory and the second memory during the data reading process, the status information can ensure that the data is the latest version of the data.
可选地,若地址存储器中包括读取地址,且状态信息为第一状态,也就是数据之前的版本正在被操作中,此时可以从第二存储器中读取该数据的最新版本。若地址存储器中包括读取地址,且状态信息为第二状态,也就是数据没有被操作中,此时可以从第一存储器中读取该数据的最新版本。Optionally, if the address memory includes a read address and the status information is the first state, that is, the previous version of the data is being operated on, the latest version of the data can be read from the second memory. If the address memory includes a read address and the status information is in the second state, that is, the data is not being operated on, the latest version of the data can be read from the first memory.
具体实现中,上述状态信息可以通过二进制字符来表示,比如第一状态用字符“1”进行表示,第二状态用字符“0”进行表示,还可以通过其他标识来区分第一状态和第二状态,本申请不对此进行具体限定。In specific implementation, the above state information can be represented by binary characters. For example, the first state is represented by the character "1", and the second state is represented by the character "0". The first state and the second state can also be distinguished by other identifiers. status, this application does not specifically limit this.
上述实现方式,在数据的状态信息为第一状态时,表示该数据正在被写入至第一存储器或者第二存储器,此时数据不可以从第一存储器读取,否则可能会出现读取到错误版本数据的情况;数据的状态信息为第二状态时,表示该数据没有处于修改状态,此时可以从第一存储器中读取该数据,通过状态信息存储器记录数据的状态信息,确保用户不会读取到错误版本的数据,确保第一存储器中记录的数据是最新版本的数据。In the above implementation, when the status information of the data is in the first state, it means that the data is being written to the first memory or the second memory. At this time, the data cannot be read from the first memory, otherwise the read may occur. In the case of incorrect version data; when the status information of the data is in the second state, it means that the data is not in a modified state. At this time, the data can be read from the first memory, and the status information of the data is recorded through the status information memory to ensure that the user does not The wrong version of data will be read. Make sure that the data recorded in the first memory is the latest version.
第二方面,提供了一种数据处理的装置,该数据处理的装置包括第一存储器,数据处理的装置包括:获取单元,用于获取数据处理指令,其中,数据处理指令包括数据在第二存储器中的地址,第一存储器的数据读写效率大于第二存储器的数据读写效率;处理单元,用于根据数据处理指令从第一存储器读取数据,对数据进行处理,其中,第一存储器用于存储第二存储器中的待缓存数据,待缓存数据包括被数据处理的装置历史访问的数据。In a second aspect, a data processing device is provided. The data processing device includes a first memory. The data processing device includes: an acquisition unit for acquiring data processing instructions, wherein the data processing instructions include data in the second memory. The address in For storing the data to be cached in the second memory, the data to be cached includes data historically accessed by the data processing device.
实施第二方面描述的方法,加速器将第二存储器中的待缓存数据存储于第一存储器,其中第一存储器的读写效率大于第二存储器的读写效率,使得加速器在对数据处理指令进行处 理时,可以直接与读写效率较高的第一存储器进行交互,从而提高加速器的数据读写效率。To implement the method described in the second aspect, the accelerator stores the data to be cached in the second memory in the first memory, where the read and write efficiency of the first memory is greater than the read and write efficiency of the second memory, so that the accelerator processes the data processing instructions. When processing, it can directly interact with the first memory with higher reading and writing efficiency, thereby improving the data reading and writing efficiency of the accelerator.
在一可能的实现方式中,数据处理的装置包括状态信息存储器,状态信息存储器用于存储数据的状态信息,状态信息包括第一状态和第二状态,其中,第一状态用于指示数据处于修改状态,第二状态用于指示数据未处于修改状态。In a possible implementation, the data processing device includes a status information memory. The status information memory is used to store status information of the data. The status information includes a first state and a second state, where the first state is used to indicate that the data is in modification. status, the second status is used to indicate that the data is not in a modified state.
在一可能的实现方式中,数据处理的装置包括地址存储器,地址存储器用于存储数据在第一存储器中的地址与数据在第二存储器中的地址之间的对应关系。In a possible implementation, the data processing device includes an address memory, and the address memory is used to store a correspondence between the address of the data in the first memory and the address of the data in the second memory.
在一可能的实现方式中,数据处理指令包括数据写入指令;处理单元,用于在数据的状态信息为第一状态、数据处理指令为数据写入指令的情况下,将数据的状态信息设置为第二状态,根据数据处理指令从第一存储器读取并更新数据,将更新后的数据存储于第一存储器。In a possible implementation, the data processing instruction includes a data writing instruction; a processing unit configured to set the status information of the data when the status information of the data is the first status and the data processing instruction is the data writing instruction. In the second state, data is read from the first memory and updated according to the data processing instructions, and the updated data is stored in the first memory.
在一可能的实现方式中,数据处理指令包括数据读取指令;处理单元,用于在数据处理的装置获取数据处理指令之前,确定地址存储器中不包括数据处理指令中的地址;处理单元,用于将数据的状态信息设置为第一状态,从第二存储器读取数据;处理单元,用于在数据的状态信息为第一状态的情况下,数据处理的装置将数据的状态信息设置为第二状态,将数据存储至第一存储器的第一存储地址。In a possible implementation, the data processing instructions include data reading instructions; the processing unit is used to determine that the address in the data processing instruction is not included in the address memory before the data processing device obtains the data processing instructions; the processing unit is configured to for setting the status information of the data to the first state and reading the data from the second memory; and the processing unit for setting the status information of the data to the first state when the status information of the data is the first status. In the second state, data is stored in the first storage address of the first memory.
在一可能的实现方式中,装置还包括配置单元,配置单元用于对第一存储器进行配置,获得第一存储器的配置信息,配置信息包括缓存开关状态、缓存地址范围、缓存容量中的一种或者多种,其中,缓存开关状态用于指示数据处理的装置是否使用第一存储器,缓存地址范围用于指示数据处理的装置将存储地址处于缓存地址范围的数据存储于第一存储器,缓存容量用于指示第一存储器的容量。In a possible implementation, the device further includes a configuration unit configured to configure the first memory and obtain configuration information of the first memory. The configuration information includes one of cache switch status, cache address range, and cache capacity. Or more, wherein the cache switch state is used to indicate whether the data processing device uses the first memory, the cache address range is used to indicate the data processing device to store data with a storage address within the cache address range in the first memory, and the cache capacity is used to indicate whether the data processing device uses the first memory. To indicate the capacity of the first memory.
在一可能的实现方式中,数据处理的装置通过ASIC或者FPGA技术实现,第一存储器包括SRAM、寄存器、SCM、CAM中的一种或者多种。In a possible implementation, the data processing device is implemented through ASIC or FPGA technology, and the first memory includes one or more of SRAM, register, SCM, and CAM.
在一可能的实现方式中,待缓存数据包括访问频率高于第一阈值的数据。In a possible implementation, the data to be cached includes data with an access frequency higher than the first threshold.
第三方面,提供了一种加速器,该加速器包括处理器和供电电路,供电电路用于为处理器供电,处理器用于实现第一方面描述的加速器所执行操作步骤的功能。In a third aspect, an accelerator is provided. The accelerator includes a processor and a power supply circuit. The power supply circuit is used to supply power to the processor. The processor is used to implement the functions of the operation steps performed by the accelerator described in the first aspect.
第四方面,提供了一种计算设备,该计算设备包括CPU和加速器,CPU用于运行指令实现计算设备的业务功能,加速器用于实现如第一方面描述的加速器所执行操作步骤的功能。In a fourth aspect, a computing device is provided. The computing device includes a CPU and an accelerator. The CPU is used to run instructions to implement business functions of the computing device. The accelerator is used to implement the functions of the operation steps performed by the accelerator described in the first aspect.
本申请在上述各方面提供的实现方式的基础上,还可以进行进一步组合以提供更多实现方式。Based on the implementation methods provided in the above aspects, this application can also be further combined to provide more implementation methods.
附图说明Description of the drawings
图1是本申请提供的一种计算设备的结构示意图;Figure 1 is a schematic structural diagram of a computing device provided by this application;
图2是本申请提供的另一种计算设备的结构示意图;Figure 2 is a schematic structural diagram of another computing device provided by this application;
图3是本申请提供的一种加速器的结构示意图;Figure 3 is a schematic structural diagram of an accelerator provided by this application;
图4是本申请提供的一种数据处理装置的结构示意图;Figure 4 is a schematic structural diagram of a data processing device provided by this application;
图5是本申请提供的一种数据处理方法在一应用场景下的步骤流程示意图;Figure 5 is a schematic flowchart of steps in an application scenario of a data processing method provided by this application;
图6是本申请提供的一种数据处理方法中处理数据读取指令的步骤流程示意图;Figure 6 is a schematic flowchart of steps for processing data reading instructions in a data processing method provided by this application;
图7是本申请提供的一种数据处理方法中处理数据写入指令的步骤流程示意图。FIG. 7 is a schematic flowchart of steps for processing data writing instructions in a data processing method provided by this application.
具体实施方式Detailed ways
下面将结合本发明实施例中的附图,对本发明实施例中的技术方案进行描述,显然,所描述的实施例仅仅是本发明一部分实施例,而不是全部的实施例。基于本发明中的实施例, 本领域普通技术人员在没有做出创造性劳动前提下所获得的所有其他实施例,都属于本发明保护的范围。The technical solutions in the embodiments of the present invention will be described below with reference to the accompanying drawings in the embodiments of the present invention. Obviously, the described embodiments are only some, not all, of the embodiments of the present invention. Based on the embodiments of the present invention, All other embodiments obtained by those of ordinary skill in the art without creative efforts fall within the scope of protection of the present invention.
首先,对本申请涉及的应用场景进行说明。First, the application scenarios involved in this application are described.
为了解决加速器与内存进行数据读写时,需要额外等待一定时间周期,导致加速器的处理性能受到内存带宽的限制的问题,本申请提供了一种计算设备,该计算设备中的加速器包括第一存储器,该加速器通过将第二存储器中的待缓存数据存储于第一存储器,其中第一存储器的读写效率大于第二存储器的读写效率,使得加速器在对数据处理指令进行处理时,可以直接与读写效率较高的第一存储器进行交互,从而提高加速器的数据读写效率。In order to solve the problem that when the accelerator reads and writes data with the memory, it needs to wait for an additional period of time, causing the processing performance of the accelerator to be limited by the memory bandwidth. This application provides a computing device. The accelerator in the computing device includes a first memory. , the accelerator stores the data to be cached in the second memory in the first memory, where the read and write efficiency of the first memory is greater than the read and write efficiency of the second memory, so that when the accelerator processes the data processing instructions, it can directly communicate with The first memory with higher reading and writing efficiency interacts with each other, thereby improving the data reading and writing efficiency of the accelerator.
下面结合附图详细介绍本申请所提供的计算设备。图1是本申请提供的一种计算设备的结构示意图,如图1所示,该计算设备100可包括处理器110、加速器120、第二存储器130、通信接口140和存储介质150。其中,处理器110、加速器120、第二存储器130、通信接口140和存储介质150之间可通过总线建立通信连接。处理器110、加速器120、第二存储器130、通信接口140和存储介质150的数量可以是一个或者多个,本申请不作具体限定。The computing device provided by this application will be introduced in detail below with reference to the accompanying drawings. Figure 1 is a schematic structural diagram of a computing device provided by this application. As shown in Figure 1, the computing device 100 may include a processor 110, an accelerator 120, a second memory 130, a communication interface 140 and a storage medium 150. Among them, communication connections can be established between the processor 110, the accelerator 120, the second memory 130, the communication interface 140 and the storage medium 150 through a bus. The number of the processor 110, the accelerator 120, the second memory 130, the communication interface 140 and the storage medium 150 may be one or more, and is not specifically limited in this application.
处理器110和加速器120可以是硬件加速器或者硬件加速器的组合。上述硬件加速器是专用集成电路(application-specific integrated circuit,ASIC)、编程逻辑器件(programmable logic device,PLD)或其组合。上述PLD是复杂编程逻辑器件(complex programmable logic device,CPLD)、现场编程逻辑门阵列(field-programmable gate array,FPGA)、通用阵列逻辑(generic array logic,GAL)或其任意组合。其中,处理器110用于执行存储介质150内的指令实现计算设备100的业务功能。The processor 110 and the accelerator 120 may be hardware accelerators or a combination of hardware accelerators. The above-mentioned hardware accelerator is an application-specific integrated circuit (ASIC), a programmable logic device (PLD), or a combination thereof. The above-mentioned PLD is a complex programmable logic device (CPLD), a field-programmable gate array (FPGA), a general array logic (GAL), or any combination thereof. The processor 110 is used to execute instructions in the storage medium 150 to implement the business functions of the computing device 100 .
具体的,处理器110可以是中央处理器(central processing unit,CPU),加速器120(也可称为加速处理单元(accelerated processing unit,APU))可以是通过FPGA、ASIC等技术实现的系统级芯片,加速器120是计算设备100内用于辅助CPU处理特殊类型计算任务的处理单元,上述特殊类型计算任务可以是图形处理、矢量计算、机器学习等等。加速器120可以是图像处理器(graphics processing unit,GPU)、处理器分散处理单元(data processing unit,DPU)、神经网络处理器(neural-network processing unit,NPU)等。Specifically, the processor 110 can be a central processing unit (CPU), and the accelerator 120 (also called an accelerated processing unit (APU)) can be a system-level chip implemented through FPGA, ASIC and other technologies. , the accelerator 120 is a processing unit in the computing device 100 for assisting the CPU in processing special types of computing tasks. The special types of computing tasks may be graphics processing, vector calculations, machine learning, etc. The accelerator 120 may be a graphics processor (graphics processing unit, GPU), a processor distributed processing unit (data processing unit, DPU), a neural network processor (neural-network processing unit, NPU), etc.
可选地,加速器120也可以是CPU,换句话说,计算设备100可包括多个处理器比如CPU1和CPU2,其中CPU1为处理器110,CPU2为加速器120,应理解,上述举例用于说明,本申请不作具体限定。Optionally, the accelerator 120 may also be a CPU. In other words, the computing device 100 may include multiple processors such as CPU1 and CPU2, where CPU1 is the processor 110 and CPU2 is the accelerator 120. It should be understood that the above examples are for illustration. There are no specific limitations in this application.
存储介质150是存储数据的载体,比如硬盘(hard disk)、U盘(universal serial bus,USB)、闪存(flash)、SD卡(secure digital memory Card,SD card)、记忆棒等等,硬盘可以是硬盘驱动器(hard disk drive,HDD)、固态硬盘(solid state disk,SSD)、机械硬盘(mechanical hard disk,HDD)等,本申请不作具体限定。存储介质150可包括第二存储器,具体实现中,该第二存储器可以是DDR。The storage medium 150 is a carrier for storing data, such as hard disk (hard disk), U disk (universal serial bus, USB), flash memory (flash), SD card (secure digital memory Card, SD card), memory stick, etc. The hard disk can It is a hard disk drive (HDD), a solid state disk (SSD), a mechanical hard disk (HDD), etc., and is not specifically limited in this application. The storage medium 150 may include a second memory, and in a specific implementation, the second memory may be DDR.
通信接口140为有线接口(例如以太网接口),为内部接口(例如高速串行计算机扩展总线(Peripheral Component Interconnect express,PCIe)总线接口)、有线接口(例如以太网接口)或无线接口(例如蜂窝网络接口或使用无线局域网接口),用于与其他服务器或单元进行通信。The communication interface 140 is a wired interface (such as an Ethernet interface), an internal interface (such as a high-speed serial computer expansion bus (Peripheral Component Interconnect express, PCIe) bus interface), a wired interface (such as an Ethernet interface), or a wireless interface (such as a cellular interface). network interface or using the wireless LAN interface) for communicating with other servers or units.
总线160是快捷外围部件互联标准(Peripheral Component Interconnect Express,PCIe)总线,或扩展工业标准结构(extended industry standard architecture,EISA)总线、统一总线(unified bus,Ubus或UB)、计算机快速链接(compute express link,CXL)、缓存一致互联协议(cache coherent interconnect for accelerators,CCIX)等。总线160分为地址总线、数据总线、控制总线等,为了清楚说明起见,在图中将各种总线都标为总线160。 Bus 160 is a peripheral component interconnect express (PCIe) bus, an extended industry standard architecture (EISA) bus, a unified bus (unified bus, Ubus or UB), or a computer express link (compute express) link (CXL), cache coherent interconnect for accelerators (CCIX), etc. The bus 160 is divided into an address bus, a data bus, a control bus, etc. For the sake of clear explanation, various buses are marked as the bus 160 in the figure.
第二存储器130包括易失性存储器或非易失性存储器,或包括易失性和非易失性存储器两者。其中,易失性存储器是随机存取存储器(random access memory,RAM)。例如双倍速率同步动态随机存储器(double data rate RAM,DDR)、动态随机存取存储器(DRAM)、同步动态随机存取存储器(synchronous DRAM,SDRAM)、双倍数据速率同步动态随机存取存储器(double data date SDRAM,DDR SDRAM)、增强型同步动态随机存取存储器(enhanced SDRAM,ESDRAM)、同步连接动态随机存取存储器(synchlink DRAM,SLDRAM)和直接内存总线随机存取存储器(direct rambus RAM,DR RAM)。The second memory 130 includes volatile memory or non-volatile memory, or both volatile and non-volatile memory. Among them, volatile memory is random access memory (RAM). For example, double rate synchronous dynamic random access memory (double data rate RAM, DDR), dynamic random access memory (DRAM), synchronous dynamic random access memory (synchronous DRAM, SDRAM), double data rate synchronous dynamic random access memory (synchronous dynamic random access memory). double data date SDRAM, DDR SDRAM), enhanced synchronous dynamic random access memory (enhanced SDRAM, ESDRAM), synchronous link dynamic random access memory (synchlink DRAM, SLDRAM) and direct memory bus random access memory (direct rambus RAM, DR RAM).
进一步的,加速器120还可包括第一存储器121,其中,第一存储器121的数据读写效率大于第二存储器130的数据读写效率。具体实现中,第一存储器121可包括静态随机存储器(static RAM,SRAM)、存储级内存(storage class memory,SCM)、寄存器(register)、内容可寻址存储器(content-addressable memory,CAM)等等,本申请不作具体限定。Further, the accelerator 120 may also include a first memory 121, where the data reading and writing efficiency of the first memory 121 is greater than the data reading and writing efficiency of the second memory 130. In specific implementation, the first memory 121 may include static random access memory (static RAM, SRAM), storage class memory (storage class memory, SCM), register, content-addressable memory (content-addressable memory, CAM), etc. etc., this application does not make specific limitations.
在本申请实施例中,存储于第二存储器130中的待缓存数据可以被缓存于第一存储器121中,这样,当加速器120需要访问第二存储器130中的数据时,可以从读写效率更高的第一存储器121中读取该数据,从而弥补加速器120和低性能存储器之间处理速度的差异,提高加速器120的数据读写效率。In the embodiment of the present application, the data to be cached stored in the second memory 130 can be cached in the first memory 121. In this way, when the accelerator 120 needs to access the data in the second memory 130, the read and write efficiency can be improved. The data is read from the high-performance first memory 121, thereby making up for the difference in processing speed between the accelerator 120 and the low-performance memory, and improving the data reading and writing efficiency of the accelerator 120.
需要说明的,图1中的第二存储器130可以是计算设备100内部原有的存储器,第一存储器121可以是加速器120内部原有的存储器,本申请通过使用计算设备100和加速器120内部原有的硬件存储,通过算法提高加速器120的数据读写效率,无需额外部署缓存硬件,降低方案实施成本,尤其对于硬件规格尺寸小的加速器120来说,方案的可实施性更强。It should be noted that the second memory 130 in Figure 1 can be the original memory inside the computing device 100, and the first memory 121 can be the original memory inside the accelerator 120. This application uses the original memory inside the computing device 100 and the accelerator 120. Hardware storage, the algorithm improves the data reading and writing efficiency of the accelerator 120, without the need to deploy additional cache hardware, reducing the implementation cost of the solution. Especially for the accelerator 120 with small hardware specifications, the solution is more implementable.
应理解,图1是本申请的一种示例性的划分方式,可选地,如图2所示,第二存储器130也可部署于加速器120的内部,换句话说,加速器120内部本身就包括至少2个存储器,其中一个存储器的读写效率高于第二个存储器的读写效率,那么可以将加速器120内部读写效率低的存储器作为第二存储器130,读写效率高的存储器作为第一存储器121。It should be understood that Figure 1 is an exemplary division method of the present application. Optionally, as shown in Figure 2, the second memory 130 can also be deployed inside the accelerator 120. In other words, the accelerator 120 itself includes At least two memories, one of which has a higher reading and writing efficiency than the second memory, then the memory with low reading and writing efficiency inside the accelerator 120 can be used as the second memory 130, and the memory with high reading and writing efficiency can be used as the first memory. Memory 121.
需要说明的,图1中的第二存储器130是加速器120外部的存储器时,第一存储器121和第二存储器130之间的通信可以是片外通信,第一存储器121和第二存储器130之间的总线可以是片外总线,这里的片外总线泛指CPU与外部器件之间的公共信息通道,比如上述PCIe总线、EISA总线、UB总线、CXL总线、CCIX总线、GenZ总线等等,本申请不作具体限定。It should be noted that when the second memory 130 in Figure 1 is a memory external to the accelerator 120, the communication between the first memory 121 and the second memory 130 may be off-chip communication. The bus can be an off-chip bus. The off-chip bus here generally refers to the public information channel between the CPU and external devices, such as the above-mentioned PCIe bus, EISA bus, UB bus, CXL bus, CCIX bus, GenZ bus, etc., this application No specific limitation is made.
图2中的第二存储器130是加速器120内部的存储器时,第一存储器121和第二存储器130之间的通信可以是带内通信,第一存储器121和第二存储器130之间的总线可以是片内总线,比如高级可扩展接口(advanced eXtensible interface,AXI)总线、高级微控制器总线架构(advanced microcontroller bus architecture,AMBA)等,本申请不作具体限定。When the second memory 130 in Figure 2 is a memory inside the accelerator 120, the communication between the first memory 121 and the second memory 130 may be in-band communication, and the bus between the first memory 121 and the second memory 130 may be On-chip buses, such as advanced eXtensible interface (AXI) bus, advanced microcontroller bus architecture (AMBA), etc., are not specifically limited in this application.
同理,图2所示的场景中,存储于第二存储器130中的待缓存数据可以被缓存于第一存储器121中,这样,当加速器120需要访问第二存储器130中的数据时,可以从读写效率更高的第一存储器121中读取该数据,从而弥补加速器120和低性能存储器之间处理速度的差异,提高加速器120的数据读写效率。Similarly, in the scenario shown in Figure 2, the data to be cached stored in the second memory 130 can be cached in the first memory 121. In this way, when the accelerator 120 needs to access the data in the second memory 130, it can be accessed from The data is read from the first memory 121 with higher reading and writing efficiency, thereby making up for the difference in processing speed between the accelerator 120 and the low-performance memory, and improving the data reading and writing efficiency of the accelerator 120 .
可以理解的,图2中的第一存储器121和第二存储器130是加速器120内部原有的存储器,通过使用加速器120内部原有的硬件存储器,通过算法提高加速器120的数据读写效率,无需额外部署缓存硬件,降低方案实施成本,尤其对于硬件规格尺寸小的加速器120来说,方案的可实施性更强。It can be understood that the first memory 121 and the second memory 130 in Figure 2 are the original memories inside the accelerator 120. By using the original hardware memory inside the accelerator 120, the data reading and writing efficiency of the accelerator 120 is improved through algorithms, without additional Deploy caching hardware to reduce the implementation cost of the solution. Especially for the accelerator 120 with small hardware specifications, the solution is more implementable.
进一步地,加速器120可进一步划分为多个单元模块,图3是本申请提供的一种加速器 的结构示意图,图3是一种示例性划分方式,如图3所示,加速器120可包括存储控制器122、第一存储器121、状态信息存储器123以及地址存储器124,其中,存储控制器122、第一存储器121、状态信息存储器123以及地址存储器124之间通过内部总线建立通信连接,该内部总线可参考总线160的描述,这里不重复赘述。Further, the accelerator 120 can be further divided into multiple unit modules. Figure 3 is an accelerator provided by this application. A schematic structural diagram of , Figure 3 is an exemplary division method. As shown in Figure 3, the accelerator 120 may include a storage controller 122, a first memory 121, a status information memory 123 and an address memory 124, where the storage controller 122, A communication connection is established between the first memory 121, the status information memory 123 and the address memory 124 through an internal bus. For the internal bus, reference can be made to the description of the bus 160, which will not be repeated here.
并且,加速器120还可以包括供电电路,该供电电路可为该存储控制器122供电。该存储控制器122可以由硬件逻辑电路实现,例如,由专用集成电路ASIC实现加速器120的各种功能。其中,该供电电路可以与该存储控制器122位于同一个加速器内,也可以位于该存储控制器122所在的加速器之外的另一个加速器内。该供电电路包括但不限于如下至少一个:供电子系统、电管管理加速器、功耗管理处理器或功耗管理控制电路。可选地,加速器120为独立加速器。In addition, the accelerator 120 may also include a power supply circuit, and the power supply circuit may provide power to the memory controller 122 . The memory controller 122 can be implemented by a hardware logic circuit, for example, an application specific integrated circuit (ASIC) to implement various functions of the accelerator 120 . The power supply circuit may be located in the same accelerator as the storage controller 122, or may be located in another accelerator other than the accelerator where the storage controller 122 is located. The power supply circuit includes but is not limited to at least one of the following: a power supply subsystem, a power management accelerator, a power management processor, or a power management control circuit. Optionally, accelerator 120 is an independent accelerator.
第一存储器121用于存储数据,状态信息存储器123用于存储数据的状态信息,地址存储器124用于存储数据的地址信息,其中,第一存储器121可以是读写效率大于第二存储器130的存储器,比如SRAM;状态信息所需的存储空间较小,但是需要与数据的状态保持同步,因此状态信息存储器123可以是寄存器(register);地址存储器124可以是CAM,应理解,CAM是以内容进行寻址的存储器,其工作机制是将一个输入数据项与存储在CAM中的所有数据项进行比较,判别该输入数据项与CAM中存储的数据项是否相匹配,因此用于存储数据的地址信息的地址存储器124可使用加速器120内的CAM实现,这样,用户请求访问数据时,CAM可以将用户请求的数据的地址与CAM中存储的地址信息进行匹配,若匹配,则表示该数据已存储于第一存储器121中,应理解,上述举例用于说明,地址存储器124也可以使用其他存储器实现,本申请不作具体限定。The first memory 121 is used to store data, the status information memory 123 is used to store the status information of the data, and the address memory 124 is used to store the address information of the data. The first memory 121 may be a memory with a greater reading and writing efficiency than the second memory 130 , such as SRAM; the storage space required for status information is small, but it needs to be synchronized with the status of the data, so the status information memory 123 can be a register (register); the address memory 124 can be a CAM. It should be understood that the CAM is based on content The working mechanism of the addressed memory is to compare an input data item with all data items stored in the CAM to determine whether the input data item matches the data items stored in the CAM, so the address information used to store the data The address memory 124 can be implemented using the CAM in the accelerator 120. In this way, when the user requests to access data, the CAM can match the address of the data requested by the user with the address information stored in the CAM. If it matches, it means that the data has been stored in In the first memory 121, it should be understood that the above examples are for illustration, and the address memory 124 can also be implemented using other memories, which is not specifically limited in this application.
可以理解的,本申请中的第一存储器121使用加速器120内的存储器实现,不需要额外部署缓存,以很低的硬件成本实现了FPGA、ASIC等加速器的缓存功能,并且软件实现过程只需要通过FPGA和ASIC的在线编程功能即可实现,可以简单、高效、低成本的解决加速器的读写效率受限于内存带宽瓶颈的问题。It can be understood that the first memory 121 in this application is implemented using the memory in the accelerator 120, and does not require additional cache deployment. The cache function of accelerators such as FPGA and ASIC is realized with very low hardware cost, and the software implementation process only needs to pass The online programming function of FPGA and ASIC can be realized, which can solve the problem of accelerator's read and write efficiency limited by the memory bandwidth bottleneck in a simple, efficient and low-cost way.
在本申请实施例中,存储控制器122可获取数据处理指令,其中,该数据处理指令包括第二存储器130的地址,根据该第二存储器130的地址,从第一存储器中读取该数据,然后根据数据处理指令对该数据进行处理。In this embodiment of the present application, the storage controller 122 can obtain a data processing instruction, wherein the data processing instruction includes the address of the second memory 130, and reads the data from the first memory according to the address of the second memory 130, The data is then processed according to the data processing instructions.
需要说明的,上述数据处理指令可以是加速器120在处理业务过程中生成的数据处理指令,还可以是处理器110向加速器120发送的数据处理指令,本申请不作具体限定。It should be noted that the above data processing instructions may be data processing instructions generated by the accelerator 120 during business processing, or may be data processing instructions sent by the processor 110 to the accelerator 120, which is not specifically limited in this application.
具体实现中,存储控制器122可以将数据存储于第二存储器130,然后将第二存储器130中的待缓存数据存储于上述第一存储器121。存储控制器122可实时将待缓存数据更新至第一存储器121,也可延时或根据某种算法将所述待缓存数据更新至第一存储器121,本申请不作具体限定。其中,待缓存数据包括被加速器120历史访问的数据。这样,当加速器120再次访问该数据时,可以直接与读写速度更快的第一存储器121进行数据交互,从而提高加速器120的数据读写效率。In specific implementation, the storage controller 122 may store data in the second memory 130, and then store the data to be cached in the second memory 130 in the first memory 121. The storage controller 122 can update the data to be cached to the first memory 121 in real time, or can update the data to be cached to the first memory 121 with a delay or according to a certain algorithm, which is not specifically limited in this application. The data to be cached includes data historically accessed by the accelerator 120 . In this way, when the accelerator 120 accesses the data again, it can directly interact with the first memory 121 that has a faster reading and writing speed, thereby improving the data reading and writing efficiency of the accelerator 120 .
可选地,待缓存数据可以包括历史访问频率高于第一阈值的数据。也就是说,第一存储器121中存储的数据是第二存储器130中存储的访问频率较高的数据,由于第一存储器的读写效率较快,这样加速器120处理频繁访问的数据时可以直接与第一存储器121交互,从而提高加速器120的读写效率。其中,第一阈值的大小可根据具体的应用场景来决定,本申请不作具体限定。Optionally, the data to be cached may include data whose historical access frequency is higher than the first threshold. That is to say, the data stored in the first memory 121 is the data with a higher access frequency stored in the second memory 130. Since the first memory has a faster reading and writing efficiency, the accelerator 120 can directly process the frequently accessed data. The first memory 121 interacts, thereby improving the reading and writing efficiency of the accelerator 120 . The size of the first threshold can be determined according to specific application scenarios, and is not specifically limited in this application.
可选地,当第一存储器121中的数据达到存储阈值时,可以将第一存储器121中访问频 率不高于第二阈值的数据删除,然后继续将加速器120访问第二存储器130的数据存储于第一存储器121,再当第一存储器121中的数据达到存储阈值时,将第一存储器121中访问频率不高于第二阈值的数据删除,使得第一存储器121中存储的数据是访问频率较高并且被加速器120访问的数据。其中,第一阈值和第二阈值可以是相同的数值,或者,第二阈值大于第一阈值,本申请不作具体限定。Optionally, when the data in the first memory 121 reaches a storage threshold, the access frequency in the first memory 121 can be The data whose rate is not higher than the second threshold is deleted, and then the data accessed by the accelerator 120 from the second memory 130 is continued to be stored in the first memory 121. When the data in the first memory 121 reaches the storage threshold, the data in the first memory 121 is deleted. Data whose access frequency is not higher than the second threshold is deleted, so that the data stored in the first memory 121 is data whose access frequency is higher and is accessed by the accelerator 120 . The first threshold and the second threshold may be the same value, or the second threshold may be greater than the first threshold, which is not specifically limited in this application.
可选地,待缓存数据也可以是最近被加速器120访问的数据,这里的最近可以是指在时间范围内被加速器120访问的数据,这里的时间范围可根据第一存储器121的存储容量来确定,具体地,当第一存储器121的数据量达到存储阈值时,将第一存储器121中每个数据的访问时间进行排序,访问时间距离当前时间最久的数据将被删除,以此类推,从而保证第一存储器121中存储的数据的访问时间为最近的访问时间。Optionally, the data to be cached may also be data recently accessed by the accelerator 120 , where the latest may refer to data accessed by the accelerator 120 within a time range, and the time range here may be determined according to the storage capacity of the first memory 121 , specifically, when the amount of data in the first memory 121 reaches the storage threshold, the access time of each data in the first memory 121 is sorted, and the data with the longest access time from the current time will be deleted, and so on, so that It is guaranteed that the access time of the data stored in the first memory 121 is the latest access time.
可选地,上述最近被加速器120访问的数据可以是预设时间范围内被加速器120访问的数据,也就是说,当前时刻为T,那么预设时间范围可以是T-t时刻~T时刻这一段时间范围,其中,t的大小可根据具体应用场景来决定,本申请不作具体限定。Optionally, the data recently accessed by the accelerator 120 may be the data accessed by the accelerator 120 within a preset time range. That is to say, if the current time is T, then the preset time range may be the period from time T-t to time T. The range, where the size of t can be determined according to specific application scenarios, is not specifically limited in this application.
可以理解的,第二存储器130中最近访问的数据被加速器120再次访问的可能性非常高,将其存储于第一存储器121中,这样当加速器120再次访问该数据时,可以直接与读写速度更快的第一存储器121进行交互,从而提高加速器120的数据读写效率。It can be understood that the possibility of the recently accessed data in the second memory 130 being accessed again by the accelerator 120 is very high, and it is stored in the first memory 121, so that when the accelerator 120 accesses the data again, it can be directly related to the read and write speed. The faster first memory 121 interacts, thereby improving the data reading and writing efficiency of the accelerator 120 .
可选地,待缓存数据也可以包括预取数据,简单来说,存储控制器122可以通过预取算法确定加速器120可能会访问的数据,将其提前从第二存储器130中提取出来,存储至读写效率更快速的第一存储器121中,这样,当加速器120请求读取该数据时,可以直接与第一存储器121交互,从而提高加速器120的读写效率。Optionally, the data to be cached may also include prefetched data. Simply put, the storage controller 122 may determine the data that the accelerator 120 may access through a prefetching algorithm, extract it from the second memory 130 in advance, and store it in In this way, when the accelerator 120 requests to read the data, it can directly interact with the first memory 121, thereby improving the reading and writing efficiency of the accelerator 120.
应理解,待缓存数据还可以包括更多类型的数据,具体可根据加速器120的应用场景进行判定,本申请不对此进行具体限定。It should be understood that the data to be cached can also include more types of data, which can be specifically determined according to the application scenario of the accelerator 120, which is not specifically limited in this application.
在一实施例中,在存储控制器122将数据存储于第一存储器121之前,用户可以对存储控制器122进行配置,具体配置内容可包括配置缓存开关、配置地址范围、配置缓存深度等等,其中,缓存开关配置为开指的是使用第一存储器121进行数据存储,为关指的是不使用第一存储器121进行数据存储。地址范围配置为目标地址范围指的是:第一存储器121中存储的数据是第二存储器130目标地址范围内的存储的数据的缓存数据。缓存深度配置为D指的是第一存储器121的存储容量为D,当然,用户还可对存储控制器122进行其他配置,具体可根据实际业务环境决定,本申请不作具体限定。In one embodiment, before the storage controller 122 stores data in the first memory 121, the user can configure the storage controller 122. The specific configuration content may include configuring the cache switch, configuring the address range, configuring the cache depth, etc. Wherein, when the cache switch is configured to be on, it means that the first memory 121 is used for data storage, and when it is set to off, it means that the first memory 121 is not used for data storage. The address range configured as the target address range means that the data stored in the first memory 121 is the cache data of the data stored in the target address range of the second memory 130 . The cache depth configuration of D means that the storage capacity of the first memory 121 is D. Of course, the user can also perform other configurations on the storage controller 122, which can be determined based on the actual business environment, and is not specifically limited in this application.
举例来说,假设第一存储器的容量为2.5M,缓存深度配置为2M,缓存开关配置为开,地址范围配置为add0~add5,那么加速器120请求写入add0~add5的数据将会先缓存至第一存储器121中,可选地,add0~add5的数据也可以滞后写入第二存储器130中,而请求写入第二存储器130的其他地址的数据可直接写入第二存储器130。若加速器120不需要对数据进行缓存时,也可以将缓存开关关闭,应理解,上述举例用于说明,本申请不作具体限定。For example, assuming that the capacity of the first memory is 2.5M, the cache depth is configured as 2M, the cache switch is configured as on, and the address range is configured as add0~add5, then the data requested by the accelerator 120 to write add0~add5 will be cached first. In the first memory 121 , optionally, the data of add0 to add5 can also be written to the second memory 130 with a delay, while data requested to be written to other addresses in the second memory 130 can be directly written to the second memory 130 . If the accelerator 120 does not need to cache data, the cache switch can also be turned off. It should be understood that the above examples are for illustration and are not specifically limited in this application.
可以理解的,通过对存储控制器122进行配置,可以根据业务需求选择是否开启存储控制器122的缓存功能,设置缓存的地址空间和容量,使得本申请的方案适用于更多的应用场景,方案灵活性更好。It can be understood that by configuring the storage controller 122, you can choose whether to enable the cache function of the storage controller 122 according to business needs, and set the address space and capacity of the cache, so that the solution of this application is suitable for more application scenarios. Flexibility is better.
进一步的,存储控制器122获取到的数据处理指令可以是数据读取指令或者数据写入指令,存储控制器122在获取到数据写入指令时,该数据写入指令包括数据的第二存储地址,该第二存储地址为第二存储器130的存储地址,可以先确定缓存开关是否为开,若缓存开关为开,则确定数据处理指令所携带的地址是否处于用户配置的地址范围,若处于用户配置的 地址范围,那么存储控制器122可以将数据存储于第一存储器121,否则,将数据存储于第二存储器130。Further, the data processing instructions obtained by the storage controller 122 may be data read instructions or data write instructions. When the storage controller 122 obtains the data write instructions, the data write instructions include the second storage address of the data. , the second storage address is the storage address of the second memory 130. You can first determine whether the cache switch is on. If the cache switch is on, determine whether the address carried by the data processing instruction is within the address range configured by the user. If it is within the address range configured by the user, configured address range, then the storage controller 122 can store the data in the first memory 121 , otherwise, the data can be stored in the second memory 130 .
同理,存储控制器122在获取到数据读取指令时,确定缓存开关为开且读取地址处于用户配置的地址范围后,再将所述从第一存储器121中读取该数据,否则,从第二存储器130中读取该数据,这里不重复展开赘述。Similarly, when the storage controller 122 obtains the data read instruction, it determines that the cache switch is on and the read address is within the address range configured by the user, and then reads the data from the first memory 121. Otherwise, The data is read from the second memory 130 and will not be described again here.
在一实施例中,存储控制器122可以将数据的第一存储地址与第二存储地址之间的对应关系存储于地址存储器124中。In an embodiment, the storage controller 122 may store the correspondence between the first storage address and the second storage address of the data in the address memory 124 .
具体实现中,存储控制器122获取到数据写入指令时,该数据写入指令中的写入地址是数据的第二存储地址,若地址存储器124中已存储有上述第二存储地址且与第一存储地址对应,即表示该数据的历史版本已写入至第一存储器121的第一存储地址,此时可以将第一存储地址对应的历史版本的数据进行更新即可,并向第二存储器130发送该数据以请求对第二存储地址对应的数据进行更新。In specific implementation, when the storage controller 122 obtains the data write instruction, the write address in the data write instruction is the second storage address of the data. If the above-mentioned second storage address is already stored in the address memory 124 and is the same as the second storage address, A storage address correspondence means that the historical version of the data has been written to the first storage address of the first memory 121. At this time, the historical version of data corresponding to the first storage address can be updated and sent to the second memory. 130. Send the data to request updating of the data corresponding to the second storage address.
若地址存储器124中没有上述第二存储地址,即表示该数据是首次写入第一存储器121,存储控制器122可以先确定第二存储地址对应的第一存储地址,然后将数据写入指令携带的数据存储于第一存储器121的第一存储地址中,并将数据的第一存储地址以及第二存储地址之间的对应关系存储于地址存储器124。需要说明的,存储控制器122在首次写入数据时,可以根据当前第一存储器121的空闲地址确定该数据的第一存储地址,本申请不对存储控制器122分配地址的策略进行限定。If there is no second storage address in the address memory 124, it means that the data is written to the first memory 121 for the first time. The storage controller 122 can first determine the first storage address corresponding to the second storage address, and then carry the data writing instruction. The data is stored in the first storage address of the first memory 121, and the corresponding relationship between the first storage address and the second storage address of the data is stored in the address memory 124. It should be noted that when the storage controller 122 writes data for the first time, it can determine the first storage address of the data based on the current free address of the first memory 121. This application does not limit the address allocation strategy of the storage controller 122.
同理,存储控制器122在获取到数据读取指令时,该数据读取指令中的读取地址是数据的第二存储地址,此时存储控制器122可以将该第二存储地址与地址存储器124中的地址进行匹配,若地址存储器124中包括该第二存储地址,此时可以根据数据的第二存储地址获取对应的第一存储地址,然后从第一存储器121中读取该数据。Similarly, when the storage controller 122 obtains a data read instruction, the read address in the data read instruction is the second storage address of the data. At this time, the storage controller 122 can compare the second storage address with the address memory. Match the address in 124. If the address memory 124 includes the second storage address, the corresponding first storage address can be obtained according to the second storage address of the data, and then the data can be read from the first memory 121.
需要说明的,如果地址存储器124中没有存储该第二存储地址,即表示该数据没有存储于第一存储器121中,那么存储控制器122可以根据该第二存储地址从第二存储器130读取该数据。同时,存储控制器122还可以确定该数据的第一存储地址,将从第二存储器130读取到的该数据存储至第一存储器121中,使得加速器120后续可以直接从读写效率更高的第一存储器121中读取该数据,并将数据的第一存储地址和第二存储地址之间的映射关系存储于该地址存储器124中。It should be noted that if the second storage address is not stored in the address memory 124, it means that the data is not stored in the first memory 121, then the storage controller 122 can read the data from the second memory 130 according to the second storage address. data. At the same time, the storage controller 122 can also determine the first storage address of the data, and store the data read from the second memory 130 into the first memory 121, so that the accelerator 120 can directly read and write the data directly from the second memory 130. The data is read from the first memory 121, and the mapping relationship between the first storage address and the second storage address of the data is stored in the address memory 124.
在一实施例中,加速器120包括状态信息存储器123,状态信息存储器123用于存储数据的状态信息,具体实现中,存储控制器122可以先确定数据的状态信息,根据状态信息将数据写入第一存储器121的第一存储地址。其中,该状态信息包括第一状态和第二状态,第一状态用于指示该数据正处于修改状态,正在被写入至第一存储器121或者第二存储器130,此时数据不可以从第一存储器121读取,否则可能会出现读取到错误版本数据的情况;第二状态用于指示该数据没有处于修改状态,此时可以从第一存储器121中读取该数据。In one embodiment, the accelerator 120 includes a status information memory 123. The status information memory 123 is used to store the status information of the data. In specific implementation, the storage controller 122 may first determine the status information of the data, and write the data into the third page according to the status information. A first storage address of the memory 121. Wherein, the status information includes a first state and a second state. The first state is used to indicate that the data is in a modification state and is being written to the first memory 121 or the second memory 130. At this time, the data cannot be read from the first memory 121 or the second memory 130. The second state is used to indicate that the data is not in a modified state, and the data can be read from the first memory 121 at this time.
具体实现中,加速器120在处理数据写入指令时,若数据的状态信息为第一状态的情况下,也就是说,数据当前正在被修改,加速器可以先将数据的状态信息设置为第二状态,然后将数据写入第一存储地址,再将该数据写入第二存储器130进行数据更新。In specific implementation, when the accelerator 120 processes the data writing instruction, if the status information of the data is in the first state, that is, the data is currently being modified, the accelerator can first set the status information of the data to the second state. , then write the data into the first storage address, and then write the data into the second memory 130 for data update.
可选地,若数据的状态信息为第二状态,也就是说,数据当前没有被操作,此时可以将数据处理指令中的数据写入至第一存储器121进行数据更新,将该数据写入第二存储器130进行数据更新。Optionally, if the status information of the data is in the second status, that is, the data is not currently being manipulated, at this time, the data in the data processing instruction can be written to the first memory 121 for data update, and the data is written into The second memory 130 performs data updating.
具体实现中,加速器120在处理数据读取指令时,若地址存储器不包括读取地址的情况 下,也就是说,第一存储器121中没有该数据,加速器120从第二存储器读取数据时,加速器120将数据的状态信息设置为第一状态,加速器120从第二存储器130读取数据,数据取回后在数据的状态信息为第一状态的情况下,加速器120将数据的状态信息设置为第二状态,将数据存储至第一存储器121的第一存储地址。如果状态信息变为第二状态,则表示存储控制器122将该数据的最新版本已进行了更新,此时可以不再将数据存储于第一存储器121。应理解,如果数据读取过程中,新版本的数据被写入了第一存储器121和第二存储器130,通过状态信息可以确保数据是最新版本的数据。In specific implementation, when the accelerator 120 processes the data read instruction, if the address memory does not include the read address, , that is to say, there is no such data in the first memory 121, and when the accelerator 120 reads the data from the second memory, the accelerator 120 sets the state information of the data to the first state, and the accelerator 120 reads the data from the second memory 130, After retrieving the data, when the status information of the data is in the first state, the accelerator 120 sets the status information of the data to the second status, and stores the data to the first storage address of the first memory 121 . If the status information changes to the second status, it means that the storage controller 122 has updated the latest version of the data, and the data can no longer be stored in the first memory 121 at this time. It should be understood that if a new version of data is written into the first memory 121 and the second memory 130 during the data reading process, the status information can ensure that the data is the latest version of the data.
可选地,若地址存储器124中包括读取地址,且状态信息为第一状态,也就是数据之前的版本正在被操作中,此时可以从第二存储器130中读取该数据的最新版本。若地址存储器124中包括读取地址,且状态信息为第二状态,也就是数据没有被操作中,此时可以从第一存储器121中读取该数据的最新版本。Optionally, if the address memory 124 includes a read address and the status information is the first state, that is, the previous version of the data is being operated on, the latest version of the data can be read from the second memory 130 at this time. If the address memory 124 includes a read address and the status information is in the second state, that is, the data is not being operated on, the latest version of the data can be read from the first memory 121 at this time.
具体实现中,上述状态信息可以通过二进制字符来表示,比如第一状态用字符“1”进行表示,第二状态用字符“0”进行表示,还可以通过其他标识来区分第一状态和第二状态,本申请不对此进行具体限定。In specific implementation, the above state information can be represented by binary characters. For example, the first state is represented by the character "1", and the second state is represented by the character "0". The first state and the second state can also be distinguished by other identifiers. status, this application does not specifically limit this.
可选地,上述加速器120也可以是CPU,第一存储器可以是CPU内的存储器,比如CPU内的SRAM,存储控制器122可以结合CPU的多级缓存架构对数据进行缓存,比如作为CPU的第四级缓存架构,从而在减少硬件复杂度的基础上,实现CPU更多级的缓存,降低多级缓存的实现成本。Optionally, the above-mentioned accelerator 120 can also be a CPU, the first memory can be a memory in the CPU, such as an SRAM in the CPU, and the storage controller 122 can cache data in combination with the multi-level cache architecture of the CPU, such as as the third memory of the CPU. The four-level cache architecture enables more levels of CPU cache and reduces the implementation cost of multi-level cache on the basis of reducing hardware complexity.
需要说明的,数据处理指令不仅包括上述数据读取指令和数据写入指令,还可包括将数据读取出来后进行业务处理的其他指令,比如加速器120将数据从第一存储器中读取出来后,可以对数据进行更新、删除、合并等操作,还可以对读取到的多个数据进行计算处理,比如矩阵乘法、卷积操作等等,具体可根据加速器120的处理业务决定,本申请对数据后续进行处理的具体流程具体限定。It should be noted that the data processing instructions not only include the above-mentioned data reading instructions and data writing instructions, but also include other instructions for performing business processing after reading the data, such as after the accelerator 120 reads the data from the first memory. , the data can be updated, deleted, merged, etc., and the multiple data read can also be calculated and processed, such as matrix multiplication, convolution operations, etc. The details can be determined according to the processing business of the accelerator 120. This application is The specific process for subsequent processing of data is specifically limited.
图4是本申请提供的一种数据处理装置的结构示意图,该数据处理装置400可以是图3中的加速器120,如图4所示,该数据处理装置400可包括配置模块1221、获取模块1225和处理模块1226,其中,处理模块1226可包括查找及写数据更新模块1222,读数据返回处理模块1223以及读数据更新模块1224,应理解,上述各个模块可以分别对应ASIC或者FPGA中的各个电路模块。Figure 4 is a schematic structural diagram of a data processing device provided by this application. The data processing device 400 can be the accelerator 120 in Figure 3. As shown in Figure 4, the data processing device 400 can include a configuration module 1221 and an acquisition module 1225. and processing module 1226, wherein the processing module 1226 may include a search and write data update module 1222, a read data return processing module 1223, and a read data update module 1224. It should be understood that each of the above modules may correspond to each circuit module in an ASIC or FPGA. .
需要说明的,图4中的获取模块1225、配置模块1221以及处理模块1226的功能可以由图3中的存储控制器122实现,并且,图4所示的数据处理装置400是图1所示的应用场景下的划分方式,即第二存储器130部署于加速器之外的应用场景,应理解,图2所示的应用场景下,第二存储器130部署于数据处理装置400之内,这里不重复展开说明。It should be noted that the functions of the acquisition module 1225, the configuration module 1221 and the processing module 1226 in Figure 4 can be implemented by the storage controller 122 in Figure 3, and the data processing device 400 shown in Figure 4 is the one shown in Figure 1 The division method in the application scenario is the application scenario in which the second memory 130 is deployed outside the accelerator. It should be understood that in the application scenario shown in Figure 2, the second memory 130 is deployed in the data processing device 400. This will not be repeated here. illustrate.
获取模块1225用于获取加速器120生成的数据处理指令,该数据处理指令可包括数据写入指令和数据读取指令。The acquisition module 1225 is used to acquire data processing instructions generated by the accelerator 120, which may include data writing instructions and data reading instructions.
配置模块1221用于接收用户输入的配置信息,该配置信息可包括配置缓存开关的信息、配置地址范围的信息以及配置缓存深度的信息,其中,配置缓存开关的信息为开指的是使用第一存储器121进行数据存储,为关指的是不使用第一存储器121进行数据存储。配置地址范围的信息包括目标地址范围,即第一存储器121中存储的数据是第二存储器130目标地址范围内的存储的数据的缓存数据。配置缓存深度的信息包括缓存深度D,即第一存储器121的存储容量为D。The configuration module 1221 is used to receive configuration information input by the user. The configuration information may include information about configuring the cache switch, information about configuring the address range, and information about configuring the cache depth. Wherein, the information about configuring the cache switch is on, which means using the first The memory 121 performs data storage, which means that the first memory 121 is not used for data storage. The information configuring the address range includes the target address range, that is, the data stored in the first memory 121 is the cache data of the data stored in the target address range of the second memory 130 . The information configuring the cache depth includes the cache depth D, that is, the storage capacity of the first memory 121 is D.
查找及写数据更新模块1222用于获取加速器120生成的数据写入指令,对数据写入指令 进行处理。具体地,查找及写数据更新模块1222可先根据该数据写入指令中携带的数据的第二存储地址,查询地址存储器124中是否存在第二存储地址,如果地址存储器124中存在第二存储地址,则查询状态信息存储器123中数据的状态信息是否为第一状态,若是第一状态,则将数据的第一状态修改为第二状态,根据第一存储地址从第一存储器121读取该数据并对其进行更新,更新后的数据重新写入第一存储器121,再将更新后的数据写入第二存储器130;若为第二状态,则直接将数据写入第一存储器121,再将数据写入第二存储器130。The search and write data update module 1222 is used to obtain the data writing instructions generated by the accelerator 120, and perform the data writing instructions for processing. Specifically, the search and write data update module 1222 may first query whether the second storage address exists in the address memory 124 according to the second storage address of the data carried in the data writing instruction. If the second storage address exists in the address memory 124 , then query whether the state information of the data in the state information memory 123 is the first state. If it is the first state, modify the first state of the data to the second state, and read the data from the first memory 121 according to the first storage address. And update it, the updated data is rewritten into the first memory 121, and then the updated data is written into the second memory 130; if it is the second state, the data is directly written into the first memory 121, and then the Data is written to the second memory 130.
如果地址存储器124中不存在第二存储地址,则根据当前第一存储器121的存储容量,确定第二存储地址对应的第一存储地址,并将第一存储地址与第二存储地址之间的对应关系存储于第一存储器121,再将数据存储于第一存储器121和第二存储器130中。If the second storage address does not exist in the address memory 124, the first storage address corresponding to the second storage address is determined according to the current storage capacity of the first memory 121, and the correspondence between the first storage address and the second storage address is The relationship is stored in the first memory 121, and the data is stored in the first memory 121 and the second memory 130.
读数据返回处理模块1223和读数据更新模块1224用于获取加速器120生成的数据读取指令,对数据读取指令进行处理。其中,读数据返回处理模块1223主要用于读取数据,读数据更新模块1224主要用于对数据进行更新。The read data return processing module 1223 and the read data update module 1224 are used to obtain the data read instructions generated by the accelerator 120 and process the data read instructions. Among them, the read data return processing module 1223 is mainly used to read data, and the read data update module 1224 is mainly used to update data.
具体地,读数据返回处理模块1223可先根据该数据读取指令中携带的数据的第二存储地址,查询地址存储器124中是否存在第二存储地址,如果地址存储器124中存在第二存储地址,则查询状态信息存储器123中数据的状态信息是否为第一状态,若是第一状态,则表示数据正在被修改,因此可以从第二存储器130读取该数据;若是第二状态,则表示数据未被修改,因此可以从第一存储器121读取该数据。Specifically, the read data return processing module 1223 can first query whether the second storage address exists in the address memory 124 according to the second storage address of the data carried in the data reading instruction. If the second storage address exists in the address memory 124, Then query whether the status information of the data in the status information memory 123 is the first state. If it is the first state, it means that the data is being modified, so the data can be read from the second memory 130; if it is the second state, it means the data has not been modified. is modified so that the data can be read from the first memory 121.
如果地址存储器124中不存在第二存储地址,读数据更新模块1224可以先将状态信息存储器123中数据的状态信息存储为第二状态,然后根据第一存储器121的存储容量确定第二存储地址对应的第一存储地址,并将第一存储地址与第二存储地址之间的对应关系存储于第一存储器121,读数据返回处理模块1223从第二存储器读取数据,同时,读数据更新模块1224判断状态信息存储器123中数据的状态信息是否为第一状态,如果是第一状态,则将状态信息修改为第二状态,然后将数据更新至第一存储器,如果是第二状态,则不将数据更新至第一存储器。应理解,如果是第二状态,则表示查找及写数据更新模块1222在此期间对第一存储器中的数据进行了更新,因此不需要再将数据更新至第一存储器。If the second storage address does not exist in the address memory 124, the read data update module 1224 can first store the status information of the data in the status information memory 123 as the second status, and then determine the second storage address corresponding to the storage capacity of the first memory 121. The first storage address, and the corresponding relationship between the first storage address and the second storage address is stored in the first memory 121. The read data return processing module 1223 reads data from the second memory. At the same time, the read data update module 1224 Determine whether the state information of the data in the state information memory 123 is the first state. If it is the first state, modify the state information to the second state, and then update the data to the first memory. If it is the second state, do not change the state information to the first state. Data is updated to the first memory. It should be understood that if it is the second state, it means that the search and write data update module 1222 has updated the data in the first memory during this period, so there is no need to update the data to the first memory.
需要说明的,图4是一种示例性的划分方式,本申请提供的存储控制器122还可划分为更多的模块,比如在图4中,在地址存储器124中查找数据处理指令中的第二存储地址这一步骤交由查找及写数据更新模块1222实现,具体实现中,查找及写数据更新模块1222也可进一步划分为查找模块和写数据更新模块,本申请不作具体限定。It should be noted that Figure 4 is an exemplary division method. The memory controller 122 provided in this application can also be divided into more modules. For example, in Figure 4, the address memory 124 is searched for the first data processing instruction. The step of storing addresses is implemented by the search and write data update module 1222. In specific implementation, the search and write data update module 1222 can also be further divided into a search module and a write data update module, which is not specifically limited in this application.
综上可知,本申请提供的加速器,通过将第二存储器中的待缓存数据存储于第一存储器,其中第一存储器的读写效率大于第二存储器的读写效率,使得加速器在读写数据时可以直接与读写效率较高的第一存储器进行交互,从而提高加速器的数据读写效率,同时,第一存储器由加速器内的存储器实现,比如SRAM、寄存器、CAM等等,不需要额外部署缓存,以很低的硬件成本实现了FPGA、ASIC等加速器的缓存功能,可以简单、高效、低成本的解决加速器的读写效率受限于内存带宽瓶颈的问题。In summary, it can be seen that the accelerator provided by the present application stores the data to be cached in the second memory in the first memory, where the read and write efficiency of the first memory is greater than the read and write efficiency of the second memory, so that the accelerator is faster when reading and writing data. It can directly interact with the first memory with high reading and writing efficiency, thereby improving the data reading and writing efficiency of the accelerator. At the same time, the first memory is implemented by the memory in the accelerator, such as SRAM, register, CAM, etc., without the need to deploy additional cache. , realizes the cache function of accelerators such as FPGA and ASIC with very low hardware cost, which can solve the problem of accelerator's read and write efficiency limited by the memory bandwidth bottleneck in a simple, efficient and low-cost way.
下面结合附图5~图7,对本申请提供的数据处理方法进行解释说明,图5~图7描述的方法可应用于图1~图4中的加速器120中。其中,图5是本申请提供的数据处理方法在一应用场景下的步骤流程示意图,图6是本申请提供的一种数据处理方法中处理数据读取指令的步骤流程示意图,图7是本申请提供的一种数据处理方法中处理数据写入指令的步骤流程示意图,简单来说,图6用于描述图5中的步骤7,图7用于描述图5中的步骤8。如图5~图7所示,该方法可包括以下步骤: The data processing method provided by the present application will be explained below with reference to Figures 5 to 7. The methods described in Figures 5 to 7 can be applied to the accelerator 120 in Figures 1 to 4. Among them, Figure 5 is a schematic flowchart of the steps of the data processing method provided by this application in an application scenario. Figure 6 is a schematic flowchart of the steps of processing data reading instructions in a data processing method provided by this application. Figure 7 is a schematic flowchart of the steps of processing data reading instructions in a data processing method provided by this application. A schematic flow chart of steps for processing data writing instructions in a data processing method is provided. Simply put, Figure 6 is used to describe step 7 in Figure 5, and Figure 7 is used to describe step 8 in Figure 5. As shown in Figures 5 to 7, the method may include the following steps:
步骤1.配置存储控制器122,该步骤可以由图4中的配置模块1221实现。Step 1. Configure the storage controller 122. This step can be implemented by the configuration module 1221 in Figure 4.
具体实现中,配置内容包括缓存开关状态、缓存地址范围、缓存容量中的一种或者多种,其中,缓存开关状态用于指示加速器是否使用第一存储器121,缓存地址范围用于指示加速器将存储地址处于缓存地址范围的数据存储于第一存储器121,缓存容量用于指示第一存储器121的容量。缓存深度配置为D指的是第一存储器121的存储容量为D,当然,用户还可对存储控制器122进行其他配置,具体可根据实际业务环境决定,本申请不作具体限定。In specific implementation, the configuration content includes one or more of cache switch status, cache address range, and cache capacity, where the cache switch status is used to indicate whether the accelerator uses the first memory 121, and the cache address range is used to indicate that the accelerator will store Data whose address is within the cache address range is stored in the first memory 121 , and the cache capacity is used to indicate the capacity of the first memory 121 . The cache depth configuration of D means that the storage capacity of the first memory 121 is D. Of course, the user can also perform other configurations on the storage controller 122, which can be determined based on the actual business environment, and is not specifically limited in this application.
可以理解的,通过对存储控制器122进行配置,可以根据业务需求选择是否开启存储控制器122的缓存功能,设置缓存的地址空间和容量,使得本申请的方案适用于更多的应用场景,方案灵活性更好。It can be understood that by configuring the storage controller 122, you can choose whether to enable the cache function of the storage controller 122 according to business needs, and set the address space and capacity of the cache, so that the solution of this application is suitable for more application scenarios. Flexibility is better.
步骤2.存储控制器122获取数据处理指令。具体实现中,数据处理指令可以是数据读取指令也可以是数据处理指令。该步骤可以由图4中的获取模块1225实现。Step 2. The storage controller 122 obtains data processing instructions. In specific implementation, the data processing instruction may be a data reading instruction or a data processing instruction. This step can be implemented by the acquisition module 1225 in Figure 4.
步骤3.存储控制器122判断缓存开关是否打开,在缓存开关为开的情况下,执行步骤4,在缓存开关为关的情况下,执行步骤9。其中,缓存开关配置为开指的是使用第一存储器121进行数据存储,为关指的是不使用第一存储器121进行数据存储。该步骤可以由图4中的配置模块1221实现。Step 3. The storage controller 122 determines whether the cache switch is on. If the cache switch is on, step 4 is performed. If the cache switch is off, step 9 is performed. Wherein, when the cache switch is configured to be on, it means that the first memory 121 is used for data storage, and when it is set to off, it means that the first memory 121 is not used for data storage. This step can be implemented by the configuration module 1221 in Figure 4.
步骤4.存储控制器122判断地址范围是否设置,在地址范围设置的情况下执行步骤5,在地址范围未设置的情况下执行步骤9。该步骤可以由图4中的配置模块1221实现。Step 4. The storage controller 122 determines whether the address range is set. If the address range is set, step 5 is performed. If the address range is not set, step 9 is performed. This step can be implemented by the configuration module 1221 in Figure 4.
可选地,在地址范围未设置的情况下,由于缓存开关已经开启,也就是说,用户希望使用第一存储器121的缓存功能,因此可以提示用户进行缓存地址范围的设置,也可以跳过步骤5的处理流程,直接执行步骤6。Optionally, when the address range is not set, since the cache switch is already turned on, that is to say, the user wants to use the cache function of the first memory 121, the user can be prompted to set the cache address range, or the step can be skipped. For the processing flow of 5, directly execute step 6.
步骤5.存储控制器122判断地址是否在范围内,其中,该地址指的是步骤2接收到的数据访问请求的地址,范围指的是步骤1所配置的缓存地址范围,若该地址在步骤1所配置的缓存地址范围内,则执行步骤6,若不在该缓存地址范围内,则执行步骤9。该步骤可以由图4中的配置模块1221实现。Step 5. The storage controller 122 determines whether the address is within the range, where the address refers to the address of the data access request received in step 2, and the range refers to the cache address range configured in step 1. If the address is within the range, 1 is within the configured cache address range, go to step 6. If it is not within the cache address range, go to step 9. This step can be implemented by the configuration module 1221 in Figure 4.
步骤6.存储控制器122判断数据处理指令是否为数据读取指令,是数据读取指令的的情况下执行步骤7,不是数据读取指令的情况下执行步骤8。该步骤可以由配置模块1221实现。Step 6. The storage controller 122 determines whether the data processing instruction is a data read instruction. If it is a data read instruction, step 7 is performed. If it is not a data read instruction, step 8 is performed. This step can be implemented by the configuration module 1221.
步骤7.存储控制器122处理数据读取指令。其中,步骤7处理数据读取指令的过程将在图6实施例中进行详细描述,该步骤可以由图4中的读数据返回处理模块1223和读数据更新模块1224实现。Step 7. The storage controller 122 processes the data read instruction. The process of processing the data read instruction in step 7 will be described in detail in the embodiment of Figure 6. This step can be implemented by the read data return processing module 1223 and the read data update module 1224 in Figure 4.
步骤8.存储控制器122处理数据写入指令。其中,步骤8处理数据写入指令的过程将在图7实施例中进行详细描述。该步骤可以由图4中的查找及写数据更新模块1222实现。Step 8. The storage controller 122 processes the data write instruction. Among them, the process of processing the data writing instruction in step 8 will be described in detail in the embodiment of FIG. 7 . This step can be implemented by the search and write data update module 1222 in Figure 4.
步骤9.存储控制器122下发第二存储器的读请求或者写请求,具体实现中,若数据访问请求为数据读取指令,第二存储器130为DDR,那么存储控制器122可以下发DDR读请求,若数据访问请求为数据处理指令,那么存储控制器122可以下发DDR写请求,上述举例用于说明,本申请不作具体限定。Step 9. The storage controller 122 issues a read request or write request to the second memory. In specific implementation, if the data access request is a data read instruction and the second memory 130 is DDR, then the storage controller 122 can issue a DDR read request. request, if the data access request is a data processing instruction, then the storage controller 122 can issue a DDR write request. The above example is for illustration and is not specifically limited in this application.
下面结合图6,对上述步骤7如何处理数据读取指令进行详细说明。图6是本申请提供的数据读取指令的处理步骤流程图,如图6所示,步骤7可包括以下步骤流程:Next, with reference to Figure 6, a detailed description will be given of how the data read instruction is processed in step 7 above. Figure 6 is a flow chart of the processing steps of the data read instruction provided by this application. As shown in Figure 6, step 7 may include the following step flow:
步骤71:地址存储器124判断地址存储器124中是否存储相同的地址,其中,数据读取指令中携带有数据的读取地址,该地址是第二存储器130的地址。该步骤可以由图4中的读数据返回处理模块1223实现。Step 71: The address memory 124 determines whether the same address is stored in the address memory 124, where the data read instruction carries the read address of the data, and the address is the address of the second memory 130. This step can be implemented by the read data return processing module 1223 in Figure 4.
参考前述内容可知,地址存储器124可以是CAM,其中,CAM是以内容进行寻址的存 储器,其工作机制是将一个输入数据项与存储在CAM中的所有数据项进行比较,判别该输入数据项与CAM中存储的数据项是否相匹配,因此用于存储数据的地址信息的地址存储器124可使用加速器120内的CAM实现,这样,用户请求读取数据时,CAM可以将读取地址与CAM中存储的地址信息进行匹配,若匹配,则表示该数据已存储于第一存储器121中。With reference to the foregoing content, it can be known that the address memory 124 may be a CAM, where the CAM is a memory that is addressed by content. The working mechanism of the memory is to compare an input data item with all data items stored in the CAM to determine whether the input data item matches the data items stored in the CAM, so the address used to store the address information of the data The memory 124 can be implemented using a CAM in the accelerator 120. In this way, when the user requests to read data, the CAM can match the read address with the address information stored in the CAM. If they match, it means that the data has been stored in the first memory 121. middle.
应理解,上述举例用于说明,地址存储器124也可以使用其他存储器实现,然后由存储控制器122实现从地址存储器124中获取地址并将其与数据读取指令中携带地址进行匹配的功能,本申请不作具体限定。It should be understood that the above examples are for illustration, the address memory 124 can also be implemented using other memories, and then the memory controller 122 implements the function of obtaining the address from the address memory 124 and matching it with the address carried in the data read instruction. Applications are not subject to specific restrictions.
需要说明的,步骤71判断地址存储器124中是否存在相同地址后,若存在则表示第一存储器121中已存储有数据的最新或者历史版本,可执行步骤72,若不存在则表示第一存储器121中没有存储过该数据的任何版本,因此可执行步骤75。It should be noted that after step 71 determines whether the same address exists in the address memory 124, if it exists, it means that the latest or historical version of the data has been stored in the first memory 121, and step 72 can be executed. If it does not exist, it means that the first memory 121 There is no version of this data stored in , so step 75 can be performed.
步骤72.判断状态信息是否为第一状态,第一状态也可以用高位状态来表示,或者“1”状态来表示,本申请不作具体限定。该步骤可以由图4中的读数据返回处理模块1223实现。Step 72: Determine whether the status information is the first state. The first state can also be represented by a high-order state or a "1" state, which is not specifically limited in this application. This step can be implemented by the read data return processing module 1223 in Figure 4.
具体实现中,状态信息存储器123可以是寄存器,步骤72可以由具有判断状态信息是否为第一状态的寄存器来实现,也可以是状态信息存储器123实现存储功能,由存储控制器122实现判断功能,比如存储控制器122从状态信息存储器123获取数据的状态信息,并对其进行判断是否处于第一状态,本申请不对此进行具体限定。In specific implementation, the status information memory 123 may be a register, and step 72 may be implemented by a register that determines whether the status information is the first state, or the status information memory 123 may implement the storage function, and the storage controller 122 may implement the determination function. For example, the storage controller 122 obtains the status information of the data from the status information memory 123 and determines whether it is in the first state. This application does not specifically limit this.
需要说明的,在数据的状态信息是第一状态(高位)的情况下,执行步骤73,第二状态(低位)的情况下,执行步骤74。It should be noted that when the status information of the data is in the first state (high bit), step 73 is executed, and when the status information of the data is in the second state (low bit), step 74 is executed.
步骤73.从第二存储器130读取数据,具体地,存储控制器122可以向第二存储器130下发数据读取指令,如果第二存储器130是DDR,那么该数据读取指令可以是DDR读请求。该步骤可以由图4中的读数据返回处理模块1223实现。Step 73. Read data from the second memory 130. Specifically, the storage controller 122 may issue a data read instruction to the second memory 130. If the second memory 130 is DDR, then the data read instruction may be a DDR read instruction. ask. This step can be implemented by the read data return processing module 1223 in Figure 4.
可以理解的,如果数据的状态信息是第一状态,参考前述内容可知,数据在第一状态时,数据可能正在从第二存储器130取回的过程中,因此此时直接从第二存储器130读取数据,可以避免读取错误问题的出现,提高数据读取的准确性。It can be understood that if the status information of the data is the first state, referring to the foregoing content, it can be seen that when the data is in the first state, the data may be in the process of being retrieved from the second memory 130, so the data is read directly from the second memory 130 at this time. Fetching data can avoid reading errors and improve the accuracy of data reading.
步骤74.从第一存储器读取数据。具体地,存储控制器122可以根据数据读取指令中携带的读取地址,查询地址存储器124确定数据在第一存储器中的第一存储地址,然后从第一存储地址读取该数据。该步骤可以由图4中的读数据返回处理模块1223实现。Step 74. Read data from the first memory. Specifically, the storage controller 122 may query the address memory 124 to determine the first storage address of the data in the first memory according to the read address carried in the data read instruction, and then read the data from the first storage address. This step can be implemented by the read data return processing module 1223 in Figure 4.
步骤75.将状态信息设置为第一状态,存储第一存储地址和第二存储地址之间的对应关系。其中,第一状态也可以用高位状态来表示,或者“1”状态来表示,本申请不作具体限定。上述对应关系将被存储于地址存储器124中。该步骤可以由图4中的读数据更新模块1224实现。Step 75: Set the state information to the first state, and store the correspondence between the first storage address and the second storage address. The first state can also be represented by a high-order state or a "1" state, which is not specifically limited in this application. The above corresponding relationship will be stored in the address memory 124. This step can be implemented by the read data update module 1224 in Figure 4.
步骤76.从第二存储器读取数据。具体地,存储控制器122可以向第二存储器130下发数据读取指令,如果第二存储器130是DDR,那么该数据读取指令可以是DDR读请求。该步骤可以由图4中的读数据更新模块1224实现。Step 76. Read data from the second memory. Specifically, the storage controller 122 may issue a data read instruction to the second memory 130. If the second memory 130 is DDR, then the data read instruction may be a DDR read request. This step can be implemented by the read data update module 1224 in Figure 4.
步骤77.判断状态信息是否为第一状态(高位)。若为第一状态执行步骤78~步骤79,若为第二状态(低位)则不执行步骤78~步骤79。该步骤可以由图4中的读数据更新模块1224实现。Step 77: Determine whether the status information is the first status (high bit). If it is the first state, steps 78 to 79 are executed. If it is the second state (low level), steps 78 to 79 are not executed. This step can be implemented by the read data update module 1224 in Figure 4.
可以理解的,若状态信息变为第二状态(低位),参考前述内容可知,在数据写入过程中,若状态信息为第一状态(高位),存储控制器122会将数据的状态信息从第一状态(高位)修改为第二状态(低位),即表示步骤75~步骤79之间,该数据发生了修改,存储控制器122正在或者已经将最新版本的数据写入至第一存储器121,因此此时读取到的数据可以不被写 入第一存储器121,以免将新版本的数据覆盖。It can be understood that if the status information changes to the second state (low bit), referring to the foregoing content, it can be seen that during the data writing process, if the status information becomes the first state (high bit), the storage controller 122 will change the status information of the data from The first state (high bit) is modified to the second state (low bit), which means that the data has been modified between step 75 and step 79, and the storage controller 122 is writing or has written the latest version of the data to the first memory 121 , so the data read at this time can not be written into the first memory 121 to avoid overwriting the data of the new version.
同理,若状态信息没有变为第二状态(低位),则说明步骤75~步骤79之间,该数据没有被修改,因此此时从第二存储器130读取到的数据可以被更新至第一存储器121。Similarly, if the status information does not change to the second state (low bit), it means that the data has not been modified between step 75 and step 79. Therefore, the data read from the second memory 130 at this time can be updated to the second state. A memory 121.
步骤78.将状态信息修改为第二状态(低)。该步骤可以由图4中的读数据更新模块1224实现。Step 78. Modify the status information to the second status (low). This step can be implemented by the read data update module 1224 in Figure 4.
步骤79.将数据更新至第一存储器121。该步骤可以由图4中的读数据更新模块1224实现。Step 79. Update the data to the first memory 121. This step can be implemented by the read data update module 1224 in Figure 4.
下面结合图7,对上述步骤8如何处理数据写入指令进行详细说明。如图7所示,图7是本申请提供的数据写入指令的处理步骤流程图,步骤8可包括以下步骤流程:Next, with reference to Figure 7, a detailed description will be given of how the data writing instruction is processed in step 8 above. As shown in Figure 7, Figure 7 is a flow chart of the processing steps of the data write instruction provided by this application. Step 8 may include the following step flow:
步骤81.判断是否存在相同地址,数据处理指令中携带有数据的写入地址,步骤81可判断地址存储器124中是否存在该写入地址,该步骤的具体实现可参考前述步骤71的描述,这里不重复赘述,该步骤可以由图4中的查找及写数据更新模块1222实现。Step 81. Determine whether the same address exists. The data processing instruction carries the write address of the data. Step 81 can determine whether the write address exists in the address memory 124. For the specific implementation of this step, please refer to the description of the aforementioned step 71. Here Without going into details, this step can be implemented by the search and write data update module 1222 in Figure 4 .
其中,存在相同地址时执行步骤82,不存在相同地址时执行步骤84。Among them, if the same address exists, step 82 is executed, and if the same address does not exist, step 84 is executed.
步骤82.判断状态信息是否为第一状态,第一状态也可以用高位状态来表示,或者“1”状态来表示,本申请不作具体限定。该步骤可以由图4中的查找及写数据更新模块1222实现。Step 82: Determine whether the status information is the first state. The first state can also be represented by a high-order state or a "1" state, which is not specifically limited in this application. This step can be implemented by the search and write data update module 1222 in Figure 4.
在状态信息为第一状态(高位)的情况下,执行步骤83、步骤85以及步骤86,在状态信息为第二状态(低位)的情况下,执行步骤85和步骤86。When the status information is the first state (high bit), step 83, step 85 and step 86 are executed. When the status information is the second state (low bit), step 85 and step 86 are executed.
步骤83.将状态信息设置为第二状态,第二状态也可以用低位状态来表示,或者“0”状态来表示,本申请不作具体限定。该步骤可以由图4中的查找及写数据更新模块1222实现。Step 83. Set the status information to the second state. The second state can also be represented by the low-order state or the "0" state, which is not specifically limited in this application. This step can be implemented by the search and write data update module 1222 in Figure 4.
可以理解的,参考步骤75可知,状态信息为第一状态时,说明该数据正在执行步骤75~步骤78,此时步骤83接收到的数据是最新版本的数据,因此将状态信息设置为第二状态,可以避免正在进行步骤75~步骤78的旧版本数据覆盖当前的新版本数据。It can be understood with reference to step 75 that when the status information is in the first state, it means that the data is executing steps 75 to 78. At this time, the data received in step 83 is the latest version of data, so the status information is set to the second state. status, which can prevent the old version data in steps 75 to 78 from overwriting the current new version data.
需要说明的,步骤83执行后可执行步骤85和步骤86。It should be noted that after step 83 is executed, step 85 and step 86 can be executed.
步骤84.存储第一存储地址和第二存储地址之间的对应关系,将其存储于地址存储器124中。可以理解的,地址存储器124中不存在该数据处理指令中携带的地址时,说明该数据的历史版本没有被写入至第一存储器121,因此此时可以为其分为第一存储器121的写入地址,也就是第一存储地址,然后将第一存储地址存储至地址存储器124,这样,当存储控制器122接收到该数据的读取请求时,可以通过地址存储器获取该数据的第一存储地址,然后读取该数据,从而实现数据的缓存。该步骤可以由图4中的查找及写数据更新模块1222实现。Step 84: Store the corresponding relationship between the first storage address and the second storage address, and store it in the address memory 124. It can be understood that when the address carried in the data processing instruction does not exist in the address memory 124, it means that the historical version of the data has not been written to the first memory 121. Therefore, it can be divided into writes to the first memory 121 at this time. Enter the address, that is, the first storage address, and then store the first storage address into the address memory 124. In this way, when the storage controller 122 receives a read request for the data, the first storage address of the data can be obtained through the address memory. address, and then reads the data to achieve data caching. This step can be implemented by the search and write data update module 1222 in Figure 4.
需要说明的,步骤84执行后可执行步骤85和步骤86。It should be noted that after step 84 is executed, step 85 and step 86 can be executed.
步骤85.将数据写入第一存储器。该步骤可以由图4中的查找及写数据更新模块1222实现。Step 85. Write data to the first memory. This step can be implemented by the search and write data update module 1222 in Figure 4.
步骤86.将数据写入第二存储器。该步骤可以由图4中的查找及写数据更新模块1222实现。Step 86. Write data to the second memory. This step can be implemented by the search and write data update module 1222 in Figure 4.
综上可知,本申请提供的数据处理方法,通过将数据存储于第一存储器,将第一存储器的数据存储于第二存储器,其中第一存储器的读写效率大于第二存储器的读写效率,使得加速器在读写数据时可以直接与读写效率较高的第一存储器进行交互,从而提高加速器的数据读写效率,同时,第一存储器由加速器内的存储器实现,比如SRAM、寄存器、CAM等等,不需要额外部署缓存,以很低的硬件成本实现了FPGA、ASIC等加速器的缓存功能,可以简单、高效、低成本的解决加速器的读写效率受限于内存带宽瓶颈的问题。In summary, it can be seen that the data processing method provided by this application stores data in the first memory and stores the data in the first memory in the second memory, where the read and write efficiency of the first memory is greater than the read and write efficiency of the second memory. This allows the accelerator to directly interact with the first memory with higher reading and writing efficiency when reading and writing data, thereby improving the data reading and writing efficiency of the accelerator. At the same time, the first memory is implemented by the memory in the accelerator, such as SRAM, register, CAM, etc. etc., without the need to deploy additional cache, and realize the cache function of accelerators such as FPGA and ASIC with very low hardware cost, which can simply, efficiently and cost-effectively solve the problem of accelerator read and write efficiency limited by the memory bandwidth bottleneck.
本申请实施例提供了一种加速器,该加速器包括处理器和供电电路,供电电路用于为处 理器供电,处理器用于实现上述图5~图7描述的加速器所执行操作步骤的功能。The embodiment of the present application provides an accelerator. The accelerator includes a processor and a power supply circuit. The power supply circuit is used to process The processor supplies power, and the processor is used to implement the functions of the operating steps performed by the accelerator described in Figures 5 to 7 above.
本申请实施例提供了一种计算设备,该计算设备包括CPU和加速器,CPU用于运行指令实现计算设备的业务功能,实现上述图5~图7描述的加速器所执行操作步骤的功能。Embodiments of the present application provide a computing device. The computing device includes a CPU and an accelerator. The CPU is used to run instructions to implement business functions of the computing device, and to implement the functions of the operation steps performed by the accelerator described in Figures 5 to 7.
上述实施例,全部或部分地通过软件、硬件、固件或其他任意组合来实现。当使用软件实现时,上述实施例全部或部分地以计算机程序产品的形式实现。计算机程序产品包括至少一个计算机指令。在计算机上加载或执行计算机程序指令时,全部或部分地产生按照本发明实施例的流程或功能。计算机为通用计算机、专用计算机、计算机网络、或者其他编程装置。计算机指令存储在计算机读存储介质中,或者从一个计算机读存储介质向另一个计算机读存储介质传输,例如,计算机指令从一个网站站点、计算机、服务器或数据中心通过有线(例如同轴电缆、光纤、数字用户线(digital subscriber line,DSL))或无线(例如红外、无线、微波等)方式向另一个网站站点、计算机、服务器或数据中心进行传输。计算机读存储介质是计算机能够存取的任何用介质或者是包含至少一个用介质集合的服务器、数据中心等数据存储节点。用介质是磁性介质(例如,软盘、硬盘、磁带)、光介质(例如,高密度数字视频光盘(digital video disc,DVD)、或者半导体介质。半导体介质是SSD。The above embodiments are implemented in whole or in part by software, hardware, firmware or any other combination. When implemented using software, the above-described embodiments are implemented in whole or in part in the form of a computer program product. A computer program product includes at least one computer instruction. When computer program instructions are loaded or executed on a computer, processes or functions according to embodiments of the present invention are generated in whole or in part. The computer is a general-purpose computer, a special-purpose computer, a computer network, or other programming device. Computer instructions are stored in or transmitted from one computer-readable storage medium to another, e.g., from a website, computer, server, or data center via wired (e.g., coaxial cable, fiber optic cable) , digital subscriber line (DSL)) or wireless (such as infrared, wireless, microwave, etc.) to transmit to another website, computer, server or data center. Computer-readable storage media are any media that can be accessed by a computer or data storage nodes such as servers and data centers that contain at least one media collection. The media used is magnetic media (for example, floppy disk, hard disk, tape), optical media (for example, high-density digital video disc (DVD)), or semiconductor media. The semiconductor medium is SSD.
以上,仅为本发明的具体实施方式,但本发明的保护范围并不局限于此,任何熟悉本技术领域的技术人员在本发明揭露的技术范围内,轻易想到各种等效的修复或替换,这些修复或替换都应涵盖在本发明的保护范围之内。因此,本发明的保护范围应以权利要求的保护范围为准。 The above are only specific embodiments of the present invention, but the protection scope of the present invention is not limited thereto. Any person familiar with the technical field can easily think of various equivalent repairs or replacements within the technical scope disclosed by the present invention. , these repairs or replacements should be covered by the protection scope of the present invention. Therefore, the protection scope of the present invention should be subject to the protection scope of the claims.

Claims (18)

  1. 一种数据处理方法,其特征在于,所述方法应用于加速器,所述加速器包括第一存储器,所述方法包括:A data processing method, characterized in that the method is applied to an accelerator, the accelerator includes a first memory, and the method includes:
    所述加速器获取数据处理指令,其中,所述数据处理指令包括数据在第二存储器中的地址,所述第一存储器的数据读写效率大于所述第二存储器的数据读写效率;The accelerator obtains data processing instructions, wherein the data processing instructions include the address of data in a second memory, and the data reading and writing efficiency of the first memory is greater than the data reading and writing efficiency of the second memory;
    所述加速器根据所述数据处理指令从所述第一存储器读取所述数据,对所述数据进行处理,其中,所述第一存储器用于存储所述第二存储器中的待缓存数据,所述待缓存数据包括被所述加速器历史访问的数据。The accelerator reads the data from the first memory according to the data processing instruction and processes the data, wherein the first memory is used to store the data to be cached in the second memory, so The data to be cached includes data historically accessed by the accelerator.
  2. 根据权利要求1所述的方法,其特征在于,所述加速器包括状态信息存储器,所述状态信息存储器用于存储所述数据的状态信息,所述状态信息包括第一状态和第二状态,其中,所述第一状态用于指示所述数据处于修改状态,所述第二状态用于指示所述数据未处于修改状态。The method according to claim 1, characterized in that the accelerator includes a state information memory, the state information memory is used to store state information of the data, the state information includes a first state and a second state, wherein , the first state is used to indicate that the data is in a modified state, and the second state is used to indicate that the data is not in a modified state.
  3. 根据权利要求1或2所述的方法,其特征在于,所述加速器包括地址存储器,所述地址存储器用于存储所述数据在所述第一存储器中的地址与所述数据在所述第二存储器中的地址之间的对应关系。The method according to claim 1 or 2, characterized in that the accelerator includes an address memory, the address memory is used to store the address of the data in the first memory and the address of the data in the second memory. Correspondence between addresses in memory.
  4. 根据权利要求1至3任一权利要求所述的方法,其特征在于,所述数据处理指令包括数据写入指令,所述加速器根据所述数据处理指令从所述第一存储器读取所述数据,对所述数据进行处理包括:The method according to any one of claims 1 to 3, characterized in that the data processing instructions include data writing instructions, and the accelerator reads the data from the first memory according to the data processing instructions. , processing the data includes:
    在所述数据的状态信息为第一状态、所述数据处理指令为数据写入指令的情况下,所述加速器将所述数据的状态信息设置为第二状态,根据所述数据处理指令从所述第一存储器读取并更新所述数据,将更新后的数据存储于所述第一存储器。When the state information of the data is a first state and the data processing instruction is a data writing instruction, the accelerator sets the state information of the data to a second state, and the data is processed from the data according to the data processing instruction. The first memory reads and updates the data, and stores the updated data in the first memory.
  5. 根据权利要求4所述的方法,其特征在于,所述数据处理指令包括数据读取指令,所述加速器获取数据处理指令之前,所述方法还包括:The method according to claim 4, wherein the data processing instructions include data reading instructions, and before the accelerator obtains the data processing instructions, the method further includes:
    所述加速器确定所述地址存储器中不包括所述数据处理指令中的地址;The accelerator determines that the address memory does not include the address in the data processing instruction;
    所述加速器将所述数据的状态信息设置为所述第一状态,从所述第二存储器读取所述数据;The accelerator sets the state information of the data to the first state and reads the data from the second memory;
    在所述数据的状态信息为所述第一状态的情况下,所述加速器将所述数据的状态信息设置为所述第二状态,将所述数据存储至所述第一存储器。When the state information of the data is the first state, the accelerator sets the state information of the data to the second state, and stores the data in the first memory.
  6. 根据权利要求1至5任一权利要求所述的方法,其特征在于,所述方法还包括:所述加速器对所述第一存储器进行配置,获得所述第一存储器的配置信息,所述配置信息包括缓存开关状态、缓存地址范围、缓存容量中的一种或者多种,其中,所述缓存开关状态用于指示所述加速器是否使用所述第一存储器,所述缓存地址范围用于指示所述加速器将存储地址处于所述缓存地址范围的数据存储于所述第一存储器,所述缓存容量用于指示所述第一存储器的容量。The method according to any one of claims 1 to 5, characterized in that the method further includes: the accelerator configuring the first memory to obtain the configuration information of the first memory, and the configuration The information includes one or more of cache switch status, cache address range, and cache capacity. The cache switch status is used to indicate whether the accelerator uses the first memory, and the cache address range is used to indicate whether the accelerator uses the first memory. The accelerator stores data with a storage address in the cache address range in the first memory, and the cache capacity is used to indicate the capacity of the first memory.
  7. 根据权利要求1至6任一权利要求所述的方法,其特征在于,所述加速器通过专用集 成电路ASIC或者现场编程逻辑门阵列FPGA技术实现,所述第一存储器包括静态随机存储器SRAM、存储级内存SCM、寄存器、内容可寻址存储器CAM中的一种或者多种。The method according to any one of claims 1 to 6, characterized in that the accelerator uses a dedicated set It is implemented by circuit-integrated ASIC or field-programmable logic gate array FPGA technology. The first memory includes one or more of static random access memory SRAM, storage-level memory SCM, register, and content-addressable memory CAM.
  8. 根据权利要求1至7任一权利要求所述的方法,其特征在于,所述待缓存数据包括访问频率高于第一阈值的数据。The method according to any one of claims 1 to 7, characterized in that the data to be cached includes data with an access frequency higher than a first threshold.
  9. 一种数据处理的装置,其特征在于,所述数据处理的装置包括第一存储器,所述数据处理的装置包括:A data processing device, characterized in that the data processing device includes a first memory, and the data processing device includes:
    获取单元,用于获取数据处理指令,其中,所述数据处理指令包括数据在第二存储器中的地址,所述第一存储器的数据读写效率大于所述第二存储器的数据读写效率;An acquisition unit configured to acquire data processing instructions, wherein the data processing instructions include the address of the data in the second memory, and the data reading and writing efficiency of the first memory is greater than the data reading and writing efficiency of the second memory;
    处理单元,用于根据所述数据处理指令从所述第一存储器读取所述数据,对所述数据进行处理,其中,所述第一存储器用于存储所述第二存储器中的待缓存数据,所述待缓存数据包括被所述数据处理的装置历史访问的数据。a processing unit, configured to read the data from the first memory according to the data processing instructions and process the data, wherein the first memory is used to store the data to be cached in the second memory , the data to be cached includes data historically accessed by the data processing device.
  10. 根据权利要求9所述的装置,其特征在于,所述数据处理的装置包括状态信息存储器,所述状态信息存储器用于存储所述数据的状态信息,所述状态信息包括第一状态和第二状态,其中,所述第一状态用于指示所述数据处于修改状态,所述第二状态用于指示所述数据未处于修改状态。The device according to claim 9, characterized in that the data processing device includes a state information memory, the state information memory is used to store state information of the data, the state information includes a first state and a second state. status, wherein the first status is used to indicate that the data is in a modified status, and the second status is used to indicate that the data is not in a modified status.
  11. 根据权利要求9或10所述的装置,其特征在于,所述数据处理的装置包括地址存储器,所述地址存储器用于存储所述数据在所述第一存储器中的地址与所述数据在所述第二存储器中的地址之间的对应关系。The device according to claim 9 or 10, characterized in that the data processing device includes an address memory, the address memory is used to store the address of the data in the first memory and the location of the data. The corresponding relationship between the addresses in the second memory.
  12. 根据权利要求9至11任一权利要求所述的装置,其特征在于,所述数据处理指令包括数据写入指令;The device according to any one of claims 9 to 11, wherein the data processing instructions include data writing instructions;
    所述处理单元,用于在所述数据的状态信息为第一状态、所述数据处理指令为数据写入指令的情况下,将所述数据的状态信息设置为第二状态,根据所述数据处理指令从所述第一存储器读取并更新所述数据,将更新后的数据存储于所述第一存储器。The processing unit is configured to set the status information of the data to a second status when the status information of the data is a first status and the data processing instruction is a data writing instruction. According to the data The processing instructions read and update the data from the first memory, and store the updated data in the first memory.
  13. 根据权利要求12所述的装置,其特征在于,所述数据处理指令包括数据读取指令;The device according to claim 12, wherein the data processing instructions include data reading instructions;
    所述处理单元,用于在所述数据处理的装置获取数据处理指令之前,确定所述地址存储器中不包括所述数据处理指令中的地址;The processing unit is configured to determine that the address memory does not include the address in the data processing instruction before the data processing device obtains the data processing instruction;
    所述处理单元,用于将所述数据的状态信息设置为所述第一状态,从所述第二存储器读取所述数据;The processing unit is configured to set the state information of the data to the first state and read the data from the second memory;
    所述处理单元,用于在所述数据的状态信息为所述第一状态的情况下,所述数据处理的装置将所述数据的状态信息设置为所述第二状态,将所述数据存储至所述第一存储器。The processing unit is configured to, when the status information of the data is the first status, the data processing device set the status information of the data to the second status, store the data to the first memory.
  14. 根据权利要求9至13任一权利要求所述的装置,其特征在于,所述装置还包括配置单元,所述配置单元用于对所述第一存储器进行配置,获得所述第一存储器的配置信息,所述配置信息包括缓存开关状态、缓存地址范围、缓存容量中的一种或者多种,其中,所述缓存开关状态用于指示所述数据处理的装置是否使用所述第一存储器,所述缓存地址范围用于 指示所述数据处理的装置将存储地址处于所述缓存地址范围的数据存储于所述第一存储器,所述缓存容量用于指示所述第一存储器的容量。The device according to any one of claims 9 to 13, characterized in that the device further includes a configuration unit, the configuration unit is used to configure the first memory to obtain the configuration of the first memory. Information, the configuration information includes one or more of cache switch status, cache address range, cache capacity, wherein the cache switch status is used to indicate whether the data processing device uses the first memory, so The cache address range described above is used for The device for instructing the data processing stores data with a storage address within the cache address range in the first memory, and the cache capacity is used to indicate the capacity of the first memory.
  15. 根据权利要求9至14任一权利要求所述的装置,其特征在于,所述数据处理的装置通过专用集成电路ASIC或者现场编程逻辑门阵列FPGA技术实现,所述第一存储器包括静态随机存储器SRAM、存储级内存SCM、寄存器、内容可寻址存储器CAM中的一种或者多种。The device according to any one of claims 9 to 14, characterized in that the data processing device is implemented by an application specific integrated circuit (ASIC) or a field programmable logic gate array (FPGA) technology, and the first memory includes a static random access memory (SRAM). , one or more of storage-level memory SCM, register, and content-addressable memory CAM.
  16. 根据权利要求9至15任一权利要求所述的装置,其特征在于,所述待缓存数据包括访问频率高于第一阈值的数据。The device according to any one of claims 9 to 15, wherein the data to be cached includes data with an access frequency higher than a first threshold.
  17. 一种加速器,其特征在于,所述加速器包括处理器和供电电路,所述供电电路用于为所述处理器供电,所述处理器用于实现如权利要求1至8任一权利要求所述加速器所执行操作步骤的功能。An accelerator, characterized in that the accelerator includes a processor and a power supply circuit, the power supply circuit is used to supply power to the processor, and the processor is used to implement the accelerator according to any one of claims 1 to 8. The function of the steps performed.
  18. 一种计算设备,其特征在于,所述计算设备包括中央处理器CPU和加速器,所述CPU用于运行指令实现所述计算设备的业务功能,所述加速器用于实现如权利要求1至8任一权利要求所述加速器所执行操作步骤的功能。 A computing device, characterized in that the computing device includes a central processing unit (CPU) and an accelerator, the CPU is used to run instructions to implement business functions of the computing device, and the accelerator is used to implement any of claims 1 to 8. The function of the operation steps performed by the accelerator of one claim.
PCT/CN2023/091041 2022-04-27 2023-04-27 Data processing method and apparatus and related device WO2023208087A1 (en)

Applications Claiming Priority (2)

Application Number Priority Date Filing Date Title
CN202210451716.8A CN117008810A (en) 2022-04-27 2022-04-27 Data processing method and device and related equipment
CN202210451716.8 2022-04-27

Publications (1)

Publication Number Publication Date
WO2023208087A1 true WO2023208087A1 (en) 2023-11-02

Family

ID=88517853

Family Applications (1)

Application Number Title Priority Date Filing Date
PCT/CN2023/091041 WO2023208087A1 (en) 2022-04-27 2023-04-27 Data processing method and apparatus and related device

Country Status (2)

Country Link
CN (1) CN117008810A (en)
WO (1) WO2023208087A1 (en)

Citations (5)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN102566978A (en) * 2010-09-30 2012-07-11 Nxp股份有限公司 Memory accelerator buffer replacement method and system
CN105074677A (en) * 2013-03-12 2015-11-18 英派尔科技开发有限公司 Accelerator buffer access
CN106415485A (en) * 2014-01-23 2017-02-15 高通股份有限公司 Hardware acceleration for inline caches in dynamic languages
WO2018179873A1 (en) * 2017-03-28 2018-10-04 日本電気株式会社 Library for computer provided with accelerator, and accelerator
CN111752867A (en) * 2019-03-29 2020-10-09 英特尔公司 Shared accelerator memory system and method

Patent Citations (5)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN102566978A (en) * 2010-09-30 2012-07-11 Nxp股份有限公司 Memory accelerator buffer replacement method and system
CN105074677A (en) * 2013-03-12 2015-11-18 英派尔科技开发有限公司 Accelerator buffer access
CN106415485A (en) * 2014-01-23 2017-02-15 高通股份有限公司 Hardware acceleration for inline caches in dynamic languages
WO2018179873A1 (en) * 2017-03-28 2018-10-04 日本電気株式会社 Library for computer provided with accelerator, and accelerator
CN111752867A (en) * 2019-03-29 2020-10-09 英特尔公司 Shared accelerator memory system and method

Also Published As

Publication number Publication date
CN117008810A (en) 2023-11-07

Similar Documents

Publication Publication Date Title
JP2019067417A (en) Final level cache system and corresponding method
US11650940B2 (en) Storage device including reconfigurable logic and method of operating the storage device
US9697111B2 (en) Method of managing dynamic memory reallocation and device performing the method
KR20180054394A (en) A solid state storage device comprising a Non-Volatile Memory Express (NVMe) controller for managing a Host Memory Buffer (HMB), a system comprising the same and method for managing the HMB of a host
US11675709B2 (en) Reading sequential data from memory using a pivot table
US11449230B2 (en) System and method for Input/Output (I/O) pattern prediction using recursive neural network and proaction for read/write optimization for sequential and random I/O
US7222217B2 (en) Cache residency test instruction
US8838873B2 (en) Methods and apparatus for data access by a reprogrammable circuit module
WO2023125524A1 (en) Data storage method and system, storage access configuration method and related device
CN110597742A (en) Improved storage model for computer system with persistent system memory
US20150121033A1 (en) Information processing apparatus and data transfer control method
EP4242819A1 (en) System and method for efficiently obtaining information stored in an address space
WO2023208087A1 (en) Data processing method and apparatus and related device
WO2023124304A1 (en) Chip cache system, data processing method, device, storage medium, and chip
US6324633B1 (en) Division of memory into non-binary sized cache and non-cache areas
KR20230117692A (en) Device, system and method for hybrid database scan acceleration
CN116541415A (en) Apparatus, system and method for acceleration
Awamoto et al. Designing a storage software stack for accelerators
US10216524B2 (en) System and method for providing fine-grained memory cacheability during a pre-OS operating environment
US10776011B2 (en) System and method for accessing a storage device
US20240061784A1 (en) System and method for performing caching in hashed storage
US20230384960A1 (en) Storage system and operation method therefor
US20230359389A1 (en) Operation method of host configured to communicate with storage devices and memory devices, and system including storage devices and memory devices
US20240078036A1 (en) Hybrid memory management systems and methods with in-storage processing and attribute data management
CN116738510A (en) System and method for efficiently obtaining information stored in address space

Legal Events

Date Code Title Description
121 Ep: the epo has been informed by wipo that ep was designated in this application

Ref document number: 23795519

Country of ref document: EP

Kind code of ref document: A1