WO2023208087A1

WO2023208087A1 - Data processing method and apparatus and related device

Info

Publication number: WO2023208087A1
Application number: PCT/CN2023/091041
Authority: WO
Inventors: 熊鹰; 徐栋; 卢霄
Original assignee: 华为技术有限公司
Priority date: 2022-04-27
Filing date: 2023-04-27
Publication date: 2023-11-02
Also published as: CN117008810A

Abstract

The present application provides a data processing method and apparatus and a related device. The method is applied to an accelerator. The accelerator comprises a first memory. The method comprises the following steps: the accelerator obtains a data processing instruction, wherein the data processing instruction comprises an address of data in a second memory, and the data read-write efficiency of the first memory is greater than that of the second memory; and the accelerator reads the data from the first memory according to the data processing instruction and processes the data, wherein the first memory is used for storing data to be cached of the second memory, and the data to be cached comprises data historically accessed by the accelerator, so that the accelerator can directly interact with the first memory having high read-write efficiency when reading and writing data. Therefore, the data read-write efficiency of the accelerator is improved, and the problem that the read-write efficiency of the accelerator is limited by the internal memory bandwidth bottleneck is solved.

Description

A data processing method, device and related equipment

This application requests the priority of the Chinese patent application submitted to the China Patent Office on April 27, 2022, with the application number 202210451716.8 and the invention title "A data processing method, device and related equipment". All of the patent applications The contents are incorporated into this application by reference.

Technical field

The present application relates to the field of computers, and in particular, to a data processing method, device and related equipment.

Background technique

With the continuous development of science and technology, accelerators (also called accelerated processing units) such as application-specific integrated circuits (ASICs) and field-programmable gate arrays (FPGAs) are playing an important role in data processing. It has assumed an increasingly important role, such as matrix operations, image processing, machine learning (ML), etc. In these fields, accelerators have high requirements for real-time reading and writing of data in memory. However, the current access frequency of the accelerator to data in the memory is not balanced with the available bandwidth provided by the memory to the accelerator. When the accelerator needs to perform read and write operations on the data in the memory, it often needs to wait for an additional period of time, resulting in a decrease in the processing performance of the accelerator.

Contents of the invention

This application provides a data processing method, device and related equipment to solve the problem of accelerator processing performance degradation caused by imbalance between the accelerator's access frequency to data in the memory and the available bandwidth provided by the memory to the accelerator.

In a first aspect, a data processing method is provided. The method is applicable to an accelerator. The accelerator includes a first memory. The method may include the following steps: the accelerator obtains data processing instructions, wherein the data processing instructions include data in the second memory. address, the data reading and writing efficiency of the first memory is greater than the data reading and writing efficiency of the second memory, the accelerator reads data from the first memory according to the data processing instructions, and processes the data, where the first memory is used to store the second memory The data to be cached in , the data to be cached includes data accessed by the accelerator in history.

To implement the method described in the first aspect, the accelerator stores the data to be cached in the second memory in the first memory, where the read and write efficiency of the first memory is greater than the read and write efficiency of the second memory, so that the accelerator is processing the data processing instructions. At this time, it can directly interact with the first memory with higher reading and writing efficiency, thereby improving the data reading and writing efficiency of the accelerator.

In a possible implementation, the accelerator is implemented through application-specific integrated circuit (ASIC) or field-programmable gate array (FPGA) technology, and the first memory includes static random access memory (static RAM) , SRAM), storage class memory (storage class memory, SCM), register, content-addressable memory (content-addressable memory, CAM), etc., this application does not make specific limitations.

In specific implementation, the computing device where the accelerator is located may include a central processing unit (CPU) and an accelerator. The accelerator may be a system-level chip implemented through FPGA, ASIC and other technologies. Specifically, it may be used in the computing device 100 to assist the CPU. A processing unit that handles special types of computing tasks. The above special types of computing tasks can be graphics processing, vector calculations, machine learning, etc. The accelerator can be a graphics processing unit (GPU), a distributed processing unit (DPU), a neural network processing unit (NPU), etc. Alternatively, the accelerator may also be a CPU. In other words, the computing device may include multiple processors such as CPU1 and CPU2, where CPU1 is the processor 110 and CPU2 is an accelerator. It should be understood that the above examples are for illustration and are not specifically limited in this application.

In specific implementation, the first memory and the second memory may be the original internal memory of the accelerator; or the first memory is the original internal memory of the accelerator, and the second memory is the original memory outside the accelerator, for example, the second memory is the accelerator. For the memory in the computing device, it should be understood that the method provided by this application can be combined with specific application scenarios to select an appropriate memory as the first memory inside the accelerator. If there are multiple memories inside the accelerator, then the memory with low read and write efficiency can be As the second memory, the one with high read and write efficiency is used as the first memory. If there are not multiple memories inside the accelerator or the read and write efficiencies of multiple memories are the same, then the memory inside the accelerator can be used as the first memory, and the read and write memory outside the accelerator can be used. A memory with low efficiency but large storage capacity is used as the second memory.

The above implementation method uses the original memory hardware inside the accelerator and improves the data reading and writing efficiency of the accelerator 120 through algorithms. There is no need to deploy additional cache hardware and reduces the cost of solution implementation, especially for small hardware specifications and sizes implemented by ASIC or FPGA technology. For accelerators, the plan is more implementable.

In a possible implementation, the data to be cached includes data with an access frequency higher than the first threshold.

In specific implementation, the accelerator may store data in the second memory, and then store the data to be cached in the second memory in the first memory. The accelerator can update the data to be cached to the first memory in real time, or can update the data to be cached to the first memory in a delayed manner or according to a certain algorithm, which is not specifically limited in this application. The data to be cached includes data historically accessed by the accelerator. In this way, when the accelerator accesses the data again, it can directly interact with the first memory, which has a faster reading and writing speed, thereby improving the data reading and writing efficiency of the accelerator.

Optionally, the data to be cached may include data whose historical access frequency is higher than the first threshold. That is to say, the data stored in the first memory is the data with higher access frequency stored in the second memory. Since the first memory has faster reading and writing efficiency, the accelerator can directly communicate with the first memory when processing frequently accessed data. Interaction, thereby improving the read and write efficiency of the accelerator. The size of the first threshold can be determined according to specific application scenarios, and is not specifically limited in this application.

Optionally, when the data in the first memory reaches the storage threshold, the data in the first memory whose access frequency is not higher than the second threshold can be deleted, and then the data accessed by the accelerator from the second memory can be continued to be stored in the first memory, When the data in the first memory reaches the storage threshold, the data in the first memory whose access frequency is not higher than the second threshold is deleted, so that the data stored in the first memory is data that has a higher access frequency and is accessed by the accelerator. The first threshold and the second threshold may be the same value, or the second threshold may be greater than the first threshold, which is not specifically limited in this application.

Optionally, the data to be cached may also be data recently accessed by the accelerator. The latest here may refer to data accessed by the accelerator within a time range. The time range here may be determined according to the storage capacity of the first memory. Specifically, , when the amount of data in the first memory reaches the storage threshold, the access time of each data in the first memory is sorted, and the data with the longest access time from the current time will be deleted, and so on, thereby ensuring that the data in the first memory is The access time of stored data is the most recent access time.

Optionally, the data recently accessed by the accelerator may be the data accessed by the accelerator within a preset time range. That is to say, if the current time is T, then the preset time range may be the time range from T-t time to T time, The size of t can be determined according to specific application scenarios, and is not specifically limited in this application.

Optionally, the data to be cached can also include prefetched data. Simply put, the storage controller can determine the data that the accelerator may access through the prefetch algorithm, extract it from the second memory in advance, and store it to read and write efficiency. In this way, when the accelerator requests to read the data, it can directly interact with the first memory, thereby improving the read and write efficiency of the accelerator.

It should be understood that the data to be cached may also include more types of data, which may be determined based on the application scenario of the accelerator, which is not specifically limited in this application.

In the above implementation, due to the high access frequency and the high possibility that the recently accessed or prefetched data will be accessed again by the accelerator, it is stored in the first memory, so that when the accelerator accesses the data again, it can be directly related to the read and write speed. Faster first memory interaction, thereby improving the accelerator's data reading and writing efficiency.

In a possible implementation, the accelerator configures the first memory and obtains configuration information of the first memory. The configuration information includes one or more of cache switch status, cache address range, and cache capacity, where the cache switch status It is used to indicate whether the accelerator uses the first memory. The cache address range is used to instruct the accelerator to store data with a storage address within the cache address range in the first memory. The cache capacity is used to indicate the capacity of the first memory.

In specific implementation, configuring the cache switch to on means using the first memory for data storage, and being off means not using the first memory for data storage. Configuring the address range as the target address range means that the data stored in the first memory is the cache data of the data stored in the target address range of the second memory. The cache depth configured as D means that the storage capacity of the first memory is D.

In the above implementation, by configuring the first memory, you can choose whether to enable the cache function of the storage controller and set the address space and capacity of the cache according to business needs, making the solution of this application suitable for more application scenarios and providing flexibility. better.

In a possible implementation, the accelerator includes an address memory, and the address memory is used to store a correspondence between an address of the data in the first memory and an address of the data in the second memory.

Optionally, the data processing instructions may be instructions generated by the accelerator, or instructions sent to it by the CPU in the computing device where the accelerator is located. The data processing instructions may be data writing instructions or data reading instructions, and may also include Other instructions for business processing after the data is read out. For example, after the accelerator reads the data from the first memory, it can update, delete, merge and other operations on the data, and can also perform calculations on the multiple data read. Processing, such as matrix multiplication, convolution operations, etc., can be determined according to the processing business of the accelerator. This application specifically limits the specific process of subsequent data processing.

In a specific implementation, when the accelerator obtains a data write instruction, the write address in the data write instruction is the second storage address of the data. If the above-mentioned second storage address is already stored in the address memory and is the same as the first storage address, Correspondence means that the historical version of the data has been written to the first storage address of the first memory. At this time, the historical version of data corresponding to the first storage address can be updated and the data is sent to the second memory to Request to update the data corresponding to the second storage address.

If there is no above-mentioned second storage address in the address memory, it means that the data is written to the first memory for the first time. The accelerator can first determine the first storage address corresponding to the second storage address, and then store the data carried by the data write instruction in the first memory. In the first storage address of a memory, the corresponding relationship between the first storage address and the second storage address of the data is stored in the address memory. It should be noted that when the storage controller writes data for the first time, it can determine the first storage address of the data based on the current free address of the first memory. This application does not limit the address allocation strategy of the storage controller.

Similarly, when the storage controller obtains a data read instruction, the read address in the data read instruction is the second storage address of the data. At this time, the storage controller can compare the second storage address with the address in the address memory. The address is matched. If the address memory includes the second storage address, the corresponding first storage address can be obtained according to the second storage address of the data, and then the data can be read from the first memory.

If the second storage address is not stored in the address memory, it means that the data is not stored in the first memory, and then the accelerator can read the data from the second memory according to the second storage address. At the same time, the accelerator can also determine the first storage address of the data, and store the data read from the second memory into the first memory, so that the accelerator can subsequently read directly from the first memory with higher reading and writing efficiency. The data is obtained, and the mapping relationship between the first storage address and the second storage address of the data is stored in the address memory.

The above implementation method uses the address memory to record the correspondence between the first storage address and the second storage address, thereby realizing the addressing of the data to be cached. In this way, receiving the data processing instruction carrying the second storage address can also be done individually. According to the address recorded in the address memory, the data is read from the first memory, so that the accelerator can directly interact with the first memory with faster reading and writing speed, thereby improving the data reading and writing efficiency of the accelerator.

In a possible implementation, the accelerator includes a state information memory, the state information memory is used to store state information of data, the state information includes a first state and a second state, where the state information includes a first state and a second state, The first state is used to indicate that the data is in a modified state and is being written to the first memory or the second memory. At this time, the data cannot be read from the first memory, otherwise the wrong version of data may be read. ; The second state is used to indicate that the data is not in a modified state, and the data can be read from the first memory at this time.

In specific implementation, when the accelerator processes the data write instruction, if the status information of the data is in the first state, that is, the data is currently being modified, the accelerator can first set the status information of the data to the second state, Then the data is written into the first storage address, and then the data is written into the second memory for data updating.

Optionally, if the status information of the data is in the second state, that is to say, the data is not currently being manipulated, at this time, the data in the data processing instruction can be written to the first memory for data update, and the data can be written into the third memory. The second memory performs data updates.

In the specific implementation, when the accelerator processes the data read instruction, if the address memory does not include the read address, that is, the data does not exist in the first memory, and when the accelerator reads the data from the second memory, the accelerator will The state information of the data is set to the first state, and the accelerator reads the data from the second memory. After the data is retrieved, when the state information of the data is the first state, the accelerator sets the state information of the data to the second state and stores the data. to the first storage address of the first memory. If the status information changes to the second status, it means that the storage controller has updated the latest version of the data, and the data can no longer be stored in the first memory at this time. It should be understood that if a new version of data is written into the first memory and the second memory during the data reading process, the status information can ensure that the data is the latest version of the data.

Optionally, if the address memory includes a read address and the status information is the first state, that is, the previous version of the data is being operated on, the latest version of the data can be read from the second memory. If the address memory includes a read address and the status information is in the second state, that is, the data is not being operated on, the latest version of the data can be read from the first memory.

In specific implementation, the above state information can be represented by binary characters. For example, the first state is represented by the character "1", and the second state is represented by the character "0". The first state and the second state can also be distinguished by other identifiers. status, this application does not specifically limit this.

In the above implementation, when the status information of the data is in the first state, it means that the data is being written to the first memory or the second memory. At this time, the data cannot be read from the first memory, otherwise the read may occur. In the case of incorrect version data; when the status information of the data is in the second state, it means that the data is not in a modified state. At this time, the data can be read from the first memory, and the status information of the data is recorded through the status information memory to ensure that the user does not The wrong version of data will be read. Make sure that the data recorded in the first memory is the latest version.

In a second aspect, a data processing device is provided. The data processing device includes a first memory. The data processing device includes: an acquisition unit for acquiring data processing instructions, wherein the data processing instructions include data in the second memory. The address in For storing the data to be cached in the second memory, the data to be cached includes data historically accessed by the data processing device.

To implement the method described in the second aspect, the accelerator stores the data to be cached in the second memory in the first memory, where the read and write efficiency of the first memory is greater than the read and write efficiency of the second memory, so that the accelerator processes the data processing instructions. When processing, it can directly interact with the first memory with higher reading and writing efficiency, thereby improving the data reading and writing efficiency of the accelerator.

In a possible implementation, the data processing device includes a status information memory. The status information memory is used to store status information of the data. The status information includes a first state and a second state, where the first state is used to indicate that the data is in modification. status, the second status is used to indicate that the data is not in a modified state.

In a possible implementation, the data processing device includes an address memory, and the address memory is used to store a correspondence between the address of the data in the first memory and the address of the data in the second memory.

In a possible implementation, the data processing instruction includes a data writing instruction; a processing unit configured to set the status information of the data when the status information of the data is the first status and the data processing instruction is the data writing instruction. In the second state, data is read from the first memory and updated according to the data processing instructions, and the updated data is stored in the first memory.

In a possible implementation, the data processing instructions include data reading instructions; the processing unit is used to determine that the address in the data processing instruction is not included in the address memory before the data processing device obtains the data processing instructions; the processing unit is configured to for setting the status information of the data to the first state and reading the data from the second memory; and the processing unit for setting the status information of the data to the first state when the status information of the data is the first status. In the second state, data is stored in the first storage address of the first memory.

In a possible implementation, the device further includes a configuration unit configured to configure the first memory and obtain configuration information of the first memory. The configuration information includes one of cache switch status, cache address range, and cache capacity. Or more, wherein the cache switch state is used to indicate whether the data processing device uses the first memory, the cache address range is used to indicate the data processing device to store data with a storage address within the cache address range in the first memory, and the cache capacity is used to indicate whether the data processing device uses the first memory. To indicate the capacity of the first memory.

In a possible implementation, the data processing device is implemented through ASIC or FPGA technology, and the first memory includes one or more of SRAM, register, SCM, and CAM.

In a third aspect, an accelerator is provided. The accelerator includes a processor and a power supply circuit. The power supply circuit is used to supply power to the processor. The processor is used to implement the functions of the operation steps performed by the accelerator described in the first aspect.

In a fourth aspect, a computing device is provided. The computing device includes a CPU and an accelerator. The CPU is used to run instructions to implement business functions of the computing device. The accelerator is used to implement the functions of the operation steps performed by the accelerator described in the first aspect.

Based on the implementation methods provided in the above aspects, this application can also be further combined to provide more implementation methods.

Description of the drawings

Figure 1 is a schematic structural diagram of a computing device provided by this application;

Figure 2 is a schematic structural diagram of another computing device provided by this application;

Figure 3 is a schematic structural diagram of an accelerator provided by this application;

Figure 4 is a schematic structural diagram of a data processing device provided by this application;

Figure 5 is a schematic flowchart of steps in an application scenario of a data processing method provided by this application;

Figure 6 is a schematic flowchart of steps for processing data reading instructions in a data processing method provided by this application;

FIG. 7 is a schematic flowchart of steps for processing data writing instructions in a data processing method provided by this application.

Detailed ways

The technical solutions in the embodiments of the present invention will be described below with reference to the accompanying drawings in the embodiments of the present invention. Obviously, the described embodiments are only some, not all, of the embodiments of the present invention. Based on the embodiments of the present invention, All other embodiments obtained by those of ordinary skill in the art without creative efforts fall within the scope of protection of the present invention.

First, the application scenarios involved in this application are described.

In order to solve the problem that when the accelerator reads and writes data with the memory, it needs to wait for an additional period of time, causing the processing performance of the accelerator to be limited by the memory bandwidth. This application provides a computing device. The accelerator in the computing device includes a first memory. , the accelerator stores the data to be cached in the second memory in the first memory, where the read and write efficiency of the first memory is greater than the read and write efficiency of the second memory, so that when the accelerator processes the data processing instructions, it can directly communicate with The first memory with higher reading and writing efficiency interacts with each other, thereby improving the data reading and writing efficiency of the accelerator.

The computing device provided by this application will be introduced in detail below with reference to the accompanying drawings. Figure 1 is a schematic structural diagram of a computing device provided by this application. As shown in Figure 1, the computing device 100 may include a processor 110, an accelerator 120, a second memory 130, a communication interface 140 and a storage medium 150. Among them, communication connections can be established between the processor 110, the accelerator 120, the second memory 130, the communication interface 140 and the storage medium 150 through a bus. The number of the processor 110, the accelerator 120, the second memory 130, the communication interface 140 and the storage medium 150 may be one or more, and is not specifically limited in this application.

The processor 110 and the accelerator 120 may be hardware accelerators or a combination of hardware accelerators. The above-mentioned hardware accelerator is an application-specific integrated circuit (ASIC), a programmable logic device (PLD), or a combination thereof. The above-mentioned PLD is a complex programmable logic device (CPLD), a field-programmable gate array (FPGA), a general array logic (GAL), or any combination thereof. The processor 110 is used to execute instructions in the storage medium 150 to implement the business functions of the computing device 100 .

Specifically, the processor 110 can be a central processing unit (CPU), and the accelerator 120 (also called an accelerated processing unit (APU)) can be a system-level chip implemented through FPGA, ASIC and other technologies. , the accelerator 120 is a processing unit in the computing device 100 for assisting the CPU in processing special types of computing tasks. The special types of computing tasks may be graphics processing, vector calculations, machine learning, etc. The accelerator 120 may be a graphics processor (graphics processing unit, GPU), a processor distributed processing unit (data processing unit, DPU), a neural network processor (neural-network processing unit, NPU), etc.

Optionally, the accelerator 120 may also be a CPU. In other words, the computing device 100 may include multiple processors such as CPU1 and CPU2, where CPU1 is the processor 110 and CPU2 is the accelerator 120. It should be understood that the above examples are for illustration. There are no specific limitations in this application.

The storage medium 150 is a carrier for storing data, such as hard disk (hard disk), U disk (universal serial bus, USB), flash memory (flash), SD card (secure digital memory Card, SD card), memory stick, etc. The hard disk can It is a hard disk drive (HDD), a solid state disk (SSD), a mechanical hard disk (HDD), etc., and is not specifically limited in this application. The storage medium 150 may include a second memory, and in a specific implementation, the second memory may be DDR.

The communication interface 140 is a wired interface (such as an Ethernet interface), an internal interface (such as a high-speed serial computer expansion bus (Peripheral Component Interconnect express, PCIe) bus interface), a wired interface (such as an Ethernet interface), or a wireless interface (such as a cellular interface). network interface or using the wireless LAN interface) for communicating with other servers or units.

Bus 160 is a peripheral component interconnect express (PCIe) bus, an extended industry standard architecture (EISA) bus, a unified bus (unified bus, Ubus or UB), or a computer express link (compute express) link (CXL), cache coherent interconnect for accelerators (CCIX), etc. The bus 160 is divided into an address bus, a data bus, a control bus, etc. For the sake of clear explanation, various buses are marked as the bus 160 in the figure.

The second memory 130 includes volatile memory or non-volatile memory, or both volatile and non-volatile memory. Among them, volatile memory is random access memory (RAM). For example, double rate synchronous dynamic random access memory (double data rate RAM, DDR), dynamic random access memory (DRAM), synchronous dynamic random access memory (synchronous DRAM, SDRAM), double data rate synchronous dynamic random access memory (synchronous dynamic random access memory). double data date SDRAM, DDR SDRAM), enhanced synchronous dynamic random access memory (enhanced SDRAM, ESDRAM), synchronous link dynamic random access memory (synchlink DRAM, SLDRAM) and direct memory bus random access memory (direct rambus RAM, DR RAM).

Further, the accelerator 120 may also include a first memory 121, where the data reading and writing efficiency of the first memory 121 is greater than the data reading and writing efficiency of the second memory 130. In specific implementation, the first memory 121 may include static random access memory (static RAM, SRAM), storage class memory (storage class memory, SCM), register, content-addressable memory (content-addressable memory, CAM), etc. etc., this application does not make specific limitations.

In the embodiment of the present application, the data to be cached stored in the second memory 130 can be cached in the first memory 121. In this way, when the accelerator 120 needs to access the data in the second memory 130, the read and write efficiency can be improved. The data is read from the high-performance first memory 121, thereby making up for the difference in processing speed between the accelerator 120 and the low-performance memory, and improving the data reading and writing efficiency of the accelerator 120.

It should be noted that the second memory 130 in Figure 1 can be the original memory inside the computing device 100, and the first memory 121 can be the original memory inside the accelerator 120. This application uses the original memory inside the computing device 100 and the accelerator 120. Hardware storage, the algorithm improves the data reading and writing efficiency of the accelerator 120, without the need to deploy additional cache hardware, reducing the implementation cost of the solution. Especially for the accelerator 120 with small hardware specifications, the solution is more implementable.

It should be understood that Figure 1 is an exemplary division method of the present application. Optionally, as shown in Figure 2, the second memory 130 can also be deployed inside the accelerator 120. In other words, the accelerator 120 itself includes At least two memories, one of which has a higher reading and writing efficiency than the second memory, then the memory with low reading and writing efficiency inside the accelerator 120 can be used as the second memory 130, and the memory with high reading and writing efficiency can be used as the first memory. Memory 121.

It should be noted that when the second memory 130 in Figure 1 is a memory external to the accelerator 120, the communication between the first memory 121 and the second memory 130 may be off-chip communication. The bus can be an off-chip bus. The off-chip bus here generally refers to the public information channel between the CPU and external devices, such as the above-mentioned PCIe bus, EISA bus, UB bus, CXL bus, CCIX bus, GenZ bus, etc., this application No specific limitation is made.

When the second memory 130 in Figure 2 is a memory inside the accelerator 120, the communication between the first memory 121 and the second memory 130 may be in-band communication, and the bus between the first memory 121 and the second memory 130 may be On-chip buses, such as advanced eXtensible interface (AXI) bus, advanced microcontroller bus architecture (AMBA), etc., are not specifically limited in this application.

Similarly, in the scenario shown in Figure 2, the data to be cached stored in the second memory 130 can be cached in the first memory 121. In this way, when the accelerator 120 needs to access the data in the second memory 130, it can be accessed from The data is read from the first memory 121 with higher reading and writing efficiency, thereby making up for the difference in processing speed between the accelerator 120 and the low-performance memory, and improving the data reading and writing efficiency of the accelerator 120 .

It can be understood that the first memory 121 and the second memory 130 in Figure 2 are the original memories inside the accelerator 120. By using the original hardware memory inside the accelerator 120, the data reading and writing efficiency of the accelerator 120 is improved through algorithms, without additional Deploy caching hardware to reduce the implementation cost of the solution. Especially for the accelerator 120 with small hardware specifications, the solution is more implementable.

Further, the accelerator 120 can be further divided into multiple unit modules. Figure 3 is an accelerator provided by this application. A schematic structural diagram of , Figure 3 is an exemplary division method. As shown in Figure 3, the accelerator 120 may include a storage controller 122, a first memory 121, a status information memory 123 and an address memory 124, where the storage controller 122, A communication connection is established between the first memory 121, the status information memory 123 and the address memory 124 through an internal bus. For the internal bus, reference can be made to the description of the bus 160, which will not be repeated here.

In addition, the accelerator 120 may also include a power supply circuit, and the power supply circuit may provide power to the memory controller 122 . The memory controller 122 can be implemented by a hardware logic circuit, for example, an application specific integrated circuit (ASIC) to implement various functions of the accelerator 120 . The power supply circuit may be located in the same accelerator as the storage controller 122, or may be located in another accelerator other than the accelerator where the storage controller 122 is located. The power supply circuit includes but is not limited to at least one of the following: a power supply subsystem, a power management accelerator, a power management processor, or a power management control circuit. Optionally, accelerator 120 is an independent accelerator.

The first memory 121 is used to store data, the status information memory 123 is used to store the status information of the data, and the address memory 124 is used to store the address information of the data. The first memory 121 may be a memory with a greater reading and writing efficiency than the second memory 130 , such as SRAM; the storage space required for status information is small, but it needs to be synchronized with the status of the data, so the status information memory 123 can be a register (register); the address memory 124 can be a CAM. It should be understood that the CAM is based on content The working mechanism of the addressed memory is to compare an input data item with all data items stored in the CAM to determine whether the input data item matches the data items stored in the CAM, so the address information used to store the data The address memory 124 can be implemented using the CAM in the accelerator 120. In this way, when the user requests to access data, the CAM can match the address of the data requested by the user with the address information stored in the CAM. If it matches, it means that the data has been stored in In the first memory 121, it should be understood that the above examples are for illustration, and the address memory 124 can also be implemented using other memories, which is not specifically limited in this application.

It can be understood that the first memory 121 in this application is implemented using the memory in the accelerator 120, and does not require additional cache deployment. The cache function of accelerators such as FPGA and ASIC is realized with very low hardware cost, and the software implementation process only needs to pass The online programming function of FPGA and ASIC can be realized, which can solve the problem of accelerator's read and write efficiency limited by the memory bandwidth bottleneck in a simple, efficient and low-cost way.

In this embodiment of the present application, the storage controller 122 can obtain a data processing instruction, wherein the data processing instruction includes the address of the second memory 130, and reads the data from the first memory according to the address of the second memory 130, The data is then processed according to the data processing instructions.

It should be noted that the above data processing instructions may be data processing instructions generated by the accelerator 120 during business processing, or may be data processing instructions sent by the processor 110 to the accelerator 120, which is not specifically limited in this application.

In specific implementation, the storage controller 122 may store data in the second memory 130, and then store the data to be cached in the second memory 130 in the first memory 121. The storage controller 122 can update the data to be cached to the first memory 121 in real time, or can update the data to be cached to the first memory 121 with a delay or according to a certain algorithm, which is not specifically limited in this application. The data to be cached includes data historically accessed by the accelerator 120 . In this way, when the accelerator 120 accesses the data again, it can directly interact with the first memory 121 that has a faster reading and writing speed, thereby improving the data reading and writing efficiency of the accelerator 120 .

Optionally, the data to be cached may include data whose historical access frequency is higher than the first threshold. That is to say, the data stored in the first memory 121 is the data with a higher access frequency stored in the second memory 130. Since the first memory has a faster reading and writing efficiency, the accelerator 120 can directly process the frequently accessed data. The first memory 121 interacts, thereby improving the reading and writing efficiency of the accelerator 120 . The size of the first threshold can be determined according to specific application scenarios, and is not specifically limited in this application.

Optionally, when the data in the first memory 121 reaches a storage threshold, the access frequency in the first memory 121 can be The data whose rate is not higher than the second threshold is deleted, and then the data accessed by the accelerator 120 from the second memory 130 is continued to be stored in the first memory 121. When the data in the first memory 121 reaches the storage threshold, the data in the first memory 121 is deleted. Data whose access frequency is not higher than the second threshold is deleted, so that the data stored in the first memory 121 is data whose access frequency is higher and is accessed by the accelerator 120 . The first threshold and the second threshold may be the same value, or the second threshold may be greater than the first threshold, which is not specifically limited in this application.

Optionally, the data to be cached may also be data recently accessed by the accelerator 120 , where the latest may refer to data accessed by the accelerator 120 within a time range, and the time range here may be determined according to the storage capacity of the first memory 121 , specifically, when the amount of data in the first memory 121 reaches the storage threshold, the access time of each data in the first memory 121 is sorted, and the data with the longest access time from the current time will be deleted, and so on, so that It is guaranteed that the access time of the data stored in the first memory 121 is the latest access time.

Optionally, the data recently accessed by the accelerator 120 may be the data accessed by the accelerator 120 within a preset time range. That is to say, if the current time is T, then the preset time range may be the period from time T-t to time T. The range, where the size of t can be determined according to specific application scenarios, is not specifically limited in this application.

It can be understood that the possibility of the recently accessed data in the second memory 130 being accessed again by the accelerator 120 is very high, and it is stored in the first memory 121, so that when the accelerator 120 accesses the data again, it can be directly related to the read and write speed. The faster first memory 121 interacts, thereby improving the data reading and writing efficiency of the accelerator 120 .

Optionally, the data to be cached may also include prefetched data. Simply put, the storage controller 122 may determine the data that the accelerator 120 may access through a prefetching algorithm, extract it from the second memory 130 in advance, and store it in In this way, when the accelerator 120 requests to read the data, it can directly interact with the first memory 121, thereby improving the reading and writing efficiency of the accelerator 120.

It should be understood that the data to be cached can also include more types of data, which can be specifically determined according to the application scenario of the accelerator 120, which is not specifically limited in this application.

In one embodiment, before the storage controller 122 stores data in the first memory 121, the user can configure the storage controller 122. The specific configuration content may include configuring the cache switch, configuring the address range, configuring the cache depth, etc. Wherein, when the cache switch is configured to be on, it means that the first memory 121 is used for data storage, and when it is set to off, it means that the first memory 121 is not used for data storage. The address range configured as the target address range means that the data stored in the first memory 121 is the cache data of the data stored in the target address range of the second memory 130 . The cache depth configuration of D means that the storage capacity of the first memory 121 is D. Of course, the user can also perform other configurations on the storage controller 122, which can be determined based on the actual business environment, and is not specifically limited in this application.

For example, assuming that the capacity of the first memory is 2.5M, the cache depth is configured as 2M, the cache switch is configured as on, and the address range is configured as add0~add5, then the data requested by the accelerator 120 to write add0~add5 will be cached first. In the first memory 121 , optionally, the data of add0 to add5 can also be written to the second memory 130 with a delay, while data requested to be written to other addresses in the second memory 130 can be directly written to the second memory 130 . If the accelerator 120 does not need to cache data, the cache switch can also be turned off. It should be understood that the above examples are for illustration and are not specifically limited in this application.

It can be understood that by configuring the storage controller 122, you can choose whether to enable the cache function of the storage controller 122 according to business needs, and set the address space and capacity of the cache, so that the solution of this application is suitable for more application scenarios. Flexibility is better.

Further, the data processing instructions obtained by the storage controller 122 may be data read instructions or data write instructions. When the storage controller 122 obtains the data write instructions, the data write instructions include the second storage address of the data. , the second storage address is the storage address of the second memory 130. You can first determine whether the cache switch is on. If the cache switch is on, determine whether the address carried by the data processing instruction is within the address range configured by the user. If it is within the address range configured by the user, configured address range, then the storage controller 122 can store the data in the first memory 121 , otherwise, the data can be stored in the second memory 130 .

Similarly, when the storage controller 122 obtains the data read instruction, it determines that the cache switch is on and the read address is within the address range configured by the user, and then reads the data from the first memory 121. Otherwise, The data is read from the second memory 130 and will not be described again here.

In an embodiment, the storage controller 122 may store the correspondence between the first storage address and the second storage address of the data in the address memory 124 .

In specific implementation, when the storage controller 122 obtains the data write instruction, the write address in the data write instruction is the second storage address of the data. If the above-mentioned second storage address is already stored in the address memory 124 and is the same as the second storage address, A storage address correspondence means that the historical version of the data has been written to the first storage address of the first memory 121. At this time, the historical version of data corresponding to the first storage address can be updated and sent to the second memory. 130. Send the data to request updating of the data corresponding to the second storage address.

If there is no second storage address in the address memory 124, it means that the data is written to the first memory 121 for the first time. The storage controller 122 can first determine the first storage address corresponding to the second storage address, and then carry the data writing instruction. The data is stored in the first storage address of the first memory 121, and the corresponding relationship between the first storage address and the second storage address of the data is stored in the address memory 124. It should be noted that when the storage controller 122 writes data for the first time, it can determine the first storage address of the data based on the current free address of the first memory 121. This application does not limit the address allocation strategy of the storage controller 122.

Similarly, when the storage controller 122 obtains a data read instruction, the read address in the data read instruction is the second storage address of the data. At this time, the storage controller 122 can compare the second storage address with the address memory. Match the address in 124. If the address memory 124 includes the second storage address, the corresponding first storage address can be obtained according to the second storage address of the data, and then the data can be read from the first memory 121.

It should be noted that if the second storage address is not stored in the address memory 124, it means that the data is not stored in the first memory 121, then the storage controller 122 can read the data from the second memory 130 according to the second storage address. data. At the same time, the storage controller 122 can also determine the first storage address of the data, and store the data read from the second memory 130 into the first memory 121, so that the accelerator 120 can directly read and write the data directly from the second memory 130. The data is read from the first memory 121, and the mapping relationship between the first storage address and the second storage address of the data is stored in the address memory 124.

In one embodiment, the accelerator 120 includes a status information memory 123. The status information memory 123 is used to store the status information of the data. In specific implementation, the storage controller 122 may first determine the status information of the data, and write the data into the third page according to the status information. A first storage address of the memory 121. Wherein, the status information includes a first state and a second state. The first state is used to indicate that the data is in a modification state and is being written to the first memory 121 or the second memory 130. At this time, the data cannot be read from the first memory 121 or the second memory 130. The second state is used to indicate that the data is not in a modified state, and the data can be read from the first memory 121 at this time.

In specific implementation, when the accelerator 120 processes the data writing instruction, if the status information of the data is in the first state, that is, the data is currently being modified, the accelerator can first set the status information of the data to the second state. , then write the data into the first storage address, and then write the data into the second memory 130 for data update.

Optionally, if the status information of the data is in the second status, that is, the data is not currently being manipulated, at this time, the data in the data processing instruction can be written to the first memory 121 for data update, and the data is written into The second memory 130 performs data updating.

In specific implementation, when the accelerator 120 processes the data read instruction, if the address memory does not include the read address, , that is to say, there is no such data in the first memory 121, and when the accelerator 120 reads the data from the second memory, the accelerator 120 sets the state information of the data to the first state, and the accelerator 120 reads the data from the second memory 130, After retrieving the data, when the status information of the data is in the first state, the accelerator 120 sets the status information of the data to the second status, and stores the data to the first storage address of the first memory 121 . If the status information changes to the second status, it means that the storage controller 122 has updated the latest version of the data, and the data can no longer be stored in the first memory 121 at this time. It should be understood that if a new version of data is written into the first memory 121 and the second memory 130 during the data reading process, the status information can ensure that the data is the latest version of the data.

Optionally, if the address memory 124 includes a read address and the status information is the first state, that is, the previous version of the data is being operated on, the latest version of the data can be read from the second memory 130 at this time. If the address memory 124 includes a read address and the status information is in the second state, that is, the data is not being operated on, the latest version of the data can be read from the first memory 121 at this time.

Optionally, the above-mentioned accelerator 120 can also be a CPU, the first memory can be a memory in the CPU, such as an SRAM in the CPU, and the storage controller 122 can cache data in combination with the multi-level cache architecture of the CPU, such as as the third memory of the CPU. The four-level cache architecture enables more levels of CPU cache and reduces the implementation cost of multi-level cache on the basis of reducing hardware complexity.

It should be noted that the data processing instructions not only include the above-mentioned data reading instructions and data writing instructions, but also include other instructions for performing business processing after reading the data, such as after the accelerator 120 reads the data from the first memory. , the data can be updated, deleted, merged, etc., and the multiple data read can also be calculated and processed, such as matrix multiplication, convolution operations, etc. The details can be determined according to the processing business of the accelerator 120. This application is The specific process for subsequent processing of data is specifically limited.

Figure 4 is a schematic structural diagram of a data processing device provided by this application. The data processing device 400 can be the accelerator 120 in Figure 3. As shown in Figure 4, the data processing device 400 can include a configuration module 1221 and an acquisition module 1225. and processing module 1226, wherein the processing module 1226 may include a search and write data update module 1222, a read data return processing module 1223, and a read data update module 1224. It should be understood that each of the above modules may correspond to each circuit module in an ASIC or FPGA. .

It should be noted that the functions of the acquisition module 1225, the configuration module 1221 and the processing module 1226 in Figure 4 can be implemented by the storage controller 122 in Figure 3, and the data processing device 400 shown in Figure 4 is the one shown in Figure 1 The division method in the application scenario is the application scenario in which the second memory 130 is deployed outside the accelerator. It should be understood that in the application scenario shown in Figure 2, the second memory 130 is deployed in the data processing device 400. This will not be repeated here. illustrate.

The acquisition module 1225 is used to acquire data processing instructions generated by the accelerator 120, which may include data writing instructions and data reading instructions.

The configuration module 1221 is used to receive configuration information input by the user. The configuration information may include information about configuring the cache switch, information about configuring the address range, and information about configuring the cache depth. Wherein, the information about configuring the cache switch is on, which means using the first The memory 121 performs data storage, which means that the first memory 121 is not used for data storage. The information configuring the address range includes the target address range, that is, the data stored in the first memory 121 is the cache data of the data stored in the target address range of the second memory 130 . The information configuring the cache depth includes the cache depth D, that is, the storage capacity of the first memory 121 is D.

The search and write data update module 1222 is used to obtain the data writing instructions generated by the accelerator 120, and perform the data writing instructions for processing. Specifically, the search and write data update module 1222 may first query whether the second storage address exists in the address memory 124 according to the second storage address of the data carried in the data writing instruction. If the second storage address exists in the address memory 124 , then query whether the state information of the data in the state information memory 123 is the first state. If it is the first state, modify the first state of the data to the second state, and read the data from the first memory 121 according to the first storage address. And update it, the updated data is rewritten into the first memory 121, and then the updated data is written into the second memory 130; if it is the second state, the data is directly written into the first memory 121, and then the Data is written to the second memory 130.

If the second storage address does not exist in the address memory 124, the first storage address corresponding to the second storage address is determined according to the current storage capacity of the first memory 121, and the correspondence between the first storage address and the second storage address is The relationship is stored in the first memory 121, and the data is stored in the first memory 121 and the second memory 130.

The read data return processing module 1223 and the read data update module 1224 are used to obtain the data read instructions generated by the accelerator 120 and process the data read instructions. Among them, the read data return processing module 1223 is mainly used to read data, and the read data update module 1224 is mainly used to update data.

Specifically, the read data return processing module 1223 can first query whether the second storage address exists in the address memory 124 according to the second storage address of the data carried in the data reading instruction. If the second storage address exists in the address memory 124, Then query whether the status information of the data in the status information memory 123 is the first state. If it is the first state, it means that the data is being modified, so the data can be read from the second memory 130; if it is the second state, it means the data has not been modified. is modified so that the data can be read from the first memory 121.

If the second storage address does not exist in the address memory 124, the read data update module 1224 can first store the status information of the data in the status information memory 123 as the second status, and then determine the second storage address corresponding to the storage capacity of the first memory 121. The first storage address, and the corresponding relationship between the first storage address and the second storage address is stored in the first memory 121. The read data return processing module 1223 reads data from the second memory. At the same time, the read data update module 1224 Determine whether the state information of the data in the state information memory 123 is the first state. If it is the first state, modify the state information to the second state, and then update the data to the first memory. If it is the second state, do not change the state information to the first state. Data is updated to the first memory. It should be understood that if it is the second state, it means that the search and write data update module 1222 has updated the data in the first memory during this period, so there is no need to update the data to the first memory.

It should be noted that Figure 4 is an exemplary division method. The memory controller 122 provided in this application can also be divided into more modules. For example, in Figure 4, the address memory 124 is searched for the first data processing instruction. The step of storing addresses is implemented by the search and write data update module 1222. In specific implementation, the search and write data update module 1222 can also be further divided into a search module and a write data update module, which is not specifically limited in this application.

In summary, it can be seen that the accelerator provided by the present application stores the data to be cached in the second memory in the first memory, where the read and write efficiency of the first memory is greater than the read and write efficiency of the second memory, so that the accelerator is faster when reading and writing data. It can directly interact with the first memory with high reading and writing efficiency, thereby improving the data reading and writing efficiency of the accelerator. At the same time, the first memory is implemented by the memory in the accelerator, such as SRAM, register, CAM, etc., without the need to deploy additional cache. , realizes the cache function of accelerators such as FPGA and ASIC with very low hardware cost, which can solve the problem of accelerator's read and write efficiency limited by the memory bandwidth bottleneck in a simple, efficient and low-cost way.

The data processing method provided by the present application will be explained below with reference to Figures 5 to 7. The methods described in Figures 5 to 7 can be applied to the accelerator 120 in Figures 1 to 4. Among them, Figure 5 is a schematic flowchart of the steps of the data processing method provided by this application in an application scenario. Figure 6 is a schematic flowchart of the steps of processing data reading instructions in a data processing method provided by this application. Figure 7 is a schematic flowchart of the steps of processing data reading instructions in a data processing method provided by this application. A schematic flow chart of steps for processing data writing instructions in a data processing method is provided. Simply put, Figure 6 is used to describe step 7 in Figure 5, and Figure 7 is used to describe step 8 in Figure 5. As shown in Figures 5 to 7, the method may include the following steps:

Step 1. Configure the storage controller 122. This step can be implemented by the configuration module 1221 in Figure 4.

In specific implementation, the configuration content includes one or more of cache switch status, cache address range, and cache capacity, where the cache switch status is used to indicate whether the accelerator uses the first memory 121, and the cache address range is used to indicate that the accelerator will store Data whose address is within the cache address range is stored in the first memory 121 , and the cache capacity is used to indicate the capacity of the first memory 121 . The cache depth configuration of D means that the storage capacity of the first memory 121 is D. Of course, the user can also perform other configurations on the storage controller 122, which can be determined based on the actual business environment, and is not specifically limited in this application.

Step 2. The storage controller 122 obtains data processing instructions. In specific implementation, the data processing instruction may be a data reading instruction or a data processing instruction. This step can be implemented by the acquisition module 1225 in Figure 4.

Step 3. The storage controller 122 determines whether the cache switch is on. If the cache switch is on, step 4 is performed. If the cache switch is off, step 9 is performed. Wherein, when the cache switch is configured to be on, it means that the first memory 121 is used for data storage, and when it is set to off, it means that the first memory 121 is not used for data storage. This step can be implemented by the configuration module 1221 in Figure 4.

Step 4. The storage controller 122 determines whether the address range is set. If the address range is set, step 5 is performed. If the address range is not set, step 9 is performed. This step can be implemented by the configuration module 1221 in Figure 4.

Optionally, when the address range is not set, since the cache switch is already turned on, that is to say, the user wants to use the cache function of the first memory 121, the user can be prompted to set the cache address range, or the step can be skipped. For the processing flow of 5, directly execute step 6.

Step 5. The storage controller 122 determines whether the address is within the range, where the address refers to the address of the data access request received in step 2, and the range refers to the cache address range configured in step 1. If the address is within the range, 1 is within the configured cache address range, go to step 6. If it is not within the cache address range, go to step 9. This step can be implemented by the configuration module 1221 in Figure 4.

Step 6. The storage controller 122 determines whether the data processing instruction is a data read instruction. If it is a data read instruction, step 7 is performed. If it is not a data read instruction, step 8 is performed. This step can be implemented by the configuration module 1221.

Step 7. The storage controller 122 processes the data read instruction. The process of processing the data read instruction in step 7 will be described in detail in the embodiment of Figure 6. This step can be implemented by the read data return processing module 1223 and the read data update module 1224 in Figure 4.

Step 8. The storage controller 122 processes the data write instruction. Among them, the process of processing the data writing instruction in step 8 will be described in detail in the embodiment of FIG. 7 . This step can be implemented by the search and write data update module 1222 in Figure 4.

Step 9. The storage controller 122 issues a read request or write request to the second memory. In specific implementation, if the data access request is a data read instruction and the second memory 130 is DDR, then the storage controller 122 can issue a DDR read request. request, if the data access request is a data processing instruction, then the storage controller 122 can issue a DDR write request. The above example is for illustration and is not specifically limited in this application.

Next, with reference to Figure 6, a detailed description will be given of how the data read instruction is processed in step 7 above. Figure 6 is a flow chart of the processing steps of the data read instruction provided by this application. As shown in Figure 6, step 7 may include the following step flow:

Step 71: The address memory 124 determines whether the same address is stored in the address memory 124, where the data read instruction carries the read address of the data, and the address is the address of the second memory 130. This step can be implemented by the read data return processing module 1223 in Figure 4.

With reference to the foregoing content, it can be known that the address memory 124 may be a CAM, where the CAM is a memory that is addressed by content. The working mechanism of the memory is to compare an input data item with all data items stored in the CAM to determine whether the input data item matches the data items stored in the CAM, so the address used to store the address information of the data The memory 124 can be implemented using a CAM in the accelerator 120. In this way, when the user requests to read data, the CAM can match the read address with the address information stored in the CAM. If they match, it means that the data has been stored in the first memory 121. middle.

It should be understood that the above examples are for illustration, the address memory 124 can also be implemented using other memories, and then the memory controller 122 implements the function of obtaining the address from the address memory 124 and matching it with the address carried in the data read instruction. Applications are not subject to specific restrictions.

It should be noted that after step 71 determines whether the same address exists in the address memory 124, if it exists, it means that the latest or historical version of the data has been stored in the first memory 121, and step 72 can be executed. If it does not exist, it means that the first memory 121 There is no version of this data stored in , so step 75 can be performed.

Step 72: Determine whether the status information is the first state. The first state can also be represented by a high-order state or a "1" state, which is not specifically limited in this application. This step can be implemented by the read data return processing module 1223 in Figure 4.

In specific implementation, the status information memory 123 may be a register, and step 72 may be implemented by a register that determines whether the status information is the first state, or the status information memory 123 may implement the storage function, and the storage controller 122 may implement the determination function. For example, the storage controller 122 obtains the status information of the data from the status information memory 123 and determines whether it is in the first state. This application does not specifically limit this.

It should be noted that when the status information of the data is in the first state (high bit), step 73 is executed, and when the status information of the data is in the second state (low bit), step 74 is executed.

Step 73. Read data from the second memory 130. Specifically, the storage controller 122 may issue a data read instruction to the second memory 130. If the second memory 130 is DDR, then the data read instruction may be a DDR read instruction. ask. This step can be implemented by the read data return processing module 1223 in Figure 4.

It can be understood that if the status information of the data is the first state, referring to the foregoing content, it can be seen that when the data is in the first state, the data may be in the process of being retrieved from the second memory 130, so the data is read directly from the second memory 130 at this time. Fetching data can avoid reading errors and improve the accuracy of data reading.

Step 74. Read data from the first memory. Specifically, the storage controller 122 may query the address memory 124 to determine the first storage address of the data in the first memory according to the read address carried in the data read instruction, and then read the data from the first storage address. This step can be implemented by the read data return processing module 1223 in Figure 4.

Step 75: Set the state information to the first state, and store the correspondence between the first storage address and the second storage address. The first state can also be represented by a high-order state or a "1" state, which is not specifically limited in this application. The above corresponding relationship will be stored in the address memory 124. This step can be implemented by the read data update module 1224 in Figure 4.

Step 76. Read data from the second memory. Specifically, the storage controller 122 may issue a data read instruction to the second memory 130. If the second memory 130 is DDR, then the data read instruction may be a DDR read request. This step can be implemented by the read data update module 1224 in Figure 4.

Step 77: Determine whether the status information is the first status (high bit). If it is the first state, steps 78 to 79 are executed. If it is the second state (low level), steps 78 to 79 are not executed. This step can be implemented by the read data update module 1224 in Figure 4.

It can be understood that if the status information changes to the second state (low bit), referring to the foregoing content, it can be seen that during the data writing process, if the status information becomes the first state (high bit), the storage controller 122 will change the status information of the data from The first state (high bit) is modified to the second state (low bit), which means that the data has been modified between step 75 and step 79, and the storage controller 122 is writing or has written the latest version of the data to the first memory 121 , so the data read at this time can not be written into the first memory 121 to avoid overwriting the data of the new version.

Similarly, if the status information does not change to the second state (low bit), it means that the data has not been modified between step 75 and step 79. Therefore, the data read from the second memory 130 at this time can be updated to the second state. A memory 121.

Step 78. Modify the status information to the second status (low). This step can be implemented by the read data update module 1224 in Figure 4.

Step 79. Update the data to the first memory 121. This step can be implemented by the read data update module 1224 in Figure 4.

Next, with reference to Figure 7, a detailed description will be given of how the data writing instruction is processed in step 8 above. As shown in Figure 7, Figure 7 is a flow chart of the processing steps of the data write instruction provided by this application. Step 8 may include the following step flow:

Step 81. Determine whether the same address exists. The data processing instruction carries the write address of the data. Step 81 can determine whether the write address exists in the address memory 124. For the specific implementation of this step, please refer to the description of the aforementioned step 71. Here Without going into details, this step can be implemented by the search and write data update module 1222 in Figure 4 .

Among them, if the same address exists, step 82 is executed, and if the same address does not exist, step 84 is executed.

Step 82: Determine whether the status information is the first state. The first state can also be represented by a high-order state or a "1" state, which is not specifically limited in this application. This step can be implemented by the search and write data update module 1222 in Figure 4.

When the status information is the first state (high bit), step 83, step 85 and step 86 are executed. When the status information is the second state (low bit), step 85 and step 86 are executed.

Step 83. Set the status information to the second state. The second state can also be represented by the low-order state or the "0" state, which is not specifically limited in this application. This step can be implemented by the search and write data update module 1222 in Figure 4.

It can be understood with reference to step 75 that when the status information is in the first state, it means that the data is executing steps 75 to 78. At this time, the data received in step 83 is the latest version of data, so the status information is set to the second state. status, which can prevent the old version data in steps 75 to 78 from overwriting the current new version data.

It should be noted that after step 83 is executed, step 85 and step 86 can be executed.

Step 84: Store the corresponding relationship between the first storage address and the second storage address, and store it in the address memory 124. It can be understood that when the address carried in the data processing instruction does not exist in the address memory 124, it means that the historical version of the data has not been written to the first memory 121. Therefore, it can be divided into writes to the first memory 121 at this time. Enter the address, that is, the first storage address, and then store the first storage address into the address memory 124. In this way, when the storage controller 122 receives a read request for the data, the first storage address of the data can be obtained through the address memory. address, and then reads the data to achieve data caching. This step can be implemented by the search and write data update module 1222 in Figure 4.

It should be noted that after step 84 is executed, step 85 and step 86 can be executed.

Step 85. Write data to the first memory. This step can be implemented by the search and write data update module 1222 in Figure 4.

Step 86. Write data to the second memory. This step can be implemented by the search and write data update module 1222 in Figure 4.

In summary, it can be seen that the data processing method provided by this application stores data in the first memory and stores the data in the first memory in the second memory, where the read and write efficiency of the first memory is greater than the read and write efficiency of the second memory. This allows the accelerator to directly interact with the first memory with higher reading and writing efficiency when reading and writing data, thereby improving the data reading and writing efficiency of the accelerator. At the same time, the first memory is implemented by the memory in the accelerator, such as SRAM, register, CAM, etc. etc., without the need to deploy additional cache, and realize the cache function of accelerators such as FPGA and ASIC with very low hardware cost, which can simply, efficiently and cost-effectively solve the problem of accelerator read and write efficiency limited by the memory bandwidth bottleneck.

The embodiment of the present application provides an accelerator. The accelerator includes a processor and a power supply circuit. The power supply circuit is used to process The processor supplies power, and the processor is used to implement the functions of the operating steps performed by the accelerator described in Figures 5 to 7 above.

Embodiments of the present application provide a computing device. The computing device includes a CPU and an accelerator. The CPU is used to run instructions to implement business functions of the computing device, and to implement the functions of the operation steps performed by the accelerator described in Figures 5 to 7.

The above embodiments are implemented in whole or in part by software, hardware, firmware or any other combination. When implemented using software, the above-described embodiments are implemented in whole or in part in the form of a computer program product. A computer program product includes at least one computer instruction. When computer program instructions are loaded or executed on a computer, processes or functions according to embodiments of the present invention are generated in whole or in part. The computer is a general-purpose computer, a special-purpose computer, a computer network, or other programming device. Computer instructions are stored in or transmitted from one computer-readable storage medium to another, e.g., from a website, computer, server, or data center via wired (e.g., coaxial cable, fiber optic cable) , digital subscriber line (DSL)) or wireless (such as infrared, wireless, microwave, etc.) to transmit to another website, computer, server or data center. Computer-readable storage media are any media that can be accessed by a computer or data storage nodes such as servers and data centers that contain at least one media collection. The media used is magnetic media (for example, floppy disk, hard disk, tape), optical media (for example, high-density digital video disc (DVD)), or semiconductor media. The semiconductor medium is SSD.

The above are only specific embodiments of the present invention, but the protection scope of the present invention is not limited thereto. Any person familiar with the technical field can easily think of various equivalent repairs or replacements within the technical scope disclosed by the present invention. , these repairs or replacements should be covered by the protection scope of the present invention. Therefore, the protection scope of the present invention should be subject to the protection scope of the claims.

Claims

A data processing method, characterized in that the method is applied to an accelerator, the accelerator includes a first memory, and the method includes:

The accelerator obtains data processing instructions, wherein the data processing instructions include the address of data in a second memory, and the data reading and writing efficiency of the first memory is greater than the data reading and writing efficiency of the second memory;

The accelerator reads the data from the first memory according to the data processing instruction and processes the data, wherein the first memory is used to store the data to be cached in the second memory, so The data to be cached includes data historically accessed by the accelerator.
The method according to claim 1, characterized in that the accelerator includes a state information memory, the state information memory is used to store state information of the data, the state information includes a first state and a second state, wherein , the first state is used to indicate that the data is in a modified state, and the second state is used to indicate that the data is not in a modified state.
The method according to claim 1 or 2, characterized in that the accelerator includes an address memory, the address memory is used to store the address of the data in the first memory and the address of the data in the second memory. Correspondence between addresses in memory.
The method according to any one of claims 1 to 3, characterized in that the data processing instructions include data writing instructions, and the accelerator reads the data from the first memory according to the data processing instructions. , processing the data includes:

When the state information of the data is a first state and the data processing instruction is a data writing instruction, the accelerator sets the state information of the data to a second state, and the data is processed from the data according to the data processing instruction. The first memory reads and updates the data, and stores the updated data in the first memory.
The method according to claim 4, wherein the data processing instructions include data reading instructions, and before the accelerator obtains the data processing instructions, the method further includes:

The accelerator determines that the address memory does not include the address in the data processing instruction;

The accelerator sets the state information of the data to the first state and reads the data from the second memory;

When the state information of the data is the first state, the accelerator sets the state information of the data to the second state, and stores the data in the first memory.
The method according to any one of claims 1 to 5, characterized in that the method further includes: the accelerator configuring the first memory to obtain the configuration information of the first memory, and the configuration The information includes one or more of cache switch status, cache address range, and cache capacity. The cache switch status is used to indicate whether the accelerator uses the first memory, and the cache address range is used to indicate whether the accelerator uses the first memory. The accelerator stores data with a storage address in the cache address range in the first memory, and the cache capacity is used to indicate the capacity of the first memory.
The method according to any one of claims 1 to 6, characterized in that the accelerator uses a dedicated set It is implemented by circuit-integrated ASIC or field-programmable logic gate array FPGA technology. The first memory includes one or more of static random access memory SRAM, storage-level memory SCM, register, and content-addressable memory CAM.
The method according to any one of claims 1 to 7, characterized in that the data to be cached includes data with an access frequency higher than a first threshold.
A data processing device, characterized in that the data processing device includes a first memory, and the data processing device includes:

An acquisition unit configured to acquire data processing instructions, wherein the data processing instructions include the address of the data in the second memory, and the data reading and writing efficiency of the first memory is greater than the data reading and writing efficiency of the second memory;

a processing unit, configured to read the data from the first memory according to the data processing instructions and process the data, wherein the first memory is used to store the data to be cached in the second memory , the data to be cached includes data historically accessed by the data processing device.
The device according to claim 9, characterized in that the data processing device includes a state information memory, the state information memory is used to store state information of the data, the state information includes a first state and a second state. status, wherein the first status is used to indicate that the data is in a modified status, and the second status is used to indicate that the data is not in a modified status.
The device according to claim 9 or 10, characterized in that the data processing device includes an address memory, the address memory is used to store the address of the data in the first memory and the location of the data. The corresponding relationship between the addresses in the second memory.
The device according to any one of claims 9 to 11, wherein the data processing instructions include data writing instructions;

The processing unit is configured to set the status information of the data to a second status when the status information of the data is a first status and the data processing instruction is a data writing instruction. According to the data The processing instructions read and update the data from the first memory, and store the updated data in the first memory.
The device according to claim 12, wherein the data processing instructions include data reading instructions;

The processing unit is configured to determine that the address memory does not include the address in the data processing instruction before the data processing device obtains the data processing instruction;

The processing unit is configured to set the state information of the data to the first state and read the data from the second memory;

The processing unit is configured to, when the status information of the data is the first status, the data processing device set the status information of the data to the second status, store the data to the first memory.
The device according to any one of claims 9 to 13, characterized in that the device further includes a configuration unit, the configuration unit is used to configure the first memory to obtain the configuration of the first memory. Information, the configuration information includes one or more of cache switch status, cache address range, cache capacity, wherein the cache switch status is used to indicate whether the data processing device uses the first memory, so The cache address range described above is used for The device for instructing the data processing stores data with a storage address within the cache address range in the first memory, and the cache capacity is used to indicate the capacity of the first memory.
The device according to any one of claims 9 to 14, characterized in that the data processing device is implemented by an application specific integrated circuit (ASIC) or a field programmable logic gate array (FPGA) technology, and the first memory includes a static random access memory (SRAM). , one or more of storage-level memory SCM, register, and content-addressable memory CAM.
The device according to any one of claims 9 to 15, wherein the data to be cached includes data with an access frequency higher than a first threshold.
An accelerator, characterized in that the accelerator includes a processor and a power supply circuit, the power supply circuit is used to supply power to the processor, and the processor is used to implement the accelerator according to any one of claims 1 to 8. The function of the steps performed.
A computing device, characterized in that the computing device includes a central processing unit (CPU) and an accelerator, the CPU is used to run instructions to implement business functions of the computing device, and the accelerator is used to implement any of claims 1 to 8. The function of the operation steps performed by the accelerator of one claim.