WO2023201987A1 - 请求处理方法、装置、设备及介质 - Google Patents

请求处理方法、装置、设备及介质 Download PDF

Info

Publication number
WO2023201987A1
WO2023201987A1 PCT/CN2022/121127 CN2022121127W WO2023201987A1 WO 2023201987 A1 WO2023201987 A1 WO 2023201987A1 CN 2022121127 W CN2022121127 W CN 2022121127W WO 2023201987 A1 WO2023201987 A1 WO 2023201987A1
Authority
WO
WIPO (PCT)
Prior art keywords
address
request
current
processed
preset
Prior art date
Application number
PCT/CN2022/121127
Other languages
English (en)
French (fr)
Inventor
徐江波
母文道
崔健
吴睿振
Original Assignee
苏州浪潮智能科技有限公司
Priority date (The priority date is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the date listed.)
Filing date
Publication date
Application filed by 苏州浪潮智能科技有限公司 filed Critical 苏州浪潮智能科技有限公司
Publication of WO2023201987A1 publication Critical patent/WO2023201987A1/zh

Links

Images

Classifications

    • HELECTRICITY
    • H04ELECTRIC COMMUNICATION TECHNIQUE
    • H04LTRANSMISSION OF DIGITAL INFORMATION, e.g. TELEGRAPHIC COMMUNICATION
    • H04L67/00Network arrangements or protocols for supporting network services or applications
    • H04L67/14Session management
    • H04L67/141Setup of application sessions
    • HELECTRICITY
    • H04ELECTRIC COMMUNICATION TECHNIQUE
    • H04LTRANSMISSION OF DIGITAL INFORMATION, e.g. TELEGRAPHIC COMMUNICATION
    • H04L51/00User-to-user messaging in packet-switching networks, transmitted according to store-and-forward or real-time protocols, e.g. e-mail
    • YGENERAL TAGGING OF NEW TECHNOLOGICAL DEVELOPMENTS; GENERAL TAGGING OF CROSS-SECTIONAL TECHNOLOGIES SPANNING OVER SEVERAL SECTIONS OF THE IPC; TECHNICAL SUBJECTS COVERED BY FORMER USPC CROSS-REFERENCE ART COLLECTIONS [XRACs] AND DIGESTS
    • Y02TECHNOLOGIES OR APPLICATIONS FOR MITIGATION OR ADAPTATION AGAINST CLIMATE CHANGE
    • Y02DCLIMATE CHANGE MITIGATION TECHNOLOGIES IN INFORMATION AND COMMUNICATION TECHNOLOGIES [ICT], I.E. INFORMATION AND COMMUNICATION TECHNOLOGIES AIMING AT THE REDUCTION OF THEIR OWN ENERGY USE
    • Y02D10/00Energy efficient computing, e.g. low power processors, power management or thermal management

Definitions

  • This application relates to the field of management software, and in particular to request processing methods, devices, equipment and media.
  • this application discloses a request processing method, which is applied to current accelerator cards, including:
  • this application discloses a request processing device, which is applied to current accelerator cards and includes:
  • the request acquisition module is used to obtain the address of the current pending request from the local first-in-first-out queue, and obtain the current pending request from the preset request buffer based on the address of the current pending request;
  • the request parsing module is used to parse the current pending request, and obtain the corresponding current pending source data and the serial address item used to record the address of the next pending request based on the parsing results;
  • the processing module is used to process the current source data to be processed to obtain the current processed target data, and write the current processed target data into the preset data buffer; the judgment result processing module is used to process the serial address item when the serial address item is empty. When, the host is notified to read all processed target data from the preset data buffer.
  • the present application discloses an electronic device, including a memory and one or more processors.
  • Computer-readable instructions are stored in the memory.
  • the above-mentioned One or more processors execute the steps of the above disclosed request processing method.
  • the present application discloses one or more non-volatile computer-readable storage media storing computer-readable instructions.
  • the above-mentioned computer-readable instructions are executed by one or more processors, the above-mentioned one or more processors
  • the processor executes the steps of the above disclosed request processing method.
  • Figure 1 is a flow chart of a request processing method disclosed in this application according to one or more embodiments
  • Figure 2 is a flow chart of a specific request processing method disclosed in this application according to one or more embodiments
  • Figure 3 is a representation of a specific request chain disclosed in this application according to one or more embodiments.
  • Figure 4 is a flow chart of a specific request processing method disclosed in this application according to one or more embodiments.
  • Figure 5 is a schematic diagram of a specific request processing method disclosed in this application according to one or more embodiments.
  • Figure 6 is a schematic structural diagram of a request processing device disclosed in this application according to one or more embodiments.
  • Figure 7 is a structural diagram of an electronic device disclosed according to one or more embodiments of this application.
  • this application accordingly provides a request processing solution, which can reduce the CPU burden and improve the efficiency of request processing.
  • the embodiment of the present application discloses a request processing solution, which is applied to current accelerator cards, including:
  • Step S11 Obtain the address of the current request to be processed from the local first-in-first-out queue, and obtain the current request to be processed from the preset request buffer based on the address of the current request to be processed.
  • the address of the current request to be processed is obtained from the local First Input First Output (FIFO) queue, and the current processing request corresponding to the address is obtained from the preset request buffer. It can be understood that What is more, the host writes the address of the current processing request into the first-in-first-out queue of the current accelerator card in advance.
  • FIFO First Input First Output
  • Step S12 Parse the current pending request, and obtain the corresponding current pending source data and the serial address item used to record the address of the next pending request based on the parsing result.
  • the current request to be processed is parsed to obtain the corresponding parsing result, where the parsing result includes the address of the processed target data corresponding to the previous processed request, the address of the current processed target data, and the address of the next processed target data.
  • Step S13 Process the current source data to be processed to obtain the current processed target data, write the current processed target data into the preset data buffer, and determine whether the serial address item is empty.
  • the current source data to be processed is processed accordingly to obtain the current processed target data, and the current processed target data is written into the corresponding preset data buffer based on the address used to store the current processed target data. s position. Determine whether the serial address item is empty, that is, determine whether there is a next pending request. You can preset that if the serial address item is empty, there is no next pending request, or you can preset the serial address to 0. , then there is no next pending request. Before correspondingly processing the current source data to be processed, it is also necessary to determine whether the parallel address item used to record the current pending parallel request corresponding to the current pending request is empty. It should be noted that it is also possible to preset if the parallel address item is empty. address item, then there is no current pending parallel request, or the parallel address can be preset to 0, then there is no current pending parallel request.
  • the process of processing the current source data to be processed to obtain the current processed target data and writing the current processed target data into the preset data buffer also includes: processing the current source data to be processed.
  • the corresponding preset error response operation is written to the response queue allocated in advance by the host for the current accelerator card, and the host is notified to perform the preset error response operation.
  • the current accelerator card writes the corresponding preset error response operation to the local response queue and interrupts it through MSI (Message Signaled Interrupts). Notify the host to perform corresponding preset error response operations.
  • MSI Message Signaled Interrupts
  • Step S14 If the serial address item is empty, notify the host to read all processed target data from the preset data buffer. If the serial address item is not empty, send the address recorded in the serial address item to Save in the first-in-first-out queue of the next accelerator card.
  • the current accelerator card writes a preset read response to the local response queue, for example, so that the phase mark in the response queue From 0 to 1, an MSI interrupt can then be sent to the host to notify the host to read all processed target data from the preset data buffer; when the phase mark in the host polling response queue is not 0, it is considered that there is new
  • the preset read response is written into the response queue and all processed target data is read from the preset data buffer. Then the host sets the phase mark from 1 to 0 so that there is a new preset read next time. Get the response and write it into the response queue, causing the phase mark to change from 0 to 1.
  • the address recorded in the serial address item is sent to the first-in-first-out queue of the next accelerator card for storage, so that the next accelerator card can obtain the serial address based on its own first-in-first-out queue.
  • the address recorded in the address item, and the next pending request corresponding to the address is obtained from the preset request buffer.
  • the PCIe EP endpoint, the endpoint device is the accelerator card in this application
  • this application obtains the address of the current pending request from the local first-in-first-out queue, and obtains the current pending request from the preset request buffer based on the address of the current pending request; parses the current pending request, and Based on the parsing results, obtain the corresponding current source data to be processed and the serial address item used to record the address of the next request to be processed; process the current source data to be processed to obtain the current processed target data, and convert the current processed target data into The data is written into the preset data buffer, and it is judged whether the serial address item is empty; if the serial address item is empty, the host is notified to read all processed target data from the preset data buffer.
  • the address recorded in the serial address item is sent to the first-in-first-out queue of the next accelerator card for storage. It can be seen that this application obtains the address of the current pending request from the local first-in-first-out queue, and obtains the current pending request corresponding to the address from the preset request buffer; parses the current pending request, and Based on the parsing results, obtain the corresponding current source data to be processed and the serial address item used to record the address of the next pending request; determine whether the serial address item is empty. If the serial address item is non-empty, the serial address item will be The address recorded in the address item is sent to the first-in-first-out queue of the next accelerator card for storage. There is no need for the host to send the address recorded in the serial address item to the first-in-first-out queue of the next accelerator card every time, reducing the CPU usage. burden, reducing redundant steps, thereby improving the efficiency of processing requests.
  • this embodiment of the present application discloses a specific request processing solution, including:
  • Step S21 Obtain the address of the current request to be processed from the local first-in-first-out queue, and obtain the current request to be processed from the preset request buffer based on the address of the current request to be processed.
  • Step S22 Parse the current pending request to obtain the corresponding parsing result; the parsing result includes the address of the processed target data corresponding to the previous processed request, the address of the current processed target data, and the address used to record the next pending request.
  • the serial address item of the address based on the address of the processed target data corresponding to the previous processed request, obtain the corresponding data from the preset data buffer to obtain the current source data to be processed.
  • the process of parsing the current pending request also includes: extracting the serial accelerator card address carried in the current pending request; correspondingly, sending the address recorded in the serial address item to the next accelerator Save in the first-in-first-out queue of the card, including: determining the next accelerator card based on the serial accelerator card address carried in the current pending request, and sending the address recorded in the serial address item to the next accelerator card Save in first-in-first-out queue.
  • the process of parsing the current request to be processed also includes: extracting the parallel accelerator card address and parallel address item carried in the current request to be processed; the parallel address item is used to record the current request corresponding to the current request to be processed.
  • the address of the parallel request to be processed accordingly, before processing the current source data to be processed to obtain the current processed target data, it also includes: judging whether the parallel address item is empty; if the parallel address item is non-empty, based on the current to-be-processed target data Process the parallel accelerator card address carried in the request to determine the corresponding target parallel accelerator card, and send the address recorded in the parallel address item to the target parallel accelerator card for storage, so that the target parallel accelerator card can use the address to request from the preset request buffer Get the current pending parallel requests.
  • each pending request contains specific data of the current pending request, for Record the serial address item of the address of the next pending request, the serial accelerator card address, the parallel accelerator card address, the parallel address item used to record the current pending parallel request corresponding to the current pending request, and the request number of the request , where the specific data of the current pending request includes the address of the processed target data corresponding to the previous processed request, the address of the current processed target data, and the relevant parameters of the current pending request, the parallel accelerator card address and the serial accelerator card
  • the address can be the BAR (Base Address Register) address of the corresponding accelerator card, and since each request contains serial address items and parallel address items, a linked list containing parallel requests and Request linked list of the serial request linked list.
  • the address of the processed target data corresponding to the current pending request and the corresponding previous processed request of the current pending parallel request is the same, and the same request linked list
  • the request numbers of all pending requests in the request list are the same, so that the current accelerator card in the request list determines that the serial address item is empty, and then carries the request number to notify the host to read all processed target data from the preset data buffer.
  • Step S23 Process the current source data to be processed to obtain the current processed target data, write the current processed target data into the preset data buffer, and determine whether the serial address item is empty.
  • writing the current processed target data into the preset data buffer specifically includes: based on the address of the current processed target data carried in the parsing result, writing the current processed target data into the preset data buffer. corresponding location.
  • Step S24 If the serial address item is empty, notify the host to read all processed target data from the preset data buffer. If the serial address item is not empty, send the address recorded in the serial address item to Save in the first-in-first-out queue of the next accelerator card.
  • the current accelerator card will be used to record the current pending requests corresponding to the current pending requests.
  • the parallel address item of the parallel request is written to the accelerator card corresponding to the parallel accelerator card address, and the address recorded in the serial address item is sent to the first-in-first-out queue of the next accelerator card corresponding to the serial accelerator card address for storage. , so requests can be processed in parallel or serially to improve processing efficiency.
  • this embodiment of the present application discloses a specific request processing solution, including:
  • Step S31 Obtain the entry address of the FIFO queue and the current available capacity of the FIFO queue from the local base address register space.
  • the host before obtaining the entry address of the FIFO queue and the current available capacity of the FIFO queue from the local base address register space, the host performs system initialization and allocates a preset number of response queues to each accelerator card. , so that the corresponding accelerator card can write the preset read response to the response queue, and then send an MSI interrupt to the host to notify the host to read all the processed target data from the preset data buffer, thereby avoiding multiple accelerator cards. data conflicts between.
  • the host applies for request space for storing pending requests from the preset request buffer, and applies for data space for storing processed target data from the preset data buffer.
  • Step S32 When the current available capacity is not less than the preset capacity threshold, write the address of the currently pending request into the first-in-first-out queue based on the entry address through the host or the previous accelerator card.
  • the current available capacity when the current available capacity is not less than the preset capacity threshold, after writing the address of the currently pending request into the first-in-first-out queue based on the entry address through the host or the previous accelerator card, it also includes: based on the current pending request The size of the space occupied by the requested address, recalculate the current available capacity to obtain the updated available capacity, and determine whether the updated available capacity is not less than the preset capacity threshold; if the updated available capacity is less than the preset capacity threshold, the new The addresses of pending requests are written to the first-in-first-out queue.
  • the current available capacity is 1, the preset capacity threshold is 1, that is, the current available capacity is not less than the preset capacity threshold, the space size occupied by the address of the current pending request is 1, and the address of the current pending request is based on the entrance through the host.
  • the current available capacity is recalculated and the updated available capacity is 0. It is determined that the updated available capacity is less than the preset capacity threshold. Therefore, it is prohibited to write the address of a new pending request into the FIFO queue. Dequeue.
  • the arbitration mechanism of the PCIe bus (Peripheral Component Interconnect Express bus) is used, and the first one The pending request receives the space allocated as 1 by the first-in-first-out queue, and the second pending request receives the space allocated as 0 by the first-in-first-out queue, and then the address of the first pending request is written. into the first-in-first-out queue, the address of the second pending request cannot be written to the first-in-first-out queue.
  • PCIe bus Peripheral Component Interconnect Express bus
  • Step S33 Obtain the address of the current request to be processed from the local first-in-first-out queue, and obtain the current request to be processed from the preset request buffer based on the address of the current request to be processed.
  • Step S34 Parse the current pending request, and obtain the corresponding current pending source data and the serial address item used to record the address of the next pending request based on the parsing result.
  • Step S35 Process the current source data to be processed to obtain the current processed target data, write the current processed target data into the preset data buffer, and determine whether the serial address item is empty.
  • Step S36 If the serial address item is empty, notify the host to read all processed target data from the preset data buffer. If the serial address item is not empty, send the address recorded in the serial address item to Save in the first-in-first-out queue of the next accelerator card.
  • this application is based on the current available capacity of the local first-in-first-out queue to determine whether the host or the previous accelerator card can write the address of the current pending request into the first-in-first-out queue based on the entry address to avoid the first-in first-out queue.
  • the current available capacity of the queue cannot meet the space occupied by the address of the current request to be processed, and a FIFO queue overflow error occurs; the current accelerator card sends the address of the next request to be processed to the FIFO of the next accelerator card. In the queue, it reduces the burden on the host, thereby reducing the workload of the CPU and improving the efficiency of request processing.
  • the host performs system initialization and allocates a preset number of response queues to each accelerator card. For example, each accelerator card is assigned a ring response queue to avoid data conflicts between multiple accelerator cards.
  • the host prepares data to be processed based on actual needs.
  • each pending request contains the request number of the request, specific data of the current pending request, a serial address item used to record the address of the next pending request, Serial accelerator card address, parallel accelerator card address, parallel address item used to record the current pending parallel request corresponding to the current pending request, where the specific data of the current pending request includes the processed target corresponding to the previous processed request.
  • FIFO first-in-first-out queue
  • the address of the initial pending request is based on The entry address (FIFO Entry) of the initial accelerator card's first-in-first-out queue is written into the first-in-first-out queue.
  • the preset capacity threshold can be set to the space occupied by the address of the initial pending request, and the preset can be changed based on the actual situation. Set the capacity threshold. If the addresses of two pending requests need to be written into the first-in-first-out queue, the arbitration mechanism of the PCIe Bus (PCIe Bus) is used to determine the first pending request and the second pending request. .
  • the current accelerator card detects that the address of the current pending request is written in the local first-in-first-out queue, it obtains the address of the current pending request from the first-in-first-out queue and uses the space occupied by the address of the current pending request. size, recalculate the current available capacity (FIFO Depth) to obtain the updated available capacity.
  • FIFO Depth current available capacity
  • the address of the initial pending request is obtained.
  • the current source data to be processed is processed to obtain the current processed target data, and then the current processed target data is written into the corresponding position of the preset data buffer based on the address of the current processed target data.
  • the corresponding preset error response operation is written to the response queue allocated by the host for the current accelerator card in advance, and the host is notified to perform the preset error response operation.
  • serial address item Determine whether the serial address item is empty. If the serial address item is not empty, send the address recorded in the serial address item to the first-in-first-out queue of the next accelerator card corresponding to the serial accelerator card address for storage. ; If the serial address item is empty, the current accelerator card writes the preset read response to the local response queue, causing the phase mark in the response queue to change from 0 to 1, and then can send an MSI interrupt to the host to notify the host from The default data buffer reads all processed target data.
  • phase mark in the host polling response queue is not 0, it is considered that a new preset read response is written into the response queue, and all processed target data is read from the preset data buffer, and then the host The phase mark is then set from 1 to 0, so that a new preset read response is written into the response queue next time, causing the phase mark to change from 0 to 1.
  • this embodiment of the present application discloses a request processing device, which is applied to current accelerator cards and includes:
  • the request acquisition module 11 is used to obtain the address of the current pending request from the local first-in-first-out queue, and obtain the current pending request from the preset request buffer based on the address of the current pending request;
  • the request parsing module 12 is used to parse the current request to be processed, and obtain the corresponding current source data to be processed and the serial address item used to record the address of the next request to be processed based on the parsing results;
  • the processing module 13 is used to process the current source data to be processed to obtain the current processed target data, and write the current processed target data into the preset data buffer;
  • Determination module 14 used to determine whether the serial address item is empty
  • the judgment result processing module 15 is used to notify the host to read all processed target data from the preset data buffer if the serial address item is empty; if the serial address item is not empty, store the serial address item in the The recorded address is sent to the first-in-first-out queue of the next accelerator card for storage.
  • this application obtains the address of the current pending request from the local first-in-first-out queue, and obtains the current pending request from the preset request buffer based on the address of the current pending request; parses the current pending request, and Based on the parsing results, obtain the corresponding current source data to be processed and the serial address item used to record the address of the next request to be processed; process the current source data to be processed to obtain the current processed target data, and convert the current processed target data into The data is written into the preset data buffer, and it is judged whether the serial address item is empty; if the serial address item is empty, the host is notified to read all processed target data from the preset data buffer.
  • the address recorded in the serial address item is sent to the first-in-first-out queue of the next accelerator card for storage. It can be seen that this application obtains the address of the current pending request from the local first-in-first-out queue, and obtains the current pending request corresponding to the address from the preset request buffer; parses the current pending request, and Based on the parsing results, obtain the corresponding current source data to be processed and the serial address item used to record the address of the next pending request; determine whether the serial address item is empty. If the serial address item is non-empty, the serial address item will be The address recorded in the address item is sent to the first-in-first-out queue of the next accelerator card for storage. There is no need for the host to send the address recorded in the serial address item to the first-in-first-out queue of the next accelerator card every time, reducing the CPU usage. burden, reducing redundant steps, thereby improving the efficiency of processing requests.
  • FIG. 7 is a schematic structural diagram of an electronic device provided by an embodiment of the present application. Specifically, it may include: at least one processor 21, at least one memory 22, power supply 23, communication interface 24, input and output interface 25 and communication bus 26.
  • the memory 22 is used to store computer readable instructions, which are loaded and executed by the processor 21 to implement the relevant steps in the request processing method executed by the computer device disclosed in any of the foregoing embodiments.
  • the power supply 23 is used to provide operating voltage for each hardware device on the computer device 20;
  • the communication interface 24 can create a data transmission channel between the computer device 20 and external devices, and the communication protocol it follows can be applicable Any communication protocol of the technical solution of this application is not specifically limited here;
  • the input and output interface 25 is used to obtain external input data or output data to the external world, and its specific interface type can be selected according to specific application needs. Here Not specifically limited.
  • the processor 21 may include one or more processing cores, such as a 4-core processor, an 8-core processor, etc.
  • the processor 21 can adopt at least one hardware form among DSP (Digital Signal Processing, digital signal processing), FPGA (Field-Programmable Gate Array, field programmable gate array), and PLA (Programmable Logic Array, programmable logic array).
  • the processor 21 may also include a main processor and a co-processor.
  • the main processor is a processor used to process data in the wake-up state, also called CPU (Central Processing Unit, central processing unit); the co-processor is A low-power processor used to process data in standby mode.
  • the processor 21 may be integrated with a GPU (Graphics Processing Unit, image processor), and the GPU is responsible for rendering and drawing the content that needs to be displayed on the display screen.
  • the processor 21 may also include an AI (Artificial Intelligence, artificial intelligence) processor, which is used to process computing operations related to machine learning.
  • AI Artificial Intelligence, artificial intelligence
  • the memory 22, as a carrier for resource storage can be a read-only memory, a random access memory, a magnetic disk or an optical disk, etc.
  • the resources stored thereon include the operating system 221, computer readable instructions 222 and data 223, etc., and the storage method can be short-term. Storage or permanent storage.
  • the operating system 221 is used to manage and control each hardware device and computer readable instructions 222 on the computer device 20 to realize the operation and processing of the massive data 223 in the memory 22 by the processor 21. It can be Windows, Unix, Linux wait.
  • the computer-readable instructions 222 may further include computer-readable instructions that can be used to complete other specific tasks. instruction.
  • the data 223 may also include data collected by its own input and output interface 25, etc.
  • embodiments of the present application also disclose a non-volatile computer-readable storage medium, which stores computer-readable instructions, and the computer-readable instructions are executed by one or more When the processor is executed, the method steps disclosed in any of the above embodiments and executed during request processing can be implemented.
  • Non-volatile memory may include read-only memory (ROM), programmable ROM (PROM), electrically programmable ROM (EPROM), electrically erasable programmable ROM (EEPROM), or flash memory.
  • Volatile memory may include random access memory (RAM) or external cache memory.
  • RAM is available in many forms, such as static RAM (SRAM), dynamic RAM (DRAM), synchronous DRAM (SDRAM), double data rate SDRAM (DDRSDRAM), enhanced SDRAM (ESDRAM), synchronous chain Synchlink DRAM (SLDRAM), memory bus (Rambus) direct RAM (RDRAM), direct memory bus dynamic RAM (DRDRAM), and memory bus dynamic RAM (RDRAM), etc.
  • SRAM static RAM
  • DRAM dynamic RAM
  • SDRAM synchronous DRAM
  • DDRSDRAM double data rate SDRAM
  • ESDRAM enhanced SDRAM
  • SLDRAM synchronous chain Synchlink DRAM
  • Rambus direct RAM
  • DRAM direct memory bus dynamic RAM
  • RDRAM memory bus dynamic RAM

Landscapes

  • Engineering & Computer Science (AREA)
  • Computer Networks & Wireless Communication (AREA)
  • Signal Processing (AREA)
  • Memory System Of A Hierarchy Structure (AREA)

Abstract

本申请公开了一种请求处理方法、装置、设备及介质,涉及管理软件领域,包括:从本地的先进先出队列中获取当前待处理请求的地址,基于当前待处理请求的地址从预设请求缓冲区中获取当前待处理请求;对当前待处理请求进行解析,基于解析结果获取相应的当前待处理源数据和用于记录下一待处理请求的地址的串行地址项;对当前待处理源数据进行处理得到当前处理后目标数据,将当前处理后目标数据写入预设数据缓冲区;在串行地址项为空时,通知主机从预设数据缓冲区读取所有的处理后目标数据。

Description

请求处理方法、装置、设备及介质
相关申请的交叉引用
本申请要求于2022年04月19日提交中国专利局,申请号为202210407248.4,申请名称为“请求处理方法、装置、设备及介质”的中国专利申请的优先权,其全部内容通过引用结合在本申请中。
技术领域
本申请涉及管理软件领域,特别涉及请求处理方法、装置、设备及介质。
背景技术
近年来,随着移动互联网和人工智能等新兴技术的发展,大数据的规模、互联网应用的规模正在呈指数级增长。数据中心需要处理的业务也同样增长,而数据中心使用的通用处理器的计算能力已经达到瓶颈。为了解决这个问题,各种基于FPGA(Field Programmable Gate Array,即现场可编程逻辑门阵列)、ASIC(Application Specific Integrated Circuit,即专用集成电路)的专用计算加速的PCIe(peripheral component interconnect express,即高速串行计算机扩展总线标准)板卡应运而生,如压缩解压缩卡、RAID(Redundant Arrays of Independent Disks,即独立磁盘冗余阵列)存储卡以及我们熟知的用于神经网络训练推导的GPGPU(General-purpose computing on graphics processing units,即通用图形处理器)等。发明人意识到,但是由于不同的加速卡使用不同的驱动接口,所以在现有技术中,对于需要多种加速计算时,每一步计算处理都需要主机进行调度,无疑增加了CPU(central processing unit,即中央处理器)的负担并降低了处理效率。
综上可见,如何降低CPU负担并提高请求处理的效率是本领域有待解决的问题。
发明内容
第一方面,本申请公开了一种请求处理方法,应用于当前加速卡,包括:
从本地的先进先出队列中获取当前待处理请求的地址,并基于当前待处理请求的地址从预设请求缓冲区中获取当前待处理请求;
对当前待处理请求进行解析,并基于解析结果获取相应的当前待处理源数据以及用于记录下一待处理请求的地址的串行地址项;
对当前待处理源数据进行处理以得到当前处理后目标数据,并将当前处理后目标数据写入预设数据缓冲区;
在串行地址项为空时,通知主机从预设数据缓冲区读取所有的处理后目标数据。
第二方面,本申请公开了一种请求处理装置,应用于当前加速卡,包括:
请求获取模块,用于从本地的先进先出队列中获取当前待处理请求的地址,并基于当前待处理请求的地址从预设请求缓冲区中获取当前待处理请求;
请求解析模块,用于对当前待处理请求进行解析,并基于解析结果获取相应的当前待处理源数据以及用于记录下一待处理请求的地址的串行地址项;
处理模块,用于对当前待处理源数据进行处理以得到当前处理后目标数据,并将当前处理后目标数据写入预设数据缓冲区;判断结果处理模块,用于在串行地址项为空时,通知主机从预设数据缓冲区读取所有的处理后目标数据。
第三方面,本申请公开了一种电子设备,包括存储器及一个或多个处理器,存储器中储存有计算机可读指令,上述计算机可读指令被上述一个或多个处理器执行时,使得上述一个或多个处理器执行上述公开的请求处理方法的步骤。
第四方面,本申请公开了一个或多个存储有计算机可读指令的非易失性计算机可读存储介质,上述计算机可读指令被一个或多个处理器执行时,使得上述一个或多个处理器执行上述公开的请求处理方法的步骤。
本申请的一个或多个实施例的细节在下面的附图和描述中提出。本申请的其它特征和优点将从说明书、附图以及权利要求书变得明显。
附图说明
为了更清楚地说明本申请实施例或现有技术中的技术方案,下面将对实施例或现有技术描述中所需要使用的附图作简单地介绍,显而易见地,下面描述中的附图仅仅是本申请的实施例,对于本领域普通技术人员来讲,在不付出创造性劳动的前提下,还可以根据提供的附图获得其他的附图。
图1为本申请根据一个或多个实施例公开的一种请求处理方法流程图;
图2为本申请根据一个或多个实施例公开的一种具体的请求处理方法流程图;
图3为本申请根据一个或多个实施例公开的一种具体的请求链表示意图;
图4为本申请根据一个或多个实施例公开的一种具体的请求处理方法流程图;
图5为本申请根据一个或多个实施例公开的一种具体的请求处理方法示意图;
图6为本申请根据一个或多个实施例公开的一种请求处理装置结构示意图;
图7为本申请根据一个或多个实施例公开的一种电子设备结构图。
具体实施方式
下面将结合本申请实施例中的附图,对本申请实施例中的技术方案进行清楚、完整地描述,显然,所描述的实施例仅仅是本申请一部分实施例,而不是全部的实施例。基于本申请中的实施例,本领域普通技术人员在没有做出创造性劳动前提下所获得的所有其他实施例,都属于本申请保护的范围。
由于不同的加速卡使用不同的驱动接口,所以在现有技术中,对于需要多种加速计算时,每一步计算处理都需要主机进行调度,无疑增加了CPU(central processing unit,即中央处理器)的负担并降低了处理效率。
为此本申请相应的提供了一种请求处理方案,能够降低CPU负担并提高请求处理的效率。
参见图1所示,本申请实施例公开了一种请求处理方案,应用于当前加速卡,包括:
步骤S11:从本地的先进先出队列中获取当前待处理请求的地址,并基于当前待处理请求的地址从预设请求缓冲区中获取当前待处理请求。
本实施例中,从本地的先进先出队列(First Input First Output,即FIFO)中获取当前待处理请求的地址,并从预设请求缓冲区中获取与该地址对应的当前处理请求,可以理解的是,由主机预先将当前处理请求的地址写入当前加速卡的先进先出队列中。
步骤S12:对当前待处理请求进行解析,并基于解析结果获取相应的当前待处理源数据以及用于记录下一待处理请求的地址的串行地址项。
本实施例中,对当前待处理请求进行解析以得到相应的解析结果,其中解析结果包括上一已处理请求对应的处理后目标数据的地址、当前处理后目标数据的地址、用于记录下一待处理请求的地址的串行地址项、并行加速卡地址以及用于记录与当前待处理请求对应的当前待处理并行请求的并行地址项。可以理解的是,基于上一已处理请求对应的处理后目标数据的地址从预设数据缓冲区中获取相应的当前待处理源数据。
步骤S13:对当前待处理源数据进行处理以得到当前处理后目标数据,并将当前处理后目标数据写入预设数据缓冲区,并判断串行地址项是否为空。
本实施例中,对当前待处理源数据进行相应的处理后得到当前处理后目标数据,并基于用于存储当前处理后目标数据的地址将当前处理后目标数据写入预设数据缓冲区中对应的位置。判断串行地址项是否为空,即判断是否存在下一待处理请求,可以预先设定若串行地址项为空,则不存在下一待处理请求,也可以预先设定串行地址为0,则不存在下一待处理请求。在对当前待处理源数据进行相应处理之前,还需要判断用于记录与当前待处理请求对应的当前待处理并行请求的并行地址项是否为空,需要注意的是,也可以预先设定若并行地址项,则不存在当前待处理并行请求,也可以预先设定并行地址为0,则不存在当前待处理并行请求。
本实施例中,对当前待处理源数据进行处理以得到当前处理后目标数据,并将当前处理后目标数据写入预设数据缓冲区过程中,还包括:当对当前待处理源数据进行处理存在错误时,则向预先通过主机为当前加速卡分配的响应队列中写入相应的预设错误响应操作,并通知主机执行预设错误响应操作。若当前加速卡对当前待处理请求进行处理时出现错误,则当前加速卡向本地的响应队列中写入相应的预设错误响应操作,并通过MSI(Message Signaled Interrupts,即消息信号中断)中断来通知主机执行相应的预设错误响应操作。
步骤S14:若串行地址项为空,则通知主机从预设数据缓冲区读取所有的处理后目标数据,若串行地址项为非空,则将串行地址项中记录的地址发送至下一加速卡的先进先出队列中进行保存。
本实施例中,若串行地址项为空,则判定不存在下一待处理请求,因此由当前加速卡向本地的响应队列中写入预设读取响应,例如使得响应队列中的相位标记由0转变为1,然后可以向主机发送MSI中断以通知主机从预设数据缓冲区读取所有的处理后目标数据;当主机轮询响应队列中的相位标记不为0时,则认为有新的预设读取响应写入了响应队列中,并从预设数据缓冲区读取所有的处理后目标数据,然后主机再将相位标记由1设置为0,以便下一次有新的预设读取响应写入响应队列中,使得相位标记由0转变为1。若串行地址项为非空,则将串行地址项中记录的地址发送至下一加速卡的先进先出队列中进行保存,以便下一加速卡基于自身的先进先出队列中获取串行地址项中记录的地址,并从预设请求缓冲区中获取与该地址对应的下一待处理请求。可以理解的是,PCIe EP(endpoint,端点设备即本申请中的加速卡)直接向另一个PCIe EP发送串行地址项中记录的地址发送至下一加速卡,即基于P2P完成当前加速卡与下一加速卡之间的协同处理。
可见,本申请从本地的先进先出队列中获取当前待处理请求的地址,并基于当前待处理请求的地址从预设请求缓冲区中获取当前待处理请求;对当前待处理请求进行解 析,并基于解析结果获取相应的当前待处理源数据以及用于记录下一待处理请求的地址的串行地址项;对当前待处理源数据进行处理以得到当前处理后目标数据,并将当前处理后目标数据写入预设数据缓冲区,并判断串行地址项是否为空;若串行地址项为空,则通知主机从预设数据缓冲区读取所有的处理后目标数据,若串行地址项为非空,则将串行地址项中记录的地址发送至下一加速卡的先进先出队列中进行保存。由此可见,本申请从本地的先进先出队列中获取当前待处理请求的地址,并从预设请求缓冲区中获取与该地址对应的当前待处理请求;将当前待处理请求进行解析,并基于解析结果获取相应的当前待处理源数据以及用于记录下一待处理请求的地址的串行地址项;判断串行地址项是否为空,若串行地址项为非空,则将串行地址项中记录的地址发送至下一加速卡的先进先出队列中进行保存,无需每一次由主机将串行地址项中记录的地址发送至下一加速卡的先进先出队列,减少CPU的负担,减少冗余步骤,进而提高对请求的处理效率。
参见图2所示,本申请实施例公开了一种具体的请求处理方案,包括:
步骤S21:从本地的先进先出队列中获取当前待处理请求的地址,并基于当前待处理请求的地址从预设请求缓冲区中获取当前待处理请求。
步骤S22:对当前待处理请求进行解析以得到相应的解析结果;解析结果包括上一已处理请求对应的处理后目标数据的地址、当前处理后目标数据的地址以及用于记录下一待处理请求的地址的串行地址项;基于上一已处理请求对应的处理后目标数据的地址,从预设数据缓冲区中获取相应的数据,以得到当前待处理源数据。
本实施例中,对当前待处理请求进行解析的过程中,还包括:提取当前待处理请求中携带的串行加速卡地址;相应的,将串行地址项中记录的地址发送至下一加速卡的先进先出队列中进行保存,包括:基于当前待处理请求中携带的串行加速卡地址,确定出下一加速卡,并将串行地址项中记录的地址发送至下一加速卡的先进先出队列中进行保存。
本实施例中,对当前待处理请求进行解析的过程中,还包括:提取当前待处理请求中携带的并行加速卡地址和并行地址项;并行地址项用于记录与当前待处理请求对应的当前待处理并行请求的地址;相应的,对当前待处理源数据进行处理以得到当前处理后目标数据之前,还包括:判断并行地址项是否为空;若并行地址项为非空,则基于当前待处理请求中携带的并行加速卡地址确定出相应的目标并行加速卡,并将并行地址项中记录的地址发送至目标并行加速卡进行保存,以便目标并行加速卡利用该地址从预设请求缓冲区中获取当前待处理并行请求。例如图3所示,主机根据实际需求,准备需要处 理的数据并创建相应的一个或多个待处理请求,可以理解的是,每一待处理请求中包含当前待处理请求的特定数据、用于记录下一待处理请求的地址的串行地址项、串行加速卡地址、并行加速卡地址、用于记录与当前待处理请求对应的当前待处理并行请求的并行地址项以及该请求的请求号,其中当前待处理请求的特定数据中包含上一已处理请求对应的处理后目标数据的地址、当前处理后目标数据的地址以及当前待处理请求的相关参数,并行加速卡地址和串行加速卡地址可以为对应加速卡的BAR(Base Address Register,即基地址寄存器)地址,并且由于每一个请求中包含串行地址项和并行地址项,因此可以形成如图3所示的包含并行请求链表和串行请求链表的请求链表,需要注意的是,在并行请求链表中当前待处理请求以及对应的当前待处理并行请求的上一已处理请求对应的处理后目标数据的地址相同,同一个请求链表中所有待处理请求的请求号均相同,以便该请求链表中的当前加速卡判定串行地址项为空,则携带该请求号通知主机从预设数据缓冲区读取所有的处理后目标数据。
步骤S23:对当前待处理源数据进行处理以得到当前处理后目标数据,并将当前处理后目标数据写入预设数据缓冲区,并判断串行地址项是否为空。
本实施例中,将当前处理后目标数据写入预设数据缓冲区,具体包括:基于解析结果中携带的当前处理后目标数据的地址,将当前处理后目标数据写入预设数据缓冲区的相应位置。
步骤S24:若串行地址项为空,则通知主机从预设数据缓冲区读取所有的处理后目标数据,若串行地址项为非空,则将串行地址项中记录的地址发送至下一加速卡的先进先出队列中进行保存。
其中,关于上述步骤S21以及S24更加具体的工作过程可以参考前述实施例中公开的相应内容,在此不再进行赘述。
由此可见,本申请中无需每一步处理都由主机来调用不同的加速卡,进而降低了CPU的负担,减少冗余步骤;当前加速卡将用于记录与当前待处理请求对应的当前待处理并行请求的并行地址项写入与并行加速卡地址对应的加速卡中,将串行地址项中记录的地址发送至与串行加速卡地址对应的下一加速卡的先进先出队列中进行保存,因此可以并行或串行协同处理请求,提高处理效率。
参见图4所示,本申请实施例公开了一种具体的请求处理方案,包括:
步骤S31:从本地的基址寄存器空间中获取先进先出队列的入口地址以及先进先出队列的当前可用容量。
本实施例中,从本地的基址寄存器空间中获取先进先出队列的入口地址以及先进先出队列的当前可用容量之前,主机进行系统初始化,并为每一个加速卡分配预设数量个响应队列,以便相应的加速卡向响应队列中写入预设读取响应,然后可以向主机发送MSI中断以通知主机从预设数据缓冲区读取所有的处理后目标数据,进而避免多个加速卡之间的数据冲突。主机从预设请求缓冲区中申请用于存放待处理请求的请求空间,并从预设数据缓冲区中申请用于存储处理后目标数据的数据空间。
步骤S32:当当前可用容量不小于预设容量阈值时,通过主机或上一加速卡将当前待处理请求的地址基于入口地址写入先进先出队列中。
本实施例中,当当前可用容量不小于预设容量阈值时,通过主机或上一加速卡将当前待处理请求的地址基于入口地址写入先进先出队列中之后,还包括:基于当前待处理请求的地址所占空间大小,重新计算当前可用容量,以得到更新后可用容量,并判断更新后可用容量是否不小于预设容量阈值;若更新后可用容量小于预设容量阈值,则禁止将新的待处理请求的地址写入先进先出队列。例如,当前可用容量为1,预设容量阈值为1,即当前可用容量不小于预设容量阈值,当前待处理请求的地址所占空间大小为1,通过主机将当前待处理请求的地址基于入口地址写入该先进先出队列之后,重新计算当前可用容量得到更新后可用容量为0,并判定更新后可用容量小于预设容量阈值,因此禁止将新的待处理请求的地址写入该先进先出队列。可以理解的是,如果当前可用容量为1,有两个待处理请求的地址需要被写入该先进先出队列时,则利用PCIe总线(外围组件互连快速总线)的仲裁机制,第一个待处理请求收到该先进先出队列为其分配为1的空间,第二个待处理请求收到该先进先出队列为其分配为0的空间,进而第一个待处理请求的地址被写入该先进先出队列,第二个待处理请求的地址不能被写入该先进先出队列。
步骤S33:从本地的先进先出队列中获取当前待处理请求的地址,并基于当前待处理请求的地址从预设请求缓冲区中获取当前待处理请求。
步骤S34:对当前待处理请求进行解析,并基于解析结果获取相应的当前待处理源数据以及用于记录下一待处理请求的地址的串行地址项。
步骤S35:对当前待处理源数据进行处理以得到当前处理后目标数据,并将当前处理后目标数据写入预设数据缓冲区,并判断串行地址项是否为空。
步骤S36:若串行地址项为空,则通知主机从预设数据缓冲区读取所有的处理后目标数据,若串行地址项为非空,则将串行地址项中记录的地址发送至下一加速卡的先进先出队列中进行保存。
其中,关于上述步骤S33、S34、S35以及S36更加具体的工作过程可以参考前述实施例中公开的相应内容,在此不再进行赘述。
由此可见,本申请基于本地的先进先出队列的当前可用容量,判断主机或上一加速卡能否将当前待处理请求的地址基于入口地址写入先进先出队列中,避免因先进先出队列的当前可用容量不能满足当前待处理请求的地址所占空间大小而出现先进先出队列溢出错误的情况;由当前加速卡将下一待处理请求的地址发送至下一加速卡的先进先出队列中,减少主机的负担,进而降低CPU的工作量,提高请求处理的效率。
下面以图5所示的一种具体的请求处理方法示意图为例,对本申请中的技术方案进行说明。主机进行系统初始化,为每一个加速卡分配预设数量的响应队列,例如为每一个加速卡分配一个环形响应队列,避免多个加速卡之间的数据冲突;主机基于实际需求,准备待处理数据并创建对应一个或多个的待处理请求,其中每一个待处理请求中包含该请求的请求号、当前待处理请求的特定数据、用于记录下一待处理请求的地址的串行地址项、串行加速卡地址、并行加速卡地址、用于记录与当前待处理请求对应的当前待处理并行请求的并行地址项,其中当前待处理请求的特定数据包含上一已处理请求对应的处理后目标数据的地址和当前处理后目标数据的地址;主机获取初始加速卡的先进先出队列(FIFO)的当前可用容量,若当前可用容量不小于预设容量阈值,则将初始待处理请求的地址基于初始加速卡的先进先出队列的入口地址(FIFO Entry)写入该先进先出队列中,其中预设容量阈值可以设置为初始待处理请求的地址所占空间大小,并且可以基于实际情况更改预设容量阈值,若有两个待处理请求的地址需要被写入该先进先出队列时,则利用PCIe总线(PCIe Bus)的仲裁机制,确定第一个待处理请求和第二个待处理请求。
如果当前加速卡检测到本地的先进先出队列中被写入当前待处理请求的地址,则从该先进先出队列中获取当前待处理请求的地址,并基于当前待处理请求的地址所占空间大小,重新计算当前可用容量(FIFO Depth),以得到更新后可用容量,当存在新的待处理请求的地址需要被写入该先进先出队列时,需要判断更新后可用容量是否小于预设容量阈值,若小于则禁止新的待处理请求的地址写入该先进先出队列,可以理解的是,如果当前加速卡为初始加速卡,则获取初始待处理请求的地址。基于当前待处理请求的地址从预设请求缓冲区中获取相应的当前待处理请求,并对当前待处理请求进行解析,因此获取包含该请求的请求号、当前待处理请求的特定数据、用于记录下一待处理请求的地址的串行地址项、串行加速卡地址、并行加速卡地址、用于记录与当前待处理请求对应的当前待处理并行请求的并行地址项的解析结果,判断并行地址项所记录的地址是否 为空,若否则将并行地址项所记录的地址发送至与并行加速卡地址对应的目标并行加速卡的先进先出队列中进行保存,以便目标并行加速卡利用该地址从预设请求缓冲区中获取当前待处理并行请求;利用上一已处理请求对应的处理后目标数据的地址,从预设数据缓冲区中获取相应的数据,以得到当前待处理源数据,对当前待处理源数据进行处理以得到当前处理后目标数据,然后基于当前处理后目标数据的地址将当前处理后目标数据写入预设数据缓冲区的相应位置。当对当前待处理源数据进行处理存在错误时,则向预先通过主机为当前加速卡分配的响应队列中写入相应的预设错误响应操作,并通知主机执行预设错误响应操作。
判断串行地址项是否为空,若串行地址项为非空,则将串行地址项中记录的地址发送至与串行加速卡地址对应的下一加速卡的先进先出队列中进行保存;若串行地址项为空,当前加速卡向本地的响应队列中写入预设读取响应,使得响应队列中的相位标记由0转变为1,然后可以向主机发送MSI中断以通知主机从预设数据缓冲区读取所有的处理后目标数据。
当主机轮询响应队列中的相位标记不为0时,则认为有新的预设读取响应写入了响应队列中,并从预设数据缓冲区读取所有的处理后目标数据,然后主机再将相位标记由1设置为0,以便下一次有新的预设读取响应写入响应队列中,使得相位标记由0转变为1。
参见图6所示,本申请实施例公开了一种请求处理装置,应用于当前加速卡,包括:
请求获取模块11,用于从本地的先进先出队列中获取当前待处理请求的地址,并基于当前待处理请求的地址从预设请求缓冲区中获取当前待处理请求;
请求解析模块12,用于对当前待处理请求进行解析,并基于解析结果获取相应的当前待处理源数据以及用于记录下一待处理请求的地址的串行地址项;
处理模块13,用于对当前待处理源数据进行处理以得到当前处理后目标数据,并将当前处理后目标数据写入预设数据缓冲区;
判断模块14,用于判断串行地址项是否为空;
判断结果处理模块15,用于若串行地址项为空,则通知主机从预设数据缓冲区读取所有的处理后目标数据,若串行地址项为非空,则将串行地址项中记录的地址发送至下一加速卡的先进先出队列中进行保存。
可见,本申请从本地的先进先出队列中获取当前待处理请求的地址,并基于当前待处理请求的地址从预设请求缓冲区中获取当前待处理请求;对当前待处理请求进行解析,并基于解析结果获取相应的当前待处理源数据以及用于记录下一待处理请求的地址 的串行地址项;对当前待处理源数据进行处理以得到当前处理后目标数据,并将当前处理后目标数据写入预设数据缓冲区,并判断串行地址项是否为空;若串行地址项为空,则通知主机从预设数据缓冲区读取所有的处理后目标数据,若串行地址项为非空,则将串行地址项中记录的地址发送至下一加速卡的先进先出队列中进行保存。由此可见,本申请从本地的先进先出队列中获取当前待处理请求的地址,并从预设请求缓冲区中获取与该地址对应的当前待处理请求;将当前待处理请求进行解析,并基于解析结果获取相应的当前待处理源数据以及用于记录下一待处理请求的地址的串行地址项;判断串行地址项是否为空,若串行地址项为非空,则将串行地址项中记录的地址发送至下一加速卡的先进先出队列中进行保存,无需每一次由主机将串行地址项中记录的地址发送至下一加速卡的先进先出队列,减少CPU的负担,减少冗余步骤,进而提高对请求的处理效率。
图7为本申请实施例提供的一种电子设备的结构示意图。具体可以包括:至少一个处理器21、至少一个存储器22、电源23、通信接口24、输入输出接口25和通信总线26。其中,存储器22用于存储计算机可读指令,计算机可读指令由处理器21加载并执行,以实现前述任一实施例公开的由计算机设备执行的请求处理方法中的相关步骤。
本实施例中,电源23用于为计算机设备20上的各硬件设备提供工作电压;通信接口24能够为计算机设备20创建与外界设备之间的数据传输通道,其所遵循的通信协议是能够适用于本申请技术方案的任意通信协议,在此不对其进行具体限定;输入输出接口25,用于获取外界输入数据或向外界输出数据,其具体的接口类型可以根据具体应用需要进行选取,在此不进行具体限定。
其中,处理器21可以包括一个或多个处理核心,比如4核心处理器、8核心处理器等。处理器21可以采用DSP(Digital Signal Processing,数字信号处理)、FPGA(Field-Programmable Gate Array,现场可编程门阵列)、PLA(Programmable Logic Array,可编程逻辑阵列)中的至少一种硬件形式来实现。处理器21也可以包括主处理器和协处理器,主处理器是用于对在唤醒状态下的数据进行处理的处理器,也称CPU(Central Processing Unit,中央处理器);协处理器是用于对在待机状态下的数据进行处理的低功耗处理器。在一些实施例中,处理器21可以在集成有GPU(Graphics Processing Unit,图像处理器),GPU用于负责显示屏所需要显示的内容的渲染和绘制。一些实施例中,处理器21还可以包括AI(Artificial Intelligence,人工智能)处理器,该AI处理器用于处理有关机器学习的计算操作。
另外,存储器22作为资源存储的载体,可以是只读存储器、随机存储器、磁盘或者 光盘等,其上所存储的资源包括操作系统221、计算机可读指令222及数据223等,存储方式可以是短暂存储或者永久存储。
其中,操作系统221用于管理与控制计算机设备20上的各硬件设备以及计算机可读指令222,以实现处理器21对存储器22中海量数据223的运算与处理,其可以是Windows、Unix、Linux等。计算机可读指令222除了包括能够用于完成前述任一实施例公开的由计算机设备20执行的请求处理方法的计算机可读指令之外,还可以进一步包括能够用于完成其他特定工作的计算机可读指令。数据223除了可以包括计算机设备接收到的由外部设备传输进来的数据,也可以包括由自身输入输出接口25采集到的数据等。
进一步的,本申请实施例还公开了一种非易失性计算机可读存储介质,该非易失性计算机可读存储介质中存储有计算机可读指令,该计算机可读指令被一个或多个处理器执行时可实现上述任一实施例公开的由请求处理过程中执行的方法步骤。
最后,还需要说明的是,在本文中,诸如第一和第二等之类的关系术语仅仅用来将一个实体或者操作与另一个实体或操作区分开来,而不一定要求或者暗示这些实体或操作之间存在任何这种实际的关系或者顺序。而且,术语“包括”、“包含”或者其任何其他变体意在涵盖非排他性的包含,从而使得包括一系列要素的过程、方法、物品或者设备不仅包括那些要素,而且还包括没有明确列出的其他要素,或者是还包括为这种过程、方法、物品或者设备所固有的要素。在没有更多限制的情况下,由语句“包括一个……”限定的要素,并不排除在包括要素的过程、方法、物品或者设备中还存在另外的相同要素。
本领域普通技术人员可以理解实现上述实施例方法中的全部或部分流程,是可以通过计算机可读指令来指令相关的硬件来完成,上述的计算机可读指令可存储于一非易失性计算机可读取存储介质中,该计算机可读指令在执行时,可包括如上述各方法的实施例的流程。其中,本申请所提供的各实施例中所使用的对存储器、存储、数据库或其它介质的任何引用,均可包括非易失性和/或易失性存储器。非易失性存储器可包括只读存储器(ROM)、可编程ROM(PROM)、电可编程ROM(EPROM)、电可擦除可编程ROM(EEPROM)或闪存。易失性存储器可包括随机存取存储器(RAM)或者外部高速缓冲存储器。作为说明而非局限,RAM以多种形式可得,诸如静态RAM(SRAM)、动态RAM(DRAM)、同步DRAM(SDRAM)、双数据率SDRAM(DDRSDRAM)、增强型SDRAM(ESDRAM)、同步链路(Synchlink)DRAM(SLDRAM)、存储器总线(Rambus)直接RAM(RDRAM)、直接存储器总线动态RAM(DRDRAM)、以及存储器总线动态RAM(RDRAM)等。
以上实施例的各技术特征可以进行任意的组合,为使描述简洁,未对上述实施例中的各个技术特征所有可能的组合都进行描述,然而,只要这些技术特征的组合不存在矛盾,都应当认为是本说明书记载的范围。
以上上述实施例仅表达了本申请的几种实施方式,其描述较为具体和详细,但并不能因此而理解为对申请专利范围的限制。应当指出的是,对于本领域的普通技术人员来说,在不脱离本申请构思的前提下,还可以做出若干变形和改进,这些都属于本申请的保护范围。因此,本申请专利的保护范围应以所附权利要求为准。

Claims (20)

  1. 一种请求处理方法,其特征在于,应用于当前加速卡,包括:
    从本地的先进先出队列中获取当前待处理请求的地址,并基于所述当前待处理请求的地址从预设请求缓冲区中获取所述当前待处理请求;
    对所述当前待处理请求进行解析,并基于解析结果获取相应的当前待处理源数据以及用于记录下一待处理请求的地址的串行地址项;
    对所述当前待处理源数据进行处理以得到当前处理后目标数据,并将所述当前处理后目标数据写入预设数据缓冲区;和
    在所述串行地址项为空时,通知主机从所述预设数据缓冲区读取所有的所述处理后目标数据。
  2. 根据权利要求1所述的请求处理方法,其特征在于,所述方法还包括:
    在所述串行地址项为非空时,将所述串行地址项中记录的地址发送至下一加速卡的先进先出队列中进行保存。
  3. 根据权利要求1所述的请求处理方法,其特征在于,所述对所述当前待处理请求进行解析,并基于解析结果获取相应的当前待处理源数据以及用于记录下一待处理请求的地址的串行地址项,包括:
    对所述当前待处理请求进行解析以得到相应的解析结果;所述解析结果包括上一已处理请求对应的所述处理后目标数据的地址;和
    基于所述上一已处理请求对应的所述处理后目标数据的地址,从预设数据缓冲区中获取相应的数据,以得到当前待处理源数据。
  4. 根据权利要求3所述的请求处理方法,其特征在于,所述解析结果还包括所述当前处理后目标数据的地址、用于记录下一待处理请求的地址的串行地址项、并行加速卡地址以及用于记录与当前待处理请求对应的当前待处理并行请求的并行地址项。
  5. 根据权利要求3所述的请求处理方法,其特征在于,所述将所述当前处理后目标数据写入预设数据缓冲区,包括:
    基于所述解析结果中携带的所述当前处理后目标数据的地址,将所述当前处理后目标数据写入所述预设数据缓冲区的相应位置。
  6. 根据权利要求3所述的请求处理方法,其特征在于,所述对所述当前待处理请求进行解析的过程中,还包括:
    提取所述当前待处理请求中携带的串行加速卡地址。
  7. 根据权利要求6所述的请求处理方法,其特征在于,,所述将所述串行地址项中记录的地址发送至下一加速卡的先进先出队列中进行保存,包括:
    基于所述当前待处理请求中携带的所述串行加速卡地址,确定出下一加速卡,并将所述串行地址项中记录的地址发送至所述下一加速卡的先进先出队列中进行保存。
  8. 根据权利要求6所述的请求处理方法,其特征在于,所述对所述当前待处理请求进行解析的过程中,还包括:
    提取所述当前待处理请求中携带的并行加速卡地址和并行地址项;所述并行地址项用于记录与所述当前待处理请求对应的当前待处理并行请求的地址。
  9. 根据权利要求8所述的请求处理方法,其特征在于,所述对所述当前待处理源数据进行处理以得到当前处理后目标数据之前,还包括:
    在所述并行地址项为非空时,基于所述当前待处理请求中携带的所述并行加速卡地址确定出相应的目标并行加速卡,并将所述并行地址项中记录的地址发送至所述目标并行加速卡进行保存,以便所述目标并行加速卡利用该地址从所述预设请求缓冲区中获取所述当前待处理并行请求。
  10. 根据权利要求1所述的请求处理方法,其特征在于,所述从本地的先进先出队列中获取当前待处理请求的地址之前,还包括:
    从本地的基址寄存器空间中获取所述先进先出队列的入口地址以及所述先进先出队列的当前可用容量;和
    当所述当前可用容量不小于预设容量阈值时,通过主机或上一加速卡将所述当前待处理请求的地址基于所述入口地址写入所述先进先出队列中。
  11. 根据权利要求10所述的请求处理方法,其特征在于,所述当所述当前可用容量不小于预设容量阈值时,通过主机或上一加速卡将所述当前待处理请求的地址基于所述入口地址写入所述先进先出队列中之后,还包括:
    基于所述当前待处理请求的地址所占空间大小,重新计算所述当前可用容量,以得到更新后可用容量;和
    在所述更新后可用容量小于所述预设容量阈值时,禁止将新的待处理请求的地址写入所述先进先出队列。
  12. 根据权利要求1所述的请求处理方法,其特征在于,所述对所述当前待处理源数据进行处理以得到当前处理后目标数据,并将所述当前处理后目标数据写入预设数据缓冲区,还包括:
    当对所述当前待处理源数据进行处理存在错误时,向预先通过所述主机为所述当前加速卡分配的响应队列中写入相应的预设错误响应操作,并通知所述主机执行所述预设错误响应操作。
  13. 根据权利要求1所述的请求处理方法,其特征在于,所述对所述当前待处理源数据进行处理以得到当前处理后目标数据,并将所述当前处理后目标数据写入预设数据缓冲区,还包括:
    在当前加速卡对当前待处理请求进行处理时出现错误时,当前加速卡向本地的响应队列中写入相应的预设错误响应操作,并通过消息信号中断设备MSI中断来通知主机执行相应的预设错误响应操作。
  14. 根据权利要求1所述的请求处理方法,其特征在于,所述从本地的先进先出队列中获取当前待处理请求的地址之前,包括:
    主机预先将当前处理请求的地址写入当前加速卡的先进先出队列中。
  15. 根据权利要求1所述的请求处理方法,其特征在于,所述串行地址项为空,确定不存在下一待处理请求,所述串行地址项为非空,确定存在下一待处理请求。
  16. 根据权利要求1至15任一项所述的请求处理方法,其特征在于,所述对所述当前待处理源数据进行处理以得到当前处理后目标数据,并将所述当前处理后目标数据写入预设数据缓冲区过程中,还包括:
    当对所述当前待处理源数据进行处理存在错误时,则向预先通过所述主机为所述当前加速卡分配的响应队列中写入相应的预设错误响应操作,并通知所述主机执行所述预设错误响应操作。
  17. 一种请求处理装置,其特征在于,应用于当前加速卡,包括:
    请求获取模块,用于从本地的先进先出队列中获取当前待处理请求的地址,并基于所述当前待处理请求的地址从预设请求缓冲区中获取所述当前待处理请求;
    请求解析模块,用于对所述当前待处理请求进行解析,并基于解析结果获取相应的当前待处理源数据以及用于记录下一待处理请求的地址的串行地址项;
    处理模块,用于对所述当前待处理源数据进行处理以得到当前处理后目标数据,并将所述当前处理后目标数据写入预设数据缓冲区;和
    判断结果处理模块,用于在所述串行地址项为空时,通知主机从所述预设数据缓冲区读取所有的所述处理后目标数据。
  18. 根据权利要求17所述的请求处理装置,其特征在于,所述判断结果处理模块还用于在所述串行地址项为非空时,将所述串行地址项中记录的地址发送至下一加速卡的先进先出队列中进行保存。
  19. 一种电子设备,其特征在于,包括存储器及一个或多个处理器,所述存储器中储存有计算机可读指令,所述计算机可读指令被所述一个或多个处理器执行时,使得所述一个或多个处理器执行如权利要求1至16任一项所述的请求处理方法的步骤。
  20. 一个或多个存储有计算机可读指令的非易失性计算机可读存储介质,其特征在于,所述计算机可读指令被一个或多个处理器执行时,使得所述一个或多个处理器执行如权利要求1至16任一项所述的请求处理方法的步骤。
PCT/CN2022/121127 2022-04-19 2022-09-23 请求处理方法、装置、设备及介质 WO2023201987A1 (zh)

Applications Claiming Priority (2)

Application Number Priority Date Filing Date Title
CN202210407248.4 2022-04-19
CN202210407248.4A CN114513545B (zh) 2022-04-19 2022-04-19 请求处理方法、装置、设备及介质

Publications (1)

Publication Number Publication Date
WO2023201987A1 true WO2023201987A1 (zh) 2023-10-26

Family

ID=81554829

Family Applications (1)

Application Number Title Priority Date Filing Date
PCT/CN2022/121127 WO2023201987A1 (zh) 2022-04-19 2022-09-23 请求处理方法、装置、设备及介质

Country Status (2)

Country Link
CN (1) CN114513545B (zh)
WO (1) WO2023201987A1 (zh)

Families Citing this family (2)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN114513545B (zh) * 2022-04-19 2022-07-12 苏州浪潮智能科技有限公司 请求处理方法、装置、设备及介质
CN115037798B (zh) * 2022-08-11 2022-12-27 成都金诺信高科技有限公司 一种时统报文数据包分发方法

Citations (5)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN109558250A (zh) * 2018-11-02 2019-04-02 锐捷网络股份有限公司 一种基于fpga的通信方法、设备、主机及异构加速系统
CN113515320A (zh) * 2021-05-26 2021-10-19 新华三信息技术有限公司 一种硬件加速处理方法、装置以及服务器
CN114138481A (zh) * 2021-11-26 2022-03-04 浪潮电子信息产业股份有限公司 一种数据处理方法、装置及介质
CN114244790A (zh) * 2022-02-24 2022-03-25 摩尔线程智能科技(北京)有限责任公司 PCIe设备与主机设备的通信方法、系统及设备
CN114513545A (zh) * 2022-04-19 2022-05-17 苏州浪潮智能科技有限公司 请求处理方法、装置、设备及介质

Family Cites Families (3)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
EP3195096B1 (en) * 2014-08-02 2020-08-12 Apple Inc. Context-specific user interfaces
US11194722B2 (en) * 2018-03-15 2021-12-07 Intel Corporation Apparatus and method for improved cache utilization and efficiency on a many core processor
CN109062826B (zh) * 2018-08-16 2022-04-15 北京算能科技有限公司 数据传输方法及系统

Patent Citations (5)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN109558250A (zh) * 2018-11-02 2019-04-02 锐捷网络股份有限公司 一种基于fpga的通信方法、设备、主机及异构加速系统
CN113515320A (zh) * 2021-05-26 2021-10-19 新华三信息技术有限公司 一种硬件加速处理方法、装置以及服务器
CN114138481A (zh) * 2021-11-26 2022-03-04 浪潮电子信息产业股份有限公司 一种数据处理方法、装置及介质
CN114244790A (zh) * 2022-02-24 2022-03-25 摩尔线程智能科技(北京)有限责任公司 PCIe设备与主机设备的通信方法、系统及设备
CN114513545A (zh) * 2022-04-19 2022-05-17 苏州浪潮智能科技有限公司 请求处理方法、装置、设备及介质

Also Published As

Publication number Publication date
CN114513545B (zh) 2022-07-12
CN114513545A (zh) 2022-05-17

Similar Documents

Publication Publication Date Title
EP3754498B1 (en) Architecture for offload of linked work assignments
WO2023201987A1 (zh) 请求处理方法、装置、设备及介质
US9448846B2 (en) Dynamically configurable hardware queues for dispatching jobs to a plurality of hardware acceleration engines
US10331595B2 (en) Collaborative hardware interaction by multiple entities using a shared queue
EP2849076B1 (en) Dma transmission method and system
US7200695B2 (en) Method, system, and program for processing packets utilizing descriptors
US8478926B1 (en) Co-processing acceleration method, apparatus, and system
CN107967225B (zh) 数据传输方法、装置、计算机可读存储介质和终端设备
US9086920B2 (en) Device for managing data buffers in a memory space divided into a plurality of memory elements
JP2000330806A (ja) 計算機システム
US20070162637A1 (en) Method, apparatus and program storage device for enabling multiple asynchronous direct memory access task executions
CN111309649B (zh) 一种数据传输和任务处理方法、装置及设备
TW508522B (en) System input/output interface design for scaling
EP3770759A1 (en) Wake-up and scheduling of functions with context hints
US11995351B2 (en) DMA engines configured to perform first portion data transfer commands with a first DMA engine and second portion data transfer commands with second DMA engine
US7913059B2 (en) Information processing device, data transfer method, and information storage medium
US8972693B2 (en) Hardware managed allocation and deallocation evaluation circuit
US9405470B2 (en) Data processing system and data processing method
US20230393782A1 (en) Io request pipeline processing device, method and system, and storage medium
CN115525226A (zh) 硬件批量指纹计算方法、装置及设备
CN116107774A (zh) Io请求处理方法、装置、电子设备和存储介质
CN117573041B (zh) 一种改进vhost-scsi提升虚拟化存储性能的方法
KR102536943B1 (ko) 데이터 절감 장치, 데이터 절감 방법 및 데이터 절감 장치를 포함하는 시스템
CN113094320B (zh) 并行消息仲裁装置及方法
WO2023231937A1 (zh) 调度装置、方法及相关设备

Legal Events

Date Code Title Description
121 Ep: the epo has been informed by wipo that ep was designated in this application

Ref document number: 22938210

Country of ref document: EP

Kind code of ref document: A1