CN113487033A - Inference method and device with graphic processor as execution core - Google Patents

Inference method and device with graphic processor as execution core Download PDF

Info

Publication number
CN113487033A
CN113487033A CN202110874284.7A CN202110874284A CN113487033A CN 113487033 A CN113487033 A CN 113487033A CN 202110874284 A CN202110874284 A CN 202110874284A CN 113487033 A CN113487033 A CN 113487033A
Authority
CN
China
Prior art keywords
inference
network interface
interface controller
graphics processor
command
Prior art date
Legal status (The legal status is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the status listed.)
Granted
Application number
CN202110874284.7A
Other languages
Chinese (zh)
Other versions
CN113487033B (en
Inventor
不公告发明人
Current Assignee (The listed assignees may be inaccurate. Google has not performed a legal analysis and makes no representation or warranty as to the accuracy of the list.)
Shanghai Bi Ren Technology Co ltd
Original Assignee
Shanghai Biren Intelligent Technology Co Ltd
Priority date (The priority date is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the date listed.)
Filing date
Publication date
Application filed by Shanghai Biren Intelligent Technology Co Ltd filed Critical Shanghai Biren Intelligent Technology Co Ltd
Priority to CN202110874284.7A priority Critical patent/CN113487033B/en
Publication of CN113487033A publication Critical patent/CN113487033A/en
Application granted granted Critical
Publication of CN113487033B publication Critical patent/CN113487033B/en
Active legal-status Critical Current
Anticipated expiration legal-status Critical

Links

Images

Classifications

    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06NCOMPUTING ARRANGEMENTS BASED ON SPECIFIC COMPUTATIONAL MODELS
    • G06N5/00Computing arrangements using knowledge-based models
    • G06N5/04Inference or reasoning models
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06TIMAGE DATA PROCESSING OR GENERATION, IN GENERAL
    • G06T1/00General purpose image data processing
    • G06T1/20Processor architectures; Processor configuration, e.g. pipelining
    • YGENERAL TAGGING OF NEW TECHNOLOGICAL DEVELOPMENTS; GENERAL TAGGING OF CROSS-SECTIONAL TECHNOLOGIES SPANNING OVER SEVERAL SECTIONS OF THE IPC; TECHNICAL SUBJECTS COVERED BY FORMER USPC CROSS-REFERENCE ART COLLECTIONS [XRACs] AND DIGESTS
    • Y02TECHNOLOGIES OR APPLICATIONS FOR MITIGATION OR ADAPTATION AGAINST CLIMATE CHANGE
    • Y02DCLIMATE CHANGE MITIGATION TECHNOLOGIES IN INFORMATION AND COMMUNICATION TECHNOLOGIES [ICT], I.E. INFORMATION AND COMMUNICATION TECHNOLOGIES AIMING AT THE REDUCTION OF THEIR OWN ENERGY USE
    • Y02D10/00Energy efficient computing, e.g. low power processors, power management or thermal management

Landscapes

  • Engineering & Computer Science (AREA)
  • Theoretical Computer Science (AREA)
  • Physics & Mathematics (AREA)
  • General Physics & Mathematics (AREA)
  • Data Mining & Analysis (AREA)
  • Computational Linguistics (AREA)
  • Artificial Intelligence (AREA)
  • Evolutionary Computation (AREA)
  • Computing Systems (AREA)
  • General Engineering & Computer Science (AREA)
  • Mathematical Physics (AREA)
  • Software Systems (AREA)
  • Multi Processors (AREA)
  • Controls And Circuits For Display Device (AREA)

Abstract

The invention relates to an inference method and a device taking a graphic processor as an execution core, wherein the inference method comprises the following steps: the graphics processor calls and executes the codes of the model from the nonvolatile storage space according to the inference command provided by the network interface controller, and is used for carrying out parallel computation on the original data provided by the network interface controller to generate a computation result; integrating the calculation results to generate an inference result; and replying the reasoning result to the network interface controller. The network interface controller is integrated by taking the graphics processor as a core to complete the reasoning operation, so that the participation of a central processor is not needed.

Description

Inference method and device with graphic processor as execution core
Technical Field
The present invention relates to a graphics processor, and more particularly, to a method and apparatus for reasoning about a graphics processor as an execution core.
Background
Traditional cloud computing needs to transmit data to a central data center for processing, and then transmits a computing result back to user equipment, so that the data center needs more and more computing capacity, and the network bandwidth connected to the data center is higher and higher. In order to solve the above-described problems, a new Edge Computing (Edge Computing) is proposed to reduce the load of the central data center and realize efficient inference. Edge computing is a network computing architecture whose computation process is as close as possible to the user equipment that provided the raw data, for reducing latency and bandwidth usage. Edge operations can be applied to artificial intelligence reasoning. In order to enable such application scenarios to be realized, the invention provides an inference method and an inference device taking a graphics processor as an execution core.
Disclosure of Invention
In view of this, how to implement artificial intelligence reasoning application is an important issue of edge operation.
The invention relates to an inference method taking a graphic processor as an execution core, which comprises the following steps: the graphics processor calls and executes the codes of the model from the nonvolatile storage space according to the inference command provided by the network interface controller, and is used for carrying out parallel computation on the original data provided by the network interface controller to generate a computation result; integrating the calculation results to generate an inference result; and replying the reasoning result to the network interface controller.
The invention relates to another inference method taking a graphic processor as an execution core, which comprises the following steps: the network interface controller receives data packets from the user equipment via the network and parses inference requests and raw data from the data packets; sending the inference command to the graphics processor according to the inference request; providing the raw data to the graphics processor, so that the graphics processor calls and executes the codes of the models according to the reasoning commands, and the codes are used for performing parallel computation on the raw data to generate a computation result and integrating the computation result to generate a reasoning result; receiving the inference result from the graphics processor; and transmitting an inference reply containing the inference result to the user equipment via the network.
The invention relates to an inference device, comprising: a calculation unit; and a command processor. The command processor is coupled with the computing unit and used for calling and executing codes of the model from the nonvolatile storage space according to the inference command provided by the network interface controller, and the code is used for performing parallel computation on the original data provided by the network interface controller through the computing unit to generate a computation result; integrating the calculation results to generate an inference result; and replying the reasoning result to the network interface controller.
One of the advantages of the above embodiments is that the inference operation is completed by integrating the network interface controller in a manner that the graphics processor is a core, and the involvement of the central processor is not required.
Another advantage of the above embodiment is that the graphics processor makes full use of built-in communication channels to access non-volatile memory.
Other advantages of the present invention will be explained in more detail in conjunction with the following description and the accompanying drawings.
Drawings
The accompanying drawings, which are included to provide a further understanding of the application and are incorporated in and constitute a part of this application, illustrate embodiment(s) of the application and together with the description serve to explain the application and not to limit the application.
FIG. 1 is a diagram of a distributed computing system according to an embodiment of the invention.
Fig. 2 is a block diagram of an edge node according to some embodiments.
FIG. 3 is a diagram of an edge computing system according to an embodiment of the invention.
Fig. 4 is a block diagram of an edge node in accordance with some embodiments of the present invention.
Fig. 5 is a block diagram of an edge node in accordance with further embodiments of the present invention.
FIG. 6 is a flowchart of an inference method using a graphics processor as a core according to an embodiment of the present invention.
Fig. 7 is a block diagram of an edge node in accordance with further embodiments of the present invention.
Wherein the symbols in the drawings are briefly described as follows:
10: a distributed computing system; 110: a cloud data center; 131. 133: an edge node; 151-156: a user equipment; 210: a network interface controller; 220: a central processing unit; 225. 235: a fast peripheral component interconnect interface; 230: a graphics processor; 250: a memory; 310: a network interface controller; 315: a microcontroller unit; 330. 410, 510: a graphics processor; 335: a built-in interface; 340: a non-volatile storage space; 350: a processing unit; 360: a memory; 370: a network interface controller; 380: a network; 411. 412: a PCIe root port; 413. 513: a command processor; 415: a memory; 416. 516: a calculation unit; 430: NVMe/NVRAM; 511: a shared memory; 512: ONFI; 530: a NAND flash memory device; s610 to S670: the method comprises the following steps of; 730: a non-volatile memory.
Detailed Description
Embodiments of the present invention will be described below with reference to the accompanying drawings. In the drawings, the same reference numerals indicate the same or similar components or process flows.
It will be understood that the terms "comprises" and/or "comprising," when used in this specification, specify the presence of stated features, integers, steps, operations, elements, and/or components, but do not preclude the presence or addition of further features, integers, steps, operations, elements, components, and/or groups thereof.
The use of words such as "first," "second," "third," etc. in this disclosure is intended to modify a component in a claim and is not intended to imply a priority order, precedence relationship, or order between components or steps in a method.
It will be understood that when an element is referred to as being "connected" or "coupled" to another element, it can be directly connected or coupled to the other element or intervening elements may be present. In contrast, when an element is described as being "directly connected" or "directly coupled" to another element, there are no intervening elements present. Other words used to describe the relationship between components may also be interpreted in a similar manner, e.g., "between" versus "directly between," or "adjacent" versus "directly adjacent," etc.
Refer to fig. 1. The decentralized computing system 10 may include a cloud data center (cloud data center)110, edge nodes (edge nodes)131 and 133, and user devices 151 to 156. The cloud data center 110 has a strong computing power and may include a server cluster (server cluster) for training various models according to a large amount of data and allowing the edge nodes 131 and 133 to download the trained models. These models can be used in the application fields of image classification, target detection, speech synthesis, etc. For example, the image classification model may be ResNet-50, MobileNet, etc., the target detection model may be a Single Shot Multi Box Detector (SSD), Yolov3/v5, etc., and the speech synthesis model may be Tacotron 2, etc. Although the present invention has been described with reference to the above model, it is not limited to the above model, and those skilled in the art can make the edge nodes 131 and 133 obtain other trained models from the cloud data center 110. Any of the edge nodes 131 and 133 may be implemented using a mainframe, workstation, industrial computer, personal computer, or the like. Each edge node is as close as possible to the user device providing the raw data and is capable of serving multiple user devices, e.g., edge node 131 may provide user devices 151-153 inference services in accordance with the trained model, while edge node 133 may provide user devices 154-156 inference services in accordance with the trained model. Any of the user devices 151 to 156 may issue an inference request (inference request) and transmit raw data (e.g., an image, a text, etc.) to the corresponding edge node. The inference request may contain a specified model, how to operate the model, and other necessary information for requesting this edge node to perform a specific operation on the raw data with the specified model to generate the inference result. The edge node then replies to the inference results (e.g., classification results, voice data, etc.) to the user device that issued the inference request.
In some embodiments of the edge node, referring to fig. 2, a Central Processing Unit (CPU) 220 serves as a host processor for controlling the execution of the whole operation, and a Graphics Processing Unit (GPU) 230 serves as an accelerator (accelerator) for performing calculation using a model specified in the inference request to generate a calculation result. In detail, a non-volatile random access memory (NVRAM) 240 stores a plurality of models acquired from the cloud data center 110. The central processor 220 receives the inference command through a Network Interface Controller (NIC) 240, reads a specified model from the non-volatile random access memory 240 through a peripheral component interconnect express (PCIe) interface 225 according to information carried in the inference command, and offloads (offload) a series of instructions to the graphics processor 230. The graphic processor 230 performs calculation according to the unloaded instruction and stores the calculation result in the memory 250. The central processor 220 reads the calculation results from the memory 250 to generate inference results, and replies the inference results to the user device that issued the inference request through the network interface controller 240. However, although the graphics processor 230 is responsible for a large amount of computation work of the model, the central processor 220 still needs to spend time and resources to obtain the progress of the graphics processor 230 and wait for the computation result of the graphics processor 230, which reduces the utilization rate (utilization) of the central processor 220. In addition, graphics processor 230 is also equipped with a PCIe interface 235, but is not used, causing hardware idleness.
With respect to the above-mentioned embodiments, the present invention provides an embodiment of an edge computing system, referring to fig. 3, including any one of the user devices 151 to 156 and the corresponding edge node 131 or 133. The user device may be an electronic product such as a personal computer, a notebook computer (Laptop PC), a tablet computer, a mobile phone, a digital camera, a digital video camera, etc., comprising at least a processing unit 350, a memory 360 and a network interface controller 370 (also referred to as a network card). The processing unit 350, when loading and executing the appropriate program code, performs the operation of the artificial intelligence application, and the memory 360 stores variables, data tables, and management information required by the artificial intelligence application. During the operation of the artificial intelligence application, the processing unit 350 may drive the network interface controller 370 to issue a management request to the edge node via the network 380 for requesting the edge node to update the model, such as updating the image classification model from ResNet-50 to MobileNet, or obtaining the model information currently in use, etc., and then may obtain the requested model information, the execution result of the request, etc., from the edge node. The processing unit 350 may also drive the network interface controller 370 to issue inference requests to the edge nodes via the network 380 for requesting the edge nodes to operate on the uploaded raw data using the specified model and generate inference results, which may then be obtained from the edge nodes. The user equipment may communicate with the edge node using a specific communication Protocol, such as HyperText Transfer Protocol (HTTP), HyperText Transfer security Protocol (HTTPs), Wireless Application Protocol (WAP), and the like.
The edge node includes at least a network interface controller 310, a graphics processor 330, and a non-volatile memory space 340. Since the graphics processor 330 is more suitable for Single Instruction Multiple Data (SIMD) concurrent operations, such as SIMD (Single Instruction Multiple Data) instructions, SIMT (Single Instruction Multiple Thread) techniques, etc., in the application of artificial intelligence, the edge node directly treats the graphics processor 330 as a main processor rather than as an accelerator. The graphic processor 330 not only calculates raw data using the model specified in the inference request to generate a calculation result, but also controls the execution process of the entire operation. In some embodiments, the edge nodes do not have the central processor involved in performing operations to respond to inference requests. In detail, the network interface controller 310 includes a Micro Controller Unit (MCU) 315 for loading and executing appropriate program codes to complete the operation of the offload engine (offload engine). During operation of the offload engine, microcontroller unit 315 may retrieve data packets received via network 380 in accordance with a particular communication protocol, parse the information in the data packets to retrieve the inference request and raw data transmitted by the user device, and send an inference command and raw data to graphics processor 330 instructing graphics processor 330 to compute the raw data using a particular model and parameters to complete a specified inference operation. After the micro-controller unit 315 obtains the inference result from the graphics processor 330, the inference result is loaded in an inference reply and the inference reply is transmitted to the user device that issued the inference request.
The Non-Volatile Memory space 340 may be implemented using Non-Volatile flash Memory (NVMe), Non-Volatile Random Access Memory (NVRAM), NAND flash Memory, etc., for storing code of firmware (firmware) or kernel binary (kernel binary) capable of operating a model to complete inference, and storing a plurality of models acquired from the cloud data center 110. The edge node may be specially designed with a file system (filesystem) to enable the graphics processor 330 to access data in the non-volatile random access memory 340 directly through a built-in Interface 335, where the built-in Interface 335 may be a PCIe Interface, an Open NAND Flash Interface (ONFI), or the like. Graphics processor 330 may load and execute code of firmware or program core binaries from non-volatile storage space 340, including control flow to perform inference operations. Under the control of the control flow, code specifying a model is loaded and executed from non-volatile storage space 340 in response to inference commands received from network interface controller 310 for performing various parallel computations to generate inference results. The graphics processor 330 then replies with the inference results to the network interface controller 310. In addition to performing inference operations, graphics processor 330 is also capable of performing other application tasks, including but not limited to: linear and non-linear data transformations, database operations, big data operations, encoding, decoding, modeling operations, image rendering operations, etc. of audio and video data.
Fig. 4 shows a block diagram of an edge node in accordance with some embodiments of the invention. The edge node contains the network interface controller 310, graphics processor 410, and NVMe or NVRAM 430 as described above. NVMe/NVRAM 430 provides non-volatile storage space for storing code commands for firmware, including control flow to complete inference operations, and code for multiple models obtained from cloud data center 110.
Graphics processor 410 includes PCIe Root Ports (RP) 411, 412, command processor 413, memory 415, and multiple Compute Units (CUs) 416. The command processor 413 contains a Root Complex of the PCIe specification (Root Complex) for connecting the network interface controller 310 and NVMe/NVRAM 430 (which may be referred to as PCIe devices) through PCIe Root ports 411 and 412, respectively. The command handler 413 loads and executes the firmware code, including the control flow to complete the inference operation, from the NVMe/NVRAM 430 through the PCIe root port 412 (which may be referred to as a second PCIe root port), and receives the inference command and the raw data from the network interface controller 310 through the PCIe root port 411 (which may be referred to as a first PCIe root port) during execution of the firmware code. The command handler 413 stores the raw data in memory 415 and calls and executes the code of the specified model from NVMe/NVRAM 430 through PCIe root port 412 according to the inference command. During execution of the model code, command processor 413 issues CUs 416 a plurality of calculation codes and arguments (arguments) to indicate CUs 416 that specific parallel calculations are to be performed by processor 416, the arguments including the source address of the original data stored in memory 415 and the destination address of the calculation result stored in memory 415. Each CU 416 reads raw data from memory 415 according to the source address, performs specified computations on the raw data according to the computation code, and writes the computation results to memory 415 according to the destination address, and informs command processor 413 that the particular computation has been completed. Computations that each CU 416 may perform include addition and multiplication of integers, floating point numbers, compare operations, Boolean operations, bit shifts, algebraic functions (e.g., planar interpolation, trigonometric functions, exponential functions, logarithmic functions), and so forth. Under the management of the control flow of the inference operation, the command processor 413 unifies the calculation results in the memory 415 to generate an inference result, and replies the inference result to the network interface controller 310 through the PCIe root port 411.
In other embodiments of edge nodes, those skilled in the art may modify the architecture of FIG. 4 to place non-volatile memory in graphics processor 410, allowing command processor 413 to directly access data in the non-volatile memory without going through PCIe root port 412. Referring to fig. 7, the graphics processor 410 includes a non-volatile memory 730 for storing code commands for firmware, including control flow to perform inference operations, and code for multiple models obtained from the cloud data center 110. Non-volatile memory 730 includes, but is not limited to, NVRAM. The command handler 413 loads and executes firmware code, including the control flow to complete the inference operation, directly from the non-volatile memory 730. Other technical detail classes can refer to the corresponding descriptions in fig. 4, and are not described again for brevity.
Fig. 5 shows a block diagram of an edge node according to further embodiments of the present invention. The edge node includes a network interface controller 310, a graphics processor 510, and a NAND flash device 530 as described above. NAND flash device 530 provides non-volatile storage space for storing code commands for firmware and code for multiple models obtained from cloud data center 110. The network interface controller 310 may comprise a Direct Memory Access (DMA) controller for storing inference commands, parameters and raw data, as well as inference results, at specified addresses in the shared memory 511.
The graphics processor 510 includes a shared memory (shared memory)511, ONFI 512, a command processor 513, and a plurality of computing units 516. The command handler 513 loads and executes firmware code, including a control flow to complete the inference operation, from the NAND flash device 530 through the ONFI 512, and receives an inference command and raw data from a designated address of the shared memory 511 during execution of the firmware code. The command handler 513 calls and executes the code of the specified model from the NAND flash device 530 through the ONFI 512 in accordance with the inference command. During execution of the model code, command processor 513 issues CUs 516 a plurality of calculation codes and arguments, including the source address of the original data stored in shared memory 511 and the destination address of the calculation result stored in shared memory 511, indicating CUs 516 completion of the particular parallel calculation. Each CU 416 reads raw data from the shared memory 511 according to the source address, performs specified calculations on the raw data according to calculation codes, and writes calculation results to the shared memory 511 according to the destination address, and notifies the command processor 513 of information that a specific calculation has been completed. Each CU 516 may perform computations as described by CU 416. Under the management of the control flow of the inference operation, the command processor 513 unifies the calculation results in the shared memory 511 to generate an inference result. The network interface controller 310 reads the inference result from the specified address in the shared memory 511.
FIG. 6 is a flowchart of a graphics processor-based inference method implemented by a graphics processor 310, 410, or 510 in combination with a network interface controller 310 according to an embodiment of the invention. The detailed steps are as follows:
step S610: a command handler in the graphics processor 330 initializes a communication channel of the network interface controller 310 and a communication channel of the nonvolatile memory space 340. Referring also to the embodiment shown in FIG. 4, command processor 413 initializes PCIe root ports 411 and 412. Referring also to the embodiment shown in fig. 5, the command handler 513 initializes the ONFI 512 and configures space in the shared memory 511 for use by the network interface controller 310.
Step S620: the command handler in the graphics processor 310 loads and executes the code of the firmware from the non-volatile memory space 340. Referring also to the embodiment shown in FIG. 4, command processor 413 loads and executes the code of the firmware from NVMe/NVRAM 430 through PCIe root port 412. Referring also to the embodiment shown in FIG. 5, the command processor 513 loads and executes the code of the firmware from the NAND flash device 530 through the ONFI 512.
Step S630: the microcontroller unit 315 in the network interface controller 310 loads and executes the code of the offload engine.
Step S640: the microcontroller unit 315, when executing the code of the offload engine, receives data packets from the network 380, parses inference requests and raw data from the data packets, and completes the offload processing accordingly. In the offload process, and with additional reference to the embodiment shown in FIG. 4, the micro-controller unit 315 sends an inference command to the graphics processor 410 via the PCIe root port 411 upon receiving an inference request. In the offload process, and with additional reference to the embodiment shown in fig. 5, the micro-controller unit 315 stores the inference command at a specified address in the shared memory 511 upon receipt of the inference request.
Step S650: the micro-controller unit 315 provides raw data to the graphics processor 330 when executing code for the offload engine. Referring also to the embodiment shown in FIG. 4, microcontroller unit 315 provides raw data to graphics processor 410 through PCIe root port 411. Referring also to the embodiment shown in FIG. 5, the microcontroller unit 315 stores the raw data at a specified address in the shared memory 511.
Step S660: the command handler in the graphic processor 330, when executing the code of the firmware, calls and executes the code of the designated model from the nonvolatile memory space 340 according to the inference command provided by the network interface controller 310, for performing parallel computation on the raw data provided by the network interface controller 310 to generate a computation result, integrating the computation result to generate an inference result, and returning the inference result to the network interface controller 310. Referring also to the embodiment shown in FIG. 4, command processor 413 calls and executes the code specifying the model from NVMe/NVRAM 430 through PCIe root port 412, performs parallel computations on the raw data through CUs 416, and replies the inference results to network interface controller 310 through PCIe root port 411. Referring also to the embodiment shown in FIG. 5, the command handler 513 calls and executes code of the specified model from the NAND flash device 530 through the ONFI 512, performs parallel computations on the raw data through CUs 516, and the network interface controller 310 reads the inference results from the specified address in the shared memory 511.
Step S670: the microcontroller unit 315 transmits an inference reply to the user device that issued the inference request.
One of the advantages of the above embodiments is that the inference operation is completed by integrating the network interface controller in a manner that the graphics processor is a core, and the involvement of the central processor is not required.
Another advantage of the above embodiment is that the graphics processor makes full use of built-in communication channels to access non-volatile memory.
All or part of the steps of the method of the present invention may be implemented by a computer program, such as a program core, a driver, and the like. In addition, other types of programs as shown above may also be implemented. Those skilled in the art can write the method of the embodiments of the present invention as program code, and will not be described again for the sake of brevity. The computer program implemented according to the embodiments of the present invention can be stored in a suitable computer readable storage medium, such as a DVD, a CD-ROM, a usb disk, a hard disk, or can be disposed in a network server accessible via a network (e.g., the internet, or other suitable medium).
Although the above-described elements are included in fig. 3 to 5, it is not excluded that more additional elements may be used to achieve better technical results without departing from the spirit of the present invention. Further, although the flowchart of fig. 6 is executed in the order specified, a person skilled in the art may modify the order between the steps to achieve the same effect without departing from the spirit of the invention, and therefore, the invention is not limited to use of only the order described above. In addition, a person skilled in the art may also integrate several steps into one step, or perform more steps in sequence or in parallel besides these steps, and the present invention should not be limited thereby.
The above description is only for the preferred embodiment of the present invention, and it is not intended to limit the scope of the present invention, and any person skilled in the art can make further modifications and variations without departing from the spirit and scope of the present invention, therefore, the scope of the present invention should be determined by the claims of the present application.

Claims (22)

1. An inference method using a graphics processor as an execution core, comprising:
the graphics processor calls and executes the code of the model from the nonvolatile storage space according to the inference command provided by the network interface controller, and is used for carrying out parallel computation on the original data provided by the network interface controller to generate a computation result;
the graphics processor integrates the calculation results to generate an inference result; and
and the graphics processor replies the inference result to the network interface controller.
2. A reasoning method according to claim 1, wherein the non-volatile memory space comprises at least one of a non-volatile flash memory and a non-volatile random access memory, the graphics processor invoking and executing code of the model from the non-volatile flash memory or the non-volatile random access memory through a built-in fast peripheral component interconnect root port.
3. The inference method of claim 1, wherein the non-volatile storage space comprises a NAND flash device from which the graphics processor invokes and executes code of the model through a built-in open NAND flash interface.
4. The inference method of claim 1, wherein the non-volatile memory space comprises a non-volatile memory disposed inside the graphics processor.
5. The inference method of claim 1, comprising:
the graphics processor initializes a first communication channel corresponding to the network interface controller and a second communication channel corresponding to the non-volatile storage space before invoking and executing code of the model.
6. The inference method of claim 5, wherein the first communication channel is a first peripheral component interconnect express root port, and the second communication channel is a second peripheral component interconnect express root port different from the first peripheral component interconnect express root port.
7. The inference method of claim 5, wherein the first communication channel is a shared memory of the graphics processor, and the second communication channel is an open NAND flash interface.
8. The inference method of claim 1, comprising:
the network interface controller receiving a data packet from a user device via a network and parsing an inference request and the raw data from the data packet;
the network interface controller sends the inference command to the graphics processor according to the inference request;
the network interface controller provides the raw data to the graphics processor;
the network interface controller receiving the inference result from the graphics processor; and
the network interface controller transmits an inference reply containing the inference result to the user device via the network.
9. The inference method of claim 8, wherein there is no central processor involved in the process of the graphics processor and the network interface controller performing the inference method.
10. An inference method using a graphics processor as an execution core, comprising:
the network interface controller receives data packets from the user equipment via the network and parses inference requests and raw data from the data packets;
the network interface controller sends an inference command to the graphics processor according to the inference request;
the network interface controller provides the original data to the graphics processor, and the graphics processor calls and executes codes of a model according to the reasoning command, and is used for performing parallel computation on the original data to generate a computation result and integrating the computation result to generate a reasoning result;
the network interface controller receiving the inference result from the graphics processor; and
the network interface controller transmits an inference reply containing the inference result to the user device via the network.
11. The inference method of claim 10, wherein said network interface controller transmits said inference command and said raw data to said graphics processor through a fast peripheral component interconnect root port of said graphics processor and/or receives said inference result from said graphics processor.
12. An inference method according to claim 10, wherein said network interface controller stores said inference command and said raw data to a specified address in a shared memory of said graphics processor, and/or reads said inference result from a specified address in said shared memory.
13. An inference apparatus, comprising:
a calculation unit; and
the command processor is coupled with the computing unit and used for calling and executing codes of the model from the nonvolatile storage space according to the inference command provided by the network interface controller, and the code is used for performing parallel computation on the original data provided by the network interface controller through the computing unit to generate a computation result; integrating the calculation results to generate an inference result; and replying the reasoning result to the network interface controller.
14. The inference apparatus of claim 13, comprising:
a second fast peripheral component interconnect root port coupled to the command processor and the non-volatile memory space;
wherein the non-volatile storage space includes at least one of a non-volatile flash memory and a non-volatile random access memory, and the command processor is to invoke the model from the non-volatile flash memory or the non-volatile random access memory through the second fast peripheral component interconnect root port.
15. The inference apparatus of claim 14, comprising:
a first fast peripheral component interconnect root port coupled to the command processor and the network interface controller, the command processor configured to receive the inference command and the raw data from the network interface controller through the first fast peripheral component interconnect root port;
the command processor is used for initializing the first quick peripheral component interconnection root port and the second quick peripheral component interconnection root port before calling and executing the code of the model.
16. The inference apparatus of claim 15, wherein the computation unit, the command processor, the first fast peripheral component interconnect root port, and the second fast peripheral component interconnect root port comprise a graphics processor.
17. The inference apparatus of claim 13, comprising:
an open NAND flash memory interface coupling the command processor and the non-volatile memory space;
wherein the non-volatile storage space includes a NAND flash device, and the command processor is to invoke the code of the model from the NAND flash device through the open NAND flash interface.
18. The inference apparatus of claim 17, comprising:
a shared memory coupled to the command processor and the network interface controller;
wherein the command handler is configured to initialize the open NAND flash memory interface and to provide the shared memory configuration space to the network interface controller for use before invoking and executing code of the model.
19. The inference apparatus of claim 18, wherein the computation unit, the command processor, the open NAND flash interface, and the shared memory constitute a graphics processor.
20. The inference apparatus of claim 13, comprising:
the network interface controller is used for receiving a data packet from user equipment through a network and analyzing an inference request and the original data from the data packet; sending the inference command to the command processor according to the inference request; providing the raw data to the command processor; receiving the inference result from the command processor; and transmitting an inference reply containing the inference result to the user equipment via the network.
21. The inference engine of claim 20, wherein the computation unit and the command processor constitute a graphics processor, and no central processor is involved in the inference engine.
22. The inference engine of claim 13, wherein the non-volatile memory space comprises non-volatile memory disposed within the inference engine.
CN202110874284.7A 2021-07-30 2021-07-30 Reasoning method and device using graphic processor as execution core Active CN113487033B (en)

Priority Applications (1)

Application Number Priority Date Filing Date Title
CN202110874284.7A CN113487033B (en) 2021-07-30 2021-07-30 Reasoning method and device using graphic processor as execution core

Applications Claiming Priority (1)

Application Number Priority Date Filing Date Title
CN202110874284.7A CN113487033B (en) 2021-07-30 2021-07-30 Reasoning method and device using graphic processor as execution core

Publications (2)

Publication Number Publication Date
CN113487033A true CN113487033A (en) 2021-10-08
CN113487033B CN113487033B (en) 2023-05-23

Family

ID=77944870

Family Applications (1)

Application Number Title Priority Date Filing Date
CN202110874284.7A Active CN113487033B (en) 2021-07-30 2021-07-30 Reasoning method and device using graphic processor as execution core

Country Status (1)

Country Link
CN (1) CN113487033B (en)

Citations (10)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN110392092A (en) * 2018-04-17 2019-10-29 三星电子株式会社 The network storage equipment being connect with network structure
CN110532098A (en) * 2019-08-30 2019-12-03 广东星舆科技有限公司 The GPU method and system of service are provided
CN111147603A (en) * 2019-09-30 2020-05-12 华为技术有限公司 Method and device for networking reasoning service
CN111404770A (en) * 2020-02-29 2020-07-10 华为技术有限公司 Network device, data processing method, device, system and readable storage medium
CN111598137A (en) * 2020-04-24 2020-08-28 北京金山云网络技术有限公司 Method and device for providing reasoning service and electronic equipment
CN111800443A (en) * 2019-04-08 2020-10-20 阿里巴巴集团控股有限公司 Data processing system and method, device and electronic equipment
CN111800281A (en) * 2019-04-08 2020-10-20 阿里巴巴集团控股有限公司 Network system, management and control method, device and storage medium
CN112650981A (en) * 2019-10-10 2021-04-13 百度(美国)有限责任公司 Data processing accelerator and computer-implemented method executed by the same
CN112671830A (en) * 2020-12-02 2021-04-16 武汉联影医疗科技有限公司 Resource scheduling method, system, device, computer equipment and storage medium
CN113112040A (en) * 2021-04-16 2021-07-13 广东美的厨房电器制造有限公司 Detection method, terminal, server, detection system and readable storage medium

Patent Citations (10)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN110392092A (en) * 2018-04-17 2019-10-29 三星电子株式会社 The network storage equipment being connect with network structure
CN111800443A (en) * 2019-04-08 2020-10-20 阿里巴巴集团控股有限公司 Data processing system and method, device and electronic equipment
CN111800281A (en) * 2019-04-08 2020-10-20 阿里巴巴集团控股有限公司 Network system, management and control method, device and storage medium
CN110532098A (en) * 2019-08-30 2019-12-03 广东星舆科技有限公司 The GPU method and system of service are provided
CN111147603A (en) * 2019-09-30 2020-05-12 华为技术有限公司 Method and device for networking reasoning service
CN112650981A (en) * 2019-10-10 2021-04-13 百度(美国)有限责任公司 Data processing accelerator and computer-implemented method executed by the same
CN111404770A (en) * 2020-02-29 2020-07-10 华为技术有限公司 Network device, data processing method, device, system and readable storage medium
CN111598137A (en) * 2020-04-24 2020-08-28 北京金山云网络技术有限公司 Method and device for providing reasoning service and electronic equipment
CN112671830A (en) * 2020-12-02 2021-04-16 武汉联影医疗科技有限公司 Resource scheduling method, system, device, computer equipment and storage medium
CN113112040A (en) * 2021-04-16 2021-07-13 广东美的厨房电器制造有限公司 Detection method, terminal, server, detection system and readable storage medium

Also Published As

Publication number Publication date
CN113487033B (en) 2023-05-23

Similar Documents

Publication Publication Date Title
US8473715B2 (en) Dynamic accelerator reconfiguration via compiler-inserted initialization message and configuration address and size information
US20210096823A1 (en) Transpose operations using processing element array
US11003429B1 (en) Compile-time scheduling
US11809953B1 (en) Dynamic code loading for multiple executions on a sequential processor
JP6998991B2 (en) Information processing methods and equipment
US20210158131A1 (en) Hierarchical partitioning of operators
CN111194437A (en) Data processing offload using in-memory code execution
US11175919B1 (en) Synchronization of concurrent computation engines
CN110825435B (en) Method and apparatus for processing data
WO2021046102A1 (en) Flexible datapath offload chaining
US11562554B1 (en) Workload reduction for non-maximum suppression operation
US10922146B1 (en) Synchronization of concurrent computation engines
US20230039000A1 (en) Graph-based data multi-operation system
CN113487033B (en) Reasoning method and device using graphic processor as execution core
US11494326B1 (en) Programmable computations in direct memory access engine
WO2023056370A1 (en) Mixing sparsity compression
US11500802B1 (en) Data replication for accelerator
US20220318604A1 (en) Sparse machine learning acceleration
US11983128B1 (en) Multidimensional and multiblock tensorized direct memory access descriptors
US11748253B1 (en) Address generation for page collision prevention in memory regions
US11620120B1 (en) Configuration of secondary processors
US20240103813A1 (en) Compute engine with transpose circuitry
US11789859B1 (en) Address generation for page collision prevention
US11875247B1 (en) Input batching with serial dynamic memory access
US20240111528A1 (en) Programmable compute engine having transpose operations

Legal Events

Date Code Title Description
PB01 Publication
PB01 Publication
SE01 Entry into force of request for substantive examination
SE01 Entry into force of request for substantive examination
GR01 Patent grant
GR01 Patent grant
CP03 Change of name, title or address

Address after: 201100 room 1302, 13 / F, building 16, No. 2388, Chenhang highway, Minhang District, Shanghai

Patentee after: Shanghai Bi Ren Technology Co.,Ltd.

Country or region after: China

Address before: 201100 room 1302, 13 / F, building 16, No. 2388, Chenhang highway, Minhang District, Shanghai

Patentee before: Shanghai Bilin Intelligent Technology Co.,Ltd.

Country or region before: China

CP03 Change of name, title or address