CN117251406A - Display card communication method and device, display card equipment, host equipment, system and medium - Google Patents

Display card communication method and device, display card equipment, host equipment, system and medium Download PDF

Info

Publication number
CN117251406A
CN117251406A CN202311279144.0A CN202311279144A CN117251406A CN 117251406 A CN117251406 A CN 117251406A CN 202311279144 A CN202311279144 A CN 202311279144A CN 117251406 A CN117251406 A CN 117251406A
Authority
CN
China
Prior art keywords
data transmission
display card
task
card device
data
Prior art date
Legal status (The legal status is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the status listed.)
Pending
Application number
CN202311279144.0A
Other languages
Chinese (zh)
Inventor
肖麟阁
郝锐
阚宏伟
Current Assignee (The listed assignees may be inaccurate. Google has not performed a legal analysis and makes no representation or warranty as to the accuracy of the list.)
Guangdong Inspur Smart Computing Technology Co Ltd
Original Assignee
Guangdong Inspur Smart Computing Technology Co Ltd
Priority date (The priority date is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the date listed.)
Filing date
Publication date
Application filed by Guangdong Inspur Smart Computing Technology Co Ltd filed Critical Guangdong Inspur Smart Computing Technology Co Ltd
Priority to CN202311279144.0A priority Critical patent/CN117251406A/en
Publication of CN117251406A publication Critical patent/CN117251406A/en
Pending legal-status Critical Current

Links

Classifications

    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F15/00Digital computers in general; Data processing equipment in general
    • G06F15/16Combinations of two or more digital computers each having at least an arithmetic unit, a program unit and a register, e.g. for a simultaneous processing of several programs
    • G06F15/163Interprocessor communication
    • G06F15/17Interprocessor communication using an input/output type connection, e.g. channel, I/O port
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F13/00Interconnection of, or transfer of information or other signals between, memories, input/output devices or central processing units
    • G06F13/38Information transfer, e.g. on bus
    • G06F13/42Bus transfer protocol, e.g. handshake; Synchronisation
    • G06F13/4282Bus transfer protocol, e.g. handshake; Synchronisation on a serial bus, e.g. I2C bus, SPI bus
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F9/00Arrangements for program control, e.g. control units
    • G06F9/06Arrangements for program control, e.g. control units using stored programs, i.e. using an internal store of processing equipment to receive or retain programs
    • G06F9/46Multiprogramming arrangements
    • G06F9/54Interprogram communication
    • G06F9/545Interprogram communication where tasks reside in different layers, e.g. user- and kernel-space
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F2213/00Indexing scheme relating to interconnection of, or transfer of information or other signals between, memories, input/output devices or central processing units
    • G06F2213/0026PCI express

Landscapes

  • Engineering & Computer Science (AREA)
  • Theoretical Computer Science (AREA)
  • Software Systems (AREA)
  • Physics & Mathematics (AREA)
  • General Engineering & Computer Science (AREA)
  • General Physics & Mathematics (AREA)
  • Computer Hardware Design (AREA)
  • Digital Computer Display Output (AREA)

Abstract

The invention provides a display card communication method, a device, display card equipment, host equipment, a system and a medium, and relates to the field of communication between display cards, wherein the method is applied to the display card equipment and comprises the following steps: when a control instruction issued by the host device is received, reading a kernel task, kernel task parameters and a data transmission task flag bit issued by the host device from a memory of the display card device; when the data transmission task flag bit is in an effective state, determining the kernel task parameter as a data transmission parameter, and performing data transmission with the target display card device through a communication module in the display card device by utilizing the data transmission parameter, or performing data transmission with the target display card device through the communication module in the display card device by utilizing the data transmission parameter and the kernel task; the display card device can communicate with another target display card device through the exclusive communication module in the device, thereby reducing the redundancy degree of the communication link between the display card devices and solving the technical problem of slower communication speed between the display cards.

Description

Display card communication method and device, display card equipment, host equipment, system and medium
Technical Field
The present invention relates to the field of communication between graphics cards, and in particular, to a method and apparatus for communication between graphics cards, a graphics card device, a host device, a system, and a medium.
Background
A graphics card device (GPU, graphics Processing Unit) is a general purpose computing device that can be deployed to run a machine learning model. In the related art, when performing the inference calculation step in the machine learning model using a plurality of graphic card devices, it is generally necessary to perform communication between the graphic card devices in order to exchange intermediate results. However, the existing communication method between display card devices has the defects of slower communication speed and more redundant paths between the display card devices, so that the communication efficiency between the display card devices is seriously affected, and the execution efficiency of the reasoning calculation step is easily affected.
Disclosure of Invention
The invention aims to provide a display card communication method, a device, display card equipment, host equipment, a system and a medium, wherein the display card equipment can communicate with another target display card equipment through a dedicated communication module in the equipment, so that the redundancy degree of a communication link between the display card equipment can be reduced, and the communication speed between the display card equipment can be improved.
In order to solve the technical problems, the invention provides a display card communication method, which is applied to display card equipment, and comprises the following steps:
When a control instruction issued by a host device is received, reading a kernel task, kernel task parameters and a data transmission task flag bit issued by the host device from a memory of the display card device;
when the data transmission task flag bit is in an effective state, the kernel task parameter is determined to be a data transmission parameter, and the data transmission parameter is utilized to perform data transmission with the target display card device through a communication module in the display card device, or the data transmission parameter, the kernel task and the data transmission with the target display card device are utilized to perform data transmission through the communication module in the display card device.
Optionally, the data transmission task flag bit is a data transmission task flag bit, the data transmission parameter includes communication information, a source address and a data transmission length, and the data transmission is performed with the target display card device by using the data transmission parameter and through a communication module in the display card device, including:
integrating a preset destination address into the data transmission parameters, and writing the integrated data transmission parameters into a register group;
and controlling the communication module to read the data transmission parameters from the register group, so that the communication module reads the data to be transmitted, which is positioned at the source address and has the length of the data transmission length, from the memory, and transmits the data to be transmitted to the corresponding position of the preset destination address in the memory of the target display card device according to the communication information.
Optionally, the register set is a control status register set.
Optionally, the writing the integrated data transmission parameters into the register set includes:
writing each data transmission parameter into a corresponding control state register in the control state register group by utilizing a read-write control state register instruction;
the controlling the communication module to read the data transmission parameters from the register set includes:
and controlling the communication module to read each data transmission parameter from each control state register of the control state register set by utilizing a read set control state register instruction.
Optionally, the read-write control state register instruction and the read-set control state register instruction are compiled by a corresponding inline assembly function in a code file.
Optionally, the controlling the communication module to read the data transmission parameter from the register set includes:
determining a target control state register storing valid bits in the control state register set;
and modifying the valid bit in the target control state register into a valid state, so that the communication module reads the data transmission parameter from the control state register set when detecting that the valid bit is in the valid state.
Optionally, the integrating the preset destination address into the data transmission parameter and writing the integrated data transmission parameter into the register set includes:
integrating a preset destination address into the data transmission parameters by using a computing core with a core number being a designated number, and writing the integrated data transmission parameters into a register group;
the controlling the communication module to read the data transmission parameters from the register set includes:
and controlling the communication module to read the data transmission parameters from the register group by using the calculation core with the core number being the designated number.
Optionally, after the data to be sent is sent to the designated position corresponding to the preset destination address in the memory of the target display card device according to the communication information, the method further includes:
and when the data to be sent is sent, controlling the communication module to send a data sending end zone bit in a valid state to the target display card equipment so as to enable the target display card equipment to determine that the data sending is completed.
Optionally, after the data to be sent is sent to the designated position corresponding to the preset destination address in the memory of the target display card device according to the communication information, the method further includes:
And when the data to be transmitted is transmitted, the data transmission task flag bit is adjusted to be in an invalid state.
Optionally, the data transmission task flag bit is a data receiving task flag bit, the data transmission parameter includes a data receiving address and a data receiving length, and the data transmission is performed with the target display card device by using the data transmission parameter, the kernel task and through a communication module in the display card device, including:
executing the kernel task to migrate the received data with the length of the data receiving length, which is positioned at the designated position of the memory, to the corresponding position of the data receiving address in the memory; the received data is sent by the target display card device, received by the communication module of the local terminal and written into the appointed position by the communication module of the local terminal.
Optionally, before executing the kernel task, the method further includes:
reading a data transmission ending zone bit from the memory;
executing the kernel task when the data transmission ending zone bit is determined to be in a valid state;
and when the data transmission ending zone bit is determined to be in an invalid state, waiting for the target display card equipment to modify the data transmission ending zone bit into an valid state through a communication module of the local terminal.
Optionally, the method further comprises:
and when the kernel task is completed, the data receiving task flag bit is adjusted to be in an invalid state.
Optionally, the reading, from the memory of the graphics card device, a kernel task issued by a host device includes:
reading an instruction file issued by the host device from a memory of the display card device; the instruction file is obtained by compiling the kernel task and the runtime library, and the instruction contained in the instruction file is used for executing the step of determining the kernel task parameter as the data transmission parameter when the data transmission task flag bit is determined to be in an effective state, and performing data transmission with the target display card device by using the data transmission parameter and through a communication module in the display card device, or performing data transmission with the target display card device by using the data transmission parameter, the kernel task and through a communication module in the display card device.
The invention also provides a display card communication method which is applied to the host equipment and comprises the following steps:
receiving an input kernel task;
writing the kernel task and the kernel task parameters into a memory of the display card device, and adjusting a data transmission task flag bit in the memory into an effective state when the kernel task name is determined to be a data transmission task name;
And controlling the display card equipment to execute the kernel task, so that the display card equipment determines the kernel task parameter as a data transmission parameter when determining that the data transmission task flag bit is in an effective state, and performs data transmission with the target display card equipment by using the data transmission parameter and through a communication module in the display card equipment, or performs data transmission with the target display card equipment by using the data transmission parameter, the kernel task and through a communication module in the display card equipment.
Optionally, the kernel task is a data sending task, the data transmission parameter includes communication information, a source address and a data sending length, and the communication module in the display card device reads data to be sent from the memory of the display card device by using the source address and the data sending length, and sends the data to be sent to the target display card device by using the communication information.
Optionally, the kernel task is a data receiving task, the data transmission parameter includes a data receiving address and a data receiving length, and the display card device migrates the received data with a length equal to the data receiving length at the designated position in the memory to a position corresponding to the data receiving address in the memory according to the data receiving address and the data receiving length; the received data is sent by the target display card device, received by the communication module of the display card device and written into the appointed position by the communication module of the display card device.
Optionally, the controlling the graphics card device to execute the kernel task includes:
compiling a runtime library and the kernel task into an instruction file, and writing the instruction file into a memory of the display card device; the instructions in the instruction file are used for executing the steps of determining the kernel task parameter as a data transmission parameter when determining that the data transmission task flag bit is in an effective state, and performing data transmission with a target display card device by using the data transmission parameter and through the communication module, or performing data transmission with the target display card device by using the data transmission parameter, the kernel task and through the communication module in the display card device;
and controlling the display card equipment to run the instruction file.
Optionally, compiling the runtime library and the kernel task into an instruction file includes:
converting the kernel task into an intermediate representation using a portable computing language framework;
the intermediate representation and the runtime library are compiled into the instruction file using a RISC-V compiler.
The invention also provides a display card communication device, which is applied to display card equipment, and comprises:
The reading module is used for reading the kernel task, the kernel task parameters and the data transmission task zone bit issued by the host device from the memory of the display card device when receiving the control instruction issued by the host device;
and the transmission module is used for determining the kernel task parameter as the data transmission parameter when the data transmission task flag bit is in an effective state, and carrying out data transmission with the target display card equipment by using the data transmission parameter and through the communication module in the display card equipment or carrying out data transmission with the target display card equipment by using the data transmission parameter, the kernel task and through the communication module in the display card equipment.
The invention also provides a display card communication device, which is applied to host equipment, and comprises:
the receiving module is used for receiving an input kernel task;
the issuing module is used for writing the kernel task and the kernel task parameters into a memory of the display card device, and adjusting a data transmission task flag bit in the memory into an effective state when the kernel task name is determined to be the data transmission task name;
and the control module is used for controlling the display card device to execute the kernel task, so that the display card device determines the kernel task parameter as a data transmission parameter when determining that the data transmission task flag bit is in an effective state, and performs data transmission with the target display card device by using the data transmission parameter and through a communication module in the display card device, or performs data transmission with the target display card device by using the data transmission parameter, the kernel task and through a communication module in the display card device.
The present invention also provides a graphic card apparatus, comprising:
a memory for storing a computer program;
a processor for implementing the graphics card communication method as described above when executing the computer program;
and the communication module is used for carrying out data transmission with the target display card equipment under the control of the processor or carrying out data transmission with the target display card equipment under the control of the target display card equipment.
The present invention also provides a host device including:
a memory for storing a computer program;
and the processor is used for realizing the display card communication method when executing the computer program.
The invention also provides a display card communication system, which comprises:
the display card device is used for executing a display card communication method applied to the display card device;
and the host equipment is used for executing the display card communication method applied to the host equipment.
The invention also provides a computer readable storage medium, wherein the computer readable storage medium stores computer executable instructions, and when the computer executable instructions are loaded and executed by a processor, the display card communication method applied to the display card device or the display card communication method applied to the host device is realized.
The invention provides a display card communication method, which is applied to display card equipment, and comprises the following steps: when a control instruction issued by a host device is received, reading a kernel task, kernel task parameters and a data transmission task flag bit issued by the host device from a memory of the display card device; when the data transmission task flag bit is in an effective state, the kernel task parameter is determined to be a data transmission parameter, and the data transmission parameter is utilized to perform data transmission with the target display card device through a communication module in the display card device, or the data transmission parameter, the kernel task and the data transmission with the target display card device are utilized to perform data transmission through the communication module in the display card device.
The invention has the beneficial effects that: the display card equipment is provided with a dedicated communication module; meanwhile, when the display card device receives the kernel task, the kernel task parameter and the data transmission task flag bit issued by the host device and determines that the data transmission task flag bit is in an effective state, the data transmission parameter can be utilized to automatically perform data transmission with the target display card device through the exclusive communication module of the data transmission parameter, or the data transmission parameter, the kernel task and the target display card device can be utilized to automatically perform data transmission through the exclusive communication module of the kernel task, so that the redundancy degree of a communication link between the display card devices can be reduced, the redundancy path between the display card devices can be reduced, and the communication rate of communication between the display cards can be improved. The invention also provides a display card communication device, display card equipment, host equipment, a system and a computer readable storage medium, which have the beneficial effects.
Drawings
In order to more clearly illustrate the embodiments of the present invention or the technical solutions in the prior art, the drawings that are required to be used in the embodiments or the description of the prior art will be briefly described below, and it is obvious that the drawings in the following description are only embodiments of the present invention, and that other drawings can be obtained according to the provided drawings without inventive effort for a person skilled in the art.
FIG. 1 is a flowchart of a method for communication of a graphics card according to an embodiment of the present invention;
fig. 2 is a schematic diagram of a graphics card device according to an embodiment of the present invention;
FIG. 3 is a schematic diagram of a read-after-set control status register instruction according to an embodiment of the present invention;
FIG. 4 is a schematic diagram of a read-write control status register instruction according to an embodiment of the present invention;
FIG. 5 is a flowchart of another method for communication of a graphics card according to an embodiment of the present invention;
fig. 6 is a schematic diagram of a data transmission flow provided in an embodiment of the present invention;
fig. 7 is a schematic diagram of a data receiving process according to an embodiment of the present invention;
fig. 8 is a block diagram of a display card communication device according to an embodiment of the present invention;
FIG. 9 is a block diagram of another display card communication device according to an embodiment of the present invention;
fig. 10 is a block diagram of a display card device according to an embodiment of the present invention;
FIG. 11 is a block diagram illustrating a host device according to an embodiment of the present invention;
fig. 12 is a block diagram of a graphics card communication system according to an embodiment of the present invention.
Detailed Description
For the purpose of making the objects, technical solutions and advantages of the embodiments of the present invention more apparent, the technical solutions of the embodiments of the present invention will be clearly and completely described below with reference to the accompanying drawings in the embodiments of the present invention, and it is apparent that the described embodiments are some embodiments of the present invention, but not all embodiments of the present invention. All other embodiments, which can be made by those skilled in the art based on the embodiments of the invention without making any inventive effort, are intended to be within the scope of the invention.
The existing communication method between display card devices has the defects of low communication speed and more redundant paths between the display card devices, and seriously affects the communication efficiency between the display card devices. For example, the communication between the display card devices is seriously dependent on the central processing unit of the host device where the display card devices are located, and then the communication between the display card devices can only be indirectly finished by depending on the communication between the host devices, so that the communication speed between the display card devices is obviously reduced; for another example, the graphics card device is connected to a PCIe interface (Peripheral Component Interconnect express, high speed serial computer expansion bus standard) of the host device, and shares a network card device (NIC, network Interface Controller) with the PCIe chipset, so that data in the graphics card device needs to pass through the PCIe chipset to reach the network card device for data transmission, which obviously improves redundancy of a communication link between the graphics card devices. In view of this, the present invention may provide a video card communication method, in which a video card device may be provided with a dedicated communication module, and the device may communicate with another target video card device using the dedicated communication module, so as to reduce the redundancy of a communication link between video card devices and increase the communication rate between video cards.
It should be noted that, the embodiment of the present invention is not limited to what instruction set the graphics card device works based on. And the customizability of the display card device is improved, and the display card device can work based on an open source RISC-V instruction set.
Referring to fig. 1, fig. 1 is a flowchart of a video card communication method according to an embodiment of the present invention, where the method is applied to a video card device, and may include:
s101, when a control instruction issued by the host device is received, reading a kernel task, kernel task parameters and a data transmission task flag bit issued by the host device from a memory of the display card device.
In order to achieve the effect that the graphics card device performs data transmission with another graphics card device, compared with the existing process of controlling the graphics card device to perform the kernel task, in this embodiment, the host device needs to additionally write a flag bit of the data transmission task in the memory of the graphics card device, and control the graphics card device to read the flag bit, so as to trigger the graphics card device to perform the data transmission process added in this embodiment by using the flag bit. It should be noted that, the embodiment of the present invention is not limited to the position of the data transmission task flag bit in the memory of the display card device, and may be set according to the actual application requirement, for example, the data transmission task flag bit may be set at a position that is not covered by other data. In addition, it is understood that the data transmission task flag bit should have a valid state and an invalid state, where the graphics card device triggers the execution of the data transmission process when determining that the data transmission task flag bit is in the valid state, and otherwise does not trigger the execution of the data transmission process. The specific value corresponding to the valid data transmission task flag bit is not limited in this embodiment, and may be set according to the actual application requirement, for example, may be 1.
Further, the kernel task refers to a task to be executed by a computing engine in the graphics card device, and the kernel task parameter is a parameter carried by the kernel task. In the related art, in order to control a graphics card device to execute a kernel task, a host device needs to issue kernel task parameters and function contents carried by the kernel task to a memory (DDR) of the graphics card device, and issue a control instruction to the graphics card device, so that the graphics card device executes the kernel task based on Data in the memory. The host device can specifically compile the kernel task into an instruction file, and issue the instruction file to the display card device, so that the display card device can complete the kernel task by executing the instructions in the instruction file, wherein the instructions contained in the instruction file can be standard RISC-V instructions. Different from the related art, the embodiment can utilize the kernel task parameters in the kernel task to transmit the data transmission parameters, wherein the data transmission parameters are parameters required by the display card device for data transmission; the kernel task may also be utilized to further transfer data transmission logic, so that the graphics card device performs related processes of data transmission during the process of executing the kernel task. In other words, the kernel task in this embodiment belongs to the data transmission task. The type of the data transmission task is not limited in this embodiment, and for example, the data transmission task may be a data transmission task or a data reception task. Also different from the related art, since the graphics card device in the embodiment of the invention can specifically work based on the open source RISC-V instruction set, a corresponding hardware circuit is not required to be set in the graphics card device for the kernel task, the kernel task is only required to be compiled into an instruction file and issued to the graphics card device for execution, and the setting efficiency of the hardware circuit is obviously lower than the compiling efficiency of the instruction file. In other words, through developing the display card device based on the open source RISC-V instruction set, the embodiment of the invention can remarkably improve the development and issuing execution flexibility of the kernel task.
S102, when the data transmission task flag bit is in an effective state, determining the kernel task parameter as a data transmission parameter, and performing data transmission with the target display card device through a communication module in the display card device by utilizing the data transmission parameter, or performing data transmission with the target display card device through the communication module in the display card device by utilizing the data transmission parameter and the kernel task.
In order to achieve the effect of reducing the redundancy degree of the communication link between the display card devices, the embodiment particularly sets a dedicated communication module in the display card device, so that the display card device has a hardware base for directly communicating with other display card devices. In other words, the computing engine and the communication module in the graphics card device are disposed on the same chip, e.g., the computing engine and the communication module may be disposed on the same FPGA chip (Field Programmable Gate Array ). In addition, in order to ensure that the display card device can use the communication module to communicate, the embodiment adds a data transmission flow for the display card device, so that the data transmission flow can be executed under the condition of being triggered, namely when the data transmission task flag bit is determined to be in an effective state, so that the exclusive communication module is utilized to perform data transmission with other display card devices. It should be noted that, the data transmission parameters required to be used in the process can be transferred by using the kernel task, so that the display card device can utilize the data transmission parameters and perform data transmission with the target display card device through the communication module in the display card device; in addition, a part of data transmission flow can be covered in the kernel task, so that the display card device can utilize data transmission parameters and the kernel task to perform data transmission with the target display card device through a communication module in the display card device.
It should be noted that the data transmission between the communication module in the graphics card device and the target graphics card device may be initiated by the processor in the graphics card device controlling the communication module to the target graphics card device, or may be initiated directly by the target graphics card device to the communication module. In other words, the communication module in the graphics card device may perform data transmission with the target graphics card device under the control of the processor, or may perform data transmission with the target graphics card device under the control of the target graphics card device.
Furthermore, in order to ensure that the display card device can execute the data transmission flow, the embodiment of the invention particularly improves a Runtime library (run library) of the display card device, wherein the Runtime library is a library on which the display card device depends in the running process, and the host device needs to compile and generate an instruction file by utilizing the library and a kernel task. Thus, when the host device utilizes the improved runtime library and kernel task to compile and generate an instruction file, the related instructions of the data transmission flow can be added into the instruction file, and the display card device can execute all steps of the data transmission flow only by executing all instructions in the instruction file.
Based on this, reading the kernel task issued by the host device from the memory of the display card device may include:
Step 11: reading an instruction file issued by the host device from a memory of the display card device; the instruction file is compiled by using a kernel task and a runtime library, and the instruction contained in the instruction file is used for executing the steps of determining the kernel task parameter as the data transmission parameter when determining that the data transmission task flag bit is in an effective state, and transmitting data with the target display card device by using the data transmission parameter through a communication module in the display card device or transmitting data with the target display card device by using the data transmission parameter, the kernel task and through the communication module in the display card device.
Specifically, the instruction file may be a binary file.
Based on the above embodiment, the display card device of the present invention is provided with a dedicated communication module; meanwhile, when the display card device receives the kernel task, the kernel task parameter and the data transmission task flag bit issued by the host device and determines that the data transmission task flag bit is in an effective state, the data transmission parameter can be utilized to automatically perform data transmission with the target display card device through the exclusive communication module of the data transmission parameter, or the data transmission parameter, the kernel task and the target display card device can be utilized to automatically perform data transmission through the exclusive communication module of the kernel task, so that the redundancy degree of a communication link between the display card devices can be reduced, the redundancy path between the display card devices can be reduced, and the communication rate of communication between the display cards can be improved.
Based on the above embodiments, a detailed description will be given below of a specific procedure of performing data transmission by the graphic card device. In one possible scenario, the data transmission task flag is a data transmission task flag, and the data transmission parameters include communication information, a source address, and a data transmission length. The method may further comprise:
s201, when a control instruction issued by the host device is received, a kernel task, kernel task parameters and a data transmission task flag bit issued by the host device are read from a memory of the display card device.
In this embodiment, when the data transmission task FLAG bit (send_flag) is in a valid state, it indicates that the kernel task at this time is a data transmission task, and the graphics card device needs to execute a data transmission flow. As described above, the data transmission task flag is set by the host device. In this embodiment, a special name (for example __ send) may be specifically set for a kernel task corresponding to the data sending flow, so that when the host device determines that the kernel task is the special name, the data sending task flag bit is set to a valid state, and when it determines that the kernel task is not the special name, the data sending task flag bit is set to an invalid state.
Further, it should be noted that since the present embodiment uses only the kernel task to transfer the data transmission parameters when data transmission is performed, the kernel task may not include any actual content. Further, step S202 may be performed after the computing core completes the kernel task. The kernel tasks for data transmission are as follows:
__ kernel __ send (sendID, rcvID, hostID, sendADDR, sendLength)// function header;
{// function body start position;
the location of the end of the function body.
Wherein __ kernel is used to identify a kernel task, sendID represents a sender board ID, rcvID represents a receiver board ID, hostID represents a host ID, sendddr represents a source address, sendLength represents a data transmission length (in bytes), and it is seen that the kernel task may not include any actual content.
Regardless of the kernel task, the host device sends the parameters of the kernel task to the memory of the display card device before starting to execute the kernel task, and the saved location of each parameter of the kernel task in the memory is always sequentially incremented by a BASE address (assuming that the BASE address of the kernel task is_base_addr), for example, sendID is located in the kernel task_base_addr, rcvID is located in the kernel task_base_addr+4, hostid is located in the kernel task_base_addr+8, and so on, so only the data transmission parameters need to be sequentially read from the memory.
S202, when the data transmission task flag bit is in an effective state, integrating a preset destination address into data transmission parameters, and writing the integrated data transmission parameters into a register set.
Further, it should be noted that, in the data transmission flow, the complete data transmission parameters include communication information (such as a sender board ID, a receiver board ID, and a host ID), a source address, a destination address, and a data transmission length, where the source address indicates a location of data to be transmitted in a memory of a display card device, the data transmission length indicates a length of data to be transmitted, and the destination address is an address in a memory of a target display card device, and the display card device needs to write the data to be transmitted into a corresponding location of the destination address in the memory of the target display card device. In the above parameters, the communication information, the source address and the data transmission length may be set by the host device according to the actual situation, and the destination address is a preset fixed value, and an address corresponding to a location in the memory of the display card device that is not covered by other data may be selected as the destination address, for example, the destination address may be located in a high-order of the memory address space. The arrangement can ensure that data is only transmitted to a fixed position in the memory of the target display card device, thereby not only facilitating the data migration of the target display card device, but also avoiding the transmission of data covered by other data.
Further, in order to ensure that the communication module can perform parameter acquisition, the embodiment needs to write the data transmission parameters into the register set and control the communication module to read the data transmission parameters from the register set. The register group is provided with a register corresponding to each parameter type, and then the data transmission parameters are only required to be written into the corresponding registers according to the parameter types. It should be noted that, the embodiment of the present invention is not limited to the setting manner of the register set, for example, the register type may be set separately in the graphics card device, and the register set may be set based on the type; the number of registers can be increased based on the existing register types of the display card device, and the register group can be set by using the newly increased registers. In consideration of increasing the number of registers based on the existing register types of the display card device and setting the register groups by using the newly increased registers, the newly increased registers can be accessed by adopting the access mode of the existing register types, and only the register addresses are required to be modified, so that the setting is convenient, and therefore, the embodiment of the invention sets the register groups in the mode. Specifically, the register set may be a control status register set (CSRs, control and States Register) in the display card device. The following table is the parameter type corresponding to each control status register in the control status register set.
For ease of understanding, the manner in which the set of control state registers is set will be described. In hardware, the graphics card apparatus includes a plurality of compute cores (cores), each of which includes a plurality of Thread bundles (Warp), each of which in turn includes a plurality of threads (threads). Since the graphics card device is in single instruction multithreading (SIMT, single Instruction Multiple Threads) mode of operation, each thread has a relatively independent set of hardware resources. These hardware resources are implemented by RISC-V instruction set standards, such as having a dedicated multiplier for each thread, a set of general purpose registers (GPRs, general Purpose Register), a set of control state registers, etc. In order to keep unified with the design of the graphics card architecture and simplify development work, the newly added custom control status register set in this embodiment also adopts the same design concept, that is, each thread has a dedicated custom control status register set and is set in the execution (execution) stage of the Pipeline (Pipeline), as shown in fig. 2. In the actual running process, the enhanced Runtime library (run time library) can automatically control the display card device when the display card device works, so that only one thread of one thread bundle of one computing core is responsible for reading and writing the self-defined control state register group, and a series of problems caused by simultaneous reading and writing of multiple threads are avoided. For each computing core, only one thread bundle is allowed to be in operation at a time, so for a computing core having 4 thread bundles, 4 threads in each thread bundle, only 4 actual physical threads are required for the computing core. When switching to a thread bundle, the relevant state and information of the 4 physical threads are switched to the context of the corresponding thread bundle. For the present embodiment, when the number of actual physical threads of the computing core is 4, it means that there are 4 custom control status register sets in the computing core, corresponding to the 4 actual physical threads. The custom control state register sets for each set are defined as follows:
TABLE 1 custom control State register set definition
Name of the name Address (12 bits) Function of
Sender-side Board card ID 0xdc2 Is mainly used for searching the corresponding IP address in the node information table
Receiver board card ID 0xdc3 Is mainly used for searching the corresponding IP address in the node information table
Host ID 0xdc4 Is mainly used for searching the corresponding IP address in the node information table
Source address low 32 bits 0xdc5 Address of data to be transmitted in DDR of local terminal
High 32 bits of source address 0xdc6 Default to 0, reserved, no action is required
Low 32 bits of destination address 0xdc7 Address of data to be transmitted in DDR of opposite end
High 32 bits of destination address 0xdc8 Default to 0, reserved, no action is required
Transmitted data length 0xdc9 Total length of data required to be transmitted at this time
Effective and effective 0xdca The CSRs information required by the data transmission is ready
S203, the communication module is controlled to read the data transmission parameters from the register group, so that the communication module reads the data to be transmitted, which is positioned at the source address and has the length of data transmission length, from the memory, and transmits the data to be transmitted to the corresponding position of the preset destination address in the memory of the target display card device according to the communication information.
As described above, the communication control module needs to perform data transmission using the above data transmission parameters, so after writing the data transmission parameters into the control status register set, the communication module can be controlled to read the data transmission parameters from the register set. The communication module can specifically read data to be transmitted, which is located at the source address and has a length equal to the transmission length of the data, from the memory, and transmit the data to be transmitted to a position corresponding to the preset destination address in the memory of the target display card device according to the communication information. It should be noted that, the present embodiment is not limited to a specific hardware component of the communication module, for example, to implement a data transmission function, it may at least include a control status register module, a DMA controller (DMA, direct Memory Access), and an idma module (Remote Direct Memory Access, remote data direct reading), where the control status register module is configured to read a data transmission parameter in the control status register set, and control the DMA controller to perform data transmission by using the parameter; the DMA controller is used for reading data to be sent from a memory of the display card device and transmitting the data to the iRDMA module; the iRDMA module is used for sending data to be sent to the target display card device according to the communication information, and the sending process needs to pass through a link layer Module (MAC). Besides, the communication module may further include a node information table, i.e. a control status register control module, where the node information table is used to store IP addresses of the graphics card nodes, and the idma module may search for corresponding IP addresses in the node information table based on an Identification (ID) of each graphics card node; the control state register control module is used for carrying out external control on the control state register module. As also shown in FIG. 2, the read-write of the control status register module connected with the General-purpose RISC-V based graphics processor (GPGPU) can be realized by reading and writing the custom control status register group belonging to the thread through a single thread, so that the information in the control status register group can be timely and correctly transferred to the control status register module. Based on this, the control status register module can activate the DMA controller according to the information to perform kernel direct communication between the graphics card nodes.
Further, for the manner of accessing each control state register, the RISC-V instruction set provides standard control state register privileged instructions in the specific format shown in fig. 3 and 4, fig. 3 is a schematic diagram of a read-after-set control state register instruction provided in the embodiment of the invention, and fig. 4 is a schematic diagram of a read-after-write control state register instruction provided in the embodiment of the invention. If a certain custom control status register needs to be read or written, the value of the immediate field (imm, i.e. csr in fig. 3 and 4) with 12 bits higher in fig. 3 or 4 is only required to be changed to the corresponding address. For example, if data needs to be written into the control status register (address is 0xdc 2) corresponding to the "sender board ID", the csr field value of the high 12 bits of the 32bit instruction shown in fig. 4 is only required to be designated as 0xdc2, the general purpose register number where the value of the 32bit to be changed is located is given through the rs1 field, and the thread can read the data from the general purpose register indicated by the rs1 field and write the data into the control status register indicated by the csr field. For another example, if it is necessary to read data from the control status register corresponding to the "sender board ID", the value of the csr field of the upper 12 bits in the 32-bit instruction shown in fig. 3 may be designated as 0xdc, the rs1 field may be fixed as 0, and the number of the general purpose register in which the read value is to be stored may be designated by the rd field. All the above instruction formats are standard instruction formats of RISC-V, so the present embodiment only needs to modify the csr field to the address of the custom control status register to be accessed.
Based on this, writing the integrated data transfer parameters into the register set may include:
step 21: writing each data transmission parameter into a corresponding control state register in the control state register group by utilizing a read-write control state register instruction;
the control communication module reading data transmission parameters from the register set may include:
step 22: the control communication module reads each data transmission parameter from each control state register of the control state register group by utilizing the read set control state register instruction.
Furthermore, in order to facilitate the developer to write the kernel task, the embodiment can use the inline assembly function to package the read-write operation for the custom control state register set, so that the call of the corresponding function can be realized without modifying a software tool chain such as a compiler, and the programming logic of upper-layer software is simplified. The following shows an inline assembly function for writing a sender card ID, and when a custom control status register corresponding to the sender card ID (sendID) needs to be written, the function is only required to be called, and a value needing to be written is transferred:
inline int csr_write_ sendID (unsigned sendID)// function header;
{// function body start position;
asm volt ("csrw% 0,%1": "i" (0 xdc 2), "r" (sendID)); a// function body;
the location of the end of the function body.
In the compiling process, the read-write control state register instruction and the read-set control state register instruction can be compiled by corresponding inline assembly functions in the code file.
Further, to trigger the communication module to read the data transmission parameter from the control status register set, after the parameter is written, the graphics card device may determine that a valid target control status register is stored in the control status register set, and modify the valid bit in the target control status register to be in a valid state (e.g. modify to be 1), so as to trigger the communication module to read the data transmission parameter.
Based on this, the control communication module reads the data transmission parameters from the register set, including:
step 31: determining a target control state register storing valid bits in the control state register set;
step 32: the valid bit in the target control state register is modified to a valid state such that the communication module reads the data transfer parameter from the control state register set when the valid bit is detected to be in the valid state.
Further, to avoid conflicts, both the writing and reading of the register set may be performed by the designated compute core, while the other compute cores need only be on standby.
Based on this, integrating the preset destination address into the data transmission parameters, and writing the integrated data transmission parameters into the register set may include:
step 41: integrating the preset destination address into the data transmission parameters by using a computing core with a core number being a designated number, and writing the integrated data transmission parameters into a register group;
the control communication module reading data transmission parameters from the register set may include:
step 42: and the calculation core control communication module with the core number being the designated number is used for reading the data transmission parameters from the register group.
Further, in order to ensure that the target display card device can correctly receive the data, when the communication module in the present display card device completes data transmission, the communication module may further SEND a data transmission end FLAG bit (send_finish_flag) in a valid state to the target display card device, so that the target display card device determines that the data transmission is completed. The data transmission end flag bit can be set at a position which is not covered by other data in the memory of the display card device.
Based on this, after sending the data to be sent to the specified position corresponding to the preset destination address in the memory of the target display card device according to the communication information, the method may further include:
step 51: when the transmission of the data to be transmitted is completed, the control communication module transmits a data transmission end zone bit in a valid state to the target display card device, so that the target display card device determines that the data transmission is completed.
Further, after the data transmission is completed, the display card device can autonomously adjust the flag bit of the data transmission task to be in an invalid state, so as to avoid interference to the next kernel task.
Based on this, after sending the data to be sent to the specified position corresponding to the preset destination address in the memory of the target display card device according to the communication information, the method may further include:
step 61: and when the transmission of the data to be transmitted is completed, the data transmission task flag bit is adjusted to be in an invalid state.
Based on the above embodiments, a detailed description will be given below of a specific procedure of performing data transmission by the graphic card device. In one possible case, the data transmission task flag bit is a data receiving task flag bit, the data transmission parameter includes a data receiving address and a data receiving length, and the data transmission is performed with the target display card device through the communication module in the display card device by using the data transmission parameter and the kernel task. The method may further comprise:
S301, when a control instruction issued by the host device is received, reading a kernel task, kernel task parameters and a data receiving task flag bit issued by the host device from a memory of the display card device.
In this embodiment, when the data receiving task FLAG bit (rcv_flag) is in a valid state, it indicates that the kernel task at this time is a data receiving task, and the graphics card device needs to execute a data receiving procedure. As described above, the data reception task flag bit is set by the host device. In this embodiment, a special name (for example __ rcv) may be specifically set for a kernel task corresponding to the data receiving flow, so that when the host device determines that the kernel task is the special name, the flag bit of the data receiving task is set to an active state, and when it determines that the kernel task is not the special name, the flag bit of the data receiving task is set to an inactive state.
S302, executing a kernel task when the data receiving task flag bit is determined to be in a valid state, so as to transfer received data with the length being the data receiving length at a designated position of a memory to a corresponding position of a data receiving address in the memory; the received data is sent by the target display card device, received by the communication module of the local terminal and written into the appointed position by the communication module of the local terminal.
It should be noted that, when receiving data, since the sender can write the data into the specified location in the receiver memory, the receiver only needs to migrate the data of the specified location to the location specified by the user. In other words, the computing engine of the video card device only needs to migrate the received data to the designated location, and the data reception can be processed by the communication module of the video card device.
Further, the logic for data migration may be written into the kernel task, and parameters of the kernel task may be a data receiving address and a data receiving length, where the data receiving address is an address of the user-specified location, and the data receiving length is a length of the received data. Further, the compute engine may migrate received data of length data reception length at a specified location in memory to a location in memory corresponding to the data reception address when executing the kernel task. The kernel tasks for data transmission are as follows:
wherein rcvADDR represents a data receiving address, rcvLength represents a data receiving length, TMP_RCV_ADDR is a fixed value, and represents a preset destination address.
Further, it is considered that the end of data transmission is indicated only when the data sender sets the data transmission end FLAG bit (send_finish_flag) to a valid state. Therefore, before executing the kernel task, the display card device can also read the data transmission end flag bit from the memory, judge whether the data transmission end flag bit is in a valid state, if so, execute the kernel task, and if not, continue waiting.
Based on this, before executing the kernel task, it may further include:
step 71: reading a data transmission ending zone bit from a memory;
step 72: executing a kernel task when determining that the data transmission ending zone bit is in a valid state;
step 73: when the data transmission ending zone bit is determined to be in an invalid state, the target display card equipment is waited to modify the data transmission ending zone bit into an effective state through a communication module of the local terminal.
Further, after the data transmission is completed, the display card device can autonomously adjust the flag bit of the data receiving task to be in an invalid state, so as to avoid interference to the next kernel task.
Based on this, the method may further include:
step 81: and when the kernel task is completed, the data receiving task flag bit is adjusted to be in an invalid state.
Based on the above embodiments, the improvement of the host device side will be described in detail below. Referring to fig. 5, fig. 5 is a flowchart of another method for communication of a graphics card according to an embodiment of the present invention, where the method is applied to a host device and may include:
s501, receiving an input kernel task.
S502, writing the kernel task and the kernel task parameters into a memory of the display card device, and adjusting a data transmission task flag bit in the memory into an effective state when the kernel task name is determined to be the data transmission task name.
Unlike the related art, when the host device in this embodiment determines that the kernel task name is the data transmission task name (for example, __ send or __ rcv), it may determine that the corresponding data transmission task needs to be executed, so that the flag bit of the corresponding data transmission task in the memory of the display card device may be adjusted to be in a valid state, so as to trigger the display card device to execute the corresponding data transmission task.
S503, controlling the display card device to execute the kernel task, so that the display card device determines the kernel task parameter as the data transmission parameter when determining that the data transmission task flag bit is in an effective state, and performs data transmission with the target display card device through a communication module in the display card device by using the data transmission parameter, or performs data transmission with the target display card device through the communication module in the display card device by using the data transmission parameter and the kernel task.
Specifically, to ensure that the graphics card device can correctly execute the corresponding data transmission flow, the present embodiment may improve the runtime library of the graphics card device to add relevant content of the data transmission flow to the library. Thus, when the host device utilizes the improved runtime library and kernel task to compile and generate an instruction file, the related instructions of the data transmission flow can be added into the instruction file, and the display card device can execute all steps of the data transmission flow only by executing all instructions in the instruction file.
Based on this, controlling the graphics card device to execute the kernel task may include:
step 91: compiling the runtime library and the kernel task into instruction files, and writing the instruction files into a memory of the display card device; the instruction in the instruction file is used for executing the steps of determining the kernel task parameter as the data transmission parameter when the data transmission task flag bit is in the effective state, and carrying out data transmission with the target display card device by utilizing the data transmission parameter and through the communication module in the display card device, or carrying out data transmission with the target display card device by utilizing the data transmission parameter, the kernel task and through the communication module in the display card device;
step 92: and controlling the display card equipment to run the instruction file.
It should be noted that, the present embodiment is not limited to a specific compiling manner, and is specifically related to a writing language of a kernel task. For example, when the kernel task is OpenCL code (Open Computing Language ) written using a portable computing language framework (PoCL, portable Computing Language), the kernel task may be converted to an intermediate representation using the portable computing language framework and the intermediate representation and runtime library compiled into an instruction file using a RISC-V compiler.
Based on this, compiling the runtime library and kernel tasks into instruction files may include:
step 1001: converting the kernel task into an intermediate representation using the portable computing language framework;
step 1002: the intermediate representation and the runtime library are compiled into an instruction file using a RISC-V compiler.
It is worth pointing out that, because the embodiment can be realized based on open source RISC-V specifications, openCL and its PoCL, higher customizability is provided, the user can realize own functions as required, and the use of open source resources also reduces the cost, and can improve the openness and iteration speed of the code.
Further, as described above, the kernel task may be a data transmission task, and the data transmission parameter may include communication information, a source address, and a data transmission length, and the communication module in the display card device reads data to be transmitted from the memory of the display card device using the source address and the data transmission length, and transmits the data to be transmitted to the target display card device using the communication information.
Further, as described above, the kernel task may be a data receiving task, the data transmission parameter may include a data receiving address and a data receiving length, and the display card device migrates the received data with a length of the data receiving length, which is located at the designated location in the memory, to a location in the memory corresponding to the data receiving address according to the data receiving address and the data receiving length; the received data is transmitted by the target display card device, received by the communication module of the display card device and written into the designated position by the communication module of the display card device.
Based on the above embodiments, the following describes the graphics card communication method fully based on two specific schematic diagrams. Referring to fig. 6, fig. 6 is a schematic diagram of a data transmission flow according to an embodiment of the present invention. When the allocation of all tasks and the thread scheduling calculation are completed, firstly judging whether the flag bit of the data transmission task is 1, if not, indicating that the currently executed kernel is not '__ send', and if 1, further judging by each Core according to the self condition: each Core will execute a different instruction by entering a different branch depending on whether its own ID is 0 (i.e., whether it is the first Core); the Core with CoreID of 0 is responsible for acquiring and judging the value of the address corresponding to FLAG by using a thread to decide whether to transmit and receive data. For the sender, the specific process is as follows: 1) Reading the DDR address value given by ADDR_SEND_FLAG (data transmission task FLAG bit address), and judging whether the DDR address value is 1; 2) If addr_send_flag is 1, sequentially reading the values of 5 parameters of keyline_base_addr (KERNEL BASE address), keyline_base_addr+4, keyline_base_addr+8, keyline_base_addr+12, keyline_base_addr+16, that is, obtaining the value of 5 parameters of __ SEND KERNEL, and for the lower 32 bits of the destination address in the table provided in the above embodiment, a fixed address is agreed here, denoted as tmp_rcv_addr (destination address), indicating that the receiver temporarily uses to save the address of the data sent by the sender, the address is located at the high order of the address space, and ensuring that other actions will not occupy the space through the PoCL parsing engine; 3) Sequentially writing the obtained 5 parameters into corresponding custom CSRs, in addition, fixedly writing the low 32bit of the destination address as TMP_RCV_ADDR, then writing valid as 1 to generate a high-efficiency pulse signal, and finally writing the data transmission task flag bit address as 0 to prevent the next operation from detecting a stale value; 4) The control module connected to the control status register of the GPGPU will automatically read the set of written information after detecting the pulse signal generated by valid, the process depends on the modules being located in the same chip, otherwise, the process cannot be effectively realized, then according to the information, kernel direct communication between GPU nodes is performed through the modules such as the DMA controller, the idma, etc., the principle and process of communication operation performed by using the modules are described in other patents, and only the description is made as a requirement of a complete communication process; 5) In addition, the sender also sends 1 the data sending end flag bit address of the receiver; indicating that the data transmission is completed.
Referring to fig. 7, fig. 7 is a schematic diagram of a data receiving process according to an embodiment of the invention. When receiving data, the PoCL analysis engine and the GPU run time library are also required to work cooperatively; the recipient typically writes a kernel task named "__ rcv" that contains only two parameters, namely rcv addr and rcv length, where rcv addr represents the data receiving address and rcv length represents the data receiving length. The kernel task copies data of length rcvLength from the contracted fixed address tmp_rcv_addr to the RCV ADDR address. The specific process is as follows: 1) When detecting that the name of a kernel to be started to be executed is __ RCV, the PoCL parsing engine running on the host side writes a value of a specific address (marked as RCV_FLAG, data receiving task FLAG bit) of a memory of the display card equipment serving as a receiver into 1; 2) The display card equipment runtime library detects whether the data receiving task flag bit is 1, if so, the currently executed kernel is __ rcv, and jumps to step 3, otherwise, the workflow is consistent with that of the common dispatching kernel; 3) Waiting for the completion of data transmission by a sender, namely waiting for the completion of data transmission with a flag bit of 1; 4) Setting a data transmission end flag bit to 0, and performing thread scheduling on a __ rcv kernel task to execute the task; 5) After the task is executed, writing the data receiving task flag bit into 0, and entering a standby state.
The display card communication device, the display card device, the host device, the display card communication system and the computer readable storage medium provided in the embodiments of the present invention are described below, and the display card communication device, the display card device, the host device, the display card communication system and the computer readable storage medium described below and the display card communication method described above may be referred to correspondingly.
Referring to fig. 8, fig. 8 is a block diagram of a communication device of a graphics card according to an embodiment of the present invention, where the device is applied to a graphics card apparatus, and may include:
a reading module 801, configured to read, when receiving a control instruction issued by a host device, a kernel task parameter, and a data transmission task flag bit issued by the host device from a memory of the display card device;
and the transmission module 802 is configured to determine, when the data transmission task flag bit is in the valid state, that the kernel task parameter is a data transmission parameter, and perform data transmission with the target display card device through the communication module in the display card device by using the data transmission parameter, or perform data transmission with the target display card device through the communication module in the display card device by using the data transmission parameter and the kernel task.
Optionally, the data transmission task flag is a data transmission task flag, and the data transmission parameter includes communication information, a source address, and a data transmission length, and the transmission module 802 may include:
the register writing sub-module is used for integrating the preset destination address into the data transmission parameters and writing the integrated data transmission parameters into the register group;
the data transmission sub-module is used for controlling the communication module to read the data transmission parameters from the register group, so that the communication module reads the data to be transmitted, which is positioned at the source address and has the length of data transmission length, from the memory, and transmits the data to be transmitted to the corresponding position of the preset destination address in the memory of the target display card device according to the communication information.
Optionally, the register set is a control status register set.
Optionally, the register writing submodule is specifically configured to:
writing each data transmission parameter into a corresponding control state register in the control state register group by utilizing a read-write control state register instruction;
the data transmission sub-module is specifically configured to:
the control communication module reads each data transmission parameter from each control state register of the control state register group by utilizing the read set control state register instruction.
Optionally, the read-write control state register instruction and the read-set control state register instruction are compiled from corresponding inline assembly functions in the code file.
Optionally, the data sending sub-module is specifically configured to:
determining a target control state register storing valid bits in the control state register set;
the valid bit in the target control state register is modified to a valid state such that the communication module reads the data transfer parameter from the control state register set when the valid bit is detected to be in the valid state.
Optionally, the register writing submodule is specifically configured to:
integrating the preset destination address into the data transmission parameters by using a computing core with a core number being a designated number, and writing the integrated data transmission parameters into a register group;
the data transmission sub-module is specifically configured to:
and the calculation core control communication module with the core number being the designated number is used for reading the data transmission parameters from the register group.
Optionally, the transmission module 802 may further include:
and the data transmission end zone bit transmitting sub-module is used for controlling the communication module to transmit the data transmission end zone bit in a valid state to the target display card equipment when the transmission of the data to be transmitted is completed, so that the target display card equipment determines that the data transmission is completed.
Optionally, the transmission module 802 may further include:
and the data transmission task flag bit adjusting sub-module is used for adjusting the data transmission task flag bit to be in an invalid state when the transmission of the data to be transmitted is completed.
Optionally, the data transmission task flag is a data reception task flag, and the data transmission parameter includes a data reception address and a data reception length, and the transmission module 802 may include:
the kernel execution sub-module is used for executing kernel tasks to migrate received data with the length of data receiving length, which is positioned at a designated position of the memory, to a position corresponding to the data receiving address in the memory; the received data is sent by the target display card device, received by the communication module of the local terminal and written into the appointed position by the communication module of the local terminal.
Optionally, the transmission module 802 may further include:
the data transmission end zone bit reading submodule is used for reading the data transmission end zone bit from the memory;
the waiting sub-module is used for waiting for the target display card equipment to modify the data transmission ending zone bit into an effective state through the communication module of the local terminal when the data transmission ending zone bit is determined to be in an invalid state;
The kernel execution sub-module is specifically configured to execute the kernel task when it is determined that the data transmission end flag bit is in a valid state;
optionally, the transmission module 802 may further include:
and the data receiving task flag bit adjusting sub-module is used for adjusting the data receiving task flag bit to an invalid state when the kernel task is completed.
Optionally, the reading module 801 is specifically configured to:
reading an instruction file issued by the host device from a memory of the display card device; the instruction file is compiled by using a kernel task and a runtime library, and the instruction contained in the instruction file is used for executing the steps of determining the kernel task parameter as the data transmission parameter when determining that the data transmission task flag bit is in an effective state, and transmitting data with the target display card device by using the data transmission parameter through a communication module in the display card device or transmitting data with the target display card device by using the data transmission parameter, the kernel task and through the communication module in the display card device.
Referring to fig. 9, fig. 9 is a block diagram of another display card communication apparatus according to an embodiment of the present invention, where the apparatus is applied to a host device, and may include:
the receiving module 901 is used for receiving an input kernel task;
The issuing module 902 is configured to write a kernel task and kernel task parameters into a memory of the display card device, and adjust a data transmission task flag bit in the memory to an effective state when determining that the kernel task name is a data transmission task name;
the control module 903 is configured to control the graphics card device to execute a kernel task, so that when the graphics card device determines that the flag bit of the data transmission task is in an effective state, determine a kernel task parameter as a data transmission parameter, and perform data transmission with the target graphics card device through a communication module in the graphics card device by using the data transmission parameter, or perform data transmission with the target graphics card device through the communication module in the graphics card device by using the data transmission parameter and the kernel task.
Optionally, the kernel task is a data sending task, the data transmission parameters include communication information, a source address and a data sending length, and the communication module in the display card device reads data to be sent from the memory of the display card device by using the source address and the data sending length, and sends the data to be sent to the target display card device by using the communication information.
Optionally, the kernel task is a data receiving task, the data transmission parameters include a data receiving address and a data receiving length, and the display card device migrates the received data with the length of the data receiving length at the designated position in the memory to a position corresponding to the data receiving address in the memory according to the data receiving address and the data receiving length; the received data is transmitted by the target display card device, received by the communication module of the display card device and written into the designated position by the communication module of the display card device.
Optionally, the control module 903 includes:
the compiling sub-module is used for compiling the runtime library and the kernel task into instruction files and writing the instruction files into a memory of the display card device; the instruction in the instruction file is used for executing the steps of determining the kernel task parameter as the data transmission parameter when the data transmission task flag bit is determined to be in an effective state, and performing data transmission with the target display card device through the communication module in the display card device by utilizing the data transmission parameter or performing data transmission with the target display card device through the communication module in the display card device by utilizing the data transmission parameter and the kernel task;
and the control sub-module is used for controlling the display card equipment to run the instruction file.
Optionally, the compiling sub-module may include:
the conversion unit is used for converting the kernel task into an intermediate representation by utilizing the portable computing language framework;
and the compiling unit is used for compiling the intermediate representation and the runtime library into an instruction file by using a RISC-V compiler.
Referring to fig. 10, fig. 10 is a block diagram of a graphics card device according to an embodiment of the present invention, where the device may include:
a memory 1001 for storing a computer program;
a processor 1002 for implementing the graphics card communication method as described above when executing the computer program;
And the communication module 1003 is used for carrying out data transmission with the target display card device under the control of the processor or carrying out data transmission with the target display card device under the control of the target display card device.
For the specific process of the above-mentioned video card communication method, reference may be made to the corresponding content provided in the foregoing embodiment, and no further description is given here.
Referring to fig. 11, fig. 11 is a block diagram illustrating a configuration of a host device according to an embodiment of the present invention, and the embodiment of the present invention provides a host device 110, including a processor 111 and a memory 112; wherein the memory 112 is used for storing a computer program; the processor 111 is configured to execute the graphics card communication method provided in the foregoing embodiment when executing the computer program.
For the specific process of the above-mentioned video card communication method, reference may be made to the corresponding content provided in the foregoing embodiment, and no further description is given here.
The memory 112 may be a carrier for storing resources, such as a read-only memory, a random access memory, a magnetic disk, or an optical disk, and the storage mode may be transient storage or permanent storage.
In addition, the host device 110 further includes a power supply 113, a communication interface 114, an input-output interface 115, and a communication bus 116; wherein, the power supply 113 is used for providing working voltage for each hardware device on the host device 110; the communication interface 114 can create a data transmission channel between the host device 110 and an external device, and the communication protocol to be followed is any communication protocol applicable to the technical solution of the present invention, which is not specifically limited herein; the input/output interface 115 is used for obtaining external input data or outputting external output data, and the specific interface type thereof may be selected according to the specific application requirement, which is not limited herein.
Referring to fig. 12, fig. 12 is a block diagram of a graphics card communication system according to an embodiment of the present invention, where the system may include:
a graphic card device 1201 for executing a graphic card communication method applied to the graphic card device;
host device 1202 for executing a graphics card communication method applied to the host device.
For the specific process of the above-mentioned video card communication method, reference may be made to the corresponding content provided in the foregoing embodiment, and no further description is given here.
The embodiment of the invention also provides a computer readable storage medium, and a computer program is stored on the computer readable storage medium, and when the computer program is executed by a processor, the steps of the display card communication method of any embodiment are realized.
Since the embodiments of the computer readable storage medium portion and the embodiments of the video card communication method portion correspond to each other, the embodiments of the storage medium portion are referred to the description of the embodiments of the video card communication method portion, and are not repeated herein.
In the description, each embodiment is described in a progressive manner, and each embodiment is mainly described by the differences from other embodiments, so that the same similar parts among the embodiments are mutually referred. For the device disclosed in the embodiment, since it corresponds to the method disclosed in the embodiment, the description is relatively simple, and the relevant points refer to the description of the method section.
Those of skill would further appreciate that the various illustrative elements and algorithm steps described in connection with the embodiments disclosed herein may be implemented as electronic hardware, computer software, or combinations of both, and that the various illustrative elements and steps are described above generally in terms of functionality in order to clearly illustrate the interchangeability of hardware and software. Whether such functionality is implemented as hardware or software depends upon the particular application and design constraints imposed on the solution. Skilled artisans may implement the described functionality in varying ways for each particular application, but such implementation decisions should not be interpreted as causing a departure from the scope of the present invention.
The steps of a method or algorithm described in connection with the embodiments disclosed herein may be embodied directly in hardware, in a software module executed by a processor, or in a combination of the two. The software modules may be disposed in Random Access Memory (RAM), memory, read Only Memory (ROM), electrically programmable ROM, electrically erasable programmable ROM, registers, hard disk, a removable disk, a CD-ROM, or any other form of storage medium known in the art.
The display card communication method, the device, the display card equipment, the host equipment, the system and the medium provided by the invention are described in detail. The principles and embodiments of the present invention have been described herein with reference to specific examples, the description of which is intended only to facilitate an understanding of the method of the present invention and its core ideas. It should be noted that it will be apparent to those skilled in the art that various modifications and adaptations of the invention can be made without departing from the principles of the invention and these modifications and adaptations are intended to be within the scope of the invention as defined in the following claims.

Claims (24)

1. A graphics card communication method, applied to a graphics card device, comprising:
when a control instruction issued by a host device is received, reading a kernel task, kernel task parameters and a data transmission task flag bit issued by the host device from a memory of the display card device;
when the data transmission task flag bit is in an effective state, the kernel task parameter is determined to be a data transmission parameter, and the data transmission parameter is utilized to perform data transmission with the target display card device through a communication module in the display card device, or the data transmission parameter, the kernel task and the data transmission with the target display card device are utilized to perform data transmission through the communication module in the display card device.
2. The display card communication method according to claim 1, wherein the data transmission task flag bit is a data transmission task flag bit, the data transmission parameter includes communication information, a source address, and a data transmission length, and the data transmission is performed with a target display card device by using the data transmission parameter and through a communication module in the display card device, including:
integrating a preset destination address into the data transmission parameters, and writing the integrated data transmission parameters into a register group;
And controlling the communication module to read the data transmission parameters from the register group, so that the communication module reads the data to be transmitted, which is positioned at the source address and has the length of the data transmission length, from the memory, and transmits the data to be transmitted to the corresponding position of the preset destination address in the memory of the target display card device according to the communication information.
3. The graphics card communication method of claim 2, wherein the register set is a control status register set.
4. The display card communication method according to claim 3, wherein writing the integrated data transmission parameters into the register set includes:
writing each data transmission parameter into a corresponding control state register in the control state register group by utilizing a read-write control state register instruction;
the controlling the communication module to read the data transmission parameters from the register set includes:
and controlling the communication module to read each data transmission parameter from each control state register of the control state register set by utilizing a read set control state register instruction.
5. The graphics card communication method of claim 4, wherein the read-write control state register instruction and the read-set control state register instruction are compiled from corresponding inline assembler functions in a code file.
6. The graphics card communication method according to claim 4, wherein the controlling the communication module to read the data transmission parameters from the register set includes:
determining a target control state register storing valid bits in the control state register set;
and modifying the valid bit in the target control state register into a valid state, so that the communication module reads the data transmission parameter from the control state register set when detecting that the valid bit is in the valid state.
7. The method according to claim 2, wherein integrating the preset destination address into the data transmission parameters and writing the integrated data transmission parameters into the register set comprises:
integrating the preset destination address into the data transmission parameters by using a computing core with a core number being a designated number, and writing the integrated data transmission parameters into a register group;
The controlling the communication module to read the data transmission parameters from the register set includes:
and controlling the communication module to read the data transmission parameters from the register group by using the calculation core with the core number being the designated number.
8. The display card communication method according to claim 2, further comprising, after transmitting the data to be transmitted to a designated location of the preset destination address corresponding to the memory of the target display card device according to the communication information:
and when the data to be sent is sent, controlling the communication module to send a data sending end zone bit in a valid state to the target display card equipment so as to enable the target display card equipment to determine that the data sending is completed.
9. The display card communication method according to claim 2, further comprising, after transmitting the data to be transmitted to a designated location of the preset destination address corresponding to the memory of the target display card device according to the communication information:
and when the data to be transmitted is transmitted, the data transmission task flag bit is adjusted to be in an invalid state.
10. The display card communication method according to claim 1, wherein the data transmission task flag bit is a data reception task flag bit, the data transmission parameter includes a data reception address and a data reception length, and the data transmission is performed with the target display card device by using the data transmission parameter, the kernel task and through a communication module in the display card device, comprising:
executing the kernel task to migrate the received data with the length of the data receiving length, which is positioned at the designated position of the memory, to the corresponding position of the data receiving address in the memory; the received data is sent by the target display card device, received by the communication module of the local terminal and written into the appointed position by the communication module of the local terminal.
11. The graphics card communication method of claim 10, further comprising, prior to performing the kernel task:
reading a data transmission ending zone bit from the memory;
executing the kernel task when the data transmission ending zone bit is determined to be in a valid state;
and when the data transmission ending zone bit is determined to be in an invalid state, waiting for the target display card equipment to modify the data transmission ending zone bit into an valid state through a communication module of the local terminal.
12. The graphics card communication method of claim 10, further comprising:
and when the kernel task is completed, the data receiving task flag bit is adjusted to be in an invalid state.
13. The graphics card communication method according to any one of claims 1 to 12, wherein the reading, from the memory of the graphics card device, a kernel task issued by a host device includes:
reading an instruction file issued by the host device from a memory of the display card device; the instruction file is obtained by compiling the kernel task and the runtime library, and the instruction contained in the instruction file is used for executing the step of determining the kernel task parameter as the data transmission parameter when the data transmission task flag bit is determined to be in an effective state, and performing data transmission with the target display card device by using the data transmission parameter and through the communication module, or performing data transmission with the target display card device by using the data transmission parameter, the kernel task and through the communication module in the display card device.
14. A graphics card communication method, applied to a host device, the method comprising:
Receiving an input kernel task;
writing the kernel task and the kernel task parameters into a memory of the display card device, and adjusting a data transmission task flag bit in the memory into an effective state when the kernel task name is determined to be a data transmission task name;
and controlling the display card equipment to execute the kernel task, so that the display card equipment determines the kernel task parameter as a data transmission parameter when determining that the data transmission task flag bit is in an effective state, and performs data transmission with the target display card equipment by using the data transmission parameter and through a communication module in the display card equipment, or performs data transmission with the target display card equipment by using the data transmission parameter, the kernel task and through a communication module in the display card equipment.
15. The display card communication method according to claim 14, wherein the kernel task is a data transmission task, the data transmission parameter includes communication information, a source address and a data transmission length, the communication module in the display card device reads data to be transmitted from the memory of the display card device by using the source address and the data transmission length, and transmits the data to be transmitted to a target display card device by using the communication information.
16. The display card communication method according to claim 14, wherein the kernel task is a data receiving task, the data transmission parameter includes a data receiving address and a data receiving length, and the display card device migrates received data with a length equal to the data receiving length at a specified location in the memory to a location corresponding to the data receiving address in the memory according to the data receiving address and the data receiving length; the received data is sent by the target display card device, received by the communication module of the display card device and written into the appointed position by the communication module of the display card device.
17. The graphics card communication method according to any one of claims 14 to 16, wherein the controlling the graphics card device to execute the kernel task includes:
compiling a runtime library and the kernel task into an instruction file, and writing the instruction file into a memory of the display card device; the instruction in the instruction file is used for executing the step of determining the kernel task parameter as a data transmission parameter when determining that the data transmission task flag bit is in an effective state, and performing data transmission with a target display card device by using the data transmission parameter and through a communication module in the display card device, or performing data transmission with the target display card device by using the data transmission parameter and the kernel task and through a communication module in the display card device;
And controlling the display card equipment to run the instruction file.
18. The graphics card communication method according to claim 17, wherein compiling the runtime library and the kernel task into instruction files comprises:
converting the kernel task into an intermediate representation using a portable computing language framework;
the intermediate representation and the runtime library are compiled into the instruction file using a RISC-V compiler.
19. A graphics card communication apparatus, for use with a graphics card device, the apparatus comprising:
the reading module is used for reading the kernel task, the kernel task parameters and the data transmission task zone bit issued by the host device from the memory of the display card device when receiving the control instruction issued by the host device;
and the transmission module is used for determining the kernel task parameter as the data transmission parameter when the data transmission task flag bit is in an effective state, and carrying out data transmission with the target display card equipment by using the data transmission parameter and through the communication module in the display card equipment or carrying out data transmission with the target display card equipment by using the data transmission parameter, the kernel task and through the communication module in the display card equipment.
20. A graphic card communication apparatus, applied to a host device, comprising:
the receiving module is used for receiving an input kernel task;
the issuing module is used for writing the kernel task and the kernel task parameters into a memory of the display card device, and adjusting a data transmission task flag bit in the memory into an effective state when the kernel task name is determined to be the data transmission task name;
and the control module is used for controlling the display card device to execute the kernel task, so that the display card device determines the kernel task parameter as a data transmission parameter when determining that the data transmission task flag bit is in an effective state, and performs data transmission with the target display card device by using the data transmission parameter and through a communication module in the display card device, or performs data transmission with the target display card device by using the data transmission parameter, the kernel task and through a communication module in the display card device.
21. A graphic card apparatus, comprising:
a memory for storing a computer program;
a processor for implementing the graphics card communication method of any one of claims 1 to 13 when executing the computer program;
And the communication module is used for carrying out data transmission with the target display card equipment under the control of the processor or carrying out data transmission with the target display card equipment under the control of the target display card equipment.
22. A host device, comprising:
a memory for storing a computer program;
a processor for implementing the graphics card communication method of any one of claims 14 to 18 when executing the computer program.
23. A graphics card communication system, comprising:
a graphic card apparatus for performing the graphic card communication method according to any one of claims 1 to 13;
host device for performing the graphics card communication method according to any one of claims 14 to 18.
24. A computer readable storage medium having stored therein computer executable instructions which, when loaded and executed by a processor, implement the graphics card communication method of any one of claims 1 to 13 or the graphics card communication method of any one of claims 14 to 18.
CN202311279144.0A 2023-09-28 2023-09-28 Display card communication method and device, display card equipment, host equipment, system and medium Pending CN117251406A (en)

Priority Applications (1)

Application Number Priority Date Filing Date Title
CN202311279144.0A CN117251406A (en) 2023-09-28 2023-09-28 Display card communication method and device, display card equipment, host equipment, system and medium

Applications Claiming Priority (1)

Application Number Priority Date Filing Date Title
CN202311279144.0A CN117251406A (en) 2023-09-28 2023-09-28 Display card communication method and device, display card equipment, host equipment, system and medium

Publications (1)

Publication Number Publication Date
CN117251406A true CN117251406A (en) 2023-12-19

Family

ID=89131086

Family Applications (1)

Application Number Title Priority Date Filing Date
CN202311279144.0A Pending CN117251406A (en) 2023-09-28 2023-09-28 Display card communication method and device, display card equipment, host equipment, system and medium

Country Status (1)

Country Link
CN (1) CN117251406A (en)

Similar Documents

Publication Publication Date Title
CN107341053B (en) Heterogeneous multi-core programmable system and memory configuration and programming method of computing unit thereof
CN100565472C (en) A kind of adjustment method that is applicable to multiprocessor karyonide system chip
CN101221541B (en) Programmable communication controller for SOC and its programming model
CN110427337B (en) Processor core based on field programmable gate array and operation method thereof
CN103714026B (en) A kind of memory access method supporting former address data exchange and device
KR20100053593A (en) Mechanism for broadcasting system management interrupts to other processors in a computer system
JPS62206658A (en) Memory controller
US11880684B2 (en) RISC-V-based artificial intelligence inference method and system
KR20010051991A (en) Flexible general-purpose input/output system
CN100573500C (en) Stream handle IP kernel based on the Avalon bus
CN102566655B (en) Dynamic bus frequency modulation method of off-chip memory and system thereof
GB2377138A (en) Ring Bus Structure For System On Chip Integrated Circuits
CN110737618B (en) Method, device and storage medium for embedded processor to carry out rapid data communication
CN117251406A (en) Display card communication method and device, display card equipment, host equipment, system and medium
WO2021179411A1 (en) Quantum computing-oriented data interaction device, method and apparatus and medium
CN112559403B (en) Processor and interrupt controller therein
EP1058189B1 (en) Microcomputer with debugging system
US7254667B2 (en) Data transfer between an external data source and a memory associated with a data processor
JP4008911B2 (en) Control device
CN113806282A (en) Heterogeneous control system and loading method thereof
CN101539849B (en) Processor and gating method of register
CN109918321B (en) PCIe bus-based online reconstruction method
CN112035394B (en) Storage device of multi-core processor for real-time processing and data processing method
CN117149680B (en) Main control board for uploading sub-module log of chip mounter and uploading method
CN113672554B (en) Processor core, processor, system on chip and debug system

Legal Events

Date Code Title Description
PB01 Publication
PB01 Publication
SE01 Entry into force of request for substantive examination
SE01 Entry into force of request for substantive examination