WO2021082990A1 - 基于pcie总线的多芯片互联系统 - Google Patents

基于pcie总线的多芯片互联系统 Download PDF

Info

Publication number
WO2021082990A1
WO2021082990A1 PCT/CN2020/122248 CN2020122248W WO2021082990A1 WO 2021082990 A1 WO2021082990 A1 WO 2021082990A1 CN 2020122248 W CN2020122248 W CN 2020122248W WO 2021082990 A1 WO2021082990 A1 WO 2021082990A1
Authority
WO
WIPO (PCT)
Prior art keywords
processor
accelerator
data
pcie bus
pcie
Prior art date
Application number
PCT/CN2020/122248
Other languages
English (en)
French (fr)
Inventor
郭述帆
吴永航
蔡坤炎
李仕峰
Original Assignee
中兴通讯股份有限公司
Priority date (The priority date is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the date listed.)
Filing date
Publication date
Application filed by 中兴通讯股份有限公司 filed Critical 中兴通讯股份有限公司
Priority to US17/771,549 priority Critical patent/US20220365898A1/en
Priority to EP20880510.1A priority patent/EP4053709A4/en
Publication of WO2021082990A1 publication Critical patent/WO2021082990A1/zh

Links

Images

Classifications

    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F15/00Digital computers in general; Data processing equipment in general
    • G06F15/16Combinations of two or more digital computers each having at least an arithmetic unit, a program unit and a register, e.g. for a simultaneous processing of several programs
    • G06F15/163Interprocessor communication
    • G06F15/173Interprocessor communication using an interconnection network, e.g. matrix, shuffle, pyramid, star, snowflake
    • G06F15/17306Intercommunication techniques
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F13/00Interconnection of, or transfer of information or other signals between, memories, input/output devices or central processing units
    • G06F13/38Information transfer, e.g. on bus
    • G06F13/382Information transfer, e.g. on bus using universal interface adapter
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F13/00Interconnection of, or transfer of information or other signals between, memories, input/output devices or central processing units
    • G06F13/38Information transfer, e.g. on bus
    • G06F13/42Bus transfer protocol, e.g. handshake; Synchronisation
    • G06F13/4204Bus transfer protocol, e.g. handshake; Synchronisation on a parallel bus
    • G06F13/4221Bus transfer protocol, e.g. handshake; Synchronisation on a parallel bus being an input/output bus, e.g. ISA bus, EISA bus, PCI bus, SCSI bus
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F15/00Digital computers in general; Data processing equipment in general
    • G06F15/16Combinations of two or more digital computers each having at least an arithmetic unit, a program unit and a register, e.g. for a simultaneous processing of several programs
    • G06F15/163Interprocessor communication
    • G06F15/167Interprocessor communication using a common memory, e.g. mailbox
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F15/00Digital computers in general; Data processing equipment in general
    • G06F15/76Architectures of general purpose stored program computers
    • G06F15/78Architectures of general purpose stored program computers comprising a single central processing unit
    • G06F15/7839Architectures of general purpose stored program computers comprising a single central processing unit with memory
    • G06F15/7864Architectures of general purpose stored program computers comprising a single central processing unit with memory on more than one IC chip
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F2213/00Indexing scheme relating to interconnection of, or transfer of information or other signals between, memories, input/output devices or central processing units
    • G06F2213/0026PCI express

Definitions

  • the present disclosure relates to the field of hardware system solutions, and in particular to a multi-type chip interconnection system based on the PCIE bus.
  • NPU Neurological network Processing Unit
  • the existing computing chips are all solidified computing devices, which cannot be flexibly matched and tailored.
  • FPGA Field Programmable Gate Array
  • FPGA Field Programmable Gate Array
  • PCIE Peripheral Component Interconnect Express
  • the processor is interconnected with various high-speed peripherals through the tree-shaped PCIE bus structure to achieve the purpose of high-speed data transmission.
  • FIG. 1 a schematic diagram of a tree-shaped PCIE bus between a processor and an accelerator in the prior art.
  • the embodiments of the present disclosure provide a PCIE bus-based multi-chip interconnection system and a method for data cooperative processing, so as to at least solve the problem that the data coordination between multiprocessors and accelerators cannot be efficiently realized in related technologies.
  • a PCIE bus-based multi-chip interconnection system including: N accelerators, M processors, and M PCIE buses, where N and M are both positive integers, and M Greater than N; each accelerator includes: at least two endpoints, each processor includes: a root node, wherein one of the endpoints and one of the root nodes are connected by a PCIE bus, In this way, at least two endpoints of each accelerator are connected to at least two processors through different PCIE buses.
  • a method for data coordination processing including: a first processor initiates a read and write access request to an accelerator, wherein the root node in the first processor passes through a first PCIE bus Is connected to one end point of the accelerator, the accelerator includes at least two end points, and is connected to at least two processors through the at least two end points of the accelerator, and the at least two processors include: the first processing After the root node in the first processor converts the read-write access request into a first PCIE bus domain access address, it sends it to the accelerator, so that the first processor performs data on the accelerator access.
  • a method for data coordination processing including: an accelerator establishes a connection with M processors, wherein the accelerator includes: at least M endpoints, and each processor includes: A root node, wherein one of the endpoints and one of the root nodes are connected through a PCIE bus, and the M is a positive integer greater than 1, so that the M endpoints of the accelerator are connected to each other through different PCIE buses.
  • At least M processors are connected, and the M processors include: a first processor; the accelerator receives a first PCIE bus domain access address, where the first PCIE bus domain access address is the first processor A first PCIE bus domain access address converted from a read and write access request initiated by the first processor, so that the accelerator can perform data access to the first processor.
  • a storage medium in which a computer program is stored, wherein the computer program is set to execute any one of the foregoing data coordination processing method embodiments when running Steps in.
  • an electronic device including a memory and a processor, the memory is stored with a computer program, and the processor is configured to run the computer program to execute any of the above Steps in the embodiment of the method of data coordination processing.
  • a PCIE bus-based multi-chip interconnection system includes: N accelerators, M processors, and M PCIE buses, where N and M are both positive integers, and M is greater than N; each accelerator Including: at least two endpoints, each processor includes: a root node, where one endpoint and one root node are connected through a PCIE bus, so that at least two endpoints of each accelerator are connected to each other through different PCIE buses. At least two processors are connected.
  • Fig. 1 is a schematic diagram of a tree-shaped PCIE bus between a processor and an accelerator in the prior art
  • FIG. 2 is a schematic diagram of a multi-chip interconnection system based on PCIE bus according to an embodiment of the present disclosure
  • FIG. 3 is a schematic structural diagram of the connection between an accelerator and a processor in a system according to an embodiment of the present disclosure
  • Fig. 4 is a flowchart of a method for data coordination processing according to an embodiment of the present disclosure
  • Fig. 5 is a flowchart of another method for data coordination processing according to an embodiment of the present disclosure.
  • Fig. 6 is a schematic diagram of the connection between the processor and the accelerator according to a preferred embodiment of the present disclosure
  • Fig. 7 is a schematic diagram of processors accessing each other across PCIE data according to a preferred embodiment of the present disclosure
  • Fig. 8 is a data processing flowchart based on the PCIE star bus structure according to a preferred embodiment of the present disclosure
  • Fig. 9 is a schematic diagram of a data flow based on a three-processor three-PCIE star structure according to a preferred embodiment of the present disclosure.
  • an embodiment of a multi-chip interconnection system based on a PCIE bus is provided.
  • a schematic diagram of a multi-chip interconnection system based on the PCIE bus is provided.
  • the system includes: N accelerators, M processors, and M PCIE buses, where N and M are both positive integers, and M is greater than N.
  • Each accelerator of the N accelerators includes: at least two The endpoints, for each of the M processors, include: a root node, where one endpoint and one root node are connected through a PCIE bus, so that at least two endpoints of each accelerator pass through different The PCIE bus is connected to at least two processors.
  • a communication mode may or may not be established between N accelerators in the system.
  • the method of establishing communication can be any existing method, which is not specifically limited here.
  • N 2
  • N 3
  • the three accelerators two accelerators can establish communication, or communication can be established between all three accelerators. There is no communication.
  • accelerator 1 and accelerator 2 can establish communication, but there is no communication with accelerator 3; accelerator 1 can also establish communication with accelerator 2 and accelerator 3; and accelerators can also be established. 1 establishes communication with accelerator 2, and accelerator 2 establishes communication with accelerator 3, but no communication is established between accelerator 1 and accelerator 3.
  • FIG. 3 a schematic structural diagram of the connection between the accelerator and the processor in the system.
  • the structure may include: an accelerator 11, a processor 13, and a PCIE bus 15.
  • the accelerator 11 includes at least one endpoint, wherein the endpoint is configured to be connected to the processor 13.
  • the processor 13 includes a root node, where the root node is connected to an endpoint in the accelerator 11 through a PCIE bus 15.
  • each processor is connected to the accelerator through its own PCIE bus.
  • processor 1 is connected to the accelerator through PCIE bus 1
  • processor 2 is connected to the accelerator through PCIE bus 2. Connected.
  • the accelerator 11 includes at least two endpoints, wherein the endpoints are connected to the processor 13; the processor 13 includes a root node, wherein the root node communicates with the accelerator via the PCIE bus 15 The endpoints in 11 are connected. Because the system takes the accelerator as the center and connects with multiple processors to form a star-shaped PCIE computing structure, data coordination between multiple processors and accelerators can be completed without the need for additional high-speed devices. Improve the data processing efficiency and reduce the technical effect of the increase and decrease of equipment.
  • N accelerators are included in the system, it means that each accelerator of the N accelerators is connected to at least two processors.
  • communication between multiple accelerators can be achieved through Ethernet, or can be connected through PCIE bus, and so on.
  • data interaction between multiple accelerators and multiple processors can be realized, thereby improving the uniform operation rate of data.
  • the foregoing system may further include: a processor, which is further configured to allocate PCIE bus domain access addresses to the endpoints of the accelerator after the multi-chip interconnection system is powered on.
  • an embodiment of a method for data coordination processing is also provided. It should be noted that the steps shown in the flowchart of the accompanying drawings can be executed in a computer system such as a set of computer-executable instructions, and, although The logical sequence is shown in the flowchart, but in some cases, the steps shown or described may be performed in a different order than here.
  • Fig. 4 is a flowchart of a data coordination processing method according to an embodiment of the present disclosure. As shown in Fig. 4, the data coordination processing method includes the following steps:
  • step S402 the first processor initiates a read and write access request to the accelerator, where the root node of the first processor is connected to one end point of the accelerator through the first PCIE bus, and the accelerator includes at least two end points. It is connected to at least two processors, and the at least two processors include: a first processor.
  • step S404 the first processor converts the read-write access request into the first PCIE bus domain access address, and sends it to the accelerator, so that the first processor performs data access to the accelerator.
  • the first processor's data access to the accelerator may, but is not limited to, configuring and reading and writing data in the accelerator.
  • the first processor initiates a read and write access request to the accelerator, where the root node of the first processor is connected to one end point of the accelerator through the first PCIE bus, and the accelerator includes at least two end points, and at least two end points of the accelerator pass through The endpoint is connected to at least two processors.
  • the at least two processors include: a first processor. After the first processor converts the read-write access request into the first PCIE bus domain access address, it sends it to the accelerator so that the first processing The accelerator performs data access to the accelerator. Data coordination between multi-processors and accelerators can be completed without the need for additional high-speed devices. Improve the data processing efficiency and reduce the technical effect of the increase and decrease of equipment.
  • the above method may further include: the first processor converts the read and write access request into a first PCIE bus domain access address; after the conversion When the first PCIE bus domain access address falls into the domain space of the second processor, the first PCIE bus domain access address is converted to the second processor domain space access address, so that the first processor can process the second processor. To access the data of the device.
  • the first processor accesses the data of the second processor.
  • the second processor can also access the data of the first processor.
  • the access by the first processor to the data of the second processor may include: the accelerator receives the first data sent by the processor, and processes the first data to obtain The accelerator notifies the second processor of the result of processing the first data; the second processor includes one or more; the accelerator receives the second data request sent by the second processor; the accelerator responds to the second data request , Send the second data to the second processor.
  • the first processor and one or more second processors can access each other across PCIE.
  • sending the second data to the second processor by the accelerator in response to the second data request may include: the accelerator receives the first data sent by the first processor of the processor through the first PCIE bus; and the accelerator After processing the first data, obtain the second data and save the second data; the accelerator will notify the second processor of the second data; the accelerator sends the second data to the second processor through the second PCIE bus. Furthermore, the mutual access of data between the first processor and the second processor across the PCIE bus is realized.
  • FIG. 5 is a flowchart of a data coordination processing method according to an embodiment of the present disclosure. As shown in FIG. 5, the data coordination processing method includes the following steps:
  • Step 502 The accelerator establishes a connection with M processors, where the accelerator includes: at least M endpoints, and each processor includes: a root node, where one endpoint and one root node are connected through a PCIE bus, M is a positive integer greater than 1, so that the M endpoints of the accelerator are connected to at least M processors through different PCIE buses, and the M processors include: the first processor.
  • Step 504 The accelerator receives the first PCIE bus domain access address, where the first PCIE bus domain access address is the first PCIE bus domain access address converted by the first processor to the read and write access request initiated by the first processor, so that The accelerator performs data access to the first processor.
  • the first PCIE bus domain access address received by the accelerator where the first access domain address is the first processor's conversion of the read and write access request initiated by itself into the first PCIE bus domain access address.
  • the above method may further include: the first PCIE bus access domain address sent by the accelerator, and the first PCIE bus access domain address is converted into the first processor’s address.
  • the accelerator can access the first processor.
  • the accelerator establishes a connection with M processors, where the accelerator includes: at least two endpoints, and each processor includes: a root node, where one endpoint and one root node are connected through a PCIE bus, So that at least two endpoints of the accelerator are connected to at least two processors through different PCIE buses.
  • the at least two processors include: a first processor.
  • the accelerator receives a first PCIE bus domain access address, where the first bus is connected to the access address.
  • the address is the first PCIE bus domain access address that the root node in the first processor converts the read and write access request, so that the accelerator can access data to the first processor, and can complete multi-processing without additional high-speed devices.
  • the data synergy between the accelerator and the accelerator improves the efficiency of data processing and reduces the technical effect of the increase or decrease of equipment.
  • the present disclosure provides a preferred embodiment, and provides a method and corresponding device for star interconnection of multiple types of chips based on the PCIE bus.
  • the preferred embodiment adopts the following technical solution: the system is divided into two parts: a processor chip and an accelerator chip.
  • the core of the processor chip includes but is not limited to X86 and ARM processors, but it needs to support the PCIE function and is used as the Root Complex (RC) of the PCIE bus.
  • RC Root Complex
  • the accelerator chip includes accelerator modules and supporting memory units required by various computing scenarios, which are used as the End Point (EP) of the PCIE bus.
  • EP End Point
  • RCs are all connected to the EP through the PCIE bus, forming a star topology with an accelerator as the center and a processor as the radiating endpoint, in which the processor node can be greater than or equal to 2.
  • FIG. 6 a schematic diagram of the connection between the processor and the accelerator according to a preferred embodiment of the present disclosure.
  • RC and EP After the system is powered on, RC and EP perform initialization operations respectively. After the processor is started, it scans for PCIE devices and assigns the PCIE domain access window address to the EP in the memory domain.
  • the process of the processor accessing the accelerator domain space is: the accelerator device is the EP end of the PCIE bus, the processor issues read and write access to the accelerator space, and after RC is converted to the PCIE bus domain access, it reaches the accelerator to realize the configuration and data reading of the accelerator. write.
  • PCIE bus domain access issued by the accelerator can also be converted to the processor domain through RC to realize the read and write of the processor domain space.
  • the conversion process is opposite to that of the processor accessing the accelerator domain.
  • the processors access each other across PCIE: the access issued by processor #1 is first converted to PCIE#1 domain access through RC#1. If the converted address falls into the PCIE#2 domain space, RC#2 will then convert it to The processor #2 memory domain accesses the processor #1 to read and write the processor #2 space.
  • the related conversion method of processor #2's access to the processor #1 space is similar, but in the opposite direction. As shown in Figure 7, a schematic diagram of processors accessing each other across PCIE data.
  • Step 1 Plan the star topology connection structure of the processor and the accelerator
  • Step 2 The system is powered on, each module is started and initialized, the processor scans PCIE peripherals, and allocates access address space;
  • Step 3 The processor initializes the accelerator
  • Step 4 Each processor delivers the data to the accelerator module, and the accelerator returns the result after processing to notify the target processor, and each processor obtains the final result by consuming the data produced by the accelerator.
  • FIG. 9 a schematic diagram of a data flow based on a three-processor three-PCIE star structure in a preferred embodiment of the present disclosure.
  • the processor #1 obtains the basic data and sends it to the accelerator #1 via PCIE #1 for calculation.
  • the accelerator sends the processed data to the accelerator memory space and informs the processors #2 and #3.
  • processors #2 and #3 receive the message, they take the data through PCIE#2 and PCIE#3, and perform the next stage of analysis and processing.
  • processor #2 passes PCIE#2, PCIE#1, and processor #3.
  • the result is returned to processor #1 through PCIE#3 and PCIE#1.
  • This structure is a concrete implementation of data production and consumption under a complete star-shaped PCIE structure.
  • the method according to the above embodiment can be implemented by means of software plus the necessary general hardware platform, of course, it can also be implemented by hardware, but in many cases the former is Better implementation.
  • the technical solution of the present disclosure essentially or the part that contributes to the existing technology can be embodied in the form of a software product, and the computer software product is stored in a storage medium (such as ROM/RAM, magnetic disk, The optical disc) includes several instructions to make a terminal device (which can be a mobile phone, a computer, a server, or a network device, etc.) execute the methods described in the various embodiments of the present disclosure.
  • the embodiment of the present disclosure also provides a storage medium in which a computer program is stored, wherein the computer program is configured to execute the steps in any one of the foregoing method embodiments when running.
  • the aforementioned storage medium may be configured to store a computer program for executing the following steps:
  • the accelerator establishes a connection with M processors, where the accelerator includes: at least M endpoints, and each processor includes: a root node, where an endpoint and a root node are connected through a PCIE bus, and M is A positive integer greater than 1, so that the M endpoints of the accelerator are connected to at least M processors through different PCIE buses, and the M processors include: the first processor;
  • the accelerator receives the first PCIE bus domain access address, where the first bus domain access address is the first PCIE bus domain access that the first processor converts the read and write access request initiated by the first processor initiated by the first processor Address so that the accelerator can access data to the first processor.
  • the foregoing storage medium may include, but is not limited to: U disk, Read-Only Memory (Read-Only Memory, ROM for short), Random Access Memory (Random Access Memory, RAM for short), Various media that can store computer programs such as mobile hard disks, magnetic disks, or optical disks.
  • U disk Read-Only Memory
  • ROM Read-Only Memory
  • RAM Random Access Memory
  • Various media that can store computer programs such as mobile hard disks, magnetic disks, or optical disks.
  • An embodiment of the present disclosure also provides an electronic device, including a memory and a processor, the memory is stored with a computer program, and the processor is configured to run the computer program to execute the steps in any of the foregoing method embodiments.
  • the aforementioned electronic device may further include a transmission device and an input-output device, wherein the transmission device is connected to the aforementioned processor, and the input-output device is connected to the aforementioned processor.
  • the foregoing processor may be configured to execute the following steps through a computer program:
  • the accelerator establishes a connection with M processors, where the accelerator includes: at least M endpoints, and each processor includes: a root node, where an endpoint and a root node are connected through a PCIE bus, and M is A positive integer greater than 1, so that the M endpoints of the accelerator are connected to at least M processors through different PCIE buses, and the M processors include: the first processor;
  • the accelerator receives the first PCIE bus domain access address, where the first PCIE bus domain access address is the first PCIE bus domain access address converted by the first processor to the read and write access request initiated by the first processor, so that the accelerator Data access to the first processor.
  • modules or steps of the present disclosure can be implemented by a general computing device, and they can be concentrated on a single computing device or distributed in a network composed of multiple computing devices.
  • they can be implemented with program codes executable by the computing device, so that they can be stored in the storage device for execution by the computing device, and in some cases, can be executed in a different order than here.

Landscapes

  • Engineering & Computer Science (AREA)
  • Theoretical Computer Science (AREA)
  • Physics & Mathematics (AREA)
  • General Engineering & Computer Science (AREA)
  • General Physics & Mathematics (AREA)
  • Computer Hardware Design (AREA)
  • Software Systems (AREA)
  • Mathematical Physics (AREA)
  • Bus Control (AREA)

Abstract

一种基于PCIE总线的多芯片互联系统,包括:N个加速器,M个处理器,以及M根PCIE总线,其中,N和M均为正整数,且M大于N;每个加速器包括:至少两个端点,每个处理器包括:一个根节点,其中,一个端点与一个根节点之间通过一根PCIE总线连接,以使每个加速器的至少两个端点通过不同的PCIE总线与至少两个处理器连接。由于该系统是以加速器为中心的与多个处理器相连形成星型PCIE的计算结构,可在无需额外增加高速器件的条件下完成多处理器与加速器的数据协同,提高了数据的处理效率以及减少设备的增减的技术效果。

Description

基于PCIE总线的多芯片互联系统 技术领域
本公开涉及硬件系统方案领域,尤其涉及一种基于PCIE总线多类型芯片互联系统。
背景技术
硬件性能迅猛提升,计算场景日趋细化。人工智能、自动驾驶、5G通信各个领域对硬件有不同的诉求,或侧重NPU(Neural network Processing Unit神经网络处理单元)单元性能,或考量传感器稳定性及决策系统正确性,或关注传输带宽及时延。
现有的计算芯片均为固化的计算装置,无法灵活搭配裁剪,FPGA(Field Programmable Gate Array现场可编程门阵列)虽可以实现硬件编程,无奈其价格是多数设备厂商难以承受之重。
PCIE(Peripheral Component Interconnect Express快速外设部件互联标准)为Intel公司提出的一种高速串行计算机扩展总线标准,4.0版本能够达到16GT/s速率,满足大多数数据高速传输需求。处理器通过树形PCIE总线结构与各高速外设互联,达到数据高速传输的目的。如图1所示,现有技术中的处理器与加速器之间的树形PCIE总线示意图。
传统多处理器协同使用加速器生产数据的方案,各处理器子系统通过PCIE与各加速器互联,处理器子系统间通过以太网传递数据,此方案需要依赖于高速网络器件,存在硬件成本高、数据时延大的缺点。
针对上述的问题,目前尚未提出有效的解决方案。
发明内容
本公开实施例提供了一种基于PCIE总线的多芯片互联系统及数据协同处理的方法,以至少解决相关技术中无法高效的实现多处理器与加速器的数据协同的问题。
根据本公开的一个实施例,提供了一种基于PCIE总线的多芯片互联系统,包括:N个加速器,M个处理器,以及M根PCIE总线,其中,N和M均为正整数,且M大于N;所述每个加速器包括:至少两个端点,所述每个处理器包括:一个根节点,其中,一个所述端点与一个所述根节点之间通过一根所述PCIE总线连接,以使所述每个加速器的至少两个端点通过不同的PCIE总线与至少两个处理器连接。
根据本公开的另一个实施例,提供了一种数据协调处理的方法,包括:第一处理器向加速器发起读写访问请求,其中,所述第一处理器中的根节点通过第一PCIE总线与所述加速器中的一个端点相连,所述加速器包括至少两个端点,通过所述加速器的至少两个端点与至少两个处理器连接,所述至少两个处理器包括:所述第一处理器;所述第一处理器中的根节点将所述读写访问请求转换为第一PCIE总线域访问地址之后,发送至所述加速器,以使所述第一处理器对所述加速器进行数据访问。
根据本公开的另一个实施例,提供了一种数据协调处理的方法,包括:加速器与M个处理器建立连接,其中,所述加速器包括:至少M个端点,所述每个处理器包括:一个根节点,其中,一个所述端点与一个所述根节点之间通过一根PCIE总线连接,所述M为大于1的正整数,以使所述加速器的M个端点通过不同的PCIE总线与至少M个处理器连接,所述M个处理器包括:第一处理器;所述加速器接收第一PCIE总线域访问地址,其中,所述第一PCIE总线域访问地址是所述第一处理器将所述第一处理器发起的读写访问请求转换的第一PCIE总线域访问地址,以使所述加速器对所述第一处理器进行数据访问。
根据本公开的又一个实施例,还提供了一种存储介质,所述存储介质中存储有计算机程序,其中,所述计算机程序被设置为运行时执行上述任一项数据协调处理的方法实施例中的步骤。
根据本公开的又一个实施例,还提供了一种电子装置,包括存储器和处理器,所述存储器中存储有计算机程序,所述处理器被设置为运行所述 计算机程序以执行上述任一项数据协调处理的方法实施例中的步骤。
通过本公开,一种基于PCIE总线的多芯片互联系统,包括:N个加速器,M个处理器,以及M根PCIE总线,其中,N和M均为正整数,且M大于N;每个加速器包括:至少两个端点,每个处理器包括:一个根节点,其中,一个端点与一个根节点之间通过一根PCIE总线连接,以使每个加速器的至少两个端点通过不同的PCIE总线与至少两个处理器连接。
附图说明
此处所说明的附图用来提供对本公开的进一步理解,构成本申请的一部分,本公开的示意性实施例及其说明用于解释本公开,并不构成对本公开的不当限定。在附图中:
图1是现有技术中的处理器与加速器之间的树形PCIE总线示意图;
图2是根据本公开实施例的基于PCIE总线的多芯片互联系统示意图;
图3是根据本公开实施例的系统中的加速器与处理器连接的结构示意图;
图4是根据本公开实施例的数据协调处理的方法流程图;
图5是根据本公开实施例的又一数据协调处理的方法流程图;
图6是根据本公开优选实施例处理器与加速器之间的连接示意图;
图7是根据本公开优选实施例的处理器跨PCIE数据相互访问的示意图;
图8是根据本公开优选实施例的基于PCIE星型总线结构的数据处理流程图;
图9是根据本公开优选实施例的基于三个处理器三个PCIE星型结构的数据流的示意图。
具体实施方式
下文中将参考附图并结合实施例来详细说明本公开。需要说明的是,在不冲突的情况下,本申请中的实施例及实施例中的特征可以相互组合。
需要说明的是,本公开的说明书和权利要求书及上述附图中的术语“第一”、“第二”等是用于区别类似的对象,而不必用于描述特定的顺序或先后次序。
根据本公开实施例,提供了一种基于PCIE总线的多芯片互联系统实施例,如图2所示,基于PCIE总线的多芯片互联系统的示意图。该系统中包括:N个加速器,M个处理器,以及M根PCIE总线,其中,N和M均为正整数,且M大于N,对于N个加速器中的每个加速器均包括:至少两个端点,对于M个处理器中的每个处理器均包括:一个根节点,其中,一个端点与一个根节点之间通过一根PCIE总线连接,以使每个加速器的至少两个端点通过不同的PCIE总线与至少两个处理器连接。
需要说明的是,在N为多个的情况下,系统中的N个加速器之间可以建立通信方式,也可以不建立通信方式。建立通信的方式可以是现有任何方式,在此不作具体的限定。例如,在N为2个的情况下,则表示系统中存在2个以下结构:以一个加速器和至少两个处理器的结构为例,如图3所示,其中2个加速器之间可以建立通信也可以不建立通信。依次类推,在N为3时,系统中存在3个如图3所示的结构,其中,3个加速器中可以2个加速器建立通信,也可以3个之间都有建立通信,还可以之间没有通信。例如,系统中存在加速器1、加速器2以及加速器3,其中,加速器1和加速2可以建立通信,与加速器3之间没有通信;还可以加速器1与加速器2和加速器3都建立通信;还可以加速器1和加速器2建立通信,加速器2和加速器3建立通信,但加速器1和加速器3之间没建立通信。
如图3所示,系统中的加速器与处理器连接的结构示意图,该结构中可以包括:加速器11、和处理器13、以及PCIE总线15。
加速器11包括至少一个端点,其中,端点设置为与处理器13相连。
处理器13中包括根节点,其中,根节点通过PCIE总线15与加速器 11中的端点相连方法。
需要说明的是,在多个处理器与加速器相连,则每个处理器通过各自的PCIE总线与加速器相连,例如,处理器1通过PCIE总线1与加速器相连,处理器2通过PCIE总线2与加速器相连。
通过上述系统,N个加速器和M个处理器建立通信,加速器11包括至少两个端点,其中,端点与处理器13相连;处理器13中包括根节点,其中,根节点通过PCIE总线15与加速器11中的端点相连。由于该系统以加速器为中心的与多个处理器相连的形成星型PCIE的计算结构,可在无需额外增加高速器件的条件下完成多处理器与加速器的数据协同。提高了数据的处理效率以及减少设备的增减的技术效果。
需要说明的是,在系统中包括N个加速器,则表示N个加速器中的每个加速器与至少两个处理器相连。
还需要说明的是,在本公开实施例中,在加速器为多个的情况下,多个加速器之间可以通过以太网实现通信,也可以通过PCIE总线进行相连等等。进而可以实现多个加速器与多个处理器之间的数据交互,从而提高了数据的匀运算速率。
作为一种可选的实施例,上述系统还可以包括:处理器,还设置为在多芯片互联系统上电后,为加速器的端点分配PCIE总线域访问地址。
根据本公开实施例,还提供了数据协调处理的方法实施例,需要说明的是,在附图的流程图示出的步骤可以在诸如一组计算机可执行指令的计算机系统中执行,并且,虽然在流程图中示出了逻辑顺序,但是在某些情况下,可以以不同于此处的顺序执行所示出或描述的步骤。
需要说明的是,该数据协调处理的方法是基于上述基于PCIE总线的多芯片互联系统实现的。
下面将对本公开实施例的数据协调处理的方法进行详细说明。
图4是根据本公开实施例的数据协调处理的方法的流程图,如图4所示,该数据协调处理的方法包括如下步骤:
步骤S402,第一处理器向加速器发起读写访问请求,其中,第一处理器的根节点通过第一PCIE总线与加速器的一个端点相连,加速器包括至少两个端点,通过加速器的至少两个端点与至少两个处理器连接,至少两个处理器包括:第一处理器。
步骤S404,第一处理器将读写访问请求转换为第一PCIE总线域访问地址之后,发送至加速器,以使第一处理器对加速器进行数据访问。
其中,第一处理器对加速器进行数据访问可以但不限于对加速器中的数据进行配置以及读写操作。
通过上述步骤,第一处理器向加速器发起读写访问请求,其中,第一处理器的根节点通过第一PCIE总线与加速器的一个端点相连,加速器包括至少两个端点,通过加速器的至少两个端点与至少两个处理器连接,至少两个处理器包括:第一处理器,第一处理器将读写访问请求转换为第一PCIE总线域访问地址之后,发送至加速器,以使第一处理器对加速器进行数据访问。可在无需额外增加高速器件的条件下完成多处理器与加速器的数据协同。提高了数据的处理效率以及减少设备的增减的技术效果。
作为一种可选的实施例,第一处理器向加速器发起读写访问请求之后,上述方法还可以包括:第一处理器将读写访问请求转换为第一PCIE总线域访问地址;在转换后的第一PCIE总线域访问地址落入到第二处理器的域空间的情况下,第一PCIE总线域访问地址转换为第二处理器域空间访问地址,以使第一处理器对第二处理器的数据进行访问。
需要说明的是,上述是第一处理器对第二处理器的数据进行访问,同样的方式操作,也可以实现第二处理器对第一处理器的数据进行访问。
作为一种可选的实施例,第一处理器对第二处理器的数据进行访问可以包括:加速器接收处理器发送的第一数据,并对第一数据进行处理,得到对第一数据处理后的第二数据;加速器将处理第一数据的结果通知第二处理器,其中,第二处理器包括一个或多个;加速器接收第二处理器发送的第二数据请求;加速器响应第二数据请求,将第二数据发送至第二处理 器。进而实现第一处理器与一个或多个第二处理器之间跨PCIE互相访问。
作为一种可选的实施例,加速器响应第二数据请求,将第二数据发送至第二处理器可以包括:加速器通过第一PCIE总线接收处理器的第一处理器发送的第一数据;加速器对第一数据处理后,得到第二数据,并保存第二数据;加速器将得到第二数据通知第二处理器;加速器通过第二PCIE总线将第二数据发送至第二处理器。进而实现第一处理器与第二处理器之间数据跨PCIE总线的互相访问。
下面将对本公开实施例的数据协调处理的方法进行详细说明。
图5是根据本公开实施例的数据协调处理的方法的流程图,如图5所示,该数据协调处理的方法包括如下步骤:
步骤502,加速器与与M个处理器建立连接,其中,加速器包括:至少M个端点,每个处理器包括:一个根节点,其中,一个端点与一个根节点之间通过一根PCIE总线连接,M为大于1的正整数,以使加速器的M个端点通过不同的PCIE总线与至少M个处理器连接,M个处理器包括:第一处理器。
步骤504,加速器接收第一PCIE总线域访问地址,其中,第一PCIE总线域访问地址是第一处理器将第一处理器发起的读写访问请求转换的第一PCIE总线域访问地址,以使加速器对第一处理器进行数据访问。
其中,加速器接收的第一PCIE总线域访问地址,该第一访问域地址是第一处理器将自身发起的读写访问请求转换为第一PCIE总线域访问地址。
可选地,加速器接收第一处理器发送的读写访问请求之前,上述方法还可以包括:加速器发送的第一PCIE总线访问域地址,在第一PCIE总线访问域地址转换为第一处理器的域空间访问地址的情况下,以使加速器对第一处理器进行访问。
通过上述步骤,加速器与M个处理器建立连接,其中,加速器包括:至少两个端点,每个处理器包括:一个根节点,其中,一个端点与一个根 节点之间通过一根PCIE总线连接,以使加速器的至少两个端点通过不同的PCIE总线与至少两个处理器连接,至少两个处理器包括:第一处理器,加速器接收第一PCIE总线域访问地址,其中,第一总线与访问地址是第一处理器中的根节点将读写访问请求转换的第一PCIE总线域访问地址,以使加速器对第一处理器进行数据访问,可在无需额外增加高速器件的条件下完成多处理器与加速器的数据协同,提高了数据的处理效率以及减少设备的增减的技术效果。
结合上述实施例,本公开提供一种优选实施例,提供了一种基于PCIE总线多类型芯片星型互联的方法及对应装置。
该优选实施例采用以下技术方案:系统分为两大部分:处理器芯片和加速器芯片。
处理器芯片核心包括但不限于X86、ARM处理器,但需支持PCIE功能,用作PCIE总线的Root Complex(RC)。
加速器芯片包括各类计算场景需要的加速器模块及配套内存单元,用作PCIE总线的End Point(EP)。
多个RC均通过PCIE总线与EP相连,形成一种以加速器为中心,处理器为辐射端点的星型拓扑结构,其中处理器节点可大于等于2。如图6所示,根据本公开优选实施例的处理器与加速器之间的连接示意图。
系统上电后,RC与EP分别进行初始化操作。处理器启动后,扫描PCIE设备,在存储器域为EP分配PCIE域访问窗口地址。
处理器访问加速器域空间的过程为:加速器设备为PCIE总线的EP端,处理器发出对加速器空间的读写访问,经过RC转换为PCIE总线域访问后到达加速器,实现对加速器的配置及数据读写。
加速器访问处理器地址域空间的过程为:加速器发出的PCIE总线域访问同样可以通过RC转化至处理器域,实现对处理器域空间的读写,转换流程与处理器访问加速器域相反。
处理器跨PCIE互相访问:处理器#1发出的访问首先经过RC#1转换 为PCIE#1域访问,若转换后的地址落入到PCIE#2域空间,RC#2随后会将其转换为处理器#2存储器域访问,实现处理器#1对处理器#2空间的读写。处理器#2对处理器#1空间的访问相关转换方式类似,但方向相反。如图7所示,处理器跨PCIE数据相互访问的示意图。
通过该优选实施例,提供了一种PCIE星型总线结构方案,取得降低硬件成本,减少访问时延的效果。
如图8所示,本公开优选实施例中的,一种基于PCIE星型总线结构的数据处理流程图,下面结合附图8对技术方案的实施作进一步的详细描述:
步骤一:规划处理器与加速器星型拓扑连接结构;
步骤二:系统上电,各模块启动并初始化,处理器扫描PCIE外设,分配访问地址空间;
步骤三:处理器初始化加速器;
步骤四:各处理器将数据下发加速器模块,加速器处理后返回结果通知目标处理器,各处理器通过消费加速器生产的数据,得出最终结果。
例如,以三个处理器三个PCIE星型结构下的具体实施来说明数据流的处理,处理器的个数在此仅作为示例说明数据流向,具体的处理器个数可按照实际应用场景确定,如图9所示,本公开优选实施例的基于三个处理器三个PCIE星型结构的数据流的示意图。
处理器#1获得了基本数据,经由PCIE#1送入加速器#1进行运算,加速器将处理后的数据送入加速器内存空间,通知处理器#2、#3。处理器#2、#3收到消息后,经PCIE#2、PCIE#3取走数据,进行下一阶段的分析处理,最终处理器#2通过PCIE#2、PCIE#1,处理器#3通过PCIE#3、PCIE#1将结果返回给处理器#1。该结构为一个完整的星形星型PCIE结构下数据生产消费的具体实施。
通过以上的实施方式的描述,本领域的技术人员可以清楚地了解到根据上述实施例的方法可借助软件加必需的通用硬件平台的方式来实现,当 然也可以通过硬件,但很多情况下前者是更佳的实施方式。基于这样的理解,本公开的技术方案本质上或者说对现有技术做出贡献的部分可以以软件产品的形式体现出来,该计算机软件产品存储在一个存储介质(如ROM/RAM、磁碟、光盘)中,包括若干指令用以使得一台终端设备(可以是手机,计算机,服务器,或者网络设备等)执行本公开各个实施例所述的方法。
本公开的实施例还提供了一种存储介质,该存储介质中存储有计算机程序,其中,该计算机程序被设置为运行时执行上述任一项方法实施例中的步骤。
可选地,在本实施例中,上述存储介质可以被设置为存储用于执行以下步骤的计算机程序:
S1,加速器与M个处理器建立连接,其中,加速器包括:至少M个端点,每个处理器包括:一个根节点,其中,一个端点与一个根节点之间通过一根PCIE总线连接,M为大于1的正整数,以使加速器的M个端点通过不同的PCIE总线与至少M个处理器连接,M个处理器包括:第一处理器;
S2,加速器接收第一PCIE总线域访问地址,其中,第一总线域访问地址是第一处理器将第一处理器发起的第一处理器发起的读写访问请求转换的第一PCIE总线域访问地址,以使加速器对第一处理器进行数据访问。
可选地,在本实施例中,上述存储介质可以包括但不限于:U盘、只读存储器(Read-Only Memory,简称为ROM)、随机存取存储器(Random Access Memory,简称为RAM)、移动硬盘、磁碟或者光盘等各种可以存储计算机程序的介质。
本公开的实施例还提供了一种电子装置,包括存储器和处理器,该存储器中存储有计算机程序,该处理器被设置为运行计算机程序以执行上述任一项方法实施例中的步骤。
可选地,上述电子装置还可以包括传输设备以及输入输出设备,其中,该传输设备和上述处理器连接,该输入输出设备和上述处理器连接。
可选地,在本实施例中,上述处理器可以被设置为通过计算机程序执行以下步骤:
S1,加速器与M个处理器建立连接,其中,加速器包括:至少M个端点,每个处理器包括:一个根节点,其中,一个端点与一个根节点之间通过一根PCIE总线连接,M为大于1的正整数,以使加速器的M个端点通过不同的PCIE总线与至少M个处理器连接,M个处理器包括:第一处理器;
S2,加速器接收第一PCIE总线域访问地址,其中,第一PCIE总线域访问地址是第一处理器将第一处理器发起的读写访问请求转换的第一PCIE总线域访问地址,以使加速器对第一处理器进行数据访问。
可选地,本实施例中的具体示例可以参考上述实施例及可选实施方式中所描述的示例,本实施例在此不再赘述。
显然,本领域的技术人员应该明白,上述的本公开的各模块或各步骤可以用通用的计算装置来实现,它们可以集中在单个的计算装置上,或者分布在多个计算装置所组成的网络上,可选地,它们可以用计算装置可执行的程序代码来实现,从而,可以将它们存储在存储装置中由计算装置来执行,并且在某些情况下,可以以不同于此处的顺序执行所示出或描述的步骤,或者将它们分别制作成各个集成电路模块,或者将它们中的多个模块或步骤制作成单个集成电路模块来实现。这样,本公开不限制于任何特定的硬件和软件结合。
以上所述仅为本公开的优选实施例而已,并不用于限制本公开,对于本领域的技术人员来说,本公开可以有各种更改和变化。凡在本公开的原则之内,所作的任何修改、等同替换、改进等,均应包含在本公开的保护范围之内。

Claims (10)

  1. 一种基于快速外设部件互联标准PCIE总线的多芯片互联系统,包括:
    N个加速器,M个处理器,以及M根PCIE总线,其中,N和M均为正整数,且M大于N;
    所述每个加速器包括:至少两个端点,所述每个处理器包括:一个根节点,其中,一个所述端点与一个所述根节点之间通过一根所述PCIE总线连接,以使所述每个加速器的至少两个端点通过不同的PCIE总线与至少两个处理器连接。
  2. 根据权利要求1所述的系统,其中,所述系统还包括:
    所述处理器,还设置为在所述多芯片互联系统上电后,为所述加速器的端点分配PCIE总线域访问地址。
  3. 一种数据协调处理的方法,包括:
    第一处理器向加速器发起读写访问请求,其中,所述第一处理器的根节点通过第一PCIE总线与所述加速器的一个端点相连,所述加速器包括至少两个端点,通过所述加速器的至少两个端点与至少两个处理器连接,所述至少两个处理器包括:所述第一处理器;
    所述第一处理器将所述读写访问请求转换为第一PCIE总线域访问地址之后,发送至所述加速器,以使所述第一处理器对所述加速器进行数据访问。
  4. 根据权利要求3所述的方法,其中,所述第一处理器向加速器发起读写访问请求之后,所述方法还包括:
    所述第一处理器将所述读写访问请求转换为第一PCIE总线域访问地址;
    在转换后的所述第一PCIE总线域访问地址落入到第二处理器的 域空间的情况下,所述第一PCIE总线域访问地址转换为所述第二处理器域空间访问地址,以使所述第一处理器对所述第二处理器进行数据访问。
  5. 根据权利要求4所述的方法,其中,所述第一处理器对所述第二处理器的数据进行访问包括:
    所述加速器接收所述第一处理器发送的第一数据,并对所述第一数据进行处理,得到对所述第一数据处理后的第二数据;
    所述加速器将处理所述第一数据的结果通知所述第二处理器,其中,所述第二处理器包括一个或多个;
    所述加速器接收所述第二处理器发送的第二数据请求;
    所述加速器响应所述第二数据请求,将所述第二数据发送至所述第二处理器。
  6. 根据权利要求5所述的方法,其中,所述加速器响应所述第二数据请求,将所述第二数据发送至所述第二处理器包括:
    所述加速器通过第一PCIE总线接收所述第一处理器发送的第一数据;
    所述加速器对所述第一数据处理后,得到所述第二数据,并保存所述第二数据;
    所述加速器将得到所述第二数据通知所述第二处理器;
    所述加速器通过第二PCIE总线将所述第二数据发送至所述第二处理器。
  7. 一种数据协调处理的方法,包括:
    加速器与M个处理器建立连接,其中,所述加速器包括:至少M个端点,所述每个处理器包括:一个根节点,其中,一个所述端点 与一个所述根节点之间通过一根PCIE总线连接,所述M为大于1的正整数,以使所述加速器的M个端点通过不同的PCIE总线与至少M个处理器连接,所述M个处理器包括:第一处理器;
    所述加速器接收第一PCIE总线域访问地址,其中,所述第一PCIE总线域访问地址是所述第一处理器将所述第一处理器发起的读写访问请求转换的第一PCIE总线域访问地址,以使所述加速器对所述第一处理器进行数据访问。
  8. 根据权利要求7所述的方法,其中,所述加速器接收第一处理器发送的读写访问请求之前,所述方法还包括:
    所述加速器发送第一PCIE总线访问域地址;
    在所述第一PCIE总线访问域地址转换为所述第一处理器的域空间访问地址的情况下,以使所述加速器对所述第一处理器进行访问。
  9. 一种存储介质,所述存储介质中存储有计算机程序,其中,所述计算机程序被设置为运行时执行所述权利要求3至6,或权利要求7至8任一项中所述的方法。
  10. 一种电子装置,包括存储器和处理器,所述存储器中存储有计算机程序,所述处理器被设置为运行所述计算机程序以执行所述权利要求3至6,或权利要求7至8任一项中所述的方法。
PCT/CN2020/122248 2019-10-31 2020-10-20 基于pcie总线的多芯片互联系统 WO2021082990A1 (zh)

Priority Applications (2)

Application Number Priority Date Filing Date Title
US17/771,549 US20220365898A1 (en) 2019-10-31 2020-10-20 Multi-chip interconnection system based on pcie buses
EP20880510.1A EP4053709A4 (en) 2019-10-31 2020-10-20 PCIE BUS-BASED MULTI-CHIP INTERCONNECT SYSTEM

Applications Claiming Priority (2)

Application Number Priority Date Filing Date Title
CN201911056191.2A CN112749121A (zh) 2019-10-31 2019-10-31 基于pcie总线的多芯片互联系统
CN201911056191.2 2019-10-31

Publications (1)

Publication Number Publication Date
WO2021082990A1 true WO2021082990A1 (zh) 2021-05-06

Family

ID=75644830

Family Applications (1)

Application Number Title Priority Date Filing Date
PCT/CN2020/122248 WO2021082990A1 (zh) 2019-10-31 2020-10-20 基于pcie总线的多芯片互联系统

Country Status (4)

Country Link
US (1) US20220365898A1 (zh)
EP (1) EP4053709A4 (zh)
CN (1) CN112749121A (zh)
WO (1) WO2021082990A1 (zh)

Families Citing this family (1)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN114610667B (zh) * 2022-05-10 2022-08-12 沐曦集成电路(上海)有限公司 复用数据总线装置和芯片

Citations (5)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
US20180024955A1 (en) * 2013-12-26 2018-01-25 Intel Corporation Computer architecture to provide flexibility and/or scalability
CN107690622A (zh) * 2016-08-26 2018-02-13 华为技术有限公司 实现硬件加速处理的方法、设备和系统
CN109240980A (zh) * 2018-06-26 2019-01-18 深圳市安信智控科技有限公司 具有多个高速串行访存通道的访存密集型算法加速芯片
CN109739785A (zh) * 2018-09-20 2019-05-10 威盛电子股份有限公司 多核系统的内连线结构
CN110297802A (zh) * 2019-06-09 2019-10-01 苏州长江睿芯电子科技有限公司 一种新型处理器之间互联结构

Family Cites Families (7)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
US8996644B2 (en) * 2010-12-09 2015-03-31 Solarflare Communications, Inc. Encapsulated accelerator
US9405550B2 (en) * 2011-03-31 2016-08-02 International Business Machines Corporation Methods for the transmission of accelerator commands and corresponding command structure to remote hardware accelerator engines over an interconnect link
US9842075B1 (en) * 2014-09-12 2017-12-12 Amazon Technologies, Inc. Presenting multiple endpoints from an enhanced PCI express endpoint device
US10503922B2 (en) * 2017-05-04 2019-12-10 Dell Products L.P. Systems and methods for hardware-based security for inter-container communication
US20190068466A1 (en) * 2017-08-30 2019-02-28 Intel Corporation Technologies for auto-discovery of fault domains
US10585734B2 (en) * 2018-01-04 2020-03-10 Qualcomm Incorporated Fast invalidation in peripheral component interconnect (PCI) express (PCIe) address translation services (ATS)
US10769092B2 (en) * 2018-12-20 2020-09-08 Dell Products, L.P. Apparatus and method for reducing latency of input/output transactions in an information handling system using no-response commands

Patent Citations (5)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
US20180024955A1 (en) * 2013-12-26 2018-01-25 Intel Corporation Computer architecture to provide flexibility and/or scalability
CN107690622A (zh) * 2016-08-26 2018-02-13 华为技术有限公司 实现硬件加速处理的方法、设备和系统
CN109240980A (zh) * 2018-06-26 2019-01-18 深圳市安信智控科技有限公司 具有多个高速串行访存通道的访存密集型算法加速芯片
CN109739785A (zh) * 2018-09-20 2019-05-10 威盛电子股份有限公司 多核系统的内连线结构
CN110297802A (zh) * 2019-06-09 2019-10-01 苏州长江睿芯电子科技有限公司 一种新型处理器之间互联结构

Also Published As

Publication number Publication date
US20220365898A1 (en) 2022-11-17
EP4053709A4 (en) 2022-12-21
EP4053709A1 (en) 2022-09-07
CN112749121A (zh) 2021-05-04

Similar Documents

Publication Publication Date Title
CN105516191B (zh) 基于fpga实现的万兆网tcp协议卸载引擎toe的系统
WO2023093043A1 (zh) 一种数据处理方法、装置及介质
EP3850493A1 (en) Methods and apparatus for high-speed data bus connection and fabric management
CN109960671B (zh) 一种数据传输系统、方法及计算机设备
CN105450588A (zh) 一种基于rdma的数据传输方法及rdma网卡
US10067900B2 (en) Virtualized I/O device sharing within a distributed processing node system
US7818509B2 (en) Combined response cancellation for load command
EP4124932A1 (en) System, apparatus and methods for power communications according to a cxl.power protocol
CN106844263B (zh) 一种基于可配置的多处理器计算机系统及实现方法
CN111488308B (zh) 一种支持不同架构多处理器扩展的系统和方法
CN109426566B (zh) 使用交换机来连接加速器资源
WO2021082990A1 (zh) 基于pcie总线的多芯片互联系统
WO2013097394A1 (zh) 一种多处理器共享存储方法及系统
US20230403232A1 (en) Data Transmission System and Method, and Related Device
WO2024037239A1 (zh) 一种加速器调度方法及相关装置
WO2023186143A1 (zh) 一种数据处理方法、主机及相关设备
CN111190840A (zh) 基于现场可编程门阵列控制的多方中央处理器通信架构
US20210004658A1 (en) System and method for provisioning of artificial intelligence accelerator (aia) resources
WO2022007644A1 (zh) 多处理器系统及配置多处理器系统的方法
WO2022178675A1 (zh) 一种互联系统、数据传输方法以及芯片
WO2022141322A1 (zh) 一种片上系统及相关方法
US11061838B1 (en) System and method for graphics processing unit management infrastructure for real time data collection
US10938875B2 (en) Multi-processor/endpoint data duplicating system
TW202315360A (zh) 微服務分配方法、電子設備及儲存介質
CN114021715A (zh) 基于Tensorflow框架的深度学习训练方法

Legal Events

Date Code Title Description
121 Ep: the epo has been informed by wipo that ep was designated in this application

Ref document number: 20880510

Country of ref document: EP

Kind code of ref document: A1

NENP Non-entry into the national phase

Ref country code: DE

ENP Entry into the national phase

Ref document number: 2020880510

Country of ref document: EP

Effective date: 20220531