WO2020107460A1 - 运算方法、芯片、系统、可读存储介质及计算机程序产品 - Google Patents

运算方法、芯片、系统、可读存储介质及计算机程序产品 Download PDF

Info

Publication number
WO2020107460A1
WO2020107460A1 PCT/CN2018/118723 CN2018118723W WO2020107460A1 WO 2020107460 A1 WO2020107460 A1 WO 2020107460A1 CN 2018118723 W CN2018118723 W CN 2018118723W WO 2020107460 A1 WO2020107460 A1 WO 2020107460A1
Authority
WO
WIPO (PCT)
Prior art keywords
chip
arithmetic
computing
link
data
Prior art date
Application number
PCT/CN2018/118723
Other languages
English (en)
French (fr)
Inventor
范靖
侯洁
王虓
乔伟
Original Assignee
北京比特大陆科技有限公司
Priority date (The priority date is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the date listed.)
Filing date
Publication date
Application filed by 北京比特大陆科技有限公司 filed Critical 北京比特大陆科技有限公司
Priority to CN201880015943.8A priority Critical patent/CN110770712B/zh
Priority to PCT/CN2018/118723 priority patent/WO2020107460A1/zh
Publication of WO2020107460A1 publication Critical patent/WO2020107460A1/zh

Links

Images

Classifications

    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F15/00Digital computers in general; Data processing equipment in general
    • G06F15/76Architectures of general purpose stored program computers
    • G06F15/78Architectures of general purpose stored program computers comprising a single central processing unit
    • G06F15/7828Architectures of general purpose stored program computers comprising a single central processing unit without memory
    • G06F15/7835Architectures of general purpose stored program computers comprising a single central processing unit without memory on more than one IC chip
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F1/00Details not covered by groups G06F3/00 - G06F13/00 and G06F21/00
    • G06F1/04Generating or distributing clock signals or signals derived directly therefrom
    • YGENERAL TAGGING OF NEW TECHNOLOGICAL DEVELOPMENTS; GENERAL TAGGING OF CROSS-SECTIONAL TECHNOLOGIES SPANNING OVER SEVERAL SECTIONS OF THE IPC; TECHNICAL SUBJECTS COVERED BY FORMER USPC CROSS-REFERENCE ART COLLECTIONS [XRACs] AND DIGESTS
    • Y02TECHNOLOGIES OR APPLICATIONS FOR MITIGATION OR ADAPTATION AGAINST CLIMATE CHANGE
    • Y02DCLIMATE CHANGE MITIGATION TECHNOLOGIES IN INFORMATION AND COMMUNICATION TECHNOLOGIES [ICT], I.E. INFORMATION AND COMMUNICATION TECHNOLOGIES AIMING AT THE REDUCTION OF THEIR OWN ENERGY USE
    • Y02D10/00Energy efficient computing, e.g. low power processors, power management or thermal management

Definitions

  • This application relates to the field of electronic technology, for example, to an arithmetic method, chip, system, readable storage medium, and computer program product.
  • PCIe chip is an arithmetic chip that can provide high-speed operations for computer equipment or functional hardware equipment.
  • PCIe expansion converters are generally connected to the slots to allow multiple PCIe chips to access the device motherboard through the PCIe expansion converter.
  • the computing system composed of such multiple computing chips will make the effective bandwidth of each PCIe chip limited by the performance of the PCIe expansion converter, which puts forward higher requirements on the performance of the PCIe expansion converter and is not conducive to use.
  • An embodiment of the present disclosure provides an operation method based on an operation system of a multi-operation chip.
  • the operation system includes: an operation chip link formed by a plurality of operation chips connected in series, and a host at the source of the link The main board electrically connected to the control operation chip; in the operation chip link, two adjacent operation chips are connected to the same independent clock signal source;
  • the computing method includes:
  • the operation data to be processed is sent by the main board to the main control operation chip and connected in series via the operation chip in the operation chip link It is forwarded to any arithmetic chip in turn.
  • An embodiment of the present disclosure provides an arithmetic chip, including: a memory, a processor connected to the memory, and a computer program stored on the memory and executable on the processor, characterized in that:
  • the processor executes the calculation method described above when running the computer program.
  • An embodiment of the present disclosure provides an arithmetic system, including: a motherboard and a plurality of the aforementioned arithmetic chips;
  • the plurality of arithmetic chips are connected in series to form an arithmetic chip link;
  • the link source of the arithmetic chip link includes a main control arithmetic chip, and the main control arithmetic chip is electrically connected to the main board;
  • the arithmetic chip chain In the circuit, any two adjacent arithmetic chips are connected to the same independent clock signal source, and receive the clock signal of the independent clock signal source.
  • An embodiment of the present disclosure provides a readable storage medium, including a program, which when executed on an arithmetic chip, causes the arithmetic chip to execute the foregoing arithmetic method.
  • An embodiment of the present disclosure provides a computer program product.
  • the computer program product includes a computer program stored on a readable storage medium.
  • the computer program includes program instructions. When the program instructions are executed by a computer, the computer program product The computer performs the aforementioned calculation method.
  • FIG. 1 is a schematic structural diagram of an arithmetic system provided by an embodiment of the present disclosure
  • FIG. 2 is a schematic flowchart of an operation method provided by an embodiment of the present disclosure
  • FIG. 3 is a schematic flowchart of another calculation method provided by an embodiment of the present disclosure.
  • FIG. 5 is a schematic diagram of a hardware structure of an arithmetic chip provided by an embodiment of the present disclosure.
  • the present disclosure provides an arithmetic method, chip, system, readable storage medium, and computer program product to effectively increase the effective bandwidth of each PCIe chip without increasing the cost to meet the high Hardware requirements for computing performance.
  • FIG. 1 is a schematic structural diagram of an operation system provided by an embodiment of the present disclosure. As shown in FIG. 1, the operation system provided by the embodiment of the present disclosure includes a motherboard and a plurality of operation chips.
  • a plurality of arithmetic chips are connected in series to form an arithmetic chip link.
  • the link source of the arithmetic chip link includes a main control arithmetic chip, which is electrically connected to the main board of the server or the host.
  • any two adjacent computing chips are connected to the same independent clock signal source and receive the clock signal of the independent clock signal source.
  • the computing chip in the computing system of the present disclosure may specifically be a PCIe chip or other types of chips, and the motherboard therein may specifically be a motherboard of a server host or a CPU motherboard of a desktop computer.
  • the prior art adopts that multiple arithmetic chips are respectively connected to the PCIe expansion converter, and interact with the mainboard through the PCIe expansion converter.
  • the use of such a hardware connection will limit the effective bandwidth of each computing chip and the performance of the PCIe expansion converter. If the computing power of each computing chip needs to be increased, the performance of the PCIe expansion converter is proposed more. High requirements are not conducive to use.
  • the computing system proposed in this disclosure first serially connects the computing chips in series to obtain a computing chip link formed by multiple computing chips, as shown in FIG. 1
  • the operation chip link composed of 4 operation chips is taken as an example.
  • Each operation chip includes a master port and a slave port.
  • the master port and the slave port of each operation chip are connected in series in order to obtain an operation chip 1 and an operation chip. 2.
  • the arithmetic chip link composed of the arithmetic chip 3 and the arithmetic chip 4, wherein the arithmetic chip 1 as the main control arithmetic chip of the arithmetic chip link is electrically connected to the aforementioned main board to realize the connection between the arithmetic chip link and the main board Data exchange and signal transmission.
  • the computing system further includes a plurality of independent clock signal source crystal oscillators, and each of the independent clock signal source crystal oscillators includes a clock signal source output terminal, and the output terminal is in any link with the computing chip link.
  • Two adjacent arithmetic chips are connected to provide independent clock signals for the two arithmetic chips
  • the main computing chip namely the slave port of the computing chip 1
  • the main computing chip will be connected to the motherboard and receive the motherboard clock signal from the motherboard to trigger the function of the entire computing chip link.
  • the main computing chip namely the slave port of the computing chip 1
  • they will also be connected to the same independent clock signal source crystal oscillator.
  • arithmetic chip 1 and arithmetic chip 2 are connected to independent clock signal source crystal oscillator 1, or arithmetic chip 2 and arithmetic chip 3 are connected together.
  • the arithmetic chip will receive the clock signal sent from the independent clock signal source crystal oscillator, and under the trigger of the clock signal, respond to requests or instructions for processing and forwarding data.
  • an arithmetic chip can receive independent signals sent from two different independent clock source crystal oscillators and be triggered by the two independent signals.
  • the aforementioned individual clock signal source crystal oscillators may specifically provide differential clock signals for each arithmetic chip, and the signal frequency may be 100 MHz.
  • the calculation method on which the above calculation system is based may specifically adopt the manner described in the following implementation, and specifically refer to the following example.
  • FIG. 2 is a schematic flowchart of an operation method provided by an embodiment of the present disclosure.
  • the calculation method includes:
  • Step 101 Receive a clock signal sent by an independent clock signal source crystal oscillator.
  • Step 102 Process or forward the received operation data to be processed according to the clock signal; wherein, the operation data to be processed is sent by the main board to the main control operation chip and connected in series via an operation chip link Each arithmetic chip is sequentially forwarded to any arithmetic chip.
  • the calculation method provided in this embodiment is directed to the aforementioned calculation system, where the calculation system at least includes a calculation chip link formed by a plurality of calculation chips connected in series, and is electrically connected to a main control calculation chip located at the source of the link The main board; Among them, in the chain of computing chips, two adjacent computing chips are connected to the same independent clock signal source crystal oscillator.
  • the hardware structure on which this embodiment is based is the aforementioned computing system.
  • the hardware architecture conditions for communication between the computing chips are satisfied.
  • the hardware architecture for any computing chip in the computing chip link, it will receive a clock signal sent by an independent clock signal source crystal oscillator, and process the received computing data to be processed according to the clock signal or Forwarding; wherein, the to-be-processed arithmetic data is sent by the main board to the main control arithmetic chip, and is sequentially forwarded to any arithmetic chip via each arithmetic chip connected in series in the arithmetic chip link.
  • each arithmetic chip stores address logic of the arithmetic chip link; the to-be-processed arithmetic data includes at least a target arithmetic chip to execute the to-be-processed arithmetic data Address and corresponding operation data. Therefore, when any arithmetic chip receives the clock signal sent by the independent clock signal source crystal oscillator, it will judge the received arithmetic data forwarded by the superior arithmetic chip in the chain of the arithmetic chip, according to the judgment result Determine whether to process or forward the to-be-processed arithmetic data.
  • the operation chip will determine whether the operation main body of the operation data is itself according to the stored address logic and the target operation chip address in the operation data to be processed; if it is, the operation logic is called to process the operation data; if not, Then, the to-be-processed arithmetic data is forwarded to a lower-level arithmetic chip in the arithmetic chip link.
  • the computing chip 1 can receive the data to be processed sent from the motherboard, and after being triggered by the clock signal, it will be based on the target computing chip address in the data to be processed, and pre-stored in its own computing chip Address logic to determine whether the target arithmetic chip address is the same as the address logic of its own arithmetic chip 1, if it is the same, arithmetic chip 1 will call the arithmetic logic in the main body of the arithmetic chip to process the corresponding arithmetic data; if not, Then, the arithmetic chip 1 sends the to-be-processed arithmetic data to the slave port of the arithmetic chip 2 through its master port, so that the slave port of the arithmetic chip 2 executes the foregoing determination step.
  • the data to be processed is generated by the motherboard according to the task to be processed, and the target computing chip address is determined by the motherboard according to the computing power of each computing chip and the position in the entire computing chip link.
  • the arithmetic chip located at the end of the arithmetic chip link will give priority to the processing of the arithmetic data to be processed.
  • the motherboard sets the target computing chip address of the first data to be processed to the computing chip 4
  • the chip address, and the target computing chip address device of the second data to be processed is the chip address of the computing chip 3, and so on, so as to further optimize the computing power and increase the computing speed of the entire computing chip link.
  • the data exchange within the link of the calculation chip can be realized by the address logic between the calculation chips without passing through the host, which saves the calculation resources of the host.
  • each arithmetic chip performs data processing based on the clock signal sent by the independent clock signal source crystal oscillator, the effective bandwidth of each arithmetic chip is improved, thereby improving the entire arithmetic system and computing power, and saving costs.
  • FIG. 3 is a schematic flowchart of another calculation method provided by the disclosure.
  • the calculation method includes:
  • Step 201 Receive a clock signal sent by an independent clock signal source crystal oscillator.
  • Step 202 Receive to-be-processed operation data forwarded by a superior operation chip in the operation chip link.
  • Step 203 According to the stored address logic and the target operation chip address in the operation data to be processed, determine whether the operation subject of the operation data is itself;
  • step 204 If yes, go to step 204; if no, go to step 206.
  • Step 204 Invoke operation logic to process the operation data and generate processing result data.
  • Step 205 Forward the processing result data to a superior computing chip in the computing chip link for the superior computing chip to forward the processing result data until the processing result data is forwarded to all
  • the main control arithmetic chip is described and stored by the main control arithmetic chip.
  • Step 206 Forward the to-be-processed operation data to a lower-level operation chip in the operation chip link.
  • Step 207 Receive the processing result data initiated by the lower-level arithmetic chip in the arithmetic chip link.
  • the processing result data includes at least: the address of the target operation chip that receives the processing result data, and the corresponding result data.
  • Step 208 Determine, according to the address logic, whether the target arithmetic chip address in the processing result data is the address of the own arithmetic chip; if yes, store the result data; if not, forward the result data to The target computing chip in the computing chip link.
  • steps 201 to 203 are similar to the embodiment shown in FIG. 2, and details are not described herein again.
  • the operation chip when it is determined that the operation body of the operation data is itself, the operation chip will call the operation logic of the operation chip body to process the operation data and generate the processing result data. Subsequently, the arithmetic chip also forwards the processing result data to the superior arithmetic chip in the arithmetic chip link for the superior arithmetic chip to forward the processing result data until the processing result data is sequentially forwarded to Target operation chip.
  • the operation chip forwards the operation data to be processed to the lower-level operation chip in the operation chip link until it reaches the operation chip corresponding to the address of the target operation chip.
  • the generated processing result data will be sequentially uploaded along the arithmetic chip link until it reaches the arithmetic chip that executes the method flow, that is, the lower arithmetic chip in the receiving arithmetic chip link Initiated processing result data, wherein the processing result data includes at least: a target operation chip address that receives the processing result data, and corresponding result data.
  • the operation chip will determine whether the target operation chip address in the processing result data is the address of the own operation chip according to the address logic; if it is, store the result data; if not, store the result data Forwarded to the superior computing chip in the computing chip link to the target computing chip.
  • FIG. 4 is a schematic flowchart of still another arithmetic method provided by the present disclosure.
  • the calculation method includes:
  • Step 301 Receive a clock signal sent by an independent clock signal source crystal oscillator.
  • Step 302 Receive to-be-processed operation data forwarded by a superior operation chip in the operation chip link;
  • Step 303 According to the stored address logic and the target operation chip address in the operation data to be processed, determine whether the operation subject of the operation data is itself;
  • step 304 If yes, go to step 304; if no, go to step 307.
  • Step 304 Call operation logic to process the operation data
  • Step 305 Generate an interrupt request, and send the interrupt request to a superior computing chip in the computing chip link for the superior computing chip to process the interrupt request.
  • Step 306 Receive and execute an interrupt response sent by the upper-level arithmetic chip, and the interrupt response is that the upper-level arithmetic chip calls arithmetic logic to process the interrupt request after receiving the interrupt request and generates and sends the interrupt response.
  • Step 307 Forward the to-be-processed operation data to a lower-level operation chip in the operation chip link.
  • steps 301 to 303 are similar to the embodiment shown in FIG. 2, and details are not described herein again.
  • the operation chip when it is determined that the operation body of the operation data is itself, the operation chip will call the operation logic of the operation chip body to process the operation data and generate the processing result data. Then, after the arithmetic chip finishes processing the arithmetic data, the arithmetic chip will generate an interrupt request and send the interrupt request to its superior arithmetic chip for its superior arithmetic chip to process the interrupt request. In order to stop the operation chip and enter the standby state. Afterwards, the superior arithmetic chip will process the interrupt request according to the preset processing logic, generate an interrupt response and return to the arithmetic chip. The computing chip will receive the interrupt response, execute and perform the standby state or stop the operation, so as to save the electrical loss of the entire computing system.
  • the arithmetic chip only generates the interrupt request after completing the processing of the arithmetic data and determining that all the subordinate arithmetic chips of its arithmetic chip link have completed the processing of the arithmetic data, and Enter the standby state or stop running, this is because the present disclosure adopts a hardware architecture in which multiple arithmetic chips are connected in series, data or signals will be transferred in sequence to each arithmetic chip along the link, once the arithmetic chip in the middle of the link enters the standby state Or stop the operation, all the operation chips downstream of the operation chip will not be able to receive more data or signals, that is to say, in order to ensure the normal operation of the entire link, further, the operation chip in this embodiment completes the operation After processing the data and determining that all the lower-level arithmetic chips of the arithmetic chip link have completed the processing of the arithmetic data, an interrupt request is generated and sent
  • a process for generating operation result data may also be included.
  • steps 204-208 in the foregoing example which is not repeated here.
  • the data exchange within the link of the calculation chip can be realized by the address logic between the calculation chips without passing through the host, which saves the calculation resources of the host.
  • each arithmetic chip performs data processing based on the clock signal sent by the independent clock signal source crystal oscillator, the effective bandwidth of each arithmetic chip is improved, thereby improving the entire arithmetic system and computing power, and saving costs.
  • the arithmetic chip includes: a memory 41, a processor 42 connected to the memory 41, and stored on the memory 41 and A computer program that can be run on the processor 42 is characterized in that the processor 42 executes the foregoing calculation method when the computer program is run.
  • An embodiment of the present disclosure also provides a readable storage medium that stores computer-executable instructions, and the computer-executable instructions are configured to perform the above calculation method.
  • An embodiment of the present disclosure also provides a computer program product.
  • the computer program product includes a computer program stored on a computer-readable storage medium.
  • the computer program includes program instructions. When the program instructions are executed by a computer, the The computer executes the above calculation method.
  • the above-mentioned readable storage medium may be a transient computer-readable storage medium or a non-transitory computer-readable storage medium.
  • the technical solutions of the embodiments of the present disclosure may be embodied in the form of software products, which are stored in a storage medium and include one or more instructions to make a computer device (which may be a personal computer, server, or network) Equipment, etc.) to perform all or part of the steps of the method described in the embodiments of the present disclosure.
  • the aforementioned storage medium may be a non-transitory storage medium, including: U disk, mobile hard disk, read-only memory (ROM, Read-Only Memory), random access memory (RAM, Random Access Memory), magnetic disk or optical disk, etc.
  • a medium that can store program codes may also be a transient storage medium.
  • first, second, etc. may be used in this application to describe various elements, these elements should not be limited by these terms. These terms are only used to distinguish one element from another.
  • the first element can be called the second element, and likewise, the second element can be called the first element, as long as all occurrences of the "first element” are consistently renamed and all occurrences of The “second component” can be renamed consistently.
  • the first element and the second element are both elements, but they may not be the same element.
  • the various aspects, implementations, implementations or features in the described embodiments can be used alone or in any combination.
  • Various aspects in the described embodiments may be implemented by software, hardware, or a combination of software and hardware.
  • the described embodiments may also be embodied by a computer-readable medium that stores computer-readable code including instructions executable by at least one computing device.
  • the computer-readable medium can be associated with any data storage device capable of storing data, which can be read by a computer system.
  • Computer-readable media used for examples may include read-only memory, random access memory, CD-ROM, HDD, DVD, magnetic tape, optical data storage devices, and the like.
  • the computer-readable medium may also be distributed in computer systems connected through a network, so that computer-readable codes can be stored and executed in a distributed manner.

Landscapes

  • Engineering & Computer Science (AREA)
  • Theoretical Computer Science (AREA)
  • Computer Hardware Design (AREA)
  • Physics & Mathematics (AREA)
  • General Engineering & Computer Science (AREA)
  • General Physics & Mathematics (AREA)
  • Advance Control (AREA)

Abstract

本公开提供的运算方法、芯片、系统、可读存储介质及计算机程序产品,能够使运算芯片之间通过地址逻辑实现运算芯片链路内部的数据交互,无需通过主机,节省了主机的运算资源。同时,由于各运算芯片基于独立时钟信号源晶振发送的时钟信号执行数据处理,因此对于每一运算芯片的有效带宽均得到了提升,进而提高了整个运算系统和算力,节约了成本。

Description

运算方法、芯片、系统、可读存储介质及计算机程序产品 技术领域
本申请涉及电子技术领域,例如涉及一种运算方法、芯片、系统、可读存储介质及计算机程序产品。
背景技术
随着信息技术和互联网的快速发展,人们对电脑设备或功能性硬件设备的要求越来越高。高速串行计算机扩展总线标准(peripheral component interconnect express,简称PCIe)芯片是一种可为电脑设备或功能型硬件设备提供高速运算的运算芯片。
在现有的设备主板上一般会设置有1-2个PCIe芯片专用插槽以用于接入PCIe芯片。在需要具备高运算性能的设备上,为了满足运算性能的需求,一般会在插槽上接入PCIe扩展转换器以使多个PCIe芯片通过PCIe扩展转换器接入设备主板。
但是这样的多运算芯片所组成的运算系统会使得每个PCIe芯片的有效带宽受到PCIe扩展转换器性能的限制,对PCIe扩展转换器自身性能提出了较高的要求,不利于使用。
上述背景技术内容仅用于帮助理解本申请,而并不代表承认或认可所提及的任何内容属于相对于本申请的公知常识的一部分。
发明内容
本公开实施例提供了一种运算方法,所述运算方法基于多运算芯片的运算系统,所述运算系统包括:由多个运算芯片串联形成的运算芯片链路,以及与位于链路源头的主控运算芯片电连接的主板;其中,在运算芯片链路中,相邻的两个运算芯片接入同一独立时钟信号源;
针对于所述运算芯片链路中的任一运算芯片,所述运算方法包括:
接收独立时钟信号源发送的时钟信号;
根据所述时钟信号对接收的待处理运算数据进行处理或转发;其中,所述待处理运算数据是所述主板发送给所述主控运算芯片,并经由运算芯片链路中串联的各运算芯片依次转发至该任一运算芯片的。
本公开实施例提供了一种运算芯片,包括:存储器、与所述存储器连接的处理器,及存储在所述存储器上并可在所述处理器上运行的计算机程序,其特征在于,
所述处理器运行所述计算机程序时执行上述所述的运算方法。
本公开实施例提供了一种运算系统,包括:主板和多个前述的运算芯片;
其中,所述多个运算芯片串联连接组成运算芯片链路;所述运算芯片链路的链路源头包括主控运算芯片,所述主控运算芯片与所述主板电连接;所述运算芯片链路中,任意相邻的两个运算芯片均连入同一独立时钟信号源,并接收该独立时钟信号源的时钟信号。
本公开实施例提供了一种可读存储介质,包括程序,当其在运算芯片上运行时,使得运算芯片执行前述的运算方法。
本公开实施例提供了一种计算机程序产品,所述计算机程序产品包括存储在可读存储介质上的计算机程序,所述计算机程序包括程序指令,当所述程序指令被计算机执行时,使所述计算机执行前述的运算方法。
附图说明
为了能够更加详尽地了解本公开实施例的特点与技术内容,下面结合附图对本公开实施例的实现进行详细阐述,所附附图仅供参考说明之用,并非用来限定本公开实施例。在以下的技术描述中,为方便解释起见,通过多个细节以提供对所披露实施例的充分理解。然而,在没有这些细节的情况下,一个或多个实施例仍然可以实施。在其它情况下,为简化附图,熟知的结构和装置可以简化展示。
图1为本公开实施例提供的一种运算系统的结构示意图;
图2为本公开实施例提供的一种运算方法的流程示意图;
图3为本公开实施例提供的另一种运算方法的流程示意图;
图4为本公开实施例提供的又一种运算方法的流程示意图;
图5为本公开实施例提供的运算芯片的硬件结构示意图。
具体实施方式
为了能够更加详尽地了解本公开实施例的特点与技术内容,下面结合附图对本公开实施例的实现进行详细阐述,所附附图仅供参考说明之用,并非用来限定本公开实施例。在以下的技术描述中,为方便解释起见,通过多个细节以提供对所披露实施例的充分理解。然而,在没有这些细节的情况下,一个或多个实施例仍然可以实施。在其它情况下,为简化附图,熟知的结构和装置可以简化展示。
如前所述的,本公开提供了一种运算方法、芯片、系统、可读存储介质及计算机程序产品,以在不提高成本的情况下,有效提高了每个PCIe芯片的有效带宽,满足高运算性能的硬件需求。
图1为本公开实施例提供的一种运算系统的结构示意图,如图1所示的,在本公开实施例提供的运算系统中包括有主板和多个运算芯片。
其中,多个运算芯片串联连接组成运算芯片链路,在运算芯片链路的链路源头包括一主控运算芯片,该主控运算芯片与服务器或主机的主板电连接。
在运算芯片链路中,任意相邻的两个运算芯片均连入同一独立时钟信号源,并接收该独立时钟信号源的时钟信号。
需要说明的是,本公开的运算系统中的运算芯片具体可为PCIe芯片,也可为其他类型芯片,其中的主板具体可为服务器主机的主板,也可为台式电脑的CPU主板等。
一般来说,为了扩展运算芯片数量,现有技术是采用将多个运算芯片分别接入PCIe扩展转换器中,并通过PCIe扩展转换器与主板进行交互。但是,采用这样的硬件连接方式,会使得每一运算芯片的有效带宽受限与PCIe扩展转换器的性能,如需增加每一运算芯片的算力,则对PCIe扩展转换器的性能提出了更高的要求,不利于使用。
而在本公开中,与现有技术不同的是,本公开所提出的运算系统首先将运算芯片进行逐级串联,以获得由多个运算芯片形成的运算芯片链路,以图1所示的由4个运算芯片组成的运算芯片链路为例,在各运算芯片中包括有主端口和从端口,通过各运算芯片的主端口与从端口的依次串联,获得一以运算芯片1、 运算芯片2、运算芯片3和运算芯片4组成的运算芯片链路,其中运算芯片1作为运算芯片链路的源头的主控运算芯片与前述的主板进行电连接,以实现运算芯片链路与主板之间的数据交互和信号传递。
在可选的实施例中,运算系统中还包括有多个独立时钟信号源晶振,每个所述独立时钟信号源晶振包括时钟信号源输出端,该输出端与所述运算芯片链路中任意相邻的两个运算芯片相连,以为该两个运算芯片提供独立时钟信号
在本运算系统中,主运算芯片,即运算芯片1的从端口将与主板连接,接收来自主板的主板时钟信号,以触发整个运算芯片链路的功能。而针对于任意相邻的两个运算芯片还将连入同一独立时钟信号源晶振,如运算芯片1和运算芯片2共同连入独立时钟信号源晶振1,还如运算芯片2和运算芯片3共同连入独立时钟信号源晶振2。通过与独立时钟信号源晶振的相连,运算芯片将接收到来自该独立时钟信号源晶振所发送的时钟信号,并在该时钟信号的触发下,响应对数据进行处理、转发等的请求或指令。需要说明的是,针对于一个运算芯片来说,其可接收到来自两个不同的独立时钟源晶振发送的独立信号,并被该两个独立信号触发。前述的各独立时钟信号源晶振具体可为各运算芯片提供差分时钟信号,而信号频率可为100MHz。上述运算系统所基于的运算方法具体可采用如下实施所述的方式,具体参见如下实例。
通过采用如上所述的运算系统的硬件架构,能够有效解决现有技术中由于使用了PCIe扩展转换器而造成的运算芯片有效带宽的上限,利于为更高的带宽和更高的算力提供硬件基础。
本公开实施例还提供了一种基于前述运算系统的运算方法,图2为本公开实施例提供的一种运算方法的流程示意图。
如图2所示,该运算方法包括:
步骤101、接收独立时钟信号源晶振发送的时钟信号。
步骤102、根据所述时钟信号对接收的待处理运算数据进行处理或转发;其中,所述待处理运算数据是所述主板发送给所述主控运算芯片,并经由运算芯片链路中串联的各运算芯片依次转发至该任一运算芯片的。
具体来说,本实施例提供的运算方法针对于前述的运算系统,其中的运算系统至少包括由多个运算芯片串联形成的运算芯片链路,以及与位于链路源头的主控运算芯片电连接的主板;其中,在运算芯片链路中,相邻的两个运算芯 片接入同一独立时钟信号源晶振。
在现有技术中采用的是将多个运算芯片分别接入PCIe扩展转换器中,并通过PCIe扩展转换器与主板进行交互的运算方式,因此,现有的运算方法中需要PCIe扩展转换器获知连入的每个运算芯片的地址,并实现对于主板和各运算芯片之间的数据转发和信号传输。也就是说,在现有技术中,由于硬件架构的限制,运算芯片之间是无法实现数据交互和信号传输的。
而本实施例所基于的硬件结构为前述的运算系统,在该运算系统中,由于各运算芯片串联形成运算芯片链路,满足了运算芯片之间通信的硬件架构条件。此外,为了配合硬件架构,针对运算芯片链路中的任意一运算芯片来说,其将接收独立时钟信号源晶振发送的时钟信号,并根据所述时钟信号对接收的待处理运算数据进行处理或转发;其中,所述待处理运算数据是所述主板发送给所述主控运算芯片,并经由运算芯片链路中串联的各运算芯片依次转发至该任一运算芯片的。
具体的,为了实现对于接收数据的运算芯片的准确定位,各运算芯片中存储有运算芯片链路的地址逻辑;所述待处理运算数据中至少包括有执行所述待处理运算数据的目标运算芯片地址和相应的运算数据。因此,当任一运算芯片接收独立时钟信号源晶振发送的时钟信号时,其将对接收到的由所述运算芯片链路中的上级运算芯片转发的待处理运算数据进行判断,以根据判断结果确定对该待处理运算数据进行处理还是转发。即,该运算芯片将根据存储的地址逻辑和待处理运算数据中的目标运算芯片地址,判断运算数据的运算主体是否为自身;若是,则调用运算逻辑对所述运算数据进行处理;若否,则将所述待处理运算数据转发至所述运算芯片链路中的下级运算芯片中。
例如,针对于运算芯片1来说,其可接收到来自于主板发送的待处理数据,在被时钟信号触发后,其将根据待处理数据中的目标运算芯片地址,和预存在自身运算芯片中的地址逻辑,判断该目标运算芯片地址是否与自身的运算芯片1的地址逻辑相同,若相同,则运算芯片1将调用运算芯片主体中的运算逻辑对相应的运算数据进行处理;若不相同,则运算芯片1将该待处理运算数据通过其主端口发送至运算芯片2的从端口,以供运算芯片2的从端口执行前述的判断步骤。
需要说明的是,待处理数据为主板根据待处理任务生成的,其中的目标运算芯片地址为主板根据各运算芯片的算力以及在整个运算芯片链路中的位置确 定的。一般的,位于运算芯片链路末端的运算芯片将优先执行待处理运算数据的处理。也就是说,若某一待处理任务被划分为4个待处理数据,由每个运算芯片处理一个待处理数据,那么主板将第一个待处理数据的目标运算芯片地址设置为运算芯片4的芯片地址,而将第二个待处理数据的目标运算芯片地址设备为运算芯片3的芯片地址,依次类推,从而进一步优化算力,提高整个运算芯片链路的运算速度。
通过采用上述的运算方法,能够使运算芯片之间通过地址逻辑实现运算芯片链路内部的数据交互,无需通过主机,节省了主机的运算资源。同时,由于各运算芯片基于独立时钟信号源晶振发送的时钟信号执行数据处理,因此对于每一运算芯片的有效带宽均得到了提升,进而提高了整个运算系统和算力,节约了成本。
为了进一步描述本公开实施例提供的运算方法,在前述的运算方法的基础上,图3为本公开提供的另一种运算方法的流程示意图。
如图3所示,该运算方法包括:
步骤201、接收独立时钟信号源晶振发送的时钟信号。
步骤202、接收由所述运算芯片链路中的上级运算芯片转发的待处理运算数据。
步骤203、根据存储的地址逻辑和待处理运算数据中的目标运算芯片地址,判断所述运算数据的运算主体是否为自身;
若是,则执行步骤204;若否,则执行步骤206。
步骤204、调用运算逻辑对所述运算数据进行处理,并生成处理结果数据。
步骤205、将所述处理结果数据转发至所述运算芯片链路中的上级运算芯片中,以供所述上级运算芯片对所述处理结果数据进行转发直至将所述处理结果数据依次转发至所述主控运算芯片,并由主控运算芯片存储。
步骤206、将所述待处理运算数据转发至所述运算芯片链路中的下级运算芯片中。
步骤207、接收运算芯片链路中的下级运算芯片发起的处理结果数据。
其中,所述处理结果数据至少包括:接收所述处理结果数据的目标运算芯片地址,以及相应的结果数据。
步骤208、根据所述地址逻辑判断所述处理结果数据中的所述目标运算芯片地址是否为自身运算芯片地址;若是,则将所述结果数据存储;若否,则将所 述结果数据转发至运算芯片链路中的目标运算芯片。
在图3所示实施例中,步骤201-步骤203的具体实施方式与图2所示实施例中类似,在此不进行赘述。
与前述实施例不同的是,图3所示实施例中,当确定出运算数据的运算主体为自身时,运算芯片将调用运算芯片主体的运算逻辑对运算数据进行处理,并生成处理结果数据。随后,该运算芯片还将处理结果数据转发至所述运算芯片链路中的上级运算芯片中,以供所述上级运算芯片对所述处理结果数据进行转发直至将所述处理结果数据依次转发至目标运算芯片。
当确定出运算数据的运算主体不为自身时,该运算芯片将待处理运算数据转发至所述运算芯片链路中的下级运算芯片中,直至到达与目标运算芯片地址相应的运算芯片,当该相应的运算芯片完成对于待处理运算数据的处理后,会将生成的处理结果数据沿运算芯片链路依次上传,直至到达执行本方法流程的运算芯片,即接收运算芯片链路中的下级运算芯片发起的处理结果数据,其中,所述处理结果数据至少包括:接收所述处理结果数据的目标运算芯片地址,以及相应的结果数据。然后,运算芯片将根据所述地址逻辑判断所述处理结果数据中的所述目标运算芯片地址是否为自身运算芯片地址;若是,则将所述结果数据存储;若否,则将所述结果数据转发至运算芯片链路中的上级运算芯片直至目标运算芯片。
此外,为了进一步描述本公开实施例提供的运算方法,在前述的运算方法的基础上,图4为本公开提供的又一种运算方法的流程示意图。
如图4所示,该运算方法包括:
步骤301、接收独立时钟信号源晶振发送的时钟信号。
步骤302、接收由所述运算芯片链路中的上级运算芯片转发的待处理运算数据;
步骤303、根据存储的地址逻辑和待处理运算数据中的目标运算芯片地址,判断所述运算数据的运算主体是否为自身;
若是,则执行步骤304;若否,则执行步骤307。
步骤304、调用运算逻辑对所述运算数据进行处理;
步骤305、生成中断请求,将所述中断请求发送至所述运算芯片链路中的上级运算芯片中,以供所述上级运算芯片对所述中断请求进行处理。
步骤306、接收并执行由所述上级运算芯片发送的中断响应,所述中断响应 是所述上级运算芯片在接收到所述中断请求之后调用运算逻辑对所述中断请求进行处理并生成发送的。
步骤307、将所述待处理运算数据转发至所述运算芯片链路中的下级运算芯片中。
在图4所示实施例中,步骤301-步骤303的具体实施方式与图2所示实施例中类似,在此不进行赘述。
与前述实施例不同的是,图4所示实施例中,当确定出运算数据的运算主体为自身时,运算芯片将调用运算芯片主体的运算逻辑对运算数据进行处理,并生成处理结果数据。然后,在该运算芯片完成对于运算数据的处理之后,该运算芯片将生成中断请求,并将该中断请求发送至其上级运算芯片,以供其上级运算芯片对中断请求进行处理,该中断请求用于使该运算芯片停止运行并进入待机状态。再后,上级运算芯片将会根据预设处理逻辑,对该中断请求进行处理,生成中断响应并返回至该运算芯片。而该运算芯片将接收到中断响应,执行并进行待机状态或停止运行,以节约整个运算系统的电损耗。
需要说明的是,在本公开的实施例中,运算芯片仅完成对于运算数据的处理且确定其运算芯片链路的全部下级运算芯片均完成对于运算数据的处理之后,才会生成中断请求,并进入待机状态或停止运行,这是由于本公开采用了将多个运算芯片串联的硬件架构,数据或信号将在沿链路的各个运算芯片依次传递,一旦位于链路中游的运算芯片进入待机状态或停止运行,该运算芯片下游的全部运算芯片将无法接收到更多的数据或信号,也就是说,为了保证整个链路的正常运行,进一步的,本实施例中的运算芯片在完成对于运算数据的处理且确定其运算芯片链路的全部下级运算芯片均完成对于运算数据的处理之后,生成中断请求并发送至上级运算芯片。
此外,在本实施例中,还可包括对于生成运算结果数据的流程,具体可参见前述实例中的步骤204-208,本实例在此不进行赘述。
通过采用上述的运算方法,能够使运算芯片之间通过地址逻辑实现运算芯片链路内部的数据交互,无需通过主机,节省了主机的运算资源。同时,由于各运算芯片基于独立时钟信号源晶振发送的时钟信号执行数据处理,因此对于每一运算芯片的有效带宽均得到了提升,进而提高了整个运算系统和算力,节约了成本。
图5为本公开提供的一种运算芯片的硬件结构示意图,如图5所示,该运算芯片包括:存储器41、与所述存储器41连接的处理器42,及存储在所述存储器41上并可在所述处理器42上运行的计算机程序,其特征在于,所述处理器42运行所述计算机程序时执行前述的运算方法。
本公开实施例还提供了一种可读存储介质,存储有计算机可执行指令,所述计算机可执行指令设置为执行上述运算方法。
本公开实施例还提供了一种计算机程序产品,所述计算机程序产品包括存储在计算机可读存储介质上的计算机程序,所述计算机程序包括程序指令,当所述程序指令被计算机执行时,使所述计算机执行上述运算方法。
上述的可读存储介质可以是暂态计算机可读存储介质,也可以是非暂态计算机可读存储介质。
本公开实施例的技术方案可以以软件产品的形式体现出来,该计算机软件产品存储在一个存储介质中,包括一个或多个指令用以使得一台计算机设备(可以是个人计算机,服务器,或者网络设备等)执行本公开实施例所述方法的全部或部分步骤。而前述的存储介质可以是非暂态存储介质,包括:U盘、移动硬盘、只读存储器(ROM,Read-Only Memory)、随机存取存储器(RAM,Random Access Memory)、磁碟或者光盘等多种可以存储程序代码的介质,也可以是暂态存储介质。
当用于本申请中时,虽然术语“第一”、“第二”等可能会在本申请中使用以描述各元件,但这些元件不应受到这些术语的限制。这些术语仅用于将一个元件与另一个元件区别开。比如,在不改变描述的含义的情况下,第一元件可以叫做第二元件,并且同样第,第二元件可以叫做第一元件,只要所有出现的“第一元件”一致重命名并且所有出现的“第二元件”一致重命名即可。第一元件和第二元件都是元件,但可以不是相同的元件。
本申请中使用的用词仅用于描述实施例并且不用于限制权利要求。如在实施例以及权利要求的描述中使用的,除非上下文清楚地表明,否则单数形式的“一个”(a)、“一个”(an)和“所述”(the)旨在同样包括复数形式。类似地,如在本申请中所使用的术语“和/或”是指包含一个或一个以上相关联的列出的任何以及所有可能的组合。另外,当用于本申请中时,术语“包括”(comprise)及其变型“包括”(comprises)和/或包括(comprising)等指陈述的特征、整 体、步骤、操作、元素,和/或组件的存在,但不排除一个或一个以上其它特征、整体、步骤、操作、元素、组件和/或这些的分组的存在或添加。
所描述的实施例中的各方面、实施方式、实现或特征能够单独使用或以任意组合的方式使用。所描述的实施例中的各方面可由软件、硬件或软硬件的结合实现。所描述的实施例也可以由存储有计算机可读代码的计算机可读介质体现,该计算机可读代码包括可由至少一个计算装置执行的指令。所述计算机可读介质可与任何能够存储数据的数据存储装置相关联,该数据可由计算机系统读取。用于举例的计算机可读介质可以包括只读存储器、随机存取存储器、CD-ROM、HDD、DVD、磁带以及光数据存储装置等。所述计算机可读介质还可以分布于通过网络联接的计算机系统中,这样计算机可读代码就可以分布式存储并执行。
上述技术描述可参照附图,这些附图形成了本申请的一部分,并且通过描述在附图中示出了依照所描述的实施例的实施方式。虽然这些实施例描述的足够详细以使本领域技术人员能够实现这些实施例,但这些实施例是非限制性的;这样就可以使用其它的实施例,并且在不脱离所描述的实施例的范围的情况下还可以做出变化。比如,流程图中所描述的操作顺序是非限制性的,因此在流程图中阐释并且根据流程图描述的两个或两个以上操作的顺序可以根据若干实施例进行改变。作为另一个例子,在若干实施例中,在流程图中阐释并且根据流程图描述的一个或一个以上操作是可选的,或是可删除的。另外,某些步骤或功能可以添加到所公开的实施例中,或两个以上的步骤顺序被置换。所有这些变化被认为包含在所公开的实施例以及权利要求中。
另外,上述技术描述中使用术语以提供所描述的实施例的透彻理解。然而,并不需要过于详细的细节以实现所描述的实施例。因此,实施例的上述描述是为了阐释和描述而呈现的。上述描述中所呈现的实施例以及根据这些实施例所公开的例子是单独提供的,以添加上下文并有助于理解所描述的实施例。上述说明书不用于做到无遗漏或将所描述的实施例限制到本公开的精确形式。根据上述教导,若干修改、选择适用以及变化是可行的。在某些情况下,没有详细描述为人所熟知的处理步骤以避免不必要地影响所描述的实施例。

Claims (12)

  1. 一种运算方法,其特征在于,所述运算方法基于多运算芯片的运算系统,所述运算系统包括:由多个运算芯片串联形成的运算芯片链路,以及与位于链路源头的主控运算芯片电连接的主板;其中,在运算芯片链路中,相邻的两个运算芯片接入同一独立时钟信号源晶振;
    针对于所述运算芯片链路中的任一运算芯片,所述运算方法包括:
    接收独立时钟信号源晶振发送的时钟信号;
    根据所述时钟信号对接收的待处理运算数据进行处理或转发;其中,所述待处理运算数据是所述主板发送给所述主控运算芯片,并经由运算芯片链路中串联的各运算芯片依次转发至该任一运算芯片的。
  2. 根据权利要求1所述的运算方法,其特征在于,所述各运算芯片中存储有所述运算芯片链路的地址逻辑;所述待处理运算数据中至少包括有执行所述待处理运算数据的目标运算芯片地址和相应的运算数据;
    所述对接收的待处理运算数据进行处理或转发,包括:
    接收由所述运算芯片链路中的上级运算芯片转发的待处理运算数据,并根据存储的地址逻辑和待处理运算数据中的目标运算芯片地址,判断所述运算数据的运算主体是否为自身;
    若是,则调用运算逻辑对所述运算数据进行处理;
    若否,则将所述待处理运算数据转发至所述运算芯片链路中的下级运算芯片中。
  3. 根据权利要求2所述的运算方法,其特征在于,所述调用运算逻辑对所述运算数据进行处理之后,所述运算方法还包括:
    生成处理结果数据,将所述处理结果数据转发至所述运算芯片链路中的上级运算芯片中,以供所述上级运算芯片对所述处理结果数据进行转发直至将所述处理结果数据依次转发至目标运算芯片。
  4. 根据权利要求2所述的运算方法,其特征在于,所述将所述待处理运算 数据转发至所述运算芯片链路中的下级运算芯片中之后,所述运算方法还包括:
    接收运算芯片链路中的下级运算芯片发起的处理结果数据;其中,所述处理结果数据至少包括:接收所述处理结果数据的目标运算芯片地址,以及相应的结果数据;
    根据所述地址逻辑判断所述处理结果数据中的所述目标运算芯片地址是否为自身运算芯片地址;
    若是,则将所述结果数据存储;
    若否,则将所述结果数据转发至运算芯片链路中的上级运算芯片直至目标运算芯片。
  5. 根据权利要求2所述的运算方法,其特征在于,所述调用运算逻辑对所述运算数据进行处理之后,所述运算方法还包括:
    生成中断请求,将所述中断请求发送至所述运算芯片链路中的上级运算芯片中,以供所述上级运算芯片对所述中断请求进行处理。
  6. 根据权利要求5所述的运算方法,其特征在于,所述将所述中断请求发送至所述运算芯片链路中的上级运算芯片中之后,所述运算方法还包括:
    接收并执行由所述上级运算芯片发送的中断响应,所述中断响应是所述上级运算芯片在接收到所述中断请求之后调用运算逻辑对所述中断请求进行处理并生成发送的。
  7. 根据权利要求2-6任一项所述的运算方法,其特征在于,所述运算方法还包括:
    进行运算芯片初始化操作,以获得并将运算芯片链路的地址逻辑存储在运算芯片链路中的每一运算芯片中。
  8. 一种运算芯片,其特征在于,包括:存储器、与所述存储器连接的处理器,及存储在所述存储器上并可在所述处理器上运行的计算机程序,其特征在于,
    所述处理器运行所述计算机程序时执行权利要求1-7任一项所述的方法。
  9. 一种运算系统,其特征在于,包括:主板和多个权利要求8所述的运算 芯片;
    其中,所述多个运算芯片串联连接组成运算芯片链路;所述运算芯片链路的链路源头包括主控运算芯片,所述主控运算芯片与所述主板电连接;所述运算芯片链路中,任意相邻的两个运算芯片均连入同一独立时钟信号源,并接收该独立时钟信号源的时钟信号。
  10. 根据权利要求9所述的运算系统,其特征在于,还包括:
    多个独立时钟信号源晶振,每个所述独立时钟信号源晶振包括时钟信号源输出端,该输出端与所述运算芯片链路中任意相邻的两个运算芯片相连,以为该两个运算芯片提供独立时钟信号。
  11. 一种可读存储介质,其特征在于,存储有计算机可执行指令,所述计算机可执行指令设置为执行权利要求1-7任一项所述的方法。
  12. 一种计算机程序产品,其特征在于,所述计算机程序产品包括存储在可读存储介质上的计算机程序,所述计算机程序包括程序指令,当所述程序指令被计算机执行时,使所述计算机执行权利要求1-7任一项所述的方法。
PCT/CN2018/118723 2018-11-30 2018-11-30 运算方法、芯片、系统、可读存储介质及计算机程序产品 WO2020107460A1 (zh)

Priority Applications (2)

Application Number Priority Date Filing Date Title
CN201880015943.8A CN110770712B (zh) 2018-11-30 2018-11-30 运算方法、芯片、系统、可读存储介质及计算机程序产品
PCT/CN2018/118723 WO2020107460A1 (zh) 2018-11-30 2018-11-30 运算方法、芯片、系统、可读存储介质及计算机程序产品

Applications Claiming Priority (1)

Application Number Priority Date Filing Date Title
PCT/CN2018/118723 WO2020107460A1 (zh) 2018-11-30 2018-11-30 运算方法、芯片、系统、可读存储介质及计算机程序产品

Publications (1)

Publication Number Publication Date
WO2020107460A1 true WO2020107460A1 (zh) 2020-06-04

Family

ID=69328658

Family Applications (1)

Application Number Title Priority Date Filing Date
PCT/CN2018/118723 WO2020107460A1 (zh) 2018-11-30 2018-11-30 运算方法、芯片、系统、可读存储介质及计算机程序产品

Country Status (2)

Country Link
CN (1) CN110770712B (zh)
WO (1) WO2020107460A1 (zh)

Families Citing this family (2)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN111506154B (zh) * 2020-04-14 2021-05-25 深圳比特微电子科技有限公司 计算机提高算力和降低功耗算力比的方法及系统
CN112084131A (zh) * 2020-09-11 2020-12-15 深圳比特微电子科技有限公司 用于数字货币的计算装置和计算系统

Citations (6)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN105680800A (zh) * 2014-11-17 2016-06-15 苏州普源精电科技有限公司 一种具有扫频功能的信号发生器
CN105760324A (zh) * 2016-05-11 2016-07-13 北京比特大陆科技有限公司 数据处理装置和服务器
CN105956659A (zh) * 2016-05-11 2016-09-21 北京比特大陆科技有限公司 数据处理装置和系统、服务器
CN205827367U (zh) * 2016-05-11 2016-12-21 北京比特大陆科技有限公司 数据处理装置和服务器
CN205983537U (zh) * 2016-05-11 2017-02-22 北京比特大陆科技有限公司 数据处理装置和系统、服务器
CN207503225U (zh) * 2017-11-28 2018-06-15 北京比特大陆科技有限公司 一种运算系统及相应的电子设备

Family Cites Families (1)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
US9384156B2 (en) * 2013-11-21 2016-07-05 Microsoft Technology Licensing, Llc Support for IOAPIC interrupts in AMBA-based devices

Patent Citations (6)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN105680800A (zh) * 2014-11-17 2016-06-15 苏州普源精电科技有限公司 一种具有扫频功能的信号发生器
CN105760324A (zh) * 2016-05-11 2016-07-13 北京比特大陆科技有限公司 数据处理装置和服务器
CN105956659A (zh) * 2016-05-11 2016-09-21 北京比特大陆科技有限公司 数据处理装置和系统、服务器
CN205827367U (zh) * 2016-05-11 2016-12-21 北京比特大陆科技有限公司 数据处理装置和服务器
CN205983537U (zh) * 2016-05-11 2017-02-22 北京比特大陆科技有限公司 数据处理装置和系统、服务器
CN207503225U (zh) * 2017-11-28 2018-06-15 北京比特大陆科技有限公司 一种运算系统及相应的电子设备

Also Published As

Publication number Publication date
CN110770712B (zh) 2023-08-18
CN110770712A (zh) 2020-02-07

Similar Documents

Publication Publication Date Title
US10210120B2 (en) Method, apparatus and system to implement secondary bus functionality via a reconfigurable virtual switch
CN112543925B (zh) 用于使用专用低延迟链路的多个硬件加速器的统一地址空间
US8891408B2 (en) Broadcasting a message in a parallel computer
US9330230B2 (en) Validating a cabling topology in a distributed computing system
CN110557311B (zh) 用于系统级封装管芯间访问等待时间的处理器间通信方法
US11675729B2 (en) Electronic device and operation method of sleep mode thereof
WO2020107460A1 (zh) 运算方法、芯片、系统、可读存储介质及计算机程序产品
CN110880998A (zh) 一种基于可编程器件的报文传输方法及装置
CN103500108A (zh) 系统内存访问方法、节点控制器和多处理器系统
EP2620876B1 (en) Method and apparatus for data processing, pci-e bus system and server
JP4711410B2 (ja) 半導体集積回路
CN208766658U (zh) 一种服务器系统
US20140006645A1 (en) Emulated keyboard controller and embedded controller interface via an interconnect interface
CN111427806A (zh) 一种双核amp系统共用串口的方法、存储介质及智能终端
WO2019120294A1 (en) Data-processing apparatus, data transmission method, and computing system thereof
KR102576707B1 (ko) 전자 시스템 및 그 동작 방법
WO2021037124A1 (zh) 一种任务处理的方法以及任务处理装置
JP2001188751A (ja) データ転送装置
US11392406B1 (en) Alternative interrupt reporting channels for microcontroller access devices
US7702838B2 (en) Method and apparatus for configuration space extension bus
KR101276837B1 (ko) 서로 다른 동작 주파수로 동작하는 프로세서 시스템 간의 통신을 지원하기 위한 장치
US20220019459A1 (en) Controlled early response in master-slave systems
US20180018292A1 (en) Method and apparatus for detecting and resolving bus hang in a bus controlled by an interface clock
CN108153703A (zh) 一种外设访问方法和装置
CN114330185B (zh) 时序修复方法、装置、存储介质及电子设备

Legal Events

Date Code Title Description
121 Ep: the epo has been informed by wipo that ep was designated in this application

Ref document number: 18941350

Country of ref document: EP

Kind code of ref document: A1

NENP Non-entry into the national phase

Ref country code: DE

122 Ep: pct application non-entry in european phase

Ref document number: 18941350

Country of ref document: EP

Kind code of ref document: A1