WO2024138796A1 - Data collection structure, method and system, and chip - Google Patents

Data collection structure, method and system, and chip Download PDF

Info

Publication number
WO2024138796A1
WO2024138796A1 PCT/CN2023/071362 CN2023071362W WO2024138796A1 WO 2024138796 A1 WO2024138796 A1 WO 2024138796A1 CN 2023071362 W CN2023071362 W CN 2023071362W WO 2024138796 A1 WO2024138796 A1 WO 2024138796A1
Authority
WO
WIPO (PCT)
Prior art keywords
arbitrator
data collection
chain
computing unit
computing
Prior art date
Application number
PCT/CN2023/071362
Other languages
French (fr)
Chinese (zh)
Inventor
汪福全
刘明
Original Assignee
声龙(新加坡)私人有限公司
Priority date (The priority date is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the date listed.)
Filing date
Publication date
Application filed by 声龙(新加坡)私人有限公司 filed Critical 声龙(新加坡)私人有限公司
Publication of WO2024138796A1 publication Critical patent/WO2024138796A1/en

Links

Images

Classifications

    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F13/00Interconnection of, or transfer of information or other signals between, memories, input/output devices or central processing units
    • G06F13/38Information transfer, e.g. on bus
    • G06F13/40Bus structure
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F15/00Digital computers in general; Data processing equipment in general
    • G06F15/16Combinations of two or more digital computers each having at least an arithmetic unit, a program unit and a register, e.g. for a simultaneous processing of several programs
    • G06F15/163Interprocessor communication
    • G06F15/173Interprocessor communication using an interconnection network, e.g. matrix, shuffle, pyramid, star, snowflake
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F15/00Digital computers in general; Data processing equipment in general
    • G06F15/76Architectures of general purpose stored program computers
    • G06F15/78Architectures of general purpose stored program computers comprising a single central processing unit
    • YGENERAL TAGGING OF NEW TECHNOLOGICAL DEVELOPMENTS; GENERAL TAGGING OF CROSS-SECTIONAL TECHNOLOGIES SPANNING OVER SEVERAL SECTIONS OF THE IPC; TECHNICAL SUBJECTS COVERED BY FORMER USPC CROSS-REFERENCE ART COLLECTIONS [XRACs] AND DIGESTS
    • Y02TECHNOLOGIES OR APPLICATIONS FOR MITIGATION OR ADAPTATION AGAINST CLIMATE CHANGE
    • Y02DCLIMATE CHANGE MITIGATION TECHNOLOGIES IN INFORMATION AND COMMUNICATION TECHNOLOGIES [ICT], I.E. INFORMATION AND COMMUNICATION TECHNOLOGIES AIMING AT THE REDUCTION OF THEIR OWN ENERGY USE
    • Y02D10/00Energy efficient computing, e.g. low power processors, power management or thermal management

Definitions

  • the embodiments of the present disclosure relate to data collection technology, and in particular to a data collection structure, method, chip and system.
  • the current solution for collecting multiple calculation results is to add the calculation results to a preset buffer in sequence. When the buffer is full, if the control unit wants to receive all the calculation results submitted by all computing units, it needs to have back pressure on the computing units, which greatly increases the difficulty of designing the computing units. In addition, the current solution does not consider the spatial layout of all computing units.
  • the plurality of computing units are sequentially connected through the plurality of the first arbitrators to form a data collection chain, and the computing unit at the end of the chain is connected to the control unit;
  • the arbitrator is configured to receive the calculation result sent by the calculation unit and transmit it to the control unit along the data collection chain.
  • Each computing unit is connected to the next computing unit through a first arbitrator corresponding to the computing unit, so that the multiple computing units are connected in a chain to form the data collection chain.
  • each of the first arbitrators may include a first inlet, a second inlet, and an outlet;
  • the first inlet of each of the first arbitrators is connected to the output interface of the computing unit corresponding to the first arbitrator;
  • control unit when the data collection chain is one, the computing unit at the end of the chain is connected to the control unit, which may include:
  • the first arbitrator corresponding to the computing unit at the end of the chain is directly connected to the control unit through the first arbitrator outlet;
  • the multiple arbitrators may also include: a second arbitrator; each second arbitrator includes a first inlet, a second inlet and an outlet;
  • the outlets of the first arbitrators corresponding to the computing units at the tails of the multiple data collection chains are connected to the control unit through the second arbitrator.
  • the outlet of the first arbitrator corresponding to the computing unit at the end of the plurality of data collection chains is connected to the control unit through the second arbitrator, which may include:
  • the outlet of the first arbitrator corresponding to the computing unit at the end of each data collection chain is connected to the first inlet of the second arbitrator or the second inlet of the second arbitrator;
  • the outlet of the second arbitrator is connected to the input interface of the control unit, or is connected to the first inlet or the second inlet of the next second arbitrator.
  • a buffer may be provided at the first entrance of each of the first arbitrators
  • the calculation results are transmitted directly or selectively along a data collection chain until they are transmitted to the control unit.
  • the arbitrator When any one of the first entry and the second entry included in the arbitrator receives the calculation result, the arbitrator directly sends the calculation result to the exit of the arbitrator for output;
  • the arbitrator selects one of the two calculation results and sends it to the exit of the arbitrator for output according to a preset selection strategy.
  • the selection strategy includes:
  • the embodiment of the present disclosure also provides a chip which may include the data collection structure.
  • FIG1 is a schematic diagram of a data collection structure including a single data collection chain and a first arbitrator disposed inside a computing unit according to an embodiment of the present disclosure
  • FIG4 is a schematic diagram of a data collection structure including multiple data collection chains and a first arbitrator disposed outside a computing unit according to an embodiment of the present disclosure
  • FIG. 7 is a block diagram of the data collection system according to an embodiment of the present disclosure.
  • the specification may have presented the method and/or process as a specific sequence of steps. However, to the extent that the method or process does not rely on the specific order of the steps described herein, the method or process should not be limited to the steps in the specific order described. As will be appreciated by those of ordinary skill in the art, other orders of steps are also possible. Therefore, the specific order of the steps set forth in the specification should not be interpreted as a limitation to the claims. In addition, the claims for the method and/or process should not be limited to the steps performed in the order written, and those skilled in the art can easily understand that these orders can be changed and still remain within the spirit and scope of the disclosed embodiments.
  • the plurality of computing units 1 are connected in sequence through the plurality of the first arbitrators 21 to form a data collection chain, and the computing unit at the end of the chain is connected to the control unit;
  • a data collection chain structure without back pressure is provided to collect the calculation results of all computing units, which can simplify the design of the computing units, increase the operating frequency of the computing units, and arrange the computing units closely to make full use of the chip space.
  • each of the first arbitrators 21 may include a first inlet a1, a second inlet b1, and an outlet c1;
  • the first inlet a1 of each first arbitrator 21 is connected to the output interface of the computing unit 1 corresponding to the first arbitrator 21;
  • the first arbitrator 21 corresponding to the computing unit 1 at the end of the chain is directly connected to the control unit 3 through the exit c1.
  • the computing unit 1 at the end of the chain is connected to the control unit 3 and may include:
  • the outlet c1 of the first arbitrator 21 corresponding to the computing unit 1 at the end of the chain of the plurality of data collection chains is connected to the control unit 3 through the second arbitrator 22, which may include:
  • the outlet c2 of the second arbitrator 22 is connected to the input interface of the control unit 3 , or is connected to the first inlet a2 or the second inlet b2 of the next second arbitrator 22 .
  • the number of computing units 1 in FIG. 1 , FIG. 2 , FIG. 3 , and FIG. 4 is only an example, and may actually be any number, for example, generally 1 to 65536.
  • directly or selectively transmitting the calculation result along a data collection chain may include:
  • the arbitrator 2 When any one of the first entry and the second entry included in the arbitrator 2 receives the calculation result, the arbitrator 2 directly sends the calculation result to the exit of the arbitrator for output;
  • the arbitrator selects one of the two calculation results and sends it to the output of the arbitrator for output according to a preset selection strategy.
  • the selection strategy may include:
  • No buffer is set at the first entrance of the first arbitrator, and one of the calculation results received from the first entrance and the second entrance is selected at random, and the selected calculation result is sent to the exit of the arbitrator.
  • the probability that the chain node connected to computing unit [2] has no data is (1-1/2 32 ) 3 ,
  • the selection strategy may include:
  • the calculation result cached in the buffer is sent to the exit c1 of the first arbitrator 21 for output; wherein, if a new calculation result sent by the calculation unit 1 corresponding to the buffer is received before the calculation result cached in the buffer is sent to the exit c1 of the first arbitrator 21, the calculation result cached in the buffer is discarded, and the new calculation result is sent to the first exit a1 of the first arbitrator 21 for output.
  • the method also includes: determining whether to set a buffer based on the size of the recovered computing power loss and the amount of resources that need to be increased; wherein, when the recovered computing power loss is greater than or equal to a preset computing power threshold, and the proportion of resources that need to be increased to the total chip resources is less than or equal to a preset proportion threshold, determining to set the buffer; when the recovered computing power loss is less than the preset computing power threshold, and/or the proportion of resources that need to be increased to the total chip resources is greater than the preset proportion threshold, determining not to set the buffer.
  • the embodiment of the present disclosure further provides a chip 10, as shown in FIG6, comprising the data collection structure A described above.
  • the embodiment of the present disclosure further provides a data collection system 1 , as shown in FIG. 7 , comprising a chip 10 , a processor 11 and a computer-readable storage medium 12 , wherein the processor 11 stores the calculation results in the chip 10 in the computer-readable storage medium 12 .
  • any of the aforementioned data collection structures and methods are applicable to the data collection system 1 embodiment, and will not be described one by one here.
  • the processor 11 and the computer-readable storage medium 12 may be implemented by the above-mentioned control unit.
  • any of the aforementioned data collection structures and methods are applicable to the computer-readable storage medium embodiments and will not be described in detail herein.
  • Such software may be distributed on a computer-readable medium, which may include a computer storage medium (or non-transitory medium) and a communication medium (or temporary medium).
  • a computer storage medium includes volatile and non-volatile, removable and non-removable media implemented in any method or technology for storing information (such as computer-readable instructions, data structures, program modules, or other data).
  • Computer storage media include, but are not limited to, RAM, ROM, EEPROM, flash memory or other memory technology, CD-ROM, digital versatile disks (DVD) or other optical disk storage, magnetic cassettes, magnetic tapes, magnetic disk storage or other magnetic storage devices, or any other medium that can be used to store the desired information and can be accessed by a computer.
  • communication media typically contain computer-readable instructions, data structures, program modules, or other data in a modulated data signal such as a carrier wave or other transport mechanism, and may include any information delivery media.

Landscapes

  • Engineering & Computer Science (AREA)
  • Theoretical Computer Science (AREA)
  • Computer Hardware Design (AREA)
  • General Engineering & Computer Science (AREA)
  • Physics & Mathematics (AREA)
  • General Physics & Mathematics (AREA)
  • Software Systems (AREA)
  • Mathematical Physics (AREA)
  • Data Exchanges In Wide-Area Networks (AREA)
  • Bus Control (AREA)
  • Debugging And Monitoring (AREA)
  • Selective Calling Equipment (AREA)
  • Complex Calculations (AREA)

Abstract

Disclosed in the embodiments of the present disclosure are a data collection structure, method and system, and a chip. The data collection structure comprises: a plurality of computing units (1) and a plurality of arbiters (2), the plurality of arbiters (2) comprising first arbiters (21) each corresponding to a computing unit (1), and the plurality of computing units (1) being successively connected by means of the plurality of first arbiters (21) to form a data collection chain, and the computing unit (1) located at the chain end being connected to a control unit (3), wherein the arbiters (2) are configured to receive computing results sent by the computing units (1) and transmit along the data collection chain the computing results to the control unit (3).

Description

一种数据收集结构、方法、芯片和系统A data collection structure, method, chip and system 技术领域Technical Field
本公开实施例涉及数据收集技术,尤指一种数据收集结构、方法、芯片和系统。The embodiments of the present disclosure relate to data collection technology, and in particular to a data collection structure, method, chip and system.
背景技术Background technique
在算力芯片中,存在很多个计算单元,在进行数据处理时需要将多个计算单元的计算结果进行收集、汇总,以将全部计算结果统一收集在一个控制单元中。目前方案中对多个计算结果的收集方法是将计算结果依次加入预设的缓存器中,而且当缓存器已满时,控制单元如果要接收所有计算单元提交的所有计算结果,这就需要对计算单元有反压,而这会大大增加计算单元的设计难度;另外,目前的方案没有考虑全部计算单元的空间布局。There are many computing units in the computing power chip. When processing data, the calculation results of multiple computing units need to be collected and summarized so that all the calculation results can be collected in a control unit. The current solution for collecting multiple calculation results is to add the calculation results to a preset buffer in sequence. When the buffer is full, if the control unit wants to receive all the calculation results submitted by all computing units, it needs to have back pressure on the computing units, which greatly increases the difficulty of designing the computing units. In addition, the current solution does not consider the spatial layout of all computing units.
发明概述SUMMARY OF THE INVENTION
以下是对本文详细描述的主题的概述,本概述并非是为了限制权利要求的保护范围。The following is a summary of the subject matter described in detail herein; this summary is not intended to limit the scope of the claims.
本公开实施例提供了一种数据收集结构、方法、芯片和系统。Embodiments of the present disclosure provide a data collection structure, method, chip and system.
本公开实施例提供了一种数据收集结构,可以包括:多个计算单元和多个仲裁器;所述多个仲裁器包括与每个计算单元对应的第一仲裁器;The present disclosure provides a data collection structure, which may include: a plurality of computing units and a plurality of arbitrators; the plurality of arbitrators include a first arbitrator corresponding to each computing unit;
多个所述计算单元通过多个所述第一仲裁器依次相连接,构成数据收集链,并且处于链尾的计算单元与控制单元相连接;The plurality of computing units are sequentially connected through the plurality of the first arbitrators to form a data collection chain, and the computing unit at the end of the chain is connected to the control unit;
其中,所述仲裁器设置为接收所述计算单元发送的计算结果并沿所述数据收集链传输至所述控制单元。The arbitrator is configured to receive the calculation result sent by the calculation unit and transmit it to the control unit along the data collection chain.
在本公开的示例性实施例中,所述多个所述计算单元通过多个所述第一仲裁器依次相连接,构成数据收集链,可以包括:In an exemplary embodiment of the present disclosure, the plurality of computing units are sequentially connected through a plurality of the first arbitrators to form a data collection chain, which may include:
每个计算单元分别通过与该计算单元对应的第一仲裁器连接到下一个计算单元,使得所述多个计算单元呈链式连接,构成所述数据收集链。Each computing unit is connected to the next computing unit through a first arbitrator corresponding to the computing unit, so that the multiple computing units are connected in a chain to form the data collection chain.
在本公开的示例性实施例中,每个所述第一仲裁器可以包括第一入口、第二入口和出口;In an exemplary embodiment of the present disclosure, each of the first arbitrators may include a first inlet, a second inlet, and an outlet;
所述每个计算单元分别通过与该计算单元对应的第一仲裁器连接到下一个计算单元,包括:Each computing unit is connected to the next computing unit through a first arbitrator corresponding to the computing unit, including:
每个所述第一仲裁器的第一入口与该第一仲裁器所对应的计算单元的输出接口相连接;The first inlet of each of the first arbitrators is connected to the output interface of the computing unit corresponding to the first arbitrator;
每个所述第一仲裁器的第二入口与上一个计算单元对应的第一仲裁器的出口相连接。The second inlet of each of the first arbitrators is connected to the outlet of the first arbitrator corresponding to the previous computing unit.
在本公开的示例性实施例中,所述数据收集链为一条时;所述处于链尾的计算单元与控制单元相连接,可以包括:In an exemplary embodiment of the present disclosure, when the data collection chain is one, the computing unit at the end of the chain is connected to the control unit, which may include:
处于链尾的所述计算单元对应的所述第一仲裁器通过该的第一仲裁器出口直接与所述控制单元相连接;The first arbitrator corresponding to the computing unit at the end of the chain is directly connected to the control unit through the first arbitrator outlet;
所述数据收集链为多条时;所述多个仲裁器还可以包括:第二仲裁器;每个第二仲裁器包括第一入口、第二入口和出口;When there are multiple data collection chains, the multiple arbitrators may also include: a second arbitrator; each second arbitrator includes a first inlet, a second inlet and an outlet;
所述处于链尾的计算单元与控制单元相连接,可以包括:The computing unit at the end of the chain is connected to the control unit and may include:
多条所述数据收集链的处于链尾的所述计算单元对应的第一仲裁器的出口通过所述第二仲裁器与所述控制单元相连接。The outlets of the first arbitrators corresponding to the computing units at the tails of the multiple data collection chains are connected to the control unit through the second arbitrator.
在本公开的示例性实施例中,所述多条所述数据收集链的处于链尾的所述计算单元对应的第一仲裁器的出口通过所述第二仲裁器与所述控制单元相连接,可以包括:In an exemplary embodiment of the present disclosure, the outlet of the first arbitrator corresponding to the computing unit at the end of the plurality of data collection chains is connected to the control unit through the second arbitrator, which may include:
每条所述数据收集链的处于链尾的所述计算单元对应的第一仲裁器的出口与所述第二仲裁器的第一入口或所述第二仲裁器的第二入口相连接;The outlet of the first arbitrator corresponding to the computing unit at the end of each data collection chain is connected to the first inlet of the second arbitrator or the second inlet of the second arbitrator;
所述第二仲裁器的出口与所述控制单元的输入接口相连,或者与下一个第二仲裁器的第一入口或第二入口相连。The outlet of the second arbitrator is connected to the input interface of the control unit, or is connected to the first inlet or the second inlet of the next second arbitrator.
在本公开的示例性实施例中,每个所述第一仲裁器的第一入口处可以设置有缓存器;In an exemplary embodiment of the present disclosure, a buffer may be provided at the first entrance of each of the first arbitrators;
所述缓存器可以设置为缓存该第一仲裁器对应的计算机单元发送的计算 结果。The buffer may be configured to cache calculation results sent by the computer unit corresponding to the first arbitrator.
本公开实施例还提供了一种数据收集方法,基于所述的数据收集结构,应用于所述数据收集结构中的仲裁器;所述仲裁器包括第一仲裁器,或者包括所述第一仲裁器和第二仲裁器;所述方法可以包括:The embodiment of the present disclosure further provides a data collection method, which is based on the data collection structure and applied to an arbitrator in the data collection structure; the arbitrator includes a first arbitrator, or includes the first arbitrator and a second arbitrator; the method may include:
获取对应的计算单元提交的和/或上一级仲裁器传输的计算结果;Obtaining the calculation results submitted by the corresponding computing unit and/or transmitted by the upper-level arbitrator;
将所述计算结果直接或选择性地沿数据收集链进行传输,直至传输到所述控制单元。The calculation results are transmitted directly or selectively along a data collection chain until they are transmitted to the control unit.
在本公开的示例性实施例中,所述将所述计算结果直接或选择性地沿数据收集链进行传输,可以包括:In an exemplary embodiment of the present disclosure, directly or selectively transmitting the calculation result along a data collection chain may include:
当所述仲裁器所包含的第一入口和第二入口中任意一个收到所述计算结果时,由该仲裁器将所述计算结果直接发送到该仲裁器的出口进行输出;When any one of the first entry and the second entry included in the arbitrator receives the calculation result, the arbitrator directly sends the calculation result to the exit of the arbitrator for output;
当所述仲裁器所包含的第一入口和第二入口中均收到计算结果时,由该仲裁器根据预设的选择策略将两个所述计算结果择一发送到该仲裁器的出口进行输出。When both the first entry and the second entry of the arbitrator receive calculation results, the arbitrator selects one of the two calculation results and sends it to the exit of the arbitrator for output according to a preset selection strategy.
在本公开的示例性实施例中,当所述计算单元的数量小于预设的数量阈值时,所述选择策略可以包括:In an exemplary embodiment of the present disclosure, when the number of the computing units is less than a preset number threshold, the selection strategy may include:
不在所述第一仲裁器的第一入口处设置缓存器,从所述第一入口和所述第二入口接收到的计算结果中任意选择一个,将选中的计算结果发送至该仲裁器的出口;No buffer is set at the first entrance of the first arbitrator, and one of the calculation results received from the first entrance and the second entrance is selected at random, and the selected calculation result is sent to the exit of the arbitrator;
当所述计算单元的数量大于或等于预设的数量阈值时,所述选择策略包括:When the number of the computing units is greater than or equal to a preset number threshold, the selection strategy includes:
预先在所述第一仲裁器的第一入口处设置缓存器,将所述第一入口接收到的所述计算结果发送请求包含的计算结果缓存到该第一入口处预先设置的缓存器内;Pre-setting a buffer at a first entrance of the first arbitrator, and buffering the calculation result included in the calculation result sending request received by the first entrance into the buffer pre-set at the first entrance;
在发送完所述第一入口接收到的所述计算结果以后发送所述缓存器中缓存的计算结果或者发送此时所述第一入口接收到的新的计算结果。After sending the calculation result received by the first entry, the calculation result cached in the buffer is sent or the new calculation result received by the first entry at this time is sent.
本公开实施例还提供了一种芯片,可以包括所述的数据收集结构。The embodiment of the present disclosure also provides a chip which may include the data collection structure.
本公开实施例还提供了一种数据收集系统,包括所述的芯片、处理器和计算机可读存储介质,所述处理器将所述芯片中的计算结果存储到所述计算机可读存储介质中。An embodiment of the present disclosure also provides a data collection system, comprising the chip, a processor and a computer-readable storage medium, wherein the processor stores the calculation results in the chip in the computer-readable storage medium.
在阅读并理解了附图和详细描述后,可以明白其他方面。Other aspects will be apparent upon reading and understanding the drawings and detailed description.
附图概述BRIEF DESCRIPTION OF THE DRAWINGS
附图用来提供对本公开技术方案的理解,并且构成说明书的一部分,与本公开的实施例一起用于解释本公开的技术方案,并不构成对本公开技术方案的限制。The accompanying drawings are used to provide an understanding of the technical solution of the present disclosure and constitute a part of the specification. Together with the embodiments of the present disclosure, they are used to explain the technical solution of the present disclosure and do not constitute a limitation on the technical solution of the present disclosure.
图1为本公开实施例的包含单条数据收集链且第一仲裁器设置于计算单元内部时的数据收集结构示意图;FIG1 is a schematic diagram of a data collection structure including a single data collection chain and a first arbitrator disposed inside a computing unit according to an embodiment of the present disclosure;
图2为本公开实施例的包含单条数据收集链且第一仲裁器设置于计算单元外部时的数据收集结构示意图;FIG2 is a schematic diagram of a data collection structure including a single data collection chain and a first arbitrator disposed outside a computing unit according to an embodiment of the present disclosure;
图3为本公开实施例的包含多条数据收集链且第一仲裁器设置于计算单元内部时的数据收集结构示意图;3 is a schematic diagram of a data collection structure including multiple data collection chains and a first arbitrator disposed inside a computing unit according to an embodiment of the present disclosure;
图4为本公开实施例的包含多条数据收集链且第一仲裁器设置于计算单元外部时的数据收集结构示意图;FIG4 is a schematic diagram of a data collection structure including multiple data collection chains and a first arbitrator disposed outside a computing unit according to an embodiment of the present disclosure;
图5为本公开实施例的数据收集方法流程图;FIG5 is a flow chart of a data collection method according to an embodiment of the present disclosure;
图6为本公开实施例的数据收集装置组成框图;FIG6 is a block diagram of a data collection device according to an embodiment of the present disclosure;
图7为本公开实施例的数据收集系统组成框图。FIG. 7 is a block diagram of the data collection system according to an embodiment of the present disclosure.
详述Details
本公开描述了多个实施例,但是该描述是示例性的,而不是限制性的,并且对于本领域的普通技术人员来说显而易见的是,在本公开所描述的实施例包含的范围内可以有更多的实施例和实现方案。尽管在附图中示出了许多可能的特征组合,并在详细实施方式中进行了讨论,但是所公开的特征的许多其它组合方式也是可能的。除非特意加以限制的情况以外,任何实施例的任何特征或元件可以与任何其它实施例中的任何其他特征或元件结合使用, 或可以替代任何其它实施例中的任何其他特征或元件。The present disclosure describes multiple embodiments, but the description is exemplary rather than restrictive, and it is obvious to those skilled in the art that there may be more embodiments and implementations within the scope of the embodiments described in the present disclosure. Although many possible feature combinations are shown in the drawings and discussed in the detailed embodiments, many other combinations of the disclosed features are also possible. Unless specifically limited, any feature or element of any embodiment may be used in combination with any other feature or element in any other embodiment, or may replace any other feature or element in any other embodiment.
本公开包括并设想了与本领域普通技术人员已知的特征和元件的组合。本公开已经公开的实施例、特征和元件也可以与任何常规特征或元件组合,以形成由权利要求限定的独特的发明方案。任何实施例的任何特征或元件也可以与来自其它发明方案的特征或元件组合,以形成另一个由权利要求限定的独特的发明方案。因此,应当理解,在本公开中示出和/或讨论的任何特征可以单独地或以任何适当的组合来实现。因此,除了根据所附权利要求及其等同替换所做的限制以外,实施例不受其它限制。此外,可以在所附权利要求的保护范围内进行多种修改和改变。The present disclosure includes and contemplates combinations of features and elements known to those of ordinary skill in the art. The embodiments, features, and elements disclosed in the present disclosure may also be combined with any conventional features or elements to form a unique invention scheme defined by the claims. Any features or elements of any embodiment may also be combined with features or elements from other invention schemes to form another unique invention scheme defined by the claims. Therefore, it should be understood that any feature shown and/or discussed in the present disclosure may be implemented individually or in any appropriate combination. Therefore, except for the limitations made according to the attached claims and their equivalents, the embodiments are not subject to other limitations. In addition, various modifications and changes may be made within the scope of protection of the attached claims.
此外,在描述具有代表性的实施例时,说明书可能已经将方法和/或过程呈现为特定的步骤序列。然而,在该方法或过程不依赖于本文所述步骤的特定顺序的程度上,该方法或过程不应限于所述的特定顺序的步骤。如本领域普通技术人员将理解的,其它的步骤顺序也是可能的。因此,说明书中阐述的步骤的特定顺序不应被解释为对权利要求的限制。此外,针对该方法和/或过程的权利要求不应限于按照所写顺序执行它们的步骤,本领域技术人员可以容易地理解,这些顺序可以变化,并且仍然保持在本公开实施例的精神和范围内。In addition, when describing representative embodiments, the specification may have presented the method and/or process as a specific sequence of steps. However, to the extent that the method or process does not rely on the specific order of the steps described herein, the method or process should not be limited to the steps in the specific order described. As will be appreciated by those of ordinary skill in the art, other orders of steps are also possible. Therefore, the specific order of the steps set forth in the specification should not be interpreted as a limitation to the claims. In addition, the claims for the method and/or process should not be limited to the steps performed in the order written, and those skilled in the art can easily understand that these orders can be changed and still remain within the spirit and scope of the disclosed embodiments.
本公开实施例提供了一种数据收集结构A,如图1、图2、图3、图4所示,可以包括:多个计算单元1和多个仲裁器2;所述多个仲裁器2包括与每个计算单元对应的第一仲裁器21;The present disclosure provides a data collection structure A, as shown in FIG1 , FIG2 , FIG3 , and FIG4 , which may include: a plurality of computing units 1 and a plurality of arbitrators 2; the plurality of arbitrators 2 include a first arbitrator 21 corresponding to each computing unit;
多个所述计算单元1通过多个所述第一仲裁器21依次相连接,构成数据收集链,并且处于链尾的计算单元与控制单元相连接;The plurality of computing units 1 are connected in sequence through the plurality of the first arbitrators 21 to form a data collection chain, and the computing unit at the end of the chain is connected to the control unit;
其中,所述仲裁器2设置为接收所述计算单元发送的计算结果并沿所述数据收集链传输至所述控制单元3。The arbitrator 2 is configured to receive the calculation result sent by the calculation unit and transmit it to the control unit 3 along the data collection chain.
在本公开的示例性实施例中,提供了一种无反压的数据收集链式结构,收集所有计算单元的计算结果,可以简化计算单元的设计,提高计算单元工作频率,并使计算单元紧密排列,充分利用芯片空间。In an exemplary embodiment of the present disclosure, a data collection chain structure without back pressure is provided to collect the calculation results of all computing units, which can simplify the design of the computing units, increase the operating frequency of the computing units, and arrange the computing units closely to make full use of the chip space.
在本公开的示例性实施例中,第一仲裁器2可以设置于所对应的计算单 元1内部(如图1、图3所示),也可以设置于所对应的计算单元1外部(如图2、图4所示),当第一仲裁器21设计在计算单元1内部时,使得计算单元1可以紧密排列,充分利用芯片空间。In an exemplary embodiment of the present disclosure, the first arbitrator 2 can be arranged inside the corresponding computing unit 1 (as shown in Figures 1 and 3), or can be arranged outside the corresponding computing unit 1 (as shown in Figures 2 and 4). When the first arbitrator 21 is designed inside the computing unit 1, the computing units 1 can be arranged closely to make full use of the chip space.
在本公开的示例性实施例中,图1、图2、图3、图4中所示的多个计算单元1分别为不同的计算单元1,并不是指同一计算单元1,多个第一仲裁器21分别为不同的第一仲裁器21,并不是指同一第一仲裁器21,多个第二仲裁器22分别为不同的第二仲裁器22,并不是指同一第二仲裁器22。In the exemplary embodiments of the present disclosure, the multiple computing units 1 shown in Figures 1, 2, 3, and 4 are different computing units 1 and do not refer to the same computing unit 1, the multiple first arbitrators 21 are different first arbitrators 21 and do not refer to the same first arbitrator 21, and the multiple second arbitrators 22 are different second arbitrators 22 and do not refer to the same second arbitrator 22.
在本公开的示例性实施例中,所述多个所述计算单元通过多个所述第一仲裁器依次相连接,构成数据收集链,可以包括:In an exemplary embodiment of the present disclosure, the plurality of computing units are sequentially connected through a plurality of the first arbitrators to form a data collection chain, which may include:
每个计算单元1分别通过与该计算单元相对应的第一仲裁器21连接到下一个计算单元1,使得所述多个计算单元1呈链式连接,构成所述数据收集链。Each computing unit 1 is connected to the next computing unit 1 through a first arbitrator 21 corresponding to the computing unit, so that the multiple computing units 1 are connected in a chain to form the data collection chain.
在本公开的示例性实施例中,每个仲裁器可以包括两个入口,一个出口:一个入口接本计算单元,另一个入口接上一个计算单元,该出口接下一个计算单元或控制单元。In an exemplary embodiment of the present disclosure, each arbitrator may include two inlets and one outlet: one inlet is connected to the current computing unit, the other inlet is connected to the previous computing unit, and the outlet is connected to the next computing unit or the control unit.
在本公开的示例性实施例中,每个所述第一仲裁器21可以包括第一入口a1、第二入口b1和出口c1;In an exemplary embodiment of the present disclosure, each of the first arbitrators 21 may include a first inlet a1, a second inlet b1, and an outlet c1;
所述每个计算单元1分别通过所对应的第一仲裁器21连接到下一个计算单元1,包括:Each computing unit 1 is connected to the next computing unit 1 through the corresponding first arbiter 21, including:
每个所述第一仲裁器21的第一入口a1与该第一仲裁器21对应的计算单元1的输出接口相连接;The first inlet a1 of each first arbitrator 21 is connected to the output interface of the computing unit 1 corresponding to the first arbitrator 21;
每个所述第一仲裁器21的第二入口b1与上一个计算单元1对应的第一仲裁器21的出口c1相连接。The second inlet b1 of each of the first arbitrators 21 is connected to the outlet c1 of the first arbitrator 21 corresponding to the previous computing unit 1 .
在本公开的示例性实施例中,数据收集链可以仅为一条,也可以设置多条数据收集链相并联,并在每个数据收集链的尾端与控制单元3相连。In an exemplary embodiment of the present disclosure, there may be only one data collection chain, or a plurality of data collection chains may be arranged in parallel, and the end of each data collection chain is connected to the control unit 3 .
在本公开的示例性实施例中,当所述数据收集链为一条时;所述处于链尾的计算单元1与控制单元3相连接,可以包括:In an exemplary embodiment of the present disclosure, when the data collection chain is one, the computing unit 1 at the end of the chain is connected to the control unit 3, which may include:
处于链尾的所述计算单元1对应的所述第一仲裁器21通过所述出口c1 直接与所述控制单元3相连接。The first arbitrator 21 corresponding to the computing unit 1 at the end of the chain is directly connected to the control unit 3 through the exit c1.
在本公开的示例性实施例中,如图3、图4所示,当所述数据收集链为多条时;所述多个仲裁器2还可以包括第二仲裁器22;每个第二仲裁器22可以包括第一入口a2、第二入口b2和出口c2;In an exemplary embodiment of the present disclosure, as shown in FIG3 and FIG4 , when there are multiple data collection chains; the multiple arbitrators 2 may further include a second arbitrator 22; each second arbitrator 22 may include a first inlet a2, a second inlet b2 and an outlet c2;
所述处于链尾的计算单元1与控制单元3相连接,可以包括:The computing unit 1 at the end of the chain is connected to the control unit 3 and may include:
多条所述数据收集链的处于链尾的所述计算单元1对应的第一仲裁器21的出口c1通过所述第二仲裁器22与所述控制单元3相连接。The outlet c1 of the first arbitrator 21 corresponding to the computing unit 1 at the end of the chain of the plurality of data collection chains is connected to the control unit 3 through the second arbitrator 22 .
在本公开的示例性实施例中,当存在一个或多个第二仲裁器22时,第二仲裁器22也加入所连接的数据收集链中,作为该数据收集链的一部分。In an exemplary embodiment of the present disclosure, when there are one or more second arbitrators 22 , the second arbitrators 22 are also added to the connected data collection chain as a part of the data collection chain.
在本公开的示例性实施例中,所述多条所述数据收集链的处于链尾的所述计算单元1对应的第一仲裁器21的出口c1通过所述第二仲裁器22与所述控制单元3相连接,可以包括:In an exemplary embodiment of the present disclosure, the outlet c1 of the first arbitrator 21 corresponding to the computing unit 1 at the end of the chain of the plurality of data collection chains is connected to the control unit 3 through the second arbitrator 22, which may include:
每条所述数据收集链的处于链尾的所述计算单元1对应的第一仲裁器21的出口c1与所述第二仲裁器22的第一入口a2或第二入口b2相连接;The exit c1 of the first arbitrator 21 corresponding to the computing unit 1 at the end of each data collection chain is connected to the first entrance a2 or the second entrance b2 of the second arbitrator 22;
所述第二仲裁器22的出口c2与所述控制单元3的输入接口相连,或者与下一个第二仲裁器22的第一入口a2或第二入口b2相连。The outlet c2 of the second arbitrator 22 is connected to the input interface of the control unit 3 , or is connected to the first inlet a2 or the second inlet b2 of the next second arbitrator 22 .
在本公开的示例性实施例中,当多条所述数据收集链为两条数据收集链时,所述第二仲裁器22可以为一个;该第二仲裁器22的所述第一入口a2与一条所述数据收集链的处于链尾的所述计算单元1对应的第一仲裁器21的出口c2相连,该第二仲裁器22的所述第二入口b2与另一条所述数据收集链的处于链尾的所述计算单元1对应的第一仲裁器21的出口c1相连;该第二仲裁器22的出口c2直接与所述控制单元3的输入接口相连。In an exemplary embodiment of the present disclosure, when the multiple data collection chains are two data collection chains, the second arbitrator 22 can be one; the first inlet a2 of the second arbitrator 22 is connected to the outlet c2 of the first arbitrator 21 corresponding to the computing unit 1 at the end of one data collection chain, and the second inlet b2 of the second arbitrator 22 is connected to the outlet c1 of the first arbitrator 21 corresponding to the computing unit 1 at the end of another data collection chain; the outlet c2 of the second arbitrator 22 is directly connected to the input interface of the control unit 3.
在本公开的示例性实施例中,当多条所述数据收集链大于两条数据收集链时,所述第二仲裁器22可以为多个;多个所述第二仲裁器22同样呈链式相连,记作第二仲裁器链,构成所连接的数据收集链的一部分;并且每个所述第二仲裁器22的第一入口a2和第二入口b2中至少之一连接到所述数据收集链的最后一个计算单元1对应的第一仲裁器22的出口c1,处于所述第二仲裁器链的链尾的第二仲裁器22的出口c2直接与所述控制单元3的输入接 口相连。In an exemplary embodiment of the present disclosure, when the plurality of data collection chains is greater than two data collection chains, there may be a plurality of second arbitrators 22; the plurality of second arbitrators 22 are also connected in a chain, recorded as a second arbitrator chain, constituting a part of the connected data collection chain; and at least one of the first inlet a2 and the second inlet b2 of each second arbitrator 22 is connected to the outlet c1 of the first arbitrator 22 corresponding to the last computing unit 1 of the data collection chain, and the outlet c2 of the second arbitrator 22 at the end of the second arbitrator chain is directly connected to the input interface of the control unit 3.
在本公开的示例性实施例中,如图3、图4所示,详细连接方式可以包括:当包括m(m为正整数)个第二仲裁器22时,第一个第二仲裁器22的第一入口a2可以与第一条数据收集链的最后一个计算单元1对应的第一仲裁器21的出口c1相连,第一个第二仲裁器22的第二入口b2可以与第二条数据收集链的最后一个计算单元1对应的第一仲裁器21的出口c1相连,第一个第二仲裁器22的出口c2可以与下一个第二仲裁器22(第二个第二仲裁器22)的第二入口b2相连。第2个至第m个第二仲裁器22的第一入口a2可以依次与第三条数据收集链到最后一条数据收集链内的最后一个计算单元1对应的第一仲裁器21的出口c1相连,第2个至第m个第二仲裁器22的第二入口b2可以依次连接到对应的上一个第二仲裁器22的出口c2。第m个第二仲裁器22的出口c2直接连接到控制单元3的输入接口。In an exemplary embodiment of the present disclosure, as shown in FIG. 3 and FIG. 4 , the detailed connection mode may include: when m (m is a positive integer) second arbitrators 22 are included, the first inlet a2 of the first second arbitrator 22 may be connected to the outlet c1 of the first arbitrator 21 corresponding to the last computing unit 1 of the first data collection chain, the second inlet b2 of the first second arbitrator 22 may be connected to the outlet c1 of the first arbitrator 21 corresponding to the last computing unit 1 of the second data collection chain, and the outlet c2 of the first second arbitrator 22 may be connected to the second inlet b2 of the next second arbitrator 22 (the second second arbitrator 22). The first inlet a2 of the second to m-th second arbitrators 22 may be connected in sequence to the outlet c1 of the first arbitrator 21 corresponding to the last computing unit 1 in the third data collection chain to the last data collection chain, and the second inlet b2 of the second to m-th second arbitrators 22 may be connected in sequence to the outlet c2 of the corresponding previous second arbitrator 22. The outlet c2 of the m-th second arbitrator 22 is directly connected to the input interface of the control unit 3.
在本公开的示例性实施例中,图1、图2、图3、图4中计算单元1的数量仅为示例,实际可以为任意多个,例如,一般为1到65536个。In the exemplary embodiments of the present disclosure, the number of computing units 1 in FIG. 1 , FIG. 2 , FIG. 3 , and FIG. 4 is only an example, and may actually be any number, for example, generally 1 to 65536.
本公开实施例还提供了一种数据收集方法,基于上述的数据收集结构,应用于所述数据收集结构中的仲裁器;所述仲裁器可以包括第一仲裁器,或者包括所述第一仲裁器和第二仲裁器;如图5所示,所述方法可以包括步骤S101-S102:The embodiment of the present disclosure further provides a data collection method, which is based on the above data collection structure and is applied to an arbitrator in the data collection structure; the arbitrator may include a first arbitrator, or include the first arbitrator and a second arbitrator; as shown in FIG5 , the method may include steps S101-S102:
S101、获取对应的计算单元1提交的和/或上一级仲裁器2传输的计算结果;S101, obtaining a calculation result submitted by a corresponding calculation unit 1 and/or transmitted by an upper-level arbitrator 2;
S102、将所述计算结果直接或选择性地沿数据收集链进行传输,直至传输到所述控制单元3。S102 , directly or selectively transmitting the calculation result along a data collection chain until it is transmitted to the control unit 3 .
在本公开的示例性实施例中,所述将所述计算结果直接或选择性地沿数据收集链进行传输,可以包括:In an exemplary embodiment of the present disclosure, directly or selectively transmitting the calculation result along a data collection chain may include:
当所述仲裁器2所包含的第一入口和第二入口中任意一个收到所述计算结果时,由该仲裁器2将所述计算结果直接发送到该仲裁器的出口进行输出;When any one of the first entry and the second entry included in the arbitrator 2 receives the calculation result, the arbitrator 2 directly sends the calculation result to the exit of the arbitrator for output;
当所述仲裁器2所包含的第一入口和第二入口中均收到计算结果时,由该仲裁器根据预设的选择策略将两个所述计算结果择一发送到该仲裁器的出 口进行输出。When both the first entry and the second entry included in the arbitrator 2 receive calculation results, the arbitrator selects one of the two calculation results and sends it to the output of the arbitrator for output according to a preset selection strategy.
在本公开的示例性实施例中,当所述计算单元1的数量小于预设的数量阈值时,所述选择策略可以包括:In an exemplary embodiment of the present disclosure, when the number of the computing units 1 is less than a preset number threshold, the selection strategy may include:
不在所述第一仲裁器的第一入口处设置缓存器,从所述第一入口和所述第二入口接收到的计算结果中任意选择一个,将选中的计算结果发送至该仲裁器的出口。No buffer is set at the first entrance of the first arbitrator, and one of the calculation results received from the first entrance and the second entrance is selected at random, and the selected calculation result is sent to the exit of the arbitrator.
在本公开的示例性实施例中,如果计算单元1的数量较少,当遇到第一入口和第二入口均收到计算结果的情况时,可以随机选取一个计算结果丢弃,并将另一个计算结果通过出口向下传输。In an exemplary embodiment of the present disclosure, if the number of computing units 1 is small, when both the first inlet and the second inlet receive computing results, one computing result can be randomly selected and discarded, and the other computing result can be transmitted downward through the outlet.
在本公开的示例性实施例中,理论上控制单元3需要接收所有计算单元1提交的所有计算结果,但这就需要对计算单元1有反压,而这会大大增加计算单元1的设计难度,所以将计算单元1提交计算结果的电路设计成一条链,任意一个计算单元1都可以通过一个仲裁器2向整个链上提交计算结果,如果仲裁器2的两个入口都有请求,则丢弃其中一个。In an exemplary embodiment of the present disclosure, in theory, the control unit 3 needs to receive all calculation results submitted by all computing units 1, but this requires back pressure on the computing unit 1, which will greatly increase the design difficulty of the computing unit 1. Therefore, the circuit for the computing unit 1 to submit the calculation results is designed as a chain. Any computing unit 1 can submit the calculation results to the entire chain through an arbitrator 2. If both entrances of the arbitrator 2 have requests, one of them will be discarded.
在本公开的示例性实施例中,假设有n(n为正整数)个计算单元,每个计算单元提交计算结果的概率都为1/2 32,独立事件。从链头到链尾的计算单元依次编号为计算单元[0]至计算单元[n-1],则: In the exemplary embodiment of the present disclosure, it is assumed that there are n (n is a positive integer) computing units, and the probability of each computing unit submitting a computing result is 1/2 32 , an independent event. The computing units from the head of the chain to the end of the chain are numbered as computing unit [0] to computing unit [n-1], then:
计算单元[0]连接的链节点没有数据概率为(1-1/2 32) 1The probability that the chain node connected to computing unit [0] has no data is (1-1/2 32 ) 1 ,
计算单元[1]连接的链节点没有数据概率为(1-1/2 32) 2 The probability that the chain node connected to the computing unit [1] has no data is (1-1/2 32 ) 2
计算单元[2]连接的链节点没有数据概率为(1-1/2 32) 3The probability that the chain node connected to computing unit [2] has no data is (1-1/2 32 ) 3 ,
计算单元[i]连接的链节点没有数据概率为(1-1/2 32) i+1,i为正整数, The probability that the chain node connected to computing unit [i] has no data is (1-1/2 32 ) i+1 , where i is a positive integer.
计算单元[n-1]连接的链节点没有数据概率为(1-1/2 32) nThe probability that the chain node connected to computing unit [n-1] has no data is (1-1/2 32 ) n ,
也就是该数据收集结构中,控制单元3收到计算结果的概率为:
Figure PCTCN2023071362-appb-000001
Figure PCTCN2023071362-appb-000002
That is, in this data collection structure, the probability that the control unit 3 receives the calculation result is:
Figure PCTCN2023071362-appb-000001
Figure PCTCN2023071362-appb-000002
整个芯片计算出计算结果的概率为:
Figure PCTCN2023071362-appb-000003
The probability of the entire chip calculating the calculation result is:
Figure PCTCN2023071362-appb-000003
该数据收集结构损失的算力百分比为:
Figure PCTCN2023071362-appb-000004
The percentage of computing power lost by this data collection structure is:
Figure PCTCN2023071362-appb-000004
在n=256的情况下,
Figure PCTCN2023071362-appb-000005
Figure PCTCN2023071362-appb-000006
When n=256,
Figure PCTCN2023071362-appb-000005
Figure PCTCN2023071362-appb-000006
上式约等于0,可以视为该数据收集结构没有算力损失。The above formula is approximately equal to 0, which can be regarded as no computing power loss in this data collection structure.
只要n的数量一定,计算单元1完全串联或部分并联并不影响上述计算结果。As long as the number n is constant, the calculation units 1 being completely connected in series or partially connected in parallel does not affect the above calculation results.
在本公开的示例性实施例中,前述的适当丢弃计算结果的功能使得仲裁器2的设计极为简单,且对计算单元1没有反压,可以使整个计算单元1的数据流向都是单向的,简化计算单元1的结构,且有利于计算单元1工作频率的提升。In the exemplary embodiment of the present disclosure, the aforementioned function of appropriately discarding calculation results makes the design of the arbitrator 2 extremely simple, and there is no back pressure on the computing unit 1, so that the data flow of the entire computing unit 1 is unidirectional, which simplifies the structure of the computing unit 1 and is conducive to the improvement of the operating frequency of the computing unit 1.
在本公开的示例性实施例中,当所述计算单元1的数量大于或等于预设的数量阈值时,所述选择策略可以包括:In an exemplary embodiment of the present disclosure, when the number of the computing units 1 is greater than or equal to a preset number threshold, the selection strategy may include:
预先在所述第一仲裁器的第一入口处设置缓存器,将所述第一入口接收到的所述计算结果发送请求包含的计算结果缓存到该第一入口处预先设置的缓存器内;Pre-setting a buffer at a first entrance of the first arbitrator, and buffering the calculation result included in the calculation result sending request received by the first entrance into the buffer pre-set at the first entrance;
在发送完所述第一入口接收到的所述计算结果以后发送所述缓存器中缓存的计算结果或者发送此时所述第一入口接收到的新的计算结果。After sending the calculation result received by the first entry, the calculation result cached in the buffer is sent or the new calculation result received by the first entry at this time is sent.
在本公开的示例性实施例中,当所述计算单元的数量大于或等于预设的数量阈值时,说明计算单元数量较大,针对该情况,可以预先在每个第一仲裁器21的所述第一入口处增加一个缓存器,所述缓存器的容量为小于预设的容量阈值(例如,缓存器的容量为1,仅能容纳一个计算结果),所述缓存器设置为缓存该第一仲裁器21对应的计算机单元1发送的计算结果。In an exemplary embodiment of the present disclosure, when the number of the computing units is greater than or equal to a preset number threshold, it means that the number of computing units is large. In response to this situation, a cache can be added in advance at the first entrance of each first arbitrator 21. The capacity of the cache is less than the preset capacity threshold (for example, the capacity of the cache is 1 and can only accommodate one calculation result). The cache is configured to cache the calculation results sent by the computer unit 1 corresponding to the first arbitrator 21.
在本公开的示例性实施例中,当所述第一仲裁器21所包含的第一入口a1和第二入口b1中均收到计算结果时,可以先将所述第一入口a1接收到的所述计算结果缓存到相应的缓存器内,并将所述第二入口b1接收到的计算结果发送到该第一仲裁器21的出口c1进行输出。In an exemplary embodiment of the present disclosure, when calculation results are received in both the first entrance a1 and the second entrance b1 included in the first arbitrator 21, the calculation result received by the first entrance a1 can be cached in a corresponding cache first, and the calculation result received by the second entrance b1 can be sent to the exit c1 of the first arbitrator 21 for output.
在本公开的示例性实施例中,在所述第二入口b1接收到的计算结果发送至出口c1输出以后,将所述缓存器中缓存的计算结果发送到该第一仲裁器21的出口c1进行输出;其中,如果在将所述缓存器中缓存的计算结果发送到该第一仲裁器21的出口c1之前接收到该缓存器对应的计算单元1发送的新的计算结果,则抛弃所述缓存器中缓存的计算结果,并将所述新的计算结果发送到该第一仲裁器21的第一出口a1进行输出。In an exemplary embodiment of the present disclosure, after the calculation result received by the second entrance b1 is sent to the exit c1 for output, the calculation result cached in the buffer is sent to the exit c1 of the first arbitrator 21 for output; wherein, if a new calculation result sent by the calculation unit 1 corresponding to the buffer is received before the calculation result cached in the buffer is sent to the exit c1 of the first arbitrator 21, the calculation result cached in the buffer is discarded, and the new calculation result is sent to the first exit a1 of the first arbitrator 21 for output.
在本公开的示例性实施例中,所述方法还包括:根据挽回的算力损失大小以及需要增加的资源多少确定是否设置缓存器;其中,当挽回的算力损失大于或等于预设的算力阈值,并且需要增加的资源占芯片总资源的比例小于或等于预设的比例阈值时确定设置所述缓存器;当挽回的算力损失小于预设的算力阈值,和/或需要增加的资源占芯片总资源的比例大于预设的比例阈值时确定不设置所述缓存器。In an exemplary embodiment of the present disclosure, the method also includes: determining whether to set a buffer based on the size of the recovered computing power loss and the amount of resources that need to be increased; wherein, when the recovered computing power loss is greater than or equal to a preset computing power threshold, and the proportion of resources that need to be increased to the total chip resources is less than or equal to a preset proportion threshold, determining to set the buffer; when the recovered computing power loss is less than the preset computing power threshold, and/or the proportion of resources that need to be increased to the total chip resources is greater than the preset proportion threshold, determining not to set the buffer.
在本公开的示例性实施例中,当n的数量较大时,通过在第一仲裁器21的第一入口a1增加1个缓存器,缓存计算单元1的计算结果,并在当前缓存的计算结果遇到新发送的计算结果时抛弃缓存的计算结果这一方案,能够使得算力损失降低,但该方案可能会导致芯片资源增加,因此可以结合挽回的算力损失以及需要增加的资源综合考虑是否在入口1加缓存器,例如在n为256时,最多也只能挽回2.9802322387695312e-08的算力,如果在第一入口a1增加的缓存资源所占的芯片资源超过这个值,就不应该在第一入口a1增加缓存。In an exemplary embodiment of the present disclosure, when the number n is large, by adding a cache to the first entrance a1 of the first arbiter 21, caching the calculation result of the calculation unit 1, and abandoning the cached calculation result when the current cached calculation result encounters the newly sent calculation result, this solution can reduce the computing power loss, but this solution may lead to an increase in chip resources. Therefore, it is necessary to comprehensively consider whether to add a cache to the entrance 1 in combination with the computing power loss to be recovered and the resources to be increased. For example, when n is 256, at most 2.9802322387695312e-08 of computing power can be recovered. If the chip resources occupied by the cache resources added at the first entrance a1 exceed this value, the cache should not be added at the first entrance a1.
在本公开的示例性实施例中,本公开实施例方案至少包括以下优势:In the exemplary embodiments of the present disclosure, the embodiments of the present disclosure include at least the following advantages:
1、计算单元可以紧密排列,充分利用计算空间;1. The computing units can be arranged closely to make full use of the computing space;
2、仲裁器对计算单元没有反压,可以使得计算单元设计简单,提高了工作频率;2. The arbitrator has no back pressure on the computing unit, which can simplify the design of the computing unit and increase the operating frequency;
3、对芯片的算力几乎没有影响。3. It has almost no impact on the computing power of the chip.
本公开实施例还提供了一种芯片10,如图6所示,包括所述的数据收集结构A。The embodiment of the present disclosure further provides a chip 10, as shown in FIG6, comprising the data collection structure A described above.
在本公开的示例性实施例中,前述的数据收集结构和方法中的任意实施 例均适用于该芯片10实施例中,在此不再一一赘述。In the exemplary embodiments of the present disclosure, any of the aforementioned data collection structures and methods are applicable to the chip 10 embodiment and will not be described in detail herein.
本公开实施例还提供了一种数据收集系统1,如图7所示,包括芯片10、处理器11和计算机可读存储介质12,所述处理器11将所述芯片10中的计算结果存储到所述计算机可读存储介质12中。The embodiment of the present disclosure further provides a data collection system 1 , as shown in FIG. 7 , comprising a chip 10 , a processor 11 and a computer-readable storage medium 12 , wherein the processor 11 stores the calculation results in the chip 10 in the computer-readable storage medium 12 .
在本公开的示例性实施例中,前述的数据收集结构和方法中的任意实施例均适用于该数据收集系统1实施例中,在此不再一一赘述。In the exemplary embodiments of the present disclosure, any of the aforementioned data collection structures and methods are applicable to the data collection system 1 embodiment, and will not be described one by one here.
在本公开的示例性实施例中,处理器11和计算机可读存储介质12可以由上述的控制单元实现。In an exemplary embodiment of the present disclosure, the processor 11 and the computer-readable storage medium 12 may be implemented by the above-mentioned control unit.
在本公开的示例性实施例中,前述的数据收集结构和方法中的任意实施例均适用于该计算机可读存储介质实施例中,在此不再一一赘述。In the exemplary embodiments of the present disclosure, any of the aforementioned data collection structures and methods are applicable to the computer-readable storage medium embodiments and will not be described in detail herein.
本领域普通技术人员可以理解,上文中所公开方法中的全部或某些步骤、系统、装置中的功能模块/单元可以被实施为软件、固件、硬件及其适当的组合。在硬件实施方式中,在以上描述中提及的功能模块/单元之间的划分不一定对应于物理组件的划分;例如,一个物理组件可以具有多个功能,或者一个功能或步骤可以由若干物理组件合作执行。某些组件或所有组件可以被实施为由处理器,如数字信号处理器或微处理器执行的软件,或者被实施为硬件,或者被实施为集成电路,如专用集成电路。这样的软件可以分布在计算机可读介质上,计算机可读介质可以包括计算机存储介质(或非暂时性介质)和通信介质(或暂时性介质)。如本领域普通技术人员公知的,术语计算机存储介质包括在用于存储信息(诸如计算机可读指令、数据结构、程序模块或其他数据)的任何方法或技术中实施的易失性和非易失性、可移除和不可移除介质。计算机存储介质包括但不限于RAM、ROM、EEPROM、闪存或其他存储器技术、CD-ROM、数字多功能盘(DVD)或其他光盘存储、磁盒、磁带、磁盘存储或其他磁存储装置、或者可以用于存储期望的信息并且可以被计算机访问的任何其他的介质。此外,本领域普通技术人员公知的是,通信介质通常包含计算机可读指令、数据结构、程序模块或者诸如载波或其他传输机制之类的调制数据信号中的其他数据,并且可包括任何信息递送介质。It will be appreciated by those skilled in the art that all or some of the steps, systems, and functional modules/units in the methods disclosed above may be implemented as software, firmware, hardware, and appropriate combinations thereof. In hardware implementations, the division between the functional modules/units mentioned in the above description does not necessarily correspond to the division of physical components; for example, a physical component may have multiple functions, or a function or step may be performed by several physical components in cooperation. Some or all components may be implemented as software executed by a processor, such as a digital signal processor or a microprocessor, or implemented as hardware, or implemented as an integrated circuit, such as an application-specific integrated circuit. Such software may be distributed on a computer-readable medium, which may include a computer storage medium (or non-transitory medium) and a communication medium (or temporary medium). As known to those skilled in the art, the term computer storage medium includes volatile and non-volatile, removable and non-removable media implemented in any method or technology for storing information (such as computer-readable instructions, data structures, program modules, or other data). Computer storage media include, but are not limited to, RAM, ROM, EEPROM, flash memory or other memory technology, CD-ROM, digital versatile disks (DVD) or other optical disk storage, magnetic cassettes, magnetic tapes, magnetic disk storage or other magnetic storage devices, or any other medium that can be used to store the desired information and can be accessed by a computer. In addition, it is well known to those of ordinary skill in the art that communication media typically contain computer-readable instructions, data structures, program modules, or other data in a modulated data signal such as a carrier wave or other transport mechanism, and may include any information delivery media.

Claims (10)

  1. 一种数据收集结构,包括:多个计算单元和多个仲裁器;所述多个仲裁器包括与每个计算单元对应的第一仲裁器;A data collection structure comprises: a plurality of computing units and a plurality of arbitrators; the plurality of arbitrators comprises a first arbitrator corresponding to each computing unit;
    多个所述计算单元通过多个所述第一仲裁器依次相连接,构成数据收集链,并且处于链尾的计算单元与控制单元相连接;The plurality of computing units are sequentially connected through the plurality of the first arbitrators to form a data collection chain, and the computing unit at the end of the chain is connected to the control unit;
    其中,所述仲裁器设置为接收所述计算单元发送的计算结果并沿所述数据收集链传输至所述控制单元。The arbitrator is configured to receive the calculation result sent by the calculation unit and transmit it to the control unit along the data collection chain.
  2. 根据权利要求1所述的数据收集结构,其中每个所述第一仲裁器包括第一入口、第二入口和出口;The data collection structure according to claim 1, wherein each of the first arbitrators comprises a first inlet, a second inlet, and an outlet;
    所述每个计算单元分别通过与该计算单元对应的第一仲裁器连接到下一个计算单元,包括:Each computing unit is connected to the next computing unit through a first arbitrator corresponding to the computing unit, including:
    每个所述第一仲裁器的第一入口与该第一仲裁器所对应的计算单元的输出接口相连接;The first inlet of each of the first arbitrators is connected to the output interface of the computing unit corresponding to the first arbitrator;
    每个所述第一仲裁器的第二入口与上一个计算单元对应的第一仲裁器的出口相连接。The second inlet of each of the first arbitrators is connected to the outlet of the first arbitrator corresponding to the previous computing unit.
  3. 根据权利要求1所述的数据收集结构,其中,所述数据收集链为一条时;所述处于链尾的计算单元与控制单元相连接,包括:The data collection structure according to claim 1, wherein when the data collection chain is one, the computing unit at the end of the chain is connected to the control unit, comprising:
    处于链尾的所述计算单元对应的所述第一仲裁器通过该第一仲裁器的出口直接与所述控制单元相连接;The first arbitrator corresponding to the computing unit at the end of the chain is directly connected to the control unit through the outlet of the first arbitrator;
    所述数据收集链为多条时;所述多个仲裁器还包括:第二仲裁器;每个第二仲裁器包括第一入口、第二入口和出口;When there are multiple data collection chains, the multiple arbitrators further include: a second arbitrator; each second arbitrator includes a first inlet, a second inlet and an outlet;
    所述处于链尾的计算单元与控制单元相连接,包括:The computing unit at the end of the chain is connected to the control unit and includes:
    多条所述数据收集链的处于链尾的所述计算单元对应的第一仲裁器的出口通过所述第二仲裁器与所述控制单元相连接。The outlets of the first arbitrators corresponding to the computing units at the tails of the multiple data collection chains are connected to the control unit through the second arbitrator.
  4. 根据权利要求3所述的数据收集结构,其中,所述多条所述数据收集链的处于链尾的所述计算单元对应的第一仲裁器的出口通过所述第二仲裁器与所述控制单元相连接,包括:The data collection structure according to claim 3, wherein the outlet of the first arbitrator corresponding to the computing unit at the end of the chain of the plurality of data collection chains is connected to the control unit through the second arbitrator, comprising:
    每条所述数据收集链的处于链尾的所述计算单元对应的第一仲裁器的出口与所述第二仲裁器的第一入口或所述第二仲裁器的第二入口相连接;The outlet of the first arbitrator corresponding to the computing unit at the end of each data collection chain is connected to the first inlet of the second arbitrator or the second inlet of the second arbitrator;
    所述第二仲裁器的出口与所述控制单元的输入接口相连,或者与下一个第二仲裁器的第一入口或第二入口相连。The outlet of the second arbitrator is connected to the input interface of the control unit, or is connected to the first inlet or the second inlet of the next second arbitrator.
  5. 根据权利要求2所述的数据收集结构,其中,每个所述第一仲裁器的第一入口处设置有缓存器;The data collection structure according to claim 2, wherein a buffer is provided at the first entrance of each of the first arbitrators;
    所述缓存器设置为缓存该第一仲裁器对应的计算机单元发送的计算结果。The buffer is configured to buffer the calculation results sent by the computer unit corresponding to the first arbitrator.
  6. 一种数据收集方法,其特征在于,基于如权利要求1-5任意一项所述的数据收集结构,应用于所述数据收集结构中的仲裁器;所述仲裁器包括第一仲裁器,或者包括所述第一仲裁器和第二仲裁器;所述方法包括:A data collection method, characterized in that it is based on the data collection structure according to any one of claims 1 to 5 and is applied to an arbitrator in the data collection structure; the arbitrator includes a first arbitrator, or includes the first arbitrator and a second arbitrator; the method includes:
    获取对应的计算单元提交的和/或上一级仲裁器传输的计算结果;Obtaining the calculation results submitted by the corresponding computing unit and/or transmitted by the upper-level arbitrator;
    将所述计算结果直接或选择性地沿数据收集链进行传输,直至传输到所述控制单元。The calculation results are transmitted directly or selectively along a data collection chain until they are transmitted to the control unit.
  7. 根据权利要求6所述的数据收集方法,其中,所述将所述计算结果直接或选择性地沿数据收集链进行传输,包括:The data collection method according to claim 6, wherein the step of directly or selectively transmitting the calculation result along the data collection chain comprises:
    当所述仲裁器所包含的第一入口和第二入口中任意一个收到所述计算结果时,由该仲裁器将所述计算结果直接发送到该仲裁器的出口进行输出;When any one of the first entry and the second entry included in the arbitrator receives the calculation result, the arbitrator directly sends the calculation result to the exit of the arbitrator for output;
    当所述仲裁器所包含的第一入口和第二入口中均收到计算结果时,由该仲裁器根据预设的选择策略将两个所述计算结果择一发送到该仲裁器的出口进行输出。When both the first entry and the second entry of the arbitrator receive calculation results, the arbitrator selects one of the two calculation results and sends it to the exit of the arbitrator for output according to a preset selection strategy.
  8. 根据权利要求7所述的数据收集方法,其中,当所述计算单元的数量小于预设的数量阈值时,所述选择策略包括:The data collection method according to claim 7, wherein when the number of the computing units is less than a preset number threshold, the selection strategy comprises:
    不在所述第一仲裁器的第一入口处设置缓存器,从所述第一入口和所述第二入口接收到的计算结果中任意选择一个,将选中的计算结果发送至该仲裁器的出口;No buffer is set at the first entrance of the first arbitrator, and one of the calculation results received from the first entrance and the second entrance is selected at random, and the selected calculation result is sent to the exit of the arbitrator;
    当所述计算单元的数量大于或等于预设的数量阈值时,所述选择策略包括:When the number of the computing units is greater than or equal to a preset number threshold, the selection strategy includes:
    预先在所述第一仲裁器的第一入口处设置缓存器,将所述第一入口接收到的所述计算结果发送请求包含的计算结果缓存到该第一入口处预先设置的缓存器内;Pre-setting a buffer at a first entrance of the first arbitrator, and buffering the calculation result included in the calculation result sending request received by the first entrance into the buffer pre-set at the first entrance;
    在发送完所述第一入口接收到的所述计算结果以后发送所述缓存器中缓存的计算结果或者发送此时所述第一入口接收到的新的计算结果。After sending the calculation result received by the first entry, the calculation result cached in the buffer is sent or the new calculation result received by the first entry at this time is sent.
  9. 一种芯片,其特征在于,包括如权利要求1-5任意一项所述的数据收集结构。A chip, characterized in that it comprises the data collection structure as described in any one of claims 1-5.
  10. 一种数据收集系统,其特征在于,包括权利要求9所述的芯片、处理器和计算机可读存储介质,所述处理器将所述芯片中的计算结果存储到所述计算机可读存储介质中。A data collection system, characterized in that it includes the chip, processor and computer-readable storage medium as described in claim 9, and the processor stores the calculation results in the chip in the computer-readable storage medium.
PCT/CN2023/071362 2022-12-27 2023-01-09 Data collection structure, method and system, and chip WO2024138796A1 (en)

Applications Claiming Priority (2)

Application Number Priority Date Filing Date Title
CN202211684700.8 2022-12-27
CN202211684700.8A CN115905088B (en) 2022-12-27 2022-12-27 Data collection structure, method, chip and system

Publications (1)

Publication Number Publication Date
WO2024138796A1 true WO2024138796A1 (en) 2024-07-04

Family

ID=86471046

Family Applications (1)

Application Number Title Priority Date Filing Date
PCT/CN2023/071362 WO2024138796A1 (en) 2022-12-27 2023-01-09 Data collection structure, method and system, and chip

Country Status (2)

Country Link
CN (1) CN115905088B (en)
WO (1) WO2024138796A1 (en)

Citations (5)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
US7158510B1 (en) * 2002-02-14 2007-01-02 Alcatel Look-up table arbitration system and method for a fast switching element
CN109716311A (en) * 2016-09-29 2019-05-03 英特尔公司 System, apparatus and method for performing distributed arbitration
CN112130752A (en) * 2019-06-24 2020-12-25 英特尔公司 Shared local memory read merge and multicast return
CN114928578A (en) * 2022-07-19 2022-08-19 中科声龙科技发展(北京)有限公司 Chip structure
CN115002050A (en) * 2022-07-18 2022-09-02 中科声龙科技发展(北京)有限公司 Workload proving chip

Family Cites Families (10)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
SU1424024A1 (en) * 1987-02-16 1988-09-15 Предприятие П/Я Г-4173 Data collection and processing system
US6434649B1 (en) * 1998-10-14 2002-08-13 Hitachi, Ltd. Data streamer
US11157287B2 (en) * 2017-07-24 2021-10-26 Tesla, Inc. Computational array microprocessor system with variable latency memory access
JP7114515B2 (en) * 2019-03-14 2022-08-08 国立大学法人東海国立大学機構 Communication device, communication system and message arbitration method
CN111782580B (en) * 2020-06-30 2024-03-01 北京百度网讯科技有限公司 Complex computing device, complex computing method, artificial intelligent chip and electronic equipment
CN114138706B (en) * 2021-10-29 2022-07-08 北京中科昊芯科技有限公司 Multifunctional arbiter, arbitration method, chip and product
CN113722249B (en) * 2021-11-01 2022-02-08 中科声龙科技发展(北京)有限公司 Data processing apparatus and data processing method
CN114928577B (en) * 2022-07-19 2022-10-21 中科声龙科技发展(北京)有限公司 Workload proving chip and processing method thereof
CN114925004B (en) * 2022-07-19 2022-10-21 中科声龙科技发展(北京)有限公司 Polling arbitrator and its polling arbitrating method and chip
CN115328828B (en) * 2022-10-17 2023-01-24 中科声龙科技发展(北京)有限公司 Data storage system and data addressing and returning method of data storage structure of data storage system

Patent Citations (5)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
US7158510B1 (en) * 2002-02-14 2007-01-02 Alcatel Look-up table arbitration system and method for a fast switching element
CN109716311A (en) * 2016-09-29 2019-05-03 英特尔公司 System, apparatus and method for performing distributed arbitration
CN112130752A (en) * 2019-06-24 2020-12-25 英特尔公司 Shared local memory read merge and multicast return
CN115002050A (en) * 2022-07-18 2022-09-02 中科声龙科技发展(北京)有限公司 Workload proving chip
CN114928578A (en) * 2022-07-19 2022-08-19 中科声龙科技发展(北京)有限公司 Chip structure

Also Published As

Publication number Publication date
CN115905088B (en) 2023-07-14
CN115905088A (en) 2023-04-04

Similar Documents

Publication Publication Date Title
CN109035016B (en) Multi-chain concurrent transaction method
CN110741342B (en) Blockchain transaction commit ordering
KR102448787B1 (en) System for providing retrieval service based on blockchain and method of the same
CN104462225B (en) The method, apparatus and system of a kind of digital independent
CN111339078A (en) Data real-time storage method, data query method, device, equipment and medium
JP4948276B2 (en) Database search apparatus and database search program
CN103164266B (en) The Dynamic Resource Allocation for Multimedia of the transactions requests sent for initiating equipment to receiving device
US10084710B2 (en) Data processing method of NOC without buffer and NOC electronic element
CN112015527B (en) Managing fetching and executing commands from a commit queue
JP2008225558A (en) Data-relay integrated circuit, data relay device, and data relay method
WO2024138796A1 (en) Data collection structure, method and system, and chip
CN118509399A (en) Message processing method and device, electronic equipment and storage medium
JP2008234059A (en) Data transfer device and information processing system
US20210357275A1 (en) Message stream processor microbatching
EP4099235A1 (en) Data pre-fetching method and apparatus, and storage device
WO2024148871A1 (en) Storage data processing method and apparatus, electronic device, and storage medium
US7886097B2 (en) Bus arbitration system, medium, and method
CN104778088B (en) A kind of Parallel I/O optimization methods and system based on reduction interprocess communication expense
US20120203881A1 (en) Computing system, configuration management device, and management
JP3317873B2 (en) Data transfer control device
CN109919768B (en) Block generation method, device, medium and computing equipment
US20160055111A1 (en) Using a credits available value in determining whether to issue a ppi allocation request to a packet engine
CN110677463B (en) Parallel data transmission method, device, medium and electronic equipment
JP2011035613A (en) Transmitter, transmission method, and control program
US20210042117A1 (en) Data Processing Apparatus and Method

Legal Events

Date Code Title Description
121 Ep: the epo has been informed by wipo that ep was designated in this application

Ref document number: 23908756

Country of ref document: EP

Kind code of ref document: A1