WO2019120294A1 - Data-processing apparatus, data transmission method, and computing system thereof - Google Patents

Data-processing apparatus, data transmission method, and computing system thereof Download PDF

Info

Publication number
WO2019120294A1
WO2019120294A1 PCT/CN2018/122789 CN2018122789W WO2019120294A1 WO 2019120294 A1 WO2019120294 A1 WO 2019120294A1 CN 2018122789 W CN2018122789 W CN 2018122789W WO 2019120294 A1 WO2019120294 A1 WO 2019120294A1
Authority
WO
WIPO (PCT)
Prior art keywords
integrated circuit
data
stage
circuit chip
busy
Prior art date
Application number
PCT/CN2018/122789
Other languages
French (fr)
Inventor
Cunyong YANG
Guochen Sun
Ketuan ZHAN
Original Assignee
Bitmain Technologies Inc.
Priority date (The priority date is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the date listed.)
Filing date
Publication date
Application filed by Bitmain Technologies Inc. filed Critical Bitmain Technologies Inc.
Publication of WO2019120294A1 publication Critical patent/WO2019120294A1/en

Links

Images

Classifications

    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F15/00Digital computers in general; Data processing equipment in general
    • G06F15/16Combinations of two or more digital computers each having at least an arithmetic unit, a program unit and a register, e.g. for a simultaneous processing of several programs
    • G06F15/163Interprocessor communication
    • G06F15/173Interprocessor communication using an interconnection network, e.g. matrix, shuffle, pyramid, star, snowflake
    • G06F15/17306Intercommunication techniques
    • G06F15/17325Synchronisation; Hardware support therefor

Definitions

  • the present disclosure relates to the field of data processing technologies, and in particular, to a data-processing apparatus, a data transmission method, and a computing system thereof.
  • GPU processing chips have been widely used in data computing tasks in various applications because of their superior graphics processing and parallel computing capabilities compared to traditional CPUs.
  • the present disclosure provides a data-processing apparatus, a data transmission method, and a computing system thereof.
  • a data-processing apparatus includes a plurality of integrated circuit chips connected in series.
  • the plurality of integrated circuit chips include a first-stage integrated circuit chip and one or more non-first-stage integrated circuit chips.
  • Each non-first-stage integrated circuit chip has a corresponding preceding-stage integrated circuit chip in an upstream communication direction within the plurality of integrated circuit chips.
  • the first-stage integrated circuit chip is configured to receive a command signal from a control unit, transmit the command to the one or more non-first-stage integrated circuit chips, and return a computation result to the control unit.
  • Each of the plurality of integrated circuit chips includes a busy-signal input pin and a busy-signal output pin; and the busy-signal input pin of each non-first-stage integrated circuit chip corresponds to the busy-signal output pin of its corresponding preceding-stage integrated circuit chip.
  • the busy-signal output pin is configured to output a first logic voltage level indicating that the integrated circuit chip is available to forward data or a second logic voltage level indicating the integrated circuit chip is unavailable to forward data.
  • each non-first-stage integrated circuit chip is configured to output the second logic voltage level through its busy-signal output pin in response to detecting its busy-signal input pin having the second logic voltage level.
  • a present-stage integrated circuit chip includes a buffer queue. If the buffer queue stores queued data waiting to be transmitted, the present-stage integrated circuit chip is configured to: output the second logic voltage level through the busy-signal output pin of the present-stage integrated circuit chip; hold off transmitting the queued data in response to detecting the busy-signal input pin of present-stage integrated circuit chip having the second logic voltage; start transmitting the queued data to the corresponding preceding-stage integrated circuit chip in response to detecting the busy-signal input pin of the present-stage integrated circuit chip having the first logic voltage level; and output the first logic voltage level through the busy-signal output pin of the present-stage integrated circuit chip when the transmitting of the queued data is completed.
  • the present-stage integrated circuit chip is configured to complete transmitting the queued data after starting transmitting the queued data.
  • the present-stage integrated circuit chip is configured to wait for a guard interval period of time before starting transmitting the queued data to the corresponding preceding-stage integrated circuit chip after outputting the second logic voltage level through the busy-signal output pin.
  • the guard interval period is determined according to a synchronous or asynchronous communication mode of the plurality of the integrated circuit chips respectively.
  • the guard interval period is determined further according a signal transmission delay and/or an operation speed of the plurality of the integrated circuit chips.
  • each of the plurality of integrated circuit chips comprises an internal control unit and one or more arithmetic-unit groups.
  • Each arithmetic-unit group comprises a plurality of arithmetic units including a first-stage arithmetic unit and one or more non-first-stage arithmetic units.
  • Each non-first-stage arithmetic unit has a corresponding preceding-stage arithmetic unit in the plurality of the arithmetic units.
  • the first-stage arithmetic unit is connected to the internal control unit.
  • each of the plurality of arithmetic units comprises a computing component and a storage component.
  • the computing component and the storage component within each of the plurality of arithmetic units are connected to each other.
  • the computing component of each non-first-stage arithmetic unit is connected to the storage component of the corresponding preceding-stage arithmetic unit.
  • each of the plurality of integrated circuit chips includes a data input unit and a data output unit.
  • the data output unit of the first-stage integrated circuit chip is connected to the control unit to transmit the computation result to the control unit.
  • the data output unit of each non-first-stage integrated circuit chip is connected to the data input unit of the corresponding preceding-stage integrated circuit chip to transmit computation data.
  • the data input unit of each non-first-stage integrated circuit chip is connected to the data output unit of the corresponding preceding-stage integrated circuit chip to receive data signal or command signal.
  • the data-processing apparatus further includes a signal conversion unit that connects two of the plurality of integrated circuit chips for signal voltage adaptation.
  • the data-processing apparatus further includes a clock signal crystal oscillator.
  • a clock signal output interface of the clock signal crystal oscillator is connected to a clock signal input interface of one of the plurality of integrated circuit chips.
  • a data transmission method for a data-processing apparatus. The method includes: determining whether a busy-signal input pin of a present-stage integrated circuit chip of the data-processing apparatus is of a first logic voltage level or a second logic voltage level; holding off transmitting data in response to the busy-signal input pin of the present-stage integrated circuit chip being of the second logic voltage; and starting transmitting queued data in a buffer queue to a preceding-stage integrated circuit chip in response to data being waiting to be transmitted in the buffer queue and the busy-signal input pin of the present-stage integrated circuit chip being of the first logic voltage.
  • the data transmission method further includes: outputting the second logic voltage level through a busy-signal output pin of the present-stage integrated circuit chip in response to the busy-signal input pin of the present-stage integrated circuit chip being of the second logic voltage level.
  • the data transmission method further includes: outputting the second logic voltage level through a busy-signal output pin of the present-stage integrated circuit chip when there is data waiting to be transmitted in the buffer queue; and outputting the first logic voltage level through the busy-signal output pin of the present-stage integrated circuit chip when the transmitting of the queued data is completed.
  • the data transmission method further includes: completing transmitting the queued data after starting transmitting the queued data.
  • the data transmission method further includes: waiting for a guard interval period of time before starting transmitting the queued data to the preceding-stage integrated circuit chip after outputting the second logic voltage level through the busy-signal output pin.
  • a computing system in another aspect of the present disclosure, includes an external control unit and the data-processing apparatus in any one of the foregoing embodiments.
  • the data transmission between the node IC chips connected in series are controlled by configuring the busy-signal input/output pins, thereby effectively avoiding data transmission conflicts between the node IC chips.
  • Computational task allocation among the plurality of node IC chips connected in series may be realized using limited amount of command communication between the control unit and the node IC chips.
  • the computational capabilities of the node IC chips may be fully utilized, and a fast failure diagnosis of the chipset may be realized.
  • FIG. 1 is a diagram of a data-processing apparatus in the prior art
  • FIG. 2 is a diagram of a data-processing apparatus according to an embodiment of the present disclosure
  • FIG. 3 is a diagram of a node IC chip according to an embodiment
  • FIG. 4 is a block diagram of two arithmetic units connected in series according to an embodiment of the present disclosure
  • FIG. 5 is a flowchart of a data transmission method according to an embodiment of the present disclosure.
  • FIG. 6 is a flowchart of a data transmission method according to another embodiment of the present disclosure.
  • FIG. 7 is a flowchart of a data transmission method according to another embodiment of the present disclosure.
  • FIG. 8 is a flowchart of a task allocation method according to an embodiment of the present disclosure.
  • FIG. 9 is a flowchart of a task allocation method according to another embodiment of the present disclosure.
  • FIG. 10 is a flowchart of a task allocation method according to another embodiment of the present disclosure.
  • FIG. 11 is a flowchart illustrating a failure diagnosis method according to an embodiment of the present disclosure.
  • FIG. 12 is a flowchart of a failure diagnosis method according to another embodiment of the present disclosure.
  • FIG. 13 is a diagram of a computing system according to an embodiment of the invention.
  • Chinese Patent Application No. CN 201610312586.4 discloses a data-processing apparatus including a plurality of integrated circuit (IC) chips connected in series.
  • a data processing task is received via an external interface by a first-stage node IC chip.
  • the plurality of node IC chips connected in series perform arithmetic processing of the data processing task and return the processing results through the external interface of the first-stage node IC chip.
  • the number of node IC chips may be increased according to the computing requirements of the data processing task, and only one node IC chip needs to communicate with external devices without occupying the communication interfaces of the external devices.
  • the data-processing apparatus may provide scalable data processing capabilities.
  • the foregoing data-processing apparatus connects a plurality of node IC chips in series with each node IC chip responsible for a portion of the computation, improving the data processing speed.
  • data transmission between node IC chips may cause signal conflicts.
  • the data-processing apparatus needs to allocate data processing tasks among the node IC chips after receiving the data processing task transmitted by external devices.
  • the manner of allocating tasks among the plurality of node IC chips to reduce signal conflicts may need to be resolved.
  • each node IC chip processes the same data processing task, there may be a failure condition of a certain node IC chip, which may cause the entire node IC chipset to fail to work normally. How to quickly diagnose the failure may chip also need to be addressed.
  • FIG. 2 is a diagram showing the structure of a data-processing apparatus 10 according to certain embodiments of the present disclosure. As shown in FIG. 2, the data-processing apparatus 10 includes a plurality of node IC chips 20 connected in series.
  • a first-stage node IC chip located in the downstream communication direction from an external control unit receives the command signal from the control unit through an external interface configured in the first-stage node IC chip.
  • the first-stage node IC chip transmits the command signal to one or more node IC chips for processing, and returns the calculation data to the control unit through the external interface.
  • its next-stage node IC chip is a neighboring IC chip connected in the downstream communication direction
  • its preceding-stage node IC chip is a neighboring IC chip connected in the upstream communication direction.
  • Each node IC chip 20 has a busy-signal input pin BI and a busy-signal output pin BO.
  • the busy-signal output pin BO of a node IC chip located in the downstream communication direction is coupled to the busy-signal input pin BI of its next-stage node IC chip 20.
  • the busy-signal output pin BO and the busy-signal input pin BI are used to control data transmission of the corresponding node IC chips in the upstream communication direction.
  • the node IC chip 20 may be implemented using an application-specific integrated circuit (ASIC) , a graphics processing unit (GPU) , a digital signal processor (DSP) , or a field-programmable gate array (FPGA) .
  • ASIC application-specific integrated circuit
  • GPU graphics processing unit
  • DSP digital signal processor
  • FPGA field-programmable gate array
  • the busy-signal output pin BO when the busy-signal output pin BO is of a first logic voltage level, it indicates that the data returned by the next-stage node IC chip can be forwarded; and when the busy-signal output pin BO is of a second logic voltage level, it indicates that the present-stage node IC chip or the preceding-stage node IC chip is or will be transmitting data.
  • the first logic voltage level at a BO or a BI pin may indicate a “not busy” state
  • the second voltage level at a BO or a BI pin may indicate a “busy” state.
  • the first logic voltage level may be a logic LOW voltage level
  • the second logic voltage level may be a logic HIGH voltage level.
  • the first logic voltage level may be a logic HIGH voltage level
  • the second logic voltage level may be a logic LOW voltage level.
  • the busy-signal output pin BO of the node IC chip may also be correspondingly set to the second logic voltage level.
  • first-in, first-out (FIFO) buffer queue of the node IC chip 20 at the present stage when it is detected that its busy-signal input pin BI is of second logic voltage level, it is necessary to wait for the busy-signal input pin BI to turn to the first logic voltage level before transmitting data.
  • FIFO first-out
  • a node IC chip 20 when a node IC chip 20 is transmitting data, if its busy-signal input pin BI is detected to be of the second logic voltage level, the data transmission in progress is not affected, that is, the data is continuously transmitted until all the data in the FIFO buffer queue is transmitted.
  • its busy-signal output pin BO when there is data waiting to be transmitted in the FIFO buffer queue of a node IC chip 20 at the present stage, its busy-signal output pin BO outputs a second logic voltage level. When the data transmission is completed, its busy-signal output pin BO output is set to first logic voltage level. In certain embodiments, when the node IC chip receives a reset signal, its busy-signal output pin BO may be changed to the first logic voltage level.
  • the node IC chip may wait for a predetermined guard interval period GAP before performing further data transmission.
  • the guard interval period GAP is configured to ensure that the next-stage node IC chip does not transmit data at the same time when the present-stage node IC chip needs to transmit data.
  • the next-stage node IC chip sends data in two possible scenarios: one is that the next-stage node IC chip is transmitting its own data, and the other is that the next-stage node IC chip is forwarding data transmitted by a further downstream node IC chip.
  • the guard interval period GAP may be at least 8 bits of data transmission time.
  • the guard interval period GAP may be configured to a 16-bit data transfer time.
  • the node IC chip at the present stage also waits for a guard interval period GAP to ensure that the data transmission of the next-stage node IC chip finishes before the present-stage node IC chip starts to transmit data.
  • GAP guard interval period
  • the first-stage node IC chip of the data-processing apparatus when the first-stage node IC chip of the data-processing apparatus needs to transmit data, it outputs the second logic voltage level through the busy-signal output pin BO, and when the last-stage node IC chip detects a second logic voltage level in its busy-signal input pin BI as the second logic voltage level, the signal has already been delayed by N stages, where N is the total number of node IC chips in the data-processing apparatus. If before the last-stage node IC chip detects the second logic voltage level signal in its busy-signal input pin BI, it has already been transmitting data in the upstream communication direction, then its data signal also needs to pass the N-stage delay to reach the first-stage node. Therefore, the guard interval period GAP needs to be configured so that the two N-stage delays are within the guard interval period GAP.
  • the guard interval period GAP is configured according to different communication modes employed between the node IC chips. Taking a series connection of 256 node IC chips as an example, the following scenarios may be considered.
  • the guard interval period is configured according to the transmission delay of the signals or commands and/or the computing speed of the IC chips.
  • FIG. 3 is a diagram illustrating the architecture of a node IC chip according to certain embodiments.
  • the node IC chip may include an internal control unit 201, two or more groups of arithmetic units 202, and one or more input/output interfaces 203.
  • the internal control unit 201 is connected to the input/output interface 203 for data exchange with external components.
  • Each arithmetic-unit group includes two or more arithmetic units connected in series.
  • a node IC chip may have multiple arithmetic units.
  • the arithmetic units may be divided into two or more arithmetic-unit groups according to a variety of factors such as available space as well as the operating characteristics and functions of the arithmetic unit.
  • the arithmetic units in each group are connected in series.
  • a person skilled in the art may group the arithmetic units according to the application. The present disclosure does not specifically limit the specific grouping method.
  • a first-stage arithmetic unit in each arithmetic-unit group is coupled to the internal control unit. Since the arithmetic units in each arithmetic-unit group are connected in series to each other, only one arithmetic unit in each arithmetic-unit group is connected to the internal control unit. In certain embodiments, the first-stage arithmetic unit in each arithmetic-unit group may be connected to the internal control unit, and the first-stage arithmetic unit is usually the arithmetic unit closest to the internal control unit, further saving wiring space and reducing wiring complexity. A person skilled in the art can select the arithmetic unit connected to the internal control unit according to the application, which is not specifically limited in the present disclosure.
  • a node IC chip may comprise two input/output interfaces 203 which are respectively configured at the two ends of the node IC chip.
  • the two input/output interfaces are connected to the internal control unit, so that the internal control unit may exchange data with external components through the input/output interfaces.
  • the input/output interfaces 203 may be separated into a data input unit and a data output unit.
  • a person skilled in the art may select the number, position and configuration of the input/output interfaces according to the application, which is not specifically limited in the present disclosure.
  • FIG. 4 is a structural diagram of the arithmetic units 202 according to certain embodiments.
  • the arithmetic unit 202 includes one or more computing components 2021, one or more storage components 2022, and a clock input interface 2023.
  • the computing component 2021 is coupled to the storage component 2022 of the preceding-stage arithmetic unit for reading data stored in the preceding-stage arithmetic unit storage component 2022 to perform calculation.
  • the computing component 2021 is also coupled to the storage component 2022 of the same arithmetic unit for storing the calculated data.
  • the stored data may be read by the next-stage arithmetic unit.
  • the clock input interface 2023 may be connected to a clock output interface of the internal control unit.
  • each arithmetic unit may acquire data that is needed for the computation by connecting the plurality of arithmetic units in series.
  • the serial connection may also save wiring space and reduce wiring complexity.
  • the arithmetic unit may comprise a microelectronic circuit, and the microelectronic circuit may comprise COMS transistors and/or NMOS transistors.
  • COMS transistors and/or NMOS transistors may be selected from the microelectronic circuit.
  • a data output unit of the first-stage node IC chip is connected to a data input units of the control unit for returning the computation result of the data-processing apparatus to the control unit.
  • a data input unit of a node IC chip may be connected to the data output unit of its next-stage node IC chip to receive data processed by the next-stage node IC chip.
  • the one or more data input units of the first-stage node IC chip are connected to one or more data output units of the control unit to receive data input or command input from the control unit.
  • the one or more data output units of a node IC chip are connected to one or more data input units of its next-stage node IC chip for transmitting data input or command input to the next-stage node IC chip.
  • the data-processing apparatus may further include a signal conversion unit that connects two node IC chips for signal voltage adaptation.
  • the data-processing apparatus may further include one or more clock signal crystal oscillators. A clock signal output interface of the clock signal crystal oscillator may be coupled to a clock signal input interface of a node IC chip of the data-processing apparatus.
  • FIG. 5 is a flow chart showing an embodiment of a data transmission method based on the data-processing apparatus 10 according to certain embodiments. As shown in FIG. 5, the data transmission may include the following steps.
  • Step S1 the node IC chip at the present stage determines whether there is data waiting to be transmitted in its buffer queue.
  • Step S2 if there is data waiting to be transmitted in the buffer queue, the node IC chip detects whether the busy-signal input pin BI is of the second logic voltage level.
  • Step S3 if the busy-signal input pin BI is of the second logic voltage level, the node IC chip waits for the busy-signal input pin BI to change from second logic voltage level to first logic voltage level, and then starts transmitting data from the buffer queue.
  • Step S4 if the busy-signal input pin BI has a first logic voltage level, the node IC chip immediately starts to transmit the data from its buffer queue.
  • the data transmission method may further include: setting the busy-signal output pin BO to the second logic voltage level when the node IC chip at the present stage detects that the busy-signal input pin BI is of the second logic voltage level.
  • FIG. 6 is a flow chart showing another embodiment of a data transmission method based on the data-processing apparatus 10. As shown in FIG. 6, the data transmission method includes the following steps.
  • Step S11 the present-stage node IC chip determines whether there is data waiting to be transmitted in its buffer queue.
  • Step S12 if there is data waiting to be transmitted in the buffer queue, the present-stage node IC chip outputs the second logic voltage level through its busy-signal output pin BO.
  • Step S13 the present-stage node IC chip detects whether the busy-signal input pin BI is of the second logic voltage level.
  • Step S14 if the busy-signal input pin BI is of the first logic voltage level, the present-stage node IC chip immediately transmits the data in the buffer queue.
  • Step S15 the present-stage node IC chip determines whether the transmission of the data in the buffer queue has finished.
  • Step S16 if the transmission of the data in the buffer queue has finished, the present-stage node IC chip outputs the first logic voltage level through its busy-signal output pin BO.
  • Step S17 if a second logic voltage level in the busy-signal input pin BI is detected when the transmission of the data in the buffer queue has not finished, the data is continuously transmitted until all data transmission in the buffer queue is completed.
  • FIG. 7 is a flow chart showing another embodiment of a data transmission method based on the data-processing apparatus 10. As shown in FIG. 7, on the basis of the embodiment shown in FIG. 6, the data transmission method further includes the following steps after step S12.
  • Step S18 after the busy-signal output pin BO of the present-stage node IC chip has been set to the second logic voltage level, the present-stage node IC chip waits for a predetermined guard interval period, and then performs data transmission to ensure that the next-stage node IC chip does not simultaneously transmit data.
  • the guard interval period is configured depending on a synchronous or an asynchronous communication mode between the node IC chips.
  • the data transmission between the serial node IC chips is controlled by configuring the busy-signal input/output pins, and in combination with configuring the guard interval period, data transmission conflicts between the node IC chips are effectively avoided.
  • FIG. 8 is a flow chart of an embodiment of a task allocation method based on the data-processing apparatus 10. As shown in FIG. 8, the task allocation method is applied to a control unit, and the method includes the following steps.
  • Step S21 the control unit sends a command to at least one node IC chip of the data-processing apparatus to cause the node IC chip to enter an inactive state.
  • Step S22 the control unit sends an address allocation command to a node IC chip of the data-processing apparatus, and sequentially assigns a chip address to each node IC chip.
  • Step S23 the control unit assigns a computing task to each node IC chip according to the chip address of the IC chip.
  • the step of sending a command to at least one node IC chip of the data-processing apparatus to cause the node IC chip to enter the inactive state includes: sending a command to each node IC chip of the data-processing apparatus, so that each node IC chip sequentially enters the inactive state.
  • the step of sending a command to at least one node IC chip of the data-processing apparatus to cause the node IC chip to enter the inactive state includes: sending one command to all of the node IC chips of the data-processing apparatus to cause each node IC chip to simultaneously enter the inactive state.
  • FIG. 9 is a flowchart of another embodiment of a data processing task allocation method based on the data-processing apparatus 10. As shown in FIG. 9, on the basis of the embodiment shown in FIG. 8, the task allocation method further includes the following steps after Step S23.
  • Step S24 the node IC chip performs one or more hash calculations according to the assigned computing task.
  • the hash calculation may include a cryptographic hash operation or a hash collision operation.
  • FIG. 10 is a flow chart showing another embodiment of a task allocation method of the data-processing apparatus 10. As shown in FIG. 10, the task allocation method is applied to a node IC chip, and the method includes the following steps.
  • Step S31 the node IC chip receives an address allocation command transmitted by the control unit.
  • Step S32 the node IC chip determines whether the node IC chip is currently in the inactive state.
  • Step S33 when the node IC chip is in an inactive state, the node IC chip parses the address allocation command transmitted by the control unit, stores the allocated address to a register, and changes to an active state.
  • Step S34 when the node IC chip is in an active state, the node IC chip does not parse the address allocation command transmitted by the control unit, instead, the node IC chip directly forwards the address to the next-stage node IC chip.
  • the task input command format used by the data-processing apparatus 10 may have an HCN field and a starting random number offset (SNO) field.
  • the HCN field is used to control the number of calculations of each node IC chip. For example, if the computing task requires 2 ⁇ 32 calculations, that is, the random number nonce progressively increases from an initial value to 2 ⁇ 32 values, then for 32 node IC chips connected in series, each node IC chip only needs to calculate 2 ⁇ 27 steps.
  • the starting random number offset (SNO) field includes a number.
  • the data-processing apparatus 10 may use an address assignment command SetAddress.
  • the command format includes a chip address field ChipAddr, which specifies the chip address of a single node IC chip.
  • ChipAddr specifies the chip address of a single node IC chip.
  • the computing task for each node IC chip is determined by the values in the SNO and ChipAddr fields.
  • the control unit to allocate the chip address, the control unit first issues a Chaininactive command to change a node IC chip to the inactive state.
  • all node IC chips may be set to the inactive state, or they may be set to the inactive state one by one.
  • the control unit then sends an address assignment command SetAddress to a node IC chip to assign an arbitrary address to the node IC chip.
  • a node IC chip may parse the chip address assignment command SetAddress and store its address in a register. Then the node IC chip may be changed to the active state. In certain embodiments, the chip may enter the active state according to a command of the control unit after being assigned a node address. Alternatively, it may automatically enter into an active state after parsing an address assignment command.
  • the node IC chip may not parse the address allocation command SetAddress. Instead, the node IC chip may forward the address allocation command directly to the next-stage node IC chip.
  • the control unit needs to issue N SetAddress commands to assign an arbitrary address to each node IC chip one by one. For example, for a cryptocurrency mining machine having 256 node IC chips, the control CPU may need to sequentially send 256 SetAddress commands to configure the addresses of all the node IC chips.
  • FIG. 11 is a flowchart according to an embodiment of a failure diagnosis method based on the data-processing apparatus 10. As shown in FIG. 11, the failure diagnosis method is applied to the control unit, and the method includes the following steps.
  • Step S41 the control unit sends a work status query command to a node IC chip of the data-processing apparatus.
  • Step S42 each node IC chip of the data-processing apparatus sequentially forwards the work status query command.
  • Step S43 each node IC chip determines whether its chip address matches the chip address specified in the work status query command.
  • Step S44 if the chip address of a node IC chip matches the chip address specified in the work status query command, the node IC chip returns the register data.
  • Step S45 the control unit determines the work status of each node IC chip according to the register data returned by the node IC chip.
  • the failure diagnosis method may further include: if it is detected that the register data returned by a node IC chip that matches the chip address specified in the work status query command is not received, the control unit determines that a failure occurs in that node IC chip.
  • FIG. 12 is a flow chart showing a failure diagnosis method of the data-processing apparatus 10 according to another embodiment. As shown in FIG. 12, the failure diagnosis method is applied to the control unit, and the method includes the following steps.
  • Step S51 the control unit sends a work status query command to a node IC chip of the data-processing apparatus.
  • Step S52 each node IC chip of the data-processing apparatus sequentially forwards the work status query command.
  • Step S53 the control unit determines whether the work status query command specifies querying the work status of all node IC chips.
  • Step S54 if yes, each node IC chip sequentially returns its register data to the control unit.
  • Step S55 the control unit determines the work status of the node IC chips according to the returned register data.
  • the failure diagnosis method further includes: determining which node IC chips have failures based on the number of register data returned by the node IC chips when the work status query command specifies querying the work status of all the node IC chips.
  • a node IC chip When a node IC chip receives the work status query of the control unit, it first forwards it to its next-stage node IC chip. For the node IC chip whose chip address matches the address in the command, the data in the corresponding register (for example, data being calculated) is returned to the control unit through a universal asynchronous receiver-transmitter (UART) interface.
  • the control unit may determine the number of operational IC chips in the apparatus based on the number of received node IC chip responses. Therefore, failure diagnosis of the node IC chipset may be performed by using a command. If a failure occurs in a node IC chip during operation, it may be determined by the hash rate of the data-processing apparatus, or it may be determined by discovering the node IC chips that do not respond with computation data for a long period of time.
  • FIG. 13 is a block diagram showing the structure of a computing system 40 according to an embodiment of the present disclosure.
  • the computing system 40 includes the data-processing apparatus 10 and a control unit 30, the data-processing apparatus 10 is connected to the control unit 30 through an external interface for communication.

Abstract

A data-processing apparatus includes a plurality of integrated circuit chips connected in series. The plurality of integrated circuit chips include a first-stage integrated circuit chip and one or more non-first-stage integrated circuit chips. Each non-first-stage integrated circuit chip has a corresponding preceding-stage integrated circuit chip in an upstream communication direction within the plurality of integrated circuit chips. The first-stage integrated circuit chip is configured to receive a command signal from a control unit, transmit the command to the one or more non-first-stage integrated circuit chips, and return a computation result to the control unit. Each of the plurality of integrated circuit chips includes a busy-signal input pin and a busy-signal output pin; and the busy-signal input pin of each non-first-stage integrated circuit chip corresponds to the busy-signal output pin of its corresponding preceding-stage integrated circuit chip.

Description

DATA-PROCESSING APPARATUS, DATA TRANSMISSION METHOD, AND COMPUTING SYSTEM THEREOF
CROSS-REFERENCE TO RELATED APPLICATION
This application claims the priority to Chinese Patent Application No. 2017114024397, entitled “DATA-PROCESSING APPARATUS, DATA TRANSMISSION METHOD, AND COMPUTING SYSTEM THEREOF, ” filed on December 21, 2017.
TECHNICAL FIELD
The present disclosure relates to the field of data processing technologies, and in particular, to a data-processing apparatus, a data transmission method, and a computing system thereof.
BACKGROUND
With the application and development of machine learning, especially deep learning technologies, higher requirements are placed on the data processing capabilities of computing systems in various fields. GPU processing chips have been widely used in data computing tasks in various applications because of their superior graphics processing and parallel computing capabilities compared to traditional CPUs.
However, the computing power of a single-GPU architecture is still limited, and may not meet the requirements of high-intensity data computing capabilities in tasks such as deep learning and cryptographic hashing.
SUMMARY
In order to address the above technical problems in the prior art, the present disclosure provides a data-processing apparatus, a data transmission method, and a computing system thereof.
In one aspect of the present disclosure, a data-processing apparatus is provided. The data-processing apparatus includes a plurality of integrated circuit chips connected in series. The plurality of integrated circuit chips include a first-stage integrated circuit chip and one or more non-first-stage integrated circuit chips. Each non-first-stage integrated circuit chip has a corresponding preceding-stage integrated circuit chip in an upstream communication direction within the plurality of integrated circuit chips. The first-stage integrated circuit chip is configured to receive a command signal from a control unit, transmit the command to the one or more non-first-stage integrated circuit chips, and return a computation result to the control unit. Each of the plurality of integrated circuit chips includes a busy-signal input pin and a busy-signal output pin; and the busy-signal input pin of each non-first-stage integrated circuit chip corresponds to the busy-signal output pin of its corresponding preceding-stage integrated circuit chip.
In certain embodiments of the data-processing apparatus, the busy-signal output pin is configured to output a first logic voltage level indicating that the integrated circuit chip is available to forward data or a second logic voltage level indicating the integrated circuit chip is unavailable to forward data.
In certain embodiments of the data-processing apparatus, each non-first-stage integrated circuit chip is configured to output the second logic voltage level through its busy-signal output pin in response to detecting its busy-signal input pin having the second logic voltage level.
In certain embodiments of the data-processing apparatus, a present-stage integrated circuit chip includes a buffer queue. If the buffer queue stores queued data waiting to be transmitted, the present-stage integrated circuit chip is configured to: output the second logic  voltage level through the busy-signal output pin of the present-stage integrated circuit chip; hold off transmitting the queued data in response to detecting the busy-signal input pin of present-stage integrated circuit chip having the second logic voltage; start transmitting the queued data to the corresponding preceding-stage integrated circuit chip in response to detecting the busy-signal input pin of the present-stage integrated circuit chip having the first logic voltage level; and output the first logic voltage level through the busy-signal output pin of the present-stage integrated circuit chip when the transmitting of the queued data is completed.
In certain embodiments of the data-processing apparatus, the present-stage integrated circuit chip is configured to complete transmitting the queued data after starting transmitting the queued data.
In certain embodiments of the data-processing apparatus, the present-stage integrated circuit chip is configured to wait for a guard interval period of time before starting transmitting the queued data to the corresponding preceding-stage integrated circuit chip after outputting the second logic voltage level through the busy-signal output pin.
In certain embodiments of the data-processing apparatus, the guard interval period is determined according to a synchronous or asynchronous communication mode of the plurality of the integrated circuit chips respectively.
In certain embodiments of the data-processing apparatus, the guard interval period is determined further according a signal transmission delay and/or an operation speed of the plurality of the integrated circuit chips.
In certain embodiments of the data-processing apparatus, each of the plurality of integrated circuit chips comprises an internal control unit and one or more arithmetic-unit groups. Each arithmetic-unit group comprises a plurality of arithmetic units including a first-stage arithmetic unit and one or more non-first-stage arithmetic units. Each non-first-stage arithmetic unit has a corresponding preceding-stage arithmetic unit in the plurality of the arithmetic units. The first-stage arithmetic unit is connected to the internal control unit.
In certain embodiments of the data-processing apparatus, each of the plurality of arithmetic units comprises a computing component and a storage component. The computing component and the storage component within each of the plurality of arithmetic units are connected to each other. And the computing component of each non-first-stage arithmetic unit is connected to the storage component of the corresponding preceding-stage arithmetic unit.
In certain embodiments of the data-processing apparatus, each of the plurality of integrated circuit chips includes a data input unit and a data output unit. The data output unit of the first-stage integrated circuit chip is connected to the control unit to transmit the computation result to the control unit. The data output unit of each non-first-stage integrated circuit chip is connected to the data input unit of the corresponding preceding-stage integrated circuit chip to transmit computation data. And the data input unit of each non-first-stage integrated circuit chip is connected to the data output unit of the corresponding preceding-stage integrated circuit chip to receive data signal or command signal.
In certain embodiments, the data-processing apparatus further includes a signal conversion unit that connects two of the plurality of integrated circuit chips for signal voltage adaptation.
In certain embodiments, the data-processing apparatus further includes a clock signal crystal oscillator. A clock signal output interface of the clock signal crystal oscillator is connected to a clock signal input interface of one of the plurality of integrated circuit chips.
In another aspect of the present disclosure, a data transmission method is provided for a data-processing apparatus. The method includes: determining whether a busy-signal input pin of a present-stage integrated circuit chip of the data-processing apparatus is of a first logic voltage level or a second logic voltage level; holding off transmitting data in response to the busy-signal input pin of the present-stage integrated circuit chip being of the second logic voltage; and starting transmitting queued data in a buffer queue to a preceding-stage integrated circuit chip in response to data being waiting to be transmitted in the buffer queue and the busy-signal input pin of the present-stage integrated circuit chip being of the first logic voltage.
In certain embodiments, the data transmission method further includes: outputting the second logic voltage level through a busy-signal output pin of the present-stage integrated circuit chip in response to the busy-signal input pin of the present-stage integrated circuit chip being of the second logic voltage level.
In certain embodiments, the data transmission method further includes: outputting the second logic voltage level through a busy-signal output pin of the present-stage integrated circuit chip when there is data waiting to be transmitted in the buffer queue; and outputting the first logic voltage level through the busy-signal output pin of the present-stage integrated circuit chip when the transmitting of the queued data is completed.
In certain embodiments, the data transmission method further includes: completing transmitting the queued data after starting transmitting the queued data.
In certain embodiments, the data transmission method further includes: waiting for a guard interval period of time before starting transmitting the queued data to the preceding-stage integrated circuit chip after outputting the second logic voltage level through the busy-signal output pin.
In another aspect of the present disclosure, a computing system is provided. The computing system includes an external control unit and the data-processing apparatus in any one of the foregoing embodiments.
According to the embodiments of the present disclosure, the data transmission between the node IC chips connected in series are controlled by configuring the busy-signal input/output pins, thereby effectively avoiding data transmission conflicts between the node IC chips. Computational task allocation among the plurality of node IC chips connected in series may be realized using limited amount of command communication between the control unit and the node  IC chips. Thus, the computational capabilities of the node IC chips may be fully utilized, and a fast failure diagnosis of the chipset may be realized.
BRIEF DESCRIPTION OF THE DRAWINGS
In order to more clearly illustrate the technical solutions in the embodiments of the present disclosure, the drawings used in the description of the embodiments will be briefly described below.
FIG. 1 is a diagram of a data-processing apparatus in the prior art;
FIG. 2 is a diagram of a data-processing apparatus according to an embodiment of the present disclosure;
FIG. 3 is a diagram of a node IC chip according to an embodiment;
FIG. 4 is a block diagram of two arithmetic units connected in series according to an embodiment of the present disclosure;
FIG. 5 is a flowchart of a data transmission method according to an embodiment of the present disclosure;
FIG. 6 is a flowchart of a data transmission method according to another embodiment of the present disclosure;
FIG. 7 is a flowchart of a data transmission method according to another embodiment of the present disclosure;
FIG. 8 is a flowchart of a task allocation method according to an embodiment of the present disclosure;
FIG. 9 is a flowchart of a task allocation method according to another embodiment of the present disclosure;
FIG. 10 is a flowchart of a task allocation method according to another embodiment of the present disclosure;
FIG. 11 is a flowchart illustrating a failure diagnosis method according to an embodiment of the present disclosure;
FIG. 12 is a flowchart of a failure diagnosis method according to another embodiment of the present disclosure; and
FIG. 13 is a diagram of a computing system according to an embodiment of the invention.
DETAILED DESCRIPTION OF THE EMBODIMENTS
Technical solutions of the present disclosure will be described with reference to the accompanying drawings. It should be understood that the described embodiments are some but not all of the embodiments of the present disclosure. Other embodiments derived by those with ordinary skills in the art on the basis of the described embodiments without inventive efforts should fall within the scope of the present disclosure.
Chinese Patent Application No. CN 201610312586.4 discloses a data-processing apparatus including a plurality of integrated circuit (IC) chips connected in series. A data processing task is received via an external interface by a first-stage node IC chip. As shown in FIG. 1, The plurality of node IC chips connected in series perform arithmetic processing of the data processing task and return the processing results through the external interface of the first-stage node IC chip. The number of node IC chips may be increased according to the computing requirements of the data processing task, and only one node IC chip needs to communicate with external devices without occupying the communication interfaces of the external devices. Thus, the data-processing apparatus may provide scalable data processing capabilities.
The foregoing data-processing apparatus connects a plurality of node IC chips in series with each node IC chip responsible for a portion of the computation, improving the data processing speed. However, data transmission between node IC chips may cause signal conflicts. Moreover, the data-processing apparatus needs to allocate data processing tasks among the node IC chips after receiving the data processing task transmitted by external devices. The manner of allocating tasks among the plurality of node IC chips to reduce signal conflicts may need to be resolved. In addition, if each node IC chip processes the same data processing task, there may be a failure condition of a certain node IC chip, which may cause the entire node IC chipset to fail to work normally. How to quickly diagnose the failure may chip also need to be addressed.
FIG. 2 is a diagram showing the structure of a data-processing apparatus 10 according to certain embodiments of the present disclosure. As shown in FIG. 2, the data-processing apparatus 10 includes a plurality of node IC chips 20 connected in series.
A first-stage node IC chip located in the downstream communication direction from an external control unit receives the command signal from the control unit through an external interface configured in the first-stage node IC chip. The first-stage node IC chip transmits the command signal to one or more node IC chips for processing, and returns the calculation data to the control unit through the external interface. For a specific node IC chip in the plurality of node IC chips 20 connected in series, its next-stage node IC chip is a neighboring IC chip connected in the downstream communication direction, and its preceding-stage node IC chip is a neighboring IC chip connected in the upstream communication direction.
Each node IC chip 20 has a busy-signal input pin BI and a busy-signal output pin BO. The busy-signal output pin BO of a node IC chip located in the downstream communication direction is coupled to the busy-signal input pin BI of its next-stage node IC chip 20. The busy-signal output pin BO and the busy-signal input pin BI are used to control data transmission of the corresponding node IC chips in the upstream communication direction.
In certain embodiments, the node IC chip 20 may be implemented using an application-specific integrated circuit (ASIC) , a graphics processing unit (GPU) , a digital signal processor (DSP) , or a field-programmable gate array (FPGA) .
In certain embodiments, when the busy-signal output pin BO is of a first logic voltage level, it indicates that the data returned by the next-stage node IC chip can be forwarded; and when the busy-signal output pin BO is of a second logic voltage level, it indicates that the present-stage node IC chip or the preceding-stage node IC chip is or will be transmitting data.
In certain embodiments, the first logic voltage level at a BO or a BI pin may indicate a “not busy” state, and the second voltage level at a BO or a BI pin may indicate a “busy” state. In certain embodiments, the first logic voltage level may be a logic LOW voltage level, and the second logic voltage level may be a logic HIGH voltage level. Alternatively, in certain embodiments, the first logic voltage level may be a logic HIGH voltage level, and the second logic voltage level may be a logic LOW voltage level.
In certain embodiments, when the busy-signal input pin BI of a node IC chip 20 is of the second logic voltage level, the busy-signal output pin BO of the node IC chip may also be correspondingly set to the second logic voltage level.
In certain embodiments, when there is data waiting to be transmitted in the first-in, first-out (FIFO) buffer queue of the node IC chip 20 at the present stage, when it is detected that its busy-signal input pin BI is of second logic voltage level, it is necessary to wait for the busy-signal input pin BI to turn to the first logic voltage level before transmitting data. When it is detected that its busy-signal input pin BI is of the first logic voltage level, data may be transmitted immediately.
In certain embodiments, when a node IC chip 20 is transmitting data, if its busy-signal input pin BI is detected to be of the second logic voltage level, the data transmission in progress is not affected, that is, the data is continuously transmitted until all the data in the FIFO buffer queue is transmitted.
In certain embodiments, when there is data waiting to be transmitted in the FIFO buffer queue of a node IC chip 20 at the present stage, its busy-signal output pin BO outputs a second  logic voltage level. When the data transmission is completed, its busy-signal output pin BO output is set to first logic voltage level. In certain embodiments, when the node IC chip receives a reset signal, its busy-signal output pin BO may be changed to the first logic voltage level.
In certain embodiments, after the busy-signal output pin BO of a node IC chip 20 at the present stage is set to the second logic voltage level, the node IC chip may wait for a predetermined guard interval period GAP before performing further data transmission.
In certain embodiments, the guard interval period GAP is configured to ensure that the next-stage node IC chip does not transmit data at the same time when the present-stage node IC chip needs to transmit data. The next-stage node IC chip sends data in two possible scenarios: one is that the next-stage node IC chip is transmitting its own data, and the other is that the next-stage node IC chip is forwarding data transmitted by a further downstream node IC chip.
In certain embodiments, if a node IC chip at the present stage needs to transmit data, it outputs the second logic voltage level through the busy-signal output pin BO. At this time, even if the next-stage node IC chip does not have data to be forwarded, it needs to wait for a guard interval period GAP. This is to avoid the scenario that the next-stage node IC chip has already been transmitting data when the busy-signal output pin BO of the present stage has just been set to a second logic voltage level. In certain exemplary embodiments, the guard interval period GAP may be at least 8 bits of data transmission time. For example, the guard interval period GAP may be configured to a 16-bit data transfer time.
In certain embodiments, if a next-stage node IC chip is transmitting data, the node IC chip at the present stage also waits for a guard interval period GAP to ensure that the data  transmission of the next-stage node IC chip finishes before the present-stage node IC chip starts to transmit data. When the present-stage node IC chip starts to transmit data, since the busy-signal output pin BO of the present-stage node IC chip has already been set to a second logic voltage level, the next-stage node IC chip will not transmit more data during the process.
In certain scenarios, when the first-stage node IC chip of the data-processing apparatus needs to transmit data, it outputs the second logic voltage level through the busy-signal output pin BO, and when the last-stage node IC chip detects a second logic voltage level in its busy-signal input pin BI as the second logic voltage level, the signal has already been delayed by N stages, where N is the total number of node IC chips in the data-processing apparatus. If before the last-stage node IC chip detects the second logic voltage level signal in its busy-signal input pin BI, it has already been transmitting data in the upstream communication direction, then its data signal also needs to pass the N-stage delay to reach the first-stage node. Therefore, the guard interval period GAP needs to be configured so that the two N-stage delays are within the guard interval period GAP.
In certain embodiments, the guard interval period GAP is configured according to different communication modes employed between the node IC chips. Taking a series connection of 256 node IC chips as an example, the following scenarios may be considered.
Scenario 1. When the node IC chip adopts a universal asynchronous receiver/transmitter (UART) mode, considering the line delay inside each node IC chip and the PCB delay, a guard interval period of 16-bit UART transmission time may be adequate for the UART mode. 
Scenario 2. When the node IC chip adopts a synchronous serial communication mode, in the case of 256 node IC chips connected in series, considering the delay time from the busy-signal output pin BO to the busy-signal input pin BI, and that for the IC chip at each stage, the data needs one clock cycle to be forwarded, the overall delay requires 256 clock cycles. Therefore, a 512-clock-cycle waiting time may be configured.
In certain embodiments, the guard interval period is configured according to the transmission delay of the signals or commands and/or the computing speed of the IC chips.
FIG. 3 is a diagram illustrating the architecture of a node IC chip according to certain embodiments. As shown in FIG. 3, in certain embodiments, the node IC chip may include an internal control unit 201, two or more groups of arithmetic units 202, and one or more input/output interfaces 203. The internal control unit 201 is connected to the input/output interface 203 for data exchange with external components.
Each arithmetic-unit group includes two or more arithmetic units connected in series. A node IC chip may have multiple arithmetic units. In order to save wiring space, reduce wiring complexity, and facilitate the control of the internal control unit, the arithmetic units may be divided into two or more arithmetic-unit groups according to a variety of factors such as available space as well as the operating characteristics and functions of the arithmetic unit. The arithmetic units in each group are connected in series. A person skilled in the art may group the arithmetic units according to the application. The present disclosure does not specifically limit the specific grouping method.
In certain embodiments, a first-stage arithmetic unit in each arithmetic-unit group is coupled to the internal control unit. Since the arithmetic units in each arithmetic-unit group are connected in series to each other, only one arithmetic unit in each arithmetic-unit group is connected to the internal control unit. In certain embodiments, the first-stage arithmetic unit in each arithmetic-unit group may be connected to the internal control unit, and the first-stage arithmetic unit is usually the arithmetic unit closest to the internal control unit, further saving wiring space and reducing wiring complexity. A person skilled in the art can select the arithmetic unit connected to the internal control unit according to the application, which is not specifically limited in the present disclosure.
In certain embodiments, a node IC chip may comprise two input/output interfaces 203 which are respectively configured at the two ends of the node IC chip. The two input/output interfaces are connected to the internal control unit, so that the internal control unit may exchange data with external components through the input/output interfaces. In certain embodiments, the input/output interfaces 203 may be separated into a data input unit and a data output unit. A person skilled in the art may select the number, position and configuration of the input/output interfaces according to the application, which is not specifically limited in the present disclosure.
FIG. 4 is a structural diagram of the arithmetic units 202 according to certain embodiments. As shown in FIG. 4, in certain embodiments, the arithmetic unit 202 includes one or more computing components 2021, one or more storage components 2022, and a clock input interface 2023.
The computing component 2021 is coupled to the storage component 2022 of the preceding-stage arithmetic unit for reading data stored in the preceding-stage arithmetic unit storage component 2022 to perform calculation. The computing component 2021 is also coupled to the storage component 2022 of the same arithmetic unit for storing the calculated data. The stored data may be read by the next-stage arithmetic unit. The clock input interface 2023 may be connected to a clock output interface of the internal control unit.
In certain embodiments, each arithmetic unit may acquire data that is needed for the computation by connecting the plurality of arithmetic units in series. The serial connection may also save wiring space and reduce wiring complexity.
The arithmetic unit may comprise a microelectronic circuit, and the microelectronic circuit may comprise COMS transistors and/or NMOS transistors. Those skilled in the art may select the architecture of the arithmetic units and storage components according to the application. The present disclosure does not specifically limit the selection in the architectures and models of the arithmetic units and the storage components.
In certain embodiments, a data output unit of the first-stage node IC chip is connected to a data input units of the control unit for returning the computation result of the data-processing apparatus to the control unit. A data input unit of a node IC chip may be connected to the data output unit of its next-stage node IC chip to receive data processed by the next-stage node IC chip. The one or more data input units of the first-stage node IC chip are connected to one or more data output units of the control unit to receive data input or command input from the control unit. The one or more data output units of a node IC chip are connected to one or more  data input units of its next-stage node IC chip for transmitting data input or command input to the next-stage node IC chip.
In certain embodiments, the data-processing apparatus may further include a signal conversion unit that connects two node IC chips for signal voltage adaptation. In certain embodiments, the data-processing apparatus may further include one or more clock signal crystal oscillators. A clock signal output interface of the clock signal crystal oscillator may be coupled to a clock signal input interface of a node IC chip of the data-processing apparatus.
FIG. 5 is a flow chart showing an embodiment of a data transmission method based on the data-processing apparatus 10 according to certain embodiments. As shown in FIG. 5, the data transmission may include the following steps.
In Step S1, the node IC chip at the present stage determines whether there is data waiting to be transmitted in its buffer queue.
In Step S2, if there is data waiting to be transmitted in the buffer queue, the node IC chip detects whether the busy-signal input pin BI is of the second logic voltage level.
In Step S3, if the busy-signal input pin BI is of the second logic voltage level, the node IC chip waits for the busy-signal input pin BI to change from second logic voltage level to first logic voltage level, and then starts transmitting data from the buffer queue.
In Step S4, if the busy-signal input pin BI has a first logic voltage level, the node IC chip immediately starts to transmit the data from its buffer queue.
In certain embodiments, the data transmission method may further include: setting the busy-signal output pin BO to the second logic voltage level when the node IC chip at the present stage detects that the busy-signal input pin BI is of the second logic voltage level.
FIG. 6 is a flow chart showing another embodiment of a data transmission method based on the data-processing apparatus 10. As shown in FIG. 6, the data transmission method includes the following steps.
In Step S11, the present-stage node IC chip determines whether there is data waiting to be transmitted in its buffer queue.
In Step S12, if there is data waiting to be transmitted in the buffer queue, the present-stage node IC chip outputs the second logic voltage level through its busy-signal output pin BO.
In Step S13, the present-stage node IC chip detects whether the busy-signal input pin BI is of the second logic voltage level.
In Step S14, if the busy-signal input pin BI is of the first logic voltage level, the present-stage node IC chip immediately transmits the data in the buffer queue.
In Step S15, the present-stage node IC chip determines whether the transmission of the data in the buffer queue has finished.
In Step S16, if the transmission of the data in the buffer queue has finished, the present-stage node IC chip outputs the first logic voltage level through its busy-signal output pin BO.
In Step S17, if a second logic voltage level in the busy-signal input pin BI is detected when the transmission of the data in the buffer queue has not finished, the data is continuously transmitted until all data transmission in the buffer queue is completed.
FIG. 7 is a flow chart showing another embodiment of a data transmission method based on the data-processing apparatus 10. As shown in FIG. 7, on the basis of the embodiment shown in FIG. 6, the data transmission method further includes the following steps after step S12.
In Step S18, after the busy-signal output pin BO of the present-stage node IC chip has been set to the second logic voltage level, the present-stage node IC chip waits for a predetermined guard interval period, and then performs data transmission to ensure that the next-stage node IC chip does not simultaneously transmit data.
In certain embodiments, the guard interval period is configured depending on a synchronous or an asynchronous communication mode between the node IC chips.
As shown in the foregoing embodiments, the data transmission between the serial node IC chips is controlled by configuring the busy-signal input/output pins, and in combination with configuring the guard interval period, data transmission conflicts between the node IC chips are effectively avoided.
FIG. 8 is a flow chart of an embodiment of a task allocation method based on the data-processing apparatus 10. As shown in FIG. 8, the task allocation method is applied to a control unit, and the method includes the following steps.
In Step S21, the control unit sends a command to at least one node IC chip of the data-processing apparatus to cause the node IC chip to enter an inactive state.
In Step S22, the control unit sends an address allocation command to a node IC chip of the data-processing apparatus, and sequentially assigns a chip address to each node IC chip.
In Step S23, the control unit assigns a computing task to each node IC chip according to the chip address of the IC chip.
In certain embodiments, the step of sending a command to at least one node IC chip of the data-processing apparatus to cause the node IC chip to enter the inactive state includes: sending a command to each node IC chip of the data-processing apparatus, so that each node IC chip sequentially enters the inactive state.
In certain embodiments, the step of sending a command to at least one node IC chip of the data-processing apparatus to cause the node IC chip to enter the inactive state includes: sending one command to all of the node IC chips of the data-processing apparatus to cause each node IC chip to simultaneously enter the inactive state.
FIG. 9 is a flowchart of another embodiment of a data processing task allocation method based on the data-processing apparatus 10. As shown in FIG. 9, on the basis of the embodiment shown in FIG. 8, the task allocation method further includes the following steps after Step S23.
In Step S24, the node IC chip performs one or more hash calculations according to the assigned computing task. In particular, the hash calculation may include a cryptographic hash operation or a hash collision operation.
FIG. 10 is a flow chart showing another embodiment of a task allocation method of the data-processing apparatus 10. As shown in FIG. 10, the task allocation method is applied to a node IC chip, and the method includes the following steps.
In Step S31, the node IC chip receives an address allocation command transmitted by the control unit.
In Step S32, the node IC chip determines whether the node IC chip is currently in the inactive state.
In Step S33, when the node IC chip is in an inactive state, the node IC chip parses the address allocation command transmitted by the control unit, stores the allocated address to a register, and changes to an active state.
In Step S34, when the node IC chip is in an active state, the node IC chip does not parse the address allocation command transmitted by the control unit, instead, the node IC chip directly forwards the address to the next-stage node IC chip.
In certain embodiments, the task input command format used by the data-processing apparatus 10 may have an HCN field and a starting random number offset (SNO) field. The HCN field is used to control the number of calculations of each node IC chip. For example, if the computing task requires 2^32 calculations, that is, the random number nonce progressively increases from an initial value to 2^32 values, then for 32 node IC chips connected in series, each node IC chip only needs to calculate 2^27 steps. The starting random number offset (SNO) field includes a number.
In certain embodiments, the data-processing apparatus 10 may use an address assignment command SetAddress. The command format includes a chip address field ChipAddr, which specifies the chip address of a single node IC chip. The computing task for each node IC chip is determined by the values in the SNO and ChipAddr fields.
In certain embodiments, to allocate the chip address, the control unit first issues a Chaininactive command to change a node IC chip to the inactive state. In certain embodiments, all node IC chips may be set to the inactive state, or they may be set to the inactive state one by one. The control unit then sends an address assignment command SetAddress to a node IC chip to assign an arbitrary address to the node IC chip.
In the inactive state, a node IC chip may parse the chip address assignment command SetAddress and store its address in a register. Then the node IC chip may be changed to the active state. In certain embodiments, the chip may enter the active state according to a command of the control unit after being assigned a node address. Alternatively, it may automatically enter into an active state after parsing an address assignment command.
In the active state, the node IC chip may not parse the address allocation command SetAddress. Instead, the node IC chip may forward the address allocation command directly to the next-stage node IC chip. For a series connection of N node IC chips, the control unit needs to issue N SetAddress commands to assign an arbitrary address to each node IC chip one by one. For example, for a cryptocurrency mining machine having 256 node IC chips, the control CPU may need to sequentially send 256 SetAddress commands to configure the addresses of all the node IC chips.
FIG. 11 is a flowchart according to an embodiment of a failure diagnosis method based on the data-processing apparatus 10. As shown in FIG. 11, the failure diagnosis method is applied to the control unit, and the method includes the following steps.
In Step S41, the control unit sends a work status query command to a node IC chip of the data-processing apparatus.
In Step S42, each node IC chip of the data-processing apparatus sequentially forwards the work status query command.
In Step S43, each node IC chip determines whether its chip address matches the chip address specified in the work status query command.
In Step S44, if the chip address of a node IC chip matches the chip address specified in the work status query command, the node IC chip returns the register data.
Step S45, the control unit determines the work status of each node IC chip according to the register data returned by the node IC chip.
In certain embodiments, the failure diagnosis method may further include: if it is detected that the register data returned by a node IC chip that matches the chip address specified in the work status query command is not received, the control unit determines that a failure occurs in that node IC chip.
FIG. 12 is a flow chart showing a failure diagnosis method of the data-processing apparatus 10 according to another embodiment. As shown in FIG. 12, the failure diagnosis method is applied to the control unit, and the method includes the following steps.
In Step S51, the control unit sends a work status query command to a node IC chip of the data-processing apparatus.
In Step S52, each node IC chip of the data-processing apparatus sequentially forwards the work status query command.
In Step S53, the control unit determines whether the work status query command specifies querying the work status of all node IC chips.
In Step S54, if yes, each node IC chip sequentially returns its register data to the control unit.
In Step S55, the control unit determines the work status of the node IC chips according to the returned register data.
In certain embodiments, the failure diagnosis method further includes: determining which node IC chips have failures based on the number of register data returned by the node IC chips when the work status query command specifies querying the work status of all the node IC chips.
When a node IC chip receives the work status query of the control unit, it first forwards it to its next-stage node IC chip. For the node IC chip whose chip address matches the address in the command, the data in the corresponding register (for example, data being calculated) is  returned to the control unit through a universal asynchronous receiver-transmitter (UART) interface. The control unit may determine the number of operational IC chips in the apparatus based on the number of received node IC chip responses. Therefore, failure diagnosis of the node IC chipset may be performed by using a command. If a failure occurs in a node IC chip during operation, it may be determined by the hash rate of the data-processing apparatus, or it may be determined by discovering the node IC chips that do not respond with computation data for a long period of time.
FIG. 13 is a block diagram showing the structure of a computing system 40 according to an embodiment of the present disclosure. As shown in FIG. 13, the computing system 40 includes the data-processing apparatus 10 and a control unit 30, the data-processing apparatus 10 is connected to the control unit 30 through an external interface for communication.
The forgoing description and embodiments are exemplary descriptions of the technical solutions provided by the present disclosure. Other embodiments of the disclosure will be apparent to those skilled in the art. The present disclosure is intended to cover any variations, modifications, equivalent substitutions, uses, or adaptations that are in accordance with the general principles of the present disclosure.

Claims (20)

  1. A data-processing apparatus, comprising a plurality of integrated circuit chips connected in series, wherein:
    the plurality of integrated circuit chips include a first-stage integrated circuit chip and one or more non-first-stage integrated circuit chips, each non-first-stage integrated circuit chip having a corresponding preceding-stage integrated circuit chip in an upstream communication direction within the plurality of integrated circuit chips;
    the first-stage integrated circuit chip is configured to receive a command signal from a control unit, transmit the command to the one or more non-first-stage integrated circuit chips, and return a computation result to the control unit;
    each of the plurality of integrated circuit chips includes a busy-signal input pin and a busy-signal output pin; and
    the busy-signal input pin of each non-first-stage integrated circuit chip corresponds to the busy-signal output pin of its corresponding preceding-stage integrated circuit chip.
  2. The data-processing apparatus according to claim 1, wherein for an integrated circuit chip in the plurality of integrated circuit chips:
    the busy-signal output pin is configured to output one of:
    a first logic voltage level indicating that the integrated circuit chip is available to forward data; and
    a second logic voltage level indicating the integrated circuit chip is unavailable to forward data.
  3. The data-processing apparatus according to claim 2, wherein:
    each non-first-stage integrated circuit chip is configured to output the second logic voltage level through its busy-signal output pin in response to detecting its busy-signal input pin having the second logic voltage level.
  4. The data-processing apparatus according to claim 2, wherein for a present-stage integrated circuit chip in the one or more non-first-stage integrated circuit chips:
    the present-stage integrated circuit chip includes a buffer queue; and
    if the buffer queue stores queued data waiting to be transmitted, the present-stage integrated circuit chip is configured to:
    output the second logic voltage level through the busy-signal output pin of the present-stage integrated circuit chip;
    hold off transmitting the queued data in response to detecting the busy-signal input pin of present-stage integrated circuit chip having the second logic voltage;
    start transmitting the queued data to the corresponding preceding-stage integrated circuit chip in response to detecting the busy-signal input pin of the present-stage integrated circuit chip having the first logic voltage level; and
    output the first logic voltage level through the busy-signal output pin of the present-stage integrated circuit chip when the transmitting of the queued data is completed.
  5. The data-processing apparatus according to claim 4, wherein:
    the present-stage integrated circuit chip is configured to complete transmitting the queued data after starting transmitting the queued data.
  6. The data-processing apparatus as claimed according to claim 4, wherein:
    the present-stage integrated circuit chip is configured to wait for a guard interval period of time before starting transmitting the queued data to the corresponding preceding-stage integrated circuit chip after outputting the second logic voltage level through the busy-signal output pin.
  7. The data-processing apparatus according to claim 6, wherein the guard interval period is determined according to a synchronous or asynchronous communication mode of the plurality of the integrated circuit chips respectively.
  8. The data-processing apparatus according to claim 6, wherein the guard interval period is determined further according to at least one of:
    a signal transmission delay; and
    an operation speed of the plurality of the integrated circuit chips.
  9. The data-processing apparatus according to claim 1, wherein:
    each of the plurality of integrated circuit chips comprises an internal control unit and one or more arithmetic-unit groups;
    each arithmetic-unit group comprises a plurality of arithmetic units including a first-stage arithmetic unit and one or more non-first-stage arithmetic units;
    each non-first-stage arithmetic unit has a corresponding preceding-stage arithmetic unit in the plurality of the arithmetic units; and
    the first-stage arithmetic unit is connected to the internal control unit.
  10. The data-processing apparatus according to claim 9, wherein:
    each of the plurality of arithmetic units comprises a computing component and a storage component;
    the computing component and the storage component within each of the plurality of arithmetic units are connected to each other; and
    the computing component of each non-first-stage arithmetic unit is connected to the storage component of the corresponding preceding-stage arithmetic unit.
  11. The data-processing apparatus according to claim 1, wherein:
    each of the plurality of integrated circuit chips includes a data input unit and a data output unit;
    the data output unit of the first-stage integrated circuit chip is connected to the control unit to transmit the computation result to the control unit;
    the data output unit of each non-first-stage integrated circuit chip is connected to the data input unit of the corresponding preceding-stage integrated circuit chip to transmit computation data; and
    the data input unit of each non-first-stage integrated circuit chip is connected to the data output unit of the corresponding preceding-stage integrated circuit chip to receive data signal or command signal.
  12. The data-processing apparatus according to claim 1, further comprising a signal conversion unit that connects two of the plurality of integrated circuit chips for signal voltage adaptation.
  13. The data-processing apparatus according to claim 1, further comprising a clock signal crystal oscillator, wherein a clock signal output interface of the clock signal crystal oscillator is connected to a clock signal input interface of one of the plurality of integrated circuit chips.
  14. A data transmission method for a data-processing apparatus, comprising:
    determining whether a busy-signal input pin of a present-stage integrated circuit chip of the data-processing apparatus is of a first logic voltage level or a second logic voltage level;
    holding off transmitting data in response to the busy-signal input pin of the present-stage integrated circuit chip being of the second logic voltage; and
    starting transmitting queued data in a buffer queue to a preceding-stage integrated circuit chip in response to data being waiting to be transmitted in the buffer queue and the busy-signal input pin of the present-stage integrated circuit chip being of the first logic voltage.
  15. The data transmission method according to claim 14, further comprising:
    outputting the second logic voltage level through a busy-signal output pin of the present-stage integrated circuit chip in response to the busy-signal input pin of the present-stage integrated circuit chip being of the second logic voltage level.
  16. The data transmission method according to claim 14, further comprising:
    outputting the second logic voltage level through a busy-signal output pin of the present-stage integrated circuit chip when there is data waiting to be transmitted in the buffer queue; and
    outputting the first logic voltage level through the busy-signal output pin of the present-stage integrated circuit chip when the transmitting of the queued data is completed.
  17. The data transmission method according to claim 14, further comprising:
    completing transmitting the queued data after starting transmitting the queued data.
  18. the data transmission method according to claim 14, further comprising:
    waiting for a guard interval period of time before starting transmitting the queued data to the preceding-stage integrated circuit chip after outputting the second logic voltage level through the busy-signal output pin.
  19. A computing system, comprising:
    an external control unit; and
    a data-processing apparatus, comprising a plurality of integrated circuit chips connected in series;
    wherein:
    the data-processing apparatus is connected to the external control unit through an external interface;
    the plurality of integrated circuit chips include a first-stage integrated circuit chip and one or more non-first-stage integrated circuit chips, each non-first-stage integrated circuit chip  having a corresponding preceding-stage integrated circuit chip in an upstream communication direction within the plurality of integrated circuit chips;
    the first-stage integrated circuit chip is configured to receive a command signal from the external control unit, transmit the command to the one or more non-first-stage integrated circuit chips, and return a computation result to the external control unit;
    each of the plurality of integrated circuit chips includes a busy-signal input pin and a busy-signal output pin; and
    the busy-signal input pin of each non-first-stage integrated circuit chip is coupled to the busy-signal output pin of the corresponding preceding-stage integrated circuit chip.
  20. The computing system according to claim 19, wherein for an integrated circuit chip in the plurality of integrated circuit chips:
    the busy-signal output pin is configured to output one of:
    a first logic voltage level indicating that the integrated circuit chip is available to forward data; and
    a second logic voltage level indicating the integrated circuit chip is unavailable to forward data.
PCT/CN2018/122789 2017-12-21 2018-12-21 Data-processing apparatus, data transmission method, and computing system thereof WO2019120294A1 (en)

Applications Claiming Priority (2)

Application Number Priority Date Filing Date Title
CN201711402439.7 2017-12-21
CN201711402439.7A CN109947555A (en) 2017-12-21 2017-12-21 Data processing equipment, data transmission method for uplink and calculating equipment

Publications (1)

Publication Number Publication Date
WO2019120294A1 true WO2019120294A1 (en) 2019-06-27

Family

ID=66993987

Family Applications (1)

Application Number Title Priority Date Filing Date
PCT/CN2018/122789 WO2019120294A1 (en) 2017-12-21 2018-12-21 Data-processing apparatus, data transmission method, and computing system thereof

Country Status (2)

Country Link
CN (1) CN109947555A (en)
WO (1) WO2019120294A1 (en)

Cited By (1)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN114628957A (en) * 2022-03-31 2022-06-14 苏州浪潮智能科技有限公司 Fool-proof device applied to SOCKET and SOCKET

Citations (4)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN101354907A (en) * 2007-07-23 2009-01-28 三星电子株式会社 Multi-chip package reducing power-up peak current
TW200939056A (en) * 2008-03-11 2009-09-16 Rdc Semiconductor Co Ltd Method of determining 2-pin logic cell orientation
US8320149B2 (en) * 2010-02-04 2012-11-27 Richtek Technology Corporation, R.O.C. Multi-chip module with master-slave analog signal transmission function
CN105760324A (en) * 2016-05-11 2016-07-13 北京比特大陆科技有限公司 Data processing device and server

Family Cites Families (7)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN102163184A (en) * 2011-03-22 2011-08-24 中兴通讯股份有限公司 Master-slave transmission system and method based on special multi-chip serial interconnection interface
CN103793344B (en) * 2012-10-31 2017-06-27 华为技术有限公司 The method and system of processing data collision
JP6548459B2 (en) * 2015-05-29 2019-07-24 キヤノン株式会社 Information processing device
CN204883694U (en) * 2015-06-30 2015-12-16 深圳市茁迩科技发展有限公司 Cascade system
CN105528988B (en) * 2016-02-15 2018-09-11 京东方科技集团股份有限公司 A kind of gate driving circuit, touch-control display panel and display device
CN206178765U (en) * 2016-11-07 2017-05-17 北京得瑞领新科技有限公司 SSD unit and SSD equipment
CN107357205B (en) * 2017-07-18 2019-08-16 深圳市博昊电子有限公司 A kind of controlled device and chain type control system

Patent Citations (4)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN101354907A (en) * 2007-07-23 2009-01-28 三星电子株式会社 Multi-chip package reducing power-up peak current
TW200939056A (en) * 2008-03-11 2009-09-16 Rdc Semiconductor Co Ltd Method of determining 2-pin logic cell orientation
US8320149B2 (en) * 2010-02-04 2012-11-27 Richtek Technology Corporation, R.O.C. Multi-chip module with master-slave analog signal transmission function
CN105760324A (en) * 2016-05-11 2016-07-13 北京比特大陆科技有限公司 Data processing device and server

Cited By (2)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN114628957A (en) * 2022-03-31 2022-06-14 苏州浪潮智能科技有限公司 Fool-proof device applied to SOCKET and SOCKET
CN114628957B (en) * 2022-03-31 2023-07-14 苏州浪潮智能科技有限公司 Fool-proof device applied to SOCKET and SOCKET

Also Published As

Publication number Publication date
CN109947555A (en) 2019-06-28

Similar Documents

Publication Publication Date Title
US11481346B2 (en) Method and apparatus for implementing data transmission, electronic device, and computer-readable storage medium
US20170075852A1 (en) Input/output signal bridging and virtualization in a multi-node network
US9274997B2 (en) Point-to-point serial peripheral interface for data communication between devices configured in a daisy-chain
JP5500741B2 (en) Interrupt approval in data processing systems
US8078781B2 (en) Device having priority upgrade mechanism capabilities and a method for updating priorities
US8719476B2 (en) Communication system, master device and slave device, and communication method, configured to handle plural concurrent requests
US20060218318A1 (en) Method and system for synchronizing communications links in a hub-based memory system
EP2312457B1 (en) Data processing apparatus, data processing method and computer-readable medium
JP2015520459A (en) Ring topology status indication
CN111966189A (en) Flexibly configured multi-computing-node server mainboard structure and program
CN114817110B (en) Data transmission method and device
US7523240B2 (en) Interrupt controller and interrupt control method
WO2019120294A1 (en) Data-processing apparatus, data transmission method, and computing system thereof
US10095643B2 (en) Direct memory access control device for at least one computing unit having a working memory
KR100288036B1 (en) Processor for information processing apparatus
US11561694B2 (en) Arithmetic processor and arithmetic apparatus
CN115687012A (en) Bus monitoring module, monitoring method and related equipment
CN110770712A (en) Operation method, chip, system, readable storage medium and computer program product
US5745704A (en) Data communication control device
US9612931B2 (en) System and method for synchronously controlling LED on multiple control modules based on a register synchronized with synchronous packets
JP6595868B2 (en) Information processing system, device, and interface control method
JP5970958B2 (en) Information processing apparatus, delay difference measurement method, and delay difference measurement program
EP3739463B1 (en) Circuit for asynchronous data transfer
US7181292B2 (en) System control method, control processing system, and control processing apparatus
KR100445915B1 (en) Apparatus for controlling memory system

Legal Events

Date Code Title Description
121 Ep: the epo has been informed by wipo that ep was designated in this application

Ref document number: 18890007

Country of ref document: EP

Kind code of ref document: A1

NENP Non-entry into the national phase

Ref country code: DE

122 Ep: pct application non-entry in european phase

Ref document number: 18890007

Country of ref document: EP

Kind code of ref document: A1