CN109947605A - Method for diagnosing faults - Google Patents

Method for diagnosing faults Download PDF

Info

Publication number
CN109947605A
CN109947605A CN201711402304.0A CN201711402304A CN109947605A CN 109947605 A CN109947605 A CN 109947605A CN 201711402304 A CN201711402304 A CN 201711402304A CN 109947605 A CN109947605 A CN 109947605A
Authority
CN
China
Prior art keywords
data
chip
node
node chip
working condition
Prior art date
Legal status (The legal status is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the status listed.)
Pending
Application number
CN201711402304.0A
Other languages
Chinese (zh)
Inventor
孙国臣
杨存永
詹克团
Current Assignee (The listed assignees may be inaccurate. Google has not performed a legal analysis and makes no representation or warranty as to the accuracy of the list.)
Bitmain Technologies Inc
Beijing Bitmain Technology Co Ltd
Original Assignee
Beijing Bitmain Technology Co Ltd
Priority date (The priority date is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the date listed.)
Filing date
Publication date
Application filed by Beijing Bitmain Technology Co Ltd filed Critical Beijing Bitmain Technology Co Ltd
Priority to CN201711402304.0A priority Critical patent/CN109947605A/en
Publication of CN109947605A publication Critical patent/CN109947605A/en
Pending legal-status Critical Current

Links

Landscapes

  • Communication Control (AREA)

Abstract

The invention discloses a kind of method for diagnosing faults.The described method includes: sending working condition querying command to the node chip of data processing equipment;Each node chip of data processing equipment successively forwards working condition querying command;Judge whether the chip address of each node chip matches with the chip address specified in working condition querying command;If the chip address of node chip matches with the chip address specified in working condition querying command, return register data;According to the register data that node chip returns, the working condition of node chip is judged.The embodiment of the present invention can effectively realize the rapid failure diagnosis of series connection node chipset.

Description

Method for diagnosing faults
Technical field
The present invention relates to technical field of data processing, more particularly to a kind of method for diagnosing faults.
Background technique
Currently, with machine learning especially depth learning technology applications in various fields and development, to computing device More stringent requirements are proposed for data-handling capacity.GPU handles chip because of its powerful graphics process and simultaneously better than tradition CPU Row operational capability is widely used to the data operation task in each field, becomes general deep learning computing platform.
However, the computing capability of single GPU architecture is still limited, deep learning, Hash operation etc. are unable to satisfy to high-strength The demand of the data computing capability of degree.For this purpose, the Chinese invention patent application application No. is CN201610312586.4 proposes A kind of scheme of the operational capability of growth data processing unit, as shown in Figure 1.The program proposes one kind by multiple node chips The data processing equipment of series connection, the data processing equipment connect via the outside for the first node chip for being located at downlink communication direction Mouth receives data processing task, carries out calculation process to data processing task by the node chips at different levels of serial connection, and lead to Cross the external interface returned data processing result of first node chip.The quantity of program interior joint chip can be according to data processing The operational capability demand of task is extended, and a node chip is only needed to be communicatively coupled with external equipment, is not accounted for With the communication interface of external equipment, it is, therefore, possible to provide the stronger data-handling capacity for being easy to extend.
Although node chip is connected in series the above-mentioned prior art, each node chip is responsible for a part of calculation processing, Data processing speed is accelerated, but carries out data transmission between each node chip and is easy to generate conflict.Also, at the data The data processing task that device receives external equipment transmission is managed, needs to distribute data processing task between each node chip, such as It is also problem in need of consideration that task is where distributed between multiple node chips with the interaction for reducing signaling.In addition, concatenated When each node chip handles same data processing task, it is understood that there may be the fault condition of delay machine occurs for some node chip, thus Entire node chip group is caused to can not work normally, how quickly to carry out the fault diagnosis of node chip is also to need what is solved to ask Topic.
Summary of the invention
To solve the above-mentioned problems, the present invention proposes a kind of method for diagnosing faults.
According to an aspect of the invention, it is proposed that a kind of method for diagnosing faults, the method for diagnosing faults is applied to have more The data processing equipment of a node chip being sequentially connected in series, described method includes following steps:
Working condition querying command is sent to the node chip of the data processing equipment;
Each node chip of the data processing equipment successively forwards the working condition querying command;
Judge each node chip chip address whether in the working condition querying command specify chip address phase Matching;
If the chip address of node chip matches with the chip address specified in the working condition querying command, return Register data;
According to the register data that node chip returns, the working condition of node chip is judged.
Optionally, the register data returned according to node chip, judges the working condition of node chip, comprising:
If detection discovery does not receive the node to match with the chip address specified in the working condition querying command The register data that chip returns, then judge that the node chip breaks down.
According to another aspect of the invention, it is proposed that a kind of method for diagnosing faults, the method for diagnosing faults is applied to have The data processing equipment of multiple node chips being sequentially connected in series, described method includes following steps:
Working condition querying command is sent to the node chip of the data processing equipment;
Each node chip of the data processing equipment successively forwards the working condition querying command;
Judge whether the working condition querying command specifies the working condition for inquiring whole node chips;
If the working condition querying command specifies the working condition for inquiring whole node chips, each node chip is successively Return register data;
According to the register data that node chip returns, the working condition of node chip is judged.
Optionally, the register data returned according to node chip, judges the working condition of node chip, comprising:
When the working condition querying command specifies the working condition for inquiring whole node chips, according to what is received The number for the register data that node chip returns, judges the node chip to break down.
Optionally, the data processing equipment includes multiple node chips being sequentially connected in series, chopped-off head node chip The data input cell of data outputting unit and external control device connection, for returning to the operation result of data processing equipment To external control device;The data input cell of superior node chip is connect with the data outputting unit of downstream site chip, is used The data obtained after receiving the operation of downstream site chip;One or more data input cells of the chopped-off head node chip with One or more data outputting units of external control device connect, to receive the data input or order of external control device Input, one or more data outputting units of superior node chip and one or more data of downstream site chip input single Member connection, for sending data input or order input to junior's node chip.
Optionally, the node chip includes control unit and multiple operation operators, the operation operator be divided into two groups or Person's multiple groups, every group of operation operator include the operation operator of two or more series connections, the chopped-off head operation in every group of operation operator Operator is connect with described control unit.
Optionally, the operation operator includes: arithmetic unit and storage unit;Wherein:
The arithmetic unit is connect with the storage unit of higher level's operation operator, for reading higher level's operation operator storage unit The data of middle storage simultaneously carry out operation;
The arithmetic unit is connect with storage unit, and the data for obtaining operation are stored in storage unit, under Grade operation operator calls.
Optionally, the data processing equipment further includes signal conversion unit, two node chips is connected, for carrying out Signal voltage adaptation.
Optionally, the data processing equipment further includes one or more clock crystals, the clock letter of the clock crystal Number output interface is connect with the clock signal input interface of a node chip in the data processing equipment.
Optionally, which is characterized in that the node chip is provided with busy signal input order and busy signal output order, institute Busy signal input order and busy signal output order are stated for controlling data hair of the respective nodes chip on uplink communication direction It send.
Optionally, when the busy signal output pin is level low/high, instruction can forward next stage node chip to return Data;When the busy signal output pin is high/low level, indicate that the same level node chip or even higher level of node chip will Or sending data.
Optionally, when the busy signal input pin of node chip is high/low level, the busy signal output of the node chip Pin is also high/low level.
Optionally, when the same level node chip has data latency transmission, when detect busy signal input pin be high/low level When, when the busy signal input pin being waited to switch to level low/high, retransmit data;When detecting the busy signal input pipe When foot is level low/high, data are sent immediately.
Optionally, when the same level node chip has data latency transmission, busy signal output pin is exported as high/low level, The busy signal output pin is exported as level low/high after data are sent completely.
Optionally, just when sending data, if detecting, busy signal input pin is high/low level to the same level node chip, Continue to send data, until the total data in buffer queue is sent completely.
Optionally, it after the busy signal output pin output of the same level node chip is high/low level, waits between scheduled protection Every the time, then carry out the transmission of data.
Optionally, the protection interval time sets respectively according to either synchronously or asynchronously communication pattern is taken between node chip It is fixed.
Compared with prior art, some embodiments of the present invention control series connection by configuring busy signal input and output pin Data between node chip are sent, and effectively prevent the data transmission collision between node chip;Pass through control unit and section Less instruction interaction realizes the distribution of the calculating task of series connection node chip between point chip, takes full advantage of series connection node The computing capability of chip, and realize the rapid failure diagnosis of series connection node chipset.
Detailed description of the invention
Fig. 1 is a kind of structural schematic diagram of data processing equipment in the prior art;
Fig. 2 is the structural schematic diagram of data processing equipment according to an embodiment of the invention;
Fig. 3 is the structural block diagram according to the node chip of one embodiment of the disclosure;
Fig. 4 is the structural block diagram according to the operation operator of one embodiment of the disclosure;
Fig. 5 is the flow chart of data transmission method for uplink according to an embodiment of the invention;
Fig. 6 is the flow chart of data transmission method for uplink according to another embodiment of the present invention;
Fig. 7 is the flow chart of data transmission method for uplink according to another embodiment of the present invention;
Fig. 8 is the flow chart of method for allocating tasks according to an embodiment of the invention;
Fig. 9 is the flow chart of method for allocating tasks according to another embodiment of the present invention;
Figure 10 is the flow chart of method for allocating tasks according to another embodiment of the present invention;
Figure 11 is the flow chart of method for diagnosing faults according to an embodiment of the invention;
Figure 12 is the flow chart of method for diagnosing faults according to another embodiment of the present invention;
Figure 13 is the structural schematic diagram according to an embodiment of the invention for calculating equipment.
Specific embodiment
To make the objectives, technical solutions, and advantages of the present invention clearer, below in conjunction with specific embodiment, and reference Attached drawing, the present invention is described in more detail.
Fig. 2 is the structural schematic diagram of data processing equipment 10 according to an embodiment of the invention.As shown in Fig. 2, the number It include multiple node chips 20 being sequentially connected in series according to processing unit 10, in which:
Positioned at downlink communication direction chopped-off head node chip by external interface receive external control device command signal, It is transferred to more than one node chip to be handled, and is returned by external interface to external control device and calculate data;
The node chip 20 is provided with busy signal input pin BI and busy signal output pin BO, is located at downlink communication side To the busy signal output pin BO of node chip 20 be coupled to the busy signal input pin BI of next stage node chip, it is described busy Signal input tube foot and busy signal output pin are used to control data of the respective nodes chip 20 on uplink communication direction and send.
In some embodiments, the node chip 20 can use ASIC specific integrated circuit, GPU, DSP or FPGA Chip is realized.
In some embodiments, when busy signal output pin BO is level low/high, instruction can forward next stage node The data that chip returns when busy signal output pin BO is high/low level, indicate the same level node chip or even higher level of node core Piece will or send data.
In some embodiments, when the busy signal input pin BI of node chip 20 is high/low level, the section The busy signal output pin BO of point chip is correspondingly also high/low level.
In some embodiments, when the data to be sent such as having in the buffer queue FIFO of the same level node chip 20, when When the busy signal input pin BI for detecting itself is high/low level, need to wait the busy signal input pin BI switch to it is low/ When high level, data can be just sent;It, can be immediately when the busy signal input pin BI for detecting itself is level low/high Send data.
In some embodiments, when the same level node chip 20 just when sending data, if detecting the described of itself When busy signal input pin BI is high/low level, data transmission is unaffected, that is, continues to send data, until buffer queue Total data in FIFO is sent completely.
It in some embodiments, will when the data to be sent such as having in the buffer queue FIFO of the same level node chip 20 The busy signal output pin BO output of its own is high/low level, exports the busy signal of its own after data are sent completely Pin BO output is level low/high.In the embodiment of the present invention, when node chip receives reset signal, busy signal efferent duct Foot BO can become level low/high.
In some embodiments, the busy signal output pin BO output of the same level node chip 20 is high/low level Afterwards, scheduled protection interval time GAP can be waited, then carries out the transmission of data.
Setting protection interval time GAP is to also guarantee next stage when the same level node chip needs to send data Node chip is not sending data simultaneously.It includes two kinds of possible situations that next stage node chip, which sends data: first is that next Grade node chip is sending the data of oneself, second is that next stage node chip is forwarding node chip more backward to send Data.
If when needing to send data, busy signal output pin BO is exported as high/low level for the same level node chip, Even if next stage node chip does not have data being forwarded at this time, it is also desirable to wait a protection interval time GAP, this be for It avoids when busy signal output pin BO is just set to high/low level, next stage node chip has developed transmission Data.When it is implemented, protection interval time GAP will at least have the data transmission period of 8 bits, for example protection can be set Interval time GAP is the data transmission period of 16 bits.
If there is next stage node chip is sending data, the same level node chip will also wait a protection interval time Then GAP starts the data for sending the same level node chip to ensure that the data transmission of next stage node chip terminates again.Due to The busy signal output pin BO of the same level node chip has exported as high/low level before this, then next stage node chip It is subsequent to be further continued for sending data.
There are also a kind of situation in practical application, when the chopped-off head node chip of data processing equipment need to send data when It waits, is high/low level busy signal output pin BO output, chains road afterbody node chip is waited to detect that busy signal is defeated Enter pin BI be high/low level when, have already been through N grades of delays.If afterbody node chip is not detecting When busy signal input pin BI is high/low level, data just are sent toward uplink communication direction, then the number that it sends It is believed that number, and need just to reach chopped-off head node chip by N grades of delays.Therefore, when setting protection interval time GAP, this Two N grades of delays, will also control within protection interval time GAP.
In some embodiments, setting protection interval time GAP is also needed according to the difference used between node chip Communication pattern is set.By taking 256 node chips are connected as an example, specifically:
1) when node chip use asynchronous serial communication mode (UART) when, it is contemplated that inside node chip line delay, PCB is delayed, and under this asynchronous communication model, it is enough for being waited using the UART transmission time of 16 bits as the protection interval time 's.
2) when node chip uses synchronous serial communication mode, in 256 cascade situations of node chip, it is contemplated that From busy signal output pin BO to the delay time of busy signal input pin BI, and data are forwarded in every level-one node chip and are needed As soon as the clock cycle, overall delay needs 256 clock cycle.Therefore, when can set waiting 512 in actual circuit The clock period.
In some embodiments, the protection interval time is according to the propagation delay time and/or chip of signal or order Arithmetic speed is arranged.
Fig. 3 be according to the structural block diagram of the node chip of one embodiment of the disclosure, as shown in figure 3, in this embodiment, institute Stating node chip includes: 201, two groups of control unit or multiple groups operation operator 202 and one or more input/output interfaces 203, in which:
Described control unit 201 is connect with the input/output interface 203, for carrying out data exchange with outside;
Every group of operation operator includes the operation operator of two or more series connections.
It usually requires that multiple operation operators are arranged on one node chip, in order to save wiring space, it is multiple to reduce wiring Miscellaneous degree is more convenient the control of control unit, can be according to the usable area of node chip, the work characteristics of operation operator, operation Multiple operation operators are divided into two or more sets operation operator tuples by the performance of operator, the function of operation operator or other factors, and It is connected in series with each other the operation operator in each group of operation operator.
It above are only illustrative explanation, in actual operation, those skilled in the art can be according to the needs of practical application Operation operator is grouped, the disclosure is not especially limited specific group technology.
In one embodiment of the disclosure, the chopped-off head operation operator in every group of operation operator is connect with described control unit.
Since the operation operator in every group of operation operator is connected in series with each other, as long as having one in every group of operation operator A operation operator is connect with control unit can.In one embodiment of the disclosure, the head in every group of operation operator can be made Grade operation operator is connect with described control unit, and the chopped-off head operation operator is usually that the nearest operation of distance controlling unit is calculated Son thus can further save wiring space, reduce wiring complexity.
It above are only illustrative explanation, in actual operation, those skilled in the art can be according to the needs of practical application The operation operator connecting with control unit is selected, the disclosure is not especially limited it.
In one embodiment of the disclosure, the input/output interface 203 is two, is set up separately the two of the node chip End, two input/output interfaces are connect with described control unit, and control unit is made to pass through input/output interface and external progress Data exchange.
It above are only illustrative explanation, in actual operation, those skilled in the art can be according to the needs of practical application The installation position of input/output interface is selected, the disclosure is not especially limited it.
Fig. 4 is according to the structural block diagram of the operation operator 202 of one embodiment of the disclosure, as shown in figure 4, real in the disclosure one It applies in mode, the operation operator 202 includes: one or more arithmetic units 2021,2022 and of one or more storage units Clock input interface 2023, in which:
The arithmetic unit 2021 is connect with the storage unit 2022 of higher level's operation operator, for reading higher level's operation operator The data that store in storage unit 2022 simultaneously carry out operation;
The arithmetic unit 2021 is connect with storage unit 2022, and the data for obtaining operation are stored in storage unit In 2022, called for junior's operation operator;
The clock input interface 2023 is connect with the clock output interface of described control unit.
In this embodiment, by the data connection step by step of mutual concatenated operation operator, each operation operator can The data of oneself needs are enough obtained, and this cascaded structure can save wiring space, reduces wiring complexity.
Further, the operation operator is made of microelectronic circuit, and the microelectronic circuit is managed by COMS, NMOS tube Composition.
In practical applications, those skilled in the art can select to match with operation purpose according to the needs of practical application Operation operator and storage unit, the disclosure are not especially limited the selection of operation operator and storage unit, related model.
In some embodiments, the data outputting unit of the chopped-off head node chip and the data of external control device are defeated Enter unit connection, for the operation result of data processing equipment to be returned to external control device;The data of superior node chip Input unit is connect with the data outputting unit of downstream site chip, for receiving the number obtained after downstream site chip operation According to;One or more data input cells of the chopped-off head node chip and one or more data of external control device export Unit connection, to receive the data input or order input of external control device, one or more numbers of superior node chip It is connect according to output unit with one or more data input cells of downstream site chip, for sending number to junior's node chip According to input or order input.
In some embodiments, the data processing equipment further includes signal conversion unit, and two node chips are connected It connects, for carrying out signal voltage adaptation.
It in some embodiments, further include one or more clock crystals, the clock signal output of the clock crystal Interface is connect with the clock signal input interface of a node chip in the data processing equipment.
Fig. 5 is the flow chart of an embodiment of the data transmission method for uplink based on data processing equipment 10 of the present invention.Such as Shown in Fig. 5, based on the data processing unit data transmission method for uplink the following steps are included:
Step S1, the same level node chip such as judge whether to have in buffer queue at the data to be sent;
Step S2, if so, then detecting whether the busy signal input pin is high/low level;
Step S3, if the busy signal input pin be high/low level, wait the busy signal input pin from After high/low level becomes level low/high, start to send the data in buffer queue;
Step S4 sends the data in buffer queue if the busy signal input pin is level low/high immediately.
In some embodiments, the data transmission method for uplink further comprises:
When the same level node chip detects that the busy signal input pin is high/low level, by the busy signal efferent duct Foot also exports as high/low level.
Fig. 6 is the flow chart of the another embodiment of the data transmission method for uplink based on data processing equipment 10 of the present invention. As shown in fig. 6, based on the data processing unit data transmission method for uplink the following steps are included:
Step S11, the same level node chip such as judge whether to have in buffer queue at the data to be sent;
Step S12, if so, then exporting the busy signal output pin for high/low level;
Step S13 detects whether the busy signal input pin is high/low level;
Step S14 sends the data in buffer queue if the busy signal input pin is level low/high immediately;
Step S15, judges whether the data in the buffer queue are sent completely;
Step S16, it is if the data in the buffer queue have been sent completely, the busy signal output pin is defeated It is out level low/high;
Step S17, if the data in the buffer queue detect the busy signal input when being sent completely not yet Pin is high/low level, then continues to send data until the total data in buffer queue is sent completely.
Fig. 7 is the flow chart of the another embodiment of the data transmission method for uplink based on data processing equipment 10 of the present invention. As shown in fig. 7, on the basis of the embodiment described in Fig. 6, the data transmission method for uplink of the processing unit based on the data, in step It is further comprising the steps of after rapid S12:
Step S18 is waited scheduled after the busy signal output pin of the same level node chip exports as high/low level The protection interval time, then the transmission of data is carried out, to ensure that next stage node chip will not send data simultaneously.
In some embodiments, the protection interval time either synchronously or asynchronously communicates mould according to taking between node chip Formula and be set separately.
Some embodiments of the present invention as previously shown control series connection node by configuring busy signal input and output pin Data between chip are sent, and the setting of join protection interval time, effectively prevent the data hair between node chip Send conflict.
Fig. 8 is the flow chart of an embodiment of the method for allocating tasks based on data processing equipment 10 of the present invention.Such as Shown in Fig. 8, the method for allocating tasks of the processing unit based on the data is suitable for control unit, and the method includes following Step:
The node chip of step S21, Xiang Suoshu data processing equipment sends order, so that node chip enters deactivation shape State;
The node chip of step S22, Xiang Suoshu data processing equipment sends address distribution order, is followed successively by each node chip Distribute chip address;
Step S23 distributes calculating task according to the chip address of each node chip for each node chip.
In some embodiments, described send to the node chip of the data processing equipment is ordered, so that node core Piece enters the step of deactivated state and includes:
Order is sent respectively to each node chip of the data processing equipment, so that each node chip sequentially enters deactivation State.
In some embodiments, described send to the node chip of the data processing equipment is ordered, so that node core Piece enters the step of deactivated state and includes:
A subcommand is sent to all node chips of the data processing equipment, is gone so that each node chip enters simultaneously State living.
Fig. 9 is the flow chart of the another embodiment of the method for allocating tasks based on data processing equipment 10 of the present invention. As shown in figure 9, on the basis of embodiment shown in Fig. 8, the method for allocating tasks of the processing unit based on the data, in step It is further comprising the steps of after S23:
Step S24, node chip execute one or many Hash operations according to the calculating task for its distribution.
Specifically, Hash operation may include cryptographic Hash operation or Hash collision operation.
Figure 10 is the flow chart of the another embodiment of the method for allocating tasks based on data processing equipment 10 of the present invention. As shown in Figure 10, the method for allocating tasks is suitable for node chip, the described method comprises the following steps:
Order is distributed in step S31, the address that reception control unit is sent;
Whether step S32 judges node chip currently in deactivated state;
Step S33, when node chip is in deactivated state, order is distributed in the address that parsing control unit is sent, and will be Its chip address distributed is saved to register, and is transferred to state of activation;
Step S34, when node chip is active, which does not parse the address point of control unit transmission With order, next stage node chip is directly forwarded it to.
The task input command format that data processing equipment 10 uses includes HCN field, starting random number offset SNO word Section.HCN field is used to control the calculation times of each node chip, for example, it is assumed that the calculating task executed needs 2^32 meter It calculates, i.e. random number nonce is incremented by since initial value, traverses 2^32 numerical value, then 32 node chips is connected in series, each Node chip only needs to calculate 2^27 step.Originating random number offset SNO field includes a number.
The address the SetAddress distribution command format that data processing equipment 10 uses includes chip address ChipAddr word Section, ChipAddr field specify the chip address of individual node chip.The calculating task of each node chip by SNO and Numerical value in ChipAddr field determines.
Distribution for chip address, firstly, control unit issues a ChainInactive order for a node core Piece is transferred to deactivation (Inactive) state.In specific implementation, can all node chips together enter deactivation (Inactive) shape Deactivation (Inactive) state can also be arranged in state one by one.Then, control unit is sent to this node chip The distribution order of the address SetAddress, distributes arbitrary address to the node chip.
Under deactivation (Inactive) state, node chip can parse the distribution order of the address SetAddress and in register Middle its address of preservation, is then transferred to activation (Active) state.In specific implementation, the node chip after distributing address can basis The order of control unit enters activation (Active) state, also can parse after address distribution is ordered automatically into activation (Active) state.
Under activation (Active) state, node chip does not parse the distribution order of the address SetAddress, directly by its turn Issue next stage node chip.It connects for N number of node chip, control unit needs successively to issue N number of SetAddress order To distribute arbitrary address to node chip one by one.For example, control CPU needs for the mine machine of 256 node chip cascades 256 addresses SetAddress distribution orders of continuous sending could complete the address setting of all node chips.
Figure 11 is the flow chart of an embodiment of the method for diagnosing faults based on data processing equipment 10 of the present invention.Such as Shown in Figure 11, the method for diagnosing faults of the processing unit based on the data is suitable for control unit, the method includes with Lower step:
The node chip of step S41, Xiang Suoshu data processing equipment sends working condition querying command;
Step S42, each node chip of the data processing equipment successively forward the working condition querying command;
Step S43, judge each node chip chip address whether in the working condition querying command specify core Piece address matches;
Step S44, if the chip address of node chip and the chip address phase specified in the working condition querying command Matching, return register data;
Step S45 judges the working condition of node chip according to the register data that node chip returns.
In some embodiments, the method for diagnosing faults further comprises:
If detection discovery does not receive the node to match with the chip address specified in the working condition querying command The register data that chip returns, then judge that the node chip breaks down.
Figure 12 is the flow chart of the another embodiment of the method for diagnosing faults based on data processing equipment 10 of the present invention. As shown in figure 12, the method for diagnosing faults of the processing unit based on the data is suitable for control unit, the method includes Following steps:
The node chip of step S51, Xiang Suoshu data processing equipment sends working condition querying command;
Step S52, each node chip of the data processing equipment successively forward the working condition querying command;
Step S53, judges whether the working condition querying command specifies the working condition for inquiring whole node chips;
Step S54, if it is, each node chip successively return register data;
Step S55 judges the working condition of node chip according to the register data that node chip returns.
In some embodiments, the method for diagnosing faults further comprises:
When the working condition querying command specifies the working condition for inquiring whole node chips, control unit according to The number for the register data that the node chip received returns, judges which node chip breaks down.
After node chip receives the working condition inquiry name of control unit, it is transmitted to next stage node core first Piece, can numerical value by UART interface corresponding registers for the node chip that matches of address in chip address and order (for example, just in the numerical value of operation) returns to control unit.Control unit can be according to the number of the node chip response received Carry out the core number in determining device.It is thus possible to the fault diagnosis of node chip group be carried out using order, if worked When some node chip there is failure, on the one hand can should may determine that substantially from the Hash rate of data processing equipment Out, which node chip operation numerical value can also not replied for a long time according to be judged.
Figure 13 is the structural schematic diagram of the calculating equipment 40 based on one embodiment of the invention.As shown in figure 13, the calculating Equipment 40 include aforementioned data processing unit 10 and control unit 30, the data processing equipment 10 by external interface with it is described Control unit 30 communicates to connect.
Particular embodiments described above has carried out further in detail the purpose of the present invention, technical scheme and beneficial effects It describes in detail bright, it should be understood that the above is only a specific embodiment of the present invention, is not intended to restrict the invention, it is all Within the spirit and principles in the present invention, any modification, equivalent substitution, improvement and etc. done should be included in guarantor of the invention Within the scope of shield.

Claims (17)

1. a kind of method for diagnosing faults applied to the data processing equipment with multiple node chips being sequentially connected in series, It is characterized in that, described method includes following steps:
Working condition querying command is sent to the node chip of the data processing equipment;
Each node chip of the data processing equipment successively forwards the working condition querying command;
Judge whether the chip address of each node chip matches with the chip address specified in the working condition querying command;
If the chip address of node chip matches with the chip address specified in the working condition querying command, deposit is returned Device data;
According to the register data that node chip returns, the working condition of node chip is judged.
2. method for diagnosing faults according to claim 1, which is characterized in that the register returned according to node chip Data judge the working condition of node chip, comprising:
If detection discovery does not receive the node chip to match with the chip address specified in the working condition querying command The register data of return then judges that the node chip breaks down.
3. a kind of method for diagnosing faults applied to the data processing equipment with multiple node chips being sequentially connected in series, It is characterized in that, described method includes following steps:
Working condition querying command is sent to the node chip of the data processing equipment;
Each node chip of the data processing equipment successively forwards the working condition querying command;
Judge whether the working condition querying command specifies the working condition for inquiring whole node chips;
If the working condition querying command specifies the working condition for inquiring whole node chips, each node chip is successively returned Register data;
According to the register data that node chip returns, the working condition of node chip is judged.
4. method for diagnosing faults according to claim 3, which is characterized in that the register returned according to node chip Data judge the working condition of node chip, comprising:
When the working condition querying command specifies the working condition for inquiring whole node chips, according to the node received The number for the register data that chip returns, judges the node chip to break down.
5. method for diagnosing faults according to claim 1 or 3, which is characterized in that the data processing equipment includes multiple The node chip being sequentially connected in series, the data outputting unit of chopped-off head node chip and the data input cell of external control device Connection, for the operation result of data processing equipment to be returned to external control device;The data of superior node chip input single Member is connect with the data outputting unit of downstream site chip, for receiving the data obtained after downstream site chip operation;It is described One or more data input cells of chopped-off head node chip and one or more data outputting units of external control device connect It connects, to receive the data input or order input of external control device, one or more data output of superior node chip Unit is connect with one or more data input cells of downstream site chip, for sending data input to junior's node chip Or order input.
6. method for diagnosing faults according to claim 1 or 3, which is characterized in that the node chip includes control unit With multiple operation operators, the operation operator is divided into two groups or multiple groups, and every group of operation operator includes that two or more series connection connect The operation operator connect, the chopped-off head operation operator in every group of operation operator are connect with described control unit.
7. method for diagnosing faults according to claim 1 or 3, which is characterized in that the operation operator includes: arithmetic unit And storage unit;Wherein:
The arithmetic unit is connect with the storage unit of higher level's operation operator, is deposited for reading in higher level's operation operator storage unit The data of storage simultaneously carry out operation;
The arithmetic unit is connect with storage unit, and the data for obtaining operation are stored in storage unit, is transported for junior Operator is calculated to call.
8. method for diagnosing faults according to claim 1 or 3, which is characterized in that the data processing equipment further includes letter Number converting unit connects two node chips, for carrying out signal voltage adaptation.
9. method for diagnosing faults according to claim 1 or 3, which is characterized in that the data processing equipment further includes one A or multiple clock crystals, the clock signal output interface and a node in the data processing equipment of the clock crystal The clock signal input interface of chip connects.
10. method for diagnosing faults according to claim 5, which is characterized in that
The node chip is provided with busy signal input order and busy signal output order, the busy signal input order and busy letter Number output order is sent for controlling data of the respective nodes chip on uplink communication direction.
11. method for diagnosing faults according to claim 10, which is characterized in that the busy signal output pin is low/high When level, instruction can forward the data of next stage node chip return;When the busy signal output pin is high/low level, Instruction the same level node chip or even higher level of node chip will or send data.
12. method for diagnosing faults described in 0 or 11 according to claim 1, which is characterized in that when the busy signal of node chip inputs When pin is high/low level, the busy signal output pin of the node chip is also high/low level.
13. method for diagnosing faults according to claim 11, which is characterized in that the same level node chip has data latency transmission When, when detecting busy signal input pin is high/low level, when the busy signal input pin being waited to switch to level low/high, Retransmit data;When detecting the busy signal input pin is level low/high, data are sent immediately.
14. method for diagnosing faults according to claim 11, which is characterized in that the same level node chip has data latency transmission When, busy signal output pin is exported as high/low level, is by the busy signal output pin output after data are sent completely Level low/high.
15. method for diagnosing faults according to claim 11, which is characterized in that the same level node chip is sending data When, if detecting, busy signal input pin is high/low level, continues to send data, until the total data hair in buffer queue Send completion.
16. method for diagnosing faults according to claim 14, which is characterized in that the busy signal efferent duct of the same level node chip Foot exports after high/low level, to wait the scheduled protection interval time, then carry out the transmission of data.
17. method for diagnosing faults according to claim 16, which is characterized in that the protection interval time is according to node core Either synchronously or asynchronously communication pattern is taken between piece and is set separately.
CN201711402304.0A 2017-12-21 2017-12-21 Method for diagnosing faults Pending CN109947605A (en)

Priority Applications (1)

Application Number Priority Date Filing Date Title
CN201711402304.0A CN109947605A (en) 2017-12-21 2017-12-21 Method for diagnosing faults

Applications Claiming Priority (1)

Application Number Priority Date Filing Date Title
CN201711402304.0A CN109947605A (en) 2017-12-21 2017-12-21 Method for diagnosing faults

Publications (1)

Publication Number Publication Date
CN109947605A true CN109947605A (en) 2019-06-28

Family

ID=67006296

Family Applications (1)

Application Number Title Priority Date Filing Date
CN201711402304.0A Pending CN109947605A (en) 2017-12-21 2017-12-21 Method for diagnosing faults

Country Status (1)

Country Link
CN (1) CN109947605A (en)

Cited By (5)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN111324070A (en) * 2020-03-04 2020-06-23 明峰医疗系统股份有限公司 Debugging method of CT serial detector module cluster based on FPGA
CN112557882A (en) * 2021-02-19 2021-03-26 深圳市明微电子股份有限公司 Chip initial address self-adaptive detection method, device, equipment and storage medium
CN112732629A (en) * 2020-12-31 2021-04-30 明峰医疗系统股份有限公司 CT detector data transmission structure and data transmission method based on source synchronous LVDS-SERDES
CN112860622A (en) * 2021-02-08 2021-05-28 山东云海国创云计算装备产业创新中心有限公司 Processing system and system on chip
CN117093523A (en) * 2023-10-20 2023-11-21 合肥为国半导体有限公司 Chip array, fault positioning method thereof and electronic equipment

Citations (5)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
US20060031593A1 (en) * 2004-08-09 2006-02-09 Sinclair Alan W Ring bus structure and its use in flash memory systems
CN102163184A (en) * 2011-03-22 2011-08-24 中兴通讯股份有限公司 Master-slave transmission system and method based on special multi-chip serial interconnection interface
CN102981992A (en) * 2012-11-28 2013-03-20 中国人民解放军国防科学技术大学 On-chip communication method and device of integrated circuit based on asynchronous structure
CN105760324A (en) * 2016-05-11 2016-07-13 北京比特大陆科技有限公司 Data processing device and server
CN107037791A (en) * 2017-03-07 2017-08-11 佛山华数机器人有限公司 A kind of producing line device visualization method for diagnosing faults

Patent Citations (5)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
US20060031593A1 (en) * 2004-08-09 2006-02-09 Sinclair Alan W Ring bus structure and its use in flash memory systems
CN102163184A (en) * 2011-03-22 2011-08-24 中兴通讯股份有限公司 Master-slave transmission system and method based on special multi-chip serial interconnection interface
CN102981992A (en) * 2012-11-28 2013-03-20 中国人民解放军国防科学技术大学 On-chip communication method and device of integrated circuit based on asynchronous structure
CN105760324A (en) * 2016-05-11 2016-07-13 北京比特大陆科技有限公司 Data processing device and server
CN107037791A (en) * 2017-03-07 2017-08-11 佛山华数机器人有限公司 A kind of producing line device visualization method for diagnosing faults

Cited By (7)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN111324070A (en) * 2020-03-04 2020-06-23 明峰医疗系统股份有限公司 Debugging method of CT serial detector module cluster based on FPGA
CN112732629A (en) * 2020-12-31 2021-04-30 明峰医疗系统股份有限公司 CT detector data transmission structure and data transmission method based on source synchronous LVDS-SERDES
CN112860622A (en) * 2021-02-08 2021-05-28 山东云海国创云计算装备产业创新中心有限公司 Processing system and system on chip
CN112557882A (en) * 2021-02-19 2021-03-26 深圳市明微电子股份有限公司 Chip initial address self-adaptive detection method, device, equipment and storage medium
CN112557882B (en) * 2021-02-19 2021-05-28 深圳市明微电子股份有限公司 Chip initial address self-adaptive detection method, device, equipment and storage medium
CN117093523A (en) * 2023-10-20 2023-11-21 合肥为国半导体有限公司 Chip array, fault positioning method thereof and electronic equipment
CN117093523B (en) * 2023-10-20 2024-01-26 合肥为国半导体有限公司 Chip array, fault positioning method thereof and electronic equipment

Similar Documents

Publication Publication Date Title
CN109947605A (en) Method for diagnosing faults
JP5793690B2 (en) Interface device and memory bus system
CN101383712B (en) Routing node microstructure for on-chip network
CN102970247B (en) Effective communication time scheduling method of time-triggered network
CN103595627A (en) NoC router based on multicast dimension order routing algorithm and routing algorithm thereof
CN110995598A (en) Variable-length message data processing method and scheduling device
CN103312614B (en) A kind of multicast message processing method, line card and communication equipment
CN109947555A (en) Data processing equipment, data transmission method for uplink and calculating equipment
CN104717152A (en) Method and device for achieving interface caching dynamic allocation
CN109194430A (en) A kind of C6678 distribution type system time synchronous method and system based on SRIO
CN105786734B (en) Data transmission method, expansion device, peripheral equipment and system
CN116150051A (en) Command processing method, device and system
CN110825210B (en) Method, apparatus, device and medium for designing clock tree structure of system on chip
CN109933433B (en) GPU resource scheduling system and scheduling method thereof
CN109947556A (en) Method for allocating tasks
CN109101451A (en) Chip-in series circuit calculates equipment and communication means
CN110519145B (en) Multi-master 485 route communication method and system based on bidirectional ring network
CN103152275A (en) Router suitable for network on chip and allowable for configuring switching mechanisms
CN105893321A (en) Path diversity-based crossbar switch fine-grit fault-tolerant module in network on chip and method
WO2010012172A1 (en) Data processing method, controller and system
US20160085706A1 (en) Methods And Systems For Controlling Ordered Write Transactions To Multiple Devices Using Switch Point Networks
CN115757251A (en) Data transmission system, method, device and medium
CN112818183B (en) Data synthesis method, device, computer equipment and storage medium
US20230305976A1 (en) Data flow-based neural network multi-engine synchronous calculation system
TW202147141A (en) A computing device and computing system for digital currency

Legal Events

Date Code Title Description
PB01 Publication
PB01 Publication
SE01 Entry into force of request for substantive examination
SE01 Entry into force of request for substantive examination
RJ01 Rejection of invention patent application after publication
RJ01 Rejection of invention patent application after publication

Application publication date: 20190628