CN109358993A

CN109358993A - The processing method and processing device of deep neural network accelerator failure

Info

Publication number: CN109358993A
Application number: CN201811122913.5A
Authority: CN
Inventors: 孔庆凯
Original assignee: Zhongke Material (beijing) Technology Co Ltd
Current assignee: Zhongke Material (beijing) Technology Co Ltd
Priority date: 2018-09-26
Filing date: 2018-09-26
Publication date: 2019-02-19

Abstract

The present embodiments relate to a kind of processing method and processing devices of deep neural network accelerator failure, which comprises generates the multiple groups input data and output result corresponding with the input data for being used for test depth neural network；The accelerator data for opening the deep neural network is routed to TCH test channel, and the input data is sent to the accelerator by the TCH test channel, to complete test run；Multiple arithmetic elements are read through operation and are stored in the operational data of corresponding register；The operational data is matched with the corresponding output result；If mismatching, it is determined that the corresponding arithmetic element of the operational data is trouble unit, and closes the trouble unit, it can make DNN accelerator that there is self-testing capability, manufacturing fault is quickly tested in the production phase and is debugged, testing cost is reduced, and improves used chip yield.

Description

The processing method and processing device of deep neural network accelerator failure

Technical field

The present embodiments relate to deep neural network technical field more particularly to a kind of events of deep neural network accelerator The processing method and processing device of barrier.

Background technique

As deep neural network (Deep Neural Network, DNN) is in speech recognition and image recognition tasks Breakthrough development has been born and has more and more been exclusively used in accelerating the hardware accelerator kernel (Neural- of deep neural network Network Processing Unit, NPU).Meanwhile the calculation amount of DNN is also with application model, at operational model and data Reason amount and increased dramatically so that deep neural network accelerator design scale is also increasing, arithmetic core is more and more, no matter Still all become in significance level in hardware size and CPU, GPU have the arithmetic core of par, but it is corresponding Means of testing do not make corresponding development but, lack efficiently integrated automatic means of testing.

The calculating of DNN arithmetic accelerator has its particularity: 1, being based on DNN model, operation is layering and has strong Consistency template can follow.2, the template of consistency causes operation packed, and single step (individual instructions operation) is exactly one group of operation Set, hardware fault is once occur not can determine that the mistake of which part original part.3, in order to accelerate one group of operation, DNN kernel Often there is multiple arithmetic elements (PE), constitute operation and the memory array of certain scale, and form flexible data path. 4, DNN network structure has flexibility, is adapted to the hardware of variation.It is increasing that die area is occupied in DNN accelerator In the case of, manufacturing fault will be inevitable, it is therefore necessary to quickly be detected to the chip of coming of new, determine fault bit Point, and the methods of frequency reducing, shielding debugging are carried out to corresponding fault bit point.

Common chip manufacturing test method have scanning (scan) designing technique, built-in self-test (BIST) designing technique, Boundary scan (Boundary Scan) technology etc..Wherein Scan Design be in order to solve sequence circuit can test method, need By circuit sequence unit be substituted for can scanning element, the load of last chain type comes out.This method is general but overhead more Greatly, both without the data path using neural network itself, also without the characteristic using accelerator module.Built-in self-test Test vector is generated by circuit itself, and whether the test result obtained by its logic judgment is correct, applies in terms of memory test More, the use on NPU also needs to explore.And boundary scan does not need generally generally as common debugging method in DNN It is used on kernel.

Therefore, lack in the prior art applied to the testing scheme on DNN.

Summary of the invention

The embodiment of the invention provides a kind of processing method and processing devices of deep neural network accelerator failure, can make DNN accelerator has self-testing capability, quickly tests manufacturing fault in the production phase and debugs, and reduces testing cost, mentions High Availabitity chip output.

In a first aspect, the embodiment of the present invention provides a kind of processing method of deep neural network accelerator failure, comprising:

Generate the network inputs data and output result corresponding with the input data for being used for test depth neural network；

The accelerator data for opening the deep neural network is routed to TCH test channel, and passes through the TCH test channel for institute It states input data and is sent to the accelerator, to complete test run；

Multiple arithmetic elements are read through operation and are stored in the operational data of corresponding register；

The operational data is matched with the corresponding output result；

If mismatching, it is determined that the corresponding arithmetic element of the operational data is trouble unit, and closes the failure Unit.

In a possible embodiment, the method also includes:

Increase bypass circuit in the input node and output node of each arithmetic element.

In a possible embodiment, the accelerator data for opening the deep neural network is routed to test Channel, and the input data is sent to by the accelerator by the TCH test channel, comprising:

The accelerator data for opening the deep neural network is routed to TCH test channel, closes bypass circuit, and pass through institute It states TCH test channel and the input data is sent to the accelerator, to complete test run.

In a possible embodiment, described to read multiple arithmetic elements through operation and be stored in corresponding register Operational data, comprising:

Bypass circuit is opened, multiple arithmetic elements are subjected to test operation through input data and is stored in corresponding register Operational data reads storage unit.

In a possible embodiment, the method also includes: modification register circuit, accelerator is all to be measured Register is by dedicated serial circuit connection, and after opening reading buffer status, the value of all registers passes through dedicated serial Circuit is output to storage unit by turn.

Second aspect, the embodiment of the present invention provide a kind of processing unit of deep neural network accelerator failure, comprising:

Generation module, for generate be used for test depth neural network network inputs data and with the input data pair The output result answered；

Test module, the accelerator data for opening the deep neural network is routed to TCH test channel, and passes through institute It states TCH test channel and the input data is sent to the accelerator, to complete test run；

Read module, for reading multiple arithmetic elements through operation and being stored in the operational data of corresponding register；

Matching module, for matching the operational data with the corresponding output result；

Control module, if for judging to mismatch, it is determined that the corresponding arithmetic element of the operational data is trouble unit, And close the trouble unit.

In a possible embodiment, the read module, specifically for the input in each arithmetic element Node and output node increase bypass circuit.

In a possible embodiment, the test module, specifically for opening adding for the deep neural network Fast device data path closes bypass circuit to TCH test channel, and the input data is sent to institute by the TCH test channel Accelerator is stated, to complete test run.

In a possible embodiment, the read module is specifically used for opening bypass circuit, by multiple operation lists Member reads storage unit through the operational data that input data carries out test operation and is stored in corresponding register.

In another possible embodiment, the read module is also used to after opening reading buffer status, institute There is the value of register to be output to storage unit by turn by dedicated serial circuit, wherein modification register circuit is complete by accelerator Portion's register to be measured passes through dedicated serial circuit connection.

In another possible embodiment, the control module is also used to after determining wrong site, utilizes bypass Circuit removes error unit.

The processing scheme of deep neural network accelerator failure provided in an embodiment of the present invention, by generating for testing depth Spend the multiple groups input data and output result corresponding with the input data of neural network；Open the deep neural network Accelerator data is routed to TCH test channel, and the input data is sent to the accelerator by the TCH test channel, with Complete test run；Multiple arithmetic elements are read through operation and are stored in the operational data of corresponding register；By the operand It is matched according to the corresponding output result；If mismatching, it is determined that the corresponding arithmetic element of the operational data be therefore Hinder unit, and close the trouble unit, can make DNN accelerator that there is self-testing capability, quickly be tested in the production phase Manufacturing fault is simultaneously debugged, and testing cost is reduced, and improves used chip yield；Neural network model consistency is utilized and adds Fast device data path characteristics simplify test vector generation and Acquisition Circuit, and the ram for directly using it internal as test to The memory space of amount effectively reduces detection circuit overhead；It can determine the site of mistake, and directly utilize circuit Bypass functionality excludes the site of mistake.

Detailed description of the invention

Fig. 1 provides a kind of process signal of the processing method of deep neural network accelerator failure for the embodiment of the present invention Figure；

Fig. 2 be the present embodiments relate to deep neural network accelerator troubleshooting circuit diagram；

Fig. 3 provides a kind of structural representation of the processing unit of deep neural network accelerator failure for the embodiment of the present invention Figure；

Fig. 4 shows for a kind of hardware configuration for the processing equipment that the embodiment of the present invention provides deep neural network accelerator failure It is intended to.

Specific embodiment

In order to make the object, technical scheme and advantages of the embodiment of the invention clearer, below in conjunction with the embodiment of the present invention In attached drawing, technical scheme in the embodiment of the invention is clearly and completely described, it is clear that described embodiment is A part of the embodiment of the present invention, instead of all the embodiments.Based on the embodiments of the present invention, those of ordinary skill in the art Every other embodiment obtained without making creative work, shall fall within the protection scope of the present invention.

In order to facilitate understanding of embodiments of the present invention, it is further explained below in conjunction with attached drawing with specific embodiment Bright, embodiment does not constitute the restriction to the embodiment of the present invention.

Fig. 1 provides a kind of process signal of the processing method of deep neural network accelerator failure for the embodiment of the present invention Figure, as shown in Figure 1, this method specifically includes:

S101, the multiple groups input data and output corresponding with the input data for being used for test depth neural network are generated As a result.

DNN accelerator can be made at least to recycle one week using one in the present embodiment, and cover the operation of accelerator The network model of all data paths of unit is generated for testing the mutually orthogonal network inputs data of the multiple groups of DNN and being somebody's turn to do The corresponding output of input data is as a result, input data and output result can be stored in electricity as shown in Figure 2 in the form of vector Test vectorgenerator module in road.

Wherein, network model used in the present embodiment can be, but be not limited to: the networks such as ANN, CNN, RNN, LSTM Model.

S102, the accelerator data for opening the deep neural network are routed to TCH test channel, and logical by the test The input data is sent to the accelerator by road, to complete test run.

S103, multiple arithmetic elements are read through operation and are stored in the operational data of corresponding register.

It, can be using the input node and output node of each arithmetic element in accelerator in the present embodiment referring to Fig. 2 Upper increase bypass circuit assists completing the test of the arithmetic element of DNN accelerator by bypass circuit, and bypass circuit can be by such as NBIST control in circuit shown in Fig. 2 is controlled, and NBIST control, which is removed, in the present embodiment carries out bypass circuit Control is outer, can also control the arithmetic element for the DNN accelerator tested.

Specifically, it is logical that the accelerator data that the deep neural network is opened in NBIST control control is routed to test Bypass circuit is closed in road, and it is defeated by the TCH test channel will to be stored in the multiple groups in test vectorgenerator module Enter data and be sent to the accelerator, multiple arithmetic element PE of accelerator carry out operation to input data and obtain corresponding operation Data, and the operational data is stored in the corresponding register of each PE, to complete test run.

Bypass circuit is opened in NBIST control control, and multiple arithmetic elements are carried out test operation simultaneously through input data The operational data for being stored in corresponding register reads storage unit.

S104, the operational data is matched with the corresponding output result.

If S105, mismatching, it is determined that the corresponding arithmetic element of the operational data is trouble unit, and described in closing Trouble unit.

Referring to Fig. 2, NBIST control control by the operational data of storage unit be sent to Data comparator into Row compares, and Data comparator reads the output in test vector generator module as a result, by operation knot simultaneously Fruit is compared with output result, when the two mismatches, it is determined that the corresponding arithmetic element of the unmatched operation result is Trouble unit, and the information of the trouble unit is sent to NBIST control, NBIST control will write dead corresponding side Road opens or closes control, so that the DNN accelerator shielding falls the arithmetic element of failure.

In another embodiment of the present embodiment, it also can be used and turn the mode of serial circuit and obtain the data of register, Specifically include: modification register circuit, by accelerator, all register to be measured is read by dedicated serial circuit connection when opening After buffer status, the value of all registers is output to storage unit by dedicated serial circuit by turn.

The processing method of deep neural network accelerator failure provided in an embodiment of the present invention, by generating for testing depth Spend the multiple groups input data and output result corresponding with the input data of neural network；Open the deep neural network Accelerator data is routed to TCH test channel, and the input data is sent to the accelerator by the TCH test channel, with Complete test run；Multiple arithmetic elements are read through operation and are stored in the operational data of corresponding register；By the operand It is matched according to the corresponding output result；If mismatching, it is determined that the corresponding arithmetic element of the operational data be therefore Hinder unit, and close the trouble unit, can make DNN accelerator that there is self-testing capability, quickly be tested in the production phase Manufacturing fault is simultaneously debugged, and testing cost is reduced, and improves used chip yield；Neural network model consistency is utilized and adds Fast device data path characteristics simplify test vector generation and Acquisition Circuit, and the memory for directly using it internal is as test The memory space of vector effectively reduces detection circuit overhead；It can determine the site of mistake, and directly utilize circuit Bypass functionality exclude mistake site.

Fig. 3 provides a kind of structural representation of the processing unit of deep neural network accelerator failure for the embodiment of the present invention Figure, as shown in figure 3, the device specifically includes:

Generation module 301, for generate be used for test depth neural network network inputs data and with the input number According to corresponding output result；

Test module 302, the accelerator data for opening the deep neural network is routed to TCH test channel, and passes through The input data is sent to the accelerator by the TCH test channel, to complete test run；

Read module 303, for reading multiple arithmetic elements through operation and being stored in the operational data of corresponding register；

Matching module 304, for matching the operational data with the corresponding output result；

Control module 305, if for judging to mismatch, it is determined that the corresponding arithmetic element of the operational data is Trouble ticket Member, and close the trouble unit.

Optionally, the read module 303, specifically for the input node and output node in each arithmetic element Increase bypass circuit.

Optionally, the test module 302, the accelerator data specifically for opening the deep neural network are routed to TCH test channel closes bypass circuit, and the input data is sent to the accelerator by the TCH test channel, to complete Test run.

Optionally, the read module 303 is specifically used for opening bypass circuit, by multiple arithmetic elements through input data Carry out test operation and be stored in the operational data of corresponding register to read storage unit.

Optionally, the read module 303 is also used to after opening reading buffer status, and the value of all registers is logical It crosses dedicated serial circuit and is output to storage unit by turn, wherein modification register circuit, all register to be measured leads to by accelerator Cross dedicated serial circuit connection.

Optionally, the control module, is also used to after determining wrong site, removes error unit using bypass circuit.

The processing unit of deep neural network accelerator failure provided in this embodiment can be depth as shown in Figure 3 The processing unit of neural network accelerator failure, the executable processing method of deep neural network accelerator failure as shown in figure 1 All steps, and then realize the technical effect of the processing method of deep neural network accelerator failure shown in Fig. 1, specifically please refer to Fig. 1 associated description, for succinct description, therefore not to repeat here.

Fig. 4 is a kind of hardware configuration of the processing equipment of deep neural network accelerator failure provided in an embodiment of the present invention Schematic diagram, as shown in figure 4, the processing equipment of the deep neural network accelerator failure specifically includes: processor 410, memory 420, transceiver 430.

Processor 410 can be central processing unit (English: central processing unit, CPU) or CPU and The combination of hardware chip.Above-mentioned hardware chip can be specific integrated circuit (English: application-specific Integrated circuit, ASIC), programmable logic device (English: programmable logic device, PLD) or A combination thereof.Above-mentioned PLD can be Complex Programmable Logic Devices (English: complex programmable logic Device, CPLD), field programmable gate array (English: field-programmable gate array, FPGA), general battle array Row logic (English: generic array logic, GAL) or any combination thereof.

Memory 420 is for storing various applications, operating system and data.Memory 420 can pass the data of storage It is defeated by processor 410.Memory 420 may include volatile memory, non-volatile dynamic random access memory (English: Nonvolatile random access memory, NVRAM), phase change random access memory (English: phase change RAM, PRAM), magnetic-resistance random access memory (English: magetoresistive RAM, MRAM) etc., a for example, at least magnetic Disk storage device, Electrical Erasable programmable read only memory (English: electrically erasable programmable Read-only memory, EEPROM), flush memory device, such as anti-or flash memory (NOR flash memory) or anti-and flash memory (NAND flash memory), semiconductor devices, such as solid state hard disk (English: solid state disk, SSD) etc..Storage Device 420 can also include the combination of the memory of mentioned kind.

Transceiver 430, for sending and/or receiving data, transceiver 430 can be antenna etc..

The course of work of each device is as follows:

Processor 410, for generate be used for test depth neural network multiple groups input data and with the input data Corresponding output result；The accelerator data for opening the deep neural network is routed to TCH test channel, and passes through the test The input data is sent to the accelerator by channel, to complete test run；Multiple arithmetic elements are read through operation and are deposited It is stored in the operational data of corresponding register；The operational data is matched with the corresponding output result；If mismatching, It then determines that the corresponding arithmetic element of the operational data is trouble unit, and closes the trouble unit.

The processing equipment of deep neural network accelerator failure provided in this embodiment can be depth as shown in Figure 3 The processing equipment of neural network accelerator failure, the executable processing method of deep neural network accelerator failure as shown in figure 1 All steps, and then realize the technical effect of the processing method of deep neural network accelerator failure shown in Fig. 1, specifically please refer to Fig. 1 associated description, for succinct description, therefore not to repeat here.

Professional should further appreciate that, described in conjunction with the examples disclosed in the embodiments of the present disclosure Unit and algorithm steps, can be realized with electronic hardware, computer software, or a combination of the two, hard in order to clearly demonstrate The interchangeability of part and software generally describes each exemplary composition and step according to function in the above description. These functions are implemented in hardware or software actually, the specific application and design constraint depending on technical solution. Professional technician can use different methods to achieve the described function each specific application, but this realization It should not be considered as beyond the scope of the present invention.

The step of method described in conjunction with the examples disclosed in this document or algorithm, can be executed with hardware, processor The combination of software module or the two is implemented.Software module can be placed in random access memory (RAM), memory, read-only memory (ROM), electrically programmable ROM, electrically erasable ROM, register, hard disk, moveable magnetic disc, CD-ROM or technical field In any other form of storage medium well known to interior.

Above-described specific embodiment has carried out further the purpose of the present invention, technical scheme and beneficial effects It is described in detail, it should be understood that being not intended to limit the present invention the foregoing is merely a specific embodiment of the invention Protection scope, all within the spirits and principles of the present invention, any modification, equivalent substitution, improvement and etc. done should all include Within protection scope of the present invention.

Claims

1. a kind of processing method of deep neural network accelerator failure characterized by comprising

Generate the multiple groups input data and output result corresponding with the input data for being used for test depth neural network；

The accelerator data for opening the deep neural network is routed to TCH test channel, and will be described defeated by the TCH test channel Enter data and be sent to the accelerator, to complete test run；

The operational data is matched with the corresponding output result；

If mismatching, it is determined that the corresponding arithmetic element of the operational data is trouble unit, and closes the trouble unit.

2. the method according to claim 1, wherein the method also includes:

3. according to the method described in claim 2, it is characterized in that, the accelerator data for opening the deep neural network It is routed to TCH test channel, and the input data is sent to by the accelerator by the TCH test channel, comprising:

The accelerator data for opening the deep neural network is routed to TCH test channel, closes bypass circuit, and pass through the survey It pings and the input data is sent to the accelerator, to complete test run.

4. according to the method described in claim 3, it is characterized in that, described read multiple arithmetic elements through operation and be stored in pair Answer the operational data of register, comprising:

Bypass circuit is opened, multiple arithmetic elements are subjected to test operation through input data and is stored in the operation of corresponding register Reading data is to storage unit.

5. the method according to claim 1, wherein the method also includes: modification register circuit, will accelerate All register to be measured is by dedicated serial circuit connection for device, and after opening reading buffer status, the value of all registers is logical It crosses dedicated serial circuit and is output to storage unit by turn.

6. a kind of processing unit of deep neural network accelerator failure characterized by comprising

Generation module is used for the multiple groups input data of test depth neural network and corresponding with the input data for generating Export result；

Test module, the accelerator data for opening the deep neural network is routed to TCH test channel, and passes through the survey It pings and the input data is sent to the accelerator, to complete test run；

7. device according to claim 6, which is characterized in that the read module is specifically used in each operation The input node and output node of unit increase bypass circuit.

8. device according to claim 7, which is characterized in that the test module is specifically used for opening the depth mind Accelerator data through network is routed to TCH test channel, closes bypass circuit, and by the TCH test channel by the input number According to the accelerator is sent to, to complete test run.

9. device according to claim 8, which is characterized in that the read module is specifically used for opening bypass circuit, will Multiple arithmetic elements read storage unit through the operational data that input data carries out test operation and is stored in corresponding register.

10. device according to claim 6, which is characterized in that the read module is also used to read register when unlatching After state, the value of all registers is output to storage unit by dedicated serial circuit by turn, wherein modification register circuit, By accelerator, all register to be measured passes through dedicated serial circuit connection.

11. according to any device of claim 6-10, which is characterized in that the control module is also used to determining mistake Accidentally behind site, error unit is removed using bypass circuit.