CN111027690B

CN111027690B - Combined processing device, chip and method for performing deterministic reasoning

Info

Publication number: CN111027690B
Application number: CN201911176119.3A
Authority: CN
Inventors: 陈子祺; 田甲
Original assignee: Individual
Current assignee: Individual
Priority date: 2019-11-26
Filing date: 2019-11-26
Publication date: 2023-08-04
Anticipated expiration: 2039-11-26
Also published as: CN111027690A

Abstract

The application relates to a combined processing device for executing deterministic reasoning, which comprises an instruction extraction module, an instruction pre-decoding module, an instruction execution module, a memory access module and a register write-back module; the instruction fetch module reads instructions from the instruction memory and updates the program counter to point to the next instruction; the instruction pre-decoding module decodes the compressed instruction into a native instruction, and the instruction decoding module accesses the register file and determines branch control; the instruction execution module obtains a result by executing vector calculation or scalar calculation for the instruction, accesses the accessed memory for the Load/Store instruction, calculates branches and jumps, and checks according to the expected result; in the memory access module, memory access of the pipeline is completed; the register write-back module writes the results of the execution phase to the register file. The application also relates to an operation device, a calculation chip and application of the deterministic neural network algorithm comprising the combined processing device.

Description

Combined processing device, chip and method for performing deterministic reasoning

Technical Field

The application relates to a combined processing device, a chip and a method for executing deterministic reasoning, which are suitable for the technical field of computers.

Background

An artificial neural network, for short, is a mathematical model or a computational model imitating the structure and function of a biological neural network in the fields of machine learning and cognitive science, and is used for estimating and approximating functions. Neural networks are made up of a large number of interconnections between nodes (or neurons). The connection between each two nodes represents a weight, called a weight, for the signal passing through the connection, which corresponds to the memory of the artificial neural network. The output of the network is different according to the connection mode of the network, the weight value and the excitation function. The network itself is usually an approximation to some algorithm or function in nature, and may also be an expression of a logic policy.

The research of artificial neural networks has been in progress in recent decades, and the artificial neural networks have successfully solved many practical problems which are difficult to solve by modern computers in the fields of pattern recognition, intelligent robots, automatic control, predictive estimation, biology, medicine, economy and the like, show good intelligent characteristics, and promote the continuous development of information technology and artificial intelligence fields.

While neural networks have achieved extensive success in many areas, most neural network algorithms are designed without consideration to data security and computational verifiability. First, existing neural network algorithms do not take into account the reproducibility and consistency of operations, and the results of the operations may not be consistent in different architectures or even in the same computing environment. Such uncertainty is multifactorial, including rounding errors of floating point operations, contention in parallel computations, and the like. Second, existing algorithms do not have the function of security protection for training data and reasoning results. The application of the neural network algorithm in fields with higher security requirements such as finance, trusted computing, blockchain, intelligent contracts and the like is greatly limited by the influence. Realizing deterministic neural network calculation, ensuring the safety, reproducibility and credibility of the calculated data of the neural network model has become urgent demands in industry and outside

Disclosure of Invention

The invention aims to provide an operation device and an operation method of a calculation chip capable of executing neural network calculation in a determined sequence, which can eliminate factors such as calculation rounding errors, parallel competition and the like in the process of the neural network calculation, avoid the condition of inconsistency of the neural network calculation and solve the verifiability problem of the deep neural network calculation.

The combined processing device for executing deterministic reasoning comprises an instruction extraction module, an instruction pre-decoding module, an instruction execution module, a memory access module and a register write-back module; the instruction fetching module reads an instruction from the instruction memory and updates the program counter to point to the next instruction; the instruction pre-decoding module decodes the compressed instruction into a native instruction, and the instruction decoding module accesses the register file and determines branch control; the instruction execution module obtains a result by executing vector calculation or scalar calculation for the instruction, accesses the accessed memory for the Load/Store instruction, calculates branches and jumps, and checks according to the expected result; in the memory access module, memory access of the pipeline is completed; the register write-back module writes the results of the execution phase to the register file.

An arithmetic device of a calculation chip of a deterministic neural network algorithm according to the present application, comprising:

integer vector arithmetic unit, which is used to carry out integer vector operation;

an execution pipeline for general purpose operations and reading, decoding and execution of instructions;

a storage unit for providing the execution pipeline access, and storing input values, output values, and temporary values for the integer vector operator to execute each instruction;

the execution pipeline is a combined processing apparatus as described above.

Preferably, the integer vector operator is provided therein with a temporary value storage unit for storing temporary values calculated in accordance with instructions and performing read and write operations to the storage unit. The integer vector operator and the storage unit are configured as random memories, and input values and temporary values in the integer vector operator and the storage unit are discarded after each operation instruction is completed. The integer vector operator performs integer vector addition, integer vector multiply-add, and integer vector nonlinear function value calculation.

Preferably, the integer vector operator accesses the memory unit through an address index, reads and writes an input value, an output value, and a temporary value; and calling and executing in the instruction execution stage of the execution pipeline, acquiring the address indexes corresponding to the input value, the output value and the temporary value in the storage unit, and completing the reading and writing operation of the storage unit of the corresponding address index through the address index distributed by the instruction execution pipeline. More preferably, the integer vector operation instruction reads the corresponding address index of the input vector in the memory from the register file and provides the address index of the placeholder of the output vector to the integer vector operator to complete the calculation.

A deterministic deep neural network computing chip according to the present application, comprising: the system comprises an execution pipeline, an integer vector arithmetic unit, a data interface, an instruction interface, a branch prediction module, a data cache module and a trusted computing coprocessor;

the execution pipeline interacts with the instruction cache unit, the register file, the control and status register to complete instruction execution, and the branch prediction module, the data cache module and the trusted computing coprocessor serve as additional modules of the execution pipeline to provide branch prediction, safe computing related functions and instruction cache related functions; the execution pipeline is a combined processing apparatus as described above.

Preferably, the trusted computing coprocessor receives external instructions and communicates via a data interface and a communication bus, and the security algorithm is accelerated during the instruction execution phase by a security engine comprising a symmetric encryption algorithm, an asymmetric encryption algorithm, and a random number generator. Preferably, the trusted computing coprocessor comprises an encryption and decryption coprocessor, and the encryption and decryption coprocessor comprises an encryption operation unit, a decryption operation unit, a random number generator, an encryption and decryption module and a security protection module.

The application also relates to a method for calculating by using the deterministic deep neural network calculation chip, which comprises the following steps:

step 1: the neural network compiler converts the neural network computational graph into a vector operation expression list through a deterministic topology ordering algorithm; acquiring storage indexes of all the corresponding placeholders of the vectors in the vector operation expression list through a storage planning algorithm; storing all input vectors into storage indexes of corresponding placeholders for neural network reasoning operation;

step 2: the instruction interface reads the instructions of the vector operation expression list from the instruction cache;

step 3: the controller decodes the instruction into a microinstruction for each functional component;

step 4: the operation unit obtains an input vector or scalar from the register file, determines whether the operation is finished by the integer vector operator according to the operation code and the operand type, and writes the result back to the register file;

step 5: the integer vector operation instruction reads the corresponding address index of the input vector in the memory from the register file, and provides the corresponding address index of the placeholder of the output vector together with the address index of the placeholder of the output vector to the integer vector operator to finish calculation;

step 6: executing memory access and register write-back according to the operation code;

and iteratively executing the steps until an operation result of the neural network calculation graph is obtained.

In step 3, the controller accesses the register file, calculates the immediate value, sets the branch, checks the illegal operation code and the combination of the operation code, and splits the vector instruction of the low-frequency operator and the vector instruction beyond the range of the definition domain of the vector operator into a plurality of general scalar instructions through vector expansion and decodes the general scalar instructions.

The application also relates to an electronic device comprising a deterministic deep neural network computing chip as described above.

The application also relates to a board card comprising a memory device, a communication device and a control device, and a deterministic deep neural network computing chip as described above; the computing chip is respectively connected with the storage device, the control device and the communication device.

Preferably, the board card further comprises a main board bus for connecting the memory device, the communication device and the control device, wherein the main board bus comprises a data bus, an address bus and a control bus;

the main control of the main board bus is the deterministic neural network computing chip, the computing chip sends control signals to all components through a control bus, the components needing to be accessed are designated through address signals of an address bus, and data information is transmitted through a data bus.

According to the deterministic deep neural network data processor, the chip and the electronic equipment provided by the application, the deterministic deep neural network data processor has the following beneficial technical effects:

(1) By executing the scheduling of the pipeline, the instructions of the neural network computation graph are executed strictly according to the topological order of the graph, and the uncertainty caused by competition in parallel computation is eliminated.

(2) The uncertainty caused by rounding errors of the traditional floating point vector operation is eliminated through equivalent transformation of an integer vector operator and a nonlinear vector operation.

(3) After the operation module generates the output value, the input value and the temporary value stored in the storage unit are discarded, so that the usable storage unit for calculation is increased, and the utilization rate of the storage unit is improved.

(4) The computing device and the computing method of the application provide deterministic neural network computation.

(5) The trusted computing coprocessor ensures the safety of data from the bottom layer, meets the requirements of high performance, high integration and miniaturization, and has the function of data transmission safety protection.

Drawings

FIG. 1 illustrates one embodiment of a deterministic deep neural network computing chip of the present invention.

FIG. 2 illustrates one embodiment of a trusted computing co-processor of the present invention.

Fig. 3 depicts a computational flow of a neural network according to an embodiment of the present disclosure.

Detailed Description

For the purposes of making the objects, technical solutions and advantages of the present application more apparent, embodiments of the present application will be described in detail hereinafter with reference to the accompanying drawings. It should be noted that, in the case of no conflict, the embodiments and features in the embodiments may be arbitrarily combined with each other.

An arithmetic device of a calculation chip of a deterministic neural network algorithm according to the present invention includes:

integer vector arithmetic unit, is used for carrying on the addition, subtraction, multiplication and addition and nonlinear evaluation operation of the integer vector;

the execution pipeline is used for general operation and instruction reading, decoding and execution;

a memory unit for providing the memory unit for performing pipeline access and storing input values, output values and temporary values for each instruction executed by the integer vector operator;

the integer vector operator is provided with a temporary value storage unit for storing temporary values calculated in accordance with instructions and performing read and write operations on the storage unit.

Preferably, the integer vector operator and the storage unit are configured as a random access memory; the integer vector arithmetic unit accesses the storage unit through an address index, reads and writes an input value, an output value and a temporary value; the integer vector operator, the input value in the memory unit, and the temporary value will be discarded after each operation instruction is completed.

Preferably, execution is invoked in the instruction execution phase of the execution pipeline, address indexes corresponding to the input value, the output value and the temporary value in the storage unit are acquired, and the read and write operations of the storage unit with the corresponding address indexes are completed through the address indexes allocated by the instruction execution pipeline.

FIG. 1 illustrates one embodiment of a deterministic deep neural network computing chip 100 with privacy protection in accordance with the present invention. As shown, deterministic deep neural network computing chip 100 includes execution pipeline 101, integer vector operator 102, data interface 103, instruction interface 104, branch prediction module 105, data cache module 106, trusted computing coprocessor 200. The deterministic deep neural network computing chip 100 implements a harvard architecture for accessing both instruction and data memory. The execution pipeline 101 is a combined processing device that performs deterministic artificial intelligence reasoning. The harvard architecture is a memory architecture that separates program instruction storage from data storage. The harvard architecture is a parallel architecture and is mainly characterized by storing programs and data in different memory spaces, i.e. the program memory and the data memory are two independent memories, each of which is addressed and accessed independently.

The deterministic deep neural network computing chip 100 of the present application has an optimized folded 6-stage pipeline that optimizes the overlap between execution and memory access, thereby reducing stalls and improving efficiency. Specifically, deterministic deep neural network computing chip 100 has an optimized folded 6-level execution pipeline 101, which includes an instruction fetch module 110, an instruction pre-decode module 111, an instruction decode module 112, an instruction execute module 113, a memory access module 114, and a register write-back module 115. By overlapping execution stages, the execution pipeline 101 is able to execute one instruction per clock cycle.

Execution pipeline 101 may interact with instruction cache unit 120, register file 121, control and status registers 122 to complete instruction execution, and branch prediction module 105, data cache module 106, and trusted computing coprocessor 200 may provide branch prediction, secure computing-related functions, instruction cache-related functions as additional modules to execution pipeline 101.

The instruction cache unit 120 is used to speed up the fetching of instructions by caching recently fetched instructions. The instruction cache is able to fetch one packet per cycle on any 16-bit boundary, but cannot fetch across block boundaries. During a cache miss, the complete block will be loaded from the instruction memory. Instruction caches may be configured according to user needs. The cache size, block length, associativity and replacement algorithms are configurable. Register file 121 may be comprised of 32 register units (X0-X31). Register X0 is always zero. The register file has two read ports and one write port. The state of deterministic deep neural network computing chip 100 is maintained by control and status registers 122 (CSR). Control and status registers 122 determine the set of functions, set interrupts and interrupt masks, and determine privilege levels. The CSR maps to an internal 12-bit address space and can be accessed using special commands.

The deterministic deep neural network computing chip 100 of the present invention can execute one instruction per clock cycle. But due to the pipeline architecture, each instruction requires several clock cycles to complete. When a branch instruction is decoded, its condition and outcome are unknown, and waiting for the branch outcome before continuing to fetch a new instruction can cause excessive processor stalls, affecting the performance of the processor. The present application provides for the provision of a branch prediction module 105 so that the processor can predict the outcome of a branch without waiting and continue fetching instructions from the predicted address. When a branch error is predicted, the processor must flush its pipeline and resume fetching from the calculated branch address. The state of the processor is not affected because the pipeline is flushed and therefore virtually no erroneously fetched instructions are executed. However, branch prediction may have forced the instruction cache to load new instructions. The instruction cache state is not restored, meaning that the predicted instruction remains in the instruction cache.

The data cache module 106 is configured to accelerate data memory accesses by buffering recently accessed memory locations. Data caches can handle byte, half-word and word accesses as long as they are located on respective boundaries. Accessing a memory location on an unnatural boundary (e.g., a word access at address 0x 003) results in a data loading trap. During a cache miss, if necessary, the complete block is written back to memory and the newly loaded block is then loaded into the cache.

The trusted computing coprocessor 200 receives external instructions and communicates via the data interface 202 and the communication bus 203, and the security algorithm is accelerated by the security engine 204 during the instruction execution phase. The security engine comprises a common symmetric encryption algorithm, an asymmetric encryption algorithm and a random number generator, and specifically comprises the following steps: a hardware random number generator 210, an aes, 3DES algorithm operator 211, a hash and HMAC algorithm operator 212, a public key private key algorithm operator 213. The HASH and HMAC algorithm arithmetic unit supports the calculation acceleration of common HASH function algorithms such as SHA1, SHA256 and the like; the public key and private key algorithm arithmetic unit supports the calculation acceleration of common public key and private key algorithms such as RSA, diffie-Hellman, ECC and the like. The chip computing method of the trusted computing coprocessor based on the edge trusted cryptography technology is matched with a neural network computing processor, so that the high-efficiency and low-cost edge trusted computing can be realized, and the digital signature technology, the digital authentication technology and the digital authentication technology can be realized functionally.

Preferably, the trusted computing coprocessor comprises an encryption and decryption coprocessor supporting the cryptographic algorithms of ECC, AES, SHA and the like, and specifically comprises an encryption operation unit, a decryption operation unit, a data interface and a communication bus, and the encryption and decryption coprocessor further comprises a random number generator, an encryption and decryption module and a security protection module. The encryption operation unit and the decryption operation unit are used for carrying out encryption and decryption operation on the target data received by the password chip according to the random number generated by the random number generator. The safety protection module is used for acquiring real-time environment data of the password chip and carrying out safety control on the password chip.

Instruction fetch module 110 reads one instruction from instruction memory 121 and updates the program counter to point to the next instruction. Instruction pre-decode module 111 decodes the 16-bit packed instruction into a native 32-bit instruction and instruction decode module 112 accesses the register file and determines branch control. Instruction execution module 113 will perform vector calculations or scalar calculations for ALU, MUL, DIV instructions to access the accessed memory for Load/Store instructions and calculate branches and jumps and check for their expected results. In the memory access module 114, memory access of the pipeline is complete, including this stage, ensuring high performance of the pipeline. The register write back module 115 writes the results of the execution phase to the register file.

Wherein the instruction fetch unit in the instruction fetch module 110 loads the new package from the program memory. A package is a code field containing one or more instructions. The address of the package to be loaded is maintained by a Program Counter (PC). The program counter is 32 bits or 64 bits wide. The program counter is updated as long as the instruction pipeline is not stalled. If the pipeline is flushed, the program counter will restart from the given address.

Where the pre-decode unit in instruction pre-decode module 111 converts the 16-bit compressed instruction into a basic 32-bit instruction, and then the handler counter modifies the instruction, such as "jump and link" and "branch". This avoids waiting for the execution phase to trigger an update and reduces the need for pipeline refreshes. The target address of the branch is predicted from data provided by an optional branch prediction unit or is statically determined from an offset.

Wherein the instruction decode unit in the instruction decode module 112 ensures that operands of the execution unit are available. It accesses the register file, calculates the immediate value, sets the branch, and checks for illegal opcodes and opcode combinations.

Wherein the instruction execution module 113 performs desired operations on the data provided by the instruction decoding module 112. The execution phase has a plurality of execution units, each execution unit having a unique function. The integer vector operator 102 performs integer vector addition, integer vector multiply-add, and integer vector nonlinear function value calculation. ALUs perform logical and arithmetic operations. The multiplier unit calculates signed/unsigned multiplications. The divider unit calculates a signed/unsigned division and a remainder. The load store unit accesses the data store. The branch unit calculates the jump and branch addresses and validates the predicted branches. The operation can only be performed once per clock cycle. Most operations are completed in one clock cycle except for the division instruction and integer vector operation instruction, which always require multiple clock cycles to complete. The multiplier supports configurable delays to improve performance.

In addition, the instruction execution module 113 may also perform required operations on the data provided by the integer vector operator 102. The integer vector operation instruction reads the corresponding address index of the input vector in the memory from the register file, and provides the corresponding address index of the placeholder of the output vector to the integer vector operator to complete the related vector calculation.

In the memory access module 114, the memory access phase waits for the memory read access to complete. When accessing the memory, address, data and control signals will be calculated during the execution phase. The memory latches these signals and then performs the actual access. This means that read data is not available until after 1 clock cycle. In the register write back module 115, the write back phase writes the results from the execution unit and memory read operations to the register file.

Preferably, the integer vector operator 102 provides an adder 130, a multiply-add operator 131 and a function value operator 132. The integer vector arithmetic unit inputs a one-dimensional vector with a length not exceeding V_LEN, outputs a one-dimensional vector storing the corresponding result, and the length should not exceed V_LEN. In this embodiment, v_len is 216, i.e., the input vector length of the integer vector operator does not exceed 216. The integer vector operator obtains a vector from the address index of the memory and writes the result back into the output placeholder vector of the corresponding address index. The integer vector arithmetic unit can carry out integer numerical operation on three operators of addition, subtraction and multiplication addition through different components; calculating a common nonlinear operator such as tanh, sigmoid, exp by a table look-up method, wherein the definition domain of the common nonlinear operator does not exceed the limit of 8-bit unsigned integers; calculating conversion operators such as activation (ReLU), maximum pooling (max pooling), transposition (transfer), matrix extremum (max, min) and the like through a built-in hardware ordering network; and for other low-frequency operators and vector calculation beyond the operator definition domain range, calculating each element of the vector one by one through a pipeline by vector expansion.

The neural network computation graph is a directed graph, each node of the graph represents a layer of neurons in the neural network, and each side represents a connection relationship between layers, and the graph can be generally divided into a directed acyclic graph and a directed cyclic graph. The neural network computational graph of the present invention is specifically a loop-free directed graph. The neural network reasoning operation is performed on the deterministic neural network computing chip 100 according to the present invention, and the neural network computing graph with the contracted execution format and the input data corresponding to each placeholder in the computing graph should be provided. The neural network computation graph is sequenced in a deterministic topology to determine the computation sequence of all vector operators, and all vector operation instructions are executed one by one according to the topology sequence.

Step 1: the neural network compiler converts the neural network computational graph 300 into a vector operation expression list 301 through a deterministic topology ordering algorithm for use in the computational chips of the present invention; acquiring storage indexes of all the corresponding placeholders of the vectors in the vector operation expression list through a storage planning algorithm; all input vectors are stored in the storage index 302 of the corresponding placeholder for neural network reasoning operations.

Step 2: the instruction interface 104 reads instructions from the instruction cache into the vector operation expression list 301, such as vector add instructions, vector multiply add instructions, vector activate function instructions, or the like, for general vector instructions suitable for performing neural network operations, or general scalar instructions suitable for integer calculations.

Step 3: the controller decodes the instructions into microinstructions for each of the functional components. It accesses the register file, calculates the immediate value, sets the branch, and checks for illegal opcodes and opcode combinations. For the low-frequency operator and the vector instruction beyond the range of the operator definition domain of the vector operator, the vector is spread, split into a plurality of general scalar instructions and decoded.

Step 4: the arithmetic unit obtains the input vector/scalar from the register file and determines, based on the opcode and operand type, whether the operation is completed by the integer vector operator and writes the result back to the register file. The integer vector operation instruction reads the corresponding address index of the input vector in the memory from the register file, and provides the corresponding address index of the placeholder of the output vector to the integer vector operator to complete the related vector calculation. Integer scalar operation instructions, floating point scalar operation instructions, and vector operation instructions that have been split are executed by an operator built in an operation unit.

Step 5: memory access and register write back are performed according to the opcode.

And iteratively executing the five steps until an operation result of the neural network calculation graph is obtained.

The application also relates to a computing board card carrying the computing chip, wherein the computing board card is further provided with a storage unit module taking a random access memory as a medium, a SATA bus module used for being connected with an external memory, an Ethernet bus module used for being connected with external communication and a main board bus used for being connected with the module.

The computing board card carrying the computing chip comprises a main board bus, a data bus, an address bus and a control bus. The main control of the main board bus is a calculation chip of the deterministic neural network algorithm, the calculation chip sends control signals to all components through a control bus, the components which need to be accessed are designated by address signals through an address bus, and data information is transmitted through a data bus.

The application also relates to a board card comprising a memory device, a communication device and a control device, and a deterministic deep neural network computing chip as described above; the computing chip is respectively connected with the storage device, the control device and the communication device. The storage device is used for storing data, the communication device is used for realizing data transmission between the chip and external equipment, and the control device is used for monitoring the state of the chip.

Preferably, the board card further comprises a main board bus for connecting the memory device, the communication device and the control device, wherein the main board bus comprises a data bus, an address bus and a control bus; the main control of the main board bus is the deterministic neural network computing chip, the computing chip sends control signals to all components through a control bus, the components needing to be accessed are designated through address signals of an address bus, and data information is transmitted through a data bus.

Although the embodiments disclosed in the present application are described above, the descriptions are merely for facilitating understanding of the present application, and are not intended to limit the present application. Any person skilled in the art to which this application pertains will be able to make any modifications and variations in form and detail of implementation without departing from the spirit and scope of the disclosure, but the scope of the patent claims of this application shall be subject to the scope of the claims that follow.

Claims

1. The combined processing device for executing deterministic reasoning is characterized by comprising an instruction extraction module, an instruction pre-decoding module, an instruction execution module, a memory access module and a register write-back module; the instruction fetching module reads an instruction from the instruction memory and updates the program counter to point to the next instruction; the instruction pre-decoding module decodes the compressed instruction into a local instruction, an instruction decoding unit in the instruction decoding module ensures that operands of an execution unit are available, and the instruction decoding module accesses a register file and determines branch control; the instruction execution module executes integer vector calculation or scalar calculation for the instruction to obtain an integer result, accesses the accessed memory for the Load/Store instruction, calculates branches and jumps, and checks according to the expected result; in the memory access module, memory access of the pipeline is completed; the register write-back module writes the results of the execution phase to the register file.

2. An arithmetic device of a calculation chip of a deterministic neural network algorithm, comprising:

the execution pipeline is a combined processing apparatus according to claim 1.

3. The arithmetic device according to claim 2, wherein the integer vector operator is provided therein with a temporary value storage unit for storing temporary values calculated in accordance with instructions and performing read and write operations to the storage unit.

4. The arithmetic device according to claim 2, wherein the integer vector operator and the storage unit are configured as a random memory, and input values and temporary values in the integer vector operator and the storage unit are to be discarded after each operation instruction is completed.

5. The arithmetic device according to claim 2 or 3 or 4, wherein the integer vector operator performs integer vector addition, integer vector multiply-add, and integer vector nonlinear function value calculation.

6. The arithmetic device according to claim 2 or 3 or 4, wherein the integer vector operator accesses the memory unit through an address index, reads and writes an input value, an output value, and a temporary value; and calling and executing in the instruction execution stage of the execution pipeline, acquiring the address indexes corresponding to the input value, the output value and the temporary value in the storage unit, and completing the reading and writing operation of the storage unit of the corresponding address index through the address index distributed by the instruction execution pipeline.

7. The computing device of claim 6, wherein the integer vector operation instruction reads a corresponding address index of the input vector from the register file and provides the address index of the placeholder of the output vector to the integer vector operator to perform the computation.

8. A deterministic deep neural network computing chip, comprising: the system comprises an execution pipeline, an integer vector arithmetic unit, a data interface, an instruction interface, a branch prediction module, a data cache module and a trusted computing coprocessor;

the execution pipeline interacts with the instruction cache unit, the register file, the control and status register to complete instruction execution, and the branch prediction module, the data cache module and the trusted computing coprocessor serve as additional modules of the execution pipeline to provide branch prediction, safe computing related functions and instruction cache related functions;

the execution pipeline is a combined processing apparatus according to claim 1.

9. The deterministic deep neural network computing chip according to claim 8, wherein the trusted computing coprocessor receives external instructions and communicates via a data interface and a communication bus, and the security algorithm is accelerated during instruction execution by a security engine comprising a symmetric encryption algorithm, an asymmetric encryption algorithm, and a random number generator.

10. The deterministic deep neural network computing chip according to claim 8 or 9, wherein the trusted computing co-processor comprises an encryption and decryption co-processor comprising an encryption operation unit, a decryption operation unit, a random number generator, an encryption and decryption module and a security protection module.

11. A method of computing using the deterministic deep neural network computing chip of any one of claims 8-10, characterized by: the method comprises the following steps:

12. The method according to claim 11, wherein: in step 3, the controller accesses the register file, calculates the immediate value, sets the branch, and checks the illegal operation code and operation code combination, and splits the vector instruction of the low frequency operator and the vector operator beyond the range of the vector operator definition domain into a plurality of general scalar instructions through vector expansion and decodes.

13. An electronic device, characterized in that it comprises a deterministic deep neural network computing chip according to any one of claims 8-10.

14. A board comprising a memory device, a communication device, and a control device, the deterministic deep neural network computing chip of any one of claims 8-10;

the computing chip is respectively connected with the storage device, the control device and the communication device.

15. The board card of claim 14, further comprising a motherboard bus for connecting the memory device, the communication device, and the control device, the motherboard bus including a data bus, an address bus, and a control bus;

the main control of the main board bus is the deterministic deep neural network computing chip, the computing chip sends control signals to all components through a control bus, the components needing to be accessed are specified through address signals of an address bus, and data information is transmitted through a data bus.