WO2020134824A1

WO2020134824A1 - Brain-like computing system

Info

Publication number: WO2020134824A1
Application number: PCT/CN2019/121453
Authority: WO
Inventors: 施路平; 王冠睿; 裴京; 吴臻志; 赵琦
Original assignee: 北京灵汐科技有限公司
Priority date: 2018-12-29
Filing date: 2019-11-28
Publication date: 2020-07-02
Also published as: CN109858620A; CN109858620B

Abstract

A brain-like computing system, comprising an arithmetic/logical operation and control unit, a brain-like coprocessing unit, a memory unit, an external port, and a bus connecting the various units and the external port. The arithmetic/logical operation and control unit is used to perform programming and configuration on the brain-like coprocessing unit, execute arithmetic operations or logical operations, and control the running and data exchange of the other units by means of the bus. The brain-like coprocessing unit is provided with an artificial neural network processing function and a pulsed neural network processing function, and is used to execute artificial neural network computation and pulsed neural network computation on the basis of instructions from the arithmetic/logical operation and control unit and store a computation result in the memory unit. The present system is able to achieve higher computational efficiency when processing different tasks in general artificial intelligence computation, and achieve real-time response to tasks by means of low-delay continuous execution of computing tasks, while also reducing the energy consumption of computation execution in the whole system.

Description

A brain-like computing system

Technical field

The invention relates to the field of artificial intelligence computing, in particular to a brain-like computing system.

Background technique

Since the 1940s when von Neumann proposed to adopt a computer architecture based on binary and program storage, computers rely on the continuous improvement of electronic technology and the continuous miniaturization of Moore's Law to today. By relying on the sequential execution of pre-defined codes and the continuous transfer of data between the memory and the processor via the bus, the computer has powerful numerical processing capabilities. On this basis, people have developed a variety of large-scale software with complex functions, which are widely used in military, economic, educational and scientific research fields. The development and progress of science and technology in today's world are inseparable from computers.

The vigorous development of big data information networks and smart mobile devices has produced massive amounts of unstructured information, accompanied by a sharp increase in the demand for efficient processing of this information. However, the traditional von Neumann computer faces two huge challenges when dealing with the above problems. On the one hand, its processor and memory are separated. Due to the use of bus communication, synchronization, serial and centralized working methods, when dealing with large and complex problems, not only high energy consumption and low efficiency, but also because of its numerical computing-oriented characteristics, it When dealing with informal problems, the complexity of software programming is high, and even impossible to achieve. On the other hand, it mainly follows Moore's Law of Miniature to increase density, reduce cost and improve performance. It is expected that in the next 10 to 15 years, microscale will reach its physical limit. It is difficult to further improve energy efficiency by means of physical miniaturization, and its development will be fundamentally affected. Sexual restrictions.

Therefore, the 2011 International Semiconductor Technology Development Guide pointed out that one of the effective strategies to solve the above challenges is to use the brain-like computing technology developed by the human brain. The human brain with 1011-level neurons and 1015-level plastic synaptic connections, with a volume of only 2 liters, has parallel computing, strong robustness, plasticity and fault tolerance that are unmatched by existing computer architectures, and its energy consumption Only on the order of 10 watts. A neural network is composed of a large number of neurons. Although the structure and behavior of a single neuron is relatively simple, it can show a wealth of network processing functions by learning rules. This network structure is different from the traditional computer processing method. Through the distributed storage and parallel collaborative processing of information, only the basic learning rules can be defined to simulate the adaptive learning process of the brain. It has advantages when formalizing problems.

There are two main ways to implement brain-like computing technology: one is to use software algorithms to simulate parallel distributed brain-like computing neural networks on the existing computer architecture, and the other is to use large-scale integrated analog, digital or digital-analog hybrid Circuit and software system to achieve. At present, a computer structure that can perform artificial intelligence tasks is built based on CPU+GPU. As shown in FIG. 1, the CPU, GPU, storage unit, and external interface are all connected to the bus. Among them, the GPU is expensive and consumes a lot of energy. Because it is not specifically optimized for neural network tasks, the calculation efficiency may not be very high when processing different tasks, and the calculation efficiency gap may be very large. This requires a large amount of computing power as the basis to achieve the required computing power, resulting in Equipment costs and system operation energy consumption are very high. For bioinspired artificial intelligence algorithms such as pulse neural networks, the calculation efficiency is very low. Therefore, due to the calculation structure of CPU+GPU, the execution carrier of the brain-like computing model implemented by the software algorithm is still a traditional computer, and it is impossible to efficiently complete artificial general intelligence. Compared with the human brain's energy efficiency optimization, there is still a big gap between the energy consumption of computing tasks. The energy consumption of brain-like computing neural networks based on silicon technology implemented by neuromorphic devices is significantly improved compared to current software implementation methods. Therefore, the most effective method at present is a brain-like computing scheme based on corresponding hardware acceleration.

The ultimate goal of brain-like computing is artificial general intelligence, also known as strong artificial intelligence, which is the ultimate goal in most areas of artificial intelligence research. For decades, researchers have been continually striving toward this goal through continuous exploration of software and hardware design. In these exploratory studies, two different technical solutions are gradually formed, they are the artificial neural network method and the impulsive neural network method. In artificial neural network computing systems or impulsive neural network computing systems, a single computing system is an individual optimization of a certain type of algorithms and problems, and a single neural network computing paradigm cannot handle complex artificial general artificial intelligence task scenarios. . Artificial neural networks are inadequate in processing sequence information, low-power event-driven response, and real-time issues; impulsive neural networks are inadequate in precise calculations and large data-intensive calculations. In scenarios that require precise numerical processing and fast response at the same time, no single computing system can meet the computing requirements.

Summary of the invention

In order to solve the problem that the existing technology cannot efficiently support the calculation tasks of complex scenes in artificial general intelligence, the present invention proposes a brain-like computing system that combines an arithmetic/logical operation and control unit and a brain-like co-processing unit to use arithmetic/logical operation and control The flexible programming and configuration of the unit to the brain-like co-processing unit realizes low-latency continuous execution of computing tasks and real-time response to the task; at the same time, by controlling the brain-like co-processing unit to efficiently divide the labor to perform artificial neural network calculations and impulsive neural network calculations, it can Achieve higher computing efficiency when processing different tasks in general artificial intelligence computing.

To achieve the above objectives, the technical solutions adopted by the present invention include:

The present invention relates to a brain-like computing system, which is characterized by including arithmetic/logical operation and control unit, brain-like co-processing unit, storage unit, external interface, and bus connecting each unit and external interface; the arithmetic/logical operation and The control unit is used to program and configure the brain-like co-processing unit, perform arithmetic operations or logical operations, and control the operation and data exchange of the other units through the bus; the brain-like co-processing unit has artificial neural network processing Function and pulse neural network processing function, used to perform artificial neural network calculation and pulse neural network calculation according to the instructions of the arithmetic/logic operation and control unit, and save the calculation result to the storage unit; the external interface, use To provide interactive information between the brain-like computing system and the external environment. The beneficial effects of this technical solution are: the brain-like computing system technical solution described in the present invention is used for general artificial intelligence calculation, and the heterogeneous brain-like computer computing structure constructed includes both suitable for performing arithmetic operations/logical calculation tasks The arithmetic/logic operation and control unit of the traditional microprocessor of the United States uses the arithmetic/logic operation and control unit to flexibly program and configure the brain-like co-processing unit to achieve low-latency continuous execution of calculation tasks and corresponding real-time tasks; also Including the brain-like co-processing unit specifically for artificial intelligence computing, forming a heterogeneous fusion brain-like computing structure that can support the brain-like co-processing unit of high-efficiency artificial neural network and impulsive neural network calculation, which can efficiently divide the labor to perform artificial neural network calculation and Pulse neural network computing can handle different tasks in general artificial intelligence computing and achieve higher computing efficiency.

Further, the brain-like co-processing unit includes an interface module connected to the bus and a brain-like co-processor component connected to the interface module, the brain-like co-processor component includes at least one artificial neural network co-processing And at least one pulse neural network co-processor; or, the brain-like co-processing component includes at least one hybrid co-processor that supports both artificial neural network and pulse neural network computing; or, the brain-like co-processing component includes at least An artificial neural network coprocessor, at least one impulsive neural network coprocessor, and at least one hybrid coprocessor supporting both artificial neural network and impulsive neural network calculations. As long as the brain-like computing system includes the coprocessor with the artificial neural network processing function and the impulsive neural network processing function, does the present invention have the same coprocessor with the artificial neural network processing function and the impulsive neural network processing function? There is no limitation in one module, and the structure is flexible. In addition, based on the calculation characteristics of the brain-like co-processing unit and the data access requirements, the present invention designs an interface module that can support the continuous and high-speed execution of the brain-like co-processing unit, so that the brain-like co-processing unit can be realized quickly, efficiently, and conveniently Data exchange with arithmetic/logic operation and control unit, with storage unit, with external interface and with brain-like co-processing unit. The arithmetic/logic operation and control unit composed of traditional microprocessors is used to control the brain-like co-processing unit through the interface module in the brain-like co-processing unit, which can meet the large amount of data interactive transmission between the brain-like co-processing unit and other components Need to achieve low-latency continuous high-speed execution of tasks while reducing the operating power consumption of the entire computing system.

Further, the arithmetic/logic operation and control unit is a CPU, GPU, DSP, and/or single-chip microcomputer; the external interface obtains information from the external environment according to instructions of the arithmetic/logic operation and control unit, or When the external environment sends specific data, the brain-like computing system is controlled to perform a corresponding processing procedure, or the operation result of the brain-like computing system is sent to the external environment.

Further, when the brain-like coprocessor component includes a plurality of the artificial neural network coprocessor, a plurality of the pulse neural network coprocessor or a plurality of the hybrid coprocessor, each of the coprocessing The device has an extensible interface, a plurality of coprocessors of the same type are connected to each other through the extensible interfaces for data information interactive transmission, and different types of coprocessors perform data information interactive transmission through the interface module. That is to say, each coprocessor has an extensible interface, and a routing interface communication network formed by the extensible interface between multiple coprocessors. Part of the co-processors in multiple pulse neural network co-processors and multiple artificial network co-processors exchange data through the interface module.

Further, the artificial neural network coprocessor includes a plurality of parallel artificial neural network computing units, and each of the artificial neural network computing units is connected to each other through an internal bus for interactive data transmission; the artificial neural network computing unit It includes a weight storage unit, a matrix calculation unit, a vector calculation unit and an intermediate value storage unit connected in sequence, the intermediate value storage unit being connected to the matrix calculation unit. The weight storage unit and the intermediate value storage unit are connected to the internal bus through the data bus to exchange data with other artificial neural network calculation units and send the data to the matrix calculation unit for calculation. After receiving the data, the matrix calculation unit performs calculation according to the control signal and The result is sent to the vector calculation unit, and the vector calculation unit performs the corresponding calculation in combination with the control signal and finally transmits the result to the intermediate value storage unit. Further, the impulse neural network coprocessor includes a plurality of impulse neural network computing units that are calculated in parallel and a plurality of routing communication units that are consistent with the number of impulse neural network computing units, and each of the impulse neural network computing units is connected to one The routing communication unit, each of the routing communication units is connected to each other to form an on-chip routing network for interactive transmission of data information; the pulse neural network computing unit includes an axon input unit, a synaptic weight storage unit, a control unit, and a dendrite The calculation unit and the neuron calculation unit, the axon input unit, the synaptic weight storage unit, the control unit and the neuron calculation unit are all connected to the dendrite calculation unit, and the control unit is connected to the axon input unit and the Neuron calculation unit. The dendrite calculation unit calculates based on the received axon input unit data and the data transmitted by the synaptic weight storage unit and sends the results to the neuron calculation unit for further calculations. Finally, the results are sent to other impulsive neural networks through the routing communication unit Computing unit for data interaction.

Further, each coprocessor of the brain-like coprocessor component switches between a calculation state and a low-power idle state according to the logic of the interface module and its own running state. In this way, the corresponding coprocessor can be awakened for calculation every time a new task to be processed arrives. When the coprocessor processing completes the current calculation task and the next calculation task has not been allocated, the coprocessor is in low-power idle State, so as to realize the event-driven working characteristics of the corresponding coprocessor and reduce the overall energy consumption of the computing system.

Further, the interface module includes a data temporary storage unit, an instruction temporary storage unit, a data format conversion unit, and a coprocessor interface unit; the data temporary storage unit includes several sets of storage intervals, and the number of the storage intervals and the interface The number of coprocessors connected to the module is the same, and is used to temporarily store data exchange between each coprocessor and the storage unit, data exchange between each coprocessor and external interface, and data exchange between each coprocessor The instruction temporary storage unit has a first-in first-out storage structure, which is used to temporarily store a plurality of instructions that need to be executed from the arithmetic/logic operation and control unit.

Further, the storage interval includes a first input temporary storage, a second input temporary storage, and an output temporary storage. The first input temporary storage and the second input temporary storage alternately perform receiving data from the bus and sending the temporary storage data to Two tasks of the coprocessor, the output temporarily stores the data processed by the coprocessor to a storage unit, an external interface, or another coprocessor. Therefore, the data temporary storage unit has the characteristics of ping-pong operation. The working status of the two input temporary storages is switched according to the instructions of the arithmetic/logic operation and control unit, or the judgment logic of the brain-like co-processing unit itself, so that the data can be processed with low latency It is sent to the brain-like co-processing unit, which also ensures that the neural network co-processor can achieve fast data acquisition when it needs to process data in several different time steps. The data temporary storage unit in the interface module alternately uses two input temporary storages to form a ping-pong operation for data transmission of the brain-like coprocessor, which greatly improves the data processing efficiency of the brain-like coprocessor.

Further, when the brain-like coprocessor component includes an artificial neural network coprocessor and a pulse neural network coprocessor, the coprocessor interface unit includes an address connected to the pulse neural network coprocessor- An event encoding and decoding unit and a numerical input and output unit connected to the artificial neural network coprocessor, the address-event encoding and decoding unit and the numerical input and output unit are connected to each other to transmit data through the data format conversion unit, The data format conversion unit performs format conversion between the artificial neuron quantity value information and the pulse neuron event packet information.

Further, the numerical input/output unit and the data format conversion unit are connected to the bus through the data temporary storage unit for data interaction, and the command temporary storage unit is directly connected to the bus for data interaction and co-processing the pulse neural network coprocessor and the artificial neural network The controller sends control instructions.

Further, when the computing system includes multiple brain-like co-processing units, the destination address of each brain-like co-processing unit is pre-allocated by the arithmetic/logic operation and control unit, and when the brain-like co-processing units require data interaction At this time, the brain-like co-processing unit assigned to the first destination address sends data to the brain-like co-processing unit corresponding to the second destination address by identifying the second destination address.

Further, when the second destination address brain co-processing unit cannot process the data from the first destination address brain co-processing unit in time, the first destination address brain co-processing unit sends the data to the storage unit, and the The arithmetic/logic operation and control unit selects a specific time to instruct the second destination address type brain co-processing unit to read and process the data from the storage unit.

Further, the brain-like co-processing unit responds to data from the external interface according to the first priority, processes data from other brain-like co-processing units according to the second priority, and processes from the storage unit according to the third priority The data. When the high-priority input is writing data to the data temporary storage unit, the low-priority input waits until the high-priority input is written before continuing to write, so that the brain-like co-processing unit can be ordered and efficiently The received data is processed in response.

Further, the brain-like co-processing unit reads the data/configuration data from the corresponding position of the storage unit according to the data reading/configuration instruction issued by the arithmetic/logical operation and control unit; the sending of the data reading/configuration instruction The process is a broadcast mode sent to all brain-like co-processing units, or a multicast mode sent to multiple designated brain-like co-processing units, or a single mode sent to a single designated brain-like co-processing unit. Broadcast mode: the storage unit sends data to the storage area of all computing units in the artificial neural network/pulse neural network coprocessor; multicast mode: the storage unit sends data to the artificial neural network/pulse neural network coprocessor In the storage area of multiple designated computing units; single mode: the data transmitted by the storage unit is sent to the storage area of a designated computing unit in the artificial neural network/pulse neural network coprocessor. The broadcast mode can be completed in one configuration, while the multicast mode and single mode determine whether to continue to configure other computing units in the brain-like co-processing unit according to the needs of the computing task. Use the multiple transmission methods of broadcast mode, multicast mode, and single mode to achieve efficient management configuration of multiple brain-like co-processing units.

BRIEF DESCRIPTION

FIG. 1 is a schematic structural diagram of an existing computing system.

2 is a schematic diagram of the first embodiment of the present invention.

FIG. 3 is a schematic diagram of a second embodiment of the present invention.

4 is a schematic diagram of a third embodiment of the present invention.

5 is a schematic diagram of a fourth embodiment of the present invention.

6 is a schematic diagram of a preferred structure of the data temporary storage unit of the present invention.

7 is a flowchart of the event-driven work of the coprocessor of the present invention.

8 is a schematic diagram of a preferred structure of the interface module of the present invention.

9 is a flowchart of a data reading/configuration instruction sending mode of the present invention.

10 is a schematic diagram of a preferred structure of the artificial neural network coprocessor of the present invention.

11 is a schematic diagram of a preferred structure of the pulse neural network coprocessor of the present invention.

detailed description

In order to understand the content of the present invention more clearly, it will be described in detail with reference to the drawings and embodiments.

The present invention relates to a brain-like computing system. FIG. 2 is a schematic diagram of a first embodiment of the present invention. The system includes an arithmetic/logic operation and control unit, a brain-like co-processing unit, a storage unit, an external interface, and a bus connecting these units and the external interface . The arithmetic/logical operation and control unit is used to program and configure the brain-like co-processing unit to perform general-purpose calculations (preferably including logical operations and arithmetic calculations such as selection, branching, and judgment), while controlling the other units through the bus Operation and data exchange; brain-like co-processing unit with artificial neural network processing function and impulsive neural network processing function, used to perform artificial neural network calculation and/or impulsive neural network calculation according to the instructions of arithmetic/logic operation and control unit, also That is, it is used for general neural network calculations (including artificial neural network calculations such as MLP, CNN, RNN, and impulse neural network calculations), and performs neural network calculations by receiving data from the storage unit according to the instructions of arithmetic/logic operations and control units, and Save the calculation results to the storage unit; the storage unit is used to provide storage space, which can save the system communication calculation program data, neural network configuration parameters, intermediate exchange data, etc.; the external interface is used to provide the brain-like computing system and the external environment Interactive information, which can obtain information from the external environment according to the instructions of the arithmetic/logic operation and control unit, or trigger the interruption of the brain-like computing system into the corresponding processing process when the specific data of the outside world arrives, or pass the operation results of the brain-like computing system through video , Images or audio and other forms to the outside environment.

Preferably, the brain-like co-processing unit includes an interface module connected to the bus and a brain-like co-processor component connected to the interface module, the brain-like co-processor component may include at least one artificial neural network co-processor and at least A pulse neural network coprocessor. In this embodiment, the computing system includes a brain-like co-processing unit, which includes a combination of an artificial neural network co-processor and a pulse neural network co-processor, and is connected to the bus through an interface module. Perform interactive data transmission.

3 is a schematic diagram of a second embodiment of the present invention, the basic structure of which is substantially the same as the technical solution of the first embodiment. The brain-like co-processing unit includes an interface module connected to the bus and a brain-like co-processing connected to the interface module Component, but the brain-like co-processor component of the brain-like co-processing unit in the second embodiment includes a hybrid co-processor that supports both artificial neural network and impulse neural network calculations, and is connected to the bus through an interface module, Perform interactive data transmission. Of course, the brain-like co-processor component of the brain-like co-processing unit may also include at least two or more hybrid co-processors that simultaneously support artificial neural network and impulse neural network calculations.

FIG. 4 is a schematic diagram of a third embodiment of the present invention. In the third embodiment, the computing system includes multiple brain-like co-processing units, and each brain-like co-processing unit is separately connected to a bus for data interactive transmission. The brain-like co-processor component of the brain-like co-processing unit may be a combination of at least one artificial neural network co-processor and at least one pulse neural network co-processor as described in the first embodiment, or may be as The second embodiment includes at least one hybrid coprocessor that supports both artificial neural network and impulse neural network calculations, and may also include at least multiple artificial neural network coprocessors or at least multiple impulse neural network coprocessors. , And any combination of artificial neural network coprocessor or impulse neural network coprocessor and at least one hybrid coprocessor that supports both artificial neural network and impulse neural network calculations. As long as the system includes a coprocessor with artificial neural network processing function and impulse neural network processing function, whether the coprocessor with artificial neural network processing function and impulse neural network processing function is in the same module , Not limited.

When the brain-like coprocessor component includes a plurality of the artificial neural network coprocessor, a plurality of the pulse neural network coprocessor or a plurality of the hybrid coprocessor, each coprocessor preferably has expandability Interface, multiple coprocessors of the same kind are connected to each other through their extensible interfaces for data information interactive transmission, and different types of coprocessors carry out data information interactive transmission through the interface module. As shown in FIG. 5, a schematic diagram of a fourth embodiment of the present invention. In the fourth embodiment, the computing system includes a brain-like co-processing unit, and the brain-like co-processor component of the brain-like co-processing unit includes multiple artificial nerves. The network coprocessor and multiple impulse neural network coprocessors, the artificial neural network coprocessor and the impulse neural network coprocessor can be connected to each other through the interface module for data exchange, and the same kind of coprocessor can be expanded through its own The interfaces are connected to each other for data exchange.

The interface module preferably includes a data temporary storage unit. The data temporary storage unit includes several sets of storage intervals. The number of the storage intervals is the same as the number of coprocessors connected to the interface module. The data temporary storage unit performs various co-operations through the storage intervals. The temporary storage of data exchanged between the processor and the storage unit, the temporary storage of data exchanged between each coprocessor and the external interface, and the temporary storage of data exchanged between each coprocessor. among them,

1) Temporary storage of data exchange between each coprocessor of the brain-like co-processing unit and the storage unit:

The artificial neural network coprocessor and the impulsive neural network coprocessor have the characteristics of parallel computing, and the calculation of multiple neurons is performed simultaneously in one operation, so the amount of data that needs to be input each time is large. Through the interface module, the data transfer from the storage unit to the interface module can be realized through direct memory access (DMA, Direct Memory Access) in advance to reduce the delay caused by data exchange during the operation of the brain-like co-processing unit. The output and intermediate data of the artificial neural network coprocessor and impulse neural network coprocessor are also stored in the data temporary storage unit first, and then the data is exchanged through the bus and the storage unit.

2) Temporary storage of data exchanged between each coprocessor of the brain-like co-processing unit and the external interface:

When the outside world needs the specific data processed by the brain-like co-processing unit, the corresponding data will be directly sent to the interface module for temporary storage. When the temporarily stored data reaches the preset value, it will stimulate arithmetic/logical operations and The control unit sends instructions or through the interface module's own logic to activate the brain-like co-processing unit to process the data.

3) Temporary storage of data exchanged between coprocessors of the brain-like co-processing unit:

Similarly, when a certain type of brain co-processing unit needs to send data to other types of brain co-processing units in real time, the destination address information in the brain-like co-processing unit pre-configured according to the arithmetic/logic operation and control unit will be sent to the corresponding The data storage unit of the brain-like co-processing unit is waiting for processing.

When the data of the brain-like co-processing unit waits for a period of time before being processed, the brain-like co-processing unit sends its output data to the storage unit, after which the arithmetic/logical operations and The control unit will send instructions to another type of brain co-processing unit at a specific time according to the calculated or preset information to read data from the storage unit for processing.

When multiple data from different sources are sent to the data temporary storage unit at the same time, the priority of the response is: external interface input> other brain-like co-processing unit> storage unit, that is, the brain-like co-processing unit according to the first priority The data from the external interface is processed in response, the data from other brain co-processing units is processed in response to the second priority, and the data from the storage unit is processed in response to the third priority. When the high-priority input is writing data to the data temporary storage unit, the low-priority input waits until the high-priority input write is completed before continuing to write.

Further, the data temporary storage unit has the characteristics of ping-pong operation, corresponding to each brain-like coprocessor component (artificial neural network coprocessor or impulse neural network coprocessor), with a set of two storage intervals, when one of them When it is in the state of receiving data from the bus, the other is in the state of sending its temporarily stored data to the brain-like co-processing unit for processing. 6 is a schematic diagram of a data temporary storage unit. The data temporary storage unit includes a first input temporary storage, a second input temporary storage, and an output temporary storage. The first input temporary storage and the second input temporary storage are alternately executed to receive data from the bus. And two tasks of sending the temporary storage data to the coprocessor, for example, when the first input temporary storage executes the task of receiving data from the bus at time t, the second input temporary storage execution sends the temporary storage data received by itself at time t-1 To the coprocessor, at time t+1, the first input temporary storage sends the temporary storage data received at time t to the coprocessor and the second input temporary storage performs the task of receiving data from the bus, so that the data temporary storage unit has Features of ping pong operation. The output temporary storage outputs the data processed by the coprocessor to a storage unit, an external interface, or another coprocessor. The working status of the two input temporary storages is switched according to the instructions of the arithmetic/logic operation and control unit, or the judgment logic of the brain-like co-processing unit itself, so that the data can be sent to the brain-like co-processing unit with low latency. It ensures that the neural network coprocessor can achieve fast data acquisition when it needs to process data in several different time steps.

7 is a flow chart of the event-driven work of each coprocessor of the present invention. The data temporary storage unit switches the ping-pong state to receive new data, and judges whether the amount of data received by the data temporary storage unit has reached the set value, and judges each when the set value is reached. Whether the coprocessor has processed the previous data and is in the idle state. If it is in the idle state, the data is sent to the coprocessor component for calculation according to the preset timing. After the data is sent, the ping-pong unit switches the read and write state. The data The temporary storage unit determines whether there is still data to be sent to the corresponding coprocessor for processing.

In this way, combined with the judgment logic of the interface module itself and the running state of the coprocessor, the corresponding coprocessor can be awakened for calculation every time a new task to be processed arrives. When the coprocessor processing completes the current calculation task and downloads When a computing task has not been allocated, the coprocessor is in an idle state with low power consumption, so as to realize the event-driven working characteristics of the corresponding coprocessor and reduce the overall energy consumption of the computing system.

8 is a schematic diagram of a preferred structure of an interface module of the present invention. In addition to the data temporary storage unit shown in FIG. 6, the interface module also includes an instruction temporary storage unit, a data format conversion unit, and a coprocessor interface unit, in which the instruction temporary storage The unit has a FIFO (first-in-first-out, first-in-first-out) storage structure. When the arithmetic/logic operation and control unit sends multiple instructions that need to be continuously executed, the instruction temporary storage unit temporarily stores the multiple instructions, so that when the corresponding agreement When the processor finishes executing an instruction, it can quickly execute the next instruction to be executed.

The coprocessor interface unit includes an address-event (AER) encoding/decoding unit connected to the pulse neural network coprocessor and a numerical input/output unit connected to the artificial neural network coprocessor, AER encoding/decoding The unit and the numerical input/output unit are connected to each other through the data format conversion unit to transmit data. The numerical input/output unit and the data format conversion unit are connected to the bus through the data temporary storage unit for data interaction, and the command temporary storage unit is directly connected to the bus for data interaction And send control instructions to the pulse neural network coprocessor and the artificial neural network coprocessor.

The interface module communicates with the AER encoding/decoding unit and the pulse neural network coprocessor using the AER notation coding method, and transmits the nerves in the pulse neural network coprocessor through the form of discrete event packets (ie, pulse neuron event packets). The output pulse of the neuron, the pulse neuron event packet contains the target address of the pulse information, when the pulse neural network coprocessor outputs a pulse neuron event packet, it means that it transmits a pulse to the destination address, if the pulse coprocessor At a certain moment, no pulse is generated in the calculation result, and no pulse neuron event packet is output. The AER encoding/decoding unit is used to analyze the routing information in the pulse neuron event package when receiving the output of the pulse neural network coprocessor, and to package the routing information when sending the input to the pulse neural network coprocessor.

The interface module and the artificial neural network coprocessor directly and continuously transmit the number of multiple artificial neurons in batches. The numerical input/output unit is used to receive continuous numerical values from the artificial neural network and store the data in the corresponding area of the data temporary storage unit, and to read from the corresponding position of the data temporary storage unit when sending data to the artificial neural network subsystem Get the data and send it.

The data format conversion unit is used for format conversion of input and output data of the artificial neural network coprocessor and the pulse neural network coprocessor. Data format conversion unit, when the artificial neuron information is input into the pulse neural network coprocessor, the artificial neuron quantity value information with a certain precision is converted into the pulse neuron event package information; the pulse neuron information is input into the artificial neural network co-processing When converting, the pulse neuron event package is converted into artificial neuron quantity value information with a certain accuracy. That is to say, the data format conversion unit performs format conversion between the artificial neuron quantity value information and the pulse neuron event packet information.

The different interface encoding methods described above can use the same physical carrier and physical transmission protocol during transmission.

The arithmetic/logical operation and control unit of the brain-like computing system of the present invention is preferably a microprocessor that traditionally executes general-purpose programs, including but not limited to: CPU, GPU, DSP, single-chip microcomputer, and the like. The storage unit is a computer-readable storage medium, which may be, but not limited to, electronic, magnetic, optical, electromagnetic, infrared or volatile, non-volatile semiconductor system, device or device, or the aforementioned Any suitable combination. A more specific example (not an exhaustive list) of computer-readable storage media will include the following: electrical connection with one or more wires, floppy disk for portable computer, hard disk, random access memory (RAM), read-only memory ( ROM), erasable programmable read-only memory (EPROM or flash memory), optical fiber, portable compact disk read-only memory (CD-ROM), non-easy memory (NVM) such as phase change memory (PCM) and resistive memory (RRAM), optical storage device, magnetic storage device, or any suitable combination of the foregoing. In the context of an embodiment of the present invention, a computer-readable storage medium may be any tangible medium that can contain or store a program used by or in conjunction with an instruction execution system, device, or device.

The arithmetic/logic operation and control unit executes algorithms and functions other than neural networks in artificial general intelligence (such as necessary operations in machine learning algorithms such as data preprocessing, branch loop logic control), and it is responsible for sending to the artificial neural network Configuration instructions and other operation instructions.

1. The arithmetic/logic operation and control unit sends instructions to the brain-like co-processing unit

The arithmetic/logic operation and control unit (referred to as the control unit) performs algorithms and functions other than neural networks in artificial general intelligence (such as data preprocessing, branch loop logic control and other necessary operations in machine learning algorithms), and it is responsible for sending Instructions for configuring the artificial neural network and other operation instructions, including but not limited to updating the configuration of the brain-like co-processing unit, changing the operating state of the co-processing co-processing unit, and reading the operating state of the co-processing unit, etc. As mentioned above, the instruction information sent by the control unit to the brain-like co-processing unit is stored in an instruction temporary storage unit with a FIFO storage structure, and is executed after the brain-like co-processor processes the previous instruction.

2. Arithmetic/logical operation and control unit update configuration data to brain-like co-processing unit

In particular, when the control unit controls the configuration data of the brain-like co-processing unit, the configuration instruction is first sent to the brain-like co-processing unit to make the brain-like co-processing unit enter the corresponding configuration mode, and then the brain-like co-processing unit and the storage unit perform data In exchange, the corresponding configuration data is obtained from the storage unit, and the address of the configuration data in the storage unit is given by the configuration instruction. When the configuration parameters are transferred from the storage unit to the brain-like co-processing unit, the configuration mode is divided into broadcast mode, multicast mode and single mode. 9 is a flow chart of a data reading/configuration instruction sending mode of the present invention, including a broadcast mode sent to all brain-like co-processing units, a multicast mode sent to multiple designated brain-like co-processing units, or a single designated class Single mode of brain co-processing unit.

Broadcast mode: The storage unit sends data to the storage area of all computing units in the artificial neural network/pulse neural network coprocessor. As shown in FIG. 9, the control unit sends a broadcast transfer instruction to the brain-like co-processing unit, and the brain-like co-processing unit reads data from the storage unit once, and the data is sent to all computing units.

Multicast mode: the storage unit sends data to the storage area of multiple specified computing units in the artificial neural network/pulse neural network coprocessor. As shown in Figure 9, the control unit sends multicast transmission to the brain-like co-processing unit Instruction, the brain-like co-processing unit reads data from the storage unit once, and the data is sent to multiple corresponding computing units.

Single mode: The data transmitted by the storage unit is sent to the storage area of a designated computing unit in the artificial neural network/pulse neural network coprocessor. As shown in FIG. 9, the control unit sends the first transmission to the brain-like co-processing unit Instruction, the brain-like co-processing unit reads the data from the storage unit once, and sends the data to a corresponding calculation unit.

Among them, the broadcast mode can be completed in one configuration, while the multicast mode and single mode determine whether to continue to configure other computing units in the brain-like co-processing unit according to the needs of the computing task, and return to control when it is necessary to continue to configure other computing units The unit needs to send data reading/configuration instructions to the brain-like co-processing unit.

The brain-like coprocessor component in the present invention preferably includes an artificial neural network coprocessor and a pulse neural network coprocessor, both of which are dedicated hardware circuit structures.

Artificial neural network coprocessor, used to transmit and process data with a certain accuracy (higher data accuracy than pulse neural network coprocessor) in artificial neural network, to achieve high-density parallel computing

10 is a schematic structural diagram of an artificial neural network coprocessor of the present invention. The artificial neural network coprocessor includes a plurality of artificial neural network computing units in parallel calculation, and each artificial neural network computing unit is connected to each other through an internal bus for interactive data transmission; the artificial neural network computing unit includes sequentially connected weight storage Unit, matrix calculation unit, vector calculation unit and intermediate value storage unit, the intermediate value storage unit is also connected to the matrix calculation unit, the weight storage unit and the intermediate value storage unit are connected to the internal bus through the data bus to communicate with other artificial neural network calculation units The data is sent to the matrix calculation unit for calculation. After receiving the data, the matrix calculation unit performs calculation according to the control signal and sends the result to the vector calculation unit. The vector calculation unit combines the control signal to perform the corresponding calculation and finally transmits the result to the middle. Value storage unit.

11 is a schematic structural diagram of a pulse neural network coprocessor of the present invention. The pulse neural network coprocessor is used to process input information with sparseness, dynamic data flow, rich timing information, and one or more features of discrete pulse input. The impulse neural network coprocessor includes multiple parallel computing impulse neural network computing units and multiple routing communication units consistent with the number of impulse neural network computing units. Each impulsive neural network computing unit is connected to a routing communication unit, each The routing communication units are connected to each other to form an on-chip routing network for data information interactive transmission; the impulse neural network computing unit includes an axon input unit, a synaptic weight storage unit, a control unit, a dendrite computing unit, and a neuron computing unit. The axon input unit receives data from the routing communication unit and sends it to the dendrite calculation unit. The axon input unit, synapse weight storage unit, control unit and neuron calculation unit are all connected to the dendrite calculation unit, and the control unit is connected to the axon input respectively The unit and the neuron calculation unit, the dendrite calculation unit calculates based on the received axon input unit data and the data transmitted by the synaptic weight storage unit and sends the result to the neuron calculation unit for further calculation, and finally the result is communicated by routing The unit is sent to other impulsive neural network computing units for data interaction.

When the computing system includes multiple brain-like co-processing units, the destination address of each brain-like co-processing unit is pre-allocated by the arithmetic/logic operation and control unit. When there are two brain-like co-processing units or more than two When data interaction is required, the brain-like co-processing unit assigned to the first destination address sends data to the brain-like co-processing unit corresponding to the second destination address by identifying the second destination address. When the second destination address brain co-processing unit cannot process the data from the first destination address brain co-processing unit in a timely manner, the first destination address brain co-processing unit sends the data to the storage unit, and the arithmetic/logic The operation and control unit selects a specific time and instructs the second destination address type brain co-processing unit to read and process the data from the storage unit.

The brain-like computing system of the present invention is essentially a heterogeneously combined brain-like computer structure, which uses an arithmetic/logic operation and control unit composed of traditional microprocessors, which can support efficient artificial neural network and impulsive neural network computing classes Brain co-processing units work together to divide and perform tasks efficiently in general artificial intelligence computing. The system facilitates the use of brain-like co-processing units in actual application scenarios. The arithmetic/logic operation and control unit composed of traditional microprocessors can realize flexible programming and configuration of brain-like co-processors, which can be real-time online Change the tasks handled by the brain-like coprocessor. At the same time, based on the calculation characteristics of the brain-like co-processing unit and the data access requirements, an interface module that can support the continuous high-speed execution of the brain-like co-processing unit is preferably designed. The logic and its own operating state are switched between the computing state and the low-power idle state, making it possible to quickly, efficiently, and conveniently implement the brain-like co-processing unit and the arithmetic/logic operation and control unit, and the storage unit, and the external interface and Data exchange between brain-like co-processing units reduces the operating power consumption of the entire system.

The above is only the preferred embodiment of the present invention, but the scope of protection of the present invention is not limited to this, any person skilled in the art can easily think of changes or replacements within the technical scope disclosed by the present invention Etc. should be covered within the protection scope of the present invention. Therefore, the protection scope of the present invention should be subject to the protection scope of the claims.

Claims

A brain-like computing system, characterized by comprising an arithmetic/logic operation and control unit, a brain-like co-processing unit, a storage unit, an external interface, and a bus connecting each unit and the external interface; the arithmetic/logic operation and control unit, It is used to program and configure the brain-like co-processing unit, perform arithmetic operations or logical operations, and control the operation and data exchange of the other units through the bus; the brain-like co-processing unit has artificial neural network processing functions and pulses Neural network processing function, used to perform artificial neural network calculation and impulsive neural network calculation according to the instructions of the arithmetic/logical operation and control unit, and save the calculation result to the storage unit; the external interface is used to provide all Describe the interactive information between the brain-like computing system and the external environment.
The computing system according to claim 1, wherein the brain-like co-processing unit includes an interface module connected to the bus and a brain-like co-processor component connected to the interface module,

The brain-like coprocessor component includes at least one artificial neural network coprocessor and at least one pulse neural network coprocessor;

Or, the brain-like co-processing component includes at least one hybrid co-processor that supports both artificial neural network and impulse neural network calculations;

Or, the brain-like co-processing component includes at least one artificial neural network co-processor, at least one impulsive neural network co-processor, and at least one hybrid co-processor that supports both artificial neural network and impulsive neural network calculations.
The computing system according to claim 1 or 2, wherein the arithmetic/logic operation and control unit is a CPU, GPU, DSP, and/or single chip microcomputer;

The external interface obtains information from the external environment according to the instructions of the arithmetic/logical operation and control unit, or controls the brain-like computing system to perform corresponding processing procedures when the external environment sends specific data, or The operation result of the brain-like computing system is sent to the external environment.
The computing system according to claim 2 or 3, wherein when the brain-like coprocessor component includes a plurality of the artificial neural network coprocessor, a plurality of the pulse neural network coprocessor or a plurality of In the hybrid coprocessor, each of the coprocessors has an extensible interface, and multiple coprocessors of the same type are connected to each other through the extensible interfaces for data information interactive transmission. Different types of coprocessors pass the The interface module transmits data information interactively.
The computing system according to any one of claims 2-4, wherein the artificial neural network coprocessor includes a plurality of parallel artificial neural network computing units, and each of the artificial neural network computing units The buses are connected to each other for data information interactive transmission; the artificial neural network calculation unit includes a weight storage unit, a matrix calculation unit, a vector calculation unit, and an intermediate value storage unit connected in sequence, and the intermediate value storage unit is connected to the matrix calculation unit.
The computing system according to any one of claims 2-5, wherein the impulse neural network coprocessor includes a plurality of impulse neural network computing units that are calculated in parallel and a plurality of the same number of impulse neural network computing units A routing communication unit, each of the pulse neural network computing units is connected to one of the routing communication units, and each of the routing communication units is connected to each other to form an on-chip routing network for interactive transmission of data information; the pulse neural network computing unit includes Axon input unit, synapse weight storage unit, control unit, dendrite calculation unit and neuron calculation unit, the axon input unit, synapse weight storage unit, control unit and neuron calculation unit are all connected to dendrite calculation unit , The control unit is respectively connected to the axon input unit and the neuron calculation unit.
The computing system according to any one of claims 2-6, wherein each coprocessor of the brain-like coprocessor component is in a computing state and low power consumption according to the logic of the interface module and its own operating state Switch between idle states.
The computing system according to any one of claims 2-7, wherein the interface module includes a data temporary storage unit, an instruction temporary storage unit, a data format conversion unit, and a coprocessor interface unit; the data temporary storage The unit includes several sets of storage intervals, the number of the storage intervals is the same as the number of coprocessors connected to the interface module, and is used to temporarily store data exchanged between each coprocessor and the storage unit, and each coprocessor Exchange data with external interfaces and exchange data between coprocessors; the instruction temporary storage unit has a first-in first-out storage structure for temporarily storing multiple instructions sent from arithmetic/logical operations and control units to be executed .
The computing system according to claim 8, wherein the storage interval includes a first input temporary storage, a second input temporary storage, and an output temporary storage, and the first input temporary storage and the second input temporary storage are alternately executed There are two tasks of receiving data from the bus and sending temporary storage data to the coprocessor. The output temporary storage outputs the data processed by the coprocessor to a storage unit, an external interface, or another coprocessor.
The computing system according to claim 8, wherein when the brain-like coprocessor component includes an artificial neural network coprocessor and a pulse neural network coprocessor, the coprocessor interface unit includes The address-event encoding and decoding unit connected to the impulse neural network coprocessor and the numerical input and output unit connected to the artificial neural network coprocessor, the address-event encoding and decoding unit and the numerical input and output unit pass The data format conversion unit connects and transmits data to each other, and the data format conversion unit performs format conversion between the artificial neuron quantity value information and the pulse neuron event packet information.
The computing system according to claim 10, wherein the numerical input/output unit and the data format conversion unit are connected to the bus through the data temporary storage unit for data interaction, and the command temporary storage unit is directly connected to the bus for data interaction and pulse Neural network coprocessor and artificial neural network coprocessor send control instructions.
The computing system according to claim 1, wherein when the computing system includes a plurality of brain-like co-processing units, the destination address of each brain-like co-processing unit is pre-allocated by an arithmetic/logical operation and control unit, When data interaction is required between the brain-like co-processing units, the brain-like co-processing unit assigned to the first destination address sends data to the brain-like co-processing unit corresponding to the second destination address by identifying the second destination address.
The computing system according to claim 12, wherein when the second destination address brain co-processing unit cannot process data from the first destination address brain co-processing unit in time, the first destination address brain co-processing unit The processing unit sends the data to the storage unit, and the arithmetic/logical operation and control unit selects a specific time to instruct the second destination address type brain co-processing unit to read and process the data from the storage unit.
The computing system according to claim 12 or 13, wherein the brain-like co-processing unit processes data from an external interface according to a first priority response, and processes data from other brain-like co-processing units according to a second priority response , The data from the storage unit is processed in response to the third priority.
The computing system according to claim 14, wherein the brain-like co-processing unit reads the data/configuration data from the corresponding location of the storage unit according to the data reading/configuration instruction issued by the arithmetic/logical operation and control unit The sending process of the data read/configuration instruction is a broadcast mode sent to all brain-like co-processing units, or a multicast mode sent to multiple specified brain-like co-processing units, or a single specified brain-like co-processing Single mode of unit.