CN112766475B - Processing component and artificial intelligence processor - Google Patents

Processing component and artificial intelligence processor Download PDF

Info

Publication number
CN112766475B
CN112766475B CN202011565319.0A CN202011565319A CN112766475B CN 112766475 B CN112766475 B CN 112766475B CN 202011565319 A CN202011565319 A CN 202011565319A CN 112766475 B CN112766475 B CN 112766475B
Authority
CN
China
Prior art keywords
unit
processing
data
processing result
instruction
Prior art date
Legal status (The legal status is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the status listed.)
Active
Application number
CN202011565319.0A
Other languages
Chinese (zh)
Other versions
CN112766475A (en
Inventor
裴京
施路平
王冠睿
马骋
Current Assignee (The listed assignees may be inaccurate. Google has not performed a legal analysis and makes no representation or warranty as to the accuracy of the list.)
Tsinghua University
Original Assignee
Tsinghua University
Priority date (The priority date is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the date listed.)
Filing date
Publication date
Application filed by Tsinghua University filed Critical Tsinghua University
Priority to CN202011565319.0A priority Critical patent/CN112766475B/en
Publication of CN112766475A publication Critical patent/CN112766475A/en
Application granted granted Critical
Publication of CN112766475B publication Critical patent/CN112766475B/en
Active legal-status Critical Current
Anticipated expiration legal-status Critical

Links

Images

Classifications

    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06NCOMPUTING ARRANGEMENTS BASED ON SPECIFIC COMPUTATIONAL MODELS
    • G06N3/00Computing arrangements based on biological models
    • G06N3/02Neural networks
    • G06N3/06Physical realisation, i.e. hardware implementation of neural networks, neurons or parts of neurons
    • G06N3/063Physical realisation, i.e. hardware implementation of neural networks, neurons or parts of neurons using electronic means
    • YGENERAL TAGGING OF NEW TECHNOLOGICAL DEVELOPMENTS; GENERAL TAGGING OF CROSS-SECTIONAL TECHNOLOGIES SPANNING OVER SEVERAL SECTIONS OF THE IPC; TECHNICAL SUBJECTS COVERED BY FORMER USPC CROSS-REFERENCE ART COLLECTIONS [XRACs] AND DIGESTS
    • Y02TECHNOLOGIES OR APPLICATIONS FOR MITIGATION OR ADAPTATION AGAINST CLIMATE CHANGE
    • Y02DCLIMATE CHANGE MITIGATION TECHNOLOGIES IN INFORMATION AND COMMUNICATION TECHNOLOGIES [ICT], I.E. INFORMATION AND COMMUNICATION TECHNOLOGIES AIMING AT THE REDUCTION OF THEIR OWN ENERGY USE
    • Y02D10/00Energy efficient computing, e.g. low power processors, power management or thermal management

Abstract

The present disclosure relates to a processing component and an artificial intelligence processor. The processing unit comprises a control unit, an axon unit, a dendrite unit, a cell body unit and a routing unit, wherein the control unit is used for generating control instructions according to preset control information so as to control the axon unit, the dendrite unit, the cell body unit and the routing unit. The axon unit is used for reading the processing data stored in the storage component and obtaining first data through data conversion. The dendrite unit is used for obtaining a first processing result according to the first data; the cell body unit is used for reading the first processing result and/or the processing data to obtain the second processing result, and the routing unit can send the first processing result, the second processing result and/or the processing data. According to the processing component of the embodiment of the disclosure, the processing component and the storage component can be arranged in the computing core, the storage component outside the core is not required to be read or written, the resource occupation of read-write data is reduced, and the operation requirements of accurate operation and quick response of a large amount of data can be met.

Description

Processing component and artificial intelligence processor
Technical Field
The present disclosure relates to the field of computers, and more particularly, to a processing component and an artificial intelligence processor.
Background
In existing computer architectures, parallel computation in artificial intelligence is typically accelerated by a conventional CPU (Central Processing Unit ) in combination with a GPU (Graphics Processing Unit, graphics processor).
In addition, the existing neural network hardware acceleration computing system mainly comprises two schemes of a system for computing based on an artificial neural network and a system for computing based on a pulse neural network.
The computing system based on the artificial neural network is based on the existing computing architecture, and accelerates a certain type of specific algorithm or scene through optimizing the computing and memory architecture, so that the optimization of the computing speed, the power consumption, the cost and the like under the specific scene is realized. Specific implementations include implementation or design of application specific integrated circuit chips (ASICs) using conventional computing systems such as GPUs, FPGAs (Field Programmable Gate Array, field programmable gate arrays), DSPs (Digital Signal Processor, digital signal processors), many-core processors, and the like. Algorithms executed by the system include various algorithms based on artificial neural networks, such as full connectivity networks (MLPs), convolutional Neural Networks (CNNs), and Recurrent Neural Networks (RNNs), for handling related problems of computer vision, natural language processing, and system control.
The computing system based on the pulse neural network is designed to be different from the traditional von neumann architecture by referring to the information structure and operation rules of human brain processing. The method generally adopts the thought of a localized integrated storage and calculation system, and uses the mutual connection structure of human brain neurons to store information in synaptic connection; meanwhile, the system adopts a decentralization system design, and different neurons can perform calculation in parallel; the neurons transmit information by using pulse (0 and 1 respectively indicate the existence of the pulse) sequences. The specific implementation mode comprises an analog, digital or analog-digital hybrid VLSI (very large scale integrated circuit) and a circuit system for realizing a neuron or a nerve synapse model based on a novel material or a novel electronic component, and is mainly used for solving the problems of low power consumption such as perception, real-time response, low power consumption, gesture recognition, optical flow information processing and the like.
However, current computer structures that can perform artificial intelligence tasks are built based on a combination of a CPU and a GPU, where the GPU has a high price and consumes very high energy when it is needed, and since it is not specifically optimized for neural network tasks, computational efficiency is not necessarily high when processing different tasks. For the biological inspired artificial intelligence algorithms such as the impulse neural network, the calculation efficiency is very low, so that the calculation tasks of the artificial general intelligence cannot be completed efficiently by virtue of the calculation structures of the CPU and the GPU.
In addition, in the artificial neural network computing system or the impulse neural network computing system, a single computing system is a complex task scene of artificial general artificial intelligence which cannot be dealt with by a single certain neural network computing paradigm for single optimization of a certain class of algorithms and problems. The artificial neural network has insufficient capability in processing sequence information, low-power event-driven response and real-time problems; impulse neural networks have inadequate ability to accurately compute, large data volume intensive computations. A single computing system cannot meet the computational requirements in a scenario where processing requires both accurate numerical processing and fast response.
Disclosure of Invention
In view of this, the present disclosure proposes a processing component and an artificial intelligence processor.
According to an aspect of the present disclosure, there is provided a processing part including: the processing component is applied to a computing core of an artificial intelligence processor, the artificial intelligence processor comprises a plurality of computing cores, each computing core comprises a processing component and a storage component, and the processing component comprises: a control unit, an axon unit, a dendrite unit, a cell body unit and a routing unit, wherein the control unit is used for: generating a control instruction according to preset control information so as to control at least one of an axon unit, a dendrite unit, a cell body unit and a routing unit. The axon unit is used for: and under the condition that a control instruction of the control unit is received, reading the processing data and/or the weight data stored in the storage component, and performing data conversion processing on the processing data and/or the weight data to obtain first data. The dendritic unit is used for: reading the first data under the condition that a control instruction of the control unit is received, and obtaining a first processing result according to the first data; the cell body unit is used for: reading the first processing result and/or the processing data under the condition that a control instruction of the control unit is received; obtaining a second processing result according to the first processing result and/or the processing data; and sending the first processing result, the processing data and/or the second processing result through the routing unit.
In one possible implementation, the obtaining, by the dendrite unit, a first processing result according to the first data includes: and performing multiply-add array processing on the first data to obtain the first processing result.
In one possible implementation, the axon unit is further configured to: reading the first processing result; writing the first processing result into the storage part.
In one possible implementation manner, the storage component stores an operation parameter, and the cell unit obtains a second processing result according to the first processing result and/or the processing data, where the second processing result includes: reading the processing data and/or the first processing result stored in the storage component and the operation parameter; activating the processing data and/or the first processing result according to the operation parameters to obtain the second processing result; writing the second processing result into the storage part.
In a possible implementation, the storage unit includes an output buffer space, and the cell unit is further configured to: reading the processing data, the first processing result and/or the second processing result stored in the data storage space under the condition that a control instruction of the control unit is received; and writing the processing data, the first processing result and/or the second processing result into the output buffer space.
In a possible implementation, the storage unit stores an operation parameter, the operation parameter including a routing table, and the routing unit is further configured to: under the condition that a control instruction of the control unit is received, reading a routing table stored in the data storage space and the processing data, the first processing result and/or the second processing result stored in the output buffer space; and sending the processing data, the first processing result and/or the second processing result according to the routing table.
In a possible implementation, the routing unit is further configured to: receiving communication data, the communication data comprising data from other computing cores; and writing the communication data into the storage component.
In a possible implementation manner, the control instruction includes instruction set address information, and instruction index information, where the instruction index information is used to indicate whether an instruction corresponding to an index is executed, and the control unit is further configured to: and sending the instruction set address information and the instruction index information to at least one of the axon unit, the dendrite unit, the cell body unit and the routing unit, so that the at least one of the axon unit, the dendrite unit, the cell body unit and the routing unit accesses the instruction set based on the instruction set address information, acquires an instruction to be executed, and executes the instruction to be executed according to the instruction index information.
According to another aspect of the present disclosure, an artificial intelligence processor is provided that includes a plurality of computing cores including a processing component and a storage component.
According to the processing component of the embodiment of the disclosure, the processing component and the storage component can be arranged in the computing core, so that the storage component directly receives the read-write access of the processing component, the storage component outside the core is not required to be read or written, the resource occupation and the power consumption of read-write data are reduced, the processing efficiency is improved, and the processing component is suitable for an artificial intelligent processor with a many-core architecture, and can meet the operation requirements of accurate operation and quick response of a large amount of data. The axon unit, the dendrite unit, the cell body unit and the routing unit are suitable for different functions such as state control, data storage, matrix processing, multiplication and addition calculation, nonlinear operation and the like, can realize data transmission through the axon unit, can execute tasks in parallel in a pipeline form, and can efficiently complete the calculation tasks of manual general intelligence.
Other features and aspects of the present disclosure will become apparent from the following detailed description of exemplary embodiments, which proceeds with reference to the accompanying drawings.
Drawings
The accompanying drawings, which are incorporated in and constitute a part of this specification, illustrate exemplary embodiments, features and aspects of the present disclosure and together with the description, serve to explain the principles of the disclosure.
FIG. 1 shows a schematic diagram of a computing core according to an embodiment of the present disclosure.
Fig. 2 illustrates an application schematic of a computing core according to an embodiment of the present disclosure.
Fig. 3 shows a schematic diagram of a processing device according to an embodiment of the present disclosure.
Fig. 4 shows a schematic diagram of an electronic device according to an embodiment of the disclosure.
Detailed Description
Various exemplary embodiments, features and aspects of the disclosure will be described in detail below with reference to the drawings. The same reference numbers in the drawings indicate identical or functionally similar elements. Although various aspects of the embodiments are illustrated in the accompanying drawings, the drawings are not necessarily drawn to scale unless specifically indicated.
The word "exemplary" is used herein to mean "serving as an example, embodiment, or illustration. Any embodiment described herein as "exemplary" is not necessarily to be construed as preferred or advantageous over other embodiments.
In addition, numerous specific details are set forth in the following detailed description in order to provide a better understanding of the present disclosure. It will be understood by those skilled in the art that the present disclosure may be practiced without some of these specific details. In some instances, methods, means, elements, and circuits well known to those skilled in the art have not been described in detail in order not to obscure the present disclosure.
FIG. 1 shows a schematic diagram of a computing core according to an embodiment of the present disclosure. Processing elements according to embodiments of the present disclosure are applied to a compute core of an artificial intelligence processor that includes a plurality of compute cores.
As shown in fig. 1, each computing core includes a processing unit and a storage unit, the processing unit including: a control unit 11, an axon unit 12, a dendrite unit 13, a cell unit 14 and a routing unit 15,
the control unit 11 is configured to:
generating a control instruction according to preset control information to control at least one of an axon unit, a dendrite unit, a cell body unit and a routing unit,
the axon unit 12 is for:
under the condition that a control instruction of the control unit is received, reading processing data and/or weight data stored in the storage component, and performing data conversion processing on the processing data and/or weight data to obtain first data;
the dendrite unit 13 is used for:
under the condition that a control instruction of the control unit is received, reading the first data, and obtaining a first processing result according to the first data;
the cell unit 14 is for:
reading the first processing result and/or the processing data under the condition that a control instruction of the control unit is received;
Obtaining a second processing result according to the first processing result and/or the processing data;
and transmits the first processing result, the processing data, and/or the second processing result via the routing unit 15.
According to the processing component of the embodiment of the disclosure, the processing component and the storage component can be arranged in the computing core, so that the storage component directly receives the read-write access of the processing component, the storage component outside the core is not required to be read or written, the resource occupation and the power consumption of read-write data are reduced, the processing efficiency is improved, and the processing component is suitable for an artificial intelligent processor with a many-core architecture, and can meet the operation requirements of accurate operation and quick response of a large amount of data. The axon unit, the dendrite unit, the cell body unit and the routing unit are suitable for different functions such as state control, data storage, matrix processing, multiplication and addition calculation, nonlinear operation and the like, and can efficiently finish the calculation tasks of the artificial general intelligence.
In one possible implementation, the processing component is applied to a computational core of an artificial intelligence processor. The artificial intelligence processor can be a brain-like computing chip, namely, the processing mode of the brain is used as a reference, and the transmission and the processing of the information by the neurons in the brain are simulated, so that the processing efficiency is improved and the power consumption is reduced. The artificial intelligence processor can comprise a plurality of computing cores, and different tasks can be independently processed among the computing cores or the same task can be processed in parallel so as to improve the processing efficiency. Inter-core information transfer between the computing cores may be performed by routing units within the computing cores. Within the computing core, a processing component and a storage component may be provided, where the processing component may simulate a processing mode of information by neurons of the brain, and may be divided into a control unit, an axon unit, a dendrite unit, a cell unit, and the like, where the processing units respectively access the storage component to perform data interaction with the storage component within the core, and may respectively undertake respective data processing tasks and/or data transmission tasks to obtain a data processing result, or communicate with other computing cores. The present disclosure does not limit the application field of the storage unit.
FIG. 2 illustrates a schematic diagram of a parallel computing method according to an embodiment of the present disclosure, as shown in FIG. 2, in the related art, parallel computing in artificial intelligence is typically accelerated by a CPU and GPU in combination, e.g., the CPU and GPU may call data in external memory units through a bus and may communicate with other devices through the bus and external interfaces. However, this method is generally limited by the data transmission speed, so that the data reading and writing speed is low, which limits the processing efficiency. And the energy consumption is higher, and the calculation efficiency is lower.
In one possible implementation, in response to the above-mentioned problems, the artificial intelligence processor of the present disclosure may include a plurality of computing cores, and the computing cores may include a processing unit and a storage unit therein, i.e., the processing unit may access the storage unit within the computing cores to increase the read-write speed, reduce the limitation of data transmission, and increase the processing efficiency.
In one possible implementation, the computational core may include a control unit, an axon unit, a dendrite unit, a cell body unit, and a routing unit. The control unit may control the axon unit, the dendrite unit, the cell unit, and the routing unit by control instructions, for example, may control the axon unit to read data from the storage part and transmit the data to the dendrite unit for matrix operation. The control instruction comprises instruction set address information and instruction index information, wherein the instruction index information is used for indicating whether an instruction corresponding to an index is executed or not. The control unit is further configured to: and sending the instruction set address information and the instruction index information to at least one of the axon unit, the dendrite unit, the cell body unit and the routing unit, so that the at least one of the axon unit, the dendrite unit, the cell body unit and the routing unit accesses the instruction set based on the instruction set address information, acquires an instruction to be executed, and executes the instruction to be executed according to the instruction index information.
In an example, the control unit may be an instruction processor, and the instruction processor may call the axon unit, the dendrite unit, the cell unit, and the routing unit according to an instruction (e.g., a static primitive group) to execute the instruction, for example, may send a control instruction to the above units, where the control instruction includes instruction set address information (e.g., an address pointer) and instruction index information, and by using the two information, it may control which unit executes the instruction, and by a unit that needs to execute the instruction, access the instruction set according to the instruction set address information, obtain the primitive full code, and execute the primitive full code.
In an example, the control unit may send control instructions to the above-described unit, which may fetch instructions according to the instruction set address information. For example, the instruction may be a primitive, and the unit may obtain the primitive full code that the unit needs to execute according to the instruction set address information, and determine whether to execute the instruction based on the instruction index information. The control unit can realize task allocation and parallel execution in space and time dimensions in the mode, and can determine the execution condition of each unit according to the processing resources so as to balance the calculation cost and improve the operation efficiency.
For example, the control unit may send control instructions to the axon unit, the dendrite unit, the cell body unit, and the routing unit, respectively. According to the instruction index information, one or more units need to execute instructions, for example, an axon unit and a dendrite unit need to execute instructions, the axon unit and the dendrite unit can access an instruction set based on instruction set address information, acquire the instructions to be executed (i.e. primitive full codes), execute the instructions, and obtain an execution result. In the next operation, the control unit may send a control instruction to the above unit again, and according to the instruction index information, the cell unit may execute an instruction for performing a further operation based on the execution result of the previous operation, so that the cell unit may find the instruction based on the instruction set address information, and process the execution result of the previous operation based on the instruction, to obtain a processing result. The axon unit and the dendrite unit can execute the same instruction as the previous operation again at the same time, and perform operation on other data to obtain the execution result of the operation. Further, the operation may be continued, and in the third operation, the routing unit may execute an instruction according to the instruction index information, and the routing unit may search for the instruction based on the instruction set address information and execute the instruction, for example, the instruction is a processing result of the forwarding cell unit in the previous operation, and so on. Meanwhile, the cell body unit can further process the execution results of the axon unit and the dendrite unit in the previous operation to obtain the processing result in the current operation. Meanwhile, the axon unit and the dendrite unit can execute the same instruction as the previous operation again, and operate other data to obtain the execution result … … of the current operation.
In an example, the control unit may control execution primitives (e.g., convolution primitive) in the axon unit, which may output one or more rows of output results (e.g., one or more rows of pixels of the output feature map) per execution, and the control unit may control the cell body unit to perform the execution primitives (e.g., pooling primitive). Meanwhile, the axon unit can continue to calculate pixels … … of other rows of the feature map, and the units can be executed in parallel in a pipelining mode, and the processed feature map is sent through the routing unit.
In an example, the control unit may control the execution of a computation primitive (e.g., a vector accumulation primitive) in the axon unit, the cell unit may perform a handling function, such as handling the output to an output buffer space, and the routing unit may send data in the output buffer space, each time a certain amount of output is accumulated (e.g., one input process of the cell unit is satisfied). Meanwhile, the axon unit can execute the next accumulation processing, namely, the units can be executed in parallel in a pipeline form.
In an example, the control unit may control the axon unit, the dendrite unit, the cell unit, and the routing unit simultaneously, for example, when the processes of the units are not data dependent (i.e., when there is no need to wait for the data processing results of other units), the control unit may control the axon unit to perform data reading and conversion, control the dendrite unit to perform matrix operation, control the cell unit to perform activation processing, and control the routing unit to perform data communication simultaneously. The control unit does not limit the control manner of the axon unit, the dendrite unit, the cell body unit and the routing unit.
In one possible implementation, the axon unit may be used to interact with the storage component, e.g., may read processing data and/or weight data in the storage component, or write processing results to the storage component.
In an example, the storage component may include a plurality of storage units, wherein a first storage unit may be used to store the processing data and/or the weight data. For example, the first memory unit may include one or more SRAMs, wherein a portion of the SRAMs may be used to store process data (e.g., matrix, vector waiting data) and a portion of the SRAMs may be used to store weight data for the neural network.
In an example, the artificial intelligence processor may perform a processing task of the neural network, and the axon unit may read the weight data and the processing data stored in the first storage unit to perform a data conversion process, to obtain the first data. For example, the weight data may be subjected to a data conversion process to convert the weight data into a vector or a data form of a matrix so that the dendrite unit performs a multiplication and addition operation of the matrix or the vector.
In one possible implementation manner, the dendrite unit may be configured to perform a matrix operation on the processing data and/or the weight data read by the dendrite unit, where the dendrite unit obtains a first processing result according to the first data, and the method includes: and performing multiply-add array processing on the first data to obtain the first processing result.
In an example, the dendritic cells may have parallel processing mechanisms and may multiply-add the array processing to the first data. For example, the weight data is converted into a data form of a matrix or a vector, and the dendrite unit may perform a vector multiply-add process or a matrix multiply-add process on the converted weight data and the processing data to obtain the first processing result.
In one possible implementation, after obtaining the first processing result, the axon unit may read the first processing result and may write the first processing result to the storage component. The axon unit is also for: reading the first processing result; writing the first processing result into the storage part.
In an example, the axon unit and the storage component may include two sets of data buses that may be used for read-only access and read-write access, respectively, to the storage component. For example, if the process data or the weight data does not need to be modified, read-only access to the memory component can be made via the data bus for read-only access. If the weight data needs to be modified or the first processing result needs to be written into the storage unit, the data bus for read-write access may be used for read-write access to the storage unit.
In one possible implementation, the weight data may also be changed through a data bus for read-write access, for example, in a task of training a neural network, each training step may change the weight data, and the changed weight data may be written into the storage component through the data bus for read-write access, so as to replace the original weight data, so that the weight of the neural network is more accurate.
In this way, the data interaction can be performed through the axon unit and the storage component in the computing core, and the parallel multiplication and addition array processing can be performed through the dendrite unit, so that the data processing efficiency can be improved, and the resource occupation of the data interaction can be reduced.
In one possible implementation, the cell units may be used to implement transmission of data and nonlinear operations within a computational core. The data transmission includes having the data transmitted within different memory cells of the memory component or having the data transmitted within multiple cells of the processing component. The nonlinear operation may include tensor comparison, LUT (look up table) activation function, LIF (leaky integrate and fire, leaky integrate and trigger) neuron, and the like nonlinear operations.
In a possible implementation manner, the storage unit further stores an operation parameter, and the cell unit obtains a second processing result according to the first processing result and/or the processing data, where the second processing result includes: reading the processing data and/or the first processing result stored in the storage component and the operation parameter; activating the processing data and/or the first processing result according to the operation parameters to obtain the second processing result; writing the second processing result into the storage part.
In one possible implementation, the storage means may comprise a second storage unit, which may be used for storing operational parameters, e.g. parameters of the activation function, parameters of the lookup table, parameters of the routing table, etc., on the basis of which the cell units may perform the non-linear operation. For example, the first processing result may be an intermediate result of the neural network processing, and further, an activation process is required to obtain a final result (for example, a second processing result), the cell unit may read the first processing result stored in the first storage unit in the storage unit, and read an operation parameter (for example, a parameter of an activation function) stored in the second storage unit in the storage unit, and perform the activation process on the first processing result based on the operation parameter, so as to obtain the second processing result, where the second processing result may be written in the storage unit. For example, in some processes, the processed data needs to be processed by an activation function and then further processed, the cell unit may read the processed data stored in the first storage unit in the storage unit, read the operation parameter stored in the second storage unit in the storage unit, and perform activation processing on the processed data based on the operation parameter to obtain a second processing result, where the second processing result may perform further operation, for example, the cell unit may write the second processing result into the first storage unit, and the axon unit may read the second processing result and perform matrix operation and other processes through the dendrite unit.
In a possible implementation, the storage unit includes an output buffer space, and the cell unit is further configured to: reading the processing data, the first processing result and/or the second processing result stored in the data storage space under the condition that a control instruction of the control unit is received; and writing the processing data, the first processing result and/or the second processing result into the output buffer space.
In an example, the third storage unit may include the output buffer space for buffering data sent to other computing cores, the cell unit may read data from other storage units of the storage unit and write the data into the third storage unit, that is, transfer of data in different storage units of the storage unit is achieved, and the routing unit may read data in the third storage unit and send the data to other computing cores. For example, the cell unit may read the processing data in the first storage space, the first processing result and/or the second processing result, and write the processing data into the third storage unit, and when the routing unit performs the communication task, may read the data stored in the third storage unit, and send the data to the other computing cores.
By the mode, data can be transmitted in different storage units and processing units through the axon unit, data transmission efficiency is improved, nonlinear processing such as activation can be performed by the axon unit, and processing efficiency is improved.
In one possible implementation, the routing unit may be used to calculate communications between cores. As described above, the axon unit may write the processing data to be transmitted, the first processing result, and/or the second processing result into the output buffer space, and the routing unit may read the data in the output buffer space and transmit the data to the other computing cores, where the transmitted address may be determined according to the routing table in the operation parameter. The storage means stores an operation parameter including a routing table, the routing unit being further configured to: under the condition that a control instruction of the control unit is received, reading a routing table stored in the data storage space and the processing data, the first processing result and/or the second processing result stored in the output buffer space; and sending the processing data, the first processing result and/or the second processing result according to the routing table. The output buffer space may also be set in the routing unit, and the setting mode of the output buffer space is not limited in the present disclosure.
In an example, the routing unit may read data to be transmitted, and may read an operation parameter (e.g., a routing table) from a second storage unit of the storage unit, and may determine a communication address based on the routing table, and further, the routing unit may transmit the data to be transmitted based on the communication address.
In one possible implementation, the routing unit may further receive data sent by other computing cores, and the routing unit is further configured to: receiving communication data, the communication data comprising data from other computing cores; and writing the communication data into the storage component. In an example, one computing core may send data to the other computing core through a routing unit, and another computing core may receive data through the routing unit. For example, a second storage unit of the storage unit may include a cache space therein into which the routing unit may write communication data received from other computing cores for further processing. For example, the axon unit may read the communication data in the buffer space and perform matrix operation processing through the dendrite unit, or the cell unit may read the communication data in the buffer space and perform activation processing, and the processing manner of the communication data is not limited in the present disclosure.
In one possible implementation, the multiple processing units of the processing unit may process data in parallel in a pipeline manner, so as to reduce latency and improve computing efficiency. In an example, the control unit may control the axon unit, the dendrite unit, the cell unit, and the routing unit to perform tasks in sequence, for example, the axon unit may read the processing data 1 and the weight data, perform multiplication and addition array processing by the dendrite unit after data conversion, obtain the first processing result 1, and write the first processing result into the storage unit. The cell body unit can read the first processing result 1, perform activation processing to obtain a second processing result 1, write the second processing result 1 into the output buffer space, and meanwhile, the axon unit can read the processing data 2 and the weight data, and obtain the first processing result 2 by the dendrite unit. The routing unit can read the second processing result 1 and send the second processing result 1 based on the routing table, meanwhile, the cell body unit can read the first processing result 2 and perform activation processing, and write the obtained second processing result 2 into the output buffer space, and meanwhile, the axon unit can read the processing data 3 … …, so that each processing unit can execute processing in parallel, and the processing efficiency is improved.
According to the processing component of the embodiment of the disclosure, the processing component and the storage component can be arranged in the computing core, so that the storage component directly receives the read-write access of the processing component, the storage component outside the core is not required to be read or written, the resource occupation and the power consumption of read-write data are reduced, the processing efficiency is improved, and the processing component is suitable for an artificial intelligent processor with a many-core architecture, and can meet the operation requirements of accurate operation and quick response of a large amount of data. The axon unit, the dendrite unit, the cell body unit and the routing unit are suitable for different functions such as state control, data storage, matrix processing, multiplication and addition calculation, nonlinear operation and the like, can realize data transmission through the axon unit, can execute tasks in parallel in a pipeline form, and can efficiently complete the calculation tasks of manual general intelligence.
In one possible implementation, the manner in which the processing means are handled is explained below by way of a specific example. The processing unit may include a control unit, a dendritic unit, an axon unit, a cell unit and a routing unit, where the control unit may control other units to perform data interaction with the storage unit, so as to process data stored therein, and perform processing such as external communication on a processing result.
In one possible implementation, the control unit may control the axon unit to read the processing data and the weight data in the storage unit, the axon unit may perform a data conversion process on the read data, for example, converting the data into a matrix or vector form, and transmitting the data to the dendrite unit to perform a matrix operation or a vector operation, that is, perform a multiply-add array operation, to obtain a first processing result, and the axon unit may write the first processing result into the storage unit.
In one possible implementation manner, the control unit may control the cell unit to read the first processing result stored in the storage unit and the parameter of the activation function, perform activation processing on the first processing result, obtain a second processing result, and may transmit the second processing result to the output buffer space in the storage unit.
In one possible implementation manner, the control unit may control the routing unit to read the second processing result buffered in the output buffer space, and read the routing table stored in the storage unit, so as to determine the communication address, and send the second processing result to the other computing cores according to the communication address.
In one possible implementation, the control unit may control the routing unit to receive communication data sent by other computing cores, and write the communication data to the storage unit for further processing.
In one possible implementation, the present disclosure also provides an artificial intelligence processor including a plurality of computing cores including the processing component described above and the storage component described above.
Fig. 3 is a block diagram illustrating a combination processing apparatus 1200 according to an embodiment of the present disclosure. The combination processing device 1200 includes a computing processing device 1202 (e.g., an artificial intelligence processor including a plurality of computing cores as described above), an interface device 1204, other processing devices 1206, and a storage device 1208. Depending on the application scenario, one or more computing devices 1210 (e.g., computing cores) may be included in the computing processing device.
In one possible implementation, the computing processing means of the present disclosure may be configured to perform user-specified operations. In an exemplary application, the computing processing device may be implemented as a single-core artificial intelligence processor or as a multi-core artificial intelligence processor. Similarly, one or more computing devices included within a computing processing device may be implemented as an artificial intelligence processor core or as part of a hardware architecture of an artificial intelligence processor core. When multiple computing devices are implemented as artificial intelligence processor cores or portions of hardware structures of artificial intelligence processor cores, the computing processing devices of the present disclosure may be considered to have a single core structure or an isomorphic multi-core structure.
In an exemplary operation, the computing processing device of the present disclosure may interact with other processing devices through an interface device to collectively accomplish user-specified operations. Depending on the implementation, other processing means of the present disclosure may include one or more types of processors among general-purpose and/or special-purpose processors such as central processing units (Central Processing Unit, CPU), graphics processors (Graphics Processing Unit, GPU), artificial intelligence processors, and the like. These processors may include, but are not limited to, digital signal processors (Digital Signal Processor, DSPs), application specific integrated circuits (Application Specific Integrated Circuit, ASICs), field programmable gate arrays (Field-Programmable Gate Array, FPGAs) or other programmable logic devices, discrete gate or transistor logic devices, discrete hardware components, etc., and the number thereof may be determined according to actual needs. As previously mentioned, the computing processing device of the present disclosure may be considered to have a single core structure or an isomorphic multi-core structure only with respect to it. However, when computing processing devices and other processing devices are considered together, both may be considered to form heterogeneous multi-core structures.
In one or more embodiments, the other processing device may interface with external data and controls as a computing processing device of the present disclosure (which may be embodied as an associated computing device for artificial intelligence, such as neural network operations), performing basic controls including, but not limited to, data handling, starting and/or stopping of the computing device, and the like. In other embodiments, other processing devices may also cooperate with the computing processing device to jointly accomplish the computational tasks.
In one or more embodiments, the interface device may be used to transfer data and control instructions between the computing processing device and other processing devices. For example, the computing device may obtain input data from other processing devices via the interface device, and write the input data to a storage device (or memory) on the computing device. Further, the computing processing device may obtain control instructions from other processing devices via the interface device, and write the control instructions into a control cache on the computing processing device chip. Alternatively or in addition, the interface device may also read data in a memory device of the computing processing device and transmit it to the other processing device.
Additionally or alternatively, the combined processing apparatus of the present disclosure may further comprise a storage device. As shown in the figure, the storage means are connected to the computing processing means and the other processing means, respectively. In one or more embodiments, a storage device may be used to store data for the computing processing device and/or the other processing devices. For example, the data may be data that cannot be stored entirely in an internal or on-chip memory device of a computing processing device or other processing device.
According to different application scenarios, the artificial intelligence chip of the present disclosure may be used in a server, a cloud server, a server cluster, a data processing device, a robot, a computer, a printer, a scanner, a tablet computer, an intelligent terminal, a PC device, an internet of things terminal, a mobile phone, a vehicle recorder, a navigator, a sensor, a camera, a video camera, a projector, a watch, an earphone, a mobile storage, a wearable device, a visual terminal, an autopilot terminal, a vehicle, a household appliance, and/or a medical device. The vehicle comprises an aircraft, a ship and/or a vehicle; the household appliances comprise televisions, air conditioners, microwave ovens, refrigerators, electric cookers, humidifiers, washing machines, electric lamps, gas cookers and range hoods; the medical device includes a nuclear magnetic resonance apparatus, a B-mode ultrasonic apparatus, and/or an electrocardiograph apparatus.
Fig. 4 illustrates a block diagram of an electronic device 1900 according to an embodiment of the disclosure. For example, electronic device 1900 may be provided as a server. Referring to FIG. 4, electronic device 1900 includes a processing component 1922 (e.g., an artificial intelligence processor including a plurality of computing cores) that further includes one or more computing cores, and memory resources represented by memory 1932 for storing instructions, such as applications, that can be executed by processing component 1922. The application programs stored in memory 1932 may include one or more modules each corresponding to a set of instructions. Further, the processing component 1922 is configured to execute instructions to perform the methods described above.
The electronic device 1900 may also include a power component 1926 configured to perform power management of the electronic device 1900, a wired or wireless network interface 1950 configured to connect the electronic device 1900 to a network, and an input/output (I/O) interface 1958. The electronic device 1900 may operate based on an operating system stored in memory 1932, such as Windows Server, mac OS XTM, unixTM, linuxTM, freeBSDTM, or the like.
In the present disclosure, units described as separate components may or may not be physically separate, and components shown as units may or may not be physical units. The aforementioned components or units may be co-located or distributed across multiple network elements. In addition, according to actual needs, some or all of the units may be selected to achieve the purposes of the solution described in the embodiments of the disclosure. Additionally, in some scenarios, multiple units in embodiments of the disclosure may be integrated into one unit or each unit may physically reside separately.
In the foregoing embodiments, the descriptions of the embodiments are emphasized, and for parts of one embodiment that are not described in detail, reference may be made to related descriptions of other embodiments. The technical features of the foregoing embodiments may be arbitrarily combined, and for brevity, all of the possible combinations of the technical features of the foregoing embodiments are not described, however, as long as there is no contradiction between the combinations of the technical features, they should be regarded as the scope of the description.
The electronic device or processor of the present disclosure may also be applied to the internet, the internet of things, data centers, energy, transportation, public management, manufacturing, education, power grid, telecommunications, finance, retail, construction site, medicine, and the like. Further, the electronic device or the processor of the present disclosure may also be used in application scenarios related to artificial intelligence, big data, and/or cloud computing, such as cloud, edge, terminal, and the like. In one or more embodiments, a high-power electronic device or processor according to the present disclosure may be applied to a cloud device (e.g., a cloud server), while a low-power electronic device or processor may be applied to a terminal device and/or an edge device (e.g., a smart phone or a camera). In one or more embodiments, the hardware information of the cloud device and the hardware information of the terminal device and/or the edge device are compatible with each other, so that according to the hardware information of the terminal device and/or the edge device, appropriate hardware resources can be matched from the hardware resources of the cloud device to simulate the hardware resources of the terminal device and/or the edge device, so that unified management, scheduling and collaborative work of the cloud entity or the cloud edge entity can be completed.
The foregoing description of the embodiments of the present disclosure has been presented for purposes of illustration and description, and is not intended to be exhaustive or limited to the embodiments disclosed. Many modifications and variations will be apparent to those of ordinary skill in the art without departing from the scope and spirit of the various embodiments described. The terminology used herein was chosen in order to best explain the principles of the embodiments, the practical application, or the improvement of technology in the marketplace, or to enable others of ordinary skill in the art to understand the embodiments disclosed herein.

Claims (4)

1. A processing component for application to a computing core of an artificial intelligence processor, the artificial intelligence processor comprising a plurality of computing cores, each computing core comprising a processing component and a storage component, the processing component comprising: a control unit, an axon unit, a dendrite unit, a cell body unit and a routing unit,
the control unit is used for:
generating a control instruction according to preset control information to control at least one of an axon unit, a dendrite unit, a cell body unit and a routing unit;
the axon unit is used for:
Under the condition that a control instruction of the control unit is received, reading processing data and/or weight data stored in the storage component, and performing data conversion processing on the processing data and/or weight data to obtain first data;
the dendritic unit is used for:
under the condition that a control instruction of the control unit is received, reading the first data, and obtaining a first processing result according to the first data;
the cell body unit is used for:
reading the first processing result and/or the processing data under the condition that a control instruction of the control unit is received;
obtaining a second processing result according to the first processing result and/or the processing data;
transmitting the first processing result, processing data and/or the second processing result via the routing unit;
the axon unit is also for:
reading the first processing result;
writing the first processing result into the storage section;
the storage part stores operation parameters, and the cell unit obtains a second processing result according to the first processing result and/or the processing data, including:
reading the processing data and/or the first processing result stored in the storage component and the operation parameter;
Activating the processing data and/or the first processing result according to the operation parameters to obtain the second processing result;
writing the second processing result into the storage section;
the storage means stores an operation parameter including a routing table, the routing unit being further configured to:
reading a routing table stored in a data storage component and outputting the processing data, the first processing result and/or the second processing result stored in a cache space under the condition that a control instruction of the control unit is received;
transmitting the processing data, the first processing result and/or the second processing result according to the routing table;
the routing unit is further configured to:
receiving communication data, the communication data comprising data from other computing cores;
writing the communication data to the storage means;
the control instruction comprises instruction set address information and instruction index information, wherein the instruction index information is used for indicating whether an instruction corresponding to an index is executed or not,
the control unit is further configured to:
transmitting the instruction set address information and the instruction index information to at least one of the axon unit, the dendrite unit, the cell body unit and the routing unit, so that the at least one of the axon unit, the dendrite unit, the cell body unit and the routing unit accesses the instruction set based on the instruction set address information, acquires an instruction to be executed, and executes the instruction to be executed according to the instruction index information;
Each unit of the processing unit processes data in parallel in a pipeline form;
the units process data in parallel in a pipelined fashion, comprising: the control unit controls the axon unit to execute the calculation primitive, and the cell body unit executes the carrying function when a certain amount of output is obtained through accumulation, and meanwhile, the axon unit executes the next accumulation processing.
2. The component of claim 1, wherein the dendrite unit obtains a first processing result from the first data comprising:
and performing multiply-add array processing on the first data to obtain the first processing result.
3. The component of claim 1, wherein the storage component comprises an output buffer space, the cell unit further to:
reading the processing data, the first processing result and/or the second processing result stored in the data storage part under the condition that a control instruction of the control unit is received;
and writing the processing data, the first processing result and/or the second processing result into the output cache space.
4. An artificial intelligence processor, characterized in that it comprises a plurality of computing cores comprising a processing means and a storage means according to any of claims 1 to 3.
CN202011565319.0A 2020-12-25 2020-12-25 Processing component and artificial intelligence processor Active CN112766475B (en)

Priority Applications (1)

Application Number Priority Date Filing Date Title
CN202011565319.0A CN112766475B (en) 2020-12-25 2020-12-25 Processing component and artificial intelligence processor

Applications Claiming Priority (1)

Application Number Priority Date Filing Date Title
CN202011565319.0A CN112766475B (en) 2020-12-25 2020-12-25 Processing component and artificial intelligence processor

Publications (2)

Publication Number Publication Date
CN112766475A CN112766475A (en) 2021-05-07
CN112766475B true CN112766475B (en) 2023-04-28

Family

ID=75694465

Family Applications (1)

Application Number Title Priority Date Filing Date
CN202011565319.0A Active CN112766475B (en) 2020-12-25 2020-12-25 Processing component and artificial intelligence processor

Country Status (1)

Country Link
CN (1) CN112766475B (en)

Families Citing this family (1)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN115086232B (en) * 2022-06-13 2023-07-21 清华大学 Task processing and data stream generating method and device

Family Cites Families (3)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN106201651A (en) * 2016-06-27 2016-12-07 鄞州浙江清华长三角研究院创新中心 The simulator of neuromorphic chip
US10824937B2 (en) * 2016-12-20 2020-11-03 Intel Corporation Scalable neuromorphic core with shared synaptic memory and variable precision synaptic memory
CN108334942B (en) * 2017-12-22 2020-08-04 清华大学 Data processing method, device, chip and storage medium of neural network

Non-Patent Citations (1)

* Cited by examiner, † Cited by third party
Title
何峰等.长短期记忆 LSTM 神经形态芯片 设计的两步映射方法.集成电路应用.2018,全文. *

Also Published As

Publication number Publication date
CN112766475A (en) 2021-05-07

Similar Documents

Publication Publication Date Title
CN109284823B (en) Arithmetic device and related product
CN110298443B (en) Neural network operation device and method
EP3660628B1 (en) Dynamic voltage frequency scaling device and method
CN111047022A (en) Computing device and related product
CN110909870B (en) Training device and method
CN109711540B (en) Computing device and board card
CN112766475B (en) Processing component and artificial intelligence processor
CN111079909B (en) Operation method, system and related product
CN116797464A (en) Computing method, computing device, computer apparatus, and storage medium
CN111078291B (en) Operation method, system and related product
CN111079925B (en) Operation method, device and related product
CN111078284A (en) Operation method, system and related product
CN111078125B (en) Operation method, device and related product
CN111079914B (en) Operation method, system and related product
CN111078283B (en) Operation method, device and related product
CN111079915B (en) Operation method, device and related product
CN111079924B (en) Operation method, system and related product
CN111079907B (en) Operation method, device and related product
CN117520254A (en) Processor, chip, board card and method
CN117093263A (en) Processor, chip, board card and method
CN112394990A (en) Floating point to half precision floating point instruction processing device and method and related products
CN112394993A (en) Half-precision floating point to short shaping instruction processing device and method and related product
CN112394986A (en) Device and method for processing half-precision floating point to floating point instruction and related products
CN116882512A (en) Data processing method, training method of model and related equipment
CN117648091A (en) Compiling method of calculation graph and related product

Legal Events

Date Code Title Description
PB01 Publication
PB01 Publication
SE01 Entry into force of request for substantive examination
SE01 Entry into force of request for substantive examination
GR01 Patent grant
GR01 Patent grant