Summary of the invention
In view of this, the present disclosure proposes a kind of operation methods, which comprises
Obtain artificial intelligence Operator Library in duplication operator and multiplication operator, the duplication operator for by input data into
To obtain duplication input data, the multiplication operator is used to carry out multiplying to input data for row duplication;
The duplication operator and the multiplication operator are spliced to form splicing operator,
Wherein, the splicing operator is used in artificial intelligence process device execute input data corresponding splicing operation behaviour
Make, to execute artificial intelligence operation.
It is described to splice the duplication operator and the multiplication operator to be formed in a kind of possible embodiment
Splice operator, comprising:
Using the duplication operator as the prime operator of the multiplication operator.
In a kind of possible embodiment, the splicing arithmetic operation includes:
When obtaining input data, input data is replicated to obtain and replicate input number using the duplication operator
According to;
Multiplying is carried out to the input data and duplication input data using the multiplication operator, to obtain multiplication fortune
Calculate result.
In a kind of possible embodiment, the splicing operator is applied to the application layer in software transfer level,
The artificial intelligence Operator Library is located at the Operator Library layer in software transfer level, and the artificial intelligence process device is located at software transfer
Chip layer in level.
According to another aspect of the present disclosure, a kind of arithmetic unit is proposed, described device includes:
Module is obtained, for obtaining duplication operator and multiplication operator in artificial intelligence Operator Library, the duplication operator is used
In input data is carried out duplication to obtain duplication input data, the multiplication operator is used to carry out multiplication fortune to input data
It calculates;
Computing module is connected to the acquisition module, for splicing the duplication operator with the multiplication operator
Splice operator to be formed,
Wherein, the splicing operator is used in artificial intelligence process device execute input data corresponding splicing operation behaviour
Make, to support artificial intelligence operation.
In a kind of possible embodiment, the computing module includes:
First operation submodule, using the duplication operator as the prime operator of the multiplication operator.
In a kind of possible embodiment, the splicing arithmetic operation includes:
When obtaining input data, input data is replicated to obtain and replicate input number using the duplication operator
According to;
Multiplying is carried out to the input data and duplication input data using the multiplication operator, to obtain multiplication fortune
Calculate result.
According to another aspect of the present disclosure, a kind of artificial intelligence process device is proposed, described device includes:
Primary processor, for executing the method, to obtain splicing operator, the splicing operator is used for the input
Data execute corresponding arithmetic operation;
Artificial intelligence process device is electrically connected to the primary processor;
The primary processor is also used to send input data and the splicing operator to artificial intelligence process device, described artificial
Intelligent processor is configured as:
Receive the input data and splicing operator that primary processor is sent;
Artificial intelligence operation is carried out to obtain operation result to the input data using the splicing operator;
The operation result is sent to the primary processor.
In a kind of possible embodiment, the primary processor further includes primary processor memory space, for storing
State splicing operator, wherein
The primary processor also provides for input data and the splicing being stored in the primary processor memory space is calculated
Son.
In a kind of possible embodiment, the artificial intelligence process device passes to operation result by I/O interface
The primary processor;
It, can between the multiple artificial intelligence process device when described device includes multiple artificial intelligence process devices
To be attached by specific structure and transmit data;
Wherein, multiple artificial intelligence process devices are interconnected simultaneously by quick external equipment interconnection Bus PC IE bus
Data are transmitted, to support the operation of more massive artificial intelligence;Multiple artificial intelligence process devices share same control system
It unites or possesses respective control system;Multiple artificial intelligence process device shared drives possess respective memory;It is multiple
The mutual contact mode of the artificial intelligence process device is any interconnection topology.
In a kind of possible embodiment, the device, further includes: storage device, the storage device respectively with institute
It states artificial intelligence process device to connect with the primary processor, for saving the artificial intelligence process device device and the main process task
The data of device.
According to another aspect of the present disclosure, a kind of artificial intelligence chip is proposed, the artificial intelligence chip includes described
Artificial intelligence process device.
According to another aspect of the present disclosure, a kind of electronic equipment is proposed, the electronic equipment includes the artificial intelligence
It can chip.
According to another aspect of the present disclosure, propose a kind of board, the board include: memory device, interface arrangement and
Control device and the artificial intelligence chip;
Wherein, the artificial intelligence chip and the memory device, the control device and the interface arrangement are distinguished
Connection;
The memory device, for storing data;
The interface arrangement, for realizing the data transmission between the chip and external equipment;
The control device is monitored for the state to the chip.
In a kind of possible embodiment, the memory device includes: multiple groups storage unit, is stored described in each group single
It is first to be connect with the chip by bus, the storage unit are as follows: DDR SDRAM;
The chip includes: DDR controller, the control for data transmission and data storage to each storage unit
System;
The interface arrangement are as follows: standard PCIE interface.
According to another aspect of the present disclosure, a kind of non-volatile computer readable storage medium storing program for executing is provided, is stored thereon with
Computer program instructions, wherein the computer program instructions realize the above method when being executed by processor.
According to below with reference to the accompanying drawings to detailed description of illustrative embodiments, the other feature and aspect of the disclosure will become
It is clear.
Specific embodiment
Various exemplary embodiments, feature and the aspect of the disclosure are described in detail below with reference to attached drawing.It is identical in attached drawing
Appended drawing reference indicate element functionally identical or similar.Although the various aspects of embodiment are shown in the attached drawings, remove
It non-specifically points out, it is not necessary to attached drawing drawn to scale.
Dedicated word " exemplary " means " being used as example, embodiment or illustrative " herein.Here as " exemplary "
Illustrated any embodiment should not necessarily be construed as preferred or advantageous over other embodiments.
In addition, giving numerous details in specific embodiment below to better illustrate the disclosure.
It will be appreciated by those skilled in the art that without certain details, the disclosure equally be can be implemented.In some instances, for
Method, means, element and circuit well known to those skilled in the art are not described in detail, in order to highlight the purport of the disclosure.
Referring to Fig. 1, Fig. 1 shows the flow chart of the operation method according to one embodiment of the disclosure.
As shown in Figure 1, which comprises
Step S110 obtains duplication operator and multiplication operator in artificial intelligence Operator Library, and the duplication operator is used for will
Input data carries out duplication to obtain duplication input data, and the multiplication operator is used to carry out multiplying to input data;
Step S120 splices the duplication operator and the multiplication operator to form splicing operator,
Wherein, the splicing operator is used in artificial intelligence process device execute input data corresponding splicing operation behaviour
Make, to execute artificial intelligence operation.
By above method, duplication operator and multiplication operator in the available artificial intelligence Operator Library of the disclosure, by institute
It states duplication operator and multiplication operator is spliced to form splicing operator, the splicing operator of formation can be used for supporting new artificial
Intelligent processor, to improve operation efficiency of the new artificial intelligence process device when carrying out the operation of neural network model.
The splicing operator formed by above method, can be used as a part of artificial intelligence operation, when splicing operator fortune
When for carrying out artificial intelligence operation in artificial intelligence process device, including but not limited to speech recognition, image recognition may be implemented
Deng application, splicing operator is formed by combining deformation operator and basic operator, artificial intelligence process device can be allowed more preferable
Realize artificial intelligence operation in ground.
In a kind of possible embodiment, operator (operator) can be common algorithm in artificial intelligence, and quilt
Referred to as layer, operation, each neural network of node correspond to a network structure, and the node in network structure is
Operator.Artificial intelligent operator library can be preset, may include multiple basic operators (such as convolution in artificial intelligence Operator Library
Operator, full connection operator, pond operator, activation operator etc.), each basis operator can be by including but not limited to central processing unit
The processors such as CPU, image processor GPU are called to realize corresponding basic function.
In a kind of possible embodiment, the dimension of input data can be 4, when the first input data is image data
When, each dimension of the first input data can indicate picture number, picture channel (Channel) quantity, picture height, picture
Width.In other embodiments, (the example when the first input data is image data, but the dimension of the first input data is less than 4
For example 3), each dimension of the first input data can indicate picture number, picture number of channels, picture height, picture width
In any 3 kinds.
In a kind of possible embodiment, duplication operator when executed, the content of an input data can be answered
It makes in another memory headroom, although input data is different with memory headroom existing for duplication input data, the two
Content is identical.
In a kind of possible embodiment, multiplication operator when executed, can to two input datas of input into
Row multiplying, to obtain the result of multiplying.
In a kind of possible embodiment, step S120 by the deformation operator and the basic operator splice with
Splicing operator is formed, may include:
Using the duplication operator as the prime operator of the multiplication operator.
In this way, the duplication of input data may be implemented in the disclosure, to meet the operation item of multiplication operator
Part, multiplication operator can carry out multiplication fortune to the two when obtaining two input data (input datas+duplication input data)
It calculates, to realize the square operation of input data.
In a kind of possible embodiment, the splicing arithmetic operation includes:
When obtaining input data, input data is replicated to obtain and replicate input number using the duplication operator
According to;
Multiplying is carried out to the input data and duplication input data using the multiplication operator, to obtain multiplication fortune
Calculate result.
In a kind of possible embodiment, the splicing operator is applied to the application layer in software transfer level,
The artificial intelligence Operator Library is located at the Operator Library layer in software transfer level, and the artificial intelligence process device is located at software transfer
Chip layer in level.
Referring to Fig. 2, Fig. 2 shows the software transfer hierarchical relationship schematic diagrames according to one embodiment of the disclosure.
As shown in Fig. 2, software transfer hierarchical relationship from top to bottom successively include application layer, ccf layer, Operator Library layer,
Drive layer, chip layer, wherein the splicing operator obtained by foregoing operation method can be applied to application layer, artificial intelligence
Energy Operator Library can be in Operator Library layer, and artificial intelligence process device can be located in chip layer, and driving layer may include for driving
The driver of dynamic chip layer work.
It, can by described above it is found that using the deformation operator in Operator Library layer and after basic operator forms splicing operator
Directly to be called by application layer to be applied in application layer, to realize corresponding function in artificial intelligence operation
Can, it avoids and requires to transfer deformation operator from Operator Library layer each time when application layer will carry out artificial intelligence operation
And the case where basis operator, so as to improve the implementation procedure of artificial intelligence operation.
It, can be with when needing using artificial intelligence operation to carry out speech recognition, image procossing in an application example
Using square splicing operator (duplication operator+multiplication operator) Lai Jinhang square operation in one embodiment of the disclosure, thus
When needing to carry out square operation to input data, input data is replicated using a square splicing operator, it is defeated to obtain duplication
Enter data, and multiplying is carried out to realize input data to input data and duplication input data using square splicing operator
Square operation.Using described in the disclosure squares of splicing operator, artificial intelligence operation can be executed more advantageously to realize
Including but not limited to image procossing, speech recognition etc. are applied, to improve the efficiency of artificial intelligence operation.
By above method, the disclosure can obtain splicing operator, the splicing operator according to duplication operator and multiplication operator
When can need to carry out square operation to input data, input data is replicated using a square splicing operator, to be answered
Input data processed, and replicate input data to input data using square splicing operator and carry out multiplying to realize input
The square operation of data.
Referring to Fig. 3, Fig. 3 shows the schematic diagram of the splicing operator according to one embodiment of the disclosure.
As shown in figure 3, the splicing operator includes:
Operator 10 is replicated, the duplication operator 10 is used to carry out input data duplication to obtain duplication input data;
Multiplication operator 20 is connected to duplication operator 10, and the multiplication operator 20 is for receiving input data and duplication
Multiplying is carried out to input data and duplication input data after input data, and exports operation result.
By splicing operator above, the disclosure can use duplication operator and input data carried out duplication to be replicated
Input data, using multiplication operator to input data and duplication input data after receiving input data and duplication input data
Multiplying is carried out, and exports operation result, to realize the square operation of input data.
Referring to Fig. 4, Fig. 4 shows the block diagram of the arithmetic unit according to one embodiment of the disclosure.
As shown in figure 4, described device includes:
Module 80 is obtained, for obtaining duplication operator and multiplication operator in artificial intelligence Operator Library, the duplication operator
For input data to be carried out duplication to obtain duplication input data, the multiplication operator is used to carry out multiplication to input data
Operation;
Computing module 90 is connected to the acquisition module 80, for carrying out the duplication operator and the multiplication operator
Splice to form splicing operator,
Wherein, the splicing operator is used in artificial intelligence process device execute input data corresponding splicing operation behaviour
Make, to support artificial intelligence operation.
Referring to Fig. 5, Fig. 5 shows the block diagram of the arithmetic unit according to one embodiment of the disclosure.
As shown in figure 5, computing module 90, comprising:
First operation submodule 910, using the duplication operator as the prime operator of the multiplication operator.
In a kind of possible embodiment, the splicing arithmetic operation includes:
When obtaining input data, input data is replicated to obtain and replicate input number using the duplication operator
According to;
Multiplying is carried out to the input data and duplication input data using the multiplication operator, to obtain multiplication fortune
Calculate result.
Referring to Fig. 6, Fig. 6 shows the block diagram of the artificial intelligence process device according to one embodiment of the disclosure.
In a kind of possible embodiment, as shown in fig. 6,
Primary processor 50, for executing the method, to obtain splicing operator, the splicing operator is used for described defeated
Enter data and executes corresponding arithmetic operation;
Artificial intelligence process device 60 is electrically connected to the primary processor 50;
The primary processor 50 is also used to send input data and the splicing operator to artificial intelligence process device 60, described
Artificial intelligence process device 60 is configured as:
Receive the input data and splicing operator that primary processor 50 is sent;
Artificial intelligence operation is carried out to obtain operation result to the input data using the splicing operator;
The operation result is sent to the primary processor 50.
In a kind of possible embodiment, primary processor 50 may include primary processor memory space, for storing master
Processor 50 executes the splicing operator that the operation method obtains, wherein
The splicing that the primary processor 50 also provides for input data and is stored in the primary processor memory space
Operator.
It is to be understood that primary processor 50 can execute the operation method after obtaining data, obtains splicing and calculate
Son, and the splicing operator of acquisition is sent to artificial intelligence process device 60 simultaneously and is handled.Primary processor 50 can also will be deposited
The splicing operator of storage is sent to artificial intelligence process device 60, and pre-stored splicing operator is sent to artificial intelligence to realize
Processor 60, artificial intelligence process device 60 carry out artificial intelligence operation according to the splicing operator and input data received.More than
Two ways, former can be considered that the mode handled immediately on line, latter can be considered processing mode under line.
In a kind of possible embodiment, device as shown in Figure 4, Figure 5 can be realized in primary processor 50.
In a kind of possible embodiment, primary processor 50 can be central processor CPU, be also possible to other types
Processor, such as image processor GPU.It is to be understood that the splicing operator is obtained by foregoing operation method
Splice operator, the specific description introduced before please referring to splicing operator, details are not described herein.
In a kind of possible embodiment, artificial intelligence process device can be to be formed by multiple identical processors
, such as multiple processors (XPU) formation is similar to the framework of primary processor 50+ artificial intelligence process device 60.Can also be by
One processor forms, and in this case, processor can both execute operation method above-mentioned, calculates to obtain splicing
Son can also carry out artificial intelligence operation to input data by splicing operator, to obtain output result.In present embodiment
In, the type of processor can be existing, be also possible to the new types of processors newly proposed, the disclosure is without limitation.
In a kind of possible embodiment, primary processor 50 can be used as artificial intelligence process device and external data and
The interface of control, including data are carried, and the basic control such as unlatching, stopping to this artificial intelligent treatment device is completed;Its elsewhere
Managing device can also be with the common completion processor active task of artificial intelligence process device cooperation.
In a kind of possible embodiment, artificial intelligence process device may include more than one artificial intelligence process
Device can be linked between artificial intelligence process device by specific structure and transmit data, for example, be carried out by PCIE bus
Data are interconnected and transmit, to support the operation of more massive machine learning.At this point it is possible to same control system is shared, it can also
To there is control system independent;Can with shared drive, can also each accelerator have respective memory.In addition, it is interconnected
Mode can be any interconnection topology.
The artificial intelligent treatment device compatibility with higher, can pass through PCIE interface and various types of server phases
Connection.
Referring to Fig. 7, Fig. 7 shows the block diagram of the artificial intelligence process device according to one embodiment of the disclosure.
In a kind of possible embodiment, as shown in fig. 7, primary processor 50 and artificial intelligence process device 60 can pass through
General interconnecting interface (such as I/O interface) connection, for transmitting data and control between primary processor 50 and artificial intelligence process device 60
System instruction.The artificial intelligent processor 60 obtains required input data (including splicing operator), write-in from primary processor 50
The storage device of artificial intelligence process device on piece;Control instruction can be obtained from primary processor 50, be written at artificial intelligence
Manage the control caching of device on piece;The data in the memory module of artificial intelligence process device 60 can also be read and be transferred to other
Processing unit.
In a kind of possible embodiment, artificial intelligence process device can also include storage device, storage device point
It is not connect with the artificial intelligence process device and other described processing units.Storage device is for being stored in the artificial intelligence
The data of the data of processing unit and other processing units, operation required for being particularly suitable for are filled in this artificial intelligence process
Set or the storage inside of other processing units in the data that can not all save.
The combined treatment device can be used as the SOC on piece of the equipment such as mobile phone, robot, unmanned plane, video monitoring equipment
The die area of control section is effectively reduced in system, improves processing speed, reduces overall power.When this situation, the combined treatment
The general interconnecting interface of device is connected with certain components of equipment.Certain components for example camera, display, mouse, keyboard,
Network interface card, wifi interface.By the above artificial intelligence process device, the disclosure can be by primary processor by input data and splicing
Operator is transferred to artificial intelligence process device, and artificial intelligence process executes artificial intelligence operation using splicing operator to input data
Operation, to obtain operation result, and is sent to primary processor for operation result.
It is to be understood that artificial intelligence process device 60 can be the single processor that can be used for artificial intelligence operation,
It is also possible to the combination of a variety of different processors.Artificial intelligence process device is applied to artificial intelligence operation, artificial intelligence operation packet
Include machine learning operation, class brain operation, etc..Wherein, machine learning operation includes neural network computing, k-means operation, branch
Hold vector machine operation etc..The artificial intelligent processor 60 can specifically include GPU (Graphics Processing Unit, figure
Shape processor unit), NPU (Neural-Network Processing Unit, neural network processor unit), DSP
(Digital Signal Process, Digital Signal Processing), field programmable gate array (Field-Programmable
Gate Array, FPGA) chip one kind or combination.
In a kind of possible embodiment, artificial intelligence process device 60 is as shown in Figure 8.Referring to Fig. 8, Fig. 8 is shown
According to the block diagram of the artificial intelligence process device of one embodiment of the disclosure.
As shown in figure 8, the artificial intelligence process device 30 includes control module 32, computing module 33 and memory module 31,
The computing module 33 include main process task circuit 331 and it is multiple from processing circuit 332 (from the number of processing circuit be example in figure
Property).
The control module 32, for obtaining input data and computations;
The control module 32 is also used to parse the computations and obtains multiple operational orders, by multiple operational order
And the input data is sent to the main process task circuit 331;
The main process task circuit 331, for executing preamble processing and with the multiple from processing to the input data
Data and operational order are transmitted between circuit;
It is the multiple from processing circuit 332, for referring to according to the data and operation transmitted from the main process task circuit 331
It enables the parallel intermediate operations that execute obtain multiple intermediate results, and multiple intermediate results is transferred to the main process task circuit 331;
The main process task circuit 331 obtains the computations for executing subsequent processing to the multiple intermediate result
Calculated result.
Artificial intelligence process device 30 described in the disclosure holds input data after receiving input data and computations
The corresponding arithmetic operation of row, to obtain the calculated result.
Artificial intelligence process device described in the disclosure can support the artificial intelligence of machine learning and some non-machine learning
It can algorithm.
Above-mentioned computations include but is not limited to: forward operation instruction or reverse train instruction, the application specific embodiment party
Formula is not intended to limit the specific manifestation form of above-mentioned computations.
It, can be by the meter after artificial intelligence process 30 obtains the calculated result in a kind of possible embodiment
It calculates result and is sent to other processors such as central processor CPU or image processor GPU.
The operational order is run code of the artificial intelligent processor 30 according to splicing operator acquisition, above-mentioned to run
Code includes but is not limited to: forward operation instruction or reverse train instruction or the instruction of other neural network computings etc., the application
Specific embodiment is not intended to limit the specific manifestation form of above-mentioned computations.
In a kind of possible embodiment, the artificial intelligence process device 30 can be obtained by data transmission module 360
It arrives, which is specifically as follows one or more data I/O interfaces or I/O pin.
The main process task circuit 331, for operational data executing preamble processing with the operation that obtains that treated to described
Data, and with it is the multiple from transmitted between processing circuit in the operational data, intermediate result and operational order at least one
Kind.
The block diagram of the main process task circuit 331 according to one embodiment of the disclosure is shown also referring to Fig. 9, Fig. 9.
As shown in figure 9, main process task circuit 331 may include: conversion processing circuit 113, activation processing circuit 111, addition
One of processing circuit 112 or any combination.
The conversion processing circuit 113 is handled for executing the preamble to the data, and the preamble processing can are as follows:
The received data of main process task circuit 331 or intermediate result are executed to the exchange between the first data structure and the second data structure
(such as conversion of continuous data and discrete data);Or the received data of main process task circuit 331 or intermediate result are executed first
Exchange (such as conversion of fixed point type and floating point type) between data type and the second data type.
The activation processing circuit 111 specially counts in execution main process task circuit 331 for executing the subsequent processing
According to activation operation;
The addition process circuit 112, for executing the subsequent processing, specially execution add operation or cumulative fortune
It calculates.
Each from processing circuit 332, operational data and operational order for being transmitted according to the main process task circuit 331 are held
Row intermediate operations obtain intermediate result, and the intermediate result is transferred to the main process task circuit 331;
The main process task circuit 331 obtains the operational order most for executing subsequent processing to multiple intermediate results
Whole calculated result.
The control module 32 is also used to generate debugging result according to the state information, and to the state information acquisition
Device 40 exports debugging result.
Memory module 31 is used to store the status information in the calculating process, wherein the state according to operational order
Information includes status information in the preamble treatment process of the main process task circuit 331, the multiple from processing circuit 332
Between the status information in calculating process, at least one in the status information in the subsequent processes of the main process task circuit 331
Kind.The memory module may include on piece sub-module stored 310, and the on piece sub-module stored 310 may include that high speed is temporary
Deposit memory.
Memory module 31 can also include register, one or any combination in caching, specifically, the caching,
For storing the computations;The register, for storing the neural network model, the data and scalar;It is described
Caching is that scratchpad caches.
In a kind of possible embodiment, control module 32 may include: instruction buffer submodule 320, instruction processing
Submodule 321 and storage queue submodule 323;
Instruction buffer submodule 320, for storing the associated computations of the neural network model;
Described instruction handles submodule 321, obtains multiple operational orders for parsing to the computations;
Storage queue submodule 323, for storing instruction queue, the instruction queue include: the tandem by the queue
Pending multiple operational orders or computations.
For example, main process task circuit 331 also may include a control module in a kind of possible embodiment
32, which may include master instruction processing submodule, be specifically used for Instruction decoding into microcommand.Certainly in one kind
It also may include another control module 32 from processing circuit 332 in possible embodiment, another control module 32 packet
It includes from instruction and handles submodule, specifically for receiving and processing microcommand.Above-mentioned microcommand can be the next stage instruction of instruction,
The microcommand can further can be decoded as each component, each module or everywhere by obtaining after the fractionation or decoding to instruction
Manage the control signal of circuit.
In a kind of optinal plan, the structure of the computations can be as shown in table 1.
Table 1
Operation code |
Register or immediate |
Register/immediate |
... |
Ellipsis expression in upper table may include multiple registers or immediate.
In alternative dispensing means, which may include: one or more operation domains and an operation code.
The computations may include neural network computing instruction.By taking neural network computing instructs as an example, as shown in table 1, wherein deposit
Device number 0, register number 1, register number 2, register number 3, register number 4 can be operation domain.Wherein, each register number 0,
Register number 1, register number 2, register number 3, register number 4 can be the number of one or more register.For example, such as
Shown in table 2.
Table 2
Above-mentioned register can be chip external memory, certainly in practical applications, or on-chip memory, for depositing
Store up data, which is specifically as follows t dimension data, and t is the integer more than or equal to 1, for example, be 1 dimension data when t=1, i.e., to
Amount is 2 dimension datas, i.e. matrix when such as t=2, is multidimensional tensor when such as t=3 or 3 or more.
Optionally, which can also include:
Dependence handles submodule 322, for when with multiple operational orders, determine the first operational order with it is described
The 0th operational order before first operational order whether there is incidence relation, such as first operational order and the 0th fortune
Calculating instruction, there are incidence relations, then first operational order are buffered in described instruction cache sub-module, the described 0th
After operational order is finished, first operational order is extracted from described instruction cache sub-module and is transmitted to the operation mould
Block;
The determination first operational order whether there is with the 0th operational order before the first operational order to be associated with
System includes:
Extract required data (such as matrix) in first operational order according to first operational order first is deposited
Address section is stored up, the 0th stored address area of required matrix in the 0th operational order is extracted according to the 0th operational order
Between, such as first storage address section has Chong Die region with the 0th storage address section, it is determined that described first
Operational order and the 0th operational order have incidence relation, such as first storage address section and the 0th storage
Location section does not have the region of overlapping, it is determined that first operational order does not have with the 0th operational order to be associated with
System.
Referring to Fig. 10, Figure 10 shows the schematic diagram of the artificial intelligence process device according to one embodiment of the disclosure.
In a kind of possible embodiment, computing module 33 may include branch process circuit 333 as shown in Figure 10;
Its specific connection structure is as shown in Figure 10, wherein
Main process task circuit 331 is connect with branch process circuit 333, branch process circuit 333 and multiple from processing circuit 332
Connection;
Branch process circuit 333, for execute forwarding main process task circuit 331 and between processing circuit 332 data or
Instruction.
In a kind of possible embodiment, by taking the full connection operation in neural network computing as an example, process can be with are as follows: y
=f (wx+b), wherein x is to input neural variable matrix, and w is weight matrix, and b is biasing scalar, and f is activation primitive, specifically can be with
Are as follows: sigmoid function, any one in tanh, relu, softmax function.It is assumed that being binary tree structure, have 8
A method from processing circuit, realized can be with are as follows:
Control module obtains input nerve variable matrix x, weight matrix w out of memory module 31 and full connection operation refers to
It enables, input nerve variable matrix x, weight matrix w and full connection operational order is transferred to main process task circuit;
Main process task circuit splits into 8 submatrixs for nerve variable matrix x is inputted, and 8 submatrixs are then passed through tree-shaped mould
Block is distributed to 8 from processing circuit, and weight matrix w is broadcast to 8 from processing circuit,
The multiplying and accumulating operation for executing 8 submatrixs and weight matrix w parallel from processing circuit obtain 8 centres
As a result, 8 intermediate results are sent to main process task circuit;
The operation result is executed biasing for sorting to obtain the operation result of wx by 8 intermediate results by main process task circuit
Activation operation is executed after the operation of b and obtains final result y, final result y is sent to control module, control module is final by this
As a result y is exported or is stored to memory module 31.
The method that neural network computing device as shown in Figure 10 executes the instruction of neural network forward operation is specifically as follows:
Control module 32 extracted out of memory module 31 operational data (such as neural network forward operation instruction, nerve net
Network operational order) operation domain is transmitted to data access by corresponding operation domain and at least one operation code, control module 32
At least one operation code is sent to computing module by module.
Control module 32 extracts the corresponding weight w of the operation domain out of memory module 31 and biasing b (when b is 0, is not required to
It extracts biasing b), weight w and biasing b is transmitted to the main process task circuit of computing module, control module is out of memory module 31
Input data Xi is extracted, input data Xi is sent to main process task circuit.
Input data Xi is split into n data block by main process task circuit;
The instruction processing submodule 321 of control module 32 determines that multiplying order, biasing refer to according at least one operation code
It enables and accumulated instruction, multiplying order, offset instructions and accumulated instruction is sent to main process task circuit, main process task circuit is by the multiplication
Instruction, weight w are sent to multiple from processing circuit in a broadcast manner, which are distributed to multiple electric from processing
Road (such as with n from processing circuit, then each sending a data block from processing circuit);It is multiple from processing circuit, use
Intermediate result is obtained in the weight w is executed multiplying with the data block received according to the multiplying order, which is tied
Fruit is sent to main process task circuit, which holds multiple intermediate results sent from processing circuit according to the accumulated instruction
Row accumulating operation obtains accumulation result, and accumulation result execution biasing is held b according to the bigoted instruction and obtains final result, by this
Final result is sent to the control module.
In addition, the sequence of add operation and multiplying can exchange.
Technical solution provided by the present application is that neural network computing instruction realizes neural network by an instruction
Multiplying and biasing operation are not necessarily to store or extract, reduce intermediate data in the intermediate result of neural computing
Storage and extraction operation, so it, which has, reduces corresponding operating procedure, the advantages of improving the calculating effect of neural network.
Figure 11 is please referred to, Figure 11 shows the schematic diagram of the artificial intelligence process device according to one embodiment of the disclosure.
In a kind of possible embodiment, computing module 33 may include a main process task circuit 331 as shown in figure 11
With multiple from processing circuit 332.
In a kind of possible embodiment, as shown in figure 11, it is multiple from processing circuit be in array distribution;Each from processing
Circuit is connect with other adjacent from processing circuit, and main process task circuit connection is the multiple a from processing from the k in processing circuit
Circuit, the k is from processing circuit are as follows: n of the 1st row arrange from processing circuit, n of m row from processing circuit and the 1st
M is from processing circuit, it should be noted that as shown in figure 11 K only include n of the 1st row from processing circuit from processing electricity
Road, the n m arranged from processing circuit and the 1st of m row are a from processing circuit, i.e. the k are multiple from processing from processing circuit
In circuit directly with the slave processing circuit of main process task circuit connection.
K is from processing circuit, in the main process task circuit and multiple data between processing circuit and referring to
The forwarding of order.
In some embodiments, a kind of chip has also been applied for comprising above-mentioned artificial intelligence process device.
In some embodiments, a kind of chip-packaging structure has been applied for comprising said chip.
In some embodiments, a kind of board has been applied for comprising said chip encapsulating structure.
Figure 12 is please referred to, Figure 12 shows a kind of board according to one embodiment of the disclosure, and above-mentioned board is in addition to including
Can also include other matching components, which includes but is not limited to other than said chip 389: memory device 390,
Interface arrangement 391 and control device 392;
The memory device 390 is connect with the chip in the chip-packaging structure by bus, for storing data.Institute
Stating memory device may include multiple groups storage unit 393.Storage unit described in each group is connect with the chip by bus.It can
To understand, storage unit described in each group can be DDR SDRAM (English: Double Data Rate SDRAM, Double Data Rate
Synchronous DRAM).
DDR, which does not need raising clock frequency, can double to improve the speed of SDRAM.DDR allows the rising in clock pulses
Edge and failing edge read data.The speed of DDR is twice of standard SDRAM.In one embodiment, the storage device can be with
Including storage unit described in 4 groups.Storage unit described in each group may include multiple DDR4 particles (chip).Implement at one
In example, the chip interior may include 4 72 DDR4 controllers, and 64bit is used for transmission in above-mentioned 72 DDR4 controllers
Data, 8bit are used for ECC check.It is appreciated that when using DDR4-3200 particle in the storage unit described in each group, data
The theoretical bandwidth of transmission can reach 25600MB/s.
In one embodiment, storage unit described in each group include multiple Double Data Rate synchronous dynamics being arranged in parallel with
Machine memory.DDR can transmit data twice within a clock cycle.The control of setting control DDR in the chips
Device, the control for data transmission and data storage to each storage unit.
The interface arrangement is electrically connected with the chip in the chip-packaging structure.The interface arrangement is for realizing described
Data transmission between chip and external equipment (such as server or computer).Such as in one embodiment, the interface
Device can be standard PCIE interface.For example, data to be processed are transferred to the core by standard PCIE interface by server
Piece realizes data transfer.Preferably, when using the transmission of 16 interface of PCIE 3.0X, theoretical bandwidth can reach 16000MB/s.
In another embodiment, the interface arrangement can also be other interfaces, and the application is not intended to limit above-mentioned other interfaces
Specific manifestation form, the interface unit can be realized signaling transfer point.In addition, the calculated result of the chip is still by institute
It states interface arrangement and sends back external equipment (such as server).
The control device is electrically connected with the chip.The control device is for supervising the state of the chip
Control.Specifically, the chip can be electrically connected with the control device by SPI interface.The control device may include list
Piece machine (Micro Controller Unit, MCU).If the chip may include multiple processing chips, multiple processing cores or more
A processing circuit can drive multiple loads.Therefore, the chip may be at the different work shape such as multi-load and light load
State.It may be implemented by the control device to processing chips multiple in the chip, multiple processing and/or multiple processing circuits
Working condition regulation.
In some embodiments, a kind of electronic equipment has been applied for comprising above-mentioned board.
Electronic equipment include data processing equipment, robot, computer, printer, scanner, tablet computer, intelligent terminal,
Mobile phone, automobile data recorder, navigator, sensor, camera, server, cloud server, camera, video camera, projector, hand
Table, earphone, mobile storage, wearable device, the vehicles, household electrical appliance, and/or Medical Devices.
The vehicles include aircraft, steamer and/or vehicle;The household electrical appliance include TV, air-conditioning, micro-wave oven,
Refrigerator, electric cooker, humidifier, washing machine, electric light, gas-cooker, kitchen ventilator;The Medical Devices include Nuclear Magnetic Resonance, B ultrasound instrument
And/or electrocardiograph.
It should be noted that for the various method embodiments described above, for simple description, therefore, it is stated as a series of
Combination of actions, but those skilled in the art should understand that, the application is not limited by the described action sequence because
According to the application, some steps may be performed in other sequences or simultaneously.Secondly, those skilled in the art should also know
It knows, embodiment described in this description belongs to alternative embodiment, related actions and modules not necessarily the application
It is necessary.
In the above-described embodiments, it all emphasizes particularly on different fields to the description of each embodiment, there is no the portion being described in detail in some embodiment
Point, reference can be made to the related descriptions of other embodiments.
In several embodiments provided herein, it should be understood that disclosed device, it can be by another way
It realizes.For example, the apparatus embodiments described above are merely exemplary, such as the division of the module, it is only a kind of
Logical function partition, there may be another division manner in actual implementation, such as multiple module or components can combine or can
To be integrated into another system, or some features can be ignored or not executed.Another point, shown or discussed is mutual
Coupling, direct-coupling or communication connection can be through some interfaces, the indirect coupling or communication connection of device or module,
It can be electrical or other forms.
The module as illustrated by the separation member may or may not be physically separated, aobvious as module
The component shown may or may not be physical module, it can and it is in one place, or may be distributed over multiple
On network module.Some or all of the modules therein can be selected to realize the mesh of this embodiment scheme according to the actual needs
's.
It, can also be in addition, can integrate in a processing module in each functional module in each embodiment of the application
It is that modules physically exist alone, can also be integrated in two or more modules in a module.Above-mentioned integrated mould
Block both can take the form of hardware realization, can also be realized in the form of software program module.
If the integrated module is realized in the form of software program module and sells or use as independent product
When, it can store in a computer-readable access to memory.Based on this understanding, the technical solution of the application substantially or
Person says that all or part of the part that contributes to existing technology or the technical solution can body in the form of software products
Reveal and, which is stored in a memory, including some instructions are used so that a computer equipment
(can be personal computer, server or network equipment etc.) executes all or part of each embodiment the method for the application
Step.And memory above-mentioned includes: USB flash disk, read-only memory (ROM, Read-Only Memory), random access memory
The various media that can store program code such as (RAM, Random Access Memory), mobile hard disk, magnetic or disk.
Those of ordinary skill in the art will appreciate that all or part of the steps in the various methods of above-described embodiment is can
It is completed with instructing relevant hardware by program, which can store in a computer-readable memory, memory
May include: flash disk, read-only memory (English: Read-Only Memory, referred to as: ROM), random access device (English:
Random Access Memory, referred to as: RAM), disk or CD etc..
Those of ordinary skill in the art will appreciate that all or part of the steps in the various methods of above-described embodiment is can
It is completed with instructing relevant hardware by program, which can store in a computer-readable memory, memory
May include: flash disk, read-only memory (English: Read-Only Memory, referred to as: ROM), random access device (English:
Random Access Memory, referred to as: RAM), disk or CD etc..
The presently disclosed embodiments is described above, above description is exemplary, and non-exclusive, and
It is not limited to disclosed each embodiment.Without departing from the scope and spirit of illustrated each embodiment, for this skill
Many modifications and changes are obvious for the those of ordinary skill in art field.The selection of term used herein, purport
In the principle, practical application or technological improvement to the technology in market for best explaining each embodiment, or lead this technology
Other those of ordinary skill in domain can understand each embodiment disclosed herein.