CN109657782A - Operation method, device and Related product - Google Patents

Operation method, device and Related product Download PDF

Info

Publication number
CN109657782A
CN109657782A CN201811536153.2A CN201811536153A CN109657782A CN 109657782 A CN109657782 A CN 109657782A CN 201811536153 A CN201811536153 A CN 201811536153A CN 109657782 A CN109657782 A CN 109657782A
Authority
CN
China
Prior art keywords
operator
cutting
input data
artificial intelligence
dimension
Prior art date
Legal status (The legal status is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the status listed.)
Granted
Application number
CN201811536153.2A
Other languages
Chinese (zh)
Other versions
CN109657782B (en
Inventor
不公告发明人
Current Assignee (The listed assignees may be inaccurate. Google has not performed a legal analysis and makes no representation or warranty as to the accuracy of the list.)
Anhui Cambricon Information Technology Co Ltd
Original Assignee
Beijing Zhongke Cambrian Technology Co Ltd
Priority date (The priority date is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the date listed.)
Filing date
Publication date
Application filed by Beijing Zhongke Cambrian Technology Co Ltd filed Critical Beijing Zhongke Cambrian Technology Co Ltd
Priority to CN201811536153.2A priority Critical patent/CN109657782B/en
Publication of CN109657782A publication Critical patent/CN109657782A/en
Application granted granted Critical
Publication of CN109657782B publication Critical patent/CN109657782B/en
Active legal-status Critical Current
Anticipated expiration legal-status Critical

Links

Classifications

    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06NCOMPUTING ARRANGEMENTS BASED ON SPECIFIC COMPUTATIONAL MODELS
    • G06N3/00Computing arrangements based on biological models
    • G06N3/02Neural networks
    • G06N3/04Architecture, e.g. interconnection topology
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06NCOMPUTING ARRANGEMENTS BASED ON SPECIFIC COMPUTATIONAL MODELS
    • G06N3/00Computing arrangements based on biological models
    • G06N3/02Neural networks

Landscapes

  • Engineering & Computer Science (AREA)
  • Theoretical Computer Science (AREA)
  • Physics & Mathematics (AREA)
  • General Health & Medical Sciences (AREA)
  • Computing Systems (AREA)
  • Biomedical Technology (AREA)
  • Biophysics (AREA)
  • Computational Linguistics (AREA)
  • Data Mining & Analysis (AREA)
  • Evolutionary Computation (AREA)
  • Life Sciences & Earth Sciences (AREA)
  • Molecular Biology (AREA)
  • Artificial Intelligence (AREA)
  • General Engineering & Computer Science (AREA)
  • General Physics & Mathematics (AREA)
  • Mathematical Physics (AREA)
  • Software Systems (AREA)
  • Health & Medical Sciences (AREA)
  • Stored Programmes (AREA)
  • Management, Administration, Business Operations System, And Electronic Commerce (AREA)

Abstract

This disclosure relates to a kind of operation method, device and Related product, the product includes control module, and the control module includes: instruction buffer submodule, instruction buffer submodule and storage queue submodule;Described instruction cache sub-module, for storing the associated computations of artificial neural network operation;Described instruction handles submodule, obtains multiple operational orders for parsing to the computations;The storage queue submodule, for storing instruction queue, the instruction queue include: by the pending multiple operational orders of the tandem of the queue or computations.By above method, operation efficiency of the Related product when carrying out the operation of neural network model is can be improved in the disclosure.

Description

Operation method, device and Related product
Technical field
This disclosure relates to machine learning techniques field more particularly to a kind of operation method, device and Related product.
Background technique
Neural network algorithm is a kind of nearest popular machine learning algorithm, is all achieved very in various fields Good effect, such as image recognition, speech recognition, natural language processing etc..With the development of neural network algorithm, algorithm is answered Miscellaneous degree is also higher and higher, and in order to improve resolution, the scale of model is also being gradually increased.These big rule have been handled with GPU and CPU The model of mould will spend a large amount of calculating time, and power consumption is very big.In this case, new artificial intelligence process device It is suggested to improve the arithmetic speed of neural network model, saves operation time, reduce power consumption.However, currently to new artificial The algorithm of intelligent processor is supported far from enough.
Summary of the invention
In view of this, the present disclosure proposes a kind of operation methods, which comprises
The cutting operator and basic operator, the cutting operator obtained in artificial intelligence Operator Library is used for input data Cutting processing is carried out in specified dimension, the basis operator is used to execute corresponding arithmetic operation to input data;
The cutting operator and the basic operator are spliced to form splicing operator,
Wherein, the splicing operator is used in artificial intelligence process device execute input data corresponding splicing operation behaviour Make, to execute artificial intelligence operation.
It is described to splice the cutting operator to be formed with the basic operator in a kind of possible embodiment Splice operator, comprising:
Using the cutting operator as the prime operator of the basic operator.
In a kind of possible embodiment, the basis operator includes connection operator, and the connection operator is used for defeated Enter data and be attached processing in specified dimension, wherein the splicing arithmetic operation includes:
Input data is subjected to cutting processing in specified dimension using cutting operator, thus by the input data specified Cutting is M subdata in dimension, and M is the integer greater than 1;
N number of subdata in the M subdata is attached processing to obtain processing knot using the connection operator Fruit, N are integer greater than 1, and N≤M.
In a kind of possible embodiment, the basis operator includes dimension order transposed operator, the dimension order Transposed operator is for rearranging the dimension order of input data, wherein described by the cutting operator and the base Plinth operator is spliced to form splicing operator and include:
Using the dimension order transposed operator as the prime operator of the cutting operator.
In a kind of possible embodiment, the splicing arithmetic operation includes:
The dimension order of input data is rearranged using the dimension order transposed operator, by input data In specified dimension be transformed into the last one dimension of input data;
Cutting is carried out using data of the cutting operator to the last one dimension of the input data after rearranging, from And be P subdata by input data cutting, P is the integer greater than 1.
In a kind of possible embodiment, it is described using the cutting operator to the input data after rearranging most The data of the latter dimension carry out cutting, to be P subdata by input data cutting, comprising:
The quantity of the data of the last one dimension of the input data after described rearrange is greater than the cutting operator When single maximum cutting quantity, multiple cutting is carried out to input data, until being P subdata by input data cutting.
In a kind of possible embodiment, the splicing operator is applied to the application layer in software transfer level, The deep learning Operator Library is located at the Operator Library layer in software transfer level, and the artificial intelligence process device is located at software transfer Chip layer in level.
According to another aspect of the present disclosure, a kind of arithmetic unit is proposed, described device includes:
Module is obtained, for obtaining cutting operator and basic operator in artificial intelligence Operator Library, the cutting operator For will carry out cutting processing in input data specified dimension, the basis operator is used to execute corresponding operation to input data Operation;
Computing module is connected to the acquisition module, for splicing the cutting operator and the basic operator Splice operator to be formed,
Wherein, the splicing operator is used in artificial intelligence process device execute input data corresponding splicing operation behaviour Make, to execute artificial intelligence operation.
In a kind of possible embodiment, the computing module includes:
First operation submodule, for using the cutting operator as the prime operator of the basic operator.
In a kind of possible embodiment, the basis operator includes connection operator, and the connection operator is used for defeated Enter data and be attached processing in specified dimension, wherein the splicing arithmetic operation includes:
Input data is subjected to cutting processing in specified dimension using cutting operator, thus by the input data specified Cutting is M subdata in dimension, and M is the integer greater than 1;
N number of subdata in the M subdata is attached processing to obtain processing knot using the connection operator Fruit, N are integer greater than 1, and N≤M.
In a kind of possible embodiment, the basis operator includes dimension order transposed operator, the dimension order Transposed operator is for rearranging the dimension order of input data, wherein the computing module further include:
Second operation submodule, for using the dimension order transposed operator as the prime operator of the cutting operator.
In a kind of possible embodiment, the splicing arithmetic operation includes:
The dimension order of input data is rearranged using the dimension order transposed operator, by input data In specified dimension be transformed into the last one dimension of input data;
Cutting is carried out using data of the cutting operator to the last one dimension of the input data after rearranging, from And be P subdata by input data cutting, P is the integer greater than 1.
In a kind of possible embodiment, it is described using the cutting operator to the input data after rearranging most The data of the latter dimension carry out cutting, to be P subdata by input data cutting, comprising:
The quantity of the data of the last one dimension of the input data after described rearrange is greater than the cutting operator When single maximum cutting quantity, multiple cutting is carried out to input data, until being P subdata by input data cutting.
According to another aspect of the present disclosure, a kind of artificial intelligence process device is proposed, described device includes:
Primary processor, for executing the method, to obtain splicing operator, the splicing operator is used for the input Data execute corresponding arithmetic operation;
Artificial intelligence process device is electrically connected to the primary processor;
The primary processor is also used to send input data and the splicing operator to artificial intelligence process device, described artificial Intelligent processor is configured as:
Receive the input data and splicing operator that primary processor is sent;
Artificial intelligence operation is carried out to obtain operation result to the input data using the splicing operator;
The operation result is sent to the primary processor.
In a kind of possible embodiment, the primary processor further includes primary processor memory space, for storing State splicing operator, wherein
The primary processor also provides for input data and the splicing being stored in the primary processor memory space is calculated Son.
In a kind of possible embodiment, the artificial intelligence process device passes to operation result by I/O interface The primary processor;
It, can between the multiple artificial intelligence process device when described device includes multiple artificial intelligence process devices To be attached by specific structure and transmit data;
Wherein, multiple artificial intelligence process devices are interconnected simultaneously by quick external equipment interconnection Bus PC IE bus Data are transmitted, to support the operation of more massive artificial intelligence;Multiple artificial intelligence process devices share same control system It unites or possesses respective control system;Multiple artificial intelligence process device shared drives possess respective memory;It is multiple The mutual contact mode of the artificial intelligence process device is any interconnection topology.
In a kind of possible embodiment, the device, further includes: storage device, the storage device respectively with institute It states artificial intelligence process device to connect with the primary processor, for saving the artificial intelligence process device device and the main process task The data of device.
According to another aspect of the present disclosure, a kind of artificial intelligence chip is proposed, the artificial intelligence chip includes described Artificial intelligence process device.
According to another aspect of the present disclosure, a kind of electronic equipment is proposed, the electronic equipment includes the artificial intelligence It can chip.
According to another aspect of the present disclosure, propose a kind of board, the board include: memory device, interface arrangement and Control device and the artificial intelligence chip;
Wherein, the artificial intelligence chip and the memory device, the control device and the interface arrangement are distinguished Connection;
The memory device, for storing data;
The interface arrangement, for realizing the data transmission between the chip and external equipment;
The control device is monitored for the state to the chip.
In a kind of possible embodiment, the memory device includes: multiple groups storage unit, is stored described in each group single It is first to be connect with the chip by bus, the storage unit are as follows: DDR SDRAM;
The chip includes: DDR controller, the control for data transmission and data storage to each storage unit System;
The interface arrangement are as follows: standard PCIE interface.
According to another aspect of the present disclosure, a kind of non-volatile computer readable storage medium storing program for executing is provided, is stored thereon with Computer program instructions, wherein the computer program instructions realize the above method when being executed by processor.
By above method, cutting operator and basic operator in the available artificial intelligence Operator Library of the disclosure, by institute It states cutting operator and the basic operator is spliced to form splicing operator, the splicing operator of formation can be used for supporting new Artificial intelligence process device, to improve operation efficiency of the new artificial intelligence process device when carrying out the operation of neural network model.
According to below with reference to the accompanying drawings to detailed description of illustrative embodiments, the other feature and aspect of the disclosure will become It is clear.
Detailed description of the invention
Comprising in the description and constituting the attached drawing of part of specification and specification together illustrates the disclosure Exemplary embodiment, feature and aspect, and for explaining the principles of this disclosure.
Fig. 1 shows the flow chart of the operation method according to one embodiment of the disclosure.
Fig. 2 shows the software transfer hierarchical relationship schematic diagrames according to one embodiment of the disclosure.
Fig. 3 shows the schematic diagram of the splicing operator according to one embodiment of the disclosure.
Fig. 4 a and Fig. 4 b show the schematic diagram of the splicing operator according to one embodiment of the disclosure.
Fig. 5 shows the block diagram of the arithmetic unit according to one embodiment of the disclosure.
Fig. 6 shows the block diagram of the arithmetic unit according to one embodiment of the disclosure.
Fig. 7 shows the block diagram of the artificial intelligence process device according to one embodiment of the disclosure.
Fig. 8 shows the block diagram of the artificial intelligence process device according to one embodiment of the disclosure.
Fig. 9 shows the block diagram of the artificial intelligence process device according to one embodiment of the disclosure.
Figure 10 shows the block diagram of the main process task circuit 331 according to one embodiment of the disclosure.
Figure 11 shows the schematic diagram of the artificial intelligence process device according to one embodiment of the disclosure.
Figure 12 shows the schematic diagram of the artificial intelligence process device according to one embodiment of the disclosure.
Figure 13 shows a kind of board according to one embodiment of the disclosure.
Specific embodiment
Various exemplary embodiments, feature and the aspect of the disclosure are described in detail below with reference to attached drawing.It is identical in attached drawing Appended drawing reference indicate element functionally identical or similar.Although the various aspects of embodiment are shown in the attached drawings, remove It non-specifically points out, it is not necessary to attached drawing drawn to scale.
Dedicated word " exemplary " means " being used as example, embodiment or illustrative " herein.Here as " exemplary " Illustrated any embodiment should not necessarily be construed as preferred or advantageous over other embodiments.
In addition, giving numerous details in specific embodiment below to better illustrate the disclosure. It will be appreciated by those skilled in the art that without certain details, the disclosure equally be can be implemented.In some instances, for Method, means, element and circuit well known to those skilled in the art are not described in detail, in order to highlight the purport of the disclosure.
Referring to Fig. 1, Fig. 1 shows the flow chart of the operation method according to one embodiment of the disclosure.
The method can be applied in server or terminal, as shown in Figure 1, which comprises
Step S110, the cutting operator (split) and basic operator, the cutting obtained in artificial intelligence Operator Library are calculated Son is used to execute input data corresponding fortune for will carry out cutting processing, the basis operator in input data specified dimension Calculate operation;
Step S120 splices the cutting operator and the basic operator to form splicing operator,
Wherein, the splicing operator is used in artificial intelligence process device execute input data corresponding splicing operation behaviour Make, to execute artificial intelligence operation.
By above method, cutting operator and basic operator in the available artificial intelligence Operator Library of the disclosure, by institute It states cutting operator and the basic operator is spliced to form splicing operator, the splicing operator of formation can be used for supporting new Artificial intelligence process device, to improve operation efficiency of the new artificial intelligence process device when carrying out the operation of neural network model.
The splicing operator formed by above method, can be used as a part of artificial intelligence operation, when splicing operator fortune When for carrying out artificial intelligence operation in artificial intelligence process device, including but not limited to speech recognition, image recognition may be implemented Deng application, splicing operator is formed by combining deformation operator and basic operator, artificial intelligence process device can be allowed more preferable Realize artificial intelligence operation in ground.
In a kind of possible embodiment, operator (operator) can be common algorithm in artificial intelligence, and quilt Referred to as layer, operation, each neural network of node corresponds to a network structure, and the node in figure is operator.It can be pre- It is first provided with artificial intelligence Operator Library, may include multiple basic operators (such as convolution operator, Quan Lian in artificial intelligence Operator Library Connect operator, pond operator, activation operator etc.), each basis operator can be by including but not limited to central processor CPU, image The processors such as processor GPU are called to realize corresponding basic function.
In a kind of possible embodiment, the dimension of the first input data can be 4, when the first input data is picture When data, each dimension of the first input data can indicate picture number, picture channel (Channel) quantity, picture height, Picture width.In other embodiments, when the first input data is image data, but the dimension of the first input data is less than 4 When (for example, 3), each dimension of the first input data can indicate picture number, picture number of channels, picture height, picture Any 3 kinds in width.
The cutting operator, can be according to number of slices to the cutting number of input data when carrying out cutting to input data Amount parameter (num_out) is set, and the specified dimension of cutting can also be set by corresponding parameter, and the disclosure is to cutting The specific number of subdata and the dimension of cutting after point are not construed as limiting.
In one example, split operator cuts two panels to the one-dimensional data that length is 4, and the first two data are a subnumber According to latter two data is another subdata.
In a kind of possible embodiment, step S120 by the cutting operator and the basic operator splice with Form splicing operator, comprising:
Using the cutting operator as the prime operator of the basic operator.
In a kind of possible embodiment, the basis operator includes connection operator (concat), the connection operator For being attached processing in specified dimension to input data, wherein the splicing arithmetic operation includes:
Input data is subjected to cutting processing in specified dimension using cutting operator, thus by the input data specified Cutting is M subdata in dimension, and M is the integer greater than 1;
N number of subdata in the M subdata is attached processing to obtain processing knot using the connection operator Fruit, N are integer greater than 1, and N≤M.
In a kind of possible embodiment, the specified dimension can according to the demand of artificial intelligence operation and other want It asks and is set, relevant parameter can be set in input data to specify the specified dimension.
In one example, if input data is the 2-D data that a length of 4 width is 2, split operator is cut according to length Input data cutting can be four subdatas by piece, and respectively four length are the 2-D data that 1 width is 2, are connected when specified When connecing operator and being attached operation to second subdata-third subdata, concat operator arrives third number for second According to being spliced, formation length is the 2-D data that 2 width are 2.
It is to be understood that can be set by the parameter in input data to the specified of N number of subdata, such as can To be specified by initial data (start) and end data (end) parameter, determine when by start parameter and end parameter When concat operator is operated, N number of subdata can be the data (including start, end) between start-end, The data (including start, end-1) being also possible between start~end-1.
It, can be with when needing using artificial intelligence operation to carry out speech recognition, image procossing in an application example It is operated using the splicing operator (cutting operator+connection operator) in one embodiment of the disclosure.Using described in the disclosure Splicing operator, artificial intelligence operation can be executed more advantageously to realize including but not limited to image procossing, speech recognition Deng application, to improve the efficiency of artificial intelligence operation.
By above method, the disclosure can obtain splicing operator, the splicing operator according to cutting operator and connection operator Cutting can be carried out first in input data, be the subdata of preset number by input data cutting, then utilize connection operator Specified multiple subdatas are attached, to obtain output data.
In a kind of possible embodiment, the basis operator includes dimension order transposed operator (transpose_ Pro), the dimension order transposed operator is for rearranging the dimension order of input data, wherein step S120 will The cutting operator is spliced with the basic operator to form splicing operator and include:
Using the dimension order transposed operator as the prime operator of the cutting operator.
In a kind of possible embodiment, the splicing arithmetic operation includes:
The dimension order of input data is rearranged using the dimension order transposed operator, by input data In specified dimension be transformed into the last one dimension of input data;
Cutting is carried out using data of the cutting operator to the last one dimension of the input data after rearranging, from And be P subdata by input data cutting, P is the integer greater than 1.
In a kind of possible embodiment, it is described using the cutting operator to the input data after rearranging most The data of the latter dimension carry out cutting, to be P subdata by input data cutting, comprising:
The quantity of the data of the last one dimension of the input data after described rearrange is greater than the cutting operator When single maximum cutting quantity, multiple cutting is carried out to input data, until being P subdata by input data cutting.
In a kind of possible embodiment, splicing operator can be the instruction that parameter (squeeze_) is removed according to dimension It is operated, the dimension removes parameter and is used to indicate whether to remove the specified dimension in input data, for example, in the dimension Degree removes parameter and shows that splicing operator carries out dimension order weight to input data when needing to remove the specified dimension in input data Specified dimension in input data, is transformed into the last one dimension of input data by new arrangement, then to rearranging after The data of the last one dimension of input data carry out cutting, to be P subdata by input data cutting, P is greater than 1 Integer.
In a kind of possible embodiment, dimension removes parameter (squeeze_) and removes output data for deciding whether The dimension that size is 1 in (input data carries out the data after cutting).For example there are two dimension is long for two-dimentional input data tool And width, size are respectively 1 and 4, and width is cut into two panels, two dimensions of obtained data: long and wide size becomes 1 and 2, due to Slice is carried out on width, although long size is 1, cannot remove length, squeeze_ can only be arranged to when this false.Only when by it is wide be cut into 4 when, squeeze_ can just be set as true, obtained data without this wide dimension, but Also retain this long dimension, to realize the purpose of dimensionality reduction.
Specifically, transpose_pro operator is to input data, by its original dimension order according to given sequence weight Row completes transposition.Input data dimension is generally not more than the four-dimension, and output data is consistent with input data dimension.Such as two-dimemsional number According to [[1,2], [3,4]] transposition, dimension initial order is 12, rearrangement sequence 21, obtained output data be [[1,3], [2, 4]]。
In a kind of possible embodiment, the specified dimension can be configured in advance, can be according to artificial intelligence The needs of energy operation are configured, and the disclosure is not construed as limiting this.
It is to be understood that usually the dimension of slice is changed to when needing the specified dimension to data to remove Last one-dimensional, other dimension relative ranks are constant.
For example, when cutting operator single maximum cutting quantity is 55, (disclosure is not construed as limiting, and is implemented in others In mode, it is also possible to other), it is 3*3*3*182 4 D data for a size, if by the 4 D data the 4th Dimension cutting is 182 subdatas, then carries out cutting on fourth dimension using split operator, be cut into 182 parts, then the Cutting is cut into four parts: 3*3*3*55,3*3*3*55,3*3*3*55,3*3*3*17, then with split operator by four Cutting is 3*3*3*1,3*3*3*1 ..., 3*3*3*1 again for part, is finally reached the mesh for being cut into the aliquot of 182 3*3*3*1 's.
In this example, after completing to the cutting of the 4 D data, since the instruction of squeeze_ parameter is needed the dimension It removes, at this point, 182 3*3*3*1 data are reduced to three-dimensional from the four-dimension.
It, can be with when needing using artificial intelligence operation to carry out speech recognition, image procossing in an application example Utilize splicing operator (dimension order transposed operator (transpose_pro) and the cutting operator in one embodiment of the disclosure (split)) Lai Jinhang cutting operation, to carry out cutting using by input data when input data is met certain condition.It adopts The splicing operator described in the disclosure can execute artificial intelligence operation more advantageously to realize at including but not limited to image The application such as reason, speech recognition, to improve the efficiency of artificial intelligence operation.
Referring to Fig. 2, Fig. 2 shows the software transfer hierarchical relationship schematic diagrames according to one embodiment of the disclosure.
As shown in Fig. 2, software transfer hierarchical relationship from top to bottom successively include application layer, ccf layer, Operator Library layer, Drive layer, chip layer, wherein the splicing operator obtained by foregoing operation method can be applied to application layer, artificial intelligence Energy Operator Library can be in Operator Library layer, and artificial intelligence process device can be located in chip layer, and driving layer may include for driving The driver of dynamic chip layer work.
It, can by described above it is found that using the deformation operator in Operator Library layer and after basic operator forms splicing operator Directly to be called by application layer to be applied in application layer, to realize corresponding function in artificial intelligence operation Can, it avoids and requires to transfer deformation operator from Operator Library layer each time when application layer will carry out artificial intelligence operation And the case where basis operator, so as to improve the implementation procedure of artificial intelligence operation.
Referring to Fig. 3, Fig. 3 shows the schematic diagram of the splicing operator according to one embodiment of the disclosure.
As shown in figure 3, the splicing operator includes:
Cutting operator 10, the cutting operator 10 are used to that cutting processing will to be carried out in input data specified dimension;
Basic operator 20, the basis operator 20 are used to execute corresponding arithmetic operation to input data.
Wherein, the cutting operator 10 and the basic operator 20 are from artificial intelligence Operator Library.
By splicing operator above, the disclosure can use cutting operator and input data carried out cutting, be calculated using basis Son carries out corresponding arithmetic operation to input data, and by the deformation operator and basic operator, splicing operator can execute phase The splicing arithmetic operation answered.
Please refer to the signal for the splicing operator that Fig. 4 a and Fig. 4 b, Fig. 4 a and Fig. 4 b are shown according to one embodiment of the disclosure Figure.
As shown in fig. 4 a, in a kind of possible embodiment, the basis operator 20 is connection operator 21, the connection Operator 21 is for being attached processing in specified dimension to input data, wherein the splicing arithmetic operation includes:
Input data is subjected to cutting processing in specified dimension using cutting operator, thus by the input data specified Cutting is M subdata in dimension, and M is the integer greater than 1;
N number of subdata in the M subdata is attached processing to obtain processing knot using the connection operator Fruit, N are integer greater than 1, and N≤M.
As shown in Figure 4 b, in a kind of possible embodiment, the basis operator 20 is dimension order transposed operator 22, The dimension order transposed operator 22 is for rearranging the dimension order of input data, wherein described to cut described Operator 10 is divided to be spliced with the basic operator 20 to form splicing operator and include:
Prime operator by the dimension order transposed operator 22 as the cutting operator 10.
In a kind of possible embodiment, the splicing arithmetic operation includes:
The dimension order of input data is rearranged using the dimension order transposed operator, by input data In specified dimension be transformed into the last one dimension of input data;
Cutting is carried out using data of the cutting operator to the last one dimension of the input data after rearranging, from And be P subdata by input data cutting, P is the integer greater than 1.
In a kind of possible embodiment, it is described using the cutting operator to the input data after rearranging most The data of the latter dimension carry out cutting, to be P subdata by input data cutting, comprising:
The quantity of the data of the last one dimension of the input data after described rearrange is greater than the cutting operator When single maximum cutting quantity, multiple cutting is carried out to input data, until being P subdata by input data cutting.
Referring to Fig. 5, Fig. 5 shows the block diagram of the arithmetic unit according to one embodiment of the disclosure.
As shown in figure 5, described device includes:
Module 80 is obtained, for obtaining cutting operator and basic operator in artificial intelligence Operator Library, the cutting calculation Son is used to execute input data corresponding fortune for will carry out cutting processing, the basis operator in input data specified dimension Calculate operation;
Computing module 90 is connected to the acquisition module 80, for carrying out the cutting operator and the basic operator Splice to form splicing operator,
Wherein, the splicing operator is used in artificial intelligence process device execute input data corresponding splicing operation behaviour Make, to execute artificial intelligence operation.
By apparatus above, cutting operator and basic operator in the available artificial intelligence Operator Library of the disclosure, by institute It states cutting operator and the basic operator is spliced to form splicing operator, the splicing operator of formation can be used for supporting new Artificial intelligence process device, to improve operation efficiency of the new artificial intelligence process device when carrying out the operation of neural network model.
Referring to Fig. 6, Fig. 6 shows the block diagram of the arithmetic unit according to one embodiment of the disclosure.
In a kind of possible embodiment, as shown in fig. 6, the computing module 90 includes:
First operation submodule 910, for using the cutting operator as the prime operator of the basic operator.
In a kind of possible embodiment, the basis operator includes connection operator, and the connection operator is used for defeated Enter data and be attached processing in specified dimension, wherein the splicing arithmetic operation includes:
Input data is subjected to cutting processing in specified dimension using cutting operator, thus by the input data specified Cutting is M subdata in dimension, and M is the integer greater than 1;
N number of subdata in the M subdata is attached processing to obtain processing knot using the connection operator Fruit, N are integer greater than 1, and N≤M.
In a kind of possible embodiment, the basis operator includes dimension order transposed operator, the dimension order Transposed operator is for rearranging the dimension order of input data, wherein the computing module 90 further include:
Second operation submodule 920, for being calculated the dimension order transposed operator as the prime of the cutting operator Son.
In a kind of possible embodiment, the splicing arithmetic operation includes:
The dimension order of input data is rearranged using the dimension order transposed operator, by input data In specified dimension be transformed into the last one dimension of input data;
Cutting is carried out using data of the cutting operator to the last one dimension of the input data after rearranging, from And be P subdata by input data cutting, P is the integer greater than 1.
In a kind of possible embodiment, it is described using the cutting operator to the input data after rearranging most The data of the latter dimension carry out cutting, to be P subdata by input data cutting, comprising:
The quantity of the data of the last one dimension of the input data after described rearrange is greater than the cutting operator When single maximum cutting quantity, multiple cutting is carried out to input data, until being P subdata by input data cutting.
Referring to Fig. 7, Fig. 7 shows the block diagram of the artificial intelligence process device according to one embodiment of the disclosure.
In a kind of possible embodiment, as shown in fig. 7,
Primary processor 50, for executing the method, to obtain splicing operator, the splicing operator is used for described defeated Enter data and executes corresponding arithmetic operation;
Artificial intelligence process device 60 is electrically connected to the primary processor 50;
The primary processor 50 is also used to send input data and the splicing operator to artificial intelligence process device 60, described Artificial intelligence process device 60 is configured as:
Receive the input data and splicing operator that primary processor 50 is sent;
Artificial intelligence operation is carried out to obtain operation result to the input data using the splicing operator;
The operation result is sent to the primary processor 50.
In a kind of possible embodiment, primary processor 50 may include primary processor memory space, for storing master Processor 50 executes the splicing operator that the operation method obtains, wherein
The splicing that the primary processor 50 also provides for input data and is stored in the primary processor memory space Operator.
It is to be understood that primary processor 50 can execute the operation method after obtaining data, obtains splicing and calculate Son, and the splicing operator of acquisition is sent to artificial intelligence process device 60 simultaneously and is handled.Primary processor 50 can also will be deposited The splicing operator of storage is sent to artificial intelligence process device 60, and pre-stored splicing operator is sent to artificial intelligence to realize Processor 60, artificial intelligence process device 60 carry out artificial intelligence operation according to the splicing operator and input data received.More than Two ways, former can be considered that the mode handled immediately on line, latter can be considered processing mode under line.
In a kind of possible embodiment, device as shown in Figure 5, Figure 6 can be realized in primary processor 50.
In a kind of possible embodiment, primary processor 50 can be central processor CPU, be also possible to other types Processor, such as image processor GPU.It is to be understood that the splicing operator is obtained by foregoing operation method Splice operator, the specific description introduced before please referring to splicing operator, details are not described herein.
In a kind of possible embodiment, artificial intelligence process device can be to be formed by multiple identical processors , such as multiple processors (XPU) formation is similar to the framework of primary processor 50+ artificial intelligence process device 60.Can also be by One processor forms, and in this case, processor can both execute operation method above-mentioned, calculates to obtain splicing Son can also carry out artificial intelligence operation to input data by splicing operator, to obtain output result.In present embodiment In, the type of processor can be existing, be also possible to the new types of processors newly proposed, the disclosure is without limitation.
In a kind of possible embodiment, primary processor 50 can be used as artificial intelligence process device and external data and The interface of control, including data are carried, and the basic control such as unlatching, stopping to this artificial intelligent treatment device is completed;Its elsewhere Managing device can also be with the common completion processor active task of artificial intelligence process device cooperation.
In a kind of possible embodiment, artificial intelligence process device may include more than one artificial intelligence process Device can be linked between artificial intelligence process device by specific structure and transmit data, for example, be carried out by PCIE bus Data are interconnected and transmit, to support the operation of more massive machine learning.At this point it is possible to same control system is shared, it can also To there is control system independent;Can with shared drive, can also each accelerator have respective memory.In addition, it is interconnected Mode can be any interconnection topology.
The artificial intelligent treatment device compatibility with higher, can pass through PCIE interface and various types of server phases Connection.
Referring to Fig. 8, Fig. 8 shows the block diagram of the artificial intelligence process device according to one embodiment of the disclosure.
In a kind of possible embodiment, as shown in figure 8, primary processor 50 and artificial intelligence process device 60 can pass through General interconnecting interface (such as I/O interface) connection, for transmitting data and control between primary processor 50 and artificial intelligence process device 60 System instruction.The artificial intelligent processor 60 obtains required input data (including splicing operator), write-in from primary processor 50 The storage device of artificial intelligence process device on piece;Control instruction can be obtained from primary processor 50, be written at artificial intelligence Manage the control caching of device on piece;The data in the memory module of artificial intelligence process device 60 can also be read and be transferred to other Processing unit.
In a kind of possible embodiment, artificial intelligence process device can also include storage device, storage device point It is not connect with the artificial intelligence process device and other described processing units.Storage device is for being stored in the artificial intelligence The data of the data of processing unit and other processing units, operation required for being particularly suitable for are filled in this artificial intelligence process Set or the storage inside of other processing units in the data that can not all save.
The combined treatment device can be used as the SOC on piece of the equipment such as mobile phone, robot, unmanned plane, video monitoring equipment The die area of control section is effectively reduced in system, improves processing speed, reduces overall power.When this situation, the combined treatment The general interconnecting interface of device is connected with certain components of equipment.Certain components for example camera, display, mouse, keyboard, Network interface card, wifi interface.By the above artificial intelligence process device, the disclosure can be by primary processor by input data and splicing Operator is transferred to artificial intelligence process device, and artificial intelligence process executes artificial intelligence operation using splicing operator to input data Operation, to obtain operation result, and is sent to primary processor for operation result.
It is to be understood that artificial intelligence process device 60 can be the single processor that can be used for artificial intelligence operation, It is also possible to the combination of a variety of different processors.Artificial intelligence process device is applied to artificial intelligence operation, artificial intelligence operation packet Include machine learning operation, class brain operation, etc..Wherein, machine learning operation includes neural network computing, k-means operation, branch Hold vector machine operation etc..The artificial intelligent processor 60 can specifically include GPU (Graphics Processing Unit, figure Shape processor unit), NPU (Neural-Network Processing Unit, neural network processor unit), DSP (Digital Signal Process, Digital Signal Processing), field programmable gate array (Field-Programmable Gate Array, FPGA) chip one kind or combination.
In a kind of possible embodiment, artificial intelligence process device 60 is as shown in Figure 9.Referring to Fig. 9, Fig. 9 is shown According to the block diagram of the artificial intelligence process device of one embodiment of the disclosure.
As shown in figure 9, the artificial intelligence process device 30 includes control module 32, computing module 33 and memory module 31, The computing module 33 include main process task circuit 331 and it is multiple from processing circuit 332 (from the number of processing circuit be example in figure Property).
The control module 32, for obtaining input data and computations;
The control module 32 is also used to parse the computations and obtains multiple operational orders, by multiple operational order And the input data is sent to the main process task circuit 331;
The main process task circuit 331, for executing preamble processing and with the multiple from processing to the input data Data and operational order are transmitted between circuit;
It is the multiple from processing circuit 332, for referring to according to the data and operation transmitted from the main process task circuit 331 It enables the parallel intermediate operations that execute obtain multiple intermediate results, and multiple intermediate results is transferred to the main process task circuit 331;
The main process task circuit 331 obtains the computations for executing subsequent processing to the multiple intermediate result Calculated result.
Artificial intelligence process device 30 described in the disclosure holds input data after receiving input data and computations The corresponding arithmetic operation of row, to obtain the calculated result.
Artificial intelligence process device described in the disclosure can support the artificial intelligence of machine learning and some non-machine learning It can algorithm.
Above-mentioned computations include but is not limited to: forward operation instruction or reverse train instruction, the application specific embodiment party Formula is not intended to limit the specific manifestation form of above-mentioned computations.
It, can be by the meter after artificial intelligence process 30 obtains the calculated result in a kind of possible embodiment It calculates result and is sent to other processors such as central processor CPU or image processor GPU.
The operational order is run code of the artificial intelligent processor 30 according to splicing operator acquisition, above-mentioned to run Code includes but is not limited to: forward operation instruction or reverse train instruction or the instruction of other neural network computings etc., the application Specific embodiment is not intended to limit the specific manifestation form of above-mentioned computations.
In a kind of possible embodiment, the artificial intelligence process device 30 can be obtained by data transmission module 360 It arrives, which is specifically as follows one or more data I/O interfaces or I/O pin.
The main process task circuit 331, for operational data executing preamble processing with the operation that obtains that treated to described Data, and with it is the multiple from transmitted between processing circuit in the operational data, intermediate result and operational order at least one Kind.
The block diagram of the main process task circuit 331 according to one embodiment of the disclosure is shown also referring to Figure 10, Figure 10.
As shown in Figure 10, main process task circuit 331 may include: conversion processing circuit 113, activation processing circuit 111, addition One of processing circuit 112 or any combination.
The conversion processing circuit 113 is handled for executing the preamble to the data, and the preamble processing can are as follows: The received data of main process task circuit 331 or intermediate result are executed to the exchange between the first data structure and the second data structure (such as conversion of continuous data and discrete data);Or the received data of main process task circuit 331 or intermediate result are executed first Exchange (such as conversion of fixed point type and floating point type) between data type and the second data type.
The activation processing circuit 111 specially counts in execution main process task circuit 331 for executing the subsequent processing According to activation operation;
The addition process circuit 112, for executing the subsequent processing, specially execution add operation or cumulative fortune It calculates.
Each from processing circuit 332, operational data and operational order for being transmitted according to the main process task circuit 331 are held Row intermediate operations obtain intermediate result, and the intermediate result is transferred to the main process task circuit 331;
The main process task circuit 331 obtains the operational order most for executing subsequent processing to multiple intermediate results Whole calculated result.
The control module 32 is also used to generate debugging result according to the state information, and to the state information acquisition Device 40 exports debugging result.
Memory module 31 is used to store the status information in the calculating process, wherein the state according to operational order Information includes status information in the preamble treatment process of the main process task circuit 331, the multiple from processing circuit 332 Between the status information in calculating process, at least one in the status information in the subsequent processes of the main process task circuit 331 Kind.The memory module may include on piece sub-module stored 310, and the on piece sub-module stored 310 may include that high speed is temporary Deposit memory.
Memory module 31 can also include register, one or any combination in caching, specifically, the caching, For storing the computations;The register, for storing the neural network model, the data and scalar;It is described Caching is that scratchpad caches.
In a kind of possible embodiment, control module 32 may include: instruction buffer submodule 320, instruction processing Submodule 321 and storage queue submodule 323;
Instruction buffer submodule 320, for storing the associated computations of the neural network model;
Described instruction handles submodule 321, obtains multiple operational orders for parsing to the computations;
Storage queue submodule 323, for storing instruction queue, the instruction queue include: the tandem by the queue Pending multiple operational orders or computations.
For example, main process task circuit 331 also may include a control module in a kind of possible embodiment 32, which may include master instruction processing submodule, be specifically used for Instruction decoding into microcommand.Certainly in one kind It also may include another control module 32 from processing circuit 332 in possible embodiment, another control module 32 packet It includes from instruction and handles submodule, specifically for receiving and processing microcommand.Above-mentioned microcommand can be the next stage instruction of instruction, The microcommand can further can be decoded as each component, each module or everywhere by obtaining after the fractionation or decoding to instruction Manage the control signal of circuit.
In a kind of optinal plan, the structure of the computations can be as shown in following table one.
Table one
Operation code Register or immediate Register/immediate ...
Ellipsis expression in upper table may include multiple registers or immediate.
In alternative dispensing means, which may include: one or more operation domains and an operation code. The computations may include neural network computing instruction.By taking neural network computing instructs as an example, as shown in table 1, wherein deposit Device number 0, register number 1, register number 2, register number 3, register number 4 can be operation domain.Wherein, each register number 0, Register number 1, register number 2, register number 3, register number 4 can be the number of one or more register.For example, such as Shown in following table two.
Table two
Above-mentioned register can be chip external memory, certainly in practical applications, or on-chip memory, for depositing Store up data, which is specifically as follows t dimension data, and t is the integer more than or equal to 1, for example, be 1 dimension data when t=1, i.e., to Amount is 2 dimension datas, i.e. matrix when such as t=2, is multidimensional tensor when such as t=3 or 3 or more.
Optionally, which can also include:
Dependence handles submodule 322, for when with multiple operational orders, determine the first operational order with it is described The 0th operational order before first operational order whether there is incidence relation, such as first operational order and the 0th fortune Calculating instruction, there are incidence relations, then first operational order are buffered in described instruction cache sub-module, the described 0th After operational order is finished, first operational order is extracted from described instruction cache sub-module and is transmitted to the operation mould Block;
The determination first operational order whether there is with the 0th operational order before the first operational order to be associated with System includes:
Extract required data (such as matrix) in first operational order according to first operational order first is deposited Address section is stored up, the 0th stored address area of required matrix in the 0th operational order is extracted according to the 0th operational order Between, such as first storage address section has Chong Die region with the 0th storage address section, it is determined that described first Operational order and the 0th operational order have incidence relation, such as first storage address section and the 0th storage Location section does not have the region of overlapping, it is determined that first operational order does not have with the 0th operational order to be associated with System.
Figure 11 is please referred to, Figure 11 shows the schematic diagram of the artificial intelligence process device according to one embodiment of the disclosure.
In a kind of possible embodiment, computing module 33 may include branch process circuit 333 as shown in figure 11; Its specific connection structure is as shown in figure 11, wherein
Main process task circuit 331 is connect with branch process circuit 333, branch process circuit 333 and multiple from processing circuit 332 Connection;
Branch process circuit 333, for execute forwarding main process task circuit 331 and between processing circuit 332 data or Instruction.
In a kind of possible embodiment, by taking the full connection operation in neural network computing as an example, process can be with are as follows: y =f (wx+b), wherein x is to input neural variable matrix, and w is weight matrix, and b is biasing scalar, and f is activation primitive, specifically can be with Are as follows: sigmoid function, any one in tanh, relu, softmax function.It is assumed that being binary tree structure, there are 8 From processing circuit, the method realized can be with are as follows:
Control module obtains input nerve variable matrix x, weight matrix w out of memory module 31 and full connection operation refers to It enables, input nerve variable matrix x, weight matrix w and full connection operational order is transferred to main process task circuit;
Main process task circuit splits into 8 submatrixs for nerve variable matrix x is inputted, and 8 submatrixs are then passed through tree-shaped mould Block is distributed to 8 from processing circuit, and weight matrix w is broadcast to 8 from processing circuit,
The multiplying and accumulating operation for executing 8 submatrixs and weight matrix w parallel from processing circuit obtain 8 centres As a result, 8 intermediate results are sent to main process task circuit;
The operation result is executed biasing for sorting to obtain the operation result of wx by 8 intermediate results by main process task circuit Activation operation is executed after the operation of b and obtains final result y, final result y is sent to control module, control module is final by this As a result y is exported or is stored to memory module 31.
The method that neural network computing device as shown in figure 11 executes the instruction of neural network forward operation is specifically as follows:
Control module 32 extracted out of memory module 31 operational data (such as neural network forward operation instruction, nerve net Network operational order) operation domain is transmitted to data access by corresponding operation domain and at least one operation code, control module 32 At least one operation code is sent to computing module by module.
Control module 32 extracts the corresponding weight w of the operation domain out of memory module 31 and biasing b (when b is 0, is not required to It extracts biasing b), weight w and biasing b is transmitted to the main process task circuit of computing module, control module is out of memory module 31 Input data Xi is extracted, input data Xi is sent to main process task circuit.
Input data Xi is split into n data block by main process task circuit;
The instruction processing submodule 321 of control module 32 determines that multiplying order, biasing refer to according at least one operation code It enables and accumulated instruction, multiplying order, offset instructions and accumulated instruction is sent to main process task circuit, main process task circuit is by the multiplication Instruction, weight w are sent to multiple from processing circuit in a broadcast manner, which are distributed to multiple electric from processing Road (such as with n from processing circuit, then each sending a data block from processing circuit);It is multiple from processing circuit, use Intermediate result is obtained in the weight w is executed multiplying with the data block received according to the multiplying order, which is tied Fruit is sent to main process task circuit, which holds multiple intermediate results sent from processing circuit according to the accumulated instruction Row accumulating operation obtains accumulation result, and accumulation result execution biasing is held b according to the bigoted instruction and obtains final result, by this Final result is sent to the control module.
In addition, the sequence of add operation and multiplying can exchange.
Technical solution provided by the present application is that neural network computing instruction realizes neural network by an instruction Multiplying and biasing operation are not necessarily to store or extract, reduce intermediate data in the intermediate result of neural computing Storage and extraction operation, so it, which has, reduces corresponding operating procedure, the advantages of improving the calculating effect of neural network.
Figure 12 is please referred to, Figure 12 shows the schematic diagram of the artificial intelligence process device according to one embodiment of the disclosure.
In a kind of possible embodiment, computing module 33 may include a main process task circuit 331 as shown in figure 12 With multiple from processing circuit 332.
In a kind of possible embodiment, as shown in figure 12, it is multiple from processing circuit be in array distribution;Each from processing Circuit is connect with other adjacent from processing circuit, and main process task circuit connection is the multiple a from processing from the k in processing circuit Circuit, the k is from processing circuit are as follows: n of the 1st row arrange from processing circuit, n of m row from processing circuit and the 1st M is from processing circuit, it should be noted that as shown in figure 12 K only include n of the 1st row from processing circuit from processing electricity Road, the n m arranged from processing circuit and the 1st of m row are a from processing circuit, i.e. the k are multiple from processing from processing circuit In circuit directly with the slave processing circuit of main process task circuit connection.
K is from processing circuit, in the main process task circuit and multiple data between processing circuit and referring to The forwarding of order.
In some embodiments, a kind of chip has also been applied for comprising above-mentioned artificial intelligence process device.
In some embodiments, a kind of chip-packaging structure has been applied for comprising said chip.
In some embodiments, a kind of board has been applied for comprising said chip encapsulating structure.
Figure 13 is please referred to, Figure 13 shows a kind of board according to one embodiment of the disclosure, and above-mentioned board is in addition to including Can also include other matching components, which includes but is not limited to other than said chip 389: memory device 390, Interface arrangement 391 and control device 392;
The memory device 390 is connect with the chip in the chip-packaging structure by bus, for storing data.Institute Stating memory device may include multiple groups storage unit 393.Storage unit described in each group is connect with the chip by bus.It can To understand, storage unit described in each group can be DDR SDRAM (English: Double Data Rate SDRAM, Double Data Rate Synchronous DRAM).
DDR, which does not need raising clock frequency, can double to improve the speed of SDRAM.DDR allows the rising in clock pulses Edge and failing edge read data.The speed of DDR is twice of standard SDRAM.In one embodiment, the storage device can be with Including storage unit described in 4 groups.Storage unit described in each group may include multiple DDR4 particles (chip).In one embodiment In, the chip interior may include 4 72 DDR4 controllers, and 64bit is used for transmission number in above-mentioned 72 DDR4 controllers According to 8bit is used for ECC check.It is appreciated that data pass when using DDR4-3200 particle in the storage unit described in each group Defeated theoretical bandwidth can reach 25600MB/s.
In one embodiment, storage unit described in each group include multiple Double Data Rate synchronous dynamics being arranged in parallel with Machine memory.DDR can transmit data twice within a clock cycle.The controller of setting control DDR in the chips, Control for data transmission and data storage to each storage unit.
The interface arrangement is electrically connected with the chip in the chip-packaging structure.The interface arrangement is for realizing described Data transmission between chip and external equipment (such as server or computer).Such as in one embodiment, the interface Device can be standard PCIE interface.For example, data to be processed are transferred to the core by standard PCIE interface by server Piece realizes data transfer.Preferably, when using the transmission of 16 interface of PCIE 3.0X, theoretical bandwidth can reach 16000MB/s. In another embodiment, the interface arrangement can also be other interfaces, and the application is not intended to limit above-mentioned other interfaces Specific manifestation form, the interface unit can be realized signaling transfer point.In addition, the calculated result of the chip is still by institute It states interface arrangement and sends back external equipment (such as server).
The control device is electrically connected with the chip.The control device is for supervising the state of the chip Control.Specifically, the chip can be electrically connected with the control device by SPI interface.The control device may include list Piece machine (Micro Controller Unit, MCU).If the chip may include multiple processing chips, multiple processing cores or more A processing circuit can drive multiple loads.Therefore, the chip may be at the different work shape such as multi-load and light load State.It may be implemented by the control device to processing chips multiple in the chip, multiple processing and/or multiple processing circuits Working condition regulation.
In some embodiments, a kind of electronic equipment has been applied for comprising above-mentioned board.
Electronic equipment include data processing equipment, robot, computer, printer, scanner, tablet computer, intelligent terminal, Mobile phone, automobile data recorder, navigator, sensor, camera, server, cloud server, camera, video camera, projector, hand Table, earphone, mobile storage, wearable device, the vehicles, household electrical appliance, and/or Medical Devices.
The vehicles include aircraft, steamer and/or vehicle;The household electrical appliance include TV, air-conditioning, micro-wave oven, Refrigerator, electric cooker, humidifier, washing machine, electric light, gas-cooker, kitchen ventilator;The Medical Devices include Nuclear Magnetic Resonance, B ultrasound instrument And/or electrocardiograph.
It should be noted that for the various method embodiments described above, for simple description, therefore, it is stated as a series of Combination of actions, but those skilled in the art should understand that, the application is not limited by the described action sequence because According to the application, some steps may be performed in other sequences or simultaneously.Secondly, those skilled in the art should also know It knows, embodiment described in this description belongs to alternative embodiment, related actions and modules not necessarily the application It is necessary.
In the above-described embodiments, it all emphasizes particularly on different fields to the description of each embodiment, there is no the portion being described in detail in some embodiment Point, reference can be made to the related descriptions of other embodiments.
In several embodiments provided herein, it should be understood that disclosed device, it can be by another way It realizes.For example, the apparatus embodiments described above are merely exemplary, such as the division of the module, it is only a kind of Logical function partition, there may be another division manner in actual implementation, such as multiple module or components can combine or can To be integrated into another system, or some features can be ignored or not executed.Another point, shown or discussed is mutual Coupling, direct-coupling or communication connection can be through some interfaces, the indirect coupling or communication connection of device or module, It can be electrical or other forms.
The module as illustrated by the separation member may or may not be physically separated, aobvious as module The component shown may or may not be physical module, it can and it is in one place, or may be distributed over multiple On network module.Some or all of the modules therein can be selected to realize the mesh of this embodiment scheme according to the actual needs 's.
It, can also be in addition, can integrate in a processing module in each functional module in each embodiment of the application It is that modules physically exist alone, can also be integrated in two or more modules in a module.Above-mentioned integrated mould Block both can take the form of hardware realization, can also be realized in the form of software program module.
If the integrated module is realized in the form of software program module and sells or use as independent product When, it can store in a computer-readable access to memory.Based on this understanding, the technical solution of the application substantially or Person says that all or part of the part that contributes to existing technology or the technical solution can body in the form of software products Reveal and, which is stored in a memory, including some instructions are used so that a computer equipment (can be personal computer, server or network equipment etc.) executes all or part of each embodiment the method for the application Step.And memory above-mentioned includes: USB flash disk, read-only memory (ROM, Read-Only Memory), random access memory The various media that can store program code such as (RAM, Random Access Memory), mobile hard disk, magnetic or disk.
Those of ordinary skill in the art will appreciate that all or part of the steps in the various methods of above-described embodiment is can It is completed with instructing relevant hardware by program, which can store in a computer-readable memory, memory May include: flash disk, read-only memory (English: Read-Only Memory, referred to as: ROM), random access device (English: Random Access Memory, referred to as: RAM), disk or CD etc..
The presently disclosed embodiments is described above, above description is exemplary, and non-exclusive, and It is not limited to disclosed each embodiment.Without departing from the scope and spirit of illustrated each embodiment, for this skill Many modifications and changes are obvious for the those of ordinary skill in art field.The selection of term used herein, purport In the principle, practical application or technological improvement to the technology in market for best explaining each embodiment, or lead this technology Other those of ordinary skill in domain can understand each embodiment disclosed herein.

Claims (21)

1. a kind of operation method, which is characterized in that the described method includes:
The cutting operator and basic operator in artificial intelligence Operator Library are obtained, the cutting operator is for specifying input data Cutting processing is carried out in dimension, the basis operator is used to execute corresponding arithmetic operation to input data;
The cutting operator and the basic operator are spliced to form splicing operator,
Wherein, the splicing operator is used in artificial intelligence process device execute input data corresponding splicing arithmetic operation, To execute artificial intelligence operation.
2. the method according to claim 1, wherein described carry out the cutting operator with the basic operator Splicing is to form splicing operator, comprising:
Using the cutting operator as the prime operator of the basic operator.
3. according to the method described in claim 2, the connection is calculated it is characterized in that, the basis operator includes connection operator Son is for being attached processing in specified dimension to input data, wherein the splicing arithmetic operation includes:
Input data is subjected to cutting processing in specified dimension using cutting operator, thus by the input data in specified dimension Upper cutting is M subdata, and M is the integer greater than 1;
N number of subdata in the M subdata is attached processing to obtain processing result, N using the connection operator For the integer greater than 1, and N≤M.
4. the method according to claim 1, wherein the basis operator includes dimension order transposed operator, institute Dimension order transposed operator is stated for rearranging to the dimension order of input data, wherein described to calculate the cutting It is sub to be spliced with the basic operator to form splicing operator and include:
Using the dimension order transposed operator as the prime operator of the cutting operator.
5. according to the method described in claim 4, it is characterized in that, the splicing arithmetic operation includes:
The dimension order of input data is rearranged using the dimension order transposed operator, it will be in input data Specified dimension is transformed into the last one dimension of input data;
Cutting is carried out using data of the cutting operator to the last one dimension of the input data after rearranging, thus will Input data cutting is P subdata, and P is the integer greater than 1.
6. according to the method described in claim 5, it is characterized in that, it is described using the cutting operator to defeated after rearranging The data for entering the last one dimension of data carry out cutting, to be P subdata by input data cutting, comprising:
The quantity of the data of the last one dimension of the input data after described rearrange is greater than the cutting operator single When maximum cutting quantity, multiple cutting is carried out to input data, until being P subdata by input data cutting.
7. the method according to claim 1, wherein the splicing operator is applied to answering in software transfer level With program layer, the deep learning Operator Library is located at the Operator Library layer in software transfer level, artificial intelligence process device position Chip layer in software transfer level.
8. a kind of arithmetic unit, which is characterized in that described device includes:
Module is obtained, is used for for obtaining cutting operator in artificial intelligence Operator Library and basic operator, the cutting operator Cutting processing will be carried out in input data specified dimension, the basis operator is used to execute corresponding operation to input data and grasp Make;
Computing module is connected to the acquisition module, for splicing the cutting operator and the basic operator with shape At splicing operator,
Wherein, the splicing operator is used in artificial intelligence process device execute input data corresponding splicing arithmetic operation, To execute artificial intelligence operation.
9. device according to claim 8, which is characterized in that the computing module includes:
First operation submodule, for using the cutting operator as the prime operator of the basic operator.
10. device according to claim 9, which is characterized in that the basis operator includes connection operator, and the connection is calculated Son is for being attached processing in specified dimension to input data, wherein the splicing arithmetic operation includes:
Input data is subjected to cutting processing in specified dimension using cutting operator, thus by the input data in specified dimension Upper cutting is M subdata, and M is the integer greater than 1;
N number of subdata in the M subdata is attached processing to obtain processing result, N using the connection operator For the integer greater than 1, and N≤M.
11. device according to claim 8, which is characterized in that the basis operator includes dimension order transposed operator, institute Dimension order transposed operator is stated for rearranging to the dimension order of input data, wherein the computing module also wraps It includes:
Second operation submodule, for using the dimension order transposed operator as the prime operator of the cutting operator.
12. device according to claim 11, which is characterized in that the splicing arithmetic operation includes:
The dimension order of input data is rearranged using the dimension order transposed operator, it will be in input data Specified dimension is transformed into the last one dimension of input data;
Cutting is carried out using data of the cutting operator to the last one dimension of the input data after rearranging, thus will Input data cutting is P subdata, and P is the integer greater than 1.
13. device according to claim 12, which is characterized in that it is described using the cutting operator to rearranging after The data of the last one dimension of input data carry out cutting, to be P subdata by input data cutting, comprising:
The quantity of the data of the last one dimension of the input data after described rearrange is greater than the cutting operator single When maximum cutting quantity, multiple cutting is carried out to input data, until being P subdata by input data cutting.
14. a kind of artificial intelligence process device, which is characterized in that described device includes:
Primary processor, for executing the method according to claim 1 to 7, to obtain splicing operator, the splicing is calculated Son is for executing corresponding arithmetic operation to the input data;
Artificial intelligence process device is electrically connected to the primary processor;
The primary processor is also used to send input data and the splicing operator, the artificial intelligence to artificial intelligence process device Processor is configured as:
Receive the input data and splicing operator that primary processor is sent;
Artificial intelligence operation is carried out to obtain operation result to the input data using the splicing operator;
The operation result is sent to the primary processor.
15. device according to claim 14, which is characterized in that the primary processor further includes that primary processor storage is empty Between, for storing the splicing operator, wherein
The splicing operator that the primary processor also provides for input data and is stored in the primary processor memory space.
16. device according to claim 14, which is characterized in that operation result is passed through I/ by the artificial intelligence process device O Interface passes to the primary processor;
When described device includes multiple artificial intelligence process devices, can lead between the multiple artificial intelligence process device Specific structure is crossed to be attached and transmit data;
Wherein, multiple artificial intelligence process devices are interconnected and are transmitted by quick external equipment interconnection Bus PC IE bus Data, to support the operation of more massive artificial intelligence;Multiple artificial intelligence process devices share same control system or Possess respective control system;Multiple artificial intelligence process device shared drives possess respective memory;It is multiple described The mutual contact mode of artificial intelligence process device is any interconnection topology.
17. device according to claim 14, which is characterized in that further include: storage device, the storage device respectively with institute It states artificial intelligence process device to connect with the primary processor, for saving the artificial intelligence process device device and the main process task The data of device.
18. a kind of artificial intelligence chip, which is characterized in that the artificial intelligence chip includes such as any one of claim 14-17 The artificial intelligence process device.
19. a kind of electronic equipment, which is characterized in that the electronic equipment includes the chip as described in the claim 18.
20. a kind of board, which is characterized in that the board includes: memory device, interface arrangement and control device and such as right It is required that artificial intelligence chip described in 18;
Wherein, the artificial intelligence chip is separately connected with the memory device, the control device and the interface arrangement;
The memory device, for storing data;
The interface arrangement, for realizing the data transmission between the chip and external equipment;
The control device is monitored for the state to the chip.
21. board according to claim 20, which is characterized in that
The memory device includes: multiple groups storage unit, and storage unit described in each group is connect with the chip by bus, institute State storage unit are as follows: DDR SDRAM;
The chip includes: DDR controller, the control for data transmission and data storage to each storage unit;
The interface arrangement are as follows: standard PCIE interface.
CN201811536153.2A 2018-12-14 2018-12-14 Operation method, device and related product Active CN109657782B (en)

Priority Applications (1)

Application Number Priority Date Filing Date Title
CN201811536153.2A CN109657782B (en) 2018-12-14 2018-12-14 Operation method, device and related product

Applications Claiming Priority (1)

Application Number Priority Date Filing Date Title
CN201811536153.2A CN109657782B (en) 2018-12-14 2018-12-14 Operation method, device and related product

Publications (2)

Publication Number Publication Date
CN109657782A true CN109657782A (en) 2019-04-19
CN109657782B CN109657782B (en) 2020-10-27

Family

ID=66113207

Family Applications (1)

Application Number Title Priority Date Filing Date
CN201811536153.2A Active CN109657782B (en) 2018-12-14 2018-12-14 Operation method, device and related product

Country Status (1)

Country Link
CN (1) CN109657782B (en)

Cited By (17)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN110458286A (en) * 2019-08-14 2019-11-15 北京中科寒武纪科技有限公司 Data processing method, device, computer equipment and storage medium
CN110555522A (en) * 2019-09-23 2019-12-10 北京中科寒武纪科技有限公司 Data processing method, data processing device, computer equipment and storage medium
CN110633153A (en) * 2019-09-24 2019-12-31 上海寒武纪信息科技有限公司 Method for realizing neural network model splitting by using multi-core processor and related product
CN110689121A (en) * 2019-09-24 2020-01-14 上海寒武纪信息科技有限公司 Method for realizing neural network model splitting by using multi-core processor and related product
CN111857829A (en) * 2019-04-25 2020-10-30 安徽寒武纪信息科技有限公司 Processor operation method and device and related product
CN111860796A (en) * 2019-04-30 2020-10-30 上海寒武纪信息科技有限公司 Operation method, device and related product
CN111966398A (en) * 2019-05-20 2020-11-20 上海寒武纪信息科技有限公司 Instruction processing method and device and related product
CN112131243A (en) * 2020-08-13 2020-12-25 成都量子象云计算科技有限公司 Data processing method, data processing device, computer equipment and storage medium
CN112232517A (en) * 2020-09-24 2021-01-15 苏州浪潮智能科技有限公司 Artificial intelligence accelerates engine and artificial intelligence treater
CN112306949A (en) * 2019-07-31 2021-02-02 中科寒武纪科技股份有限公司 Data processing method and device and related product
CN112396186A (en) * 2019-08-12 2021-02-23 上海寒武纪信息科技有限公司 Execution method, device and related product
CN112765541A (en) * 2019-11-01 2021-05-07 中科寒武纪科技股份有限公司 Data processing method, data processing device, computer equipment and storage medium
CN113011585A (en) * 2021-03-19 2021-06-22 上海西井信息科技有限公司 Compiling optimization method, system, equipment and storage medium for eliminating splicing operator
CN113837923A (en) * 2021-09-26 2021-12-24 安徽寒武纪信息科技有限公司 Data processing device, data processing method and related product
CN113837921A (en) * 2021-09-26 2021-12-24 安徽寒武纪信息科技有限公司 Data processing device, data processing method and related product
CN115762515A (en) * 2022-11-08 2023-03-07 北京百度网讯科技有限公司 Processing and application method, device and equipment of neural network for voice recognition
WO2024131170A1 (en) * 2022-12-24 2024-06-27 华为技术有限公司 Operator processing method and apparatus, and chip, computing device and storage medium

Citations (10)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
EP0586999A2 (en) * 1992-09-10 1994-03-16 Deere & Company Neural network based controller for a machine in particular for a combine
CN103235974A (en) * 2013-04-25 2013-08-07 中国科学院地理科学与资源研究所 Method for improving processing efficiency of massive spatial data
CN107122825A (en) * 2017-03-09 2017-09-01 华南理工大学 A kind of activation primitive generation method of neural network model
CN107229967A (en) * 2016-08-22 2017-10-03 北京深鉴智能科技有限公司 A kind of hardware accelerator and method that rarefaction GRU neutral nets are realized based on FPGA
CN107621934A (en) * 2017-07-28 2018-01-23 中国人民解放军国防信息学院 Based on modularization, the evaluation index computational methods of graphical operator and device
CN107621932A (en) * 2017-09-25 2018-01-23 威创集团股份有限公司 The local amplification method and device of display image
CN107729523A (en) * 2017-10-27 2018-02-23 平安科技(深圳)有限公司 Data service method, electronic installation and storage medium
CN107967135A (en) * 2017-10-31 2018-04-27 平安科技(深圳)有限公司 Computing engines implementation method, electronic device and storage medium
CN108280515A (en) * 2018-02-12 2018-07-13 华夏芯(北京)通用处理器技术有限公司 A kind of method and apparatus that instruction delay executes and instructs stipulations
US20180357552A1 (en) * 2016-01-27 2018-12-13 Bonsai AI, Inc. Artificial Intelligence Engine Having Various Algorithms to Build Different Concepts Contained Within a Same AI Model

Patent Citations (10)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
EP0586999A2 (en) * 1992-09-10 1994-03-16 Deere & Company Neural network based controller for a machine in particular for a combine
CN103235974A (en) * 2013-04-25 2013-08-07 中国科学院地理科学与资源研究所 Method for improving processing efficiency of massive spatial data
US20180357552A1 (en) * 2016-01-27 2018-12-13 Bonsai AI, Inc. Artificial Intelligence Engine Having Various Algorithms to Build Different Concepts Contained Within a Same AI Model
CN107229967A (en) * 2016-08-22 2017-10-03 北京深鉴智能科技有限公司 A kind of hardware accelerator and method that rarefaction GRU neutral nets are realized based on FPGA
CN107122825A (en) * 2017-03-09 2017-09-01 华南理工大学 A kind of activation primitive generation method of neural network model
CN107621934A (en) * 2017-07-28 2018-01-23 中国人民解放军国防信息学院 Based on modularization, the evaluation index computational methods of graphical operator and device
CN107621932A (en) * 2017-09-25 2018-01-23 威创集团股份有限公司 The local amplification method and device of display image
CN107729523A (en) * 2017-10-27 2018-02-23 平安科技(深圳)有限公司 Data service method, electronic installation and storage medium
CN107967135A (en) * 2017-10-31 2018-04-27 平安科技(深圳)有限公司 Computing engines implementation method, electronic device and storage medium
CN108280515A (en) * 2018-02-12 2018-07-13 华夏芯(北京)通用处理器技术有限公司 A kind of method and apparatus that instruction delay executes and instructs stipulations

Cited By (29)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN111857829A (en) * 2019-04-25 2020-10-30 安徽寒武纪信息科技有限公司 Processor operation method and device and related product
CN111860796B (en) * 2019-04-30 2023-10-03 上海寒武纪信息科技有限公司 Operation method, device and related product
CN111860796A (en) * 2019-04-30 2020-10-30 上海寒武纪信息科技有限公司 Operation method, device and related product
CN111966398B (en) * 2019-05-20 2024-06-07 上海寒武纪信息科技有限公司 Instruction processing method and device and related products
CN111966398A (en) * 2019-05-20 2020-11-20 上海寒武纪信息科技有限公司 Instruction processing method and device and related product
CN112306949A (en) * 2019-07-31 2021-02-02 中科寒武纪科技股份有限公司 Data processing method and device and related product
CN112306949B (en) * 2019-07-31 2022-11-01 中科寒武纪科技股份有限公司 Data processing method and device and related product
CN112396186A (en) * 2019-08-12 2021-02-23 上海寒武纪信息科技有限公司 Execution method, device and related product
CN112396186B (en) * 2019-08-12 2024-05-03 上海寒武纪信息科技有限公司 Execution method, execution device and related product
CN110458286A (en) * 2019-08-14 2019-11-15 北京中科寒武纪科技有限公司 Data processing method, device, computer equipment and storage medium
CN113435591A (en) * 2019-08-14 2021-09-24 中科寒武纪科技股份有限公司 Data processing method, data processing device, computer equipment and storage medium
CN110458286B (en) * 2019-08-14 2022-02-08 中科寒武纪科技股份有限公司 Data processing method, data processing device, computer equipment and storage medium
CN113435591B (en) * 2019-08-14 2024-04-05 中科寒武纪科技股份有限公司 Data processing method, device, computer equipment and storage medium
CN110555522A (en) * 2019-09-23 2019-12-10 北京中科寒武纪科技有限公司 Data processing method, data processing device, computer equipment and storage medium
CN110689121A (en) * 2019-09-24 2020-01-14 上海寒武纪信息科技有限公司 Method for realizing neural network model splitting by using multi-core processor and related product
WO2021057722A1 (en) * 2019-09-24 2021-04-01 安徽寒武纪信息科技有限公司 Method of performing splitting in neural network model by means of multi-core processor, and related product
WO2021057713A1 (en) * 2019-09-24 2021-04-01 安徽寒武纪信息科技有限公司 Method for splitting neural network model by using multi-core processor, and related product
CN110633153A (en) * 2019-09-24 2019-12-31 上海寒武纪信息科技有限公司 Method for realizing neural network model splitting by using multi-core processor and related product
CN112765541A (en) * 2019-11-01 2021-05-07 中科寒武纪科技股份有限公司 Data processing method, data processing device, computer equipment and storage medium
CN112765541B (en) * 2019-11-01 2024-02-23 中科寒武纪科技股份有限公司 Data processing method, device, computer equipment and storage medium
CN112131243A (en) * 2020-08-13 2020-12-25 成都量子象云计算科技有限公司 Data processing method, data processing device, computer equipment and storage medium
CN112232517A (en) * 2020-09-24 2021-01-15 苏州浪潮智能科技有限公司 Artificial intelligence accelerates engine and artificial intelligence treater
CN113011585A (en) * 2021-03-19 2021-06-22 上海西井信息科技有限公司 Compiling optimization method, system, equipment and storage medium for eliminating splicing operator
CN113011585B (en) * 2021-03-19 2023-09-26 上海西井科技股份有限公司 Compiling optimization method, system, equipment and storage medium for eliminating splicing operator
CN113837921A (en) * 2021-09-26 2021-12-24 安徽寒武纪信息科技有限公司 Data processing device, data processing method and related product
CN113837923A (en) * 2021-09-26 2021-12-24 安徽寒武纪信息科技有限公司 Data processing device, data processing method and related product
CN115762515B (en) * 2022-11-08 2023-12-01 北京百度网讯科技有限公司 Processing and application method, device and equipment for neural network for voice recognition
CN115762515A (en) * 2022-11-08 2023-03-07 北京百度网讯科技有限公司 Processing and application method, device and equipment of neural network for voice recognition
WO2024131170A1 (en) * 2022-12-24 2024-06-27 华为技术有限公司 Operator processing method and apparatus, and chip, computing device and storage medium

Also Published As

Publication number Publication date
CN109657782B (en) 2020-10-27

Similar Documents

Publication Publication Date Title
CN109657782A (en) Operation method, device and Related product
CN109685201A (en) Operation method, device and Related product
CN109522052A (en) A kind of computing device and board
CN109543832A (en) A kind of computing device and board
CN109726822A (en) Operation method, device and Related product
CN109740739A (en) Neural computing device, neural computing method and Related product
CN109189473A (en) Processing with Neural Network device and its method for executing vector exchange instruction
CN106156851B (en) Accelerator and method towards deep learning business
CN109740754A (en) Neural computing device, neural computing method and Related product
CN110059797A (en) A kind of computing device and Related product
CN109543825A (en) Neural network model algorithm Compilation Method, device and Related product
CN109670581A (en) A kind of computing device and board
CN110147249A (en) A kind of calculation method and device of network model
CN109740729A (en) Operation method, device and Related product
CN109739703A (en) Adjust wrong method and Related product
CN109993301A (en) Neural metwork training device and Related product
CN109753319A (en) A kind of device and Related product of release dynamics chained library
CN110163349A (en) A kind of calculation method and device of network model
CN110059809A (en) A kind of computing device and Related product
CN109711538A (en) Operation method, device and Related product
CN109740730A (en) Operation method, device and Related product
CN109711540A (en) A kind of computing device and board
CN109726800A (en) Operation method, device and Related product
CN110472734A (en) A kind of computing device and Related product
CN108108189A (en) A kind of computational methods and Related product

Legal Events

Date Code Title Description
PB01 Publication
PB01 Publication
SE01 Entry into force of request for substantive examination
SE01 Entry into force of request for substantive examination
CB02 Change of applicant information
CB02 Change of applicant information

Address after: 100190 room 644, comprehensive research building, No. 6 South Road, Haidian District Academy of Sciences, Beijing

Applicant after: Zhongke Cambrian Technology Co., Ltd

Address before: 100190 room 644, comprehensive research building, No. 6 South Road, Haidian District Academy of Sciences, Beijing

Applicant before: Beijing Zhongke Cambrian Technology Co., Ltd.

TA01 Transfer of patent application right
TA01 Transfer of patent application right

Effective date of registration: 20200910

Address after: Room 611-194, R & D center building, China (Hefei) international intelligent voice Industrial Park, 3333 Xiyou Road, hi tech Zone, Hefei City, Anhui Province

Applicant after: Anhui Cambrian Information Technology Co., Ltd

Address before: 100190 room 644, comprehensive research building, No. 6 South Road, Haidian District Academy of Sciences, Beijing

Applicant before: Zhongke Cambrian Technology Co.,Ltd.

GR01 Patent grant
GR01 Patent grant