CN109740729A - Operation method, device and Related product - Google Patents

Operation method, device and Related product Download PDF

Info

Publication number
CN109740729A
CN109740729A CN201811534505.0A CN201811534505A CN109740729A CN 109740729 A CN109740729 A CN 109740729A CN 201811534505 A CN201811534505 A CN 201811534505A CN 109740729 A CN109740729 A CN 109740729A
Authority
CN
China
Prior art keywords
operator
input data
artificial intelligence
splicing
normalization
Prior art date
Legal status (The legal status is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the status listed.)
Granted
Application number
CN201811534505.0A
Other languages
Chinese (zh)
Other versions
CN109740729B (en
Inventor
不公告发明人
Current Assignee (The listed assignees may be inaccurate. Google has not performed a legal analysis and makes no representation or warranty as to the accuracy of the list.)
Anhui Cambricon Information Technology Co Ltd
Original Assignee
Beijing Zhongke Cambrian Technology Co Ltd
Priority date (The priority date is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the date listed.)
Filing date
Publication date
Application filed by Beijing Zhongke Cambrian Technology Co Ltd filed Critical Beijing Zhongke Cambrian Technology Co Ltd
Priority to CN201811534505.0A priority Critical patent/CN109740729B/en
Publication of CN109740729A publication Critical patent/CN109740729A/en
Application granted granted Critical
Publication of CN109740729B publication Critical patent/CN109740729B/en
Active legal-status Critical Current
Anticipated expiration legal-status Critical

Links

Abstract

This disclosure relates to a kind of operation method, device and Related product, the product includes control module, and the control module includes: instruction buffer submodule, instruction buffer submodule and storage queue submodule;Described instruction cache sub-module, for storing the associated computations of artificial neural network operation;Described instruction handles submodule, obtains multiple operational orders for parsing to the computations;The storage queue submodule, for storing instruction queue, the instruction queue include: by the pending multiple operational orders of the tandem of the queue or computations.By above method, operation efficiency of the Related product when carrying out the operation of neural network model is can be improved in the disclosure.

Description

Operation method, device and Related product
Technical field
This disclosure relates to machine learning techniques field more particularly to a kind of operation method, device and Related product.
Background technique
Neural network algorithm is a kind of nearest popular machine learning algorithm, is all achieved very in various fields Good effect, such as image recognition, speech recognition, natural language processing etc..With the development of neural network algorithm, algorithm is answered Miscellaneous degree is also higher and higher, and in order to improve resolution, the scale of model is also being gradually increased.These big rule have been handled with GPU and CPU The model of mould will spend a large amount of calculating time, and power consumption is very big.In this case, new artificial intelligence process device It is suggested to improve the arithmetic speed of neural network model, saves operation time, reduce power consumption.However, currently to new artificial The algorithm of intelligent processor is supported far from enough.
Summary of the invention
According to one aspect of the disclosure, a kind of operation method is proposed, which comprises
The basic operator in artificial intelligence Operator Library is obtained, the basis operator is used to execute input data corresponding fortune Calculate operation;
Splicing operator is formed using the basic operator, the splicing operator is used in artificial intelligence process device to input Data execute splicing arithmetic operation, so that input data be normalized.
In a kind of possible embodiment, the basis operator includes the first deformation operator, the second deformation operator and returns One changes exponential operator, and the first deformation operator and the second deformation operator are used to carry out at type conversion input data Reason, the normalization exponential operator is for being normalized operation, wherein
It is described to form splicing operator using the basic operator, comprising:
Using the first deformation operator as the prime operator of the normalization exponential operator;
Using the second deformation operator as the rear class operator of the normalization exponential operator.
In a kind of possible embodiment, the splicing arithmetic operation includes:
It is greater than 2 in the dimension of first input data of the first kind, and is taken in first input data When the first parameter and the second parameter of band meet preset condition, first input data is turned using the first deformation operator It is changed to Second Type, the dimension of the first input data of the Second Type is 2;
The first input data of the Second Type is returned in the second dimension using the normalization exponential operator One changes operation, to export the output data of Second Type;
The output data of Second Type is converted to the output data of the first kind using the second deformation operator.
In a kind of possible embodiment, the basis operator includes scaling operator and normalization operator, the scaling Operator is used to zoom in and out input data operation, and the normalization operator is used to that operation to be normalized to input data, In,
It is described to form splicing operator using the basic operator, comprising:
Using the normalization operator as the prime operator of the scaling operator.
In a kind of possible embodiment, the splicing arithmetic operation includes:
Input data is normalized using the normalization operator, obtains normalization result;
The normalization result is zoomed in and out into processing using the scaling operator, the normalization result after being scaled.
In a kind of possible embodiment, the basis operator is inverted including square operator, convolution operator, square root Operator and multiplication operator, described square of operator are used to carry out input data square operation, and the convolution operator is used for input Data carry out summation operation, and the square root, which is got, figures son for carrying out sqrt to input data and seeking fortune reciprocal It calculates, the multiplication operator is used to carry out multiplying to input data, wherein
It is described to form splicing operator using the basic operator, comprising:
Described square of operator, convolution operator, the inverted operator of square root and multiplication operator are successively spliced to form the spelling Connect operator.
In a kind of possible embodiment, the splicing arithmetic operation includes:
Square operation is carried out to input data using described square of operator, obtains square operation result;
Summation operation is carried out to multiple square operation results using the convolution operator, obtains summation operation result;
Sqrt operation is successively carried out to the summation operation result using square root inverted operator and seeks inverse Operation obtains derivative action result;
Multiplying is carried out to input data and the derivative action result using the multiplication operator, obtains normalization knot Fruit.
In a kind of possible embodiment, the splicing operator is applied to the application layer in software transfer level, The deep learning Operator Library is located at the Operator Library layer in software transfer level, and the artificial intelligence process device is located at software transfer Chip layer in level.
According to another aspect of the present disclosure, a kind of arithmetic unit is proposed, described device includes:
Module is obtained, for obtaining the basic operator in artificial intelligence Operator Library, the basis operator is used for input number According to the corresponding arithmetic operation of execution;
Computing module is connected to the acquisition module, for forming splicing operator, the splicing using the basic operator Operator is used in artificial intelligence process device execute input data splicing arithmetic operation, so that input data be normalized Processing.
In a kind of possible embodiment, the basis operator includes the first deformation operator, the second deformation operator and returns One changes exponential operator, and the first deformation operator and the second deformation operator are used to carry out at type conversion input data Reason, the normalization exponential operator is for being normalized operation, wherein
The computing module includes the first operation submodule, and the first operation submodule is configured as:
Using the first deformation operator as the prime operator of the normalization exponential operator;
Using the second deformation operator as the rear class operator of the normalization exponential operator.
In a kind of possible embodiment, the splicing arithmetic operation includes:
It is greater than 2 in the dimension of first input data of the first kind, and is taken in first input data When the first parameter and the second parameter of band meet preset condition, first input data is turned using the first deformation operator It is changed to Second Type, the dimension of the first input data of the Second Type is 2;
The first input data of the Second Type is returned in the second dimension using the normalization exponential operator One changes operation, to export the output data of Second Type;
The output data of Second Type is converted to the output data of the first kind using the second deformation operator.
In a kind of possible embodiment, the basis operator includes scaling operator and normalization operator, the scaling Operator is used to zoom in and out input data operation, and the normalization operator is used to that operation to be normalized to input data, In,
The computing module includes the second operation submodule, and the second operation submodule is configured as:
Using the normalization operator as the prime operator of the scaling operator.
In a kind of possible embodiment, the splicing arithmetic operation includes:
Input data is normalized using the normalization operator, obtains normalization result;
The normalization result is zoomed in and out into processing using the scaling operator, the normalization result after being scaled.
In a kind of possible embodiment, the basis operator is inverted including square operator, convolution operator, square root Operator and multiplication operator, described square of operator are used to carry out input data square operation, and the convolution operator is used for input Data carry out summation operation, and the square root, which is got, figures son for carrying out sqrt to input data and seeking fortune reciprocal It calculates, the multiplication operator is used to carry out multiplying to input data, wherein
The computing module includes third operation submodule, and the third operation submodule is configured as:
Described square of operator, convolution operator, the inverted operator of square root and multiplication operator are successively spliced to form the spelling Connect operator.
In a kind of possible embodiment, the splicing arithmetic operation includes:
Square operation is carried out to input data using described square of operator, obtains square operation result;
Summation operation is carried out to multiple square operation results using the convolution operator, obtains summation operation result;
Sqrt operation is successively carried out to the summation operation result using square root inverted operator and seeks inverse Operation obtains derivative action result;
Multiplying is carried out to input data and the derivative action result using the multiplication operator, obtains normalization knot Fruit.
According to another aspect of the present disclosure, a kind of artificial intelligence process device is proposed, described device includes:
Primary processor, for executing the method, to obtain splicing operator, the splicing operator is used for the input Data execute corresponding arithmetic operation;
Artificial intelligence process device is electrically connected to the primary processor;
The primary processor is also used to send input data and the splicing operator to artificial intelligence process device, described artificial Intelligent processor is configured as:
Receive the input data and splicing operator that primary processor is sent;
Artificial intelligence operation is carried out to obtain operation result to the input data using the splicing operator;
The operation result is sent to the primary processor.
In a kind of possible embodiment, the primary processor further includes primary processor memory space, for storing State splicing operator, wherein
The primary processor also provides for input data and the splicing being stored in the primary processor memory space is calculated Son.
In a kind of possible embodiment, the artificial intelligence process device passes to operation result by I/O interface The primary processor;
It, can between the multiple artificial intelligence process device when described device includes multiple artificial intelligence process devices To be attached by specific structure and transmit data;
Wherein, multiple artificial intelligence process devices are interconnected simultaneously by quick external equipment interconnection Bus PC IE bus Data are transmitted, to support the operation of more massive artificial intelligence;Multiple artificial intelligence process devices share same control system It unites or possesses respective control system;Multiple artificial intelligence process device shared drives possess respective memory;It is multiple The mutual contact mode of the artificial intelligence process device is any interconnection topology.
In a kind of possible embodiment, the device, further includes: storage device, the storage device respectively with institute It states artificial intelligence process device to connect with the primary processor, for saving the artificial intelligence process device device and the main process task The data of device.
According to another aspect of the present disclosure, a kind of artificial intelligence chip is proposed, the artificial intelligence chip includes described Artificial intelligence process device.
According to another aspect of the present disclosure, a kind of electronic equipment is proposed, the electronic equipment includes the artificial intelligence It can chip.
According to another aspect of the present disclosure, propose a kind of board, the board include: memory device, interface arrangement and Control device and the artificial intelligence chip;
Wherein, the artificial intelligence chip and the memory device, the control device and the interface arrangement are distinguished Connection;
The memory device, for storing data;
The interface arrangement, for realizing the data transmission between the chip and external equipment;
The control device is monitored for the state to the chip.
In a kind of possible embodiment, the memory device includes: multiple groups storage unit, is stored described in each group single It is first to be connect with the chip by bus, the storage unit are as follows: DDR SDRAM;
The chip includes: DDR controller, the control for data transmission and data storage to each storage unit System;
The interface arrangement are as follows: standard PCIE interface.
According to another aspect of the present disclosure, a kind of non-volatile computer readable storage medium storing program for executing is provided, is stored thereon with Computer program instructions, wherein the computer program instructions realize the above method when being executed by processor.
By above method, basic operator in the available artificial intelligence Operator Library of the disclosure utilizes the basis to calculate Son forms splicing operator, and the splicing operator of formation can be used for that input data is normalized, and supports new artificial intelligence Energy processor, to improve operation efficiency of the new artificial intelligence process device when carrying out the operation of neural network model.
According to below with reference to the accompanying drawings to detailed description of illustrative embodiments, the other feature and aspect of the disclosure will become It is clear.
Detailed description of the invention
Comprising in the description and constituting the attached drawing of part of specification and specification together illustrates the disclosure Exemplary embodiment, feature and aspect, and for explaining the principles of this disclosure.
Fig. 1 shows the flow chart of the operation method according to one embodiment of the disclosure.
Fig. 2 shows the software transfer hierarchical relationship schematic diagrames according to one embodiment of the disclosure.
Fig. 3 a-3c shows the schematic diagram of the splicing operator according to one embodiment of the disclosure.
Fig. 4 shows the block diagram of the arithmetic unit according to one embodiment of the disclosure.
Fig. 5 shows the block diagram of the arithmetic unit according to one embodiment of the disclosure.
Fig. 6 shows the block diagram of the artificial intelligence process device according to one embodiment of the disclosure.
Fig. 7 shows the block diagram of the artificial intelligence process device according to one embodiment of the disclosure.
Fig. 8 shows the block diagram of the artificial intelligence process device according to one embodiment of the disclosure.
Fig. 9 shows the block diagram of the main process task circuit 331 according to one embodiment of the disclosure.
Figure 10 shows the schematic diagram of the artificial intelligence process device according to one embodiment of the disclosure.
Figure 11 shows the schematic diagram of the artificial intelligence process device according to one embodiment of the disclosure.
Figure 12 shows a kind of board according to one embodiment of the disclosure.
Specific embodiment
Various exemplary embodiments, feature and the aspect of the disclosure are described in detail below with reference to attached drawing.It is identical in attached drawing Appended drawing reference indicate element functionally identical or similar.Although the various aspects of embodiment are shown in the attached drawings, remove It non-specifically points out, it is not necessary to attached drawing drawn to scale.
Dedicated word " exemplary " means " being used as example, embodiment or illustrative " herein.Here as " exemplary " Illustrated any embodiment should not necessarily be construed as preferred or advantageous over other embodiments.
In addition, giving numerous details in specific embodiment below to better illustrate the disclosure. It will be appreciated by those skilled in the art that without certain details, the disclosure equally be can be implemented.In some instances, for Method, means, element and circuit well known to those skilled in the art are not described in detail, in order to highlight the purport of the disclosure.
Referring to Fig. 1, Fig. 1 shows the flow chart of the operation method according to one embodiment of the disclosure.
The method can be applied in server or terminal, as shown in Figure 1, which comprises
Step S110 obtains the basic operator in artificial intelligence Operator Library, and the basis operator is for holding input data The corresponding arithmetic operation of row;
Step S120 forms splicing operator using the basic operator, and the splicing operator is used in artificial intelligence process Splicing arithmetic operation is executed to input data in device, so that input data be normalized.
By above method, basic operator in the available artificial intelligence Operator Library of the disclosure utilizes the basis to calculate Son forms splicing operator, and the splicing operator of formation can be used for that input data is normalized, and supports new artificial intelligence Energy processor, to improve operation efficiency of the new artificial intelligence process device when carrying out the operation of neural network model.
The splicing operator formed by above method, can be used as a part of artificial intelligence operation, when splicing operator fortune When for carrying out artificial intelligence operation in artificial intelligence process device, including but not limited to speech recognition, image recognition may be implemented Deng application, splicing operator is formed by combining deformation operator and basic operator, artificial intelligence process device can be allowed more preferable Realize artificial intelligence operation in ground.
In a kind of possible embodiment, operator (operator) can be common algorithm in artificial intelligence, and quilt Referred to as layer, operation, each neural network of node corresponds to a network structure, and the node in figure is operator.It can be pre- It is first provided with artificial intelligence Operator Library, may include multiple basic operators (such as convolution operator, Quan Lian in artificial intelligence Operator Library Connect operator, pond operator, activation operator etc.), each basis operator can be by including but not limited to central processor CPU, image The processors such as processor GPU are called to realize corresponding basic function.
In a kind of possible embodiment, the dimension of the first input data can be 4, when the first input data is picture When data, each dimension of the first input data can indicate picture number, picture channel (Channel) quantity, picture height, Picture width.In other embodiments, when the first input data is image data, but the dimension of the first input data is less than 4 When (for example, 3), each dimension of the first input data can indicate picture number, picture number of channels, picture height, picture Any 3 kinds in width.
In a kind of possible embodiment, the basis operator includes the first deformation operator (Reshape), the second deformation Operator (Reshape) and normalization exponential operator (softmax), the first deformation operator and the second deformation operator are used for Type conversion process is carried out to input data, the normalization exponential operator is used to be normalized operation, such as when input number When according to for multidimensional data (such as four-dimensional), normalization exponential operator can all reflect input data in all data of specified dimension It is mapped between 0-1, and the sum of the data after the mapping of the specified dimension are 1.
In one example, when input data is one-dimensional vector [- 3,2, -1,0], normalization exponential operator can be defeated to this Enter data and operation be normalized, so that input data is normalized to [0.0057,0.8390,0.0418,0.1135]), it can See, the sum of input data after normalization is 1.
It in another example, can be to this when input data is 2-D data [[1,1,1], [1,1,1]] of 2*3 Input data second dimension arithmetic operation is normalized, thus the input data after normalize for [[0.333,0.333, 0.333], [0.333,0.333,0.333]];Arithmetic operation can be normalized to the first dimension of the input data, thus Input data after being normalized is [[0.5,0.5,0.5], [0.5,0.5,0.5]], wherein first dimension can be defeated Enter the starting dimension of data.
In a kind of possible embodiment, step S120 by the deformation operator and the basic operator splice with Forming splicing operator may include:
Using the first deformation operator as the prime operator of the normalization exponential operator;
Using the second deformation operator as the rear class operator of the normalization exponential operator;
Wherein, the first deformation operator is used to the first input data of the first kind being converted into the second of Second Type Input data, the second deformation operator are used to convert the output data of the Second Type of the normalization exponential operator output For the output data of the first kind.
The splicing operator formed in the above manner can utilize the first deformation when input data is met certain condition Operator converts input data, and normalization exponential operator is recycled to return the input data after conversion in specified dimension One changes arithmetic operation, finally restore using result of the second deformation operator to normalization operation the conversion operation of shape, from And the result for normalizing operation is converted into data identical with the shape of input data.
In a kind of possible embodiment, in splicing operator by the first deformation operator, normalization exponential operator and second When deforming operator composition, the splicing arithmetic operation includes:
It is greater than 2 in the dimension of first input data of the first kind, and is taken in first input data When the first parameter and the second parameter of band meet preset condition, first input data is turned using the first deformation operator It is changed to Second Type, the dimension of the first input data of the Second Type is 2;
The first input data of the Second Type is returned in the second dimension using the normalization exponential operator One changes operation, to export the output data of Second Type;
The output data of Second Type is converted to the output data of the first kind using the second deformation operator.
When artificial intelligence process device carries out the operation about neural network, returning in artificial intelligence Operator Library can called The first input data, the first parameter, second parameter etc. are inputted when one change exponential operator.
In a kind of possible embodiment, the first parameter (preserve_shape) can indicate whether to retain input number According to shape, the second parameter (multi_output) can indicate whether the shape of input data and label data is consistent.
In the present embodiment, it can set the first parameter to true (true) when retaining input data shape, and False (false) is set by the first parameter when not needing to retain input data shape.
It in the present embodiment, can be in input data shape and label data in the training of neural network model When shape is identical, false (false) is set by the second parameter, and in other cases, true (true) is set by the second parameter, For example, can in the training of neural network model, by the shape of input data shape and label data when not identical second Parameter is set as very, can be carried out in forward operation (prediction) using neural network model, sets false for the second parameter.
Certainly, above description is exemplary, and those skilled in the art can according to need the first parameter of setting And second parameter value.
In a kind of possible embodiment, the first kind first input data dimension be greater than 2, and When the first parameter and the second parameter carried in first input data meets preset condition, calculated using first deformation First input data is converted to Second Type by son, and the dimension of the first input data of the Second Type is 2, in which:
The preset condition can be false for the first parameter and the second parameter simultaneously.
It is carried out using first input data of the normalization exponential operator to the Second Type after the first deformation operator conversion After normalizing operation, available normalized result (output data of Second Type), the data type of normalized result Identical as the first input data of Second Type, they are all 2-D datas.
In order to enable output data is identical as the shape of input data, the second deformation operator can use by normalized knot Fruit is converted, so that the output data of the first kind is obtained, after the second deformation operator conversion process, the output of the first kind Data are identical as the input data shape of the first kind, for example, the output data of the first kind is 4 D data, each dimension Put in order it is identical as the input data of the first kind.
It, can be with when needing using artificial intelligence operation to carry out speech recognition, image procossing in an application example Utilize normalization index splicing operator (the first deformation operator+connection operator+the second deformation calculation entirely in one embodiment of the disclosure Son) exponent arithmetic is normalized, to utilize normalization index splicing operator will when input data is met certain condition Input data is converted, and is converted to 2 dimension datas will be greater than the input data of 2 dimensions, and to the input data of 2 dimensions after conversion Operation is normalized in the second dimension, to obtain output data, then by the data shape of output data be converted to it is defeated It is identical to enter data.Operator is spliced using normalization index described in the disclosure, can more advantageously execute artificial intelligence operation To realize the including but is not limited to application such as image procossing, speech recognition, to improve the efficiency of artificial intelligence operation.
In a kind of possible embodiment, the basis operator includes scaling operator and normalization operator, the scaling Operator is used to zoom in and out input data operation, and the normalization operator is for being normalized operation to input data.
In a kind of possible embodiment, the scaling operator can be realized by following formula:
Y=nX+m, wherein X indicates input data, and n indicates that scaling multiple, m indicate that displacement number, Y indicate zoom operation knot Fruit, wherein n and m can indicate scalar, can also indicate tensor, according to the scaling operator that the formula obtains, calculating process class It is similar to vector multiplication operation.
In a kind of possible embodiment, the splicing arithmetic operation includes:
Input data is normalized using the normalization operator, obtains normalization result;
The normalization result is zoomed in and out into processing using the scaling operator, thus the normalization knot after being scaled Fruit.
In a kind of possible embodiment, the normalization operator can be realized by following formula:
Wherein, data [:, i: ...] be input data, data_mean [i] be desired value, data_var [i] be standard Difference, ε are constant.
In one example, if to obtain input data data [:, i: ...] normalization result out [:, I: ...]:
Wherein, gamma [i] is scaling multiple, and beta [i] indicates displacement number,
Then can first with normalization operator to input data data [:, i: ...] be normalized, obtain formula Then normalization shown in 1 is as a result, zoom in and out processing using normalization result of the scaling operator to normalization operator, to obtain Normalization result as shown in Equation 2 after to scaling.
It, can be with when needing using artificial intelligence operation to carry out speech recognition, image procossing in an application example Operation is normalized using the normalization splicing operator (normalization operator+scaling operator) in one embodiment of the disclosure, To can use normalizing when needing that input data is normalized to obtain normalizing result as shown in Equation 2 Change splicing operator input data is normalized, to obtain normalization shown in formula 1 as a result, and utilizing normalization splicing Operator zooms in and out operation to the input data after normalization, to obtain operation result to the end.Using described in the disclosure Normalization splicing operator, can execute artificial intelligence operation more advantageously to realize including but not limited to image procossing, voice The application such as identification, to improve the efficiency of artificial intelligence operation.
In the above manner, the disclosure can use scaling operator and normalization operator forms splicing operator, which is calculated Son input data can be normalized and scaling processing, to obtain the normalization result of input data.
In a kind of possible embodiment, the basis operator is inverted including square operator, convolution operator, square root Operator and multiplication operator, described square of operator are used to carry out input data square operation, and the convolution operator is used for input Data carry out summation operation, and the square root, which is got, figures son for carrying out sqrt to input data and seeking fortune reciprocal It calculates, the multiplication operator is used to carry out multiplying to input data, wherein
Step S120 forms splicing operator (L2_normalization) using the basic operator, comprising:
Described square of operator, convolution operator, the inverted operator of square root and multiplication operator are successively spliced to form the spelling Connect operator.
In a kind of possible embodiment, the splicing arithmetic operation includes:
Square operation is carried out to input data using described square of operator, obtains square operation result;
Summation operation is carried out to multiple square operation results using the convolution operator, obtains summation operation result;
For example, by taking input data is a RGB picture as an example, its channel is 3, described to utilize the convolution Operator carries out summation operation to multiple square operation results, that is, calculates the quadratic sum of respective pixel in three channels, then Result is put into respectively R, G, in channel B.
In a kind of possible embodiment, the number of convolution kernel is identical as channel number, and the size of convolution kernel is 1x1, weight all 1.It can thus accomplish that the data by all input channel are added, and the result of addition is put into Into each channel.
Sqrt operation is successively carried out to the summation operation result using square root inverted operator and seeks inverse Operation obtains derivative action result;
Multiplying is carried out to input data and the derivative action result using the multiplication operator, obtains normalization knot Fruit.
In a kind of possible embodiment, it can indicate that the splicing operation of the above splicing operator is grasped by following formula Make:
Out [:, i: ... :]=data [:, i: ... :]/sqrt (sum (data [:, i: ... :] * * 2), In, i is the integer of 1~N, out [:, i: ... :] be splicing operator output normalization as a result, data [:, I: ... :] be input data, data [:, i: ... :] * * 2 be to input data carry out square operation, sum (data [:, I: ... :] * * 2 be to squared results carry out summation operation, sqrt (sum (data [:, i: ... :] * * 2) and be to summation As a result it carries out sqrt and seeks operation reciprocal.
In an application example, which can be applied to Single Shot MultiBox Detector (SSD) neural network of target detection is used in.
It, can be with when needing using artificial intelligence operation to carry out speech recognition, image procossing in an application example (the inverted operator of square operator, convolution operator, square root and multiplied using the normalization splicing operator in one embodiment of the disclosure Method operator) operation is normalized, thus needing input data is normalized to obtain as shown in formula 2 When normalizing result, it can use normalization splicing operator and input data be normalized, to obtain normalization result. Using normalization described in the disclosure splice operator, can more advantageously execute artificial intelligence operation with realize including but it is unlimited It is applied in image procossing, speech recognition etc., to improve the efficiency of artificial intelligence operation.
Referring to Fig. 2, Fig. 2 shows the software transfer hierarchical relationship schematic diagrames according to one embodiment of the disclosure.
As shown in Fig. 2, software transfer hierarchical relationship from top to bottom successively include application layer, ccf layer, Operator Library layer, Drive layer, chip layer, wherein the splicing operator obtained by foregoing operation method can be applied to application layer, artificial intelligence Energy Operator Library can be in Operator Library layer, and artificial intelligence process device can be located in chip layer, and driving layer may include for driving The driver of dynamic chip layer work.
It, can by described above it is found that using the deformation operator in Operator Library layer and after basic operator forms splicing operator Directly to be called by application layer to be applied in application layer, to realize corresponding function in artificial intelligence operation Can, it avoids and requires to transfer deformation operator from Operator Library layer each time when application layer will carry out artificial intelligence operation And the case where basis operator, so as to improve the implementation procedure of artificial intelligence operation.
Please refer to the signal for the splicing operator that Fig. 3 a- Fig. 3 c, Fig. 3 a- Fig. 3 c is shown according to one embodiment of the disclosure Figure.
As shown in Figure 3a, the splicing operator includes:
First deformation operator 11, for first input data to be converted to Second Type, the of the Second Type The dimension of one input data is 2;
The normalization exponential operator 12, carries out in the second dimension for the first input data to the Second Type Operation is normalized, to export the output data of Second Type;
Second deformation operator 13, for the output data of Second Type to be converted to the output data of the first kind.
Wherein, described to splice the deformation operator to form splicing operator and include: with the basic operator
Prime operator by the first deformation operator 11 as the normalization exponential operator 12;
Rear class operator by the second deformation operator 13 as the normalization exponential operator 12.
In a kind of possible embodiment, the arithmetic operation includes:
Indicate that the dimension of first input data is greater than 2 in the first kind, and in first input data When the first parameter and the second parameter carried meets preset condition, number is inputted by described first using the first deformation operator 11 According to Second Type is converted to, the Second Type indicates the dimension of first input data after converting into 2;
First input data of the Second Type is carried out in the second dimension using the normalization exponential operator 12 Operation is normalized, to export the output data of Second Type;
The output data of Second Type is converted to the output data of the first kind using the second deformation operator 13.
By splicing operator above, the first parameter and the second parameter that the disclosure can carry in input data meet pre- If when condition, first input data being converted to Second Type using the first deformation operator, utilizes the normalization Operation is normalized in the second dimension to the first input data of the Second Type in exponential operator, to export Second Type Output data, the output data of Second Type is converted to the output data of the first kind using the second deformation operator.
As shown in Figure 3b, in a kind of possible embodiment, the basis operator includes normalization operator 21 and scaling Operator 22, the normalization operator 21 is for being normalized input data, wherein the splicing arithmetic operation packet It includes:
Input data is normalized using the normalization operator, obtains normalization result;
The normalization result is zoomed in and out into processing using the scaling operator, thus the normalization knot after being scaled Fruit.
As shown in Figure 3c, in a kind of possible embodiment, the basis operator includes square operator 31, convolution operator 32, the inverted operator 33 of square root and multiplication operator 34, described square of operator 31 are used to carry out square operation, institute to input data State convolution operator 32 for input data carry out summation operation, the square root get figure son 33 for input data into Row sqrt simultaneously seeks operation reciprocal, and the multiplication operator 34 is used to carry out multiplying to input data, wherein
It is described to form splicing operator using the basic operator, comprising:
Described square of operator 31, convolution operator 32, the inverted operator 33 of square root and multiplication operator 34 are successively spliced into shape At the splicing operator.
In a kind of possible embodiment, the splicing arithmetic operation includes:
Square operation is carried out to input data using described square of operator, obtains square operation result;
Summation operation is carried out to multiple square operation results using the convolution operator, obtains summation operation result;
Sqrt operation is successively carried out to the summation operation result using square root inverted operator and seeks inverse Operation obtains derivative action result;
Multiplying is carried out to input data and the derivative action result using the multiplication operator, obtains normalization knot Fruit.
Referring to Fig. 4, Fig. 4 shows the block diagram of the arithmetic unit according to one embodiment of the disclosure.
As shown in figure 4, described device includes:
Module 80 is obtained, for obtaining the basic operator in artificial intelligence Operator Library, the basis operator is used for input Data execute corresponding arithmetic operation;
Computing module 90 is connected to the acquisition module 80, described for forming splicing operator using the basic operator Splice operator to be used in artificial intelligence process device execute input data splicing arithmetic operation, to return input data One change processing.
By apparatus above, basic operator in the available artificial intelligence Operator Library of the disclosure utilizes the basis to calculate Son forms splicing operator, and the splicing operator of formation can be used for that input data is normalized, and supports new artificial intelligence Energy processor, to improve operation efficiency of the new artificial intelligence process device when carrying out the operation of neural network model.
Referring to Fig. 5, Fig. 5 shows the block diagram of the arithmetic unit according to one embodiment of the disclosure.
In a kind of possible embodiment, as shown in figure 5, the basis operator includes the first deformation operator, the second change Shape operator and normalization exponential operator, the first deformation operator and the second deformation operator are used to carry out class to input data Type conversion process, the normalization exponential operator is for being normalized operation, wherein
The computing module 90 includes the first operation submodule 910, and the first operation submodule 910 is configured as:
Using the first deformation operator as the prime operator of the normalization exponential operator;
Using the second deformation operator as the rear class operator of the normalization exponential operator.
In a kind of possible embodiment, the splicing arithmetic operation includes:
It is greater than 2 in the dimension of first input data of the first kind, and is taken in first input data When the first parameter and the second parameter of band meet preset condition, first input data is turned using the first deformation operator It is changed to Second Type, the dimension of the first input data of the Second Type is 2;
The first input data of the Second Type is returned in the second dimension using the normalization exponential operator One changes operation, to export the output data of Second Type;
The output data of Second Type is converted to the output data of the first kind using the second deformation operator.
In a kind of possible embodiment, the basis operator includes scaling operator and normalization operator, the scaling Operator is used to zoom in and out input data operation, and the normalization operator is used to that operation to be normalized to input data, In,
The computing module 90 includes the second operation submodule 920, and the second operation submodule 920 is configured as:
Using the normalization operator as the prime operator of the scaling operator.
In a kind of possible embodiment, the splicing arithmetic operation includes:
Input data is normalized using the normalization operator, obtains normalization result;
The normalization result is zoomed in and out into processing using the scaling operator, the normalization result after being scaled.
In a kind of possible embodiment, the basis operator is inverted including square operator, convolution operator, square root Operator and multiplication operator, described square of operator are used to carry out input data square operation, and the convolution operator is used for input Data carry out summation operation, and the square root, which is got, figures son for carrying out sqrt to input data and seeking fortune reciprocal It calculates, the multiplication operator is used to carry out multiplying to input data, wherein
The computing module 90 includes third operation submodule 930, and the third operation submodule 930 is configured as:
Described square of operator, convolution operator, the inverted operator of square root and multiplication operator are successively spliced to form the spelling Connect operator.
In a kind of possible embodiment, the splicing arithmetic operation includes:
Square operation is carried out to input data using described square of operator, obtains square operation result;
Summation operation is carried out to multiple square operation results using the convolution operator, obtains summation operation result;
Sqrt operation is successively carried out to the summation operation result using square root inverted operator and seeks inverse Operation obtains derivative action result;
Multiplying is carried out to input data and the derivative action result using the multiplication operator, obtains normalization knot Fruit.
Referring to Fig. 6, Fig. 6 shows the block diagram of the artificial intelligence process device according to one embodiment of the disclosure.
In a kind of possible embodiment, as shown in fig. 6,
Primary processor 50, for executing the method, to obtain splicing operator, the splicing operator is used for described defeated Enter data and executes corresponding arithmetic operation;
Artificial intelligence process device 60 is electrically connected to the primary processor 50;
The primary processor 50 is also used to send input data and the splicing operator to artificial intelligence process device 60, described Artificial intelligence process device 60 is configured as:
Receive the input data and splicing operator that primary processor 50 is sent;
Artificial intelligence operation is carried out to obtain operation result to the input data using the splicing operator;
The operation result is sent to the primary processor 50.
In a kind of possible embodiment, primary processor 50 may include primary processor memory space, for storing master Processor 50 executes the splicing operator that the operation method obtains, wherein
The splicing that the primary processor 50 also provides for input data and is stored in the primary processor memory space Operator.
It is to be understood that primary processor 50 can execute the operation method after obtaining data, obtains splicing and calculate Son, and the splicing operator of acquisition is sent to artificial intelligence process device 60 simultaneously and is handled.Primary processor 50 can also will be deposited The splicing operator of storage is sent to artificial intelligence process device 60, and pre-stored splicing operator is sent to artificial intelligence to realize Processor 60, artificial intelligence process device 60 carry out artificial intelligence operation according to the splicing operator and input data received.More than Two ways, former can be considered that the mode handled immediately on line, latter can be considered processing mode under line.
In a kind of possible embodiment, device as shown in Figure 4, Figure 5 can be realized in primary processor 50.
In a kind of possible embodiment, primary processor 50 can be central processor CPU, be also possible to other types Processor, such as image processor GPU.It is to be understood that the splicing operator is obtained by foregoing operation method Splice operator, the specific description introduced before please referring to splicing operator, details are not described herein.
In a kind of possible embodiment, artificial intelligence process device can be to be formed by multiple identical processors , such as multiple processors (XPU) formation is similar to the framework of primary processor 50+ artificial intelligence process device 60.Can also be by One processor forms, and in this case, processor can both execute operation method above-mentioned, calculates to obtain splicing Son can also carry out artificial intelligence operation to input data by splicing operator, to obtain output result.In present embodiment In, the type of processor can be existing, be also possible to the new types of processors newly proposed, the disclosure is without limitation.
In a kind of possible embodiment, primary processor 50 can be used as artificial intelligence process device and external data and The interface of control, including data are carried, and the basic control such as unlatching, stopping to this artificial intelligent treatment device is completed;Its elsewhere Managing device can also be with the common completion processor active task of artificial intelligence process device cooperation.
In a kind of possible embodiment, artificial intelligence process device may include more than one artificial intelligence process Device can be linked between artificial intelligence process device by specific structure and transmit data, for example, be carried out by PCIE bus Data are interconnected and transmit, to support the operation of more massive machine learning.At this point it is possible to same control system is shared, it can also To there is control system independent;Can with shared drive, can also each accelerator have respective memory.In addition, it is interconnected Mode can be any interconnection topology.
The artificial intelligent treatment device compatibility with higher, can pass through PCIE interface and various types of server phases Connection.
Referring to Fig. 7, Fig. 7 shows the block diagram of the artificial intelligence process device according to one embodiment of the disclosure.
In a kind of possible embodiment, as shown in fig. 7, primary processor 50 and artificial intelligence process device 60 can pass through General interconnecting interface (such as I/O interface) connection, for transmitting data and control between primary processor 50 and artificial intelligence process device 60 System instruction.The artificial intelligent processor 60 obtains required input data (including splicing operator), write-in from primary processor 50 The storage device of artificial intelligence process device on piece;Control instruction can be obtained from primary processor 50, be written at artificial intelligence Manage the control caching of device on piece;The data in the memory module of artificial intelligence process device 60 can also be read and be transferred to other Processing unit.
In a kind of possible embodiment, artificial intelligence process device can also include storage device, storage device point It is not connect with the artificial intelligence process device and other described processing units.Storage device is for being stored in the artificial intelligence The data of the data of processing unit and other processing units, operation required for being particularly suitable for are filled in this artificial intelligence process Set or the storage inside of other processing units in the data that can not all save.
The combined treatment device can be used as the SOC on piece of the equipment such as mobile phone, robot, unmanned plane, video monitoring equipment The die area of control section is effectively reduced in system, improves processing speed, reduces overall power.When this situation, the combined treatment The general interconnecting interface of device is connected with certain components of equipment.Certain components for example camera, display, mouse, keyboard, Network interface card, wifi interface.By the above artificial intelligence process device, the disclosure can be by primary processor by input data and splicing Operator is transferred to artificial intelligence process device, and artificial intelligence process executes artificial intelligence operation using splicing operator to input data Operation, to obtain operation result, and is sent to primary processor for operation result.
It is to be understood that artificial intelligence process device 60 can be the single processor that can be used for artificial intelligence operation, It is also possible to the combination of a variety of different processors.Artificial intelligence process device is applied to artificial intelligence operation, artificial intelligence operation packet Include machine learning operation, class brain operation, etc..Wherein, machine learning operation includes neural network computing, k-means operation, branch Hold vector machine operation etc..The artificial intelligent processor 60 can specifically include GPU (Graphics Processing Unit, figure Shape processor unit), NPU (Neural-Network Processing Unit, neural network processor unit), DSP (Digital Signal Process, Digital Signal Processing), field programmable gate array (Field-Programmable Gate Array, FPGA) chip one kind or combination.
In a kind of possible embodiment, artificial intelligence process device 60 is as shown in Figure 8.Referring to Fig. 8, Fig. 8 is shown According to the block diagram of the artificial intelligence process device of one embodiment of the disclosure.
As shown in figure 8, the artificial intelligence process device 30 includes control module 32, computing module 33 and memory module 31, The computing module 33 include main process task circuit 331 and it is multiple from processing circuit 332 (from the number of processing circuit be example in figure Property).
The control module 32, for obtaining input data and computations;
The control module 32 is also used to parse the computations and obtains multiple operational orders, by multiple operational order And the input data is sent to the main process task circuit 331;
The main process task circuit 331, for executing preamble processing and with the multiple from processing to the input data Data and operational order are transmitted between circuit;
It is the multiple from processing circuit 332, for referring to according to the data and operation transmitted from the main process task circuit 331 It enables the parallel intermediate operations that execute obtain multiple intermediate results, and multiple intermediate results is transferred to the main process task circuit 331;
The main process task circuit 331 obtains the computations for executing subsequent processing to the multiple intermediate result Calculated result.
Artificial intelligence process device 30 described in the disclosure holds input data after receiving input data and computations The corresponding arithmetic operation of row, to obtain the calculated result.
Artificial intelligence process device described in the disclosure can support the artificial intelligence of machine learning and some non-machine learning It can algorithm.
Above-mentioned computations include but is not limited to: forward operation instruction or reverse train instruction, the application specific embodiment party Formula is not intended to limit the specific manifestation form of above-mentioned computations.
It, can be by the meter after artificial intelligence process 30 obtains the calculated result in a kind of possible embodiment It calculates result and is sent to other processors such as central processor CPU or image processor GPU.
The operational order is run code of the artificial intelligent processor 30 according to splicing operator acquisition, above-mentioned to run Code includes but is not limited to: forward operation instruction or reverse train instruction or the instruction of other neural network computings etc., the application Specific embodiment is not intended to limit the specific manifestation form of above-mentioned computations.
In a kind of possible embodiment, the artificial intelligence process device 30 can be obtained by data transmission module 360 It arrives, which is specifically as follows one or more data I/O interfaces or I/O pin.
The main process task circuit 331, for operational data executing preamble processing with the operation that obtains that treated to described Data, and with it is the multiple from transmitted between processing circuit in the operational data, intermediate result and operational order at least one Kind.
The block diagram of the main process task circuit 331 according to one embodiment of the disclosure is shown also referring to Fig. 9, Fig. 9.
As shown in figure 9, main process task circuit 331 may include: conversion processing circuit 113, activation processing circuit 111, addition One of processing circuit 112 or any combination.
The conversion processing circuit 113 is handled for executing the preamble to the data, and the preamble processing can are as follows: The received data of main process task circuit 331 or intermediate result are executed to the exchange between the first data structure and the second data structure (such as conversion of continuous data and discrete data);Or the received data of main process task circuit 331 or intermediate result are executed first Exchange (such as conversion of fixed point type and floating point type) between data type and the second data type.
The activation processing circuit 111 specially counts in execution main process task circuit 331 for executing the subsequent processing According to activation operation;
The addition process circuit 112, for executing the subsequent processing, specially execution add operation or cumulative fortune It calculates.
Each from processing circuit 332, operational data and operational order for being transmitted according to the main process task circuit 331 are held Row intermediate operations obtain intermediate result, and the intermediate result is transferred to the main process task circuit 331;
The main process task circuit 331 obtains the operational order most for executing subsequent processing to multiple intermediate results Whole calculated result.
The control module 32 is also used to generate debugging result according to the state information, and to the state information acquisition Device 40 exports debugging result.
Memory module 31 is used to store the status information in the calculating process, wherein the state according to operational order Information includes status information in the preamble treatment process of the main process task circuit 331, the multiple from processing circuit 332 Between the status information in calculating process, at least one in the status information in the subsequent processes of the main process task circuit 331 Kind.The memory module may include on piece sub-module stored 310, and the on piece sub-module stored 310 may include that high speed is temporary Deposit memory.
Memory module 31 can also include register, one or any combination in caching, specifically, the caching, For storing the computations;The register, for storing the neural network model, the data and scalar;It is described Caching is that scratchpad caches.
In a kind of possible embodiment, control module 32 may include: instruction buffer submodule 320, instruction processing Submodule 321 and storage queue submodule 323;
Instruction buffer submodule 320, for storing the associated computations of the neural network model;
Described instruction handles submodule 321, obtains multiple operational orders for parsing to the computations;
Storage queue submodule 323, for storing instruction queue, the instruction queue include: the tandem by the queue Pending multiple operational orders or computations.
For example, main process task circuit 331 also may include a control module in a kind of possible embodiment 32, which may include master instruction processing submodule, be specifically used for Instruction decoding into microcommand.Certainly in one kind It also may include another control module 32 from processing circuit 332 in possible embodiment, another control module 32 Including handling submodule from instruction, specifically for receiving and processing microcommand.Above-mentioned microcommand can refer to for the next stage of instruction It enables, which can further can be decoded as each component, each module or each by obtaining after the fractionation or decoding to instruction The control signal of processing circuit.
In a kind of optinal plan, the structure of the computations can be as shown in following table one.
Table one
Operation code Register or immediate Register/immediate ...
Ellipsis expression in upper table may include multiple registers or immediate.
In alternative dispensing means, which may include: one or more operation domains and an operation code. The computations may include neural network computing instruction.By taking neural network computing instructs as an example, as shown in table 1, wherein deposit Device number 0, register number 1, register number 2, register number 3, register number 4 can be operation domain.Wherein, each register number 0, Register number 1, register number 2, register number 3, register number 4 can be the number of one or more register.For example, such as Shown in following table two.
Table two
Above-mentioned register can be chip external memory, certainly in practical applications, or on-chip memory, for depositing Store up data, which is specifically as follows t dimension data, and t is the integer more than or equal to 1, for example, be 1 dimension data when t=1, i.e., to Amount is 2 dimension datas, i.e. matrix when such as t=2, is multidimensional tensor when such as t=3 or 3 or more.
Optionally, which can also include:
Dependence handles submodule 322, for when with multiple operational orders, determine the first operational order with it is described The 0th operational order before first operational order whether there is incidence relation, such as first operational order and the 0th fortune Calculating instruction, there are incidence relations, then first operational order are buffered in described instruction cache sub-module, the described 0th After operational order is finished, first operational order is extracted from described instruction cache sub-module and is transmitted to the operation mould Block;
The determination first operational order whether there is with the 0th operational order before the first operational order to be associated with System includes:
Extract required data (such as matrix) in first operational order according to first operational order first is deposited Address section is stored up, the 0th stored address area of required matrix in the 0th operational order is extracted according to the 0th operational order Between, such as first storage address section has Chong Die region with the 0th storage address section, it is determined that described first Operational order and the 0th operational order have incidence relation, such as first storage address section and the 0th storage Location section does not have the region of overlapping, it is determined that first operational order does not have with the 0th operational order to be associated with System.
Referring to Fig. 10, Figure 10 shows the schematic diagram of the artificial intelligence process device according to one embodiment of the disclosure.
In a kind of possible embodiment, computing module 33 may include branch process circuit 333 as shown in Figure 10; Its specific connection structure is as shown in Figure 10, wherein
Main process task circuit 331 is connect with branch process circuit 333, branch process circuit 333 and multiple from processing circuit 332 Connection;
Branch process circuit 333, for execute forwarding main process task circuit 331 and between processing circuit 332 data or Instruction.
In a kind of possible embodiment, by taking the full connection operation in neural network computing as an example, process can be with are as follows: y =f (wx+b), wherein x is to input neural variable matrix, and w is weight matrix, and b is biasing scalar, and f is activation primitive, specifically may be used Any one with are as follows: sigmoid function, in tanh, relu, softmax function.It is assumed that being binary tree structure, have 8 A method from processing circuit, realized can be with are as follows:
Control module obtains input nerve variable matrix x, weight matrix w out of memory module 31 and full connection operation refers to It enables, input nerve variable matrix x, weight matrix w and full connection operational order is transferred to main process task circuit;
Main process task circuit splits into 8 submatrixs for nerve variable matrix x is inputted, and 8 submatrixs are then passed through tree-shaped mould Block is distributed to 8 from processing circuit, and weight matrix w is broadcast to 8 from processing circuit,
The multiplying and accumulating operation for executing 8 submatrixs and weight matrix w parallel from processing circuit obtain 8 centres As a result, 8 intermediate results are sent to main process task circuit;
The operation result is executed biasing for sorting to obtain the operation result of wx by 8 intermediate results by main process task circuit Execute activation operation after the operation of b and obtain final result y, final result y is sent to control module, control module by this most Termination fruit y output is stored to memory module 31.
The method that neural network computing device as shown in Figure 10 executes the instruction of neural network forward operation is specifically as follows:
Control module 32 extracted out of memory module 31 operational data (such as neural network forward operation instruction, nerve net Network operational order) operation domain is transmitted to data access by corresponding operation domain and at least one operation code, control module 32 At least one operation code is sent to computing module by module.
Control module 32 extracts the corresponding weight w of the operation domain out of memory module 31 and biasing b (when b is 0, is not required to It extracts biasing b), weight w and biasing b is transmitted to the main process task circuit of computing module, control module is out of memory module 31 Input data Xi is extracted, input data Xi is sent to main process task circuit.
Input data Xi is split into n data block by main process task circuit;
The instruction processing submodule 321 of control module 32 determines that multiplying order, biasing refer to according at least one operation code It enables and accumulated instruction, multiplying order, offset instructions and accumulated instruction is sent to main process task circuit, main process task circuit is by the multiplication Instruction, weight w are sent to multiple from processing circuit in a broadcast manner, which are distributed to multiple electric from processing Road (such as with n from processing circuit, then each sending a data block from processing circuit);It is multiple from processing circuit, use Intermediate result is obtained in the weight w is executed multiplying with the data block received according to the multiplying order, which is tied Fruit is sent to main process task circuit, which holds multiple intermediate results sent from processing circuit according to the accumulated instruction Row accumulating operation obtains accumulation result, and accumulation result execution biasing is held b according to the bigoted instruction and obtains final result, by this Final result is sent to the control module.
In addition, the sequence of add operation and multiplying can exchange.
Technical solution provided by the present application is that neural network computing instruction realizes neural network by an instruction Multiplying and biasing operation are not necessarily to store or extract, reduce intermediate data in the intermediate result of neural computing Storage and extraction operation, so it, which has, reduces corresponding operating procedure, the advantages of improving the calculating effect of neural network.
Figure 11 is please referred to, Figure 11 shows the schematic diagram of the artificial intelligence process device according to one embodiment of the disclosure.
In a kind of possible embodiment, computing module 33 may include a main process task circuit 331 as shown in figure 11 With multiple from processing circuit 332.
In a kind of possible embodiment, as shown in figure 11, it is multiple from processing circuit be in array distribution;Each from processing Circuit is connect with other adjacent from processing circuit, and main process task circuit connection is the multiple a from processing from the k in processing circuit Circuit, the k are a from processing circuit are as follows: the n of n of the 1st row from processing circuit, m row is a to be arranged from processing circuit and the 1st M from processing circuit, it should be noted that as shown in figure 11 K only include n of the 1st row from processing circuit from processing Circuit, the n m arranged from processing circuit and the 1st of m row are a from processing circuit, i.e. the k are multiple from from processing circuit Manage circuit in directly with the slave processing circuit of main process task circuit connection.
K is from processing circuit, in the main process task circuit and multiple data between processing circuit and referring to The forwarding of order.
In some embodiments, a kind of chip has also been applied for comprising above-mentioned artificial intelligence process device.
In some embodiments, a kind of chip-packaging structure has been applied for comprising said chip.
In some embodiments, a kind of board has been applied for comprising said chip encapsulating structure.
Figure 12 is please referred to, Figure 12 shows a kind of board according to one embodiment of the disclosure, and above-mentioned board is in addition to including Can also include other matching components, which includes but is not limited to other than said chip 389: memory device 390, Interface arrangement 391 and control device 392;
The memory device 390 is connect with the chip in the chip-packaging structure by bus, for storing data.Institute Stating memory device may include multiple groups storage unit 393.Storage unit described in each group is connect with the chip by bus.It can To understand, storage unit described in each group can be DDR SDRAM (English: Double Data Rate SDRAM, Double Data Rate Synchronous DRAM).
DDR, which does not need raising clock frequency, can double to improve the speed of SDRAM.DDR allows the rising in clock pulses Edge and failing edge read data.The speed of DDR is twice of standard SDRAM.In one embodiment, the storage device can be with Including storage unit described in 4 groups.Storage unit described in each group may include multiple DDR4 particles (chip).In one embodiment In, the chip interior may include 4 72 DDR4 controllers, and 64bit is used for transmission number in above-mentioned 72 DDR4 controllers According to 8bit is used for ECC check.It is appreciated that data pass when using DDR4-3200 particle in the storage unit described in each group Defeated theoretical bandwidth can reach 25600MB/s.
In one embodiment, storage unit described in each group include multiple Double Data Rate synchronous dynamics being arranged in parallel with Machine memory.DDR can transmit data twice within a clock cycle.The controller of setting control DDR in the chips, Control for data transmission and data storage to each storage unit.
The interface arrangement is electrically connected with the chip in the chip-packaging structure.The interface arrangement is for realizing described Data transmission between chip and external equipment (such as server or computer).Such as in one embodiment, the interface Device can be standard PCIE interface.For example, data to be processed are transferred to the core by standard PCIE interface by server Piece realizes data transfer.Preferably, when using the transmission of 3.0 X of PCIE, 16 interface, theoretical bandwidth can reach 16000MB/ s.In another embodiment, the interface arrangement can also be other interfaces, and the application is not intended to limit above-mentioned others and connects The specific manifestation form of mouth, the interface unit can be realized signaling transfer point.In addition, the calculated result of the chip still by The interface arrangement sends back external equipment (such as server).
The control device is electrically connected with the chip.The control device is for supervising the state of the chip Control.Specifically, the chip can be electrically connected with the control device by SPI interface.The control device may include list Piece machine (Micro Controller Unit, MCU).If the chip may include multiple processing chips, multiple processing cores or more A processing circuit can drive multiple loads.Therefore, the chip may be at the different work shape such as multi-load and light load State.It may be implemented by the control device to processing chips multiple in the chip, multiple processing and/or multiple processing circuits Working condition regulation.
In some embodiments, a kind of electronic equipment has been applied for comprising above-mentioned board.
Electronic equipment include data processing equipment, robot, computer, printer, scanner, tablet computer, intelligent terminal, Mobile phone, automobile data recorder, navigator, sensor, camera, server, cloud server, camera, video camera, projector, hand Table, earphone, mobile storage, wearable device, the vehicles, household electrical appliance, and/or Medical Devices.
The vehicles include aircraft, steamer and/or vehicle;The household electrical appliance include TV, air-conditioning, micro-wave oven, Refrigerator, electric cooker, humidifier, washing machine, electric light, gas-cooker, kitchen ventilator;The Medical Devices include Nuclear Magnetic Resonance, B ultrasound instrument And/or electrocardiograph.
It should be noted that for the various method embodiments described above, for simple description, therefore, it is stated as a series of Combination of actions, but those skilled in the art should understand that, the application is not limited by the described action sequence because According to the application, some steps may be performed in other sequences or simultaneously.Secondly, those skilled in the art should also know It knows, embodiment described in this description belongs to alternative embodiment, related actions and modules not necessarily the application It is necessary.
In the above-described embodiments, it all emphasizes particularly on different fields to the description of each embodiment, there is no the portion being described in detail in some embodiment Point, reference can be made to the related descriptions of other embodiments.
In several embodiments provided herein, it should be understood that disclosed device, it can be by another way It realizes.For example, the apparatus embodiments described above are merely exemplary, such as the division of the module, it is only a kind of Logical function partition, there may be another division manner in actual implementation, such as multiple module or components can combine or can To be integrated into another system, or some features can be ignored or not executed.Another point, shown or discussed is mutual Coupling, direct-coupling or communication connection can be through some interfaces, the indirect coupling or communication connection of device or module, It can be electrical or other forms.
The module as illustrated by the separation member may or may not be physically separated, aobvious as module The component shown may or may not be physical module, it can and it is in one place, or may be distributed over multiple On network module.Some or all of the modules therein can be selected to realize the mesh of this embodiment scheme according to the actual needs 's.
It, can also be in addition, can integrate in a processing module in each functional module in each embodiment of the application It is that modules physically exist alone, can also be integrated in two or more modules in a module.Above-mentioned integrated mould Block both can take the form of hardware realization, can also be realized in the form of software program module.
If the integrated module is realized in the form of software program module and sells or use as independent product When, it can store in a computer-readable access to memory.Based on this understanding, the technical solution of the application substantially or Person says that all or part of the part that contributes to existing technology or the technical solution can body in the form of software products Reveal and, which is stored in a memory, including some instructions are used so that a computer equipment (can be personal computer, server or network equipment etc.) executes all or part of each embodiment the method for the application Step.And memory above-mentioned includes: USB flash disk, read-only memory (ROM, Read-Only Memory), random access memory The various media that can store program code such as (RAM, Random Access Memory), mobile hard disk, magnetic or disk.
Those of ordinary skill in the art will appreciate that all or part of the steps in the various methods of above-described embodiment is can It is completed with instructing relevant hardware by program, which can store in a computer-readable memory, memory May include: flash disk, read-only memory (English: Read-Only Memory, referred to as: ROM), random access device (English: Random Access Memory, referred to as: RAM), disk or CD etc..
The presently disclosed embodiments is described above, above description is exemplary, and non-exclusive, and It is not limited to disclosed each embodiment.Without departing from the scope and spirit of illustrated each embodiment, for this skill Many modifications and changes are obvious for the those of ordinary skill in art field.The selection of term used herein, purport In the principle, practical application or technological improvement to the technology in market for best explaining each embodiment, or lead this technology Other those of ordinary skill in domain can understand each embodiment disclosed herein.

Claims (23)

1. a kind of operation method, which is characterized in that the described method includes:
The basic operator in artificial intelligence Operator Library is obtained, the basis operator is used to execute corresponding operation to input data and grasp Make;
Splicing operator is formed using the basic operator, the splicing operator is used in artificial intelligence process device to input data Splicing arithmetic operation is executed, so that input data be normalized.
2. the method according to claim 1, wherein the basis operator includes the first deformation operator, the second change Shape operator and normalization exponential operator, the first deformation operator and the second deformation operator are used to carry out class to input data Type conversion process, the normalization exponential operator is for being normalized operation, wherein
It is described to form splicing operator using the basic operator, comprising:
Using the first deformation operator as the prime operator of the normalization exponential operator;
Using the second deformation operator as the rear class operator of the normalization exponential operator.
3. according to the method described in claim 2, it is characterized in that, the splicing arithmetic operation includes:
It is greater than 2 in the dimension of first input data of the first kind, and carried in first input data When first parameter and the second parameter meet preset condition, first input data is converted to using the first deformation operator Second Type, the dimension of the first input data of the Second Type are 2;
The first input data of the Second Type is normalized in the second dimension using the normalization exponential operator Operation, to export the output data of Second Type;
The output data of Second Type is converted to the output data of the first kind using the second deformation operator.
4. the method according to claim 1, wherein the basis operator includes that scaling operator and normalization are calculated Son, the scaling operator are used to zoom in and out input data operation, and the normalization operator is for returning input data One changes operation, wherein
It is described to form splicing operator using the basic operator, comprising:
Using the normalization operator as the prime operator of the scaling operator.
5. according to the method described in claim 2, it is characterized in that, the splicing arithmetic operation includes:
Input data is normalized using the normalization operator, obtains normalization result;
The normalization result is zoomed in and out into processing using the scaling operator, the normalization result after being scaled.
6. the method according to claim 1, wherein the basis operator includes square operator, convolution operator, puts down The inverted operator of root and multiplication operator, described square of operator are used to carry out square operation, the convolution operator to input data For carrying out summation operation to input data, the square root, which is got, figures son for carrying out sqrt to input data and asking Inverted operation, the multiplication operator are used to carry out multiplying to input data, wherein
It is described to form splicing operator using the basic operator, comprising:
Described square of operator, convolution operator, the inverted operator of square root and multiplication operator are successively spliced to form the splicing to calculate Son.
7. according to the method described in claim 6, it is characterized in that, the splicing arithmetic operation includes:
Square operation is carried out to input data using described square of operator, obtains square operation result;
Summation operation is carried out to multiple square operation results using the convolution operator, obtains summation operation result;
Sqrt operation is successively carried out to the summation operation result using square root inverted operator and seeks derivative action, Obtain derivative action result;
Multiplying is carried out to input data and the derivative action result using the multiplication operator, obtains normalization result.
8. the method according to claim 1, wherein the splicing operator is applied to answering in software transfer level With program layer, the deep learning Operator Library is located at the Operator Library layer in software transfer level, artificial intelligence process device position Chip layer in software transfer level.
9. a kind of arithmetic unit, which is characterized in that described device includes:
Module is obtained, for obtaining the basic operator in artificial intelligence Operator Library, the basis operator is for holding input data The corresponding arithmetic operation of row;
Computing module is connected to the acquisition module, for forming splicing operator, the splicing operator using the basic operator For executing splicing arithmetic operation to input data in artificial intelligence process device, so that place be normalized to input data Reason.
10. device according to claim 9, which is characterized in that the basis operator includes the first deformation operator, the second change Shape operator and normalization exponential operator, the first deformation operator and the second deformation operator are used to carry out class to input data Type conversion process, the normalization exponential operator is for being normalized operation, wherein
The computing module includes the first operation submodule, and the first operation submodule is configured as:
Using the first deformation operator as the prime operator of the normalization exponential operator;
Using the second deformation operator as the rear class operator of the normalization exponential operator.
11. device according to claim 10, which is characterized in that the splicing arithmetic operation includes:
It is greater than 2 in the dimension of first input data of the first kind, and carried in first input data When first parameter and the second parameter meet preset condition, first input data is converted to using the first deformation operator Second Type, the dimension of the first input data of the Second Type are 2;
The first input data of the Second Type is normalized in the second dimension using the normalization exponential operator Operation, to export the output data of Second Type;
The output data of Second Type is converted to the output data of the first kind using the second deformation operator.
12. device according to claim 9, which is characterized in that the basis operator includes that scaling operator and normalization are calculated Son, the scaling operator are used to zoom in and out input data operation, and the normalization operator is for returning input data One changes operation, wherein
The computing module includes the second operation submodule, and the second operation submodule is configured as:
Using the normalization operator as the prime operator of the scaling operator.
13. device according to claim 10, which is characterized in that the splicing arithmetic operation includes:
Input data is normalized using the normalization operator, obtains normalization result;
The normalization result is zoomed in and out into processing using the scaling operator, the normalization result after being scaled.
14. device according to claim 9, which is characterized in that it is described basis operator include square operator, convolution operator, The inverted operator of square root and multiplication operator, described square of operator are used to carry out input data square operation, and the convolution is calculated For son for carrying out summation operation to input data, the square root, which is got, figures son for carrying out sqrt simultaneously to input data Operation reciprocal is sought, the multiplication operator is used to carry out multiplying to input data, wherein
The computing module includes third operation submodule, and the third operation submodule is configured as:
Described square of operator, convolution operator, the inverted operator of square root and multiplication operator are successively spliced to form the splicing to calculate Son.
15. device according to claim 14, which is characterized in that the splicing arithmetic operation includes:
Square operation is carried out to input data using described square of operator, obtains square operation result;
Summation operation is carried out to multiple square operation results using the convolution operator, obtains summation operation result;
Sqrt operation is successively carried out to the summation operation result using square root inverted operator and seeks derivative action, Obtain derivative action result;
Multiplying is carried out to input data and the derivative action result using the multiplication operator, obtains normalization result.
16. a kind of artificial intelligence process device, which is characterized in that described device includes:
Primary processor, for executing the method according to claim 1, to obtain splicing operator, the splicing is calculated Son is for executing corresponding arithmetic operation to the input data;
Artificial intelligence process device is electrically connected to the primary processor;
The primary processor is also used to send input data and the splicing operator, the artificial intelligence to artificial intelligence process device Processor is configured as:
Receive the input data and splicing operator that primary processor is sent;
Artificial intelligence operation is carried out to obtain operation result to the input data using the splicing operator;
The operation result is sent to the primary processor.
17. device according to claim 16, which is characterized in that the primary processor further includes that primary processor storage is empty Between, for storing the splicing operator, wherein
The splicing operator that the primary processor also provides for input data and is stored in the primary processor memory space.
18. device according to claim 16, which is characterized in that operation result is passed through I/ by the artificial intelligence process device O Interface passes to the primary processor;
When described device includes multiple artificial intelligence process devices, can lead between the multiple artificial intelligence process device Specific structure is crossed to be attached and transmit data;
Wherein, multiple artificial intelligence process devices are interconnected and are transmitted by quick external equipment interconnection Bus PC IE bus Data, to support the operation of more massive artificial intelligence;Multiple artificial intelligence process devices share same control system or Possess respective control system;Multiple artificial intelligence process device shared drives possess respective memory;It is multiple described The mutual contact mode of artificial intelligence process device is any interconnection topology.
19. device according to claim 16, which is characterized in that further include: storage device, the storage device respectively with institute It states artificial intelligence process device to connect with the primary processor, for saving the artificial intelligence process device device and the main process task The data of device.
20. a kind of artificial intelligence chip, which is characterized in that the artificial intelligence chip includes such as any one of claim 16-19 The artificial intelligence process device.
21. a kind of electronic equipment, which is characterized in that the electronic equipment includes the chip as described in the claim 20.
22. a kind of board, which is characterized in that the board includes: memory device, interface arrangement and control device and such as right It is required that artificial intelligence chip described in 20;
Wherein, the artificial intelligence chip is separately connected with the memory device, the control device and the interface arrangement;
The memory device, for storing data;
The interface arrangement, for realizing the data transmission between the chip and external equipment;
The control device is monitored for the state to the chip.
23. board according to claim 22, which is characterized in that
The memory device includes: multiple groups storage unit, and storage unit described in each group is connect with the chip by bus, institute State storage unit are as follows: DDR SDRAM;
The chip includes: DDR controller, the control for data transmission and data storage to each storage unit;
The interface arrangement are as follows: standard PCIE interface.
CN201811534505.0A 2018-12-14 2018-12-14 Operation method, device and related product Active CN109740729B (en)

Priority Applications (1)

Application Number Priority Date Filing Date Title
CN201811534505.0A CN109740729B (en) 2018-12-14 2018-12-14 Operation method, device and related product

Applications Claiming Priority (1)

Application Number Priority Date Filing Date Title
CN201811534505.0A CN109740729B (en) 2018-12-14 2018-12-14 Operation method, device and related product

Publications (2)

Publication Number Publication Date
CN109740729A true CN109740729A (en) 2019-05-10
CN109740729B CN109740729B (en) 2020-12-22

Family

ID=66359539

Family Applications (1)

Application Number Title Priority Date Filing Date
CN201811534505.0A Active CN109740729B (en) 2018-12-14 2018-12-14 Operation method, device and related product

Country Status (1)

Country Link
CN (1) CN109740729B (en)

Cited By (2)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN114625035A (en) * 2020-12-14 2022-06-14 北京晶视智能科技有限公司 Hybrid precision artificial intelligence processor and method of operation thereof
TWI800866B (en) * 2021-06-15 2023-05-01 瑞昱半導體股份有限公司 Method for improving convolutional neural network to perform computations

Citations (9)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
US20140003804A1 (en) * 2012-06-27 2014-01-02 Centurylink Intellectual Property Llc Use of Dying Gasp to Locate Faults in Communications Networks
CN107621932A (en) * 2017-09-25 2018-01-23 威创集团股份有限公司 The local amplification method and device of display image
CN107909041A (en) * 2017-11-21 2018-04-13 清华大学 A kind of video frequency identifying method based on space-time pyramid network
CN107967135A (en) * 2017-10-31 2018-04-27 平安科技(深圳)有限公司 Computing engines implementation method, electronic device and storage medium
CN108446330A (en) * 2018-02-13 2018-08-24 北京数字新思科技有限公司 Promotion object processing method and device and computer-readable storage medium
CN108647828A (en) * 2018-05-15 2018-10-12 中山大学 A kind of Prediction of Stock Index method of combination news corpus and stock market's transaction data
CN108664894A (en) * 2018-04-10 2018-10-16 天津大学 The human action radar image sorting technique of neural network is fought based on depth convolution
CN108764005A (en) * 2018-01-31 2018-11-06 华侨大学 A kind of high-spectrum remote sensing atural object space Spectral Characteristic extracting method and system
CN108872091A (en) * 2018-03-20 2018-11-23 浙江理工大学 A kind of detection method of the vegetable pesticide residue concentration based on high light spectrum image-forming

Patent Citations (9)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
US20140003804A1 (en) * 2012-06-27 2014-01-02 Centurylink Intellectual Property Llc Use of Dying Gasp to Locate Faults in Communications Networks
CN107621932A (en) * 2017-09-25 2018-01-23 威创集团股份有限公司 The local amplification method and device of display image
CN107967135A (en) * 2017-10-31 2018-04-27 平安科技(深圳)有限公司 Computing engines implementation method, electronic device and storage medium
CN107909041A (en) * 2017-11-21 2018-04-13 清华大学 A kind of video frequency identifying method based on space-time pyramid network
CN108764005A (en) * 2018-01-31 2018-11-06 华侨大学 A kind of high-spectrum remote sensing atural object space Spectral Characteristic extracting method and system
CN108446330A (en) * 2018-02-13 2018-08-24 北京数字新思科技有限公司 Promotion object processing method and device and computer-readable storage medium
CN108872091A (en) * 2018-03-20 2018-11-23 浙江理工大学 A kind of detection method of the vegetable pesticide residue concentration based on high light spectrum image-forming
CN108664894A (en) * 2018-04-10 2018-10-16 天津大学 The human action radar image sorting technique of neural network is fought based on depth convolution
CN108647828A (en) * 2018-05-15 2018-10-12 中山大学 A kind of Prediction of Stock Index method of combination news corpus and stock market's transaction data

Cited By (2)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN114625035A (en) * 2020-12-14 2022-06-14 北京晶视智能科技有限公司 Hybrid precision artificial intelligence processor and method of operation thereof
TWI800866B (en) * 2021-06-15 2023-05-01 瑞昱半導體股份有限公司 Method for improving convolutional neural network to perform computations

Also Published As

Publication number Publication date
CN109740729B (en) 2020-12-22

Similar Documents

Publication Publication Date Title
CN109657782A (en) Operation method, device and Related product
CN109543832A (en) A kind of computing device and board
CN109685201A (en) Operation method, device and Related product
CN109522052A (en) A kind of computing device and board
CN109189473A (en) Processing with Neural Network device and its method for executing vector exchange instruction
CN109740739A (en) Neural computing device, neural computing method and Related product
CN109726822A (en) Operation method, device and Related product
CN109740754A (en) Neural computing device, neural computing method and Related product
CN109416756A (en) Acoustic convolver and its applied artificial intelligence process device
CN110163358A (en) A kind of computing device and method
CN110147249A (en) A kind of calculation method and device of network model
CN110059797A (en) A kind of computing device and Related product
CN109993301A (en) Neural metwork training device and Related product
CN110909870B (en) Training device and method
CN110163349A (en) A kind of calculation method and device of network model
CN109740729A (en) Operation method, device and Related product
CN109711538A (en) Operation method, device and Related product
CN110059809A (en) A kind of computing device and Related product
CN109740730A (en) Operation method, device and Related product
CN109670581A (en) A kind of computing device and board
CN109711540B (en) Computing device and board card
CN109753319A (en) A kind of device and Related product of release dynamics chained library
WO2021082725A1 (en) Winograd convolution operation method and related product
CN109359542A (en) The determination method and terminal device of vehicle damage rank neural network based
CN212302545U (en) Deep learning processor architecture for intelligent parking

Legal Events

Date Code Title Description
PB01 Publication
PB01 Publication
SE01 Entry into force of request for substantive examination
SE01 Entry into force of request for substantive examination
CB02 Change of applicant information

Address after: 100190 room 644, comprehensive research building, No. 6 South Road, Haidian District Academy of Sciences, Beijing

Applicant after: Zhongke Cambrian Technology Co., Ltd

Address before: 100190 room 644, comprehensive research building, No. 6 South Road, Haidian District Academy of Sciences, Beijing

Applicant before: Beijing Zhongke Cambrian Technology Co., Ltd.

CB02 Change of applicant information
TA01 Transfer of patent application right

Effective date of registration: 20201126

Address after: Room 611-194, R & D center building, China (Hefei) international intelligent voice Industrial Park, 3333 Xiyou Road, hi tech Zone, Hefei City, Anhui Province

Applicant after: Anhui Cambrian Information Technology Co., Ltd

Address before: 100190 room 644, comprehensive research building, No. 6 South Road, Haidian District Academy of Sciences, Beijing

Applicant before: Zhongke Cambrian Technology Co.,Ltd.

TA01 Transfer of patent application right
GR01 Patent grant
GR01 Patent grant