WO2021082724A1

WO2021082724A1 - Operation method and related product

Info

Publication number: WO2021082724A1
Application number: PCT/CN2020/113166
Authority: WO
Inventors: 张英男; 曾洪博; 张尧; 刘少礼; 黄迪; 周诗怡; 张曦珊; 刘畅; 郭家明; 高钰峰
Original assignee: 中科寒武纪科技股份有限公司
Priority date: 2019-11-01
Filing date: 2020-09-03
Publication date: 2021-05-06
Also published as: CN112784207B; CN112784207A

Abstract

An operation method and a related product, the method comprising: acquiring feature data outputted by an upper layer convolutional network and a feature transformation matrix used for performing forward transformation on the feature data (201); on the basis of the feature transformation matrix, transforming the feature data to obtain a feature transformation result, wherein the transformation operation of the feature data is disassembled into a summation operation, and the feature transformation result is determined on the basis of the summation operation (202); acquiring the weight transformation result of the present layer convolutional network after forward transformation and performing a bitwise multiplication operation on the feature transformation result and the weight transformation result to obtain a multiplication operation result (203); acquiring an inverse transformation matrix used for performing inverse transformation on the multiplication operation result and, on the basis of the inverse transformation matrix, performing transformation on the multiplication operation result to obtain an operation result, wherein the transformation operation of the multiplication operation result is disassembled into a summation operation, and the operation result is determined on the basis of the summation operation (204); and outputting the operation result to a lower layer convolutional network (205).

Description

Algorithms and related products

This application claims the priority of a Chinese patent application filed with the Chinese Patent Office with the application number 2019110611184 and the application name "Operation Method and Related Products" on November 1, 2019, the entire content of which is incorporated into this application by reference.

Technical field

This application relates to the field of deep learning technology, in particular to a neural network-based calculation method and related products.

Background technique

In recent years, deep learning technology has developed rapidly, especially in the fields of image recognition, speech recognition, natural language analysis, intelligent robots, and big data analysis, and has become a research focus.

The neural network model is an operation model in deep learning technology, which uses a multi-layer architecture to process the input data and output the corresponding operation results. In the prior art, training a neural network model is a necessary step for calculations using the neural network model. During the training process, the neural network to be trained needs to repeatedly perform iterative operations on massive training data to obtain the trained neural network model. .

However, the traditional method of performing repeated iterative operations on massive training data will take up a lot of computing resources, and due to the low efficiency of computing the data, it requires a long training time and large computing power consumption.

Summary of the invention

Based on this, it is necessary to address the above-mentioned technical problems and provide a computing method and related products that can be used to improve the training efficiency of neural network models and reduce training computing resources.

In the first aspect, this application provides an operation method, including:

Obtain the feature data output by the upper layer convolutional network and the feature transformation matrix used for positive transformation of the feature data;

Transforming the feature data according to the feature transformation matrix to obtain a feature transformation result; wherein the transformation operation of the feature data is disassembled into a summation operation, and the feature transformation result is determined according to the summation operation;

Acquiring the weight transformation result of the positive transformation of the convolutional network of this layer, and performing a bitwise multiplication operation on the feature transformation result and the weight transformation result to obtain the multiplication operation result;

Obtain an inverse transformation matrix for inverse transformation of the multiplication operation result, and transform the multiplication operation result according to the inverse transformation matrix to obtain an operation result; wherein, the transformation operation of the multiplication operation result is decomposed into Sum operation, and determine the operation result according to the sum operation;

The result of the operation is output to the lower layer convolutional network.

In the second aspect, this application provides a computing device, including:

An acquisition module, used to acquire the feature data output by the upper layer convolutional network and the feature transformation matrix used to perform positive transformation on the feature data;

The feature transformation module is used to transform the feature data according to the feature transformation matrix to obtain a feature transformation result; wherein the transformation operation of the feature data is disassembled into a summation operation, and the summation operation is determined according to the summation operation. The result of feature transformation;

The bit multiplication module is used to obtain the weight conversion result of the positive transformation of the convolutional network of this layer, and perform bit multiplication on the feature conversion result and the weight conversion result to obtain the multiplication result;

The inverse transformation module is used to obtain an inverse transformation matrix for inverse transformation of the multiplication operation result, and transform the multiplication operation result according to the inverse transformation matrix to obtain the operation result; wherein The transformation operation is disassembled into a summation operation, and the operation result is determined according to the summation operation;

The transmission module is used to output the calculation result to the lower layer convolutional network.

In a third aspect, this application provides an artificial intelligence chip, which includes the computing device described in any one of the preceding items.

In a fourth aspect, the present application provides an electronic device including the artificial intelligence chip as described above.

In a fifth aspect, the present application provides a board card, the board card includes: a storage device, an interface device, a control device, and the aforementioned artificial intelligence chip;

Wherein, the artificial intelligence chip is connected to the storage device, the control device, and the interface device respectively;

The storage device is used to store data;

The interface device is used to implement data transmission between the artificial intelligence chip and external equipment;

The control device is used to monitor the state of the artificial intelligence chip.

The operation method and related products provided by this application include obtaining the feature data output by the upper-layer convolutional network and the feature transformation matrix used for forward transformation of the feature data; transforming the feature data according to the feature transformation matrix to obtain the feature transformation result; wherein , The transformation operation of feature data is disassembled into a summation operation, and the feature transformation result is determined according to the summation operation; the weight transformation result of the positive transformation of the convolutional network of this layer is obtained, and the feature transformation result and the weight transformation result are compared Bit multiplication operation, the multiplication operation result is obtained; the inverse transformation matrix used for inverse transformation of the multiplication operation result is obtained, and the multiplication operation result is transformed according to the inverse transformation matrix to obtain the operation result; among them, the transformation operation of the multiplication operation result is disassembled into Sum operation, and determine the operation result according to the sum operation; output the operation result to the lower layer convolutional network. When performing convolution processing on feature data, the winograd algorithm is used. This algorithm can convert multiplication to addition. At the same time, the data transformation operation in the algorithm is converted to a summation operation, which can further reduce the number of multiplications, thereby reducing the computer system The performance loss of the system increases the computing speed.

Description of the drawings

The drawings included in the specification and constituting a part of the specification together with the specification illustrate exemplary embodiments, features, and aspects of the present disclosure, and are used to explain the principle of the present disclosure.

Fig. 1 is a structural diagram of a processing system shown in an exemplary embodiment;

Fig. 2 is a flowchart of an operation method shown in an exemplary embodiment of the present invention;

Fig. 3 is a flowchart of an operation method shown in an exemplary embodiment of the present invention;

4 is a schematic diagram of a master-slave processing architecture shown in an exemplary embodiment of the present invention;

Fig. 5 is a schematic diagram of an arithmetic device shown in an exemplary embodiment of the present invention;

Fig. 6 is a structural block diagram of a board according to an exemplary embodiment of the present invention.

Detailed ways

The technical solutions in the embodiments of the present disclosure will be clearly and completely described below in conjunction with the accompanying drawings in the embodiments of the present disclosure. Obviously, the described embodiments are part of the embodiments of the present disclosure, rather than all of the embodiments. Based on the embodiments in the present disclosure, all other embodiments obtained by those skilled in the art without creative work shall fall within the protection scope of the present disclosure.

It should be understood that the terms "first", "second", "third" and "fourth" in the claims, specification and drawings of the present disclosure are used to distinguish different objects, rather than to describe a specific order. . The terms "comprising" and "comprising" used in the specification and claims of the present disclosure indicate the existence of the described features, wholes, steps, operations, elements and/or components, but do not exclude one or more other features, wholes The existence or addition of, steps, operations, elements, components, and/or their collections.

It should also be understood that the terms used in this specification of the present disclosure are only for the purpose of describing specific embodiments, and are not intended to limit the present disclosure. As used in the specification and claims of the present disclosure, unless the context clearly indicates otherwise, the singular forms "a", "an" and "the" are intended to include plural forms. It should also be further understood that the term "and/or" used in the specification and claims of this disclosure refers to any combination of one or more of the associated listed items and all possible combinations, and includes these combinations.

As used in this specification and claims, the term "if" can be interpreted as "when" or "once" or "in response to determination" or "in response to detection" depending on the context. Similarly, the phrase "if determined" or "if detected [described condition or event]" can be interpreted as meaning "once determined" or "in response to determination" or "once detected [described condition or event]" depending on the context ]" or "in response to detection of [condition or event described]".

In order to clearly understand the technical solutions of the present application, the technical terms involved in the prior art and the embodiments of the present application are explained below:

Convolution operation: Convolution operation refers to opening an active window with the same size as the template from the upper left corner of the image. The active window corresponds to a window image, which is the convolution kernel, and the window image corresponds to the pixels in the image Multiply and add, and use the calculation result as the first pixel value of the new image after the convolution operation. Then, the active window moves one column to the right, the window image corresponding to the active window and the pixels in the image are multiplied and then added, and the calculation result is used as the second pixel value of the new image after the convolution operation. By analogy, from left to right, from top to bottom, you can get a new image.

Winograd convolution operation: Winograd convolution operation is a convolution acceleration implementation method based on polynomial interpolation algorithm. It passes the two inputs of the convolution operation: the first target matrix and the second target matrix are respectively subjected to Winograd convolution positive transformation, and then the first target matrix and the second target matrix after the positive transformation are subjected to bitwise multiplication, and finally The Winograd convolution inverse transformation is performed again on the result of the bit multiplication, and the convolution result equivalent to the original convolution operation is obtained.

Convolutional neural network model: Convolutional neural network model is a type of feedforward neural network model that includes convolution calculation and has a deep structure, and is one of the representative models of deep learning. In the convolutional layer, fully connected layer and other network layers in the convolutional neural network model, it is necessary to perform convolution operations on neurons and convolution kernels to obtain feature data, which is widely used in image classification and image recognition.

The operation method according to the embodiment of the present disclosure can be applied to any one processor of a processing system (for example, an artificial intelligence chip) including multiple processors (multi-core). The processor may be a general-purpose processor, such as a CPU (Central Processing Unit, central processing unit), or an artificial intelligence processor (IPU) for performing artificial intelligence operations. Artificial intelligence operations can include machine learning operations, brain-like operations, and so on. Among them, machine learning operations include neural network operations, k-means operations, support vector machine operations, and so on. The artificial intelligence processor may, for example, include GPU (Graphics Processing Unit), NPU (Neural-Network Processing Unit, neural network processing unit), DSP (Digital Signal Process, digital signal processing unit), field programmable gate array (Field-Programmable Gate Array, FPGA) One or a combination of chips. The present disclosure does not limit the specific types of processors. In addition, the types of multiple processors in the processing system may be the same or different, which is not limited in the present disclosure.

In a possible implementation manner, the processor mentioned in the present disclosure may include multiple processing units, and each processing unit can independently run various tasks assigned to it, such as convolution computing tasks and pooling tasks. Or fully connected tasks, etc. The present disclosure does not limit the processing unit and the tasks run by the processing unit.

Fig. 1 shows a schematic diagram of a processing system of an arithmetic method according to an embodiment of the present disclosure. As shown in Figure 1, the processing system 100 includes multiple processors 101 and a memory 102. The multiple processors 101 are used to execute instruction sequences. The memory 102 is used to store data, and may include random access memory (RAM, Random Access Memory) and registers. heap. The multiple processors 101 in the processing system 100 can not only share part of the storage space, for example, share part of the RAM storage space and the register file, but also have their own storage space at the same time.

When the processing system implements artificial intelligence functions based on neural networks, the processing system can perform convolution operations on the feature data input to the layer according to the weight of a convolution layer, and obtain the convolution result, and then use the convolution result as the next convolution For the input data of the build-up layer, the next convolutional layer continues to use the weight data of this layer to perform convolution calculation on the input feature data. The convolution method can extract the features in the original data, for example, extract the image features, so as to output the required results according to these features.

In order to solve the aforementioned technical problems, in the operation method provided in this embodiment, when convolution operation is performed on the input feature data according to the weight data of the convolution layer, the winograd algorithm is used, and the transformation operation is divided into calculations. Sum operation to reduce the amount of multiplication processing, thereby reducing the performance loss of the computing system and improving computing efficiency.

Fig. 2 is a flowchart of an operation method shown in an exemplary embodiment of the present invention.

As shown in Figure 2, the calculation method provided by this embodiment includes:

Step 201: Obtain feature data output by the upper-layer convolutional network and a feature transformation matrix used to perform positive transformation on the feature data.

Among them, the computer system used to execute the solution of the embodiment may be connected to a terminal device. The terminal device can send the original data to the computer system. The computer system can use the method provided in this embodiment to process the original data, extract the features in the original data, and then feed back the recognition results to the terminal device based on these features, such as feeding back the original data correspondence Information.

Specifically, the original data may be picture data, for example, the terminal device may upload the original picture to the computer system, the computer system extracts the features included in the picture, determines a recognition result based on the characteristics, and then feeds back the recognition result to the terminal device.

Among them, a layer of convolutional network can output the operation result obtained by convolution to the next layer of convolutional network, so that the convolutional network can obtain the feature data output by the upper layer of convolutional network.

This layer of convolutional network can perform convolution calculation on the feature data according to the weight of this layer, so as to obtain the calculation result. In the method provided in this embodiment, the winograd algorithm may be used to transform the characteristic data.

Among them, when the convolution operation is performed by the winograd algorithm, the following formula can be used for calculation:

Y=A ^T [(GgG ^T )⊙(B ^T dB)] A

Among them, Y is used to represent the convolution matrix, that is, the result matrix obtained by convolution operation using feature data and weight data; d is used to represent the input feature data; g is used to represent the weight data in the neural network; B is used Yu represents the feature transformation matrix that transforms the feature data from the original domain to the winograd domain; B ^{T is} used to represent the feature inverse transformation matrix that transforms the feature data from the winograd domain to the original domain; G is used to represent the conversion of the weight data from the original domain The weight transformation matrix to the winograd domain; G ^{T is} used to represent the weight inverse transformation matrix that converts the weight data from the winograd domain to the original domain; A is used to represent the conversion of the result of the bitwise multiplication from the original domain to winograd The transformation matrix of the inverse transformation operation of the domain; ^{AT is} used to represent the inverse transformation matrix of the inverse transformation operation of converting the result of the bitwise multiplication from the winograd domain to the original domain.

It should be noted that the above-mentioned original domain refers to a domain that has not been transformed by winograd, and a winograd domain refers to a domain that has been transformed by winograd.

^{Specifically, feature transformation matrices B and B T} used to perform positive transformation on feature data d can also be obtained.

Further, in the traditional convolution operation, the number of multiplication operations is relatively large. By using the winograd algorithm for convolution processing, the number of multiplications can be reduced, thereby reducing the performance loss caused by the operation.

In actual application, in the winograd algorithm, the feature data needs to be forward transformed. Therefore, the method provided in this embodiment can obtain the feature transformation matrix for forward transformation of the feature data.

winograd algorithm, if d, g size is fixed, the ^{A, A T, B, B} T, G, G T is a fixed matrix. Specifically, the size of d can be determined according to the size of the required output result Y, the weight data g, and the sliding step length of the convolution process, and then the corresponding A, ^AT , B, B ^T , G, G ^T.

Step 202: Transform the feature data according to the feature transformation matrix to obtain a feature transformation result; wherein, the transformation operation of the feature data is disassembled into a summation operation, and the feature transformation result is determined according to the summation operation.

Among them, the characteristic data d can be transformed by the characteristic transformation matrix B and B ^T to obtain the characteristic change result, that is, the result of determining the B ^T dB.

Further, in order to further reduce the number of multiplications and reduce the performance loss caused by the operation, in the method provided in this embodiment, the transformation operation of the characteristic data is disassembled into a summation operation, and the characteristic transformation result is determined according to the summation operation.

In practical applications, the transformation operation of the feature data can be disassembled into multiple sub-transformation results, and then the sum result of the sub-transformation results can be determined as the feature transformation result.

Taking B ^T dB as an example, assuming that the characteristic data is a 4×4 matrix, the replacement matrix corresponding to each element in d can be preset, for example, d ₀₀ corresponds to a matrix D ₀₀ , and d ₀₁ corresponds to a D ₀₁ ……D ₃₃ corresponds to a D ₃₃ . The replacement matrix can be a matrix including 0, 1, -1.

When transforming d, the replacement matrix corresponding to d can be directly read, and each element in d can be extracted, and the element can be multiplied by the corresponding replacement matrix and then added to obtain the transformation result. Specific characteristic data can be converted according to the size, feature matrix B, wherein determining an inverse transformation matrix B ^T substitution matrix, when the conversion characteristic data can be read directly converting the stored characteristics beforehand substitution matrix.

Specifically, the multiplication operation of a single element and the replacement matrix can reduce the number of multiplications. Especially when the replacement matrix is composed of 0, 1, ﹣1, the amount of calculation can be greatly reduced. For example, the feature data is a 4×4 matrix, which includes _{16 elements d 00} , d ₀₁ ... d ₃₃ altogether. At this time, there may be 16 replacement matrices D ₀₁ , D ₀₁ ... D _{33 corresponding to these elements.} . In specific calculations,

B ^T dB=d ₀₀ ×D ₀₁ +d ₀₁ ×D ₀₁ …d ₃₃ ×D ₃₃

When the replacement matrix is composed of 0, 1, ﹣1, the multiplication of the element and the replacement matrix is changed to the process of directly writing data. For example, the 1 in the replacement matrix is multiplied by d ₀₀ , and the actual result is directly written into d ₀₀ . Therefore, based on the method provided in this embodiment, the transformation process in the winograd algorithm can be converted into an addition algorithm, thereby further reducing the computational complexity of the convolution process.

Step 203: Obtain the weight transformation result of the positive transformation of the convolutional network of this layer, and perform a bitwise multiplication operation on the feature transformation result and the weight transformation result to obtain the multiplication operation result.

In an embodiment, similar to

steps

201 and 202, the weight data g of the convolutional network of the current layer and the weight transformation matrices G ^T , G used to transform the weight data can be obtained. Then use the weight transformation matrix to transform the weight data to obtain the weight transformation result.

That is to determine the result of ^{G T gG.} In the determination process, it can also be disassembled into a summation operation in the above-mentioned manner, thereby reducing the performance loss of the operation process. For example, the replacement matrix corresponding to each element in g can be stored in advance, and then the positive transformation operation of the weight can be converted into a summation operation through these replacement matrices.

In another embodiment, since the weight data in each convolutional layer is fixed when the neural network is used for data processing, the weight transformation matrix corresponding to the weight data can be determined in advance, and The weight transformation result is determined in advance according to the weight data and the corresponding weight transformation matrix. When it is necessary to perform convolution calculation on the feature data, the predetermined weight change result can be directly read. For example, the predetermined weight transformation result can be stored in the storage unit, and the weight transformation result can be directly read when needed. Thereby, the performance loss caused by the positive transformation of the weight data is further reduced.

Optionally, the time sequence for obtaining the weight change result and determining the feature transformation result is not limited.

Among them, after the feature transformation result is determined and the weight transformation result is obtained, the two results can be multiplied by bit. That is, after obtaining B ^T dB and G ^T gG, the two matrices can be multiplied by bit to determine the result of (GgG ^T )⊙(B ^T dB).

In practical applications, the values at the corresponding positions of the two transformation results can be multiplied to obtain a new matrix as the result of the multiplication operation. For example, the result of feature data transformation is:

The result of weight transformation is:

The result of the multiplication operation is:

Step 204: Obtain an inverse transformation matrix for inversely transforming the result of the multiplication operation, and transform the result of the multiplication operation according to the inverse transformation matrix to obtain the operation result; wherein, the transformation operation of the multiplication operation result is disassembled into a summation operation, and according to The summation operation determines the result of the operation.

^{Specifically, the inverse transformation matrices A and AT} used for inverse transformation of the multiplication operation result can also be obtained. As described above, the inverse transformation matrix can be determined according to the size of the operation result.

Further, the inverse transformation matrix can be used to inversely transform the result of the multiplication operation, that is, to determine

Y=A ^T [(GgG ^T )⊙(B ^T dB)] A.

In practical applications, the replacement matrix corresponding to each element included in the multiplication operation can be determined in advance, so that the inverse transformation operation can be disassembled into a summation operation according to these replacement matrices, and the operation result can be determined according to the summation operation.

Among them, the specific disassembly method is similar to the disassembly method for the feature transformation operation, and the convolution operation result can be obtained through fewer multiplication methods.

Step 205: Output the calculation result to the lower layer convolutional network.

Specifically, in the method provided in this embodiment, the convolutional network of this layer can output the determined calculation result to the lower layer convolutional network, so as to use it as the input feature of the lower layer convolutional network, and the lower layer convolutional network can be based on the The weight data performs convolution calculation on the input data.

Further, at this time, when the lower-level convolutional network has performed convolution operations, the above-mentioned calculation method can also be used, that is, the winograd algorithm is adopted, and the change operation in the algorithm is converted into a summation operation.

The method provided in this embodiment is used to perform convolution operations, and the method is executed by a device provided with the method provided in this embodiment, and the device is usually implemented in hardware and/or software.

The operation method provided by this embodiment includes obtaining the feature data output by the upper-layer convolutional network and the feature transformation matrix used for forward transformation of the feature data; transforming the feature data according to the feature transformation matrix to obtain the feature transformation result; where The transformation operation of feature data is disassembled into a summation operation, and the feature transformation result is determined according to the summation operation; the weight transformation result of the positive transformation of the convolutional network of this layer is obtained, and the feature transformation result and the weight transformation result are subjected to bitwise multiplication Operate to obtain the result of the multiplication operation; obtain the inverse transformation matrix for inverse transformation of the result of the multiplication operation, and transform the result of the multiplication operation according to the inverse transformation matrix to obtain the operation result; among them, the transformation operation of the multiplication operation result is disassembled into a summation Calculate, and determine the result of the operation according to the summation operation; output the result of the operation to the lower layer convolutional network. In the method provided in this embodiment, the computer system uses the winograd algorithm when performing convolution processing on the feature data, which can convert multiplication to addition, and convert the change process in the algorithm to a summation operation, thereby further Reduce the multiplication operation in the data processing process, can reduce the performance loss of the computer system, and increase the calculation speed.

Fig. 3 is a flowchart of an operation method shown in an exemplary embodiment of the present invention.

As shown in Figure 3, the calculation method provided by this embodiment includes:

Step 301: Obtain feature data output by the upper-layer convolutional network and a feature transformation matrix used for positive transformation of the feature data.

The implementation principles and methods of step 301 and step 201 are similar, and will not be described again.

Step 302: Disassemble the feature data into multiple feature sub-tensors.

Among them, in the method provided in this embodiment, the positive transformation of the characteristic data can be disassembled into a summation operation, thereby reducing the number of multiplication operations. In the disassembly process, the feature data can be disassembled into multiple feature sub-tensors.

Specifically, the sum of multiple feature sub-tensors is feature data, the number of multiple feature sub-tensors is the same as the number of non-zero elements in the feature data, each feature sub-tensor has a single non-zero element, and the feature The non-zero elements in the sub-tensor are the same as the non-zero elements in the corresponding position in the feature data.

Further, for example, the characteristic data d is:

According to the above regulations, the feature data can be divided into 16 (assuming that the elements in the feature data are all non-zero), and the feature sub-tensors are:

Step 303: Perform a transformation operation on the multiple feature sub-tensors according to the feature transformation matrix and sum them to obtain a feature transformation result.

In practical applications, after the feature data is split into feature sub-tensors, the feature transformation matrix can be used to transform each feature sub-tensor, and then the transformation results of the feature sub-tensors can be added to obtain the feature transformation result.

Among them, since the sum of the feature sub-tensors is equal to the feature data, the result of transforming the feature sub-data and adding the transformation results is the same as the transformation result of the feature data.

For example, for one of the feature subtensors, it can be transformed based on the following formula:

For each feature sub-tensor, the above transformation can be performed, and then the transformation results of each feature sub-tensor are added to obtain the feature transformation result.

In order to further reduce the multiplication operation in the calculation process, when the feature sub-tensor is transformed and summed to obtain the feature transformation result, you can also:

Determine the corresponding feature element sub-tensor according to the feature sub-tensor, where the feature element sub-tensor is a tensor in which the non-zero elements of the feature sub-tensor are set to 1;

Determine the transformation result of the feature sub-tensor according to the feature transformation matrix, the feature element sub-tensor and the corresponding non-zero elements;

The result of the feature transformation is obtained by summing the transformation results of the feature subtensors.

Among them, the non-zero elements in the feature sub-tensor can be identified, and the position corresponding to the non-zero elements can be set to 1, to obtain the feature element sub-tensor, for example, for the feature sub-tensor

In other words, the corresponding feature element sub-tensor is:

For each feature sub-tensor, the corresponding feature element sub-tensor can be determined.

When transforming the feature sub-tensor, the transformation result of the feature sub-tensor can be determined according to the feature transformation matrix, the feature element sub-tensor and the corresponding non-zero elements.

Specifically, the left side of the eigen-element sub-tensor can be multiplied by the left multiplication matrix in the feature transformation matrix, and the right side can be multiplied by the right-multiplication matrix in the feature transformation matrix, and the result can be multiplied with the non-zero elements corresponding to the eigen-element sub-tensor to get The transformation result of the feature sub-tensor; among them, the left multiplication matrix and the right multiplication matrix in the feature sub-tensor are both determined by the size of the feature sub tensor

For example for

For example, it can be converted to

Since the elements in the matrix are 1 except for 0, the performance loss caused by the calculation process of the above formula is small.

^{Further, since B T} and B can be determined according to the size of the feature data, and the feature element sub-tensor can also be determined in advance according to the feature data. Therefore, the replacement matrix corresponding to the position of each element in the feature data can also be determined in advance according to B ^{T, B, and feature element sub-tensors.}

For example, for the element position in the first row and first column, the replacement matrix is:

Based on the above formula, it can be known that the feature sub-tensor

The result of the transformation becomes:

d ₀₀ ×D ₀₀

The corresponding replacement matrix can be determined for each element position in the feature data. When the feature data is transformed, the corresponding replacement matrix set can be determined directly according to the data size, and then the feature transformation result can be determined according to the replacement matrix set.

Based on the above formula, the result of feature change can be obtained as:

B ^T dB=d ₀₀ ×D ₀₁ +d ₀₁ ×D ₀₁ …d ₃₃ ×D ₃₃

Step 304: Obtain the weight data of the convolutional network of the current layer and the weight transformation matrix used for positive transformation of the weight data.

Step 305: Transform the weight data according to the weight transformation matrix to obtain a weight transformation result; wherein, the transformation operation of the weight data is disassembled into a summation operation, and the weight transformation result is determined according to the summation operation.

Among them, in the method provided in this embodiment, the weight data can be transformed according to the weight transformation matrix to obtain the weight transformation result.

Specifically, in the transformation process, in order to reduce the multiplication operation, the transformation operation of the weight data can be disassembled into a summation operation, and the weight transformation result can be determined according to the summation operation.

Further, the weight data can be transformed based on the following formula:

G ^T gG

Among them, G ^T and G are weight transformation matrices, and g is weight data. When the transformation process is disassembled into a summation operation, the weight data can be disassembled into multiple weight sub-tensors; then the multiple weight sub-tensors are transformed and summed according to the weight transformation matrix, Get the result of weight transformation.

Specifically, the sum of multiple weight sub-tensors is weight data, the number of multiple weight sub-tensors is the same as the number of non-zero elements in the weight data, and each weight sub-tensor has a single non-zero value. Element, and the non-zero element in the weight sub-tensor is the same as the non-zero element in the corresponding position in the weight data.

Further, for example, the weight data g is:

Similar to the split feature data, the weight data can also be split into 16 weight sub-tensors, which are:

In practical applications, after splitting the weight data into weight sub-tensors, you can use the weight transformation matrix to transform each weight sub-tensor, and then add the transformation results of the weight sub-tensors to obtain the weight The result of the value transformation.

For example, for one of the weight sub-tensors, it can be transformed based on the following formula:

For each weight sub-tensor, the above transformation can be performed, and then the transformation results of each weight sub-tensor are added to obtain the weight transformation result.

In order to further reduce the multiplication operation in the calculation process, when the weight sub-tensor is transformed and summed to obtain the weight transformation result, you can also:

Determine the corresponding weight sub-tensor according to the weight sub-tensor, where the weight sub-tensor is a tensor with the non-zero elements of the weight sub-tensor set to 1;

Determine the transformation result of the weight sub-tensor according to the weight transformation matrix, the weight element sub-tensor and the corresponding non-zero elements;

Sum the transformation results of the weight subtensors to obtain the weight transformation results.

Similar to the transformation process of feature data, it can identify the non-zero elements in the weight sub-tensor, and set the position corresponding to the non-zero element to 1, to obtain the weight element sub-tensor, for example, for the weight sub-tensor

In other words, the corresponding weight element sub-tensor is:

For each weight sub-tensor, the corresponding weight element sub-tensor can be determined.

When transforming the weight sub-tensor, the transformation result of the weight sub-tensor can be determined according to the weight transformation matrix, the weight element sub-tensor and the corresponding non-zero elements.

Specifically, the left side of the weight element sub-tensor can be multiplied by the left multiplication matrix in the weight transformation matrix, and the right side can be multiplied by the right multiplication matrix in the weight transformation matrix, and the result can be the non-zero element corresponding to the weight element sub-tensor. Multiply to obtain the transformation result of the weight sub-tensor; among them, the left multiplication matrix and the right multiplication matrix in the weight element sub-tensor are both determined by the scale of the weight sub-tensor.

For example for

For example, it can be converted to

^{Further, since G T} and G can be determined according to the size of the weight data, and the weight element sub-tensor can also be determined in advance according to the weight data. Therefore, the replacement matrix corresponding to the position of each element in the weight data can also be determined in advance according to G ^{T, G and the weight element sub-tensor.}

Based on the above formula, it can be known that the weight sub-tensor

The result of the transformation becomes:

g ₀₀ ×D′ ₀₀

The corresponding replacement matrix can be determined for each element position in the weight data. When the weight data is transformed, the corresponding replacement matrix set can be determined directly according to the data size, and then the weight transformation result can be determined according to the replacement matrix set.

Based on the above formula, the result of weight change can be obtained as:

G ^T gG＝g ₀₀ ×D＇ ₀₁ +g ₀₁ ×D＇ ₀₁ …g ₃₃ ×D＇ ₃₃

Step 306: Perform a bitwise multiplication operation on the feature transformation result and the weight transformation result to obtain the multiplication operation result.

Step 306 is similar to the implementation principle and method of performing bit multiplication on the feature transformation result and the weight transformation result in step 203, and will not be repeated here.

Step 307: Decompose the result of the multiplication operation into multiple resultant sub-tensors.

Step 308: Perform a transformation operation on the multiple resultant sub-tensors according to the inverse transformation matrix and sum them to obtain the operation result.

Among them, in the method provided in this embodiment, the multiplication operation result data can be transformed according to the inverse transformation matrix to obtain the operation result.

Specifically, in the transformation process, in order to reduce the multiplication operation, the transformation operation of the multiplication operation result can be disassembled into a summation operation, and the operation result can be determined according to the summation operation.

Further, the result of the multiplication operation can be transformed based on the following formula:

ApA ^T

Among them, A ^T and A are inverse transformation matrices, and p is the result of the multiplication operation. When the transformation process is disassembled into a summation operation, the result of the multiplication operation can be disassembled into multiple result sub-tensors; then the multiple result sub-tensors are transformed and summed according to the inverse transformation matrix to obtain the result of the operation .

Specifically, the sum of multiple resultant subtensors is the result of the multiplication operation, the number of multiple resultant subtensors is the same as the number of non-zero elements in the result of the multiplication operation, and each resultant subtensor has a single non-zero element, and The non-zero elements in the resulting sub-tensor are the same as the non-zero elements in the corresponding position in the result of the multiplication operation.

Further, for example, the multiplication operation result p is:

Similar to splitting the feature data, the result of the multiplication operation can also be split into 16 result sub-tensors, which are:

In practical applications, after splitting the result of the multiplication operation into the resultant sub-tensors, the inverse transformation matrix can be used to transform each resultant sub-tensor, and then the transformation results of the resultant sub-tensors can be added to obtain the result of the operation.

For example, for one of the resulting sub-tensors, it can be transformed based on the following formula:

For each result sub-tensor, the above transformation can be performed, and then the transformation results of each result sub-tensor are added to obtain the operation result.

In order to further reduce the multiplication operation in the operation process, when the result sub-tensor is transformed and summed to obtain the operation result, you can also:

Determine the corresponding result sub-tensor according to the result sub-tensor, where the result sub-tensor is a tensor with the non-zero elements of the result sub-tensor set to 1;

Determine the transformation result of the resulting sub-tensor according to the inverse transformation matrix, the resulting sub-tensor and its corresponding non-zero elements;

Sum the transformation results of the resulting sub-tensor to obtain the result of the operation.

Similar to the transformation process of feature data, the non-zero elements in the result sub-tensor can be identified, and the position corresponding to the non-zero elements can be set to 1, and the result element sub-tensor can be obtained, for example, for the result sub-tensor

In other words, the corresponding result element subtensor is:

For each result sub-tensor, the corresponding result element sub-tensor can be determined.

When transforming the resultant sub-tensor, the result of the operation can be determined according to the inverse transformation matrix, the resultant sub-tensor and its corresponding non-zero elements.

Specifically, the left side of the resultant sub-tensor can be multiplied by the left multiplication matrix in the inverse transformation matrix, and the right side can be multiplied by the right multiplication matrix in the inverse transformation matrix, and the result can be multiplied by the non-zero elements corresponding to the resultant sub-tensor to get The transformation result of the resultant sub-tensor; where, the left-multiplication matrix and the right-multiplication matrix in the resultant sub-tensor are both determined by the scale of the resultant sub-tensor.

For example for

For example, it can be converted to

^{Further, since A T} and A can be determined according to the size of the operation result, and the result element sub-tensor can also be determined in advance according to the size of the operation result. Therefore, the replacement matrix corresponding to the position of each element in the multiplication data can also be determined in advance according to ^{AT, A, and the result element sub-tensor.}

Based on the above formula, it can be known that the resulting sub-tensor

The result of the transformation becomes:

p ₀₀ ×D″ ₀₀

The corresponding replacement matrix can be determined for the position of each element in the multiplication operation result. When transforming the multiplication operation result, the corresponding replacement matrix set can be determined directly according to the result or the final operation result size, and then determined according to the replacement matrix set The result of the calculation.

Based on the above formula, the operation result can be obtained as

ApA ^T = p ₀₀ ×D＇＇ ₀₀ +p ₀₁ ×D＇＇ ₀₁ …p ₃₃ ×D＇＇ ₃₃

Step 309: Output the calculation result to the lower layer convolutional network.

The implementation principles and methods of step 309 and step 205 are similar, and will not be repeated here.

Fig. 4 is a schematic diagram showing a master-slave processing architecture according to an exemplary embodiment of the present invention.

As shown in FIG. 4, the solution of this embodiment also provides a master-slave processing architecture, which can be used to implement the calculation method provided in this embodiment.

The master-slave processing structure includes a master functional unit 41 and at least one slave functional unit 42.

Among them, the main functional unit 41 transforms the characteristic data according to the characteristic transformation matrix to obtain the characteristic transformation result; wherein, the transformation operation of the characteristic data is disassembled into a summation operation, and the characteristic transformation result is determined according to the summation operation.

Optionally, a main storage unit (not shown in the figure) can also be provided, and the main storage unit can be connected to the main function unit 41. A main control unit (not shown in the figure) can respectively send instructions to the main storage unit and the main function unit 41, so that the main storage unit can send characteristic data to the main function unit.

Obtain the weight transformation result of the positive transformation of the convolutional network of this layer from the functional unit, and perform the bitwise multiplication on the feature transformation result and the weight transformation result to obtain the multiplication result;

Obtain the inverse transformation matrix for inverse transformation of the multiplication operation result from the functional unit, and transform the multiplication operation result according to the inverse transformation matrix to obtain the operation result; among them, the transformation operation of the multiplication operation result is disassembled into a summation operation, and according to The summation operation determines the result of the operation.

The functional unit 42 performs a bitwise multiplication operation on the feature data transformation result and the weight transformation result, and obtains the multiplication operation result.

Wherein, the slave functional unit 42 may perform a bitwise multiplication operation on the received feature data transformation result and the weight transformation result, so as to obtain the multiplication operation result.

In the method provided in this embodiment, the data processing process is similar to the foregoing embodiment, and will not be repeated.

Specifically, one main functional unit 41 can be connected to multiple slave functional units 42, and allocation rules can be preset for allocating characteristic data transformation results to the slave functional units 42.

Further, the operation process of the main function unit 41 and the slave function unit 42 is a parallel operation. Before the main function unit 41 completes the calculation of the transformation result value of each element position in the feature data, the slave function unit 42 transforms the calculated feature For the element position of the result value, perform the alignment multiplication operation of the feature transformation result and the weight transformation result under the element position, until the alignment multiplication operation value of each element position is calculated, and the multiplication operation result is obtained.

In actual application, by sending the feature change result determined by the main functional unit 41 to the slave functional unit 42, and the slave functional unit 42 executes the bitwise multiplication operation in parallel, the operation efficiency of the system can be improved.

It should be noted that for the foregoing method embodiments, for the sake of simple description, they are all expressed as a series of action combinations, but those skilled in the art should know that the present disclosure is not limited by the described sequence of actions. Because according to the present disclosure, certain steps can be performed in other order or at the same time. Secondly, those skilled in the art should also know that the embodiments described in the specification are all optional embodiments, and the involved actions and modules are not necessarily required by the present disclosure.

It should be further noted that although the various steps in the flowchart are displayed in sequence according to the directions of the arrows, these steps are not necessarily executed in the order indicated by the arrows. Unless there is a clear description in this article, there is no strict order for the execution of these steps, and these steps can be executed in other orders. Moreover, at least part of the steps in the flowchart may include multiple sub-steps or multiple stages. These sub-steps or stages are not necessarily executed at the same time, but can be executed at different times. The execution of these sub-steps or stages The sequence is not necessarily performed sequentially, but may be performed alternately or alternately with at least a part of other steps or sub-steps or stages of other steps.

Fig. 5 is a schematic diagram of an arithmetic device shown in an exemplary embodiment of the present invention.

As shown in Figure 5, the computing device provided in this embodiment includes:

The obtaining module 51 is configured to obtain the feature data output by the upper layer convolutional network and the feature transformation matrix used to perform positive transformation on the feature data;

The feature transformation module 52 is configured to transform the feature data according to the feature transformation matrix to obtain a feature transformation result; wherein, the transformation operation of the feature data is disassembled into a summation operation, and determined according to the summation operation The feature transformation result;

The bit multiplication module 53 is used to obtain the weight conversion result of the positive transformation of the convolutional network of the current layer, and perform bit multiplication on the feature conversion result and the weight conversion result to obtain the multiplication result;

The inverse transformation module 54 is configured to obtain an inverse transformation matrix used to inversely transform the multiplication operation result, and transform the multiplication operation result according to the inverse transformation matrix to obtain an operation result; wherein, the multiplication operation result is The transformation operation of is disassembled into a summation operation, and the operation result is determined according to the summation operation;

The transmission module 55 is configured to output the calculation result to the lower layer convolutional network.

The specific principle, implementation, and effect of the computing device provided in this embodiment are similar to those in the embodiment shown in FIG. 2 and will not be repeated.

Based on the computing device shown in FIG. 5, in the computing device provided in this embodiment, the feature transformation module 52 is specifically configured to:

Disassemble the feature data into multiple feature sub-tensors;

Performing a transformation operation on the multiple feature sub-tensors according to the feature transformation matrix and summing them to obtain the feature transformation result;

The inverse transform module 54 is specifically configured to:

Disassemble the multiplication operation result into multiple result sub-tensors;

Perform a transformation operation on the multiple resultant sub-tensors according to the inverse transformation matrix and sum them to obtain the operation result.

The sum of a plurality of the feature sub-tensors is the feature data; the sum of a plurality of the result sub-tensors is the result of the multiplication operation;

The number of the multiple feature sub-tensors is the same as the number of non-zero elements in the feature data, each feature sub-tensor has a single non-zero element, and the non-zero elements in the feature sub-tensor are the same as The non-zero elements at corresponding positions in the feature data are the same;

The number of the multiple resultant sub-tensors is the same as the number of non-zero elements in the result of the multiplication operation, each of the resultant sub-tensors has a single non-zero element, and there are non-zero elements in the resultant sub-tensor It is the same as the non-zero element in the corresponding position in the multiplication operation result.

The feature transformation module 52 is specifically configured to:

Determine a corresponding feature element sub-tensor according to the feature sub-tensor, wherein the feature element sub-tensor is a tensor with a non-zero element of the feature sub-tensor set to 1;

Summing the transformation results of the feature subtensors to obtain the feature transformation results;

The inverse transform module 54 is specifically configured to:

Determine a corresponding result element sub-tensor according to the result sub-tensor, wherein the result element sub-tensor is a tensor in which a non-zero element of the result sub-tensor is set to 1;

Determine the transformation result of the result sub-tensor according to the inverse transformation matrix, the result element sub-tensor and the corresponding non-zero elements;

Summing the transformation results of the resultant sub-tensor to obtain the operation result.

The feature transformation module 52 is specifically configured to:

Multiply the left side of the feature element sub-tensor by the left multiplication matrix in the feature transformation matrix, and multiply the right side by the right multiplication matrix in the feature transformation matrix, and the result will be the non-correspondence of the feature element sub-tensor. Multiply the 0 elements to obtain the transformation result of the feature sub-tensor;

Wherein, the left multiplication matrix and the right multiplication matrix in the feature element sub-tensor are both determined by the scale of the feature sub-tensor;

The inverse transform module 54 is specifically configured to:

Multiply the left side of the result element sub-tensor by the left multiplying matrix in the inverse transformation matrix, and multiply the right side by the right multiplying matrix in the inverse transformation matrix, and the result is the non-sub-tensor corresponding to the result element. Multiply the 0 elements to obtain the transformation result of the resulting sub-tensor;

Wherein, the left multiplication matrix and the right multiplication matrix in the result element sub-tensor are both determined by the scale of the result sub-tensor.

The alignment multiplication module 53 is specifically used for:

Obtain the weight data of the convolutional network of this layer and the weight transformation matrix used for positive transformation of the weight data;

The weight data is transformed according to the weight transformation matrix to obtain the weight transformation result; wherein, the transformation operation of the weight data is disassembled into a summation operation, and the weight is determined according to the summation operation. The result of the value transformation.

The alignment multiplication module 53 is specifically used for:

Disassemble the weight data into multiple weight sub-tensors;

Perform a transformation operation on the multiple weight sub-tensors according to the weight transformation matrix and sum them to obtain the weight transformation result.

The alignment multiplication module 53 is specifically used for:

Determine the corresponding weight sub-tensor according to the weight sub-tensor, wherein the weight sub-tensor is a tensor in which non-zero elements of the weight sub-tensor are set to 1;

Determining the transformation result of the weight sub-tensor according to the weight transformation matrix, the weight element sub-tensor and the corresponding non-zero elements;

Sum the transformation results of the weight sub-tensors to obtain the weight transformation results.

The alignment multiplication module 53 is specifically used for:

Multiply the left side of the weight element sub-tensor by the left multiplication matrix in the weight transformation matrix, and multiply the right side by the right multiplication matrix in the weight transformation matrix, and compare the result with the weight element sub-tensor Multiply the non-zero elements corresponding to the quantity to obtain the transformation result of the weight sub-tensor;

Wherein, the left multiplication matrix and the right multiplication matrix in the weight sub-tensor are both determined by the scale of the weight sub-tensor.

The specific principle, implementation, and effect of the computing device provided in this embodiment are similar to those in the embodiments shown in FIGS. 3 and 4, and will not be repeated.

It should be understood that the foregoing device embodiments are only illustrative, and the device of the present disclosure may also be implemented in other ways. For example, the division of units/modules in the above-mentioned embodiments is only a logical function division, and there may be other division methods in actual implementation. For example, multiple units, modules or components may be combined or integrated into another system, or some features may be omitted or not implemented.

In addition, unless otherwise specified, the functional units/modules in the various embodiments of the present disclosure may be integrated into one unit/module, or each unit/module may exist alone physically, or two or more units/modules may exist. The modules are integrated together. The above-mentioned integrated unit/module can be implemented in the form of hardware or software program module.

If the integrated unit/module is implemented in the form of hardware, the hardware may be a digital circuit, an analog circuit, and so on. The physical realization of the hardware structure includes but is not limited to transistors, memristors and so on. Unless otherwise specified, the artificial intelligence processor may be any appropriate hardware processor, such as CPU, GPU, FPGA, DSP, ASIC, and so on. Unless otherwise specified, the storage unit may be any suitable magnetic storage medium or magneto-optical storage medium, such as RRAM (Resistive Random Access Memory), DRAM (Dynamic Random Access Memory), Static random access memory SRAM (Static Random-Access Memory), enhanced dynamic random access memory EDRAM (Enhanced Dynamic Random Access Memory), high-bandwidth memory HBM (High-Bandwidth Memory), hybrid storage cube HMC (Hybrid Memory Cube), etc. Wait.

If the integrated unit/module is implemented in the form of a software program module and sold or used as an independent product, it can be stored in a computer readable memory. Based on this understanding, the technical solution of the present disclosure essentially or the part that contributes to the prior art or all or part of the technical solution can be embodied in the form of a software product, and the computer software product is stored in a memory, It includes several instructions to make a computer device (which may be a personal computer, a server, or a network device, etc.) execute all or part of the steps of the methods described in the various embodiments of the present disclosure. The aforementioned memory includes: U disk, Read-Only Memory (ROM, Read-Only Memory), Random Access Memory (RAM, Random Access Memory), mobile hard disk, magnetic disk or optical disk and other media that can store program codes.

In a possible implementation manner, an artificial intelligence chip is also disclosed, which includes the aforementioned computing device.

In a possible implementation manner, a board card is also disclosed, which includes a storage device, an interface device, a control device, and the aforementioned artificial intelligence chip; wherein, the artificial intelligence chip is connected to the storage device and the control device And the interface devices are respectively connected; the storage device is used to store data; the interface device is used to realize data transmission between the artificial intelligence chip and an external device; the control device is used to The state of the artificial intelligence chip is monitored.

Fig. 6 is a structural block diagram of a board card shown in an exemplary embodiment of the present invention. Referring to Fig. 6, the board card may include other supporting components in addition to the chip 389 described above. The supporting components include, but are not limited to: a storage device 390. Interface device 391 and control device 392;

The storage device 390 is connected to the artificial intelligence chip through a bus for storing data. The storage device may include multiple groups of storage units 393. Each group of the storage unit and the artificial intelligence chip are connected through a bus. It can be understood that each group of the storage units may be DDR SDRAM (English: Double Data Rate SDRAM, double-rate synchronous dynamic random access memory).

DDR does not need to increase the clock frequency to double the speed of SDRAM. DDR allows data to be read on the rising and falling edges of the clock pulse. The speed of DDR is twice that of standard SDRAM. In an embodiment, the storage device may include 4 groups of the storage units. Each group of the storage unit may include a plurality of DDR4 particles (chips). In one embodiment, the artificial intelligence chip may include four 72-bit DDR4 controllers. In the 72-bit DDR4 controller, 64 bits are used for data transmission and 8 bits are used for ECC verification. It can be understood that when DDR3200 particles are used in each group of the storage units, the theoretical bandwidth of data transmission can reach 25600MB/s.

In one embodiment, each group of the storage unit includes a plurality of double-rate synchronous dynamic random access memories arranged in parallel. DDR can transmit data twice in one clock cycle. A controller for controlling the DDR is provided in the chip, which is used to control the data transmission and data storage of each storage unit.

The interface device is electrically connected with the artificial intelligence chip. The interface device is used to implement data transmission between the artificial intelligence chip and an external device (such as a server or a computer). For example, in one embodiment, the interface device may be a standard PCIE interface. For example, the data to be processed is transferred from the server to the chip through a standard PCIE interface to realize data transfer. Preferably, when the PCIE 3.0 X 16 interface is used for transmission, the theoretical bandwidth can reach 16000MB/s. In another embodiment, the interface device may also be other interfaces. The present disclosure does not limit the specific manifestations of the other interfaces mentioned above, as long as the interface unit can realize the switching function. In addition, the calculation result of the artificial intelligence chip is still transmitted by the interface device back to an external device (such as a server).

The control device is electrically connected with the artificial intelligence chip. The control device is used to monitor the state of the artificial intelligence chip. Specifically, the artificial intelligence chip and the control device may be electrically connected through an SPI interface. The control device may include a single-chip microcomputer (Micro Controller Unit, MCU). For example, the artificial intelligence chip may include multiple processing chips, multiple processing cores, or multiple processing circuits, and can drive multiple loads. Therefore, the artificial intelligence chip can be in different working states such as multi-load and light-load. The control device can realize the regulation and control of the working states of multiple processing chips, multiple processing and or multiple processing circuits in the artificial intelligence chip.

In a possible implementation manner, an electronic device is disclosed, which includes the aforementioned artificial intelligence chip. Electronic equipment includes data processing devices, robots, computers, printers, scanners, tablets, smart terminals, mobile phones, driving recorders, navigators, sensors, cameras, servers, cloud servers, cameras, cameras, projectors, watches, headsets , Mobile storage, wearable devices, vehicles, household appliances, and/or medical equipment.

The transportation means include airplanes, ships, and/or vehicles; the household appliances include TVs, air conditioners, microwave ovens, refrigerators, rice cookers, humidifiers, washing machines, electric lights, gas stoves, and range hoods; the medical equipment includes nuclear magnetic resonance, B-ultrasound and/or electrocardiograph.

In the above-mentioned embodiments, the description of each embodiment has its own focus. For parts that are not described in detail in an embodiment, reference may be made to related descriptions of other embodiments. The technical features of the above-mentioned embodiments can be combined arbitrarily. In order to make the description concise, all possible combinations of the various technical features in the above-mentioned embodiments are not described. However, as long as there is no contradiction in the combination of these technical features, they should all be combined. It is considered as the range described in this specification.

The foregoing can be better understood according to the following clauses:

A1. An operation method, the method comprising:

Output the result of the operation to the lower convolutional network

A2. The method according to clause A1, the disassembling the transformation operation of the characteristic data into a summation operation, and determining the characteristic transformation result according to the summation operation, includes:

Disassemble the feature data into multiple feature sub-tensors;

The disassembling the transformation operation of the multiplication operation result into a summation operation and determining the operation result according to the summation operation includes:

Perform a transformation operation on the multiple resultant sub-tensors according to the inverse transformation matrix and sum them to obtain the operation result

A3. The method according to clause A2, wherein the sum of a plurality of the feature sub-tensors is the feature data; the sum of a plurality of the result sub-tensors is the result of the multiplication operation;

The number of the multiple resultant sub-tensors is the same as the number of non-zero elements in the result of the multiplication operation, each of the resultant sub-tensors has a single non-zero element, and there are non-zero elements in the resultant sub-tensor Same as the non-zero element in the corresponding position in the result of the multiplication operation

A4. The method according to clause A2, the performing a transformation operation on the multiple feature sub-tensors according to the feature transformation matrix and summing them to obtain the feature transformation result includes:

The performing a transformation operation on the multiple result levy tensors according to the inverse transformation matrix and summing them to obtain the operation result includes:

A5. The method according to clause A4, wherein the determining the transformation result of the feature sub-tensor according to the feature transformation matrix, the feature element sub-tensor and its corresponding non-zero elements includes:

The determining the transformation result of the result sub-tensor according to the inverse transformation matrix, the result element sub-tensor and the corresponding non-zero elements includes:

A6. According to the method described in clause A1, the obtaining the weight transformation result of the current layer of the convolutional network after the positive transformation includes:

A7. The method according to clause A6, the disassembling the transformation operation of the weight data into a summation operation, and determining the weight transformation result according to the summation operation, includes:

Disassemble the weight data into multiple weight sub-tensors;

A8. The method according to clause A7, the performing a transformation operation on the multiple weight sub-tensors and summing them according to the weight transformation matrix to obtain the weight transformation result includes:

A9. The method according to clause A8, the determining the transformation result of the weight sub-tensor according to the weight transformation matrix, the weight element sub-tensor and its corresponding non-zero elements, includes:

A10. The method according to any one of clauses A1-A9, which is applied to a master-slave processing architecture; the master-slave processing architecture includes: a master functional unit and at least one slave functional unit.

A11. The method according to clause A10, wherein the main function unit transforms the feature data according to the feature transformation matrix to obtain a feature transformation result; wherein the transformation operation of the feature data is disassembled into a summation operation, And determine the feature transformation result according to the summation operation;

Acquiring, by the functional unit, the weight transformation result of the current layer of the convolutional network through the forward transformation, and performing a bitwise multiplication operation on the feature transformation result and the weight transformation result to obtain the multiplication operation result;

The slave functional unit obtains an inverse transformation matrix for inversely transforming the multiplication operation result, and transforms the multiplication operation result according to the inverse transformation matrix to obtain an operation result; wherein, the multiplication operation result is transformed The operation is disassembled into a summation operation, and the operation result is determined according to the summation operation.

A12. According to the method described in clause A11, the operation process of the main functional unit and the slave functional unit is parallel operation. Before the main functional unit calculates the transformation result value of each element position in the feature data, all The said slave function unit executes the alignment multiplication operation of the feature transformation result and the weight transformation result under the element position for the element position of the calculated feature transformation result value, until the alignment multiplication operation value of each element position is calculated, Obtain the result of the multiplication operation.

A13. An arithmetic device, including:

A14. According to the device described in clause A13, the feature transformation module is specifically configured to:

Disassemble the feature data into multiple feature sub-tensors;

The inverse transform module is specifically used for:

A15. The device according to clause A14, wherein the sum of a plurality of the feature sub-tensors is the feature data; the sum of a plurality of the result sub-tensors is the result of the multiplication operation;

A16. According to the device described in clause A14, the feature transformation module is specifically configured to:

The inverse transform module is specifically used for:

A17. According to the device described in clause A16, the feature transformation module is specifically configured to:

The inverse transform module is specifically used for:

A18. According to the device described in clause A13, the alignment multiplication module is specifically configured to:

A19. According to the device described in clause A18, the alignment multiplication module is specifically configured to:

Disassemble the weight data into multiple weight sub-tensors;

A20. According to the device described in clause A19, the alignment multiplication module is specifically configured to:

A21. According to the device described in clause A20, the alignment multiplication module is specifically configured to:

A22. An artificial intelligence chip, the chip comprising the computing device according to any one of clauses A13-A21.

A23. An electronic device including the artificial intelligence chip as described in clause A22.

A24. A board card, the board card comprising: a storage device, an interface device, a control device, and the artificial intelligence chip as described in clause A22;

The storage device is used to store data;

A25. The board according to clause A24, the storage device includes: multiple groups of storage units, each group of the storage units is connected to the artificial intelligence chip through a bus, and the storage unit is: DDR SDRAM;

The chip includes: a DDR controller, which is used to control the data transmission and data storage of each storage unit;

The interface device is: a standard PCIE interface.

The embodiments of the present disclosure are described in detail above, and specific examples are used in this article to illustrate the principles and implementations of the present disclosure. The descriptions of the above embodiments are only used to help understand the methods and core ideas of the present disclosure. At the same time, changes or modifications made by those skilled in the art based on the ideas of the present disclosure, the specific embodiments and the scope of application of the present disclosure, are all within the protection scope of the present disclosure. In summary, the content of this specification should not be construed as a limitation of this disclosure.

Based on the solutions provided by the foregoing embodiments, the efficiency of neural network processing can be effectively improved. In practical applications, the above solutions can be implemented based on various architectures, for example, through a master-slave architecture or a general architecture.

Claims

An operation method, characterized in that it includes:

Obtain the feature data output by the upper layer convolutional network and the feature transformation matrix used for positive transformation of the feature data;

Transforming the feature data according to the feature transformation matrix to obtain a feature transformation result; wherein the transformation operation of the feature data is disassembled into a summation operation, and the feature transformation result is determined according to the summation operation;

Acquiring the weight transformation result of the positive transformation of the convolutional network of this layer, and performing a bitwise multiplication operation on the feature transformation result and the weight transformation result to obtain the multiplication operation result;

Obtain an inverse transformation matrix for inverse transformation of the multiplication operation result, and transform the multiplication operation result according to the inverse transformation matrix to obtain an operation result; wherein, the transformation operation of the multiplication operation result is decomposed into Sum operation, and determine the operation result according to the sum operation;

The result of the operation is output to the lower layer convolutional network.
The method according to claim 1, wherein the disassembling the transformation operation of the characteristic data into a summation operation, and determining the characteristic transformation result according to the summation operation, comprises:

Disassemble the feature data into multiple feature sub-tensors;

Performing a transformation operation on the multiple feature sub-tensors according to the feature transformation matrix and summing them to obtain the feature transformation result;

The disassembling the transformation operation of the multiplication operation result into a summation operation and determining the operation result according to the summation operation includes:

Disassemble the multiplication operation result into multiple result sub-tensors;

Perform a transformation operation on the multiple resultant sub-tensors according to the inverse transformation matrix and sum them to obtain the operation result.
3. The method according to claim 2, wherein the sum of a plurality of the feature sub-tensors is the feature data; the sum of a plurality of the result sub-tensors is the result of the multiplication operation;

The number of the multiple feature sub-tensors is the same as the number of non-zero elements in the feature data, each feature sub-tensor has a single non-zero element, and the non-zero elements in the feature sub-tensor are the same as The non-zero elements at corresponding positions in the feature data are the same;

The number of the multiple resultant sub-tensors is the same as the number of non-zero elements in the result of the multiplication operation, each of the resultant sub-tensors has a single non-zero element, and there are non-zero elements in the resultant sub-tensor It is the same as the non-zero element in the corresponding position in the multiplication operation result.
3. The method according to claim 2, wherein the performing a transformation operation on the plurality of feature sub-tensors according to the feature transformation matrix and summing them to obtain the feature transformation result comprises:

Determine a corresponding feature element sub-tensor according to the feature sub-tensor, wherein the feature element sub-tensor is a tensor with a non-zero element of the feature sub-tensor set to 1;

Determine the transformation result of the feature sub-tensor according to the feature transformation matrix, the feature element sub-tensor and the corresponding non-zero elements;

Summing the transformation results of the feature subtensors to obtain the feature transformation results;

The performing a transformation operation on the multiple result levy tensors according to the inverse transformation matrix and summing them to obtain the operation result includes:

Determine a corresponding result element sub-tensor according to the result sub-tensor, wherein the result element sub-tensor is a tensor in which a non-zero element of the result sub-tensor is set to 1;

Determine the transformation result of the result sub-tensor according to the inverse transformation matrix, the result element sub-tensor and the corresponding non-zero elements;

Summing the transformation results of the resultant sub-tensor to obtain the operation result.
The method according to claim 4, wherein the determining the transformation result of the feature sub-tensor according to the feature transformation matrix, the feature element sub-tensor and its corresponding non-zero elements, comprises:

Multiply the left side of the feature element sub-tensor by the left multiplication matrix in the feature transformation matrix, and multiply the right side by the right multiplication matrix in the feature transformation matrix, and the result will be the non-correspondence of the feature element sub-tensor. Multiply the 0 elements to obtain the transformation result of the feature sub-tensor;

Wherein, the left multiplication matrix and the right multiplication matrix in the feature element sub-tensor are both determined by the scale of the feature sub-tensor;

The determining the transformation result of the result sub-tensor according to the inverse transformation matrix, the result element sub-tensor and the corresponding non-zero elements includes:

Multiply the left side of the result element sub-tensor by the left multiplying matrix in the inverse transformation matrix, and multiply the right side by the right multiplying matrix in the inverse transformation matrix, and the result is the non-sub-tensor corresponding to the result element. Multiply the 0 elements to obtain the transformation result of the resulting sub-tensor;

Wherein, the left multiplication matrix and the right multiplication matrix in the result element sub-tensor are both determined by the scale of the result sub-tensor.
The method according to claim 1, wherein the obtaining the weight transformation result of the current layer of the convolutional network after the positive transformation comprises:

Obtain the weight data of the convolutional network of this layer and the weight transformation matrix used for positive transformation of the weight data;

The weight data is transformed according to the weight transformation matrix to obtain the weight transformation result; wherein, the transformation operation of the weight data is disassembled into a summation operation, and the weight is determined according to the summation operation. The result of the value transformation.
The method according to claim 6, wherein the disassembling the transformation operation of the weight data into a summation operation, and determining the weight transformation result according to the summation operation, comprises:

Disassemble the weight data into multiple weight sub-tensors;

Perform a transformation operation on the multiple weight sub-tensors according to the weight transformation matrix and sum them to obtain the weight transformation result.
8. The method according to claim 7, wherein the performing a transformation operation on the multiple weight sub-tensors and summing them according to the weight transformation matrix to obtain the weight transformation result comprises:

Determine the corresponding weight sub-tensor according to the weight sub-tensor, wherein the weight sub-tensor is a tensor in which non-zero elements of the weight sub-tensor are set to 1;

Determining the transformation result of the weight sub-tensor according to the weight transformation matrix, the weight element sub-tensor and the corresponding non-zero elements;

Sum the transformation results of the weight sub-tensors to obtain the weight transformation results.
8. The method according to claim 8, wherein said determining the transformation result of said weight sub-tensor according to said weight transformation matrix, said weight element sub-tensor and its corresponding non-zero elements ,include:

Multiply the left side of the weight element sub-tensor by the left multiplication matrix in the weight transformation matrix, and multiply the right side by the right multiplication matrix in the weight transformation matrix, and compare the result with the weight element sub-tensor Multiply the non-zero elements corresponding to the quantity to obtain the transformation result of the weight sub-tensor;

Wherein, the left multiplication matrix and the right multiplication matrix in the weight sub-tensor are both determined by the scale of the weight sub-tensor.
The method according to any one of claims 1-9, wherein the method is applied to a master-slave processing architecture; the master-slave processing architecture comprises: a master functional unit and at least one slave functional unit.
The method according to claim 10, wherein the main function unit transforms the feature data according to the feature transformation matrix to obtain a feature transformation result; wherein the transformation operation of the feature data is disassembled into Sum operation, and determine the feature transformation result according to the sum operation;

Acquiring, by the functional unit, the weight transformation result of the current layer of the convolutional network through the forward transformation, and performing a bitwise multiplication operation on the feature transformation result and the weight transformation result to obtain the multiplication operation result;

The slave functional unit obtains an inverse transformation matrix for inversely transforming the multiplication operation result, and transforms the multiplication operation result according to the inverse transformation matrix to obtain an operation result; wherein, the multiplication operation result is transformed The operation is disassembled into a summation operation, and the operation result is determined according to the summation operation.
The method according to claim 11, wherein the operation process of the main functional unit and the slave functional unit is a parallel operation, and the calculation of the transformation result value of the position of each element in the characteristic data by the main functional unit is completed Previously, the slave functional unit performs the alignment multiplication operation of the feature transformation result and the weight transformation result under the element position for the element position of the calculated feature transformation result value, until the alignment multiplication of each element position is calculated Calculate the value and obtain the result of the multiplication operation.
An arithmetic device, characterized in that it comprises:

An acquisition module, used to acquire the feature data output by the upper layer convolutional network and the feature transformation matrix used to perform positive transformation on the feature data;

The feature transformation module is used to transform the feature data according to the feature transformation matrix to obtain a feature transformation result; wherein the transformation operation of the feature data is disassembled into a summation operation, and the summation operation is determined according to the summation operation. The result of feature transformation;

The bit multiplication module is used to obtain the weight conversion result of the positive transformation of the convolutional network of this layer, and perform bit multiplication on the feature conversion result and the weight conversion result to obtain the multiplication result;

The inverse transformation module is used to obtain an inverse transformation matrix for inverse transformation of the multiplication operation result, and transform the multiplication operation result according to the inverse transformation matrix to obtain the operation result; wherein The transformation operation is disassembled into a summation operation, and the operation result is determined according to the summation operation;

The transmission module is used to output the calculation result to the lower layer convolutional network.
The device of claim 13, wherein:

The feature transformation module is specifically used for:

Disassemble the feature data into multiple feature sub-tensors;

Performing a transformation operation on the multiple feature sub-tensors according to the feature transformation matrix and summing them to obtain the feature transformation result;

The inverse transform module is specifically used for:

Disassemble the multiplication operation result into multiple result sub-tensors;

Perform a transformation operation on the multiple resultant sub-tensors according to the inverse transformation matrix and sum them to obtain the operation result.
The device according to claim 14, wherein the sum of a plurality of the feature sub-tensors is the feature data; the sum of a plurality of the result sub-tensors is the result of the multiplication operation;

The number of the multiple feature sub-tensors is the same as the number of non-zero elements in the feature data, each feature sub-tensor has a single non-zero element, and the non-zero elements in the feature sub-tensor are the same as The non-zero elements at corresponding positions in the feature data are the same;

The number of the multiple resultant sub-tensors is the same as the number of non-zero elements in the result of the multiplication operation, each of the resultant sub-tensors has a single non-zero element, and there are non-zero elements in the resultant sub-tensor It is the same as the non-zero element in the corresponding position in the multiplication operation result.
The device according to claim 14, wherein the feature transformation module is specifically configured to:

Determine a corresponding feature element sub-tensor according to the feature sub-tensor, wherein the feature element sub-tensor is a tensor with a non-zero element of the feature sub-tensor set to 1;

Determine the transformation result of the feature sub-tensor according to the feature transformation matrix, the feature element sub-tensor and the corresponding non-zero elements;

Summing the transformation results of the feature subtensors to obtain the feature transformation results;

The inverse transform module is specifically used for:

Determine a corresponding result element sub-tensor according to the result sub-tensor, wherein the result element sub-tensor is a tensor in which a non-zero element of the result sub-tensor is set to 1;

Determine the transformation result of the result sub-tensor according to the inverse transformation matrix, the result element sub-tensor and the corresponding non-zero elements;

Summing the transformation results of the resultant sub-tensor to obtain the operation result.
The device according to claim 16, wherein the feature transformation module is specifically configured to:

Multiply the left side of the feature element sub-tensor by the left multiplication matrix in the feature transformation matrix, and multiply the right side by the right multiplication matrix in the feature transformation matrix, and the result will be the non-correspondence of the feature element sub-tensor. Multiply the 0 elements to obtain the transformation result of the feature sub-tensor;

Wherein, the left multiplication matrix and the right multiplication matrix in the feature element sub-tensor are both determined by the scale of the feature sub-tensor;

The inverse transform module is specifically used for:

Multiply the left side of the result element sub-tensor by the left multiplying matrix in the inverse transformation matrix, and multiply the right side by the right multiplying matrix in the inverse transformation matrix, and the result is the non-sub-tensor corresponding to the result element. Multiply the 0 elements to obtain the transformation result of the resulting sub-tensor;

Wherein, the left multiplication matrix and the right multiplication matrix in the result element sub-tensor are both determined by the scale of the result sub-tensor.
The device according to claim 13, wherein the alignment multiplication module is specifically configured to:

Obtain the weight data of the convolutional network of this layer and the weight transformation matrix used for positive transformation of the weight data;

Transform the weight data according to the weight transformation matrix to obtain a weight transformation result; wherein, the transformation operation of the weight data is disassembled into a summation operation, and the weight is determined according to the summation operation. The result of the value transformation.
The device according to claim 18, wherein the alignment multiplication module is specifically configured to:

Disassemble the weight data into multiple weight sub-tensors;

Perform a transformation operation on the multiple weight sub-tensors according to the weight transformation matrix and sum them to obtain the weight transformation result.
The device according to claim 19, wherein the alignment multiplication module is specifically configured to:

Determine the corresponding weight sub-tensor according to the weight sub-tensor, wherein the weight sub-tensor is a tensor in which non-zero elements of the weight sub-tensor are set to 1;

Determining the transformation result of the weight sub-tensor according to the weight transformation matrix, the weight element sub-tensor and the corresponding non-zero elements;

Sum the transformation results of the weight sub-tensors to obtain the weight transformation results.
The device according to claim 20, wherein the alignment multiplication module is specifically configured to:

Multiply the left side of the weight element sub-tensor by the left multiplication matrix in the weight transformation matrix, and multiply the right side by the right multiplication matrix in the weight transformation matrix, and compare the result with the weight element sub-tensor Multiply the non-zero elements corresponding to the quantity to obtain the transformation result of the weight sub-tensor;

Wherein, the left multiplication matrix and the right multiplication matrix in the weight sub-tensor are both determined by the scale of the weight sub-tensor.
An artificial intelligence chip, characterized in that the chip comprises the computing device according to any one of claims 13-21.
An electronic device, wherein the electronic device comprises the artificial intelligence chip according to claim 22.
A board card, characterized in that the board card comprises: a storage device, an interface device, a control device, and the artificial intelligence chip according to claim 22;

Wherein, the artificial intelligence chip is connected to the storage device, the control device, and the interface device respectively;

The storage device is used to store data;

The interface device is used to implement data transmission between the artificial intelligence chip and external equipment;

The control device is used to monitor the state of the artificial intelligence chip.
The board according to claim 24, characterized in that,

The storage device includes: multiple groups of storage units, each group of the storage unit is connected to the artificial intelligence chip through a bus, and the storage unit is: DDR SDRAM;

The chip includes: a DDR controller, which is used to control the data transmission and data storage of each storage unit;

The interface device is: a standard PCIE interface.