CN118052283A - Simplified reasoning method, system, medium and device of binary neural network - Google Patents

Simplified reasoning method, system, medium and device of binary neural network Download PDF

Info

Publication number
CN118052283A
CN118052283A CN202410170334.7A CN202410170334A CN118052283A CN 118052283 A CN118052283 A CN 118052283A CN 202410170334 A CN202410170334 A CN 202410170334A CN 118052283 A CN118052283 A CN 118052283A
Authority
CN
China
Prior art keywords
matrix
weight
output
neural network
calculation
Prior art date
Legal status (The legal status is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the status listed.)
Pending
Application number
CN202410170334.7A
Other languages
Chinese (zh)
Inventor
李威君
李元振
尚德龙
周玉梅
Current Assignee (The listed assignees may be inaccurate. Google has not performed a legal analysis and makes no representation or warranty as to the accuracy of the list.)
Zhongke Nanjing Intelligent Technology Research Institute
Original Assignee
Zhongke Nanjing Intelligent Technology Research Institute
Priority date (The priority date is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the date listed.)
Filing date
Publication date
Application filed by Zhongke Nanjing Intelligent Technology Research Institute filed Critical Zhongke Nanjing Intelligent Technology Research Institute
Priority to CN202410170334.7A priority Critical patent/CN118052283A/en
Publication of CN118052283A publication Critical patent/CN118052283A/en
Pending legal-status Critical Current

Links

Landscapes

  • Character Discrimination (AREA)

Abstract

The invention discloses a simplified reasoning method, system, medium and equipment of a binary neural network, comprising the following steps: and reading the neural network calculation graph and the weight related data from the system configuration file of the neural network, and converting the input data and the corresponding weight into a binary matrix. And controlling the inference calculation of the input matrix and the weight matrix in the neural network calculation graph according to the inference flow direction. Particularly, in the reasoning calculation of the convolution calculation node, the multiplication operation is eliminated by utilizing the logical AND operation, the addition operation is eliminated by utilizing the logical AND operation and the character statistics method, the efficient reasoning of the binary neural network model can be realized, the reasoning speed is improved, the occupied calculation resources are few, the calculation power conflict with low-configuration equipment can not be generated, and the method can be used on low-configuration equipment in a large-scale layout.

Description

Simplified reasoning method, system, medium and device of binary neural network
Technical Field
The present application relates to the field of electronic technologies, and in particular, to a simplified reasoning method, system, medium, and apparatus for a binary neural network.
Background
In recent years, deep learning technology has penetrated into various industries and fields, and the demands of low-profile devices such as mobile devices, wearable devices and the like for the deep learning technology are also growing. Because deep learning models require a significant amount of computational and memory resources, direct application to low-profile devices can create computational conflicts, binary neural network models (BNN) are often employed to address the challenges of applying deep learning models to low-profile devices.
However, the applicant found in the research that, although the binary neural network model adopts a mode of representing input tensor and weight by single bit to realize the maximum compression of the model, binary data still need to be expanded into integer or floating point data to carry out multiply-accumulate calculation when the network model carries out reasoning calculation, so that the reasoning speed of the binary neural network model is weakened.
Disclosure of Invention
The invention provides a simplified reasoning method, system, medium and equipment of a binary neural network, which aim to solve or partially solve the technical problem that the reasoning speed of the binary neural network is lower.
To solve the above technical problem, in a first aspect of the present invention, a simplified reasoning method of a binary neural network is disclosed, which is characterized in that the method includes:
reading a neural network calculation graph and weight related data from a system configuration file of the neural network; the neural network computation graph comprises a plurality of computation nodes and a reasoning flow direction among the nodes, and the weight related data comprises weight values and matrix quantization information;
Quantizing the weight into a binary weight matrix based on the matrix quantization information and loading the binary weight matrix into a corresponding computing node in the neural network computing graph;
quantizing input data into a binary input matrix based on the matrix quantization information;
controlling the reasoning calculation of the input matrix and the weight matrix in the neural network calculation graph according to the reasoning flow direction until all calculation nodes of the neural network calculation graph are traversed, and obtaining output data;
In the reasoning calculation of the convolution calculation nodes in the plurality of calculation nodes, performing logical exclusive nor operation on the sign bit of the node input matrix and the sign bit of the node weight matrix which belong to the convolution calculation nodes to obtain a first output matrix; performing logic AND operation circularly in the first output matrix according to a set circulation order to obtain a second output matrix; the second output matrix is converted into a node output matrix based on output dimensions belonging to the convolution computing node to serve as the node input matrix of the next computing node.
Optionally, the weights are four-dimensional tensors, which are respectively: the number, the number of channels, the height and the width; the quantizing the weight into a binary weight matrix based on the matrix quantizing information specifically includes:
Converting the weight from the four-dimensional tensor to a two-dimensional weight matrix;
and quantizing the two-dimensional weight matrix based on quantization zero points in the matrix quantization information to obtain the weight matrix.
Optionally, the dimensions of the input data are: IH×IW×IC; wherein IH represents the height of the input data, IW represents the width of the input data, and IC represents the number of channels of the input data; the method for quantizing the input data into a binary input matrix based on the matrix quantization information specifically comprises the following steps:
expanding the input data into a two-dimensional input matrix of (ih×iw) × (kh×kw×ic) according to a convolution calculation characteristic; the unfolding mode is as follows: KH denotes the height of the convolution kernel, KW denotes the height of the convolution kernel, S denotes the convolution stride,/> Representing a downward rounding;
And quantizing the two-dimensional input matrix based on quantization zero points in the matrix quantization information to obtain the input matrix.
Optionally, the performing a logical or operation on the sign bit of the node input matrix and the sign bit of the node weight matrix belonging to the convolution computing node to obtain a first output matrix specifically includes:
Extracting the sign bit of the node input matrix to obtain an input sign bit matrix;
extracting the sign bit of the node weight matrix to obtain a weight sign bit matrix;
and executing logical exclusive OR operation on the corresponding positions of the input symbol bit matrix and the weight symbol bit matrix to obtain a first output matrix.
Optionally, the performing a logical and operation in the first output matrix according to a set cyclic order to obtain a second output matrix specifically includes:
Traversing line by line in the first output matrix, dividing each line into N groups according to line by line, dividing each group into K elements, and performing logic AND operation in the K elements in a circulating manner in the first output matrix according to the set circulating order to perform logic AND operation to obtain an intermediate output matrix; n is more than or equal to 2 and is a positive integer, k is more than or equal to 1 and less than N and is a positive integer;
and carrying out character statistics on the intermediate output matrix, and generating the second output matrix based on a character statistics result.
Optionally, the performing character statistics on the intermediate output matrix, and generating the second output matrix based on the result of the character statistics specifically includes:
and counting the number of characters '1' contained in the corresponding position of the intermediate output matrix, and filling the number into the corresponding position of the second output matrix.
Optionally, after the output data is obtained, the method further includes:
and converting the output data into a target output matrix according to the output dimension information read from the system configuration file.
In a second aspect of the present invention, a simplified reasoning system of a binary neural network is disclosed, characterized in that the system comprises:
The reading module is used for reading the neural network calculation graph and the weight related data from the system configuration file of the neural network; the neural network computation graph comprises a plurality of computation nodes and a reasoning flow direction among the nodes, and the weight related data comprises weight values and matrix quantization information;
The first quantization module is used for quantizing the weight into a binary weight matrix based on the matrix quantization information and loading the binary weight matrix into a corresponding computing node in the neural network computing graph;
A second quantization module for quantizing input data into a binary input matrix based on the matrix quantization information;
The inference calculation module is used for controlling the inference calculation of the input matrix and the weight matrix in the neural network calculation graph according to the inference flow direction until all calculation nodes of the neural network calculation graph are traversed, so as to obtain output data;
In the reasoning calculation of the convolution calculation nodes in the plurality of calculation nodes, performing logical exclusive nor operation on the sign bit of the node input matrix and the sign bit of the node weight matrix which belong to the convolution calculation nodes to obtain a first output matrix; performing logic AND operation circularly in the first output matrix according to a set circulation order to obtain a second output matrix; the second output matrix is converted into a node output matrix based on output dimensions belonging to the convolution computing node to serve as the node input matrix of the next computing node.
In a third aspect of the present invention, a computer-readable storage medium is disclosed, on which a computer program is stored which, when being executed by a processor, implements the steps of the above-described method.
In a fourth aspect of the invention, a computer device is disclosed comprising a memory, a processor and a computer program stored on the memory and executable on the processor, said processor implementing the steps of the above method when executing said program.
Through one or more technical schemes of the invention, the invention has the following beneficial effects or advantages:
In the technical scheme of the invention, the neural network calculation graph and the weight related data are read from the system configuration file of the neural network, and the input data and the corresponding weight are converted into a binary matrix. And controlling the inference calculation of the input matrix and the weight matrix in the neural network calculation graph according to the inference flow direction. In particular, in the reasoning calculation of the convolution calculation node, the multiplication operation is eliminated by utilizing the logical AND operation, and the addition operation is eliminated by utilizing the logical AND operation and the character statistics method.
The foregoing description is only an overview of the present invention, and is intended to be implemented in accordance with the teachings of the present invention in order that the same may be more clearly understood and to make the same and other objects, features and advantages of the present invention more readily apparent.
Drawings
Various other advantages and benefits will become apparent to those of ordinary skill in the art upon reading the following detailed description of the preferred embodiments. The drawings are only for purposes of illustrating the preferred embodiments and are not to be construed as limiting the invention. Also, like reference numerals are used to designate like parts throughout the figures.
In the drawings:
FIG. 1 illustrates a simplified reasoning method flow diagram of a binary neural network according to one embodiment of the invention;
FIG. 2 illustrates a schematic diagram of an implementation of a logical AND operation in accordance with one embodiment of the invention;
FIG. 3A illustrates a schematic diagram of an implementation of a logical AND operation in accordance with one embodiment of the invention;
FIG. 3B illustrates a logic diagram of the logical AND operation of K elements according to one embodiment of the invention;
FIG. 3C illustrates a specific numerical calculation example diagram of a logical AND operation according to one embodiment of the invention;
Fig. 4 shows a schematic diagram of a simplified reasoning system of a binary neural network according to one embodiment of the invention.
Detailed Description
Exemplary embodiments of the present invention will be described in more detail below with reference to the accompanying drawings. While exemplary embodiments of the present invention are shown in the drawings, it should be understood that the present invention may be embodied in various forms and should not be limited to the embodiments set forth herein. Rather, these embodiments are provided so that this disclosure will be thorough and complete, and will fully convey the scope of the invention to those skilled in the art.
Exemplary embodiments of the present invention will be described in more detail below with reference to the accompanying drawings. While exemplary embodiments of the present invention are shown in the drawings, it should be understood that the present invention may be embodied in various forms and should not be limited to the embodiments set forth herein. Rather, these embodiments are provided so that this disclosure will be thorough and complete, and will fully convey the scope of the invention to those skilled in the art.
In a first aspect, as shown in fig. 1, a simplified reasoning method of a binary neural network provided in an embodiment of the present disclosure at least includes the following steps:
s101, reading a neural network calculation graph and weight related data from a system configuration file of the neural network.
The system configuration file of the neural network comprises system configuration information, input source setting information, a neural network calculation map and weight related data. The system configuration information defines a basic configuration of an operating system to which the neural network belongs. The input source setting information refers to a path for designating an input file or an I/O port of an acquisition device, and may be read from the file or from an acquisition device such as a sensor (e.g., camera). The neural network computational graph comprises input dimension information, output dimension information, a plurality of computational nodes and an inter-node reasoning flow direction. The input latitude information defines an input dimension of the input data; the output dimension information defines the output dimension of the output data; a computing node may be understood as a "layer" of which several computing nodes include: the method mainly aims at convolution computing nodes and other nodes. The inter-node reasoning flow defines the calculation sequence of each node, and the node output data of the previous calculation node is used as the node input data of the next calculation node. The weight related data comprises weight values and matrix quantization information, wherein the weight values relate to related parameters such as data bit width, data type, tensor dimension and the like; wherein the data bit width refers to the number of bits used to represent integer data, and the binary data bit width is 1, that is, one bit is used to represent data; the data type is used for indicating whether the data is floating point type or integer type, whether the data is signed number or unsigned number, and before quantization, the data type of the weight is signed floating point number (float 32); tensor dimension, which refers to weight dimension information, e.g., the dimension of the convolution weight is [16,3,3,3], represents 16,3 channels, 3 high, 3 wide. Quantization is a process that will involve no dimensional changes. Quantization information including quantization zero and binary quantization method. Quantization zero is a standard for quantization, such as: taking 0 as a quantization zero point, the binary quantization method comprises the following steps: greater than zero is quantized to 1, otherwise quantized to 0. Of course, the quantization zero may be other values, such as 0.5, without limitation.
S102, based on the matrix quantization information, the weight is quantized into a binary weight matrix and is loaded to a corresponding computing node in the neural network computing graph.
Wherein, the weight is four-dimensional tensor, respectively: the number, the number of channels, the height and the width. In a specific quantization process, the weights are converted from a four-dimensional tensor to a two-dimensional weight matrix. For example, the weights are converted from four-dimensional tensors to two-dimensional weight matrices using Im2Col functions, and of course, the conversion may be performed with priority or column priority. And quantizing the two-dimensional weight matrix based on quantization zero points in the matrix quantization information to obtain the weight matrix. For convenience in explaining and explaining the present invention, for example: the two-dimensional weight matrix is(Floating point number), taking 0 as a quantization zero point, and after quantization by matrix quantization information, obtaining a binary weight matrix which is: /(I)
S103, quantizing the input data into a binary input matrix based on the matrix quantization information.
First, preprocessing operations such as gamma conversion, normalization, and the like are performed on input data. The dimensions of the input data are: IH×IW×IC; wherein IH represents the height of the input data, IW represents the width of the input data, and IC represents the number of channels of the input data;
In the quantization process, the input data is spread into a two-dimensional input matrix of (ih×iw) × (kh×kw×ic) according to the convolution calculation characteristics; the unfolding mode is as follows: KH denotes the height of the convolution kernel, KW denotes the height of the convolution kernel, S denotes the convolution stride,/> Representing a downward rounding; the method of converting the matrix into a column vector after expansion is filled in a specific value of the input data with reference to Im2Col, but is not limited thereto. After the two-dimensional input matrix is obtained, the two-dimensional input matrix is quantized based on quantization zero points in matrix quantization information by referring to the quantization mode which is the same as the weight value, and a binary input matrix is obtained. For convenience in explaining and explaining the present invention, for example: the two-dimensional input matrix is/>Taking 0 as a quantization zero point, and after matrix quantization information quantization is adopted, the obtained binary input matrix is as follows: /(I)
S104, controlling the reasoning calculation of the input matrix and the weight matrix in the neural network calculation graph according to the reasoning flow direction until all calculation nodes of the neural network calculation graph are traversed, and obtaining output data.
If the computation node of the neural network computation graph is a convolution computation node, the input data needs to be expanded into a binary input matrix according to the above, and if the input data is of other types of computation nodes, the input data is kept unchanged.
Further, in the reasoning calculation of the convolution calculation node in the plurality of calculation nodes, performing logical OR operation on the sign bit of the node input matrix and the sign bit of the node weight matrix which belong to the convolution calculation node to obtain a first output matrix; performing logic AND operation circularly in the first output matrix according to the set circulation order to obtain a second output matrix; the second output matrix is converted into a node output matrix based on the output dimensions belonging to the convolution computing node to serve as a node input matrix of the next computing node. It is noted that the sign bit of the node input matrix and the sign bit of the node weight matrix are both the first digital corresponding positions. In actual operation, the node input matrix and the node weight matrix can also be directly adopted to execute logical exclusive OR operation. Exemplary, since the node input matrix isIts corresponding input sign bit matrix is also: the actual node input matrix and the corresponding input symbol bit matrix are the same, so that the node input matrix and the node weight matrix can be directly adopted to execute logical exclusive nor operation.
In the implementation process of executing logical OR operation, extracting the sign bit of the node input matrix to obtain an input sign bit matrix; extracting the sign bit of the node weight matrix to obtain a weight sign bit matrix; initializing a first output matrix according to the dimension information of the input symbol bit matrix and the weight symbol bit matrix, wherein the first output matrix is a full zero matrix, and the dimension is equal to the dimension obtained by multiplying the input symbol bit matrix and the weight symbol bit matrix. For example, the weight sign bit matrix is: m x K, the input symbol bit matrix is: k×n, the first output matrix is initialized to an all-zero mxn matrix. And performing logical OR operation on the corresponding positions of the input symbol bit matrix and the weight symbol bit matrix to obtain a first output matrix. And determining the corresponding position according to the matrix operation rule. For example:
In the matrix of the above example, a 11⊙b11 is logically ored, and the other positions are similar. Furthermore, the input symbol bit matrix and the weight symbol bit matrix are matrices actually composed of binary digits of 0 and 1, so that a first output matrix is obtained according to the operation principle that the same is 1 and the different is 0 in logical AND operation. The first output matrix is also a binary output matrix.
Referring to fig. 2, a logic diagram for implementing a logical exclusive nor operation is shown. And when the corresponding positions of the input symbol bit matrix and the weight symbol bit matrix are subjected to logical OR operation, obtaining a first output matrix M multiplied by K multiplied by N.
Taking specific binary values of "0" and "1" as examples, the weight symbol bit matrix is: 4×3, the input symbol bit matrix is: 3 x 4, a first output matrix of 4 x 4 is obtained by matrix multiplication. The values in brackets () need to be logically or-ed, and the corresponding positions are filled after the logically or-ed operations are performed, so as to obtain a final first output matrix of 4×4.
Further, a logical AND operation is performed in the first output matrix. In the operation process, traversing line by line in a first output matrix, dividing each line into N groups according to line by line, dividing each group into K elements, and circularly executing logic AND operation by operation logic of the K elements in the first output matrix according to a set circulation order to obtain an intermediate output matrix; n is not less than 2 and is a positive integer, and N is the total number of elements in each row. K is more than or equal to 1 and less than N, and is a positive integer. And performing character statistics on the intermediate output matrix, and generating a second output matrix based on the character statistics result. And counting the number of characters '1' contained in the corresponding position in the middle output matrix during character statistics, and filling the number into the corresponding position of the second output matrix.
Referring to FIG. 3A, a logic diagram of a logical AND operation is shown. In fig. 3A, the dimensions of the first output matrix are: m x K x N, divide each line into N group by line, each group is divided into K element, the internal execution logic AND operation of K element. After division, M pieces of data are obtained, wherein each piece of data contains N groups of K elements. In the process of executing logical AND operation, according to the operation rule of 0 out 0 and 1 out 1, two elements are taken as a group to execute logical AND operation, the first execution of logical AND operation is represented as 1-order operation, and the operation result at the moment is N groups of K/2 elements in each group. Illustratively, if the K elements are respectively: c 0,c1,c2,c3, performing a logical AND operation on every two groups to obtain c_ 1=c0&c1,c_2=c2&c3. Assuming the round robin order is set to 1 order, the elements in each group are filled in the corresponding positions of the intermediate output matrix, e.g., r 1=c_1c_2. And counting the number of the characters '1' contained in the corresponding position of the intermediate output matrix, and filling the number into the corresponding position of the second output matrix, wherein the sum of the numbers of r 1=c_1c_2 in the second output matrix is obtained. If the set cyclic order is 2, performing logical AND operation r 1=c_1&c_2 again and filling the corresponding position of the intermediate output matrix, counting the number of characters '1' contained in the corresponding position of the intermediate output matrix, and filling the number into the corresponding position of the second output matrix.
It is noted that the K elements may be odd or even. Whether the number is odd or even, the logical AND operation is firstly carried out in groups every two, and when the set circulation order is not satisfied, the rest single elements which are not involved in the logical AND operation after grouping in the K elements directly participate in the next logical AND operation; and when the set circulation order is not satisfied, filling the corresponding positions of the intermediate output matrixes with the rest single elements which are not involved in the logical AND operation after being grouped in the K elements. Referring to fig. 3B, a logic schematic diagram of the logical and operation of K elements is shown.
Referring to the specific numerical computation in fig. 3C, the dimensions of the first output matrix are: when 4×3×4 is grouped, each row is divided into 4 groups, and 3 elements in each group perform logical and operation. If the cyclic order is set to be 1 order, after the first logical AND operation is executed, the elements in each group are filled in the corresponding positions of the intermediate output matrixes, the number of characters '1' contained in the corresponding positions of the intermediate output matrixes is counted, and the number is filled in the corresponding positions of the second output matrixes. If the cyclic order is set to be 2 orders, after the first logical AND operation is executed, the second logical AND operation is executed continuously, elements in each group are filled in the corresponding positions of the intermediate output matrixes, the number of characters '1' contained in the corresponding positions of the intermediate output matrixes is counted, and the number is filled in the corresponding positions of the second output matrixes.
The above is the processing logic of the convolution computation node, after traversing the neural network computation graph to obtain output data, converting the output data into a target output matrix according to the output dimension information read from the system configuration file. For example, the output dimension information is [ batch size B, channel number C, height H, width W ], and the final target output matrix is obtained by converting the output dimension information accordingly. Taking the second output matrix in fig. 3B as an example, the target output matrix can be remodeled into [1,4,2,2], and the result of reasoning is obtained and output by looking up a table.
It is noted that since the most common operation in neural networks is a multiply-accumulate operation, i.e., summing the products of the elements. And after the elements of the binary network are multiplied, the output of the binary network is only 0 and 1, and the results of summing the 0 and 1 results and counting the character 1 are consistent.
When classifying, the method is used for outputting the lower index value of the most value (generally taking the maximum value) of the vector group instead of the specific value of the operation, so that after the logical AND operation and the logical AND operation are executed, the method does not need to be converted into floating point numbers, only needs to count the character 1, and after a target output matrix is remodeled, the lower index value of a corresponding statistical result is obtained by table look-up. The table lookup refers to a tag value corresponding to a data processing result after the statistics of the character 1 is queried from the classification dictionary.
Exemplary, the table structure of the classification dictionary according to the present application is as follows:
{
1: the tag value of 1 is set to be,
2: The tag value of 2 is set to be,
……}
Wherein the tag value (classification result) is related to, i.e. provided by, a specific dataset. For example, for MNIST, the table is:
{
0:0,
1:1,
……,
9:9,
}
and (5) looking up a table by using the lowest index value to obtain a corresponding label value.
For example: the output target output matrix is: [0,0,0,5,3,4,7,4,1,1] the maximum value is 7, the subscript value of 7 is 6, and the table look-up knows that its reasoning result is the number 6.
The method disclosed by the technical scheme can be applied to low-profile equipment for calculation, and can also be used as a verification method for verifying the calculation result of the binary neural network.
Further, in order to verify the feasibility of the present technical solution, an embodiment is provided: modifying the LeNet to obtain a Bin_ Mnist _mixed_big network to carry out MNIST (handwritten number recognition) test.
The Bin_ Mnist _mixed_big network has the parameters of 47.746K multiplied by the accumulated calculated amount of 158.368K, and the network structure is shown in a table 1. The network consists of 2 convolution units (Conv 1, conv 2) with convolution kernel size of 5×5 and stride of 2, 1 full connection layer and 1 binary full connection unit; reLU is used as the activation function.
Table 1 Bin_Mnist_Mixed_Big network structure table
The bin_ Mnist _mixed_big network is trained by adopting Mixed precision, the trunk part of the network adopts 32-bit single-bit floating point weight, and the output layer (FC 2 layer) adopts 1bit (0 and 1) weight.
In reasoning, the first 3 layers of the network, namely Conv1, conv2 and FC1, are deployed by ONNXRunTime, and are verified by using an output layer, wherein the selected test set comprises 10,000 handwritten font images, and the specific composition is as follows:
table 2MNIST official test set image distribution statistics table
The test results are shown in Table 3, with a first order reduced reasoning error of 0.65% and below 1%.
Table 3 simplified matrix multiplication test results table
Note that:
1) PyTorch refers to tests performed under the PyTorch framework;
2) NumPy refers to FC2 using NumPy operations;
3) MAC refers to FC2 operating using multiply-accumulate;
4) Simple_mac refers to FC2 using the proposed binary matrix reduced computation method.
According to the technical scheme, the addition operation quantity can be further reduced through multiple applications. Table 4 is the high-order test results of the proposed algorithm.
Table 4 high order simplified matrix multiplication test result table
In a second aspect, based on the same inventive concept as the simplified reasoning method of the binary neural network provided in the foregoing first aspect embodiment, the present specification embodiment further provides a simplified reasoning system of the binary neural network, referring to fig. 4, where the system includes:
A reading module 401, configured to read the neural network computation graph and the weight related data from the system configuration file of the neural network; the neural network computation graph comprises a plurality of computation nodes and a reasoning flow direction among the nodes, and the weight related data comprises weight values and matrix quantization information;
A first quantization module 402, configured to quantize the weights into a binary weight matrix based on the matrix quantization information and load the binary weight matrix into corresponding computing nodes in the neural network computation graph;
a second quantization module 403, configured to quantize the input data into a binary input matrix based on the matrix quantization information;
The inference calculation module 404 is configured to control the inference calculation of the input matrix and the weight matrix in the neural network calculation graph according to the inference flow direction until all calculation nodes of the neural network calculation graph are traversed, so as to obtain output data;
In the reasoning calculation of the convolution calculation nodes in the plurality of calculation nodes, performing logical exclusive nor operation on the sign bit of the node input matrix and the sign bit of the node weight matrix which belong to the convolution calculation nodes to obtain a first output matrix; performing logic AND operation circularly in the first output matrix according to a set circulation order to obtain a second output matrix; the second output matrix is converted into a node output matrix based on output dimensions belonging to the convolution computing node to serve as the node input matrix of the next computing node.
It should be noted that, in the simplified reasoning system of the binary neural network provided in the embodiment of the present specification, the specific manner in which each module performs the operation has been described in detail in the method embodiment provided in the first aspect, and the specific implementation process may refer to the method embodiment provided in the first aspect, which will not be described in detail herein.
In a third aspect, based on the same inventive concept as the simplified reasoning method of the binary neural network provided by the embodiment of the first aspect, the embodiment of the present invention further discloses a computer readable storage medium having stored thereon a computer program which when executed by a processor implements the steps of any of the methods described above.
In a fourth aspect, based on the same inventive concept as the simplified reasoning method of the binary neural network provided in the foregoing first aspect, an embodiment of the present invention further discloses a computer device, including a memory, a processor, and a computer program stored on the memory and executable on the processor, the processor implementing the steps of any of the foregoing methods when executing the program.
Through one or more embodiments of the present invention, the present invention has the following benefits or advantages:
In the technical scheme of the invention, the neural network calculation graph and the weight related data are read from the system configuration file of the neural network, and the input data and the corresponding weight are converted into a binary matrix. And controlling the inference calculation of the input matrix and the weight matrix in the neural network calculation graph according to the inference flow direction. In particular, in the reasoning calculation of the convolution calculation node, the multiplication operation is eliminated by utilizing the logical AND operation, and the addition operation is eliminated by utilizing the logical AND operation and the character statistics method.
The algorithms and displays presented herein are not inherently related to any particular computer, virtual system, or other apparatus. Various general-purpose systems may also be used with the teachings herein. The required structure for a construction of such a system is apparent from the description above. In addition, the present invention is not directed to any particular programming language. It will be appreciated that the teachings of the present invention described herein may be implemented in a variety of programming languages, and the above description of specific languages is provided for disclosure of enablement and best mode of the present invention.
In the description provided herein, numerous specific details are set forth. However, it is understood that embodiments of the invention may be practiced without these specific details. In some instances, well-known methods, structures and techniques have not been shown in detail in order not to obscure an understanding of this description.
Similarly, it should be appreciated that in the above description of exemplary embodiments of the invention, various features of the invention are sometimes grouped together in a single embodiment, figure, or description thereof for the purpose of streamlining the disclosure and aiding in the understanding of one or more of the various inventive aspects. However, the disclosed method should not be construed as reflecting the intention that: i.e., the claimed invention requires more features than are expressly recited in each claim. Rather, as the following claims reflect, inventive aspects lie in less than all features of a single foregoing disclosed embodiment. Thus, the claims following the detailed description are hereby expressly incorporated into this detailed description, with each claim standing on its own as a separate embodiment of this invention.
Those skilled in the art will appreciate that the modules in the apparatus of the embodiments may be adaptively changed and disposed in one or more apparatuses different from the embodiments. The modules or units or components of the embodiments may be combined into one module or unit or component and, furthermore, they may be divided into a plurality of sub-modules or sub-units or sub-components. Any combination of all features disclosed in this specification (including any accompanying claims, abstract and drawings), and all of the processes or units of any method or apparatus so disclosed, may be used in combination, except insofar as at least some of such features and/or processes or units are mutually exclusive. Each feature disclosed in this specification (including any accompanying claims, abstract and drawings), may be replaced by alternative features serving the same, equivalent or similar purpose, unless expressly stated otherwise.
Furthermore, those skilled in the art will appreciate that while some embodiments herein include some features but not others included in other embodiments, combinations of features of different embodiments are meant to be within the scope of the invention and form different embodiments. For example, in the following claims, any of the claimed embodiments can be used in any combination.
Various component embodiments of the invention may be implemented in hardware, or in software modules running on one or more processors, or in a combination thereof. Those skilled in the art will appreciate that some or all of the functions of some or all of the components in a gateway, proxy server, system according to embodiments of the present invention may be implemented in practice using a microprocessor or Digital Signal Processor (DSP). The present invention can also be implemented as an apparatus or device program (e.g., a computer program and a computer program product) for performing a portion or all of the methods described herein. Such a program embodying the present invention may be stored on a computer readable medium, or may have the form of one or more signals. Such signals may be downloaded from an internet website, provided on a carrier signal, or provided in any other form.
It should be noted that the above-mentioned embodiments illustrate rather than limit the invention, and that those skilled in the art will be able to design alternative embodiments without departing from the scope of the appended claims. In the claims, any reference signs placed between parentheses shall not be construed as limiting the claim. The word "comprising" does not exclude the presence of elements or steps not listed in a claim. The word "a" or "an" preceding an element does not exclude the presence of a plurality of such elements. The invention may be implemented by means of hardware comprising several distinct elements, and by means of a suitably programmed computer. In the unit claims enumerating several means, several of these means may be embodied by one and the same item of hardware. The use of the words first, second, third, etc. do not denote any order. These words may be interpreted as names.

Claims (10)

1. A simplified reasoning method of a binary neural network, the method comprising:
reading a neural network calculation graph and weight related data from a system configuration file of the neural network; the neural network computation graph comprises a plurality of computation nodes and a reasoning flow direction among the nodes, and the weight related data comprises weight values and matrix quantization information;
Quantizing the weight into a binary weight matrix based on the matrix quantization information and loading the binary weight matrix into a corresponding computing node in the neural network computing graph;
quantizing input data into a binary input matrix based on the matrix quantization information;
controlling the reasoning calculation of the input matrix and the weight matrix in the neural network calculation graph according to the reasoning flow direction until all calculation nodes of the neural network calculation graph are traversed, and obtaining output data;
In the reasoning calculation of the convolution calculation nodes in the plurality of calculation nodes, performing logical exclusive nor operation on the sign bit of the node input matrix and the sign bit of the node weight matrix which belong to the convolution calculation nodes to obtain a first output matrix; performing logic AND operation circularly in the first output matrix according to a set circulation order to obtain a second output matrix; the second output matrix is converted into a node output matrix based on output dimensions belonging to the convolution computing node to serve as the node input matrix of the next computing node.
2. The method of claim 1, wherein the weights are four-dimensional tensors of: the number, the number of channels, the height and the width; the quantizing the weight into a binary weight matrix based on the matrix quantizing information specifically includes:
Converting the weight from the four-dimensional tensor to a two-dimensional weight matrix;
and quantizing the two-dimensional weight matrix based on quantization zero points in the matrix quantization information to obtain the weight matrix.
3. The method of claim 1 or 2, wherein the dimensions of the input data are: IH×IW×IC; wherein IH represents the height of the input data, IW represents the width of the input data, and IC represents the number of channels of the input data; the method for quantizing the input data into a binary input matrix based on the matrix quantization information specifically comprises the following steps:
expanding the input data into a two-dimensional input matrix of (ih×iw) × (kh×kw×ic) according to a convolution calculation characteristic; the unfolding mode is as follows: KH denotes the height of the convolution kernel, KW denotes the height of the convolution kernel, S denotes the convolution stride,/> Representing a downward rounding;
And quantizing the two-dimensional input matrix based on quantization zero points in the matrix quantization information to obtain the input matrix.
4. The method of claim 1, wherein performing a logical or operation on the sign bits of the node input matrix and the sign bits of the node weight matrix belonging to the convolution computing node to obtain a first output matrix, specifically comprises:
Extracting the sign bit of the node input matrix to obtain an input sign bit matrix;
extracting the sign bit of the node weight matrix to obtain a weight sign bit matrix;
and executing logical exclusive OR operation on the corresponding positions of the input symbol bit matrix and the weight symbol bit matrix to obtain a first output matrix.
5. The method according to claim 1 or 4, wherein performing a logical and operation in the first output matrix according to a set cyclic order to obtain a second output matrix, specifically includes:
Traversing line by line in the first output matrix, dividing each line into N groups according to line by line, dividing each group into K elements, and performing logic AND operation in the K elements in a circulating manner in the first output matrix according to the set circulating order to perform logic AND operation to obtain an intermediate output matrix; n is more than or equal to 2 and is a positive integer, k is more than or equal to 1 and less than N and is a positive integer;
and carrying out character statistics on the intermediate output matrix, and generating the second output matrix based on a character statistics result.
6. The method of claim 5, wherein the performing character statistics on the intermediate output matrix and generating the second output matrix based on the character statistics comprises:
and counting the number of characters '1' contained in the corresponding position of the intermediate output matrix, and filling the number into the corresponding position of the second output matrix.
7. The method of claim 1, wherein after the obtaining the output data, the method further comprises:
and converting the output data into a target output matrix according to the output dimension information read from the system configuration file.
8. A simplified reasoning system of a binary neural network, the system comprising:
The reading module is used for reading the neural network calculation graph and the weight related data from the system configuration file of the neural network; the neural network computation graph comprises a plurality of computation nodes and a reasoning flow direction among the nodes, and the weight related data comprises weight values and matrix quantization information;
The first quantization module is used for quantizing the weight into a binary weight matrix based on the matrix quantization information and loading the binary weight matrix into a corresponding computing node in the neural network computing graph;
A second quantization module for quantizing input data into a binary input matrix based on the matrix quantization information;
The inference calculation module is used for controlling the inference calculation of the input matrix and the weight matrix in the neural network calculation graph according to the inference flow direction until all calculation nodes of the neural network calculation graph are traversed, so as to obtain output data;
In the reasoning calculation of the convolution calculation nodes in the plurality of calculation nodes, performing logical exclusive nor operation on the sign bit of the node input matrix and the sign bit of the node weight matrix which belong to the convolution calculation nodes to obtain a first output matrix; performing logic AND operation circularly in the first output matrix according to a set circulation order to obtain a second output matrix; the second output matrix is converted into a node output matrix based on output dimensions belonging to the convolution computing node to serve as the node input matrix of the next computing node.
9. A computer readable storage medium, on which a computer program is stored, characterized in that the program, when being executed by a processor, implements the steps of the method according to any one of claims 1-7.
10. A computer device comprising a memory, a processor and a computer program stored on the memory and executable on the processor, characterized in that the processor implements the steps of the method according to any of claims 1-7 when the program is executed by the processor.
CN202410170334.7A 2024-02-06 2024-02-06 Simplified reasoning method, system, medium and device of binary neural network Pending CN118052283A (en)

Priority Applications (1)

Application Number Priority Date Filing Date Title
CN202410170334.7A CN118052283A (en) 2024-02-06 2024-02-06 Simplified reasoning method, system, medium and device of binary neural network

Applications Claiming Priority (1)

Application Number Priority Date Filing Date Title
CN202410170334.7A CN118052283A (en) 2024-02-06 2024-02-06 Simplified reasoning method, system, medium and device of binary neural network

Publications (1)

Publication Number Publication Date
CN118052283A true CN118052283A (en) 2024-05-17

Family

ID=91049619

Family Applications (1)

Application Number Title Priority Date Filing Date
CN202410170334.7A Pending CN118052283A (en) 2024-02-06 2024-02-06 Simplified reasoning method, system, medium and device of binary neural network

Country Status (1)

Country Link
CN (1) CN118052283A (en)

Similar Documents

Publication Publication Date Title
US11727276B2 (en) Processing method and accelerating device
US9529590B2 (en) Processor for large graph algorithm computations and matrix operations
CN112292816B (en) Processing core data compression and storage system
CN110163359B (en) Computing device and method
CN112200300B (en) Convolutional neural network operation method and device
CN110163334B (en) Integrated circuit chip device and related product
US10884736B1 (en) Method and apparatus for a low energy programmable vector processing unit for neural networks backend processing
CN113723589A (en) Hybrid precision neural network
CN110738317A (en) FPGA-based deformable convolution network operation method, device and system
CN114138231B (en) Method, circuit and SOC for executing matrix multiplication operation
US11748100B2 (en) Processing in memory methods for convolutional operations
KR20200043617A (en) Artificial neural network module and scheduling method thereof for highly effective operation processing
WO2020256836A1 (en) Sparse convolutional neural network
Zhan et al. Field programmable gate array‐based all‐layer accelerator with quantization neural networks for sustainable cyber‐physical systems
WO2023115814A1 (en) Fpga hardware architecture, data processing method therefor and storage medium
CN118052283A (en) Simplified reasoning method, system, medium and device of binary neural network
US20220253709A1 (en) Compressing a Set of Coefficients for Subsequent Use in a Neural Network
KR102372869B1 (en) Matrix operator and matrix operation method for artificial neural network
US20230012127A1 (en) Neural network acceleration
CN115115044A (en) Configurable sparse convolution hardware acceleration method and system based on channel fusion
WO2023165290A1 (en) Data processing method and apparatus, and electronic device and storage medium
TWI798591B (en) Convolutional neural network operation method and device
US20230259579A1 (en) Runtime predictors for computation reduction in dependent computations
CN114692847B (en) Data processing circuit, data processing method and related products
US20230229917A1 (en) Hybrid multipy-accumulation operation with compressed weights

Legal Events

Date Code Title Description
PB01 Publication
PB01 Publication
SE01 Entry into force of request for substantive examination