CN112199072B - Data processing method, device and equipment based on neural network layer - Google Patents

Data processing method, device and equipment based on neural network layer Download PDF

Info

Publication number
CN112199072B
CN112199072B CN202011229722.6A CN202011229722A CN112199072B CN 112199072 B CN112199072 B CN 112199072B CN 202011229722 A CN202011229722 A CN 202011229722A CN 112199072 B CN112199072 B CN 112199072B
Authority
CN
China
Prior art keywords
preset
data
format
linear
logarithmic
Prior art date
Legal status (The legal status is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the status listed.)
Active
Application number
CN202011229722.6A
Other languages
Chinese (zh)
Other versions
CN112199072A (en
Inventor
张建杰
Current Assignee (The listed assignees may be inaccurate. Google has not performed a legal analysis and makes no representation or warranty as to the accuracy of the list.)
Hangzhou Hikvision Digital Technology Co Ltd
Original Assignee
Hangzhou Hikvision Digital Technology Co Ltd
Priority date (The priority date is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the date listed.)
Filing date
Publication date
Application filed by Hangzhou Hikvision Digital Technology Co Ltd filed Critical Hangzhou Hikvision Digital Technology Co Ltd
Priority to CN202011229722.6A priority Critical patent/CN112199072B/en
Publication of CN112199072A publication Critical patent/CN112199072A/en
Application granted granted Critical
Publication of CN112199072B publication Critical patent/CN112199072B/en
Active legal-status Critical Current
Anticipated expiration legal-status Critical

Links

Images

Classifications

    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F7/00Methods or arrangements for processing data by operating upon the order or content of the data handled
    • G06F7/38Methods or arrangements for performing computations using exclusively denominational number representation, e.g. using binary, ternary, decimal representation
    • G06F7/48Methods or arrangements for performing computations using exclusively denominational number representation, e.g. using binary, ternary, decimal representation using non-contact-making devices, e.g. tube, solid state device; using unspecified devices
    • G06F7/50Adding; Subtracting
    • G06F7/501Half or full adders, i.e. basic adder cells for one denomination
    • G06F7/5013Half or full adders, i.e. basic adder cells for one denomination using algebraic addition of the input signals, e.g. Kirchhoff adders
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06NCOMPUTING ARRANGEMENTS BASED ON SPECIFIC COMPUTATIONAL MODELS
    • G06N3/00Computing arrangements based on biological models
    • G06N3/02Neural networks
    • G06N3/04Architecture, e.g. interconnection topology
    • G06N3/045Combinations of networks
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06NCOMPUTING ARRANGEMENTS BASED ON SPECIFIC COMPUTATIONAL MODELS
    • G06N3/00Computing arrangements based on biological models
    • G06N3/02Neural networks
    • G06N3/06Physical realisation, i.e. hardware implementation of neural networks, neurons or parts of neurons
    • G06N3/063Physical realisation, i.e. hardware implementation of neural networks, neurons or parts of neurons using electronic means
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06NCOMPUTING ARRANGEMENTS BASED ON SPECIFIC COMPUTATIONAL MODELS
    • G06N3/00Computing arrangements based on biological models
    • G06N3/02Neural networks
    • G06N3/08Learning methods
    • YGENERAL TAGGING OF NEW TECHNOLOGICAL DEVELOPMENTS; GENERAL TAGGING OF CROSS-SECTIONAL TECHNOLOGIES SPANNING OVER SEVERAL SECTIONS OF THE IPC; TECHNICAL SUBJECTS COVERED BY FORMER USPC CROSS-REFERENCE ART COLLECTIONS [XRACs] AND DIGESTS
    • Y02TECHNOLOGIES OR APPLICATIONS FOR MITIGATION OR ADAPTATION AGAINST CLIMATE CHANGE
    • Y02DCLIMATE CHANGE MITIGATION TECHNOLOGIES IN INFORMATION AND COMMUNICATION TECHNOLOGIES [ICT], I.E. INFORMATION AND COMMUNICATION TECHNOLOGIES AIMING AT THE REDUCTION OF THEIR OWN ENERGY USE
    • Y02D10/00Energy efficient computing, e.g. low power processors, power management or thermal management

Abstract

The embodiment of the invention provides a data processing method, a device and equipment based on a neural network layer, in the scheme, in a first aspect, input data of a neural network target layer is converted into a preset logarithmic format from a preset first linear format, and a logarithmic domain operator in the target layer is utilized to operate on the input data in the preset logarithmic format to obtain first operated data; therefore, in the scheme, the operation of linear domain data is converted into the operation of logarithmic domain data, namely the operation of using a multiplier is converted into the operation of using an adder, so that the operation amount of the neural network is reduced; in a second aspect, converting the first operated data from a preset logarithmic format to a preset second linear format; calculating the first calculated data with a preset second linear format by using a linear domain operator in the target layer to obtain second calculated data; therefore, in the scheme, the logarithmic domain data is converted into the linear domain data, so that the operation requirement of the linear domain data is met.

Description

Data processing method, device and equipment based on neural network layer
Technical Field
The present invention relates to the field of machine learning technologies, and in particular, to a data processing method, apparatus, and device based on a neural network layer.
Background
Machine learning has been widely used in various fields such as image classification, object detection, natural language processing, and the like. Machine learning is understood to mean training a neural network using sample data, i.e., iteratively adjusting network parameters in the neural network, and the trained neural network may be used for image classification, object detection, natural language processing, and the like.
As the data volume is larger and the algorithm is more complex, the operation volume of the neural network is also larger and more, so that the performance requirement on hardware equipment for deploying the neural network is higher and more, and the hardware cost is increased. Therefore, a scheme for reducing the computation amount of the neural network is needed.
Disclosure of Invention
The embodiment of the invention aims to provide a data processing method, device and equipment based on a neural network layer so as to reduce the operation amount of the neural network.
To achieve the above objective, an embodiment of the present invention provides a data processing method based on a neural network layer, including:
Acquiring input data of a neural network target layer; if the target layer of the neural network is an input layer of the neural network, the input data is any one of the following: images, audio, text; if the target layer of the neural network is not an input layer of the neural network, the input data is any one of the following: image features, audio features, text features;
converting the input data from a preset first linear format to a preset logarithmic format;
calculating the input data in the preset logarithmic format by utilizing a logarithmic domain operator in the target layer to obtain first calculated data;
converting the first operated data from the preset logarithmic format to a preset second linear format;
and calculating the first calculated data of the preset second linear format by using a linear domain operator in the target layer to obtain second calculated data.
Optionally, the preset logarithmic format includes: an exponential portion and a fractional portion; the converting the input data from a preset first linear format to a preset logarithmic format includes:
counting the digits of the leading 0 of the input data;
determining an exponent part of the input data in the preset logarithmic format based on the number of bits of the preamble 0;
Determining a fractional part of the input data in the preset first linear format based on the number of bits of the preamble 0;
mapping the decimal part under the preset first linear format into the decimal part under the preset logarithmic format based on a preset first mapping relation;
and splicing the index part in the preset logarithmic format with the decimal part in the preset logarithmic format to obtain the input data in the preset logarithmic format.
Optionally, the converting the first operated-on data from the preset logarithmic format to a preset second linear format includes:
splitting the first operated data in the preset logarithmic format into an index part and a decimal part;
based on a preset second mapping relation, mapping the fractional part of the split data after the first operation in the preset logarithmic format into the fractional part in the preset second linear format;
obtaining first operated data of a preset second linear format by shifting the decimal part of the preset second linear format to the left; the bit number shifted leftwards is the same as the value of the exponent part of the first operated data in the preset logarithmic format.
Optionally, the calculating the first calculated data in the preset second linear format by using the linear domain arithmetic unit in the target layer, after obtaining the second calculated data, further includes:
intercepting the second operated data in the preset second linear format to obtain the second operated data in the preset first linear format.
Optionally, the operation unit in the target layer includes a matrix multiplication unit, the log domain operator is a log domain adder in the matrix multiplication unit, and the linear domain operator is a linear domain accumulator in the matrix multiplication unit.
Optionally, after the obtaining the input data of the target layer of the neural network, the method further includes:
caching the input data into a caching module, wherein the caching module also caches network parameters of the target layer, and the network parameters are in a preset logarithmic format;
the converting the input data from a preset first linear format to a preset logarithmic format includes:
converting the input data cached in the caching module from a preset first linear format to a preset logarithmic format;
the calculating the input data in the preset logarithmic format by using the logarithmic domain arithmetic unit in the target layer to obtain first calculated data includes:
And operating the input data in the preset logarithmic format and the network parameters cached in the caching module by utilizing a logarithmic domain operator in the target layer to obtain first operated data.
To achieve the above object, an embodiment of the present invention further provides a data processing device based on a neural network layer, including:
the acquisition module is used for acquiring input data of a neural network target layer; if the target layer of the neural network is an input layer of the neural network, the input data is any one of the following: images, audio, text; if the target layer of the neural network is not an input layer of the neural network, the input data is any one of the following: image features, audio features, text features;
the first conversion module is used for converting the input data from a preset first linear format into a preset logarithmic format;
the first operation module is used for operating the input data in the preset logarithmic format by utilizing a logarithmic domain operator in the target layer to obtain first operated data;
the second conversion module is used for converting the first operated data from the preset logarithmic format into a preset second linear format;
And the second operation module is used for operating the first operated data with the preset second linear format by utilizing the linear domain operator in the target layer to obtain second operated data.
Optionally, the preset logarithmic format includes: an exponential portion and a fractional portion; the first conversion module includes:
a statistics sub-module, configured to count a number of bits of a preamble 0 of the input data;
a first determining submodule, configured to determine an exponent portion of the input data in the preset logarithmic format based on the number of bits of the preamble 0;
a second determining submodule for determining a decimal part of the input data in the preset first linear format based on the number of bits of the preamble 0;
the first mapping sub-module is used for mapping the decimal part under the preset first linear format into the decimal part under the preset logarithmic format based on a preset first mapping relation;
and the splicing sub-module is used for splicing the index part in the preset logarithmic format with the decimal part in the preset logarithmic format to obtain the input data in the preset logarithmic format.
Optionally, the second conversion module includes:
The splitting module is used for splitting the first operation data in the preset logarithmic format into an index part and a decimal part;
the second mapping sub-module is used for mapping the fractional part of the first calculated data in the preset logarithmic format obtained by splitting into the fractional part in the preset second linear format based on a preset second mapping relation;
the shifting sub-module is used for obtaining first operated data of the preset second linear format by shifting the decimal part of the preset second linear format leftwards; the bit number shifted leftwards is the same as the value of the exponent part of the first operated data in the preset logarithmic format.
Optionally, the apparatus further includes:
and the intercepting module is used for intercepting the second operated data in the preset second linear format to obtain the second operated data in the preset first linear format.
Optionally, the operation unit in the target layer includes a matrix multiplication unit, the log domain operator is a log domain adder in the matrix multiplication unit, and the linear domain operator is a linear domain accumulator in the matrix multiplication unit.
Optionally, the apparatus further includes:
The caching module is used for caching the input data and the network parameters of the target layer, wherein the network parameters are in a preset logarithmic format;
the first conversion module is specifically configured to: converting the input data cached in the caching module from a preset first linear format to a preset logarithmic format;
the first operation module is specifically configured to: and operating the input data in the preset logarithmic format and the network parameters cached in the caching module by utilizing a logarithmic domain operator in the target layer to obtain first operated data.
In order to achieve the above object, an embodiment of the present invention further provides an electronic device, including a processor and a memory;
a memory for storing a computer program;
and the processor is used for realizing any data processing method based on the neural network layer when executing the program stored in the memory.
According to the embodiment of the invention, in the first aspect, input data of a target layer of the neural network is converted into a preset logarithmic format from a preset first linear format, and the input data in the preset logarithmic format is operated by utilizing a logarithmic domain operator in the target layer to obtain first operated data; therefore, in the scheme, the operation of linear domain data is converted into the operation of logarithmic domain data, namely the operation of using a multiplier is converted into the operation of using an adder, so that the operation amount of the neural network is reduced; in a second aspect, converting the first operated data from a preset logarithmic format to a preset second linear format; calculating the first calculated data with a preset second linear format by using a linear domain operator in the target layer to obtain second calculated data; therefore, in the scheme, the logarithmic domain data is converted into the linear domain data, so that the operation requirement of the linear domain data is met.
Of course, it is not necessary for any one product or method of practicing the invention to achieve all of the advantages set forth above at the same time.
Drawings
In order to more clearly illustrate the embodiments of the invention or the technical solutions in the prior art, the drawings that are required in the embodiments or the description of the prior art will be briefly described, it being obvious that the drawings in the following description are only some embodiments of the invention, and that other drawings may be obtained according to these drawings without inventive effort for a person skilled in the art.
FIG. 1a is a schematic diagram of a linear format of data according to an embodiment of the present invention;
FIG. 1b is a diagram illustrating a logarithmic format of data according to an embodiment of the present invention;
fig. 1c is a schematic structural diagram of a hybrid quantization module according to an embodiment of the present invention;
FIG. 1d is a schematic diagram of a low bit linear domain data-to-logarithmic domain data module according to an embodiment of the present invention;
FIG. 1e is a schematic diagram of a circuit structure for counting the first 0 numbers according to an embodiment of the present invention;
FIG. 1f is a schematic diagram of a process for counting the preamble 0 number by using the circuit shown in FIG. 1e according to an embodiment of the present invention;
FIG. 1g is a schematic diagram of converting low bit linear domain data into logarithmic domain data according to an embodiment of the present invention;
fig. 1h is a schematic structural diagram of a module for converting logarithmic domain data into high-precision linear domain data according to an embodiment of the present invention;
FIG. 1i is a schematic diagram of converting logarithmic domain data into high-precision linear domain data according to an embodiment of the present invention;
FIG. 1j is a schematic diagram of a circuit for implementing accumulation operation according to an embodiment of the present invention;
fig. 2 is a schematic flow chart of a data processing method based on a neural network layer according to an embodiment of the present invention;
FIG. 3 is a schematic structural diagram of a data processing device based on a neural network layer according to an embodiment of the present invention;
fig. 4 is a schematic structural diagram of an electronic device according to an embodiment of the present invention.
Detailed Description
The following description of the embodiments of the present invention will be made clearly and completely with reference to the accompanying drawings, in which it is apparent that the embodiments described are only some embodiments of the present invention, but not all embodiments. All other embodiments, which can be made by those skilled in the art based on the embodiments of the invention without making any inventive effort, are intended to be within the scope of the invention.
Term interpretation:
deep neural network: DNN, deep Neural Networks, can be understood as a neural network comprising a number of hidden layers.
Neural network quantification: as the depth of the neural network increases, the network nodes become more and more, the computation scale becomes very large, which is very unfriendly to the hardware device, if a network with good performance is to be deployed on the hardware device with limited resources, the computation scale can be reduced by means of compression, coding and the like, and quantization is one of compression methods widely used.
Mixing and quantifying: in this embodiment, a hybrid quantization method using linear quantization and logarithmic quantization is referred to.
Matrix multiplying unit: and the multiplication and accumulation calculation unit is a basic operation unit of a convolution layer and a full connection layer of the deep neural network.
FeatureMap data: the output data of a certain layer of the deep neural network will be hereinafter abbreviated as Map data.
Weight data: the convolutional layer of the deep neural network, the network parameters in the fully connected layer.
Bit: the Binary bit is the smallest unit representing information, and in general, the amount of n bits of information may represent a power of 2 n choice.
Rounding: in calculating the number of bits, the number of bits may be replaced with the number of bits of smaller number for convenience, or due to restrictions of calculation tools, or based on other considerations. Thus, for numbers with a larger number of digits, digits after a significant digit are removed according to a certain rule, and the remainder is adjusted to be as close to the original value as possible, a process called rounding.
The inventive concept of the present invention is as follows:
the embodiment of the invention aims to realize hardware calculation acceleration of convolution, full connection layers and the like in the deep neural network, and the neural network adopts floating point number training in the training process, but the floating point number calculation consumes a large amount of resources in the hardware realization process, which is not beneficial to the deployment of the neural network. The method comprises the steps of adopting a quantization process from floating point numbers to linear numbers, realizing neural network calculation in a linear domain, for example, adopting linear int8/int32 to calculate, in the practical application process, actually completing calculation through a matrix multiplication unit, still consuming a large amount of hardware resources for the linear numbers like the linear numbers int8/int32 in the multiplication calculation process, adopting a mixed quantization scheme, completing conversion from linear domain data to logarithmic domain data through a data conversion part, performing addition operation (equivalent to multiplication calculation of the linear domain) in the logarithmic domain, and converting a logarithmic domain result back to the linear domain through the data conversion part.
In an embodiment of the present invention, the data format includes a linear format and a logarithmic format. The linear format may be as shown with reference to fig. 1a, including the sign bit a n-1 And data bit a n-2 ……a k 、a k-1 ……a 1 、a 0 The sign bit indicates whether the data is positive or negative. The logarithmic format may be as shown with reference to fig. 1b, comprising a sign bit b m-1 Index bit b m-2 ……b k And decimal place b k-1 ……b 1 、b 0 The sign bit indicates whether the data is a positive value or a negative value, the exponent bit indicates an exponent portion of the data, and the fraction bit indicates a fraction portion of the data. In the following, data in a linear format is referred to as linear domain data, and data in a logarithmic format is referred to as logarithmic domain data.
In the following, the linear format may be further subdivided into a low bit linear format and a high precision linear format, with the decimal place of the high precision linear format being more than the decimal place of the low bit linear format. The manner of converting the high precision linear format to the low bit linear format may include: the high-precision linear format data is intercepted or rounded, and the specific conversion mode is not limited.
The logarithmic domain data can be expressed as: { sign, exact, fraction }, where sign represents a sign bit, exact represents an exponent portion, fraction represents a fraction portion. The fraction of the fraction is different from the fraction of the conventional floating point format data, the fraction of the conventional floating point format data is linear domain data, and the fraction of the fraction is in an exponential form, which is equivalent to the result of taking log2 logarithm in the exponential form of the linear domain data.
For example, assume a signed piece of linear domain data is: sign (2) exponet ×2 0.fraction ) The process of converting the linear domain data into logarithmic domain data can be expressed as:
log_data=log2(sign(2exponet×20.fraction))=sign(exponent+0.fraction);
the process can be understood as a quantization process, and log_data is quantized log domain data. The quantized exact and fraction may be stored separately using the logarithmic format shown in fig. 1 b.
The multiplication of the linear domain data may be converted into addition of the logarithmic domain data. For example, assume two linear domain data x 1 And x 2 Multiplying x by 1 And x 2 Respectively converting into logarithmic domain data:
Figure BDA0002764779580000071
s 1 sum s 2 Respectively represent sign bits e 1 And e 2 Respectively represent the index parts, f 1 And f 2 Representing the fractional parts, respectively.
x 1 And x 2 The operation of multiplication in the logarithmic domain can be expressed as:
Figure BDA0002764779580000072
where nand represents a nand operation. It can be seen that the above procedure converts the multiplication of linear domain data into addition of logarithmic domain data.
Based on the above inventive concept, the main innovations of the embodiments of the present invention include:
1. the mixed quantization matrix multiplication unit device applied to the deep neural network model is configured, and the mixed quantization matrix multiplication unit device is operated by adopting a mixed quantization mode of linear quantization and logarithmic quantization, and the mixed quantization calculation acceleration scheme framework can be shown as a figure 1c and specifically comprises an input data buffer module 100 for buffering input linear quantization data; a low bit linear domain data to logarithmic domain data module 200 for converting linear domain data to logarithmic domain data; an adder block 300 for adding the logarithmic domain data; a module 400 for converting logarithmic domain data into high-precision linear domain data, which is used for converting logarithmic domain data into linear domain data; a linear domain data accumulation module 500 for completing the matrix multiplication data accumulation operation; a high-precision linear domain data-to-low bit linear domain data module 600 for linear domain data truncation; and an output data buffer module 700 for outputting the buffer of the linear domain data. The hybrid quantization scheme can be seamlessly applied to the deployment of linear quantization deep neural network hardware computational acceleration.
2. The device is added with a module for converting logarithmic domain data into linear domain data, namely the module 400 for converting logarithmic domain data into high-precision linear domain data, which is shown in the following fig. 1h and comprises a truncating module 410 for generating an exponential part and a decimal part of the logarithmic domain data; a second look-up table module 420 for converting the logarithmic domain fractional part to a linear domain fractional part; a second shift module 430 is used to generate final linear quantized data to facilitate processing of the data in subsequent linear quantized network layers.
3. In the process of converting the linear domain data into the logarithmic domain data, the device uses a form of a computation tree, namely a leading zero computation tree module 210 in the low bit linear domain data-to-logarithmic domain data module 200, and uses a form of a computation tree; and finally, a data splicing mode is used instead of an addition circuit when the logarithmic domain data is obtained, and compared with some related schemes, the method is simple to realize and saves hardware resources.
Referring to fig. 1c, the hybrid quantization module provided in the embodiment of the present invention may include: an input data buffer module 100, a low bit linear domain data to logarithmic domain data module 200, an adder module 300, a logarithmic domain data to high precision linear domain data module 400, a linear domain data accumulation module 500, a high precision linear domain data to low bit linear domain data module 600, and an output data buffer module 700.
The Weight data may be converted into a logarithmic format in advance and buffered to the input data buffer module 100.
Taking a certain level of the neural network as an example, the input data of the level may be output data of the previous level, that is, map data, and the input data of the level may be low-bit linear domain data, that is, data in a low-bit linear format. The low bit linear domain data is buffered to the input data buffer module 100. The low bit linear domain data to logarithmic domain data module 200 converts the buffered low bit linear domain data from a low bit linear format to a logarithmic format.
When the matrix multiplication unit processes the linear domain data, the matrix multiplication unit performs multiplication operation and then performs accumulation operation. In the scheme, the linear domain data are converted into the logarithmic domain data, and the multiplication operation is changed into the addition operation. Adder module 300 performs an addition operation on the log-format Map data and the Weight data to obtain first operated data.
The log domain data to high precision linear domain data module 400 converts the first operated data in log format to high precision linear domain data. The accumulation operation in the matrix multiplication unit is represented in this scheme by the accumulation operation of the linear domain data accumulation module 500 on the high-precision linear domain data, where the data obtained after the accumulation operation is referred to as second operation data, and the second operation data is in a high-precision linear format.
The module 600 for converting the high-precision linear domain data into the low-bit linear domain data converts the second operation data from the high-precision linear format to the low-bit linear format, so as to obtain the low-bit linear domain data, as described above, the method of intercepting or rounding can be adopted, and the specific conversion method is not limited. The converted data in the low bit linear format may be buffered in the output data buffer module 700. The neural network of the next level may obtain low bit linear domain data, that is, map data, from the output data buffer module 700.
The following describes each module in detail:
the input data buffer module 100:
in some related schemes, map data and Weight data are stored in a Memory DRAM (Dynamic Random Access Memory), and the adder module 300 typically needs to obtain data from a Static Random-Access Memory (SRAM). In this embodiment, map data and Weight data are cached in the input data cache module 100, so that the adder module 300 can read the Map data and the Weight data from the input data cache module 100 without frequently reading in the SRAM.
The low bit linear domain data to logarithmic domain data module 200:
Referring to fig. 1d, the low bit linear domain data to logarithmic domain data module 200 may include: a preamble 0 computation tree module 210, a first shift module 220, a first look-up table module 230, and a data stitching module 240.
Wherein, the leading 0 computation tree module 210 counts the leading 0 numbers of the input low-bit linear domain data, the implementation circuit may be in a tree structure, referring to fig. 1e, the uppermost layer of the tree groups the low-bit linear domain data without sign bits, and counts the number of leading 0 bits for each two bits from the high-order bits, as { a in fig. 1e n-2 ,a n-3 },{…},{a k ,a k-1 },{…},{a 2 ,a 1 },a 0
In some cases, the bit width of the low bit linear domain data is even, and the lowest bit a is removed from the sign bit 0 Individual in a group. Thus, the processing circuit of the lowest order bit is different from the other bit, if the lowest order bit is 0, the leading 0 bit of the group of the lowest order bit is 1, if the lowest order bit is 1, the processing circuit counts the leading 0 bit of the group of the lowest order bit to 1The leading 0 bit of the group in which the lowest bit is counted is 0, and therefore, as shown in FIG. 1e, the lowest bit a 0 Is an inverter.
The other two bits of the processing circuit, which may be denoted add-mux-1, including the addition circuit add and the data selection circuit mux, may determine whether the counted number of bits of the leading 0 is valid according to left (left bit, high bit). left is a control signal of a two-input selector, and left and right (right side bits, low bits) are inverted respectively and then enter the adder circuit. The processing circuit add-mux-1 in the first layer of the tree, shown in the upper right side of fig. 1e, inverts the bit of the input, left is a control signal of a two-input selector, and controls whether the output lz_cnt is 0 or the result of the inverse addition of left and right. If the bit width of the output lz_cnt is 2 bits, when left is 0, the output lz_cnt is the result of adding left and right in inverse, and when left is 1, the output lz_cnt is 0.
The next layer of the tree further processes the processing result of the previous layer, and the processing circuit of the next layer of the tree can be represented as add-mux-2, and also comprises an adding circuit add and a data selecting circuit mux, wherein the add-mux-2 is shown by referring to the right lower side of fig. 1e, and determines whether the output lz_cnt is a left value or a (left+right) value according to the highest bit of left (left [ h ] in fig. 1 e). When left [ h ] is 1, a (left+right) value is output, and when left [ h ] is 0, a left value is output.
The data bit width of the other layers of the tree, except the first layer of the tree, is even and therefore there is no separate processing circuit (similar to a in the first layer 0 Processing circuitry of (a) for processing the data). The leading 0 lz_count of the low bit linear domain data is obtained through the tree-structured circuit shown in fig. 1 e.
The specific procedure for counting the number of leading 0's by the circuit shown in fig. 1e is illustrated below, referring to fig. 1f, assuming that the uppermost layer of the tree groups the low bit linear domain data 0001011 excluding the sign bit, counting the number of leading 0's for each two bits from the high order, as in {0,0}, {0,1}, and {1}, respectively, as shown in fig. 1 f.
The processing circuit of the lowest order {1} is an inverter, the lowest order is 1, and the number of bits of the leading 0 of the group in which the lowest order is counted is 0. The processing results of the uppermost layer of the tree are each represented as two bits (2 'b), and the number of bits of the leading 0 of the group in which the lowest bit is located is represented as 2' b00.
The other two processing circuits with one set are denoted as add-mux-1, and the specific processing procedure of the processing circuit add-mux-1, including the adding circuit add and the data selecting circuit mux, is referred to above and will not be repeated here. The bits of leading 0 of each group of {0,1} are 2' b10, 2' b01 and 2' b01, respectively, by the processing circuit add-mux-1 statistics {0,0}, {0,1 }.
The processing result of the previous layer is further processed by the next layer of the tree, and the processing circuit of the next layer of the tree can be represented as add-mux-2, and also comprises an adding circuit add and a data selecting circuit mux, and the specific processing procedure of the processing circuit add-mux-2 is referred to above and will not be repeated here.
The processing results of the second layer in fig. 1f are all represented as three bits (3' b). The processing results 2'b10 and 2' b01 of the first layer are processed by one processing circuit add-mux-2, the obtained processing result is 3'b011, and the processing results 2' b01 and 2'b00 of the first layer are processed by the other processing circuit add-mux-2, the obtained processing result is 3' b001.
The third layer in fig. 1f further processes the result of the processing of the second layer, the result of the processing of the third layer in fig. 1f being represented as four bits (4' b). The outputs 3' b011 and 3' b001 of the second layer are processed by a processing circuit add-mux-2 to obtain processing results of 4' b0011, that is, 3 in decimal, and finally the leading 0 numbers lz_count=3 of the low bit linear domain data 0001011 are counted.
The first shift module 220 left-shifts the input low-bit linear domain data by the number of bits of the preamble 0, resulting in a fractional part of the linear domain data. The fractional part of the linear domain data is then mapped into the fractional part of the logarithmic domain data by the first lookup table module 230, the mapping process may be expressed as 1.XXXXXX→2 0.YYYYYY The input index of the first lookup table module 230 is the xxxxx value of the linear domain data, the output index result is the exponent YYYYYY of 2, the conversion accuracy may depend on the decimal number of the exponent portion of 2,the greater the number of bits, the higher the conversion accuracy and the corresponding resources consumed by the first lookup table module 230.
In addition, the first shift module 220 subtracts the bit width of the data bits of the low bit linear domain data minus the number of bits of the preamble 0 by 1 (subtracting 1 is because the most significant bit of the normalized data is 1), resulting in an exponent portion of the logarithmic domain data.
The data splicing module 240 splices the exponent part of the logarithmic domain data, the decimal part yyyyyyy of the logarithmic domain data obtained by mapping, and the sign bit, so as to obtain data in logarithmic format, or logarithmic domain data. The low bit linear domain data to logarithmic domain data module 200 generally implements the conversion of low bit linear domain data from a linear format to a logarithmic format.
For example, referring to fig. 1g, it is assumed that the low bit linear domain data 8 'b0_011100, 8' b represents that the bit width of the low bit linear domain data is 8 bits, and the most significant bit is a sign bit, which may be represented as (1, 3, 6), 1 represents the bit width of the sign bit, 3 represents the bit width of the exponent portion, and 6 represents the bit width of the fraction portion. The number of bits of the preamble 0 is 1, and the first shift module 220 shifts the 8' b00111100 left by 1 bit, resulting in the fractional part 1.11100 of the linear domain data. The fractional portion 1.11100 of the linear domain data is then mapped to the fractional portion 0.111000 of the logarithmic domain data by the first lookup table module 230. In addition, the first shift module 220 subtracts the bit width of the data bits of the low bit linear domain data by the number of bits of the leading 0 by 1 (subtracting 1 is because the most significant bit of the normalized data is 1), that is, 7-1-1=5, to obtain the exponent part 5 of the logarithmic domain data, the binary form is 3'b101, and 3' b represents the bit width is 3 bits. The decimal part 0.111000 of the logarithmic domain data is spliced with the exponent part 101 of the logarithmic domain data to obtain the logarithmic domain data of 10'b0_101_111000, wherein "_" represents a separator separating sign bits, exponent bits, decimal places, and 10' b represents a bit width of 10 bits.
Adder module 300:
from the foregoing, it is clear that the addition of the digital domain data corresponds to the multiplication of the linear domain data. In the above-described example of the present invention,
Figure BDA0002764779580000121
adder module 300 completes e 1 +e 2 +0.(f 1 +f 2 ) Is added to the process of the addition of (a).
The logarithmic domain data to high precision linear domain data module 400:
referring to fig. 1h, the logarithmic domain data to high precision linear domain data module 400 may include: a truncating module 410, a second lookup table module 420, and a second shifting module 430.
Wherein the truncating module 410 truncates the input logarithmic domain data into an exponential portion and a fractional portion, the fractional portion enters the second lookup table module 420, the second lookup table module 420 maps the fractional portion of the logarithmic domain data into a fractional portion of the linear domain data, and the mapping process can be expressed as 2 0.YYYYYY The input index of the second lookup table module 420 is YYYYYY, and the output index result is xxxxx. The high-precision conversion is embodied in that the XXXX can be a large bit width, and the specific bit width can be determined according to practical application. The second shift module 430 shifts the fractional part of the logarithmic domain data to the left, the number of shifted bits is the value of the exponential part of the logarithmic domain data, and high-precision linear domain data is obtained after the shift.
For example, referring to fig. 1i, it is assumed that the logarithmic domain data 11' b0_1010_001010, the bit width of which is 11 bits, may be expressed as (1, 4, 6), 1 represents the bit width of a sign bit, 4 represents the bit width of an exponent portion, and 6 represents the bit width of a fraction portion. The truncating module 410 truncates the log domain data into an exponent portion 10 (4 'b 1010) and a fraction portion of 6' b001010. The second lookup table module 420 maps the fractional part to 6' b000111, the second shift module 430 shifts the fractional part 6' b000111 exponent part to the left by the number of bits of the exponent part of the logarithmic domain data, i.e. 10 bits, which corresponds to adding 100 s after 6' b000111 and the previous bit of 000111 is 1 in a fixed notation format. Assuming that the high-precision linear domain data is 23-bit data, the resulting high-precision linear domain data is 23' b0_0000010001110000_000000, where "_represents a separator separating sign bits, exponent bits, and decimal places, and the decimal place of the high-precision linear domain data is 6-bit.
Linear domain data accumulation module 500:
as described above, when the matrix multiplication unit processes the linear domain data, the multiplication operation is performed first and then the accumulation operation is performed, where the accumulation operation is represented in this scheme by the linear domain data accumulation module 500 accumulating the linear domain data with high precision, and the data obtained after the accumulation operation is referred to as second post-operation data, and the second post-operation data is in a high-precision linear format.
The implementation circuit of accumulation operation can refer to fig. 1j, and when in accumulation, a register is used for temporarily storing input data, accumulation is continuously carried out according to the input data, the accumulated data can be quite large, the bit width of the accumulated data can be adjusted to meet the actual requirement, and the bit width of proper accumulated data is adjusted, so that the data in the whole accumulation process has no precision loss.
The linear domain data accumulation module 500 completes the accumulation process of the high-precision linear domain data, and the overflow condition when the accumulation operation is actually performed is an important factor for determining the bit width of the high-precision linear domain data.
The high-precision linear domain data-to-low bit linear domain data module 600:
as described above, the high-precision linear domain data may be converted into the low-bit linear domain data by means of interception or rounding, and the specific conversion method is not limited. The module 600 realizes the clamping of the accumulated high-precision large-bit-width data, so that the bit width of the linear domain data input and output by the whole hybrid quantization module is the same, and the hybrid quantization module can be seamlessly embedded into a certain layer of the deep neural network. The bit width of the low bit linear domain data can be determined according to actual requirements.
Output data buffer module 700:
the converted data in the low bit linear format may be buffered in the output data buffer module 700. The next level of neural network may obtain low bit linear domain data from the output data buffer module 700.
By applying the embodiment of the invention, in the first aspect, the operation of linear domain data is converted into the operation of logarithmic domain data, namely the operation of using a multiplier is converted into the operation of using an adder, so that the operation amount of the neural network is reduced; in the second aspect, the input and output of the hybrid quantization module are all linear domain data, so that the requirement of operation on the linear domain data by other levels of the neural network is met, for example, the levels of a BN (Batch Normalization) layer, a ReLu (Rectified Linear Unit) layer, a pooling layer and the like need to operate on the linear domain data; in a third aspect, the data stitching module 240 stitches the exponent portion, the fraction portion, and the sign bit of the logarithmic domain data to obtain the logarithmic domain data, and compared to some related schemes in which an addition operation is used to convert the linear domain data into the logarithmic domain data, the direct stitching method saves the computing resources.
Based on the same inventive concept, the embodiments of the present invention provide a data processing method, apparatus and device based on a neural network layer, where the method and apparatus may be applied to various electronic devices, and are not limited in particular. The data processing method based on the neural network layer will be described in detail first.
Fig. 2 is a schematic flow chart of a data processing method based on a neural network layer according to an embodiment of the present invention, including:
s201: input data of a target layer of the neural network is obtained. If the target layer of the neural network is an input layer of the neural network, the input data is any one of the following: images, audio, text; if the target layer of the neural network is not an input layer of the neural network, the input data is any one of the following: image features, audio features, text features.
The neural network may be a deep neural network, or may also be a convolutional neural network, etc., and the specific neural network type is not limited. The target layer may be any layer in the neural network, which may be a convolution layer, a full connection layer, etc., and the specific hierarchy type is not limited. If the target layer is not the first layer of the neural network, the input data may be output data of a layer above the target layer, that is, map data. If the target layer is the first layer of the neural network, the input data may be data externally input into the neural network.
For example, the neural network may be a neural network for image processing, such as a neural network for image classification, a neural network for object detection, a neural network for object tracking, and the like, which are not listed one by one. In this case, if the target layer is the first layer of the neural network, the input data may be an image externally input into the neural network, which may be a single image or may be a continuous video frame image. If the target layer is not the first layer of the neural network, the input data may be output data of a layer above the target layer, that is, image features.
As another example, the neural network may be a neural network for audio processing, such as a neural network for audio classification, a neural network for audio recognition, and the like, to name but a few. In this case, if the target layer is the first layer of the neural network, the input data may be audio externally input into the neural network. If the target layer is not the first layer of the neural network, the input data may be output data of a layer above the target layer, i.e., audio features.
As another example, the neural network may be a neural network for text processing, such as a neural network for semantic recognition, a neural network for text detection, and the like, to name but a few. In this case, if the target layer is the first layer of the neural network, the input data may be text externally input into the neural network. If the target layer is not the first layer of the neural network, the input data may be output data of a layer above the target layer, that is, text features.
S202: the input data is converted from a preset first linear format to a preset logarithmic format.
The preset first linear format corresponds to the low bit linear format in the content. The data in the preset logarithmic format may be shown with reference to fig. 1b, and includes sign bits, exponent bits and fraction bits. Alternatively, the data in the predetermined logarithmic format may include only the exponent and the fraction.
In one embodiment, the preset logarithmic format includes: an exponential portion and a fractional portion; s202 may include: counting the digits of the leading 0 of the input data; determining an exponent part of the input data in the preset logarithmic format based on the number of bits of the preamble 0; determining a fractional part of the input data in the preset first linear format based on the number of bits of the preamble 0; mapping the decimal part under the preset first linear format into the decimal part under the preset logarithmic format based on a preset first mapping relation; and splicing the index part in the preset logarithmic format with the decimal part in the preset logarithmic format to obtain the input data in the preset logarithmic format.
For example, the bit width of the data bits of the input data may be subtracted by the number of leading 0 bits by 1 (minus 1 because the most significant bit of the normalized data is 1) to obtain the exponent portion of the input data in the predetermined logarithmic format. The input data may be left shifted by the number of bits of the preamble 0 to obtain the fractional part of the linear domain data. Assuming that the preset first mapping relationship is expressed as 1.XXXXXX.fwdarw.2 0.YYYYYY And mapping the decimal part XXXXXX of the linear domain data into the decimal part YYYYYY in a preset logarithmic format through the preset first mapping relation. And then splicing the index part in the preset logarithmic format with the decimal part in the preset logarithmic format to obtain the data in the preset logarithmic format.
If the input data further includes sign bits, the sign bits may not be changed during the process of converting the input data from the preset first linear format to the preset logarithmic format. In the above embodiment, the sign bit, the index part in the preset logarithmic format, and the decimal part in the preset logarithmic format may be spliced to obtain the data in the preset logarithmic format.
The logarithmic format is a data format, and there are various ways of converting data into the logarithmic format, and the above embodiment is a conversion manner proposed by the present invention, and the embodiment of the present invention is not limited to a specific conversion manner.
Referring to the above, the low bit linear domain data to logarithmic domain data module 200 may be used to convert the input data from a predetermined first linear format to a predetermined logarithmic format.
S203: and operating the input data in a preset logarithmic format by utilizing a logarithmic domain operator in the target layer to obtain first operated data.
In one case, the arithmetic unit in the target layer includes a matrix multiplication unit, and the log-domain operator is a log-domain adder in the matrix multiplication unit. In this case, the log-domain operator may correspond to the adder module 300 in the above. As described above, the multiplication operation in the linear domain may be converted into addition operation in the logarithmic domain.
Alternatively, the log-domain arithmetic unit may be another arithmetic unit, such as a logic arithmetic unit, which is not particularly limited.
In one embodiment, after S201, the input data may be cached in a caching module, where the caching module further caches the network parameters of the target layer, where the network parameters are in a preset logarithmic format. In this embodiment, S202 may include: converting the input data cached in the caching module from a preset first linear format to a preset logarithmic format; s203 may include: and operating the input data in the preset logarithmic format and the network parameters cached in the caching module by utilizing a logarithmic domain operator in the target layer to obtain first operated data.
The buffer module may correspond to the input data buffer module 300 in the above. By setting the buffer module, frequent reading in the SRAM is not needed, and the data processing flow is optimized.
S204: and converting the first operated data from a preset logarithmic format to a preset second linear format.
The preset second linear format corresponds to the high-precision linear format in the above. The data in the preset logarithmic format may be shown with reference to fig. 1b, and includes sign bits, exponent bits and fraction bits. Alternatively, the data in the predetermined logarithmic format may include only the exponent and the fraction.
In one embodiment, S204 may include: splitting the first operated data in the preset logarithmic format into an index part and a decimal part; based on a preset second mapping relation, mapping the fractional part of the split data after the first operation in the preset logarithmic format into the fractional part in the preset second linear format; obtaining first operated data of a preset second linear format by shifting the decimal part of the preset second linear format to the left; the bit number shifted leftwards is the same as the value of the exponent part of the first operated data in the preset logarithmic format.
For example, the predetermined second mapping relationship may be expressed as 2 0.YYYYYY And (2) expressing the split decimal part as YYYYYY, wherein the decimal part under the preset second linear format obtained by mapping is XXXXXX. And shifting the decimal part in the preset second linear format leftwards, wherein the number of the shifted digits is the numerical value of the exponential part of the data after the first operation in the preset logarithmic format, and the data in the preset second linear format, namely the high-precision linear domain data, is obtained after the shifting. The high-precision conversion is embodied in that the XXXX can be a large bit width, and the specific bit width can be determined according to practical application.
If the first operated data further includes sign bits, the sign bits may not be changed in the process of converting the first operated data from the preset logarithmic format to the preset second linear format. In the above embodiment, the first operation data of the preset second linear format may be obtained by shifting the decimal part of the preset second linear format to the left by m bits under the condition that the sign bit is unchanged; where m is a value of an exponent portion of the first operated data in a preset logarithmic format.
The high-precision linear format is a data format, and various ways of converting the data in the logarithmic format into the data in the high-precision linear format are available, and the above embodiment is a conversion way proposed by the present invention, and the embodiment of the present invention is not limited to a specific conversion way.
Referring to the above, the module 400 for converting logarithmic domain data into high-precision linear domain data may be used to convert the first operation data from a preset logarithmic format into a preset second linear format.
S205: and operating the first operated data with a preset second linear format by using a linear domain operator in the target layer to obtain second operated data.
As described above, in one case, the operation unit in the target layer includes a matrix multiplication unit, and in this case, the line number domain operator is a linear domain accumulator in the matrix multiplication unit. In this case, the linear domain operator may correspond to the linear domain data accumulation module 500 in the above. As described above, when the matrix multiplication unit processes the linear domain data, the multiplication operation is performed first and then the accumulation operation is performed, where the accumulation operation is expressed in this scheme, and the second post-operation data is obtained by performing an operation on the first post-operation data with a preset second linear format using the linear domain operator in the target layer.
Alternatively, the linear domain operator may be another operator, such as a logic operator, which is not particularly limited.
In one embodiment, after S205, the second operated data in the preset second linear format may be intercepted, so as to obtain the second operated data in the preset first linear format.
The high-precision linear domain data to low-bit linear domain data module 600 in the above description may be used to convert the second operation data from the preset second linear format to the preset first linear format. Converting the preset second linear format to the preset first linear format may include: intercepting or rounding the data in the preset second linear format, and the specific conversion mode is not limited.
If there is still a next hierarchy behind the neural network target layer, the input data of the next hierarchy may be the second operated data of the preset first linear format. The next level may be the target level, with embodiments of the present invention continuing to process data.
The data processing process in the embodiment of the invention can be a data processing process when the neural network is trained, and can also be a data processing process when the neural network is used, and is not particularly limited.
By applying the embodiment of the invention, in the first aspect, the operation of linear domain data is converted into the operation of logarithmic domain data, namely, the operation of using a multiplier is converted into the operation of using an adder, so that the operation amount of the neural network is reduced. In the second aspect, the input and output of the target layer of the neural network are all linear domain data, so that the requirement of other layers of the neural network on operation of the linear domain data is met, and the linear domain data is required to be operated on by the layers such as a BN layer, a ReLu layer, a pooling layer and the like. In the third aspect, in one embodiment, input data in a preset logarithmic format is obtained by stitching, and compared with some related schemes, in which addition operation is adopted to convert linear domain data into logarithmic domain data, the direct stitching manner saves computational resources.
Corresponding to the above method embodiment, the embodiment of the present invention further provides a data processing device based on a neural network layer, as shown in fig. 3, including:
an acquiring module 301, configured to acquire input data of a target layer of a neural network; if the target layer of the neural network is an input layer of the neural network, the input data is any one of the following: images, audio, text; if the target layer of the neural network is not an input layer of the neural network, the input data is any one of the following: image features, audio features, text features;
a first conversion module 302, configured to convert the input data from a preset first linear format to a preset logarithmic format;
the first operation module 303 is configured to operate on the input data in the preset logarithmic format by using a logarithmic domain operator in the target layer, so as to obtain first operated data;
the second conversion module 304 is configured to convert the first operated data from the preset logarithmic format to a preset second linear format;
the second operation module 305 is configured to operate on the first operated data in the preset second linear format by using the linear domain operator in the target layer, so as to obtain second operated data.
The first conversion module 302 may correspond to the low bit linear domain data to logarithmic domain data module 200 in the above content, the first operation module 303 may correspond to the adder module 300 in the above content, the second conversion module 304 may correspond to the logarithmic domain data to high precision linear domain data module 400 in the above content, and the second operation module 305 may correspond to the linear domain data accumulation module 500 in the above content.
In one embodiment, the preset logarithmic format includes: an exponential portion and a fractional portion; the first conversion module 302 includes: a statistics sub-module, a first determination sub-module, a second determination sub-module, a first mapping sub-module, and a stitching sub-module (not shown), wherein,
a statistics sub-module, configured to count a number of bits of a preamble 0 of the input data;
a first determining submodule, configured to determine an exponent portion of the input data in the preset logarithmic format based on the number of bits of the preamble 0;
a second determining submodule for determining a decimal part of the input data in the preset first linear format based on the number of bits of the preamble 0;
the first mapping sub-module is used for mapping the decimal part under the preset first linear format into the decimal part under the preset logarithmic format based on a preset first mapping relation;
And the splicing sub-module is used for splicing the index part in the preset logarithmic format with the decimal part in the preset logarithmic format to obtain the input data in the preset logarithmic format.
The statistics sub-module may correspond to the leading 0 computation tree module 210 in the above-described content, the first determination sub-module and the second determination sub-module may correspond to the first shift module 220 in the above-described content, the first mapping sub-module may correspond to the first lookup table module 230 in the above-described content, and the stitching sub-module may correspond to the data stitching module 240 in the above-described content.
In one embodiment, the second conversion module 304 includes: a split sub-module, a second mapping sub-module and a shift sub-module (not shown in the figure), wherein,
the splitting module is used for splitting the first operation data in the preset logarithmic format into an index part and a decimal part;
the second mapping sub-module is used for mapping the fractional part of the first calculated data in the preset logarithmic format obtained by splitting into the fractional part in the preset second linear format based on a preset second mapping relation;
the shifting sub-module is used for obtaining first operated data of the preset second linear format by shifting the decimal part of the preset second linear format leftwards; the bit number shifted leftwards is the same as the value of the exponent part of the first operated data in the preset logarithmic format.
The split sub-module may correspond to the truncate module 410 in the above, the second mapping sub-module may correspond to the second look-up table module 420 in the above, and the shift sub-module may correspond to the second shift module 430 in the above.
In one embodiment, the apparatus further comprises:
and the intercepting module (not shown in the figure) is used for intercepting the second operated data in the preset second linear format to obtain the second operated data in the preset first linear format.
The intercept module may correspond to the high precision linear domain data to low bit linear domain data module 600 described above.
In one embodiment, the operation unit in the target layer includes a matrix multiplication unit, the log domain operator is a log domain adder in the matrix multiplication unit, and the linear domain operator is a linear domain accumulator in the matrix multiplication unit.
In one embodiment, the apparatus further comprises:
and the caching module (not shown in the figure) is used for caching the input data and the network parameters of the target layer, wherein the network parameters are in a preset logarithmic format. The buffer module may correspond to the input data buffer module 100 in the above.
The first conversion module 302 is specifically configured to: converting the input data cached in the caching module from a preset first linear format to a preset logarithmic format;
the first operation module 303 is specifically configured to: and operating the input data in the preset logarithmic format and the network parameters cached in the caching module by utilizing a logarithmic domain operator in the target layer to obtain first operated data.
By applying the embodiment of the invention, in the first aspect, the operation of linear domain data is converted into the operation of logarithmic domain data, namely, the operation of using a multiplier is converted into the operation of using an adder, so that the operation amount of the neural network is reduced. In the second aspect, the input and output of the target layer of the neural network are all linear domain data, so that the requirement of operation on the linear domain data by other layers of the neural network is met. In the third aspect, in one embodiment, input data in a preset logarithmic format is obtained by stitching, and compared with some related schemes, in which addition operation is adopted to convert linear domain data into logarithmic domain data, the direct stitching manner saves computational resources.
The embodiment of the invention also provides an electronic device, as shown in fig. 4, comprising a processor 401 and a memory 402,
A memory 402 for storing a computer program;
the processor 401 is configured to implement any of the above-described data processing methods based on the neural network layer when executing the program stored in the memory 402.
The Memory mentioned in the above electronic device may include a Random access Memory (Random AccessMemory, RAM) or a Non-Volatile Memory (NVM), such as at least one magnetic disk Memory. Optionally, the memory may also be at least one memory device located remotely from the aforementioned processor.
The processor may be a general-purpose processor, including a central processing unit (Central Processing Unit, CPU), a network processor (Network Processor, NP), etc.; but also digital signal processors (Digital Signal Processing, DSP), application specific integrated circuits (Application Specific IntegratedCircuit, ASIC), field programmable gate arrays (Field-Programmable Gate Array, FPGA) or other programmable logic devices, discrete gate or transistor logic devices, discrete hardware components.
In yet another embodiment of the present invention, a computer readable storage medium is provided, in which a computer program is stored, the computer program implementing any one of the above data processing methods based on a neural network layer when executed by a processor.
In yet another embodiment of the present invention, a computer program product containing instructions that, when run on a computer, cause the computer to perform any of the above-described neural network layer-based data processing methods is also provided.
In the above embodiments, it may be implemented in whole or in part by software, hardware, firmware, or any combination thereof. When implemented in software, may be implemented in whole or in part in the form of a computer program product. The computer program product includes one or more computer instructions. When loaded and executed on a computer, produces a flow or function in accordance with embodiments of the present invention, in whole or in part. The computer may be a general purpose computer, a special purpose computer, a computer network, or other programmable apparatus. The computer instructions may be stored in or transmitted from one computer-readable storage medium to another, for example, by wired (e.g., coaxial cable, optical fiber, digital Subscriber Line (DSL)), or wireless (e.g., infrared, wireless, microwave, etc.). The computer readable storage medium may be any available medium that can be accessed by a computer or a data storage device such as a server, data center, etc. that contains an integration of one or more available media. The usable medium may be a magnetic medium (e.g., floppy Disk, hard Disk, magnetic tape), an optical medium (e.g., DVD), or a semiconductor medium (e.g., solid State Disk (SSD)), etc.
It is noted that relational terms such as first and second, and the like are used solely to distinguish one entity or action from another entity or action without necessarily requiring or implying any actual such relationship or order between such entities or actions. Moreover, the terms "comprises," "comprising," or any other variation thereof, are intended to cover a non-exclusive inclusion, such that a process, method, article, or apparatus that comprises a list of elements does not include only those elements but may include other elements not expressly listed or inherent to such process, method, article, or apparatus. Without further limitation, an element defined by the phrase "comprising one … …" does not exclude the presence of other like elements in a process, method, article, or apparatus that comprises the element.
In this specification, each embodiment is described in a related manner, and identical and similar parts of each embodiment are all referred to each other, and each embodiment mainly describes differences from other embodiments. In particular, for apparatus embodiments, device embodiments, computer-readable storage medium embodiments, and computer program product embodiments, the description is relatively simple, as relevant to the method embodiments in part.
The foregoing description is only of the preferred embodiments of the present invention and is not intended to limit the scope of the present invention. Any modification, equivalent replacement, improvement, etc. made within the spirit and principle of the present invention are included in the protection scope of the present invention.

Claims (9)

1. A data processing method based on a neural network layer, comprising:
acquiring input data of a neural network target layer; if the target layer of the neural network is an input layer of the neural network, the input data is any one of the following: images, audio, text; if the target layer of the neural network is not an input layer of the neural network, the input data is any one of the following: image features, audio features, text features; converting the input data from a preset first linear format to a preset logarithmic format;
calculating the input data in the preset logarithmic format by utilizing a logarithmic domain operator in the target layer to obtain first calculated data;
converting the first operated data from the preset logarithmic format to a preset second linear format;
calculating the first calculated data of the preset second linear format by using a linear domain operator in the target layer to obtain second calculated data;
The preset logarithmic format includes: an exponential portion and a fractional portion; the converting the input data from a preset first linear format to a preset logarithmic format includes:
counting the digits of the leading 0 of the input data;
determining an exponent part of the input data in the preset logarithmic format based on the number of bits of the preamble 0;
determining a fractional part of the input data in the preset first linear format based on the number of bits of the preamble 0;
mapping the decimal part under the preset first linear format into the decimal part under the preset logarithmic format based on a preset first mapping relation;
splicing the index part in the preset logarithmic format with the decimal part in the preset logarithmic format to obtain input data in the preset logarithmic format;
the counting the number of bits of the leading 0 of the input data comprises:
counting the leading 0 number of the input data through a leading 0 calculation tree module, wherein an implementation circuit of the leading 0 calculation tree module is in a tree structure; dividing input data which does not comprise sign bits into a group every two bits through the uppermost layer of the tree structure, starting from a high bit, and counting the number of bits of leading 0 in the group for each group of data respectively; calculating the number of bits of the leading 0 of the input data according to the number of bits of the leading 0 in each group through other layers of the tree structure; the bit width of the data is even, the bit number of the leading 0 of the group where the lowest bit is located is counted as 1, and if the lowest bit is 1, the bit number of the leading 0 of the group where the lowest bit is located is counted as 0;
The method further includes, after performing an operation on the first operated data in the preset second linear format by using the linear domain operator in the target layer to obtain second operated data,:
intercepting the second operated data in the preset second linear format to obtain the second operated data in the preset first linear format.
2. The method of claim 1, wherein said converting said first computed data from said predetermined logarithmic format to a predetermined second linear format comprises:
splitting the first operated data in the preset logarithmic format into an index part and a decimal part;
based on a preset second mapping relation, mapping the fractional part of the split data after the first operation in the preset logarithmic format into the fractional part in the preset second linear format;
obtaining first operated data of a preset second linear format by shifting the decimal part of the preset second linear format to the left; the bit number shifted leftwards is the same as the value of the exponent part of the first operated data in the preset logarithmic format.
3. The method according to any of claims 1-2, wherein the operation unit in the target layer comprises a matrix multiplication unit, the log domain operator is a log domain adder in the matrix multiplication unit, and the linear domain operator is a linear domain accumulator in the matrix multiplication unit.
4. The method according to claim 1, further comprising, after the obtaining the input data of the target layer of the neural network:
caching the input data into a caching module, wherein the caching module also caches network parameters of the target layer, and the network parameters are in a preset logarithmic format;
the converting the input data from a preset first linear format to a preset logarithmic format includes:
converting the input data cached in the caching module from a preset first linear format to a preset logarithmic format;
the calculating the input data in the preset logarithmic format by using the logarithmic domain arithmetic unit in the target layer to obtain first calculated data includes:
and operating the input data in the preset logarithmic format and the network parameters cached in the caching module by utilizing a logarithmic domain operator in the target layer to obtain first operated data.
5. A data processing apparatus based on a neural network layer, comprising:
the acquisition module is used for acquiring input data of a neural network target layer; if the target layer of the neural network is an input layer of the neural network, the input data is any one of the following: images, audio, text; if the target layer of the neural network is not an input layer of the neural network, the input data is any one of the following: image features, audio features, text features;
The first conversion module is used for converting the input data from a preset first linear format into a preset logarithmic format;
the first operation module is used for operating the input data in the preset logarithmic format by utilizing a logarithmic domain operator in the target layer to obtain first operated data;
the second conversion module is used for converting the first operated data from the preset logarithmic format into a preset second linear format; the second operation module is used for operating the first operated data with the preset second linear format by utilizing the linear domain operator in the target layer to obtain second operated data;
the preset logarithmic format includes: an exponential portion and a fractional portion; the first conversion module includes:
a statistics sub-module, configured to count a number of bits of a preamble 0 of the input data;
a first determining submodule, configured to determine an exponent portion of the input data in the preset logarithmic format based on the number of bits of the preamble 0;
a second determining submodule for determining a decimal part of the input data in the preset first linear format based on the number of bits of the preamble 0;
the first mapping sub-module is used for mapping the decimal part under the preset first linear format into the decimal part under the preset logarithmic format based on a preset first mapping relation;
The splicing sub-module is used for splicing the index part in the preset logarithmic format with the decimal part in the preset logarithmic format to obtain input data in the preset logarithmic format;
the counting the number of bits of the leading 0 of the input data comprises:
counting the leading 0 number of the input data through a leading 0 calculation tree module, wherein an implementation circuit of the leading 0 calculation tree module is in a tree structure; dividing input data which does not comprise sign bits into a group every two bits through the uppermost layer of the tree structure, starting from a high bit, and counting the number of bits of leading 0 in the group for each group of data respectively; calculating the number of bits of the leading 0 of the input data according to the number of bits of the leading 0 in each group through other layers of the tree structure; the bit width of the data is even, the bit number of the leading 0 of the group where the lowest bit is located is counted as 1, and if the lowest bit is 1, the bit number of the leading 0 of the group where the lowest bit is located is counted as 0;
the apparatus further comprises:
and the intercepting module is used for intercepting the second operated data in the preset second linear format to obtain the second operated data in the preset first linear format.
6. The apparatus of claim 5, wherein the second conversion module comprises:
The splitting module is used for splitting the first operation data in the preset logarithmic format into an index part and a decimal part;
the second mapping sub-module is used for mapping the fractional part of the first calculated data in the preset logarithmic format obtained by splitting into the fractional part in the preset second linear format based on a preset second mapping relation;
the shifting sub-module is used for obtaining first operated data of the preset second linear format by shifting the decimal part of the preset second linear format leftwards; the bit number shifted leftwards is the same as the value of the exponent part of the first operated data in the preset logarithmic format.
7. The apparatus of any of claims 5-6, wherein the arithmetic unit in the target layer comprises a matrix multiplication unit, the log domain operator is a log domain adder in the matrix multiplication unit, and the linear domain operator is a linear domain accumulator in the matrix multiplication unit.
8. The apparatus of claim 5, wherein the apparatus further comprises:
the caching module is used for caching the input data and the network parameters of the target layer, wherein the network parameters are in a preset logarithmic format;
The first conversion module is specifically configured to: converting the input data cached in the caching module from a preset first linear format to a preset logarithmic format;
the first operation module is specifically configured to: and operating the input data in the preset logarithmic format and the network parameters cached in the caching module by utilizing a logarithmic domain operator in the target layer to obtain first operated data.
9. An electronic device comprising a processor and a memory;
a memory for storing a computer program;
a processor for carrying out the method steps of any one of claims 1-4 when executing a program stored on a memory.
CN202011229722.6A 2020-11-06 2020-11-06 Data processing method, device and equipment based on neural network layer Active CN112199072B (en)

Priority Applications (1)

Application Number Priority Date Filing Date Title
CN202011229722.6A CN112199072B (en) 2020-11-06 2020-11-06 Data processing method, device and equipment based on neural network layer

Applications Claiming Priority (1)

Application Number Priority Date Filing Date Title
CN202011229722.6A CN112199072B (en) 2020-11-06 2020-11-06 Data processing method, device and equipment based on neural network layer

Publications (2)

Publication Number Publication Date
CN112199072A CN112199072A (en) 2021-01-08
CN112199072B true CN112199072B (en) 2023-06-02

Family

ID=74034054

Family Applications (1)

Application Number Title Priority Date Filing Date
CN202011229722.6A Active CN112199072B (en) 2020-11-06 2020-11-06 Data processing method, device and equipment based on neural network layer

Country Status (1)

Country Link
CN (1) CN112199072B (en)

Families Citing this family (1)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN112734023B (en) * 2021-02-02 2023-10-13 中国科学院半导体研究所 Reconfigurable circuit applied to activation function of cyclic neural network

Family Cites Families (4)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
JP2012221187A (en) * 2011-04-08 2012-11-12 Fujitsu Ltd Arithmetic circuit, arithmetic processing unit, and control method of arithmetic circuit
CN109521994B (en) * 2017-09-19 2020-11-10 华为技术有限公司 Multiplication hardware circuit, system on chip and electronic equipment
TWI672643B (en) * 2018-05-23 2019-09-21 倍加科技股份有限公司 Full index operation method for deep neural networks, computer devices, and computer readable recording media
CN110888623B (en) * 2019-11-25 2021-11-23 集美大学 Data conversion method, multiplier, adder, terminal device and storage medium

Also Published As

Publication number Publication date
CN112199072A (en) 2021-01-08

Similar Documents

Publication Publication Date Title
JP7348971B2 (en) Convolutional neural network hardware configuration
CN109214509B (en) High-speed real-time quantization structure and operation implementation method for deep neural network
CN110852416A (en) CNN accelerated computing method and system based on low-precision floating-point data expression form
US20200218509A1 (en) Multiplication Circuit, System on Chip, and Electronic Device
JP2020112901A (en) Inference processing device and inference processing method
US9798520B2 (en) Division operation apparatus and method of the same
CN110852434A (en) CNN quantization method, forward calculation method and device based on low-precision floating point number
CN112199072B (en) Data processing method, device and equipment based on neural network layer
CN109165006B (en) Design optimization and hardware implementation method and system of Softmax function
CN108109166B (en) Method and device for calculating image entropy value in automatic exposure
US20120030203A1 (en) Method for establishing multiple look-up tables and data acquisition method using multiple look-up tables
CN114677548A (en) Neural network image classification system and method based on resistive random access memory
CN108228136B (en) Logarithmic function calculation method and device based on optimization lookup table method
CN110337636A (en) Data transfer device and device
US20060106905A1 (en) Method for reducing memory size in logarithmic number system arithmetic units
Wu et al. Efficient dynamic fixed-point quantization of CNN inference accelerators for edge devices
CN107783935B (en) Approximate calculation reconfigurable array based on dynamic precision configurable operation
CN111598227B (en) Data processing method, device, electronic equipment and computer readable storage medium
Arvind et al. Hardware implementation of hyperbolic tangent activation function for floating point formats
JPH11212768A (en) Logarithmic value calculation circuit
WO2021039164A1 (en) Information processing device, information processing system, and information processing method
CN113407788B (en) Data processing method, device and system
CN114201140B (en) Exponential function processing unit, method and neural network chip
CN113988279A (en) Output current reading method and system of storage array supporting negative value excitation
CN114860193A (en) Hardware operation circuit for calculating Power function and data processing method

Legal Events

Date Code Title Description
PB01 Publication
PB01 Publication
SE01 Entry into force of request for substantive examination
SE01 Entry into force of request for substantive examination
GR01 Patent grant
GR01 Patent grant