WO2019165602A1 - 数据转换方法和装置 - Google Patents

数据转换方法和装置 Download PDF

Info

Publication number
WO2019165602A1
WO2019165602A1 PCT/CN2018/077573 CN2018077573W WO2019165602A1 WO 2019165602 A1 WO2019165602 A1 WO 2019165602A1 CN 2018077573 W CN2018077573 W CN 2018077573W WO 2019165602 A1 WO2019165602 A1 WO 2019165602A1
Authority
WO
WIPO (PCT)
Prior art keywords
value
real
output value
target layer
field
Prior art date
Application number
PCT/CN2018/077573
Other languages
English (en)
French (fr)
Inventor
李似锦
赵尧
杨康
Original Assignee
深圳市大疆创新科技有限公司
Priority date (The priority date is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the date listed.)
Filing date
Publication date
Application filed by 深圳市大疆创新科技有限公司 filed Critical 深圳市大疆创新科技有限公司
Priority to CN201880011394.7A priority Critical patent/CN110337636A/zh
Priority to PCT/CN2018/077573 priority patent/WO2019165602A1/zh
Publication of WO2019165602A1 publication Critical patent/WO2019165602A1/zh
Priority to US17/000,915 priority patent/US20200389182A1/en

Links

Images

Classifications

    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F7/00Methods or arrangements for processing data by operating upon the order or content of the data handled
    • G06F7/38Methods or arrangements for performing computations using exclusively denominational number representation, e.g. using binary, ternary, decimal representation
    • G06F7/48Methods or arrangements for performing computations using exclusively denominational number representation, e.g. using binary, ternary, decimal representation using non-contact-making devices, e.g. tube, solid state device; using unspecified devices
    • G06F7/544Methods or arrangements for performing computations using exclusively denominational number representation, e.g. using binary, ternary, decimal representation using non-contact-making devices, e.g. tube, solid state device; using unspecified devices for evaluating functions by calculation
    • G06F7/5443Sum of products
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F7/00Methods or arrangements for processing data by operating upon the order or content of the data handled
    • G06F7/38Methods or arrangements for performing computations using exclusively denominational number representation, e.g. using binary, ternary, decimal representation
    • G06F7/48Methods or arrangements for performing computations using exclusively denominational number representation, e.g. using binary, ternary, decimal representation using non-contact-making devices, e.g. tube, solid state device; using unspecified devices
    • G06F7/483Computations with numbers represented by a non-linear combination of denominational numbers, e.g. rational numbers, logarithmic number system or floating-point numbers
    • G06F7/485Adding; Subtracting
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F7/00Methods or arrangements for processing data by operating upon the order or content of the data handled
    • G06F7/38Methods or arrangements for performing computations using exclusively denominational number representation, e.g. using binary, ternary, decimal representation
    • G06F7/48Methods or arrangements for performing computations using exclusively denominational number representation, e.g. using binary, ternary, decimal representation using non-contact-making devices, e.g. tube, solid state device; using unspecified devices
    • G06F7/483Computations with numbers represented by a non-linear combination of denominational numbers, e.g. rational numbers, logarithmic number system or floating-point numbers
    • G06F7/487Multiplying; Dividing
    • G06F7/4876Multiplying
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F7/00Methods or arrangements for processing data by operating upon the order or content of the data handled
    • G06F7/38Methods or arrangements for performing computations using exclusively denominational number representation, e.g. using binary, ternary, decimal representation
    • G06F7/48Methods or arrangements for performing computations using exclusively denominational number representation, e.g. using binary, ternary, decimal representation using non-contact-making devices, e.g. tube, solid state device; using unspecified devices
    • G06F7/52Multiplying; Dividing
    • G06F7/523Multiplying only
    • G06F7/5235Multiplying only using indirect methods, e.g. quarter square method, via logarithmic domain
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F9/00Arrangements for program control, e.g. control units
    • G06F9/06Arrangements for program control, e.g. control units using stored programs, i.e. using an internal store of processing equipment to receive or retain programs
    • G06F9/30Arrangements for executing machine instructions, e.g. instruction decode
    • G06F9/30003Arrangements for executing specific machine instructions
    • G06F9/30007Arrangements for executing specific machine instructions to perform operations on data operands
    • G06F9/3001Arithmetic instructions
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F9/00Arrangements for program control, e.g. control units
    • G06F9/06Arrangements for program control, e.g. control units using stored programs, i.e. using an internal store of processing equipment to receive or retain programs
    • G06F9/30Arrangements for executing machine instructions, e.g. instruction decode
    • G06F9/30003Arrangements for executing specific machine instructions
    • G06F9/30007Arrangements for executing specific machine instructions to perform operations on data operands
    • G06F9/30032Movement instructions, e.g. MOVE, SHIFT, ROTATE, SHUFFLE
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06NCOMPUTING ARRANGEMENTS BASED ON SPECIFIC COMPUTATIONAL MODELS
    • G06N3/00Computing arrangements based on biological models
    • G06N3/02Neural networks
    • G06N3/04Architecture, e.g. interconnection topology
    • G06N3/045Combinations of networks
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06NCOMPUTING ARRANGEMENTS BASED ON SPECIFIC COMPUTATIONAL MODELS
    • G06N3/00Computing arrangements based on biological models
    • G06N3/02Neural networks
    • G06N3/06Physical realisation, i.e. hardware implementation of neural networks, neurons or parts of neurons
    • G06N3/063Physical realisation, i.e. hardware implementation of neural networks, neurons or parts of neurons using electronic means
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F2207/00Indexing scheme relating to methods or arrangements for processing data by operating upon the order or content of the data handled
    • G06F2207/38Indexing scheme relating to groups G06F7/38 - G06F7/575
    • G06F2207/48Indexing scheme relating to groups G06F7/48 - G06F7/575
    • G06F2207/4802Special implementations
    • G06F2207/4818Threshold devices
    • G06F2207/4824Neural networks
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06NCOMPUTING ARRANGEMENTS BASED ON SPECIFIC COMPUTATIONAL MODELS
    • G06N3/00Computing arrangements based on biological models
    • G06N3/02Neural networks
    • G06N3/04Architecture, e.g. interconnection topology
    • G06N3/048Activation functions
    • HELECTRICITY
    • H03ELECTRONIC CIRCUITRY
    • H03MCODING; DECODING; CODE CONVERSION IN GENERAL
    • H03M7/00Conversion of a code where information is represented by a given sequence or number of digits to a code where the same, similar or subset of information is represented by a different sequence or number of digits
    • H03M7/14Conversion to or from non-weighted codes
    • H03M7/24Conversion to or from floating-point codes

Definitions

  • the present application relates to the field of data processing, and in particular, to a data conversion method and apparatus.
  • the current mainstream neural network computing framework it is basically calculated using floating point numbers.
  • the calculation of the gradient needs to be based on the floating point number representation to ensure sufficient accuracy; the layers of the forward propagation process of the neural network, especially the weight coefficients of the convolutional layer and the fully connected layer
  • the output values of each layer are also expressed in floating point numbers.
  • the main computational amount is concentrated in the convolution operation, and the convolution operation is a large number of multiply-accumulate operations. This aspect consumes more hardware resources and, on the other hand, consumes more power and bandwidth.
  • the conversion of data from the real number field to the logarithmic domain requires reference to the Full Scale Range (FSR).
  • FSR Full Scale Range
  • the FSR can also be called the conversion reference value, which is based on experience and is different for different networks. Need to manually adjust the parameters.
  • the existing method of converting data from the real number field to the log field is only applicable to the case where the data is a positive value, but the weight coefficient, the input feature value, and the output value are in many cases negative values. The above two points affect the expressive ability of the network, resulting in a decrease in the accuracy of the network.
  • the application provides a data conversion method and device, which can improve the expressive ability of the network and improve the accuracy of the network.
  • a data conversion method comprising: determining a weight reference value according to a weight of a first target layer of a neural network and a magnitude of a maximum weight coefficient; according to the weight reference value and the The weight log field is wide, and the weight coefficients in the first target layer are converted to a log domain.
  • the weight reference value is determined according to the weight log width and the maximum weight coefficient, and the weight coefficient is converted to the log domain based on the weight reference value and the weight log field width, and the weight coefficient is
  • the weight reference value in the logarithmic domain is not an empirical value, but is determined according to the weight logarithmic domain bit width and the maximum weight coefficient, which can improve the expression ability of the network and improve the accuracy of the network.
  • a data conversion method comprising: determining an input feature value of a first target layer of a neural network; performing a multiply and accumulate calculation on an input feature value and a weight coefficient of a log domain by a shift operation, The output value of the real field of the first target layer is obtained. .
  • a data conversion apparatus comprising a processor and a memory for storing instructions executed by a processor, the processor for performing the step of: weighting a first target layer according to a neural network And determining a weight reference value according to the size of the number field width and the maximum weight coefficient; converting the weight coefficient in the first target layer to the log domain according to the weight reference value and the weight log field width.
  • a data conversion apparatus comprising a processor and a memory for storing instructions executed by the processor, the processor for performing the step of determining input characteristics of a first target layer of the neural network The value is obtained by multiplying and accumulating the input feature value and the weight coefficient of the log domain by a shift operation to obtain an output value of the real domain of the first target layer.
  • Figure 1 is a schematic diagram of the framework of a deep convolutional neural network.
  • FIG. 2 is a schematic flowchart of a data conversion method according to an embodiment of the present application.
  • FIG. 3 is a schematic diagram of a multiply and accumulate operation flow according to an embodiment of the present application.
  • FIG. 4 is a schematic diagram of a multiply and accumulate operation flow of another embodiment of the present application.
  • FIG. 5A, FIG. 5B and FIG. 5C are schematic diagrams of several cases of merge preprocessing according to an embodiment of the present application;
  • FIG. 5D is a schematic diagram of a layer connection manner of the BN layer after the convolution layer.
  • FIG. 6 is a schematic block diagram of a data conversion apparatus according to an embodiment of the present application.
  • FIG. 7 is a schematic block diagram of a data conversion apparatus according to another embodiment of the present application.
  • FIG. 8 is a schematic block diagram of a data conversion apparatus according to another embodiment of the present application.
  • FIG. 9 is a schematic block diagram of a data conversion device of another embodiment of the present application.
  • Figure 1 is a schematic diagram of the framework of a deep convolutional neural network.
  • Input eigenvalues of deep convolutional neural networks input by the input layer
  • convolution transposed convolution or deconvolution
  • normalization BN
  • scaling Scale
  • hidden layers fully connected, concatenation, pooling, element-wise addition, and activation, yielding eigenvalues (output by output layer, in this article)
  • output value eigenvalues
  • the hidden layer of the deep convolutional neural network may include multiple layers of cascades.
  • the input of each layer is the output of the upper layer, which is a feature map, and each layer performs at least one operation of the above described one or more sets of feature maps to obtain the output of the layer.
  • the output of each layer is also a feature map.
  • each layer is named after the implemented function, for example, a layer that implements a convolution operation is called a convolutional layer.
  • the hidden layer may further include a transposed convolution layer, a BN layer, a Scale layer, a pooling layer, a fully connected layer, a Concatenation layer, an element intelligent addition layer, an activation layer, and the like, which are not enumerated here.
  • the specific operation flow of each layer can refer to the existing technology, and will not be described in detail herein.
  • each layer may have one input and/or one output, or multiple inputs and/or multiple outputs.
  • the width and height of the feature map are often decremented layer by layer (for example, the input, feature map #1, feature map #2, feature map #3, and output width and height shown in FIG. 1 are
  • the semantic segmentation task after the width and height of the feature graph are decremented to a certain depth, it may be incremented by a transposition convolution operation or an upsampling operation.
  • the convolution layer is followed by an activation layer.
  • Common activation layers include a Rectified Linear Unit (ReLU) layer, a S-type (sigmoid) layer, and a hyperbolic tangent (tanh) layer.
  • layers that require more weight parameters for operations are: convolutional layer, fully connected layer, transposed convolutional layer, and BN layer.
  • the representation of data in the real field means that the data is represented by the size of the data itself.
  • the representation of the data in the logarithmic domain means that the data is represented by the magnitude of the absolute value of the data (for example, the log value of the base of the absolute value of the data).
  • the embodiment of the present application provides a data conversion method, which includes an offline part and an online part.
  • the offline part is to determine the weight reference value corresponding to the weight coefficient in the logarithmic domain before the operation of the neural network or the operation, and convert the weight coefficient into the logarithmic domain.
  • the online part is the specific operation process of the neural network, that is, the process of obtaining the output value.
  • the weighting coefficient of the logarithmic domain appears in the above process
  • the size of the FSR needs to be manually adjusted under different networks.
  • the existing scheme of converting data from the real number field to the logarithmic field is only applicable to the case where the data in the real field is positive, but the weight coefficient, the input eigenvalue, and the output value are negative values in many cases.
  • the above two points affect the expressive ability of the network, resulting in a decrease in the accuracy of the neural network (hereinafter referred to as the network).
  • the weight logarithmic domain bit width of a given weight coefficient is BW_W
  • the input value logarithmic domain bit width of the input eigenvalue is BW_X
  • the output value logarithmic domain bit width of the output value is BW_Y.
  • FIG. 2 is a schematic flowchart of a data conversion method 200 according to an embodiment of the present application. As shown in FIG. 2, method 200 includes the following steps.
  • the weight reference value is determined according to the weight log domain width and the maximum weight coefficient, and the weight coefficient is converted to the log domain based on the weight reference value and the weight log domain width.
  • the weight reference value of the coefficient in the logarithmic domain is not an empirical value, but is determined according to the weight logarithmic domain bit width and the maximum weight coefficient, which can improve the expression ability of the network and improve the accuracy of the network.
  • the maximum weight coefficient can be considered as the reference weight value of the weight coefficient, which is denoted as RW.
  • the reference weight value may also select a maximum weight coefficient after the abnormal value is removed, and may also select a value other than the maximum weight coefficient, which is not limited in this embodiment of the present application.
  • the size of the weight reference value of the layer may be determined by the maximum weighting factor of the first target layer.
  • the weight reference value is recorded as BASE_W. It should be noted that embodiments of the present application can calculate the weight reference value according to the need of accuracy.
  • the weight reference value can be an integer or a decimal place; it can be a positive number or a negative number. Its value can be given by the following formula (1).
  • ceil() is the up-rounding function.
  • Determining the weight reference value BASE_W according to the formula (1) makes it possible to have a higher accuracy of the larger weight coefficient when converting the weight coefficient to the logarithmic domain.
  • the 2 BW_W-1 term in the formula (1) is given based on the case where the weight coefficient is converted to the log field including one bit symbol, and when the weight coefficient of the log domain is not set with the sign bit, the item It can also be 2 BW_W .
  • the embodiment of the present application is not limited to determining BASE_W by the formula (1), and BASE_W can be determined based on other principles and by other formulas.
  • weight coefficients in the first target layer may be converted into a logarithmic domain in S220, and only the weight coefficients of a part of the first target layer may be converted into a logarithmic domain. Not limited.
  • Converting the weight coefficient in the first target layer to the logarithmic domain according to the weight reference value and the weight log domain width may include: weighting according to the weight reference value, the weight log domain width, and the weight coefficient. The coefficients are converted to a logarithmic domain.
  • the weight log field width may include one bit symbol, and the symbol of the weight coefficient in the log field is consistent with the sign of the weight coefficient in the real field.
  • the data value is a negative value
  • the data is uniformly converted into a logarithmic domain value corresponding to the zero value of the real domain.
  • the positive and negative signs of the weight coefficient are reserved, which is beneficial to improving the accuracy of the network.
  • the conversion of the weight parameter to the logarithmic domain can be calculated by the following formula (2).
  • sign() can be expressed as the following formula (3).
  • Round() can be expressed as the following formula (4).
  • Clip() can be expressed specifically as the following formula (5).
  • weight coefficients of BASE_W and the logarithmic domain can be used.
  • weighting factor w of the real number field To represent the weighting factor w of the real number field.
  • the weight logarithmic domain bit width BW_W is all integer bits.
  • ⁇ (0-128) can be realized by 4-bit width, that is, ⁇ (0, 1, 2, 4) , 8, 16, 32, 64, and 128) representations of weight coefficient values, where 1 bit is a sign bit.
  • the weight log field width BW_W of the embodiment of the present application may also include a decimal place.
  • ⁇ (0-2 3.5 ) can be represented by a 4-bit width (two-digit integer and one decimal), that is, ⁇ ( 0 , 2 0 , 2 0.5 , 2 1 , 2
  • the weighted log field width may also not include the sign bit.
  • the sign bit may not be included in the weight log field width.
  • the offline portion may also include an output reference value that determines the output value in the logarithmic domain, but it is not a necessary step in certain scenarios. That is, in practical applications, it is possible to convert only the weight to the log domain without converting the output value to the log domain, so this step is optional.
  • the method 200 may further include determining an output reference value according to an output value of the first target layer and a magnitude of the reference domain bit width and the reference output value RY. This step may be performed after S220, and may be performed before S210, and may also be performed at the same time as S210 to S220, which is not limited by the embodiment of the present application.
  • the reference output value RY can be determined by the following steps. A maximum output value of each of the plurality of input samples at the first target layer is calculated; and a reference output value RY is selected from the plurality of maximum output values. Specifically, selecting the reference output value RY from the plurality of maximum output values may include: sorting the plurality of maximum output values, and selecting the reference output value RY from the plurality of maximum output values according to the preset selection parameter.
  • a plurality of maximum output values are sorted, for example, in ascending order or descending order, or sorted according to a certain preset rule.
  • the maximum output value is selected from the M maximum output values according to the preset selection parameters (for example, the selection parameter is the value of the specific position after sorting), as the reference output value RY.
  • the M maximum output values are arranged in ascending order, the parameter is a, and the a ⁇ M maximum output values are selected as the reference output value RY, where a is greater than or equal to 0 and less than Or equal to 1.
  • reference output value RY can also be determined by other methods, which is not limited by the embodiment of the present application.
  • the determination of the output reference value BASE_Y can be calculated by the following formula (6).
  • both the weight coefficient and the output value may be expressed in the form of a differential value based on the reference value to ensure that each difference value is a positive number, and only the reference value may be a negative number.
  • each weight coefficient and output value can save 1 bit (bit) of bit width, which can reduce storage overhead, and can generate significant bandwidth gain for a large data scale of a neural network.
  • a convolution operation or a full join operation or other layer operations of the neural network may be represented by the multiply-accumulate operation of equation (7).
  • kc is the number of channels of the input feature value
  • kh is the height of the convolution kernel
  • kw is the width of the convolution kernel
  • x is the input eigenvalue
  • w is the weight coefficient
  • y is the output value.
  • the method 200 further includes the following steps.
  • the input feature value of the first target layer is determined.
  • the input feature value and the weight coefficient of the logarithmic domain are multiplied and accumulated to obtain the output value of the real domain of the first target layer.
  • each embodiment of the present application can obtain an output value of a real field by an addition operation combined with a shift operation.
  • the input feature value is the input feature value of the real field and the input feature value of the log field
  • the embodiment of the present application may have different processing manners.
  • the input feature value is an input feature value of the real field.
  • performing, by a shift operation, multiply and accumulate the input feature value and the weight coefficient of the logarithmic domain to obtain an output value of the real domain of the first target layer which may include: performing a first shift operation And performing multiply-accumulate calculation on the input feature value of the real number field and the weight coefficient of the logarithmic domain to obtain a multiply-accumulated value; performing a second shift operation on the multiply-accumulated value to obtain the first target layer The output value of the real field.
  • the input feature values of some embodiments of the present application are not converted to the log domain, because the conversion of the input feature values into the log domain results in a loss of detail. , so the input feature value retains the real field representation.
  • the weighting factor w can already be converted to the logarithmic domain in the offline part, with BASE_W and non-negative Said.
  • the output reference value BASE_Y of the output value has also been determined in the offline section.
  • the input feature value and the output value may be fixed-point numbers in the real field, and the weight coefficient w may have been converted to the log domain in the offline portion, to BASE_W and non- minus Said.
  • the fixed point format of the input eigenvalue is QA.B
  • the fixed point format of the output value is QC.D.
  • a and C represent integer bit widths
  • B and D represent decimal place widths.
  • performing a second shift operation on the multiply-accumulated value to obtain an output value of the real-number field of the first target layer may include: a decimal place width according to an input feature value of the real-number field and a decimal place of an output value of the real-number field
  • the bit width is shifted by the multiply-accumulated value to obtain the output value of the real field of the first target layer. Since the weight coefficient is BASE_W and non-negative The value of the logarithmic domain is represented, so further, the multiply-accumulated value is shifted according to the decimal place width of the input feature value of the real field, the decimal place width of the output value of the real field, and the weight reference value. The output value of the real field of the first target layer is obtained.
  • bitshift(y sum , B-BASE_W-D) in the formula (8) is a second shift operation, which can be specifically expressed as the following formula (9).
  • y sum can be calculated by the following formula (10) and formula (11).
  • the output value y of the real field of the fixed point format QC.D can be obtained by the formula (8) to the formula (11).
  • the fixed point format of the input feature value is Q7.0
  • the fixed point format of the output value is Q4.3.
  • FIG. 3 is a schematic diagram of a multiply and accumulate operation flow 300 in accordance with an embodiment of the present application.
  • the process 300 includes the following steps. S310 and S320 are implemented in the offline part, and S330 and S340 are implemented in the online part.
  • the method of the present scheme may further include the step of converting the output value of the real field to the log domain. Specifically, the multiply-accumulated value is shifted according to the decimal place width of the input feature value of the real number field, the decimal place width of the output value of the real field, and the weight reference value, to obtain the real domain of the first target layer. After outputting the value, the method may further include: converting the output value of the real field to the log domain according to the output reference value, the output value log field width, and the output value of the real field. Wherein, the output value log field width may include one bit symbol, and the output value in the log field is consistent with the output value in the real field.
  • the conversion of the output value to the logarithmic domain can be calculated by the formula (12) similar to the formula (2).
  • performing a second shift operation on the multiply-accumulated value to obtain an output value of the real field of the first target layer may include: according to the weight reference value and the output reference value And performing a shift operation on the multiply-accumulated value to obtain an output value of the real-number field of the first target layer.
  • the decrement of 1 in the formula (13) is to reserve a one-digit mantissa, so this number can be regarded as a fixed-point number with one decimal place.
  • Bitshift() and y sum can refer to equations (9) through (11). Among them, the bitshift (y sum , BASE_Y-BASE_W-1) in the formula (13) is the second shift operation.
  • the output value y of the real field here is the output value of the real field after the output reference value BASE_Y has been considered.
  • the output value y of the real field can be converted to a logarithmic domain.
  • the method 200 may further include: a logarithmic domain bit according to the output value The size of the output values of the wide and real fields, converting the output value of the real field to the log field.
  • the output value log field width includes one bit symbol, and the output value in the log field is consistent with the output value in the real field.
  • the conversion of the output value y of the real number field to the logarithmic domain can be calculated by the following formula (14).
  • the input feature value is an input feature value of a logarithmic domain.
  • performing, by a shift operation, multiply and accumulate the input feature value and the weight coefficient of the logarithmic domain to obtain an output value of the real domain of the first target layer which may include: performing a third shift operation And performing multiply-accumulate calculation on the input feature value of the logarithmic domain and the weighting coefficient of the logarithmic domain to obtain a multiply-accumulated accumulating value; performing a fourth shift operation on the multiply-accumulated accumulating value to obtain the first target layer The output value of the real field.
  • This alternative applies to the middle layer of the neural network, where the input eigenvalue of the intermediate layer is the output value of the previous layer and has been converted to the logarithmic domain.
  • the output reference value of the output value of the upper layer of the first target layer (the middle layer of the neural network) to be discussed herein can be regarded as the input reference value of the input feature value of the first target layer, which is denoted as BASE_X.
  • the output reference value of the output value of the first target layer is BASE_Y, and the weight reference value BASE_W of the weight coefficient of the first target layer.
  • performing a fourth shift operation on the multiply-accumulated value to obtain an output value of the real-number field of the first target layer may include: an input reference value, an output reference value, and a weight reference value according to the input feature value of the log domain And performing a shift operation on the multiply-accumulated value to obtain an output value of the real-number field of the first target layer.
  • Bitshift() can refer to equation (9). Among them, the bitshift (y sum , BASE_Y-BASE_W-BASE_X-1) in the formula (15) is the fourth shift operation.
  • y sum can be calculated by the following formula (16) and formula (17).
  • the output value y can be converted to the logarithmic domain by equation (12)
  • equation (12) The embodiment of the present application does not limit this.
  • FIG. 4 is a schematic diagram of a multiply-accumulate operation flow 400 of another embodiment of the present application.
  • the process 400 includes the following steps. S410 to S430 are implemented in the offline portion, and S440 and S450 are implemented in the online portion.
  • S420 Convert the weight coefficient of the real field to the logarithmic domain according to the weight reference value, and obtain a weight coefficient of the log domain. Specifically, according to the weight reference value and the weight log field width, the weight coefficient of the real field in the first target layer is converted into a log domain, and the weight coefficient of the log domain is obtained.
  • S430 Calculate an output reference value according to the reference output value. Specifically, the output reference value is determined according to the output value logarithmic domain bit width of the first target layer and the magnitude of the reference output value.
  • S450 Convert the output value of the real field to a logarithmic domain according to the size of the output value of the real field and the output reference value.
  • the output value of the real field is converted to a logarithmic domain based on the output value log field width, the output reference value, and the output value of the real field.
  • log 2 () can be implemented from the high to the low, except for the sign bit, and the first position is not 0.
  • the two multiplication operations are XORed and sign bit concatenated in the hardware design, that is, no multiplier is required.
  • the embodiment of the present application further provides a data conversion method, and the method includes the following steps. Determining an input feature value of the first target layer of the neural network; performing multiplication and accumulation calculation on the input feature value and the weight coefficient of the log domain by a shift operation to obtain an output value of the real domain of the first target layer.
  • the weighting coefficient of the logarithmic domain can be obtained in the existing manner or the method in the embodiment of the present application, which is not limited in the embodiment of the present application.
  • the first target layer of the embodiment of the present application may include a convolution layer, a transposed convolution layer, a BN layer, a Scale layer, a pooling layer, a fully connected layer, a Concatenation layer, an element-wise addition layer, and a layer in the active layer. Or at least two layers of the combined layer. That is, the data conversion method 200 of the embodiment of the present application can be applied to any layer or layers of a hidden layer of a neural network.
  • the data conversion method 200 may further include performing a combination pre-processing on at least two layers of the neural network to obtain a first target layer formed by combining, in a case where the first target layer is a layer of the at least two layers. This process can be considered as a pre-processing part of the data fixed-point method.
  • the parameters of the convolutional layer, the BN layer, and the Scale layer of the inference phase are fixed.
  • the parameters of the BN layer and the Scale layer can be merged into the parameters of the convolution layer, so that the intellectual property core (IP core) of the neural network does not need to be specifically for the BN layer and the Scale layer.
  • IP core intellectual property core
  • the convolutional layer was followed by the active layer.
  • the BN layer can be introduced before the layer is activated after the convolution layer.
  • x i is the output of the convolutional layer, let X be the input of the convolutional layer, and W be the matrix of weight coefficients, For the offset value there are:
  • combining the pre-processing of at least two layers of the neural network to obtain the first target layer formed by combining may include: performing pre-processing on the convolution layer and the BN layer of the neural network to obtain the first target layer; or The convolutional layer and the Scale layer of the neural network are combined and preprocessed to obtain a first target layer; or, the convolutional layer, the BN layer and the Scale layer of the neural network are combined and preprocessed to obtain a first target layer.
  • the maximum weight coefficient may be a maximum value of a weight coefficient of the first target layer formed by combining pre-processing on at least two layers of the neural network.
  • the maximum output value is a maximum output value of the first target layer formed after the combination of each of the plurality of input samples.
  • FIGS. 5A, 5B, and 5C are schematic diagrams of several cases of merge preprocessing in the embodiment of the present application.
  • Figure 5D is a layer connection of the BN layer followed by the simplest convolutional layer.
  • the convolution layer is followed by a BN layer, followed by an active layer, and the convolution layer and the BN layer are merged into a first target layer, followed by an active layer, which is similar.
  • Figure 2D is a two layer structure.
  • IP cores support the processing of the Scale layer
  • the combination of the convolutional layer and the BN layer in the merge preprocessing can be replaced by the convolution layer and the Scale layer.
  • FIG. 5B before the merge preprocessing is performed, the convolution layer is followed by the Scale layer, followed by the active layer, and the convolution layer and the Scale layer are merged into the first target layer, followed by the active layer, which is similar.
  • Figure 2D is a two layer structure.
  • FIG. 6 is a schematic block diagram of a data conversion device 600 according to an embodiment of the present application.
  • the data conversion device 600 includes a weight reference determination module 610 and a weight logarithmic conversion module 620.
  • the weight reference determination module 610 is configured to determine the weight reference value according to the weight log field width and the maximum weight coefficient of the first target layer of the neural network.
  • the weight logarithmic conversion module 620 is configured to convert the weight coefficients in the first target layer to a logarithmic domain according to the weight reference value and the weight log domain width.
  • the data conversion apparatus of the embodiment of the present application determines the weight reference value according to the weight log width and the maximum weight coefficient, and converts the weight coefficient to the log domain based on the weight reference value and the weight log domain width.
  • the weight reference value of the coefficient in the logarithmic domain is not an empirical value, but is determined according to the weight logarithmic domain bit width and the maximum weight coefficient, which can improve the expression ability of the network and improve the accuracy of the network.
  • the weight logarithmic conversion module 620 converts the weight coefficients in the first target layer to the log domain according to the weight reference value and the weight log domain width, including: the weight logarithmic conversion module 620 The weight coefficients are converted to a logarithmic domain according to the weight reference value, the weight log field width, and the weight coefficient.
  • the weight log field width includes a one-bit symbol bit, and the symbol of the weight coefficient in the log domain is consistent with the sign of the weight coefficient in the real field.
  • the input feature value is an input feature value of a real field.
  • the real number output module 630 performs a multiply and accumulate calculation on the input feature value and the weight coefficient of the logarithmic domain by a shift operation to obtain an output value of the real domain of the first target layer, including: the real output module 630 passes a first shift operation, performing multiply-accumulate calculation on the input feature value of the real field and the weight coefficient of the log domain to obtain a multiply-accumulated value; the real output module 630 performs a second shift operation on the multiply-accumulated value Obtaining an output value of the real number field of the first target layer.
  • the real number output module 630 performs a second shift operation on the multiplied accumulating value to obtain an output value of the real field of the first target layer, including: the output of the real number output module 630 according to the input feature value of the real field and the output of the real field The decimal place width of the value is shifted by the multiply-accumulated value to obtain the output value of the real field of the first target layer.
  • the real output module 630 performs a shift operation on the multiply-accumulated value according to the decimal place width of the input feature value of the real field and the decimal place width of the output value of the real field to obtain the first target.
  • the output value of the real field of the layer includes: the real output module 630 performs a shift operation on the multiply-accumulated value according to the decimal place width of the input feature value of the real field, the decimal place width of the output value of the real field, and the weight reference value. , the output value of the real field of the first target layer is obtained.
  • the data conversion device 600 may further include a logarithmic output module 640, and the decimal place width of the input feature value of the real field in the real number output module 630, and the decimal place of the output value of the real field.
  • the bit width and the weight reference value are subjected to a shift operation on the multiplied accumulating value to obtain an output value of the real field of the first target layer, and then according to the output reference value, the logarithmic domain bit width of the output value, and the output value of the real field, Convert the output value of the real field to the log field.
  • the output value log field width includes a one-bit sign bit, and the output value in the log field is consistent with the output value in the real field.
  • the real output module 630 performs a second shift operation on the multiply-accumulated value to obtain an output value of the real field of the first target layer, including: the real output module 630 according to the weight reference value and the output reference value. And performing a shift operation on the multiply-accumulated value to obtain an output value of the real-number field of the first target layer.
  • the data conversion device 600 may further include a logarithmic output module 640, configured to perform a shift operation on the multiply-accumulated value according to the weight reference value and the output reference value by the real-number output module 630 to obtain the first After the output value of the real field of the target layer, the output value of the real field is converted to a logarithmic domain according to the output value of the log field width and the output value of the real field.
  • a logarithmic output module 640 configured to perform a shift operation on the multiply-accumulated value according to the weight reference value and the output reference value by the real-number output module 630 to obtain the first After the output value of the real field of the target layer, the output value of the real field is converted to a logarithmic domain according to the output value of the log field width and the output value of the real field.
  • the output value log field width includes a one-bit sign bit, and the output value in the log field is consistent with the output value in the real field.
  • the output reference determining module 660 selects the reference output value from the plurality of maximum output values, including: the output reference determining module 660 sorts the plurality of maximum output values according to the preset selection parameters. The reference output value is selected from among the plurality of maximum output values.
  • the input feature value is an input feature value of a logarithmic domain.
  • the real number output module 630 performs a multiply and accumulate calculation on the input feature value and the weight coefficient of the logarithmic domain by a shift operation to obtain an output value of the real domain of the first target layer, including: the number output module 630 passes a third shift operation, performing multiply-accumulate calculation on the input feature value of the logarithmic domain and the weighting coefficient of the logarithmic domain to obtain a multiply-accumulated value; the number output module 630 performs a fourth shift on the multiply-accumulated value An operation is performed to obtain an output value of a real field of the first target layer.
  • the real output module 630 performs a fourth shift operation on the multiply-accumulated value to obtain an output value of the real field of the first target layer, including: an input feature value of the real number output module 630 according to a logarithmic field.
  • the input reference value, the output reference value, and the weight reference value are subjected to a shift operation on the multiplied cumulative value to obtain an output value of the real domain of the first target layer.
  • the maximum weight coefficient is a maximum value of a weight coefficient of the first target layer formed by combining pre-processing on at least two layers of the neural network.
  • the data conversion device 600 may further include a pre-processing module 670.
  • the pre-processing module 670 is configured to perform a combined pre-processing on at least two layers of the neural network to obtain a first target layer formed by combining.
  • the maximum output value is a maximum output value of the first target layer formed by combining each of the plurality of input samples.
  • the pre-processing module 670 performs a combination pre-processing on at least two layers of the neural network to obtain a first target layer formed by combining, including: a pre-processing module 670 convolution layer and return to the neural network The first layer is merged and preprocessed to obtain the first target layer; or, the convolution layer and the zoom layer of the neural network are combined and preprocessed to obtain the first target layer; or, the convolution layer of the neural network, normalization The layer and the zoom layer are combined and preprocessed to obtain a first target layer.
  • the first target layer includes a convolution layer, a transposed convolution layer, a normalization layer, a scaling layer, a pooling layer, a fully connected layer, a stitching layer, an element intelligent additive layer, and an active layer.
  • a convolution layer a transposed convolution layer
  • a normalization layer a scaling layer
  • a scaling layer a scaling layer
  • a pooling layer a fully connected layer
  • a stitching layer an element intelligent additive layer
  • an active layer One of the layers or at least two of the combined layers.
  • weight reference determination module 610 the weight logarithmic conversion module 620, the real number output module 630, the logarithmic output module 640, the output reference determination module 650, the output reference determination module 660, and the preprocessing module 670 may be implemented by a processor and a memory. achieve.
  • FIG. 7 is a schematic block diagram of a data conversion apparatus 700 of another embodiment of the present application.
  • the data conversion apparatus 700 shown in FIG. 7 may include a processor 710 and a memory 720 in which computer instructions are stored, and when the processor 710 executes computer instructions, the data conversion apparatus 700 performs the following steps.
  • the weight reference value is determined according to the weight of the first target layer of the neural network and the magnitude of the maximum weight coefficient.
  • the weight coefficients in the first target layer are converted to a log domain according to the weight reference value and the weight log field width.
  • the processor 710 converts the weight coefficients in the first target layer into a log domain according to the weight reference value and the weight log domain width, including: according to the weight reference value, the weight log domain The size of the bit width and the weighting factor convert the weighting factor to the logarithmic domain.
  • the weight log field width includes a one-bit symbol bit, and the symbol of the weight coefficient in the log domain is consistent with the sign of the weight coefficient in the real field.
  • the processor 710 is further configured to perform the following steps:
  • the input feature value of the first target layer is calculated by multiplying and accumulating the input feature value and the weight coefficient of the logarithmic domain by a shift operation to obtain an output value of the real domain of the first target layer.
  • the input feature value is an input feature value of a real field.
  • the processor 710 performs multiplication and accumulation calculation on the input feature value and the weight coefficient of the logarithmic domain by a shift operation to obtain an output value of the real domain of the first target layer, including: performing a first shift operation And performing multiply-accumulate calculation on the input feature value of the real number field and the weight coefficient of the logarithmic domain to obtain a multiply-accumulated value; performing a second shift operation on the multiply-accumulated value to obtain the first target layer The output value of the real field. .
  • the processor 710 performs a second shift operation on the multiply-accumulated value to obtain an output value of the real-number field of the first target layer, including: a decimal place width sum according to an input feature value of the real field The decimal place width of the output value of the real field is shifted by the multiply-accumulated value to obtain the output value of the real field of the first target layer.
  • the processor 710 performs a shift operation on the multiply-accumulated value according to the decimal place width of the input feature value of the real number field and the decimal place width of the output value of the real field to obtain the first target layer.
  • the output value of the real number field includes: shifting the multiply-accumulated value according to the scale width of the input feature value of the real number field, the scale width of the output value of the real field, and the weight reference value to obtain the first target The output value of the real field of the layer.
  • the processor 710 performs a shift operation on the multiply-accumulated value according to the decimal place width of the input feature value of the real number field, the decimal place width of the output value of the real field, and the weight reference value.
  • the method further performs the following steps: converting the output value of the real field to the pair according to the output reference value, the output value log field width, and the output value of the real field Number field.
  • the output value log field width includes a one-bit sign bit, and the output value in the log field is consistent with the output value in the real field.
  • the processor 710 performs a second shift operation on the multiply-accumulated value to obtain an output value of the real-number field of the first target layer, including: adding a multiply-accumulated value according to the weight reference value and the output reference value.
  • a shift operation is performed to obtain an output value of the real field of the first target layer.
  • the processor 710 after the processor 710 performs a shift operation on the multiply-accumulated value according to the weight reference value and the output reference value to obtain an output value of the real-number field of the first target layer, the processor 710 is further configured to perform the following steps. : Converts the output value of the real field to the logarithmic domain according to the output value log field width and the output value of the real field.
  • the output value log field width includes a one-bit sign bit, and the output value in the log field is consistent with the output value in the real field.
  • the processor 710 is further configured to: determine an output reference value according to an output value of the first target layer, a log domain bit width, and a size of the reference output value.
  • the processor 710 is further configured to: calculate a maximum output value of each of the plurality of input samples in the first target layer; and select a reference output from the plurality of maximum output values value.
  • the processor 710 selects the reference output value from the multiple maximum output values, including: sorting the multiple maximum output values, and selecting from the plurality of maximum output values according to the preset selection parameters.
  • the reference output value is output.
  • the input feature value is an input feature value of a logarithmic domain.
  • the processor 710 performs multiplication and accumulation calculation on the input feature value and the weight coefficient of the logarithmic domain by a shift operation to obtain an output value of the real domain of the first target layer, including: performing a third shift operation And performing multiply-accumulate calculation on the input feature value of the logarithmic domain and the weighting coefficient of the logarithmic domain to obtain a multiply-accumulated accumulating value; performing a fourth shift operation on the multiply-accumulated accumulating value to obtain the first target layer The output value of the real field.
  • the processor 710 performs a fourth shift operation on the multiply-accumulated value to obtain an output value of the real-number field of the first target layer, including: an input reference value according to the input feature value of the log domain, The reference value and the weight reference value are output, and the multiplied and accumulated value is shifted to obtain an output value of the real field of the first target layer.
  • the maximum weight coefficient is a maximum value of a weight coefficient of the first target layer formed by combining pre-processing on at least two layers of the neural network.
  • the processor 710 is further configured to perform the following steps: performing a combined pre-processing on at least two layers of the neural network to obtain a first target layer formed by combining.
  • the maximum output value is a maximum output value of the first target layer formed by combining each of the plurality of input samples.
  • the processor 710 performs a combination pre-processing on at least two layers of the neural network to obtain a merged first target layer, including: merging the convolution layer and the normalization layer of the neural network. Pre-processing to obtain a first target layer; or, combining the convolution layer and the scaling layer of the neural network to obtain a first target layer; or, performing a convolution layer, a normalization layer, and a scaling layer of the neural network The pre-processing is combined to obtain a first target layer.
  • the first target layer includes a convolution layer, a transposed convolution layer, a normalization layer, a scaling layer, a pooling layer, a fully connected layer, a stitching layer, an element intelligent additive layer, and an active layer.
  • a convolution layer a transposed convolution layer
  • a normalization layer a scaling layer
  • a scaling layer a scaling layer
  • a pooling layer a fully connected layer
  • a stitching layer an element intelligent additive layer
  • an active layer One of the layers or at least two of the combined layers.
  • FIG. 8 is a schematic block diagram of a data conversion apparatus 800 of another embodiment of the present application.
  • Data conversion device 800 includes a real output module 810.
  • the real number output module 810 is configured to determine an input feature value of the first target layer of the neural network, and perform multiply and accumulate calculation on the input feature value and the weight coefficient of the log domain by a shift operation to obtain the first target layer. The output value of the real field.
  • the data conversion device of the embodiment of the present invention can perform the multiply and accumulate operations by performing simple addition and shift operations on the input feature value and the weight coefficient of the logarithmic domain, and does not require a multiplier, thereby reducing equipment cost.
  • the input feature value is an input feature value of the real field.
  • the real number output module 810 performs a multiply and accumulate calculation on the input feature value and the weight coefficient of the logarithmic domain by a shift operation to obtain an output value of the real domain of the first target layer, including: the real output module 810 passes a first shift operation, performing multiply-accumulate calculation on the input feature value of the real field and the weight coefficient of the log domain to obtain a multiply-accumulated value; the real output module 810 performs a second shift operation on the multiply-accumulated value Obtaining an output value of the real number field of the first target layer.
  • the real output module 810 performs a second shift operation on the multiplied accumulating value to obtain an output value of the real field of the first target layer, including: the real output module 810 according to the input feature value of the real field
  • the scale of the decimal place width and the output value of the real field is the width of the decimal place, and the multiplication and accumulation value is shifted to obtain the output value of the real field of the first target layer.
  • the real output module 810 performs a shift operation on the multiply-accumulated value according to the decimal place width of the input feature value of the real field and the decimal place width of the output value of the real field to obtain the first target.
  • the output value of the real field of the layer includes: the real output module 810 performs a shift operation on the multiply-accumulated value according to the decimal place width of the input feature value of the real field, the decimal place width of the output value of the real field, and the weight reference value. , the output value of the real field of the first target layer is obtained.
  • the data conversion apparatus 800 may further include a logarithmic output module 840 for the decimal place width of the input feature value of the real number field and the decimal place of the output value of the real field at the real number output module 810.
  • the bit width and the weight reference value are subjected to a shift operation on the multiplied accumulating value to obtain an output value of the real field of the first target layer, and then according to the output reference value, the logarithmic domain bit width of the output value, and the output value of the real field, Convert the output value of the real field to the log field.
  • the output value log field width includes a one-bit sign bit, and the output value in the log field is consistent with the output value in the real field.
  • the real output module 810 performs a shift operation on the multiply-accumulated value to obtain an output value of the real field of the first target layer, including: the real output module 810 is configured according to the weight reference value and the output reference value.
  • the shift operation is performed by the accumulated addition value to obtain the output value of the real field of the first target layer.
  • the data conversion device 800 may further include a logarithmic output module 840, configured to perform a shift operation on the multiply-accumulated value by the real-number output module 810 according to the weight reference value and the output reference value to obtain the first After the output value of the real field of the target layer, the output value of the real field is converted to a logarithmic domain according to the output value of the log field width and the output value of the real field.
  • a logarithmic output module 840 configured to perform a shift operation on the multiply-accumulated value by the real-number output module 810 according to the weight reference value and the output reference value to obtain the first After the output value of the real field of the target layer, the output value of the real field is converted to a logarithmic domain according to the output value of the log field width and the output value of the real field.
  • the output value log field width includes a one-bit sign bit, and the output value in the log field is consistent with the output value in the real field.
  • the data conversion apparatus 800 may further include an output reference determining module 850, configured to determine an output reference value according to an output value of the first target layer, a logarithmic domain bit width, and a size of the reference output value.
  • an output reference determining module 850 configured to determine an output reference value according to an output value of the first target layer, a logarithmic domain bit width, and a size of the reference output value.
  • the data conversion apparatus 800 may further include an output reference determining module 860, configured to calculate a maximum output value of each of the plurality of input samples in the first target layer; and from the plurality of maximum output values The reference output value is selected.
  • an output reference determining module 860 configured to calculate a maximum output value of each of the plurality of input samples in the first target layer; and from the plurality of maximum output values The reference output value is selected.
  • the output reference determining module 860 selects the reference output value from the plurality of maximum output values, including: the output reference determining module 860 sorts the plurality of maximum output values according to the preset selection parameters. The reference output value is selected from among the plurality of maximum output values.
  • the input feature value is an input feature value of a logarithmic domain.
  • the real number output module 810 performs a multiply and accumulate calculation on the input feature value and the weight coefficient of the logarithmic domain by a shift operation to obtain an output value of the real domain of the first target layer, including: the real output module 810 passes a third shift operation, performing multiply-accumulate calculation on the input feature value of the logarithmic domain and the weighting coefficient of the logarithmic domain to obtain a multiply-accumulated value; the real-number output module 810 performs a fourth shift operation on the multiply-accumulated value, An output value of the real number field of the first target layer is obtained.
  • the real output module 810 performs a fourth shift operation on the multiply-accumulated value to obtain an output value of the real field of the first target layer, including: the real output module 810 is configured according to the pair An input reference value, an output reference value, and a weight reference value of the input feature value of the number field are subjected to a shift operation on the multiplied cumulative value to obtain an output value of the real number field of the first target layer.
  • the data conversion device 800 may further include a weight reference determination module 820 and a weight log conversion module 830.
  • the weight reference determination module 820 is configured to determine the weight reference value according to the weight of the first target layer and the magnitude of the maximum weight coefficient.
  • the weight logarithmic conversion module 830 is configured to convert the weight coefficients of the real domain in the first target layer to the log domain according to the weight reference value and the weight log domain width, to obtain the weight coefficient of the log domain.
  • the weight logarithmic conversion module 830 converts the weight coefficient of the real domain in the first target layer to a logarithmic domain according to the weight reference value and the weight log domain width, and obtains a log domain.
  • the weighting coefficient includes: the weight logarithmic conversion module 830 converts the weight coefficient of the real domain to the logarithmic domain according to the weight reference value, the weight log domain width, and the weight coefficient to obtain a weight coefficient of the log domain.
  • the weight log field width includes a one-bit symbol bit, and the symbol of the weight coefficient in the log domain is consistent with the sign of the weight coefficient in the real field.
  • the maximum weight coefficient is a maximum value of a weight coefficient of the first target layer formed by combining pre-processing on at least two layers of the neural network.
  • the data conversion device 800 may further include a pre-processing module 870.
  • the pre-processing module 870 is configured to perform a combined pre-processing on at least two layers of the neural network to obtain a first target layer formed by combining.
  • the maximum output value is a maximum output value of the first target layer formed by combining each of the plurality of input samples.
  • the pre-processing module 870 performs a combination pre-processing on at least two layers of the neural network to obtain a merged first target layer, including: a pre-processing module 870 convolution layer and return to the neural network.
  • the first layer is merged and preprocessed to obtain the first target layer; or, the convolution layer and the zoom layer of the neural network are combined and preprocessed to obtain the first target layer; or, the convolution layer of the neural network, normalization
  • the layer and the zoom layer are combined and preprocessed to obtain a first target layer.
  • the first target layer includes a convolution layer, a transposed convolution layer, a normalization layer, a scaling layer, a pooling layer, a fully connected layer, a stitching layer, an element intelligent additive layer, and an active layer.
  • a convolution layer a transposed convolution layer
  • a normalization layer a scaling layer
  • a scaling layer a scaling layer
  • a pooling layer a fully connected layer
  • a stitching layer an element intelligent additive layer
  • an active layer One of the layers or at least two of the combined layers.
  • FIG. 9 is a schematic block diagram of a data conversion apparatus 900 of another embodiment of the present application.
  • the data conversion apparatus 900 shown in FIG. 9 may include a processor 910 and a memory 920 in which computer instructions are stored, and when the processor 910 executes computer instructions, the data conversion apparatus 900 performs the following steps. Determining an input feature value of the first target layer of the neural network; performing multiplication and accumulation calculation on the input feature value and the weight coefficient of the log domain by a shift operation to obtain an output value of the real domain of the first target layer.
  • the input feature value is an input feature value of the real field.
  • the processor 910 performs a multiplication and accumulation calculation on the input feature value and the weight coefficient of the logarithmic domain by using a shift operation to obtain an output value of the real domain of the first target layer, including: passing the first shift a bit operation, performing multiply-accumulate calculation on the input feature value of the real number field and the weight coefficient of the logarithmic domain to obtain a multiply-accumulated value; performing a second shift operation on the multiply-accumulated value to obtain the first target The output value of the real field of the layer.
  • the processor 910 performs a second shift operation on the multiply-accumulated value to obtain an output value of the real-number field of the first target layer, including: a decimal place width sum according to an input feature value of the real field The decimal place width of the output value of the real field is shifted by the multiply-accumulated value to obtain the output value of the real field of the first target layer.
  • the processor 910 performs a shift operation on the multiply-accumulated value according to the decimal place width of the input feature value of the real number field and the decimal place width of the output value of the real field to obtain the first target layer.
  • the output value of the real number field includes: shifting the multiply-accumulated value according to the scale width of the input feature value of the real number field, the scale width of the output value of the real field, and the weight reference value to obtain the first target The output value of the real field of the layer.
  • the processor 910 performs a shift operation on the multiply-accumulated value according to a decimal place width of the input feature value of the real number field, a decimal place width of the output value of the real field, and a weight reference value.
  • the method further performs the following steps: converting the output value of the real field to the pair according to the output reference value, the output value log field width, and the output value of the real field Number field.
  • the processor 910 performs a second shift operation on the multiply-accumulated value to obtain an output value of the real-number field of the first target layer, including: adding a multiply-accumulated value according to the weight reference value and the output reference value.
  • a shift operation is performed to obtain an output value of the real field of the first target layer.
  • the processor 910 is further configured to perform the following steps. : Converts the output value of the real field to the logarithmic domain according to the output value log field width and the output value of the real field.
  • the output value log field width includes a one-bit sign bit, and the output value in the log field is consistent with the output value in the real field.
  • the processor 910 is further configured to: determine an output reference value according to an output value of the first target layer, a log domain bit width, and a size of the reference output value.
  • the processor 910 is further configured to: calculate a maximum output value of each of the plurality of input samples in the first target layer; and select a reference output from the plurality of maximum output values. value.
  • the processor 910 selects the reference output value from the multiple maximum output values, including: sorting the multiple maximum output values, and selecting from the plurality of maximum output values according to the preset selection parameters.
  • the reference output value is output.
  • the input feature value is an input feature value of a logarithmic domain.
  • the processor 910 performs multiplication and accumulation calculation on the input feature value and the weight coefficient of the logarithmic domain by a shift operation to obtain an output value of the real domain of the first target layer, including: performing a third shift operation And multiply and accumulate the input eigenvalues of the logarithmic domain and the weighting coefficients of the logarithmic domain to obtain a multiply-accumulated accumulating value; performing a fourth shifting operation on the multiply-accumulated accumulating value to obtain a real number of the first target layer The output value of the field.
  • the processor 910 performs a fourth shift operation on the multiply-accumulated value to obtain an output value of the real-number field of the first target layer, including: an input feature according to the log-domain The input reference value, the output reference value, and the weight reference value of the value are subjected to a shift operation on the multiplied cumulative value to obtain an output value of the real number field of the first target layer.
  • the processor 910 is further configured to: determine a weight reference value according to a weight of the first target layer and a magnitude of the maximum weight coefficient; according to the weight reference value and the weight pair The number field width is wide, and the weight coefficient of the real field in the first target layer is converted to a logarithmic domain to obtain a weight coefficient of the log domain.
  • the processor 910 converts the weight coefficient of the real domain in the first target layer to a logarithmic domain according to the weight reference value and the weight log domain width, and obtains a weight coefficient of the log domain.
  • the method includes: converting the weight coefficient of the real field to the logarithmic domain according to the weight reference value, the weight log domain width, and the weight coefficient, and obtaining the weight coefficient of the log domain.
  • the weight log field width includes a one-bit symbol bit, and the symbol of the weight coefficient in the log domain is consistent with the sign of the weight coefficient in the real field.
  • the maximum weight coefficient is a maximum value of a weight coefficient of the first target layer formed by combining pre-processing on at least two layers of the neural network.
  • the processor 910 is further configured to perform the following steps: performing a combined pre-processing on at least two layers of the neural network to obtain a first target layer formed by combining.
  • the maximum output value is a maximum output value of the first target layer formed by combining each of the plurality of input samples.
  • the processor 910 performs a combined pre-processing on at least two layers of the neural network to obtain a merged first target layer, including: combining the convolutional layer and the normalized layer of the neural network. Pre-processing to obtain a first target layer; or, combining the convolution layer and the scaling layer of the neural network to obtain a first target layer; or, performing a convolution layer, a normalization layer, and a scaling layer of the neural network The pre-processing is combined to obtain a first target layer.
  • the first target layer includes a convolution layer, a transposed convolution layer, a normalization layer, a scaling layer, a pooling layer, a fully connected layer, a stitching layer, an element intelligent additive layer, and an active layer.
  • a convolution layer a transposed convolution layer
  • a normalization layer a scaling layer
  • a scaling layer a scaling layer
  • a pooling layer a fully connected layer
  • a stitching layer an element intelligent additive layer
  • an active layer One of the layers or at least two of the combined layers.
  • the apparatus of the embodiments of the present application may be implemented based on a memory and a processor, each memory is configured to store instructions for executing the method of the embodiments of the present application, and the processor executes the instructions, so that the apparatus performs the embodiments of the present application. Methods.
  • processors mentioned in the embodiment of the present application may be a central processing unit (CPU), and may also be other general-purpose processors, digital signal processors (DSPs), and application specific integrated circuits ( Application Specific Integrated Circuit (ASIC), Field Programmable Gate Array (FPGA) or other programmable logic device, discrete gate or transistor logic device, discrete hardware component, etc.
  • the general purpose processor may be a microprocessor or the processor or any conventional processor or the like.
  • the memory referred to in the embodiments of the present application may be a volatile memory or a non-volatile memory, or may include both volatile and non-volatile memory.
  • the non-volatile memory may be a read-only memory (ROM), a programmable read only memory (PROM), an erasable programmable read only memory (Erasable PROM, EPROM), or an electric Erase programmable read-only memory (EEPROM) or flash memory.
  • the volatile memory can be a Random Access Memory (RAM) that acts as an external cache.
  • RAM Random Access Memory
  • many forms of RAM are available, such as static random access memory (SRAM), dynamic random access memory (DRAM), synchronous dynamic random access memory (Synchronous DRAM). SDRAM), Double Data Rate SDRAM (DDR SDRAM), Enhanced Synchronous Dynamic Random Access Memory (ESDRAM), Synchronous Connection Dynamic Random Access Memory (Synchlink DRAM, SLDRAM) ) and direct memory bus random access memory (DR RAM).
  • processor is a general-purpose processor, DSP, ASIC, FPGA or other programmable logic device, discrete gate or transistor logic device, discrete hardware component, the memory (storage module) is integrated in the processor.
  • memories described herein are intended to comprise, without being limited to, these and any other suitable types of memory.
  • the embodiment of the present application further provides a computer readable storage medium having stored thereon instructions for causing a computer to execute the methods of the foregoing method embodiments when the instructions are run on a computer.
  • the embodiment of the present application further provides a computing device, which includes the above computer readable storage medium.
  • Embodiments of the present application can be applied to the field of aircraft, especially drones.
  • circuits, sub-circuits, and sub-units of various embodiments of the present application is merely illustrative. Those of ordinary skill in the art will appreciate that the circuits, sub-circuits, and sub-units of the various examples described in the embodiments disclosed herein can be further separated or combined.
  • a computer program product includes one or more computer instructions.
  • the computer can be a general purpose computer, a special purpose computer, a computer network, or other programmable device.
  • the computer instructions can be stored in a computer readable storage medium or transferred from one computer readable storage medium to another computer readable storage medium, for example, computer instructions can be wired from a website site, computer, server or data center (eg Coax, fiber, Digital Subscriber Line (DSL) or wireless (eg infrared, wireless, microwave, etc.) to another website, computer, server or data center.
  • the computer readable storage medium can be any available media that can be accessed by a computer or a data storage device such as a server, data center, or the like that includes one or more available media.
  • the usable medium may be a magnetic medium (for example, a floppy disk, a hard disk, a magnetic tape), an optical medium (for example, a high-density digital video disc (DVD)), or a semiconductor medium (for example, a solid state disk (SSD)). )Wait.
  • a magnetic medium for example, a floppy disk, a hard disk, a magnetic tape
  • an optical medium for example, a high-density digital video disc (DVD)
  • DVD high-density digital video disc
  • SSD solid state disk
  • the size of the sequence numbers of the foregoing processes does not mean the order of execution sequence, and the order of execution of each process should be determined by its function and internal logic, and should not be applied to the embodiment of the present application.
  • the implementation process constitutes any limitation.
  • B corresponding to A means that B is associated with A, and B can be determined according to A.
  • determining B from A does not mean that B is only determined based on A, and that B can also be determined based on A and/or other information.
  • the disclosed systems, devices, and methods may be implemented in other manners.
  • the device embodiments described above are merely illustrative.
  • the division of the unit is only a logical function division.
  • there may be another division manner for example, multiple units or components may be combined or Can be integrated into another system, or some features can be ignored or not executed.
  • the mutual coupling or direct coupling or communication connection shown or discussed may be an indirect coupling or communication connection through some interface, device or unit, and may be in an electrical, mechanical or other form.
  • the units described as separate components may or may not be physically separated, and the components displayed as units may or may not be physical units, that is, may be located in one place, or may be distributed to multiple network units. Some or all of the units may be selected according to actual needs to achieve the purpose of the solution of the embodiment.
  • each functional unit in each embodiment of the present application may be integrated into one processing unit, or each unit may exist physically separately, or two or more units may be integrated into one unit.

Landscapes

  • Engineering & Computer Science (AREA)
  • Physics & Mathematics (AREA)
  • Theoretical Computer Science (AREA)
  • General Physics & Mathematics (AREA)
  • Computing Systems (AREA)
  • General Engineering & Computer Science (AREA)
  • Pure & Applied Mathematics (AREA)
  • Mathematical Optimization (AREA)
  • Mathematical Analysis (AREA)
  • Computational Mathematics (AREA)
  • Software Systems (AREA)
  • Biophysics (AREA)
  • Biomedical Technology (AREA)
  • Health & Medical Sciences (AREA)
  • Life Sciences & Earth Sciences (AREA)
  • Mathematical Physics (AREA)
  • Data Mining & Analysis (AREA)
  • Computational Linguistics (AREA)
  • Molecular Biology (AREA)
  • Artificial Intelligence (AREA)
  • Evolutionary Computation (AREA)
  • General Health & Medical Sciences (AREA)
  • Nonlinear Science (AREA)
  • Neurology (AREA)
  • Image Analysis (AREA)

Abstract

一种数据转换方法和装置,该方法包括:根据神经网络的第一目标层的权重对数域位宽和最大权重系数的大小,确定权重基准值;根据该权重基准值和该权重对数域位宽,将该第一目标层中的权重系数转换到对数域。该方法中,权重系数在对数域的权重基准值不是经验值,而是根据权重对数域位宽和最大权重系数确定的,可以改善网络的表达能力,提高网络的准确率。

Description

数据转换方法和装置
版权申明
本专利文件披露的内容包含受版权保护的材料。该版权为版权所有人所有。版权所有人不反对任何人复制专利与商标局的官方记录和档案中所存在的该专利文件或者该专利披露。
技术领域
本申请涉及数据处理领域,尤其涉及一种数据转换方法和装置。
背景技术
当前主流的神经网络计算框架中,基本都是利用浮点数进行训练计算的。其中,神经网络的反向传播过程中,梯度的计算需要基于浮点数表示,以保证足够的精度;神经网络的前向传播过程的各层,尤其是卷积层和全连接层的权重系数和各层的输出值也均以浮点数表示。例如,在深度卷积神经网络的推理(inference)运算过程中,主要运算量集中于卷积运算中,而卷积运算是大量的乘累加运算。这一方面会消耗更多的硬件资源,另一方面也使得功率和带宽的消耗更高。
对于卷积运算的优化方法有多种,其中一种方法是将浮点数转换为定点数。但即使是使用定点数,在神经网络的加速器中,基于定点数的乘累加运算仍然需要大量的乘法器以保证运算的实时性。另外一种方法是将数据从实数域转换到对数域,将乘累加运算中的乘法运算转换为加法运算。
在现有的方案中,将数据从实数域转换到对数域需要参考满里程范围(Full Scale Range,FSR),FSR也可以称为转换参考值,是根据经验值得出的,对于不同的网络需要手动调参。并且,现有的将数据从实数域转换到对数域的方法仅适用于数据为正数值的情况,然而权重系数、输入特征值以及输出值在很多情况下是负数值。以上两点影响网络的表达能力,造成网络的准确率下降。
发明内容
本申请提供了一种数据转换方法和装置,可以改善网络的表达能力,提高网络的准确率。
第一方面,提供了一种数据转换方法,该方法包括:根据神经网络的第一目标层的权重对数域位宽和最大权重系数的大小,确定权重基准值;根据该权重基准值和该权重对数域位宽,将该第一目标层中的权重系数转换到对数域。
第一方面的数据转换方法,根据权重对数域位宽和最大权重系数的大小,确定权重基准值,基于权重基准值和权重对数域位宽,将权重系数转换到对数域,权重系数在对数域的权重基准值不是经验值,而是根据权重对数域位宽和最大权重系数确定的,可以改善网络的表达能力,提高网络的准确率。
第二方面,提供了一种数据转换方法,该方法包括:确定神经网络的第一目标层的输入特征值;通过移位运算,对输入特征值和对数域的权重系数进行乘累加计算,得到第一目标层的实数域的输出值。。
第二方面的数据转换方法,对输入特征值和对数域的权重系数进行简单的加法和移位操作即可实现乘累加运算,不需要乘法器,可以降低设备成本。
第三方面,提供了一种数据转换装置,该装置包括处理器和存储器,该存储器用于存储处理器执行的指令,该处理器用于执行以下步骤:根据神经网络的第一目标层的权重对数域位宽和最大权重系数的大小,确定权重基准值;根据该权重基准值和该权重对数域位宽,将该第一目标层中的权重系数转换到对数域。
第四方面,提供了一种数据转换装置,该装置包括处理器和存储器,该存储器用于存储处理器执行的指令,该处理器用于执行以下步骤:确定神经网络的第一目标层的输入特征值;通过移位运算,对该输入特征值和对数域的权重系数进行乘累加计算,得到第一目标层的实数域的输出值。
附图说明
图1是深度卷积神经网络的框架示意图。
图2是本申请一个实施例的数据转换方法的示意性流程图。
图3是本申请一个实施例的乘累加运算流程的示意图。
图4是本申请另一个实施例的乘累加运算流程的示意图。
图5A、图5B和图5C是本申请实施例的合并预处理的几种情况的示意图;图5D是卷积层之后是BN层的一种层连接方式的示意图。
图6是本申请一个实施例的数据转换装置的示意性框图。
图7是本申请另一个实施例的数据转换装置的示意性框图。
图8是本申请另一个实施例的数据转换装置的示意性框图。
图9是本申请另一个实施例的数据转换装置的示意性框图。
具体实施方式
下面将结合附图,对本申请实施例中的技术方案进行描述。
除非另有定义,本文所使用的所有的技术和科学术语与属于本申请的技术领域的技术人员通常理解的含义相同。本文中在本申请的说明书中所使用的术语只是为了描述具体的实施例的目的,不是旨在于限制本申请。
首先介绍本申请实施例涉及的相关技术及概念。
神经网络(以深度卷积神经网络(Deep Convolutional Neural Network,DCNN)为例):
图1是深度卷积神经网络的框架示意图。深度卷积神经网络的输入特征值(由输入层输入),经隐藏层进行卷积(convolution)、转置卷积(transposed convolution or deconvolution)、归一化(Batch Normalization,BN)、缩放(Scale)、全连接(fully connected)、拼接(Concatenation)、池化(pooling)、元素智能加法(element-wise addition)和激活(activation)等运算后,得到输出特征值(由输出层输出,本文中简称为输出值)。本申请实施例的神经网络的隐藏层可能涉及的运算不仅限于上述运算。
深度卷积神经网络的隐藏层可以包括级联的多层。每层的输入为上层的输出,为特征图(feature map),每层对输入的一组或多组特征图进行前述描述的至少一种运算,得到该层的输出。每层的输出也是特征图。一般情况下,各层以实现的功能命名,例如实现卷积运算的层称作卷积层。此外,隐藏层还可以包括转置卷积层、BN层、Scale层、池化层、全连接层、Concatenation层、元素智能加法层和激活层等等,此处不进行一一列举。各层的具体运算流程可以参考现有的技术,本文不进行赘述。
应理解,每层(包括输入层和输出层)可以有一个输入和/或一个输出, 也可以有多个输入和/或多个输出。在视觉领域的分类和检测任务中,特征图的宽高往往是逐层递减的(例如图1所示的输入、特征图#1、特征图#2、特征图#3和输出的宽高是逐层递减的);而在语义分割任务中,特征图的宽高在递减到一定深度后,有可能会通过转置卷积运算或上采样(upsampling)运算,再逐层递增。
通常情况下,卷积层的后面会紧接着一层激活层,常见的激活层有线性整流函数(Rectified Linear Unit,ReLU)层、S型(sigmoid)层和双曲正切(tanh)层等。在BN层被提出以后,越来越多的神经网络在卷积之后会先进行BN处理,然后再进行激活计算。
当前,需要较多权重参数用于运算的层有:卷积层、全连接层、转置卷积层和BN层。
实数域:
数据在实数域表示是指以数据本身的大小来表示数据。
对数域:
数据在对数域表示是指以数据的绝对值的对数值(例如,对数据的绝对值取以2为底的log值)的大小来表示数据。
本申请实施例提供了一种数据转换方法,该方法包括离线部分和在线部分。离线部分是在神经网络开始运算之前或者运算之外,确定权重系数在对数域对应的权重基准值,将权重系数转换到对数域。同时,还可以确定每层的输出值在对数域对应的输出基准值。在线部分是神经网络具体的运算过程,即得到输出值的过程。
首先来说明将数据从实数域转换到对数域后,神经元的乘法运算的流程。例如,该神经元的实数域的权重系数w=0.25,实数域的输入特征值x=128。传统的实数域的运算方式下,实数域的输出值y=w*x=0.25*128=32,该乘法运算需要乘法器来实现,对硬件的要求非常高。权重系数w=0.25=2 -2,在对数域下以
Figure PCTCN2018077573-appb-000001
表示即可。输入特征值x=128=2 7,在对数域下以
Figure PCTCN2018077573-appb-000002
表示即可。上述乘法运算在对数域下则可转换为加法运算,输出值y=2 -2*2 7=2 (-2+7)=2 5,即输出值y在对数域下以
Figure PCTCN2018077573-appb-000003
表示。将对数域下的输出值
Figure PCTCN2018077573-appb-000004
转换回实数域仅需要移位操作,输出值y=1<<(-2+7)=32。这样,乘法运算只需要加法运算和移位操作即可得到结果。
上述过程中出现了对数域的权重系数
Figure PCTCN2018077573-appb-000005
为了简化对数域下数据的 表达,现有的一种方案提出了根据经验值得出FSR。例如FSR=10,位宽为3,那么对数域的数据对应的范围为{0,3,4,5,6,7,8,9}。-2可以对应该范围内的某个数值,从而避免出现负值的对数域的权重系数。
现有的方案中,不同的网络下需要手动调整FSR的大小。并且,现有的将数据从实数域转换到对数域的方案仅适用于实数域的数据为正数值的情况,然而权重系数、输入特征值以及输出值在很多情况下是负数值。以上两点影响网络的表达能力,造成神经网络(下文中简称为网络)的准确率下降。
本文中,假设给定的权重系数的权重对数域位宽为BW_W,输入特征值的输入值对数域位宽为BW_X,输出值的输出值对数域位宽为BW_Y。
图2是本申请一个实施例的数据转换方法200的示意性流程图。如图2所示,方法200包括以下步骤。
S210,根据神经网络的第一目标层的权重对数域位宽和最大权重系数的大小,确定权重基准值。
S220,根据权重基准值和权重对数域位宽,将第一目标层中的权重系数转换到对数域。
本申请实施例的数据转换方法,根据权重对数域位宽和最大权重系数的大小,确定权重基准值,基于权重基准值和权重对数域位宽,将权重系数转换到对数域,权重系数在对数域的权重基准值不是经验值,而是根据权重对数域位宽和最大权重系数确定的,可以改善网络的表达能力,提高网络的准确率。
应理解,最大权重系数可以认为是权重系数的参考权重值,记为RW。本申请实施例中,参考权重值也可以选取去除异常值之后的最大权重系数,还可以选取最大权重系数以外的值,本申请实施例对此不作限定。对于神经网络中的任意一层,例如第一目标层,该层的权重基准值的大小可以由第一目标层的最大权重系数决定。权重基准值记为BASE_W。应说明的是,本申请各实施例可以根据精度的需要计算权重基准值。权重基准值可以是整数,也可以包括小数位;可以是正数,也可以是负数。其值可以由下列公式(1)给出。
BASE_W=ceil(log 2|RW|)-2 BW_W- 1+1      公式(1)
其中,ceil()为向上取整函数。
根据公式(1)确定权重基准值BASE_W,可以使得在将权重系数转换 到对数域时,较大的权重系数有较高的准确度。
应理解,公式(1)中的2 BW_W-1项是基于权重系数转化到对数域时包括一位符号位的情况给出的,当对数域的权重系数不设置符号位时,该项也可以为2 BW_W。本申请实施例不限于通过公式(1)确定BASE_W,可以基于其他原理和通过其他公式确定BASE_W。
还应理解,S220中可以将第一目标层中的所有的权重系数转换到对数域,也仅可以将第一目标层中的一部分的权重系数转换到对数域,本申请实施例对此不作限定。
S220,根据权重基准值和权重对数域位宽,将第一目标层中的权重系数转换到对数域,可以包括:根据权重基准值、权重对数域位宽和权重系数的大小将权重系数转换到对数域。
在本申请实施例中,权重对数域位宽中可以包括一位符号位,权重系数在对数域的符号与权重系数在实数域的符号一致。现有的方案中,在数据转换到对数域时,如果数据值为负值,则数据统一转换为对应于实数域的0值的对数域值。而在本申请实施例中,权重系数的正负符号得到了保留,有利于提高网络的准确率。
具体地,权重参数转换到对数域可以通过下列公式(2)计算。
Figure PCTCN2018077573-appb-000006
其中,sign()具体可以表示为下列公式(3)。
Figure PCTCN2018077573-appb-000007
Round()具体可以表示为下列公式(4)。
Round(z)=int(z+0.5)         公式(4)
其中,int为取整函数。
Clip()具体可以表示为下列公式(5)。
Figure PCTCN2018077573-appb-000008
由此,可以用BASE_W和对数域的权重系数
Figure PCTCN2018077573-appb-000009
来表示实数域的权重系 数w。
在一个具体的例子中,对于第一目标层的权重系数,权重对数域位宽为BW_W=4,权重参考值RW=64。则有权重基准值BASE_W=ceil(log 2|64|)-2 4-1+1=6-8+1=-1。
Figure PCTCN2018077573-appb-000010
的取值范围为:
Figure PCTCN2018077573-appb-000011
其中,-8表示实数域的0,正负号代表的是实数域的正负。
以上的例子中权重对数域位宽BW_W全部是整数位,例如,在考虑权重基准值BASE_W后,可以通过4位位宽实现±(0-128),即±(0、1、2、4、8、16、32、64和128)的权重系数值的表示,其中1位为符号位。本申请实施例的权重对数域位宽BW_W也可以包括小数位。例如,在考虑权重基准值BASE_W后,可以通过4位位宽(两位整数和一位小数)来表示±(0-2 3.5),即±(0、2 0、2 0.5、2 1、2 1.5、2 2、2 2.5、2 3和2 3.5)的权重系数值,其中1位为符号位。
本申请实施例中,权重对数域位宽中也可以不包括符号位。例如,在权重系数均为正值的情况下,权重对数域位宽中可以不包括符号位。
以上是离线部分中将权重系数转换到对数域的过程。离线部分还可以包括确定输出值在对数域对应的输出基准值,但其在一定的场景下不是必要的步骤。即,在实际应用中,有可能只需要把权重转换到对数域,而不需要将输出值则转换到对数域,因此该步骤是可选的。相应地,方法200还可以包括:根据第一目标层的输出值对数域位宽和参考输出值RY的大小,确定输出基准值。该步骤可以在S220之后执行,也可以在S210之前执行,还可以在执行S210至S220的同时执行,本申请实施例对此不作限定。
其中,参考输出值RY可以通过以下步骤来确定。计算多个输入样本中每个输入样本在第一目标层的最大输出值;从多个最大输出值中选取出参考输出值RY。具体而言,从多个最大输出值中选取出参考输出值RY,可以包括:对多个最大输出值进行排序,按照预设的选取参数从多个最大输出值中选取出参考输出值RY。
具体而言,对多个最大输出值(例如,M个最大输出值)进行排序,例如升序排列或者降序排列,或者依照某预设规则排序。排好序后,按照预设的选取参数(例如选取参数为选取排序后特定位置的值)从M个最大输出 值中选取出一个最大输出值,作为参考输出值RY。
在一个具体的例子中,将M个最大输出值按从小到大的顺序排列,选取参数为a,选取第a×M个最大输出值作为参考输出值RY,其中,a大于或等于0且小于或等于1。在一些实施例中,选取参数a可以为选取最大值(即a=1)和次大值,这里对选取参考输出值RY的方法不做限定。
应理解,参考输出值RY也可以通过其他的方法确定,本申请实施例对此不作限定。
具体地,根据输出值对数域位宽BW_Y和参考输出值RY,确定输出基准值BASE_Y可以通过下列公式(6)计算。
BASE_Y=ceil(log 2|RY|)-2 BW_Y- 1+1      公式(6)
应理解,本申请实施例不限于通过公式(6)确定BASE_Y,可以基于其他原理和通过其他公式确定BASE_Y。
本申请实施例中,权重系数和输出值都可以基于基准值以差分值的形式表示,以保证每个差分值都是正数,只有基准值有可能是负数。这样每个权重系数和输出值可以节省1比特(bit)的位宽,可以降低存储开销,对于神经网络庞大的数据规模而言,可以产生显著的带宽增益。
对于在线部分,一次卷积运算或全连接运算或神经网络的其他层的运算,可能可以用公式(7)的乘累加运算表示。
Figure PCTCN2018077573-appb-000012
其中,kc为输入特征值的通道数,kh为卷积核的高度,kw为卷积核的宽度,x为输入特征值,w为权重系数,y为输出值。
相应地,在S220根据权重基准值和权重对数域位宽,将第一目标层中的权重系数转换到对数域之后,方法200还包括以下步骤。确定第一目标层的输入特征值。通过移位运算,对输入特征值和对数域的权重系数进行乘累加计算,得到第一目标层的实数域的输出值。
具体而言,本申请各实施例可以通过加法运算结合移位运算,得到实数域的输出值。在输入特征值分别为实数域的输入特征值和对数域的输入特征值的情况下,本申请实施例可以有不同的处理方式。
在一种可选的方案中,输入特征值是实数域的输入特征值。所述通过移位运算,对所述输入特征值和所述对数域的权重系数进行乘累加计算,得到 所述第一目标层的实数域的输出值,可以包括:通过第一移位运算,对所述实数域的输入特征值和所述对数域的权重系数进行乘累加计算,得到乘累加值;对所述乘累加值进行第二移位运算,得到所述第一目标层的实数域的输出值。
具体地,本申请的一部分实施例的输入特征值(例如,神经网络的第一层的输入特征值)不转换到对数域,因为将输入特征值转换到对数域,会造成细节的损失,因此输入特征值保留实数域表示。权重系数w可以已经在离线部分转换到对数域,以BASE_W和非负的
Figure PCTCN2018077573-appb-000013
表示。输出值的输出基准值BASE_Y也已经在离线部分确定。
具体地,在该可选的方案中的一种具体的情况下,输入特征值和输出值可以为实数域的定点数,权重系数w可以已经在离线部分转换到对数域,以BASE_W和非负的
Figure PCTCN2018077573-appb-000014
表示。假设输入特征值的定点格式为QA.B,输出值的定点格式为QC.D。其中,A和C代表整数位位宽,B和D代表小数位位宽。
由此,对乘累加值进行第二移位运算,得到第一目标层的实数域的输出值,可以包括:根据实数域的输入特征值的小数位位宽和实数域的输出值的小数位位宽,对乘累加值进行移位运算,得到第一目标层的实数域的输出值。由于权重系数是以BASE_W和非负的
Figure PCTCN2018077573-appb-000015
表示的对数域的值,所以更进一步地,根据实数域的输入特征值的小数位位宽、实数域的输出值的小数位位宽和权重基准值,对乘累加值进行移位运算,得到第一目标层的实数域的输出值。
具体地,公式(7)的乘累加运算可以简化为公式(8)。
y=bitshift(y sum,B-BASE_W-D)       公式(8)
其中,公式(8)中的bitshift(y sum,B-BASE_W-D)是第二移位运算,其具体可以表示为下列公式(9)。
Figure PCTCN2018077573-appb-000016
y sum可以通过下列公式(10)和公式(11)计算。
Figure PCTCN2018077573-appb-000017
其中,公式(10)中的
Figure PCTCN2018077573-appb-000018
是第一移位运算,其具体可以表示为下列公式(11)。
Figure PCTCN2018077573-appb-000019
通过公式(8)至公式(11)可以得到定点格式为QC.D的实数域的输出值y。
在一个具体的例子中,假设输入特征值的定点格式为Q7.0,实数域的输入特征值分别为x1=4,x2=8,x3=16。输出值的定点格式为Q4.3。权重对数域位宽为BW_W=4,权重基准值BASE_W=-7,
Figure PCTCN2018077573-appb-000020
Figure PCTCN2018077573-appb-000021
则有实数域的输出值y=(-(4<<1)+8<<2)>>(0-(-7)-3)=(-8+32)>>4=1。
其中,<<表示左移,>>表示右移。由于
Figure PCTCN2018077573-appb-000022
表示实数域的0,因此x3*w3不需要计算。
图3是本申请一个实施例的乘累加运算流程300的示意图。流程300包括以下步骤。S310和S320在离线部分实现,S330和S340在在线部分实现。
S310,根据最大权重系数计算权重基准值。具体地,根据第一目标层的权重对数域位宽和最大权重系数的大小,确定权重基准值。
S320,根据权重基准值,将实数域的权重系数转换到对数域,得到对数域的权重系数。具体地,根据权重基准值和权重对数域位宽,将第一目标层中的实数域的权重系数转换到对数域,得到对数域的权重系数。
S330,根据实数域的输入特征值和对数域的权重系数,计算实数域的输出值。具体地,通过第一移位运算,对实数域的输入特征值和对数域的权重系数进行乘累加计算,得到乘累加值;对乘累加值进行第二移位运算,得到第一目标层的实数域的输出值。
S340,输出实数域的输出值,可以是实数域定点格式的输出值。
如果后续计算需要将实数域的输出值y转换到对数域。则本方案的方法还可以包括将实数域的输出值转换到对数域的步骤。具体地,在根据实数域的输入特征值的小数位位宽、实数域的输出值的小数位位宽和权重基准值,对乘累加值进行移位运算,得到第一目标层的实数域的输出值之后,方法还可以包括:根据输出基准值、输出值对数域位宽和实数域的输出值的大小,将实数域的输出值转换到对数域。其中,输出值对数域位宽中可以包括一位符号位,输出值在对数域的符号与输出值在实数域的符号一致。
具体地,输出值转换到对数域可以通过与公式(2)类似的公式(12) 计算。
Figure PCTCN2018077573-appb-000023
在上述可选的方案中的另一种具体的情况下,对乘累加值进行第二移位运算,得到第一目标层的实数域的输出值,可以包括:根据权重基准值和输出基准值,对乘累加值进行移位运算,得到第一目标层的实数域的输出值。
具体地,公式(7)的乘累加运算可以简化为公式(13)。
y=bitshift(y sum,BASE_Y-BASE_W-1)      公式(13)
其中公式(13)中的减1是为了多保留一位尾数,因此可以把这个数当作是有一位小数位的定点数。
bitshift()和y sum可以参考公式(9)至公式(11)。其中,公式(13)中的bitshift(y sum,BASE_Y-BASE_W-1)是第二移位运算。
这里的实数域的输出值y是已经考虑输出基准值BASE_Y之后的实数域的输出值。该实数域的输出值y可以转换到对数域。具体而言,在根据权重基准值和输出基准值,对乘累加值进行移位运算,得到第一目标层的实数域的输出值之后,方法200可以还可以包括:根据输出值对数域位宽和实数域的输出值的大小,将实数域的输出值转换到对数域。其中,输出值对数域位宽中包括一位符号位,输出值在对数域的符号与输出值在实数域的符号一致。
具体地,该实数域的输出值y转换到对数域可以通过下列公式(14)计算。
Figure PCTCN2018077573-appb-000024
sign()、Round()和Clip()可以参考公式(3)至公式(5)。
在另一种可选的方案中,所述输入特征值是对数域的输入特征值。所述通过移位运算,对所述输入特征值和所述对数域的权重系数进行乘累加计算,得到所述第一目标层的实数域的输出值,可以包括:通过第三移位运算, 对所述对数域的输入特征值和所述对数域的权重系数进行乘累加计算,得到乘累加值;对所述乘累加值进行第四移位运算,得到所述第一目标层的实数域的输出值。该可选的方案适用于神经网络的中间层,中间层的输入特征值是上一层的输出值,并已经被转换到对数域的情况。
应该理解,这里要讨论的第一目标层(神经网络的中间层)的上一层的输出值的输出基准值可以认为是第一目标层的输入特征值的输入基准值,记为BASE_X。第一目标层的输出值的输出基准值为BASE_Y,第一目标层的权重系数的权重基准值BASE_W。
具体而言,对乘累加值进行第四移位运算,得到第一目标层的实数域的输出值,可以包括:根据对数域的输入特征值的输入基准值、输出基准值和权重基准值,对乘累加值进行移位运算,得到第一目标层的实数域的输出值。
具体地,公式(7)的乘累加运算可以简化为公式(15)。
y=bitshift(y sum,BASE_Y-BASE_W-BASE_X-1)     公式(15)
bitshift()可以参考公式(9)。其中,公式(15)中的bitshift(y sum,BASE_Y-BASE_W-BASE_X-1)是第四移位运算。
y sum可以通过下列公式(16)和公式(17)计算。
Figure PCTCN2018077573-appb-000025
Figure PCTCN2018077573-appb-000026
其中,公式(16)中的
Figure PCTCN2018077573-appb-000027
是第三移位运算。
在一个具体的例子中,假设对数域的输入特征值分别为
Figure PCTCN2018077573-appb-000028
输出基准值BASE_X=2。权重对数域位宽为BW_W=4,权重基准值BASE_W=-7,
Figure PCTCN2018077573-appb-000029
输出对数域位宽为BW_Y=4,输出基准值BASE_Y=3。
则有实数域的输出值y=[-(1<<(1+2))+(1<<(0+5))]>>3-(-7)-2-1=[-8+64]>>7=0。
输出值y可以通过公式(12)转换到对数域
Figure PCTCN2018077573-appb-000030
也可以不转换到对数域,本申请实施例对此不作限定。
图4是本申请另一个实施例的乘累加运算流程400的示意图。流程400包括以下步骤。S410至S430在离线部分实现,S440和S450在在线部分实现。
S410,根据最大权重系数计算权重基准值。具体地,根据第一目标层的权重对数域位宽和最大权重系数的大小,确定权重基准值。
S420,根据权重基准值,将实数域的权重系数转换到对数域,得到对数域的权重系数。具体地,根据权重基准值和权重对数域位宽,将第一目标层中的实数域的权重系数转换到对数域,得到对数域的权重系数。
S430,根据参考输出值计算输出基准值。具体地,根据第一目标层的输出值对数域位宽和参考输出值的大小,确定输出基准值。
S440,根据实数域的输入特征值、对数域的权重系数和输出基准值,计算实数域的输出值。
S450,根据实数域的输出值的大小和输出基准值,将实数域的输出值转换到对数域。根据输出值对数域位宽、输出基准值和实数域的输出值的大小,将实数域的输出值转换到对数域。
本申请各实施例中,log 2()的可以通过取除符号位外,从高位到低位,第一个不为0对应的位置实现。
Figure PCTCN2018077573-appb-000031
的两个乘法运算在硬件设计中是异或和符号位拼接,也就是说,并不需要乘法器。
应理解,本申请实施例还提供一种数据转换方法,该方法包括以下步骤。确定神经网络的第一目标层的输入特征值;通过移位运算,对所述输入特征值和对数域的权重系数进行乘累加计算,得到所述第一目标层的实数域的输出值。确定对数域的权重系数可以以现有的方式或本申请实施例的方法得到,本申请实施例对此不作限定。
本申请实施例的第一目标层可以包括卷积层、转置卷积层、BN层、Scale层、池化层、全连接层、Concatenation层、element-wise addition层和激活层中的一层或至少两层合并后的层。即,本申请实施例的数据转换方法200可以应用于神经网络的隐藏层的任一层或多层。
对应于第一目标层为至少两层合并后的层的情况,数据转换方法200还可以包括:对神经网络的至少两层进行合并预处理,得到合并后形成的第一目标层。该处理可以认为是数据定点化方法的预处理部分。
神经网络的训练阶段完成后,推理阶段的卷积层、BN层和Scale层的 参数是固定的。通过计算推导可以知道,其实BN层和Scale层的参数是可以合并到卷积层的参数里,这样神经网络的知识产权核(Intellectual Property core,IP核)就不需要专门为BN层和Scale层设计专用电路了。
早期的神经网络中,卷积层之后为激活层。为了防止网络过拟合、加快收敛速度、增强网络泛化能力等,在卷积层之后激活层之前可以引入BN层。BN层的输入包括Β={x 1,...,x m}={x i}以及参数γ和β,其中,x i既是卷积层的输出又是BN层的输入,参数γ和β在训练阶段进行计算,在推理阶段均为常数。BN层的输出为{y i=BN γ,β(x i)}。
其中,
Figure PCTCN2018077573-appb-000032
Figure PCTCN2018077573-appb-000033
Figure PCTCN2018077573-appb-000034
Figure PCTCN2018077573-appb-000035
因此
Figure PCTCN2018077573-appb-000036
和y i的计算可以化简为:
Figure PCTCN2018077573-appb-000037
Figure PCTCN2018077573-appb-000038
x i是卷积层的输出,令X为卷积层的输入,W为权重系数矩阵,
Figure PCTCN2018077573-appb-000039
为偏置值则有:
Figure PCTCN2018077573-appb-000040
Figure PCTCN2018077573-appb-000041
由此,卷积层和BN层的合并完成。
Scale层本身就是要计算y i=ax i+b,参考BN层与卷积层的合并,也可将Scale层和卷积层合并。在Caffe框架下,BN层的输出是
Figure PCTCN2018077573-appb-000042
因此基于Caffe框架设计的神经网络,通常会在BN层后加入Scale层以实现完整的归一化。
由此,对神经网络的至少两层进行合并预处理,得到合并后形成的第一目标层,可以包括:对神经网络的卷积层和BN层进行合并预处理,得到第 一目标层;或,对神经网络的卷积层和Scale层进行合并预处理,得到第一目标层;或,对神经网络的卷积层、BN层和Scale层进行合并预处理,得到第一目标层。
相对应地,本申请实施例中,最大权重系数可以为对神经网络的至少两层进行合并预处理后形成的第一目标层的权重系数的最大值。
本申请实施例中,最大输出值为多个输入样本中每个输入样本在合并后形成的第一目标层的最大输出值。
图5A、图5B和图5C是本申请实施例的合并预处理的几种情况的示意图。图5D是最简单的卷积层之后是BN层的一种层连接方式。
如图5A所示,未进行合并预处理之前,卷积层后是BN层,再之后是激活层,将卷积层和BN层合并为第一目标层,其后为激活层,得到类似于图5D的两层结构。
应理解,某些IP核支持Scale层的处理,那么合并预处理中卷积层与BN层的合并,可以替换为卷积层与Scale层的合并。如图5B所示,未进行合并预处理之前,卷积层后是Scale层,再之后是激活层,将卷积层和Scale层合并为第一目标层,其后为激活层,得到类似于图5D的两层结构。
如图5C所示,未进行合并预处理之前,卷积层后是BN层,继而为Scale层,再之后是激活层,将卷积层、BN层和Scale层合并为第一目标层,其后为激活层,得到类似于图5D的两层结构。
以上详细说明了本申请实施例的方法,下面详细说明了本申请实施例的装置。
图6是本申请一个实施例的数据转换装置600的示意性框图。数据转换装置600包括权重基准确定模块610和权重对数转换模块620。权重基准确定模块610用于根据神经网络的第一目标层的权重对数域位宽和最大权重系数的大小,确定权重基准值。权重对数转换模块620用于根据权重基准值和权重对数域位宽,将第一目标层中的权重系数转换到对数域。
本申请实施例的数据转换装置,根据权重对数域位宽和最大权重系数的大小,确定权重基准值,基于权重基准值和权重对数域位宽,将权重系数转换到对数域,权重系数在对数域的权重基准值不是经验值,而是根据权重对数域位宽和最大权重系数确定的,可以改善网络的表达能力,提高网络的准确率。
可选地,作为一个实施例,权重对数转换模块620根据权重基准值和权重对数域位宽,将第一目标层中的权重系数转换到对数域,包括:权重对数转换模块620根据权重基准值、权重对数域位宽和权重系数的大小将权重系数转换到对数域。
可选地,作为一个实施例,权重对数域位宽中包括一位符号位,权重系数在对数域的符号与权重系数在实数域的符号一致。
可选地,作为一个实施例,数据转换装置600还可以包括实数输出模块630。实数输出模块630用于在权重对数转换模块620根据权重基准值和权重对数域位宽,将第一目标层中的权重系数转换到对数域之后,实数输出模块630确定所述第一目标层的输入特征值;实数输出模块630通过移位运算,对所述输入特征值和所述对数域的权重系数进行乘累加计算,得到所述第一目标层的实数域的输出值。
可选地,作为一个实施例,所述输入特征值是实数域的输入特征值。实数输出模块630通过移位运算,对所述输入特征值和所述对数域的权重系数进行乘累加计算,得到所述第一目标层的实数域的输出值,包括:实数输出模块630通过第一移位运算,对所述实数域的输入特征值和所述对数域的权重系数进行乘累加计算,得到乘累加值;实数输出模块630对所述乘累加值进行第二移位运算,得到所述第一目标层的实数域的输出值。
实数输出模块630对乘累加值进行第二移位运算,得到第一目标层的实数域的输出值,包括:实数输出模块630根据实数域的输入特征值的小数位位宽和实数域的输出值的小数位位宽,对乘累加值进行移位运算,得到第一目标层的实数域的输出值。
可选地,作为一个实施例,实数输出模块630根据实数域的输入特征值的小数位位宽和实数域的输出值的小数位位宽,对乘累加值进行移位运算,得到第一目标层的实数域的输出值,包括:实数输出模块630根据实数域的输入特征值的小数位位宽、实数域的输出值的小数位位宽和权重基准值,对乘累加值进行移位运算,得到第一目标层的实数域的输出值。
可选地,作为一个实施例,数据转换装置600还可以包括对数输出模块640,用于在实数输出模块630根据实数域的输入特征值的小数位位宽、实数域的输出值的小数位位宽和权重基准值,对乘累加值进行移位运算,得到第一目标层的实数域的输出值之后,根据输出基准值、输出值对数域位宽和 实数域的输出值的大小,将实数域的输出值转换到对数域。
可选地,作为一个实施例,输出值对数域位宽中包括一位符号位,输出值在对数域的符号与输出值在实数域的符号一致。
可选地,作为一个实施例,实数输出模块630对乘累加值进行第二移位运算,得到第一目标层的实数域的输出值,包括:实数输出模块630根据权重基准值和输出基准值,对乘累加值进行移位运算,得到第一目标层的实数域的输出值。
可选地,作为一个实施例,数据转换装置600还可以包括对数输出模块640,用于在实数输出模块630根据权重基准值和输出基准值,对乘累加值进行移位运算,得到第一目标层的实数域的输出值之后,根据输出值对数域位宽和实数域的输出值的大小,将实数域的输出值转换到对数域。
可选地,作为一个实施例,输出值对数域位宽中包括一位符号位,输出值在对数域的符号与输出值在实数域的符号一致。
可选地,作为一个实施例,数据转换装置600还可以包括输出基准确定模块650,用于根据第一目标层的输出值对数域位宽和参考输出值的大小,确定输出基准值。
可选地,作为一个实施例,数据转换装置600还可以包括输出参考确定模块660,用于计算多个输入样本中每个输入样本在第一目标层的最大输出值;从多个最大输出值中选取出参考输出值。
可选地,作为一个实施例,输出参考确定模块660从多个最大输出值中选取出参考输出值,包括:输出参考确定模块660对多个最大输出值进行排序,按照预设的选取参数从多个最大输出值中选取出参考输出值。
可选地,作为一个实施例,所述输入特征值是对数域的输入特征值。实数输出模块630通过移位运算,对所述输入特征值和所述对数域的权重系数进行乘累加计算,得到所述第一目标层的实数域的输出值,包括:数输出模块630通过第三移位运算,对所述对数域的输入特征值和所述对数域的权重系数进行乘累加计算,得到乘累加值;数输出模块630对所述乘累加值进行第四移位运算,得到所述第一目标层的实数域的输出值。
可选地,作为一个实施例,实数输出模块630对乘累加值进行第四移位运算,得到第一目标层的实数域的输出值,包括:实数输出模块630根据对数域的输入特征值的输入基准值、输出基准值和权重基准值,对乘累加值进 行移位运算,得到第一目标层的实数域的输出值。
可选地,作为一个实施例,最大权重系数为对神经网络的至少两层进行合并预处理后形成的第一目标层的权重系数的最大值。
可选地,作为一个实施例,数据转换装置600还可以包括预处理模块670。预处理模块670用于对神经网络的至少两层进行合并预处理,得到合并后形成的第一目标层。
可选地,作为一个实施例,最大输出值为多个输入样本中每个输入样本在合并后形成的第一目标层的最大输出值。
可选地,作为一个实施例,预处理模块670对神经网络的至少两层进行合并预处理,得到合并后形成的第一目标层,包括:预处理模块670对神经网络的卷积层和归一化层进行合并预处理,得到第一目标层;或,对神经网络的卷积层和缩放层进行合并预处理,得到第一目标层;或,对神经网络的卷积层、归一化层和缩放层进行合并预处理,得到第一目标层。
可选地,作为一个实施例,第一目标层包括卷积层、转置卷积层、归一化层、缩放层、池化层、全连接层、拼接层、元素智能加法层和激活层中的一层或至少两层合并后的层。
应理解,上述权重基准确定模块610、权重对数转换模块620、实数输出模块630、对数输出模块640、输出基准确定模块650、输出参考确定模块660和预处理模块670可以由处理器和存储器实现。
图7是本申请另一个实施例的数据转换装置700的示意性框图。如图7所示的数据转换装置700可以包括处理器710和存储器720,存储器720中存储有计算机指令,处理器710执行计算机指令时,使得数据转换装置700执行以下步骤。根据神经网络的第一目标层的权重对数域位宽和最大权重系数的大小,确定权重基准值。根据权重基准值和权重对数域位宽,将第一目标层中的权重系数转换到对数域。
可选地,作为一个实施例,处理器710根据权重基准值和权重对数域位宽,将第一目标层中的权重系数转换到对数域,包括:根据权重基准值、权重对数域位宽和权重系数的大小将权重系数转换到对数域。
可选地,作为一个实施例,权重对数域位宽中包括一位符号位,权重系数在对数域的符号与权重系数在实数域的符号一致。
可选地,作为一个实施例,处理器710在根据权重基准值和权重对数域 位宽,将第一目标层中的权重系数转换到对数域之后,还用于执行以下步骤:确定所述第一目标层的输入特征值;通过移位运算,对所述输入特征值和所述对数域的权重系数进行乘累加计算,得到所述第一目标层的实数域的输出值。
可选地,作为一个实施例,所述输入特征值是实数域的输入特征值。处理器710通过移位运算,对所述输入特征值和所述对数域的权重系数进行乘累加计算,得到所述第一目标层的实数域的输出值,包括:通过第一移位运算,对所述实数域的输入特征值和所述对数域的权重系数进行乘累加计算,得到乘累加值;对所述乘累加值进行第二移位运算,得到所述第一目标层的实数域的输出值。。
可选地,作为一个实施例,处理器710对乘累加值进行第二移位运算,得到第一目标层的实数域的输出值,包括:根据实数域的输入特征值的小数位位宽和实数域的输出值的小数位位宽,对乘累加值进行移位运算,得到第一目标层的实数域的输出值。
可选地,作为一个实施例,处理器710根据实数域的输入特征值的小数位位宽和实数域的输出值的小数位位宽,对乘累加值进行移位运算,得到第一目标层的实数域的输出值,包括:根据实数域的输入特征值的小数位位宽、实数域的输出值的小数位位宽和权重基准值,对乘累加值进行移位运算,得到第一目标层的实数域的输出值。
可选地,作为一个实施例,处理器710在根据实数域的输入特征值的小数位位宽、实数域的输出值的小数位位宽和权重基准值,对乘累加值进行移位运算,得到第一目标层的实数域的输出值之后,还用于执行以下步骤:根据输出基准值、输出值对数域位宽和实数域的输出值的大小,将实数域的输出值转换到对数域。
可选地,作为一个实施例,输出值对数域位宽中包括一位符号位,输出值在对数域的符号与输出值在实数域的符号一致。
可选地,作为一个实施例,处理器710对乘累加值进行第二移位运算,得到第一目标层的实数域的输出值,包括:根据权重基准值和输出基准值,对乘累加值进行移位运算,得到第一目标层的实数域的输出值。
可选地,作为一个实施例,处理器710在根据权重基准值和输出基准值,对乘累加值进行移位运算,得到第一目标层的实数域的输出值之后,还 用于执行以下步骤:根据输出值对数域位宽和实数域的输出值的大小,将实数域的输出值转换到对数域。
可选地,作为一个实施例,输出值对数域位宽中包括一位符号位,输出值在对数域的符号与输出值在实数域的符号一致。
可选地,作为一个实施例,处理器710还用于执行以下步骤:根据第一目标层的输出值对数域位宽和参考输出值的大小,确定输出基准值。
可选地,作为一个实施例,处理器710还用于执行以下步骤:计算多个输入样本中每个输入样本在第一目标层的最大输出值;从多个最大输出值中选取出参考输出值。
可选地,作为一个实施例,处理器710从多个最大输出值中选取出参考输出值,包括:对多个最大输出值进行排序,按照预设的选取参数从多个最大输出值中选取出参考输出值。
可选地,作为一个实施例,输入特征值是对数域的输入特征值。处理器710通过移位运算,对所述输入特征值和所述对数域的权重系数进行乘累加计算,得到所述第一目标层的实数域的输出值,包括:通过第三移位运算,对所述对数域的输入特征值和所述对数域的权重系数进行乘累加计算,得到乘累加值;对所述乘累加值进行第四移位运算,得到所述第一目标层的实数域的输出值。
可选地,作为一个实施例,处理器710对乘累加值进行第四移位运算,得到第一目标层的实数域的输出值,包括:根据对数域的输入特征值的输入基准值、输出基准值和权重基准值,对乘累加值进行移位运算,得到第一目标层的实数域的输出值。
可选地,作为一个实施例,最大权重系数为对神经网络的至少两层进行合并预处理后形成的第一目标层的权重系数的最大值。
可选地,作为一个实施例,处理器710还用于执行以下步骤:对神经网络的至少两层进行合并预处理,得到合并后形成的第一目标层。
可选地,作为一个实施例,最大输出值为多个输入样本中每个输入样本在合并后形成的第一目标层的最大输出值。
可选地,作为一个实施例,处理器710对神经网络的至少两层进行合并预处理,得到合并后形成的第一目标层,包括:对神经网络的卷积层和归一化层进行合并预处理,得到第一目标层;或,对神经网络的卷积层和缩放层 进行合并预处理,得到第一目标层;或,对神经网络的卷积层、归一化层和缩放层进行合并预处理,得到第一目标层。
可选地,作为一个实施例,第一目标层包括卷积层、转置卷积层、归一化层、缩放层、池化层、全连接层、拼接层、元素智能加法层和激活层中的一层或至少两层合并后的层。
图8是本申请另一个实施例的数据转换装置800的示意性框图。数据转换装置800包括实数输出模块810。实数输出模块810用于确定神经网络的第一目标层的输入特征值;通过移位运算,对所述输入特征值和对数域的权重系数进行乘累加计算,得到所述第一目标层的实数域的输出值。
本申请实施例的数据转换装置,对输入特征值和对数域的权重系数进行简单的加法和移位操作即可实现乘累加运算,不需要乘法器,可以降低设备成本。
可选地,作为一个实施例,输入特征值是实数域的输入特征值。实数输出模块810通过移位运算,对所述输入特征值和所述对数域的权重系数进行乘累加计算,得到所述第一目标层的实数域的输出值,包括:实数输出模块810通过第一移位运算,对所述实数域的输入特征值和所述对数域的权重系数进行乘累加计算,得到乘累加值;实数输出模块810对所述乘累加值进行第二移位运算,得到所述第一目标层的实数域的输出值。
可选地,作为一个实施例,实数输出模块810对乘累加值进行第二移位运算,得到第一目标层的实数域的输出值,包括:实数输出模块810根据实数域的输入特征值的小数位位宽和实数域的输出值的小数位位宽,对乘累加值进行移位运算,得到第一目标层的实数域的输出值。
可选地,作为一个实施例,实数输出模块810根据实数域的输入特征值的小数位位宽和实数域的输出值的小数位位宽,对乘累加值进行移位运算,得到第一目标层的实数域的输出值,包括:实数输出模块810根据实数域的输入特征值的小数位位宽、实数域的输出值的小数位位宽和权重基准值,对乘累加值进行移位运算,得到第一目标层的实数域的输出值。
可选地,作为一个实施例,数据转换装置800还可以包括对数输出模块840,用于在实数输出模块810根据实数域的输入特征值的小数位位宽、实数域的输出值的小数位位宽和权重基准值,对乘累加值进行移位运算,得到第一目标层的实数域的输出值之后,根据输出基准值、输出值对数域位宽和 实数域的输出值的大小,将实数域的输出值转换到对数域。
可选地,作为一个实施例,输出值对数域位宽中包括一位符号位,输出值在对数域的符号与输出值在实数域的符号一致。
可选地,作为一个实施例,实数输出模块810对乘累加值进行移位运算,得到第一目标层的实数域的输出值,包括:实数输出模块810根据权重基准值和输出基准值,对乘累加值进行移位运算,得到第一目标层的实数域的输出值。
可选地,作为一个实施例,数据转换装置800还可以包括对数输出模块840,用于在实数输出模块810根据权重基准值和输出基准值,对乘累加值进行移位运算,得到第一目标层的实数域的输出值之后,根据输出值对数域位宽和实数域的输出值的大小,将实数域的输出值转换到对数域。
可选地,作为一个实施例,输出值对数域位宽中包括一位符号位,输出值在对数域的符号与输出值在实数域的符号一致。
可选地,作为一个实施例,数据转换装置800还可以包括输出基准确定模块850,用于根据第一目标层的输出值对数域位宽和参考输出值的大小,确定输出基准值。
可选地,作为一个实施例,数据转换装置800还可以包括输出参考确定模块860,用于计算多个输入样本中每个输入样本在第一目标层的最大输出值;从多个最大输出值中选取出参考输出值。
可选地,作为一个实施例,输出参考确定模块860从多个最大输出值中选取出参考输出值,包括:输出参考确定模块860对多个最大输出值进行排序,按照预设的选取参数从多个最大输出值中选取出参考输出值。
可选地,作为一个实施例,所述输入特征值是对数域的输入特征值。实数输出模块810通过移位运算,对所述输入特征值和所述对数域的权重系数进行乘累加计算,得到所述第一目标层的实数域的输出值,包括:实数输出模块810通过第三移位运算,对所述对数域的输入特征值和对数域的权重系数进行乘累加计算,得到乘累加值;实数输出模块810对所述乘累加值进行第四移位运算,得到所述第一目标层的实数域的输出值。
可选地,作为一个实施例,实数输出模块810对所述乘累加值进行第四移位运算,得到所述第一目标层的实数域的输出值,包括:实数输出模块810根据所述对数域的输入特征值的输入基准值、输出基准值和权重基准值,对 所述乘累加值进行移位运算,得到所述第一目标层的所述实数域的输出值。
可选地,作为一个实施例,数据转换装置800还可以包括权重基准确定模块820和权重对数转换模块830。权重基准确定模块820用于根据第一目标层的权重对数域位宽和最大权重系数的大小,确定权重基准值。权重对数转换模块830用于根据权重基准值和权重对数域位宽,将第一目标层中的实数域的权重系数转换到对数域,得到对数域的权重系数。
可选地,作为一个实施例,权重对数转换模块830根据权重基准值和权重对数域位宽,将第一目标层中的实数域的权重系数转换到对数域,得到对数域的权重系数,包括:权重对数转换模块830根据权重基准值、权重对数域位宽和权重系数的大小将实数域的权重系数转换到对数域,得到对数域的权重系数。
可选地,作为一个实施例,权重对数域位宽中包括一位符号位,权重系数在对数域的符号与权重系数在实数域的符号一致。
可选地,作为一个实施例,最大权重系数为对神经网络的至少两层进行合并预处理后形成的第一目标层的权重系数的最大值。
可选地,作为一个实施例,数据转换装置800还可以包括预处理模块870。预处理模块870用于对神经网络的至少两层进行合并预处理,得到合并后形成的第一目标层。
可选地,作为一个实施例,最大输出值为多个输入样本中每个输入样本在合并后形成的第一目标层的最大输出值。
可选地,作为一个实施例,预处理模块870对神经网络的至少两层进行合并预处理,得到合并后形成的第一目标层,包括:预处理模块870对神经网络的卷积层和归一化层进行合并预处理,得到第一目标层;或,对神经网络的卷积层和缩放层进行合并预处理,得到第一目标层;或,对神经网络的卷积层、归一化层和缩放层进行合并预处理,得到第一目标层。
可选地,作为一个实施例,第一目标层包括卷积层、转置卷积层、归一化层、缩放层、池化层、全连接层、拼接层、元素智能加法层和激活层中的一层或至少两层合并后的层。
应理解,上述实数输出模块810、权重基准确定模块820、权重对数转换模块830、对数输出模块840、输出基准确定模块850、输出参考确定模块860和预处理模块870可以由处理器和存储器实现。
图9是本申请另一个实施例的数据转换装置900的示意性框图。如图9所示的数据转换装置900可以包括处理器910和存储器920,存储器920中存储有计算机指令,处理器910执行计算机指令时,使得数据转换装置900执行以下步骤。确定神经网络的第一目标层的输入特征值;通过移位运算,对所述输入特征值和对数域的权重系数进行乘累加计算,得到所述第一目标层的实数域的输出值。
可选地,作为一个实施例,输入特征值是实数域的输入特征值。所述处理器910通过移位运算,对所述输入特征值和所述对数域的权重系数进行乘累加计算,得到所述第一目标层的实数域的输出值,包括:通过第一移位运算,对所述实数域的输入特征值和所述对数域的权重系数进行乘累加计算,得到乘累加值;对所述乘累加值进行第二移位运算,得到所述第一目标层的实数域的输出值。
可选地,作为一个实施例,处理器910对乘累加值进行第二移位运算,得到第一目标层的实数域的输出值,包括:根据实数域的输入特征值的小数位位宽和实数域的输出值的小数位位宽,对乘累加值进行移位运算,得到第一目标层的实数域的输出值。
可选地,作为一个实施例,处理器910根据实数域的输入特征值的小数位位宽和实数域的输出值的小数位位宽,对乘累加值进行移位运算,得到第一目标层的实数域的输出值,包括:根据实数域的输入特征值的小数位位宽、实数域的输出值的小数位位宽和权重基准值,对乘累加值进行移位运算,得到第一目标层的实数域的输出值。
可选地,作为一个实施例,处理器910在根据实数域的输入特征值的小数位位宽、实数域的输出值的小数位位宽和权重基准值,对乘累加值进行移位运算,得到第一目标层的实数域的输出值之后,还用于执行以下步骤:根据输出基准值、输出值对数域位宽和实数域的输出值的大小,将实数域的输出值转换到对数域。
可选地,作为一个实施例,输出值对数域位宽中包括一位符号位,输出值在对数域的符号与输出值在实数域的符号一致。
可选地,作为一个实施例,处理器910对乘累加值进行第二移位运算,得到第一目标层的实数域的输出值,包括:根据权重基准值和输出基准值,对乘累加值进行移位运算,得到第一目标层的实数域的输出值。
可选地,作为一个实施例,处理器910在根据权重基准值和输出基准值,对乘累加值进行移位运算,得到第一目标层的实数域的输出值之后,还用于执行以下步骤:根据输出值对数域位宽和实数域的输出值的大小,将实数域的输出值转换到对数域。
可选地,作为一个实施例,输出值对数域位宽中包括一位符号位,输出值在对数域的符号与输出值在实数域的符号一致。
可选地,作为一个实施例,处理器910还用于执行以下步骤:根据第一目标层的输出值对数域位宽和参考输出值的大小,确定输出基准值。
可选地,作为一个实施例,处理器910还用于执行以下步骤:计算多个输入样本中每个输入样本在第一目标层的最大输出值;从多个最大输出值中选取出参考输出值。
可选地,作为一个实施例,处理器910从多个最大输出值中选取出参考输出值,包括:对多个最大输出值进行排序,按照预设的选取参数从多个最大输出值中选取出参考输出值。
可选地,作为一个实施例,输入特征值是对数域的输入特征值。处理器910通过移位运算,对所述输入特征值和所述对数域的权重系数进行乘累加计算,得到所述第一目标层的实数域的输出值,包括:通过第三移位运算,对所述对数域的输入特征值和对数域的权重系数进行乘累加计算,得到乘累加值;对所述乘累加值进行第四移位运算,得到所述第一目标层的实数域的输出值。
可选地,作为一个实施例,处理器910对所述乘累加值进行第四移位运算,得到所述第一目标层的实数域的输出值,包括:根据所述对数域的输入特征值的输入基准值、输出基准值和权重基准值,对所述乘累加值进行移位运算,得到所述第一目标层的所述实数域的输出值。
可选地,作为一个实施例,处理器910还用于执行以下步骤:根据第一目标层的权重对数域位宽和最大权重系数的大小,确定权重基准值;根据权重基准值和权重对数域位宽,将第一目标层中的实数域的权重系数转换到对数域,得到对数域的权重系数。
可选地,作为一个实施例,处理器910根据权重基准值和权重对数域位宽,将第一目标层中的实数域的权重系数转换到对数域,得到对数域的权重系数,包括:根据权重基准值、权重对数域位宽和权重系数的大小将实数域 的权重系数转换到对数域,得到对数域的权重系数。
可选地,作为一个实施例,权重对数域位宽中包括一位符号位,权重系数在对数域的符号与权重系数在实数域的符号一致。
可选地,作为一个实施例,最大权重系数为对神经网络的至少两层进行合并预处理后形成的第一目标层的权重系数的最大值。
可选地,作为一个实施例,处理器910还用于执行以下步骤:对神经网络的至少两层进行合并预处理,得到合并后形成的第一目标层。
可选地,作为一个实施例,最大输出值为多个输入样本中每个输入样本在合并后形成的第一目标层的最大输出值。
可选地,作为一个实施例,处理器910对神经网络的至少两层进行合并预处理,得到合并后形成的第一目标层,包括:对神经网络的卷积层和归一化层进行合并预处理,得到第一目标层;或,对神经网络的卷积层和缩放层进行合并预处理,得到第一目标层;或,对神经网络的卷积层、归一化层和缩放层进行合并预处理,得到第一目标层。
可选地,作为一个实施例,第一目标层包括卷积层、转置卷积层、归一化层、缩放层、池化层、全连接层、拼接层、元素智能加法层和激活层中的一层或至少两层合并后的层。
应理解,本申请各实施例的装置可以基于存储器和处理器实现,各存储器用于存储用于执行本申请个实施例的方法的指令,处理器执行上述指令,使得装置执行本申请各实施例的方法。
应理解,本申请实施例中提及的处理器可以是中央处理单元(Central Processing Unit,CPU),还可以是其他通用处理器、数字信号处理器(Digital Signal Processor,DSP)、专用集成电路(Application Specific Integrated Circuit,ASIC)、现成可编程门阵列(Field Programmable Gate Array,FPGA)或者其他可编程逻辑器件、分立门或者晶体管逻辑器件、分立硬件组件等。通用处理器可以是微处理器或者该处理器也可以是任何常规的处理器等。
还应理解,本申请实施例中提及的存储器可以是易失性存储器或非易失性存储器,或可包括易失性和非易失性存储器两者。其中,非易失性存储器可以是只读存储器(Read-Only Memory,ROM)、可编程只读存储器(Programmable ROM,PROM)、可擦除可编程只读存储器(Erasable PROM,EPROM)、电可擦除可编程只读存储器(Electrically EPROM,EEPROM) 或闪存。易失性存储器可以是随机存取存储器(Random Access Memory,RAM),其用作外部高速缓存。通过示例性但不是限制性说明,许多形式的RAM可用,例如静态随机存取存储器(Static RAM,SRAM)、动态随机存取存储器(Dynamic RAM,DRAM)、同步动态随机存取存储器(Synchronous DRAM,SDRAM)、双倍数据速率同步动态随机存取存储器(Double Data Rate SDRAM,DDR SDRAM)、增强型同步动态随机存取存储器(Enhanced SDRAM,ESDRAM)、同步连接动态随机存取存储器(Synchlink DRAM,SLDRAM)和直接内存总线随机存取存储器(Direct Rambus RAM,DR RAM)。
需要说明的是,当处理器为通用处理器、DSP、ASIC、FPGA或者其他可编程逻辑器件、分立门或者晶体管逻辑器件、分立硬件组件时,存储器(存储模块)集成在处理器中。
应注意,本文描述的存储器旨在包括但不限于这些和任意其它适合类型的存储器。
本申请实施例还提供一种计算机可读存储介质,其上存储有指令,当指令在计算机上运行时,使得计算机执行上述各方法实施例的方法。
本申请实施例还提供一种计算设备,该计算设备包括上述计算机可读存储介质。
本申请实施例可以应用在飞行器,尤其是无人机领域。
应理解,本申请各实施例的电路、子电路、子单元的划分只是示意性的。本领域普通技术人员可以意识到,本文中所公开的实施例描述的各示例的电路、子电路和子单元,能够再行拆分或组合。
在上述实施例中,可以全部或部分地通过软件、硬件、固件或者其任意组合来实现。当使用软件实现时,可以全部或部分地以计算机程序产品的形式实现。计算机程序产品包括一个或多个计算机指令。在计算机上加载和执行计算机指令时,全部或部分地产生按照本申请实施例的流程或功能。计算机可以是通用计算机、专用计算机、计算机网络、或者其他可编程装置。计算机指令可以存储在计算机可读存储介质中,或者从一个计算机可读存储介质向另一个计算机可读存储介质传输,例如,计算机指令可以从一个网站站点、计算机、服务器或数据中心通过有线(例如同轴电缆、光纤、数字用户线(Digital Subscriber Line,DSL))或无线(例如红外、无 线、微波等)方式向另一个网站站点、计算机、服务器或数据中心进行传输。计算机可读存储介质可以是计算机能够存取的任何可用介质或者是包含一个或多个可用介质集成的服务器、数据中心等数据存储设备。可用介质可以是磁性介质(例如,软盘、硬盘、磁带)、光介质(例如,高密度数字视频光盘(Digital Video Disc,DVD))、或者半导体介质(例如,固态硬盘(Solid State Disk,SSD))等。
应理解,本申请各实施例均是以总位宽为16位(bit)为例进行说明的,本申请各实施例可以适用于其他的位宽。
应理解,说明书通篇中提到的“一个实施例”或“一实施例”意味着与实施例有关的特定特征、结构或特性包括在本申请的至少一个实施例中。因此,在整个说明书各处出现的“在一个实施例中”或“在一实施例中”未必一定指相同的实施例。此外,这些特定的特征、结构或特性可以任意适合的方式结合在一个或多个实施例中。
应理解,在本申请的各种实施例中,上述各过程的序号的大小并不意味着执行顺序的先后,各过程的执行顺序应以其功能和内在逻辑确定,而不应对本申请实施例的实施过程构成任何限定。
应理解,在本申请实施例中,“与A相应的B”表示B与A相关联,根据A可以确定B。但还应理解,根据A确定B并不意味着仅仅根据A确定B,还可以根据A和/或其它信息确定B。
应理解,本文中术语“和/或”,仅仅是一种描述关联对象的关联关系,表示可以存在三种关系,例如,A和/或B,可以表示:单独存在A,同时存在A和B,单独存在B这三种情况。另外,本文中字符“/”,一般表示前后关联对象是一种“或”的关系。
本领域普通技术人员可以意识到,结合本文中所公开的实施例描述的各示例的单元及算法步骤,能够以电子硬件、或者计算机软件和电子硬件的结合来实现。这些功能究竟以硬件还是软件方式来执行,取决于技术方案的特定应用和设计约束条件。专业技术人员可以对每个特定的应用来使用不同方法来实现所描述的功能,但是这种实现不应认为超出本申请的范围。
所属领域的技术人员可以清楚地了解到,为描述的方便和简洁,上述描述的系统、装置和单元的具体工作过程,可以参考前述方法实施例中的 对应过程,在此不再赘述。
在本申请所提供的几个实施例中,应该理解到,所揭露的系统、装置和方法,可以通过其它的方式实现。例如,以上所描述的装置实施例仅仅是示意性的,例如,所述单元的划分,仅仅为一种逻辑功能划分,实际实现时可以有另外的划分方式,例如多个单元或组件可以结合或者可以集成到另一个系统,或一些特征可以忽略,或不执行。另一点,所显示或讨论的相互之间的耦合或直接耦合或通信连接可以是通过一些接口,装置或单元的间接耦合或通信连接,可以是电性,机械或其它的形式。
所述作为分离部件说明的单元可以是或者也可以不是物理上分开的,作为单元显示的部件可以是或者也可以不是物理单元,即可以位于一个地方,或者也可以分布到多个网络单元上。可以根据实际的需要选择其中的部分或者全部单元来实现本实施例方案的目的。
另外,在本申请各个实施例中的各功能单元可以集成在一个处理单元中,也可以是各个单元单独物理存在,也可以两个或两个以上单元集成在一个单元中。
以上所述,仅为本申请的具体实施方式,但本申请的保护范围并不局限于此,任何熟悉本技术领域的技术人员在本申请揭露的技术范围内,可轻易想到变化或替换,都应涵盖在本申请的保护范围之内。因此,本申请的保护范围应以所述权利要求的保护范围为准。

Claims (88)

  1. 一种数据转换方法,其特征在于,包括:
    根据神经网络的第一目标层的权重对数域位宽和最大权重系数的大小,确定权重基准值;
    根据所述权重基准值和所述权重对数域位宽,将所述第一目标层中的权重系数转换到对数域。
  2. 根据权利要求1所述的方法,其特征在于,所述根据所述权重基准值和所述权重对数域位宽,将所述第一目标层中的权重系数转换到对数域,包括:
    根据所述权重基准值、所述权重对数域位宽和所述权重系数的大小将所述权重系数转换到对数域。
  3. 根据权利要求2所述的方法,其特征在于,所述权重对数域位宽中包括一位符号位,所述权重系数在对数域的符号与所述权重系数在实数域的符号一致。
  4. 根据权利要求1所述的方法,其特征在于,在根据所述权重基准值和所述权重对数域位宽,将所述第一目标层中的权重系数转换到对数域之后,所述方法还包括:
    确定所述第一目标层的输入特征值;
    通过移位运算,对所述输入特征值和所述对数域的权重系数进行乘累加计算,得到所述第一目标层的实数域的输出值。
  5. 根据权利要求4所述的方法,其特征在于,所述输入特征值是实数域的输入特征值,所述通过移位运算,对所述输入特征值和所述对数域的权重系数进行乘累加计算,得到所述第一目标层的实数域的输出值,包括:
    通过第一移位运算,对所述实数域的输入特征值和所述对数域的权重系数进行乘累加计算,得到乘累加值;
    对所述乘累加值进行第二移位运算,得到所述第一目标层的实数域的输出值。
  6. 根据权利要求5所述的方法,其特征在于,所述对所述乘累加值进行第二移位运算,得到所述第一目标层的实数域的输出值,包括:
    根据所述实数域的输入特征值的小数位位宽和所述实数域的输出值的 小数位位宽,对所述乘累加值进行移位运算,得到所述第一目标层的实数域的输出值。
  7. 根据权利要求6所述的方法,其特征在于,所述根据所述实数域的输入特征值的小数位位宽和所述实数域的输出值的小数位位宽,对所述乘累加值进行移位运算,得到所述第一目标层的实数域的输出值,包括:
    根据所述实数域的输入特征值的小数位位宽、所述实数域的输出值的小数位位宽和所述权重基准值,对所述乘累加值进行移位运算,得到所述第一目标层的所述实数域的输出值。
  8. 根据权利要求7所述的方法,其特征在于,在所述根据所述实数域的输入特征值的小数位位宽、所述实数域的输出值的小数位位宽和所述权重基准值,对所述乘累加值进行移位运算,得到所述第一目标层的所述实数域的输出值之后,所述方法还包括:
    根据输出基准值、输出值对数域位宽和所述实数域的输出值的大小,将所述实数域的输出值转换到对数域。
  9. 根据权利要求8所述的方法,其特征在于,所述输出值对数域位宽中包括一位符号位,所述输出值在对数域的符号与所述输出值在实数域的符号一致。
  10. 根据权利要求5所述的方法,其特征在于,所述对所述乘累加值进行第二移位运算,得到所述第一目标层的实数域的输出值,包括:
    根据所述权重基准值和输出基准值,对所述乘累加值进行移位运算,得到所述第一目标层的实数域的输出值。
  11. 根据权利要求10所述的方法,其特征在于,在根据所述权重基准值和输出基准值,对所述乘累加值进行移位运算,得到所述第一目标层的实数域的输出值之后,所述方法还包括:
    根据输出值对数域位宽和所述实数域的输出值的大小,将所述实数域的输出值转换到对数域。
  12. 根据权利要求11所述的方法,其特征在于,所述输出值对数域位宽中包括一位符号位,所述输出值在对数域的符号与所述输出值在实数域的符号一致。
  13. 根据权利要求8所述的方法,其特征在于,所述方法还包括:
    根据所述第一目标层的输出值对数域位宽和参考输出值的大小,确定所 述输出基准值。
  14. 根据权利要求13所述的方法,其特征在于,所述方法还包括:
    计算多个输入样本中每个所述输入样本在所述第一目标层的最大输出值;
    从多个所述最大输出值中选取出所述参考输出值。
  15. 根据权利要求14所述的方法,其特征在于,所述从多个所述最大输出值中选取出所述参考输出值,包括:
    对多个所述最大输出值进行排序,按照预设的选取参数从所述多个最大输出值中选取出所述参考输出值。
  16. 根据权利要求4所述的方法,其特征在于,所述输入特征值是对数域的输入特征值,所述通过移位运算,对所述输入特征值和所述对数域的权重系数进行乘累加计算,得到所述第一目标层的实数域的输出值,包括:
    通过第三移位运算,对所述对数域的输入特征值和所述对数域的权重系数进行乘累加计算,得到乘累加值;
    对所述乘累加值进行第四移位运算,得到所述第一目标层的实数域的输出值。
  17. 根据权利要求16所述的方法,其特征在于,所述对所述乘累加值进行第四移位运算,得到所述第一目标层的实数域的输出值,包括:
    根据所述对数域的输入特征值的输入基准值、所述输出基准值和所述权重基准值,对所述乘累加值进行移位运算,得到所述第一目标层的所述实数域的输出值。
  18. 根据权利要求1所述的方法,其特征在于,所述最大权重系数为对所述神经网络的至少两层进行合并预处理后形成的所述第一目标层的权重系数的最大值。
  19. 根据权利要求1所述的方法,其特征在于,所述方法还包括:
    对所述神经网络的至少两层进行合并预处理,得到合并后形成的所述第一目标层。
  20. 根据权利要求14所述的方法,其特征在于,所述最大输出值为所述多个输入样本中每个所述输入样本在合并后形成的所述第一目标层的所述最大输出值。
  21. 根据权利要求19所述的方法,其特征在于,所述对所述神经网络 的至少两层进行合并预处理,得到合并后形成的所述第一目标层,包括:
    对所述神经网络的卷积层和归一化层进行合并预处理,得到所述第一目标层;或,
    对所述神经网络的卷积层和缩放层进行合并预处理,得到所述第一目标层;或,
    对所述神经网络的卷积层、归一化层和缩放层进行合并预处理,得到所述第一目标层。
  22. 根据权利要求1所述的方法,其特征在于,所述第一目标层包括卷积层、转置卷积层、归一化层、缩放层、池化层、全连接层、拼接层、元素智能加法层和激活层中的一层或至少两层合并后的层。
  23. 一种数据转换方法,其特征在于,包括:
    确定神经网络的第一目标层的输入特征值;
    通过移位运算,对所述输入特征值和对数域的权重系数进行乘累加计算,得到所述第一目标层的实数域的输出值。
  24. 根据权利要求23所述的方法,其特征在于,所述输入特征值是实数域的输入特征值,所述通过移位运算,对所述输入特征值和所述对数域的权重系数进行乘累加计算,得到所述第一目标层的实数域的输出值,包括:
    通过第一移位运算,对所述实数域的输入特征值和所述对数域的权重系数进行乘累加计算,得到乘累加值;
    对所述乘累加值进行第二移位运算,得到所述第一目标层的实数域的输出值。
  25. 根据权利要求24所述的方法,其特征在于,所述对所述乘累加值进行第二移位运算,得到所述第一目标层的实数域的输出值,包括:
    根据所述实数域的输入特征值的小数位位宽和所述实数域的输出值的小数位位宽,对所述乘累加值进行移位运算,得到所述第一目标层的实数域的输出值。
  26. 根据权利要求25所述的方法,其特征在于,所述根据所述实数域的输入特征值的小数位位宽和所述实数域的输出值的小数位位宽,对所述乘累加值进行移位运算,得到所述第一目标层的实数域的输出值,包括:
    根据所述实数域的输入特征值的小数位位宽、所述实数域的输出值的小数位位宽和所述权重基准值,对所述乘累加值进行移位运算,得到所述第一 目标层的所述实数域的输出值。
  27. 根据权利要求26所述的方法,其特征在于,在所述根据所述实数域的输入特征值的小数位位宽、所述实数域的输出值的小数位位宽和所述权重基准值,对所述乘累加值进行移位运算,得到所述第一目标层的所述实数域的输出值之后,所述方法还包括:
    根据输出基准值、输出值对数域位宽和所述实数域的输出值的大小,将所述实数域的输出值转换到对数域。
  28. 根据权利要求27所述的方法,其特征在于,所述输出值对数域位宽中包括一位符号位,所述输出值在对数域的符号与所述输出值在实数域的符号一致。
  29. 根据权利要求24所述的方法,其特征在于,所述对所述乘累加值进行第二移位运算,得到所述第一目标层的实数域的输出值,包括:
    根据所述权重基准值和输出基准值,对所述乘累加值进行移位运算,得到所述第一目标层的实数域的输出值。
  30. 根据权利要求29所述的方法,其特征在于,在根据所述权重基准值和输出基准值,对所述乘累加值进行移位运算,得到所述第一目标层的实数域的输出值之后,所述方法还包括:
    根据输出值对数域位宽和所述实数域的输出值的大小,将所述实数域的输出值转换到对数域。
  31. 根据权利要求30所述的方法,其特征在于,所述输出值对数域位宽中包括一位符号位,所述输出值在对数域的符号与所述输出值在实数域的符号一致。
  32. 根据权利要求27或29所述的方法,其特征在于,所述方法还包括:
    根据所述第一目标层的输出值对数域位宽和参考输出值的大小,确定所述输出基准值。
  33. 根据权利要求32所述的方法,其特征在于,所述方法还包括:
    计算多个输入样本中每个所述输入样本在所述第一目标层的最大输出值;
    从多个所述最大输出值中选取出所述参考输出值。
  34. 根据权利要求33所述的方法,其特征在于,所述从多个所述最大输出值中选取出所述参考输出值,包括:
    对多个所述最大输出值进行排序,按照预设的选取参数从所述多个最大输出值中选取出所述参考输出值。
  35. 根据权利要求23所述的方法,其特征在于,所述输入特征值是对数域的输入特征值,所述通过移位运算,对所述输入特征值和所述对数域的权重系数进行乘累加计算,得到所述第一目标层的实数域的输出值,包括:
    通过第三移位运算,对所述对数域的输入特征值和对数域的权重系数进行乘累加计算,得到乘累加值;
    对所述乘累加值进行第四移位运算,得到所述第一目标层的实数域的输出值。
  36. 根据权利要求35所述的方法,其特征在于,所述对所述乘累加值进行第四移位运算,得到所述第一目标层的实数域的输出值,包括:
    根据所述对数域的输入特征值的输入基准值、输出基准值和权重基准值,对所述乘累加值进行移位运算,得到所述第一目标层的所述实数域的输出值。
  37. 根据权利要求23所述的方法,其特征在于,所述方法还包括:
    根据所述第一目标层的权重对数域位宽和最大权重系数的大小,确定权重基准值;
    根据所述权重基准值和所述权重对数域位宽,将所述第一目标层中的实数域的权重系数转换到对数域,得到所述对数域的权重系数。
  38. 根据权利要求37所述的方法,其特征在于,所述根据所述权重基准值和所述权重对数域位宽,将所述第一目标层中的实数域的权重系数转换到对数域,得到所述对数域的权重系数,包括:
    根据所述权重基准值、所述权重对数域位宽和所述权重系数的大小将所述实数域的权重系数转换到对数域,得到所述对数域的权重系数。
  39. 根据权利要求38所述的方法,其特征在于,所述权重对数域位宽中包括一位符号位,所述权重系数在对数域的符号与所述权重系数在实数域的符号一致。
  40. 根据权利要求37所述的方法,其特征在于,所述最大权重系数为对所述神经网络的至少两层进行合并预处理后形成的所述第一目标层的权重系数的最大值。
  41. 根据权利要求23所述的方法,其特征在于,所述方法还包括:
    对所述神经网络的至少两层进行合并预处理,得到合并后形成的所述第一目标层。
  42. 根据权利要求33所述的方法,其特征在于,所述最大输出值为所述多个输入样本中每个所述输入样本在合并后形成的所述第一目标层的所述最大输出值。
  43. 根据权利要求42所述的方法,其特征在于,所述对所述神经网络的至少两层进行合并预处理,得到合并后形成的所述第一目标层,包括:
    对所述神经网络的卷积层和归一化层进行合并预处理,得到所述第一目标层;或,
    对所述神经网络的卷积层和缩放层进行合并预处理,得到所述第一目标层;或,
    对所述神经网络的卷积层、归一化层和缩放层进行合并预处理,得到所述第一目标层。
  44. 根据权利要求23所述的方法,其特征在于,所述第一目标层包括卷积层、转置卷积层、归一化层、缩放层、池化层、全连接层、拼接层、元素智能加法层和激活层中的一层或至少两层合并后的层。
  45. 一种数据转换装置,其特征在于,包括处理器和存储器,所述存储器用于存储处理器执行的指令,所述处理器用于执行以下步骤:
    根据神经网络的第一目标层的权重对数域位宽和最大权重系数的大小,确定权重基准值;
    根据所述权重基准值和所述权重对数域位宽,将所述第一目标层中的权重系数转换到对数域。
  46. 根据权利要求45所述的装置,其特征在于,所述处理器根据所述权重基准值和所述权重对数域位宽,将所述第一目标层中的权重系数转换到对数域,包括:
    根据所述权重基准值、所述权重对数域位宽和所述权重系数的大小将所述权重系数转换到对数域。
  47. 根据权利要求46所述的装置,其特征在于,所述权重对数域位宽中包括一位符号位,所述权重系数在对数域的符号与所述权重系数在实数域的符号一致。
  48. 根据权利要求45所述的装置,其特征在于,所述处理器在根据所述 权重基准值和所述权重对数域位宽,将所述第一目标层中的权重系数转换到对数域之后,还用于执行以下步骤:
    确定所述第一目标层的输入特征值;
    通过移位运算,对所述输入特征值和所述对数域的权重系数进行乘累加计算,得到所述第一目标层的实数域的输出值。
  49. 根据权利要求48所述的装置,其特征在于,所述输入特征值是实数域的输入特征值,所述处理器通过移位运算,对所述输入特征值和所述对数域的权重系数进行乘累加计算,得到所述第一目标层的实数域的输出值,包括:
    通过第一移位运算,对所述实数域的输入特征值和所述对数域的权重系数进行乘累加计算,得到乘累加值;
    对所述乘累加值进行第二移位运算,得到所述第一目标层的实数域的输出值。
  50. 根据权利要求49所述的装置,其特征在于,所述处理器对所述乘累加值进行第二移位运算,得到所述第一目标层的实数域的输出值,包括:
    根据所述实数域的输入特征值的小数位位宽和所述实数域的输出值的小数位位宽,对所述乘累加值进行移位运算,得到所述第一目标层的实数域的输出值。
  51. 根据权利要求50所述的装置,其特征在于,所述处理器根据所述实数域的输入特征值的小数位位宽和所述实数域的输出值的小数位位宽,对所述乘累加值进行移位运算,得到所述第一目标层的实数域的输出值,包括:
    根据所述实数域的输入特征值的小数位位宽、所述实数域的输出值的小数位位宽和所述权重基准值,对所述乘累加值进行移位运算,得到所述第一目标层的所述实数域的输出值。
  52. 根据权利要求51所述的装置,其特征在于,所述处理器在根据所述实数域的输入特征值的小数位位宽、所述实数域的输出值的小数位位宽和所述权重基准值,对所述乘累加值进行移位运算,得到所述第一目标层的所述实数域的输出值之后,还用于执行以下步骤:
    根据输出基准值、输出值对数域位宽和所述实数域的输出值的大小,将所述实数域的输出值转换到对数域。
  53. 根据权利要求52所述的装置,其特征在于,所述输出值对数域位宽 中包括一位符号位,所述输出值在对数域的符号与所述输出值在实数域的符号一致。
  54. 根据权利要求49所述的装置,其特征在于,所述处理器对所述乘累加值进行第二移位运算,得到所述第一目标层的实数域的输出值,包括:
    根据所述权重基准值和输出基准值,对所述乘累加值进行移位运算,得到所述第一目标层的实数域的输出值。
  55. 根据权利要求54所述的装置,其特征在于,所述处理器在根据所述权重基准值和输出基准值,对所述乘累加值进行移位运算,得到所述第一目标层的实数域的输出值之后,还用于执行以下步骤:
    根据输出值对数域位宽和所述实数域的输出值的大小,将所述实数域的输出值转换到对数域。
  56. 根据权利要求55所述的装置,其特征在于,所述输出值对数域位宽中包括一位符号位,所述输出值在对数域的符号与所述输出值在实数域的符号一致。
  57. 根据权利要求52所述的装置,其特征在于,所述处理器还用于执行以下步骤:
    根据所述第一目标层的输出值对数域位宽和参考输出值的大小,确定所述输出基准值。
  58. 根据权利要求57所述的装置,其特征在于,所述处理器还用于执行以下步骤:
    计算多个输入样本中每个所述输入样本在所述第一目标层的最大输出值;
    从多个所述最大输出值中选取出所述参考输出值。
  59. 根据权利要求58所述的装置,其特征在于,所述处理器从多个所述最大输出值中选取出所述参考输出值,包括:
    对多个所述最大输出值进行排序,按照预设的选取参数从所述多个最大输出值中选取出所述参考输出值。
  60. 根据权利要求48所述的装置,其特征在于,所述输入特征值是对数域的输入特征值,所述处理器通过移位运算,对所述输入特征值和所述对数域的权重系数进行乘累加计算,得到所述第一目标层的实数域的输出值,包括:
    通过第三移位运算,对所述对数域的输入特征值和所述对数域的权重系数进行乘累加计算,得到乘累加值;
    对所述乘累加值进行第四移位运算,得到所述第一目标层的实数域的输出值。
  61. 根据权利要求60所述的装置,其特征在于,所述处理器对所述乘累加值进行第四移位运算,得到所述第一目标层的实数域的输出值,包括:
    根据所述对数域的输入特征值的输入基准值、所述输出基准值和所述权重基准值,对所述乘累加值进行移位运算,得到所述第一目标层的所述实数域的输出值。
  62. 根据权利要求45所述的装置,其特征在于,所述最大权重系数为对所述神经网络的至少两层进行合并预处理后形成的所述第一目标层的权重系数的最大值。
  63. 根据权利要求45所述的装置,其特征在于,所述处理器还用于执行以下步骤:
    对所述神经网络的至少两层进行合并预处理,得到合并后形成的所述第一目标层。
  64. 根据权利要求58所述的装置,其特征在于,所述最大输出值为所述多个输入样本中每个所述输入样本在合并后形成的所述第一目标层的所述最大输出值。
  65. 根据权利要求63所述的装置,其特征在于,所述处理器对所述神经网络的至少两层进行合并预处理,得到合并后形成的所述第一目标层,包括:
    对所述神经网络的卷积层和归一化层进行合并预处理,得到所述第一目标层;或,
    对所述神经网络的卷积层和缩放层进行合并预处理,得到所述第一目标层;或,
    对所述神经网络的卷积层、归一化层和缩放层进行合并预处理,得到所述第一目标层。
  66. 根据权利要求45所述的装置,其特征在于,所述第一目标层包括卷积层、转置卷积层、归一化层、缩放层、池化层、全连接层、拼接层、元素智能加法层和激活层中的一层或至少两层合并后的层。
  67. 一种数据转换装置,其特征在于,包括处理器和存储器,所述存储 器用于存储处理器执行的指令,所述处理器用于执行以下步骤:
    确定神经网络的第一目标层的输入特征值;
    通过移位运算,对所述输入特征值和对数域的权重系数进行乘累加计算,得到所述第一目标层的实数域的输出值。
  68. 根据权利要求67所述的装置,其特征在于,所述输入特征值是实数域的输入特征值,所述处理器通过移位运算,对所述输入特征值和所述对数域的权重系数进行乘累加计算,得到所述第一目标层的实数域的输出值,包括:
    通过第一移位运算,对所述实数域的输入特征值和所述对数域的权重系数进行乘累加计算,得到乘累加值;
    对所述乘累加值进行第二移位运算,得到所述第一目标层的实数域的输出值。
  69. 根据权利要求68所述的装置,其特征在于,所述处理器对所述乘累加值进行第二移位运算,得到所述第一目标层的实数域的输出值,包括:
    根据所述实数域的输入特征值的小数位位宽和所述实数域的输出值的小数位位宽,对所述乘累加值进行移位运算,得到所述第一目标层的实数域的输出值。
  70. 根据权利要求69所述的装置,其特征在于,所述处理器根据所述实数域的输入特征值的小数位位宽和所述实数域的输出值的小数位位宽,对所述乘累加值进行移位运算,得到所述第一目标层的实数域的输出值,包括:
    根据所述实数域的输入特征值的小数位位宽、所述实数域的输出值的小数位位宽和所述权重基准值,对所述乘累加值进行移位运算,得到所述第一目标层的所述实数域的输出值。
  71. 根据权利要求70所述的装置,其特征在于,所述处理器在根据所述实数域的输入特征值的小数位位宽、所述实数域的输出值的小数位位宽和所述权重基准值,对所述乘累加值进行移位运算,得到所述第一目标层的所述实数域的输出值之后,还用于执行以下步骤:
    根据输出基准值、输出值对数域位宽和所述实数域的输出值的大小,将所述实数域的输出值转换到对数域。
  72. 根据权利要求71所述的装置,其特征在于,所述输出值对数域位宽中包括一位符号位,所述输出值在对数域的符号与所述输出值在实数域的符 号一致。
  73. 根据权利要求68所述的装置,其特征在于,所述处理器对所述乘累加值进行第二移位运算,得到所述第一目标层的实数域的输出值,包括:
    根据所述权重基准值和输出基准值,对所述乘累加值进行移位运算,得到所述第一目标层的实数域的输出值。
  74. 根据权利要求73所述的装置,其特征在于,所述处理器在根据所述权重基准值和输出基准值,对所述乘累加值进行移位运算,得到所述第一目标层的实数域的输出值之后,还用于执行以下步骤:
    根据输出值对数域位宽和所述实数域的输出值的大小,将所述实数域的输出值转换到对数域。
  75. 根据权利要求74所述的装置,其特征在于,所述输出值对数域位宽中包括一位符号位,所述输出值在对数域的符号与所述输出值在实数域的符号一致。
  76. 根据权利要求71或73所述的装置,其特征在于,所述处理器还用于执行以下步骤:
    根据所述第一目标层的输出值对数域位宽和参考输出值的大小,确定所述输出基准值。
  77. 根据权利要求76所述的装置,其特征在于,所述处理器还用于执行以下步骤:
    计算多个输入样本中每个所述输入样本在所述第一目标层的最大输出值;
    从多个所述最大输出值中选取出所述参考输出值。
  78. 根据权利要求77所述的装置,其特征在于,所述处理器从多个所述最大输出值中选取出所述参考输出值,包括:
    对多个所述最大输出值进行排序,按照预设的选取参数从所述多个最大输出值中选取出所述参考输出值。
  79. 根据权利要求67所述的装置,其特征在于,所述输入特征值是对数域的输入特征值,所述处理器通过移位运算,对所述输入特征值和所述对数域的权重系数进行乘累加计算,得到所述第一目标层的实数域的输出值,包括:
    通过第三移位运算,对所述对数域的输入特征值和对数域的权重系数进 行乘累加计算,得到乘累加值;
    对所述乘累加值进行第四移位运算,得到所述第一目标层的实数域的输出值。
  80. 根据权利要求79所述的装置,其特征在于,所述处理器对所述乘累加值进行第四移位运算,得到所述第一目标层的实数域的输出值,包括:
    根据所述对数域的输入特征值的输入基准值、输出基准值和权重基准值,对所述乘累加值进行移位运算,得到所述第一目标层的所述实数域的输出值。
  81. 根据权利要求67所述的装置,其特征在于,所述处理器还用于执行以下步骤:
    根据所述第一目标层的权重对数域位宽和最大权重系数的大小,确定权重基准值;
    根据所述权重基准值和所述权重对数域位宽,将所述第一目标层中的实数域的权重系数转换到对数域,得到所述对数域的权重系数。
  82. 根据权利要求81所述的装置,其特征在于,所述处理器根据所述权重基准值和所述权重对数域位宽,将所述第一目标层中的实数域的权重系数转换到对数域,得到所述对数域的权重系数,包括:
    根据所述权重基准值、所述权重对数域位宽和所述权重系数的大小将所述实数域的权重系数转换到对数域,得到所述对数域的权重系数。
  83. 根据权利要求82所述的装置,其特征在于,所述权重对数域位宽中包括一位符号位,所述权重系数在对数域的符号与所述权重系数在实数域的符号一致。
  84. 根据权利要求81所述的装置,其特征在于,所述最大权重系数为对所述神经网络的至少两层进行合并预处理后形成的所述第一目标层的权重系数的最大值。
  85. 根据权利要求67所述的装置,其特征在于,所述处理器还用于执行以下步骤:
    对所述神经网络的至少两层进行合并预处理,得到合并后形成的所述第一目标层。
  86. 根据权利要求77所述的装置,其特征在于,所述最大输出值为所述多个输入样本中每个所述输入样本在合并后形成的所述第一目标层的所述 最大输出值。
  87. 根据权利要求86所述的装置,其特征在于,所述处理器对所述神经网络的至少两层进行合并预处理,得到合并后形成的所述第一目标层,包括:
    对所述神经网络的卷积层和归一化层进行合并预处理,得到所述第一目标层;或,
    对所述神经网络的卷积层和缩放层进行合并预处理,得到所述第一目标层;或,
    对所述神经网络的卷积层、归一化层和缩放层进行合并预处理,得到所述第一目标层。
  88. 根据权利要求67所述的装置,其特征在于,所述第一目标层包括卷积层、转置卷积层、归一化层、缩放层、池化层、全连接层、拼接层、元素智能加法层和激活层中的一层或至少两层合并后的层。
PCT/CN2018/077573 2018-02-28 2018-02-28 数据转换方法和装置 WO2019165602A1 (zh)

Priority Applications (3)

Application Number Priority Date Filing Date Title
CN201880011394.7A CN110337636A (zh) 2018-02-28 2018-02-28 数据转换方法和装置
PCT/CN2018/077573 WO2019165602A1 (zh) 2018-02-28 2018-02-28 数据转换方法和装置
US17/000,915 US20200389182A1 (en) 2018-02-28 2020-08-24 Data conversion method and apparatus

Applications Claiming Priority (1)

Application Number Priority Date Filing Date Title
PCT/CN2018/077573 WO2019165602A1 (zh) 2018-02-28 2018-02-28 数据转换方法和装置

Related Child Applications (1)

Application Number Title Priority Date Filing Date
US17/000,915 Continuation US20200389182A1 (en) 2018-02-28 2020-08-24 Data conversion method and apparatus

Publications (1)

Publication Number Publication Date
WO2019165602A1 true WO2019165602A1 (zh) 2019-09-06

Family

ID=67804735

Family Applications (1)

Application Number Title Priority Date Filing Date
PCT/CN2018/077573 WO2019165602A1 (zh) 2018-02-28 2018-02-28 数据转换方法和装置

Country Status (3)

Country Link
US (1) US20200389182A1 (zh)
CN (1) CN110337636A (zh)
WO (1) WO2019165602A1 (zh)

Cited By (2)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN111831356A (zh) * 2020-07-09 2020-10-27 北京灵汐科技有限公司 权重精度配置方法、装置、设备及存储介质
WO2021194095A1 (en) * 2020-03-24 2021-09-30 Lg Electronics Inc. Training a neural network using stochastic whitening batch normalization

Families Citing this family (2)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
EP3471271A1 (en) * 2017-10-16 2019-04-17 Acoustical Beauty Improved convolutions of digital signals using a bit requirement optimization of a target digital signal
US11037027B2 (en) * 2018-10-25 2021-06-15 Raytheon Company Computer architecture for and-or neural networks

Citations (5)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN103731159A (zh) * 2014-01-09 2014-04-16 北京邮电大学 一种对先验信息迭代应用的混合域fft多进制和积译码算法
CN105320495A (zh) * 2014-07-22 2016-02-10 英特尔公司 用于卷积神经网络的权重移位机制
WO2016165120A1 (en) * 2015-04-17 2016-10-20 Microsoft Technology Licensing, Llc Deep neural support vector machines
CN106204474A (zh) * 2011-03-02 2016-12-07 杜比实验室特许公司 局部多等级色调映射运算器
CN107220025A (zh) * 2017-04-24 2017-09-29 华为机器有限公司 处理乘加运算的装置和处理乘加运算的方法

Patent Citations (5)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN106204474A (zh) * 2011-03-02 2016-12-07 杜比实验室特许公司 局部多等级色调映射运算器
CN103731159A (zh) * 2014-01-09 2014-04-16 北京邮电大学 一种对先验信息迭代应用的混合域fft多进制和积译码算法
CN105320495A (zh) * 2014-07-22 2016-02-10 英特尔公司 用于卷积神经网络的权重移位机制
WO2016165120A1 (en) * 2015-04-17 2016-10-20 Microsoft Technology Licensing, Llc Deep neural support vector machines
CN107220025A (zh) * 2017-04-24 2017-09-29 华为机器有限公司 处理乘加运算的装置和处理乘加运算的方法

Cited By (3)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
WO2021194095A1 (en) * 2020-03-24 2021-09-30 Lg Electronics Inc. Training a neural network using stochastic whitening batch normalization
CN111831356A (zh) * 2020-07-09 2020-10-27 北京灵汐科技有限公司 权重精度配置方法、装置、设备及存储介质
CN111831356B (zh) * 2020-07-09 2023-04-07 北京灵汐科技有限公司 权重精度配置方法、装置、设备及存储介质

Also Published As

Publication number Publication date
US20200389182A1 (en) 2020-12-10
CN110337636A (zh) 2019-10-15

Similar Documents

Publication Publication Date Title
CN108701250B (zh) 数据定点化方法和装置
WO2019165602A1 (zh) 数据转换方法和装置
US10643124B2 (en) Method and device for quantizing complex artificial neural network
US10929746B2 (en) Low-power hardware acceleration method and system for convolution neural network computation
US10534999B2 (en) Apparatus for classifying data using boost pooling neural network, and neural network training method therefor
Guo et al. A stock market forecasting model combining two-directional two-dimensional principal component analysis and radial basis function neural network
EP3674986A1 (en) Neural network apparatus and method with bitwise operation
Li et al. MapReduce-based fast fuzzy c-means algorithm for large-scale underwater image segmentation
US20190354844A1 (en) Implementing Traditional Computer Vision Algorithms as Neural Networks
US8458194B1 (en) System and method for content-based document organization and filing
CN110826708B (zh) 一种用多核处理器实现神经网络模型拆分方法及相关产品
EP3528181B1 (en) Processing method of neural network and apparatus using the processing method
EP3480689B1 (en) Hierarchical mantissa bit length selection for hardware implementation of deep neural network
GB2568081A (en) End-to-end data format selection for hardware implementation of deep neural network
EP3809285B1 (en) Method and apparatus with data processing
CN112906865B (zh) 神经网络架构搜索方法、装置、电子设备及存储介质
JP2005234994A (ja) 類似度判定プログラム、マルチメディアデータ検索プログラム、類似度判定方法、および類似度判定装置
Ching et al. Learning image aesthetics by learning inpainting
US20240037939A1 (en) Contrastive captioning for image groups
RU2701841C1 (ru) Устройство дефаззификации на основе метода отношения площадей
US20230031537A1 (en) Running Bidirectional Recurrent Neural Networks in Hardware
US20220180187A1 (en) Method and apparatus for performing deep learning operations
US20230143874A1 (en) Method and apparatus with recognition model training
US20220051084A1 (en) Method and apparatus with convolution operation processing based on redundancy reduction
US20230068394A1 (en) Number format selection for bidirectional recurrent neural networks

Legal Events

Date Code Title Description
121 Ep: the epo has been informed by wipo that ep was designated in this application

Ref document number: 18908065

Country of ref document: EP

Kind code of ref document: A1

NENP Non-entry into the national phase

Ref country code: DE

122 Ep: pct application non-entry in european phase

Ref document number: 18908065

Country of ref document: EP

Kind code of ref document: A1