CN109542512A

CN109542512A - A kind of data processing method, device and storage medium

Info

Publication number: CN109542512A
Application number: CN201811313447.9A
Authority: CN
Inventors: 戴彦; 蔡林金; 丁缙; 姚达; 吴永坚
Original assignee: Tencent Technology Shenzhen Co Ltd
Current assignee: Tencent Technology Shenzhen Co Ltd
Priority date: 2018-11-06
Filing date: 2018-11-06
Publication date: 2019-03-29
Anticipated expiration: 2038-11-06
Also published as: CN109542512B

Abstract

The embodiment of the invention discloses a kind of data processing method, device and storage mediums；The embodiment of the present invention is after getting default convolutional neural networks, it can determine current convolutional layer, and obtain the expansion bit port value of the current convolutional layer, then, during carrying out feature extraction and cumulative feature to input data using the current convolutional layer, cumulative port number is counted, and when cumulative port number reaches the integral multiple of the expansion bit port value, the operational bits of current convolutional layer in a register are extended, to be stored to the data that cumulative process generates；The program can reduce the quantity and convolution algorithm complexity of instruction while avoiding overflowing, and realize the reduction that performance accelerates and computing resource consumes.

Description

Data processing method, device and storage medium

Technical Field

The present invention relates to the field of communications technologies, and in particular, to a data processing method, apparatus, and storage medium.

Background

With the widespread application of Convolutional Neural Networks (CNNs) in the field of data processing (such as image processing), and the popularization of mobile terminals, the quantization Convolutional Neural networks are also receiving more and more attention. Quantization means compression of the model, that is, representing the weight values and the feature values with a smaller number of operation bits. Through quantification, the convolutional neural network can be lighter and can be further deployed in platforms with limited hardware resources, such as mobile terminals.

During the research and practice of the prior art, the inventor of the present invention found that, because the convolution operation is mainly based on multiplication and addition operations, as the number of operation bits is reduced after quantization, multiple multiplication and accumulation operations are very likely to cause overflow (i.e. the operation result exceeds the range that can be represented by the current number of operation bits). In order to avoid overflow, the operation bit number in the register needs to be frequently expanded to a higher bit number, which not only complicates the operation, but also increases the complexity of the quantized convolution operation, generates more extra overhead, and increases the consumption of the computing resource.

Disclosure of Invention

The embodiment of the invention provides a data processing method, a data processing device and a storage medium; the bit expansion method can flexibly and simply expand the operation bit number, avoid overflow, reduce the complexity of the quantized convolution operation and reduce the consumption of computing resources.

The embodiment of the invention provides a data processing method, which is characterized by comprising the following steps:

acquiring a preset convolutional neural network, wherein the convolutional neural network comprises a plurality of convolutional layers;

determining a current convolutional layer, and acquiring input data and an extended bit channel value of the current convolutional layer;

performing feature extraction on input data by adopting a current convolutional layer to obtain feature information corresponding to a plurality of channels;

accumulating the characteristic information corresponding to the plurality of channels, and counting the number of the accumulated channels;

if the accumulated channel number reaches the integral multiple of the bit expansion channel value, expanding the operation digit of the current convolution layer in the register so as to store the data generated in the accumulation process;

and after the characteristic information corresponding to the plurality of channels is accumulated, returning to the step of determining the current convolutional layer until all convolutional layers are processed.

Correspondingly, an embodiment of the present invention provides a data processing apparatus, including an obtaining unit, a determining unit, an extracting unit, an accumulating unit, and an expanding unit, as follows:

the device comprises an acquisition unit, a processing unit and a control unit, wherein the acquisition unit is used for acquiring a preset convolutional neural network, and the convolutional neural network comprises a plurality of convolutional layers;

the determining unit is used for determining the current convolutional layer and acquiring the input data and the bit expansion channel value of the current convolutional layer;

the extraction unit is used for extracting the characteristics of the input data by adopting the current convolutional layer to obtain the characteristic information corresponding to a plurality of channels;

the accumulation unit is used for accumulating the characteristic information corresponding to the channels, counting the number of the accumulated channels, and triggering the determination unit to execute the operation of determining the current convolutional layer after the characteristic information corresponding to the channels is accumulated until all the convolutional layers are processed;

and the expanding unit is used for expanding the operation digit of the current convolution layer in the register when the accumulated channel number reaches the integral multiple of the bit expansion channel value so as to store the data generated in the accumulation process.

Optionally, in some embodiments, the extension unit may include a calling subunit and an extension subunit, as follows:

the calling subunit is used for calling the bit expansion instruction when the accumulated channel number reaches the integral multiple of the bit expansion channel value;

and the extension subunit is used for extending the operation digit of the current convolution layer in the register according to the bit extension instruction so as to store the data generated in the accumulation process.

Optionally, in some embodiments, the bit expansion instruction is a multiply-add instruction with a bit expansion function, and the extension subunit is specifically configured to determine the number of operation bits of the current convolution layer in the register and a target number of operation bits that are allowed to be extended, and extend the number of operation bits of the current convolution layer in the register to the target number of operation bits by using the multiply-add instruction with a bit expansion function.

Optionally, in some embodiments, the determining unit includes a first determining subunit, an input subunit, and a second determining subunit, as follows:

the first determining subunit is used for determining the current convolutional layer;

the input subunit is used for acquiring input data of the current convolutional layer;

the second determining subunit is configured to determine a weight value of a convolution kernel of the current convolution layer and a eigenvalue value range of input data, and determine a bit expansion channel value of the current convolution layer according to the weight value and the eigenvalue value range.

Optionally, in some embodiments, the second determining subunit is specifically configured to calculate, according to the weight value and the value range of the feature value, a maximum number of channels where no overflow occurs in the current operation digit, and obtain an extended channel value of the current convolutional layer;

or, the second determining subunit is specifically configured to obtain, from a preset static code library, a bit expansion channel value corresponding to the weighted value and the eigenvalue value range, and obtain the bit expansion channel value of the current convolutional layer.

Optionally, in some embodiments, the data processing apparatus may further include a setup unit, as follows:

the establishing unit is used for obtaining the weighted values and the eigenvalue value ranges of a plurality of groups of samples, calculating the maximum channel number without overflow of the preset operation digits according to the weighted values and the eigenvalue value ranges of each group of samples, obtaining the bit expansion channel value corresponding to each group of samples, establishing the corresponding relation among the weighted values, the eigenvalue value ranges and the bit expansion channel values of each group of samples, and storing the corresponding relation into the static code library.

Optionally, in some embodiments, the second determining subunit may be specifically configured to calculate a maximum value range that can be represented by a current operation digit, determine a maximum feature value according to a feature value range, perform convolution operation on the maximum feature value and a weight value to obtain a maximum value of each channel in the current convolutional layer, calculate a maximum channel number that can be accommodated in the maximum value range according to the maximum value of each channel, and determine an extended channel value of the current convolutional layer according to the calculated maximum channel number.

Optionally, in some embodiments, the second determining subunit may be specifically configured to use the calculated maximum channel number as the extended channel value of the current convolutional layer.

Optionally, in some embodiments, the second determining subunit may be specifically configured to perform fine-grained adjustment on the calculated maximum channel number according to a prediction policy, so as to obtain the extended channel value of the current convolutional layer.

In addition, an embodiment of the present invention further provides a storage medium, where the storage medium stores a plurality of instructions, and the instructions are suitable for being loaded by a processor to perform steps in any data processing method provided in the embodiment of the present invention.

After the preset convolutional neural network is obtained, the current convolutional layer can be determined, the bit expansion channel value of the current convolutional layer is obtained, then, in the process of performing feature extraction and feature accumulation on input data by using the current convolutional layer, the number of accumulated channels is counted, and when the number of accumulated channels reaches integral multiple of the bit expansion channel value, the operation number of the current convolutional layer in a register is expanded so as to store data generated in the accumulation process; because the scheme can flexibly determine the bit expanding time according to the actual situation of the current operation environment so as to automatically expand the operation bits in the register, compared with the existing scheme which can only expand the bits frequently, the scheme can reduce the number of instructions and realize the performance acceleration while avoiding overflow; in addition, due to the reduction of the number of instructions, the complexity of the convolution operation after quantization is also reduced, and further, the consumption of computing resources can be reduced.

Drawings

In order to more clearly illustrate the technical solutions in the embodiments of the present invention, the drawings needed to be used in the description of the embodiments will be briefly introduced below, and it is obvious that the drawings in the following description are only some embodiments of the present invention, and it is obvious for those skilled in the art to obtain other drawings based on these drawings without creative efforts.

FIG. 1a is a schematic view of a data processing method according to an embodiment of the present invention;

FIG. 1b is a flow chart of a data processing method provided by an embodiment of the invention;

FIG. 1c is a schematic diagram of a convolution operation process in the data processing method according to the embodiment of the present invention;

FIG. 2a is a flowchart illustrating the calculation of a bit-extended channel value according to an embodiment of the present invention;

FIG. 2b is another flow chart of a data processing method according to an embodiment of the present invention;

FIG. 3a is a schematic structural diagram of a data processing apparatus according to an embodiment of the present invention;

FIG. 3b is a schematic diagram of another structure of a data processing apparatus according to an embodiment of the present invention;

fig. 4 is a schematic structural diagram of a network device according to an embodiment of the present invention.

Detailed Description

The technical solutions in the embodiments of the present invention will be clearly and completely described below with reference to the drawings in the embodiments of the present invention, and it is obvious that the described embodiments are only a part of the embodiments of the present invention, and not all of the embodiments. All other embodiments, which can be derived by a person skilled in the art from the embodiments given herein without making any creative effort, shall fall within the protection scope of the present invention.

The embodiment of the invention provides a data processing method, a data processing device and a storage medium. The data processing apparatus may be specifically integrated in a network device, such as a server or a mobile terminal.

Taking the data processing apparatus integrated in the network device as an example, after the network device obtains a preset convolutional neural network (the convolutional neural network includes multiple convolutional layers), it may determine a current convolutional layer, and obtain input data and a bit-extended channel value of the current convolutional layer, for example, referring to fig. 1a, if the current convolutional layer is a convolutional layer 1, at this time, it may obtain input data — input data 1 of the convolutional layer 1, and obtain a bit-extended channel value — m of the convolutional layer 1₁(ii) a For another example, if the current convolutional layer is convolutional layer 2, then the input data of convolutional layer 2, i.e. input data 2, and the extended bit channel value of convolutional layer 2, i.e. m, can be obtained₂And so on, and so on. Then, feature extraction is performed on the input data by using the current convolutional layer, for example, feature extraction is performed on input data 1 by using convolutional layer 1, or feature extraction is performed on input data 2 by using convolutional layer 2Extracting characteristics, and the like to obtain characteristic information corresponding to a plurality of channels in the current convolutional layer; thereafter, the feature information corresponding to the multiple channels may be accumulated, and each time "channel value of bit expansion" is accumulated, bit expansion (i.e., expansion of the number of operation bits of the current convolutional layer in the register) is performed once, for example, original 8 bits (bits) is expanded to 16 bits, or 16 bits is expanded to 32 bits, and so on, so as to store the data generated in the accumulation process. After the characteristic information corresponding to the channels is accumulated, the accumulated result is used as the input data of the next volume of lamination layer, the next volume of lamination layer is used as a new current lamination layer, and the operation is executed until all the lamination layers are processed, so that the aims of preventing overflow and automatically expanding bits are fulfilled.

The following are detailed below. It should be noted that the following description of the embodiments is not intended to limit the preferred order of the embodiments.

The first embodiment,

The present embodiment will be described from the perspective of a data processing apparatus, which may be specifically inherited in a network device, such as a server or a mobile terminal, which may include a mobile phone, a tablet Computer, a notebook Computer, or a Personal Computer (PC), etc.

A method of data processing, comprising: the method comprises the steps of obtaining a preset convolutional neural network, determining a current convolutional layer, obtaining input data and bit expansion channel values of the current convolutional layer, performing feature extraction on the input data by adopting the current convolutional layer to obtain feature information corresponding to a plurality of channels, accumulating the feature information corresponding to the plurality of channels, counting the number of accumulated channels, expanding the operation number of the current convolutional layer in a register if the number of accumulated channels reaches integral multiple of the bit expansion channel values to store data generated in the accumulation process, and returning to the step of determining the current convolutional layer after the feature information corresponding to the plurality of channels is accumulated until all convolutional layers are processed.

As shown in fig. 1b, the specific flow of the data processing method may be as follows:

101. and acquiring a preset convolutional neural network.

The network structure of the convolutional neural network may be determined according to the requirements of practical applications, and for example, may include a plurality of convolutional layers, and may further include structures such as an active layer, a pooling layer, and a full-link layer. Parameters of each layer in the network structure, such as a Convolution Kernel (Convolution Kernel) size of the convolutional layer, a weight of the Convolution Kernel, a step size, and the like, may also be set according to a specific application scenario or requirement, and are not described herein again.

The convolutional neural network may be implemented by code, for example, a set of template codes may be written, and then different convolutional core codes may be dynamically generated based on the template codes by using a Just-In-Time Compiler (JIT) according to the network structures and parameters of each layer.

Specifically, the preset codes of a plurality of convolutional neural networks may be stored in a code library, such as a static code library, and then, when an acquisition request of a convolutional neural network is received, a corresponding code is acquired from the code library according to a convolutional neural network identifier carried in the acquisition request, so that the corresponding convolutional neural network can be obtained; that is to say, in a specific implementation, the step "obtaining the preset convolutional neural network" may include:

receiving an acquisition request of the convolutional neural network, wherein the convolutional neural network identifier carried in the acquisition request obtains a corresponding code from a code base according to the convolutional neural network identifier, and obtaining the convolutional neural network.

Of course, the codes of the convolutional neural network may be stored in other forms in a local (i.e., the data processing apparatus) or other devices, which are not listed here.

102. And determining the current convolutional layer, and acquiring the input data and the bit expansion channel value of the current convolutional layer.

For example, taking the convolutional neural network including convolutional layer 1, convolutional layer 2, and convolutional layer 3 as an example, if the current convolutional layer is convolutional layer 1, then the input data and the extended bit channel value of convolutional layer 1 may be obtained at this time; if the current convolutional layer is convolutional layer 2, then the input data and the bit expansion channel value of convolutional layer 2 can be obtained at this time; if the current convolutional layer is convolutional layer 3, then the input data and the bit-extended channel value of convolutional layer 3 can be obtained, and so on.

The bit expansion channel value refers to a bit expansion (i.e. expansion of the operation bit number in the register) period, and the period is represented by the accumulated channel number; for example, if the bit-extended channel value of a convolutional layer is m, when the characteristic information of a plurality of channels of the convolutional layer is accumulated, bit extension is required once every m channels are accumulated.

The step of obtaining the bit expansion channel value of the current convolutional layer may include:

determining a weight value of a convolution kernel of the current convolution layer and a characteristic value range of input data, and determining a bit expansion channel value of the current convolution layer according to the weight value and the characteristic value range.

The method for determining the extended bit channel value of the current convolutional layer according to the weight value and the eigenvalue value range may be various, for example, the extended bit channel value may be obtained by directly calculating according to the weight value and the eigenvalue value range, or may be obtained by querying, specifically as follows:

(1) the first method is as follows:

calculating the maximum channel number without overflow of the current operation digit according to the weight value (namely the weight value of the convolution kernel of the current convolution layer) and the value range of the characteristic value to obtain the bit expansion channel value of the current convolution layer; for example, the following may be specifically mentioned:

and S1, calculating the maximum numerical range which can be represented by the current operation digit.

For example, if the current operation bit number is 8 bits (bit), and one sign bit (i.e., "+" or "-" sign) is removed, the maximum value range that can be represented is "2⁷-1 ", in binary representation" 1111111 "; if the current operation bit number is 16 bits (bit), and one sign bit is removed, the maximum value range which can be represented by the current operation bit number is 2¹⁵-1 ", in binary representation, the binary number" 111111111111111 ", and so on.

And S2, determining the maximum characteristic value according to the characteristic value range.

This is because, when each of the feature values takes a maximum value, the value of the operation result is maximized, and if the number of operation bits is still guaranteed not to overflow in such a worst case, overflow does not occur in other cases.

For example, if the eigenvalue of the input data takes on the range of (-2)^r～2^r) Then, at this time, the maximum eigenvalue is "2^r"where, the value range of r may be determined according to the requirement of the practical application, for example, if the number of register bits for storing the input data is 8 bits (where one bit is a sign bit), the value range of r is (0-7), or for example, if the number of register bits for storing the input data is 16 bits (where one bit is a sign bit), the value range of r is (0-15), and so on.

S3, carrying out convolution operation on the maximum characteristic value and the weighted value to obtain the maximum value of each channel in the current convolution layer, wherein the maximum value is expressed by a formula:

wherein, F_maxFor each channel in the current convolutional layer, w_iWeight value of convolution kernel of current convolution layer, d_imaxIs the maximum eigenvalue.

Wherein, the weight value (i.e. the weight value of the convolution kernel of the current convolution layer) can be set according to the requirement of practical application, and the value range can be (-2)^t～2^t) The value range of t may also be determined according to the requirements of practical applications, for example, if the number of register bits used for storing the weight value is 8 bits (one of the register bits is a sign bit), the value range of t is (0-7), or for example, if the number of register bits used for storing the weight value is 16 bits (one of the register bits is a sign bit), the value range of t is (0-15), and so on.

And S4, calculating the maximum number of channels which can be accommodated in the maximum value range according to the maximum value of each channel.

For example, taking the current operation bit number as 16 bits (bit) as an example, if overflow is not allowed to occur, the convolution operation result of the current convolutional layer needs to satisfy the following condition:

wherein d is_jjmaxFor the maximum feature value used in calculating the "maximum value of the j-th channel", the maximum value F of the j-th channel_jmaxI.e. byIn the known case, the value of m, i.e. the maximum number of channels that can be accommodated within this maximum value range, can be calculated.

For another example, taking the current operation bit number as 32 bits (bit) as an example, if overflow is not allowed to occur, the convolution operation result of the current convolutional layer needs to satisfy the following condition:

wherein d is_jimaxFor the maximum feature value used in calculating the "maximum value of the jth channel", the maximum value F of the jth channel is calculated in the same manner_jmaxI.e. byIn the known case, the value of m, i.e. the maximum number of channels that can be accommodated within this maximum value range, can be calculated.

S5, determining the extended channel value of the current convolution layer according to the maximum channel number obtained by calculation.

For example, the calculated maximum number of channels may be specifically used as the extended channel value of the current convolutional layer.

Optionally, since the maximum number of channels is calculated based on "each eigenvalue is a maximum eigenvalue", and in general, each eigenvalue is not necessarily a "maximum eigenvalue", in order to further reduce the number of multiply-add instructions in the convolution operation (the larger the extended channel value is, the more the number of multiply-add instructions that can be reduced is), fine-grain (fine-grain) adjustment may also be performed on the calculated maximum number of channels, and the adjusted maximum number of channels is used as the extended channel value of the current convolution layer, that is, the step "determining the extended channel value of the current convolution layer according to the calculated maximum number of channels" may also include:

and performing fine-grained adjustment on the calculated maximum channel number according to a prediction strategy to obtain the bit expansion channel value of the current convolutional layer.

The prediction strategy may be set according to a requirement of an actual application, for example, the size of "the maximum number of channels (i.e., m value)" may be increased step by step, for the same input feature map and convolution kernel, it is tested whether overflow will occur by using different m values, and the maximum m value that will not cause overflow, such as m 'value, is selected from the m value, that is, the m' value may be used as the extended channel value of the current convolutional layer.

(2) The second method comprises the following steps:

and acquiring the bit expansion channel value corresponding to the weighted value and the characteristic value range from a preset static code library to obtain the bit expansion channel value of the current convolutional layer.

For example, specifically, for some application scenarios, a value range of a weight value and a feature value to be used may be analyzed, and then a corresponding bit expansion channel value may be obtained through calculation, and then a correspondence relationship between the weight value, the value range of the feature value, and the bit expansion channel value is stored in a static code library for subsequent calling; before the step of obtaining the bit expansion channel value corresponding to the weighted value and the eigenvalue value range from the preset static code library to obtain the bit expansion channel value of the current convolutional layer, the data processing method may further include:

the method comprises the steps of obtaining the weighted values and the eigenvalue value ranges of a plurality of groups of samples, calculating the maximum channel number without overflow of preset operation digits according to the weighted values and the eigenvalue value ranges of each group of samples, obtaining the bit expansion channel value corresponding to each group of samples, establishing the corresponding relation among the weighted values, the eigenvalue value ranges and the bit expansion channel values of each group of samples, and storing the corresponding relation into a static code library.

The "calculating the maximum number of channels without overflow of the preset operation digits according to the weight value and the range of the eigenvalue of each group of samples" is similar to the "way one", for example, the maximum numerical range that the current operation digits can represent can be calculated, the maximum eigenvalue is determined according to the range of the eigenvalue, then, convolution operation is performed on the maximum eigenvalue and the weight value to obtain the maximum value of each channel in the current sample, then, according to the maximum value of each channel, the maximum number of channels that can be accommodated in the maximum numerical range is calculated, and the extended channel value of the sample is determined according to the calculated maximum number of channels, and so on, see the foregoing embodiments, and are not described herein again.

The correspondence relationship may be stored in a code library that is independent of the "convolutional neural network" code, or may be stored in a code library that stores the "convolutional neural network" code. Optionally, in specific implementation, corresponding convolution operation codes may be generated based on different bit expansion channel values and stored in a code library, so that when a convolution neural network is used, the corresponding convolution operation codes may be directly called from the code library to be executed without separately obtaining the bit expansion channel values.

Optionally, a general function may be further stored in the static code library, and is used to process an unusual bit expansion channel value, for example, the general function may be adopted to calculate, according to the weight value (that is, the weight value of the convolution kernel of the current convolution layer) and the range of the eigenvalue, the maximum channel number where overflow is not generated in the current operation digit, and obtain the bit expansion channel value of the current convolution layer, and the like, where a specific calculation manner may refer to "manner one", which is not described herein again.

103. And performing feature extraction on the input data by adopting the current convolutional layer to obtain feature information corresponding to a plurality of channels.

For example, a channel currently needing to be processed may be determined, then, a feature value of input data of the current convolutional layer on the channel currently needing to be processed and a weight value of a convolutional core are obtained, then, the feature value and the weight value are multiplied respectively and then accumulated, so as to obtain feature information corresponding to the channel currently needing to be processed, then, the step of determining the channel currently needing to be processed (that is, processing another channel) is executed in a return manner, until all channels are processed, so as to obtain feature information corresponding to a plurality of channels of the current convolutional layer.

For example, referring to fig. 1c, taking channel X and a convolution kernel of 3 × 3 as an example, if the feature value of the input data (i.e. the input feature map) of the current convolution layer on the channel X is "d₁、d₂、d₃、d₄、d₅、d₆、d₇、d₈、d₉、……d_i", and the weight value of the convolution kernel is" w₁、w₂、w₃、w₄、w₅、w₆、w₇、w₈、w₉、……w_i", after a convolution operation, the operation result (i.e. the gray square in fig. 1 c) can be obtained as follows:

since there are n points on the input feature map, it is necessary to perform convolution operation on n points, and therefore, for the whole input feature map, on the channel X, the final operation result F of each convolution can be recorded as:

for convenience of description, in the embodiment of the present invention, the characteristic information corresponding to the channel X of the current convolutional layer is also referred to as F.

Similarly, the above-mentioned processing may be performed on other channels, so as to obtain the feature information corresponding to all channels of the current convolutional layer.

104. And accumulating the characteristic information corresponding to the plurality of channels, and counting the number of the accumulated channels.

Since the input data (input feature map) generally includes a plurality of channels, when the convolutional neural network processes a certain input data map, it is necessary to perform a convolution operation on values of the plurality of channels at corresponding positions in the input data, and if the number of channels is denoted as M, a result S of the convolution operation of the plurality of channels (i.e., a result obtained by accumulating feature information corresponding to the plurality of channels) may be represented as:

wherein d is_jiThe characteristic value, w, of the input data (i.e., input characteristic map) of the current convolutional layer on the j-th channel_iIs the weight value of the convolution kernel of the current convolution layer.

105. If the accumulated channel number reaches the integral multiple of the bit expansion channel value, expanding the operation digit of the current convolution layer in the register so as to store the data generated in the accumulation process; for example, the following can be specifically seen:

(1) if the accumulated channel number reaches the integral multiple of the bit expansion channel value, the bit expansion instruction is called.

The bit expansion instruction may be set according to a requirement of an actual application, for example, the bit expansion instruction may be specifically a multiply-add instruction with a bit expansion function, and the like.

For example, if the bit-extended channel value is m, when the number of accumulated channels reaches m, 2m, and 3m, etc., a bit-extended instruction needs to be called, that is, a bit-extended instruction needs to be called once every time m channels are accumulated, so as to extend the number of operation bits of the current convolutional layer in the register.

(2) And expanding the operation digit of the current convolution layer in the register according to the bit expansion instruction so as to store the data generated in the accumulation process.

For example, taking the bit expansion instruction as a multiply-add instruction with bit expansion function as an example, the step "expanding the operation bit number of the current convolution layer in the register according to the bit expansion instruction" may include:

and determining the operation digit of the current convolution layer in the register and a target operation digit allowing expansion, and expanding the operation digit of the current convolution layer in the register to the target operation digit by adopting the multiply-add instruction with the bit expansion function.

For example, if the number of operation bits of the current convolution layer in the register is 16 bits, and the number of target operation bits allowed to be expanded is 32 bits, then the multiply-add instruction with the bit expansion function may be used to expand the number of operation bits of the current convolution layer in the register to 32 bits, and so on.

106. And after the characteristic information corresponding to the plurality of channels is accumulated, returning to the step of determining the current convolutional layer until all convolutional layers are processed.

For example, taking M channels as an example, if the feature information corresponding to the M channels is accumulated, it indicates that the convolution processing is completed on the input data (i.e., the input feature map) of the current convolutional layer, so at this time, the output data (i.e., the output feature map) of the current convolutional layer may be used as the input data of the next convolutional layer, and the current convolutional layer is updated to the next convolutional layer, and the extended channel value of the next convolutional layer is obtained (i.e., the step 102 is returned to be executed), then the operations of steps 103 to 106 are executed, and so on until all convolutional layers are processed.

As can be seen from the above, after the preset convolutional neural network is obtained, the current convolutional layer may be determined, and the bit expansion channel value of the current convolutional layer is obtained, then, in the process of performing feature extraction and feature accumulation on input data by using the current convolutional layer, statistics is performed on the number of accumulated channels, and when the number of accumulated channels reaches an integral multiple of the bit expansion channel value, the number of operation bits of the current convolutional layer in the register is expanded, so as to store data generated in the accumulation process; because the scheme can flexibly determine the bit expanding time according to the actual situation of the current operation environment so as to automatically expand the operation bits in the register, compared with the existing scheme which can only expand the bits frequently, the scheme can reduce the number of instructions and realize the performance acceleration while avoiding overflow; in addition, due to the reduction of the number of instructions, the complexity of the convolution operation after quantization is also reduced, and further, the consumption of computing resources can be reduced.

Example II,

In order to better carry out the above process, further details will be given below by way of example. In this embodiment, the data processing apparatus is specifically integrated in a network device, and a middle bit-extended channel value is obtained from a static code library.

And (I) setting a bit expansion channel value.

Firstly, the value ranges of the weight values and the characteristic values to be used can be analyzed according to certain application scenes, secondly, the value ranges of each weight value and each characteristic value are used as a group of samples, and then, the network equipment calculates the maximum channel number without overflow of the preset operation digits according to the weight values and the characteristic value ranges of each group of samples to obtain the bit expansion channel value corresponding to each group of samples.

For example, as shown in fig. 2a, taking a group of samples as an example, the bit-extended channel value corresponding to the group of samples can be calculated by the following method:

and A201, the network equipment calculates the maximum numerical range which can be represented by the preset operation digit.

For example, if the predetermined number of bits is 8 bits (bit) and one sign bit (i.e., "+" or "-") is removed, the maximum value range that can be expressed is "2⁷-1 "; if the current operation bit number is 16 bits (bit), and one sign bit is removed, the maximum value range which can be represented by the current operation bit number is 2¹⁵-1 "; if the predetermined number of bits is 32 bits (bit) and one sign bit (i.e. "+" or "-") is removed, the maximum value range that can be expressed is "2³¹-1 "and so on, and so on.

And A202, the network equipment determines the maximum eigenvalue according to the eigenvalue value range of the group of samples.

For example, if the eigenvalues of the set of samples range from (-2)^r～2^r) Then, at this time, the maximum eigenvalue is "2^r”。

Wherein, the value range of rThe method can be determined according to the requirements of practical applications, for example, if the number of register bits for storing the characteristic value is 16 bits (one of the bits is a sign bit), then the value range of r is (0-15), and then the maximum characteristic value of the group of samples is "2" at this time¹⁵”。

For another example, if the number of register bits for storing the feature value is 32 bits (one of the bits is a sign bit), then the value range of r is (0-31), and then the maximum feature value of the group of samples is "2" at this time³¹"and so on, and so on.

And A203, the network equipment performs convolution operation on the maximum characteristic value and the weight value of the group of samples to obtain the maximum value of each channel, and calculates the maximum channel number which can be contained in the maximum value range according to the maximum value of each channel.

For example, taking the number of operation bits of 16 bits as an example, the maximum value range is "2¹⁵-1 ", therefore, the following formula can be derived:

where n is the number of eigenvalues, d_jimaxFor the maximum eigenvalue used in calculating the "maximum value for the j-th channel", the maximum value for the j-th channelIn the known case, the value of m, i.e. the maximum number of channels that can be accommodated within this maximum value range, can be calculated.

And A204, the network equipment determines the bit expansion channel value of the group of samples according to the maximum channel number obtained by calculation.

For example, the network device may directly use the calculated maximum number of channels as the bit-extended channel value corresponding to the group of samples.

Optionally, in order to further reduce the number of multiply-add instructions in the convolution operation process, fine-grained adjustment may be performed on the calculated maximum number of channels, and the adjusted maximum number of channels is used as the extended channel value corresponding to the group of samples; for example, the size of "maximum number of channels (i.e. m value)" may be increased step by step, and for the same input feature map and convolution kernel, it is tested whether overflow will occur by using different m values, and then the maximum m value that will not cause overflow, such as m 'value, is selected from the m values, that is, the m' value may be used as the bit-extended channel value corresponding to the group of samples.

By analogy, bit expansion channel values corresponding to multiple groups of samples can be obtained, and then the corresponding relation between the weight value, the characteristic value range and the bit expansion channel value of each group of samples can be stored in the static code library, so that the corresponding bit expansion channel value can be called from the static code library according to different weight values and different characteristic value ranges.

Optionally, a general function may be further stored in the static code library, and is used to process an unusual bit expansion channel value, so that if a corresponding bit expansion channel value cannot be directly called from the static library subsequently, the general function may be used to calculate a maximum channel number where no overflow occurs in a current operation bit number according to a weighted value and a characteristic value range of a convolution kernel of a current convolution layer, and obtain the bit expansion channel value of the current convolution layer, which is not described herein again.

And (II) using the bit expansion channel value.

As shown in fig. 2b, a specific flow of a data processing method may be as follows:

and B201, the network equipment acquires a preset convolutional neural network.

The network structure of the convolutional neural network may be determined according to the requirements of practical applications, and for example, may include a plurality of convolutional layers, and may further include structures such as an active layer, a pooling layer, and a full-link layer. Parameters of each layer in the network structure, such as the convolution kernel size of the convolution layer, the weight of the convolution kernel, the step size, and the like, may also be set according to a specific application scenario or requirement.

The convolutional neural network may be implemented by code, for example, a set of template codes may be written, and then, according to the network structure and parameters of each layer, different convolutional core codes are dynamically generated based on the template codes by JIT.

Specifically, the preset codes of a plurality of convolutional neural networks may be stored in a code library, such as a static code library, and then, when an acquisition request of a convolutional neural network is received, the network device acquires a corresponding code from the code library according to a convolutional neural network identifier carried in the acquisition request, so as to obtain the corresponding convolutional neural network.

Wherein the static code library may be stored in the network device or other device. Of course, the codes of the convolutional neural network may be stored in other forms in a local or other device, which will not be listed here.

B202, the network equipment determines the current convolutional layer, and determines the weight value of the convolutional core of the current convolutional layer and the value range of the characteristic value of the input data.

For example, taking the convolutional neural network including convolutional layer 1, convolutional layer 2, and convolutional layer 3 as an example, if the current convolutional layer is convolutional layer 1, then the weight value of the convolutional core of convolutional layer 1 and the range of the eigenvalue of the input data may be obtained at this time; if the current convolutional layer is convolutional layer 2, then the weight value of the convolutional core of convolutional layer 2 and the range of the characteristic value of the input data can be obtained; if the current convolutional layer is convolutional layer 3, then the weight value of the convolutional core of convolutional layer 3 and the range of the characteristic value of the input data may be obtained, and so on.

And B203, the network device obtains the bit expansion channel value corresponding to the weighted value and the characteristic value range obtained in the step 202 from the preset static code library, and obtains the bit expansion channel value of the current convolutional layer.

Because the static code library stores the corresponding relation between each group of weight values, the characteristic value range and the bit expansion channel value, the bit expansion channel value of the current convolutional layer can be obtained only by calling in the static code trousers according to the weight values and the characteristic value range.

Optionally, if the bit expansion channel value corresponding to the weight value and the characteristic value range obtained in step 202 is not called in the static code library, a preset general function "for managing an unusual bit expansion channel value" may be called from the static code library, and then, the general function is used to calculate the maximum number of channels where the current operation digit does not overflow according to the weight value and the characteristic value range, so as to obtain the bit expansion channel value of the current convolution layer.

And B204, the network equipment extracts the characteristics of the input data by adopting the current convolution layer to obtain the characteristic information corresponding to the plurality of channels.

For example, the network device may determine a channel that needs to be processed currently, obtain a feature value of input data of the current convolutional layer on the channel that needs to be processed currently, and a weight value of the convolutional core, multiply and accumulate the feature value and the weight value respectively, so as to obtain feature information corresponding to the channel that needs to be processed currently, and then may return to execute the step of determining the channel that needs to be processed currently (i.e., process another channel), until all channels are processed, so as to obtain feature information corresponding to multiple channels of the current convolutional layer.

For example, taking channel 1, channel 2, and channel 3 as examples, if the characteristic value of the input data of the current convolutional layer on channel 1 is "d₁₁、d₁₂、d₁₃、d₁₄、d₁₅、d₁₆、d₁₇、d₁₈、d₁₉、……d_1i", the characteristic value on channel 2 is" d₂₁、d₂₂、d₂₃、d₂₄、d₂₅、d₂₆、d₂₇、d₂₈、d₂₉、……d_2i", the characteristic value on channel 2 is" d₃₁、d₃₂、d₃₃、d₃₄、d₃₅、d₃₆、d₃₇、d₃₈、d₃₉、……d_3iAnd the weight value of the convolution kernel is' w₁、w₂、w₃、w₄、w₅、w₆、w₇、w₈、w₉、……w_i", then:

on channel 1, the result of the convolution operation F₁Can be written as:

on channel 2, the result of the convolution operation F₂Can be written as:

on channel 3, the result of the convolution operation F₃Can be written as:

similarly, the feature information corresponding to all channels of the current convolutional layer can be obtained.

B205, the network device accumulates the characteristic information corresponding to the multiple channels, and counts the number of the accumulated channels.

Because the input data generally includes a plurality of channels, when the convolutional neural network processes a certain input data graph, it is necessary to perform convolution operation on values of the plurality of channels at corresponding positions in the input data, and if the number of channels is denoted as M, a result S obtained by accumulating feature information corresponding to the plurality of channels may be represented as:

wherein d is_jiCharacteristic value, w, of input data of current convolutional layer on the j-th channel_iIs the weight value of the convolution kernel of the current convolution layer.

B206, if the accumulated channel number reaches the integral multiple of the bit expansion channel value, the network equipment calls a bit expansion instruction.

For example, if the bit-extended channel value is 3, when the number of accumulated channels reaches 3, 6, etc., a bit-extended instruction needs to be called, that is, a bit-extended instruction needs to be called once every time 3 channels are accumulated, so as to extend the number of operation bits of the current convolution layer in the register. For example, specifically, the accumulated number of channels may be counted by a counter, and when the statistical result shows that the current accumulated number of channels reaches a multiple of 3 (for example, 3, 6, 9, and the like), a one-time bit expansion instruction is called to expand the number of operation bits of the current convolution layer in the register; or, after each time of bit expansion, the counter may be cleared, and then the count of the number of channels accumulated by the cleared counter is continued to be performed, and when the statistical result shows that the number of channels currently accumulated is 3, a bit expansion instruction is called to expand the number of operation bits of the current convolution layer in the register.

Optionally, since the total number of channels M may not be an integer multiple of the value of M, a corresponding boundary condition M ″ may be set according to an actual requirement, where M ″ may be obtained by the following formula:

if the current number of remaining channels is not enough to be divided by M, for example, if the total number of channels M is 7, M is 3, and the current number of operation bits is 16 bits, then when the accumulated number of channels reaches 3 and 6, a bit expansion instruction needs to be invoked to expand the number of operation bits of the current convolutional layer in the register, which may be as follows:

first time of bit expansion: when the accumulated channel number reaches 3, calling a bit expansion instruction to expand the operation bit number from 16 bits to 32 bits;

and (3) second time of bit expansion: when the number of accumulated channels reaches 6, a bit expansion instruction is invoked to expand the number of operational bits from 32 bits to 64 bits.

Because the total channel number 7 cannot be divided by the expanded channel value 3, after the second bit expansion, the third bit expansion cannot be performed until all the channels are accumulated, so that the bit expansion instruction can be called to expand the operation bit number in a brute force manner when the current residual channel number is not enough to be divided by the expanded channel value; of course, the bit expansion may also be omitted, which may be determined according to the requirements of the actual application and will not be described herein.

B207, the network equipment expands the operation digit of the current convolution layer in the register according to the bit expansion instruction so as to store the data generated in the accumulation process.

For example, taking the bit-expansion instruction as a multiply-add instruction with bit-expansion function as an example, at this time, the network device may determine the number of operation bits of the current convolution layer in the register and the target number of operation bits allowed to be expanded, and then expand the number of operation bits of the current convolution layer in the register to the target number of operation bits by using the multiply-add instruction with bit-expansion function.

For example, if the number of operation bits of the current convolution layer in the register is 16 bits, and the number of target operation bits allowed to be extended is 32 bits, then the network device may extend the number of operation bits of the current convolution layer in the register to 32 bits by using the multiply-add instruction with bit extension function, and so on.

And B208, after the characteristic information corresponding to the plurality of channels of the current convolutional layer is accumulated, the network equipment returns to the step of determining the current convolutional layer (step B202) until all the convolutional layers are processed.

For example, the network device may specifically determine whether feature information corresponding to multiple channels of the current convolutional layer is accumulated completely, if so, determine whether all convolutional layers are processed completely, and if so, perform other processing of the convolutional neural network, for example, may use other layers of the convolutional neural network, such as performing pooling processing by using a pooling layer, performing full connection processing by using a full connection layer, and the like; if the convolution layer which is not processed exists, returning to the step of determining the current convolution layer (step B202); otherwise, if the accumulation is not completed, continuing to execute B205-B208 to accumulate the feature information corresponding to the other remaining channels.

For example, if the current total number of channels is M, at this time, it may be determined whether the feature information corresponding to the M channels is accumulated completely, if the feature information is not accumulated completely, the feature information corresponding to the M channels is accumulated continuously, the number of accumulated channels is counted, and in the accumulation process, the bit is expanded once every time M channels are accumulated; otherwise, if the accumulation is completed, it indicates that the current convolutional layer has completed convolutional processing on its input data, so that at this time, the output data of the current convolutional layer can be used as the input data of the next convolutional layer, and the current convolutional layer is updated to the next convolutional layer, and the bit-extended channel value of the next convolutional layer is obtained (i.e., the step B202 is returned to be executed), then the operations of the steps B203 to B208 are executed, and so on, until all convolutional layers are processed.

For example, taking the convolutional neural network as an example, which includes convolutional layer 1, convolutional layer 2 and convolutional layer 3, and the current total channel number is 7, if the current convolutional layer is convolutional layer 1, after the feature information corresponding to 7 channels of the convolutional layer 1 is accumulated, the output data of the convolutional layer 1 can be used as the input data of the convolutional layer 2, and the current convolutional layer is updated to the convolutional layer 2, then, the bit-extended channel value of convolutional layer 2 is obtained, and then the operations of steps B203-B208 are performed, and so on, after the characteristic information corresponding to 7 channels of the convolutional layer 2 is accumulated, the output data of the convolutional layer 2 is used as the input data of the convolutional layer 3, and the current convolutional layer is updated to the convolutional layer 3, then, the bit-extended channel value of convolutional layer 3 is obtained, and the operations of steps B203 to B208 are continuously executed until the feature information corresponding to 7 channels of convolutional layer 3 is accumulated.

Optionally, in specific implementation, corresponding convolution operation codes may be generated based on different bit expansion channel values and stored in the static code base, so that when the convolutional neural network is used, the corresponding convolution operation codes may be directly called from the code base to be executed without separately obtaining the bit expansion channel values.

In addition, during specific implementation, the convolution operation code can be mapped to FPGA hardware by utilizing a Field-programmable gate Array (FPGA); alternatively, the convolution codes may be mapped to (ASIC) hardware to implement the convolution operation, which is not described herein.

As can be seen from the above, in this embodiment, for the value ranges of the weight values and the feature values commonly used in some application scenarios, corresponding bit expansion channel values are set in the static code library in advance, and then, after the preset convolutional neural network is obtained, the corresponding bit expansion channel values can be obtained from the static code library according to the value ranges of the weight values and the feature values of the current convolutional layer, and the operation digits of the current convolutional layer in the register are expanded based on the bit expansion channel values, so as to store the data generated in the accumulation process; because the scheme can flexibly and dynamically determine the bit expanding time according to the actual situation of the current operation environment so as to automatically expand the operation bits in the register, compared with the existing scheme which can only expand the bits frequently, the scheme can reduce the number of instructions and realize the performance acceleration while avoiding overflow; in addition, due to the reduction of the number of instructions, the complexity of the convolution operation after quantization is also reduced, and further, the consumption of computing resources can be reduced.

In addition, because the embodiment can directly call the corresponding bit expansion channel value from the static code library, and the bit expansion channel value does not need to be directly calculated according to the weight value and the value range of the characteristic value of the current convolutional layer, the processing efficiency can also be improved.

Example III,

In order to better implement the above method, an embodiment of the present invention further provides a data processing apparatus, which may be specifically integrated in a network device, such as a server or a mobile terminal.

For example, referring to fig. 3a, the data processing apparatus may comprise an acquisition unit 301, a determination unit 302, an extraction unit 303, an accumulation unit 304 and an expansion unit 305, as follows:

(1) an acquisition unit 301;

an obtaining unit 301, configured to obtain a preset convolutional neural network.

The network structure of the convolutional neural network may be determined according to the requirements of practical applications, and for example, may include a plurality of convolutional layers, and may further include structures such as an active layer, a pooling layer, and a full-link layer. Parameters of each layer in the network structure, such as the convolution kernel size of the convolution layer, the weight of the convolution kernel, the step size, and the like, may also be set according to a specific application scenario or requirement, and are not described herein again.

Specifically, the preset codes of a plurality of convolutional neural networks may be stored in a code base, such as a static code base, and then, when the obtaining unit 301 receives an obtaining request of a convolutional neural network, the corresponding code is obtained from the code base according to a convolutional neural network identifier carried in the obtaining request, that is:

the obtaining unit 301 may be specifically configured to receive an obtaining request of the convolutional neural network, where the convolutional neural network identifier carried in the obtaining request obtains a corresponding code from a code base according to the convolutional neural network identifier, so as to obtain the convolutional neural network.

Of course, the codes of the convolutional neural network may be stored in other forms in a local or other device, which will not be listed here.

(2) A determination unit 302;

a determining unit 302, configured to determine a current convolutional layer, and obtain input data and an extended bit channel value of the current convolutional layer.

For example, the determining unit 302 may include a first determining subunit, an input subunit, and a second determining subunit, as follows:

the first determining subunit is used for determining the current convolutional layer.

The input subunit is used for acquiring the input data of the current convolutional layer.

The second determining subunit is configured to determine a weight value of a convolution kernel of the current convolution layer and a value range of a feature value of the input data, and determine a bit expansion channel value of the current convolution layer according to the weight value and the value range of the feature value.

The method for determining the bit expansion channel value of the current convolutional layer may be various, for example, the bit expansion channel value may be obtained by directly calculating according to the weighted value and the eigenvalue value range, or may also be obtained by querying, that is:

A. the first method is as follows:

the second determining subunit may be specifically configured to calculate, according to the weight value and the range of the characteristic value, a maximum number of channels where no overflow occurs in the current operation digit, and obtain an extended channel value of the current convolutional layer.

For example, the second determining subunit may be specifically configured to calculate a maximum numerical range that can be represented by a current operation digit, determine a maximum feature value according to a feature value range, perform convolution operation on the maximum feature value and a weight value to obtain a maximum value of each channel in the current convolutional layer, calculate a maximum number of channels that can be accommodated in the maximum numerical range according to the maximum value of each channel, and determine an extended channel value of the current convolutional layer according to the calculated maximum number of channels; for example, the maximum channel number obtained by calculation may be directly used as the extended channel value of the current convolutional layer, that is:

the second determining subunit may be specifically configured to use the calculated maximum number of channels as the extended channel value of the current convolutional layer.

Optionally, since the maximum number of channels is calculated based on "each eigenvalue is a maximum eigenvalue", and in general, each eigenvalue is not necessarily a "maximum eigenvalue", in order to further reduce the number of multiply-add instructions in the convolution operation process, fine-grained adjustment may also be performed on the calculated maximum number of channels, and the adjusted maximum number of channels is used as an extended channel value of the current convolution layer, that is:

the second determining subunit may be specifically configured to perform fine-grained adjustment on the calculated maximum number of channels according to a prediction policy, so as to obtain an extended channel value of the current convolutional layer.

B. The second method comprises the following steps:

the second determining subunit may be specifically configured to obtain, from a preset static code library, a bit expansion channel value corresponding to the weighted value and the eigenvalue value range, and obtain a bit expansion channel value of the current convolutional layer.

For example, specifically, for some application scenarios, a value range of a weight value and a feature value to be used may be analyzed, and then a corresponding bit expansion channel value may be obtained through calculation, and then a correspondence relationship between the weight value, the value range of the feature value, and the bit expansion channel value is stored in a static code library for subsequent calling; i.e. as shown in fig. 3b, the data processing apparatus may further comprise a setup unit 306, as follows:

the establishing unit 306 may be configured to obtain weighted values and eigenvalue value ranges of multiple groups of samples, calculate a maximum number of channels where a preset operation digit does not overflow according to the weighted value and the eigenvalue value range of each group of samples, obtain a bit expansion channel value corresponding to each group of samples, establish a corresponding relationship between the weighted value, the eigenvalue value range, and the bit expansion channel value of each group of samples, and store the corresponding relationship in a static code library.

The method is similar to the method one, wherein the maximum channel number without overflow of the preset operation digit is calculated according to the weighted value and the characteristic value range of each group of samples. For example, the maximum value range that can be represented by the current operation digit may be calculated, the maximum feature value may be determined according to the feature value range, then convolution operation may be performed on the maximum feature value and the weight value to obtain the maximum value of each channel in the current sample, then the maximum channel number that can be accommodated in the maximum value range may be calculated according to the maximum value of each channel, and the extended channel value of the sample may be determined according to the calculated maximum channel number, and so on, which is detailed in the foregoing embodiment and will not be described herein.

The correspondence relationship may be stored in a code library that is independent of the "convolutional neural network" code, or may be stored in a code library that stores the "convolutional neural network" code.

Optionally, a general function may be further stored in the static code library, for processing an unusual bit expansion channel value, for example, the second determining subunit may be specifically configured to call the general function from a preset static code library, and then calculate, by using the general function, a maximum channel number where no overflow occurs in the current operation bit number according to the weighted value and the value range of the characteristic value, to obtain a bit expansion channel value of the current convolution layer, and the like, where a specific calculation manner may refer to "manner one", and details of which are not described herein.

(3) An extraction unit 303;

an extracting unit 303, configured to perform feature extraction on the input data by using the current convolutional layer to obtain feature information corresponding to multiple channels.

For example, the extracting unit 303 may be specifically configured to determine a channel that needs to be processed currently, then obtain a feature value of input data of the current convolutional layer on the channel that needs to be processed currently and a weight value of the convolutional core, then multiply and accumulate the feature value and the weight value respectively to obtain feature information corresponding to the channel that needs to be processed currently, and then return to execute the operation of determining the channel that needs to be processed currently (i.e., process another channel) until all channels are processed, so as to obtain feature information corresponding to multiple channels of the current convolutional layer.

(4) An accumulation unit 304;

the accumulation unit 304 is configured to accumulate the feature information corresponding to the multiple channels, count the number of the accumulated channels, and trigger the determination unit 302 to perform an operation of determining the current convolutional layer after the feature information corresponding to the multiple channels is accumulated until all convolutional layers are processed.

For example, taking M channels as an example, if the accumulation unit 304 has already accumulated the feature information corresponding to the M channels, it indicates that the convolution processing has been completed on the input data of the current convolutional layer, so at this time, the output data of the current convolutional layer may be used as the input data of the next convolutional layer, and the current convolutional layer is updated to the next convolutional layer, and the extended channel value of the next convolutional layer is obtained (i.e., the determination unit 302 performs the operation of determining the current convolutional layer), and so on until all convolutional layers are processed.

(5) An expansion unit 305;

For example, the extension unit 305 may include a calling subunit and an extension subunit, as follows:

the calling subunit may be configured to call the bit expansion instruction when the accumulated number of channels reaches an integer multiple of the bit expansion channel value.

For example, if the bit-expansion channel value is m, when the number of accumulated channels reaches m, 2m, and 3m, etc., the calling subunit needs to call the bit-expansion instruction, that is, each time m channels are accumulated, the calling subunit needs to call the bit-expansion instruction once to trigger the expansion subunit to expand the number of operation bits of the current convolution layer in the register.

The extension subunit may be configured to extend the number of operation bits of the current convolution layer in the register according to the bit extension instruction, so as to store data generated in the accumulation process.

For example, taking the bit-expansion instruction as a multiply-add instruction with bit-expansion function as an example, the expansion subunit may be specifically configured to determine the number of operation bits of the current convolution layer in the register and a target number of operation bits allowed to be expanded, and expand the number of operation bits of the current convolution layer in the register to the target number of operation bits by using the multiply-add instruction with bit-expansion function.

For example, if the number of operation bits of the current convolution layer in the register is 16 bits, and the number of target operation bits allowed to be expanded is 32 bits, then the expansion subunit may expand the number of operation bits of the current convolution layer in the register to 32 bits by using the multiply-add instruction with the bit expansion function, and so on.

In a specific implementation, the above units may be implemented as independent entities, or may be combined arbitrarily to be implemented as the same or several entities, and the specific implementation of the above units may refer to the foregoing method embodiments, which are not described herein again.

As can be seen from the above, in the data processing apparatus of this embodiment, after the preset convolutional neural network is obtained, the determining unit 302 may determine the current convolutional layer, and obtain the bit expansion channel value of the current convolutional layer, then, in the process that the extracting unit 303 performs feature extraction on the input data and the accumulating unit 304 accumulates the feature information, the accumulating unit 304 performs statistics on the accumulated channel number, and when the accumulated channel number reaches the integral multiple of the bit expansion channel value, the expanding unit 305 expands the operation bits of the current convolutional layer in the register, so as to store the data generated in the accumulation process; because the scheme can flexibly determine the bit expanding time according to the actual situation of the current operation environment so as to automatically expand the operation bits in the register, compared with the existing scheme which can only expand the bits frequently, the scheme can reduce the number of instructions and realize the performance acceleration while avoiding overflow; in addition, due to the reduction of the number of instructions, the complexity of the convolution operation after quantization is also reduced, and further, the consumption of computing resources can be reduced.

Example four,

An embodiment of the present invention further provides a network device, as shown in fig. 4, which shows a schematic structural diagram of the network device according to the embodiment of the present invention, specifically:

the network device may include components such as a processor 401 of one or more processing cores, memory 402 of one or more computer-readable storage media, a power supply 403, and an input unit 404. Those skilled in the art will appreciate that the network device architecture shown in fig. 4 does not constitute a limitation of network devices and may include more or fewer components than shown, or some components may be combined, or a different arrangement of components. Wherein:

the processor 401 is a control center of the network device, connects various parts of the entire network device by using various interfaces and lines, and performs various functions of the network device and processes data by running or executing software programs and/or modules stored in the memory 402 and calling data stored in the memory 402, thereby performing overall monitoring of the network device. Optionally, processor 401 may include one or more processing cores; preferably, the processor 401 may integrate an application processor, which mainly handles operating systems, user interfaces, application programs, etc., and a modem processor, which mainly handles wireless communications. It will be appreciated that the modem processor described above may not be integrated into the processor 401.

The memory 402 may be used to store software programs and modules, and the processor 401 executes various functional applications and data processing by operating the software programs and modules stored in the memory 402. The memory 402 may mainly include a program storage area and a data storage area, wherein the program storage area may store an operating system, an application program required by at least one function (such as a sound playing function, an image playing function, etc.), and the like; the storage data area may store data created according to use of the network device, and the like. Further, the memory 402 may include high speed random access memory, and may also include non-volatile memory, such as at least one magnetic disk storage device, flash memory device, or other volatile solid state storage device. Accordingly, the memory 402 may also include a memory controller to provide the processor 401 access to the memory 402.

The network device further includes a power supply 403 for supplying power to each component, and preferably, the power supply 403 is logically connected to the processor 401 through a power management system, so that functions of managing charging, discharging, and power consumption are implemented through the power management system. The power supply 403 may also include any component of one or more dc or ac power sources, recharging systems, power failure detection circuitry, power converters or inverters, power status indicators, and the like.

The network device may also include an input unit 404, where the input unit 404 may be used to receive input numeric or character information and generate keyboard, mouse, joystick, optical or trackball signal inputs related to user settings and function control.

Although not shown, the network device may further include a display unit and the like, which are not described in detail herein. Specifically, in this embodiment, the processor 401 in the network device loads the executable file corresponding to the process of one or more application programs into the memory 402 according to the following instructions, and the processor 401 runs the application program stored in the memory 402, thereby implementing various functions as follows:

the method comprises the steps of obtaining a preset convolutional neural network, determining a current convolutional layer, obtaining input data and bit expansion channel values of the current convolutional layer, performing feature extraction on the input data by adopting the current convolutional layer to obtain feature information corresponding to a plurality of channels, accumulating the feature information corresponding to the plurality of channels, counting the number of accumulated channels, expanding the operation number of the current convolutional layer in a register if the number of accumulated channels reaches integral multiple of the bit expansion channel values to store data generated in the accumulation process, and returning to the step of determining the current convolutional layer after the feature information corresponding to the plurality of channels is accumulated until all convolutional layers are processed.

For example, a weight value of a convolution kernel of the current convolution layer and a characteristic value range of input data may be specifically determined, and a bit expansion channel value of the current convolution layer is determined according to the weight value and the characteristic value range, for example, a maximum channel number where overflow is not generated in a current operation digit may be calculated according to the weight value (i.e., a weight value of a convolution kernel of the current convolution layer) and the characteristic value range, so as to obtain a bit expansion channel value of the current convolution layer; or, the bit expansion channel value corresponding to the weighted value and the eigenvalue value range may be obtained from a preset static code library, so as to obtain the bit expansion channel value of the current convolutional layer.

For example, specifically, for some application scenarios, a value range of a weight value and a feature value to be used may be analyzed, and then a corresponding bit expansion channel value may be obtained through calculation, and then a correspondence relationship between the weight value, the value range of the feature value, and the bit expansion channel value is stored in a static code library for subsequent calling; that is, the processor 401 may also run an application program stored in the memory 402, thereby implementing the following functions:

Optionally, a general function may be further stored in the static code base, and is used to process an unusual bit-extended channel value, which may be specifically referred to in the foregoing embodiments and is not described herein.

The above operations can be implemented in the foregoing embodiments, and are not described in detail herein.

As can be seen from the above, after the network device of this embodiment acquires the preset convolutional neural network, it may determine the current convolutional layer, and acquire the bit expansion channel value of the current convolutional layer, then, in the process of performing feature extraction and feature accumulation on input data by using the current convolutional layer, count the number of accumulated channels, and when the number of accumulated channels reaches the integral multiple of the bit expansion channel value, expand the number of operation bits of the current convolutional layer in the register, so as to store data generated in the accumulation process; because the scheme can flexibly determine the bit expanding time according to the actual situation of the current operation environment so as to automatically expand the operation bits in the register, compared with the existing scheme which can only expand the bits frequently, the scheme can reduce the number of instructions and realize the performance acceleration while avoiding overflow; in addition, due to the reduction of the number of instructions, the complexity of the convolution operation after quantization is also reduced, and further, the consumption of computing resources can be reduced.

Example V,

It will be understood by those skilled in the art that all or part of the steps of the methods of the above embodiments may be performed by instructions or by associated hardware controlled by the instructions, which may be stored in a computer readable storage medium and loaded and executed by a processor.

To this end, the embodiment of the present invention provides a storage medium, in which a plurality of instructions are stored, and the instructions can be loaded by a processor to execute the steps in the transfer method of any one of the data processing methods provided by the embodiment of the present invention. For example, the instructions may perform the steps of:

For example, specifically, for some application scenarios, a value range of a weight value and a feature value to be used may be analyzed, and then a corresponding bit expansion channel value may be obtained through calculation, and then a correspondence relationship between the weight value, the value range of the feature value, and the bit expansion channel value is stored in a static code library for subsequent calling; i.e., the instructions may also perform the steps of:

Wherein the storage medium may include: read Only Memory (ROM), Random Access Memory (RAM), magnetic or optical disks, and the like.

Since the instructions stored in the storage medium can execute the steps in any data processing method provided in the embodiment of the present invention, the beneficial effects that can be achieved by any data processing method provided in the embodiment of the present invention can be achieved, which are detailed in the foregoing embodiments and will not be described herein again.

The foregoing detailed description has provided a data processing method, apparatus, and storage medium according to embodiments of the present invention, and the present invention has been described in detail using specific examples to explain the principles and implementations of the present invention, and the description of the foregoing embodiments is only used to help understand the method and the core idea of the present invention; meanwhile, for those skilled in the art, according to the idea of the present invention, there may be variations in the specific embodiments and the application scope, and in summary, the content of the present specification should not be construed as a limitation to the present invention.

Claims

1. A data processing method, comprising:

2. The method of claim 1, wherein expanding the number of bits operated in the register by the current convolutional layer to store the data generated by the accumulation process if the accumulated number of channels reaches an integer multiple of the extended channel value comprises:

if the accumulated channel number reaches the integral multiple of the bit expansion channel value, calling a bit expansion instruction;

and expanding the operation digit of the current convolution layer in the register according to the bit expansion instruction so as to store the data generated in the accumulation process.

3. The method of claim 2, wherein the bit-expanding instruction is a multiply-add instruction with bit-expanding function, and wherein expanding the number of operation bits of the current convolution layer in the register according to the bit-expanding instruction comprises:

determining the operation digit of the current convolution layer in a register and a target operation digit allowing expansion;

and expanding the operation digit of the current convolution layer in the register to the target operation digit by adopting the multiply-add instruction with the bit expansion function.

4. The method of any of claims 1 to 3, wherein obtaining the extended bit lane value of the current convolutional layer comprises:

determining a weight value of a convolution kernel of the current convolution layer and a characteristic value range of input data;

and determining the bit expansion channel value of the current convolutional layer according to the weighted value and the eigenvalue value range.

5. The method of claim 4, wherein determining the extended bit channel value of the current convolutional layer according to the weight value and the eigenvalue range comprises:

calculating the maximum channel number without overflow of the current operation digit according to the weighted value and the value range of the characteristic value to obtain an expanded channel value of the current convolutional layer; or,

6. The method according to claim 5, wherein before obtaining the extended bit channel value corresponding to the weighted value and the eigenvalue value range from a preset static code library, and obtaining the extended bit channel value of the current convolutional layer, the method further comprises:

acquiring weighted values and eigenvalue value ranges of a plurality of groups of samples;

calculating the maximum channel number without overflow of preset operation digits according to the weighted value and the characteristic value range of each group of samples to obtain an expanded channel value corresponding to each group of samples;

and establishing a corresponding relation among the weight value, the characteristic value range and the bit expansion channel value of each group of samples, and storing the corresponding relation into a static code library.

7. The method according to claim 5, wherein the calculating a maximum number of channels for which overflow is not generated in a current operation digit according to the weight value and the eigenvalue value range to obtain the extended channel value of the current convolutional layer comprises:

calculating the maximum numerical range which can be expressed by the current operation digit;

determining a maximum eigenvalue according to the eigenvalue value range;

performing convolution operation on the maximum characteristic value and the weight value to obtain the maximum value of each channel in the current convolution layer;

calculating the maximum channel number which can be accommodated in the maximum numerical range according to the maximum value of each channel;

and determining the bit expansion channel value of the current convolutional layer according to the maximum channel number obtained by calculation.

8. The method of claim 7, wherein determining the extended lane value for the current convolutional layer based on the calculated maximum number of lanes comprises:

taking the maximum channel number obtained by calculation as the bit expansion channel value of the current convolutional layer; or,

9. A data processing apparatus, comprising:

10. The apparatus of claim 9, wherein the extension unit comprises a calling subunit and an extension subunit;

11. The apparatus according to claim 9 or 10, wherein the determining unit comprises a first determining subunit, an input subunit, and a second determining subunit;

12. The apparatus of claim 11,

the second determining subunit is specifically configured to calculate, according to the weight value and the eigenvalue value range, a maximum channel number for which no overflow occurs in the current operation digit, and obtain an extended channel value of the current convolutional layer; or,

the second determining subunit is specifically configured to obtain, from a preset static code library, a bit expansion channel value corresponding to the weighted value and the eigenvalue value range, and obtain a bit expansion channel value of the current convolutional layer.

13. The apparatus of claim 12, further comprising a setup unit;

14. The apparatus of claim 12,

the second determining subunit is specifically configured to calculate a maximum numerical range that can be represented by a current operation digit, determine a maximum feature value according to a feature value range, perform convolution operation on the maximum feature value and a weight value to obtain a maximum value of each channel in the current convolutional layer, calculate a maximum channel number that can be accommodated within the maximum numerical range according to the maximum value of each channel, and determine an extended channel value of the current convolutional layer according to the calculated maximum channel number.

15. A storage medium storing a plurality of instructions adapted to be loaded by a processor to perform the steps of the data processing method according to any one of claims 1 to 8.