CN112446461A

CN112446461A - Neural network model training method and device

Info

Publication number: CN112446461A
Application number: CN201910808066.6A
Authority: CN
Inventors: 张渊; 谢迪; 浦世亮
Original assignee: Hangzhou Hikvision Digital Technology Co Ltd
Current assignee: Hangzhou Hikvision Digital Technology Co Ltd
Priority date: 2019-08-29
Filing date: 2019-08-29
Publication date: 2021-03-05
Also published as: WO2021037174A1

Abstract

The embodiment of the application provides a neural network model training method and device, wherein training samples are obtained, and the neural network model is trained by utilizing the training samples. When the neural network model is trained, integer fixed-point coding is carried out on the first activation quantity input into each network layer and the network weight of each network layer, the first activation quantity and the network weight after coding are integer fixed-point data with specified bit width, when operation is carried out, the involved operations such as matrix multiplication, matrix addition and the like all adopt integer fixed-point formats, and the bit width of the integer fixed-point data is obviously less than the bit width of single-precision floating-point data, so that the hardware resource overhead required by running the neural network model can be greatly reduced.

Description

Neural network model training method and device

Technical Field

The application relates to the technical field of machine learning, in particular to a neural network model training method and device.

Background

The deep neural network is an emerging field in machine learning research, analyzes data by simulating a mechanism of a human brain, and is an intelligent model for analyzing and learning by establishing and simulating the human brain. At present, deep neural networks, such as convolutional neural networks, cyclic neural networks, long-short term memory networks, and the like, have been well applied in the aspects of target detection and segmentation, behavior detection and recognition, voice recognition, and the like.

At present, the training of the neural network model usually adopts single-precision floating point data to operate so as to ensure the precision of the convergence of the neural network model. However, single-precision floating point data has a high bit width, and the amount of data participating in the operation is large, so that high hardware resource overhead is required for operating the neural network model.

Disclosure of Invention

The embodiment of the application aims to provide a neural network model training method and device so as to reduce hardware resource overhead required by running a neural network model. The specific technical scheme is as follows:

in a first aspect, an embodiment of the present application provides a neural network model training method, where the method includes:

obtaining a training sample;

training the neural network model by using the training sample, wherein when the neural network model is trained, aiming at each network layer in the neural network model, the following steps are respectively executed:

acquiring a first activation quantity input into a network layer and a network weight of the network layer;

performing integer fixed point coding on the first activation quantity and the network weight, and coding the first activation quantity and the network weight into integer fixed point data with a specified bit width;

and calculating a second activation quantity output by the network layer according to the coded first activation quantity and the network weight.

Optionally, the method is applied to a camera; the training sample is a training sample with a specified target; the neural network model is a target detection model for detecting a specified target;

before the step of training the neural network model using the training samples, the method further comprises:

starting a target detection function;

judging whether a model online training function is started or not;

training a neural network model by using the training samples, wherein the training steps comprise:

and if the model on-line training function is started, training the target detection model by using a training sample with a specified target.

Optionally, the step of training the neural network model by using the training sample includes:

inputting a training sample into a neural network model, and performing forward operation on the training sample according to the sequence of each network layer in the neural network model from front to back to obtain a forward operation result of the neural network model, wherein when performing forward operation, a first activation quantity input into the network layer and a network weight of the network layer are respectively subjected to integer fixed point coding aiming at each network layer, the first activation quantity and the network weight are coded into integer fixed point data with a specified bit width, a second activation quantity output by the network layer is calculated according to the coded first activation quantity and the network weight, and the second activation quantity is used as the first activation quantity input into the next network layer for calculation until the second activation quantity output by the last network layer is determined as the forward operation result;

comparing the forward operation result with a preset nominal value to obtain a loss value;

inputting the loss value into the neural network model, according to the sequence of each network layer in the neural network model from back to front, the loss value is reversely calculated to obtain the weight gradient of each network layer in the neural network model, wherein, when the reverse operation is performed, the shaping fixed point coding is performed on the first activation quantity, the first activation quantity gradient and the network weight value of each network layer, which are input into the network layer, respectively, the first activation quantity gradient and the network weight value are coded into the shaping fixed point data with the appointed bit width, calculating a second activation quantity gradient and a weight gradient output by the network layer according to the coded first activation quantity, the coded first activation quantity gradient and the coded network weight, and calculating by taking the second activation quantity gradient as the first activation quantity gradient input into the next network layer until the weight gradients of all the network layers are calculated;

and adjusting the network weight of each network layer according to the weight gradient of each network layer.

Optionally, the step of adjusting the network weight of each network layer according to the weight gradient of each network layer includes:

performing integer fixed point coding on the weight gradient of each network layer, and coding the weight gradient of each network layer into integer fixed point data with a specified bit width;

and calculating the adjusted network weight of each network layer by using a preset optimization algorithm according to the encoded weight gradient of each network layer and the encoded network weight of each network layer.

Optionally, after calculating a second activation amount output by the network layer according to the encoded first activation amount and the network weight, the method provided in the embodiment of the present application further includes:

and performing integer fixed point encoding on the second activation quantity, and encoding the second activation quantity into integer fixed point data with the specified bit width.

Optionally, the step of performing integer fixed point coding on the first activation quantity and the network weight, and coding the first activation quantity and the network weight into integer fixed point data with a specified bit width includes:

and respectively coding each scalar numerical value in the first activation quantity and the network weight as a product of a parameter value representing the global dynamic range and an integer fixed point value of the designated bit width.

Optionally, if the network layer is a convolutional layer, the size of the network weight is C × R × N, and for each scalar numerical value in each three-dimensional tensor with the size of C × R, the corresponding parameter values are the same;

if the network layer is a full connection layer, the network weight is MxN, and corresponding parameter values are the same for each scalar numerical value in each column vector with the size of 1 xN;

the parameter values corresponding to the scalar values in the first activation quantity are the same.

In a second aspect, an embodiment of the present application provides a neural network model training apparatus, including:

the acquisition module is used for acquiring a training sample;

the training module is used for training the neural network model by utilizing the training sample, wherein when the training module trains the neural network model, the training module respectively executes the following steps aiming at each network layer in the neural network model:

Optionally, the apparatus is applied to a camera; the training sample is a training sample with a specified target; the neural network model is a target detection model for detecting a specified target;

the device also includes:

the starting module is used for starting a target detection function;

the judging module is used for judging whether a model on-line training function is started or not;

the training module is specifically configured to:

and if the judgment result of the judgment module is that the on-line model training function is started, training the target detection model by using a training sample with a specified target.

Optionally, the training module is specifically configured to:

Optionally, the training module, when configured to adjust the network weights of the network layers according to the weight gradients of the network layers, is specifically configured to:

Optionally, the training module is further configured to:

Optionally, the training module is specifically configured to, when the training module is configured to perform integer fixed point coding on the first activation quantity and the network weight, and code the first activation quantity and the network weight as integer fixed point data with a specified bit width:

In a third aspect, embodiments provide a computer device comprising a processor and a machine-readable storage medium storing machine-executable instructions executable by the processor, the processor being caused by the machine-executable instructions to: the method provided by the first aspect of the embodiments of the present application is implemented.

In a fourth aspect, an embodiment of the present application provides a machine-readable storage medium, which stores machine-executable instructions and, when being invoked and executed by a processor, implements the method provided in the first aspect of the embodiment of the present application.

The neural network model training method and device provided by the embodiment of the application acquire the training samples and train the neural network model by using the training samples. When the neural network model is trained, aiming at each network layer in the neural network model, respectively executing: the method comprises the steps of obtaining a first activation quantity input into a network layer and a network weight of the network layer, conducting integer fixed point coding on the first activation quantity and the network weight, coding the first activation quantity and the network weight into integer fixed point data with a specified bit width, and calculating a second activation quantity output by the network layer according to the coded first activation quantity and the network weight. When the neural network model is trained, integer fixed-point coding is carried out on the first activation quantity input into each network layer and the network weight of each network layer, the first activation quantity and the network weight after coding are integer fixed-point data with specified bit width, when operation is carried out, the involved operations such as matrix multiplication, matrix addition and the like all adopt integer fixed-point formats, and the bit width of the integer fixed-point data is obviously less than the bit width of single-precision floating-point data, so that the hardware resource overhead required by running the neural network model can be greatly reduced.

Drawings

In order to more clearly illustrate the embodiments of the present application or the technical solutions in the prior art, the drawings used in the description of the embodiments or the prior art will be briefly described below, it is obvious that the drawings in the following description are only some embodiments of the present application, and for those skilled in the art, other drawings can be obtained according to the drawings without creative efforts.

FIG. 1 is a schematic flow chart illustrating a neural network model training method according to an embodiment of the present disclosure;

FIG. 2 is a schematic diagram of a neural network model training process according to an embodiment of the present application;

fig. 3 is a schematic diagram of an execution flow of each network layer in a neural network model in a process of training the neural network model according to an embodiment of the present application;

FIG. 4 is a schematic diagram of a tensor space structure corresponding to a convolution kernel of a four-dimensional tensor of size C × R × R × N according to an embodiment of the present application;

FIG. 5 is a diagram illustrating an example of how each scalar value is encoded within a three-dimensional tensor of size C R in accordance with an embodiment of the present application;

fig. 6 is a schematic diagram of a tensor space structure corresponding to a two-dimensional matrix with size of mxn according to an embodiment of the present application;

FIG. 7 is a diagram illustrating an example of the encoding of each scalar value within a column vector of size 1N according to the present disclosure;

FIG. 8 is a schematic diagram of the manner in which each scalar value is encoded within the activation volume and activation volume gradient three-dimensional tensor according to an embodiment of the present application;

FIG. 9 is a schematic flowchart of a target detection model training method applied to a camera according to an embodiment of the present disclosure;

FIG. 10 is a schematic structural diagram of a neural network model training apparatus according to an embodiment of the present application;

fig. 11 is a schematic structural diagram of a computer device according to an embodiment of the present application.

Detailed Description

The technical solutions in the embodiments of the present application will be clearly and completely described below with reference to the drawings in the embodiments of the present application, and it is obvious that the described embodiments are only a part of the embodiments of the present application, and not all of the embodiments. All other embodiments, which can be derived by a person skilled in the art from the embodiments given herein without making any creative effort, shall fall within the protection scope of the present application.

In order to reduce hardware resource overhead required for running a neural network model, embodiments of the present application provide a neural network model training method, apparatus, computer device, and machine-readable storage medium. Next, a neural network model training method provided in the embodiment of the present application is first described.

An execution main body of the neural network training method provided by the embodiment of the application may be a computer device with a neural network model training function, a computer device for implementing functions such as target detection and segmentation, behavior detection and recognition, voice recognition, and the like, a camera with functions such as target detection and segmentation, behavior detection and recognition, or a microphone with a voice recognition function, and the execution main body at least includes a core processing chip with a data processing capability. The method for implementing the neural network training method provided by the embodiment of the application can be at least one of software, hardware circuits and logic circuits arranged in an execution subject.

As shown in fig. 1, a neural network model training method provided in an embodiment of the present application may include the following steps.

And S101, obtaining a training sample.

When the neural network training is performed, a large number of training samples are generally required to be collected, and the collected training samples are different according to different functions to be realized based on the neural network model. For example, if the detection model is trained for face detection, the collected training samples are face samples; and if the tracking model is trained for tracking the vehicle, the collected training samples are vehicle samples.

And S102, training the neural network model by using the training sample.

Inputting training samples into a neural network model, calculating the training samples by using a Back Propagation (BP) algorithm or other model training algorithms, comparing the calculation result with a set nominal value, and adjusting the network weight of the neural network model based on the comparison result. Different training samples are sequentially input into the neural network model, the steps are iteratively executed, the network weight is continuously adjusted, the output of the neural network model is more and more approximate to a nominal value, and the training of the neural network model is considered to be completed until the difference between the output of the neural network model and the nominal value is small enough or the output of the neural network model is converged.

Taking the BP algorithm as an example, the main calculation operation and data flow in the neural network model training process are shown in fig. 2, and each network layer mainly performs convolution operation Y during forward operation_i＝W_i*Y_i-1Each network layer mainly performs convolution operation dY when performing reverse operation_i-1＝dY_i-1*W_iAnd a matrix multiplication operation dW_i＝dY_i*Y_i-1Wherein, the forward operation refers to the operation sequence from the first network layer to the back, the reverse operation refers to the operation sequence from the last network layer to the back, W_iIndicating network weights at i-layer network layers, e.g. convolutional or full link layer parameters, Y_iIndicating the amount of activation, dW, input to or output from the i-layer network layer_iRepresents the weight gradient, dY, corresponding to the i-th network layer_iActivation amount ladder for indicating input of i-th layer network layerAnd (4) degree.

As shown in fig. 2, in the process of training the neural network model by using the BP algorithm, the training sample X is input into the neural network model, and the k-layer network layer sequentially performs convolution operation from front to back through the forward operation of the neural network model to obtain the model output Y_kComparing the model output with a nominal value by a loss function to obtain a loss value dY_kAnd performing reverse operation of the neural network model, sequentially performing convolution operation and matrix multiplication operation on the k-layer network layers from back to front to obtain a weight gradient corresponding to each network layer, and adjusting the network weight according to the weight gradient. Through a continuous iteration process, the output of the neural network model is more and more approximate to a nominal value.

In the embodiment of the present application, in the process of training the neural network model, each network layer in the neural network model needs to perform each step shown in fig. 3.

S301, obtaining the first activation quantity input into the network layer and the network weight of the network layer.

When the forward operation is carried out, the first activation quantity input into the i-layer network layer is Y_iWhen performing the inverse operation, the first activation amount inputted to the i-th layer network layer is dY_i。

S302, integer fixed point coding is carried out on the first activation quantity and the network weight, and the first activation quantity and the network weight are coded into integer fixed point data with a designated bit width.

For the i-layer network layer, a first activation amount Y input to the network layer is required_i、dY_iAnd the network weight W of the network layer_iAnd carrying out integer fixed-point coding, wherein the integer fixed-point coding is used for coding the data in the floating-point format into the data in the integer fixed-point format.

Optionally, S302 may specifically be:

The specific coding mode can be that the first activation quantity is combined with the netEach scalar value in the network weights is encoded as the product of a parameter value sp characterizing the global dynamic range and an integer fixed-point value ip specifying the bit-width, where sp is 2^EE is a signed binary number with bit width EB, EB is a set bit width, ip is a signed binary number with bit width IB, and IB is a bit width set according to the size of original floating point data. The integer fixed-point value ip and the parameter value sp are calculated in the following way:

wherein s is the sign bit of binary number x, and takes the value of 0 or 1, x_iIs the ith numerical value of the binary number x, and takes the value of 0 or 1.

Optionally, if the network layer is a convolutional layer, the size of the network weight is C × R × N, and for each scalar numerical value in each three-dimensional tensor with the size of C × R, the corresponding parameter values are the same; if the network layer is a full connection layer, the network weight is MxN, and corresponding parameter values are the same for each scalar numerical value in each column vector with the size of 1 xN; the parameter values corresponding to the scalar values in the first activation quantity are the same.

W_iThe network weight corresponding to the ith layer of the neural network model, and the type of the network layer is convolutional layer or full connection layer. If the ith layer is a convolutional layer, W_iThe four-dimensional tensor convolution kernel with the size of C × R × N is a corresponding tensor space structure as shown in fig. 4, where C represents the input channel direction dimension size of the convolution kernel, R represents the space dimension size of the convolution kernel, and N represents the output channel direction dimension size of the convolution kernel. For each three-dimensional tensor W of size C R_i ^pEach scalar value w within can be expressed as:

w＝sp·ip (3)

wherein each W_i ^pThe three-dimensional tensors share an sp, one for each scalar value wThe integer fixed point value ip. The encoding of each scalar value in a three-dimensional tensor of size C x R is shown in fig. 5. The calculation methods of ip and sp are as in formulas (1) and (2), and are not described herein again.

Similarly, if the ith layer is a fully connected layer, W_iThe two-dimensional matrix with the size of M × N corresponds to a tensor space structure as shown in fig. 6, and the matrix can be divided into a structure that the two-dimensional matrix with the size of M × N is divided into M column vectors with the size of 1 × N. For each column vector W of size 1 XN_i ^pEach scalar value w within is represented by the above equation (3). Each W_i ^pThe column vectors share an sp, and each scalar value w corresponds to an integer fixed-point value ip. The encoding of each scalar value within a column vector of size 1 × N is shown in fig. 7. The calculation methods of ip and sp are as in formulas (1) and (2), and are not described herein again.

Y_iAnd dY_iThe activation amount and the activation amount gradient corresponding to the ith layer of the neural network model are three-dimensional tensors with the size of C multiplied by H multiplied by W, and the three-dimensional tensors are Y_iOr dY_iEach scalar value y or dy within may be expressed as:

y＝sp·ip (4)

dy＝sp·ip (5)

wherein each three-dimensional tensor Y_iOr dY_iSharing an sp, each scalar value y or dy corresponds to an integer fixed-point value ip. The way in which each scalar value is encoded within the activation volume and activation volume gradient three-dimensional tensor is shown in figure 8. The calculation methods of ip and sp are as in formulas (1) and (2), and are not described herein again.

S303, calculating a second activation quantity output by the network layer according to the coded first activation quantity and the network weight.

As described above, each scalar value in the first activation quantity and the network weight is subjected to integer fixed point coding, and the coded value is an integer fixed point value, so that the maximum operation of the involved operation resource overhead, such as convolution operation and matrix multiplication operation, is converted from floating point operation to integer fixed point operation during forward operation and reverse operation, and the training efficiency of the neural network on a hardware platform is greatly improved.

Optionally, S102 may specifically be implemented by the following steps:

the method comprises the steps of firstly, inputting training samples into a neural network model, carrying out forward operation on the training samples according to the sequence of each network layer in the neural network model from front to back to obtain a forward operation result of the neural network model, wherein when the forward operation is carried out, aiming at each network layer, respectively carrying out integer fixed point coding on a first activation quantity input into the network layer and a network weight of the network layer, coding the first activation quantity and the network weight into integer fixed point data with a specified bit width, calculating a second activation quantity output by the network layer according to the coded first activation quantity and the coded network weight of each network layer, and calculating the second activation quantity as the first activation quantity input into the next network layer until the second activation quantity output by the last network layer is determined as the forward operation result.

And secondly, comparing the forward operation result with a preset nominal value to obtain a loss value.

Thirdly, inputting the loss value into the neural network model, according to the sequence of each network layer in the neural network model from back to front, the loss value is reversely calculated to obtain the weight gradient of each network layer in the neural network model, wherein, when the reverse operation is performed, the shaping fixed point coding is performed on the first activation quantity, the first activation quantity gradient and the network weight value of each network layer, which are input into the network layer, respectively, the first activation quantity gradient and the network weight value are coded into the shaping fixed point data with the appointed bit width, and calculating a second activation quantity gradient and a weight gradient output by the network layer according to the coded first activation quantity, the first activation quantity gradient and the network weight, and calculating by taking the second activation quantity gradient as the first activation quantity gradient input into the next network layer until the weight gradients of all the network layers are calculated.

And fourthly, adjusting the network weight of each network layer according to the weight gradient of each network layer.

The process from the first step to the fourth step is the BP algorithmThe four steps are continuously and circularly executed in the operation process of the neural network model training device, and the training of the neural network model is realized. The forward operation process is Y through multiplication of the first activation quantity and the network weight_i＝W_i*Y_i-1Calculating a second activation amount Y_iThe inverse operation process is dY through multiplication of the first activation quantity gradient and the network weight_i-1＝dY_i-1*W_iCalculating a second activation gradient dY_i-1And dW is multiplied by the first activation amount gradient_i＝dY_i*Y_i-1Calculating weight gradient dW_iWith the integer fixed point encoding, the floating point operation becomes an integer fixed point operation:

f₃₂(Y_k+1)＝f₃₂(Y_k)*f₃₂(W_k)→int_YB(Y_k+1)＝int_YB(Y_k)*int_WB(W_k) (6)

f₃₂(dY_k-1)＝f₃₂(dY_k)*f₃₂(W_k)→int_dYB(dY_k-1)＝int_dYB(dY_k)*int_WB(W_k) (7)

f₃₂(dW_k)＝f₃₂(dY_k)*f₃₂(Y_k-1)→int_dWB(dW_k)＝int_dYB(dY_k)*int_YB(Y_k-1) (8)

wherein YB, WB, dYB and dWB are integer bit width values, f₃₂() And int () represents 32-bit floating point format and integer fixed point format.

Optionally, the fourth step may be specifically implemented by the following steps:

performing integer fixed point coding on the weight gradient of each network layer, and coding the weight gradient of each network layer into integer fixed point data with a specified bit width; and calculating the adjusted network weight of each network layer by using a preset optimization algorithm according to the encoded weight gradient of each network layer and the encoded network weight of each network layer.

After calculating the weight gradients of each network layer, the weight gradients may be encoded, and the specific encoding process may refer to the above process of encoding the network weights, which is not described herein again. After encoding, the network weight needs to be adjusted based on the weight Gradient, the adjustment process mainly includes matrix addition, and particularly, optimization algorithms such as SGD (Stochastic Gradient Descent) are adopted, so that the network weight can be converted from a floating point format to an integer fixed point format. Taking the SGD optimization algorithm as an example, the conversion of the network weights is shown in equations (9) to (11).

f₃₂(dW)＝f₃₂(dW)+f₃₂(λ_w)·f₃₂(W)→int_dWB(dW)＝int_dWB(dW)+int_λB(λ_w)·int_WB(W) (9)

f₃₂(W_old)＝f₃₂(m)·f₃₂(dW_old)+f₃₂(η)·f₃₂(dW)→int_WB(W_old)＝int_mB(m)·int_dWB(dW_old)+int_ηB(η)·int_dWB(dW) (10)

f₃₂(W)＝f₃₂(W)+f₃₂(W_old)→int_WB(W)＝int_WB(W)+int_WB(W_old) (11)

Wherein, dW is the weight gradient of the network layer at the current moment, dW_oldThe weight gradient of the network layer at the last moment, W is the network weight of the network layer at the current moment, lambda_wη and m are training hyperparameters (which may be set).

Optionally, after step S303 is executed, the neural network model training method provided in the embodiment of the present application may further execute:

After the operation of each network layer, the bit width of the obtained integer fixed point data generally becomes long, when the data is input to a subsequent network layer for operation, the operation efficiency may be reduced due to a longer bit width, and in order to ensure the operation efficiency, the second activation amount to be calculated may be subjected to integer fixed point encoding again, so as to reduce the bit width of the second activation amount, so that the bit width of the second activation amount can meet the calculation requirement of the next network layer.

By applying the embodiment of the application, the training sample is obtained, and the neural network model is trained by utilizing the training sample. When the neural network model is trained, aiming at each network layer in the neural network model, respectively executing: the method comprises the steps of obtaining a first activation quantity input into a network layer and a network weight of the network layer, conducting integer fixed point coding on the first activation quantity and the network weight, coding the first activation quantity and the network weight into integer fixed point data with a specified bit width, and calculating a second activation quantity output by the network layer according to the coded first activation quantity and the network weight. When the neural network model is trained, integer fixed-point coding is carried out on the first activation quantity input into each network layer and the network weight of each network layer, the first activation quantity and the network weight after coding are integer fixed-point data with specified bit width, when operation is carried out, the involved operations such as matrix multiplication, matrix addition and the like all adopt integer fixed-point formats, and the bit width of the integer fixed-point data is obviously less than the bit width of single-precision floating-point data, so that the hardware resource overhead required by running the neural network model can be greatly reduced.

The neural network model training method is mainly suitable for edge devices with limited resources, such as cameras, and for cameras, the intelligent inference functions of the cameras mainly include target detection, target tracking, face recognition and the like, and the following introduces a training method for a target detection model deployed on the camera by taking target detection as an example, as shown in fig. 9, the method mainly includes the following steps:

and S901, starting a target detection function.

The camera can start a target detection function based on a selection result of a user when the target detection is required according to the actual requirement of the user.

And S902, judging whether to start the model on-line training function, if so, executing S903, and otherwise, waiting for starting the model on-line training function.

Before the target detection model is used for target detection, the target detection model needs to be trained, whether online training is performed or not can be selected by a user, and in a general case, only after the model online training function is started, the camera trains the target detection model according to the steps of the embodiment shown in fig. 1.

And S903, training the target detection model by using the obtained training sample with the specified target.

When the target detection model is trained, the training sample input into the target detection model is a training sample with a specified target, so that the trained target detection model can detect the specified target. The specific way of training the target detection model is the same as the way of training the neural network model in the embodiment shown in fig. 3, and details are not repeated here.

Because the camera trains the target detection model by adopting the training mode in the embodiment shown in fig. 3, the first activation quantity input into each network layer and the network weight of each network layer are subjected to integer fixed point coding in the training process, and the coded first activation quantity and the network weight are integer fixed point data with a specified bit width, when the operation is performed, the involved operations such as matrix multiplication, matrix addition and the like all adopt an integer fixed point format, and the bit width of the integer fixed point data is obviously less than that of single-precision floating point data, so that the hardware resource overhead of the camera can be greatly reduced. And performing on-line training of the target detection model on the camera, so that the camera can have a scene self-adaption function.

Corresponding to the above method embodiment, an embodiment of the present application provides a neural network model training apparatus, as shown in fig. 10, the apparatus may include:

an obtaining module 1010, configured to obtain a training sample;

a training module 1020, configured to train the neural network model by using the training samples, where the training module 1020 executes the following steps for each network layer in the neural network model when training the neural network model:

the apparatus may further include:

the starting module is used for starting a target detection function;

training module 1020 may be specifically configured to:

Optionally, the training module 1020 may be specifically configured to:

Optionally, the training module 1020, when being configured to adjust the network weights of the network layers according to the weight gradients of the network layers, may specifically be configured to:

Optionally, the training module 1020 may further be configured to:

Optionally, the training module 1020, when being configured to perform integer fixed point coding on the first activation quantity and the network weight, and code the first activation quantity and the network weight as integer fixed point data with a specified bit width, may be specifically configured to:

The present application embodiment provides a computer device, as shown in fig. 11, which may include a processor 1101 and a machine-readable storage medium 1102, where the machine-readable storage medium 1102 stores machine-executable instructions capable of being executed by the processor 1101, and the processor 1101 is caused by the machine-executable instructions to: all steps of the neural network model training method described above are implemented.

The machine-readable storage medium may include a RAM (Random Access Memory) and a NVM (Non-Volatile Memory), such as at least one disk Memory. Alternatively, the machine-readable storage medium may be at least one memory device located remotely from the processor.

The Processor may be a general-purpose Processor, including a Central Processing Unit (CPU), a Network Processor (NP), and the like; but also a DSP (Digital Signal Processing), an ASIC (Application Specific Integrated Circuit), an FPGA (Field Programmable Gate Array) or other Programmable logic device, discrete Gate or transistor logic device, discrete hardware component.

The machine-readable storage medium 1102 and the processor 1101 may be in data communication by way of a wired or wireless connection, and the computer device may communicate with other devices by way of a wired or wireless communication interface. Fig. 11 shows only an example of data transmission between the processor 1101 and the machine-readable storage medium 1102 through a bus, and the connection manner is not limited in particular.

In this embodiment, the processor 1101 can realize that by reading the machine executable instructions stored in the machine readable storage medium 1102 and by executing the machine executable instructions: and acquiring a training sample, and training the neural network model by using the training sample. When the neural network model is trained, aiming at each network layer in the neural network model, respectively executing: the method comprises the steps of obtaining a first activation quantity input into a network layer and a network weight of the network layer, conducting integer fixed point coding on the first activation quantity and the network weight, coding the first activation quantity and the network weight into integer fixed point data with a specified bit width, and calculating a second activation quantity output by the network layer according to the coded first activation quantity and the network weight. When the neural network model is trained, integer fixed-point coding is carried out on the first activation quantity input into each network layer and the network weight of each network layer, the first activation quantity and the network weight after coding are integer fixed-point data with specified bit width, when operation is carried out, the involved operations such as matrix multiplication, matrix addition and the like all adopt integer fixed-point formats, and the bit width of the integer fixed-point data is obviously less than the bit width of single-precision floating-point data, so that the hardware resource overhead required by running the neural network model can be greatly reduced.

The embodiment of the application also provides a machine-readable storage medium, which stores machine executable instructions and realizes all the steps of the neural network model training method when being called and executed by a processor.

In this embodiment, the machine-readable storage medium stores machine-executable instructions for executing the neural network model training method provided in this embodiment when running, so that the following can be implemented: and acquiring a training sample, and training the neural network model by using the training sample. When the neural network model is trained, aiming at each network layer in the neural network model, respectively executing: the method comprises the steps of obtaining a first activation quantity input into a network layer and a network weight of the network layer, conducting integer fixed point coding on the first activation quantity and the network weight, coding the first activation quantity and the network weight into integer fixed point data with a specified bit width, and calculating a second activation quantity output by the network layer according to the coded first activation quantity and the network weight. When the neural network model is trained, integer fixed-point coding is carried out on the first activation quantity input into each network layer and the network weight of each network layer, the first activation quantity and the network weight after coding are integer fixed-point data with specified bit width, when operation is carried out, the involved operations such as matrix multiplication, matrix addition and the like all adopt integer fixed-point formats, and the bit width of the integer fixed-point data is obviously less than the bit width of single-precision floating-point data, so that the hardware resource overhead required by running the neural network model can be greatly reduced.

For the embodiments of the computer device and the machine-readable storage medium, the contents of the related methods are substantially similar to those of the foregoing method embodiments, so that the description is relatively simple, and for the relevant points, reference may be made to partial descriptions of the method embodiments.

It is noted that, herein, relational terms such as first and second, and the like may be used solely to distinguish one entity or action from another entity or action without necessarily requiring or implying any actual such relationship or order between such entities or actions. Also, the terms "comprises," "comprising," or any other variation thereof, are intended to cover a non-exclusive inclusion, such that a process, method, article, or apparatus that comprises a list of elements does not include only those elements but may include other elements not expressly listed or inherent to such process, method, article, or apparatus. Without further limitation, an element defined by the phrase "comprising an … …" does not exclude the presence of other identical elements in a process, method, article, or apparatus that comprises the element.

All the embodiments in the present specification are described in a related manner, and the same and similar parts among the embodiments may be referred to each other, and each embodiment focuses on the differences from the other embodiments. In particular, for the apparatus, computer device, and machine-readable storage medium embodiments, since they are substantially similar to the method embodiments, the description is relatively simple, and in relation to the description, reference may be made to some portions of the method embodiments.

The above description is only for the preferred embodiment of the present application, and is not intended to limit the scope of the present application. Any modification, equivalent replacement, improvement and the like made within the spirit and principle of the present application are included in the protection scope of the present application.

Claims

1. A neural network model training method, the method comprising:

obtaining a training sample;

training a neural network model by using the training sample, wherein when the neural network model is trained, the following steps are respectively executed for each network layer in the neural network model:

acquiring a first activation quantity input into the network layer and a network weight of the network layer;

2. The method of claim 1, wherein the method is applied to a camera; the training sample is a training sample with a specified target; the neural network model is a target detection model used for detecting the specified target;

before the training of the neural network model using the training samples, the method further comprises:

starting a target detection function;

judging whether a model online training function is started or not;

the training of the neural network model by using the training samples comprises:

and if the model on-line training function is started, training the target detection model by using the training sample with the specified target.

3. The method of claim 1, wherein training a neural network model using the training samples comprises:

inputting the training sample into a neural network model, and performing forward operation on the training sample according to the sequence of each network layer in the neural network model from front to back to obtain a forward operation result of the neural network model, wherein when performing forward operation, for each network layer, performing integer fixed point coding on a first activation quantity input into the network layer and a network weight of the network layer, coding the first activation quantity and the network weight into integer fixed point data with a specified bit width, calculating a second activation quantity output by the network layer according to the coded first activation quantity and the network weight, and calculating the second activation quantity as a first activation quantity input into a next network layer until a second activation quantity output by a last network layer is determined as a forward operation result;

inputting the loss value into the neural network model, and performing reverse operation on the loss value according to the sequence of each network layer in the neural network model from back to front to obtain a weight gradient of each network layer in the neural network model, wherein during the reverse operation, a first activation quantity gradient and a network weight of the network layer are respectively subjected to integer fixed-point coding aiming at each network layer, the first activation quantity gradient and the network weight are coded into integer fixed-point data with a specified bit width, a second activation quantity gradient and a weight gradient output by the network layer are calculated according to the coded first activation quantity, the first activation quantity gradient and the network weight, and the second activation quantity gradient is used as a first activation quantity gradient input into the next network layer for calculation, until the weight gradients of all network layers are calculated;

4. The method according to claim 3, wherein the adjusting the network weight of each network layer according to the weight gradient of each network layer comprises:

5. The method according to claim 1, wherein after the calculating a second activation amount output by the network layer according to the encoded first activation amount and the network weight, the method further comprises:

and performing integer fixed point encoding on the second activation quantity, and encoding the second activation quantity into integer fixed point data with a specified bit width.

6. The method according to claim 1, wherein the performing integer fixed point coding on the first activation quantity and the network weight, and coding the first activation quantity and the network weight as integer fixed point data with a specified bit width comprises:

and respectively coding each scalar numerical value in the first activation quantity and the network weight as a product of a parameter value representing a global dynamic range and an integer fixed point value of a specified bit width.

7. The method of claim 6, wherein if the network layer is a convolutional layer, the network weight has a size of C x R x N, and the corresponding parameter values are the same for each scalar value in each three-dimensional tensor having a size of C x R;

if the network layer is a full connection layer, the size of the network weight is MxN, and the corresponding parameter values are the same for each scalar numerical value in each column vector with the size of 1 xN;

and the parameter values corresponding to the scalar numerical values in the first activation quantity are the same.

8. An apparatus for neural network model training, the apparatus comprising:

the acquisition module is used for acquiring a training sample;

a training module, configured to train a neural network model using the training samples, where the training module, when training the neural network model, respectively executes the following steps for each network layer in the neural network model:

9. The apparatus according to claim 8, wherein the apparatus is applied to a camera; the training sample is a training sample with a specified target; the neural network model is a target detection model used for detecting the specified target;

the device further comprises:

the starting module is used for starting a target detection function;

the training module is specifically configured to:

and if the judgment result of the judgment module is that the model on-line training function is started, training the target detection model by using the training sample with the specified target.

10. The apparatus of claim 8, wherein the training module is specifically configured to:

11. The apparatus according to claim 10, wherein the training module, when configured to adjust the network weights of the network layers according to the weight gradients of the network layers, is specifically configured to:

12. The apparatus of claim 8, wherein the training module is further configured to:

13. The apparatus according to claim 8, wherein the training module, when configured to perform integer fixed point coding on the first activation quantity and the network weight, and encode the first activation quantity and the network weight as integer fixed point data having a specified bit width, is specifically configured to:

14. The apparatus of claim 13, wherein if the network layer is a convolutional layer, the network weight has a size of cxrxrxrxrxrxrxrxrxrxrxnxn, and the corresponding parameter values are the same for each scalar value in each three-dimensional tensor of size cxxrxr;