CN110751265A

CN110751265A - Lightweight neural network construction method and system and electronic equipment

Info

Publication number: CN110751265A
Application number: CN201910904649.9A
Authority: CN
Inventors: 周阳; 张涌; 宁立; 王书强; 邬晶晶; 姜元爽
Original assignee: Shenzhen Institute of Advanced Technology of CAS
Current assignee: Shenzhen Institute of Advanced Technology of CAS
Priority date: 2019-09-24
Filing date: 2019-09-24
Publication date: 2020-02-04

Abstract

The application relates to a light weight neural network construction method, a light weight neural network construction system and electronic equipment. The method comprises the following steps: step a: decomposing a1 x 1 convolution parameter matrix in the depth separable convolution by using a tensor train decomposition algorithm; step b: carrying out quantization operation on the matrix kernel parameters subjected to tensor decomposition by using a weight quantization algorithm to obtain an optimized depth separable convolution; step c: a lightweight neural network is constructed using the optimized deep separable convolution. By compressing the 1 × 1 convolution of the depth separable convolution by using a tensor train decomposition algorithm, the parameter quantity of the depth separable convolution is greatly reduced while the model performance is maintained. The core matrix parameters after tensor decomposition are quantized from 32 bits to low bits by using a weight quantization algorithm, so that the calculation amount of the model is reduced, the forward inference speed of the model is increased, and the light weight type neural network constructed by the method can be better deployed on embedded equipment with limited calculation amount and storage amount.

Description

Lightweight neural network construction method and system and electronic equipment

Technical Field

The application belongs to the technical field of deep neural networks, and particularly relates to a light-weight neural network construction method, a light-weight neural network construction system and electronic equipment.

Background

With deep learning, the method has better and better effects in a plurality of fields such as image recognition, natural language processing, voice recognition and the like. In order to achieve extremely high accuracy, researchers generally adopt deeper and more complex network structures, but parameters and calculation amount of the neural network are greatly increased, requirements on hardware (a processor, a memory, a calculation card and bandwidth) are higher and higher, and it is difficult to directly deploy the large deep neural network on embedded equipment with limited calculation amount and storage amount and achieve available speed. With the application of artificial intelligence in various industries, the demand for deploying these large networks on embedded devices is increasing, and how to implement the compression and acceleration of neural networks is an important issue that must be considered for implementing the industrialization of artificial intelligence.

The deep neural network is deployed on the embedded device, and the problem of limited storage space and computing capacity of the device needs to be considered firstly, so that a light-weight neural network structure which is very compact and efficient needs to be designed. MobileNet [ Howard AG, Zhu M, Chen B, et al. Mobilenes: effective volatile neural networks for mobile vision applications [ J ]. arXiv preprinting arXiv:1704.04861,2017 ] is the most representative lightweight neural network at present, and Depth-separable convolution (Depth-wise separable convolution) is used to replace the traditional convolution operation, so that the operation amount of the convolution operation is obviously reduced on the basis of ensuring the model performance. The deep separable convolution divides the conventional convolution operation into two steps: the first step is Depthwise contribution, one Convolution kernel is only convoluted with one corresponding characteristic diagram; the second step is Pointwise Convolution, the size of the Convolution kernel is 1 × 1, namely 1 × 1 Convolution, and linear combination of different channels of the feature map is realized. The 1 × 1 convolution in the deep separable convolution can be regarded as mapping a set of feature maps by a fully-connected matrix, wherein the most significant parameter quantity comes from the fully-connected mapping matrix, which contains a large number of redundant parameters (the 1 × 1 convolution in MobileNet occupies about 75% of the parameter quantity and 95% of the calculated quantity).

Disclosure of Invention

The present application provides a method, a system and an electronic device for constructing a lightweight neural network, which aim to solve at least one of the above technical problems in the prior art to a certain extent.

In order to solve the above problems, the present application provides the following technical solutions:

a lightweight neural network construction method comprises the following steps:

step a: decomposing a1 x 1 convolution parameter matrix in the depth separable convolution by using a tensor train decomposition algorithm;

step b: carrying out quantization operation on the matrix kernel parameters subjected to tensor decomposition by using a weight quantization algorithm to obtain an optimized depth separable convolution;

step c: a lightweight neural network is constructed using the optimized deep separable convolution.

The technical scheme adopted by the embodiment of the application further comprises the following steps: in the step a, a convolution kernel parameter matrix is set to be 1 × 1 × M × N, where M is the number of input eigen map channels and N is the number of output eigen map channels, then the total parameter number of 1 × 1 convolution is MN, and the decomposition of the 1 × 1 convolution parameter matrix in the depth separable convolution by using a tensor train decomposition algorithm specifically includes:

step a 1: the current layer 1 x 1 convolution kernel parameter matrix is extracted and converted into dimension (m)₁n₁,...,m_dn_d) Of (b), wherein

Step a 2: carrying out tensor train decomposition on the tensor A to obtain a nuclear matrix G_k[m_k,n_k]；

Step a 3: carrying out tensor train decomposition on the channel number M of the input characteristic diagram to obtain

Wherein the content of the first and second substances,

step a 4: obtaining an output characteristic diagram after tensor operation

Wherein

The operational procedure of the decomposed 1 × 1 convolution is expressed as:

the technical scheme adopted by the embodiment of the application further comprises the following steps: in the step b, the quantizing the matrix kernel parameters after tensor decomposition by using a weight quantization algorithm specifically includes:

step b 1: extracting the weight of the current layer, and calculating the values of a scaling coefficient S and a zero point Z;

step b 2: calculating a quantization value q corresponding to the actual value r through the scaling coefficient S and the zero point Z, wherein the q is r/S + Z;

step b 3: performing corresponding calculation on input data and the quantized weight parameters, and storing the obtained result in a uint32 form;

step b 4: adding the offset in the form of the uint32 and the result in step b3 and quantifying the result to the form of uint 8;

step b 5: the result in the form of the uint8 is input to the activation function, resulting in output data for the layer, the result being in the form of the uint 8.

The technical scheme adopted by the embodiment of the application further comprises the following steps: in the step c, the sizes of the separate convolution kernels of the lightweight neural network are all 3 × 3, and the activation function is Relu 6.

Another technical scheme adopted by the embodiment of the application is as follows: a lightweight neural network construction system, comprising:

a network decomposition module: for decomposing a1 x 1 convolution parameter matrix in a depth separable convolution using a tensor train decomposition algorithm;

a parameter quantization module: the matrix kernel parameter after tensor decomposition is subjected to quantization operation by using a weight quantization algorithm to obtain an optimized depth separable convolution;

a model construction module: for constructing lightweight neural networks using optimized deep separable convolutions.

The technical scheme adopted by the embodiment of the application further comprises the following steps: assuming that the convolution kernel parameter matrix is 1 × 1 × M × N, where M is the number of input eigen map channels and N is the number of output eigen map channels, the total number of parameters of 1 × 1 convolution is MN, and the network decomposition module decomposes the 1 × 1 convolution parameter matrix in the deep separable convolution by using a tensor train decomposition algorithm specifically as follows: the current layer 1 x 1 convolution kernel parameter matrix is extracted and converted into dimension (m)₁n₁,...,m_dn_d) Of (b), wherein

Carrying out tensor train decomposition on the tensor A to obtain a nuclear matrix G_k[m_k,n_k](ii) a Carrying out tensor train decomposition on the channel number M of the input characteristic diagram to obtain

Wherein the content of the first and second substances,

obtaining an output characteristic diagram after tensor operation

Wherein

The operational procedure of the decomposed 1 × 1 convolution is expressed as:

the technical scheme adopted by the embodiment of the application further comprises the following steps: the parameter quantization module performs quantization operation on the matrix kernel parameters after tensor decomposition by using a weight quantization algorithm, and specifically comprises the following steps: 1: extracting the weight of the current layer, and calculating the values of a scaling coefficient S and a zero point Z; 2: calculating a quantization value q corresponding to the actual value r through the scaling coefficient S and the zero point Z, wherein the q is r/S + Z; 3: performing corresponding calculation on input data and the quantized weight parameters, and storing the obtained result in a uint32 form; 4: adding the offset in the form of the uint32 and the result in 3 and quantizing the result to the form of uint 8; 5: the result in the form of the uint8 is input to the activation function, resulting in output data for the layer, the result being in the form of the uint 8.

The technical scheme adopted by the embodiment of the application further comprises the following steps: the sizes of the separation convolution kernels of the lightweight neural network are all 3 x 3, and the activation function is Relu 6.

The embodiment of the application adopts another technical scheme that: an electronic device, comprising:

at least one processor; and

a memory communicatively coupled to the at least one processor; wherein the content of the first and second substances,

the memory stores instructions executable by the one processor to cause the at least one processor to perform the following operations of the lightweight neural network construction method described above:

Compared with the prior art, the embodiment of the application has the advantages that: the lightweight neural network construction method, the system and the electronic device in the embodiment of the application compress the 1 × 1 convolution of the deep separable convolution by using the tensor train decomposition algorithm, so that the parameter quantity of the deep separable convolution is greatly reduced while the model performance is maintained. By using a weight quantization algorithm to quantize the kernel matrix parameters after tensor decomposition from 32 bits to low bits, the calculation amount of the model is reduced, and the forward inference speed of the model is increased. The lightweight neural network constructed by the method needs less storage space and less calculation power and can be better deployed on embedded devices with limited calculation amount and storage amount.

Drawings

FIG. 1 is a flow chart of a method of constructing a lightweight neural network according to an embodiment of the present application;

FIG. 2 is a tensor train exploded view;

FIG. 3 is a schematic diagram of a depth separable convolution after tensor train decomposition is introduced;

FIG. 4 is a schematic diagram of the operation process of the 1 × 1 convolution after quantization;

FIG. 5 is a schematic structural diagram of a lightweight neural network construction system according to an embodiment of the present application;

fig. 6 is a schematic structural diagram of a hardware device of a lightweight neural network construction method provided in an embodiment of the present application.

Detailed Description

In order to make the objects, technical solutions and advantages of the present application more apparent, the present application is described in further detail below with reference to the accompanying drawings and embodiments. It should be understood that the specific embodiments described herein are merely illustrative of the present application and are not intended to limit the present application.

In order to solve the defects in the prior art, the application provides a depth separable convolution based on tensor train decomposition, and a set of lightweight neural network aiming at embedded equipment is built by combining a weight quantization algorithm. Firstly, aiming at the problem of 1 × 1 convolution parameter redundancy in the depth separable convolution, a tensor train decomposition algorithm is used for decomposing a1 × 1 convolution full-connection mapping matrix, and the parameter quantity of a light weight type neural network is further reduced; secondly, quantizing the matrix kernel parameters after tensor decomposition by using a weight quantization method, accelerating the forward inference speed of the model and reducing the size of the model; finally, a lightweight neural network for the embedded device is constructed using the optimized deep separable convolution.

Specifically, please refer to fig. 1, which is a flowchart of a method for constructing a lightweight neural network according to an embodiment of the present application. The lightweight neural network construction method comprises the following steps:

step 100: compressing a1 x 1 convolution parameter matrix in the depth separable convolution by using a tensor train decomposition algorithm, and reducing the parameter quantity of a lightweight network;

in step 100, a Tensor Train decomposition algorithm (Tensor-Train)0 is a Tensor decomposition algorithm, and each element in the high-dimensional Tensor can be expressed in a form of Matrix multiplication (Matrix Product State), that is:

A(i₁,i₂,...,i_d)＝G₁(i₁)G₂(i₂)...G_d(i_d) (1)

in the formula (1), G_k(i_k) Is a size r_k-1×r_kMatrix of r_kIs the rank of decomposition (TT-ranks) of tensor trains, and is a scalar r to ensure that the result of the matrix multiplication is a scalar₀＝r_k＝1。

Figure 2 is a tensor train decomposition diagram. The tensor train decomposition algorithm is applied to the 1 x 1 convolution, so that the parameter quantity of the deep separable convolution can be effectively reduced, and good operation performance is kept. In the embodiment of the present application, the principle of compressing the 1 × 1 convolution parameter matrix in the depth separable convolution by using the tensor train decomposition algorithm is as follows: the essence of the 1 × 1 convolution is to perform linear combination on input feature maps to realize information exchange between the feature maps, and a convolution kernel parameter matrix is set to be 1 × 1 × M × N, where M is the number of channels of the input feature maps and N is the number of channels of the output feature maps, so that the total parameter number of the 1 × 1 convolution is MN. The parameter matrix is a full-connection matrix and contains a large amount of parameter redundancy, the parameter matrix is subjected to decompression by tensor train decomposition, the parameter amount of the model can be further reduced, and the depth separable convolution after the tensor train decomposition is introduced is shown in figure 3. The tensor train decomposition algorithm comprises the following concrete implementation steps:

step 101: the current layer 1 x 1 convolution kernel parameter matrix is extracted and converted into dimension (m)₁n₁,...,m_dn_d) Of (b), wherein

Step 102: carrying out tensor train decomposition on the tensor A obtained by conversion to obtain a nuclear matrix G_k[m_k,n_k]；

Step 103: decomposing the channel number M of the input characteristic diagram in the same way to obtain

Wherein the content of the first and second substances,

step 104: obtaining an output characteristic diagram after tensor operation

Wherein

The operational procedure of the decomposed 1 × 1 convolution is expressed as:

step 200: carrying out quantization operation on the matrix kernel parameters subjected to tensor decomposition by using a weight quantization algorithm to obtain an optimized depth separable convolution;

in step 200, although the parameter quantity is reduced by the tensor train decomposition algorithm, the parameter matrix is decomposed into a plurality of matrix cores, the number of calculation layers of the matrix is increased, and the required calculation quantity is not obviously reduced. Therefore, in the present application, the weight quantization algorithm is applied to the decomposed 1 × 1 convolution parameter matrix kernel, and the 32bit parameter is quantized to a low bit (in the embodiment of the present application, it is preferable to quantize the 32bit parameter to 8bit, and specifically, the quantization can be set according to actual operation), so that the calculation amount can be obviously reduced, the forward operation speed of the model is increased, the size of the model is reduced, and the storage space required by the model is compressed. Weight quantization algorithm [ Krishnaoorthi. quantization depth relational networks for effectiveness reference: Awhitepaperpper [ J ]. arXiv prediction arXiv:1806.08342,2018 ]. The method is a neural network forward acceleration technology which is widely used at present, the weight parameters are quantized from 32 bits to low bits, the operation amount of the neural network can be obviously reduced, almost no precision loss can be realized, and a schematic diagram of the operation process of the 1 × 1 convolution after quantization is shown in fig. 4.

Specifically, the weight quantization algorithm includes the following steps:

step 201: extracting the weight of the current layer, and calculating the values of a scaling coefficient S and a zero point Z;

step 202: calculating a quantization value q corresponding to the actual value r through the scaling coefficient S and the zero point Z, wherein the q is r/S + Z;

step 203: performing corresponding calculation on input data and the quantized weight parameters, and storing the obtained result in a uint32 form;

step 204: adding the offset in the form of the uint32 to the result in step 203 and quantizing the result to the form of the uint8 in the same manner as steps 201 and 202;

step 205: the result of step 204 is input into the activation function to obtain the output data for the layer, the result being in the form of a agent 8.

Step 300: constructing a lightweight neural network for the embedded device using the optimized depth separable convolution;

in step 300, the lightweight neural network of the embodiment of the present application mainly comprises a depth separable convolution based on tensor train decomposition, the size of the separation convolution kernels is 3 × 3 by replacing the conventional depth separable convolution with the depth separable convolution based on tensor train decomposition, the superposition of a plurality of small 3 × 3 convolution kernels has fewer parameters and better nonlinear representation compared with a larger convolution kernel, the activation function uses Relu6, and Batch Normalization is used. And the depth separable convolution parameters after the tensor train is decomposed are quantized from 32 bits to 8 bits by using a weight quantization algorithm, so that the model precision is maintained, and the size and the inference speed of the model are obviously improved compared with those of the MobileNet.

For the ImageNet dataset, the main architecture of the lightweight neural network of the embodiments of the present application is shown in the following table:

table 1: main body architecture of light weight type neural network

The number of parameters before compression is 3.19M and the number of parameters after compression is 0.922M, which greatly reduces the amount of parameters for the deep separable convolution.

Please refer to fig. 5, which is a schematic structural diagram of a lightweight neural network construction system according to an embodiment of the present application. The lightweight neural network construction system comprises a network decomposition module, a parameter quantification module and a model construction module.

A network decomposition module: the method is used for compressing a1 x 1 convolution parameter matrix in the depth separable convolution by using a tensor train decomposition algorithm to reduce the parameter quantity of a lightweight network; the tensor train decomposition algorithm is a tensor decomposition algorithm, and each element in the high-dimensional tensor can be expressed in a Matrix multiplication (Matrix Product State) form, that is to say:

A(i₁,i₂,...,i_d)＝G₁(i₁)G₂(i₂)...G_d(i_d) (1)

The tensor train decomposition algorithm is applied to the 1 x 1 convolution, so that the parameter quantity of the deep separable convolution can be effectively reduced, and good operation performance is kept. In the embodiment of the present application, the principle of compressing the 1 × 1 convolution parameter matrix in the depth separable convolution by using the tensor train decomposition algorithm is as follows: the essence of the 1 × 1 convolution is to perform linear combination on input feature maps to realize information exchange between the feature maps, and a convolution kernel parameter matrix is set to be 1 × 1 × M × N, where M is the number of channels of the input feature maps and N is the number of channels of the output feature maps, so that the total parameter number of the 1 × 1 convolution is MN. The parameter matrix is a full-connection matrix and contains a large amount of parameter redundancy, the parameter matrix is subjected to decompression by tensor train decomposition, the parameter amount of the model can be further reduced, and the depth separable convolution after the tensor train decomposition is introduced is shown in figure 3.

The tensor train decomposition algorithm specifically comprises the following steps: the current layer 1 x 1 convolution kernel parameter matrix is extracted and converted into dimension (m)₁n₁,...,m_dn_d) Of (b), wherein

Carrying out tensor train decomposition on the tensor A obtained by conversion to obtain a nuclear matrix G_k[m_k,n_k](ii) a Decomposing the channel number M of the input characteristic diagram in the same way to obtain

Wherein the content of the first and second substances,obtaining an output characteristic diagram after tensor operation

WhereinThe operational procedure of the decomposed 1 × 1 convolution is expressed as:

a parameter quantization module: the matrix kernel parameter after tensor decomposition is subjected to quantization operation by using a weight quantization algorithm to obtain an optimized depth separable convolution; although the parameter quantity is reduced by the tensor train decomposition algorithm, the parameter matrix is decomposed into a plurality of matrix cores, the number of operation layers of the matrix is increased, and the required calculation quantity is not obviously reduced. Therefore, the weight quantization algorithm is applied to the decomposed 1 × 1 convolution parameter matrix kernel, the 32bit parameter is quantized to a low bit, the calculation amount can be obviously reduced, the forward calculation speed of the model is accelerated, the size of the model is reduced, and the storage space required by the model is compressed. The weight quantization algorithm is a neural network forward acceleration technology which is widely used at present, the weight parameters are quantized from 32 bits to low bits, the operation amount of the neural network can be obviously reduced, and almost no precision loss can be realized.

Specifically, the weight quantization algorithm specifically includes:

1: extracting the weight of the current layer, and calculating the values of a scaling coefficient S and a zero point Z;

2: calculating a quantization value q corresponding to the actual value r through the scaling coefficient S and the zero point Z, wherein the q is r/S + Z;

3: performing corresponding calculation on input data and the quantized weight parameters, and storing the obtained result in a uint32 form;

4: adding the offset in the form of the uint32 and the result in 3 and quantizing the result to the uint8 form in the same manner;

5: and inputting the result in the step 4 into the activation function to obtain the output data of the layer, wherein the result is in the form of the agent 8.

A model construction module: for constructing a lightweight neural network for the embedded device using the optimized deep separable convolution; the lightweight neural network of the embodiment of the application mainly comprises the depth separable convolution based on tensor train decomposition, the traditional depth separable convolution is replaced by the depth separable convolution based on tensor train decomposition, the sizes of separation convolution kernels are all 3 x 3, the superposition of a plurality of small 3 x 3 convolution kernels has fewer parameters and better nonlinear representation compared with a larger convolution kernel, the Relu6 is used as an activation function, and Batch Normalization is used. And the depth separable convolution parameters after the tensor train is decomposed are quantized from 32 bits to 8 bits by using a weight quantization algorithm, so that the model precision is maintained, and the size and the inference speed of the model are obviously improved compared with those of the MobileNet.

table 1: main body architecture of light weight type neural network

Fig. 6 is a schematic structural diagram of a hardware device of a lightweight neural network construction method provided in an embodiment of the present application. As shown in fig. 6, the device includes one or more processors and memory. Taking a processor as an example, the apparatus may further include: an input system and an output system.

The processor, memory, input system, and output system may be connected by a bus or other means, as exemplified by the bus connection in fig. 6.

The memory, which is a non-transitory computer readable storage medium, may be used to store non-transitory software programs, non-transitory computer executable programs, and modules. The processor executes various functional applications and data processing of the electronic device, i.e., implements the processing method of the above-described method embodiment, by executing the non-transitory software program, instructions and modules stored in the memory.

The memory may include a storage program area and a storage data area, wherein the storage program area may store an operating system, an application program required for at least one function; the storage data area may store data and the like. Further, the memory may include high speed random access memory, and may also include non-transitory memory, such as at least one disk storage device, flash memory device, or other non-transitory solid state storage device. In some embodiments, the memory optionally includes memory located remotely from the processor, and these remote memories may be connected to the processing system over a network. Examples of such networks include, but are not limited to, the internet, intranets, local area networks, mobile communication networks, and combinations thereof.

The input system may receive input numeric or character information and generate a signal input. The output system may include a display device such as a display screen.

The one or more modules are stored in the memory and, when executed by the one or more processors, perform the following for any of the above method embodiments:

The product can execute the method provided by the embodiment of the application, and has the corresponding functional modules and beneficial effects of the execution method. For technical details that are not described in detail in this embodiment, reference may be made to the methods provided in the embodiments of the present application.

Embodiments of the present application provide a non-transitory (non-volatile) computer storage medium having stored thereon computer-executable instructions that may perform the following operations:

Embodiments of the present application provide a computer program product comprising a computer program stored on a non-transitory computer readable storage medium, the computer program comprising program instructions that, when executed by a computer, cause the computer to perform the following:

The lightweight neural network construction method, the system and the electronic device in the embodiment of the application compress the 1 × 1 convolution of the deep separable convolution by using the tensor train decomposition algorithm, so that the parameter quantity of the deep separable convolution is greatly reduced while the model performance is maintained. By using a weight quantization algorithm to quantize the kernel matrix parameters after tensor decomposition from 32 bits to low bits, the calculation amount of the model is reduced, and the forward inference speed of the model is increased. The lightweight neural network constructed by the method needs less storage space and less calculation power and can be better deployed on embedded devices with limited calculation amount and storage amount.

The previous description of the disclosed embodiments is provided to enable any person skilled in the art to make or use the present application. Various modifications to these embodiments will be readily apparent to those skilled in the art, and the generic principles defined herein may be applied to other embodiments without departing from the spirit or scope of the application. Thus, the present application is not intended to be limited to the embodiments shown herein but is to be accorded the widest scope consistent with the principles and novel features disclosed herein.

Claims

1. A lightweight neural network construction method is characterized by comprising the following steps:

2. A lightweight neural network construction method according to claim 1, wherein in step a, assuming that the convolution kernel parameter matrix is 1 × 1 × M × N, where M is the number of input eigen map channels and N is the number of output eigen map channels, the total parameter number of 1 × 1 convolution is MN, and the decomposing the 1 × 1 convolution parameter matrix in the depth separable convolution by using the tensor train decomposition algorithm is specifically:

Wherein the content of the first and second substances,

step a 4: obtaining an output characteristic diagram after tensor operation

Wherein

The operational procedure of the decomposed 1 × 1 convolution is expressed as:

3. a lightweight neural network construction method according to claim 2, wherein in step b, the quantizing the matrix kernel parameters after tensor decomposition by using a weight quantizing algorithm specifically includes:

step b 5: inputting the result of the form of the uint8 described in the step b4 into the activation function to obtain the output data of the layer, and the result is in the form of the uint 8.

4. A method for constructing a lightweight neural network as claimed in any one of claims 1 to 3, wherein in step c, the sizes of the discrete convolution kernels of the lightweight neural network are 3 x 3, and the activation function is Relu 6.

5. A lightweight neural network construction system, comprising:

6. A lightweight neural network construction system as claimed in claim 5, wherein assuming that the convolution kernel parameter matrix is 1 x M x N, where M is the number of input eigen map channels and N is the number of output eigen map channels, the total parameter number of 1 x 1 convolution is MN, and said network decomposition module decomposes the 1 x 1 convolution parameter matrix in the deep separable convolution using tensor train decomposition algorithm specifically: the current layer 1 x 1 convolution kernel parameter matrix is extracted and converted into dimension (m)₁n₁,...,m_dn_d) Of (b), wherein

Wherein

The operational procedure of the decomposed 1 × 1 convolution is expressed as:

7. the system according to claim 6, wherein the quantizing module quantizes the matrix kernel parameters after tensor decomposition by using a weight quantizing algorithm specifically includes: 1: extracting the weight of the current layer, and calculating the values of a scaling coefficient S and a zero point Z; 2: calculating a quantization value q corresponding to the actual value r through the scaling coefficient S and the zero point Z, wherein the q is r/S + Z; 3: performing corresponding calculation on input data and the quantized weight parameters, and storing the obtained result in a uint32 form; 4: adding the offset in the form of the uint32 and the result in 3 and quantizing the result to the form of uint 8; 5: the result in the form of the uint8 is input to the activation function, resulting in output data for the layer, the result being in the form of the uint 8.

8. A lightweight neural network construction system as claimed in any one of claims 5 to 7, wherein the discrete convolution kernels of the lightweight neural network are each 3 x 3 in size and the activation function is Relu 6.

9. An electronic device, comprising:

at least one processor; and

the memory stores instructions executable by the at least one processor to enable the at least one processor to perform the following operations of the lightweight neural network construction method of any one of claims 1 to 4 above: