CN111324860B

CN111324860B - Lightweight CNN calculation method and device based on random matrix approximation

Info

Publication number: CN111324860B
Application number: CN202010086785.4A
Authority: CN
Inventors: 李斌; 陈沛鋆; 刘宏福; 赵成林; 许方敏
Original assignee: Wuxi Bupt Sensing Technology & Industry Academy Co ltd; Beijing University of Posts and Telecommunications
Current assignee: Wuxi Bupt Sensing Technology & Industry Academy Co ltd; Beijing University of Posts and Telecommunications
Priority date: 2020-02-11
Filing date: 2020-02-11
Publication date: 2024-01-23
Anticipated expiration: 2040-02-11
Also published as: CN111324860A

Abstract

The invention provides a lightweight CNN calculation method and device based on random matrix approximation, comprising the following steps: performing dimension reduction processing on the data sample to obtain a low-dimension representation of the data sample; performing dimension reduction treatment on the weight parameters of the model to obtain low-dimensional weight characterization of the weight parameters; and performing CNN model training by using the low-dimensional representation of the data sample and the low-dimensional weight representation of the weight parameter. According to the invention, the data quantity of the data sample is reduced, the data quantity of the weight parameter of the network is reduced, the CNN model operation is performed by utilizing the low-dimensional representation and the low-dimensional weight representation after the dimensionality reduction, the complexity of the model operation can be reduced, the storage resource and the calculation resource required by the model operation are reduced, and the model operation can be realized on the terminal equipment with lower configuration.

Description

Lightweight CNN calculation method and device based on random matrix approximation

Technical Field

The invention relates to the technical field of machine learning, in particular to a lightweight CNN computing method and device based on random matrix approximation.

Background

Machine learning is a multi-field discipline combining techniques such as mathematics and computers, and has been widely studied, popularized and applied in recent years. Various machine learning models have good application effects in the fields of images, videos, semantic analysis, machine translation, sequence processing and the like. However, with the development of machine learning technology, the model structure is more and more complex, the operation complexity of training and reasoning process is higher and higher, and the storage and calculation resource requirements on hardware equipment are higher, so that the model can only be deployed in a server, and the functions of the model can not be realized on low-configuration terminal equipment.

Disclosure of Invention

Therefore, the invention aims to provide a lightweight CNN calculation method and device based on random matrix approximation, so as to solve the problem of excessively high hardware requirements due to complex model.

Based on the above object, the present invention provides a lightweight CNN calculation method based on stochastic matrix approximation, including:

performing dimension reduction processing on the data sample to obtain a low-dimension representation of the data sample;

performing dimension reduction treatment on the weight parameters of the model to obtain low-dimensional weight characterization of the weight parameters;

and training a CNN model by using the low-dimensional characterization and the low-dimensional weight characterization.

Optionally, performing dimension reduction processing on the data sample to obtain a low-dimension representation of the data sample, including:

converting the tensor of data samples into a data matrix;

extracting a travel data representation matrix and a column data representation matrix from the data matrix;

and calculating to obtain a core data representation matrix according to the row data representation matrix and the column data representation matrix.

Optionally, performing dimension reduction processing on the weight parameters of the model to obtain low-dimensional weight characterization of the weight parameters, including:

initializing a weight parameter tensor;

converting the weight parameter tensor into a weight matrix;

extracting a travel weight representation matrix and a column weight representation matrix from the weight matrix;

and calculating to obtain a core weight representation matrix according to the weight matrix, the row weight representation matrix and the column weight representation matrix.

Optionally, extracting the row weight representation matrix and the column weight representation matrix from the weight matrix includes:

constructing a row weight sampling matrix with only one element with a value of 1 in each column, and calculating the row weight sampling matrix and the weight matrix to obtain the row weight characterization matrix;

constructing a column weight sampling matrix with only one element with a value of 1 in each column, and calculating the column weight sampling matrix and the weight matrix to obtain the column weight characterization matrix.

Optionally, the method further comprises:

and performing forward reasoning calculation by using the low-dimensional characterization and the low-dimensional weight characterization.

Optionally, the forward reasoning calculation includes a training mode for training a model, and in the training mode, the row weight representation matrix, the column weight representation matrix and the core weight representation matrix are stored, and the model is trained according to the low-dimensional representation, the row weight representation matrix, the column weight representation matrix and the core weight representation matrix.

Optionally, the forward reasoning calculation includes an application mode for training a model, in the application mode, the column weight representation matrix and a product matrix are stored, the product matrix is obtained by multiplying the row weight representation matrix and the core weight representation matrix, and model prediction operation is performed according to the low-dimensional representation, the column weight representation matrix and the product matrix.

Optionally, the method further comprises:

and performing backward propagation calculation by using the low-dimensional characterization and the low-dimensional weight characterization.

Optionally, the method further comprises:

updating model parameters by using the low-dimensional characterization and the low-dimensional weight characterization.

The embodiment of the invention also provides a lightweight CNN computing device based on random matrix approximation, which comprises:

the sample processing module is used for performing dimension reduction processing on the data sample to obtain a low-dimension representation of the data sample;

the parameter processing module is used for carrying out dimension reduction processing on the weight parameters of the model to obtain low-dimensional weight characterization of the weight parameters;

and the model training module is used for carrying out CNN model training by utilizing the low-dimensional representation and the low-dimensional weight representation.

From the above, it can be seen that the lightweight CNN calculation method and apparatus based on stochastic matrix approximation provided by the invention obtain the low-dimensional representation through performing the dimension reduction processing on the data samples, obtain the low-dimensional weight representation through performing the dimension reduction processing on the weight parameters of the model, and perform the CNN model training by using the low-dimensional representation and the low-dimensional weight representation, so that the complexity of model operation can be reduced, the storage resources and the calculation resources required by the model operation can be reduced, and the model operation can be realized on the terminal equipment with lower configuration.

Drawings

In order to more clearly illustrate the embodiments of the invention or the technical solutions in the prior art, the drawings that are required in the embodiments or the description of the prior art will be briefly described, it being obvious that the drawings in the following description are only some embodiments of the invention, and that other drawings may be obtained according to these drawings without inventive effort for a person skilled in the art.

FIG. 1 is a schematic flow chart of a method according to an embodiment of the invention;

FIG. 2 is a flow chart of a method for computing a low-dimensional representation in accordance with an embodiment of the present invention;

FIG. 3 is a flow chart of a method for calculating a low-dimensional weight characterization according to an embodiment of the present invention;

FIG. 4 is a schematic diagram illustrating data sample conversion according to an embodiment of the present invention;

FIG. 5 is a diagram illustrating weight parameter conversion according to an embodiment of the present invention;

FIG. 6 is a block diagram of an apparatus according to an embodiment of the present invention;

fig. 7 is a block diagram of an electronic device according to an embodiment of the present invention.

Detailed Description

The present invention will be further described in detail below with reference to specific embodiments and with reference to the accompanying drawings, in order to make the objects, technical solutions and advantages of the present invention more apparent.

It should be noted that unless otherwise defined, technical or scientific terms used in the embodiments of the present invention should be given the ordinary meaning as understood by one of ordinary skill in the art to which the present disclosure pertains. The terms "first," "second," and the like, as used in this disclosure, do not denote any order, quantity, or importance, but rather are used to distinguish one element from another. The word "comprising" or "comprises", and the like, means that elements or items preceding the word are included in the element or item listed after the word and equivalents thereof, but does not exclude other elements or items. The terms "connected" or "connected," and the like, are not limited to physical or mechanical connections, but may include electrical connections, whether direct or indirect. "upper", "lower", "left", "right", etc. are used merely to indicate relative positional relationships, which may also be changed when the absolute position of the object to be described is changed.

In some implementations, in the actual application process of some machine learning models with specific prediction functions, along with development of machine learning technology, the structures of the models are more complex, the models include a large number of redundant structures, and storage resources and computing resources are required in the model training and reasoning process, so that in order to implement the prediction functions of the models, the models can be deployed in servers with higher hardware configurations, such as a central server or a cloud server, but the burden of the servers is increased. The method comprises the steps that a server executes a training model, the trained model is deployed on edge equipment with lower configuration to execute model reasoning operation, however, in order to realize model training, data is required to be uploaded to the server, on one hand, the risk of data leakage exists, the method is not suitable for application scenes with higher requirements on data security, on the other hand, for application scenes with a mapping relation and a task mode which are changed, the edge equipment and the server need to frequently interact data, and for scenes with limited network resources and low delay requirements, the requirement cannot be met.

In order to solve the above problems, embodiments of the present invention provide a lightweight CNN computing method and apparatus based on stochastic matrix approximation, which can reduce complexity of model training and reasoning operation and reduce requirements for hardware configuration by reducing data volume of input data samples and data volume of network parameters, so that training and reasoning operation of a model can be implemented by using hardware devices with lower configuration.

FIG. 1 is a flow chart of a method according to an embodiment of the invention. As shown in the figure, the lightweight CNN calculation method based on random matrix approximation provided by the embodiment of the invention includes:

s101: performing dimension reduction processing on the data sample to obtain a low-dimension representation of the data sample;

s102: performing dimension reduction treatment on the weight parameters of the model to obtain low-dimensional weight characterization of the weight parameters;

s103: and performing CNN model training by using the low-dimensional characterization and the low-dimensional weight characterization.

In this embodiment, considering that some network model compression methods can simplify the structure of the model, for example, zero setting smaller elements in the convolution kernel or distributing larger absolute elements in the convolution kernel in a specific dimension, and then repeating the steps of training before compressing the convolution kernel until the convergence condition is met in the simplification methods, which do not consider that the data sample for training the model also contains a large amount of redundant data, resulting in a longer model training process and a very complex operation process. According to the lightweight CNN calculation method, the data quantity of the data sample is reduced, the data quantity of the weight parameters of the network is reduced, the CNN model is trained by utilizing the low-dimensional representation and the low-dimensional weight representation after the data quantity is reduced, the complexity of the training model can be reduced, the storage resources and the calculation resources required by the training model are reduced, and the training model operation can be realized on the terminal equipment with lower configuration.

FIG. 2 is a flow chart of a method for computing a low-dimensional representation in accordance with an embodiment of the present invention. As shown in the figure, in some embodiments, in step S101, the dimension reduction processing is performed on the data sample to obtain a low-dimension representation of the data sample, which includes:

s201: converting the tensor of data samples into a data matrix;

s202: extracting a travel data representation matrix and a column data representation matrix from the data matrix;

s203: and calculating to obtain a core data representation matrix according to the data representation matrix and the column data representation matrix.

And taking the data characterization matrix, the column data characterization matrix and the core data characterization matrix as low-dimensional characterization for training the model, and training the CNN model.

In this embodiment, the original data sample is a high-dimensional data sample tensor, in order to reduce the data volume and complexity of the data sample tensor, the data sample tensor is converted into a data matrix, a row data representation matrix and a column data representation matrix are extracted from the data matrix, a core data representation matrix is obtained by calculation based on the row data representation matrix and the column data representation matrix, the simplified row data representation matrix, the simplified column data representation matrix and the core data representation matrix are used as input data, a model is trained, and model training operation is simplified by optimizing and simplifying the data sample. According to the embodiment, the thought of approximation of the high-dimensional matrix is used for reference, three low-dimensional data characterization matrices are extracted from the high-dimensional data samples through random sampling and approximation calculation, and the calculation complexity and the space complexity can be effectively reduced on the premise that the generalization performance of the model is not reduced.

FIG. 3 is a flow chart of a method for calculating a low-dimensional weight characterization according to an embodiment of the present invention. As shown in the figure, in some embodiments, in step S102, the dimension reduction processing is performed on the weight parameters of the model to obtain a low-dimensional weight representation, which includes:

s301: initializing a weight parameter tensor;

s302: converting the weight parameter tensor into a weight matrix;

s303: extracting a travel weight representation matrix and a column weight representation matrix from the weight matrix;

s304: and calculating according to the weight matrix, the row weight characterization matrix and the column weight characterization matrix to obtain a core weight characterization matrix.

And then, taking the row weight characterization matrix, the column weight characterization matrix and the core weight characterization matrix as low-dimensional weight characterization for training the model, and training the CNN model.

In this embodiment, the weight parameter of the convolution kernel is a high-dimensional weight parameter tensor, in order to reduce the complexity of the weight parameter tensor, the weight parameter tensor is converted into a high-dimensional weight matrix, then a row weight representation matrix and a column weight representation matrix are extracted from the weight matrix, a core weight representation matrix is obtained by calculation based on the weight matrix, the row weight representation matrix and the column weight representation matrix, and the simplified row weight representation matrix, the simplified column weight representation matrix and the simplified core weight representation matrix are used as low-dimensional weight representation for training a model, so that the operand of the training model can be effectively reduced.

The lightweight CNN calculation method of the present invention is described in detail below with reference to specific embodiments.

1. Simplifying data samples of an input model

As shown in fig. 4 and 5, the data sample tensor of the convolutional layer of the convolutional neural network (Convolutional Neural Network, CNN) is set to be X e R ^m×h×w×c Where m is the number of data samples, h and w are the height and width of the data sample tensor, respectively, c is the number of channels of the data sample tensor, k and l are the height and width of the convolution kernel, respectively, and n is the number of convolution kernels.

Expanding a tensor X of data samples into a two-dimensional matrix X _m Two-dimensional matrix X _m The scale of (2) is related to the height and width of the convolution kernel, the scan step size, and the edge fill pattern. In one mode, taking the convolution kernel height and width as k and l respectively, the scanning step length as 1 and the edge not filling as an example, the data sample tensor X is unfolded to obtain a two-dimensional matrixWhere b= (m (h-k+1) (w-l+1)), a=klc, two-dimensional matrix X _m Expressed as:

X _m ＝unfold(X,height＝k,width＝l,strides＝1,padding＝0) (1)

the unfold is an expansion function, and the parameters height, width, stride and padding respectively represent the height, width, scanning step length and edge filling mode of the convolution kernel.

Obtaining a two-dimensional matrix X _m Thereafter, from the two-dimensional matrix X _m The data representation matrix and the column data representation matrix are obtained by sampling. In some embodiments, a suitable sampling pattern is selected from a two-dimensional matrix X _m The data characterization matrix and the column data characterization matrix are extracted. Optionally, the sampling method comprises the following steps: from a two-dimensional matrix X _m In the matrix obtained by sampling, the sum of squares of elements in each column is calculated, and the sum of squares of elements in each column is taken as the sampling probability of the column, so that the model performance is hardly affected after training.

For the data characterization matrix, let the sampling line number be t (t<<b) Constructing a row data sampling matrixStructured data sampling matrix S _r Only one element has a value of 1 and the other elements have values of 0. For the column data characterization matrix, let the sampling column number be s (s<<a) Constructing a column data sampling matrix->Structured column data sampling matrix S _c Only one element has a value of 1 and the other elements have a value of 0, expressed as:

S _r ,S _c ＝GetIndex(X _m ,s,t) (2)

the GetIndex represents a calculation method of a row data sampling matrix and a column data sampling matrix, for example, an average sampling method, a method of performing weighted sampling according to an absolute value or square of an element, and the like.

Then, a data sampling matrix S is used _r For two-dimensional matrix X _m Sampling to obtain a data representation matrixExpressed as:

X _r ＝S _r ^T X _m (3)

similarly, a column data sampling matrix S is utilized _c For two-dimensional matrix X _m Sampling to obtain column data characterization matrixExpressed as:

X _c ＝X _m S _c (4)

thereafter, matrix X is characterized in terms of data _r And column data characterization matrix X _c Constructing a core data characterization matrixIn some aspects, a data characterization matrix X may be selected _r And column data characterization matrix X _c Is a part of the overlapping part of (2)Is used for calculating the core data characterization matrix X _u The method comprises the following steps:

V＝S _r ^T X _m S _c (5)

it should be noted that matrix X is characterized by data _r And column data characterization matrix X _c Constructing a core data characterization matrix X _u The method is not limited to the above manner, and a core data representation matrix may be constructed by selecting a portion of the row data representation matrix and a portion of the column data representation matrix, or performing an operation according to elements of the row data representation matrix and elements of the column data representation matrix by a specific algorithm to obtain the core data representation matrix.

In this embodiment, since the original data sample is very complex, the original data sample is simplified, the row data representation matrix and the column data representation matrix are extracted, the core data representation matrix is obtained by calculating according to the row data representation matrix and the column data representation matrix, and the row data representation matrix, the column data representation matrix and the core data representation matrix are used as input data of the training model, so that the operation amount of the training model is greatly reduced, further, storage resources and calculation resources required by the training model are reduced, and the function of the training model can be realized even in hardware equipment with lower configuration.

2. Simplifying weight parameters

As shown in fig. 4 and 5, the data sample tensor of the convolution layer is set to beThe weight parameter tensor isWhere m is the number of input samples, h, w is the height and width of the input sample tensor, respectively, c is the number of channels of the data sample tensor, k, l is the height and width of the convolution kernel, respectively, and n is the number of convolution kernels.

In this embodiment, during initialization, firstly, a weight parameter tensor W is obtained by using an Xavier random initialization method, and any element W of the obtained weight parameter tensor W is initialized _i,x,y,z Obeys uniform distribution:

combining the weight parameter tensor W into the first three dimensions according to the row priority order to obtain a high-dimensional weight matrixa＝klc。

Constructing a row weight sampling matrix and a column weight sampling matrix, and extracting and sampling a low-dimensional weight matrix from the high-dimensional weight matrix by utilizing the row weight sampling matrix and the column weight sampling matrix. The method specifically comprises the following steps:

determining the number of sampling lines p (p<<a) And the number of columns q (q<<n) using the maximum volume methodUp-row weight sampling matrixAnd column weight sampling matrix->Expressed as:

T _r ,T _c ＝MaxVol(W _m ,p,q) (8)

wherein MaxVol represents the calculation method of the maximum volume method.

Constructing a row weight sampling matrix T _r Each column has only one element with a value of 1, and the values of other elements are all 0, and the matrix T is sampled by using row weight _r From the weight matrix W _m Extracting trip weight representation matrixThe method comprises the following steps:

W _r ＝T _r ^T W _m (9)

constructing a column weight sampling matrix T _c Each column has only one element with a value of 1, the values of other elements are all 0, and the matrix T is sampled by column weight _c From the weight matrix W _m Extracting the column weight representation matrixThe method comprises the following steps:

W _c ＝W _m T _c (10)

according to the weight matrix W _m Row weight characterization matrix W _r And column weight characterization matrix W _c Calculating to obtain a core weight representation matrix W _u The calculation method is as follows: solving forSo that the approximation error W _m -W _c W _u W _r || _F Minimum, obtain the core weight representation matrix W _u Is a closed form solution of:

in the embodiment of the invention, forward reasoning calculation is performed by utilizing the low-dimensional representation of the data sample and the low-dimensional weight representation of the weight parameter. In some embodiments, forward reasoning computation is divided into training patterns for training models and application patterns for prediction with models.

In training mode, three reduced low-dimensional weight characterization matrices are stored: row weight characterization matrix W _r Column weight characterization matrix W _c And a core weight characterization matrix W _u . In training mode, the data representing matrix X is simplified _r Column data characterization matrix X _c And core data characterization matrix X _u And the line weight representation matrix W after simplification _r Column weight characterization matrix W _c And a core weight characterization matrix W _u Multiplying by a bias matrixThrough the action of the activation function f, an output result is obtained>Expressed as:

X _m ’＝f(X _c (X _u (X _r W _c )W _u W _r )+b) (13)

in equation (13), each matrix W _r 、W _c 、W _u 、X _r 、X _c 、X _u According to the multiplication operation sequence calculation of the formula (13), the operation amount can be reduced to the greatest extent, the calculation complexity is about O (atq + bns), and compared with the traditional CNN forward reasoning calculation operation complexity O (abn), the calculation complexity is greatly reduced.

In the mode of application,only two low-dimensional weight characterization matrices are stored: column weight characterization matrix W _c Sum-product matrix W _t Product matrixIs a core weight characterization matrix W _u And a row weight characterization matrix W _r Is a product matrix of (1), namely:

W _t ＝W _u W _r (14)

in the application mode, the data subjected to the simplification process represents the matrix X _r Column data characterization matrix X _c And core data characterization matrix X _u And column weight representation matrix W after simplification _c Product matrix W _t Multiplying, adding a bias matrix, and obtaining an output result through the action of an activation function f, wherein the output result is expressed as:

X _m `＝f(X _c (X _u (X _r W _c )W _t )+b) (15)

obtain output result X _m After' conversion to output result tensorAnd outputting the output result tensor as a final output result.

In the formula (15), each matrix W _c 、W _t 、X _r 、X _c 、X _u According to the multiplication operation sequence calculation of the formula (15), the operation amount can be reduced to the greatest extent, the calculation complexity is about O (atq + bns), and compared with the traditional CNN forward reasoning calculation operation complexity O (abn), the calculation complexity is greatly reduced.

In the embodiment of the invention, backward propagation calculation is performed by using the low-dimensional representation of the simplified data sample and the low-dimensional weight representation of the simplified weight parameter. In some embodiments, the inverse error tensor of the current layer is converted into an error matrix, and the row error matrix, the column error matrix and the core error matrix are calculated according to the row data representation matrix, the column data representation matrix, the core data representation matrix, the row weight representation matrix, the column weight representation matrix, the core weight representation matrix and the error matrix, respectively, so as to obtain the inverse error tensor transferred to the next layer according to the row error matrix, the column error matrix and the core error matrix.

Let the error tensor received by the current layer beFirstly, converting error tensor delta into two-dimensional error matrix +.>Then, the matrix W is characterized according to the weights _r 、W _c 、W _u And each data characterization matrix X _r 、X _c 、X _u Respectively calculating column error matrix delta _c Row error matrix delta _r

And a core error matrix delta _u According to the sampling and pseudo-inverse operation process in the pushing, the method can be obtained by a chain rule:

wherein D is an intermediate variable, and I is an identity matrix. In the calculation of equation (16), the right-hand column data sampling matrix S _c Transposed S of (2) _c ^T Without matrix multiplication, due to the column data sampling matrix S _c Only one element value of each column is 1, the element values of the rest positions are all 0, and the transpose S is multiplied right _c ^T The inverse of the column sampling is equivalent to the process of placing the columns of the matrix multiplied by it back into an all-zero matrix according to the index being sampled. Similarly, a left-hand riding data sampling matrix S _r The operation of (2) is also the inverse of the line sampling, and only the line of the matrix multiplied by it needs to be put back into an all-zero matrix according to the sampled index. Compared with matrix multiplication, the calculation process has low complexity and greatly reduces the calculation amount.

According to the row error matrix, the column error matrix and the core error matrix, calculating a total error matrix of back propagation as follows:

δ′ _m ＝δ′ _c +δ′ _u +δ′ _r (17)

then, the total error matrix delta _m Conversion to error tensor delta' e R ^m×h×w×c Passing the error tensor to the next layer:

δ`＝fold(δ _m `,height＝k,width＝l,strides＝1,padding＝0) (18)

the fold is a folding function, and the meaning of parameters in the function is the same as that of parameters in an unfold function.

In the embodiment of the invention, the model parameters are updated by using the simplified low-dimensional representation and the low-dimensional weight representation. In some embodiments, the weight gradient is calculated using the low-dimensional representation and the low-dimensional weight representation, and the model parameters are updated according to the weight gradient. The weight gradient DeltaW is obtained according to the chain law _r Sum column weight gradient ΔW _r ：

ΔW _c ＝X _r ^T (X _u ^T ((X _c ^T δ _m )W _r ^T W _u ^T )) (19)

ΔW _r ＝(W _u ^T (W _c ^T X _r ^T )X _u ^T )(X _c ^T δ _m ) (20)

In the formulas (19) and (20), the calculation of each matrix is calculated according to the multiplication sequence of the formulas (19) and (20), so that the calculation amount can be reduced to the maximum extent, and the calculation complexity is about O (atq + bns), and compared with the traditional CNN model updating parameter calculation complexity O (abn), the calculation complexity is greatly reduced.

In this embodiment, in order to maintain the coupling relationship among the row weight representation matrix, the column weight representation matrix and the core weight representation matrix, so that the model training can converge more quickly, the three weight representation matrices need to be updated. The method comprises the following steps: firstly, according to the calculated row weight gradient delta W _r Sum column weight gradient ΔW _r Respectively updating the row weight representation matrix and the column weight representation matrix to obtain a row weight representation matrix W after the first update _r Characterization matrix W of column weights _c `：

W _r `＝W _r +lr*ΔW _r (21)

W _c `＝W _c +lr*ΔW _c (22)

Where lr is the learning rate.

Then, a row weight characterization matrix W after the first update is calculated _r Characterization matrix W of' and column weights _c Average matrix of overlapping portionsExpressed as:

M＝(T _r ^T W _c `+W _r `T _c )/2 (23)

the average matrix M is used for respectively replacing the row weight representation matrix W after the first updating _r Characterization matrix W of' and column weights _c And (3) the overlapping part of the matrix is used for obtaining a row weight representation matrix and a column weight representation matrix after the second updating, and a function h is defined for meeting the matrix addition calculation requirement, and zero elements can be filled around the matrix by the function h so that the matrix dimension meets the matrix addition requirement.

Characterizing the matrix W according to the column weights after the first update _c An average matrix M and a row weight sampling matrix T _r Performing a second update on the column weight representation matrix after the first update to obtain an updated column weight representation matrix W _c ``：

W _c ``＝W _c `+h(M-T _r ^T W _c `) (24)

Similarly, matrix W is characterized according to the row weight after the first update _r Sampling matrix T of' average matrix M and column weight _c Performing a second update on the row weight representation matrix after the first update to obtain an updated row weight representation matrix W _r ``：

W _r ``＝W _r `+h(M-W _r `T _c ) (25)

Updating the core weight representation matrix according to the average matrix M to obtain an updated row weight representation matrix W _u `：

The update calculation process of the weight representation matrix uses a row weight sampling matrix T _r And column weight sampling matrix T _c The sampling process of (2) can be directly realized by index extraction, and the calculation complexity is O (p ³ +q ³ ) Compared with the forward reasoning calculation and model parameter updating calculation process, the calculation complexity is negligible.

According to the lightweight CNN calculation method, the data quantity of the data sample of the input model is reduced, and the data quantity of the weight parameters of the model is reduced, so that the calculation quantity and the complexity of the training calculation, forward reasoning calculation, updating parameter calculation and the like of the CNN model, which relate to the calculation, are greatly reduced, and the required calculation resources are greatly reduced; moreover, all data samples and all weight parameters are not required to be stored, and only the low-dimensional characterization and the low-dimensional weight characterization are required to be stored, so that the space complexity is greatly reduced, and the required storage resources are greatly reduced. According to the lightweight CNN calculation method provided by the embodiment of the invention, the model can be deployed on a high-configuration server and low-configuration edge equipment, the model can be trained by the server and/or the edge equipment, the model is used for carrying out model calculation such as prediction and model update on input data, the requirements of adaptability, instantaneity and data confidentiality in various application scenes can be met, the application scene of the model is expanded, and the resource utilization rate is improved.

It should be noted that, the method of the embodiment of the present invention may be performed by a single device, for example, a computer or a server. The method of the embodiment can also be applied to a distributed scene, and is completed by mutually matching a plurality of devices. In the case of such a distributed scenario, one of the devices may perform only one or more steps of the method of an embodiment of the present invention, the devices interacting with each other to accomplish the method.

Fig. 6 is a block diagram of a device according to an embodiment of the present invention. As shown in the figure, the lightweight CNN computing device based on random matrix approximation provided by the embodiment of the present invention includes:

and the model training module is used for performing CNN model training by utilizing the low-dimensional representation and the low-dimensional weight representation.

The device of the foregoing embodiment is configured to implement the corresponding method in the foregoing embodiment, and has the beneficial effects of the corresponding method embodiment, which is not described herein.

Fig. 7 is a schematic diagram of a hardware structure of an electronic device according to the embodiment, where the device may include: a processor 1010, a memory 1020, an input/output interface 1030, a communication interface 1040, and a bus 1050. Wherein processor 1010, memory 1020, input/output interface 1030, and communication interface 1040 implement communication connections therebetween within the device via a bus 1050.

The processor 1010 may be implemented by a general-purpose CPU (Central Processing Unit ), microprocessor, application specific integrated circuit (Application Specific Integrated Circuit, ASIC), or one or more integrated circuits, etc. for executing relevant programs to implement the technical solutions provided in the embodiments of the present disclosure.

The Memory 1020 may be implemented in the form of ROM (Read Only Memory), RAM (Random Access Memory ), static storage device, dynamic storage device, or the like. Memory 1020 may store an operating system and other application programs, and when the embodiments of the present specification are implemented in software or firmware, the associated program code is stored in memory 1020 and executed by processor 1010.

The input/output interface 1030 is used to connect with an input/output module for inputting and outputting information. The input/output module may be configured as a component in a device (not shown) or may be external to the device to provide corresponding functionality. Wherein the input devices may include a keyboard, mouse, touch screen, microphone, various types of sensors, etc., and the output devices may include a display, speaker, vibrator, indicator lights, etc.

Communication interface 1040 is used to connect communication modules (not shown) to enable communication interactions of the present device with other devices. The communication module may implement communication through a wired manner (such as USB, network cable, etc.), or may implement communication through a wireless manner (such as mobile network, WIFI, bluetooth, etc.).

Bus 1050 includes a path for transferring information between components of the device (e.g., processor 1010, memory 1020, input/output interface 1030, and communication interface 1040).

It should be noted that although the above-described device only shows processor 1010, memory 1020, input/output interface 1030, communication interface 1040, and bus 1050, in an implementation, the device may include other components necessary to achieve proper operation. Furthermore, it will be understood by those skilled in the art that the above-described apparatus may include only the components necessary to implement the embodiments of the present description, and not all the components shown in the drawings.

The computer readable media of the present embodiments, including both permanent and non-permanent, removable and non-removable media, may be used to implement information storage by any method or technology. The information may be computer readable instructions, data structures, modules of a program, or other data. Examples of storage media for a computer include, but are not limited to, phase change memory (PRAM), static Random Access Memory (SRAM), dynamic Random Access Memory (DRAM), other types of Random Access Memory (RAM), read Only Memory (ROM), electrically Erasable Programmable Read Only Memory (EEPROM), flash memory or other memory technology, compact disc read only memory (CD-ROM), digital Versatile Discs (DVD) or other optical storage, magnetic cassettes, magnetic tape magnetic disk storage or other magnetic storage devices, or any other non-transmission medium, which can be used to store information that can be accessed by a computing device.

Those of ordinary skill in the art will appreciate that: the discussion of any of the embodiments above is merely exemplary and is not intended to suggest that the scope of the disclosure, including the claims, is limited to these examples; the technical features of the above embodiments or in the different embodiments may also be combined within the idea of the invention, the steps may be implemented in any order and there are many other variations of the different aspects of the invention as described above, which are not provided in detail for the sake of brevity.

Additionally, well-known power/ground connections to Integrated Circuit (IC) chips and other components may or may not be shown within the provided figures, in order to simplify the illustration and discussion, and so as not to obscure the invention. Furthermore, the devices may be shown in block diagram form in order to avoid obscuring the invention, and also in view of the fact that specifics with respect to implementation of such block diagram devices are highly dependent upon the platform within which the present invention is to be implemented (i.e., such specifics should be well within purview of one skilled in the art). Where specific details (e.g., circuits) are set forth in order to describe example embodiments of the invention, it should be apparent to one skilled in the art that the invention can be practiced without, or with variation of, these specific details. Accordingly, the description is to be regarded as illustrative in nature and not as restrictive.

While the invention has been described in conjunction with specific embodiments thereof, many alternatives, modifications, and variations of those embodiments will be apparent to those skilled in the art in light of the foregoing description. For example, other memory architectures (e.g., dynamic RAM (DRAM)) may use the embodiments discussed.

The embodiments of the invention are intended to embrace all such alternatives, modifications and variances which fall within the broad scope of the appended claims. Therefore, any omission, modification, equivalent replacement, improvement, etc. of the present invention should be included in the scope of the present invention.

Claims

1. The lightweight CNN calculation method based on random matrix approximation is characterized by comprising the following steps:

performing dimension reduction processing on the data sample to obtain a low-dimension representation of the data sample, wherein the dimension reduction processing comprises the following steps: converting the tensor of data samples into a data matrix; extracting a travel data representation matrix and a column data representation matrix from the data matrix; according to the data representation matrix and the column data representation matrix, calculating to obtain a core data representation matrix;

performing dimension reduction processing on the weight parameters of the model to obtain low-dimensional weight characterization of the weight parameters, wherein the dimension reduction processing comprises the following steps: initializing a weight parameter tensor; converting the weight parameter tensor into a weight matrix; extracting a travel weight representation matrix and a column weight representation matrix from the weight matrix; according to the weight matrix, the row weight characterization matrix and the column weight characterization matrix, a core weight characterization matrix is obtained through calculation, and the calculation method comprises the following steps: solving a closed solution of a core weight representation matrix, so that the approximation error between the weight matrix and a result obtained by multiplying the column weight representation matrix, the core weight representation matrix and the row weight representation matrix is minimum;

performing CNN model training by using the low-dimensional characterization and the low-dimensional weight characterization; the model is deployed on a server and/or edge equipment and used for predicting input data, wherein the input data comprises image data, video data and/or text data;

forward reasoning calculation using the low-dimensional characterization and the low-dimensional weight characterization, including:

in a training mode, multiplying the row data representation matrix, the column data representation matrix and the core data representation matrix with the row weight representation matrix, the column weight representation matrix and the core weight representation matrix, adding a bias matrix, and obtaining an output result through the action of an activation function;

in an application mode, multiplying the row data representation matrix, the column data representation matrix and the core data representation matrix by the column weight representation matrix and the product matrix, adding a bias matrix, and obtaining an output result through the action of an activation function; the product matrix is obtained by multiplying the core weight representation matrix and the row weight representation matrix.

2. The method of claim 1, wherein extracting the row weight characterization matrix and the column weight characterization matrix from the weight matrix comprises:

3. The method as recited in claim 1, further comprising:

4. The method as recited in claim 1, further comprising:

5. A lightweight CNN computing device based on stochastic matrix approximation, comprising:

the sample processing module is used for carrying out dimension reduction processing on the data sample to obtain low-dimension representation of the data sample, and comprises the following steps: converting the tensor of data samples into a data matrix; extracting a travel data representation matrix and a column data representation matrix from the data matrix; according to the data representation matrix and the column data representation matrix, calculating to obtain a core data representation matrix;

the parameter processing module is used for performing dimension reduction processing on the weight parameters of the model to obtain low-dimensional weight characterization of the weight parameters, and comprises the following steps: initializing a weight parameter tensor; converting the weight parameter tensor into a weight matrix; extracting a travel weight representation matrix and a column weight representation matrix from the weight matrix; according to the weight matrix, the row weight characterization matrix and the column weight characterization matrix, a core weight characterization matrix is obtained through calculation, and the calculation method comprises the following steps: solving a closed solution of a core weight representation matrix, so that the approximation error between the weight matrix and a result obtained by multiplying the column weight representation matrix, the core weight representation matrix and the row weight representation matrix is minimum;

the model training module is used for performing CNN model training by utilizing the low-dimensional representation and the low-dimensional weight representation; the model is deployed on a server and/or edge equipment and used for predicting input data, wherein the input data comprises image data, video data and/or text data;

the forward reasoning calculation module is used for multiplying the row data representation matrix, the column data representation matrix and the core data representation matrix with the row weight representation matrix, the column weight representation matrix and the core weight representation matrix in a training mode, adding a bias matrix, and obtaining an output result through the action of an activation function; in an application mode, multiplying the row data representation matrix, the column data representation matrix and the core data representation matrix by the column weight representation matrix and the product matrix, adding a bias matrix, and obtaining an output result through the action of an activation function f; the product matrix is obtained by multiplying the core weight representation matrix and the row weight representation matrix.