CN111340182A

CN111340182A - Input feature approximation low-complexity CNN training method and device

Info

Publication number: CN111340182A
Application number: CN202010086794.3A
Authority: CN
Inventors: 李斌; 陈沛鋆; 刘宏福; 赵成林; 许方敏
Original assignee: Institute Of Sensing Technology And Business Beijing University Of Posts And Telecommunication; Beijing University of Posts and Telecommunications
Current assignee: Institute Of Sensing Technology And Business Beijing University Of Posts And Telecommunication; Beijing University of Posts and Telecommunications
Priority date: 2020-02-11
Filing date: 2020-02-11
Publication date: 2020-06-26
Anticipated expiration: 2040-02-11
Also published as: CN111340182B

Abstract

The invention provides a low-complexity CNN training method and a device for input feature approximation, which comprises the following steps: performing dimensionality reduction processing on the data sample to obtain a low-dimensional representation of the data sample; and training the CNN model by taking the low-dimensional representation as input data for training the model. According to the invention, through reducing the dimensionality of the data sample and using the low-dimensional representation of the data sample with the reduced data volume as the input data for training the model, the model is trained, the complexity of CNN model training can be reduced, the storage resources and the calculation resources required by the training model are reduced, the model correlation operation can be realized on the terminal equipment with lower configuration, and the application scene is expanded.

Description

Input feature approximation low-complexity CNN training method and device

Technical Field

The invention relates to the technical field of machine learning, in particular to a low-complexity CNN training method and device for input feature approximation.

Background

Machine learning is a multi-field subject combining technologies of mathematics, computers and the like, and is widely researched, popularized and applied in recent years. Various machine learning models have good application effects in the fields of images, videos, semantic analysis, machine translation, sequence processing and the like. However, with the development of deep learning technology, the model structure is more and more complex, the computation complexity of the training and reasoning process is more and more high, and the requirements on the storage and computation resources of hardware equipment are high, so that the model can only be deployed in a server and cannot realize the functions of the low-configuration terminal equipment.

Disclosure of Invention

In view of the above, the present invention provides a low complexity CNN training method and apparatus for input feature approximation, so as to solve the problem of high requirement on hardware due to complex model.

Based on the above purpose, the present invention provides a low complexity CNN training method for input feature approximation, which includes:

performing dimensionality reduction processing on the data sample to obtain a low-dimensional representation of the data sample;

and training the CNN model by taking the low-dimensional representation as input data for training the model.

Optionally, the performing dimension reduction processing on the data sample to obtain a low-dimensional characterization of the data sample includes:

converting the data sample tensor into a data matrix;

extracting a trip characterization matrix and a list characterization matrix from the data matrix;

and calculating to obtain a core characterization matrix according to the row characterization matrix and the list characterization matrix.

Optionally, training the CNN model by using the low-dimensional characterization as input data for training the model, including:

and taking the row characterization matrix, the list characterization matrix and the core characterization matrix as input data for training a model, and training a CNN model.

Optionally, the extracting a travel characterization matrix and a list characterization matrix from the data matrix includes:

constructing a row sampling matrix with only one value of 1 in each column, and calculating the row sampling matrix and the data matrix to obtain a row characterization matrix;

and constructing a column sampling matrix with only one value of 1 in each column, and calculating the column sampling matrix and the data matrix to obtain the column characterization matrix.

Optionally, the step of calculating a core characterization matrix according to the row characterization matrix and the list characterization matrix includes:

and selecting a superposition part of the row characterization matrix and the list characterization matrix to perform pseudo-inverse calculation to obtain the core characterization matrix.

Optionally, the method further includes:

and carrying out forward reasoning calculation by using the input data.

Optionally, the method further includes:

and performing back propagation calculation by using the input data.

Optionally, the method further includes:

updating model parameters using the input data.

The embodiment of the present invention further provides a low complexity CNN training device for input feature approximation, including:

the sample processing module is used for carrying out dimensionality reduction processing on the data sample to obtain a low-dimensional representation of the data sample;

and the model training module is used for training the CNN model by taking the low-dimensional representation as input data for training the model.

The embodiment of the invention also provides electronic equipment which comprises a memory, a processor and a computer program which is stored on the memory and can run on the processor, wherein the processor realizes the input feature approximating low-complexity CNN training method when executing the program.

From the above, it can be seen that the input feature approximated low-complexity CNN training method and apparatus provided by the present invention perform dimension reduction processing on the data sample to obtain a low-dimensional representation of the data sample, and train the CNN model by using the low-dimensional representation as input data for training the model. According to the invention, through reducing the dimensionality of the data sample and using the low-dimensional representation after the data volume is reduced as the input data for training the model, the model is trained, the complexity of model training can be reduced, the storage resources and the calculation resources required by the training model are reduced, the model correlation operation can be realized on the terminal equipment with lower configuration, and the application scene is expanded.

Drawings

In order to more clearly illustrate the embodiments of the present invention or the technical solutions in the prior art, the drawings used in the description of the embodiments or the prior art will be briefly described below, it is obvious that the drawings in the following description are only some embodiments of the present invention, and for those skilled in the art, other drawings can be obtained according to the drawings without creative efforts.

FIG. 1 is a schematic flow chart of a method according to an embodiment of the present invention;

FIG. 2 is a schematic flow chart of a method for computing a low-dimensional representation according to an embodiment of the present invention;

FIG. 3 is a diagram illustrating a data sample tensor conversion two-dimensional matrix according to an embodiment of the present invention;

FIG. 4 is a diagram illustrating a transformation of a weight matrix according to an embodiment of the present invention;

FIG. 5 is a schematic diagram of a sampling method according to an embodiment of the present invention;

FIG. 6 is a block diagram of an apparatus according to an embodiment of the present invention;

fig. 7 is a block diagram of an electronic device according to an embodiment of the present invention.

Detailed Description

In order to make the objects, technical solutions and advantages of the present invention more apparent, the present invention is described in further detail below with reference to specific embodiments and the accompanying drawings.

It is to be noted that technical terms or scientific terms used in the embodiments of the present invention should have the ordinary meanings as understood by those having ordinary skill in the art to which the present disclosure belongs, unless otherwise defined. The use of "first," "second," and similar terms in this disclosure is not intended to indicate any order, quantity, or importance, but rather is used to distinguish one element from another. The word "comprising" or "comprises", and the like, means that the element or item listed before the word covers the element or item listed after the word and its equivalents, but does not exclude other elements or items. The terms "connected" or "coupled" and the like are not restricted to physical or mechanical connections, but may include electrical connections, whether direct or indirect. "upper", "lower", "left", "right", and the like are used merely to indicate relative positional relationships, and when the absolute position of the object being described is changed, the relative positional relationships may also be changed accordingly.

In some implementation manners, in the practical application process of some machine learning models with specific prediction functions, along with the development of machine learning technologies, the structures of the models become more complex, the models include a large number of redundant structures, the requirements on storage resources and computing resources required by the model training and reasoning process are high, in order to implement the prediction functions of the models, the models can be deployed in servers with higher hardware configurations, such as a central server or a cloud server, but the load of the servers can be increased. One solution is that a server executes a training model, and the trained model is deployed in edge equipment with a low configuration to execute model inference operation, however, in order to implement model training, data needs to be uploaded to the server, on one hand, there is a risk of data leakage and is not suitable for an application scenario with a high requirement on data security, and on the other hand, for an application scenario in which a mapping relationship and a task mode change, the edge equipment and the server need to frequently interact data, and the demand cannot be met for a scenario with limited network resources and a low delay requirement.

In order to solve the above problems, embodiments of the present invention provide a low-complexity CNN training method and apparatus with approximated input features, which can reduce the complexity of model training and inference operations and reduce the requirements on hardware configuration by reducing the dimensionality of input data samples, so that the training and inference operations of a model can be implemented by using hardware devices with lower configuration.

FIG. 1 is a schematic flow chart of a method according to an embodiment of the present invention. As shown in the figure, the low-complexity CNN training method for input feature approximation provided in the embodiment of the present invention includes:

s101: performing dimensionality reduction processing on the data sample to obtain a low-dimensional representation of the data sample;

s102: the CNN model is trained using the low-dimensional representations as input data for training the model.

In this embodiment, it is considered that some network compression methods can simplify the structure of the model, for example, compress the convolution kernel, however, it is not considered that the data samples used for training the model often contain a large amount of redundant data, which results in a complicated process for training the model. According to the CNN model training method provided by the embodiment of the invention, the dimensionality of the data sample is reduced, the low-dimensional representation with the reduced data volume is used as input data for training the model, the CNN model is trained, the complexity of model training can be reduced, the storage resources and the calculation resources required by the training model are reduced, and the training model operation can be realized on a terminal device with lower configuration.

FIG. 2 is a flow chart of a method for calculating a data characterization sample according to an embodiment of the present invention. As shown in the figure, in an embodiment, in the step S101, the dimension reduction processing is performed on the data sample to obtain a low-dimensional representation of the data sample, which includes

S201: converting the data sample tensor into a data matrix;

s202: extracting a trip characterization matrix and a list characterization matrix from the data matrix;

s203: and calculating to obtain a core characterization matrix according to the row characterization matrix and the list characterization matrix.

Then, the model is trained using the row characterization matrix, the list characterization matrix, and the core characterization matrix as input data for training the model.

In this embodiment, the original data samples are high-dimensional data sample tensors, in order to reduce the data volume of the data sample tensors, the data sample tensors are converted into data matrices, a row feature matrix and a list feature matrix are extracted from the data matrices, a core feature matrix is obtained through calculation based on the row feature matrix and the list feature matrix, the simplified row feature matrix, the simplified list feature matrix and the simplified core feature matrix are used as input data, a model is trained, and model training operations are simplified by optimizing and simplifying the data samples. According to the embodiment, the thought of approximate approximation of the high-dimensional matrix is used for reference, three low-dimensional characterization matrixes are extracted from the high-dimensional data sample through random sampling and approximate calculation, and the calculation complexity and the space complexity can be effectively reduced on the premise of not reducing the generalization performance of the model.

The low complexity CNN training method of the present invention is described below with reference to specific embodiments.

Let X ∈ R be the data sample tensor of the Convolutional layer of Convolutional Neural Network (CNN)^m×h×w×cThe weight tensor is W ∈ R^k×l×c×nWherein m is the number of data samples, h and w are the height and width of the data sample tensor respectively, c is the number of channels of the data sample tensor, k and l are the height and width of the convolution kernel respectively, and n is the number of convolution kernels.

Unfolding a data sample tensor X into a two-dimensional matrix X_m，X_mIs related to the height and width of the convolution kernel, the scan step size, and the edge filling approach. In one mode, taking the height and width of a convolution kernel as k and l, the scanning step length as 1, and the edge not filled as an example, the tensor X of the data sample is unfolded to obtain a two-dimensional matrix X_m∈R^a×bWhere a ═ m (h-k +1) (w-l +1)) and b ═ klc, as shown in fig. 3, the two-dimensional matrix X_mExpressed as:

X_m＝unfold(X,height＝k,width＝l,strides＝1,padding＝0) (1)

wherein, underfold is an expansion function, and parameters height, width, threads and padding respectively represent height, width, scanning step length and edge filling mode of a convolution kernel.

Obtaining a two-dimensional matrixX_mAnd then, sampling is extracted from the two-dimensional matrix to obtain a row feature matrix and a list feature matrix. In some embodiments, a suitable sampling method is selected from the two-dimensional matrix X_mAnd extracting a row characterization matrix and a list characterization matrix. Optionally, the sampling method includes: from a two-dimensional matrix X_mIn the method, uniform row sampling is carried out, the sum of squares of elements in each column is calculated in a matrix obtained by sampling, the sum of squares of elements in each column accounts for the sum of squares of all elements in all columns of the matrix and is taken as the sampling probability of the column, and the performance of a model is hardly influenced by the sampling method after training. As shown in fig. 5, the sampling probability of the first column is:

for a row characterization matrix, let the number of sampling rows be t (t)<<a) Constructing a line sampling matrix according to the selected line sampling method

The constructed row sampling matrix S_rHas only one element with a value of 1 and the other elements have a value of 0, and then the row sampling matrix S is used_rFor two-dimensional matrix X_mSampling to obtain a row characterization matrix

R＝S_r ^TX_m(3)

For a list characterization matrix, let the number of sampling columns be s(s)<<b) Constructing a column sampling matrix according to the selected column sampling method

Constructed column sampling matrix S_cHas only one element with a value of 1 and the other elements have a value of 0, and then the column sampling matrix S is used_cFor two-dimensional matrix X_mSampling to obtain a list characterization matrix

C＝X_mS_c(4)

Then, according to the row characterization matrix and the list characterization matrix, a core characterization matrix is constructed

In some embodiments, the coincidence of the row characterization matrix R and the column characterization matrix C may be selected

To compute the core characterization matrix U, i.e.:

V＝S_r ^TX_mS_c(5)

it should be noted that the method for constructing the core characterization matrix U according to the row characterization matrix R and the column characterization matrix C is not limited to the above manner, and a part of the row characterization matrix and a part of the column characterization matrix may also be selected to construct the core characterization matrix, or an operation is performed according to an element of the row characterization matrix and an element of the column characterization matrix by using a specific algorithm to obtain the core characterization matrix.

In this embodiment, because the original data samples are very complex, the original data samples are simplified to reduce the data volume, the travel characterization matrix and the list characterization matrix are extracted, the core characterization matrix is obtained through calculation according to the row characterization matrix and the list characterization matrix, the row characterization matrix, the list characterization matrix and the core characterization matrix are used as input data of a training model to train the CNN model, the computation volume of the training model is greatly reduced, the storage resources and the computation resources required by the training model are further reduced, and the function of the training model can be realized even in a hardware device with low configuration.

In the embodiment of the invention, the forward reasoning calculation is carried out on the model obtained by training by using the input data. In some embodiments, the weight parameters of the network are processed to obtain a weight matrix, the input data and the weight matrix are multiplied to obtain a product result, the product result is added with a bias vector, and an output result of the model is obtained after the action of an activation function.

Referring to fig. 4, for the CNN network, the first three dimensions of the convolution kernel tensor W are first combined to obtain a two-dimensional weight matrix

Then, a row characterization matrix R, a column characterization matrix C, a core characterization matrix U and a weight matrix W are combined_mMultiplying to obtain product result, adding offset vector

Obtaining the output result of the model after the action of the activation function f

Expressed as:

X_m`＝f(C(URW_m)+b) (7)

as shown in formula (7), the core characterization matrix U is multiplied by the row characterization matrix R and then multiplied by the two-dimensional weight matrix W_mThe multiplication sequence is calculated in a minimum amount by multiplying the obtained product by the list characterization matrix C, and the model calculation amount can be reduced to the maximum extent.

In the embodiment of the invention, the model obtained by training is subjected to backward propagation calculation by using the input data. In some embodiments, the reverse error tensor of the current layer is converted into an error matrix, a row error matrix, a column error matrix and a core error matrix are respectively obtained through calculation according to the row characterization matrix, the column characterization matrix, the core characterization matrix and the error matrix, and the reverse error tensor transmitted to the next layer is obtained according to the row error matrix, the column error matrix and the core error matrix.

Let the tensor of the error received by the current layer be

Firstly, the error tensor delta is converted into a two-dimensional error matrix

Then separately calculate the column error matrix delta_c', row error matrix delta_r"sum core error matrix delta_uAccording to the sampling and pseudo-inverse operation process in the estimation, the method can obtain the following steps by a chain rule:

in the calculation process, the column sampling matrix S of right multiplication_cIs transposed S_c ^TWithout matrix multiplication, since the column sample matrix S_cHas only one element with a value of 1, and the element values of the rest positions are all 0, and are multiplied by S right_c ^TCorresponding to the inverse of the column sampling, i.e. the columns of the matrix to which they are multiplied are placed back in an all-zero matrix according to the sampled index. Similarly, the left-hand row sampling matrix S_rThe operation of (1) is the inverse of row sampling, and only the rows of the matrix multiplied by it need to be placed back into an all-zero matrix according to the sampled index. Compared with matrix multiplication, the complexity of the calculation process is low, and the calculation amount is greatly reduced.

And calculating a total error matrix of the back propagation according to the row error matrix, the column error matrix and the core error matrix as follows:

δ_m`＝δ_c`+δ_u`+δ_r`，δ_m`∈-^a×b(9)

then, the total error matrix delta is determined_mThe 'conversion to error tensor δ' ∈ R^m×h×w×cThe error tensor is passed to the next layer:

δ`＝fold(δ_m`,height＝k,width＝l,strides＝1,padding＝0) (10)

wherein, fold is a folding function, and the meaning of the parameter in the function is the same as that of the parameter in the unfold function.

In the embodiment of the invention, the model parameters are updated by using the input data. In some embodiments, a weight gradient is calculated using the input data, and the model parameters are updated based on the weight gradient.

Will error matrix delta_mMultiplying the obtained data by a row characterization matrix R, a list characterization matrix C and a core characterization matrix U to calculate the weight gradient

The calculation formula is as follows:

ΔW_m＝R^T(U^T(C^Tδ_m))/m (11)

the transpose C of the list eigen matrix C is shown in equation (11)^TAnd error matrix delta_mTranspose U of core characterization matrix U after multiplication^TMultiplying, the resulting product being then transposed R of the row characterization matrix R^TMultiplication, such multiplication operations are performed in the order of the smallest calculation amount.

Calculating to obtain a weight gradient delta W_mThen, it is converted into a gradient tensor

The model is then updated using a gradient descent based optimization algorithm.

According to the low-complexity CNN training method, the computational complexity of the training model calculation depends on the sampling calculation method, and is generally lower than the computational processes of forward propagation, backward propagation and model updating; and the computational complexity of the forward propagation calculation and the calculation of the updated model parameters are both O (asn + stn + btn), the computational complexity of the backward propagation calculation is about O (asn + stn + btn), and by synthesis, the computational complexity of the CNN model training and updating method can be reduced to O (asn + stn + btn) which is far lower than the computational complexity O (abn) of the existing model training, and the required computational resources are greatly reduced. In addition, because the input data samples are required to be used for updating the model parameter calculation, compared with the prior art in which all data samples are reserved, the method only needs to reserve the row feature matrix, the list feature matrix and the core feature matrix extracted from all the data samples, the spatial complexity is reduced to O (as + tb + st), which is far lower than the original spatial complexity O (ab), and the required storage resources are greatly reduced. The low-complexity CNN model training method provided by the embodiment of the invention can deploy the model to a high-configuration server and a low-configuration edge device, can realize model operations such as model training, model input prediction and model updating by using the server and/or the edge device, can meet the requirements of adaptivity, real-time property and data confidentiality in various application scenes, expands the application scenes of the model and improves the resource utilization rate.

It should be noted that the method of the embodiment of the present invention may be executed by a single device, such as a computer or a server. The method of the embodiment can also be applied to a distributed scene and completed by the mutual cooperation of a plurality of devices. In the case of such a distributed scenario, one of the multiple devices may only perform one or more steps of the method according to the embodiment of the present invention, and the multiple devices interact with each other to complete the method.

FIG. 6 is a block diagram of an apparatus according to an embodiment of the present invention. As shown in the figure, the low-complexity CNN training apparatus for input feature approximation provided in the embodiment of the present invention includes:

The apparatus of the foregoing embodiment is used to implement the corresponding method in the foregoing embodiment, and has the beneficial effects of the corresponding method embodiment, which are not described herein again.

Fig. 7 is a schematic diagram illustrating a more specific hardware structure of an electronic device according to this embodiment, where the electronic device may include: a processor 1010, a memory 1020, an input/output interface 1030, a communication interface 1040, and a bus 1050. Wherein the processor 1010, memory 1020, input/output interface 1030, and communication interface 1040 are communicatively coupled to each other within the device via bus 1050.

The processor 1010 may be implemented by a general-purpose CPU (Central Processing Unit), a microprocessor, an Application Specific Integrated Circuit (ASIC), or one or more Integrated circuits, and is configured to execute related programs to implement the technical solutions provided in the embodiments of the present disclosure.

The Memory 1020 may be implemented in the form of a ROM (Read Only Memory), a RAM (Random access Memory), a static storage device, a dynamic storage device, or the like. The memory 1020 may store an operating system and other application programs, and when the technical solution provided by the embodiments of the present specification is implemented by software or firmware, the relevant program codes are stored in the memory 1020 and called to be executed by the processor 1010.

The input/output interface 1030 is used for connecting an input/output module to input and output information. The i/o module may be configured as a component in a device (not shown) or may be external to the device to provide a corresponding function. The input devices may include a keyboard, a mouse, a touch screen, a microphone, various sensors, etc., and the output devices may include a display, a speaker, a vibrator, an indicator light, etc.

The communication interface 1040 is used for connecting a communication module (not shown in the drawings) to implement communication interaction between the present apparatus and other apparatuses. The communication module can realize communication in a wired mode (such as USB, network cable and the like) and also can realize communication in a wireless mode (such as mobile network, WIFI, Bluetooth and the like).

Bus 1050 includes a path that transfers information between various components of the device, such as processor 1010, memory 1020, input/output interface 1030, and communication interface 1040.

It should be noted that although the above-mentioned device only shows the processor 1010, the memory 1020, the input/output interface 1030, the communication interface 1040 and the bus 1050, in a specific implementation, the device may also include other components necessary for normal operation. In addition, those skilled in the art will appreciate that the above-described apparatus may also include only those components necessary to implement the embodiments of the present description, and not necessarily all of the components shown in the figures.

Computer-readable media of the present embodiments, including both non-transitory and non-transitory, removable and non-removable media, may implement information storage by any method or technology. The information may be computer readable instructions, data structures, modules of a program, or other data. Examples of computer storage media include, but are not limited to, phase change memory (PRAM), Static Random Access Memory (SRAM), Dynamic Random Access Memory (DRAM), other types of Random Access Memory (RAM), Read Only Memory (ROM), Electrically Erasable Programmable Read Only Memory (EEPROM), flash memory or other memory technology, compact disc read only memory (CD-ROM), Digital Versatile Discs (DVD) or other optical storage, magnetic cassettes, magnetic tape magnetic disk storage or other magnetic storage devices, or any other non-transmission medium that can be used to store information that can be accessed by a computing device.

Those of ordinary skill in the art will understand that: the discussion of any embodiment above is meant to be exemplary only, and is not intended to intimate that the scope of the disclosure, including the claims, is limited to these examples; within the idea of the invention, also features in the above embodiments or in different embodiments may be combined, steps may be implemented in any order, and there are many other variations of the different aspects of the invention as described above, which are not provided in detail for the sake of brevity.

In addition, well known power/ground connections to Integrated Circuit (IC) chips and other components may or may not be shown within the provided figures for simplicity of illustration and discussion, and so as not to obscure the invention. Furthermore, devices may be shown in block diagram form in order to avoid obscuring the invention, and also in view of the fact that specifics with respect to implementation of such block diagram devices are highly dependent upon the platform within which the present invention is to be implemented (i.e., specifics should be well within purview of one skilled in the art). Where specific details (e.g., circuits) are set forth in order to describe example embodiments of the invention, it should be apparent to one skilled in the art that the invention can be practiced without, or with variation of, these specific details. Accordingly, the description is to be regarded as illustrative instead of restrictive.

While the present invention has been described in conjunction with specific embodiments thereof, many alternatives, modifications, and variations of these embodiments will be apparent to those of ordinary skill in the art in light of the foregoing description. For example, other memory architectures (e.g., dynamic ram (dram)) may use the discussed embodiments.

The embodiments of the invention are intended to embrace all such alternatives, modifications and variances that fall within the broad scope of the appended claims. Therefore, any omissions, modifications, substitutions, improvements and the like that may be made without departing from the spirit and principles of the invention are intended to be included within the scope of the invention.

Claims

1. A low-complexity CNN training method for input feature approximation is characterized by comprising the following steps:

2. The method of claim 1, wherein the performing the dimensionality reduction on the data sample to obtain the low-dimensional characterization of the data sample comprises:

converting the data sample tensor into a data matrix;

3. The method of claim 2, wherein training a CNN model using the low-dimensional tokens as input data for training the model comprises:

4. The method of claim 2, wherein extracting a row characterization matrix and a list characterization matrix from the data matrix comprises:

5. The method of claim 4, wherein computing a core characterization matrix from the row characterization matrix and the column characterization matrix comprises:

6. The method of claim 1, further comprising:

and carrying out forward reasoning calculation by using the input data.

7. The method of claim 1, further comprising:

and performing back propagation calculation by using the input data.

8. The method of claim 1, further comprising:

updating model parameters using the input data.

9. A low complexity CNN training device for input feature approximation, comprising:

10. An electronic device comprising a memory, a processor and a computer program stored on the memory and executable on the processor, characterized in that the processor implements the method according to any of claims 1 to 8 when executing the program.