RU2734579C1

RU2734579C1 - Artificial neural networks compression system based on iterative application of tensor approximations

Info

Publication number: RU2734579C1
Application number: RU2019145091A
Authority: RU
Inventors: Юлия Валерьевна Гусак; Евгений Сергеевич Пономарев; Лариса Борисовна Маркеева; Анджей Станислав Чихоцкий; Иван Валерьевич ОСЕЛЕДЕЦ; Максим Дмитриевич Холявченко
Priority date: 2019-12-30
Filing date: 2019-12-30
Publication date: 2020-10-20

Abstract

FIELD: physics.

SUBSTANCE: invention relates to compression of artificial neural networks. System comprises a compression device comprising a module for automatic determination of compression parameters (rank selector module) and a module which performs replacement of parameters of convolutional/fully connected layers of the NN with their low-rank approximation, obtained using tensor/matrix expansions (tensor approximator module), and a fine tuning device, wherein the compression device receives to the input of the NN, the rank selector module automatically for each convolutional/fully connected layer of the NN selects the rank of the tensor decomposition, which is used when approximating the weight tensor, after which the tensor approximator module changes the weight of the layer to its low-rank approximation such that the total number of parameters of the new tensors is less than the number of parameters in the initial tensor, and fine adjustment device receives input of converted NN from compression device and outputs to output optimized NN, having better predictive ability due to correction of model parameters, which is performed by method of back propagation of error using database.

EFFECT: technical result consists in improvement of compression efficiency of artificial neural networks.

1 cl, 5 dwg

Description

ОБЛАСТЬ ТЕХНИКИFIELD OF TECHNOLOGY

Настоящее техническое решение относится к области информационных технологий, в частности, к системе сжатия искусственных нейронных сетей (НС) на основе итеративного применения тензорных аппроксимаций.This technical solution relates to the field of information technology, in particular, to the compression system of artificial neural networks (NN) based on the iterative application of tensor approximations.

УРОВЕНЬ ТЕХНИКИLEVEL OF TECHNOLOGY

Из уровня техники известны решения, которые предлагают использовать матрицы, подобные Теплицевым для аппроксимации тензоров весов в Recursive Neural Networks (RNN) и Long Short-Term Memory (LSTM). Данные методы базируются на довольно сильном предположении о структуре тензора весов. В случае невыполнения данного предположения компрессия данным методом может быть невозможна. Такие решения, например, раскрыты в следующих документах: US20170076196A1 (МПК G06N3/04, опубл. 2017-03-16); CN107038476A (МПК G06N3/04, опубл. 2017-08-11).In the prior art, solutions are known that propose the use of matrices like Toeplitzs for approximating weight tensors in Recursive Neural Networks (RNN) and Long Short-Term Memory (LSTM). These methods are based on a fairly strong assumption about the structure of the weight tensor. If this assumption is not met, compression by this method may not be possible. Such solutions, for example, are disclosed in the following documents: US20170076196A1 (IPC G06N3 / 04, publ. 2017-03-16); CN107038476A (IPC G06N3 / 04, publ. 2017-08-11).

Кроме того, из уровня техники известны решения, которые предлагают сокращение количества фильтров в сверточных слоях путем их удаления. Несмотря на то, что данный метод сокращает количество нейронов в нейронной сети, он не использует в полной мере структурные свойства тензора, поэтому компрессия такими методами менее интенсивна. Такие решения раскрыты в следующих документах: CN107392305A (МПК G06N3/04, опубл. 2017-11-24); US201716346313A1 (МПК G06N3/04, опубл. 2019-09-12); CN201910338123А (МПК G06K9/62, опубл. 2019-08-27).In addition, prior art solutions are known that suggest reducing the number of filters in convolutional layers by removing them. Despite the fact that this method reduces the number of neurons in the neural network, it does not fully exploit the structural properties of the tensor, therefore, compression by such methods is less intense. Such solutions are disclosed in the following documents: CN107392305A (IPC G06N3 / 04, publ. 2017-11-24); US201716346313A1 (IPC G06N3 / 04, publ. 2019-09-12); CN201910338123A (IPC G06K9 / 62, publ. 2019-08-27).

Из патента US20160217369A1 (МПК G06N3/08, опубл. 2016-07-28) известно решение, описывающее способ сжатия нейронной сети. Известное решение включает замену по меньшей мере одного слоя в нейронной сети множеством сжатых слоев для создания сжатой нейронной сети; вставку нелинейности между сжатыми слоями сжатой сети; и тонкую настройка сжатой сети путем обновления значений веса по меньшей мере в одном из сжатых слоев.From patent US20160217369A1 (IPC G06N3 / 08, publ. 2016-07-28) a solution is known that describes a method for compressing a neural network. The known solution includes replacing at least one layer in a neural network with multiple compressed layers to create a compressed neural network; insertion of nonlinearity between compressed layers of a compressed network; and fine-tuning the compressed network by updating the weight values in at least one of the compressed layers.

Из уровня техники известно решение, описывающее способ сжатия сверточных нейронных сетей, основанный на разложении Таккера и анализе главных компонент (CN110032951A, МПК G06K9/00, опубл. 2019-07-19). При этом способ использует весовой тензор текущего слоя и весовой тензор двух соседних слоев, когда выбирается ранг, и сжатие между слоями больше не является полностью независимым. Выбор ранга является более разумным благодаря информации между смежными уровнями. И, чтобы решить проблему увеличения глубины сети с помощью метода сжатия, основанного на разложении Таккера, разложение Таккера и метод анализа главных компонентов объединяются для сжатия весового тензора каждого сверточного слоя, так что исходная глубина сети сохраняется, а проблемы исчезновения градиента и т.п., вызванные значительным увеличением количества сетевых уровней, исключаются.A solution is known from the prior art that describes a method for compressing convolutional neural networks based on Tucker decomposition and principal component analysis (CN110032951A, IPC G06K9 / 00, publ. 2019-07-19). In doing so, the method uses the weight tensor of the current layer and the weight tensor of two adjacent layers when the rank is selected, and the compression between the layers is no longer completely independent. The choice of rank is more reasonable thanks to the information between adjacent levels. And, to solve the problem of increasing network depth using a compression method based on the Tucker decomposition, the Tucker decomposition and principal component analysis are combined to compress the weight tensor of each convolutional layer, so that the original network depth is preserved, but the problems of gradient fading, etc. caused by a significant increase in the number of network layers are excluded.

Недостатками известных из уровня техник решений является то, что они выполняют сжатие каждого слоя один раз, что влечет резкое уменьшение числа параметров и значительное падение качества, которое затрудняет последующую процедуру тонкой настройки (путем корректировки параметров НС методом обратного распространения ошибки) с целью восстановления исходного качества модели.The disadvantages of solutions known from the level of techniques are that they compress each layer once, which entails a sharp decrease in the number of parameters and a significant drop in quality, which complicates the subsequent fine-tuning procedure (by adjusting the NN parameters using the backpropagation method) in order to restore the original quality models.

Заявленное решение не обладает этим недостатком, так как слои НС можно сжимать несколько раз подряд, т.е. не допускать резкого падения качества предсказаний модели после одной итерации сжатия и тем самым упрощать процедуру тонкой настройки (выполняемую после очередного сжатия слоев).The claimed solution does not have this drawback, since the NS layers can be compressed several times in a row, i.e. prevent a sharp drop in the quality of model predictions after one iteration of compression and thereby simplify the fine-tuning procedure (performed after the next compression of layers).

Таким образом, основным отличием заявленного решения от известных из уровня техники, базирующихся на тензорных аппроксимациях, является то, что оно позволяет проводить сжатие итеративно.Thus, the main difference between the claimed solution and those known from the prior art, based on tensor approximations, is that it allows for iterative compression.

Как результат, при заданном качестве такая процедура позволяет достичь большей степени компрессии, чем неитеративные аналоги.As a result, for a given quality, such a procedure allows achieving a higher degree of compression than non-iterative analogs.

Более того, в заявленную систему сжатия входят компоненты, не встречающиеся у аналогов. В качестве одного из режимов автоматического выбора ранга используется Байесовский подход с последующим ослабление ранга.Moreover, the claimed compression system includes components that are not found in analogues. As one of the automatic rank selection modes, the Bayesian approach is used, followed by rank weakening.

Также в заявленной системе присутствует возможность сжатия сверточного слоя НС путем поиска аппроксимации не для исходного веса слоя, а для его модификации, полученной за счет переиндексации элементов веса.Also in the claimed system there is the possibility of compressing the convolutional layer of the NN by searching for an approximation not for the initial weight of the layer, but for its modification, obtained by reindexing the weight elements.

СУЩНОСТЬ ИЗОБРЕТЕНИЯSUMMARY OF THE INVENTION

Нейронные сети доказали свою эффективность для классификации и сегментации изображений, обнаружения объектов на изображении. Тем не менее, современные сверточные НС содержат сотни миллионов параметров, что препятствует их эффективной работе на встраиваемых системах, характерными признаками которых является ограниченность памяти и мощностей, доступных для использования.Neural networks have proven their effectiveness for image classification and segmentation, detection of objects in the image. However, modern convolutional neural networks contain hundreds of millions of parameters, which prevents them from working effectively on embedded systems, which are characterized by limited memory and power available for use.

Технической проблемой, на решение которой направлено заявленное техническое решение, является создание системы сжатия искусственных НС на основе итеративного применения тензорных аппроксимаций.The technical problem to be solved by the claimed technical solution is the creation of a compression system for artificial neural networks based on the iterative application of tensor approximations.

Техническим результатом, достигаемым при решении вышеуказанной технической проблемы, является эффективное сжатие НС, что позволяет уменьшить размер НС, сохраняя качество ее предсказаний.The technical result achieved by solving the above technical problem is the effective compression of the NN, which makes it possible to reduce the NN size, while maintaining the quality of its predictions.

Данный результат позволяет решать задачу сжатия более эффективно с точки зрения использования ресурсов клиентских устройств, например, мобильных телефонов. Это может иметь решающее значение для систем с ограниченным хранилищем данных. Кроме того, это облегчает распространение приложений через Интернет. Сжатие уменьшает количество вычислений и, следовательно, уменьшает потребление энергии.This result allows solving the compression problem more efficiently in terms of using the resources of client devices, for example, mobile phones. This can be critical for systems with limited data storage. It also makes it easier to distribute applications over the Internet. Compression reduces the amount of computation and therefore reduces power consumption.

В заявленном решении используется итеративный подход к сжатию, который основан на чередовании двух процедур: сжатия слоев НС и восстановления качества предсказаний НС.In the claimed solution, an iterative approach to compression is used, which is based on the alternation of two procedures: compression of NS layers and restoration of the quality of NS predictions.

При этом параметры сжатия ищутся автоматически, а именно, для каждого слоя автоматически ищется ранг тензорного разложения, который используется при аппроксимации весового тензора и определяет структуру сжатого слоя. Такой алгоритм ускоряет сверточные НС и уменьшает размер модели без снижения оригинальной точности.In this case, the compression parameters are searched automatically, namely, for each layer the rank of the tensor expansion is automatically searched, which is used in the approximation of the weight tensor and determines the structure of the compressed layer. This algorithm speeds up convolutional neural networks and reduces model size without compromising original accuracy.

Заявленный результат достигается за счет осуществления системы сжатия искусственных НС на основе итеративного применения тензорных аппроксимаций, содержащей:The claimed result is achieved by implementing a compression system for artificial neural networks based on iterative application of tensor approximations, which contains:

устройство сжатия, которое состоит из модуля автоматического определения параметров сжатия (модуль rank selector) и модуля, осуществляющего замену параметров сверточных/полносвязных слоев НС на их малоранговую аппроксимацию, полученную с помощью тензорных/матричных разложений (модуль tensor approximator), иa compression device, which consists of a module for automatic determination of compression parameters (rank selector module) and a module that replaces the parameters of convolutional / fully connected layers of the NN with their low-rank approximation obtained using tensor / matrix expansions (tensor approximator module), and

устройство тонкой настройки, при этом:fine tuning device, while:

• устройство сжатия принимает на вход НС, модуль rank selector автоматически для каждого сверточного/полносвязного слоя НС подбирает ранг тензорного разложения, который используется при аппроксимации весового тензора, после чего модуль tensor approximator осуществляет замену веса слоя на его малоранговую аппроксимацию так, что суммарное число параметров новых тензоров меньше, чем число параметров в исходном тензоре,• the compression device receives the NS as input, the rank selector module automatically selects the rank of the tensor expansion for each convolutional / fully connected NS layer, which is used to approximate the weight tensor, after which the tensor approximator module replaces the layer weight with its low-rank approximation so that the total number of parameters there are fewer new tensors than the number of parameters in the original tensor,

в результате чего при первой обработке сверточного/полносвязного слоя устройством сжатия исходный слой заменяется на декомпозированный слой, который представляет собой последовательность нескольких сверточных/полносвязных слоев, при этом веса новых слоев инициализируются факторами тензорного разложения, с помощью которого выполнена аппроксимация, при повторной обработке уже декомпозированного слоя число сверточных/полносвязных слоев не изменяется, но число параметров в каждой составляющей декомпозированного слоя уменьшается в силу уменьшения ранга аппроксимации;as a result of which, during the first processing of a convolutional / fully connected layer by a compression device, the original layer is replaced by a decomposed layer, which is a sequence of several convolutional / fully connected layers, while the weights of the new layers are initialized by the tensor decomposition factors, with which the approximation was performed, when the already decomposed layer, the number of convolutional / fully connected layers does not change, but the number of parameters in each component of the decomposed layer decreases due to a decrease in the approximation rank;

• устройство тонкой настройки принимает на вход преобразованную НС и выдает на выход оптимизированную НС, обладающую лучшей предсказательной способностью за счет корректировки параметров модели, которая производится методом обратного распространения ошибки с использованием базы данных.• the fine tuning device accepts the transformed NN at the input and outputs the optimized NN, which has a better predictive ability due to the adjustment of the model parameters, which is performed by the backpropagation method using a database.

Вес сверточного слоя в нашем случае представляет собой четырехмерный тензор, а вес полносвязного слоя является матрицей (двумерным тензором).The weight of the convolutional layer in our case is a four-dimensional tensor, and the weight of a fully connected layer is a matrix (two-dimensional tensor).

ОПИСАНИЕ ЧЕРТЕЖЕЙDESCRIPTION OF DRAWINGS

Реализация изобретения будет описана в дальнейшем в соответствии с прилагаемыми чертежами, которые представлены для пояснения сути изобретения и никоим образом не ограничивают область изобретения. К заявке прилагаются следующие чертежи:The implementation of the invention will be described in the following in accordance with the accompanying drawings, which are presented to clarify the essence of the invention and in no way limit the scope of the invention. The following drawings are attached to the application:

Фиг. 1 иллюстрирует пример системы сжатия искусственных НС на основе итеративного применения тензорных аппроксимаций;FIG. 1 illustrates an example of an artificial neural network compression system based on iterative application of tensor approximations;

Фиг. 2 иллюстрирует работу устройства сжатия на примере одного сверточного слоя;FIG. 2 illustrates the operation of a compression device using one convolutional layer as an example;

Фиг. 3 иллюстрирует модель детектирования объектов, на примере которой показана работа системы сжатия;FIG. 3 illustrates an object detection model, which illustrates the operation of the compression system;

Фиг. 4 иллюстрирует описание сжатия декомпозированного слоя;FIG. 4 illustrates a description of compression of a decomposed layer;

Фиг. 5 иллюстрирует пример общей схемы компьютерного устройства.FIG. 5 illustrates an example of a general arrangement of a computing device.

ДЕТАЛЬНОЕ ОПИСАНИЕ ИЗОБРЕТЕНИЯDETAILED DESCRIPTION OF THE INVENTION

В приведенном ниже подробном описании реализации изобретения приведены многочисленные детали реализации, призванные обеспечить отчетливое понимание настоящего изобретения. Однако, квалифицированному в предметной области специалисту, будет очевидно каким образом можно использовать настоящее изобретение, как с данными деталями реализации, так и без них. В других случаях хорошо известные методы, процедуры и компоненты не были описаны подробно, чтобы не затруднять понимание особенностей настоящего изобретения.In the following detailed description of an implementation of the invention, numerous implementation details are set forth to provide a thorough understanding of the present invention. However, it will be obvious to those skilled in the art how the present invention can be used, with or without these implementation details. In other instances, well-known techniques, procedures, and components have not been described in detail so as not to obscure the features of the present invention.

Кроме того, из приведенного изложения будет ясно, что изобретение не ограничивается приведенной реализацией. Многочисленные возможные модификации, изменения, вариации и замены, сохраняющие суть и форму настоящего изобретения, будут очевидными для квалифицированных в предметной области специалистов.In addition, it will be clear from the above description that the invention is not limited to the above implementation. Numerous possible modifications, changes, variations and substitutions, while retaining the spirit and form of the present invention, will be apparent to those skilled in the art.

Ниже будут описаны термины и понятия, необходимые для осуществления настоящего технического решения.Below will be described the terms and concepts necessary to implement this technical solution.

База данных (БД) - совокупность данных, организованных в соответствии с концептуальной структурой, описывающей характеристики этих данных и взаимоотношения между ними, причем такое собрание данных, которое поддерживает одну или более областей применения (ISO/IEC 2382:2015, 2121423 «database»).Database (DB) - a collection of data organized in accordance with a conceptual structure describing the characteristics of this data and the relationship between them, and such a collection of data that supports one or more areas of application (ISO / IEC 2382: 2015, 2121423 "database") ...

Нейронная сеть (далее - НС) - вычислительная или логическая схема, построенная из процессорных элементов, являющихся упрощенными функциональными моделями нейронов. Существуют сторонние библиотеки, позволяющих эффективно работать с искусственными нейронными сетями, которые находятся в открытом доступе (например, PyTorch, TensorFlow).A neural network (hereinafter - NN) is a computational or logical circuit built from processing elements, which are simplified functional models of neurons. There are third-party libraries that allow you to effectively work with artificial neural networks that are openly available (for example, PyTorch, TensorFlow).

Слой нейронной сети (англ. layer) - совокупность нейронов сети, объединяемых по особенностям их функционирования.A layer of a neural network (English layer) is a set of neurons in a network, united by the peculiarities of their functioning.

Преобразование данных в моделях классификации изображений и детектирования объектов осуществляется за счет обработки данных слоями нейронной сети. Архитектура вычислений в уровне техники известна как архитектура сверточной искусственной нейронной сети. На данный момент множество задач компьютерного зрения успешно решается с помощью данного инструмента. Data transformation in the models of image classification and object detection is carried out by processing the data with layers of a neural network. Computing architecture is known in the art as convolutional artificial neural network architecture. At the moment, many tasks of computer vision are successfully solved using this tool.

Механизм обработки входного изображения с помощью сверточной НС представляет собой чередование обработки линейными (сверточные, полносвязные), нелинейными, MaxPooling слоями.The mechanism for processing the input image using a convolutional neural network is the alternation of processing by linear (convolutional, fully connected), nonlinear, MaxPooling layers.

Обработка данных сверточным слоем состоит из последовательного применения операции свертки к входному тензору: трехмерный входной тензор с размерами (H,W,Cin) преобразуется путем последовательного скалярного умножения участков тензора, размера (K1,K2,Cin,Cout) каждый, на ядро сверточного слоя - четырехмерный тензор весов с размерами (K1,K2,Cin,Cout).Data processing by a convolutional layer consists of sequential application of the convolution operation to the input tensor: a three-dimensional input tensor with dimensions (H, W, Cin) is transformed by sequential scalar multiplication of tensor sections of size (K1, K2, Cin, Cout) each, by the kernel of the convolutional layer - four-dimensional tensor of weights with dimensions (K1, K2, Cin, Cout).

В данном случае, (H,W) это ширина и высота, а Cin -число каналов входного тензора, (K1,K2) - пространственные размеры ядра свертки, а Cout - число выходных каналов свертки. В результате применения данной операции, полученный тензор имеет размерность (H,W,Cout). Кроме того, часть слоев применяет операцию свертки не к каждому участку входного тензора, сдвигая по H и W на 1 пиксель каждый раз, но с произвольным сдвигом (англ. stride). В таком случае размер выходного тензора будет (Hnew,Wnew,Cout).In this case, (H, W) is the width and height, and Cin is the number of channels of the input tensor, (K1, K2) is the spatial dimensions of the convolution kernel, and Cout is the number of output channels of the convolution. As a result of applying this operation, the resulting tensor has the dimension (H, W, Cout). In addition, some of the layers do not apply the convolution operation to every part of the input tensor, shifting H and W by 1 pixel each time, but with an arbitrary shift (eng. Stride). In this case, the size of the output tensor will be (Hnew, Wnew, Cout).

Механизм работы полносвязного слоя искусственной НС состоит из последовательного применения следующих преобразований входного вектора размерности L: вектор умножается на матрицу весов размерности (M, L) и результат суммируется с вектором смещения размерности M.The mechanism of operation of a fully connected layer of an artificial neural network consists of the sequential application of the following transformations of the input vector of dimension L: the vector is multiplied by the matrix of weights of dimension (M, L) and the result is summed up with the bias vector of dimension M.

Кроме того, после каждого линейного слоя следует нелинейное преобразования полученного вектора (нелинейный слой).In addition, each linear layer is followed by a nonlinear transformation of the resulting vector (nonlinear layer).

Еще один тип слоя - MaxPooling, который копирует максимальные элементы участков входного тензора, имеющих размер (K,K,Cin). Данный слой обычно используется для уменьшения обрабатываемого тензора в 2 раза по высоте и ширине.Another type of layer is MaxPooling, which copies the maximum elements of the sections of the input tensor that have dimensions (K, K, Cin). This layer is usually used to reduce the processed tensor by 2 times in height and width.

Тензор, являющийся результатом обработки одного из слоев, называется выходом скрытого слоя и отправляется в качестве входного тензора следующей компоненте искусственной нейронной сети (т.е. отправляется на обработку следующему скрытому слою).The tensor resulting from the processing of one of the layers is called the output of the hidden layer and is sent as an input tensor to the next component of the artificial neural network (i.e., sent for processing to the next hidden layer).

Ниже приведено описание алгоритма в общих чертах (Фиг 1).Below is a description of the algorithm in general terms (Fig. 1).

Создание более глубоких и сложных нейросетевых моделей обусловлено необходимостью получать модели с лучшей предсказательной способностью. Однако, такие сети содержат десятки миллионов параметров и часто не могут быть эффективно реализованы на переносных и мобильных устройствах из-за их вычислительной мощности и ограничений памяти. Таким образом, существует проблема, связанная с хранением и работой в режиме реального времени современных нейросетевых моделей, в частности моделей для классификации и детектирования по видеоданным.The creation of deeper and more complex neural network models is due to the need to obtain models with better predictive ability. However, such networks contain tens of millions of parameters and often cannot be efficiently implemented on portable and mobile devices due to their processing power and memory limitations. Thus, there is a problem associated with the storage and operation in real time of modern neural network models, in particular models for classification and detection based on video data.

Методы, основанные на матричных и тензорных аппроксимациях низкого ранга, обеспечивают хорошее сжатие [1, 2, 3, 4]. Однако все предыдущие подходы следуют одной и той же схеме: разовое сжатие со значительной потерей качества модели, и последующая тонкая настройка для восстановления качества.Methods based on low rank matrix and tensor approximations provide good compression [1, 2, 3, 4]. However, all previous approaches follow the same pattern: one-time compression with a significant loss of model quality, and subsequent fine-tuning to restore quality.

Заявленное решение существенно улучшает вышеупомянутые схемы, применяя сжатие и тонкую настройку не один раз, а итеративно, используя при этом автоматический подбор параметров сжатия.The claimed solution significantly improves the above-mentioned schemes, applying compression and fine-tuning not once, but iteratively, using the automatic selection of compression parameters.

На Фиг. 1 представлен общий вид основных элементов заявленной системы (100) для осуществления сжатия НС на основе многошаговых тензорных аппроксимаций. Система (100) включает в себя базу данных (110), НС (120), устройство сжатия НС (300), преобразованную НС (400), устройство тонкой настройки НС (500), оптимизированную НС (600).FIG. 1 shows a general view of the main elements of the claimed system (100) for performing compression of the NN based on multistep tensor approximations. The system (100) includes a database (110), an NS (120), an NS compression device (300), a converted NS (400), an NS fine tuning device (500), an NS optimized (600).

Получение оптимизированной, в части занимаемой памяти и скорости работы, НС (600) с помощью заявленной технической системы происходит путем чередования двух шагов: обработки НС с помощью устройства сжатия (300) и обработки НС с помощью устройства тонкой настройки (500). Итеративное повторение этих двух шагов позволяет постепенно сжимать НС и сохранять качество предсказаний результирующей модели на уровне, превышающим качество моделей, полученных с помощью существующих аналогов.Obtaining an optimized, in terms of memory and speed of work, NS (600) using the claimed technical system occurs by alternating two steps: processing the NS using a compression device (300) and processing the NS using a fine tuning device (500). Iterative repetition of these two steps allows you to gradually compress the neural network and maintain the quality of the predictions of the resulting model at a level that exceeds the quality of models obtained using existing analogues.

Чередование шагов завершается, когда-либо число операций, выполняемых оптимизированной НС (600) при обработке сигнала, либо число параметров модели, либо ее предсказательное качество не опустится ниже заданного порога. Пороги задаются перед началом работы системы (100), как, соответственно, максимальное число операций/параметров и максимальное падение качества НС, которые считаются допустимыми.The alternation of steps ends when the number of operations performed by the optimized NS (600) during signal processing, or the number of model parameters, or its predictive quality, does not fall below a given threshold. The thresholds are set before the start of the system (100), as, respectively, the maximum number of operations / parameters and the maximum drop in the quality of the neural network, which are considered acceptable.

Устройство сжатия (300) принимает на вход НС (120) и выдает на выход преобразованную НС (400), которая содержит меньшее число параметров и требует меньшее число операций для обработки сигнала. Сжатие происходит за счет уменьшения числа параметров в сверточных и полносвязных слоях НС (120).The compression device (300) receives the NS (120) at the input and outputs the converted NS (400), which contains fewer parameters and requires fewer operations to process the signal. Compression occurs due to a decrease in the number of parameters in convolutional and fully connected layers of the neural network (120).

Устройство тонкой настройки (500) принимает на вход преобразованную НС (400), являющуюся результатом работы устройства сжатия и выдает на выход оптимизированную НС (600), обладающую лучшей предсказательной способностью. Качество предсказаний НС (120) улучшается за счет корректировки параметров модели, которая производится методом обратного распространения ошибки с использованием базы данных (110).The fine tuning device (500) takes as input the transformed NS (400), which is the result of the operation of the compression device, and outputs to the output an optimized NS (600) with better predictive ability. The quality of NN predictions (120) is improved by adjusting the parameters of the model, which is performed by the backpropagation method using the database (110).

Работа устройства сжатия (300) заключается в последовательном применении двух модулей:The operation of the compression device (300) consists in the sequential application of two modules:

• модуля rank selector (310);• the rank selector module (310);

• модуля tensor approximator (320).• tensor approximator (320) module.

Параметры (вес) сверточного слоя представляют собой четырехмерный тензор. Вес полносвязного слоя является матрицей (двумерным тензором). Построение аппроксимации исходного тензора с помощью тензорного разложения означает, что мы находим новые тензоры (факторы) такие, что каждый элемент исходного тензора есть некая линейная функция от элементов новых тензоров. Каждое тензорное разложение характеризуется значением ранга, т.е. параметром, от которого зависят размеры найденных факторов.The parameters (weight) of the convolutional layer are a four-dimensional tensor. The weight of a fully connected layer is a matrix (two-dimensional tensor). The construction of an approximation of the original tensor using tensor expansion means that we find new tensors (factors) such that each element of the original tensor is some linear function of the elements of the new tensors. Each tensor decomposition is characterized by a rank value, i.e. a parameter on which the sizes of the found factors depend.

Аппроксимация называется малогранговой, если суммарное число параметров новых тензоров меньше, чем число параметров в исходном тензоре.An approximation is called low-rank if the total number of parameters of the new tensors is less than the number of parameters in the original tensor.

Таким образом, обработка НС устройством сжатия (300) заключается в том, что для каждого сверточного/полносвязного слоя НС, модуль rank selector (310) автоматически подбирает ранг тензорного разложения, которое далее используется модулем tensor approximator (320) для замены веса слоя на его малоранговую аппроксимацию.Thus, the processing of the neural network by the compression device (300) means that for each convolutional / fully connected layer of the neural network, the rank selector module (310) automatically selects the rank of the tensor expansion, which is then used by the tensor approximator module (320) to replace the layer weight with its low-rank approximation.

Получается, что при первой обработке сверточного/полносвязного слоя устройством сжатия (300) исходный слой заменяется на декомпозированный слой (411), который представляет собой последовательность нескольких сверточных/полносвязных слоев (веса новых слоев инициализируются факторами тензорного разложения, с помощью которого выполнена аппроксимация). При повторной обработке уже декомпозированного слоя (411) устройством сжатия (300) число сверточных/полносвязных слоев не изменяется, но число параметров в каждой составляющей (412) декомпозированного слоя (411) уменьшается в силу уменьшения ранга аппроксимации.It turns out that during the first processing of a convolutional / fully connected layer by a compression device (300), the original layer is replaced by a decomposed layer (411), which is a sequence of several convolutional / fully connected layers (the weights of new layers are initialized by tensor decomposition factors, with which the approximation is performed). When the already decomposed layer (411) is reprocessed by the compression device (300), the number of convolutional / fully connected layers does not change, but the number of parameters in each component (412) of the decomposed layer (411) decreases due to a decrease in the approximation rank.

Предлагаемая система (100) может быть использована для сжатия и ускорения работы любых НС, содержащих сверточные и полносвязные слои. The proposed system (100) can be used to compress and accelerate the operation of any neural network containing convolutional and fully connected layers.

Модули tensor approximator (320) и rank selector (310) могут функционировать в нескольких режимах, что позволяет более оптимальное использование устройства сжатия (300) в зависимости от цели сжатия (либо избавление от избыточных параметров модели, либо получение модели с наименьшим числом параметров/выполняемых операций с плавающей точкой, демонстрирующей качество предсказаний не ниже заданного уровня).The tensor approximator (320) and rank selector (310) modules can operate in several modes, which allows more optimal use of the compression device (300) depending on the purpose of compression (either getting rid of redundant model parameters or obtaining a model with the least number of parameters / executable operations with floating point, demonstrating the quality of predictions not lower than the specified level).

Система сжатия (100) использует новый Байесовского подход для автоматического подбора ранга в качестве одного из режимов работы модуля rank selector (310). Также, модуль tensor approximator (320) включает себе режим (не встречающийся ранее у аналогов), в котором тензорная аппроксимация ищется не для исходного веса слоя, а для тензора, являющегося его модификацией, полученной за счет переиндексации элементов веса.The compression system (100) uses a new Bayesian approach for automatic rank selection as one of the modes of operation of the rank selector module (310). Also, the tensor approximator module (320) includes a mode (not found earlier in its analogs), in which the tensor approximation is sought not for the initial layer weight, but for the tensor, which is its modification, obtained by reindexing the weight elements.

Ключевым отличием предлагаемой системы сжатия (100) от существующих аналогов является итеративность, то есть поочередная обработка НС устройством сжатия (300) и устройством тонкой настройки (500). The key difference of the proposed compression system (100) from existing analogs is iteration, that is, the alternate processing of the NS by the compression device (300) and the fine tuning device (500).

Итеративный подход позволяет получать модели с качеством предсказаний, как у сжатых моделей, являющихся результатом работы аналогов, и содержащих при этом меньшее число параметров.The iterative approach makes it possible to obtain models with prediction quality similar to compressed models that are the result of the work of analogs, and at the same time contain fewer parameters.

Ниже приведено описание того, как сжимается один слой.Below is a description of how one layer is compressed.

Модуль tensor approximator (320) для сжатия слоев НС может функционировать в трех режимах: “Tucker-2”/“СP-3”/“Truncated-SVD”, которые соответствуют поиску аппроксимации весового тензора в формате Tucker decomposition/Canonical polyadic decomposition/Singular Value Decomposition (см. [14]).The tensor approximator (320) module for compression of NS layers can operate in three modes: “Tucker-2” / “СP-3” / “Truncated-SVD”, which correspond to the search for an approximation of the weight tensor in the Tucker decomposition / Canonical polyadic decomposition / Singular format Value Decomposition (see [14]).

Ниже приведена схема сжатия сверточного слоя НС (Фиг. 2). Below is a diagram of the compression of the convolutional layer of the neural network (Fig. 2).

Каждому сверточному слою соответствует 4-x мерный тензор весов (K, K, Cin, Cout). Модуль автоматического подбора параметров (310) определяет ранг тензорного разложения, который далее используется модулем сжатия (320) для аппроксимации тензора весов (ниже подробно будет раскрыта работа модуля (310)).Each convolutional layer corresponds to a 4-dimensional weight tensor (K, K, Cin, Cout). The module for automatic selection of parameters (310) determines the rank of the tensor decomposition, which is then used by the compression module (320) to approximate the weight tensor (the operation of the module (310) will be described in detail below).

Модуль сжатия (320) производит замену 4-х мерного тензора на найденную аппроксимацию, что эквивалентно замене исходного сверточного слоя на последовательность трех сверточных слоев с меньшими весовыми тензорами.The compression module (320) replaces the 4-dimensional tensor with the found approximation, which is equivalent to replacing the original convolutional layer with a sequence of three convolutional layers with smaller weight tensors.

Модуль сжатия (320), работающий в режиме “Tucker-2”, аппроксимирует 4-х мерный тензор весов W размера (K, K, Cin, Cout) с помощью 4-х мерного тензора W2 меньшего размера (K, K, R1, R2), R1< Cin, R2< Cout и двух матриц размера (Cin, Rin) и (Cout, Rout), соответственно (т.е. аппроксимируем с помощью тензорного разложения Tucker-2 ранга (R1, R2)). Далее, элементы матриц используются для создания двух 4-х мерных тензоров W1 и W3 размера (1, 1, Cin, R1) и (1, 1, R2, Cout), соответственно.The compression module (320), operating in the “Tucker-2” mode, approximates the 4-dimensional tensor of the weights W of size (K, K, Cin, Cout) using the 4-dimensional tensor W2 of the smaller size (K, K, R1, R2), R1 <Cin, R2 <Cout, and two matrices of size (Cin, Rin) and (Cout, Rout), respectively (that is, approximated by the Tucker-2 tensor decomposition of rank (R1, R2)). Next, the matrix elements are used to create two 4-dimensional tensors W1 and W3 of size (1, 1, Cin, R1) and (1, 1, R2, Cout), respectively.

Таким образом, обработка входного тензора сверточным слоем с весовым тензором W заменяется на последовательную обработку входного тензора тремя сверточными слоями с весовыми тензорами W1, W2, W3.Thus, the processing of the input tensor by a convolutional layer with a weight tensor W is replaced by sequential processing of the input tensor by three convolutional layers with weight tensors W1, W2, W3.

В режиме CP-3 модуль сжатия (320) заменяет новые слои на три слоя с размерами (1, 1, Cin, R), (K, K, 1, R), (1, 1, R, Cout). Это происходит в результате следующих шагов.In CP-3 mode, the compression module (320) replaces the new layers into three layers with the sizes (1, 1, Cin, R), (K, K, 1, R), (1, 1, R, Cout). This happens as a result of the following steps.

Сначала 4-х мерный тензор весов (K, K, Cin, Cout) преобразуется в 3-х мерный (K*K, Cin, Cout) за счет переиндексации элементов. Затем 3-х мерный тензор аппроксимируется с помощью трех матриц размера (Cin, R), (K*K, R), (Cout, R) (т.е. аппроксимируем с помощью тензорного разложения CP-3 ранга R). Далее, элементы матриц используются для создания трех 4-х мерных тензоров W1, W2 и W3 размера (1, 1, Cin, R), (K, K, 1, R) и (1, 1, R, Cout), соответственно.First, the 4-dimensional tensor of the weights (K, K, Cin, Cout) is converted into the 3-dimensional (K * K, Cin, Cout) by reindexing the elements. The 3-D tensor is then approximated using three matrices of size (Cin, R), (K * K, R), (Cout, R) (i.e., approximated using the CP-3 tensor decomposition of rank R). Next, the matrix elements are used to create three 4D tensors W1, W2 and W3 of size (1, 1, Cin, R), (K, K, 1, R) and (1, 1, R, Cout), respectively ...

Таким образом, обработка входного тензора сверточным слоем с весовым тензором W заменяется на последовательную обработку входного тензора тремя сверточными слоями с весовыми тензорами W1, W2, W3, где второй слой представляет собой поканальную свертку.Thus, the processing of the input tensor by a convolutional layer with a weight tensor W is replaced by the sequential processing of the input tensor by three convolutional layers with weight tensors W1, W2, W3, where the second layer is a channel-by-channel convolution.

Ниже приведена схема сжатия полносвязного слоя НС.Below is a diagram of the compression of a fully connected NS layer.

Каждому полносвязному слою соответствует 2-х мерный тензор весов (Cin, Cout). Модуль автоматического определения параметров сжатия (310) определяет ранг тензорного разложения R и модуль (320) производит замену 2-х мерного тензора на найденную с помощью “Truncated-SVD” аппроксимацию. Т.е. исходный слой заменяется на последовательность двух полносвязных с меньшими весовыми тензорами (Cin, R), (R, Cout).Each fully connected layer corresponds to a 2-dimensional weight tensor (Cin, Cout). The module for automatic determination of the compression parameters (310) determines the rank of the tensor expansion R and the module (320) replaces the 2-dimensional tensor with the approximation found using “Truncated-SVD”. Those. the original layer is replaced by a sequence of two fully connected ones with lower weight tensors (Cin, R), (R, Cout).

Ниже приведено описание сжатия декомпозированного слоя (213) (Фиг. 4).Below is a description of the compression of the decomposed layer (213) (Fig. 4).

При повторной обработке уже декомпозированного слоя (213) устройством сжатия (300) число сверточных/полносвязных слоев не изменяется, но число параметров в каждой составляющей (214) декомпозированного слоя (213) уменьшается в силу уменьшения ранга аппроксимации. Таким образом, устройство сжатия (300) выдает декомпозированный слой (413).When the already decomposed layer (213) is reprocessed by the compression device (300), the number of convolutional / fully connected layers does not change, but the number of parameters in each component (214) of the decomposed layer (213) decreases due to the decrease in the approximation rank. Thus, the compressor (300) produces a decomposed layer (413).

Описание автоматического подбора ранга.Description of automatic rank selection.

Модуль rank selector (310) для автоматического определения параметров сжатия может функционировать в двух режимах: “bayesian” и “threshold”.Module The rank selector (310) for automatic determination of compression parameters can operate in two modes: “bayesian” and “threshold”.

Режим “bayesian”.Bayesian mode.

Автоматический подбор рангов в режиме ‘bayesian’ производится с помощью Байесовского подхода. Для удобства введем два обозначения: экстремальный ранг и ослабленный ранг.Automatic 'bayesian' rank selection is performed using the Bayesian approach. For convenience, we introduce two notation: extreme rank and weakened rank.

Экстремальный ранг - это значение, при котором малоранговая аппроксимация весового тензора не является избыточной.The extreme rank is the value at which the low-rank approximation of the weight tensor is not redundant.

Ослабленный ранг - это значение, при котором определенное количество избыточности сохраняется в аппроксимации тензора после разложения.The weakened rank is the value at which a certain amount of redundancy is retained in the tensor approximation after decomposition.

В Байесовском подходе, во-первых, выполняется поиск экстремального ранга с помощью GAS EVBMF (Глобальное аналитическое решение эмпирической вариационной байесовской матричной факторизации, см. [13]), а во-вторых, производится ослабление ранга (т.е. увеличение значения экстремального ранга).In the Bayesian approach, firstly, the search for the extreme rank is performed using GAS EVBMF (Global Analytical Solution to Empirical Variational Bayesian Matrix Factorization, see [13]), and secondly, the rank is weakened (that is, the value of the extreme rank ).

GAS EVBMF может автоматически находить ранг матрицы, выполняя байесовский вывод, однако он предоставляет субоптимальное решение.GAS EVBMF can automatically find the rank of a matrix by performing Bayesian inference, however it provides a suboptimal solution.

В отличие от решения [2], в заявленном решении используется GAS EVBMF не для того, чтобы установить ранг для аппроксимации весового тензора R, а только для определения экстремального ранга (т.е. Rextr = Revbmf). Чтобы определить экстремальный ранг с помощью GAS EVBMF для аппроксимации в тензорном формате Tucker-2, в заявленном решении применяем его по-отдельности к двум разверткам тензора весов (т.е. к двум матрицам, которые получаются из 4-х мерного тензора путем переиндексации элементов, и имеют значение одной из размерностей равным числу каналов).In contrast to the solution [2], the claimed solution uses GAS EVBMF not to establish the rank for approximating the weight tensor R, but only to determine the extreme rank (ie Rextr = Revbmf). To determine the extreme rank using the GAS EVBMF for approximation in the Tucker-2 tensor format, in the claimed solution we apply it separately to two unfolding of the weights tensor (i.e., to two matrices that are obtained from a 4-dimensional tensor by reindexing the elements , and have the value of one of the dimensions equal to the number of channels).

Ослабленный ранг Rweak зависит линейно от экстремального ранга и служит для сохранения большей избыточности в аппроксимации тензора.The weakened rank of Rweak depends linearly on the extreme rank and serves to preserve more redundancy in the tensor approximation.

R = Rweak облегчает тонкую настройку и дает шаг сжатия с большей точностью.R = Rweak makes fine tuning easier and gives a compression step with greater precision.

Ослабленный ранг определяется следующим образом: Rweak = Rinit - w * (Rinit - Rextr), где w - гиперпараметр, называемый коэффициентом ослабления, 0 <w <1, и Rinit - изначальный ранг тензора. Это приводит к Rextr ≤ Rweak ≤ Rinit.The attenuated rank is defined as follows: Rweak = Rinit - w * (Rinit - Rextr), where w is a hyperparameter called the attenuation coefficient, 0 <w <1, and Rinit is the initial tensor rank. This results in Rextr ≤ Rweak ≤ Rinit.

Оптимальное значение для w находится в диапазоне: 0,5≤w≤0,9. Если начальный ранг меньше 21, в заявленном решении алгоритм считает такие ядра достаточно маленькими и не сжимает их.The optimal value for w is in the range: 0.5≤w≤0.9. If the initial rank is less than 21, in the stated solution the algorithm considers such kernels small enough and does not compress them.

Автоматический подбор рангов в режиме “threshold” определяет ранг разложения для каждого слоя как минимальный ранг, при котором качество всей модели не падает ниже заданного пользователем порога качества.Automatic selection of ranks in the “threshold” mode determines the decomposition rank for each layer as the minimum rank at which the quality of the entire model does not fall below the user-specified quality threshold.

Описание устройства тонкой настройкиDescription of the fine tuning device

Устройство настройки НС.NS tuning device.

После процедуры сжатия модели измеряется ее качество на валидационной выборке (например, PASCAL VOC val [12]) и происходит процесс обучения модели с помощью устройства тонкой настройки (500):After the model compression procedure, its quality is measured on the validation sample (for example, PASCAL VOC val [12]) and the model is trained using a fine-tuning device (500):

Целевой набор данных должен состоять из изображений и аннотаций, содержащих информацию о правильных позициях объекта, его класса, координат обрамляющего прямоугольника (bounding box) и, опционально, маски для сегментации.The target dataset should consist of images and annotations containing information about the correct position of the object, its class, coordinates of the bounding box and, optionally, a mask for segmentation.

Для изображений из обучающей выборки и их аннотаций, вычисляется результат работы НС. Подсчитывается сконструированная особым образом функция потерь, учитывающая точность определения класса и границ объекта.For images from the training set and their annotations, the result of the neural network is calculated. A specially designed loss function is calculated, taking into account the accuracy of determining the class and boundaries of the object.

Подробное описание функции потерь см. в статье [11]. For a detailed description of the loss function, see article [11].

С помощью метода обратного распространения ошибки, избранным методом оптимизации (стохастическим градиентным спуском с моментом 0.9, с начальным коэффициентом скорости обучений (learning rate) 0.01) происходит подстройка весов всей модели. Процесс повторяется несколько эпох до тех пор, пока не будет достигнуто желаемое качество модели или не будет достигнуто максимальное установленное число итераций. Using the error backpropagation method chosen by the optimization method (stochastic gradient descent with a moment of 0.9, with an initial learning rate of 0.01), the weights of the entire model are adjusted. The process is repeated several epochs until the desired quality of the model is achieved or the maximum specified number of iterations is reached.

В экспериментах с Faster R-CNN [11] использовалось число итераций равное 180’000, с уменьшением в 10 раз коэффициента скорости обучения на 120’000 и 160’000 шагах.In experiments with Faster R-CNN [11], the number of iterations equal to 180'000 was used, with a 10-fold decrease in the learning rate coefficient at 120'000 and 160'000 steps.

Модель и результаты обучения сохраняются и на этом процедуру настройки можно считать завершенной и алгоритм переходит к следующему этапу.The model and the training results are saved and at this the tuning procedure can be considered complete and the algorithm proceeds to the next stage.

База данных и обработка данных.Database and data processing.

Предварительная обработка данных является частью комплексной системы сжатия НС, в частности, моделей компьютерного зрения.Data preprocessing is a part of an integrated neural network compression system, in particular, computer vision models.

Для модулей, решающих задачу классификации изображений, распознавания образов (детектирования объектов), сегментации или любой другой задачи компьютерного зрения требуется загрузить на вычислительное устройство наборы данных, содержащие изображения и аннотации в установленном формате. For modules that solve the problem of image classification, pattern recognition (object detection), segmentation, or any other computer vision task, it is required to download data sets containing images and annotations in the specified format to the computing device.

Для задачи распознавания образов и работы с наборами данных в формате COCO (Common Objects in Context) [9] и PascalVOC [12], а также для задачи классификации изображений с набором данных в формате Imagenet, CIFAR SVHN, STL10, система (110) осуществляет загрузку данных и их интерпретацию.For the problem of pattern recognition and work with datasets in the COCO (Common Objects in Context) [9] and PascalVOC [12] format, as well as for the problem of image classification with a dataset in the Imagenet, CIFAR SVHN, STL10 format, system (110) implements loading data and their interpretation.

Далее, для выбранной задачи требуется загрузить целевую модель НС (120) для ее дальнейшего сжатия. Система (100) позволяет использовать предобученные модели, распространяемые в установленном формате, содержащие информацию о значениях весов каждого слоя нейронной сети. После загрузки, целевая модель обладает следующими характеристиками:Further, for the selected task, it is required to load the target NS model (120) for its further compression. System (100) allows the use of pre-trained models distributed in a prescribed format containing information on the values of the weights of each layer of the neural network. After loading, the target model has the following characteristics:

вычислительная сложность, в количестве операций с плавающей точкой;computational complexity, in the number of floating point operations;

количество параметров;number of parameters;

скорость исполнения на целевой архитектуре;execution speed on the target architecture;

точность в рамках задачи и тестовой подвыборки набора данных (например, test set COCO-2017 [9]). accuracy within the task and test subsampling of the dataset (for example, test set COCO-2017 [9]).

Точность для каждой задачи имеет различный математический смысл. В задаче классификации качество модели оценивается метрикой accuracy (долей правильных предсказаний модели), в задаче детектирования объектов - метрикой mAP, ее описание представлено в источнике [7].Accuracy has a different mathematical meaning for each problem. In the problem of classification, the quality of the model is assessed by the metric accuracy (the proportion of correct predictions of the model), in the problem of detecting objects - by the metric mAP, its description is presented in the source [7].

Ниже приведен выбор режимов функционирования устройства сжатия.Below is a selection of modes of operation of the compression device.

Выбор режима функционирования устройства сжатия (300) зависит от критериев, накладываемых на сжатую модель.The choice of the mode of operation of the compression device (300) depends on the criteria imposed on the compressed model.

Если целью сжатия является избавление от избыточных параметров модели, то модулю rank selector (310) стоит использовать режиме “bayesian”, модуль tensor approximator (320) работает в режиме “Tucker-2” для сверточных слоев нейронной сети и режиме “Truncated-SVD” для сверточных слоев с ядром (1, 1, Cin, Cout) и для полносвязных слоев.If the purpose of compression is to get rid of redundant model parameters, then the rank selector (310) should use the “bayesian” mode, the tensor approximator (320) works in the “Tucker-2” mode for neural network convolutional layers and the “Truncated-SVD” mode for convolutional layers with kernel (1, 1, Cin, Cout) and for fully connected layers.

Если ставится цель получить модель с наименьшим числом параметров/операций с плавающей точкой, демонстрирующую качество не ниже заданного уровня, то следует выбирать следующие режимы. Для модуля rank selector (310) - режим “threshold”, для модуля tensor approximator (320) - CP-3 для сверточных слоев нейронной сети и “Truncated-SVD” для сверточных слоев с ядром (1, 1, Cin, Cout) и для полносвязных слоев.If the goal is to obtain a model with the smallest number of parameters / floating point operations, demonstrating the quality not lower than the specified level, then the following modes should be selected. For the rank selector (310) module - the “threshold” mode, for the tensor approximator (320) module - CP-3 for convolutional layers of the neural network and “Truncated-SVD” for convolutional layers with the kernel (1, 1, Cin, Cout) and for fully connected layers.

Ниже приведен пример работы модели детектирования объектов (см. фиг. 3).Below is an example of the operation of the object detection model (see Fig. 3).

НС детектирования объектов Faster R-CNN [11], на которой в заявленном решении демонстрируется эффективная работа системы (100) автоматического сжатия, состоит из нескольких частей:The NS for object detection Faster R-CNN [11], on which the claimed solution demonstrates the effective operation of the automatic compression system (100), consists of several parts:

• Backbone. Искусственная НС поиска признаков (210). Архитектура сети состоит из последовательности, сконструированной из сверточных, нелинейных и MaxPooling [6] слоев (например, это может быть архитектура сети ResNet [5] или VGG [10]). Результат прохождения через каждый слой - промежуточный трехмерным тензор. • Backbone. Artificial neural network for feature search (210). The network architecture consists of a sequence constructed from convolutional, nonlinear and MaxPooling [6] layers (for example, it can be a ResNet [5] or VGG [10] network architecture). The result of passing through each layer is an intermediate 3D tensor.

На Фиг. 3 показан сверточный слой (211) и трехмерный тензор (212), являющийся выходом слоя (211).FIG. 3 shows a convolutional layer (211) and a 3D tensor (212), which is the output of the layer (211).

Сеть принимает на вход трехканальное, предварительно нормированное RGB изображение размера (h,w,3). Результатом прохождения через сеть является выходной промежуточный трехмерный тензор (220) размера (H,W,1024).The network accepts as input a three-channel, pre-normalized RGB image of size (h, w, 3). The result of passing through the network is the output intermediate three-dimensional tensor (220) of size (H, W, 1024).

• Выходной тензор (220) сети backbone поступает в блок region proposal network (230), в который входит искусственная НС (231), предсказывающая регионы потенциального нахождения объектов (232), и блок (233), осуществляющий фильтрацию пересекающихся регионов. • The output tensor (220) of the backbone network enters the region proposal network block (230), which includes an artificial neural network (231), which predicts the regions of potential location of objects (232), and a block (233), which filters intersecting regions.

• НС (231) состоит из светрочного слоя с ядром (3х3х1024х1024) и двух параллельно выполняющихся сверточных слоев: objectness layer с ядром (1х1х1024хnum_anchors) и bbox_rpn layer с ядром (1х1х1024х(4*num_achors)), где num_anchors - число заданных заранее якорных областей или anchors (прямоугольные рамки с фиксированным заранее соотношением сторон). Обычно используются рамки трех размеров с тремя разными соотношениями сторон: (1:1, 1:2, 2:1), то есть num_anchors=9.• NS (231) consists of a light layer with a core (3x3x1024x1024) and two parallel running convolutional layers: an objectness layer with a core (1x1x1024xnum_anchors) and a bbox_rpn layer with a core (1x1x1024x (4 * num_achors)), where the number of num_anchors specified in advance is or anchors (rectangular frames with a fixed aspect ratio). Usually frames are used in three sizes with three different aspect ratios: (1: 1, 1: 2, 2: 1), i.e. num_anchors = 9.

Для архитектуры Faster R-CNN ResNet 50 C4 размеры рамок - [8x8, 16x16, 32x32] (для соотношения сторон 1:1) пикселя на выходном промежуточном слое, что соответствует 128, 256 и 512 пикселям на исходном изображении соответственно. Эти значения являются эвристическими и работают в предположении, что большинство объектов хорошо вписываются в такие рамки. Для каждой якорной области далее делается предсказание о наличии объекта внутри этой области (с помощью objectness layer, результат которого в тексте обозначается как o), а также с помощью bbox_rpn layer вычисляются поправки к границам якорной области, а именно, смещение области (Δx, Δy) и изменение высоты и ширины области (t_h, t_w).For the Faster R-CNN ResNet 50 C4 architecture, the frame sizes are [8x8, 16x16, 32x32] (for an aspect ratio of 1: 1) pixels on the output intermediate layer, which corresponds to 128, 256 and 512 pixels on the original image, respectively. These values are heuristic and operate on the assumption that most objects fit well within such a framework. For each anchor area, a prediction is made about the presence of an object inside this area (using the objectness layer, the result of which is denoted as o in the text), and also using the bbox_rpn layer, corrections to the boundaries of the anchor area are calculated, namely, the area displacement (Δx, Δy ) and changing the height and width of the area (t_h, t_w).

• В результате, выходы (231) интерпретируются, как набор потенциально обнаруженных объектов, так называемый список предсказанных областей интереса (regions of interests) (232). Каждый элемент данного списка (называемый также RoI) соответствует одной области и представляется в виде [(o, x, y, h, w)], где o - вероятность наличия объекта в данной области (мера objectness), x,y - координаты левого верхнего угла региона, h, w - высота и ширина региона. • As a result, outputs (231) are interpreted as a set of potentially detected objects, the so-called list of predicted regions of interests (232). Each element of this list (also called RoI) corresponds to one area and is represented as [(o, x, y, h, w)], where o is the probability of an object in this area (measure of objectness), x, y are the coordinates of the left the top corner of the region, h, w - the height and width of the region.

• Затем все регионы интереса проходят процедуру non-maximum-supression (233), заключающуюся в фильтрации пересекающихся регионов. Несколько раз подряд выполняется следующая операция. Выбирается регион с максимальным значением меры o (objectness) и для него вычисляется мера пересечения с каждым из оставшихся регионов, IoU (Intersection over union, пересечение по объединению - общепринятая метрика качества детекции объектов, см. определение, например, в [7]). Регионы, для которых IoU превышает порог 0.7, отбрасываются. • Then all regions of interest undergo a non-maximum-supression (233) procedure, which consists in filtering overlapping regions. The following operation is performed several times in a row. A region with the maximum value of the measure o (objectness) is selected and the measure of intersection with each of the remaining regions, IoU (Intersection over union, intersection by union is a generally accepted metric of the quality of object detection, see the definition, for example, in [7]). Regions for which IoU exceeds the 0.7 threshold are discarded.

Затем операция повторяется для очередного региона из оставшихся. При достижении заданного числа регионов или по достижении порога по objectness, процедура non-maximum-supression завершается. Отобранные регионы интереса поступают далее на обработку следующей части модели детектирования.Then the operation is repeated for the next region from the remaining ones. Upon reaching the specified number of regions or upon reaching the threshold for objectness, the non-maximum-supression procedure ends. The selected regions of interest are further processed for the next part of the detection model.

• Отобранные список регионов поступает в блок RoIPooling (240), где происходит интерполяция участка тензора признаков, соответствующего каждому RoI в тензор фиксированного размера, в данном случае 14x14x512 (241). Точное описание метода интерполяции см. в статье [8]. • The selected list of regions enters the RoIPooling block (240), where the feature tensor section corresponding to each RoI is interpolated into a tensor of a fixed size, in this case 14x14x512 (241). For a detailed description of the interpolation method, see [8].

• Затем каждый такой тензор (241) поступает в блок классификатора, называемый head (250). Входной тензор проходит через несколько сверточных слоев, затем трехмерный тензор промежуточного слоя разворачивается в одномерный вектор и поступает на два параллельных полносвязных слоя: классификатор (с нелинейностью softmax, которая переводит вектор в аналог вектора вероятностей, где каждый элемент неотрицателен и сумма всех равна единице) размера n_classes+1 и регрессор (линейный) размера 4*(n_classes+1). Число n_classes - количество распознаваемых классов объектов (20 для PascalVOC и 80 для COCO). Один символ для каждого класса соответствует вероятности, а 4 значения из регрессора - смещениям рамки относительно начального приближения. • Then each such tensor (241) enters the classifier block, called head (250). The input tensor passes through several convolutional layers, then the three-dimensional tensor of the intermediate layer is expanded into a one-dimensional vector and enters two parallel fully connected layers: a classifier (with softmax nonlinearity, which converts the vector into an analog of the probability vector, where each element is non-negative and the sum of all is equal to one) of size n_classes + 1 and a (linear) size 4 * regressor (n_classes + 1). The number n_classes is the number of object classes recognized (20 for PascalVOC and 80 for COCO). One symbol for each class corresponds to the probability, and 4 values from the regressor correspond to the box offsets relative to the initial approximation.

• Таким образом, для каждого входного изображения, устройство возвращает список (260) из наиболее вероятного класса (c) и координат рамки, окружающей объект (x,y,h,w).• Thus, for each input image, the device returns a list (260) of the most likely class (c) and coordinates of the frame surrounding the object (x, y, h, w).

В архитектуре детектирования Faster R-CNN, на которой демонстрируется работа системы сжатия (100), только слои из блока backbone модифицируются с помощью устройства сжатия (300).In the Faster R-CNN detection architecture, which demonstrates the operation of the compression system (100), only the layers from the backbone block are modified using the compression device (300).

Оптимизированная нейросетевая модель.Optimized neural network model.

После итеративного повторения вышеуказанных процедур сжатия и настройки модели, архитектура искусственной нейронной сети изменяется и в конечном виде представляет собой исходную архитектуру с измененными выбранными сверточными слоями (один сверточный слой преобразуется в три слоя, как описано выше)After iteratively repeating the above compression procedures and model tuning, the architecture of the artificial neural network changes and in the final form is the original architecture with the changed selected convolutional layers (one convolutional layer is converted into three layers, as described above)

В итоге число параметров и число операций меняется, однако качество всей сети сохраняется в желаемом пределе.As a result, the number of parameters and the number of operations changes, but the quality of the entire network remains within the desired limit.

В случае описанной выше модели детектирования объектов на изображении, когда backbone имеет архитектуру сетей ResNet или VGG, заявленная система при работе модуля rank selector в режиме ‘bayesian’ и модуля tensor approximator в режиме ‘Tucker-2’ позволяет получить сжатые модели, которые в 1.5 раз легче и имеют качество предсказаний на 0.4% лучше, чем у исходных.In the case of the above model for detecting objects in the image, when the backbone has the architecture of ResNet or VGG networks, the declared system, when the rank selector module operates in the 'bayesian' mode and the tensor approximator module in the 'Tucker-2' mode, allows obtaining compressed models, which in 1.5 times lighter and have a quality of predictions 0.4% better than the original ones.

На Фиг. 5 представлен пример общего вида вычислительной системы (700), на базе которой может быть реализована система итеративного сжатия нейронной сети (100).FIG. 5 shows an example of a general view of a computing system (700), on the basis of which an iterative neural network compression system (100) can be implemented.

В общем виде система (700) содержит объединенные общей шиной информационного обмена один или несколько процессоров (701), средства памяти, такие как ОЗУ (702) и ПЗУ (703), интерфейсы ввода/вывода (704), устройства ввода/вывода (705), и устройство для сетевого взаимодействия (706).In general, the system (700) contains one or more processors (701) united by a common bus of information exchange, memory means, such as RAM (702) and ROM (703), input / output interfaces (704), input / output devices (705 ), and a device for networking (706).

Процессор (701) (или несколько процессоров, многоядерный процессор и т.п.) может выбираться из ассортимента устройств, широко применяемых в настоящее время, например, таких производителей, как: Intel™, AMD™, Apple™, Samsung Exynos™, MediaTEK™, Qualcomm Snapdragon™ и т.п.The processor (701) (or multiple processors, multi-core processor, etc.) can be selected from a range of devices currently widely used, for example, manufacturers such as: Intel ™, AMD ™, Apple ™, Samsung Exynos ™, MediaTEK ™, Qualcomm Snapdragon ™, etc.

ОЗУ (702) представляет собой оперативную память и предназначено для хранения исполняемых процессором (701) машиночитаемых инструкций для выполнение необходимых операций по логической обработке данных. ОЗУ (702), как правило, содержит исполняемые инструкции операционной системы и соответствующих программных компонент (приложения, программные модули и т.п.). При этом, в качестве ОЗУ (702) может выступать доступный объем памяти графической карты или графического процессора.RAM (702) is a random access memory and is intended to store computer-readable instructions executed by the processor (701) for performing the necessary operations for logical data processing. RAM (702) typically contains executable instructions of the operating system and associated software components (applications, software modules, etc.). In this case, the available memory of the graphics card or graphics processor can act as RAM (702).

ПЗУ (703) представляет собой одно или более устройств постоянного хранения данных, например, жесткий диск (HDD), твердотельный накопитель данных (SSD), флэш-память (EEPROM, NAND и т.п.), оптические носители информации (CD-R/RW, DVD-R/RW, BlueRay Disc, MD) и др.ROM (703) is one or more persistent storage devices such as hard disk drive (HDD), solid state data storage device (SSD), flash memory (EEPROM, NAND, etc.), optical storage media (CD-R / RW, DVD-R / RW, BlueRay Disc, MD), etc.

Для организации работы компонентов системы (700) и организации работы внешних подключаемых устройств применяются различные виды интерфейсов В/В (704). Выбор соответствующих интерфейсов зависит от конкретного исполнения вычислительного устройства, которые могут представлять собой, не ограничиваясь: PCI, AGP, PS/2, IrDa, FireWire, LPT, COM, SATA, IDE, Lightning, USB (2.0, 3.0, 3.1, micro, mini, type C), TRS/Audio jack (2.5, 3.5, 6.35), HDMI, DVI, VGA, Display Port, RJ45, RS232 и т.п.Various types of I / O interfaces (704) are used to organize the operation of system components (700) and to organize the operation of external connected devices. The choice of the appropriate interfaces depends on the specific version of the computing device, which can be, but are not limited to: PCI, AGP, PS / 2, IrDa, FireWire, LPT, COM, SATA, IDE, Lightning, USB (2.0, 3.0, 3.1, micro, mini, type C), TRS / Audio jack (2.5, 3.5, 6.35), HDMI, DVI, VGA, Display Port, RJ45, RS232, etc.

Для обеспечения взаимодействия пользователя с вычислительной системой (700) применяются различные средства (705) В/В информации, например, клавиатура, дисплей (монитор), сенсорный дисплей, тач-пад, джойстик, манипулятор мышь, световое перо, стилус, сенсорная панель, трекбол, динамики, микрофон, средства дополненной реальности, оптические сенсоры, планшет, световые индикаторы, проектор, камера, средства биометрической идентификации (сканер сетчатки глаза, сканер отпечатков пальцев, модуль распознавания голоса) и т.п.To ensure user interaction with the computing system (700), various means (705) of I / O information are used, for example, a keyboard, display (monitor), touch display, touch pad, joystick, mouse manipulator, light pen, stylus, touch panel, trackball, speakers, microphone, augmented reality, optical sensors, tablet, light indicators, projector, camera, biometric identification (retina scanner, fingerprint scanner, voice recognition module), etc.

Средство сетевого взаимодействия (706) обеспечивает передачу данных посредством внутренней или внешней вычислительной сети, например, Интранет, Интернет, ЛВС и т.п. В качестве одного или более средств (706) может использоваться, но не ограничиваться: Ethernet карта, GSM модем, GPRS модем, LTE модем, 5G модем, модуль спутниковой связи, NFC модуль, Bluetooth и/или BLE модуль, Wi-Fi модуль и др.The networking tool (706) provides data transmission via an internal or external computer network, for example, Intranet, Internet, LAN, etc. One or more means (706) may be used, but not limited to: Ethernet card, GSM modem, GPRS modem, LTE modem, 5G modem, satellite communication module, NFC module, Bluetooth and / or BLE module, Wi-Fi module, and dr.

Дополнительно могут применяться также средства спутниковой навигации в составе системы (700), например, GPS, ГЛОНАСС, BeiDou, Galileo.Additionally, satellite navigation aids can be used as part of the system (700), for example, GPS, GLONASS, BeiDou, Galileo.

Конкретный выбор элементов системы (700) для реализации различных программно-аппаратных архитектурных решений может варьироваться с сохранением обеспечиваемого требуемого функционала. The specific choice of system elements (700) for the implementation of various software and hardware architectural solutions can vary while maintaining the required functionality provided.

В настоящих материалах заявки было представлено предпочтительное раскрытие осуществление заявленного технического решения, которое не должно использоваться как ограничивающее иные, частные воплощения его реализации, которые не выходят за рамки испрашиваемого объема правовой охраны и являются очевидными для специалистов в соответствующей области техники.In the present application materials, the preferred disclosure of the implementation of the claimed technical solution was presented, which should not be used as limiting other, particular embodiments of its implementation, which do not go beyond the scope of the claimed scope of legal protection and are obvious to specialists in the relevant field of technology.

Библиография: Bibliography:

[1] Jaderberg, M., Vedaldi, A., & Zisserman, A. (2014). Speeding up convolutional neural networks with low rank expansions. CoRR, abs/1405.3866.[1] Jaderberg, M., Vedaldi, A., & Zisserman, A. (2014). Speeding up convolutional neural networks with low rank expansions. CoRR, abs / 1405.3866.

[2] Kim, Y. D., Park, E., Yoo, S., Choi, T., Yang, L., & Shin, D. (2015). Compression of deep convolutional neural networks for fast and low power mobile applications. International Conference on Learning Representations.[2] Kim, Y. D., Park, E., Yoo, S., Choi, T., Yang, L., & Shin, D. (2015). Compression of deep convolutional neural networks for fast and low power mobile applications. International Conference on Learning Representations.

[3] Lebedev, V., Ganin, Y., Rakhuba, M., Oseledets, I., and Lempitsky, V. (2015). Speeding-up convolutional neural networks using fine-tuned cp-decomposition. International Conference on Learning Representations.[3] Lebedev, V., Ganin, Y., Rakhuba, M., Oseledets, I., and Lempitsky, V. (2015). Speeding-up convolutional neural networks using fine-tuned cp-decomposition. International Conference on Learning Representations.

[4] X. Zhang, J. Zou, K. He, and J. Sun. (2016). Accelerating deep convolutional networks for classification and detection. IEEE Transactions on Pattern Analysis and Machine Intelligence, 38(10):1943–1955.[4] X. Zhang, J. Zou, K. He, and J. Sun. (2016). Accelerating deep convolutional networks for classification and detection. IEEE Transactions on Pattern Analysis and Machine Intelligence, 38 (10): 1943-1955.

[5] He, K., Zhang, X., Ren, S., & Sun, J. (2016). Deep residual learning for image recognition. Proceedings of the IEEE Computer Society Conference on Computer Vision and Pattern Recognition. https://doi.org/10.1109/CVPR.2016.90.[5] He, K., Zhang, X., Ren, S., & Sun, J. (2016). Deep residual learning for image recognition. Proceedings of the IEEE Computer Society Conference on Computer Vision and Pattern Recognition. https://doi.org/10.1109/CVPR.2016.90.

[6] Nagi, J., Ducatelle, F., Di Caro, G. A., Cireşan, D., Meier, U., Giusti, A., Gambardella, L. M. (2011). Max-pooling convolutional neural networks for vision-based hand gesture recognition. 2011 IEEE International Conference on Signal and Image Processing Applications, ICSIPA 2011. https://doi.org/10.1109/ICSIPA.2011.6144164.[6] Nagi, J., Ducatelle, F., Di Caro, G. A., Cireşan, D., Meier, U., Giusti, A., Gambardella, L. M. (2011). Max-pooling convolutional neural networks for vision-based hand gesture recognition. 2011 IEEE International Conference on Signal and Image Processing Applications, ICSIPA 2011. https://doi.org/10.1109/ICSIPA.2011.6144164.

[7] mAP (mean Average Precision) for Object Detection - Jonathan Hui - Medium https://medium.com/@jonathan_hui/map-mean-average-precision-for-object-detection-45c121a31173.[7] mAP (mean Average Precision) for Object Detection - Jonathan Hui - Medium https://medium.com/@jonathan_hui/map-mean-average-precision-for-object-detection-45c121a31173.

[8] Girshick, R. (2015). Fast R-CNN. Proceedings of the IEEE International Conference on Computer Vision. https://doi.org/10.1109/ICCV.2015.169.[8] Girshick, R. (2015). Fast R-CNN. Proceedings of the IEEE International Conference on Computer Vision. https://doi.org/10.1109/ICCV.2015.169.

[9] Tsung-Yi Lin, Genevieve Patterson, Matteo R. Ronchi, Yin Cui, Michael Maire, Serge Belongie, … Piotr Dollár. (2018). COCO - Common Objects in Context. COCO Dataset, 740–741. Retrieved from http://cocodataset.org/#home.[9] Tsung-Yi Lin, Genevieve Patterson, Matteo R. Ronchi, Yin Cui, Michael Maire, Serge Belongie, ... Piotr Dollár. (2018). COCO - Common Objects in Context. COCO Dataset, 740-741. Retrieved from http://cocodataset.org/#home.

[10] Simonyan, K., & Zisserman, A. (2015). Very deep convolutional networks for large-scale image recognition. 3rd International Conference on Learning Representations, ICLR 2015 - Conference Track Proceedings.[10] Simonyan, K., & Zisserman, A. (2015). Very deep convolutional networks for large-scale image recognition. 3rd International Conference on Learning Representations, ICLR 2015 - Conference Track Proceedings.

[11] Ren, S., He, K., Girshick, R., & Sun, J. (2017). Faster R-CNN: Towards Real-Time Object Detection with Region Proposal Networks. IEEE Transactions on Pattern Analysis and Machine Intelligence. https://doi.org/10.1109/TPAMI.2016.2577031.[11] Ren, S., He, K., Girshick, R., & Sun, J. (2017). Faster R-CNN: Towards Real-Time Object Detection with Region Proposal Networks. IEEE Transactions on Pattern Analysis and Machine Intelligence. https://doi.org/10.1109/TPAMI.2016.2577031.

[12] Everingham, M., Van Gool, L., Williams, C. K. I., Winn, J., & Zisserman, A. (2010). The pascal visual object classes (VOC) challenge. International Journal of Computer Vision. https://doi.org/10.1007/s11263-009-0275-4.[12] Everingham, M., Van Gool, L., Williams, C. K. I., Winn, J., & Zisserman, A. (2010). The pascal visual object classes (VOC) challenge. International Journal of Computer Vision. https://doi.org/10.1007/s11263-009-0275-4.

[13] Nakajima, S., Tomioka R., Sugiyama, M., and Babacan, S. D.. (2012). Perfect dimensionality recovery by variational Bayesian PCA. In Advances in Neural Information Processing Systems, pages 971–979.[13] Nakajima, S., Tomioka R., Sugiyama, M., and Babacan, S. D .. (2012). Perfect dimensionality recovery by variational Bayesian PCA. In Advances in Neural Information Processing Systems, pages 971-979.

[14] Cichocki, A., Lee, N., Oseledets, I., Phan, A. H., Zhao, Q., & Mandic, D. P. (2016). Tensor networks for dimensionality reduction and large-scale optimization: Part 1 low-rank tensor decompositions. Foundations and Trends® in Machine Learning, 9(4-5), 249-429.[14] Cichocki, A., Lee, N., Oseledets, I., Phan, AH, Zhao, Q., & Mandic, DP (2016). Tensor networks for dimensionality reduction and large-scale optimization: Part 1 low-rank tensor decompositions. Foundations and Trends® in Machine Learning , 9 (4-5), 249-429.

Claims

1. Compression system of artificial neural networks (NN) based on iterative application of tensor approximations, containing: a compression device, which consists of a module for automatic determination of compression parameters (module rank selector) and a module that replaces the parameters of convolutional / fully connected layers of NN with their low-rank approximation obtained using tensor / matrix expansions (tensor approximator module), and a fine tuning device, while:

• the compression device receives the NS as input, the rank selector module automatically selects the rank of the tensor expansion for each convolutional / fully connected NS layer, which is used to approximate the weight tensor, after which the tensor approximator module replaces the layer weight with its low-rank approximation so that the total number of parameters there are fewer new tensors than the number of parameters in the original tensor,

as a result of which, during the first processing of a convolutional / fully connected layer by a compression device, the original layer is replaced by a decomposed layer, which is a sequence of several convolutional / fully connected layers, while the weights of the new layers are initialized by the tensor decomposition factors, with which the approximation was performed, when the already decomposed layer, the number of convolutional / fully connected layers does not change, but the number of parameters in each component of the decomposed layer decreases due to a decrease in the approximation rank;

• the fine tuning device receives the transformed NS from the compression device as input and outputs an optimized NS, which has a better predictive ability due to the adjustment of the model parameters, which is performed by the backpropagation method using the database.

2. The system according to claim 1, characterized in that the weight of the convolutional layer is a four-dimensional tensor, and the weight of the fully connected layer is a matrix (two-dimensional tensor).