RU2656990C1

RU2656990C1 - System and method for artificial neural network invariant to transferring

Info

Publication number: RU2656990C1
Application number: RU2017131720A
Authority: RU
Inventors: Владимир Петрович Парамонов; Виталий Сергеевич ЛАВРУХИН; Алексей Станиславович ЧЕРНЯВСКИЙ
Original assignee: Самсунг Электроникс Ко., Лтд.
Priority date: 2017-09-11
Filing date: 2017-09-11
Publication date: 2018-06-07

Abstract

FIELD: information technology.

SUBSTANCE: group of inventions refers to artificial neural networks and can be used to process and recognize signals, such as images, video or sound. Method comprises the steps of: inputting data to the current layer of the trained neural network, processing the input data to obtain output data, and if the number of the current layer of the neural network is less than N, go to the next layer of the neural network, if the number is N, extract the output data. Processing step comprises applying an operation of transferring the input data in one of the predetermined directions, calculating a linear weighted sum of the input data, and adding the shifted input data and the weighted sum calculation result to obtain the output data, the processing being applied to each predetermined direction, or applying an operation of transferring the input data in one or more predetermined directions to obtain shifted input data in each of the predetermined directions, calculating a linear weighted sum of the shifted input data in each of the predetermined directions, and summing the input data and the weighted sum calculation results for each of the predetermined directions to obtain the output data.

EFFECT: reducing the consumption of computing resources while maintaining a high degree of recognition accuracy.

10 cl, 14 dwg

Description

ОБЛАСТЬ ТЕХНИКИ, К КОТОРОЙ ОТНОСИТСЯ ИЗОБРЕТЕНИЕFIELD OF THE INVENTION

Данное изобретение относится к области машинного обучения, а именно к искусственным нейронным сетям, инвариантным к сдвигу. В частности, настоящее изобретение относится к обработке и распознаванию сигналов, таких как изображения, видео или звук, с помощью таких искусственных нейронных сетей.This invention relates to the field of machine learning, namely to artificial invariant networks that are invariant to shear. In particular, the present invention relates to the processing and recognition of signals, such as images, video or sound, using such artificial neural networks.

УРОВЕНЬ ТЕХНИКИBACKGROUND

В настоящее время нейронные сети являются одним из эффективно применяемых подходов для распознавания, регрессии, предсказания и/или классификации сигналов. В частности, для обработки и распознавания данных изображений, видео, звука (например, речи) и 3D-данных (например, медицинских данных), как правило, применяется сверточная нейронная сеть (CNN). CNN обучается и вырабатывает необходимую иерархию признаков (иначе говоря, ядра свертки) во время процесса обучения на основании примерного размеченного набора данных, для которого заданы правильные ответы распознавания. Другими словами, во время процесса обучения нейронная сеть подбирает весовые коэффициенты для ядер свертки, причем на первых слоях весовые коэффициенты соответствуют некоторым общим признакам, присущим всем сигналам заданного типа, например, изображениям. Для изображений такими признаками могут быть, например, линии под разными углами. Количество слоев в нейронной сети задает глубину нейронной сети. Переходя глубже, т.е. к последующим слоям, сеть начинает выделять более сложные признаки, и весовые коэффициенты ядер свертки становятся более специфичными к конкретным заданным классам или конкретной задаче регрессии. Последний слой сверточной нейронной сети выполняет классификацию или регрессию, в зависимости от постановки задачи, на основе информации, полученной от предыдущих слоев. Подбор весовых коэффициентов для ядер свертки, например, осуществляется в соответствии с методом обратного распространения ошибки.Currently, neural networks are one of the effective approaches for recognition, regression, prediction and / or classification of signals. In particular, a convolutional neural network (CNN) is typically used to process and recognize image data, video, sound (e.g., speech) and 3D data (e.g., medical data). CNN learns and develops the necessary hierarchy of attributes (in other words, convolution kernels) during the learning process based on an approximate labeled data set for which the correct recognition answers are set. In other words, during the training process, the neural network selects weights for convolution kernels, and on the first layers, weights correspond to some common features inherent in all signals of a given type, for example, images. For images, such signs may be, for example, lines at different angles. The number of layers in a neural network determines the depth of the neural network. Going deeper, i.e. to subsequent layers, the network begins to highlight more complex features, and the convolution kernel weights become more specific to specific given classes or to a particular regression task. The last layer of the convolutional neural network performs classification or regression, depending on the statement of the problem, based on information received from previous layers. The selection of weighting factors for convolution kernels, for example, is carried out in accordance with the method of back propagation of error.

Таким образом, CNN может распознавать данные (например, классифицировать объект на изображении) на основе упомянутых ядер и выдавать в качестве выходного сигнала одно наиболее вероятное наименование класса объекта (Топ-1 класс) или несколько наиболее вероятных наименований классов, например, 5 наиболее вероятных наименований классов (Топ-5 классов). Для сравнения качества классификации, как правило, используются метрики «Топ-1 ошибка» и «Топ-5 ошибка». При использовании метрики «Топ-1 ошибка» классификация входного сигнала нейронной сетью считается правильной, если правильный ответ совпал с Топ-1 классом. При использовании метрики «Топ-5 ошибка» классификация сигнала нейронной сетью считается правильной, если правильный ответ попал в Топ-5 классов, выданных сетью. Thus, CNN can recognize data (for example, classify an object in the image) on the basis of the mentioned kernels and produce as the output signal one most probable object class name (Top-1 class) or several most probable class names, for example, 5 most probable names classes (Top 5 classes). To compare the quality of classification, as a rule, the metrics “Top-1 error” and “Top-5 error” are used. When using the “Top-1 error” metric, the classification of the input signal by the neural network is considered correct if the correct answer matches the Top-1 class. When using the “Top-5 error” metric, the classification of a signal by a neural network is considered correct if the correct answer is in the Top-5 classes issued by the network.

За последние годы глубина применяемых нейронных сетей увеличилась, приведя к большей точности в отношении правильных ответов для различных задач машинного распознавания образов, при этом размеры ядер свертки уменьшились. В частности, С 2010 года ведется конкурс ILSVRC (ImageNet Large Scale Visual Recognition Challenge - соревнование по распознаванию образов в наборе данных ImageNet), в рамках которого было установлено, что в 2010 году ошибка классификации по метрике Топ-1 при использовании известных неглубоких сетей составляла 28,2%, а в 2011 году - 25,8%. В 2012 году глубина применяемой сети (AlexNet) увеличилась до 8 слоев, а ошибка составила 16,4%; этот результат был улучшен в 2013 году при достижении ошибки, составляющей 11,7%. В 2014 году глубина применяемой нейронной сети уже составляла 19 слоев, что позволяло осуществлять распознавание объектов с ошибкой, составляющей лишь 7,3% (сеть VGG). В этом же году была разработана другая известная сеть - GoogleNet, глубина которой составляла 22 слоя, а ошибка составляла 6,7%. Наилучший результат был достигнут в 2015 году при применении сети ResNet с глубиной 152 слоя и размером ядра свертки 3×3, а ошибка составляла лишь 3,57%. Однако использование таких глубоких сверточных нейронных сетей требует высокого потребления вычислительных ресурсов, а именно большого объема памяти, так как многократное применение операций свертки является времязатратным и трудоемким. In recent years, the depth of neural networks used has increased, leading to greater accuracy with respect to the correct answers for various problems of machine pattern recognition, while the size of convolution kernels has decreased. In particular, since 2010, the ILSVRC (ImageNet Large Scale Visual Recognition Challenge - Image Recognition Competition in the ImageNet dataset) contest has been held, in which it was found that in 2010 the classification error by the Top-1 metric using well-known shallow networks was 28.2%, and in 2011 - 25.8%. In 2012, the depth of the network used (AlexNet) increased to 8 layers, and the error was 16.4%; this result was improved in 2013 with an error of 11.7%. In 2014, the depth of the neural network used was already 19 layers, which allowed the recognition of objects with an error of only 7.3% (VGG network). In the same year, another well-known network was developed - GoogleNet, the depth of which was 22 layers, and the error was 6.7%. The best result was achieved in 2015 using a ResNet network with a depth of 152 layers and a convolution kernel size of 3 × 3, and the error was only 3.57%. However, the use of such deep convolutional neural networks requires a high consumption of computing resources, namely a large amount of memory, since the repeated use of convolution operations is time-consuming and time-consuming.

Одним из известных решений, описывающих параллельную сверточную нейронную сеть, является решение, раскрытое, например, в патентном документе WO 2014/105865 A1 (Google Inc., «System and method for parallelizing convolutional neural networks» - Система и способ для параллельных сверточных нейронных сетей). Известная CNN реализуется множеством сверточных нейронных сетей, каждая из которых находится на соответствующем узле обработки. Каждая CNN имеет множество слоев. Подмножество слоев взаимосвязано между узлами обработки таким образом, чтобы сигналы активации продвигались далее по узлам. Недостатком известного решения является высокое потребление вычислительных ресурсов.One of the well-known solutions describing a parallel convolutional neural network is the solution disclosed, for example, in patent document WO 2014/105865 A1 (Google Inc., “System and method for parallelizing convolutional neural networks” - System and method for parallel convolutional neural networks ) Known CNN is implemented by many convolutional neural networks, each of which is located on the corresponding processing node. Each CNN has many layers. A subset of the layers is interconnected between the processing nodes so that the activation signals move further along the nodes. A disadvantage of the known solution is the high consumption of computing resources.

В документе US 20140180986 A1 (Google Inc., «System and method for addressing overfitting in a neural network» - Система и способ для регулировки переобучения в нейронной сети) предложена система обучения нейронной сети. Переключатель связан с детекторами признаков по крайней мере в некоторых слоях нейронной сети. Для каждого примера для обучения переключатель избирательно отключает каждый из детекторов признаков в соответствии с заранее заданной вероятностью. Затем весовые коэффициенты каждого примера для обучения нормируются для применения нейронной сети к данным для испытания. Однако при реализации данного известного решения объем потребляемой памяти также велик вследствие наличия огромного количества параметров в сети.In the document US 20140180986 A1 (Google Inc., "System and method for addressing overfitting in a neural network" - a system for training a neural network). The switch is associated with feature detectors in at least some layers of the neural network. For each training example, a switch selectively disables each of the feature detectors in accordance with a predetermined probability. Then, the weights of each training example are normalized to apply the neural network to the test data. However, when implementing this known solution, the amount of memory consumed is also large due to the presence of a huge number of parameters on the network.

Таким образом, проблема существующего уровня техники заключается в том, что процесс обучения представляет собой трудную математическую и вычислительную задачу, которая требует значительных ресурсов аппаратного обеспечения, ресурсов времени и трудозатрат.Thus, the problem of the existing level of technology is that the learning process is a difficult mathematical and computational task, which requires significant hardware resources, time and labor resources.

Задачей настоящего изобретения является создание более быстрой и простой архитектуры нейронной сети для обработки изображения, видео или звука, в частности, важно спроектировать простую и эффективную нейронную сеть, минимизировав при этом количество гиперпараметров (например, количество слоев, размеры ядер), которые необходимо задавать человеку при настройке архитектуры нейронной сети.The objective of the present invention is to provide a faster and simpler architecture of a neural network for processing images, video or sound, in particular, it is important to design a simple and efficient neural network, while minimizing the number of hyperparameters (for example, the number of layers, the size of the nuclei) that must be set to a person when setting up a neural network architecture.

РАСКРЫТИЕ ИЗОБРЕТЕНИЯSUMMARY OF THE INVENTION

Указанная задача решается посредством способа и системы, которые охарактеризованы в независимых пунктах формулы изобретения. Дополнительные варианты реализации настоящего изобретения представлены в зависимых пунктах формулы изобретения.This problem is solved by the method and system, which are described in the independent claims. Additional embodiments of the present invention are presented in the dependent claims.

Согласно настоящему изобретению предложен эффективный путь, в соответствии с которым вместо операции свертки применяются операции сдвига и линейной взвешенной суммы, за счет чего избегаются слои с трудоемкими операциями свертки, что приводит к меньшим вычислительным затратам при сохранении высокой точности распознавания, при этом элементы сигнала по-прежнему рассматриваются в контексте, т.е. с учетом некоторой окрестности рассматриваемого элемента. Указанное преимущество достигается за счет сведения операции линейной взвешенной суммы к хорошо известному алгоритму обобщенного перемножения матриц (GEMM), при этом использование дополнительной памяти не требуется. Операции GEMM оптимизированы для GPU/CPU, что в результате приводит к меньшему потреблению вычислительных ресурсов.According to the present invention, an effective way is proposed in which shear and linear weighted sum operations are used instead of the convolution operation, thereby avoiding layers with laborious convolution operations, which leads to lower computational costs while maintaining high recognition accuracy, while the signal elements are transmitted still considered in context, i.e. taking into account some neighborhood of the element under consideration. This advantage is achieved by reducing the linear weighted sum operation to the well-known generalized matrix multiplication (GEMM) algorithm, and the use of additional memory is not required. GEMM operations are optimized for the GPU / CPU, resulting in less computational resources.

Согласно первому аспекту заявленной группы изобретений предложен способ обработки сигналов с помощью нейронной сети, имеющей N слоев, содержащий этапы, на которых:According to a first aspect of the claimed group of inventions, a method for processing signals using a neural network having N layers, comprising the steps of:

- подают входные данные на текущий слой обученной нейронной сети, причем входные данные представляют собой набор числовых значений сигнала для обработки;- submit the input data to the current layer of the trained neural network, and the input data is a set of numerical values of the signal for processing;

- обрабатывают поданные входные данные для получения выходных данных; и- process the submitted input data to obtain output data; and

- если номер текущего слоя нейронной сети меньше N, переходят на следующий слой нейронной сети и повторяют этапы способа с использованием полученных выходных данных в качестве входных данных, и- if the number of the current layer of the neural network is less than N, go to the next layer of the neural network and repeat the steps of the method using the obtained output as input, and

- если номер текущего слоя нейронной сети равен N, выводят полученные выходные данные;- if the number of the current layer of the neural network is N, output data is output;

причем этап обработки содержит применение операции сдвига входных данных в одном из заданных направлений для получения сдвинутых входных данных, вычисление линейной взвешенной суммы входных данных и суммирование сдвинутых входных данных и результата вычисления взвешенной суммы для получения выходных данных, причем данная обработка применяется в отношении каждого заданного направления, или применение операции сдвига входных данных в одном или более заданных направлениях для получения сдвинутых входных данных в каждом из одного или более заданных направлений, вычисление линейной взвешенной суммы сдвинутых входных данных в каждом из одного или более заданных направлений и суммирование результатов вычисления взвешенной суммы для каждого из одного или более заданных направлений и входных данных для получения выходных данных.moreover, the processing step comprises applying the input data shift operation in one of the given directions to obtain shifted input data, calculating the linear weighted sum of the input data and summing the shifted input data and the result of calculating the weighted sum to obtain the output data, and this processing is applied to each given direction , or applying an input data shift operation in one or more predetermined directions to obtain shifted input data in each of one or olee predetermined directions, calculating a linear weighted sum of shifted input data in each of the one or more predetermined directions and summing the results of calculating a weighted sum for each of the one or more predetermined directions and input data to obtain output data.

При этом вычисление линейной взвешенной суммы сводится к операции обобщенного перемножения двух матриц (GEMM).In this case, the calculation of the linear weighted sum reduces to the operation of generalized multiplication of two matrices (GEMM).

Способ дополнительно содержит предварительные этапы, на которых:The method further comprises preliminary steps in which:

- получают примерный размеченный набор данных для обучения;- receive an approximate labeled data set for training;

- обучают нейронную сеть на основе полученного примерного размеченного набора данных путем определения соответствующих весовых коэффициентов.- train the neural network based on the obtained approximate labeled data set by determining the corresponding weighting factors.

Сигналом, подлежащим обработке, может являться изображение, а заданное направление может представлять собой направление в сторону одного из прилегающих соседних пикселей, при этом соседние пиксели определяются исключительно топологией данных (например, 1Д (звук), 2Д (изображение) и 3Д (видео), и т.д., и т.п.).The signal to be processed can be an image, and the given direction can be a direction in the direction of one of the adjacent neighboring pixels, while the neighboring pixels are determined solely by the topology of the data (for example, 1D (sound), 2D (image) and 3D (video), etc.).

Выходные данные представляют собой наименование сигнала, классификацию сигнала или список наиболее вероятных наименований сигнала или другой искомый сигнал, в случае задачи регрессии.The output is the name of the signal, the classification of the signal, or a list of the most probable names of the signal, or another signal sought, in the case of a regression problem.

Согласно другому аспекту заявленной группы изобретений предложена система для обработки сигналов с помощью нейронной сети, имеющей N слоев, содержащая:According to another aspect of the claimed group of inventions, a system for processing signals using a neural network having N layers, comprising:

- устройство приема, выполненное с возможностью принимать входные данные, представляющие собой набор числовых значений сигнала для обработки;- a receiving device, configured to receive input data, which is a set of numerical values of the signal for processing;

- память, выполненную с возможностью хранить принятые входные данные;- a memory configured to store received input data;

- устройство обработки, выполненное с возможностью считывать входные данные из памяти и выполнять этапы способа обработки сигналов согласно первому аспекту, причем устройство обработки дополнительно выполнено с возможностью записи выходных данных, выведенных устройством обработки, в память.- a processing device configured to read the input data from the memory and perform the steps of the signal processing method according to the first aspect, the processing device further configured to write output data output by the processing device to the memory.

Упомянутое устройство обработки представляет собой центральный процессор (CPU) и/или графический процессор (GPU).Said processing device is a central processing unit (CPU) and / or a graphics processing unit (GPU).

Технические эффекты настоящего изобретения заключаются в следующем:The technical effects of the present invention are as follows:

- простота: уменьшение количества гиперпараметров, которые необходимо настраивать при разработке новой нейронной сети для конкретной задачи;- simplicity: reducing the number of hyperparameters that need to be configured when developing a new neural network for a specific task;

- быстрота: меньшие затраты по времени и потребление меньшего объема памяти для CPU и GPU за счет непосредственного осуществления GEMM вместо операции свертки без дополнительных подготовительных ресурсозатратных операций, как будет указано далее;- speed: lower time costs and less memory consumption for the CPU and GPU due to the direct implementation of the GEMM instead of the convolution operation without additional preparatory resource-consuming operations, as will be described later;

- точность: обладает такой же степенью точности, как и применяемые в настоящее время сверточные нейронные сети;- accuracy: has the same degree of accuracy as the convolutional neural networks currently used;

- мобильная обработка (в мобильном устройстве): без необходимости в дополнительном аппаратном обеспечении или облачной поддержке.- mobile processing (in a mobile device): without the need for additional hardware or cloud support.

Таким образом, технический результат, достигаемый посредством использования настоящей группы изобретений, заключается в уменьшении потребления вычислительных ресурсов при распознавании сигналов с помощью нейронной сети с сохранением высокой степени точности распознавания.Thus, the technical result achieved through the use of this group of inventions is to reduce the consumption of computing resources when recognizing signals using a neural network while maintaining a high degree of recognition accuracy.

КРАТКОЕ ОПИСАНИЕ ЧЕРТЕЖЕЙBRIEF DESCRIPTION OF THE DRAWINGS

Эти и другие признаки и преимущества настоящего изобретения станут очевидны после прочтения нижеследующего описания и просмотра сопроводительных чертежей, на которых:These and other features and advantages of the present invention will become apparent after reading the following description and viewing the accompanying drawings, in which:

Фиг. 1 иллюстрирует схематическое изображение единичного блока сети в соответствии с известной сверточной нейронной сетью;FIG. 1 illustrates a schematic representation of a unit network block in accordance with a known convolutional neural network;

Фиг. 2а иллюстрирует схематическое изображение одного варианта единичного блока сети в соответствии с вариантом осуществления изобретения;FIG. 2a illustrates a schematic illustration of one embodiment of a unit network block in accordance with an embodiment of the invention;

Фиг. 2б иллюстрирует схематическое изображение другого варианта единичного блока сети в соответствии с вариантом осуществления изобретения;FIG. 2b illustrates a schematic illustration of another embodiment of a unit network block in accordance with an embodiment of the invention;

Фиг. 3 иллюстрирует сведение вычислений к умножению двумерных матриц;FIG. 3 illustrates the reduction of computations to multiplication of two-dimensional matrices;

Фиг. 4а иллюстрирует схематическое изображение сведения применения двух параллельных операций сдвига в разных направлениях к одновременному применению операций сдвига в этих направлениях в соответствии с вариантом осуществления изобретения;FIG. 4a illustrates a schematic representation of the application of two parallel shear operations in different directions to the simultaneous application of shear operations in these directions in accordance with an embodiment of the invention;

Фиг. 4б иллюстрирует примерное представление двух направлений в отношении рассматриваемого изображения;FIG. 4b illustrates an exemplary representation of two directions with respect to the image in question;

Фиг. 5а иллюстрирует схематическое изображение единичного блока сети при рассмотрении набора последовательных изображений с одновременным применением операций сдвига в трех направлениях в соответствии с вариантом осуществления изобретения;FIG. 5a illustrates a schematic illustration of a unit block of a network when considering a set of sequential images while applying three-way shift operations in accordance with an embodiment of the invention;

Фиг. 5б иллюстрирует примерное представление трех направлений в отношении рассматриваемого набора последовательных изображений;FIG. 5b illustrates an exemplary representation of three directions with respect to the considered set of consecutive images;

Фиг. 6а иллюстрирует график сравнения точности результатов обучения предложенной нейронной сети с нейронной сетью ResNet на основе набора данных CIFAR-10;FIG. 6a illustrates a graph comparing the accuracy of the learning outcomes of the proposed neural network with a ResNet neural network based on the CIFAR-10 data set;

Фиг. 6б иллюстрирует график сравнения точности результатов обучения предложенной нейронной сети с нейронной сетью ResNet на основе набора данных CIFAR-100;FIG. 6b illustrates a graph comparing the accuracy of the learning results of the proposed neural network with a ResNet neural network based on the CIFAR-100 data set;

Фиг. 6в иллюстрирует график сравнения точности результатов обучения предложенной нейронной сети с нейронной сетью ResNet на основе набора данных ImageNet;FIG. 6c illustrates a graph comparing the accuracy of the learning outcomes of the proposed neural network with a ResNet neural network based on an ImageNet data set;

Фиг. 7а иллюстрирует результаты тестов на основе набора данных CIFAR-10 для известной нейронной сети ResNet и предложенной нейронной сети;FIG. 7a illustrates test results based on the CIFAR-10 dataset for the known ResNet neural network and the proposed neural network;

Фиг. 7б иллюстрирует результаты тестов на основе набора данных CIFAR-100 для известной нейронной сети ResNet и предложенной нейронной сети;FIG. 7b illustrates test results based on the CIFAR-100 dataset for the well-known ResNet neural network and the proposed neural network;

Фиг. 7в иллюстрирует результаты тестов на основе набора данных ImageNet для известной нейронной сети ResNet и предложенной нейронной сети.FIG. 7c illustrates test results based on the ImageNet dataset for the well-known ResNet neural network and the proposed neural network.

ОСУЩЕСТВЛЕНИЕ ИЗОБРЕТЕНИЯDETAILED DESCRIPTION OF THE INVENTION

Различные варианты осуществления настоящего изобретения описываются в дальнейшем более подробно со ссылкой на чертежи. Однако настоящее изобретение может быть воплощено во многих других формах и не должно истолковываться как ограниченное любой конкретной структурой или функцией, представленной в нижеследующем описании. На основании настоящего описания специалист в данной области техники поймет, что объем правовой охраны настоящего изобретения охватывает любой вариант осуществления настоящего изобретения, раскрытый в данном документе, вне зависимости от того, реализован ли он независимо или в сочетании с любым другим вариантом осуществления настоящего изобретения. Например, система может быть реализована или способ может быть осуществлен на практике с использованием любого числа вариантов осуществления, изложенных в данном документе. Кроме того, следует понимать, что любой вариант осуществления настоящего изобретения, раскрытый в данном документе, может быть воплощен с помощью одного или более элементов формулы изобретения.Various embodiments of the present invention are described in further detail below with reference to the drawings. However, the present invention can be embodied in many other forms and should not be construed as being limited by any particular structure or function described in the following description. Based on the present description, a person skilled in the art will understand that the scope of legal protection of the present invention covers any embodiment of the present invention disclosed herein, regardless of whether it is implemented independently or in combination with any other embodiment of the present invention. For example, a system may be implemented or the method may be practiced using any number of embodiments set forth herein. In addition, it should be understood that any embodiment of the present invention disclosed herein may be embodied using one or more of the claims.

Слово «примерный» используется в данном документе в значении «служащий в качестве примера или иллюстрации». Любой вариант осуществления, описанный в данном документе как «примерный», необязательно должен истолковываться как предпочтительный или обладающий преимуществом над другими вариантами осуществления.The word “exemplary” is used herein to mean “serving as an example or illustration”. Any embodiment described herein as “exemplary” need not be construed as being preferred or taking precedence over other embodiments.

Далее описан вариант осуществления настоящего изобретения на примере обработки изображений, однако заявленное изобретение также применимо и для обработки сигналов другого типа, например, видео, звука (речи), медицинских 3D-данных. Под входными данными в настоящем документе могут подразумеваться любые данные, выраженные в виде матрицы чисел, составляющих изображение, ряда чисел, составляющих звуковой поток; многомерных массивов чисел, составляющих видео поток или медицинские данные и т.п. Например, в полутоновом черно-белом изображении числовое значение пикселя выражается от 0 (черный пиксель) до 255 (белый пиксель). В цветном изображении, например, каждый пиксель имеет 3 канала: B (синий), G (зеленый) и R (красный), каждый из которых выражается в диапазоне от 0 до 255 и т.п.The following describes an embodiment of the present invention by the example of image processing, however, the claimed invention is also applicable to processing signals of a different type, for example, video, sound (speech), 3D medical data. By input in this document can be meant any data expressed as a matrix of numbers that make up the image, a series of numbers that make up the sound stream; multidimensional arrays of numbers that make up the video stream or medical data, etc. For example, in a grayscale black and white image, the numerical value of a pixel is expressed from 0 (black pixel) to 255 (white pixel). In a color image, for example, each pixel has 3 channels: B (blue), G (green) and R (red), each of which is expressed in the range from 0 to 255, etc.

По определению, свертка - это математическая операция, применяемая к двум функциям, порождающая третью функцию, которая иногда может рассматриваться как модифицированная версия первой из них. В этом случае вторая функция называется ядром свертки. Суть операции свертки (или фильтрации с ядром) заключается в том, что каждый фрагмент обрабатываемого изображения умножается на матрицу (ядро свертки) поэлементно, а результат суммируется и записывается в аналогичную позицию каждого элемента выходного изображения. Основное свойство такой фильтрации заключается в том, что при правильной нормировке сигнала выходное значение в каждом пикселе тем больше, чем больше фрагмент изображения в окрестности пикселя похож на само ядро. Таким образом, изображение, свернутое с неким ядром, даст нам другое изображение того же размера, каждый пиксель которого будет означать степень похожести фрагмента изображения на ядро свертки, т.е. обладать значением от 0 (пиксель и его окрестность, т.е. фрагмент изображения, не похожи на ядро свертки) до 1 (пиксель и его окрестность совпадают с ядром свертки). Такое другое изображение называется картой признаков. Для повышения эффективности архитектуры сети в отношении затрачиваемого времени и памяти в известных сверточных нейронных сетях прибегают к использованию ядер небольшого размера. При этом при уменьшении размера ядра в таких сетях архитектура сети становится проще, но значительно глубже для достижения большей точности, как это было показано при раскрытии уровня техники. В настоящее время, как правило, используются ядра свертки размером 3×3, или последовательно применяются ядра свертки размером 1×3 и 3×1 - сепарабельная свертка. Сепарабельная свертка позволяет уменьшить количество требуемых операций для выполнения поставленной задачи. Согласно настоящему изобретению трудоемкая операция свертки не применяется, а применяемую операцию можно назвать вырожденной сверткой с применением ядра размера 1×1. При этом сохраняется локальный характер обработки данных, т.к. по-прежнему учитывается связность обрабатываемого сигнала. Иными словами, учитывается попарная связь соседних пикселей.By definition, convolution is a mathematical operation applied to two functions, generating a third function, which can sometimes be considered as a modified version of the first of them. In this case, the second function is called the convolution kernel. The essence of the convolution operation (or filtering with the core) is that each fragment of the processed image is multiplied by a matrix (convolution core) elementwise, and the result is summed and written to the same position of each element of the output image. The main property of this filtering is that, when the signal is properly normalized, the output value in each pixel is greater, the more the image fragment in the vicinity of the pixel resembles the core itself. Thus, an image convoluted with a certain core will give us another image of the same size, each pixel of which will indicate the degree of similarity of the image fragment to the convolution core, i.e. have a value from 0 (a pixel and its neighborhood, i.e. a fragment of an image, do not look like a convolution core) to 1 (a pixel and its neighborhood coincide with a convolution core). Such another image is called a feature map. To increase the efficiency of the network architecture with respect to the time and memory spent in known convolutional neural networks, small cores are used. Moreover, with a decrease in the size of the core in such networks, the network architecture becomes simpler, but much deeper to achieve greater accuracy, as was shown in the disclosure of the prior art. Currently, as a rule, 3 × 3 convolution kernels are used, or 1 × 3 and 3 × 1 convolutional kernels are used sequentially - separable convolution. Separable convolution reduces the number of operations required to complete the task. According to the present invention, the laborious convolution operation is not applied, and the operation used can be called degenerate convolution using a 1 × 1 kernel. At the same time, the local nature of data processing is preserved, since the connectivity of the processed signal is still taken into account. In other words, pairwise coupling of neighboring pixels is taken into account.

Как правило, операция свертки выполняется достаточно медленно, а размер ядра - это дополнительный параметр, который необходимо задавать при разработке сети для конкретной задачи, что обусловлено необходимостью рассматривать как пиксель, так и некоторую окрестность вокруг него. В частности, при рассмотрении традиционной сети с размером ядра свертки 3×3 необходимо рассматривать текущий пиксель, а также 8 соседних пикселей, включая расположенные по диагонали, и 9 соответствующих весовых коэффициентов.As a rule, the convolution operation is rather slow, and the kernel size is an additional parameter that must be set when designing the network for a specific task, due to the need to consider both the pixel and some neighborhood around it. In particular, when considering a traditional network with a convolution kernel size of 3 × 3, it is necessary to consider the current pixel, as well as 8 neighboring pixels, including those located diagonally, and 9 corresponding weighting factors.

Для известной нейронной сети AlexNet при обработке изображения операции свертки в сумме занимают большую часть времени работы GPU и CPU, в частности 95% времени для GPU и 90% времени для CPU тратится на выполнение операций свертки в слоях сети. При этом операция активации (ReLU), которая представляет собой функцию, применяемую для нелинейного отображения входных данных с целью улучшения обобщающей способности, и операция подвыборки (pooling) пикселей, выполняющая уменьшение размерности сформированных карт признаков, по существу сохраняя только значимую информацию, не требуют большого количества времени или памяти на реализацию.For the well-known neural network AlexNet, when processing the image, the convolution operations in total take up most of the GPU and CPU work time, in particular 95% of the time for the GPU and 90% of the time for the CPU is spent on the convolution operations in the network layers. In this case, the activation operation (ReLU), which is a function used to display input data nonlinearly with the aim of improving the generalizing ability, and a pixel pooling operation that reduces the dimension of the generated feature maps, essentially storing only significant information, do not require large the amount of time or memory to implement.

На Фиг. 1 приведено схематическое изображение единичного блока в соответствии с традиционной нейронной сетью, применение которого будет повторяться N раз, т.е. столько раз, какова глубина сети. Согласно Фиг. 1 в традиционной нейронной сети входные данные I={I_c} изображения размера PxQ, где P - ширина изображения, Q - его высота, а c=1…C - это номер канала, проходят два пути обработки, при этом входные данные представляют собой матрицу числовых значений изображения. В частности, рассматривая каналы цветного изображения B (синий), G (зеленый) и R (красный), при рассмотрении 1 пикселя рассматривается 3 числовых значения-канала и т.п. При этом входные данные по существу представляют собой набор из C карт признаков размера PxQ. На первом пути к входным данным применяется функция активации (ReLU), после чего входные данные претерпевают операцию свертки, а по второму пути входные данные проходят без изменения. При этом размер ядра зависит от количества C каналов на входе, поскольку каждое ядро также является многоканальным (количество каналов в ядре совпадает с количеством каналов на входе). Затем результаты обоих путей складываются для получения выходных данных O={O_k}, которые по существу представляют собой набор из K карт признаков размера PxQ. Далее прохождение двух вышеупомянутых путей инициируется заново с картами признаков, подаваемыми в упомянутый блок уже в качестве входных данных, и т.д. Однако следует отметить, что операция активации также может быть применена и после операции свертки, и после операции суммирования, или не применена вовсе, что остается на усмотрение инженера при выполнении конкретной поставленной задачи. В примере, представленном на Фиг. 1, применяется ядро свертки размера 3×3; при этом на данной фигуре также изображена область, включающая в себя текущий пиксель и его окрестность. Крестиками в данной области отмечены те позиции-пиксели, рассмотрение которых необходимо для получения информации о текущем пикселе с его окрестностью на одном слое нейронной сети - всего 9 параметров. При применении каждого ядра свертки на выходе получается 1 изображение размером PxQ - размер исходного изображения; таким образом, при применении K ядер будет получена матрица PxQxK, где K - количество каналов в выходном изображении (количество каналов на выходе равно количеству ядер). Например, при применении 10 ядер получаем данные с размером, равным уже P x Q x 10, т.к. будет получено 10 результатов разных сверток.In FIG. 1 is a schematic representation of a unit block in accordance with a traditional neural network, the application of which will be repeated N times, i.e. so many times what is the depth of the network. According to FIG. 1 in a traditional neural network, the input data I = {I _c } is an image of size PxQ, where P is the image width, Q is its height, and c = 1 ... C is the channel number, two processing paths pass, while the input data is matrix of numerical values of the image. In particular, considering the color image channels B (blue), G (green) and R (red), when considering 1 pixel, 3 numeric channel values are considered, etc. Moreover, the input data is essentially a set of C cards of features of size PxQ. On the first path to the input data, the activation function (ReLU) is applied, after which the input data undergoes a convolution operation, and along the second path, the input data passes unchanged. The size of the core depends on the number of C channels at the input, since each core is also multi-channel (the number of channels in the core coincides with the number of channels at the input). Then, the results of both paths are added up to obtain the output data O = {O _k }, which are essentially a set of K characteristic cards of size PxQ. Further, the passage of the two aforementioned paths is initiated anew with feature maps supplied to the mentioned block as input data, etc. However, it should be noted that the activation operation can also be applied after the convolution operation, and after the summation operation, or not at all, which remains at the discretion of the engineer when performing a specific task. In the example of FIG. 1, a 3 × 3 convolution kernel is applied; however, this figure also shows a region including the current pixel and its vicinity. The crosses in this area mark those pixel positions whose consideration is necessary to obtain information about the current pixel with its neighborhood on one layer of the neural network - only 9 parameters. When applying each convolution kernel, the output produces 1 image of size PxQ - the size of the original image; Thus, when applying K cores, the matrix PxQxK will be obtained, where K is the number of channels in the output image (the number of channels at the output is equal to the number of cores). For example, when using 10 cores, we obtain data with a size equal to P x Q x 10, because 10 results of different convolutions will be obtained.

На Фиг. 2а и 2б схематически изображены два варианта единичного блока в соответствии с раскрытой в настоящем изобретении нейронной сетью, применение которого также будет повторяться N раз, что соответствует глубине сети. Согласно предлагаемой нейронной сети принятые с помощью соответствующего устройства приема входные данные I={I_c} также проходят два пути обработки, где c - это номер канала. Согласно первому варианту, изображенному на Фиг. 2а, на первом пути к входным данным применяется функция активации (ReLU), после чего входные данные претерпевают операцию вычисления линейной взвешенной суммы (

, где w_c ^k- весовые коэффициенты, а k=1…K - номер ядра, который по существу соответствует номеру соответствующего канала на выходе) для применения обобщенного перемножения матриц (GEMM) вместо операций свертки. По второму пути входные данные проходят без изменения, но со сдвигом в одном из направлений x_i, где i=1…4-4 параметра (весовые коэффициенты, представляющие связь с каждым из 4 соседних пикселей только по горизонтали и вертикали). Операция сдвига в настоящем документе понимается в значении логического сдвига элементов набора числовых значений в заданном направлении. В частности, при применении операции сдвига к входным данным в направлении x_i числовые значения в матрице сдвигаются в этом направлении на одну позицию, а освободившиеся от числовых значений позиции заполняются нулями, что в данном случае аналогично умножению упомянутой матрицы числовых значений изображения на соответствующую матрицу сдвига. Получившийся в результате применения операции сдвига набор числовых значений в дальнейшем называется сдвинутыми входными данными.In FIG. 2a and 2b schematically depict two variants of a unit block in accordance with the neural network disclosed in the present invention, the application of which will also be repeated N times, which corresponds to the depth of the network. According to the proposed neural network, the input data I = {I _c } received using the corresponding receiving device also goes through two processing paths, where c is the channel number. According to the first embodiment shown in FIG. 2a, the activation function (ReLU) is applied on the first path to the input data, after which the input data undergoes the operation of calculating the linear weighted sum (

, where w _c ^k are weights and k = 1 ... K is the kernel number, which essentially corresponds to the number of the corresponding channel at the output) for applying generalized matrix multiplication (GEMM) instead of convolution operations. On the second path, the input data passes unchanged, but with a shift in one of the directions x _i , where i = 1 ... 4-4 parameters (weighting factors representing the relationship with each of 4 neighboring pixels only horizontally and vertically). The shift operation in this document is understood in the value of the logical shift of the elements of a set of numerical values in a given direction. In particular, when applying the shift operation to the input data in the _xi direction, the numerical values in the matrix are shifted in this direction by one position, and the positions freed from the numerical values are filled with zeros, which in this case is similar to multiplying the mentioned matrix of numerical values of the image by the corresponding shift matrix . The resulting set of numerical values resulting from the shift operation is hereinafter referred to as shifted input data.

Затем результаты обоих путей складываются для получения выходных данных O={O_k}, которые представляют собой карты признаков. Далее прохождение двух вышеупомянутых путей инициируется заново с картами признаков уже в качестве входных данных и т.д.Then, the results of both paths are added up to obtain the output data O = {O _k }, which are feature maps. Further, the passage of the two above-mentioned paths is initiated anew with feature maps already as input, etc.

Согласно второму варианту, изображенному на Фиг. 2б, на первом пути к входным данным применяется функция активации (ReLU), после чего входные данные претерпевают операцию сдвига в одном из направлений x_i, где i=1…4. Далее на первом пути применяется операция вычисления линейной взвешенной суммы (

). По второму пути входные данные проходят без изменения. Затем результаты обоих путей складываются для получения выходных данных O={O_k}, и прохождение двух вышеупомянутых путей инициируется заново с выходными данными уже в качестве входных данных и т.д. Согласно настоящему раскрытию операция активации также может быть применена и после операции свертки, и после операции суммирования или не применена вовсе, что остается на усмотрение инженера при выполнении конкретной поставленной задачи.According to the second embodiment shown in FIG. 2b, the activation function (ReLU) is applied on the first path to the input data, after which the input data undergoes a shift operation in one of the directions x _i , where i = 1 ... 4. Next, on the first path, the operation of calculating the linear weighted sum (

) On the second path, the input data passes unchanged. Then the results of both paths are added to obtain the output O = {O _k }, and the passage of the two paths mentioned above is re-initiated with the output as input, etc. According to the present disclosure, the activation operation can also be applied after the convolution operation, and after the summation operation or not at all, which is left to the discretion of the engineer when performing a specific task.

На данных фигурах также изображена область, включающая в себя текущий пиксель и его окрестность, на которой крестиками отмечены те позиции-связи, рассмотрение которых необходимо для получения информации о текущем пикселе с его окрестностью - вышеупомянутые 4 параметра, соответствующие весовым коэффициентам, присваиваемым отмеченным позициям-связям. Текущий пиксель в области на фигурах отмечен точкой. При этом на одном слое нейронной сети рассматривается одна изображенная область, т.е. пиксель в связи (совокупности) с соседним пикселем. Таким образом, для получения информации о текущем пикселе с его окрестностью рассматривается простая обработка 4 параметров в 4 слоях нейронной сети вместо известной до сих пор сложной обработки 9 параметров в одном слое. Применяемые операции сдвига и вычисления линейной взвешенной суммы проще и требуют меньших вычислительных затрат по сравнению с операцией свертки, а меньшее количество параметров для анализа той же окрестности рассматриваемого пикселя обеспечивает необходимость в меньшем объеме памяти, что дополнительно делает применение заявленного способа на мобильных устройствах более доступным.These figures also depict a region that includes the current pixel and its neighborhood, on which crosses indicate the connection positions, the consideration of which is necessary to obtain information about the current pixel with its neighborhood - the above 4 parameters corresponding to the weight coefficients assigned to the marked positions - connections. The current pixel in the area in the figures is marked with a dot. Moreover, on one layer of the neural network one image area is considered, i.e. a pixel in communication (aggregate) with a neighboring pixel. Thus, to obtain information about the current pixel with its vicinity, a simple processing of 4 parameters in 4 layers of a neural network is considered instead of the previously known complex processing of 9 parameters in one layer. The applied shift operations and linear weighted sum calculations are simpler and require less computational costs than the convolution operation, and fewer parameters for analyzing the same neighborhood of the pixel under consideration provide the need for less memory, which additionally makes the application of the claimed method on mobile devices more affordable.

Как было упомянуто выше, в нейронной сети также может применяться операция подвыборки (pooling). Традиционно операция подвыборки применяется для уменьшения размерности данных, чтобы получить максимум возможной информации из доступных данных в разном масштабе, т.е. необходимость в применении подвыборки зависит от конкретных прикладных задач. Таким образом, операция подвыборки может применяться на каждом слое нейронной сети, на любом этапе этого слоя, или не применяться вовсе.As mentioned above, a pooling operation can also be used in a neural network. Traditionally, the subsampling operation is used to reduce the dimension of the data in order to obtain the maximum possible information from the available data at different scales, i.e. the need for subsampling depends on the specific application. Thus, the subsampling operation can be applied on each layer of the neural network, at any stage of this layer, or not applied at all.

В настоящем изобретении, как это уже было указано выше, рассматривается не пиксель, а связь пикселя с соседним пикселем в одном из четырех направлений, что выражается в том, что один весовой коэффициент присваивается не одному пикселю, как это традиционно осуществляется, а сразу двум соседним пикселям. Однако необходимо отметить, что диагональные пиксели также учитываются согласно заявленному изобретению, так как одновременно с тем, как оценивается связь пикселя, например, с его соседним сверху пикселем, в отношении соседнего сверху пикселя также оценивается его связь с его соседним сбоку пикселем и т.д. Таким образом, как это было указано выше, в известной сверточной нейронной сети пиксель и область вокруг этого пикселя можно проанализировать по рассмотрению одного слоя и 9 параметров, соответствующих настраиваемым весовым коэффициентам (см. Фиг. 1), в то время как в настоящем изобретении данная область также будет проанализирована, но по рассмотрению 4 слоев сети и 4 параметров, по 1 параметру на каждый слой (см. Фиг. 2а и Фиг. 2б).In the present invention, as already mentioned above, it is not a pixel that is considered, but the pixel's relationship with a neighboring pixel in one of four directions, which is expressed in the fact that one weighting coefficient is assigned not to one pixel, as is traditionally done, but to two neighboring ones pixels. However, it should be noted that diagonal pixels are also taken into account according to the claimed invention, since at the same time as evaluating the relationship of the pixel, for example, with its neighboring top pixel, its relation to the pixel adjacent to its upper side, etc. . Thus, as indicated above, in a known convolutional neural network, a pixel and the area around this pixel can be analyzed by considering one layer and 9 parameters corresponding to custom weighting factors (see Fig. 1), while in the present invention the area will also be analyzed, but by considering 4 network layers and 4 parameters, 1 parameter for each layer (see Fig. 2a and Fig. 2b).

Операция свертки, в общем, эквивалентна обобщенному перемножению матриц (GEMM), в частности, умножению трехмерных матриц, которое в известных сверточных нейронных сетях производится множество раз. Однако в традиционных подходах для обеспечения возможности применения GEMM необходимо соответствующим образом преобразовывать данные, перегруппировывать и копировать их в памяти, что приводит к увеличению потребляемых ресурсов. Программная реализация такой операции включает в себя цикл с 6 уровнями вложения, так как на выходе получают трехмерный результат (пиксель с 3 координатами k, p, q и вычисление суммы по трем размерностям c, r, s). Классическая формула свертки, которая может быть записана для трехмерного случая, будет выглядеть следующим образом:The convolution operation is, in general, equivalent to the generalized matrix multiplication (GEMM), in particular, the multiplication of three-dimensional matrices, which is performed many times in known convolutional neural networks. However, in traditional approaches, in order to be able to use GEMM, it is necessary to appropriately transform the data, rearrange it and copy it into memory, which leads to an increase in resource consumption. The software implementation of such an operation includes a cycle with 6 investment levels, since the output is a three-dimensional result (a pixel with 3 coordinates k, p, q and the calculation of the sum over three dimensions c, r, s). The classic convolution formula that can be written for the three-dimensional case will look like this:

где w - набор из K ядер свертки размера CxRxS, C - количество каналов в ядрах (соответствует вышеупомянутому количеству каналов на входе, т.е. карт признаков в I), R - ширина ядер свертки, S - высота ядер свертки, I - набор из C карт признаков размера PxQ (входные данные), O - набор из К карт признаков размера PxQ (выходные данные), k - номер карты признаков выходных данных (равен номеру ядра свертки), p=1,…,P и q=1,…,Q - координаты текущего пикселя выходного изображения, c=1,…,C - номер карты признаков входных данных (соответствует номеру канала), r=1,…,R и s=1,…,S - локальные координаты в окрестности текущего пикселя, g - функция, контролирующая шаг по пикселям входных данных (например, через один, через два и т.п.), u и v - размеры шага по пикселям по вертикали и горизонтали. Такое программное вычисление не эффективно, так как необходимо осуществлять либо последовательное суммирование, либо распараллеливать процесс посредством создания копии, что нежелательно заполнит память. В связи с этим традиционный подход является очень сложным и ресурсозатратным.where w is a set of K convolution kernels of size CxRxS, C is the number of channels in the nuclei (corresponds to the above number of channels at the input, i.e., feature maps in I), R is the width of the convolution kernels, S is the height of the convolution kernels, I is the set from C cards of signs of size PxQ (input data), O is a set of K cards of signs of size PxQ (output data), k is the number of cards of signs of output data (equal to the number of convolution kernels), p = 1, ..., P and q = 1 , ..., Q are the coordinates of the current pixel of the output image, c = 1, ..., C is the card number of the signs of the input data (corresponds to the channel number), r = 1, ..., R and s = 1, ..., S are local to coordinates in the vicinity of the current pixel, g is the function that controls the pixel pitch of the input data (for example, through one, after two, etc.), u and v are the step sizes in pixels vertically and horizontally. Such software calculation is not effective, since it is necessary to carry out either sequential summation or parallelize the process by creating a copy, which is undesirable to fill the memory. In this regard, the traditional approach is very complex and resource-intensive.

Заявленное техническое решение аналогично применению линейной взвешенной суммы, т.к. размеры ядер R=1, S=1, т.е. фактически исключению операции свертки как таковой. Настоящее изобретение сводится к перемножению двух двумерных матриц (непосредственно GEMM):The claimed technical solution is similar to the application of linear weighted sum, because core sizes R = 1, S = 1, i.e. virtually eliminating the convolution operation as such. The present invention is reduced to the multiplication of two two-dimensional matrices (directly GEMM):

Несмотря на то, что Iпредставляет собой трехмерную матрицу, она хранится в памяти линейно, и к ней можно осуществлять доступ в памяти как к двумерной, что наглядно изображено на Фиг. 3. При этом w - это матрица весовых коэффициентов одного слоя нейронной сети, каждая строка в которой является вектором коэффициентов линейной взвешенной суммы (аналогом ядра свертки размером 1×1хC). Таким образом, выполнение k операций

сводится к одному перемножению двух матриц (см. Фиг. 3). Осуществление доступа к памяти и считывание необходимой информации, хранящейся в ней, выполняется с помощью соответствующего устройства обработки, которое сконфигурировано с возможностью выполнять все вышеупомянутые операции, необходимые для обработки принятых сигналов. Такое устройство обработки может представлять собой, например, центральный процессор (CPU) и/или графический процессор (GPU). При этом отпадает необходимость предварительной обработки входных данных перед применением GEMM.Despite the fact that Irepresents a three-dimensional matrix, it is stored linearly in the memory, and it can be accessed in the memory as a two-dimensional one, which is clearly shown in FIG. 3. Moreover, w is the matrix of weighting coefficients of one layer of the neural network, each row in which is a vector of linear weighted sum coefficients (analogous to the convolution kernel of size 1 × 1xC). Thus, performing k operations

reduced to one multiplication of two matrices (see Fig. 3). Access to the memory and reading of the necessary information stored in it is performed using the corresponding processing device, which is configured to perform all of the above operations necessary for processing the received signals. Such a processing device may be, for example, a central processing unit (CPU) and / or a graphics processing unit (GPU). This eliminates the need for pre-processing of input data before applying GEMM.

Согласно другому варианту осуществления предлагается одновременное применение операций сдвига в двух или более разных направлениях. Применение операций сдвига сразу в двух направлениях изображено на Фиг. 4а в виде двух параллельных вышеописанных путей со сдвигом в двух разных направлениях - x₁ и x₂, результаты которых складываются друг с другом и с входными данными без изменения. Полученные таким образом выходные данные затем снова подаются на вход представленного единичного блока в качестве входных данных и т.д. При применении данного другого варианта осуществления не происходит усложнений, т.к. при таком подходе количество ядер делится на количество разных направлений; таким образом, количество параметров в слое сети остается неизменным.According to another embodiment, the simultaneous use of shear operations in two or more different directions is proposed. The application of shear operations in two directions at once is shown in FIG. 4a in the form of two parallel paths described above with a shift in two different directions - x ₁ and x ₂ , the results of which are added together and with the input data without change. The output data thus obtained are then again fed to the input of the presented unit block as input data, etc. When applying this other embodiment, no complications occur, because with this approach, the number of cores is divided by the number of different directions; Thus, the number of parameters in the network layer remains unchanged.

Для наглядности на Фиг. 4б проиллюстрировано примерное представление направлений x₁ и x₂в отношении рассматриваемого изображения. Разумеется, операция сдвига может быть применена одновременно в любых двух возможных направлениях из четырех направлений x₁, x₂, x₃(соответствует направлению - x₁) и x₄(соответствует направлению - x₂).For clarity, in FIG. 4b, an exemplary representation of the directions x ₁ and x ₂ with respect to the image in question is illustrated. Of course, the shift operation can be applied simultaneously in any two possible directions from four directions x ₁ , x ₂ , x ₃ (corresponds to the direction - x ₁ ) and x ₄ (corresponds to the direction - x ₂ ).

На Фиг. 5а изображено схематическое изображение единичного блока в соответствии с заявленной нейронной сетью при рассмотрении набора последовательных изображений, согласно которому одновременно применяются операции сдвига, по существу, сразу в трех направлениях x₁, x₂ и x₃, причем в настоящем варианте осуществления направление x₃по существу соответствует «оси времени», т.е. заданной последовательности изображений в наборе, в частности, последовательности кадров в видео. Примерное представление обозначенных трех направлений в отношении рассматриваемого набора последовательных изображений показано на Фиг. 5б. Применение такого единичного блока также будет повторяться N раз, что соответствует глубине сети. Как наглядно изображено на Фиг. 5а, данный вариант осуществления аналогичен вышеописанному одновременному применению операций сдвига на примере применения операций сдвига одновременно в двух направлениях, т.е. параллельно применяются три вышеописанных пути со сдвигом в трех разных направлениях - x₁, x₂ и x₃, результаты которых складываются друг с другом и с входными данными без изменения. Полученные таким образом выходные данные затем снова подаются на вход представленного единичного блока в качестве входных данных и т.д. Таким образом, один слой заявленной нейронной сети согласно данному варианту осуществления прорабатывает весь массив изображений в последовательности (в частности, кадров в видео) одновременно и передает все выходные денные на следующий слой в качестве входных.In FIG. 5a is a schematic illustration of a unit block in accordance with the claimed neural network when considering a set of sequential images, according to which at the same time, shear operations are applied simultaneously in essentially three directions x ₁ , x ₂ and x ₃ , and in the present embodiment, the direction x ₃ along essentially corresponds to the "time axis", i.e. a given sequence of images in a set, in particular, a sequence of frames in a video. An exemplary representation of the indicated three directions with respect to the considered set of consecutive images is shown in FIG. 5 B. The use of such a unit block will also be repeated N times, which corresponds to the depth of the network. As illustrated in FIG. 5a, this embodiment is similar to the above-described simultaneous use of shear operations on the example of applying shear operations simultaneously in two directions, i.e. in parallel, the three paths described above are applied with a shift in three different directions - x ₁ , x ₂ and x ₃ , the results of which are added together and with the input data without change. The output data thus obtained are then again fed to the input of the presented unit block as input data, etc. Thus, one layer of the claimed neural network according to this embodiment processes the entire array of images in a sequence (in particular, frames in a video) at the same time and transfers all output data to the next layer as input.

На Фиг. 6а, 6б и 6в показаны графики сравнения точности результатов обучения заявленной нейронной сети с нейронной сетью ResNet на основе наборов данных CIFAR-10, CIFAR-100 и ImageNet, соответственно. Набор данных CIFAR-10 представляет собой набор данных, содержащий множество изображений из 10 различных классов. Набор данных CIFAR-100 - набор данных, содержащий множество изображений из 100 различных классов. ImageNet же является самым большим набором данных из 1000 категорий, содержащих более 1,2 миллиона изображений. Отношение итераций процесса обучения к значению процента ошибки для нейронной сети ResNet известно, например, из источника «Identity Mappings in Deep Residual Networks» (Kaiming He и др.). Исходя из результатов сравнения можно сделать вывод о том, что значение точности предложенного способа не уступает или приблизительно равно значению точности результатов в соответствии с традиционной сверточной нейронной сетью ResNet (приводится сравнение значений для метрик «Топ-1 ошибка» и «Топ-5 ошибка»). Результаты для метрики «Топ-1 ошибка» на графиках показаны с помощью сплошных линий, а результаты для метрики «Топ-5 ошибка» - с помощью пунктирных линий, при этом результаты для известной сверточной нейронной сети (с размером ядра свертки 3×3) выполнены черным цветом, а результаты для заявленной сети (с применением операций сдвига и вычисления линейной взвешенной суммы) -серым. Следует также отметить, что при рассмотрении результатов обучения на наборе CIFAR-10 для известной сети ResNet было рассмотрено 110 слоев и 1,15 млн параметров, а при рассмотрении результатов обучения на этом наборе для заявленной нейронной сети было рассмотрено 20 слоев и 1 млн параметров. На наборе CIFAR-100 для известной сети ResNet было рассмотрено 164 слоя и 1,7 млн параметров, а для заявленной нейронной сети было рассмотрено 38 слоев и 2 млн параметров. На наборе ImageNet для известной сети ResNet было рассмотрено 18 слоев и 12 млн параметров, а для заявленной нейронной сети было рассмотрено 32 слоя и 13 млн параметров. Как наглядно изображено на графиках, заявленная нейронная сеть не уступает известной нейронной сети по точности, обладая схожими значениями процента ошибки на протяжении всех итераций процесса обучения.In FIG. 6a, 6b and 6c show graphs comparing the accuracy of the learning results of the claimed neural network with the ResNet neural network based on the data sets CIFAR-10, CIFAR-100 and ImageNet, respectively. The CIFAR-10 dataset is a dataset containing many images from 10 different classes. CIFAR-100 Dataset - A dataset containing many images from 100 different classes. ImageNet is the largest data set of 1000 categories containing more than 1.2 million images. The ratio of learning process iterations to the percentage of error for a ResNet neural network is known, for example, from the source “Identity Mappings in Deep Residual Networks” (Kaiming He et al.). Based on the comparison results, we can conclude that the accuracy value of the proposed method is not inferior or approximately equal to the accuracy value of the results in accordance with the traditional convolutional neural network ResNet (a comparison of the values for the metrics “Top-1 error” and “Top-5 error” is given ) The results for the Top-1 error metric are shown on the graphs with solid lines, and the results for the Top-5 error metrics are shown with dashed lines, while the results for a known convolutional neural network (with a convolution kernel size of 3 × 3) are executed in black, and the results for the claimed network (using shift operations and calculating the linear weighted sum) are gray. It should also be noted that when considering the learning results on the CIFAR-10 set for the well-known ResNet network, 110 layers and 1.15 million parameters were considered, and when considering the learning results on this set for the declared neural network, 20 layers and 1 million parameters were considered. The CIFAR-100 set for the well-known ResNet network examined 164 layers and 1.7 million parameters, and for the claimed neural network, 38 layers and 2 million parameters were considered. On the ImageNet set, 18 layers and 12 million parameters were considered for the well-known ResNet network, and 32 layers and 13 million parameters were considered for the declared neural network. As graphically illustrated, the claimed neural network is not inferior to the known neural network in accuracy, having similar values of the percentage of error during all iterations of the learning process.

На Фиг. 7а, 7б и 7в приведены краткие таблицы результатов тестов в метрике «Топ-1 ошибка» для известной сети ResNet и заявленной нейронной сети на основе наборов данных CIFAR-10, CIFAR-100 и ImageNet, соответственно. В частности, ошибка сети ResNet на основе набора данных CIFAR-10 составляет 6,19%, в то время как ошибка настоящей сети на основе этого же набора данных составляет 6,28%; ошибка сети ResNet на основе набора данных CIFAR-100 составляет 24,83%, в то время как ошибка настоящей сети на основе этого же набора данных составляет 25,04%; а ошибка сети ResNet на основе набора данных ImageNet составляет 29,96%, в то время как ошибка настоящей сети на основе этого же набора данных составляет 30,67%. Данные тестовые результаты также подтверждают то, что заявленная нейронная сеть не уступает в точности известной сети ResNet.In FIG. 7a, 7b, and 7c show brief tables of test results in the “Top-1 Error” metric for the well-known ResNet network and the declared neural network based on the CIFAR-10, CIFAR-100, and ImageNet data sets, respectively. In particular, the error of the ResNet network based on the CIFAR-10 data set is 6.19%, while the error of the real network based on the same data set is 6.28%; the error of the ResNet network based on the CIFAR-100 data set is 24.83%, while the error of the real network based on the same data set is 25.04%; and the error of the ResNet network based on the ImageNet data set is 29.96%, while the error of the real network based on the same data set is 30.67%. These test results also confirm that the claimed neural network is not inferior in accuracy to the well-known ResNet network.

Следует отметить, что в отношении других известных нейронных сетей, например, таких как NIN, DSN, FitNet, Highway, ELU, original-ResNet, stoc-depth, pre-act-ResNet и др., получены аналогичные результаты сравнения, подтверждающие что заявленная нейронная сеть не уступает или приблизительно равна по точности в сравнении с данными известными нейронными сетями, результаты применения которых раскрыты, например, в статье «Wide residual networks» (Sergey Zagoruyko, Nikos Komodakis, в редакции от 14 июня 2017), в частности, в таблице 5 на стр. 8 данной статьи. При этом следует отметить, что сети, обладающие незначительно большим значением процента точности (т.е. меньшим значением процента ошибки), используют гораздо большее количество параметров.It should be noted that in relation to other well-known neural networks, for example, such as NIN, DSN, FitNet, Highway, ELU, original-ResNet, stoc-depth, pre-act-ResNet, etc., similar comparison results were obtained, confirming that the claimed the neural network is not inferior or approximately equal in accuracy in comparison with the data of known neural networks, the results of which are disclosed, for example, in the article “Wide residual networks” (Sergey Zagoruyko, Nikos Komodakis, revised June 14, 2017), in particular, table 5 on page 8 of this article. It should be noted that networks with a slightly higher percentage of accuracy (i.e., a lower percentage of error) use a much larger number of parameters.

Дополнительно следует отметить, что для обнаружения типа сети или проверки применения конкретной сети, зная ее параметры, можно наложить на изображение специально сконструированный шумовой сигнал. На глаз для человека изображение не изменится, однако результат распознавания станет неверным, т.к. нейронная сеть будет введена в заблуждение. Конкретный неверный результат распознавания заведомо известен разработчику сконструированного шумового сигнала, что позволяет ему детектировать применение конкретной нейронной сети, для которой данный шумовой сигнал был сконструирован. Способы конструирования таких сигналов известны из уровня техники и раскрыты, например, в источнике информации: ʺIntriguing properties of neural networksʺ, Christian Szegedy et al., ArXiv, 2013.In addition, it should be noted that to detect the type of network or verify the application of a particular network, knowing its parameters, you can impose on the image a specially designed noise signal. The image will not change by eye for a person, however, the recognition result will become incorrect, because the neural network will be misled. The specific incorrect recognition result is known to the developer of the constructed noise signal, which allows him to detect the use of a specific neural network for which this noise signal was designed. Methods for constructing such signals are known from the prior art and are disclosed, for example, in the information source: rigIntriguing properties of neural networksʺ, Christian Szegedy et al., ArXiv, 2013.

Заявленное изобретение может найти применение в области самоуправляемых транспортных средств (для распознавания изображений и видео, в частности, для обнаружения пешеходов, дорожных знаков и транспортных средств), видео конференциях (для обработки видео, в частности, для сжатия и расшифровки видео, «улучшения» видео посредством достройки деталей), медицине (ультразвуковая область, в частности, обнаружение повреждений, улучшение качества изображения посредством удаления шумов), безопасности мобильных устройств (для идентификации пользователя, в частности, распознавания радужной оболочки глаз, лица или отпечатков пальцев).The claimed invention can find application in the field of self-driving vehicles (for recognizing images and videos, in particular for detecting pedestrians, traffic signs and vehicles), video conferences (for video processing, in particular for compressing and decrypting videos, “improving” video by completing parts), medicine (ultrasound, in particular, damage detection, improving image quality by removing noise), security of mobile devices (to identify benefits Vatel, in particular, recognition of the iris, facial or fingerprint).

Специалисты в данной области техники должны понимать, что показанные варианты осуществления являются примерными и, по мере необходимости, могут быть скорректированы для достижения большей эффективности в конкретном применении, если в описании конкретно не указано иное. Упоминание элементов системы в единственном числе не исключает множества таких элементов, если в явном виде не указано иное.Specialists in the art should understand that the shown embodiments are exemplary and, as necessary, can be adjusted to achieve greater efficiency in a particular application, unless otherwise specified in the description. Mention of elements of the system in the singular does not exclude many of these elements, unless explicitly stated otherwise.

Хотя в настоящем описании показаны примерные варианты осуществления изобретения, следует понимать, что различные изменения и модификации могут быть выполнены, не выходя за рамки объема охраны настоящего изобретения, определяемого прилагаемой формулой изобретения.Although exemplary embodiments of the invention are shown in the present description, it should be understood that various changes and modifications can be made without departing from the scope of protection of the present invention defined by the attached claims.

Claims

1. A method of processing signals using a neural network having N layers, comprising stages in which:

- submit the input data to the current layer of the trained neural network, and the input data is a set of numerical values of the signal for processing;

- process the submitted input data to obtain output data; and

- if the number of the current layer of the neural network is less than N, go to the next layer of the neural network and repeat the steps of the method using the obtained output as input, and

- if the number of the current layer of the neural network is N, output data is output;

wherein the processing step comprises:

applying the input data shift operation in one of the given directions to obtain shifted input data, calculating the linear weighted sum of the input data and summing the shifted input data and the result of calculating the weighted sum to obtain the output data, and this processing is applied to each given direction, or

applying an input data shift operation in one or more specified directions to obtain shifted input data in each of one or more specified directions, calculating a linear weighted sum of shifted input data in each of one or more specified directions, and summing the input data and the results of calculating the weighted sum each of one or more specified directions to obtain output data.

2. The method according to claim 1, wherein the calculation of the linear weighted sum is reduced to the operation of generalized multiplication of two matrices (GEMM).

3. The method according to claim 1, wherein the method further comprises preliminary steps in which:

- receive an approximate labeled data set for training;

- train the neural network based on the obtained approximate labeled data set by determining the corresponding weighting factors.

4. The method according to claim 1, additionally containing stages, which apply the operation of activation and / or subsampling of pixels in any predetermined order.

5. The method according to claim 1, wherein the signal is an image, and predetermined directions are directions toward each of adjacent adjacent pixels.

6. The method according to claim 1, wherein the output is the name of the signal.

7. The method according to claim 1, wherein the output is a signal classification.

8. The method according to claim 1, wherein the output is a list of the most likely signal names.

9. A system for processing signals using a neural network having N layers, containing:

- a receiving device, configured to receive input data, which is a set of numerical values of the signal for processing;

- a memory configured to store received input data;

- a processing device configured to read the input data from the memory and perform the steps of the signal processing method according to claim 1,

moreover, the processing device is further configured to write output data output by the processing device to the memory.

10. The system for processing signals using the neural network according to claim 9, in which the processing device is a Central processing unit (CPU) and / or graphic processor (GPU).