RU2716914C1

RU2716914C1 - Method for automatic classification of x-ray images using transparency masks

Info

Publication number: RU2716914C1
Application number: RU2019133540A
Authority: RU
Inventors: Анатолий Рудольфович Дабагов; Сергей Алексеевич Филист; Дмитрий Сергеевич Кондрашов
Priority date: 2019-10-22
Filing date: 2019-10-22
Publication date: 2020-03-17

Abstract

FIELD: image processing means.

SUBSTANCE: invention relates to digital image processing methods and can be used in intelligent X-ray classification systems. Result is achieved with a manual of automatic classification of X-ray images using transparency masks, which involves forming an X-ray digital image in form of an array of optical densities of the object, obtaining depth layers of the image by processing the original digital image with local filters unique for each layer, reducing the dimensionality of images in the deep layers using a technique of pooling (sub-sampling), formation of a space of informative features for a trained fully connected neural network from sub-sampled deep layers and classification of the obtained vector of informative features by means of a fully-connected neural network.

EFFECT: technical result consists in improvement of accuracy of recognition of areas of interest when analyzing graphic information.

8 cl, 12 dwg

Description

Изобретение относится к способам цифровой обработки изображений и может быть использовано в интеллектуальных системах классификации рентгеновских снимков. При построении интеллектуальных систем дифференциальной диагностики онкологических заболеваний используются искусственные нейронные сети (ИНС). ИНС - это компьютерные модели, которые способны дублировать способности человеческого интеллекта посредством использования вычислительной мощности компьютеров, и, обрабатывая большие объемы информации, обучаться на аннотированных примерах. The invention relates to digital image processing methods and can be used in intelligent x-ray classification systems. When constructing intelligent systems for the differential diagnosis of cancer, artificial neural networks (ANNs) are used. ANNs are computer models that can duplicate the capabilities of human intelligence by using the computing power of computers, and, when processing large amounts of information, learn from annotated examples.

Существует два подхода использования ИНС для интерпретации рентгеновских снимков. Первый подход заключается в выделении вектора информативных признаков из изображения или его сегмента, а затем построения классифицирующей модели с учетом вектора информативных признаков: обучаемого или не обучаемого классификатора. Второй подход предполагает применение классификатора непосредственно к данным изображения или области интереса. There are two approaches to using ANNs for interpreting x-rays. The first approach is to extract the vector of informative features from the image or its segment, and then construct a classification model taking into account the vector of informative features: a trained or non-trained classifier. The second approach involves applying the classifier directly to image or area of interest data.

Первый подход назовем попиксельной классификацией. Это значит, что классификация осуществляется путем отнесения каждого пикселя изображения к определенному классу. Как правило, это бинарная классификация, в результате которой на выходе классификатора получаем бинарное изображение с исходным растром, в котором пикселям искомого класса присваивается код 255, а пикселям, не относящихся к искомому классу – код 0. В качестве реализации способа классификации, основанного на таком подходе, рассмотрим автоматизированную систему обнаружения микрокальцификаций в цифровых маммограммах [Computer aided detection system for microcalcifications in digital mammograms/Hayat Mohameda, Mai S. Mabroukb, Amr Sharawy //Computer methods and programs in biomedicine 116 (2014) 226–235, https://doi.org/10.1016/j.cmpb.2014.04.010]. Способ классификации состоит из четырех этапов: 1- этап предварительной обработки изображения, который может быть реализован посредством известных способов повышения контрастности изображения за счет видоизменения гистограмм, и удаление шумов на изображении, например, посредством медианной фильтрации или посредством морфологической фильтрации; 2 – этап сегментации или выделения областей интереса на изображении, который может быть осуществлен посредством пороговой обработки, или более сложными способами, например, [Патент 2629629 Российская Федерация, МПК G06K 9/34. Заявка - № 2016132680; заявл. 09.08.16; опубл. 30.08.17, Бюл. № 25]; 3 – этап выделения информативных признаков, способы выделения которых основаны на анализе текстурных, спектральных, статистических или геометрических характеристик сегмента; 4 – этап классификации, который предусматривает построение обучаемого или не обучаемого классификаторов, в качестве которого могут быть использованы ИНС, машина опорных векторов, метод k - ближайших соседей и т.д.The first approach is called pixel-by-pixel classification. This means that the classification is carried out by assigning each image pixel to a specific class. As a rule, this is a binary classification, as a result of which the classifier receives a binary image with an initial raster in which the code for the pixels of the desired class is assigned 255, and for pixels outside the class it is assigned the code 0. As an implementation of the classification method based on this approach, consider an automated system for detecting microcalcifications in digital mammograms [Computer aided detection system for microcalcifications in digital mammograms / Hayat Mohameda, Mai S. Mabroukb, Amr Sharawy // Computer methods and programs in biomedicine 116 (2014) 226–235, https: / /doi.org/10.1016/j.cmpb.2014. 04.010]. The classification method consists of four stages: 1 - the stage of preliminary processing of the image, which can be implemented by known methods of increasing the contrast of the image by modifying the histograms, and removing noise in the image, for example, by median filtering or by morphological filtering; 2 - the stage of segmentation or allocation of areas of interest in the image, which can be carried out by threshold processing, or in more complex ways, for example, [Patent 2629629 Russian Federation, IPC G06K 9/34. Application - No. 2016132680; declared 08/09/16; publ. 08/30/17, Bull. No. 25]; 3 - the stage of selection of informative features, the methods of selection of which are based on the analysis of texture, spectral, statistical or geometric characteristics of the segment; 4 - classification stage, which provides for the construction of trained or non-trained classifiers, which can be used as ANNs, a machine of support vectors, method k - nearest neighbors, etc.

Для реализации второго подхода используется технология сверточных нейронных сетей (СНС) [Aksenov S.V., Kostin K.A., Ivanova A.V., Liang J., Zamyatin A.V. An ensemble of convolutional neural networks for the use in video endoscopy. Sovremennye tehnologii v medicine 2018; 10(2): 7–19, https://doi.org/10.17691/stm2018.10.2.01]. Задача построения архитектуры СНС сводится к «сворачиванию» размеров входного изображения к трехмерным слоям с размерами 2×2 или 1×1 таким образом, чтобы получить выходные сигналы в виде вероятности принадлежности входного изображения к одному из представленных классов. Архитектура СНС разделяется на несколько блоков (слоев) с определенными значениями характеристик, чередование которых позволяет сформировать наиболее эффективную архитектуру СНС. Как и в случае с многослойными НС прямого распространения, при проектировании архитектуры СНС увеличение количества слоев и связей внутри сети дает возможность строить более сложные модели, позволяющие оперировать более сложными образами. To implement the second approach, convolutional neural network (SNA) technology is used [Aksenov S.V., Kostin K.A., Ivanova A.V., Liang J., Zamyatin A.V. An ensemble of convolutional neural networks for the use in video endoscopy. Sovremennye tehnologii v medicine 2018; 10 (2): 7–19, https://doi.org/10.17691/stm2018.10.2.01]. The task of constructing the SNA architecture is to “collapse” the size of the input image to three-dimensional layers with dimensions 2 × 2 or 1 × 1 in such a way as to obtain output signals in the form of the probability that the input image belongs to one of the presented classes. The architecture of the SNA is divided into several blocks (layers) with certain values of the characteristics, the alternation of which allows you to create the most effective architecture of the SNA. As in the case of multi-layered NS of direct distribution, when designing the SNA architecture, an increase in the number of layers and links within the network makes it possible to build more complex models that can handle more complex images.

Известна автоматизированная система диагностики медицинских изображений с использованием глубоких СНС (патент США 9,589,374 B1, 07.03.2017). Данное изобретение раскрывает методы применения глубоких СНС к анализу медицинских изображений для диагностики в режиме реального времени. В приведенном изобретении применяется анализ КТ и МРТ снимков, которые обрабатываются с применением двух СНС и других программных модулей, для получения отклика с вероятностью наличия областей интереса на снимках пациента, которые необходимы для дальнейшего анализа лечащим врачом.A well-known automated system for diagnosing medical images using deep SNA (US patent 9,589,374 B1, 03/07/2017). This invention discloses methods for applying deep SNA to the analysis of medical images for real-time diagnosis. The invention uses the analysis of CT and MRI images, which are processed using two SNA and other software modules, to obtain a response with the likelihood of areas of interest in the patient’s images, which are necessary for further analysis by the attending physician.

В статье «Automatic Liver and Tumor Segmentation of CT and MRI Volumes Using Cascaded Fully Convolutional Neural Networks» (Patrick Ferdinand Christ et al. Computer Vision and Pattern Recognition (cs.CV); Artificial Intelligence (cs.AI). 20.02.2017) рассматриваются подходы к автоматизированному анализу КТ и МРТ снимков для и выявления патологий печени, при этом используется сверточная нейронная сеть U-NET типа.In the article “Automatic Liver and Tumor Segmentation of CT and MRI Volumes Using Cascaded Fully Convolutional Neural Networks” (Patrick Ferdinand Christ et al. Computer Vision and Pattern Recognition (cs.CV); Artificial Intelligence (cs.AI). 02.20.2017) approaches to the automated analysis of CT and MRI images are considered for detecting liver pathologies, and a U-NET type convolutional neural network is used.

Недостаток вышеперечисленных технических решений заключается в том, что СНС настраивается на решение конкретной задачи. Отсутствие универсальности затрудняет ее использование в рентгеновской диагностике. Для преодоления этого недостатка в структуре СНС используют распараллеливание каналов анализа данных таким образом, чтобы каждый канал был настроен на определенный класс изображений.The disadvantage of the above technical solutions is that the SNA is configured to solve a specific problem. The lack of versatility makes it difficult to use in x-ray diagnostics. To overcome this drawback in the structure of the SNA, parallelization of data analysis channels is used so that each channel is tuned to a particular class of images.

В работе [Анохина О.А., Шустова М.В. Методы нейросетевой обработки биомедицинских данных МРТ. – Сборник научных трудов по итогам V Международной научно-практической конференции «Актуальные проблемы медицины в России и за рубежом» (г. Новосибирск, 11 февраля 2018 г.), 2018, С.37-46.] предлагается решение задачи автоматической сегментации снимков МРТ головного мозга на разные классы тканей с помощью набора СНС. Для каждого класса входное изображение разбивается на некоторое число фрагментов разного размера. Каждая СНС обучается классифицированию фрагментов определенного размера. Качество классификации оказывается выше, чем в случае использования только одной сети. Это объясняется тем, что отдельная нейронная сеть лучше настраивается на извлечение признаков из фрагмента конкретного размера. Количество выходных классов N задается пользователем. В проведенных экспериментах для обработки снимков новорожденных N задавалось равным 9 (8 классов тканей и фон), для снимков пожилых взрослых N=8 (7 классов тканей и фон), для снимков молодых взрослых N=7 (6 классов тканей и фон). Сегментация снимков с помощью СНС показывает хорошие результаты.In the work [Anokhina O.A., Shustova M.V. Methods of neural network processing of biomedical MRI data. - A collection of scientific papers based on the results of the V International Scientific and Practical Conference “Actual Problems of Medicine in Russia and Abroad” (Novosibirsk, February 11, 2018), 2018, pp. 37-46.] A solution to the problem of automatic segmentation of MRI images is proposed brain to different classes of tissues using a set of SNS. For each class, the input image is divided into a number of fragments of different sizes. Each SNA is trained to classify fragments of a certain size. The quality of classification is higher than in the case of using only one network. This is because a separate neural network is better tuned to extract features from a fragment of a specific size. The number of output classes N is set by the user. In the experiments performed, for processing images of newborns, N was set equal to 9 (8 tissue classes and background), for older adults, N = 8 (7 tissue classes and background), and for young adults, N = 7 (6 tissue classes and background). Segmentation of images using the SNA shows good results.

Однако увеличение каналов анализа данных приводит к необоснованному увеличению схемы принятия решения, к неприемлемым экономическим и временным затратам. Оперативность принятия решений важно при массовых скрининг-обследованиях или обследовании реанимационных больных.However, an increase in the channels of data analysis leads to an unreasonable increase in the decision-making scheme, to unacceptable economic and time costs. Efficiency of decision-making is important during mass screening examinations or examination of resuscitation patients.

В классификаторах часто применяют предварительную обработку входных данных. Для предобработки последовательно применяются метод «winsorizing», удаляющий выбросы яркости, и метод нормализации яркости изображения N4ITK [Nicholas J. Tustison, Brian B. Avants, Philip A. Cook N4ITK: Improved N3 Bias Correction // IEEE Transactions on Medical Imaging (Volume: 29, Issue: 6, June 2010)], которые статистически значимо улучшают качество сегментации. После этого изображение преобразуется так, чтобы математическое ожидание и дисперсия яркости составляли 0 и 1 соответственно. Classifiers often use preprocessing of input data. For pre-processing, the “winsorizing” method, which removes brightness spikes, and the N4ITK image brightness normalization method [Nicholas J. Tustison, Brian B. Avants, Philip A. Cook N4ITK: Improved N3 Bias Correction // IEEE Transactions on Medical Imaging (Volume: 29, Issue: 6, June 2010)], which statistically significantly improve the quality of segmentation. After that, the image is converted so that the mathematical expectation and brightness variance are 0 and 1, respectively.

Качество сегментации улучшается за счет увеличения количества размеченных изображений с помощью поворотов на углы, кратные 90º. В архитектуре СНС, используемой в работе [Sérgio Pereira, Adriano Pinto, Victor Alves, and Carlos A. Silva. Brain Tumor Segmentation Using Convolutional Neural Networks in MRI Images // IEEE Transactions on Medical Imaging (Volume: 35, Issue: 5, May 2016)], задействованы следующие слои:The quality of segmentation is improved by increasing the number of marked images using rotations by angles that are multiples of 90º. In the SNA architecture used in [Sérgio Pereira, Adriano Pinto, Victor Alves, and Carlos A. Silva. Brain Tumor Segmentation Using Convolutional Neural Networks in MRI Images // IEEE Transactions on Medical Imaging (Volume: 35, Issue: 5, May 2016)], the following layers are involved:

– сверточные (Conv. – Convolutional layer);- convolutional (Conv. - Convolutional layer);

– субдискретизирующие слои, выбирающие максимум (Max-pool.);- sub-sampling layers that select the maximum (Max-pool.);

– полносвязные (FC – Fully-Connected layer).- fully connected (FC - Fully-Connected layer).

Переход от обработки последовательности двухмерных изображений к трехмерным позволяет получить больше информации об исследуемом объекте. Эффективность использования 3D-сверточных нейронных сетей при сегментации областей интереса продемонстрирована в [Konstantinos Kamnitsas, Christian Ledig, Virginia F.J. Newcombe, Joanna P. Simpson, Andrew D. Kane, David K. Menon, Daniel Rueckert, Ben Glocker. Efficient Multi-Scale 3D CNN with fully connected CRF for Accurate Brain Lesion Segmentation. // Medical Image Analysis, Volume 36, pages 61-78. URL: https://arxiv.org/pdf/1603.05959.pdf (дата обращения: 25.01.2018)]. Структура используемой многомасштабной трехмерной СНС выделяется использованием «двух путей» (dual pathway multi-scale 3D Convolutional Neural Network) – на одном пути сеть обучается определению местоположения зоны интереса внутри мозга, на другом фиксируется внешний вид структур.The transition from processing a sequence of two-dimensional images to three-dimensional allows you to get more information about the investigated object. The effectiveness of using 3D convolutional neural networks for segmentation of areas of interest was demonstrated in [Konstantinos Kamnitsas, Christian Ledig, Virginia F.J. Newcombe, Joanna P. Simpson, Andrew D. Kane, David K. Menon, Daniel Rueckert, Ben Glocker. Efficient Multi-Scale 3D CNN with fully connected CRF for Accurate Brain Lesion Segmentation. // Medical Image Analysis, Volume 36, pages 61-78. URL: https://arxiv.org/pdf/1603.05959.pdf (accessed: 01/25/2018)]. The structure of the used multi-scale three-dimensional SNA is distinguished by the use of “two paths” (dual path multi-scale 3D Convolutional Neural Network) - on one path the network is trained to determine the location of the zone of interest inside the brain, on the other the appearance of the structures is fixed.

Данное решение является, по своей технической сути, наиболее близким аналогом. Основным недостатком данного решения является такая настройка СНС, которая не подразумевает разделения откликов по весомым коэффициентам с их последующим перевзешиванием в слоях каждой СНС и разделением обучающей выборке по типу патологий в процессе обучения ансамбля, что приводит к достаточно высокой степени появления ошибок в ходе распознавания изменений структур на рентгеновских снимках. При этом данное решение, как таковое, не используется для анализа рентгеновских снимков.This solution is, in its technical essence, the closest analogue. The main disadvantage of this solution is the adjustment of the SNA, which does not imply the separation of responses by weighted coefficients with their subsequent re-weighing in the layers of each SNA and the separation of the training sample by the type of pathology during the training of the ensemble, which leads to a rather high degree of errors during recognition of structural changes on x-rays. However, this solution, as such, is not used for the analysis of x-ray images.

Технической задачей предлагаемого способа является минимизация ошибки ложных срабатываний ансамбля СНС и, соответственно, увеличение точности распознавания областей интереса при анализе графический информации за счет новой архитектуры СНС.The technical task of the proposed method is to minimize the error of false positives of the SNA ensemble and, accordingly, increase the recognition accuracy of areas of interest when analyzing graphic information due to the new SNA architecture.

Поставленная задача достигается тем, что в известном способе автоматической классификации рентгеновских снимков, предусматривающем формирование рентгеновского цифрового изображения в виде матрицы оптических плотностей объекта, получение слоев изображения (размерность по глубине) путем обработки исходного цифрового изображения локальными фильтрами, уникальными для каждого слоя, последующим снижением размерности изображений в глубинных слоях посредством технологии пулинга (субдискретизаии), формирование пространства информативных признаков для обучаемой полносвязной нейронной сети разверткой глубинных слоев, обучение полносвязной нейронной сети на примерах рентгеновских снимков с заданными морфологическими образованиями; входное цифровое изображение дополняется маской прозрачности, локальные фильтры реализуются в виде тождественных операторов, которые формируют глубинные слои посредством индексации масштаба маски фильтра. При этом технология пулинга осуществляется путем формирования трехмерных тензоров по два на каждый глубинный слой путем обработки глубинных слоев двумя дифференциальными операторами, число элементов которых определяется масштабом глубинного слоя, с бинарными выходами и пороговой функцией активации, при этом каждый масштаб будет характеризоваться трехмерными мегапикселями, число которых определяется масштабной маской соответствующего глубинного слоя.The problem is achieved in that in the known method for the automatic classification of x-ray images, providing for the formation of an x-ray digital image in the form of a matrix of optical densities of the object, obtaining image layers (dimensionality in depth) by processing the original digital image with local filters unique to each layer, followed by a decrease in dimension images in deep layers through the technology of pooling (subsampling), the formation of space is informative x attributes for a fully-trained neural network scan deep layers fully connected neural network training examples radiographs with predetermined morphological formations; the input digital image is complemented by a transparency mask, local filters are implemented in the form of identical operators that form deep layers by indexing the scale of the filter mask. In this case, the pooling technology is carried out by forming three-dimensional tensors two for each depth layer by processing the depth layers with two differential operators, the number of elements of which is determined by the scale of the depth layer, with binary outputs and a threshold activation function, with each scale being characterized by three-dimensional megapixels, the number of which determined by the scale mask of the corresponding depth layer.

В качестве субдискретизаторов используются маска прозрачности и однослойные персептроны. Маска прозрачности прореживает трехмерные тензоры информативных признаков таким образом, что на выходе остаются только информативные признаки, принадлежащие к одному сегменту. Однослойные персептроны обучены на классификацию определенного класса патологии при условии, что на их вход поступает вектор информативных признаков, соответствующий мегапикселю сегмента определённого масштаба, следовательно, их структура не зависит от структуры сегмента, а зависит только от глубины трехмерного тензора информативных признаков. Однослойный персептрон агрегирует решение по всем мегапикселям данного прореженного трехмерного тензора классифицируемого сегмента. Результаты классификации мегасегментов с однослойных персептронов поступают на интеллектуальный анализатор, в котором происходит агрегация результатов классификации по всему сегменту с выявлением выбросов в каждом однослойном персептроне и анализом выбросов по всем однослойным персептронам. Агрегированные результаты классификации однослойных персептронов формируют входы полносвязной нейронной сети, настроенной на классификацию сегмента с заданной патологией.As a sub-sampler, a transparency mask and single-layer perceptrons are used. The transparency mask thinnes out three-dimensional tensors of informative features in such a way that only informative features that belong to the same segment remain at the output. Single-layer perceptrons are trained to classify a specific class of pathology, provided that they receive a vector of informative features corresponding to a megapixel of a segment of a certain scale, therefore, their structure does not depend on the structure of the segment, but depends only on the depth of the three-dimensional tensor of informative features. A single-layer perceptron aggregates the solution for all megapixels of this thinned three-dimensional tensor of the classified segment. Classification results of mega segments from single-layer perceptrons are sent to an intelligent analyzer, in which aggregation of the classification results for the entire segment is performed with identification of emissions in each single-layer perceptron and analysis of emissions for all single-layer perceptrons. The aggregated results of the classification of single-layer perceptrons form the inputs of a fully connected neural network configured to classify a segment with a given pathology.

На фиг.1 представлена структурная схема устройства, осуществляющего предлагаемый способ.Figure 1 presents the structural diagram of a device implementing the proposed method.

На фиг.2 представлена укрупненная схема алгоритма, реализующего предлагаемый способ.Figure 2 presents an enlarged diagram of an algorithm that implements the proposed method.

На фиг. 3 представлены примеры масок прозрачности, определяемых на первом этапе классификации изображения рентгеновского снимка.In FIG. Figure 3 shows examples of transparency masks defined in the first step in classifying an x-ray image.

На фиг.4 представлена схема взаимодействия тождественного оператора и исходного изображения для тождественного оператора с матрицей свертки 2 х 2.Figure 4 presents a diagram of the interaction of the identity operator and the source image for the identity operator with a convolution matrix of 2 x 2.

На фиг. 5 представлена иллюстрация технологии классического пулинга.In FIG. 5 is an illustration of classic pooling technology.

На фиг. 6 представлена иллюстрация взаимодействия горизонтального вельвета Хаара с масштабной маской глубинного изображения (а) и формирование третьего измерения у тензора информативных признаков (б).In FIG. 6 illustrates the interaction of a horizontal Haar velveteen with a large-scale mask of an in-depth image (a) and the formation of a third dimension in the tensor of informative features (b).

На фиг. 7 показан фрагмент рентгеновского снимка с наложенной масштабной маской.In FIG. 7 shows a fragment of an x-ray with a superimposed scale mask.

На фиг. 8 представлен пример трехмерного тензора информативных признаков для сегмента размером 80 х 64 пикселя и маски (мегапикселя) 16 х 16 пикселей, используемого для формирования пространства информативных признаков для однослойного персептрона (слабого классификатора).In FIG. Figure 8 shows an example of a three-dimensional tensor of informative features for a segment of 80 x 64 pixels and a mask (megapixel) of 16 x 16 pixels, used to form an informative feature space for a single-layer perceptron (weak classifier).

На фиг. 9 показана схема алгоритма пулинга.In FIG. 9 shows a design of a pooling algorithm.

На фиг. 10 представлена структурная схема классификатора, реализующего предлагаемый способ.In FIG. 10 shows a structural diagram of a classifier that implements the proposed method.

На фиг. 11 представлена структура полносвязной нейронной сети, используемой на четвертом этапе классификации сегментов изображения рентгеновского снимка. In FIG. 11 shows the structure of a fully connected neural network used in the fourth stage of classification of x-ray image segments.

На фиг. 12 показана схема алгоритма работы интеллектуального анализатора.In FIG. 12 shows a diagram of the operation algorithm of an intelligent analyzer.

Устройство фиг. 1 состоит из цифрового рентгеновского аппарата 1, компьютера 2; первым входом подключённого к выходу рентгеновского аппарата 1, блока памяти 3, предназначенного для хранения рентгеновских снимков, входом подключенного к первому выходу компьютера 2, а выходом – ко второму входу компьютера 2, который включает файлы данных 4 с полутоновыми растровыми изображениями рентгеновских снимков, файлы с масками прозрачности 5 и файлы с обработанными (классифицированными) изображениями рентгеновских снимков 6; блока памяти 7, входы подключенного к третьему выходу компьютера 3, а выходы – к третьему входу компьютера 3, предназначенного для хранения рабочих программ и моделей сверточных нейронных сетей, и видеомонитора 8, подключенного ко второму выходу компьютера 2. The device of FIG. 1 consists of a digital x-ray apparatus 1, computer 2; the first input of the X-ray apparatus 1 connected to the output, a memory unit 3 for storing X-ray images, the input connected to the first output of the computer 2, and the output - to the second input of the computer 2, which includes data files 4 with halftone raster images of x-ray images, files with masks of transparency 5 and files with processed (classified) images of x-rays 6; memory block 7, the inputs connected to the third output of computer 3, and the outputs to the third input of computer 3, designed to store work programs and models of convolutional neural networks, and a video monitor 8, connected to the second output of computer 2.

Способ реализуется посредством последовательности программных модулей 9…15 (фиг.2) включающих технологию пуллинга.The method is implemented through a sequence of software modules 9 ... 15 (figure 2) including pulling technology.

Технология пуллинга реализуется посредством выполнения последовательности программных модулей 16…34, представленных на фиг. 9.Pulling technology is implemented by executing a sequence of software modules 16 ... 34 shown in FIG. nine.

Классификатор фиг. 10 включает последовательные процедуры преобразования рентгеновского снимка 35 в глубинные слои 36, глубинные слои 36 в трехмерные тензоры информативных признаков 37, прореживание тензоров информативных признаков 37 посредством маски прозрачности 38, последовательное подключение мегапикселей тензоров 37 посредством блока мультиплексоров 39 к однослойным персептронам 40, классификацию вектора размерностью 2U, где U – число глубинных слоев, интеллектуальным анализатором 41, и классификацию области интереса посредством полносвязной нейронной сети 42. Выход (2U + 1) интеллектуального анализатора 41 подключен к входу (2U + 1) блока мультиплексоров 39.The classifier of FIG. 10 includes successive procedures for converting an X-ray image 35 into depth layers 36, depth layers 36 into three-dimensional tensors of informative features 37, thinning out tensors of informative features 37 using a transparency mask 38, sequentially connecting megapixels of tensors 37 through multiplexer block 39 to single-layer perceptrons 40, vector classification by dimension 2U, where U is the number of deep layers, by the intelligent analyzer 41, and the classification of the region of interest by means of a fully connected neural network ty 42. The output (2U + 1) of the intelligent analyzer 41 is connected to the input (2U + 1) of the unit of multiplexers 39.

Полносвязная нейронная сеть 42 фиг. 11 состоит из последовательно соединенных входного слоя 43, скрытого слоя 44 и выходного слоя 45.The fully connected neural network 42 of FIG. 11 consists of a series-connected input layer 43, a hidden layer 44 and an output layer 45.

Способ реализуется согласно алгоритму, схема которого представлена на фиг. 2. Посредством рентгеновского аппарата 1 получают рентгеновский снимок. Это изображение с помощью компьютера 2 фиг. 1 загружается в файлы данных 4 блока памяти 3 фиг. 1. После этого компьютер приступает к его обработке, включающей четыре этапа. The method is implemented according to the algorithm, the scheme of which is presented in FIG. 2. By means of the x-ray apparatus 1, an x-ray is obtained. This computer image 2 of FIG. 1 is loaded into data files 4 of a memory unit 3 of FIG. 1. After that, the computer proceeds to its processing, which includes four stages.

На первом этапе формируется матрица прозрачности (блок 10 фиг. 2). Размер матрицы прозрачности соответствует размеру матрицы рентгеновского снимка. Элементы матрицы прозрачности принимают значения нуля или единицы, то есть она является бинарной матрицей. Значение единицы принимают те пиксели матрицы прозрачности, которые принадлежат области интереса рентгеновского снимка (сегменты). Следовательно, формирователь матрицы прозрачности является двух альтернативным классификатором, позволяющим выделить на рентгеновском снимке области интереса, которые классифицируются на следующих этапах более «тонкими» классификаторами, позволяющими определить, к какому заболеванию может быть отнесена соответствующая область интереса. Процедура выделения зон интереса, соответствующая первому этапу обработки, и представленная на схеме алгоритма фиг.2 блоком 10, подробно описана в работе [Метод каскадной сегментации рентгенограмм молочной железы / Дабагов А.Р. и др. // Известия ЮЗГУ. Серия Управление, вычислительная техника, информатика. Медицинское приборостроение. – 2019. – Т.9. № 1(30) – С.49-61]. Маски прозрачности хранятся в файлах 3 фиг. 1.At the first stage, a transparency matrix is formed (block 10 of Fig. 2). The size of the transparency matrix corresponds to the size of the x-ray matrix. Elements of the transparency matrix take values of zero or one, that is, it is a binary matrix. The value of unity is taken by those pixels of the transparency matrix that belong to the region of interest of the x-ray image (segments). Consequently, the transparency matrix shaper is two alternative classifiers, which make it possible to identify areas of interest on the x-ray, which are classified at the next stages by more “thinner” classifiers, which make it possible to determine to which disease the corresponding area of interest can be assigned. The procedure for identifying zones of interest corresponding to the first stage of processing, and presented in block diagram 10 in the flowchart of FIG. 2, is described in detail in [Method of cascading segmentation of breast radiographs / Dabagov A.R. and others // Bulletin of the South-Western State University. Series Management, computer engineering, computer science. Medical instrumentation. - 2019 .-- T.9. No. 1 (30) - S. 49-61]. Transparency masks are stored in files 3 of FIG. 1.

На фиг. 3а представлена маска прозрачности для сегментов, образованных тенями ребер на рентгенограмме грудной клетке, на фиг. 3б представлена маска прозрачности для пикселей попавших в область легочных полей на рентгенограмме грудной клетки, на фиг. 3в показана маска прозрачности для зоны интереса-пневмония, на фиг. 3г показана маска прозрачности для рентгеновской маммограммы с зоной интереса-рак.In FIG. 3a presents a transparency mask for the segments formed by the shadows of the ribs on the chest x-ray, in FIG. 3b presents a transparency mask for pixels trapped in the pulmonary field on the chest radiograph, FIG. 3c shows a transparency mask for the zone of interest — pneumonia; FIG. 3d shows a transparency mask for an x-ray mammogram with a cancer-of-interest zone.

На втором этапе осуществляется формирование глубинных слоев (блок 11 фиг. 2). За основу формирования глубинных слоев взята технология формирования сверточных слоев в сверточных нейронных сетях (СНС). Но в отличие от архитектуры СНС в данном способе используются тождественные операторы. Тождественные операторы имеют такую же масочную структуру, что и операторы свертки, то есть результат их работы (яркость пикселя с координатами i, j) можно представить выражениемAt the second stage, the formation of deep layers is carried out (block 11 of Fig. 2). The technology for the formation of convolutional layers in convolutional neural networks (SNA) is taken as the basis for the formation of deep layers. But unlike the SNA architecture, this method uses identical operators. Identity operators have the same mask structure as convolution operators, that is, the result of their work (pixel brightness with coordinates i, j) can be represented by the expression

, (1)

где u - идентификатор свертки (идентификатор масштаба), M1 x M2 – размер матрицы свертки,

- весовые коэффициенты свертки с идентификатором u.where u is the convolution identifier (scale identifier), M1 x M2 is the size of the convolution matrix,

- convolution weights with identifier u.

Особенностью тождественного оператора является то, что у него отличен от нуля (равен единице) только один весовой коэффициент, координата которого совпадает с координатой активного пикселя на изображении. A feature of the identical operator is that it has a non-zero (equal to unity) only one weight coefficient, the coordinate of which coincides with the coordinate of the active pixel in the image.

Этому условию соответствует следующее правило продукционного типаThis condition is met by the following production type rule

Если ((g=M1-1)И(q=M2-1)) ТО

, ИНАЧЕ

. (2)If ((g = M1-1) AND (q = M2-1)) THEN

ELSE

. (2)

Пример тождественного оператора для размера матрицы свертки (масштаба) 2 × 2 и схема его взаимодействие с исходным изображением при получении глубинного слоя с этим масштабом показаны на фиг. 4.An example of the identity operator for the size of the convolution matrix (scale) 2 × 2 and the scheme of its interaction with the original image when receiving a depth layer with this scale are shown in FIG. 4.

Таким образом, тождественный оператор транслирует исходное изображение в глубинный слой с присвоением ему определенного масштаба (размечает исходное изображение на мегапиксели). Масштабы глубинных слоев выбираются из ряда: 2×2; 4×4; 8×8 …, то есть из ряда 2ⁿ⁺¹, где n принимает значения ряда натуральных чисел. При этом размерность изображения в глубинном слое сокращается и принимает значение (N1-M1/2) x (N2-M2/2), где N1 × N2 – размер исходного изображения, а M1 × M1 – размер мегапикселя.Thus, the identical operator translates the original image into a deep layer with the assignment of a certain scale to it (marks the original image in megapixels). The scales of the deep layers are selected from the series: 2 × 2; 4x4; 8 × 8 ..., that is, from the series 2 ^{n + 1} , where n takes the values of a series of natural numbers. In this case, the dimension of the image in the deep layer is reduced and takes the value (N1-M1 / 2) x (N2-M2 / 2), where N1 × N2 is the size of the original image, and M1 × M1 is the size of the megapixel.

На третьем этапе осуществляется сокращения размерности глубинных слоев, которая в технологии СНС называется пулингом или субдискретизациией (блок 12 фиг. 2).At the third stage, the dimension of the deep layers is reduced, which in the SNA technology is called pooling or subsampling (block 12 of Fig. 2).

Задача пулинга – сократить карты признаков, то есть сократить число признаков в глубинных слоях. На фиг. 5 представлена иллюстрация классического пулинга, используемого в СНС. Для его реализации необходима матрица весовых коэффициентов, также как и при реализации сверточного слоя. На фиг. 5 представлен пример пулинга с матрицей размером 2 × 2, который позволяет сократить размерность глубинного слоя с 24 × 24 до 12 × 12. В результате пулинга каждые четыре элемента из исходного изображения, попавшие в матрицу пулинга размером 2 × 2, заменены одним элементом в выходном изображении, который является максимальным из четырех. В литературе описаны и другие способы выбора элемента выходного изображения и шага перемещения матрицы пулинга. The task of pooling is to reduce feature maps, that is, to reduce the number of features in deep layers. In FIG. 5 is an illustration of the classic pooling used in the SNA. For its implementation, a matrix of weighting coefficients is required, as well as in the implementation of a convolutional layer. In FIG. Figure 5 shows an example of a pooling with a 2 × 2 matrix, which allows you to reduce the dimension of the depth layer from 24 × 24 to 12 × 12. As a result of the pooling, every four elements from the original image that fall into the 2 × 2 pooling matrix are replaced by one element in the output image which is the maximum of four. Other methods are described in the literature for selecting an output image element and a step for moving a pooling matrix.

В данном способе используется технология пуллинга, отличная от классической. Сущность технологии состоит в том, что элементам глубинного слоя (1), находящимся в области масштабной маски, соответствующей этому глубинному слою, ставится в соответствии некоторый функционал Z. При формировании этого функционала учитываем то, что в результате сегментации первого этапа сегменты уже селектированы по показателям яркости и текстуры. Поэтому дальнейшая обработка изображения сегмента направлена только на его классификацию. Каждая масштабная маска глубинного слоя может быть описана вектором информативных признаков, число элементов в котором должно быть меньше, чем в масштабной маске (как в классическом пулинге фиг. 5). За основу формирования такого вектора взят метод Виолы-Джонса. В этом методе для описания текстуры в масштабном окне используются множество вейвлетов (примитивов) Хаара. В предлагаемой технологии пулинга для каждого масштабного окна используем только два вейвлета Хаара: вертикальный и горизонтальный. Размер вертикального вейлета равен M2/2 × M1, а размер горизонтального вейвлета равен М1/2 × М2. Горизонтальный вейвлет Хаара перемещается по маске в вертикальном направлении с шагом ΔV, а вертикальный вейвлет Хаара перемещается по маске в горизонтальном направлении с шагом ΔG. При каждом i-м положении вейвлета вычисляется разность This method uses pulling technology other than classic. The essence of the technology is that some functional Z is assigned to the elements of the deep layer (1) located in the area of the scale mask corresponding to this deep layer. When forming this functional, we take into account the fact that as a result of segmentation of the first stage, the segments are already selected according to indicators brightness and texture. Therefore, further processing of the image segment is aimed only at its classification. Each scale mask of the deep layer can be described by a vector of informative features, the number of elements in which should be less than in the scale mask (as in the classic pooling of Fig. 5). The basis for the formation of such a vector is taken by the Viola-Jones method. In this method, a number of Haar wavelets (primitives) are used to describe the texture in the scale window. In the proposed pooling technology for each large-scale window, we use only two Haar wavelets: vertical and horizontal. The size of the vertical wavelet is M2 / 2 × M1, and the size of the horizontal wavelet is M1 / 2 × M2. The horizontal Haar wavelet moves along the mask in the vertical direction with a step ΔV, and the vertical Haar wavelet moves along the mask in the horizontal direction with a step ΔG. At each ith position of the wavelet, the difference is calculated

Z_i=W_i – B_i, (3)Z _i = W _i - B _i , (3)

где W_i – сумма яркостей пикселей, находящихся под «белой» частью вейвлета Хаара, B_i – сумма яркостей пикселей, находящихся под «черной» частью вейвлета Хаара.where W _i is the sum of the brightness of pixels located under the "white" part of the Haar wavelet, B _i is the sum of the brightness of pixels located under the "white" part of the Haar wavelet.

При этом получаем два вариационных ряда, число элементов в которых определяется как int(M1/(2 ΔV) и int(M2/(2 ΔG). Если выбрать М1=М2, то число элементов в вариационных рядах одинаково и их можем принять за элементы комплексного ряда с соответствующей действительной (первый ряд) и мнимой (второй ряд) составляющими. At the same time, we obtain two variational series, the number of elements in which is defined as int (M1 / (2 ΔV) and int (M2 / (2 ΔG). If M1 = M2 is chosen, then the number of elements in the variational series is the same and we can take them as elements complex series with corresponding real (first row) and imaginary (second row) components.

При классификации сегмента необходимо учитывать, что на рентгеновском снимке он может быть не прямоугольной формы, причем допускаются самые разнообразные конфигурации его контура. Таким образом, формируем первое требование к пространству информативных признаков: вектор информативных признаков не должен зависеть от масштаба глубинного слоя и от количества масштабных окон в сегменте. Это первое требование к технологии пулинга. Вторым требованием является инвариантность информативных признаков к динамическому диапазону яркости пикселей на рентгеновском снимке. When classifying a segment, it must be taken into account that on an X-ray image it may not be of a rectangular shape, and the most diverse configurations of its contour are allowed. Thus, we form the first requirement for the space of informative features: the vector of informative features should not depend on the scale of the deep layer and on the number of scale windows in the segment. This is the first requirement for pooling technology. The second requirement is the invariance of informative features to the dynamic range of pixel brightness in an x-ray image.

Для реализации второго требования к пространству информативных признаков полагаем, что (3) может принимать только значения -1, 0 и +1. С этой целью введем пороговый параметр η и преобразуем результат взаимодействия маскированных пикселей с вейвлетом Хаара следующим образом:To implement the second requirement for the space of informative features, we assume that (3) can only take values -1, 0 and +1. To this end, we introduce the threshold parameter η and transform the result of the interaction of masked pixels with the Haar wavelet as follows:

(4)

На фиг. 6 показан пример определения вектора информативных признаков масштабного окна. Для простоты иллюстрации в качестве примера глубинного слоя взято изображение «шахматное поле». Масштабная маска выбрана 2 × 2 элемента с горизонтальным вейвлетом Хаара 1 × 2. Это иллюстрирует фиг. 6а. В результате продвижения вейвлета Хаара по масштабной маске получаем отсчеты согласно (3), аппроксимация которых представлена кривой 1 на фиг. 6б. Кривая 2 на фиг. 6б иллюстрирует отсчеты, полученные согласно (4).In FIG. 6 shows an example of determining the vector of informative features of a scale window. For simplicity of illustration, the image “chess field” was taken as an example of a deep layer. The scale mask is selected 2 × 2 elements with a horizontal Haar wavelet 1 × 2. This is illustrated in FIG. 6a. As a result of the Haar wavelet moving along the scale mask, we obtain the samples according to (3), the approximation of which is represented by curve 1 in FIG. 6b. Curve 2 in FIG. 6b illustrates the samples obtained according to (4).

На фиг. 7 показан фрагмент изображения реального рентгеновского снимка с масштабной маской, образующей мегапиксель, при сканировании которой вейвлетами Хаара формируется вектор информативных признаков (4).In FIG. Figure 7 shows a fragment of an image of a real x-ray image with a large-scale mask forming a megapixel, when scanned by Haar wavelets, a vector of informative features is formed (4).

Перейдем к процедуре пулинга, позволяющей удовлетворить первое требование. Функционал Z должен быть выбран таким образом, чтобы его значение не зависело от размера сегмента, а зависело только от его текстуры. Это требование обусловлено тем, что субдискретизированное глубинное изображение представляет из себя множество

, где i=1,2,…,L, где L-кратность вложения маски пулинга масштаба u в u-ый глубинный слой. Для того, чтобы отстроиться от влияния количества масштабных масок в сегменте глубинного слоя на вектор информативных признаков будем использовать однослойный персептрон. Процесс нормирования длины вектора информативных признаков поясним на конкретном примере. Пусть мы имеем сегмент в виде прямоугольника размером 80 × 64 пикселя. Рассмотрим глубинное изображение с масштабной маской 16 × 16 пикселей. В результате прохождения вейвлета Хаара по каждой масштабной маске получаем восьмикомпонентный вектор информативных признаков, который может быть представлен тензором 5 × 4 × 8. На фиг. 8 этот тензор представлен в виде прямоугольного параллелепипеда. Полагаем, что корреляционные связи проявляются только у элементов смежных масок, и подаем на однослойный персептрон четыре восьмикомпонентных вектора (на фиг. 8 они объединены окружностями). Учитывая, что используются два вейвлета Хаара, то число восьмикомпонентных векторов на входе однослойного персептрона увеличится до 8, то есть для данного глубинного слоя необходимо использовать нейронную сеть с одним выходом и 64 входами или две нейронных сети с 32 входами.We turn to the pooling procedure, which allows us to satisfy the first requirement. The functional Z should be chosen so that its value does not depend on the size of the segment, but depends only on its texture. This requirement is due to the fact that the sub-sampled depth image is a multitude of

, where i = 1,2, ..., L, where L is the multiplicity of the embedding of the pooling mask of scale u in the u-th deep layer. In order to rebuild from the influence of the number of scale masks in the deep layer segment on the vector of informative features, we will use a single-layer perceptron. The process of rationing the length of the vector of informative signs is explained with a specific example. Let us have a segment in the form of a rectangle measuring 80 × 64 pixels. Consider a deep image with a 16 × 16 pixel mask. As a result of passing the Haar wavelet through each scale mask, we obtain an eight-component vector of informative features, which can be represented by the tensor 5 × 4 × 8. In FIG. 8, this tensor is represented as a rectangular parallelepiped. We believe that correlation relationships appear only in the elements of adjacent masks, and feed four eight-component vectors to the single-layer perceptron (in Fig. 8 they are connected by circles). Given that two Haar wavelets are used, the number of eight-component vectors at the input of a single-layer perceptron will increase to 8, that is, for a given deep layer, it is necessary to use a neural network with one output and 64 inputs or two neural networks with 32 inputs.

Так как нейронная сеть должна ответить на вопрос, принадлежит ли элемент сегмента глубинного изображения, определенный масштабной маской, к заданному известному классу, то однослойный персептрон обучен под известный класс с известной масштабной маской, поэтому его структура не зависит от числа масштабных масок (мегапикселей), включенных в классифицируемый сегмент. Этот однослойный персептрон является слабым классификатором. Слабые классификаторы формируют входной вектор для полносвязной нейронной сети.Since the neural network must answer the question whether the element of the depth image segment defined by the scale mask belongs to the given known class, the single-layer perceptron is trained to the known class with the known scale mask, therefore its structure does not depend on the number of scale masks (megapixels), included in the classified segment. This single-layer perceptron is a weak classifier. Weak classifiers form the input vector for a fully connected neural network.

Алгоритм, реализующий технологию пулинга, представлен на фиг. 9. Целью алгоритма является формирование информативных признаков по результатам анализа глубинных слоев для полносвязной нейронной сети, реализующий четвертый этап классификации сегментов изображения рентгеновского снимка, поэтому алгоритм включает глобальный цикл анализа U глубинных изображений рентгеновского снимка (блок 20). Идентификатор U определяет количество масштабных окон (масштабов), используемых для классификации сегментов изображения рентгеновского снимка, которые определяются в блоке 11 фиг. 2.The algorithm implementing the pooling technology is shown in FIG. 9. The purpose of the algorithm is the formation of informative features based on the results of the analysis of deep layers for a fully connected neural network, which implements the fourth stage of the classification of segments of the image of the x-ray image, therefore, the algorithm includes a global analysis cycle of U depth images of the x-ray image (block 20). The identifier U determines the number of scale windows (scales) used to classify the segments of the x-ray image, which are determined in block 11 of FIG. 2.

В блоке 16 осуществляется загрузка рентгеновского снимка, изображение которого имеет размер N1 × N2 пикселей. В блоке 17 загружается маска прозрачности, изображение которой определено на первом этапе классификации. В блоке 18 выбирается классифицируемый сегмент. В блоке 18 загружаются масштабные маски, которые используются при формировании глубинных слоев. Полагаем, что для формирования глубинных слоев используется U масштабных масок. В блоке 21 вводится соответствующая масштабная маска, которая форматирует изображение рентгеновского снимка в соответствии со своим размером (M1×M2)_u. В блоке 22 вводятся значения порогов

и

для формирования вектора F согласно (4). В блоке 24 формируются горизонтальные и вертикальные вейвлеты Хаара. Размер вертикального вейлета равен (M2/2×M1)_u, а размер горизонтального вейвлета равен (М1/2×М2)_u. Горизонтальный вейвлет Хаара перемещается по маске в вертикальном направлении с шагом ΔV, а вертикальный вейвлет Хаара перемещается по маске в горизонтальном направлении с шагом ΔG. Величина шага задается в блоке 9. По умолчанию ΔV= ΔG=1 и тогда этот блок может отсутствовать.In block 16, an X-ray image is loaded, the image of which has a size of N1 × N2 pixels. In block 17, a transparency mask is loaded, the image of which is determined at the first stage of classification. In block 18, a classified segment is selected. In block 18, scale masks are loaded, which are used in the formation of deep layers. We believe that U scale masks are used to form deep layers. In block 21, a corresponding scale mask is introduced that formats the image of the x-ray image in accordance with its size (M1 × M2) _u . In block 22, thresholds are entered

and

to form the vector F according to (4). In block 24, horizontal and vertical Haar wavelets are formed. The size of the vertical wavelet is (M2 / 2 × M1) _u , and the size of the horizontal wavelet is (M1 / 2 × M2) _u . The horizontal Haar wavelet moves along the mask in the vertical direction with a step ΔV, and the vertical Haar wavelet moves along the mask in the horizontal direction with a step ΔG. The step value is set in block 9. By default, ΔV = ΔG = 1 and then this block may be absent.

Непосредственно тензор информативных признаков для "слабых" классификаторов вычисляется в цикле, который организован посредством блока 25. Цикл предусматривает сканирование изображения рентгеновского снимка согласно правилу строчной развертки. При этом анализируется группа пикселей, попавшая в масштабную маску. Каждой группе пикселей ставится в соответствие комплексный вектор размером D, который формирует третье измерение. Первая компонента комплексного вектора формируется блоками 27-29, а вторая - блоками 30-32. Блоки 25 и 26 организуют сканирование псевдоматрицы глубинного слоя. Элементы этой матрицы составляют пиксели изображения рентгеновского снимка, попавшие в масштабную маску - мегапиксели. Число мегапикселей в u-ом глубинном слое определяется как V_u x G_u, где V_u=int(N1/M1_u), G_u=int(N2/M2_u). The tensor of informative features for “weak” classifiers is calculated directly in a cycle organized by block 25. The cycle involves scanning an x-ray image according to the horizontal scanning rule. In this case, a group of pixels that fall into the scale mask is analyzed. Each pixel group is assigned a complex vector of size D, which forms the third dimension. The first component of the complex vector is formed by blocks 27-29, and the second - by blocks 30-32. Blocks 25 and 26 organize a scan of the pseudomatrix of the deep layer. Elements of this matrix are the pixels of the image of the x-ray image, which fell into a large-scale mask - megapixels. The number of megapixels in the u-th deep layer is defined as V _u x G _u , where V _u = int (N1 / M1 _u ), G _u = int (N2 / M2 _u ).

Блоки 27-29 и 30-32 представлены, с целью упрощения описания алгоритма, для случая ΔV= ΔG и М1=М2. В этом случае число компонент вектора, соответствующих координате d, полученных сканированием мегапикселя горизонтальным и вертикальным вейвлетами Хаара одинаково и определяется как D=M1/(2 ΔV)=M2/(2 ΔG). Blocks 27-29 and 30-32 are presented, in order to simplify the description of the algorithm, for the case ΔV = ΔG and M1 = M2. In this case, the number of vector components corresponding to the d coordinate obtained by scanning the megapixel by the horizontal and vertical Haar wavelets is the same and is defined as D = M1 / (2 ΔV) = M2 / (2 ΔG).

После получения двух тензоров, один из которых представлен на фиг. 8, для каждого глубинного слоя осуществляется их прореживание в блоке 33 согласно матрицы прозрачности и выбранному в блоке 18 сегменту. Каждый из оставшихся после прореживания мегапикселей будет формировать слабый классификатор, на выходе которого присутствуют две компоненты, характеризующие принадлежность анализируемого сегмента к заданному классу. "Слабый" классификатор формируется однослойным персептроном, который реализуется в блоке 34. На его вход поступают элементы не только слоя D, соответствующие мегапикселю этого "слабого" классификатора, но и элементы слоев D смежных мегапикселей (на фигуре 8 обведены окружностью). Однослойный персептрон настраивается на один класс, поэтому для всех мегасегментов глубинного слоя u используются два однослойных персептрона, которые обучаются на примерах D-слоев, полученных в результате горизонтального и вертикального сканирования вейвлетами Хаара соответствующего мегапикселя.After obtaining two tensors, one of which is shown in FIG. 8, for each depth layer, they are thinned out in block 33 according to the transparency matrix and the segment selected in block 18. Each of the megapixels remaining after thinning will form a weak classifier, at the output of which there are two components characterizing the belonging of the analyzed segment to a given class. A “weak” classifier is formed by a single-layer perceptron, which is implemented in block 34. Not only layer D elements corresponding to the megapixel of this “weak” classifier are received at its input, but also elements of layers D of adjacent megapixels (circled in figure 8). A single-layer perceptron is tuned to one class, therefore, for all mega segments of the deep layer u, two single-layer perceptrons are used, which are trained on the examples of D-layers obtained as a result of horizontal and vertical scanning by Haar wavelets of the corresponding megapixel.

На четвертом этапе осуществляется классификация сегмента посредством многослойного персептрона. Полносвязный слой выполнен в виде двух нейронных сетей блочного типа, скрытые слои которых обучены на векторах, полученных в результате вертикального и горизонтального сканирования вейвлетом Хаара соответствующего мегапикселя. Последние слои полносвязной нейронной сети агрегируют результаты автономной классификации нейронных сетей блочного типа.At the fourth stage, the segment is classified by means of a multilayer perceptron. The fully connected layer is made in the form of two neural networks of block type, the hidden layers of which are trained on the vectors obtained as a result of vertical and horizontal scanning by the Haar wavelet of the corresponding megapixel. The last layers of a fully connected neural network aggregate the results of an autonomous classification of block-type neural networks.

На фиг. 10 показана полная структурная схема классификатора, реализующая предлагаемый способ. In FIG. 10 shows a complete block diagram of a classifier that implements the proposed method.

Входными данными для классификатора является матрица отчетов пикселей рентгеновского снимка 35, из которой формируется глубинные слои 36 используя технологию пулинга из глубинных слоев 36 получают тензоры информативных признаков 37, которые прореживаются посредством маски прозрачности 38. Мультиплексор 39 подключает Д - слои тензоров 37 к входам однослойным персептронам 40, что обеспечивает интеллектуальный анализатор 41, который также формирует входные данные для полносвязной нейронной сети 42.The input for the classifier is a matrix of x-ray pixel pixels report 35, from which the deep layers 36 are formed using the pooling technology from the deep layers 36, informative feature tensors 37 are obtained, which are thinned out using the transparency mask 38. The multiplexer 39 connects the D-layers of tensors 37 to the inputs of the single-layer perceptrons 40, which provides an intelligent analyzer 41, which also generates input for a fully connected neural network 42.

Полносвязная нейронная сеть 42 (фиг. 11) включает входной слой 43, который является выходом интеллектуального анализатора 41, скрытые слои 44 и выходной слой 45.A fully connected neural network 42 (Fig. 11) includes an input layer 43, which is the output of the intelligent analyzer 41, hidden layers 44 and the output layer 45.

На фиг. 12 представлена схема алгоритма работы интеллектуального анализатора 41 (фиг. 10). Исходными данными для него является выходы однослойный персептронов 40 (фиг. 10). На выходы однослойных персептронов последовательно поступает информация о каждом мегапикселе глубинного слоя U. Интеллектуальный анализатор должен хранить эти данные, которые на схеме алгоритма обозначенный как

и

, а также хранить число мегапикселей в глубинном слое U, которое обозначено идентификатором К_u.In FIG. 12 is a flow chart of an intelligent analyzer 41 (FIG. 10). The initial data for it is the outputs of a single-layer perceptron 40 (Fig. 10). The outputs of a single-layer perceptron consistently receive information about each megapixel of the deep layer U. The intelligent analyzer must store this data, which in the algorithm diagram is designated as

and

, and also store the number of megapixels in the deep layer U, which is indicated by the identifier K _u .

Посредством блоков 49 и 50 осуществляется сканирование глубинного слоя U. Сканирование ведется по мегапикселям. Размер рентгеновского снимка в мегапикселях в глубинном слое U определяется как:Using blocks 49 and 50, a depth layer U is scanned. Scanning is carried out in megapixels. The size of the x-ray in megapixels in the deep layer U is defined as:

где - (M1×M2)_u где размер масштабной маски.where - (M1 × M2) _u where is the size of the scale mask.

Координаты мегапикселя (i, j) сравниваются с координатами сегмента и на маске прозрачности, если мегапиксель попадает в сегмент, то из соответствующих слоев D тензора на однослойные персептроны поступают входные данные, что реализуется блоками 51, 53, 54. При этом срабатывает счётчик сегментов в слое U (блок 52) и выходные данные с однослойных персептронов 40 запоминается в блоке 55.The coordinates of the megapixel (i, j) are compared with the coordinates of the segment and on the transparency mask, if the megapixel falls into the segment, then the input data comes from the corresponding layers D of the tensor to single-layer perceptrons, which is implemented by blocks 51, 53, 54. In this case, the segment counter in layer U (block 52) and the output from single-layer perceptrons 40 is stored in block 55.

После просмотра мегапикселей глубинного слоя U и в блоке 56 запоминаются число пикселей K_u, принадлежащих классифицируемому сегменту. After viewing the megapixels of the deep layer U and in block 56, the number of pixels K _u belonging to the classified segment is stored.

Сущность интеллектуального анализатора данных

и

состоит в следующем. Так как мегасегменты, информация о которых отражена в вышеуказанных множествах, принадлежат одному и тому же сегменту, то элементы множеств не должны быть относительно однородны, то есть дисперсия элементов множеств не должна превышать порогов

и

. The essence of intelligent data analyzer

and

consists in the following. Since the mega segments, information about which is reflected in the above sets, belong to the same segment, the elements of the sets should not be relatively uniform, that is, the variance of the elements of the sets should not exceed the thresholds

and

.

Если эти условия не соблюдаются (блоки 58 и 59), то необходимо искать выбросы в данных, я и при формировании соответствующего входа полносвязные нейронные сети 42 выбросы не учитывать.If these conditions are not met (blocks 58 and 59), then it is necessary to look for outliers in the data, and when generating the corresponding input, fully connected neural networks 42 do not take into account outliers.

Выбросы для множества

определяется в блоках 61 – 64, а для множества

в блоках 66 – 69. Если выбросов нет, то входы нейронной сети 42 формируют блок 60, в противном случае – блоки 65 и 70.Emissions for the set

defined in blocks 61 - 64, and for the set

in blocks 66 - 69. If there are no emissions, then the inputs of the neural network 42 form a block 60, otherwise - blocks 65 and 70.

Таким образом, для решения задач классификации морфологических структур с патологическими образованиями на растровых полутоновых изображениях рентгеновских снимков предложено разделить исходное изображение на части (сегменты), различающиеся по своему семантическому содержанию, и затем классифицировать эти сегменты. При сегментации изображений рентгеновских снимков на них выделяются зоны интереса произвольной формы (не растровые изображения). Поэтому ставится вопрос не о классификации исходного изображения, а о классификации его сегментов. Компьютерные программы классификации работают, как правило, с «матричными» изображениями, представленными прямоугольными или квадратными растрами. Для преобразования не растрового изображения (сегмента) в растровое к нему присоединяется маска (изображение) прозрачности, размер которой совпадает с размером исходного изображения. Пиксели маски, принимающие значение единица, формируют массив обрабатываемых пикселей исходного изображения, принадлежащих к выделенному сегменту, а пиксели маски, принимающие значение ноль - преобразуют яркость соответствующих пикселей исходного изображения в яркость фона.Thus, to solve the problems of classifying morphological structures with pathological formations on halftone raster images of x-ray images, it is proposed to divide the original image into parts (segments) that differ in their semantic content, and then classify these segments. When segmenting images of x-ray images, zones of interest of arbitrary shape (non-raster images) are allocated to them. Therefore, the question is not about the classification of the source image, but about the classification of its segments. Computer classification programs work, as a rule, with “matrix” images represented by rectangular or square rasters. To convert a non-raster image (segment) into a raster image, a transparency mask (image) is attached to it, the size of which coincides with the size of the original image. Mask pixels that take a value of one form an array of processed pixels of the original image that belong to the selected segment, and mask pixels that take a value of zero convert the brightness of the corresponding pixels of the original image to the brightness of the background.

Классификация выделенного сегмента осуществляется модифицированной сверточной нейронной сетью, в которой из классической сверточной сети используются слои пулинга и слои полносвязной нейронной сети. Для того, чтобы использовать формальные процедуры обработки изображений с прямоугольным растром, взаимодействие с маской прозрачности осуществляется после выполнения пулинга. В отличие от классической сверточной сети в предложенном техническом решении после слоя пулинга вставлен слой слабых классификаторов, выполненный на однослойных персептронах. Число слабых классификаторов не зависит от структуры и размеров мегасегмета (масштаба глубинного слоя). Каждый слабый классификатор настроен на определенный класс сегмента и на определенный размер масштабной маски. Следовательно, число слабых классификаторов, участвующих в классификации сегмента равно числу используемых масштабов (глубинных слоев) в сверточной нейронной сети умноженному на два. Для агрегирования решений каждого из слабых классификаторов по всем мегапикселям сегмента используется слой «интеллектуальный анализатор», который и формирует пространство информативных признаков для полносвязной нейронной сети. Каждый слабый классификатор выдает решение о принадлежности мегапикселя анализируемого сегмента к заданному классу, но на полносвязную нейронную сеть поступает информация, усредненная по всем сегментам. The classification of the selected segment is carried out by a modified convolutional neural network, in which pulling layers and layers of a fully connected neural network are used from the classical convolutional network. In order to use formal procedures for processing images with a rectangular raster, interaction with the transparency mask is carried out after the pooling. In contrast to the classical convolution network, in the proposed technical solution, after the pooling layer, a layer of weak classifiers inserted on single-layer perceptrons is inserted. The number of weak classifiers does not depend on the structure and size of the mega-segment (scale of the deep layer). Each weak classifier is tuned to a specific segment class and to a specific scale mask size. Therefore, the number of weak classifiers participating in the classification of the segment is equal to the number of scales used (depth layers) in the convolutional neural network multiplied by two. To aggregate the solutions of each of the weak classifiers for all megapixels of the segment, the “intelligent analyzer” layer is used, which forms the space of informative features for a fully connected neural network. Each weak classifier gives a decision on whether the megapixel of the analyzed segment belongs to a given class, but information averaged over all segments arrives at a fully connected neural network.

Процедуру усреднения осуществляет интеллектуальный анализатор с учетом выбросов. Если результат классификации мегасегмента слабым классификатором отнесен к выбросам значения, резко отличающиеся от других значений в собранном наборе данных, то он не участвует в формирование входных данных для полносвязной нейронной сети.The averaging procedure is carried out by an intelligent analyzer taking into account emissions. If the result of classifying a mega segment with a weak classifier is assigned to outliers as values that sharply differ from other values in the collected data set, then it does not participate in the formation of input data for a fully connected neural network.

Claims

1. A method for the automatic classification of x-ray images using transparency masks, which provides for the formation of an x-ray digital image in the form of a matrix of optical densities of the object, obtaining deep image layers by processing the original digital image with local filters unique to each layer, reducing the dimensionality of images in the deep layers using pooling technology (sub-sampling), the formation of a space of informative features for a learner fully connected neural network from sub-sampled deep layers and classification of the obtained vector of informative features by means of a fully connected neural network, characterized in that the input digital image is supplemented by a transparency mask obtained by preliminary segmentation of the x-ray image, and the classified feature vector is formed not by all image pixels, but only by those pixels that are not masked by the transparency mask.

2. The method according to claim 1, characterized in that the local filters are implemented in the form of identical operators with different scale masks, which form deep layers by indexing the filter mask scale, which converts the pixels of the original image to megapixels - a lot of pixels of the original image that fall within the boundaries filter masks.

3. The method according to claim 1, characterized in that the pooling consists of four stages: at the first stage, each megapixel of the deep layer of the image is represented in the form of two vectors obtained by means of two differential operators adapted to horizontal and vertical directions; at the second stage, a classified segment is selected on the transparency mask, and vectors whose megapixels do not belong to the classified segment are removed from the obtained three-dimensional tensors; at the third stage, from four adjacent vectors of each megapixel, a vector of informative signs for the “weak” classifier is formed; at the fourth stage, each weak classifier of each deep layer determines the degree of belonging of a given megapixel to a given class.

4. The method according to claim 3, characterized in that at the first stage of the pooling the results of differentiation Z _{i are} compared with the threshold η, as a result of which the components of the vector of informative features take on value

5. The method according to claim 3, characterized in that Haar wavelets with sizes M2 / 2xM1 and M1 / 2xM2 are used as differential operators, where M1 x M2 is the size of the scale mask of the corresponding depth layer.

6. The method according to claim 3, characterized in that as the "weak" classifiers are used single-layer perceptrons.

7. The method according to claim 1, characterized in that the fully connected neural network includes two block fully connected networks, the inputs of the first of which are connected to the outputs of weak classifiers, the informative features of which are formed from the megapixel vectors of the first tensor, and the inputs of the second block neural network are connected to the outputs weak classifiers, the informative features of which are formed from the megapixel vectors of the second tensor.

8. The method according to claim 7, characterized in that the inputs of a fully connected neural network are formed by intelligent analysis of the classification results by single-layer perceptrons of megapixels of the global layer and in the presence of “outliers” of the megapixel classifier, the results of its classification do not participate in the formation of the corresponding input of a fully connected neural network.