WO2020242341A1

WO2020242341A1 - Method for selecting and classifying blood cell types by means of deep convolutional neural networks

Info

Publication number: WO2020242341A1
Application number: PCT/RU2019/000687
Authority: WO
Inventors: Александр Михайлович ГРОМОВ; Вадим Сергеевич КОНУШИН
Original assignee: Общество С Ограниченной Ответственностью "Лаб Кмд"
Priority date: 2019-05-27
Filing date: 2019-09-27
Publication date: 2020-12-03
Also published as: RU2732895C1

Abstract

This technical solution relates in general to the field of computer science and medicine, and specifically to a method for selecting and classifying blood cell types by means of deep convolutional neural networks. The technical result consists in the automatic detection and classification of blood cell types by means of deep convolutional neural networks. A computer-implementable method for selecting and classifying blood cell types by means of deep convolutional neural networks, consisting in carrying out steps in which: an image comprising blood cells is produced; the blood cells are detected in the produced image; normal blood cells and edge blood cells are differentiated; the normal blood cells are selected and cut out of the image, while the edge blood cells are excluded from further analysis; then the blood cells are classified by type, wherein: for each image of a blood cell that has been cut out, a set of images is produced by an augmentation method; the set of images produced for each cell is analyzed, and each blood cell is classified by type based on said set.

Description

Method for the isolation and classification of blood cell types using deep convolutional neural networks

FIELD OF TECHNOLOGY

This technical solution, in general, relates to the field of computing and medicine, and in particular to a method for the isolation and classification of blood cell types using deep convolutional neural networks.

LEVEL OF TECHNOLOGY

At present, intelligent systems are being intensively developed for the automatic processing of medical images. Automated processing and analysis of medical images is a universal tool for medical diagnostics.

The classification of blood cells in a microscopic image is, in computer vision terms, an object recognition task.

Blood is a complex functional system that provides timely delivery of oxygen and nutrients to tissue cells and the removal of metabolic products from organs and interstitial spaces. The blood system subtly reacts to the effects of environmental factors with a set of specific and non-specific components. An important characteristic of the physiology and pathology of the blood system is the quantitative and qualitative composition of the erythrocyte population.

Visual assessment of the morphological characteristics of blood cells is an integral part of the analysis of human blood. Determination of the number of blood cells of different types, their ratio is an important and most frequent test of clinical laboratory diagnostics.

Historically, identification and counting of blood cells were performed using a microscope in a “manual” mode, while the analyzed blood sample was in a static state. In recent years, another approach to the identification and counting of blood cells has been intensively developed - the method of digital microscopy.

At present, this promising direction is in the stage of development, the search for appropriate optimal algorithms and programs to minimize errors in counting blood cells. The following solutions are known from the prior art: CN103745210B "Method and device for classifying white blood cells", patent holder AVE SCIENCE & TECHNOLOGY CO., LTD, publication date 02/06/2018; US20180322327A1 "Machine learning classification and training for digital microscopy cytology images", patented by TECHCYTE INC., Publication date 11/08/2018; KR101927852B1 "Method and Apparatus for Identifying Cell Species Using 3D Refractive Index Tomography and Machine Learning Algorithm", published 13.12.2018, which discloses methods for automatic differentiation of peripheral blood cells.

In addition, at the current level of technology, a solution is known from the companies CellaVision (http://www.cellavision.com/en/) and VisionHema (http://visionhemaultimate.ru/), which is a device with software, into which slide cassette, and the system automatically performs WBC count and erythrocyte morphology analysis.

However, the solutions known from the prior art for the automatic differentiation of blood cells have limited functionality, namely, these solutions do not have a cell detection step. In addition, all of the above solutions solve only the problem of classifying the simplest blood cells (5 types).

SUMMARY OF THE INVENTION

The technical problem to be solved by the claimed technical solution is the creation of a computer-implemented method for the isolation and classification of blood cell types using deep convolutional neural networks, which is characterized in an independent claim. Additional embodiments of the present invention are presented in the dependent claims.

This technical solution is aimed at eliminating the disadvantages inherent in existing solutions known from the prior art.

The technical result consists in automatic detection and classification of types of blood cells using deep convolutional neural networks.

The specified technical result is achieved due to the implementation of a computer-implemented method for the isolation and classification of types of blood cells using deep convolutional neural networks, which consists in performing the stages at which:

^■ get an image containing blood cells; At carry out detection, on the obtained image, blood cells;

We distinguish between normal blood cells and borderline;

Normal blood cells are isolated and cut from the image, and the border blood cells are excluded from further analysis;

After that, blood cells are classified according to types, while: a set of images is obtained for each image of the cut blood cell using the augmentation method; the set of images obtained for each cell is analyzed and, according to this set, each blood cell is classified by type. In a particular version, the detection of blood cells is determined by the coordinates of the upper left corner, the width and height of the cell.

In another particular embodiment, normal blood cells are selected with the coordinates of the bounding rectangle.

In another private version, a deep convolutional neural network is pre-trained based on two datasets: ImageNet 22k and Place 365.

In another particular embodiment, a single-stage RetinaNet detector is used to detect blood cells in an image.

DESCRIPTION OF DRAWINGS

The implementation of the invention will be described in the following in accordance with the accompanying drawings, which are presented to clarify the essence of the invention and in no way limit the scope of the invention. The following drawings are attached to the application:

FIG. 1 illustrates a computer-implemented method for the isolation and classification of blood cell types using deep convolutional neural networks;

FIG. 2 illustrates a block diagram of the claimed solution; FIG. 3 illustrates a detailed description of the detector architecture;

FIG. 4 illustrates an example of FPN construction;

FIG. 5 illustrates an example of generating anchors;

FIG. 6 illustrates an example of visualization of the operation of the augmentation method;

FIG. 7 illustrates an example of a general arrangement of a computing device. DETAILED DESCRIPTION OF THE INVENTION

In the following detailed description of an implementation of the invention, numerous implementation details are set forth to provide a thorough understanding of the present invention. However, it will be obvious to those skilled in the art how the present invention can be used, with or without these implementation details. In other cases, known methods, procedures, and components have not been described in detail so as not to obscure the features of the present invention.

This technical solution can be implemented on a computer in the form of an automated system (AS) or a computer-readable medium containing instructions for performing the above method.

The technical solution can be implemented as a distributed computer system.

In addition, it will be clear from the above description that the invention is not limited to the above implementation. Numerous possible modifications, changes, variations and substitutions, while retaining the spirit and form of the present invention, will be apparent to those skilled in the art.

Let us introduce a number of definitions and concepts that will be used to describe the implementation of the declared solution.

A convolutional neural network (CNN) is a special architecture of artificial neural networks aimed at efficient image recognition and is part of deep learning technologies.

Deep learning is characterized as a class of machine learning algorithms that:

• uses a multilayer system of nonlinear filters to extract features with transformations. Each subsequent layer receives the output of the previous layer as input. A deep learning system can combine learning algorithms with and without a teacher, while searching for cells and their further classification — learning with a teacher;

• has several layers of identifying features or parameters of data presentation. In this case, the features are organized hierarchically, the features of a higher level are derived from features of a lower level; • is part of the broader field of machine learning, the study of data representations;

• forms in the learning process layers at several levels of representations that correspond to different levels of abstraction; layers form a hierarchy of concepts.

Deep neural networks are currently becoming one of the most popular machine learning methods. They show better results compared to alternative methods in areas such as speech recognition, natural language processing, computer vision, medical informatics, etc. One of the reasons for the successful application of deep neural networks is that the network automatically extracts important features from the data. necessary to solve the problem.

Augmentation (Test time augmentation - TTL) - transformation of images: rotations, compression, adding noise, magnification, data augmentation, resizing, changing colors, changing the scale, cropping. This is a way to increase the quality of the classifier by averaging the predictions for the image and augmentations of the given image.

Blood cells, or blood cells, are cells that make up the blood and are formed in the red bone marrow during hematopoiesis. There are three main types of blood cells: erythrocytes (red blood cells), leukocytes (white blood cells), and platelets (platelets).

Diagnostics plays an important role in medicine. A timely accurate diagnosis facilitates the choice of a treatment method and significantly increases the likelihood of a patient's recovery. The use of neural networks is one of the ways to improve the efficiency of medical diagnostics.

The present invention is directed to providing a computer-implemented method for isolating and classifying blood cell types using deep convolutional neural networks.

In the claimed solution, the recognition of pathological cells can be divided into two stages - detection (detection) of cells and classification of cells.

As shown in FIG. 1, the claimed computer-implemented method for isolating and classifying types of blood cells using deep convolutional neural networks (100) is implemented as follows: In step (101), an image containing blood cells is obtained.

Next, at step (102), blood cells are detected on the obtained image. In this case, the detection of each blood cell is characterized by four numbers, namely, the coordinates of the upper left corner, the width and height of the cell. And everything is counted in pixels.

At the stage of detecting blood cells, the architecture of a deep convolutional neural network built using RetinaNet is used (Fig. 3).

The basis of this network is the MobileNet-128 network (the architecture of the MobileNet-128 network is described in the article "MobileNets: Efficient Convolutional Neural Networks for Mobile Vision Applications" https://arxiv.org/abs/1704.04861), since this network has a more different structure, which allows you to use it to obtain output files of the trained network of a smaller size.

RetinaNet is a single unified network consisting of a main neural network structure (NN) for feature extraction, and two auxiliary ones for specific tasks (an example of the RetinaNet architecture is given in the article "Focal Loss for Dense Object Detection" https://arxiv.org/abs/ 1708.02002). The main neural network is responsible for calculating the feature map of objects throughout the input image and is an independent convolution network.

The first auxiliary NS (localization NS) performs classification at the output of the main NS; the second auxiliary NN (classification NN) performs regression convolution for the bounding box.

The FeaturePyramideNet (FPN) architecture is used to generate spatial feature maps. As a loss function, the Focal Loss function described in the article "Focal Loss for Dense Object Detection" (https://arxiv.org/abs/1708.02002) is used.

The claimed solution for the detection of blood cells uses a one-stage detector of the RetinaNet family. Object detection is the output of the four coordinates of the rectangle into which the object of interest is inscribed.

The architecture of a one-stage RetinaNet detector is shown in FIG. 3.

The Features Pyramid Netrowk (FPN) built on the MobileNet-128 architecture is used as a backbone. The use of MobileNet-128 gave an increase in the speed of work, while allowing not to worsen the results of metrics. TO the FPN output is joined by two subnets, the first responsible for the classification of anchors, the second for their regression.

FPN

The FPN (Feature Pyramid Network) is built on top of the Mobilenet-128 Convolutional Deep Network. The pyramid consists of 5 levels - Pz, P4, Ps, Rb, P7. The first 3 levels are connected to Cs, C ₄ , Cs through a convolutional layer with 256 filters of size 1 X 1. Cs, C4, C 5 correspond to the feature maps of the Mobilenet-128 network, after 3, 4 and 5 sub-sampling layers, each of which reduces input image by 8, 16 and 32 times. Ps is obtained by applying a 256-filter convolutional layer of 1 X 1 size to C5. P4 is obtained by element-wise addition of the result of applying a convolutional layer with 256 filters of size 1 X 1 to Q and the result of doubling Ps, followed by applying a convolutional layer with 256 filters of size 3 X 3 and a convolution step of 1. P3 is obtained in the same way only it is connected to P4 and O (Fig. 4). Pb is obtained by applying a 256-filter convolutional layer of 3 X 3 size and a convolution step of 2 to Ps. P7 is obtained by applying the activation function ReLU and then applying a convolutional layer with 256 filters of size 3 X 3 and a convolution step of 2 to Pb.

Figure 5 shows an example of anchor generation. Each cell is a pixel in the output feature map, a predefined set of anchors is generated for each pixel. This example generates 4 anchors per pixel.

Generation of anchors. Since RetinaNet is a single-stage detector, unlike Faster R-CNN, where hypotheses are generated by a separate RPN neural network, each pixel of the feature map obtained after FPN (5 maps in total) is assigned a predetermined set of anchors. Anchors have the size 32 ² , 64 ² , 128 ² , 256 ² , 512 ² at the levels P3, P ₄ , Ps, Pb, P7, respectively. Three aspect ratios of the anchors are used - {1: 1, 1: 2, 2: 1}, and 3 scale factors - | 2 °, 2h, 2h |. Thus, a total of 9 anchors are generated for each pixel in the feature map, the size of the anchors depends on the level of the pyramid. Each anchor is associated with a vector of length 4 - a regression problem, and a vector of length K, where K is the number of classes, - a classification problem. The anchor is related to the reference rectangles based on the IOU criterion (intersection to union ratio), if IOU is greater than 0.5, then it is considered that the anchor coincides with the reference rectangle, if IOU is less than 0.4, then the anchor refers to the background, otherwise the anchor is ignored. Classification network. This network consists of 4 consecutive convolutional layers with 256 3 X 3 filters, each layer is followed by a ReLU activation layer, the last layer is a convolutional layer with K * A number of filters, where A is the number of generated anchors per pixel, and K is the number classes.

Regression network. This network consists of 4 consecutive convolutional layers with 256 filters of 3 X 3 sizes, after each layer there is an activation layer ReLU ReLU = max (0, x) - neuron activation function), the last layer is a convolutional layer with 4 * A number of filters, where A is the number of generated anchors per pixel, and a factor of 4 means that for each anchor you need to predict 4 values, the coordinates of the upper-left corner, width and height.

Focal loss. Imagine that the input image is 1024x1024 pixels, then 5 maps of 128 x 128, 64 x 64, 32 x 32, 16 x 16, 8 x 8 will be generated for it, and 9 anchors will be generated for each pixel of the resulting maps. 196,416 anchors will be generated in total, and the anchors corresponding to the standard will be about 0.1 percent of this number. To combat such an imbalance, focal loss is used: L (P _t ) = - (1 - P _t Y x log (p _t ), where

(p if y = 1

p _t = | _ _{p otherwise} e 'P ^ve R ^{oi nity} _> Y - belonging to the reference class

Thus, the more confident the detector is in the correct class, the lower the error value will be.

At step (103), normal blood cells and boundary cells are distinguished. Border cell - a cell located on the border of the image, only half or a third of this cell can be seen. These border cells cannot be classified.

In step (104), normal blood cells are isolated and cut from the image, and the border blood cells are excluded from further analysis. At the same time, normal blood cells are distinguished by the coordinates of the bounding rectangle.

In step (105), blood cell types are classified.

A set of images is obtained for each image of the excised blood cell using the augmentation method. Using the Test time augmentation approach, or TTA, is the classification of not just one image, but a set of images obtained from one, by rotating, displaying and cutting out a part of the image from the original (see Fig. 6). Analyze each image in the resulting set of images for each cell and classify each blood cell by type according to this set.

Examples of blood and bone marrow cell types that can be classified in the claimed solution using deep convolutional neural networks are presented in Table 1 below.

Table 1

In the claimed solution for classification, they use their own network based on the approach that is used in networks of the ResNet family - "Deep Residual Learning for Image Recognition" https://arxiv.org/abs/1512.03385.

The architecture used is an averaging of the Resnet-50 and Resnet-101 architectures. It contains 71 layers, hereinafter referred to as Resnet-71.

Architecture:

1 layer - 64 convolutions, sizes 7 by 7, applied with a step 2. BatchNormalization - layer for normalization of weights, activation function Relu, max pooling, size 3 by 3, applied with step 2. Max-pooling is an operation for dimensionality reduction and feature combining.

Layer 2 - a sequence of convolutions - 64 convolutions, 1 by 1, 64 convolutions, 3 by 3, 256 convolutions, 1 by 1, this sequence is repeated 3 times.

Layer 3 - sequence of convolutions - 128 convolutions, 1 by 1, 128 convolutions, 3 by 3, 512 convolutions, 1 by 1, this sequence is repeated 4 times. In the first of the four sequences, the convolution has a size of 3 by 3, the step is 2.

Layer 4 - a sequence of convolutions - 256 convolutions, 1 by 1, 256 convolutions, 3 by 3, 1024 convolutions, 1 by 1, this sequence is repeated 12 times. In the first of 12 sequences, in a convolution, with a size of 3 by 3, the step is 2.

Layer 5 - a sequence of convolutions - 512 convolutions, 1 by 1, 512 convolutions, 3 by 3, 1024 convolutions, 1 by 1, this sequence is repeated 3 times. In the first of the three sequences, the convolution has a size of 3 by 3, the step is 2.

On layers 2, 3, 4, and 5, BatchNormalization and Relu are applied before each convolution.

Layer 6 - BatchNormalization, Relu activation function, then average pooling is applied, the operation is similar to max-pooling, with a size of 7 by 7.

Layer 7 is a fully connected layer with the number of elements equal to the number of blood cells.

This neural network is first trained on two sets of images - the ImageNet 22k datasets (http://image-net.org/) and the P1ace365 dataset - ImageNet 22k and P1ace365 (http://places2.csail.mit.edu/download. html).

Also, to improve the accuracy of the network classifier, the method of TTA augmentation is used - test time augmentation - the classification of not just one image, but a set of images obtained from one, by rotating, displaying and cutting out a part of the image from the original. A schematic visualization of the work of which is presented in figure 6.

A detailed description of the operation of the augmentation method (TTA).

The following set of transformations is applied to each blood cell - 23 rotations with a step of 15 degrees (15 * 23 = 345, 24 rotations will return the cell to its original state), for all 24 images, a mapping is done, and a small cut is also done images - a new image height and width is randomly selected and an image of that size is cropped from the original image. The new height and new width are 95-100 percent of the original height and width, and the exact percentage is randomly selected. Thus, from the image of one cell, 47 images are obtained - 23 rotations * 2 (since the mapping is applied) + 1 (the original image is displayed), in total 48 images belonging to one cell are classified. After that, the results of the classification are accumulated and the final answer is selected on the basis of the harmonic mean.

Fast and accurate diagnosis provides faster and more effective treatment. It is well known that there are not many qualified specialists in this field.

Photographs of blood to be sent to medical organizations where experts work. All this takes time and increases the workload for experts. The ability to get preliminary analysis using neural networks can reduce time and work. An expert opinion will be needed only in difficult and uncertain cases.

As a result, all this will improve the quality of medical care, take less time from doctors and provide more information for decisions about treatment.

Aspects of the present invention may also be implemented with a data processing device, which is a computer or system (or means such as a central / graphics processor or microprocessor) that reads and executes a program written to a memory device to perform the functions of the above-described embodiment (s ) implementation, and the method shown in FIG. 1, the steps of which are performed by a computer or apparatus by, for example, reading and executing a program stored in a memory device to perform the functions of the above-described embodiment (s). To this end, a program is written to a computer, for example, via a network or from a recording medium of various types serving as a storage device (for example, a computer-readable medium).

FIG. 7, a general diagram of a computing device (700) will now be presented with which aspects of the present invention may be implemented.

In the general case, the device (700) contains, combined using a universal bus (710), such components as: at least one processors (701), at least at least one memory (702), data storage means (703), input / output interfaces (704), I / O means (705), networking means (706).

The processor (701) performs all the basic computational operations necessary for the operation of the device (700) or the functionality of one or more of its components. The processor (701) executes the necessary machine-readable instructions contained in the main memory (702).

Memory (702), as a rule, can represent one or more devices of various types, such as: RAM, ROM, or their combinations and contains the necessary program logic that provides the required functionality, and an operating system that organizes the interaction interface and data processing protocols. HDD, SSD disks, flash memory, etc. can be used as ROM.

The data storage medium (703) can be performed in the form of HDD, SSD disks, raid array, network storage, flash memory, optical information storage devices (CD, DVD, MD, Blue-Ray disks), etc. The means (703) allows performing long-term storage of various types of information, for example, the aforementioned files with user data sets, a database containing records of time intervals measured for each user, user identifiers, etc.

Interfaces (704) are standard means for connecting and working with a computer device, for example, USB, RS232, RJ45, LPT, COM, HDMI, PS / 2, Lightning, FireWire, etc.

The choice of interfaces (704) depends on the specific implementation of the device (700), which can be a personal computer, mainframe, server cluster, thin client, smartphone, laptop, be part of a bank terminal, ATM, etc.

As means of I / O data (705) in any embodiment of a system that implements the described method, a mouse should be used. The hardware implementation of the mouse can be any known. Connecting the mouse to the computer can be either wired, in which the mouse connecting cable is connected to the PS / 2 or USB port located on the system unit of the desktop computer, or wireless, in which the mouse exchanges data via a wireless channel, for example, a radio channel. with a base station, which, in turn, is directly connected to the system unit, for example, to one of the USB ports. In addition to the mouse, as part of the funds Data I / O can also be used: joystick, display (touchscreen), projector, touchpad, keyboard, trackball, light pen, speakers, microphone, etc.

Networking tools (706) are selected from a device that provides network reception and transmission of data, for example, Ethernet card, WLAN / Wi-Fi module, Bluetooth module, BLE module, NFC module, IrDa, RFID module, GSM modem (2G, 3G, 4G, 5G), etc. The means (705) provide the organization of data exchange via a wired or wireless data transmission channel, for example, WAN, PAN, LAN, Intranet, Internet, WLAN, WMAN or GSM.

In the present application materials, the preferred disclosure of the implementation of the claimed technical solution was presented, which should not be used as limiting other, particular embodiments of its implementation, which do not go beyond the scope of the claimed scope of legal protection and are obvious to specialists in the relevant field of technology.

Claims

Formula

1. A computer-implemented method for isolating and classifying types of blood cells using deep convolutional neural networks, which consists in performing the stages at which: get an image containing blood cells; S carry out detection, on the obtained image, blood cells; distinguish between normal blood cells and border; normal blood cells are isolated and cut out from the image, and border blood cells are excluded from further analysis; after that, the blood cells are classified by types, while: a set of images is obtained for each image of the cut blood cell using the augmentation method; the set of images obtained for each cell is analyzed and, according to this set, each blood cell is classified by type.

2. The method according to claim 1, characterized in that the detection of blood cells is determined by the coordinates of the upper left corner, the width and height of the cell.

3. The method according to claim 1, characterized in that normal blood cells are isolated by the coordinates of the bounding box.

4. The method according to claim 1, characterized in that a deep convolutional neural network is pre-trained based on two datasets: ImageNet 22k and Place 365.

5. The method according to claim 1, characterized in that a single-stage detector of the RetinaNet family is used to detect blood cells in the image.