WO2022005336A1

WO2022005336A1 - Noise-resilient vasculature localization method with regularized segmentation

Info

Publication number: WO2022005336A1
Application number: PCT/RU2021/050187
Authority: WO
Inventors: Dmitry Vladimirovich Dylov; Oleg Yur'yevich ROGOV; Vito Michele LELI; Aleksandr Yevgenyevich SARACHAKOV; Aleksandr Aleksandrovich RUBASHEVSKII
Original assignee: Autonomous Non-Profit Organization For Higher Education «Skolkovo Institute Of Science And Technology»
Priority date: 2020-06-29
Filing date: 2021-06-29
Publication date: 2022-01-06

Abstract

The present invention relates to a method for neural network training for near-infrared vein imaging, the method including obtaining a set of raw images based on near-infrared radiation from vein; pre-processing the set of raw images using a ves seines s filter and/or an adaptive filter, and/or a denoising network to enhance the visibility of veins and applying an annotation tool for removing the noise and for generating annotated ground truth masks; splitting the pre-processed images into a training set and a test set; feeding the training set and the test set to the neural network; calculating the gradient of loss to evaluate the accuracy of the neural network predictions based on the annotated masks; searching a set of weights and biases that minimizes the losses with the annotated masks; penalizing the neural network for mistakes of the first kind using loss function and/or a tube-like minimal path method. The present method allows a clear definition of a vein contour.

Description

NOISE-RESILIENT VASCULATURE LOCALIZATION METHOD WITH REGULARIZED SEGMENTATION

FIELD OF INVENTION

The present invention is generally directed to the field of vein imaging systems and methods. More specifically, it is directed to the methods for identifying venous systems via vasculature semantic segmentation in the NIR imaging and visible light imaging.

BACKGROUND ART

Currently, the most common tests to help recognize diseases or disorders are blood tests. Analytical studies have estimated that more than 1.5 million blood tests are performed per day, with roughly 45% of them associated with various degrees of discomfort, such as rashes, hematomas, or damaged veins due to re-venipuncture. The risk group includes people with obesity (2.1 billion people in the world), diabetes (415 millions), chronic venous damage (30 millions), infants and children up to 10 years of age (more than 1.5 billions). Poorly discernible and non- palpable veins characterize peripheral difficult venous access problem (PDVA), in which case even a healthcare professional can use technical means for guiding the needle of the vein- puncturing device. The most common causes of the above-mentioned problem are thin or deep veins, excessive layers of adipose tissue, loss of color contrast due to the skin tone or the hairiness, edemas, and previous puncture damages. Currently, there is no generally accepted solution, especially when particularly difficult cases of PDVA occur.

Recently, several commercial products have appeared that can partially solve this problem through hardware and software integration. Available devices include devices with external screens, augmented reality glasses, laser vein illumination stations, etc. There are various types of vein imaging systems such as a portable vein viewer apparatus, an identity recognition device and other vein imaging systems. In recent years, these imaging technologies can provide the imaging of different parts of a human body and its vasculature using a variety of methods, including methods based on the absorption of infrared light by the veins. However, devices that use a laser as an infrared light source are complicated and expensive to manufacture, and devices that use various types of LEDs as an infrared light source are not effective for identifying fine or deep veins or those veins that need to overcome certain skin characteristics, such as excessive layers of adipose tissue, or loss of color contrast due to the skin tone or the hairiness, edemas, and previous puncture damages.

In CN patent application No. 111178221 A, an identity recognition method and an identity recognition device based on a finger vein identification is disclosed. The method includes the following steps: acquiring an image of a finger vein of a target object, collected by an infrared camera; identifying the finger vein image using a finger vein neural network model; and determining the identification information of the target object based on the finger vein image identification result, wherein the neural network model of a finger vein is obtained by machine learning training using multiple data groups, each data group from the multiple data groups including an image of a finger vein and the identification information of the corresponding target object.

According to this method the poor stability of imaging equipment, the low recognition rate and the inability to overcome the effects of translation are improved.

However, due to the special location of a finger vein, it does not have any of such skin characteristics, as fine or deep veins arrangement, excessive layers of adipose tissue, or loss of color contrast due to the skin tone or the hairiness, edemas, and previous puncture damages. Consequently, the method disclosed in CN patent application No. 111178221 A is not intended for solving the problems of identifying veins in difficult cases of the skin having the above skin characteristics.

The closest prior art of the claimed invention is a portable handheld apparatus for identifying vein locations disclosed in US 8,463,364. The device comprises a first laser diode emitting infrared light and a second laser diode emitting only visible light. The device is used for identifying veins location using infrared light and projecting an image of those veins directly on a patient's skin using visible light. Infrared light emitted by first laser diode is processed through a circuitry to amplify, sum, and filter the outputted signals, and with the use of an image-processing algorithm, the contrasted image is projected onto the patient's skin surface using the second laser diode. However, certain patient's veins or a portion of their veins might not be displayed well or at all, since the patient's body parts can include some of the above-mentioned skin characteristics, including vein depth skin conditions (e.g. eczema, tattoos), hair, highly contoured skin surface, and adipose (i.e. fatty) tissue.

Therefore, nowadays, despite a plurality of methods, devices and systems for vein identification having been developed, none of them provide a method for detecting veins in patients who have said skin characteristics that prevent a clear definition of vein contour.

Therefore, improved means and methods for vein imaging systems that allow clear definition of a vein contour are desired.

SUMMARY OF THE INVENTION

In view of the foregoing, an object of the invention is to clearly define a vein contour, especially for people having thin or deep veins, excessive layers of adipose tissue, loss of color contrast due to the skin tone or the hairiness, edemas, and previous puncture damages. The methods and devices disclosed herein provide a clear definition of vein contour via vasculature semantic segmentation in the NIR imaging using infrared LED as an infrared source and visible light imaging, in combination with a vesselness filter and/or an adaptive filter, and/or a denoising network, U-Net-based neural network (CNN) with loss function and/or a tube-like minimal path method in patients with the above-mentioned skin characteristics. The methods and devices of the present invention provide it possible to determine veins pattern to facilitate for the healthcare professional to find the location of veins without additional equipment or manipulations. Moreover, due to appliance of LEDs as an infrared light source and/or ambient light as a visible light source, the device for implementing the method is cheaper and easier to manufacture. Finally, another technical effect of the invention is the reduction of the first and the second kind mistakes, more preferably, the mistakes of the first kind. In the invention, the mistakes of the first kind (false positive mistakes) are veins that have not been identified as veins, and the mistakes of the second kind (false negative mistakes) are objects that have been erroneously identified as veins. Therefore, it is more important to notice fewer veins but be almost certain that these are true veins.

First, to generalize vein detection among different patient cohorts the extraction of the vein pattern via vasculature semantic segmentation in the NIR imaging using infrared LED as an infrared source and visible light imaging is provided. For that, in some embodiments, a U-Net architecture is enhanced by penalizing for non-tubical morphology using a loss function and/or a tube-like minimal path method as well as vesselness filter and/or an adaptive filter and/or a denoising network pre-processing. The implemented U-Net architecture allows the pipeline to localize the finest and otherwise indiscernible veins.

The object of the present invention is to provide methods and devices for identifying venous systems with improved definition of a vein contour via vasculature semantic segmentation in the NIR imaging and visible light imaging.

In one aspect, the object is achieved by providing a method for neural network training for near-infrared vein imaging, the method including: obtaining a set of raw images based on near-infrared radiation from at least one vein; pre-processing the set of raw images using a vesselness filter and/or an adaptive filter and/or a denoising network to enhance the visibility of veins and applying an annotation tool for removing the noise and for generating annotated ground truth masks; splitting the pre-processed images into a training set and a test set; feeding the training set and the test set to the neural network; calculating the gradient of loss to evaluate the accuracy of the neural network predictions based on the annotated masks; searching a set of weights and biases that minimizes the losses with the annotated masks; penalizing the neural network for mistakes of the first kind using a loss function and/or a tube-like minimal path method.

Thus, according to the methods of the invention, the sequential execution of all steps of the methods for vein image visualization provides a step-by-step removal of noise in the vein images in infrared or visible light, wherein at least part of the noise from using LEDs is removed at the preprocessing step. The subsequent image segmentation using a neural network with loss function and/or a tube-like minimal path method provides a more accurate definition of a vein contour despite the interfering skin characteristics. Accurate definition of a vein contour in patients with skin characteristics, including vein depth skin conditions (e.g. eczema, tattoos), hair, highly contoured skin surface, and adipose (i.e. fatty) tissue, is also provided by penalizing for mistakes of the first kind. This is achieved by using loss function and/or a tube-like minimal path method that provides the determination of a tubular shape of objects in an image and subsequent penalization of the neural network, in case, if these objects are not tubular.

In one embodiment of the invention, the adaptive filter is selected from at least one of the following: Anisotropic Adaptive Filtering, Random Fields and Wiener Filtering, Learnable frequency domain filters.

In another embodiment of the invention, the loss function is selected from at least one of the following: a tubular loss function, a standard binary cross-entropy (BCE), Structure Similarity (SSIM) metrics, a Wasserstein distance, a peak signal-to-noise ratio (PSNR) or a combination thereof.

In another embodiment of the invention, the vesselness filter is a Frangi filter, a tubeness filter or a Ciahe filter.

In another aspect of the invention, a method for neural network training of visible light vein imaging is provided, the method comprising: obtaining a set of raw images based on visible light radiation from at least one vein; pre-processing the set of raw images using a vesselness filter and/or an adaptive filter and/or a denoising network to enhance the visibility of veins and applying an annotation tool for removing the noise and for generating annotated ground truth masks; splitting the pre-processed images into a training set and a test set; feeding the training set and the test set to the neural network; calculating the gradient of loss to evaluate the accuracy of the neural network predictions based on the annotated masks; searching a set of weights and biases that minimizes the losses with the annotated masks; penalizing the neural network for mistakes of the first kind using loss function and/or a tube-like minimal path method.

In one embodiment of the invention, the adaptive filter is selected from at least one of the following: Anisotropic Adaptive Filtering, Random Fields and Wiener Filtering, Learnable frequency domain filters. In another embodiment of the invention, the loss function is selected from at least one of the following: a tubular loss function, a standard binary cross-entropy (BCE), Structure Similarity (SSIM) metrics, a Wasserstein distance, a peak signal-to-noise ratio (PSNR) or a combination thereof.

In another embodiment of the invention, the vesselness filter is a Frangi filter, a tubeness filter or a Clahe filter.

In another aspect of the invention, a method for near-infrared vein image visualization is provided, the method comprising:

- emitting light by an infrared source onto a body part of a subject;

- capturing the infrared light scattered at and reflected from the body part of the subject by an infrared camera to form an image of veins comprised in the body part;

- pre-processing the image using a vesselness filter and/or an adaptive filter, and/or a denoising network;

- segmenting the image via a neural network trained using the method for neural network training for near-infrared vein imaging;

- projecting the segmented image onto the body part using a projector.

In one embodiment of the invention, before capturing the infrared light by an infrared camera the light scattered at and/or reflected by the body part transmits through a polarizing filter and an IR filter.

In another embodiment of the invention, the infrared source is an IR LED.

In another embodiment of the invention, the neural network is a convolutional neural network and/or a recurrent network.

In another embodiment of the invention, the denoising network and the convolutional network using for image segmentation are configured as a single architecture. In another embodiment of the invention, the convolutional and/or recurrent neural network is selected from at least one of the following: U-Net, Attention U-Net, U-Net with different encoders, Recurrent U-Net.

In another embodiment of the invention, the encoders are selected from at least one of the following: VGG11, ResNet34, VGG-X, ResNet-X, ResNet, ResNeXt, ResNeSt, Res2Ne(X)t, RegNet(x/y), SE-Net, SK-ResNe(X)t, DenseNet, Inception, EfficientNet, MobileNet, DPN or VGG.

In still another aspect of the invention, a method for visible light vein image visualization is provided, the method comprising:

- emitting light by a visible light source onto a body part of a subject;

- capturing the visible light scattered at and reflected from the body part of the subject by a camera to form an image of veins comprised in the body part;

- segmenting the image via neural networks trained using the method for neural network training of visible light vein imaging;

- projecting the segmented image on the body part using a projector.

In another embodiment of the invention, the visible light source is an ambient light.

In another embodiment of the invention, the neural networks is a convolutional neural network and/or recurrent network.

In another embodiment of the invention, the denoising network and the convolutional network using for image segmentation configured as a single architecture. In another embodiment of the invention, a convolutional neural network and/or recurrent neural network is selected from at least one of the following: U-Net, Attention U-Net, U-Net with different encoders, Recurrent U-Net.

In another embodiment of the invention, the encoders are selected from at least one of the following: VGG11, ResNet34, VGG-X. ResNet-X, ResNet, ResNeXt, ResNeSt, Res2Ne(X)t, RegNet(x/y), SE-Net, SK-ResNe(X)t, DenseNet, Inception, EfficientNet, MobileNet, DPN or

VGG.

In still another embodiment of the invention, the camera is a hyperspeetrai camera.

In another aspect of the invention, a device for near-infrared vein image semantic segmentation is provided, the device comprising:

- an infrared source for emitting infrared light onto a body part of a subject;

- a set of polarizing and IR filters configured to transmit the infrared light scattered and/or reflected by the body part;

- an infrared camera positioned to capture the infrared light scattered at and reflected from the body part of the subject transmitted by the set of polarizing and IR filters to form an image of veins comprised in the body part;

- a computation core configured to pre-process the image using a vesselness filter and/or an adaptive filter, and/or a denoising network and to segment the image via a neural network; and

- a projector configured to project the segmented image onto the body part.

In another embodiment of the invention, a device comprising a set of polarizing and IR filters configured to transmit the infrared light scattered and/or reflected by the body part.

In another aspect of the invention, a device for visible light vein image semantic segmentationrovided, the device comprising:

- a visible light source for emitting visible light onto a body part of a subject;

- a camera positioned to capture the visible light scattered at and reflected from the body part of the subject to form an image of veins comprised in the body part;

- a projector configured to project the segmented image onto the body part.

BRIEF DESCRIPTION OF THE DRAWINGS

For a better understanding of the invention reference will now be made, by way of example only, to the accompanying drawings, which are not necessarily drawn to scale, and wherein:

FIG. 1 is a perspective view of a hardware configuration of an experimental setup of vein scanner (3D view).

FIG. 2 shows a detailed experimental setup of FIG. 1.

FIG. 3 shows the segmentation pipeline using a vesselness filter, attention U-Net and tubular loss function.

FIG. 4 shows different segmentation cases from left to right: raw image, ground true mask, segmented (predicted) probability image, segmented (predicted) binarized mask.

FIG. 5 shows different architectures of networks.

DETAIFED DESCRIPTION OF THE EMBODIMENTS

The present disclosure will now be described more fully hereinafter with reference to example implementations thereof. These example implementations are described so that this disclosure will be thorough and complete, and will fully convey the scope of the disclosure to those skilled in the art. Indeed, the disclosure may be embodied in many different forms and should not be construed as limited to the implementations set forth herein; rather, these implementations are provided so that this disclosure will satisfy applicable legal requirements. As used in the specification and the appended claims, the singular forms “a,” “an,” “the” and the like include plural referents unless the context clearly dictates otherwise. Also, while reference may be made herein to quantitative measures, values, geometric relationships or the like, unless otherwise stated, any one or more if not all of these may be absolute or approximate to account for acceptable variations that may occur, such as those due to engineering tolerances or the like.

As used herein the term “mask” means a filter. The concept of masking is also known as spatial filtering. Masking is also known as filtering. In this case, the filtering operation is performed directly on the image. The general process of filtering and applying masks is consists of moving the filter mask from point to point in an image. At each point (x,y) of the original image, the response of a filter is calculated by a predefined relationship. All the filters values are pre defined and are a standard.

As used herein the term “segmentation” in the field of digital image processing and computer vision means image segmentation, i.e. the process of dividing a digital image into multiple segments (sets of pixels, also known as image objects). The goal of segmentation is to simplify and/or change the representation of an image into something that is more meaningful and easier to analyze. Image segmentation is typically used to locate objects and boundaries (lines, curves, etc.) in images. More precisely, image segmentation is the process of assigning a label to every pixel in an image such that pixels with the same label share certain characteristics.

As used herein the term “semantic segmentation” means the process of classifying each pixel belonging to a particular label.

As described hereinafter, example implementations of the present disclosure relate to vein imaging systems.

The most promising line of augmenting products are vein scanners that use near-infrared (NIR) cameras to provide an extra vein contrast that is present in the NIR part of the spectrum. The NIR image can clearly show the contours of large veins or veins close to skin on a separate screen or directly on a body part of the patient using a visible light projector. Therefore, a method for visualization a contour of deeper and thinner veins, as well as veins, the visualization of which is prevented by the above-mentioned skin characteristics or a body (thin or deep veins, excessive layers of adipose tissue, loss of color contrast due to the skin tone or the hairiness, edemas, and previous puncture damages) is required. In view of the absence of a proper device to solve the problem described herein, the object of the claimed invention is not only to provide improved means and methods for vein imaging systems, but also to make that simpler than those of the prior art. Thus, LEDs are used as an infrared source to simplify and reduce the costs of the system. However, using LEDs and/or visible light as a light source may be an additional cause of noise in vein images in addition to noise caused by skin characteristics. To solve the above-mentioned problems, the Inventor focused on implementing a neural network (CNN) with loss function and/or a tube-like minimal path method, infrared (LEDs) or visible light and a vesselness filter and/or an adaptive filter, and/or a denoising network, configured in a typical hardware configuration of a vein scanner. The provided configuration of the obtained set of software-based hardware units using the methods for vein image visualization via vasculature semantic segmentation disclosed herein allows the noise in the vein images in infrared or visible light to be removed, and thereby allows a clear vein contour projected onto a body part of a patient. The obtained set of software- based hardware units provides the achievement of the required technical effect, presented on Fig. 1 for a schematic and Fig. 2 for details.

Reffering to FIG. 1, 2, in one embodiment, is an experimental setup comprising an infrared source, such as (IR) diodes (e.g. Kingbright, F-7113SF4C, Taipei, Taiwan) or a panel of IR diodes (e.g. SFH 4780S), having a wavelength range from 750 nm to 900 nm and illuminating an area of interest on a body part of a patient, preferably a forearm; a set of filters; an IR camera (e.g. Raspberry NoIR camera V2) doping with particles sensitive to the NIR light for obtaining an IR image of deep or thin blood vessels, otherwise invisible in a regular light; a computation core (e.g. Raspberry Pi 4 with 4Gb RAM) in communication with the NIR camera and a projector for receiving an image, pre-processing and segmentation thereof, and subsequent transmission to the projector; and the projector (e.g. XGIMI Z3, Chengdu Sichuan, China) configured to project the processed and segmented image back onto the body part of the patient. The infrared source may include, for example, a coherent source (infrared lasers that are used with a certain allowed power in medicine), an incoherent source (commercial (non-laser) IR FEDs operating at wavelengths other than those indicated above), xenon lamps, luminescent and incandescent lamps, or the like, or some combination thereof.

The light scattered, absorbed and/or reflected by tissues and veins of a body part of a patient, preferably tissues of a forearm, is passed through a set of filters, such as an IR filter located in front of the IR camera to reduce the stray light effect of the ambient light, mostly in the visible spectral range, and crossed polarizers to reduce reflection from the air-skin interface to minimize a patch of light, and then the light is captured by the IR camera to form a raw image of the veins. Then the computational core processes the raw image captured by the IR camera and the processed image is transmitted to the projector. In one embodiment, the computational core comprise the Frangi-filtered input for U-Net and various modules for CV- and RL- based image-mask alignment. In another embodiment, the computational core comprise the CLAHE-filtered input for U-Net. A Contrast Limited Adaptive Histogram Equalization (CLAHE) filter is used to enhance the local contrast of the image. It is based on computation of several histograms for each corresponding section of the image to redistribute the lightness.

Alternatively, or in addition, in another embodiment, an experimental setup comprises a visible light source, such as natural light, illuminating an area of interest on a body part of a patient, preferably a forearm; a camera for obtaining an image of blood vessels; a computation core (e.g. Raspberry Pi 4 with 4Gb RAM) in communication with the camera and a projector for receiving an image, pre-processing and segmentation thereof, and subsequent transmission to the projector; and the projector (e.g. XGIMI Z3, Chengdu Sichuan, China) configured to project the processed and segmented image back onto the body part of the patient. The camera may include, for example, a hyperspectral camera.

The light scattered, absorbed and/or reflected by tissues and veins of a body part of a patient, preferably tissues of a forearm, is captured by the camera to form a raw image of the veins. Then the computational core processes the raw image captured by the camera and the processed image is transmitted to the projector. Preferably, the computational core comprise the Frangi- filtered input for U-Net.

In another embodiment, the computational core comprises the CLAHE-filtered input for

U-Net.

In another embodiment, the computational core comprises an adaptive filter or a denoising network.

Data preparation

In one embodiment, a dataset of NIR images of forearms is collected using the experimental setup shown in FIG. 1 and 2. In one embodiment, the raw images are processed with the Frangi vesselness filter to enhance the visibility of tube-like structures (veins). In particular, the filter entails image Hessian calculation, followed by a computation of the eigenvectors and the corresponding eigenvalues, which form the adjusted bounds for the image contrast. After this preprocessing, a Computer Vision Annotation Tool (CVAT) is used to remove the noise and to generate the annotated ground truth masks. The ground truth masks are produced during the dataset preparations by an expert by means of an annotation tool (in this embodiment such a tool is CVAT) as ground truth images for the neural network to train for segmentation tasks at the segmentation step. These masks are prepared with the dataset and are the references for the neural network during the training/validation process. In one embodiment, the collected dataset includes 320 forearm images, with 90 of these images being manually annotated and divided into 75 training images (training set) and 15 test images (test set). The pre-processed images are cropped to the resolution of 512 px x 512 px to match the input to the neural network.

In another embodiment, after this pre-processing any annotation tool (mostly deep- leaming-specific) can be used, such as Labelme, labelimg, Visual Object Tagging Tool (VoTT), imglab, or the like.

To ensure various plausible hand orientation in front of the camera, standard image augmentation techniques is applied to the training set, including scaling and randomly initialized rotations (the range of angles is restricted to {-180°, 180°}). To increase the range of illumination patterns, brightness augmentation methods, which can introduce bias into the training, are omitted. Instead, the object of the invention is focused on a true representation of the signal variability in the collected dataset, yielding better segmentation performance for the real scenarios. Specifically, the dataset contains a wide range of natural intensity variations associated with the skin characteristics, including vein depth skin conditions (e.g. eczema, tattoos), hair, highly contoured skin surface, and adipose (i.e. fatty) tissue.

For the purpose of segmentation strategies and techniques, U-Net is a preferable CNN architecture for an image segmentation in terms of efficiency. There exist many U-Net architectures which fine-tune the model for particular segmentation tasks. For example, noise- resistance property of networks can preserve high accuracy of predictions even on noisy and distorted data. Non-Noise-resistant networks are largely affected by the data noise level (due to architecture structure or data used for training). Denoising Network is a network capable of implementing data denoising techniques to ensure images better quality. In particular, denoising networks can be applied as a filter at the pre-processing step to remove noise, that is, before the images are fed to the neural network. Although a denoising network is a convolutional network, it is not used for the segmentation tasks, since its architecture is not designed to perform a segmentation. Temporal features encoded in a recurrent network, such as Long Short-Term Memory (LSTM), may also enhance CNNs for different medical segmentation needs. Yet, among the most promising segmentation models are the latest U-Net-based CNNs that use the attention mechanisms and the shape-awareness. Classical approaches to detect veins and vessels use Hessian- and skeleton-based methods, both of which currently experiencing some resurgence as the community realized that they could be used to additional regularization for tube-like structures with CNNs.

FIG. 3 illustrates a neural network training, in accordance with some embodiments of the present invention. Each U-Net-based architecture of the set of U-Net-based architectures is trained and compared to be used for a semantic segmentation task on the dataset. The set consists of six primary architectures: classic U-Net, U-Net with VGG-11 (TernausNet) and ResNet34 (AlbuNet) encoders, attention based U-Net, Recurrent U-Net and Attention Recurrent U-Net (shown in FIG. 5). It shall be note that in the other embodiments U-Net may be implemented with other encoders, such as VGG-X, ResNet-X, ResNet, ResNeXt, ResNeSt, Res2Ne(X)t, RegNet(x/y), SE-Net, SK- ResNe(X)t, DenseNet, Inception, EffieientNet, MobileNet, DPN, YGG. Moreover, neural networks can be used in coupe with differential image gradient. Discrete differentiation operators are used in case of computing an approximation of the gradient of the image intensity function. The gradient of an image measures how it is changing. In particular, it provides two pieces of information. The magnitude of the gradient tells us how quickly the image is changing, while the direction of the gradient tells us the direction in which the image is changing most rapidly. Such operators are based on convolving the image with a small, separable filter in the horizontal and vertical directions and is therefore relatively inexpensive in terms of computations. CNN architectures automatically learn a set of such kernels so their use is common on such scenarios. FIG. 3 shows the segmentation pipeline demonstrating the highest results when using a combination of a vesselness filter and/or an adaptive filter and/or a denoising network, Attention U-Net with an attention gate to focus on target structures (veins) of varying shapes and sizes and a loss function, such as similarity measure called centerlineDice (short clDice), which is calculated on the intersection of the segmented masks (predicted by network) and their (morphological) skeleton. These masks are virtual and generated by the neural network and later used in the alignment procedures. In other embodiments, the standard binary cross-entropy (BCE) is used as a loss function. It should be mentioned that other proper function of this category, such as Categorical Crossentropy, Sparse Categorical Crossentropy, or the like can be used. However, the inventors has found that the clDice loss function gives the best results. ClDice is a modification of the standard Dice loss to optimize the segmentation of tubular structures.

According to another embodiment, a tubular loss function, a standard binary cross-entropy (BCE), Structure Similarity (SSIM) metrics, a Wasserstein distance, a peak signal-to-noise ratio (PSNR) or a combination thereof can be used as a loss function.

It shall be noted that using of appropriate loss functions improves the quality of the results. In particular, neural networks with noise-aware losses (SSIM, MS-SSIM) significantly improve results, even when the network architecture is left unchanged.

According to another embodiment, multi-threshold methods with Wasserstein distance known in the art can be used as a loss function. The Wasserstein distance provides a natural notion of dissimilarity for probability measures. Although optimizing with respect to the exact Wasserstein distance is costly, recent work has described a regularized approximation that is efficiently computed. The Wasserstein loss can encourage smoothness of the predictions with respect to a chosen metric on the output space. Learning with a Wasserstein loss may be used alone or in combination with the other losses.

In another embodiment, the segmentation pipeline comprises a combination of a vesselness filter and/or an adaptive filter and/or a denoismg network, neural network and a tube-like minimal path method to correct the segmented masks.

The tube-like minimal path method is an interactive method for tubular structure extraction. The main application of the method for the claimed invention is vessel tracking in 2D and 3D images. The basic tools are minimal paths solved using the fast marching algorithm. This allows interactive tools for the physician by clicking on a small number of points in order to obtain a minimal path between two points or a set of paths in the case of a tree structure. The approach is based on a variant of the minimal path method that models the vessel as a centerline and surface. This is done by adding one dimension for the local radius around the centerline. The metric is well oriented along the direction of the vessel, admits higher velocity on the centerline, and provides a good estimate of the vessel radius. Based on the optimally oriented flux this measure is required to be robust against the disturbance introduced by noise or adjacent structures with intensity similar to the target vessel. The tube-like minimal path method can be used separately or in combination with any other loss as a weighted component. In some embodiments, Anisotropic Adaptive Filtering, Random Fields and Wiener Filtering, Learnable frequency domain filters or the like can be used as an adaptive filter.

According to different embodiments, any of the above-mentioned U-Net- based architecture can be used in the segmentation pipeline.

Since using of a combination containing any of the above-mentioned U-Net-based architecture, a vesselness filter and/or an adaptive filter, and/or a denoising network and any of the loss functions will be apparent to a person skilled in the art, only one example embodiment of the invention is illustrated on FIG. 3.

To evaluate divergence of the predicted results from the ground truth p the following

loss functions is applied: binary cross entropy (BCE) loss - a sigmoid activation with a cross- entropy (CE) loss. (¹⁾

(2)

Additionally, the sum of the two mentioned above losses as a new cumulative loss is used:

(3)

In addition, clDice is used as a loss function. Masks in Dice formula are substituted with the intersection of the segmented masks and their skeletons:

(4)

where s and ŝ are skeletons of the masks, T_prec and T_sens are the topology precision and the topology sensitivity, respectively.

Finally, clDice loss is used as it shows the best result on the dataset. The goal of the training is to obtain high accuracy on validation while keeping the loss between training and validation sets to a similar value to avoid overfitting or underfitting.

For training, in one embodiment, data in batches with a size of 5 are used, since the representativeness in sets of larger numbers of pictures is improved. Size of batches are only limited by memory size. Then this data is mixed, used for making a prediction (forward pass) and then losses with annotated masks (an expert annotates the raw NIR images with the ground-truth binary masks) to train the network (segmentation “step”)) is considered and the gradient of loss is calculated (backward pass). This step is followed by the loss function optimization. The validation is run and the quality is verified with the correct annotation in the validation dataset (Manual image annotation is the process of manually defining regions in an image and creating a textual description of those regions. Such annotations can for instance be used to train machine learning algorithms for computer vision applications). For supervised learning Adam optimizer with default parameters is applied to train the model. Besides, “reduce learning rate on plateau” strategy is used, and different learning rates {0.01, 0.001, 0.0001} are set and the total learning epochs not less than 100 are set. To find the optimal configuration for the network training, a tuning of the hyperparameters over a selected range is required. The tuning is done using the Ignite framework.

Important to mention a training cycle technique, namely false positive penalization. While the validation accuracy is not on the plateau, the network keeps training. Then weights for non- vein objects are increased and start over last moment. Similar strategy of false negatives penalization can be applied when vein detection is bad in terms of false negative errors. Training network in several steps using false positives penalization policy provides better results as it properly works with shadows, hairs, tattoos and folds of forearms. Furthermore, false positive errors are more important in training, than false negatives, since it is better to notice fewer veins but be almost certain that these are true veins.

As for metrics, Intersection over Union (IoU), Precision fraction of relevant instances among the retrieved instances, and Recall (fraction of the total amount of relevant instances that were actually retrieved) are used. Hence, higher-order inconsistencies between ground truth segmented masks are detected and the ones segmented by the Frangi-supported U-Net: (5)

In addition, Structure Similarity (SSIM) metrics are used for accuracy monitoring. It quantifies image quality degradation between two images:

(6)

where α = β = γ = 1, 1 is the luminance component, c is the contrast component and s is the structure component:

(7)

(8)

(9)

where c₁,c₂,c₃ are stabilization variables, μ_i - average of i, σ_i - variance of i and σ_ij - covariance of i and j.

By defining a noise-free l x m image I and its noisy approximation

, mean squared error is written as: (10)

The loss function of the neural network based on the peak signal-to-noise ratio is used, being often referenced to as PSNR:

(11)

where MAX_I is the maximum plausible pixel value of the image.

Using PSNR loss enables the noise-aware training process of the neural network.

Let be the set of all joint distributions γ whose marginal distributions are μ and v.

Then Wasserstein distance is:

(12)

Frangi-based pre-processing of the Attention U-Net inputs for Dice loss case FIG. 4. It increases segmentation accuracy in comparison with all other pipeline modifications. The classic segmentation paradigm implies the bounding box being detected first and then segmented. The idea of attention gate is to segment through a refined attention form rather than a detected noisy box. Thus, a suppression of irrelevant areas is obtained, resulting in increased quality as shown in Table 1 (see below), where numerical results also include standard deviation.

Table 1. Segmentation models performance comparison

It is essential to mention that the SSIM metric shows high-quality results for all considered networks, while IoU metric differs from one architecture to another. This is because the SSIM metric is a signal-wise one, and IoU is a per-class metric. Nevertheless, SSIM metric shows that all the networks preserve accurate structure information in predictions as demonstrated in FIG. 4. Even though the metrics results are in suboptimal values, the used approach shows a good visual concordance between original images and predicted veins. This discrepancy can be explained by reasons derived from a manual assembly of a brand-new local dataset. Firstly, in the set collected at the moment (which is rather small being composed by 90 images in total), the forearm's visual characteristics show clearly a dispersion effect, e.g., color, hairline, vein depth, size. Secondly, very different structures of the veins are recorded: from barely noticeable points to large tree-like structures. Thirdly, the manual annotation is a sophisticated process due to the variability of original images contrast and brightness. Additionally, one has to deal with the noise level of the images received from the IR camera, such as vein light, contrast and different image resolutions. This can be compensated by taking advantage of the Frangi-based pre-processing of the U-Net inputs. All this affects the results: somewhere cut off parts of the veins (Fig. 3 A, B, D, E), somewhere non-vein objects are segmented as veins (Fig. 3 C, F). Therefore, a solution for the PDVA problem was proposed and realized as a hardware prototype with specific software of embodiments of the present invention.

Finally, the sequential execution of all steps of the methods of embodiments of the present invention for vein image visualization provides the removal of noise in the vein images in infrared or visible light, wherein at least part of the noise from using LEDs is removed at the preprocessing step. The subsequent image segmentation using a neural network with tubular loss function provides a more accurate definition of a vein contour despite the interfering skin characteristics. Therefore, it is the synergetic and sequential execution of all steps of the method implemented by the presented device that provides the extraction of vein ’confidence’ maps. Many modifications and other embodiments of the disclosure set forth herein will come to mind to one skilled in the art to which this disclosure pertains having the benefit of the teachings presented in the foregoing descriptions and the associated drawings. Therefore, it is to be understood that the disclosure is not to be limited to the specific implementations disclosed, and that modifications and other implementations are intended to be included within the scope of the appended claims. Moreover, although the foregoing descriptions and the associated drawings describe example implementations in the context of certain example combinations of elements and/or functions, it should be appreciated that different combinations of elements and/or functions may be provided by alternative implementations without departing from the scope of the appended claims. In this regard, for example, different combinations of elements and/or functions than those explicitly described above are also contemplated as may be set forth in some of the appended claims.

Claims

1. A method for neural network training for near-infrared vein imaging, the method including: obtaining a set of raw images based on near-infrared radiation from at least one vein; pre-processing the set of raw images using a vesselness filter and/or an adaptive filter and/or a denoising network to enhance the visibility of veins and applying an annotation tool for removing the noise and for generating annotated ground truth masks; splitting the pre-processed images into a training set and a test set; feeding the training set and the test set to the neural network; calculating the gradient of loss to evaluate the accuracy of the neural network predictions based on the annotated masks; searching a set of weights and biases that minimizes the losses with the annotated masks; penalizing the neural network for mistakes of the first kind using a loss function and/or a tube-like minimal path method.

2. The method according to claim 1, wherein the adaptive filter is selected from at least one of the following: Anisotropic Adaptive Filtering, Random Fields and Wiener Filtering, Learnable frequency domain filters.

3. The method according to claim 1, wherein the loss function is selected from at least one of the following: a tubular loss function, a standard binary cross-entropy (BCE), Structure Similarity (SSIM) metrics, a Wasserstein distance, a peak signal-to-noise ratio (PSNR) or a combination thereof.

4. The method according to claim 1, wherein the vesselness filter is a Frangi filter, a tubeness filter or a Clahe filter.

5. A method for neural network training of visible light vein imaging, the method comprising: obtaining a set of raw images based on visible light radiation from at least one vein; pre-processing the set of raw images using a vesselness filter and/or an adaptive filter and/or a denoising network to enhance the visibility of veins and applying an annotation tool for removing the noise and for generating annotated ground truth masks; splitting the pre-processed images into a training set and a test set; feeding the training set and the test set to the neural network; calculating the gradient of loss to evaluate the accuracy of the neural network predictions based on the annotated masks; searching a set of weights and biases that minimizes the losses with the annotated masks; penalizing the neural network for mistakes of the first kind using loss function and/or a tube-like minimal path method.

6. The method according to claim 5, wherein the adaptive filter is selected from at least one of the following: Anisotropic Adaptive Filtering, Random Fields and Wiener Filtering, Learnable frequency domain filters.

7. The method according to claim 5, wherein the loss function is selected from at least one of the following: a tubular loss function, a standard binary cross-entropy (BCE), Structure Similarity (SSIM) metrics, a Wasserstein distance, a peak signal-to-noise ratio (PSNR) or a combination thereof.

8. The method according to claim 5, wherein the vesselness filter is a Frangi filter, a tubeness filter or a Clahe filter.

9. A method for near-infrared vein image visualization, the method comprising:

- emitting light by an infrared source onto a body part of a subject;

- pre-processing the image using a vesselness filter and/or an adaptive filter and/or a denoising network;

- segmenting the image via a neural network trained using the method of claim 1 ;

- projecting the segmented image onto the body part using a projector.

10. The method according to claim 9 wherein before capturing the infrared light by an infrared camera the light scattered at and/or reflected by the body part transmits through a polarizing filter and an IR filter.

11. The method according to claim 9, wherein the infrared source is an IR LED.

12. The method according to claim 9, wherein the neural network is a convolutional neural network and/or a recurrent network.

13. The method according to claim 12, wherein the denoising network and the convolutional network used for image segmentation are configured as a single architecture.

14. The method according to claim 12, wherein the convolutional and/or the recurrent neural network is selected from at least one of the following: U-Net, Attention U-Net, U-Net with different encoders, Recurrent U-Net.

15. The method according to claim 14, wherein the encoders are selected from at least one from the following: VGG11 , ResNet34, YGG-X, ResNet-X, ResNet, ResNeXt, ResNeSt, Res2Ne(X)t, RegNet(x/y), SE-Net, SK-ResNe(X)t, DenseNet, Inception, EfficientNet, MobileNet, DPN or VGG,

16. A method for visible light vein image visualization, the method comprising:

- emitting light by a visible light source onto a body part of a subject;

- segmenting the image via neural networks trained using the method of claim 6;

- projecting the segmented image on the body part using a projector.

17. The method according to claim 16, wherein the visible light source is an ambient light.

18. The method according to claim 16, wherein the neural networks is a convolutional neural network and/or recurrent network.

19. The method according to claim 16, wherein the denoising network and the convolutional network used for image segmentation are configured as a single architecture.

20. The method according to claim 18, wherein the convolutional neural network and/or the recurrent neural network is selected from at least one of the following: U-Net, Attention U-Net, U-Net with different encoders, Recurrent U-Net.

21. The method according to claim 20, wherein the encoders are selected from at least one of the following: VGG11, ResNet34, VGG-X, ResNet-X, ResNet, ResNeXt, ResNeSt, Res2Ne(X)t, RegNet(x/y), SE-Net, SK-ResNe(X)t, DenseNet, Inception, EfficientNet, MobileNet, DPN or VGG.

22. The method according to claim 16, wherein the camera is a hyperspectra! camera.

23. A device for near-infrared vein image semantic segmentation, the device comprising:

- an infrared source for emitting infrared light onto a body part of a subject;

- an infrared camera positioned to capture the infrared light scattered at and reflected from the body part of the subject to form an image of veins comprised in the body part;

- a projector configured to project the segmented image onto the body part.

24. The device according to claim 23 comprising a set of polarizing and IR filters configured to transmit the infrared light scattered and/or reflected by the body part.

25. A device for visible light vein image semantic segmentation, the device comprising:

- a projector configured to project the segmented image onto the body part.