WO2023283411A2

WO2023283411A2 - Method for machine-learning based training and segmentation of overlapping objects

Info

Publication number: WO2023283411A2
Application number: PCT/US2022/036470
Authority: WO
Inventors: Max PIHLSTRÖM; Ida-Maria Sintorn
Original assignee: Intelligent Virus Imaging Inc.
Priority date: 2021-07-08
Filing date: 2022-07-08
Publication date: 2023-01-12
Also published as: WO2023283411A4; WO2023283411A3

Abstract

The method is for training and automatically segmenting overlapping objects (102, 104) in images such as overlapping objects in images acquired with an imaging device such as a microscope. The overlapping objects are divided into non-overlapping connected components and overlapping segments. The method includes combinatorial set theory in a training scheme and at inference of a machine learning approach for automatic segmentation of overlapping objects (102, 104) imaged with an electron microscope.

Description

METHOD FOR MACHINE -LEARNING BASED TRAINING AND SEGMENTATION OF OVERLAPPING OBJECTS

Technical Field

The present invention relates to a method for training and automatically segmenting overlapping objects in images such as overlapping objects in images acquired with an imaging device such as a microscope. More particularly, the present invention includes combinatorial set theory in a training scheme and at inference of a machine learning approach for automatic segmentation of overlapping objects imaged with, for example, an electron microscope.

Background and Summary of the Invention

In the pharmaceutical industry, viruses, Virus-Like Particles (VLPs), liposomes and lipid nanoparticles (LNPs) are extensively used as carriers for drug and gene delivery. The assessment of size, shape and content of these particles, that is the type and number of therapeutic substances or genes, are of prime importance for drug development and quality control as it is directly linked to the efficiency of the treatment as well as the stability (shelf life) and production costs.

Transmission electron microscopy (TEM) is a suitable and commonly used technique for characterizing these nano-sized particles. Different kinds of TEM are used for this purpose of which cryogenic TEM (CryoTEM) and room-temperature TEM (rtTEM), often using negative stain (nsTEM) preparation techniques, are the most common. In CryoTEM, the sample is kept in its native hydrated form by instantly freezing it to a cryogenic temperature to avoid the formation and effects of ice-crystals before imaging at the cryogenic temperatures in the microscope. In rtTEM, the sample is kept at room- temperature and for biological samples. This often means that a stain and/or some other kind of preservative agent must be added to embed the objects in order for important structures to be maintained and not destroyed by the drying process that occurs at room-temperature. A negative stain is sometimes used because it has the effect of increasing the contrast in the sample so that certain types of details and sample constituent differences are enhanced.

Artificial Intelligence (AI) and machine learning such as image-based deep-learning offers the ability of excluding the human's often poor capability of instructing the computer about what information to extract and how to measure and analyze in order to be able to learn relevant information from examples. It has the drawback of not incorporating information or "types" not represented in the examples into the deep-learning model. This type of information is, hence, not typically learnt by the machine/computer. Another problem is that the training models may also learn erroneous features when there are systematic errors in the training data.

Machine-learning techniques, more specifically deep learning and the computational structures known as convolutional neural networks, have lifted image processing and analysis to a whole new and different level. This is especially for computer vision applications that are based on the enormous amounts of accessible natural scene images available on the Internet. Biomedical applications, such as microscopy, face a different scenario with fewer accessible training images. Often, the purpose of the processing and analysis is slightly different and application specific.

In microscopy, a sample often contains multiple objects that should be detected, identified, and measured in different ways. The high number of different types of objects along with their measured characteristic features are in turn used for e.g., clinical diagnosis, disease pathway research and understanding, drug development and quality control. It is, thus, important that each object is segmented in its full size and shape so that the extracted measurements are not biased when objects are overlapping one another or when objects are partly or fully hidden behind other objects. Thanks to the shared weights of convolutional neural networks, these networks can very effectively learn to determine which pixels in an image belong to some type of object. A problem with this sharing of weights, however, is that when overlapping objects are present in the image, there is no obvious way of training a network to determine pixels that belong to the object type and simultaneously separate the segmentations of the individual overlapping objects.

The present invention provides an effective solution to the problem associated with overlapping objects. The method of the present invention relates to a sequence of smart set- theory combinatorial steps of object segments combined with machine-learning approaches. Using this simple, yet clever, combination allows for automatic, accurate and reliable training of such networks and once the networks are trained, detection of full object geometries of overlapping objects in complex images. An important aspect of the present invention is the idea and novel insight of splitting the segmentation task into to two machine-learning tasks. The first task produces two mask images corresponding to non-overlapping and overlapping object segments, and the second task combines the masks to full objects with correct geometry. The step of splitting the segmentation into two steps also allows for the inclusion of a simple, yet effective, modification of the overlapping segments, during the training stages, that makes the segmentation less sensitive to incorrect predictions made by the machine learning methods. The overall segmentation approach is thereby made more robust to handle complex images and images of varying appearances. More particularly, the method of the present invention relates to the steps listed below:

1) The training part of the machine-learning approach of the present invention which makes sure that individual objects are detected, and their full geometry can be segmented; and

2) The design of the machine-learning model of the present invention which performs automated segmentation of overlapping objects.

It should be noted that for both steps it is most important that overlapping objects are included and represented in order not to bias the final results and subsequent extracted measurements. In order to detect overlapping objects correctly in new images when applying/using the method steps of the present invention, so called "at inference," the training of how to handle overlapping objects must have been included in the training step. Although exemplified here as applied to TEM images of liposomes, the method of the present invention is applicable to other datatypes and contents, such as lipid nanoparticles (LNPs), gene therapy and viral biological vectors (i.e., the particles that contain or carry the genes), cellular organelles such as exosomes and ribosomes, and other gene/drug delivery particles as well as overlapping cells or subcellular organelles (such as nuclei) in various types of light and fluorescence microscopy images.

More particularly, the method of the present invention is for analyzing an image having overlapping objects. An imaging device providing an input image having a first object overlapping a second object at a first overlapping segment. The first object comprising the first overlapping segment and a first non-overlapping connected component. The second object comprising the first overlapping segment and a second non-overlapping connected component. A first network or first computational structure receiving the input image from the imaging device. The first network or first computational structure calculating a first image containing only the first and second overlapping segments as a first output image and a second image containing only the first and second non overlapping connected components as a second output image. A separate second network extracting and processing each non overlapping connected component in the second output image together with the first output image and input image. The second network or second computational structure receiving the first non-overlapping connected component of the second output image, the first output image with overlapping segments and the input image. The second network or second computational structure calculating a resulting image of the first object based on the first non-overlapping connected component and the overlapping segments and the input image. The first object comprising the first non-overlapping connected component and the first overlapping segment.

In an alternative embodiment, the method of present invention is creating a training set for machine learning, for determining parameters of a first computational structure or first network having an improved tolerance to uncertain predictions made by the first computational structure or network. An imaging device providing an input image having a first object overlapping a second object at a first overlapping segment. The first object comprising the first overlapping segment and a first non-overlapping connected component. The second object comprising the first overlapping segment and a second non-overlapping connected component. The first non-overlapping and the second non overlapping component having a first gap defined therebetween. Enlarging the first gap to a second gap wherein the second gap is greater than the first gap. Providing ground truth object information for a first and a second object. Setting pixel values of pixels in a first target image representing overlapping segments to one when the pixels belong to more than one ground truth object or when the pixels are located within a predetermined distance extending from the overlapping segments, otherwise setting the pixel values to zero. Setting the pixel values of pixels in a second target image representing non-overlapping connected components to one when the pixels belong to only one ground truth object and when a corresponding position is zero in the first target image, otherwise setting the pixel values to zero.

In another embodiment, the method of the present invention further comprises the step of providing the first network or first computational structure with a training input image or training input images having corresponding non-overlapping connected components and ground truth object segmentations of overlapping objects and training the first network or first computational structure by updating parameters based on the training input image.

In yet another embodiment, the method of the present invention further comprises the steps of making the objects in a ground truth object segmentations image having segmentations in overlapping segments known to the first network or first computational structure in a second training scheme, and the first overlapping segment separating the first connected component from the second connected component so that the first connected component is distinctly separated from the second connected component.

In yet another embodiment, the method of the present invention further comprises the steps of a second network or second computational structure learning in the second training scheme to reconstruct the first object based on the first non-overlapping connected component, the expanded first overlapping segment and the ground truth object segmentations image.

In another embodiment, the method of the present invention further comprises the steps of the second network or second computational structure learning in the training scheme to reconstruct the second object based on the second non-overlapping connected component, the expanded first overlapping segment and the ground truth object segmentations image.

In yet an alternative embodiment, the method of the present invention further comprises the step of creating a padding box encompassing the first non-overlapping connected component.

In an alternative embodiment, the method of the present invention further comprises the step of preparing one padding box for each connected component in the input image.

In another embodiment, the method of the present invention further comprises the step of creating a segmentation set by combining a region of the input image corresponding to the padding box with the non-overlapping connected component and with an expanded overlapping segment.

Brief Description of Drawings

Fig. 1A is an original image showing overlapping objects of liposomes depicted in a transmission electron microscope;

Fig. IB is an image of ground truth object segmentations overlaid on the original image shown in Fig. 1A;

Fig. 1C is an image of a total object mask of the present invention; Fig. ID is an image of expanded objects of the image in

Fig. IB of the present invention overlaid on the original image shown in Fig. 1A;

Fig. IE is an image of a corresponding AND-mask of the present invention; Fig. IF is an image of a corresponding XOR-mask of the present invention;

Fig. 2A is an original image depicting overlapping objects to be analyzed by using the method of the present invention; Fig. 2B is an image depicting overlapping objects which are incorrectly segmented and outlined as large undesirable conjoined objects;

Fig. 2C is an image of individually outlined objects with correct segmentations and the contours of each object are clearly shown (ground truth object segmentations);

Fig. 2D is an XOR-mask image of the present invention overlaid on the original image in Fig. 2A, depicting non overlapping connected components of all the objects;

Fig. 3A is a binary XOR-mask image of the present invention including a bounding box and a padding box of one connected component highlighted in gray;

Fig. 3B is the original image shown in Fig. 2A including the padding box of the present invention;

Fig. 3C is a binary AND-mask image disposed inside the padding box of the present invention of the images shown in Figs. 3A-3B;

Fig. 4A is a zoomed in portion of the binary XOR-mask image shown in Fig. 3A including a bounding box and a padding box encompassing the first selected connected component (highlighted in gray) of the present invention;

Fig. 4B is a binary image of the first selected connected component in the XOR-mask image shown in Fig. 4A sized as the padding box of the present invention;

Fig. 4C is a detailed zoomed in view of the input image (shown in Fig. 3B) showing the region corresponding to the padding box for the first selected connected component of the present invention;

Fig. 4D is a detailed zoomed in view of the AND-mask image shown in Fig. 3C corresponding to the region of the padding box of the present invention;

Fig. 4E is a view of a segmentation set of the first selected connected component of the present invention;

Fig. 4F is a resulting image of the first correctly segmented object corresponding to the first selected connected component analyzed by using the method of the present invention;

Fig. 5A is a detailed zoomed in part of the binary XOR- mask having a bounding box and padding box encompassing the second selected connected component (highlighted in gray) of the present invention;

Fig. 5B is a binary image of the second selected connected component in the XOR-mask shown in Fig 5A of the present invention;

Fig. 5C is a detailed zoomed in view of the input image showing the region corresponding to the padding box for the second selected connected component of the present invention;

Fig. 5D is a detailed zoomed in view of the AND-mask in Fig. 3C corresponding to the region of the padding box of the second selected connected component of the present invention;

Fig. 5E is a view of a segmentation set of the second selected connected component of the present invention;

Fig. 5F is a resulting image of the second correctly segmented object analyzed by using the method of the present invention;

Fig. 6A is a detailed zoomed in part of the binary XOR- mask having a bounding box and padding box encompassing the third selected connected component (highlighted in gray) of the present invention;

Fig. 6B is a binary image of the third selected connected component of the XOR-mask shown in Fig 6A of the present invention;

Fig. 6C is a detailed zoomed in view of the input image showing the region corresponding to the padding box for the third selected connected component of the present invention;

Fig. 6D is a detailed zoomed in view of the AND-mask in Fig. 3C corresponding to the region of the padding box of the third selected connected component of the present invention;

Fig. 6E is a view of a segmentation set of the third selected connected component of the present invention;

Fig. 6F is a resulting image of the third correctly segmented object analyzed by using the method of the present invention;

Fig. 7A is a detailed zoomed in part of the binary XOR- mask having a bounding box and padding box encompassing the fourth selected connected component (highlighted in gray) of the present invention;

Fig. 7B is a binary image of the fourth selected connected component of the XOR-mask shown in Fig 7A of the present invention;

Fig. 7C is a detailed zoomed in view of the input image showing the region corresponding to the padding box for the fourth selected connected component of the present invention;

Fig. 7D is a detailed zoomed in view of the AND-mask in Fig. 3C corresponding to the region of the padding box of the fourth selected connected component of the present invention; Fig. 7E is a view of a segmentation set of the fourth selected connected component of the present invention;

Fig. 7F is a resulting image of the fourth correctly segmented object analyzed by using the method of the present invention;

Fig. 8A is a detailed zoomed in part of the binary XOR- mask having a bounding box and padding box encompassing the fifth selected connected component (highlighted in gray) of the present invention;

Fig. 8B is a binary image of the fifth selected connected component of the XOR-mask shown in Fig 8A of the present invention;

Fig. 8C is a detailed zoomed in view of the input image showing the region corresponding to the padding box for the fifth selected connected component of the present invention;

Fig. 8D is a detailed zoomed in view of the AND-mask in Fig. 3C corresponding to the region of the padding box of the fifth selected connected component of the present invention;

Fig. 8E is a view of a segmentation set of the fifth selected connected component of the present invention; and

Fig. 8F is a resulting image of the fifth correctly segmented object analyzed by using the method of the present invention;

Fig. 9 is a final resulting image of all the correctly segmented objects analyzed by using the method of the present invention;

Fig. 10A is an original image to be analyzed by the detection or first network of the present invention;

Fig. 10B is the resulting non-binary XOR belongingness image of the detection network of the present invention; Fig. IOC is the resulting non-binary AND belongingness image of the detection network of the present invention;

Fig. 10D is the binary XOR mask of the present invention resulting from a binarization of the XOR belongingness image shown in Fig. 10B;

Fig. 10E is the binary AND mask of the present invention resulting from a binarization of the AND belongingness image shown in Fig. IOC;

Fig. 10F is a detailed zoomed in part of the binary XOR- mask having a bounding box and padding box encompassing a selected connected component (highlighted in gray) of the present invention;

Fig. 10G is a view of a segmentation set of the selected connected component of the present invention;

Fig. 10H is the resulting belongingness image of the segmentation set in Fig. 10G given as input and analyzed by the completion or second network of the present invention;

Fig. 101 is the resulting binary object mask of completion stage of the present invention;

Fig. 10J is an image with the final result, the correctly segmented object, for the selected connected component analyzed by the completion stage of the method of the present invention overlaid on the original input image; and

Fig. 10K is a final result image of all the objects analyzed by using the method of the present invention overlaid on the original input image (shown in Fig. 10A).

Detailed Description

The method of the present invention is preferably used in conjunction with any suitable detection method that produces a class belongingness image where each pixel value of each pixel in the image is interpreted as the certainty or probability of the corresponding pixel of the input image belonging to the considered class. For example, a pixel value close to 1 indicates an extremely high probability of the pixel belonging to the class such as belonging to an object in the image that is being investigated. Similarly, a pixel value close to 0 indicates an extremely low probability of the pixel belonging to the class such as belonging to the object in the image that is being investigated. Preferably, a computerized or computational device is used when training and applying the method of the present invention. More particularly, the device, preferably, has one or several computational structures or convolutional neural networks implemented that are first trained to carry out the detection and completion stages of the present invention. As indicated above, it should be understood that the principles or steps of the method of the present invention work with any suitable detection method used to detect objects in an image such as an image depicted with a microscope. For instance, during the training stage, a suitable detection method may be used to detect and outline objects for the ground truth object segmentations image that is used for training, i.e., determining the parameters of the computational structures or networks, as explained in detail below.

In the non-binary class belongingness image, the highest possible value is, preferably, interpreted by the computational device as the image pixel belonging to an object of the segmentation class with full certainty. As the values decrease, the certainty also decreases such that the lowest possible value 0 is interpreted by the device to mean that the image pixel completely or definitely does not belong to an object of the class. Values are typically floating point values ranging from 1 (highest) to 0 (lowest) but can also be of other types and in other ranges. It is, for example, not uncommon to have images represented by integers in the range 0-255.

The belongingness image can be converted to a binary segmentation mask by letting the computational device change each pixel value to either 1 or 0 according to a preferred procedure. The preferred procedure could, for instance, be a simple thresholding method where belongingness values above a certain value (i.e., the threshold) are converted to 1 and values equal to or below the threshold are converted to 0.

The binarization can also consider or combine multiple features or belongingness images according to pre-determined rules. Preferably, the binary segmentation mask is created in which all pixels are either 1 or 0.

The binary segmentation masks can be used to determine the connected components in the mask. For example, all the pixels in a connected component have the value 1 and form one connected continuous area that is not split up or separated by pixels that have the value 0. Preferably, any two neighboring pixels of the same binary pixel value (either 1 or 0) belong to the same connected component. In this way, connected components correspond to an intuitive notion of isolated groups or "islands" of 1-valued class pixels surrounded by 0-valued background pixels. For clarity, the 1-valued class pixels could be depicted in white color or in a certain pattern while the 0-valued class pixels could be depicted in black color or in another pattern than the pattern used to depict 1-valued class pixels.

Given a class ontology where individual objects are present in the input image, when the objects are all well- separated (i.e., when there is no overlap between the objects), each complete object is represented by a connected component in the binary segmentation mask. However, when the objects are overlapping in the image, individual entire objects of a class cannot be distinguished from one another in the binary segmentation mask. Instead, overlapping objects are conjoined into a single connected component (that includes many objects) so that the geometry of the individual objects cannot unequivocally be determined from the single conjoined connected component. In other words, the relationship between objects and single conjoined connected components is then not 1-to-l. This means that the individual objects are not the same as each conjoined connected component shown in the image because the conjoined connected component includes two or more objects that are conjoined and not easily separable from each other.

However, a 1-to-l relationship between objects and connected components can be achieved by using the method of the present invention that takes special consideration of regions in the input image where the objects overlap.

Instead of producing a single class belongingness image corresponding to the certainty of the presence of the object, the method of the present invention uses two separate images. The first image corresponds to the certainty of pixels belonging to a non-overlapping region or segment of an object in the image. The second image corresponds to the certainty of pixels belonging to regions or segments in the objects where there is overlap with another object or objects. In other words, pixels located in the overlapping segment belong to more than one object. The two different images are henceforth called the belongingness XOR-image and the belongingness AND-image, respectively. The belongingness images that are produced as output by the detection of the first computational structure or network are not binary. Instead, the belonginess images produced by the first network indicate the probability of each pixel belonging to the class by varying the shade of the gray scale from any shade between white (100% probability) to black (0% probability). Preferably, the probabilities according to the detection or first network are based on how the detection network has been trained to set the probabilities. Their conversions to binary images may simply be called the XOR-mask and the AND- mask. This simple way of keeping track of overlapping object regions and using them in the deep learning framework (instead of just using one "object mask") is a core feature of the present invention. It allows for a successful final segmentation of full geometry objects despite the objects being overlapped by other objects, while being computationally efficient.

The method of the present invention includes two separate consecutive stages i.e., a detection stage (first stage) followed by a completion stage (second stage).

However, as explained in detail below, before the device can carry out the detection and completion stages on a new image that contains overlapping objects, the networks in the device must first be trained to do the detection and the completion stages. In the preferred embodiment, the detection and completion stages, preferably, include a combination of at least two machine learning methods or networks.

In the detection stage, when the XOR- and AND-masks are created, the connected components of the XOR-mask map to individual objects of the input image in a 1-to-l fashion. This means the pixels in the pixel positions located in each connected component of the XOR mask belong to only one object. However, the connected components in the XOR-masks, do not generally represent the full object geometries in the original input image. Often a connected component only represents a portion of an object. This is because overlapping objects in the input image are represented by partial segmentations, i.e., the non-overlapping regions or segments because the overlapping segments of the objects are interpreted by the networks of the device as belonging to the background of the image. For example, in a binary XOR mask, the overlapping segments are interpreted by the device/networks as not belonging to the object but to the background. The full object geometries can be rectified in the later completion stage by using the AND-mask, as described in detail below.

The detection stage of the method of the present invention is, preferably, implemented in the computational device by using a supervised machine-learning method such as a fully convolution neural network e.g., U-Net trained on the XOR-mask and the AND-mask as targets for an input image wherein the input image is used by the trained U-Net to create the belongingness XOR image and the belongingness AND image. The target masks (i.e., the AND and XOR masks), used when training the supervised machine learning method for the detection task, can be created so that predictions by the network (the belongingness XOR image and the belongingness AND image) that deviate from the exact desired target (as stipulated by the binary ground truth masks) are better tolerated .

A First Network Is Trained To Carry Out The Detection Stage Geometrically correct segmentations (the ground truth object segmentations) of each individual object are, preferably, used together with the input image when the networks of the device are trained to create the XOR-mask and the AND-mask from the input image. The ground truth object segmentations may be created by using a suitable detection method in order to correctly outline or segment all individual objects in the image despite the objects overlapping one another. In other words, the position and shape of the outer edge of each object in the overlapping segment are known so that the objects are correctly segmented despite any overlap between objects. The image and information of the ground truth segmentations (geometries) of all the objects are only used during the training of the networks to make sure the networks are correctly trained since it is known what each object looks like in the ground truth object segmentations image despite the objects overlapping one another. During the training stage of the networks, target masks (AND and XOR masks) with an incorporated tolerance for errors are created, according to the steps below: A provisional XOR-mask is created based on the image of the ground truth object segmentations by setting pixel values in the mask to 0 when the pixel positions of the pixels are located not inside an object to be analyzed in the image or when the pixel positions of the pixels are part of or belong to two objects or more.

Otherwise, the pixel values are set to 1. This means the pixels located inside an object are set to 1 while the pixels located in the overlapping segments and the background are set to 0. As a target when training the first network, an AND-mask is created based on the image of the ground truth object segmentations by setting the pixel values in this mask to 1 when the pixel positions of the pixels are located inside or in close proximity to (nearly inside) two or more objects. This makes the overlapping segments depicted in the AND-mask slightly larger than the overlapping segments depicted in the image of the ground truth object segmentations. Otherwise, the pixel values are set to 0. The proximity criterion can, for example, be within a certain distance, to any pixel that is inside two or more objects. This expansion of the overlapping segments used in the AND-mask is an important step of the method of the present invention, during the training stage of the first and the second networks, because it makes the method less sensitive to flaws or inaccuracies in the predictions made by the networks that are being trained. It should be noted that the expansion of the overlapping segments in the AND mask is only done during the training stage but not when the fully trained network is creating AND masks when analyzing a new unknown image (at inference). Instead of expanding the overlapping segments, it would also be possible to reduce the size of the connected components relative to the overlapping segments so that a gap is formed between adjacent connected components.

3. To serve as an additional or the second target when training the network, an XOR-mask is created by setting the pixel values to 0 when the corresponding pixel in the provisional XOR-mask (from step 1) has a pixel value of 0 or when the corresponding pixel (same location) in the AND-mask has a pixel value of 1. The other pixels are set to 1. This means that a connected component for an overlapping object in the XOR mask is slightly smaller than the corresponding connected component in the provisional XOR mask.

In this way, overlapping segments of the objects (depicted in the AND-mask) render slightly bigger areas of the background disposed between connected components in the XOR-mask of step 3 than would be seen in the provisional XOR- mask of step 1. The expanded overlapping segments mean the connected components in the XOR-masks are rendered slightly smaller than the connected components in the original image because the overlapping segments in the original image have not been expanded. Crucially, a proper gap between two detected overlapping objects is guaranteed because the AND- mask has been expanded to also include "nearly inside" pixels. This gap ensures that the connected components in the XOR-mask are separated from one another while each connected component only belongs to and/or depicts one object. During the training stage, the connected components corresponding to different objects are thus distinctly separated from one another by the gaps created by the expanded overlapping segments in the AND-mask. Preferably, the size of the gap corresponds to the distance in the proximity criterion of step 2. The gap between the connected components in the XOR-mask used as the target mask when training the network for the detection stage prevents object segmentations from becoming undesirably conjoined when the network prediction deviates slightly from a target result, as specified by the ground truth object segmentations. This means that the gaps between the connected components in the XOR-mask when training the network allow for some errors in the network prediction when the method is used to create the XOR-mask and the AND-mask when using the method of the present invention on new unseen images (i.e., when using the trained network at inference) wherein the object segmentations are not known to the networks.

It should be understood that once the first or detection network has been trained by using a large number of images with objects and their corresponding ground truth masks, the trained first network is preferably used in the detection stage of a new input image with unknown object segmentations of overlapping objects. The image of the ground truth object segmentations is not used during the "real" detection stage i.e., when the trained first network is using its knowledge and applies it to the detection stage of a new image. The predictions made by the first network in the detection stage are preferably interpreted as the belongingness XOR-image and the belongingness AND-image. The connected components and overlapping segments of the belongingness images, preferably, have pixels with different shades ranging from white to gray that indicate the likelihood of the pixels belonging or not. Figs. lOB-lOC show examples of XOR and AND belongingness images, respectively. The belongingness images are preferably the output of the detection stage of the first network. The belongingness images can be converted to binary segmentation masks by, for example, applying a threshold value to the images, as mentioned above. The threshold value can be fixed (pre-determined and fixed in the computational device) or manually set (given to the computational device by the user) or automatically determined. Commonly used and suitable automatic thresholding methods are, for example, the Otsu thresholding method or the minimum error thresholding method.

A Second Network Is Preferably Trained To Carry Out The Completion Stage

To complete the connected components of the XOR-mask (determined or created during the detection stage) in the completion stage, a different second or completion network preferably combines the connected components of the XOR-mask with the segments in the AND-mask and the input image. In this way, a collection of pixel regions that represent the full geometry of each of the objects in the input image is generated. For each connected component in the XOR-mask, a pixel region surrounding it is determined. The corresponding regions in the input image and in the AND-mask are then used together with the connected component to reconstruct the full geometry of the object. The completion stage may be implemented by using a supervised machine-learning method such as a second fully convolutional neural network. The second or completion network could, for example, be a U-Net that is trained on full geometry objects (ground truth object segmentations) as prediction targets for input images in the form of a region around each connected component (representing a part of an object in a 1-1 fashion) in the XOR-mask, and the corresponding regions in the original image and the AND-mask. The second network is thus trained to carry out the completion stage. In other words, the first network is preferably trained to carry out the detection stage from which the XOR-mask and AND-mask are derived while the second network is trained to carry out the completion stage for each object. The AND-mask and XOR-mask of the above steps 2 and 3 in the training of the detection stage are preferably re-used as the training data when training the second network to carry out the completion stage of the present invention.

Given an object from the input image (i.e., aground truth object segmentation), training data to train the second network to carry out the completion stage can be created by the following steps:

1.By using the same ground truth object segmentations as was used when the training the first network for the detection stage, a surrounding rectangle, a padding box, is determined and created by the computational device with sides of a factor alpha, e.g., alpha=3, larger than the sides of the bounding box of the connected component in the input image. The padding box should be sized to ensure that the entire object to be detected and analyzed is included inside the padding box before being sent into or submitted to the second network. The padding box is positioned such that the center of the padding box aligns with the centroid or center of the connected component in consideration. The bounding box is the smallest box that can be fitted around the connected component and it is automatically determined by the computational device by investigating the locations (x,y) coordinates of all pixels belonging to the connected component. The smallest and largest x- and y-values encountered correspond to the top and bottom corner coordinates of the bounding box, respectively . A target mask region is created by rendering the same object in the ground truth object segmentations image with 1-valued pixels in a 0-valued rectangle of the same dimensions as the padding box. A XOR-mask region is created by cropping the XOR-mask used during the training of the first network of the detection stage, with the padding box so that the XOR- mask region has the same size as the padding box. Any pixel value of a pixel for which the corresponding pixel value in the target mask is 0 is set to 0. An AND-mask region is created by cropping the AND-mask, used during the training of the first network of the detection stage, with the padding box so that the AND- mask region and the padding box have the same size. An input image region is created by cropping the input image from step 1 with the padding box so that the input image region and the padding box have the same size. Training the second network to carry out the completion stage by using the same XOR-mask and AND-mask, as were used during the training of the first network to carry out the detection stage, enables the second network to produce full object geometries by taking as input the XOR-mask and the AND-mask produced during the detection stage.

In an inference scenario, when a trained network carries out the completion stage on a new image to complete objects in that image, the padding box for input creation cannot, of course, be constructed from the ground truth object geometry being the "a posteriori" result of the network. The image of the ground truth object segmentations is only used as input during the training stage of the networks to learn the detection stage and the completion stage. When the completion stage is carried out by the trained network on a new image, the padding box is instead derived from or based on the XOR-mask which is the output from the detection stage of the new image. In other words, the XOR-mask (which includes a connected component) created during the detection stage of objects in the new image serves as the basis for creating the bounding box and the padding box for the objects in the new image analyzed during the completion stage. The Trained Second Network Carries Out The Completion Stage Of A New Image By Using The Connected Component In The XOR- Mask From The Detection Stage As Input

Given a connected component, such as a connected component from the XOR-mask created during the detection stage, the input to the trained second network for the completion stage of the new image for full object geometry prediction is created as follows:

1.A surrounding rectangle, padding box, is created with sides a factor alpha, e.g., alpha=3, larger than the bounding box of the selected connected component in the XOR-mask. The padding box is preferably positioned such that the center of the padding box aligns with the centroid of the connected component (example shown in Fig. 4A).

2.A XOR-mask region is created by cropping the XOR-mask, created during the detection stage of the new input image, with the padding box. Any pixel that does not belong to the connected component is set to have a pixel value 0 (example shown in Fig. 4B).

3.An AND-mask region is created by cropping the AND-mask, created during the detection stage of the new input image, with the padding box (example shown in Fig. 4D). In other words, the same AND-mask is used as was created during the detection stage of the new input image.

4.An input image region is created by cropping the new input image with the padding box (example shown in Fig. 4C).

Fig. 4E shows the segmentation set used as input to the completion network. The prediction result of the second network used in the completion stage is interpreted as a belongingness image. Preferably, this belongingness image is converted to a segmentation mask (example shown in Fig. 4F), such that the segmentations are interpreted as representing the full geometries of the object in the cropped new input image. The conversion from the belongingness image to the segmentation mask could, for example, be achieved by applying a threshold to the values. Any suitable thresholding method could be used such as the commonly used Otsu thresholding method (used in the illustrative examples in this application) or the minimum error thresholding method.

Another approach could be to apply the watershed segmentation method to a gradient magnitude image derived from the belongingness image. Yet another approach could be to use the thresholded belonginess mask as a starting guess for a deformable or active shape model applied to the underlying original image.

Figs. 10A-K (described in detail below) illustrate all the steps of the present invention at inference, that is when the method is applied to a new image with overlapping objects .

Example

The first feature (training phase) of the present invention is the creation of a XOR-mask and an AND-mask from an image with prepared (ground truth) object segmentations that shows the correct segmentation of the objects although some of the objects are overlapping one another. Preferably, the image of the ground truth object segmentations is used as the target image to train the first network. XOR may stand for "exclusive or" meaning the mask depicts or shows pixel positions that belong to one object or another object (but not both) in the image. AND may stand for "and" meaning that the mask depicts or shows pixel positions of overlapping segments that belong to at least two objects in the image i.e., the pixel positions belong to one object and another object.

Fig. 1A shows an original input image 100 that contains overlapping or very close objects 102, 104, 106, 108, 110,

112, 114, 116 and 118. The objects may be liposomes depicted in a transmission electron microscope (TEM). Fig. IB is very similar to Fig. 1A but shows "ground truth" or ideal object segmentations overlaid on the original input image so that all the objects in the image can be clearly distinguished from one another despite the overlap. It should be understood that the image in Fig. IB is merely used to illustrate that the outlines of each object are fully known by using another suitable detection method to detect and outline each object in the original input image. As explained in more detail below, the image of the ground truth object segmentations is merely created and used as training data when training the first network to carry out the detection stage and when training the second network to carry out the completion stage.

The method steps of the present invention can be used regardless of which method was used to find and identify the objects in the TEM image 100 in order to create the ground truth object segmentations image. As best shown in Fig. IB, object 102 and object 104 overlap at an overlapping segment 120. Object 112 and object 106 overlap at a segment 126. Object 114 and object 106 overlap at a segment 128. Object 106 and object 102 overlap at a segment 130. Object 106 and object 104 overlap at a segment 132. Object 104 and object 116 overlap at a segment 134. Object 114 and object 116 overlap at a segment 138. Object 118 and object 116 overlap at a segment 140. Object 110 is very close to object 102 and object 108 is very close to object 102 but there is no clear overlap between the objects.

As explained above, a mask may be used to emphasize, mark or render/display certain sections in an image. For better clarity, the masks are preferably, but not necessarily, binary. A binary AND-mask 144 and a binary XOR mask 146 can be created in the device, as shown below:

1. A binary mask or total object mask 142 of all objects 102, 104, 106, 108, 110, 112, 114, 116 and 118 is first created, as shown in Fig. 1C. That is, all pixels belonging to any object 102..118 of image 100 in Figs. 1A and IB are set to 1 (white color) in the total object mask 142. Fig. IB shows image 100 overlaid with object overlapping regions as segments 120 ..140. All pixels not belonging to any object 102..118 in Fig. IB are set to 0 (black color) in the mask 142. The black color thus shows the background of the image or mask 142.

2. Image 100' in Fig. ID shows a new set of dilated overlapping segments 120', 122', 124', 126', 128',

130', 132', 134', 136', 138' and 140' that are created by dilating (growing or expanding) each object 102..118 in Fig. 1A by some amount, such as a specific distance or 10% or any other suitable percentage, to create expanded objects 102'..118' and corresponding expanded overlapping segments 120'..140' (best shown in Fig.

ID). The result is that, for example, overlapping segment 128', shown in Fig. ID, has been expanded from overlapping segment 128 shown in Fig. IB. Similarly, overlapping segment 130' has been expanded from overlapping segment 130 and so on.

3. The binary AND-mask 144, as shown in Fig. IE, is created, from the overlapping dilated or expanded segments 120'..140', shown in Fig. ID, in which each pixel that belongs to only one dilated object 102'..118' in Fig ID is set to background i.e., 0 or black color in Fig. IE. All pixels in Fig. ID that belong to two or more objects 102'..118' (AND operation) are set to foreground i.e., 1 or white color in Fig. IE. The result is that only the expanded overlapping segments 120'..140' are shown in white color in Fig. IE and everything else is shown in black color.

4. The binary XOR-mask 146, as shown in Fig. IF, is created from the set difference, i.e., by subtracting the AND-mask 144, shown in Fig. IE, from the total object mask 142, shown in Fig. 1C. This means the expanded overlapping segments shown in white color in Fig. IE are shown in black color in Fig. IF and that the remaining connected component of each non-expanded object, shown in white in Fig. 1C, is shown as white in Fig. IF.

One important feature or advantage of dilating or expanding the object segments 102..118 to create the overlapping object region segments 120'..140' prior to creating the AND-mask 144 is that it makes sure that there is a gap or margin between the connected components of the XOR- mask 146 so that no object or connected component (i.e., partial object) in the XOR-mask 146 is touching or in contact with another object or connected component. All the connected components are distinctly separated from one another so that no white area in the XOR-mask 146 is touching another white area. In this way, there is, for example, a gap 148 between the connected component associated with object 106 and the connected component associated with object 114, as shown in Fig. IF. It should be noted that there would be no, or only a 1-pixel wide, gap 148 if, for example, the non-expanded segment 128, instead of the expanded segment 128', was used to depict the overlap between object 114 and object 106. Similarly, there is another gap 150 between the connected component associated with object 106 and the connected component associated with object 112 and a gap 152 between the connected component associated with object 116 and the connected component associated with object 104. As a result of the expansion, each white area (connected component) in the XOR-mask 146 is distinctly outlined and is associated with or representing only one object. Some of the gaps between the connected components of the objects are represented by the expanded overlapping regions or segments. The fact that the connected components are not in contact with one another makes the machine-learning method of the present invention, described in detail below, very effective and robust and allows for some prediction errors of the networks while maintaining the accuracy of the detection and completion stages of the present invention both during the learning stages and when the trained networks are applied to new input images. A second important feature of the present invention is the step of training a fully convolutional neural network in the computational device or distributed devices, on a set of images that display ground truth object segmentations. In this way, the network, such as the first or detection network, is first trained to create the basis (belongingness images) for the corresponding AND-mask and XOR-mask as output based on the image of the ground truth object segmentations. More particularly, the first network creates an AND- belongingness image and an XOR-belongingness image prior to binarizing them to the AND-mask and XOR-mask, respectively. The idea is thus to train the first network to be able to carry out the same steps of creating the corresponding AND- mask and XOR-mask based on a new image as input where the correct object segmentation is not known. In other words, the trained network applies the same principles to the new image as the network learned, during the training of the detection stage, to apply to the image of the ground truth object segmentations. A central concept of the present invention is thus to first train the system on known images that include separate and overlapping objects in order to be able to apply it to unknown/new images that also include separate and overlapping objects. It should be understood that the networks embody the machine learning methods of the present invention. The networks are, preferably, computational structures in the family of machine learning methods. In other words, the networks are thus first trained on known examples of images to learn a method that they then can apply to unknown/new examples of images.

Fig. 2A shows an original input image 200a that includes overlapping objects. Normally, the image 200a with overlapping objects 202, 204, 206, 208, and 210 cannot be segmented correctly right away with a fully convolutional neural network which has been trained to outline complete objects that have the full and correct geometry. Although only objects 202..210 have been marked in Fig. 2A the principles that apply to objects 202..210 also apply to all the other objects in image 200a and other images. Objects 202..210 merely serve as examples to illustrate the steps and principles of the detection and completion stages of the present invention.

Fig. 2B shows the same view as in Fig. 2A overlaid with an incorrectly segmented image 200b of overlapping objects 202..210 which are identical to the objects in image 200a but image 200b emphasizes or illustrates as an overlay that the objects are incorrectly outlined as large undesirable conjoined objects wherein the individual objects of the conjoined objects are difficult to distinguish from one another due to the overlapping regions. This undesired and incorrect segmentation, corresponding to the typical result from a fully convolutional neural network which has been trained to outline complete objects (the full and correct geometry) in one go or step, makes the overlapping objects 202, 204, 206, 208 and 210 look like one large, conjoined object 199. As explained in detail below, the two-step approach of the method of the present invention is a critical feature in order for the networks to be able to reliably create correct segmentations of overlapping objects in an image.

Fig. 2C depicts the same view as shown in Fig. 2A overlaid with a ground truth object segmentations image 200c of individually outlined objects 202..210 with correct outlines of the objects. This means the entire objects 202..210 are individually outlined with correct segmentation despite being overlapped by another object. More particularly, all objects 202..210 are individually outlined and overlapping regions 212, 214, 216, and 218 are correctly acknowledged as belonging to at least two objects. It is to be understood that any suitable detection method may be used to correctly detect and outline the objects in order to create the ground truth object segmentations image 200c shown in Fig. 2C used when training the networks of the present invention.

Fig. 2D shows the same view as in Fig. 2A overlaid with outlines of the connected components of an XOR image 200d of overlapping objects 202..210 with incorrect outlining but individual detection of each connected component 201..209 separated by expanded overlapping segments 212', 214', 216' and 218'. It should be noted that the images 200a, 200b,

200c and 200d are based on the same view of objects.

A feature of the present invention, as a first step towards correct segmentations, is the creation of an XOR-mask as the one which outlines are shown overlaid on the original image 200d in Fig. 2D as an output of the detection stage of the present invention. However, the XOR-mask has incorrect outlining of the objects (because the mask excludes the expanded or dilated overlapping object regions 212'..218' and depicts them in black as background) but illustrates the individual detection of the connected components 201..209 that are associated with the corresponding objects 202..210, respectively. The XOR mask depicted in image 200d overlaid on the original image is an example that shows that the first network has learnt not to include the overlapping segments 212'..218'. More particularly, the XOR-mask in image 200d thus treats the expanded or dilated overlapping segments 212'..218', as explained in connection with Figs. 1A-1F, as belonging to the background 247 of the XOR-mask in image 200d. It should also be noticed that there is no overlap of segments in the XOR-mask shown in image 200d and each connected component 201..209 (i.e., the integral remaining portion of each object) corresponding to each object 202..210 are distinct and separated from the connected components of the other objects. The creation of the XOR-mask, shown overlaid on the original image in Fig. 2D, used when training the detection network is preferably done by first creating the total object mask (example shown in Fig. 1C) and then deducting the corresponding AND-mask, as explained in detail in connection with Figs. 1A-1F.

A third important feature of the present invention, which relates to the completion stage, is the creation and use of the padding box 252, best shown in Fig. 3A, which marks a region encompassing one connected component, such as the first connected component 201 corresponding to object 202 and some surroundings in one or more same-sized images. The padding box 252 is preferably constructed by enlarging (by some amount) the bounding box 254 that is the smallest sized box that surrounds or encompasses the selected connected component 201 in the XOR-mask image 200e. The XOR mask image 200e is the same mask as the one displayed overlaid on the original image in Fig. 2D. It shows the connected component 201 (marked in a gray shade) of the object 202. The bounding box 254 is placed and sized to have the minimum size that encompasses the connected component 201 (corresponding to object 202) in the XOR mask image 200e. Preferably, the padding box 252 should be sized (i.e., be large enough) and placed (centered) so that it encompasses all overlapping regions belonging to the object to be reconstructed in the completion stage. In practice, such as when the objects depicted correspond to gene carriers, it is often important that each object, that is to be analyzed in a later stage, is correctly outlined, despite being overlapped by another object, to ensure that the entire object is then being analyzed. This is one reason for constructing one padding box for each connected component (that is associated with an object).

More particularly, Fig. 3A depicts the XOR mask image 200e that shows the first connected component 201 of the selected first object 202 and its bounding box 254 and padding box 252. As mentioned earlier, the bounding box 254 is preferably the smallest sized box that encompasses the first connected component 201 of the first selected object 202. The larger size of the padding box 252 compared to the bounding box 254 ensures that the entire full geometry object

202 to be detected and analyzed is located inside the padding box 252. The bounding box 254 is thus associated with an object such as object 202. The XOR mask image 200e shown in Fig. 3A, is based on the original image 200a, shown in Fig. 2A. It should be understood that the bounding box 254 and padding box 252 are merely marked regions in the image to be analyzed. Fig. 3B shows the padding box 252 overlayed on the original input image 200d (that is virtually identical to image 200a) placed in the same area as the padding box 252 shown on the XOR mask image 200e. Fig. 3C shows a corresponding AND mask 200g with the padding box 252 marked.

A fourth feature of the completion stage of the present invention is the creation of segmentation sets or masks i.e., the padding box regions are copied from the full-size images 200e, 200f, 200g and depicted as images 200e', 200f', 200g' that have the same size as the padding box 252. More particularly, given the original input image 200f (shown in Fig. 3B) and its corresponding XOR-mask image 200e (shown in Fig. 3A) and AND-mask 200g (shown in Fig. 3C), a segmentation set is created for each connected component, such as connected component 201 in the XOR-mask 200e, by the following steps:

1. The padding box 252 of the connected component 201 corresponding to object 202 is constructed.

2. Three empty images, with the same size as the padding box 252, are created.

3. The connected component 201 in padding box region 252 of the XOR mask image 200e' (shown in Fig. 4A) is drawn (or rather copied by the device) onto XOR mask 246 (shown in Fig. 4B) such that any pixel that belongs to the connected component 201 of object 202 is drawn white while any other region is drawn black. This image thus depicts the binary XOR-mask 246 of the connected component 201 that is sized according to the padding box 252. 4. The content of the padding box 252 in image 200f (shown in Fig. 3B) is depicted in image 200f', as shown in Fig. 4C.

5. The content of the padding box 252 of the AND mask 200g (shown in Fig. 3C) is depicted as AND-mask 200g', as shown in Fig. 4D.

The image 200e' in Fig. 4A is a zoomed in part of the XOR mask image 200e (Fig. 3A). Just as Fig. 3A, Fig. 4A displays the XOR mask image 200e' zoomed in on the bounding box 254 and padding box 252 for the connected component 201 which is highlighted in gray. The zoomed in image 200f' in Fig. 4C is the same as the padding box 252 in image 200f shown in Fig. 3B. The AND mask 200g' depicted in Fig. 4D is also shown inside the padding box 252 shown in Fig. 3C.

After this procedure, the XOR mask 246, the image 200f', and the AND mask 200g' together constitute a segmentation set 258, illustrated jointly in Fig. 4E, as a composition of the XOR mask 246 (shown in Fig. 4B) and the AND mask 200g' (shown in Fig. 4D), overlaid on image 200f' (shown in Fig. 4C). It should be noted that applying the method of the present invention to new images does not require the use of any views related to the view of the ground truth object segmentations.

As mentioned above, the ground truth object segmentations are only used during the learning stages of the first and second networks. Once the networks have been fully trained, they only use input from the new image to be analyzed. It should also be noted that the overlap segments (AND-regions) 212'..218' are expanded when creating the segmentation set 258 used for training, while the connected component 201 is not expanded and has the correct size.

A fifth feature of the present invention is to train a supervised machine learning method, such as a second network, to carry out the completion stage. For example, the computational structure could be a fully convolutional neural network that, during the learning stages, is trained on a set of images including the image that shows the ground truth objects segmentations. One task of the second network is to output the complete geometry of an object given a segmentation set, such as segmentation set 258, as input.

The segmentation sets can be created from the XOR-mask and AND-mask according to the procedure describe above. While training the network, the segmentation set, and the view of the corresponding ground truth object segmentation are preferably used as input. Again, the ground truth object segmentations image is only used during the training stage to, preferably, train the first network to carry out the detection stage and the second network to carry out the completion stage.

A sixth feature of the present invention is the creation of the input to the completion stage from the output of the detection stage given only an image as input. Each connected component, such as the connected component 201, in the XOR- mask and the AND-mask output of the detection stage (typically provided by the trained fully convolutional first neural network) is, together with the input image, used to create a segmentation set (i.e., based on images 246, 200f', 200g') displayed jointly in the segmentation set 258 in Fig. 4E which is supplied as input to the second network that performs the completion stage (typically a trained fully convolutional neural network) so that the final output (i.e., the resulting image) is a segmentation image 260, shown in Fig. 4F, which correctly outlines the object 202. One connected component can be used to produce one segmentation set so there is, preferably, only one segmentation set for each connected component. The overlapping object segments 212..218 of the original input image 200f' (Fig. 4C) are interpreted by the device in the second network as being parts of the full geometry of the object 202 and are added to the connected component 201 to create the full geometry of object 202, as shown in the resulting image 260 (shown in Fig. 4F).

The completion stage is preferably executed multiple times to complete all connected components in the XOR-mask created during the detection stage by the first network. The process of completing five local connected components is illustrated in Figs. 5A-8F, and the combined final result with all five objects correctly outlined is shown in Fig. 9.

A seventh feature of the present invention is that the completion stage only needs to be performed for objects that are overlapping. These can be directly identified from the connected components in the AND-mask, allowing for efficient implementation and low computational cost. In the case of an image with objects with no overlaps, the AND-mask is completely black. The connected components of the XOR-mask derived from the output of the first network in the detection stage then correspond to the correctly outlined objects.

When the components in the AND-mask are sparse and far apart, computational optimization can be achieved by only performing the completion stage on the XOR-mask connected components in near proximity to the connected components in the AND-mask.

An eighth feature of the present invention is that the XOR-mask can be used to evaluate the degenerate case where an object is completely covered by a larger object. Any covered object can be recovered by checking for holes in the connected component of the covering object in the XOR-mask.

A correct outline of the covered object is then achieved by reversing the dilation (expansion), i.e., eroding (shrinking), the corresponding connected component in the AND-mask .

Figs. 5A-5F show the same principle views as Figs. 4A-4F but a bounding box 262 and padding box 264 are encompassing the second selected connected component 207 instead of the first selected connected component 201. The steps that have been described in connection with the connected component 201, as shown in Figs. 4A-4F, also apply to all the other connected components in the image such as connected components 203, 205, 207, and 209. More particularly, Fig.

5A is a zoomed in part (image 266) of the binary XOR-mask 200e (shown in Fig. 3A) showing the bounding box 262 and the padding box 264 for the connected component 207 highlighted in gray. Fig. 5B is a binary XOR-mask 268 of the second connected component 207 i.e., the second selected (marked in gray) connected component in Fig. 5A. Fig. 5C is a corresponding input image 270 of the padding box 264 region of image 200a for the second selected connected component 207 corresponding to object 208. Fig. 5D is the binary AND-mask 272 of the padding box 264 showing the expanded overlapping segment 216' and portions of the expanded segments 214' and 218'. Fig. 5E is a view of a second segmentation set 274 of the second selected connected component 207. Fig. 5F is a resulting image 276 of the second selected connected component 207 that clearly shows the entire object 208 with correct segmentation and outline. The resulting image 276 is the result of the completion step related to the second selected connected component 207. The corresponding detection steps and completion steps are identical for the second selected connected component 207 corresponding to object 208 as were, for example, described in detail above regarding the first selected connected component 201 and shown in Figs. 4A-4F.

Figs. 6A-6F show the same principle views as Figs. 4A-4F and Figs. 5A-5F but the bounding box 280 and padding box 282 are encompassing the third selected connected component 205 instead of the first selected connected component 201 and second selected connected component 207, respectively. More particularly, Fig. 6A is a zoomed in part (image 284) of the binary XOR-mask 200e (shown in Fig. 3A) showing the bounding box 280 and the padding box 282 for the connected component 205 highlighted in gray. Fig. 6B is a binary XOR-mask 286 of the third connected component 205 i.e., the third selected connected component shown in gray in Fig. 6A. Fig. 6C is a corresponding input image 288 of the padding box 282 region of image 200f for the third selected connected component 205 corresponding to object 206. Fig. 6D is the binary AND-mask 290 of the padding box 282 showing the expanded overlapping segment 214' and portions of the expanded segment 216'. Fig. 6E is a view of a third segmentation set 292 of the third selected connected component 205. Fig. 6F is a resulting image 294 of the third selected connected component 205 that clearly shows the entire object 206 with correct segmentation and outline. The resulting image 294 is the result of the completion step related to the third selected connected component 205. The corresponding detection steps and completion steps are identical for the third selected connected component 205 as were, for example, described in detail above regarding the first selected connected component 201 and shown in Figs. 4A-4F.

Figs. 7A-7F show the same principle views as Figs. 4A- 4F, 5A-5F and 6A-6F but the bounding box 298 and padding box 300 are encompassing the fourth selected connected component 209 instead of the earlier selected connected components, as mentioned above. More particularly, Fig. 7A is a zoomed in part (image 302) of the binary XOR-mask 200e (shown in Fig. 3A) showing the bounding box 298 and the padding box 300 for the connected component 209 highlighted in gray. Fig. 7B is a binary XOR-mask 304 of the fourth connected component 209, the fourth selected connected component 209 shown in gray in Fig. 7A. Fig. 7C is a corresponding input image 306 of the padding box 300 region of image 200f for the fourth selected connected component 209 corresponding to object 210. Fig. 7D is a detailed binary AND-mask 308 of the padding box 300 showing the expanded overlapping segment 218'. Fig. 7E is a view of a fourth segmentation set 310 of the fourth selected connected component 209. Fig. 7F is a resulting image 312 of the fourth selected connected component 209 that clearly shows the entire object 210 with correct segmentation and outline. The resulting image 312 is the result of the completion step related to the fourth selected connected component 209 from the output of the detection stage. The corresponding detection steps and completion steps are identical for the fourth selected connected component 209 as were, for example, described in detail above regarding the first selected connected component 201 and shown in Figs. 4A- 4F.

Figs. 8A-8F show the same principle views as Figs. 4A-7F but the bounding box 320 and padding box 322 are encompassing the fifth selected connected component 203 instead of the earlier selected connected components, as mentioned above. More particularly, Fig. 8A is a zoomed in part (image 324) of the binary XOR-mask 200e (shown in Fig. 3A) showing the bounding box 320 and the padding box 322 for the connected component 203 highlighted in gray. Fig. 8B is a binary XOR- mask 326 of the connected component 203, the fifth selected connected component shown in gray in Fig. 8A. Fig. 8C is a corresponding input image 328 of the padding box 322 region of image 200f for the fifth selected connected component 203 corresponding to object 204. Fig. 8D is the binary AND-mask 330 of the padding box 322 showing the expanded overlapping segment 212'. Fig. 8E is a view of a fifth segmentation set 332 of the fifth selected connected component 203. Fig. 8F is a resulting image 334 of the fifth selected connected component 203 that clearly shows the entire object 204 with correct segmentation and outline. The resulting image 334 is the result of the completion step related to the fifth selected connected component 203. The corresponding detection steps and completion steps are identical for the fifth selected connected component 203 as were, for example, described in detail above regarding the first selected connected component 201 and shown in Figs. 4A-4F. Fig. 9 shows the final result image 336 that is a combination of image 260 (Fig. 4F), image 276 (Fig. 5F), image 294 (Fig. 6F), image 312 (Fig. 7F) and image 334 (Fig. 8F). The image 336 clearly shows and outlines the entire objects 202, 204, 206, 208 and 210 despite the overlapping regions described above. It should be noticed that image 336 shows the objects 202..210 as clearly as the ground truth image 200c, shown in Fig. 2C. This confirms that the detection and completion steps of the present invention produce the correct result.

Figs. 10A-K illustrate all steps of the present invention at inference when used to segment and outline the objects of a new image 400, shown in Fig. 10A, that has not been analyzed before by any of the networks of the present invention. The original image 400 is fed as input to the first neural network of the detection stage which has been trained to produce a grayscale (non-binary) XOR belongingness image 420 (shown in Fig. 10B) and a grayscale (non-binary)

AND belongingness image 440 (shown in Fig. IOC) as the results of the detection network. The belongingness images 420, 440 depict the pixels in white, gray, or black wherein the specific grayscale rendered indicates the probability of belonging to the XOR-mask and AND-mask respectively. The brighter the grayscale the higher the probability of belonging. The shades of gray between white and black indicate probabilities between white (1) and black (0). The belongingness images 420, 440 are then each binarized by e.g., the Otsu thresholding method or any other suitable method, to produce a binary XOR-mask 450 (shown in Fig. 10D) and a binary AND-mask 470 (shown in Fig. 10E), respectively. Fig. 10F shows an image 480 of the XOR-mask for a selected connected component 401 (highlighted in gray), together with its bounding box 482 and padding box 484. Fig. 10G shows the segmentation set 490 for the selected connected component 401, i.e., only the selected connected component 401 from the padding box 484 portion of the XOR-mask image 480, the part of the original image 400, and the overlapping segments of the AND-mask 470 disposed inside the padding box 484 of the selected connected component 401. Preferably, there is only one padding box per selected connected component. Fig. 10H shows the resulting gray-scale belongingness image 500 when using the segmentation set as input to the second completion neural network. The belongingness image 500 is thus the output from the second or completion network as a result of the analysis by the second network of the segmentation set 490. Fig. 101 shows the binarized belongingness image 510 based on view 500 in Fig. 10H that has been binarized (by, for example, the Otsu thresholding method), which is the final object segmentation mask of the selected connected component 401. Fig. 10J shows an image 512 of the final object segmentation mask overlaid on its padding box region of the original image 400. Fig. 10K shows the correct outlines of all objects detected by the method overlaid on the original image 400. As can be seen in Fig. 10K, the overlapping segments are correctly segmented and clearly shown so that the outline of the objects 404, 406, 408, 410 can be seen despite overlapping object 402. Similarly, the outline of object 402 can be clearly seen despite the object 402 overlapping or being overlapped by objects 404, 406, 408 and 410. An alternative way of creating the gap between connected components in the present invention is to erode or shrink the ground truth objects (used during training phase) or the connected components (at inference) prior to subtracting unexpanded overlapping regions when creating the XOR-mask, i.e., the opposite of what is described above wherein the connected components (XOR-mask) are kept fixed while the overlapping regions are expanded when creating the training data. Everything else is preferably kept the same. While the present invention has been described in accordance with preferred compositions and embodiments, it is to be understood that certain substitutions and alterations may be made thereto without departing from the spirit and scope of the following claims.

Claims

We claim:

1. A method of analyzing an image having overlapping objects, comprising: an imaging device providing an input image (100) having a first object (102) overlapping a second object (104) at a first overlapping segment (120), the first object (102) comprising the first overlapping segment (120) and a first non-overlapping connected component, the second object (104) comprising the first overlapping segment (120) and a second non-overlapping connected component; a first network or first computational structure receiving the input image (100) from the imaging device, the first network or first computational structure calculating a first image containing only the first overlapping segment (120) as a first output image and a second image containing only the first and second non-overlapping connected components as a second output image; a separate second network or second computational structure extracting and processing each non-overlapping connected component in the second output image together with the first output image and input image (100); the second network or second computational structure receiving the first non-overlapping connected component of the second output image, the first output image with overlapping segments and the input image (100); and the second network calculating a resulting image of the first object (102) based on the first non-overlapping connected component and the overlapping segments (120) and the input image (100).

2. A method of creating a training set for machine learning, for adjusting parameters of a first computational structure or first network having an improved tolerance to uncertain predictions made by the first computational structure or first network, comprising: an imaging device providing an input image (100) having a first object (102) overlapping a second object (104) at a first overlapping segment (120), the first object (102) comprising the first overlapping segment (120) and a first non-overlapping connected component, the second object (104) comprising the first overlapping segment (120) and a second non-overlapping connected component, the first non overlapping and the second non-overlapping component having a first gap (148) defined therebetween; enlarging the first gap (148) to a second gap wherein the second gap is greater than the first gap (148); providing ground truth object information for a first and a second object; setting pixel values of pixels in a first target image representing overlapping segments to one when the pixels belong to more than one ground truth object or when the pixels are located within a predetermined distance extending from the overlapping segments, otherwise setting the pixel values to zero; and setting the pixel values of pixels in a second target image representing non-overlapping connected components to one when the pixels belong to only one ground truth object and when a corresponding position is zero in the first target image, otherwise setting the pixel values to zero.

3. The method of claim 1 wherein the method further comprises the step of providing the first network or first computational structure with a training input image having corresponding non-overlapping connected components and ground truth object segmentations of overlapping objects and training the first network or first computational structure by updating parameters based on the training input image.

4. The method of claim 3 wherein the method further comprises the steps of making the objects in a ground truth object segmentations image having segmentations in overlapping segments known to the first network or first computational structure in a training scheme, and the first overlapping segment separating the first connected component from the second connected component so that the first connected component is distinctly separated from the second connected component.

5. The method according to claim 4 wherein the method further comprises the steps of a second network or second computational structure learning in a second training scheme to reconstruct the first object (102) based on the first non- overlapping connected component, the expanded first overlapping segment and the ground truth object segmentations image.

6. The method according to claim 5 wherein the method further comprises the steps of the second network or second computational structure learning in the second training scheme to reconstruct the second object (104) based on the second non-overlapping connected component, the expanded first overlapping segment and the ground truth object segmentations image.

7. The method according to claim 1 wherein the method further comprises the step of creating a padding box (252) encompassing the first non-overlapping connected component.

8. The method according to claim 7 wherein the method further comprises the step of preparing one padding box for each connected component in the input image (100).

9. The method according to claim 7 wherein the method further comprises the step of creating a segmentation set by combining a region of the input image (100) corresponding to the padding box (252) with the non-overlapping connected component and with an expanded overlapping segment.