WO2023283411A2 - Method for machine-learning based training and segmentation of overlapping objects - Google Patents
Method for machine-learning based training and segmentation of overlapping objects Download PDFInfo
- Publication number
- WO2023283411A2 WO2023283411A2 PCT/US2022/036470 US2022036470W WO2023283411A2 WO 2023283411 A2 WO2023283411 A2 WO 2023283411A2 US 2022036470 W US2022036470 W US 2022036470W WO 2023283411 A2 WO2023283411 A2 WO 2023283411A2
- Authority
- WO
- WIPO (PCT)
- Prior art keywords
- overlapping
- image
- connected component
- mask
- network
- Prior art date
Links
- 230000011218 segmentation Effects 0.000 title claims abstract description 112
- 238000000034 method Methods 0.000 title claims abstract description 80
- 238000012549 training Methods 0.000 title claims abstract description 66
- 238000010801 machine learning Methods 0.000 title claims abstract description 17
- 238000003384 imaging method Methods 0.000 claims abstract description 9
- 238000012545 processing Methods 0.000 claims description 4
- 238000013459 approach Methods 0.000 abstract description 8
- 238000001514 detection method Methods 0.000 description 67
- 238000013527 convolutional neural network Methods 0.000 description 9
- 239000002245 particle Substances 0.000 description 5
- 239000000523 sample Substances 0.000 description 5
- 238000013528 artificial neural network Methods 0.000 description 4
- 239000002502 liposome Substances 0.000 description 4
- 108090000623 proteins and genes Proteins 0.000 description 4
- 238000013106 supervised machine learning method Methods 0.000 description 4
- 238000004627 transmission electron microscopy Methods 0.000 description 4
- 238000004458 analytical method Methods 0.000 description 3
- 238000000604 cryogenic transmission electron microscopy Methods 0.000 description 3
- 238000013135 deep learning Methods 0.000 description 3
- 239000002105 nanoparticle Substances 0.000 description 3
- 238000003917 TEM image Methods 0.000 description 2
- 230000005540 biological transmission Effects 0.000 description 2
- 239000000969 carrier Substances 0.000 description 2
- 238000006243 chemical reaction Methods 0.000 description 2
- 230000007423 decrease Effects 0.000 description 2
- 230000000916 dilatatory effect Effects 0.000 description 2
- 238000012377 drug delivery Methods 0.000 description 2
- 238000009509 drug development Methods 0.000 description 2
- 230000000694 effects Effects 0.000 description 2
- 238000001476 gene delivery Methods 0.000 description 2
- 150000002632 lipids Chemical class 0.000 description 2
- 238000005259 measurement Methods 0.000 description 2
- 238000000386 microscopy Methods 0.000 description 2
- 239000000203 mixture Substances 0.000 description 2
- 210000003463 organelle Anatomy 0.000 description 2
- 238000003908 quality control method Methods 0.000 description 2
- 241000700605 Viruses Species 0.000 description 1
- 230000004075 alteration Effects 0.000 description 1
- 238000013473 artificial intelligence Methods 0.000 description 1
- 230000008901 benefit Effects 0.000 description 1
- 239000012472 biological sample Substances 0.000 description 1
- 230000015572 biosynthetic process Effects 0.000 description 1
- 210000004027 cell Anatomy 0.000 description 1
- 230000001413 cellular effect Effects 0.000 description 1
- 230000008859 change Effects 0.000 description 1
- 238000003759 clinical diagnosis Methods 0.000 description 1
- 239000000470 constituent Substances 0.000 description 1
- 239000013078 crystal Substances 0.000 description 1
- 238000013136 deep learning model Methods 0.000 description 1
- 238000013461 design Methods 0.000 description 1
- 230000010339 dilation Effects 0.000 description 1
- 201000010099 disease Diseases 0.000 description 1
- 208000037265 diseases, disorders, signs and symptoms Diseases 0.000 description 1
- 239000003814 drug Substances 0.000 description 1
- 229940079593 drug Drugs 0.000 description 1
- 238000001035 drying Methods 0.000 description 1
- 230000003628 erosive effect Effects 0.000 description 1
- 210000001808 exosome Anatomy 0.000 description 1
- 238000002073 fluorescence micrograph Methods 0.000 description 1
- 230000008014 freezing Effects 0.000 description 1
- 238000007710 freezing Methods 0.000 description 1
- 238000001415 gene therapy Methods 0.000 description 1
- 238000004519 manufacturing process Methods 0.000 description 1
- 238000012986 modification Methods 0.000 description 1
- 230000004048 modification Effects 0.000 description 1
- 238000000879 optical micrograph Methods 0.000 description 1
- 238000005457 optimization Methods 0.000 description 1
- 230000037361 pathway Effects 0.000 description 1
- 238000002360 preparation method Methods 0.000 description 1
- 239000003755 preservative agent Substances 0.000 description 1
- 230000008569 process Effects 0.000 description 1
- 238000009877 rendering Methods 0.000 description 1
- 238000011160 research Methods 0.000 description 1
- 210000003705 ribosome Anatomy 0.000 description 1
- 239000000126 substance Substances 0.000 description 1
- 238000006467 substitution reaction Methods 0.000 description 1
- 230000009897 systematic effect Effects 0.000 description 1
- 230000001225 therapeutic effect Effects 0.000 description 1
- 239000013598 vector Substances 0.000 description 1
- 230000003612 virological effect Effects 0.000 description 1
Classifications
-
- G—PHYSICS
- G06—COMPUTING; CALCULATING OR COUNTING
- G06V—IMAGE OR VIDEO RECOGNITION OR UNDERSTANDING
- G06V20/00—Scenes; Scene-specific elements
- G06V20/60—Type of objects
- G06V20/69—Microscopic objects, e.g. biological cells or cellular parts
- G06V20/695—Preprocessing, e.g. image segmentation
-
- G—PHYSICS
- G06—COMPUTING; CALCULATING OR COUNTING
- G06T—IMAGE DATA PROCESSING OR GENERATION, IN GENERAL
- G06T7/00—Image analysis
- G06T7/10—Segmentation; Edge detection
- G06T7/11—Region-based segmentation
-
- G—PHYSICS
- G06—COMPUTING; CALCULATING OR COUNTING
- G06T—IMAGE DATA PROCESSING OR GENERATION, IN GENERAL
- G06T2207/00—Indexing scheme for image analysis or image enhancement
- G06T2207/20—Special algorithmic details
- G06T2207/20084—Artificial neural networks [ANN]
Definitions
- the present invention relates to a method for training and automatically segmenting overlapping objects in images such as overlapping objects in images acquired with an imaging device such as a microscope. More particularly, the present invention includes combinatorial set theory in a training scheme and at inference of a machine learning approach for automatic segmentation of overlapping objects imaged with, for example, an electron microscope.
- viruses In the pharmaceutical industry, viruses, Virus-Like Particles (VLPs), liposomes and lipid nanoparticles (LNPs) are extensively used as carriers for drug and gene delivery.
- VLPs Virus-Like Particles
- LNPs lipid nanoparticles
- TEM Transmission electron microscopy
- CryoTEM cryogenic TEM
- rtTEM room-temperature TEM
- nsTEM negative stain
- a stain and/or some other kind of preservative agent must be added to embed the objects in order for important structures to be maintained and not destroyed by the drying process that occurs at room-temperature.
- a negative stain is sometimes used because it has the effect of increasing the contrast in the sample so that certain types of details and sample constituent differences are enhanced.
- AI Artificial Intelligence
- machine learning such as image-based deep-learning offers the ability of excluding the human's often poor capability of instructing the computer about what information to extract and how to measure and analyze in order to be able to learn relevant information from examples. It has the drawback of not incorporating information or "types" not represented in the examples into the deep-learning model. This type of information is, hence, not typically learnt by the machine/computer. Another problem is that the training models may also learn erroneous features when there are systematic errors in the training data.
- Machine-learning techniques more specifically deep learning and the computational structures known as convolutional neural networks, have lifted image processing and analysis to a whole new and different level. This is especially for computer vision applications that are based on the enormous amounts of accessible natural scene images available on the Internet.
- Biomedical applications such as microscopy, face a different scenario with fewer accessible training images.
- the purpose of the processing and analysis is slightly different and application specific.
- a sample In microscopy, a sample often contains multiple objects that should be detected, identified, and measured in different ways.
- the high number of different types of objects along with their measured characteristic features are in turn used for e.g., clinical diagnosis, disease pathway research and understanding, drug development and quality control. It is, thus, important that each object is segmented in its full size and shape so that the extracted measurements are not biased when objects are overlapping one another or when objects are partly or fully hidden behind other objects. Thanks to the shared weights of convolutional neural networks, these networks can very effectively learn to determine which pixels in an image belong to some type of object. A problem with this sharing of weights, however, is that when overlapping objects are present in the image, there is no obvious way of training a network to determine pixels that belong to the object type and simultaneously separate the segmentations of the individual overlapping objects.
- the present invention provides an effective solution to the problem associated with overlapping objects.
- the method of the present invention relates to a sequence of smart set- theory combinatorial steps of object segments combined with machine-learning approaches. Using this simple, yet clever, combination allows for automatic, accurate and reliable training of such networks and once the networks are trained, detection of full object geometries of overlapping objects in complex images.
- An important aspect of the present invention is the idea and novel insight of splitting the segmentation task into to two machine-learning tasks. The first task produces two mask images corresponding to non-overlapping and overlapping object segments, and the second task combines the masks to full objects with correct geometry.
- the step of splitting the segmentation into two steps also allows for the inclusion of a simple, yet effective, modification of the overlapping segments, during the training stages, that makes the segmentation less sensitive to incorrect predictions made by the machine learning methods.
- the overall segmentation approach is thereby made more robust to handle complex images and images of varying appearances. More particularly, the method of the present invention relates to the steps listed below:
- the method of the present invention is applicable to other datatypes and contents, such as lipid nanoparticles (LNPs), gene therapy and viral biological vectors (i.e., the particles that contain or carry the genes), cellular organelles such as exosomes and ribosomes, and other gene/drug delivery particles as well as overlapping cells or subcellular organelles (such as nuclei) in various types of light and fluorescence microscopy images.
- LNPs lipid nanoparticles
- gene therapy and viral biological vectors i.e., the particles that contain or carry the genes
- cellular organelles such as exosomes and ribosomes
- other gene/drug delivery particles as well as overlapping cells or subcellular organelles (such as nuclei) in various types of light and fluorescence microscopy images.
- the method of the present invention is for analyzing an image having overlapping objects.
- An imaging device providing an input image having a first object overlapping a second object at a first overlapping segment.
- the first object comprising the first overlapping segment and a first non-overlapping connected component.
- the second object comprising the first overlapping segment and a second non-overlapping connected component.
- a first network or first computational structure receiving the input image from the imaging device.
- the first network or first computational structure calculating a first image containing only the first and second overlapping segments as a first output image and a second image containing only the first and second non overlapping connected components as a second output image.
- a separate second network extracting and processing each non overlapping connected component in the second output image together with the first output image and input image.
- the second network or second computational structure receiving the first non-overlapping connected component of the second output image, the first output image with overlapping segments and the input image.
- the second network or second computational structure calculating a resulting image of the first object based on the first non-overlapping connected component and the overlapping segments and the input image.
- the first object comprising the first non-overlapping connected component and the first overlapping segment.
- the method of present invention is creating a training set for machine learning, for determining parameters of a first computational structure or first network having an improved tolerance to uncertain predictions made by the first computational structure or network.
- An imaging device providing an input image having a first object overlapping a second object at a first overlapping segment.
- the first object comprising the first overlapping segment and a first non-overlapping connected component.
- the second object comprising the first overlapping segment and a second non-overlapping connected component.
- the first non-overlapping and the second non overlapping component having a first gap defined therebetween. Enlarging the first gap to a second gap wherein the second gap is greater than the first gap.
- the method of the present invention further comprises the step of providing the first network or first computational structure with a training input image or training input images having corresponding non-overlapping connected components and ground truth object segmentations of overlapping objects and training the first network or first computational structure by updating parameters based on the training input image.
- the method of the present invention further comprises the steps of making the objects in a ground truth object segmentations image having segmentations in overlapping segments known to the first network or first computational structure in a second training scheme, and the first overlapping segment separating the first connected component from the second connected component so that the first connected component is distinctly separated from the second connected component.
- the method of the present invention further comprises the steps of a second network or second computational structure learning in the second training scheme to reconstruct the first object based on the first non-overlapping connected component, the expanded first overlapping segment and the ground truth object segmentations image.
- the method of the present invention further comprises the steps of the second network or second computational structure learning in the training scheme to reconstruct the second object based on the second non-overlapping connected component, the expanded first overlapping segment and the ground truth object segmentations image.
- the method of the present invention further comprises the step of creating a padding box encompassing the first non-overlapping connected component.
- the method of the present invention further comprises the step of preparing one padding box for each connected component in the input image.
- the method of the present invention further comprises the step of creating a segmentation set by combining a region of the input image corresponding to the padding box with the non-overlapping connected component and with an expanded overlapping segment.
- Fig. 1A is an original image showing overlapping objects of liposomes depicted in a transmission electron microscope
- Fig. IB is an image of ground truth object segmentations overlaid on the original image shown in Fig. 1A;
- Fig. 1C is an image of a total object mask of the present invention
- Fig. ID is an image of expanded objects of the image in
- FIG. IB of the present invention overlaid on the original image shown in Fig. 1A;
- Fig. IE is an image of a corresponding AND-mask of the present invention
- Fig. IF is an image of a corresponding XOR-mask of the present invention
- Fig. 2A is an original image depicting overlapping objects to be analyzed by using the method of the present invention
- Fig. 2B is an image depicting overlapping objects which are incorrectly segmented and outlined as large undesirable conjoined objects
- Fig. 2C is an image of individually outlined objects with correct segmentations and the contours of each object are clearly shown (ground truth object segmentations);
- Fig. 2D is an XOR-mask image of the present invention overlaid on the original image in Fig. 2A, depicting non overlapping connected components of all the objects;
- Fig. 3A is a binary XOR-mask image of the present invention including a bounding box and a padding box of one connected component highlighted in gray;
- Fig. 3B is the original image shown in Fig. 2A including the padding box of the present invention
- Fig. 3C is a binary AND-mask image disposed inside the padding box of the present invention of the images shown in Figs. 3A-3B;
- Fig. 4A is a zoomed in portion of the binary XOR-mask image shown in Fig. 3A including a bounding box and a padding box encompassing the first selected connected component (highlighted in gray) of the present invention
- Fig. 4B is a binary image of the first selected connected component in the XOR-mask image shown in Fig. 4A sized as the padding box of the present invention
- Fig. 4C is a detailed zoomed in view of the input image (shown in Fig. 3B) showing the region corresponding to the padding box for the first selected connected component of the present invention
- Fig. 4D is a detailed zoomed in view of the AND-mask image shown in Fig. 3C corresponding to the region of the padding box of the present invention
- Fig. 4E is a view of a segmentation set of the first selected connected component of the present invention.
- Fig. 4F is a resulting image of the first correctly segmented object corresponding to the first selected connected component analyzed by using the method of the present invention
- Fig. 5A is a detailed zoomed in part of the binary XOR- mask having a bounding box and padding box encompassing the second selected connected component (highlighted in gray) of the present invention
- Fig. 5B is a binary image of the second selected connected component in the XOR-mask shown in Fig 5A of the present invention.
- Fig. 5C is a detailed zoomed in view of the input image showing the region corresponding to the padding box for the second selected connected component of the present invention
- Fig. 5D is a detailed zoomed in view of the AND-mask in Fig. 3C corresponding to the region of the padding box of the second selected connected component of the present invention
- Fig. 5E is a view of a segmentation set of the second selected connected component of the present invention.
- Fig. 5F is a resulting image of the second correctly segmented object analyzed by using the method of the present invention.
- Fig. 6A is a detailed zoomed in part of the binary XOR- mask having a bounding box and padding box encompassing the third selected connected component (highlighted in gray) of the present invention
- Fig. 6B is a binary image of the third selected connected component of the XOR-mask shown in Fig 6A of the present invention.
- Fig. 6C is a detailed zoomed in view of the input image showing the region corresponding to the padding box for the third selected connected component of the present invention
- Fig. 6D is a detailed zoomed in view of the AND-mask in Fig. 3C corresponding to the region of the padding box of the third selected connected component of the present invention
- Fig. 6E is a view of a segmentation set of the third selected connected component of the present invention.
- Fig. 6F is a resulting image of the third correctly segmented object analyzed by using the method of the present invention.
- Fig. 7A is a detailed zoomed in part of the binary XOR- mask having a bounding box and padding box encompassing the fourth selected connected component (highlighted in gray) of the present invention
- Fig. 7B is a binary image of the fourth selected connected component of the XOR-mask shown in Fig 7A of the present invention.
- Fig. 7C is a detailed zoomed in view of the input image showing the region corresponding to the padding box for the fourth selected connected component of the present invention.
- Fig. 7D is a detailed zoomed in view of the AND-mask in Fig. 3C corresponding to the region of the padding box of the fourth selected connected component of the present invention
- Fig. 7E is a view of a segmentation set of the fourth selected connected component of the present invention
- Fig. 7F is a resulting image of the fourth correctly segmented object analyzed by using the method of the present invention.
- Fig. 8A is a detailed zoomed in part of the binary XOR- mask having a bounding box and padding box encompassing the fifth selected connected component (highlighted in gray) of the present invention
- Fig. 8B is a binary image of the fifth selected connected component of the XOR-mask shown in Fig 8A of the present invention.
- Fig. 8C is a detailed zoomed in view of the input image showing the region corresponding to the padding box for the fifth selected connected component of the present invention.
- Fig. 8D is a detailed zoomed in view of the AND-mask in Fig. 3C corresponding to the region of the padding box of the fifth selected connected component of the present invention
- Fig. 8E is a view of a segmentation set of the fifth selected connected component of the present invention.
- Fig. 8F is a resulting image of the fifth correctly segmented object analyzed by using the method of the present invention.
- Fig. 9 is a final resulting image of all the correctly segmented objects analyzed by using the method of the present invention.
- Fig. 10A is an original image to be analyzed by the detection or first network of the present invention.
- Fig. 10B is the resulting non-binary XOR belongingness image of the detection network of the present invention
- Fig. IOC is the resulting non-binary AND belongingness image of the detection network of the present invention
- Fig. 10D is the binary XOR mask of the present invention resulting from a binarization of the XOR belongingness image shown in Fig. 10B;
- Fig. 10E is the binary AND mask of the present invention resulting from a binarization of the AND belongingness image shown in Fig. IOC;
- Fig. 10F is a detailed zoomed in part of the binary XOR- mask having a bounding box and padding box encompassing a selected connected component (highlighted in gray) of the present invention
- Fig. 10G is a view of a segmentation set of the selected connected component of the present invention.
- Fig. 10H is the resulting belongingness image of the segmentation set in Fig. 10G given as input and analyzed by the completion or second network of the present invention
- Fig. 101 is the resulting binary object mask of completion stage of the present invention.
- Fig. 10J is an image with the final result, the correctly segmented object, for the selected connected component analyzed by the completion stage of the method of the present invention overlaid on the original input image;
- Fig. 10K is a final result image of all the objects analyzed by using the method of the present invention overlaid on the original input image (shown in Fig. 10A).
- the method of the present invention is preferably used in conjunction with any suitable detection method that produces a class belongingness image where each pixel value of each pixel in the image is interpreted as the certainty or probability of the corresponding pixel of the input image belonging to the considered class.
- a pixel value close to 1 indicates an extremely high probability of the pixel belonging to the class such as belonging to an object in the image that is being investigated.
- a pixel value close to 0 indicates an extremely low probability of the pixel belonging to the class such as belonging to the object in the image that is being investigated.
- a computerized or computational device is used when training and applying the method of the present invention.
- the device preferably has one or several computational structures or convolutional neural networks implemented that are first trained to carry out the detection and completion stages of the present invention.
- the principles or steps of the method of the present invention work with any suitable detection method used to detect objects in an image such as an image depicted with a microscope.
- a suitable detection method may be used to detect and outline objects for the ground truth object segmentations image that is used for training, i.e., determining the parameters of the computational structures or networks, as explained in detail below.
- the highest possible value is, preferably, interpreted by the computational device as the image pixel belonging to an object of the segmentation class with full certainty.
- the certainty also decreases such that the lowest possible value 0 is interpreted by the device to mean that the image pixel completely or definitely does not belong to an object of the class.
- Values are typically floating point values ranging from 1 (highest) to 0 (lowest) but can also be of other types and in other ranges. It is, for example, not uncommon to have images represented by integers in the range 0-255.
- the belongingness image can be converted to a binary segmentation mask by letting the computational device change each pixel value to either 1 or 0 according to a preferred procedure.
- the preferred procedure could, for instance, be a simple thresholding method where belongingness values above a certain value (i.e., the threshold) are converted to 1 and values equal to or below the threshold are converted to 0.
- the binarization can also consider or combine multiple features or belongingness images according to pre-determined rules.
- the binary segmentation mask is created in which all pixels are either 1 or 0.
- the binary segmentation masks can be used to determine the connected components in the mask. For example, all the pixels in a connected component have the value 1 and form one connected continuous area that is not split up or separated by pixels that have the value 0. Preferably, any two neighboring pixels of the same binary pixel value (either 1 or 0) belong to the same connected component.
- connected components correspond to an intuitive notion of isolated groups or "islands" of 1-valued class pixels surrounded by 0-valued background pixels. For clarity, the 1-valued class pixels could be depicted in white color or in a certain pattern while the 0-valued class pixels could be depicted in black color or in another pattern than the pattern used to depict 1-valued class pixels.
- each complete object is represented by a connected component in the binary segmentation mask.
- the objects are overlapping in the image, individual entire objects of a class cannot be distinguished from one another in the binary segmentation mask. Instead, overlapping objects are conjoined into a single connected component (that includes many objects) so that the geometry of the individual objects cannot unequivocally be determined from the single conjoined connected component. In other words, the relationship between objects and single conjoined connected components is then not 1-to-l. This means that the individual objects are not the same as each conjoined connected component shown in the image because the conjoined connected component includes two or more objects that are conjoined and not easily separable from each other.
- a 1-to-l relationship between objects and connected components can be achieved by using the method of the present invention that takes special consideration of regions in the input image where the objects overlap.
- the method of the present invention uses two separate images.
- the first image corresponds to the certainty of pixels belonging to a non-overlapping region or segment of an object in the image.
- the second image corresponds to the certainty of pixels belonging to regions or segments in the objects where there is overlap with another object or objects. In other words, pixels located in the overlapping segment belong to more than one object.
- the two different images are henceforth called the belongingness XOR-image and the belongingness AND-image, respectively.
- the belongingness images that are produced as output by the detection of the first computational structure or network are not binary.
- the belonginess images produced by the first network indicate the probability of each pixel belonging to the class by varying the shade of the gray scale from any shade between white (100% probability) to black (0% probability).
- the probabilities according to the detection or first network are based on how the detection network has been trained to set the probabilities. Their conversions to binary images may simply be called the XOR-mask and the AND- mask. This simple way of keeping track of overlapping object regions and using them in the deep learning framework (instead of just using one "object mask”) is a core feature of the present invention. It allows for a successful final segmentation of full geometry objects despite the objects being overlapped by other objects, while being computationally efficient.
- the method of the present invention includes two separate consecutive stages i.e., a detection stage (first stage) followed by a completion stage (second stage).
- the networks in the device must first be trained to do the detection and the completion stages.
- the detection and completion stages preferably, include a combination of at least two machine learning methods or networks.
- the connected components of the XOR-mask map to individual objects of the input image in a 1-to-l fashion. This means the pixels in the pixel positions located in each connected component of the XOR mask belong to only one object.
- the connected components in the XOR-masks do not generally represent the full object geometries in the original input image. Often a connected component only represents a portion of an object. This is because overlapping objects in the input image are represented by partial segmentations, i.e., the non-overlapping regions or segments because the overlapping segments of the objects are interpreted by the networks of the device as belonging to the background of the image.
- the overlapping segments are interpreted by the device/networks as not belonging to the object but to the background.
- the full object geometries can be rectified in the later completion stage by using the AND-mask, as described in detail below.
- the detection stage of the method of the present invention is, preferably, implemented in the computational device by using a supervised machine-learning method such as a fully convolution neural network e.g., U-Net trained on the XOR-mask and the AND-mask as targets for an input image wherein the input image is used by the trained U-Net to create the belongingness XOR image and the belongingness AND image.
- a supervised machine-learning method such as a fully convolution neural network e.g., U-Net trained on the XOR-mask and the AND-mask as targets for an input image wherein the input image is used by the trained U-Net to create the belongingness XOR image and the belongingness AND image.
- the target masks i.e., the AND and XOR masks
- the target masks used when training the supervised machine learning method for the detection task, can be created so that predictions by the network (the belongingness XOR image and the belongingness AND image) that deviate from the exact desired target (as stipulated
- a First Network Is Trained To Carry Out The Detection Stage Geometrically correct segmentations (the ground truth object segmentations) of each individual object are, preferably, used together with the input image when the networks of the device are trained to create the XOR-mask and the AND-mask from the input image.
- the ground truth object segmentations may be created by using a suitable detection method in order to correctly outline or segment all individual objects in the image despite the objects overlapping one another. In other words, the position and shape of the outer edge of each object in the overlapping segment are known so that the objects are correctly segmented despite any overlap between objects.
- ground truth segmentations are only used during the training of the networks to make sure the networks are correctly trained since it is known what each object looks like in the ground truth object segmentations image despite the objects overlapping one another.
- target masks AND and XOR masks
- a provisional XOR-mask is created based on the image of the ground truth object segmentations by setting pixel values in the mask to 0 when the pixel positions of the pixels are located not inside an object to be analyzed in the image or when the pixel positions of the pixels are part of or belong to two objects or more.
- the pixel values are set to 1. This means the pixels located inside an object are set to 1 while the pixels located in the overlapping segments and the background are set to 0.
- an AND-mask is created based on the image of the ground truth object segmentations by setting the pixel values in this mask to 1 when the pixel positions of the pixels are located inside or in close proximity to (nearly inside) two or more objects. This makes the overlapping segments depicted in the AND-mask slightly larger than the overlapping segments depicted in the image of the ground truth object segmentations. Otherwise, the pixel values are set to 0.
- the proximity criterion can, for example, be within a certain distance, to any pixel that is inside two or more objects.
- This expansion of the overlapping segments used in the AND-mask is an important step of the method of the present invention, during the training stage of the first and the second networks, because it makes the method less sensitive to flaws or inaccuracies in the predictions made by the networks that are being trained. It should be noted that the expansion of the overlapping segments in the AND mask is only done during the training stage but not when the fully trained network is creating AND masks when analyzing a new unknown image (at inference). Instead of expanding the overlapping segments, it would also be possible to reduce the size of the connected components relative to the overlapping segments so that a gap is formed between adjacent connected components.
- an XOR-mask is created by setting the pixel values to 0 when the corresponding pixel in the provisional XOR-mask (from step 1) has a pixel value of 0 or when the corresponding pixel (same location) in the AND-mask has a pixel value of 1.
- the other pixels are set to 1. This means that a connected component for an overlapping object in the XOR mask is slightly smaller than the corresponding connected component in the provisional XOR mask.
- overlapping segments of the objects render slightly bigger areas of the background disposed between connected components in the XOR-mask of step 3 than would be seen in the provisional XOR- mask of step 1.
- the expanded overlapping segments mean the connected components in the XOR-masks are rendered slightly smaller than the connected components in the original image because the overlapping segments in the original image have not been expanded.
- a proper gap between two detected overlapping objects is guaranteed because the AND- mask has been expanded to also include "nearly inside" pixels. This gap ensures that the connected components in the XOR-mask are separated from one another while each connected component only belongs to and/or depicts one object.
- the connected components corresponding to different objects are thus distinctly separated from one another by the gaps created by the expanded overlapping segments in the AND-mask.
- the size of the gap corresponds to the distance in the proximity criterion of step 2.
- the gap between the connected components in the XOR-mask used as the target mask when training the network for the detection stage prevents object segmentations from becoming undesirably conjoined when the network prediction deviates slightly from a target result, as specified by the ground truth object segmentations.
- the trained first network is preferably used in the detection stage of a new input image with unknown object segmentations of overlapping objects.
- the image of the ground truth object segmentations is not used during the "real" detection stage i.e., when the trained first network is using its knowledge and applies it to the detection stage of a new image.
- the predictions made by the first network in the detection stage are preferably interpreted as the belongingness XOR-image and the belongingness AND-image.
- the connected components and overlapping segments of the belongingness images preferably, have pixels with different shades ranging from white to gray that indicate the likelihood of the pixels belonging or not. Figs.
- the lOB-lOC show examples of XOR and AND belongingness images, respectively.
- the belongingness images are preferably the output of the detection stage of the first network.
- the belongingness images can be converted to binary segmentation masks by, for example, applying a threshold value to the images, as mentioned above.
- the threshold value can be fixed (pre-determined and fixed in the computational device) or manually set (given to the computational device by the user) or automatically determined. Commonly used and suitable automatic thresholding methods are, for example, the Otsu thresholding method or the minimum error thresholding method.
- a Second Network Is Preferably Trained To Carry Out The Completion Stage
- a different second or completion network preferably combines the connected components of the XOR-mask with the segments in the AND-mask and the input image. In this way, a collection of pixel regions that represent the full geometry of each of the objects in the input image is generated. For each connected component in the XOR-mask, a pixel region surrounding it is determined. The corresponding regions in the input image and in the AND-mask are then used together with the connected component to reconstruct the full geometry of the object.
- the completion stage may be implemented by using a supervised machine-learning method such as a second fully convolutional neural network.
- the second or completion network could, for example, be a U-Net that is trained on full geometry objects (ground truth object segmentations) as prediction targets for input images in the form of a region around each connected component (representing a part of an object in a 1-1 fashion) in the XOR-mask, and the corresponding regions in the original image and the AND-mask.
- the second network is thus trained to carry out the completion stage.
- the first network is preferably trained to carry out the detection stage from which the XOR-mask and AND-mask are derived while the second network is trained to carry out the completion stage for each object.
- the AND-mask and XOR-mask of the above steps 2 and 3 in the training of the detection stage are preferably re-used as the training data when training the second network to carry out the completion stage of the present invention.
- training data to train the second network to carry out the completion stage can be created by the following steps:
- the padding box should be sized to ensure that the entire object to be detected and analyzed is included inside the padding box before being sent into or submitted to the second network.
- the padding box is positioned such that the center of the padding box aligns with the centroid or center of the connected component in consideration.
- the bounding box is the smallest box that can be fitted around the connected component and it is automatically determined by the computational device by investigating the locations (x,y) coordinates of all pixels belonging to the connected component.
- a target mask region is created by rendering the same object in the ground truth object segmentations image with 1-valued pixels in a 0-valued rectangle of the same dimensions as the padding box.
- a XOR-mask region is created by cropping the XOR-mask used during the training of the first network of the detection stage, with the padding box so that the XOR- mask region has the same size as the padding box. Any pixel value of a pixel for which the corresponding pixel value in the target mask is 0 is set to 0.
- An AND-mask region is created by cropping the AND-mask, used during the training of the first network of the detection stage, with the padding box so that the AND- mask region and the padding box have the same size.
- An input image region is created by cropping the input image from step 1 with the padding box so that the input image region and the padding box have the same size.
- the padding box for input creation cannot, of course, be constructed from the ground truth object geometry being the "a posteriori" result of the network.
- the image of the ground truth object segmentations is only used as input during the training stage of the networks to learn the detection stage and the completion stage.
- the padding box is instead derived from or based on the XOR-mask which is the output from the detection stage of the new image.
- the XOR-mask (which includes a connected component) created during the detection stage of objects in the new image serves as the basis for creating the bounding box and the padding box for the objects in the new image analyzed during the completion stage.
- the input to the trained second network for the completion stage of the new image for full object geometry prediction is created as follows:
- the padding box is preferably positioned such that the center of the padding box aligns with the centroid of the connected component (example shown in Fig. 4A).
- a XOR-mask region is created by cropping the XOR-mask, created during the detection stage of the new input image, with the padding box. Any pixel that does not belong to the connected component is set to have a pixel value 0 (example shown in Fig. 4B).
- An input image region is created by cropping the new input image with the padding box (example shown in Fig. 4C).
- Fig. 4E shows the segmentation set used as input to the completion network.
- the prediction result of the second network used in the completion stage is interpreted as a belongingness image.
- this belongingness image is converted to a segmentation mask (example shown in Fig. 4F), such that the segmentations are interpreted as representing the full geometries of the object in the cropped new input image.
- the conversion from the belongingness image to the segmentation mask could, for example, be achieved by applying a threshold to the values. Any suitable thresholding method could be used such as the commonly used Otsu thresholding method (used in the illustrative examples in this application) or the minimum error thresholding method.
- Another approach could be to apply the watershed segmentation method to a gradient magnitude image derived from the belongingness image.
- Yet another approach could be to use the thresholded belonginess mask as a starting guess for a deformable or active shape model applied to the underlying original image.
- Figs. 10A-K illustrate all the steps of the present invention at inference, that is when the method is applied to a new image with overlapping objects .
- the first feature (training phase) of the present invention is the creation of a XOR-mask and an AND-mask from an image with prepared (ground truth) object segmentations that shows the correct segmentation of the objects although some of the objects are overlapping one another.
- the image of the ground truth object segmentations is used as the target image to train the first network.
- XOR may stand for "exclusive or” meaning the mask depicts or shows pixel positions that belong to one object or another object (but not both) in the image.
- AND may stand for "and” meaning that the mask depicts or shows pixel positions of overlapping segments that belong to at least two objects in the image i.e., the pixel positions belong to one object and another object.
- Fig. 1A shows an original input image 100 that contains overlapping or very close objects 102, 104, 106, 108, 110,
- Fig. IB is very similar to Fig. 1A but shows "ground truth" or ideal object segmentations overlaid on the original input image so that all the objects in the image can be clearly distinguished from one another despite the overlap. It should be understood that the image in Fig. IB is merely used to illustrate that the outlines of each object are fully known by using another suitable detection method to detect and outline each object in the original input image. As explained in more detail below, the image of the ground truth object segmentations is merely created and used as training data when training the first network to carry out the detection stage and when training the second network to carry out the completion stage.
- TEM transmission electron microscope
- object 102 and object 104 overlap at an overlapping segment 120.
- Object 112 and object 106 overlap at a segment 126.
- Object 114 and object 106 overlap at a segment 128.
- Object 106 and object 102 overlap at a segment 130.
- Object 106 and object 104 overlap at a segment 132.
- Object 104 and object 116 overlap at a segment 134.
- Object 114 and object 116 overlap at a segment 138.
- Object 118 and object 116 overlap at a segment 140.
- Object 110 is very close to object 102 and object 108 is very close to object 102 but there is no clear overlap between the objects.
- a mask may be used to emphasize, mark or render/display certain sections in an image.
- the masks are preferably, but not necessarily, binary.
- a binary AND-mask 144 and a binary XOR mask 146 can be created in the device, as shown below:
- a binary mask or total object mask 142 of all objects 102, 104, 106, 108, 110, 112, 114, 116 and 118 is first created, as shown in Fig. 1C. That is, all pixels belonging to any object 102..118 of image 100 in Figs. 1A and IB are set to 1 (white color) in the total object mask 142.
- Fig. IB shows image 100 overlaid with object overlapping regions as segments 120 ..140. All pixels not belonging to any object 102..118 in Fig. IB are set to 0 (black color) in the mask 142. The black color thus shows the background of the image or mask 142.
- Image 100' in Fig. ID shows a new set of dilated overlapping segments 120', 122', 124', 126', 128',
- each object 102..118 in Fig. 1A by some amount, such as a specific distance or 10% or any other suitable percentage, to create expanded objects 102'..118' and corresponding expanded overlapping segments 120'..140' (best shown in Fig.
- overlapping segment 128' shown in Fig. ID
- overlapping segment 130' has been expanded from overlapping segment 130 and so on.
- the binary AND-mask 144 is created, from the overlapping dilated or expanded segments 120'..140', shown in Fig. ID, in which each pixel that belongs to only one dilated object 102'..118' in Fig ID is set to background i.e., 0 or black color in Fig. IE. All pixels in Fig. ID that belong to two or more objects 102'..118' (AND operation) are set to foreground i.e., 1 or white color in Fig. IE. The result is that only the expanded overlapping segments 120'..140' are shown in white color in Fig. IE and everything else is shown in black color.
- the binary XOR-mask 146 as shown in Fig. IF, is created from the set difference, i.e., by subtracting the AND-mask 144, shown in Fig. IE, from the total object mask 142, shown in Fig. 1C.
- One important feature or advantage of dilating or expanding the object segments 102..118 to create the overlapping object region segments 120'..140' prior to creating the AND-mask 144 is that it makes sure that there is a gap or margin between the connected components of the XOR- mask 146 so that no object or connected component (i.e., partial object) in the XOR-mask 146 is touching or in contact with another object or connected component. All the connected components are distinctly separated from one another so that no white area in the XOR-mask 146 is touching another white area. In this way, there is, for example, a gap 148 between the connected component associated with object 106 and the connected component associated with object 114, as shown in Fig. IF.
- each white area (connected component) in the XOR-mask 146 is distinctly outlined and is associated with or representing only one object.
- a second important feature of the present invention is the step of training a fully convolutional neural network in the computational device or distributed devices, on a set of images that display ground truth object segmentations.
- the network such as the first or detection network, is first trained to create the basis (belongingness images) for the corresponding AND-mask and XOR-mask as output based on the image of the ground truth object segmentations.
- the first network creates an AND- belongingness image and an XOR-belongingness image prior to binarizing them to the AND-mask and XOR-mask, respectively.
- the idea is thus to train the first network to be able to carry out the same steps of creating the corresponding AND- mask and XOR-mask based on a new image as input where the correct object segmentation is not known.
- the trained network applies the same principles to the new image as the network learned, during the training of the detection stage, to apply to the image of the ground truth object segmentations.
- a central concept of the present invention is thus to first train the system on known images that include separate and overlapping objects in order to be able to apply it to unknown/new images that also include separate and overlapping objects.
- the networks embody the machine learning methods of the present invention.
- the networks are, preferably, computational structures in the family of machine learning methods.
- the networks are thus first trained on known examples of images to learn a method that they then can apply to unknown/new examples of images.
- Fig. 2A shows an original input image 200a that includes overlapping objects.
- the image 200a with overlapping objects 202, 204, 206, 208, and 210 cannot be segmented correctly right away with a fully convolutional neural network which has been trained to outline complete objects that have the full and correct geometry.
- objects 202..210 have been marked in Fig. 2A the principles that apply to objects 202..210 also apply to all the other objects in image 200a and other images.
- Objects 202..210 merely serve as examples to illustrate the steps and principles of the detection and completion stages of the present invention.
- Fig. 2B shows the same view as in Fig. 2A overlaid with an incorrectly segmented image 200b of overlapping objects 202..210 which are identical to the objects in image 200a but image 200b emphasizes or illustrates as an overlay that the objects are incorrectly outlined as large undesirable conjoined objects wherein the individual objects of the conjoined objects are difficult to distinguish from one another due to the overlapping regions.
- This undesired and incorrect segmentation corresponding to the typical result from a fully convolutional neural network which has been trained to outline complete objects (the full and correct geometry) in one go or step, makes the overlapping objects 202, 204, 206, 208 and 210 look like one large, conjoined object 199.
- the two-step approach of the method of the present invention is a critical feature in order for the networks to be able to reliably create correct segmentations of overlapping objects in an image.
- Fig. 2C depicts the same view as shown in Fig. 2A overlaid with a ground truth object segmentations image 200c of individually outlined objects 202..210 with correct outlines of the objects.
- any suitable detection method may be used to correctly detect and outline the objects in order to create the ground truth object segmentations image 200c shown in Fig. 2C used when training the networks of the present invention.
- Fig. 2D shows the same view as in Fig. 2A overlaid with outlines of the connected components of an XOR image 200d of overlapping objects 202..210 with incorrect outlining but individual detection of each connected component 201..209 separated by expanded overlapping segments 212', 214', 216' and 218'. It should be noted that the images 200a, 200b,
- 200c and 200d are based on the same view of objects.
- a feature of the present invention is the creation of an XOR-mask as the one which outlines are shown overlaid on the original image 200d in Fig. 2D as an output of the detection stage of the present invention.
- the XOR-mask has incorrect outlining of the objects (because the mask excludes the expanded or dilated overlapping object regions 212'..218' and depicts them in black as background) but illustrates the individual detection of the connected components 201..209 that are associated with the corresponding objects 202..210, respectively.
- the XOR mask depicted in image 200d overlaid on the original image is an example that shows that the first network has learnt not to include the overlapping segments 212'..218'.
- the XOR-mask in image 200d thus treats the expanded or dilated overlapping segments 212'..218', as explained in connection with Figs. 1A-1F, as belonging to the background 247 of the XOR-mask in image 200d. It should also be noticed that there is no overlap of segments in the XOR-mask shown in image 200d and each connected component 201..209 (i.e., the integral remaining portion of each object) corresponding to each object 202..210 are distinct and separated from the connected components of the other objects.
- the creation of the XOR-mask, shown overlaid on the original image in Fig. 2D, used when training the detection network is preferably done by first creating the total object mask (example shown in Fig. 1C) and then deducting the corresponding AND-mask, as explained in detail in connection with Figs. 1A-1F.
- a third important feature of the present invention is the creation and use of the padding box 252, best shown in Fig. 3A, which marks a region encompassing one connected component, such as the first connected component 201 corresponding to object 202 and some surroundings in one or more same-sized images.
- the padding box 252 is preferably constructed by enlarging (by some amount) the bounding box 254 that is the smallest sized box that surrounds or encompasses the selected connected component 201 in the XOR-mask image 200e.
- the XOR mask image 200e is the same mask as the one displayed overlaid on the original image in Fig. 2D. It shows the connected component 201 (marked in a gray shade) of the object 202.
- the bounding box 254 is placed and sized to have the minimum size that encompasses the connected component 201 (corresponding to object 202) in the XOR mask image 200e.
- the padding box 252 should be sized (i.e., be large enough) and placed (centered) so that it encompasses all overlapping regions belonging to the object to be reconstructed in the completion stage.
- Fig. 3A depicts the XOR mask image 200e that shows the first connected component 201 of the selected first object 202 and its bounding box 254 and padding box 252.
- the bounding box 254 is preferably the smallest sized box that encompasses the first connected component 201 of the first selected object 202.
- the larger size of the padding box 252 compared to the bounding box 254 ensures that the entire full geometry object
- the bounding box 254 is thus associated with an object such as object 202.
- the XOR mask image 200e shown in Fig. 3A is based on the original image 200a, shown in Fig. 2A. It should be understood that the bounding box 254 and padding box 252 are merely marked regions in the image to be analyzed.
- Fig. 3B shows the padding box 252 overlayed on the original input image 200d (that is virtually identical to image 200a) placed in the same area as the padding box 252 shown on the XOR mask image 200e.
- Fig. 3C shows a corresponding AND mask 200g with the padding box 252 marked.
- a fourth feature of the completion stage of the present invention is the creation of segmentation sets or masks i.e., the padding box regions are copied from the full-size images 200e, 200f, 200g and depicted as images 200e', 200f', 200g' that have the same size as the padding box 252. More particularly, given the original input image 200f (shown in Fig. 3B) and its corresponding XOR-mask image 200e (shown in Fig. 3A) and AND-mask 200g (shown in Fig. 3C), a segmentation set is created for each connected component, such as connected component 201 in the XOR-mask 200e, by the following steps:
- the padding box 252 of the connected component 201 corresponding to object 202 is constructed.
- the connected component 201 in padding box region 252 of the XOR mask image 200e' (shown in Fig. 4A) is drawn (or rather copied by the device) onto XOR mask 246 (shown in Fig. 4B) such that any pixel that belongs to the connected component 201 of object 202 is drawn white while any other region is drawn black.
- This image thus depicts the binary XOR-mask 246 of the connected component 201 that is sized according to the padding box 252.
- the content of the padding box 252 in image 200f (shown in Fig. 3B) is depicted in image 200f', as shown in Fig. 4C.
- the content of the padding box 252 of the AND mask 200g (shown in Fig. 3C) is depicted as AND-mask 200g', as shown in Fig. 4D.
- the image 200e' in Fig. 4A is a zoomed in part of the XOR mask image 200e (Fig. 3A).
- Fig. 4A displays the XOR mask image 200e' zoomed in on the bounding box 254 and padding box 252 for the connected component 201 which is highlighted in gray.
- the zoomed in image 200f' in Fig. 4C is the same as the padding box 252 in image 200f shown in Fig. 3B.
- the AND mask 200g' depicted in Fig. 4D is also shown inside the padding box 252 shown in Fig. 3C.
- the XOR mask 246, the image 200f', and the AND mask 200g' together constitute a segmentation set 258, illustrated jointly in Fig. 4E, as a composition of the XOR mask 246 (shown in Fig. 4B) and the AND mask 200g' (shown in Fig. 4D), overlaid on image 200f' (shown in Fig. 4C). It should be noted that applying the method of the present invention to new images does not require the use of any views related to the view of the ground truth object segmentations.
- the ground truth object segmentations are only used during the learning stages of the first and second networks. Once the networks have been fully trained, they only use input from the new image to be analyzed. It should also be noted that the overlap segments (AND-regions) 212'..218' are expanded when creating the segmentation set 258 used for training, while the connected component 201 is not expanded and has the correct size.
- a fifth feature of the present invention is to train a supervised machine learning method, such as a second network, to carry out the completion stage.
- the computational structure could be a fully convolutional neural network that, during the learning stages, is trained on a set of images including the image that shows the ground truth objects segmentations.
- One task of the second network is to output the complete geometry of an object given a segmentation set, such as segmentation set 258, as input.
- the segmentation sets can be created from the XOR-mask and AND-mask according to the procedure describe above. While training the network, the segmentation set, and the view of the corresponding ground truth object segmentation are preferably used as input. Again, the ground truth object segmentations image is only used during the training stage to, preferably, train the first network to carry out the detection stage and the second network to carry out the completion stage.
- a sixth feature of the present invention is the creation of the input to the completion stage from the output of the detection stage given only an image as input.
- Each connected component such as the connected component 201, in the XOR- mask and the AND-mask output of the detection stage (typically provided by the trained fully convolutional first neural network) is, together with the input image, used to create a segmentation set (i.e., based on images 246, 200f', 200g') displayed jointly in the segmentation set 258 in Fig. 4E which is supplied as input to the second network that performs the completion stage (typically a trained fully convolutional neural network) so that the final output (i.e., the resulting image) is a segmentation image 260, shown in Fig.
- One connected component can be used to produce one segmentation set so there is, preferably, only one segmentation set for each connected component.
- the overlapping object segments 212..218 of the original input image 200f' (Fig. 4C) are interpreted by the device in the second network as being parts of the full geometry of the object 202 and are added to the connected component 201 to create the full geometry of object 202, as shown in the resulting image 260 (shown in Fig. 4F).
- the completion stage is preferably executed multiple times to complete all connected components in the XOR-mask created during the detection stage by the first network.
- the process of completing five local connected components is illustrated in Figs. 5A-8F, and the combined final result with all five objects correctly outlined is shown in Fig. 9.
- a seventh feature of the present invention is that the completion stage only needs to be performed for objects that are overlapping. These can be directly identified from the connected components in the AND-mask, allowing for efficient implementation and low computational cost. In the case of an image with objects with no overlaps, the AND-mask is completely black. The connected components of the XOR-mask derived from the output of the first network in the detection stage then correspond to the correctly outlined objects.
- An eighth feature of the present invention is that the XOR-mask can be used to evaluate the degenerate case where an object is completely covered by a larger object. Any covered object can be recovered by checking for holes in the connected component of the covering object in the XOR-mask.
- a correct outline of the covered object is then achieved by reversing the dilation (expansion), i.e., eroding (shrinking), the corresponding connected component in the AND-mask .
- Figs. 5A-5F show the same principle views as Figs. 4A-4F but a bounding box 262 and padding box 264 are encompassing the second selected connected component 207 instead of the first selected connected component 201.
- FIG. 5A is a zoomed in part (image 266) of the binary XOR-mask 200e (shown in Fig. 3A) showing the bounding box 262 and the padding box 264 for the connected component 207 highlighted in gray.
- Fig. 5B is a binary XOR-mask 268 of the second connected component 207 i.e., the second selected (marked in gray) connected component in Fig. 5A.
- Fig. 5C is a corresponding input image 270 of the padding box 264 region of image 200a for the second selected connected component 207 corresponding to object 208.
- Fig. 5D is the binary AND-mask 272 of the padding box 264 showing the expanded overlapping segment 216' and portions of the expanded segments 214' and 218'.
- FIG. 5E is a view of a second segmentation set 274 of the second selected connected component 207.
- Fig. 5F is a resulting image 276 of the second selected connected component 207 that clearly shows the entire object 208 with correct segmentation and outline.
- the resulting image 276 is the result of the completion step related to the second selected connected component 207.
- the corresponding detection steps and completion steps are identical for the second selected connected component 207 corresponding to object 208 as were, for example, described in detail above regarding the first selected connected component 201 and shown in Figs. 4A-4F.
- Figs. 6A-6F show the same principle views as Figs. 4A-4F and Figs. 5A-5F but the bounding box 280 and padding box 282 are encompassing the third selected connected component 205 instead of the first selected connected component 201 and second selected connected component 207, respectively.
- Fig. 6A is a zoomed in part (image 284) of the binary XOR-mask 200e (shown in Fig. 3A) showing the bounding box 280 and the padding box 282 for the connected component 205 highlighted in gray.
- Fig. 6B is a binary XOR-mask 286 of the third connected component 205 i.e., the third selected connected component shown in gray in Fig. 6A.
- FIG. 6C is a corresponding input image 288 of the padding box 282 region of image 200f for the third selected connected component 205 corresponding to object 206.
- Fig. 6D is the binary AND-mask 290 of the padding box 282 showing the expanded overlapping segment 214' and portions of the expanded segment 216'.
- Fig. 6E is a view of a third segmentation set 292 of the third selected connected component 205.
- Fig. 6F is a resulting image 294 of the third selected connected component 205 that clearly shows the entire object 206 with correct segmentation and outline.
- the resulting image 294 is the result of the completion step related to the third selected connected component 205.
- the corresponding detection steps and completion steps are identical for the third selected connected component 205 as were, for example, described in detail above regarding the first selected connected component 201 and shown in Figs. 4A-4F.
- Figs. 7A-7F show the same principle views as Figs. 4A- 4F, 5A-5F and 6A-6F but the bounding box 298 and padding box 300 are encompassing the fourth selected connected component 209 instead of the earlier selected connected components, as mentioned above. More particularly, Fig. 7A is a zoomed in part (image 302) of the binary XOR-mask 200e (shown in Fig. 3A) showing the bounding box 298 and the padding box 300 for the connected component 209 highlighted in gray. Fig. 7B is a binary XOR-mask 304 of the fourth connected component 209, the fourth selected connected component 209 shown in gray in Fig. 7A. Fig.
- FIG. 7C is a corresponding input image 306 of the padding box 300 region of image 200f for the fourth selected connected component 209 corresponding to object 210.
- Fig. 7D is a detailed binary AND-mask 308 of the padding box 300 showing the expanded overlapping segment 218'.
- Fig. 7E is a view of a fourth segmentation set 310 of the fourth selected connected component 209.
- Fig. 7F is a resulting image 312 of the fourth selected connected component 209 that clearly shows the entire object 210 with correct segmentation and outline.
- the resulting image 312 is the result of the completion step related to the fourth selected connected component 209 from the output of the detection stage.
- the corresponding detection steps and completion steps are identical for the fourth selected connected component 209 as were, for example, described in detail above regarding the first selected connected component 201 and shown in Figs. 4A- 4F.
- Figs. 8A-8F show the same principle views as Figs. 4A-7F but the bounding box 320 and padding box 322 are encompassing the fifth selected connected component 203 instead of the earlier selected connected components, as mentioned above. More particularly, Fig. 8A is a zoomed in part (image 324) of the binary XOR-mask 200e (shown in Fig. 3A) showing the bounding box 320 and the padding box 322 for the connected component 203 highlighted in gray. Fig. 8B is a binary XOR- mask 326 of the connected component 203, the fifth selected connected component shown in gray in Fig. 8A. Fig.
- FIG. 8C is a corresponding input image 328 of the padding box 322 region of image 200f for the fifth selected connected component 203 corresponding to object 204.
- Fig. 8D is the binary AND-mask 330 of the padding box 322 showing the expanded overlapping segment 212'.
- Fig. 8E is a view of a fifth segmentation set 332 of the fifth selected connected component 203.
- Fig. 8F is a resulting image 334 of the fifth selected connected component 203 that clearly shows the entire object 204 with correct segmentation and outline. The resulting image 334 is the result of the completion step related to the fifth selected connected component 203.
- Fig. 9 shows the final result image 336 that is a combination of image 260 (Fig. 4F), image 276 (Fig. 5F), image 294 (Fig. 6F), image 312 (Fig. 7F) and image 334 (Fig. 8F).
- the image 336 clearly shows and outlines the entire objects 202, 204, 206, 208 and 210 despite the overlapping regions described above. It should be noticed that image 336 shows the objects 202..210 as clearly as the ground truth image 200c, shown in Fig. 2C. This confirms that the detection and completion steps of the present invention produce the correct result.
- Figs. 10A-K illustrate all steps of the present invention at inference when used to segment and outline the objects of a new image 400, shown in Fig. 10A, that has not been analyzed before by any of the networks of the present invention.
- the original image 400 is fed as input to the first neural network of the detection stage which has been trained to produce a grayscale (non-binary) XOR belongingness image 420 (shown in Fig. 10B) and a grayscale (non-binary)
- the belongingness images 420, 440 depict the pixels in white, gray, or black wherein the specific grayscale rendered indicates the probability of belonging to the XOR-mask and AND-mask respectively. The brighter the grayscale the higher the probability of belonging. The shades of gray between white and black indicate probabilities between white (1) and black (0).
- the belongingness images 420, 440 are then each binarized by e.g., the Otsu thresholding method or any other suitable method, to produce a binary XOR-mask 450 (shown in Fig. 10D) and a binary AND-mask 470 (shown in Fig. 10E), respectively.
- FIG. 10F shows an image 480 of the XOR-mask for a selected connected component 401 (highlighted in gray), together with its bounding box 482 and padding box 484.
- Fig. 10G shows the segmentation set 490 for the selected connected component 401, i.e., only the selected connected component 401 from the padding box 484 portion of the XOR-mask image 480, the part of the original image 400, and the overlapping segments of the AND-mask 470 disposed inside the padding box 484 of the selected connected component 401.
- Fig. 10H shows the resulting gray-scale belongingness image 500 when using the segmentation set as input to the second completion neural network.
- the belongingness image 500 is thus the output from the second or completion network as a result of the analysis by the second network of the segmentation set 490.
- Fig. 101 shows the binarized belongingness image 510 based on view 500 in Fig. 10H that has been binarized (by, for example, the Otsu thresholding method), which is the final object segmentation mask of the selected connected component 401.
- Fig. 10J shows an image 512 of the final object segmentation mask overlaid on its padding box region of the original image 400.
- Fig. 10K shows the correct outlines of all objects detected by the method overlaid on the original image 400. As can be seen in Fig.
Landscapes
- Engineering & Computer Science (AREA)
- Theoretical Computer Science (AREA)
- Physics & Mathematics (AREA)
- General Physics & Mathematics (AREA)
- Life Sciences & Earth Sciences (AREA)
- Health & Medical Sciences (AREA)
- Computer Vision & Pattern Recognition (AREA)
- Biomedical Technology (AREA)
- General Health & Medical Sciences (AREA)
- Molecular Biology (AREA)
- Multimedia (AREA)
- Image Analysis (AREA)
- Image Processing (AREA)
Abstract
The method is for training and automatically segmenting overlapping objects (102, 104) in images such as overlapping objects in images acquired with an imaging device such as a microscope. The overlapping objects are divided into non-overlapping connected components and overlapping segments. The method includes combinatorial set theory in a training scheme and at inference of a machine learning approach for automatic segmentation of overlapping objects (102, 104) imaged with an electron microscope.
Description
METHOD FOR MACHINE -LEARNING BASED TRAINING AND SEGMENTATION OF OVERLAPPING OBJECTS
Technical Field
The present invention relates to a method for training and automatically segmenting overlapping objects in images such as overlapping objects in images acquired with an imaging device such as a microscope. More particularly, the present invention includes combinatorial set theory in a training scheme and at inference of a machine learning approach for automatic segmentation of overlapping objects imaged with, for example, an electron microscope.
Background and Summary of the Invention
In the pharmaceutical industry, viruses, Virus-Like Particles (VLPs), liposomes and lipid nanoparticles (LNPs) are extensively used as carriers for drug and gene delivery. The assessment of size, shape and content of these particles, that is the type and number of therapeutic substances or genes, are of prime importance for drug development and quality control as it is directly linked to the efficiency of the treatment as well as the stability (shelf life) and production costs.
Transmission electron microscopy (TEM) is a suitable and commonly used technique for characterizing these nano-sized particles. Different kinds of TEM are used for this purpose of which cryogenic TEM (CryoTEM) and room-temperature TEM (rtTEM), often using negative stain (nsTEM) preparation techniques, are the most common. In CryoTEM, the sample is kept in its native hydrated form by instantly freezing it to
a cryogenic temperature to avoid the formation and effects of ice-crystals before imaging at the cryogenic temperatures in the microscope. In rtTEM, the sample is kept at room- temperature and for biological samples. This often means that a stain and/or some other kind of preservative agent must be added to embed the objects in order for important structures to be maintained and not destroyed by the drying process that occurs at room-temperature. A negative stain is sometimes used because it has the effect of increasing the contrast in the sample so that certain types of details and sample constituent differences are enhanced.
Artificial Intelligence (AI) and machine learning such as image-based deep-learning offers the ability of excluding the human's often poor capability of instructing the computer about what information to extract and how to measure and analyze in order to be able to learn relevant information from examples. It has the drawback of not incorporating information or "types" not represented in the examples into the deep-learning model. This type of information is, hence, not typically learnt by the machine/computer. Another problem is that the training models may also learn erroneous features when there are systematic errors in the training data.
Machine-learning techniques, more specifically deep learning and the computational structures known as convolutional neural networks, have lifted image processing and analysis to a whole new and different level. This is especially for computer vision applications that are based on the enormous amounts of accessible natural scene images available on the Internet. Biomedical applications, such as
microscopy, face a different scenario with fewer accessible training images. Often, the purpose of the processing and analysis is slightly different and application specific.
In microscopy, a sample often contains multiple objects that should be detected, identified, and measured in different ways. The high number of different types of objects along with their measured characteristic features are in turn used for e.g., clinical diagnosis, disease pathway research and understanding, drug development and quality control. It is, thus, important that each object is segmented in its full size and shape so that the extracted measurements are not biased when objects are overlapping one another or when objects are partly or fully hidden behind other objects. Thanks to the shared weights of convolutional neural networks, these networks can very effectively learn to determine which pixels in an image belong to some type of object. A problem with this sharing of weights, however, is that when overlapping objects are present in the image, there is no obvious way of training a network to determine pixels that belong to the object type and simultaneously separate the segmentations of the individual overlapping objects.
The present invention provides an effective solution to the problem associated with overlapping objects. The method of the present invention relates to a sequence of smart set- theory combinatorial steps of object segments combined with machine-learning approaches. Using this simple, yet clever, combination allows for automatic, accurate and reliable training of such networks and once the networks are trained, detection of full object geometries of overlapping objects in complex images. An important aspect of the present invention
is the idea and novel insight of splitting the segmentation task into to two machine-learning tasks. The first task produces two mask images corresponding to non-overlapping and overlapping object segments, and the second task combines the masks to full objects with correct geometry. The step of splitting the segmentation into two steps also allows for the inclusion of a simple, yet effective, modification of the overlapping segments, during the training stages, that makes the segmentation less sensitive to incorrect predictions made by the machine learning methods. The overall segmentation approach is thereby made more robust to handle complex images and images of varying appearances. More particularly, the method of the present invention relates to the steps listed below:
1) The training part of the machine-learning approach of the present invention which makes sure that individual objects are detected, and their full geometry can be segmented; and
2) The design of the machine-learning model of the present invention which performs automated segmentation of overlapping objects.
It should be noted that for both steps it is most important that overlapping objects are included and represented in order not to bias the final results and subsequent extracted measurements. In order to detect overlapping objects correctly in new images when applying/using the method steps of the present invention, so called "at inference," the training of how to handle overlapping objects must have been included in the training step.
Although exemplified here as applied to TEM images of liposomes, the method of the present invention is applicable to other datatypes and contents, such as lipid nanoparticles (LNPs), gene therapy and viral biological vectors (i.e., the particles that contain or carry the genes), cellular organelles such as exosomes and ribosomes, and other gene/drug delivery particles as well as overlapping cells or subcellular organelles (such as nuclei) in various types of light and fluorescence microscopy images.
More particularly, the method of the present invention is for analyzing an image having overlapping objects. An imaging device providing an input image having a first object overlapping a second object at a first overlapping segment. The first object comprising the first overlapping segment and a first non-overlapping connected component. The second object comprising the first overlapping segment and a second non-overlapping connected component. A first network or first computational structure receiving the input image from the imaging device. The first network or first computational structure calculating a first image containing only the first and second overlapping segments as a first output image and a second image containing only the first and second non overlapping connected components as a second output image. A separate second network extracting and processing each non overlapping connected component in the second output image together with the first output image and input image. The second network or second computational structure receiving the first non-overlapping connected component of the second output image, the first output image with overlapping segments and the input image. The second network or second
computational structure calculating a resulting image of the first object based on the first non-overlapping connected component and the overlapping segments and the input image. The first object comprising the first non-overlapping connected component and the first overlapping segment.
In an alternative embodiment, the method of present invention is creating a training set for machine learning, for determining parameters of a first computational structure or first network having an improved tolerance to uncertain predictions made by the first computational structure or network. An imaging device providing an input image having a first object overlapping a second object at a first overlapping segment. The first object comprising the first overlapping segment and a first non-overlapping connected component. The second object comprising the first overlapping segment and a second non-overlapping connected component. The first non-overlapping and the second non overlapping component having a first gap defined therebetween. Enlarging the first gap to a second gap wherein the second gap is greater than the first gap. Providing ground truth object information for a first and a second object. Setting pixel values of pixels in a first target image representing overlapping segments to one when the pixels belong to more than one ground truth object or when the pixels are located within a predetermined distance extending from the overlapping segments, otherwise setting the pixel values to zero. Setting the pixel values of pixels in a second target image representing non-overlapping connected components to one when the pixels belong to only one ground truth object and when a corresponding position is
zero in the first target image, otherwise setting the pixel values to zero.
In another embodiment, the method of the present invention further comprises the step of providing the first network or first computational structure with a training input image or training input images having corresponding non-overlapping connected components and ground truth object segmentations of overlapping objects and training the first network or first computational structure by updating parameters based on the training input image.
In yet another embodiment, the method of the present invention further comprises the steps of making the objects in a ground truth object segmentations image having segmentations in overlapping segments known to the first network or first computational structure in a second training scheme, and the first overlapping segment separating the first connected component from the second connected component so that the first connected component is distinctly separated from the second connected component.
In yet another embodiment, the method of the present invention further comprises the steps of a second network or second computational structure learning in the second training scheme to reconstruct the first object based on the first non-overlapping connected component, the expanded first overlapping segment and the ground truth object segmentations image.
In another embodiment, the method of the present invention further comprises the steps of the second network or second computational structure learning in the training scheme to reconstruct the second object based on the second
non-overlapping connected component, the expanded first overlapping segment and the ground truth object segmentations image.
In yet an alternative embodiment, the method of the present invention further comprises the step of creating a padding box encompassing the first non-overlapping connected component.
In an alternative embodiment, the method of the present invention further comprises the step of preparing one padding box for each connected component in the input image.
In another embodiment, the method of the present invention further comprises the step of creating a segmentation set by combining a region of the input image corresponding to the padding box with the non-overlapping connected component and with an expanded overlapping segment.
Brief Description of Drawings
Fig. 1A is an original image showing overlapping objects of liposomes depicted in a transmission electron microscope;
Fig. IB is an image of ground truth object segmentations overlaid on the original image shown in Fig. 1A;
Fig. 1C is an image of a total object mask of the present invention; Fig. ID is an image of expanded objects of the image in
Fig. IB of the present invention overlaid on the original image shown in Fig. 1A;
Fig. IE is an image of a corresponding AND-mask of the present invention; Fig. IF is an image of a corresponding XOR-mask of the
present invention;
Fig. 2A is an original image depicting overlapping objects to be analyzed by using the method of the present invention; Fig. 2B is an image depicting overlapping objects which are incorrectly segmented and outlined as large undesirable conjoined objects;
Fig. 2C is an image of individually outlined objects with correct segmentations and the contours of each object are clearly shown (ground truth object segmentations);
Fig. 2D is an XOR-mask image of the present invention overlaid on the original image in Fig. 2A, depicting non overlapping connected components of all the objects;
Fig. 3A is a binary XOR-mask image of the present invention including a bounding box and a padding box of one connected component highlighted in gray;
Fig. 3B is the original image shown in Fig. 2A including the padding box of the present invention;
Fig. 3C is a binary AND-mask image disposed inside the padding box of the present invention of the images shown in Figs. 3A-3B;
Fig. 4A is a zoomed in portion of the binary XOR-mask image shown in Fig. 3A including a bounding box and a padding box encompassing the first selected connected component (highlighted in gray) of the present invention;
Fig. 4B is a binary image of the first selected connected component in the XOR-mask image shown in Fig. 4A sized as the padding box of the present invention;
Fig. 4C is a detailed zoomed in view of the input image (shown in Fig. 3B) showing the region corresponding to the
padding box for the first selected connected component of the present invention;
Fig. 4D is a detailed zoomed in view of the AND-mask image shown in Fig. 3C corresponding to the region of the padding box of the present invention;
Fig. 4E is a view of a segmentation set of the first selected connected component of the present invention;
Fig. 4F is a resulting image of the first correctly segmented object corresponding to the first selected connected component analyzed by using the method of the present invention;
Fig. 5A is a detailed zoomed in part of the binary XOR- mask having a bounding box and padding box encompassing the second selected connected component (highlighted in gray) of the present invention;
Fig. 5B is a binary image of the second selected connected component in the XOR-mask shown in Fig 5A of the present invention;
Fig. 5C is a detailed zoomed in view of the input image showing the region corresponding to the padding box for the second selected connected component of the present invention;
Fig. 5D is a detailed zoomed in view of the AND-mask in Fig. 3C corresponding to the region of the padding box of the second selected connected component of the present invention;
Fig. 5E is a view of a segmentation set of the second selected connected component of the present invention;
Fig. 5F is a resulting image of the second correctly segmented object analyzed by using the method of the present invention;
Fig. 6A is a detailed zoomed in part of the binary XOR-
mask having a bounding box and padding box encompassing the third selected connected component (highlighted in gray) of the present invention;
Fig. 6B is a binary image of the third selected connected component of the XOR-mask shown in Fig 6A of the present invention;
Fig. 6C is a detailed zoomed in view of the input image showing the region corresponding to the padding box for the third selected connected component of the present invention;
Fig. 6D is a detailed zoomed in view of the AND-mask in Fig. 3C corresponding to the region of the padding box of the third selected connected component of the present invention;
Fig. 6E is a view of a segmentation set of the third selected connected component of the present invention;
Fig. 6F is a resulting image of the third correctly segmented object analyzed by using the method of the present invention;
Fig. 7A is a detailed zoomed in part of the binary XOR- mask having a bounding box and padding box encompassing the fourth selected connected component (highlighted in gray) of the present invention;
Fig. 7B is a binary image of the fourth selected connected component of the XOR-mask shown in Fig 7A of the present invention;
Fig. 7C is a detailed zoomed in view of the input image showing the region corresponding to the padding box for the fourth selected connected component of the present invention;
Fig. 7D is a detailed zoomed in view of the AND-mask in Fig. 3C corresponding to the region of the padding box of the fourth selected connected component of the present invention;
Fig. 7E is a view of a segmentation set of the fourth selected connected component of the present invention;
Fig. 7F is a resulting image of the fourth correctly segmented object analyzed by using the method of the present invention;
Fig. 8A is a detailed zoomed in part of the binary XOR- mask having a bounding box and padding box encompassing the fifth selected connected component (highlighted in gray) of the present invention;
Fig. 8B is a binary image of the fifth selected connected component of the XOR-mask shown in Fig 8A of the present invention;
Fig. 8C is a detailed zoomed in view of the input image showing the region corresponding to the padding box for the fifth selected connected component of the present invention;
Fig. 8D is a detailed zoomed in view of the AND-mask in Fig. 3C corresponding to the region of the padding box of the fifth selected connected component of the present invention;
Fig. 8E is a view of a segmentation set of the fifth selected connected component of the present invention; and
Fig. 8F is a resulting image of the fifth correctly segmented object analyzed by using the method of the present invention;
Fig. 9 is a final resulting image of all the correctly segmented objects analyzed by using the method of the present invention;
Fig. 10A is an original image to be analyzed by the detection or first network of the present invention;
Fig. 10B is the resulting non-binary XOR belongingness image of the detection network of the present invention;
Fig. IOC is the resulting non-binary AND belongingness image of the detection network of the present invention;
Fig. 10D is the binary XOR mask of the present invention resulting from a binarization of the XOR belongingness image shown in Fig. 10B;
Fig. 10E is the binary AND mask of the present invention resulting from a binarization of the AND belongingness image shown in Fig. IOC;
Fig. 10F is a detailed zoomed in part of the binary XOR- mask having a bounding box and padding box encompassing a selected connected component (highlighted in gray) of the present invention;
Fig. 10G is a view of a segmentation set of the selected connected component of the present invention;
Fig. 10H is the resulting belongingness image of the segmentation set in Fig. 10G given as input and analyzed by the completion or second network of the present invention;
Fig. 101 is the resulting binary object mask of completion stage of the present invention;
Fig. 10J is an image with the final result, the correctly segmented object, for the selected connected component analyzed by the completion stage of the method of the present invention overlaid on the original input image; and
Fig. 10K is a final result image of all the objects analyzed by using the method of the present invention overlaid on the original input image (shown in Fig. 10A).
Detailed Description
The method of the present invention is preferably used
in conjunction with any suitable detection method that produces a class belongingness image where each pixel value of each pixel in the image is interpreted as the certainty or probability of the corresponding pixel of the input image belonging to the considered class. For example, a pixel value close to 1 indicates an extremely high probability of the pixel belonging to the class such as belonging to an object in the image that is being investigated. Similarly, a pixel value close to 0 indicates an extremely low probability of the pixel belonging to the class such as belonging to the object in the image that is being investigated. Preferably, a computerized or computational device is used when training and applying the method of the present invention. More particularly, the device, preferably, has one or several computational structures or convolutional neural networks implemented that are first trained to carry out the detection and completion stages of the present invention. As indicated above, it should be understood that the principles or steps of the method of the present invention work with any suitable detection method used to detect objects in an image such as an image depicted with a microscope. For instance, during the training stage, a suitable detection method may be used to detect and outline objects for the ground truth object segmentations image that is used for training, i.e., determining the parameters of the computational structures or networks, as explained in detail below.
In the non-binary class belongingness image, the highest possible value is, preferably, interpreted by the computational device as the image pixel belonging to an object of the segmentation class with full certainty. As the
values decrease, the certainty also decreases such that the lowest possible value 0 is interpreted by the device to mean that the image pixel completely or definitely does not belong to an object of the class. Values are typically floating point values ranging from 1 (highest) to 0 (lowest) but can also be of other types and in other ranges. It is, for example, not uncommon to have images represented by integers in the range 0-255.
The belongingness image can be converted to a binary segmentation mask by letting the computational device change each pixel value to either 1 or 0 according to a preferred procedure. The preferred procedure could, for instance, be a simple thresholding method where belongingness values above a certain value (i.e., the threshold) are converted to 1 and values equal to or below the threshold are converted to 0.
The binarization can also consider or combine multiple features or belongingness images according to pre-determined rules. Preferably, the binary segmentation mask is created in which all pixels are either 1 or 0.
The binary segmentation masks can be used to determine the connected components in the mask. For example, all the pixels in a connected component have the value 1 and form one connected continuous area that is not split up or separated by pixels that have the value 0. Preferably, any two neighboring pixels of the same binary pixel value (either 1 or 0) belong to the same connected component. In this way, connected components correspond to an intuitive notion of isolated groups or "islands" of 1-valued class pixels surrounded by 0-valued background pixels. For clarity, the 1-valued class pixels could be depicted in white color or in
a certain pattern while the 0-valued class pixels could be depicted in black color or in another pattern than the pattern used to depict 1-valued class pixels.
Given a class ontology where individual objects are present in the input image, when the objects are all well- separated (i.e., when there is no overlap between the objects), each complete object is represented by a connected component in the binary segmentation mask. However, when the objects are overlapping in the image, individual entire objects of a class cannot be distinguished from one another in the binary segmentation mask. Instead, overlapping objects are conjoined into a single connected component (that includes many objects) so that the geometry of the individual objects cannot unequivocally be determined from the single conjoined connected component. In other words, the relationship between objects and single conjoined connected components is then not 1-to-l. This means that the individual objects are not the same as each conjoined connected component shown in the image because the conjoined connected component includes two or more objects that are conjoined and not easily separable from each other.
However, a 1-to-l relationship between objects and connected components can be achieved by using the method of the present invention that takes special consideration of regions in the input image where the objects overlap.
Instead of producing a single class belongingness image corresponding to the certainty of the presence of the object, the method of the present invention uses two separate images. The first image corresponds to the certainty of pixels belonging to a non-overlapping region or segment of an object
in the image. The second image corresponds to the certainty of pixels belonging to regions or segments in the objects where there is overlap with another object or objects. In other words, pixels located in the overlapping segment belong to more than one object. The two different images are henceforth called the belongingness XOR-image and the belongingness AND-image, respectively. The belongingness images that are produced as output by the detection of the first computational structure or network are not binary. Instead, the belonginess images produced by the first network indicate the probability of each pixel belonging to the class by varying the shade of the gray scale from any shade between white (100% probability) to black (0% probability). Preferably, the probabilities according to the detection or first network are based on how the detection network has been trained to set the probabilities. Their conversions to binary images may simply be called the XOR-mask and the AND- mask. This simple way of keeping track of overlapping object regions and using them in the deep learning framework (instead of just using one "object mask") is a core feature of the present invention. It allows for a successful final segmentation of full geometry objects despite the objects being overlapped by other objects, while being computationally efficient.
The method of the present invention includes two separate consecutive stages i.e., a detection stage (first stage) followed by a completion stage (second stage).
However, as explained in detail below, before the device can carry out the detection and completion stages on a new image that contains overlapping objects, the networks in the device
must first be trained to do the detection and the completion stages. In the preferred embodiment, the detection and completion stages, preferably, include a combination of at least two machine learning methods or networks.
In the detection stage, when the XOR- and AND-masks are created, the connected components of the XOR-mask map to individual objects of the input image in a 1-to-l fashion. This means the pixels in the pixel positions located in each connected component of the XOR mask belong to only one object. However, the connected components in the XOR-masks, do not generally represent the full object geometries in the original input image. Often a connected component only represents a portion of an object. This is because overlapping objects in the input image are represented by partial segmentations, i.e., the non-overlapping regions or segments because the overlapping segments of the objects are interpreted by the networks of the device as belonging to the background of the image. For example, in a binary XOR mask, the overlapping segments are interpreted by the device/networks as not belonging to the object but to the background. The full object geometries can be rectified in the later completion stage by using the AND-mask, as described in detail below.
The detection stage of the method of the present invention is, preferably, implemented in the computational device by using a supervised machine-learning method such as a fully convolution neural network e.g., U-Net trained on the XOR-mask and the AND-mask as targets for an input image wherein the input image is used by the trained U-Net to create the belongingness XOR image and the belongingness AND
image. The target masks (i.e., the AND and XOR masks), used when training the supervised machine learning method for the detection task, can be created so that predictions by the network (the belongingness XOR image and the belongingness AND image) that deviate from the exact desired target (as stipulated by the binary ground truth masks) are better tolerated .
A First Network Is Trained To Carry Out The Detection Stage Geometrically correct segmentations (the ground truth object segmentations) of each individual object are, preferably, used together with the input image when the networks of the device are trained to create the XOR-mask and the AND-mask from the input image. The ground truth object segmentations may be created by using a suitable detection method in order to correctly outline or segment all individual objects in the image despite the objects overlapping one another. In other words, the position and shape of the outer edge of each object in the overlapping segment are known so that the objects are correctly segmented despite any overlap between objects. The image and information of the ground truth segmentations (geometries) of all the objects are only used during the training of the networks to make sure the networks are correctly trained since it is known what each object looks like in the ground truth object segmentations image despite the objects overlapping one another. During the training stage of the networks, target masks (AND and XOR masks) with an incorporated tolerance for errors are created, according to the steps below:
A provisional XOR-mask is created based on the image of the ground truth object segmentations by setting pixel values in the mask to 0 when the pixel positions of the pixels are located not inside an object to be analyzed in the image or when the pixel positions of the pixels are part of or belong to two objects or more.
Otherwise, the pixel values are set to 1. This means the pixels located inside an object are set to 1 while the pixels located in the overlapping segments and the background are set to 0. As a target when training the first network, an AND-mask is created based on the image of the ground truth object segmentations by setting the pixel values in this mask to 1 when the pixel positions of the pixels are located inside or in close proximity to (nearly inside) two or more objects. This makes the overlapping segments depicted in the AND-mask slightly larger than the overlapping segments depicted in the image of the ground truth object segmentations. Otherwise, the pixel values are set to 0. The proximity criterion can, for example, be within a certain distance, to any pixel that is inside two or more objects. This expansion of the overlapping segments used in the AND-mask is an important step of the method of the present invention, during the training stage of the first and the second networks, because it makes the method less sensitive to flaws or inaccuracies in the predictions made by the networks that are being trained. It should be noted that the expansion of the overlapping segments in the AND mask is only done during the training stage but not
when the fully trained network is creating AND masks when analyzing a new unknown image (at inference). Instead of expanding the overlapping segments, it would also be possible to reduce the size of the connected components relative to the overlapping segments so that a gap is formed between adjacent connected components.
3. To serve as an additional or the second target when training the network, an XOR-mask is created by setting the pixel values to 0 when the corresponding pixel in the provisional XOR-mask (from step 1) has a pixel value of 0 or when the corresponding pixel (same location) in the AND-mask has a pixel value of 1. The other pixels are set to 1. This means that a connected component for an overlapping object in the XOR mask is slightly smaller than the corresponding connected component in the provisional XOR mask.
In this way, overlapping segments of the objects (depicted in the AND-mask) render slightly bigger areas of the background disposed between connected components in the XOR-mask of step 3 than would be seen in the provisional XOR- mask of step 1. The expanded overlapping segments mean the connected components in the XOR-masks are rendered slightly smaller than the connected components in the original image because the overlapping segments in the original image have not been expanded. Crucially, a proper gap between two detected overlapping objects is guaranteed because the AND- mask has been expanded to also include "nearly inside" pixels. This gap ensures that the connected components in the XOR-mask are separated from one another while each connected component only belongs to and/or depicts one
object. During the training stage, the connected components corresponding to different objects are thus distinctly separated from one another by the gaps created by the expanded overlapping segments in the AND-mask. Preferably, the size of the gap corresponds to the distance in the proximity criterion of step 2. The gap between the connected components in the XOR-mask used as the target mask when training the network for the detection stage prevents object segmentations from becoming undesirably conjoined when the network prediction deviates slightly from a target result, as specified by the ground truth object segmentations. This means that the gaps between the connected components in the XOR-mask when training the network allow for some errors in the network prediction when the method is used to create the XOR-mask and the AND-mask when using the method of the present invention on new unseen images (i.e., when using the trained network at inference) wherein the object segmentations are not known to the networks.
It should be understood that once the first or detection network has been trained by using a large number of images with objects and their corresponding ground truth masks, the trained first network is preferably used in the detection stage of a new input image with unknown object segmentations of overlapping objects. The image of the ground truth object segmentations is not used during the "real" detection stage i.e., when the trained first network is using its knowledge and applies it to the detection stage of a new image. The predictions made by the first network in the detection stage are preferably interpreted as the belongingness XOR-image and the belongingness AND-image. The connected components and
overlapping segments of the belongingness images, preferably, have pixels with different shades ranging from white to gray that indicate the likelihood of the pixels belonging or not. Figs. lOB-lOC show examples of XOR and AND belongingness images, respectively. The belongingness images are preferably the output of the detection stage of the first network. The belongingness images can be converted to binary segmentation masks by, for example, applying a threshold value to the images, as mentioned above. The threshold value can be fixed (pre-determined and fixed in the computational device) or manually set (given to the computational device by the user) or automatically determined. Commonly used and suitable automatic thresholding methods are, for example, the Otsu thresholding method or the minimum error thresholding method.
A Second Network Is Preferably Trained To Carry Out The Completion Stage
To complete the connected components of the XOR-mask (determined or created during the detection stage) in the completion stage, a different second or completion network preferably combines the connected components of the XOR-mask with the segments in the AND-mask and the input image. In this way, a collection of pixel regions that represent the full geometry of each of the objects in the input image is generated. For each connected component in the XOR-mask, a pixel region surrounding it is determined. The corresponding regions in the input image and in the AND-mask are then used together with the connected component to reconstruct the full geometry of the object.
The completion stage may be implemented by using a supervised machine-learning method such as a second fully convolutional neural network. The second or completion network could, for example, be a U-Net that is trained on full geometry objects (ground truth object segmentations) as prediction targets for input images in the form of a region around each connected component (representing a part of an object in a 1-1 fashion) in the XOR-mask, and the corresponding regions in the original image and the AND-mask. The second network is thus trained to carry out the completion stage. In other words, the first network is preferably trained to carry out the detection stage from which the XOR-mask and AND-mask are derived while the second network is trained to carry out the completion stage for each object. The AND-mask and XOR-mask of the above steps 2 and 3 in the training of the detection stage are preferably re-used as the training data when training the second network to carry out the completion stage of the present invention.
Given an object from the input image (i.e., aground truth object segmentation), training data to train the second network to carry out the completion stage can be created by the following steps:
1.By using the same ground truth object segmentations as was used when the training the first network for the detection stage, a surrounding rectangle, a padding box, is determined and created by the computational device with sides of a factor alpha, e.g., alpha=3, larger than the sides of the bounding box of the connected component in the input image. The padding box should be sized to ensure that the entire object to be detected and
analyzed is included inside the padding box before being sent into or submitted to the second network. The padding box is positioned such that the center of the padding box aligns with the centroid or center of the connected component in consideration. The bounding box is the smallest box that can be fitted around the connected component and it is automatically determined by the computational device by investigating the locations (x,y) coordinates of all pixels belonging to the connected component. The smallest and largest x- and y-values encountered correspond to the top and bottom corner coordinates of the bounding box, respectively . A target mask region is created by rendering the same object in the ground truth object segmentations image with 1-valued pixels in a 0-valued rectangle of the same dimensions as the padding box. A XOR-mask region is created by cropping the XOR-mask used during the training of the first network of the detection stage, with the padding box so that the XOR- mask region has the same size as the padding box. Any pixel value of a pixel for which the corresponding pixel value in the target mask is 0 is set to 0. An AND-mask region is created by cropping the AND-mask, used during the training of the first network of the detection stage, with the padding box so that the AND- mask region and the padding box have the same size. An input image region is created by cropping the input image from step 1 with the padding box so that the input image region and the padding box have the same size.
Training the second network to carry out the completion stage by using the same XOR-mask and AND-mask, as were used during the training of the first network to carry out the detection stage, enables the second network to produce full object geometries by taking as input the XOR-mask and the AND-mask produced during the detection stage.
In an inference scenario, when a trained network carries out the completion stage on a new image to complete objects in that image, the padding box for input creation cannot, of course, be constructed from the ground truth object geometry being the "a posteriori" result of the network. The image of the ground truth object segmentations is only used as input during the training stage of the networks to learn the detection stage and the completion stage. When the completion stage is carried out by the trained network on a new image, the padding box is instead derived from or based on the XOR-mask which is the output from the detection stage of the new image. In other words, the XOR-mask (which includes a connected component) created during the detection stage of objects in the new image serves as the basis for creating the bounding box and the padding box for the objects in the new image analyzed during the completion stage. The Trained Second Network Carries Out The Completion Stage Of A New Image By Using The Connected Component In The XOR- Mask From The Detection Stage As Input
Given a connected component, such as a connected component from the XOR-mask created during the detection stage, the input to the trained second network for the
completion stage of the new image for full object geometry prediction is created as follows:
1.A surrounding rectangle, padding box, is created with sides a factor alpha, e.g., alpha=3, larger than the bounding box of the selected connected component in the XOR-mask. The padding box is preferably positioned such that the center of the padding box aligns with the centroid of the connected component (example shown in Fig. 4A).
2.A XOR-mask region is created by cropping the XOR-mask, created during the detection stage of the new input image, with the padding box. Any pixel that does not belong to the connected component is set to have a pixel value 0 (example shown in Fig. 4B).
3.An AND-mask region is created by cropping the AND-mask, created during the detection stage of the new input image, with the padding box (example shown in Fig. 4D). In other words, the same AND-mask is used as was created during the detection stage of the new input image.
4.An input image region is created by cropping the new input image with the padding box (example shown in Fig. 4C).
Fig. 4E shows the segmentation set used as input to the completion network. The prediction result of the second network used in the completion stage is interpreted as a belongingness image. Preferably, this belongingness image is converted to a segmentation mask (example shown in Fig. 4F), such that the segmentations are interpreted as representing the full geometries of the object in the cropped new input
image. The conversion from the belongingness image to the segmentation mask could, for example, be achieved by applying a threshold to the values. Any suitable thresholding method could be used such as the commonly used Otsu thresholding method (used in the illustrative examples in this application) or the minimum error thresholding method.
Another approach could be to apply the watershed segmentation method to a gradient magnitude image derived from the belongingness image. Yet another approach could be to use the thresholded belonginess mask as a starting guess for a deformable or active shape model applied to the underlying original image.
Figs. 10A-K (described in detail below) illustrate all the steps of the present invention at inference, that is when the method is applied to a new image with overlapping objects .
Example
The first feature (training phase) of the present invention is the creation of a XOR-mask and an AND-mask from an image with prepared (ground truth) object segmentations that shows the correct segmentation of the objects although some of the objects are overlapping one another. Preferably, the image of the ground truth object segmentations is used as the target image to train the first network. XOR may stand for "exclusive or" meaning the mask depicts or shows pixel positions that belong to one object or another object (but not both) in the image. AND may stand for "and" meaning that the mask depicts or shows pixel positions of overlapping segments that belong to at least two objects in the image
i.e., the pixel positions belong to one object and another object.
Fig. 1A shows an original input image 100 that contains overlapping or very close objects 102, 104, 106, 108, 110,
112, 114, 116 and 118. The objects may be liposomes depicted in a transmission electron microscope (TEM). Fig. IB is very similar to Fig. 1A but shows "ground truth" or ideal object segmentations overlaid on the original input image so that all the objects in the image can be clearly distinguished from one another despite the overlap. It should be understood that the image in Fig. IB is merely used to illustrate that the outlines of each object are fully known by using another suitable detection method to detect and outline each object in the original input image. As explained in more detail below, the image of the ground truth object segmentations is merely created and used as training data when training the first network to carry out the detection stage and when training the second network to carry out the completion stage.
The method steps of the present invention can be used regardless of which method was used to find and identify the objects in the TEM image 100 in order to create the ground truth object segmentations image. As best shown in Fig. IB, object 102 and object 104 overlap at an overlapping segment 120. Object 112 and object 106 overlap at a segment 126. Object 114 and object 106 overlap at a segment 128. Object 106 and object 102 overlap at a segment 130. Object 106 and object 104 overlap at a segment 132. Object 104 and object 116 overlap at a segment 134. Object 114 and object 116 overlap at a segment 138. Object 118 and object 116 overlap
at a segment 140. Object 110 is very close to object 102 and object 108 is very close to object 102 but there is no clear overlap between the objects.
As explained above, a mask may be used to emphasize, mark or render/display certain sections in an image. For better clarity, the masks are preferably, but not necessarily, binary. A binary AND-mask 144 and a binary XOR mask 146 can be created in the device, as shown below:
1. A binary mask or total object mask 142 of all objects 102, 104, 106, 108, 110, 112, 114, 116 and 118 is first created, as shown in Fig. 1C. That is, all pixels belonging to any object 102..118 of image 100 in Figs. 1A and IB are set to 1 (white color) in the total object mask 142. Fig. IB shows image 100 overlaid with object overlapping regions as segments 120 ..140. All pixels not belonging to any object 102..118 in Fig. IB are set to 0 (black color) in the mask 142. The black color thus shows the background of the image or mask 142.
2. Image 100' in Fig. ID shows a new set of dilated overlapping segments 120', 122', 124', 126', 128',
130', 132', 134', 136', 138' and 140' that are created by dilating (growing or expanding) each object 102..118 in Fig. 1A by some amount, such as a specific distance or 10% or any other suitable percentage, to create expanded objects 102'..118' and corresponding expanded overlapping segments 120'..140' (best shown in Fig.
ID). The result is that, for example, overlapping segment 128', shown in Fig. ID, has been expanded from
overlapping segment 128 shown in Fig. IB. Similarly, overlapping segment 130' has been expanded from overlapping segment 130 and so on.
3. The binary AND-mask 144, as shown in Fig. IE, is created, from the overlapping dilated or expanded segments 120'..140', shown in Fig. ID, in which each pixel that belongs to only one dilated object 102'..118' in Fig ID is set to background i.e., 0 or black color in Fig. IE. All pixels in Fig. ID that belong to two or more objects 102'..118' (AND operation) are set to foreground i.e., 1 or white color in Fig. IE. The result is that only the expanded overlapping segments 120'..140' are shown in white color in Fig. IE and everything else is shown in black color.
4. The binary XOR-mask 146, as shown in Fig. IF, is created from the set difference, i.e., by subtracting the AND-mask 144, shown in Fig. IE, from the total object mask 142, shown in Fig. 1C. This means the expanded overlapping segments shown in white color in Fig. IE are shown in black color in Fig. IF and that the remaining connected component of each non-expanded object, shown in white in Fig. 1C, is shown as white in Fig. IF.
One important feature or advantage of dilating or expanding the object segments 102..118 to create the overlapping object region segments 120'..140' prior to creating the AND-mask 144 is that it makes sure that there is a gap or margin between the connected components of the XOR-
mask 146 so that no object or connected component (i.e., partial object) in the XOR-mask 146 is touching or in contact with another object or connected component. All the connected components are distinctly separated from one another so that no white area in the XOR-mask 146 is touching another white area. In this way, there is, for example, a gap 148 between the connected component associated with object 106 and the connected component associated with object 114, as shown in Fig. IF. It should be noted that there would be no, or only a 1-pixel wide, gap 148 if, for example, the non-expanded segment 128, instead of the expanded segment 128', was used to depict the overlap between object 114 and object 106. Similarly, there is another gap 150 between the connected component associated with object 106 and the connected component associated with object 112 and a gap 152 between the connected component associated with object 116 and the connected component associated with object 104. As a result of the expansion, each white area (connected component) in the XOR-mask 146 is distinctly outlined and is associated with or representing only one object. Some of the gaps between the connected components of the objects are represented by the expanded overlapping regions or segments. The fact that the connected components are not in contact with one another makes the machine-learning method of the present invention, described in detail below, very effective and robust and allows for some prediction errors of the networks while maintaining the accuracy of the detection and completion stages of the present invention both during the learning stages and when the trained networks are applied to new input images.
A second important feature of the present invention is the step of training a fully convolutional neural network in the computational device or distributed devices, on a set of images that display ground truth object segmentations. In this way, the network, such as the first or detection network, is first trained to create the basis (belongingness images) for the corresponding AND-mask and XOR-mask as output based on the image of the ground truth object segmentations. More particularly, the first network creates an AND- belongingness image and an XOR-belongingness image prior to binarizing them to the AND-mask and XOR-mask, respectively. The idea is thus to train the first network to be able to carry out the same steps of creating the corresponding AND- mask and XOR-mask based on a new image as input where the correct object segmentation is not known. In other words, the trained network applies the same principles to the new image as the network learned, during the training of the detection stage, to apply to the image of the ground truth object segmentations. A central concept of the present invention is thus to first train the system on known images that include separate and overlapping objects in order to be able to apply it to unknown/new images that also include separate and overlapping objects. It should be understood that the networks embody the machine learning methods of the present invention. The networks are, preferably, computational structures in the family of machine learning methods. In other words, the networks are thus first trained on known examples of images to learn a method that they then can apply to unknown/new examples of images.
Fig. 2A shows an original input image 200a that includes
overlapping objects. Normally, the image 200a with overlapping objects 202, 204, 206, 208, and 210 cannot be segmented correctly right away with a fully convolutional neural network which has been trained to outline complete objects that have the full and correct geometry. Although only objects 202..210 have been marked in Fig. 2A the principles that apply to objects 202..210 also apply to all the other objects in image 200a and other images. Objects 202..210 merely serve as examples to illustrate the steps and principles of the detection and completion stages of the present invention.
Fig. 2B shows the same view as in Fig. 2A overlaid with an incorrectly segmented image 200b of overlapping objects 202..210 which are identical to the objects in image 200a but image 200b emphasizes or illustrates as an overlay that the objects are incorrectly outlined as large undesirable conjoined objects wherein the individual objects of the conjoined objects are difficult to distinguish from one another due to the overlapping regions. This undesired and incorrect segmentation, corresponding to the typical result from a fully convolutional neural network which has been trained to outline complete objects (the full and correct geometry) in one go or step, makes the overlapping objects 202, 204, 206, 208 and 210 look like one large, conjoined object 199. As explained in detail below, the two-step approach of the method of the present invention is a critical feature in order for the networks to be able to reliably create correct segmentations of overlapping objects in an image.
Fig. 2C depicts the same view as shown in Fig. 2A
overlaid with a ground truth object segmentations image 200c of individually outlined objects 202..210 with correct outlines of the objects. This means the entire objects 202..210 are individually outlined with correct segmentation despite being overlapped by another object. More particularly, all objects 202..210 are individually outlined and overlapping regions 212, 214, 216, and 218 are correctly acknowledged as belonging to at least two objects. It is to be understood that any suitable detection method may be used to correctly detect and outline the objects in order to create the ground truth object segmentations image 200c shown in Fig. 2C used when training the networks of the present invention.
Fig. 2D shows the same view as in Fig. 2A overlaid with outlines of the connected components of an XOR image 200d of overlapping objects 202..210 with incorrect outlining but individual detection of each connected component 201..209 separated by expanded overlapping segments 212', 214', 216' and 218'. It should be noted that the images 200a, 200b,
200c and 200d are based on the same view of objects.
A feature of the present invention, as a first step towards correct segmentations, is the creation of an XOR-mask as the one which outlines are shown overlaid on the original image 200d in Fig. 2D as an output of the detection stage of the present invention. However, the XOR-mask has incorrect outlining of the objects (because the mask excludes the expanded or dilated overlapping object regions 212'..218' and depicts them in black as background) but illustrates the individual detection of the connected components 201..209 that are associated with the corresponding objects 202..210,
respectively. The XOR mask depicted in image 200d overlaid on the original image is an example that shows that the first network has learnt not to include the overlapping segments 212'..218'. More particularly, the XOR-mask in image 200d thus treats the expanded or dilated overlapping segments 212'..218', as explained in connection with Figs. 1A-1F, as belonging to the background 247 of the XOR-mask in image 200d. It should also be noticed that there is no overlap of segments in the XOR-mask shown in image 200d and each connected component 201..209 (i.e., the integral remaining portion of each object) corresponding to each object 202..210 are distinct and separated from the connected components of the other objects. The creation of the XOR-mask, shown overlaid on the original image in Fig. 2D, used when training the detection network is preferably done by first creating the total object mask (example shown in Fig. 1C) and then deducting the corresponding AND-mask, as explained in detail in connection with Figs. 1A-1F.
A third important feature of the present invention, which relates to the completion stage, is the creation and use of the padding box 252, best shown in Fig. 3A, which marks a region encompassing one connected component, such as the first connected component 201 corresponding to object 202 and some surroundings in one or more same-sized images. The padding box 252 is preferably constructed by enlarging (by some amount) the bounding box 254 that is the smallest sized box that surrounds or encompasses the selected connected component 201 in the XOR-mask image 200e. The XOR mask image 200e is the same mask as the one displayed overlaid on the original image in Fig. 2D. It shows the connected component
201 (marked in a gray shade) of the object 202. The bounding box 254 is placed and sized to have the minimum size that encompasses the connected component 201 (corresponding to object 202) in the XOR mask image 200e. Preferably, the padding box 252 should be sized (i.e., be large enough) and placed (centered) so that it encompasses all overlapping regions belonging to the object to be reconstructed in the completion stage. In practice, such as when the objects depicted correspond to gene carriers, it is often important that each object, that is to be analyzed in a later stage, is correctly outlined, despite being overlapped by another object, to ensure that the entire object is then being analyzed. This is one reason for constructing one padding box for each connected component (that is associated with an object).
More particularly, Fig. 3A depicts the XOR mask image 200e that shows the first connected component 201 of the selected first object 202 and its bounding box 254 and padding box 252. As mentioned earlier, the bounding box 254 is preferably the smallest sized box that encompasses the first connected component 201 of the first selected object 202. The larger size of the padding box 252 compared to the bounding box 254 ensures that the entire full geometry object
202 to be detected and analyzed is located inside the padding box 252. The bounding box 254 is thus associated with an object such as object 202. The XOR mask image 200e shown in Fig. 3A, is based on the original image 200a, shown in Fig. 2A. It should be understood that the bounding box 254 and padding box 252 are merely marked regions in the image to be analyzed.
Fig. 3B shows the padding box 252 overlayed on the original input image 200d (that is virtually identical to image 200a) placed in the same area as the padding box 252 shown on the XOR mask image 200e. Fig. 3C shows a corresponding AND mask 200g with the padding box 252 marked.
A fourth feature of the completion stage of the present invention is the creation of segmentation sets or masks i.e., the padding box regions are copied from the full-size images 200e, 200f, 200g and depicted as images 200e', 200f', 200g' that have the same size as the padding box 252. More particularly, given the original input image 200f (shown in Fig. 3B) and its corresponding XOR-mask image 200e (shown in Fig. 3A) and AND-mask 200g (shown in Fig. 3C), a segmentation set is created for each connected component, such as connected component 201 in the XOR-mask 200e, by the following steps:
1. The padding box 252 of the connected component 201 corresponding to object 202 is constructed.
2. Three empty images, with the same size as the padding box 252, are created.
3. The connected component 201 in padding box region 252 of the XOR mask image 200e' (shown in Fig. 4A) is drawn (or rather copied by the device) onto XOR mask 246 (shown in Fig. 4B) such that any pixel that belongs to the connected component 201 of object 202 is drawn white while any other region is drawn black. This image thus depicts the binary XOR-mask 246 of the connected component 201 that is sized according to the padding box 252.
4. The content of the padding box 252 in image 200f (shown in Fig. 3B) is depicted in image 200f', as shown in Fig. 4C.
5. The content of the padding box 252 of the AND mask 200g (shown in Fig. 3C) is depicted as AND-mask 200g', as shown in Fig. 4D.
The image 200e' in Fig. 4A is a zoomed in part of the XOR mask image 200e (Fig. 3A). Just as Fig. 3A, Fig. 4A displays the XOR mask image 200e' zoomed in on the bounding box 254 and padding box 252 for the connected component 201 which is highlighted in gray. The zoomed in image 200f' in Fig. 4C is the same as the padding box 252 in image 200f shown in Fig. 3B. The AND mask 200g' depicted in Fig. 4D is also shown inside the padding box 252 shown in Fig. 3C.
After this procedure, the XOR mask 246, the image 200f', and the AND mask 200g' together constitute a segmentation set 258, illustrated jointly in Fig. 4E, as a composition of the XOR mask 246 (shown in Fig. 4B) and the AND mask 200g' (shown in Fig. 4D), overlaid on image 200f' (shown in Fig. 4C). It should be noted that applying the method of the present invention to new images does not require the use of any views related to the view of the ground truth object segmentations.
As mentioned above, the ground truth object segmentations are only used during the learning stages of the first and second networks. Once the networks have been fully trained, they only use input from the new image to be analyzed. It should also be noted that the overlap segments (AND-regions) 212'..218' are expanded when creating the segmentation set 258 used for training, while the connected
component 201 is not expanded and has the correct size.
A fifth feature of the present invention is to train a supervised machine learning method, such as a second network, to carry out the completion stage. For example, the computational structure could be a fully convolutional neural network that, during the learning stages, is trained on a set of images including the image that shows the ground truth objects segmentations. One task of the second network is to output the complete geometry of an object given a segmentation set, such as segmentation set 258, as input.
The segmentation sets can be created from the XOR-mask and AND-mask according to the procedure describe above. While training the network, the segmentation set, and the view of the corresponding ground truth object segmentation are preferably used as input. Again, the ground truth object segmentations image is only used during the training stage to, preferably, train the first network to carry out the detection stage and the second network to carry out the completion stage.
A sixth feature of the present invention is the creation of the input to the completion stage from the output of the detection stage given only an image as input. Each connected component, such as the connected component 201, in the XOR- mask and the AND-mask output of the detection stage (typically provided by the trained fully convolutional first neural network) is, together with the input image, used to create a segmentation set (i.e., based on images 246, 200f', 200g') displayed jointly in the segmentation set 258 in Fig. 4E which is supplied as input to the second network that performs the completion stage (typically a trained fully
convolutional neural network) so that the final output (i.e., the resulting image) is a segmentation image 260, shown in Fig. 4F, which correctly outlines the object 202. One connected component can be used to produce one segmentation set so there is, preferably, only one segmentation set for each connected component. The overlapping object segments 212..218 of the original input image 200f' (Fig. 4C) are interpreted by the device in the second network as being parts of the full geometry of the object 202 and are added to the connected component 201 to create the full geometry of object 202, as shown in the resulting image 260 (shown in Fig. 4F).
The completion stage is preferably executed multiple times to complete all connected components in the XOR-mask created during the detection stage by the first network. The process of completing five local connected components is illustrated in Figs. 5A-8F, and the combined final result with all five objects correctly outlined is shown in Fig. 9.
A seventh feature of the present invention is that the completion stage only needs to be performed for objects that are overlapping. These can be directly identified from the connected components in the AND-mask, allowing for efficient implementation and low computational cost. In the case of an image with objects with no overlaps, the AND-mask is completely black. The connected components of the XOR-mask derived from the output of the first network in the detection stage then correspond to the correctly outlined objects.
When the components in the AND-mask are sparse and far apart, computational optimization can be achieved by only performing the completion stage on the XOR-mask connected components in
near proximity to the connected components in the AND-mask.
An eighth feature of the present invention is that the XOR-mask can be used to evaluate the degenerate case where an object is completely covered by a larger object. Any covered object can be recovered by checking for holes in the connected component of the covering object in the XOR-mask.
A correct outline of the covered object is then achieved by reversing the dilation (expansion), i.e., eroding (shrinking), the corresponding connected component in the AND-mask .
Figs. 5A-5F show the same principle views as Figs. 4A-4F but a bounding box 262 and padding box 264 are encompassing the second selected connected component 207 instead of the first selected connected component 201. The steps that have been described in connection with the connected component 201, as shown in Figs. 4A-4F, also apply to all the other connected components in the image such as connected components 203, 205, 207, and 209. More particularly, Fig.
5A is a zoomed in part (image 266) of the binary XOR-mask 200e (shown in Fig. 3A) showing the bounding box 262 and the padding box 264 for the connected component 207 highlighted in gray. Fig. 5B is a binary XOR-mask 268 of the second connected component 207 i.e., the second selected (marked in gray) connected component in Fig. 5A. Fig. 5C is a corresponding input image 270 of the padding box 264 region of image 200a for the second selected connected component 207 corresponding to object 208. Fig. 5D is the binary AND-mask 272 of the padding box 264 showing the expanded overlapping segment 216' and portions of the expanded segments 214' and 218'. Fig. 5E is a view of a second segmentation set 274 of
the second selected connected component 207. Fig. 5F is a resulting image 276 of the second selected connected component 207 that clearly shows the entire object 208 with correct segmentation and outline. The resulting image 276 is the result of the completion step related to the second selected connected component 207. The corresponding detection steps and completion steps are identical for the second selected connected component 207 corresponding to object 208 as were, for example, described in detail above regarding the first selected connected component 201 and shown in Figs. 4A-4F.
Figs. 6A-6F show the same principle views as Figs. 4A-4F and Figs. 5A-5F but the bounding box 280 and padding box 282 are encompassing the third selected connected component 205 instead of the first selected connected component 201 and second selected connected component 207, respectively. More particularly, Fig. 6A is a zoomed in part (image 284) of the binary XOR-mask 200e (shown in Fig. 3A) showing the bounding box 280 and the padding box 282 for the connected component 205 highlighted in gray. Fig. 6B is a binary XOR-mask 286 of the third connected component 205 i.e., the third selected connected component shown in gray in Fig. 6A. Fig. 6C is a corresponding input image 288 of the padding box 282 region of image 200f for the third selected connected component 205 corresponding to object 206. Fig. 6D is the binary AND-mask 290 of the padding box 282 showing the expanded overlapping segment 214' and portions of the expanded segment 216'. Fig. 6E is a view of a third segmentation set 292 of the third selected connected component 205. Fig. 6F is a resulting image 294 of the third selected connected component 205 that
clearly shows the entire object 206 with correct segmentation and outline. The resulting image 294 is the result of the completion step related to the third selected connected component 205. The corresponding detection steps and completion steps are identical for the third selected connected component 205 as were, for example, described in detail above regarding the first selected connected component 201 and shown in Figs. 4A-4F.
Figs. 7A-7F show the same principle views as Figs. 4A- 4F, 5A-5F and 6A-6F but the bounding box 298 and padding box 300 are encompassing the fourth selected connected component 209 instead of the earlier selected connected components, as mentioned above. More particularly, Fig. 7A is a zoomed in part (image 302) of the binary XOR-mask 200e (shown in Fig. 3A) showing the bounding box 298 and the padding box 300 for the connected component 209 highlighted in gray. Fig. 7B is a binary XOR-mask 304 of the fourth connected component 209, the fourth selected connected component 209 shown in gray in Fig. 7A. Fig. 7C is a corresponding input image 306 of the padding box 300 region of image 200f for the fourth selected connected component 209 corresponding to object 210. Fig. 7D is a detailed binary AND-mask 308 of the padding box 300 showing the expanded overlapping segment 218'. Fig. 7E is a view of a fourth segmentation set 310 of the fourth selected connected component 209. Fig. 7F is a resulting image 312 of the fourth selected connected component 209 that clearly shows the entire object 210 with correct segmentation and outline. The resulting image 312 is the result of the completion step related to the fourth selected connected component 209 from the output of the detection stage. The
corresponding detection steps and completion steps are identical for the fourth selected connected component 209 as were, for example, described in detail above regarding the first selected connected component 201 and shown in Figs. 4A- 4F.
Figs. 8A-8F show the same principle views as Figs. 4A-7F but the bounding box 320 and padding box 322 are encompassing the fifth selected connected component 203 instead of the earlier selected connected components, as mentioned above. More particularly, Fig. 8A is a zoomed in part (image 324) of the binary XOR-mask 200e (shown in Fig. 3A) showing the bounding box 320 and the padding box 322 for the connected component 203 highlighted in gray. Fig. 8B is a binary XOR- mask 326 of the connected component 203, the fifth selected connected component shown in gray in Fig. 8A. Fig. 8C is a corresponding input image 328 of the padding box 322 region of image 200f for the fifth selected connected component 203 corresponding to object 204. Fig. 8D is the binary AND-mask 330 of the padding box 322 showing the expanded overlapping segment 212'. Fig. 8E is a view of a fifth segmentation set 332 of the fifth selected connected component 203. Fig. 8F is a resulting image 334 of the fifth selected connected component 203 that clearly shows the entire object 204 with correct segmentation and outline. The resulting image 334 is the result of the completion step related to the fifth selected connected component 203. The corresponding detection steps and completion steps are identical for the fifth selected connected component 203 as were, for example, described in detail above regarding the first selected connected component 201 and shown in Figs. 4A-4F.
Fig. 9 shows the final result image 336 that is a combination of image 260 (Fig. 4F), image 276 (Fig. 5F), image 294 (Fig. 6F), image 312 (Fig. 7F) and image 334 (Fig. 8F). The image 336 clearly shows and outlines the entire objects 202, 204, 206, 208 and 210 despite the overlapping regions described above. It should be noticed that image 336 shows the objects 202..210 as clearly as the ground truth image 200c, shown in Fig. 2C. This confirms that the detection and completion steps of the present invention produce the correct result.
Figs. 10A-K illustrate all steps of the present invention at inference when used to segment and outline the objects of a new image 400, shown in Fig. 10A, that has not been analyzed before by any of the networks of the present invention. The original image 400 is fed as input to the first neural network of the detection stage which has been trained to produce a grayscale (non-binary) XOR belongingness image 420 (shown in Fig. 10B) and a grayscale (non-binary)
AND belongingness image 440 (shown in Fig. IOC) as the results of the detection network. The belongingness images 420, 440 depict the pixels in white, gray, or black wherein the specific grayscale rendered indicates the probability of belonging to the XOR-mask and AND-mask respectively. The brighter the grayscale the higher the probability of belonging. The shades of gray between white and black indicate probabilities between white (1) and black (0). The belongingness images 420, 440 are then each binarized by e.g., the Otsu thresholding method or any other suitable method, to produce a binary XOR-mask 450 (shown in Fig. 10D) and a binary AND-mask 470 (shown in Fig. 10E), respectively.
Fig. 10F shows an image 480 of the XOR-mask for a selected connected component 401 (highlighted in gray), together with its bounding box 482 and padding box 484. Fig. 10G shows the segmentation set 490 for the selected connected component 401, i.e., only the selected connected component 401 from the padding box 484 portion of the XOR-mask image 480, the part of the original image 400, and the overlapping segments of the AND-mask 470 disposed inside the padding box 484 of the selected connected component 401. Preferably, there is only one padding box per selected connected component. Fig. 10H shows the resulting gray-scale belongingness image 500 when using the segmentation set as input to the second completion neural network. The belongingness image 500 is thus the output from the second or completion network as a result of the analysis by the second network of the segmentation set 490. Fig. 101 shows the binarized belongingness image 510 based on view 500 in Fig. 10H that has been binarized (by, for example, the Otsu thresholding method), which is the final object segmentation mask of the selected connected component 401. Fig. 10J shows an image 512 of the final object segmentation mask overlaid on its padding box region of the original image 400. Fig. 10K shows the correct outlines of all objects detected by the method overlaid on the original image 400. As can be seen in Fig. 10K, the overlapping segments are correctly segmented and clearly shown so that the outline of the objects 404, 406, 408, 410 can be seen despite overlapping object 402. Similarly, the outline of object 402 can be clearly seen despite the object 402 overlapping or being overlapped by objects 404, 406, 408 and 410.
An alternative way of creating the gap between connected components in the present invention is to erode or shrink the ground truth objects (used during training phase) or the connected components (at inference) prior to subtracting unexpanded overlapping regions when creating the XOR-mask, i.e., the opposite of what is described above wherein the connected components (XOR-mask) are kept fixed while the overlapping regions are expanded when creating the training data. Everything else is preferably kept the same. While the present invention has been described in accordance with preferred compositions and embodiments, it is to be understood that certain substitutions and alterations may be made thereto without departing from the spirit and scope of the following claims.
Claims
1. A method of analyzing an image having overlapping objects, comprising: an imaging device providing an input image (100) having a first object (102) overlapping a second object (104) at a first overlapping segment (120), the first object (102) comprising the first overlapping segment (120) and a first non-overlapping connected component, the second object (104) comprising the first overlapping segment (120) and a second non-overlapping connected component; a first network or first computational structure receiving the input image (100) from the imaging device, the first network or first computational structure calculating a first image containing only the first overlapping segment (120) as a first output image and a second image containing only the first and second non-overlapping connected components as a second output image; a separate second network or second computational structure extracting and processing each non-overlapping connected component in the second output image together with the first output image and input image (100); the second network or second computational structure
receiving the first non-overlapping connected component of the second output image, the first output image with overlapping segments and the input image (100); and the second network calculating a resulting image of the first object (102) based on the first non-overlapping connected component and the overlapping segments (120) and the input image (100).
2. A method of creating a training set for machine learning, for adjusting parameters of a first computational structure or first network having an improved tolerance to uncertain predictions made by the first computational structure or first network, comprising: an imaging device providing an input image (100) having a first object (102) overlapping a second object (104) at a first overlapping segment (120), the first object (102) comprising the first overlapping segment (120) and a first non-overlapping connected component, the second object (104) comprising the first overlapping segment (120) and a second non-overlapping connected component, the first non overlapping and the second non-overlapping component having a first gap (148) defined therebetween; enlarging the first gap (148) to a second gap wherein the
second gap is greater than the first gap (148); providing ground truth object information for a first and a second object; setting pixel values of pixels in a first target image representing overlapping segments to one when the pixels belong to more than one ground truth object or when the pixels are located within a predetermined distance extending from the overlapping segments, otherwise setting the pixel values to zero; and setting the pixel values of pixels in a second target image representing non-overlapping connected components to one when the pixels belong to only one ground truth object and when a corresponding position is zero in the first target image, otherwise setting the pixel values to zero.
3. The method of claim 1 wherein the method further comprises the step of providing the first network or first computational structure with a training input image having corresponding non-overlapping connected components and ground truth object segmentations of overlapping objects and training the first network or first computational structure by updating parameters based on the training input image.
4. The method of claim 3 wherein the method further comprises the steps of making the objects in a ground truth object segmentations image having segmentations in overlapping segments known to the first network or first computational structure in a training scheme, and the first overlapping segment separating the first connected component from the second connected component so that the first connected component is distinctly separated from the second connected component.
5. The method according to claim 4 wherein the method further comprises the steps of a second network or second computational structure learning in a second training scheme to reconstruct the first object (102) based on the first non- overlapping connected component, the expanded first overlapping segment and the ground truth object segmentations image.
6. The method according to claim 5 wherein the method further comprises the steps of the second network or second computational structure learning in the second training scheme to reconstruct the second object (104) based on the second non-overlapping connected component, the expanded
first overlapping segment and the ground truth object segmentations image.
7. The method according to claim 1 wherein the method further comprises the step of creating a padding box (252) encompassing the first non-overlapping connected component.
8. The method according to claim 7 wherein the method further comprises the step of preparing one padding box for each connected component in the input image (100).
9. The method according to claim 7 wherein the method further comprises the step of creating a segmentation set by combining a region of the input image (100) corresponding to the padding box (252) with the non-overlapping connected component and with an expanded overlapping segment.
Applications Claiming Priority (2)
Application Number | Priority Date | Filing Date | Title |
---|---|---|---|
US202163219582P | 2021-07-08 | 2021-07-08 | |
US63/219,582 | 2021-07-08 |
Publications (3)
Publication Number | Publication Date |
---|---|
WO2023283411A2 true WO2023283411A2 (en) | 2023-01-12 |
WO2023283411A3 WO2023283411A3 (en) | 2023-02-16 |
WO2023283411A4 WO2023283411A4 (en) | 2023-05-04 |
Family
ID=84800972
Family Applications (1)
Application Number | Title | Priority Date | Filing Date |
---|---|---|---|
PCT/US2022/036470 WO2023283411A2 (en) | 2021-07-08 | 2022-07-08 | Method for machine-learning based training and segmentation of overlapping objects |
Country Status (1)
Country | Link |
---|---|
WO (1) | WO2023283411A2 (en) |
Family Cites Families (4)
Publication number | Priority date | Publication date | Assignee | Title |
---|---|---|---|---|
GB2398379A (en) * | 2003-02-11 | 2004-08-18 | Qinetiq Ltd | Automated digital image analysis |
US9292933B2 (en) * | 2011-01-10 | 2016-03-22 | Anant Madabhushi | Method and apparatus for shape based deformable segmentation of multiple overlapping objects |
EP3798968B1 (en) * | 2018-06-06 | 2024-07-03 | Guangdong Oppo Mobile Telecommunications Corp., Ltd. | Image processing method and device, electronic device, computer apparatus, and storage medium |
US11210554B2 (en) * | 2019-03-21 | 2021-12-28 | Illumina, Inc. | Artificial intelligence-based generation of sequencing metadata |
-
2022
- 2022-07-08 WO PCT/US2022/036470 patent/WO2023283411A2/en active Application Filing
Also Published As
Publication number | Publication date |
---|---|
WO2023283411A4 (en) | 2023-05-04 |
WO2023283411A3 (en) | 2023-02-16 |
Similar Documents
Publication | Publication Date | Title |
---|---|---|
Graham et al. | MILD-Net: Minimal information loss dilated network for gland instance segmentation in colon histology images | |
Sirinukunwattana et al. | Gland segmentation in colon histology images: The glas challenge contest | |
Thangaraj et al. | Retinal vessel segmentation using neural network | |
Han et al. | Multi-layer pseudo-supervision for histopathology tissue semantic segmentation using patch-level classification labels | |
Tulsani et al. | Segmentation using morphological watershed transformation for counting blood cells | |
Chi et al. | X-Net: Multi-branch UNet-like network for liver and tumor segmentation from 3D abdominal CT scans | |
Morris | A pyramid CNN for dense-leaves segmentation | |
Ye et al. | Automatic graph cut segmentation of lesions in CT using mean shift superpixels | |
Cohen et al. | Memory based active contour algorithm using pixel-level classified images for colon crypt segmentation | |
CN110675464A (en) | Medical image processing method and device, server and storage medium | |
Wang et al. | Detection of dendritic spines using wavelet‐based conditional symmetric analysis and regularized morphological shared‐weight neural networks | |
Zitnick et al. | The role of image understanding in contour detection | |
US7430320B2 (en) | Region-guided boundary refinement method | |
Nazir et al. | Copy move forgery detection and segmentation using improved mask region-based convolution network (RCNN) | |
Wankerl et al. | Fully convolutional networks for void segmentation in X-ray images of solder joints | |
Nandy et al. | Segmentation of nuclei from 3D microscopy images of tissue via graphcut optimization | |
Qu et al. | Stripnet: Towards topology consistent strip structure segmentation | |
Feng et al. | Retinal mosaicking with vascular bifurcations detected on vessel mask by a convolutional network | |
Feng et al. | A deep learning based multiscale approach to segment the areas of interest in whole slide images | |
Wang et al. | Food image recognition and food safety detection method based on deep learning | |
Wiehman et al. | Semantic segmentation of bioimages using convolutional neural networks | |
Dmitriev12 et al. | Efficient correction for em connectomics with skeletal representation | |
Salih et al. | The local ternary pattern encoder–decoder neural network for dental image segmentation | |
Al Shehhi et al. | An Automatic Cognitive Graph‐Based Segmentation for Detection of Blood Vessels in Retinal Images | |
Nawaz et al. | MSeg‐Net: A Melanoma Mole Segmentation Network Using CornerNet and Fuzzy K‐Means Clustering |
Legal Events
Date | Code | Title | Description |
---|---|---|---|
121 | Ep: the epo has been informed by wipo that ep was designated in this application |
Ref document number: 22838460 Country of ref document: EP Kind code of ref document: A2 |
|
DPE1 | Request for preliminary examination filed after expiration of 19th month from priority date (pct application filed from 20040101) | ||
NENP | Non-entry into the national phase |
Ref country code: DE |
|
122 | Ep: pct application non-entry in european phase |
Ref document number: 22838460 Country of ref document: EP Kind code of ref document: A2 |