WO2023091180A1

WO2023091180A1 - System and method for automated microscope image acquisition and 3d analysis

Info

Publication number: WO2023091180A1
Application number: PCT/US2022/023171
Authority: WO
Inventors: Lucas Martin GAGO; Martin Blasco; Horacio Claudio Acerbo
Original assignee: Mycofood Us Llc
Priority date: 2021-11-19
Filing date: 2022-04-01
Publication date: 2023-05-25

Abstract

Systems, methods, and computer-readable storage media for identifying and extracting relevant information from images. The system sets the microscope at a starting location, then raises or lowers the microscope while capturing images of a sample. The system then uses a first Convolutional Neural Network (CNN) to tag known objects within the images as microorganisms or inorganic structures, then uses the tagged images to construct a 3D structure. This 3D structure is compared to known microorganisms and/or inorganic structures, and based on that comparison a candidate microorganisms or inorganic structure is sent to a user for review.

Description

PATENT APPLICATION

SYSTEM AND METHOD FOR AUTOMATED MICROSCOPE IMAGE ACQUISITION AND 3D ANALYSIS

PRIORITY

[0001] The present application claims priority to U.S. Provisional Patent Application No. 63/281,394, filed November 19, 2021, the contents of which are incorporated herein in their entirety.

BACKGROUND

1. Technical Field

[0002] The present disclosure relates to light microscopy, and more specifically to identifying and extracting relevant information from images.

2. Introduction

[0003] Since the appearance of light microscopy in the 17th century, many improvements have been made to increase the quality and quantity of data that can be obtained with microscopes. However, with the development of automatic tools for image capturing, the vast amount of pictures that need to be processed on a regular experiment has skyrocketed. For example, when analyzing images of a sample, scientists usually only study a 2D section of the three-dimensional space that the sample occupies. At times, this can be enough to characterize the microorganism, especially if the sample is in a slide.

[0004] Well plates are trays with multiple wells that can be used as small test tubes. When working with a well plate (which regularly contain 96 samples) scientists must deal with the depth of the well, and the depth of the sample within the well, which can result in some parts of the microorganism being out of focus or occluded. Therefore, it becomes necessary to take images focused on different horizontal planes of the sample to extract as much information as possible. This multiplies the number of images that the scientist must take and evaluate, and renders the process infeasible if done manually. To tackle this, scientists have begun to apply computerized techniques. Such methods belong to an area of bioinformatics called “bioimage informatics,” which aims to extract and compare the biological knowledge of an image. SUMMARY

[0005] Additional features and advantages of the disclosure will be set forth in the description that follows, and in part will be understood from the description, or can be learned by practice of the herein disclosed principles. The features and advantages of the disclosure can be realized and obtained by means of the instruments and combinations particularly pointed out in the appended claims. These and other features of the disclosure will become more fully apparent from the following description and appended claims, or can be learned by the practice of the principles set forth herein.

[0006] Disclosed are systems, methods, and non-transitory computer-readable storage media which provide a technical solution to the technical problem described. An example method for performing the concepts disclosed herein can include: for a sample under a microscope: setting the microscope at a starting location; and capturing a plurality of images of the sample via a camera connected to the microscope, with a stepper motor performing one of increasing or decreasing a vertical distance between a microscope lens and a sample while moving the microscope from the starting location a set amount for each image capture until reaching an end location, resulting in a plurality of 2D images for the sample; processing, via at least one processor, the plurality of images using a first Convolutional Neural Network (CNN) to tag known objects within the plurality of images associated with at least one of microorganisms or inorganic structures, resulting in tagged versions of the plurality of 2D images; constructing, via the at least one processor using the tagged versions of the plurality of 2D images, a 3D structure; processing, via the at least one processor using a second CNN, the 3D structure to compare the 3D structure to known microorganisms, resulting in a 3D comparison; and outputting at least one candidate microorganism or inorganic structure to a user based on the 3D comparison.

[0007] An example system for performing the concepts disclosed herein can include: at least one processor; a microscope; and a computer-readable storage medium having instructions stored which, when executed by the processor, cause the processor to perform operations which include: for a sample under the microscope: setting the microscope at a starting location; and capturing a plurality of images of the sample via a camera connected to the microscope, with a stepper motor performing one of increasing or decreasing a vertical distance between a microscope lens and a sample while moving the microscope from the starting location a set amount for each image capture until reaching an end location, resulting in a plurality of 2D images for the sample; processing, via the at least one processor, the plurality of images using a first Convolutional Neural Network (CNN) to tag known objects within the plurality of images associated with at least one of microorganisms or inorganic structures, resulting in tagged versions of the plurality of 2D images; constructing, via the at least one processor using the tagged versions of the plurality of 2D images, a 3D structure; processing, via the at least one processor using a second CNN, the 3D structure to compare the 3D structure to known microorganisms, resulting in a 3D comparison; and outputting at least one candidate microorganism or inorganic structure to a user based on the 3D comparison.

[0008] A non-transitory computer-readable storage medium having instructions stored which, when executed by at least one processor, cause the at least one processor to perform operations which can include: for a sample under a microscope: setting the microscope at a starting location; and capturing a plurality of images of the sample via a camera connected to the microscope, with a stepper motor performing one of increasing or decreasing a vertical distance between a microscope lens and a sample while moving the microscope from the starting location a set amount for each image capture until reaching an end location, resulting in a plurality of 2D images for the sample; processing, via the at least one processor, the plurality of images using a first Convolutional Neural Network (CNN) to tag known objects within the plurality of images associated with at least one of microorganisms or inorganic structures, resulting in tagged versions of the plurality of 2D images; constructing, via the at least one processor using the tagged versions of the plurality of 2D images, a 3D structure; processing, via the at least one processor using a second CNN, the 3D structure to compare the 3D structure to known microorganisms, resulting in a 3D comparison; and outputting at least one candidate microorganism or inorganic structure to a user based on the 3D comparison.

BRIEF DESCRIPTION OF THE DRAWINGS

[0009] FIG. 1 illustrates an example process pipeline of the system;

[0010] FIG. 2 illustrates an example visualization of occluded objects due to the depth of field of a microscope;

[0011] FIG. 3 illustrates a representation of the phenomenon by which we can visualize occluded objects;

[0012] FIG. 4 illustrates example results of a semantic segmentation model;

[0013] FIG. 5 illustrates an example 2D projection of the 3D representation; [0014] FIG. 6 illustrates an exemplary method embodiment; and [0015] FIG. 7 illustrates an example computer system.

DETAILED DESCRIPTION

[0016] Various embodiments of the disclosure are described in detail below. While specific implementations are described, it should be understood that this is done for illustration purposes only. Other components and configurations may be used without parting from the spirit and scope of the disclosure.

[0017] Disclosed herein is an automated pipeline that captures images with a synchronized system of a well plate positioner and a digital camera. The automation of the data gathering opens the possibility of capturing images at certain heights in a reproducible manner. With this stack of images, the system can extract morphological information from the 3D structure of the samples. To this end, the system simultaneously solves a dense classification and a focus selection problem, which allows the samples to be analyzed as two-dimensional (2D) sections, then joins those two-dimensional sections to form a three-dimensional structure using a three-dimensional (3D) convolutional neural network (CNN). These 3D results can then be filtered based on characteristics identified in the 2D sections or 3D structures, resulting in improved filtering and consistency identifying microorganisms within well plates.

[0018] Traditional software that attempts to solve the problems mentioned above does not take focus selection into account. This limitation, coupled with the fact that they use conventional filters instead of CNNs, translates into an overestimating of the 3D structures, and renders such traditional software solutions useless for microorganisms with filamentous or thin morphologies. For the above reasons, traditional systems obtain erroneous results in tracking the organisms' growth and correctly identifying 3D morphological formations. [0019] Systems configured as disclosed herein can also implement machine learning, with the ability to leam from the scientist when the scientist corrects the predictions created by the model. For example, if a 3D structure is initially identified as “X” and upon review by a scientist, that identification is changed to “Y,” the system can modify the identification code such that future instances where the same structure is detected result in a “Y” prediction, rather than an “X” prediction. In making such modification, the system can utilize a reinforcement learning algorithm, where each time a prediction is confirmed the “score” for a prediction is incremented, while each time a prediction is modified or identified as incorrect the score is decreased. If a score decreases to a predetermined threshold, the system can change the identification code or weighting which resulted in the erroneous classification. Because identifying a given microorganism may rely on correctly identifying multiple substructures, and those substructures may be present in multiple microorganisms, the system can also have scoring capacity for individual substructures and modify when/how those substructures are identified based on the feedback from scientists.

[0020] In addition, systems configured as disclosed herein can identify portions of images identified as “regions of interest,” to be reviewed and/or tagged by scientists. The system can then incorporate the feedback from the scientists in identifying future substructures and/or microorganisms. If, for example, a region is flagged by the system as needing review, and the scientists identifies it as belonging to an already known substructure or microorganism, the system can add the additional 2D views and 3D structure to a database, such that the system can identify future instances of the region in the same manner as prescribed by the scientist.

[0021] The system can provide an automatic pipeline which classifies different classes of microorganisms and microscopic objects, while simultaneously discriminating which objects are in focus, and can do so without metadata or any extra domain knowledge associated with images. With the focus-segmentation results, the system can extract relevant variables about the morphology of microorganism(s) within the well, providing helpful information for the scientists, such as the shape of the organism, their size, and their number.

[0022] The system can operate by first automating data collection from the samples. To begin, each sample, containing for example an organism, is placed in the corresponding well of a well plate, and the system periodically takes images of the samples to analyze the growth of the microorganism. The objective is to repeat the data collection process as often as possible to have more data points to understand the microorganism development. To this end, the system can use an automatic process, improving the analysis and reducing human- induced errors.

[0023] The hardware for automated data collection can include:

- A microscope system with a digital camera; and

- A well plate positioner with a stepper motor.

[0024] The well plate positioner allows the system to move the experiment plate while the stepper motor controls the focus knob of the microscope. Both items can also be controlled by the same software controlling the digital camera, thereby synchronizing the plate’s movement and the image capturing.

[0025] Because the system performs 3D virtual reconstruction of the samples, it is preferable to take pictures in multiple horizontal planes for each well. The process for the data collection can be summarized as:

1) The well plate positioner moves the plate to the target well so that it lies below the microscope objective;

2) The stepper motor moves the focusing knob of the microscope to the lowest vertical position;

3) An image is captured with the digital camera;

4) The stepper motor moves the focusing knob by a small amount;

5) Steps 3 and 4 are repeated until the highest vertical position is reached; and

6) Repeat step 1, changing the target well to be the next one in the list of filled wells. [0026] In other words, multiple photographs/images of the sample are taken, ranging between the lowest vertical position of the microscope to the highest vertical position of the microscope, for each sample/well within the well plate.

[0027] For each of the wells, the images taken can be converted into a video to make the visualization and transferring of files more accessible. The videos can be saved locally, in the computer where the software is running, in a storage server, and/or in the cloud.

[0028] With a graphical user interface (GUI), users of the system can control any parameter related to the performance of the data collection system, such as:

• Starting height, ending height, and the step size/distance of the stepper motor, allowing the user to control the size of the vertical sampling of the well, and the distance between the collected images;

• Experimental distribution of the well. Depending on the experiment being performed, not every well may be filled, and the user may need to inform the well plate positioner where the filled wells are located to identify the pictures correctly;

• Calibration positions of the well. The well plate positioner may need to be calibrated with the position of the comers of the plate to calculate the coordinates of each of the filled wells. Such calibration may be done by calculating the vertical and horizontal distances between these positions and dividing them by the number of wells in each row or column. [0029] Once the images have been obtained, the system processes the images to obtain relevant data about the conditions of the microorganism. This process can be divided into three techniques, preferably applied sequentially:

1) Semantic segmentation and focus selection, from which the system creates a mask from the images to classify each item that appears within the images. A mask for each image taken of a well is created, resulting in a series of masks for each well of the plate.

2) 3D reconstruction, by which a 3D representation of each well is created using the masks of the wells created in the previous step.

3) Data extraction, by which relevant information about the conditions and current state of the microorganism is obtained using the 3D reconstruction as an input.

[0030] Each of these respective techniques is described in further detail.

[0031] Starting with the semantic segmentation and focus selection, the system aims to classify each pixel on the image and to select the regions in focus. An exemplary system uses a CNN based on a U-Net (a convolutional neural network that was developed for biomedical image segmentation) with, for example, EfficientNet (a convolutional neural network architecture and scaling method that uniformly scales all dimensions of depth/width/resolution using a compound coefficient) B0 (the baseline network) as a lightweight feature extractor. However, other configurations may configure the CNN without using the U-Net or EfficientNet examples. The CNN can, for example, be pre-trained on ImageNet, a large visual database designed for use in visual object recognition software, though the use of other or different databases is also possible.

[0032] Due to the difficulty of obtaining properly annotated data for training, robust data augmentation is preferably used. To avoid impeding focus selection learning, this data augmentation can be precisely calibrated. Early model results from sparse training data can be used to annotate more data more quickly. Training can, for example, begin in one instance with only twenty labeled images, seven of which lack concentrated material. The database can then be expanded to sixty-seven images, with the annotation process sped up by using the predictions from the first model. The CNN network can then be trained using an embedded dynamic learning rate, which accelerates the convergence process. When overfitting is detected via a callback, the training process can be automatically halted.

[0033] In one example, the system has replaced the down-sampling component of the classical U-Net with an EfficientNet B0, while the bottleneck and the up-sampling maintain the original U-Net architecture. The skip connections are sent from the first, second, third, and fifth blocks of the EfficientNet BO, while the output is connected to the bottleneck part of the U-Net. The U-Net presents skip connections between the down-sampling and the upsampling paths, improving the quality of the segmentation mask by providing local information to the encoded global information in the up-sampling process.

[0034] Continuing with the example, the network can have three parts: the down-sampling, the bottleneck, and the up-sampling. The down-sampling can use four blocks containing 3 x 3 convolutional layers with batch normalization, followed by 2 x 2 max-pooling layers. At the end of each block, a skip connection can be sent to the symmetric up-sampling module. The bottleneck can be built from two convolutional layers with batch normalization, and dropout to reduce the overfitting. The up-sampling path can include four blocks, consisting of transposed convolutions with stride 2, a concatenation with the corresponding feature map from the down-sampling (skip connection), and 3x3 convolutional layers with batch normalization.

[0035] To avoid over-fitting, the system can normalize the input, use batch normalization layers, and deploy regularization during training. The dataset used can be split into training, validation, and test sets, and the best performing model in the validation set can be tested against the test set to avoid over-fitting on selection.

[0036] Aggressive data augmentation can also be used to avoid over-fitting. For each pair of training image and mask, the algorithm 1 can be performed each time they go through the model:

[0037] Algorithm 1 Data Augmentation

1: Select an image/mask pair;

2: while less than 5 augmentations were applied do

3: Select a data augmentation operation from stage 1 ;

4: if random number > threshold for selected operation then

5: apply augmentation operation to the image and the mask;

6: for all data augmentation operation from stage 2 do

7 : if random number > threshold for selected operation then

8: apply augmentation operation to the image and the mask;

[0038] Where stage 1, data augmentation operations can include but are not limited to:

• Random elastic transformations to simulate distortion.

Randomly adding values to the pixels to simulate imperfections. • Random JPEG compression to simulate different image qualities.

• Random Motion Blur to simulate bad camera focus.

• Random variations in Hue and Saturation for better generalization.

• Random perspective transformation to simulate different angles.

• Random crop and pad.

[0039] And stage 2 data augmentation operation include but are not limited to:

• Random scaling in both axes.

• Random translation in both axes.

• Random shear.

• Random flips in both axes.

[0040] The learning rate can be dynamically reduced to help the training process converge to an optimum rate.

[0041] Once pre-established accuracy criteria are met, the model can be deployed. Regarding the loss function of the neural network, the system can use the dice coefficient (DC) defined as follows:

2TP DC = - — -

2TP+FP+FN where TP, FP, and FN stand for true positives, false positives, and false negatives.

[0042] Labeling an image is a lengthy process that requires a lot of domain expert effort, for example, scientists can spend more than 40 minutes per image. Therefore, the images tagged by the experts, and used for training, should be carefully selected to maximize the knowledge that the neural network can extract from them. To this end, the present system extracts relevant patches from some of the images. These patches can be determined by randomly sampling images and classifying them into ‘relevant’ and ‘not relevant.’ Such a model can be trained with very little training data, using an EfficientNet B0 as a lightweight feature extractor, pre-trained on ImageNet, and training only a 64 unit fully connected layer with a binary sigmoid activation layer at the end. Additionally, the system can use the same number of images that were selected in the previous step from the lower quartile of the focus distribution to use as full black masks, as nothing is in focus.

[0043] Once the system has the relevant patches, an expert can tag them. Tagging an image consists of drawing a mask on top of said image, in which the tagging scientist identifies the structures appearing on it. This structure can be the microorganism itself, image artifacts, or crystal formations, among others. Next, each of these classes is tagged by the scientist painting them in a different color. Finally, the scientist compares each image with the one immediately above and below to determine whether the item of interest is in focus or not. With this small database produced by the experts, the system can create a much larger one by introducing variations on the images and combining said variations on the masks. This allows the system to increase the amount of data with which the segmentation model is trained. [0044] Once the segmentation model can produce the masks, the team of experts can analyze the masks to determine if they are correctly tagged or not. If there is an error, the expert will tag that image again, and the result will be added to the training database. This iterative process will periodically improve the segmentation results up to the point where the masks produced by the system and the experts are equivalent.

[0045] Regarding 3D reconstruction, the masks obtained in the automatic analysis using focus-segmentation usually present a problem: a lack of resolution in z prevents sufficient connectivity between elements in different focuses. Even though focus stacking can be a suitable method for some applications of microscope imaging, in this case, it is not robust enough to create an accurate 3D reconstruction, as it sometimes creates image artifacts that make it unusable. To solve this issue, the system creates another neural network that takes both a series of masks and the original images as input and creates the 3D reconstruction of those images.

[0046] The system starts by using a three-dimensional section of the samples and the masks created by the focus-segmentation network. The size of these images should be relatively small for computational limitations. The system can then use a CNN that filters errors and improves the connectivity between the results of different horizontal planes, as dictated by the 2D images. This additional CNN can, for example, have five convolution layers followed by a fully connected layer. The output of this model is a 3D voxel reconstruction, represented by a 3D matrix of zeroes and ones, where a one represents the presence of the microorganism in that 3D point. The system can also project the 3D reconstruction onto a 2D image for easier visualization.

[0047] Regarding data extraction, there are many different parameters that the system can extract from the 3D reconstruction. For example, the system can identify the number of different microorganisms in a particular well. This is easily identifiable in the 3D reconstruction, as the system only needs to identify /detect the number of individual connected components.

[0048] From a morphological point of view, the variables are more dependent on the physical nature of the microorganism, such as radius if the microorganism is spherical, the medium length if the microorganism has a rod or spiral shape, the total area covered the sample contains colonies, etc. Special considerations can be taken when the microorganism being studied has a filamentous morphology, which happens with some microscopic fungi and bacteria. In this case, many relevant parameters can be extracted, but the system must first parameterize the 3D reconstruction in a mathematical structure. To this end, the system can transform the obtained mask into a mathematical graph.

[0049] Specifically, the system can create weighted graphs. A weighted graph, or a network, is a graph in which a number (a weight) is assigned to each edge. Such weights might represent, for instance, costs, lengths, or capacities, depending on the problem at hand. As such, the nodes of the microorganism graph representation created by the system will be the ramifications of the network, while the edges will be the connections and the weights the length of that connection.

[0050] To convert the 3D representation to a graph, the system preprocesses the 3D representation with a method called skeletonization. Once the system has the skeleton of each frame of the 3D representation, the system can create a 3D matrix with it and use an algorithm to detect the nodes and edges of the graph that represents said matrix.

[0051] As a byproduct of parameterizing the microorganism as a graph, the system can calculate many relevant growth variables:

• Nodes: The total number of connected entities. These are the branch points or the tips.

• Edges: The total number of connections. In some microorganisms such as filamentous fungi, these are the connections between a ramification.

• Node Degree: The number of links attached to a node, often used as a measure of the connectedness of networks, especially if it is a frequency distribution.

• Subgraphs: If the network is broken up, the number of loose parts.

• Spatial extent: Refers to the area covered by the microorganism, calculated as the area of the convex hull of the node positions in space, or by segmentation of the colony outline.

• Node density: The number of junctions per unit area of space covered by the microorganism. This is a measure of the branching/fusing density.

• Total length: The total length of links in the network, calculated by summing the lengths of all links in the network.

[0052] Regardless of the morphology of the microorganism, the data extraction process can give the researcher very valuable information that can be used to analyze the growing conditions of the microorganism under different conditions. For example, it allows the researcher to obtain the growth rates of the microorganism or its maximum growth under certain conditions.

[0053] FIG. 1 illustrates an example process pipeline of the system, representing the complete pipeline 102 from raw images at different heights to 3D graphs and 2D projections. [0054] The input is the raw images 106 at different heights, and the output are 3D graphs 116 and 2D projections 118. The system automatically captures images 104 on many horizontal planes of the sample 106. These images are then processed into a mask 110, with the semantic segmentation and 108, followed by focus selection CNN 112 as described above. The CNN 112 is trained with data labeled by an expert, with patches of images suggested by another network. For example, the system can identify specific patches of the images and provide suggestions 120 for what those patches contain. An expert can then annotate 124 the patches, accepting the suggested description or updating the patch with a new description. These annotations 124 can then be used to train, or retrain, the models 126. With the masks 110 of the different frames and the CNN 112, the system creates a final model to filter the results and create a 3D representation 114, which can include the 2D projection 118 and/or the 3D graph 116. Once the 2D projection 118 and/or 3D graph 116 are generated they can be provided to users via a visualization platform 128. In some configurations, the system can include a correction tool for users 122, where the users viewing the images 128 can provide feedback which is then used as a further suggestion 120, reviewed by experts 124, and ultimately incorporated into model training 126.

[0055] Exemplary innovative qualities of the developed technologies include: 1) Joining the automatic data acquisition with image analysis to obtain data about microorganism growth. Scientists cannot relate quantitative data describing the growth of microorganisms to the image data obtained with classical techniques due to the enormous amount of time involved in processing all the data; and 2) Visualization of occluded objects due to the depth of field: As an alternative to other techniques such as SEM, optical microscopy allows detection of details partially occluded by other microorganisms by selecting different foci, as shown in Figure 3. Using this effect, the system can obtain a faithful reproduction of the entire content of a 3D recipient rather than a projection showing only the 'surface' of the recipient. 3) Obtaining faithful images from a set of partially focused images. Scientists cannot currently analyze three-dimensional photos due to the large amount and processing of data, as well as the problem of dealing with defocusing. Systems configured as disclosed herein can resolve both problems simultaneously, which does not generate loss of information in the image. [0056] Figure 2 shows an example of the visualization of these occluded objects. FIG. 2 illustrates an example visualization of occluded objects due to the depth of field of a microscope. The microorganism featured is a filamentous fungus (Fusarium venenatum). Shown are three frames 202, 204, 206 of the same sample at different heights, with the left frame 202 being the lowest point and the right frame 206 the highest. In the first frame 202, there is an in-focus vertical hyphae and other structures that are out of focus. In the last frame 206 is a horizontal hypha that is not being occluded by the vertical hyphae above, as it is now out of focus.

[0057] FIG. 3 illustrates a representation of the phenomenon by which occluded objects are visualized. When the microscope camera is focused on the far end of the plate, light rays from the right point 302 that strike any part of the front of the lens 306 are being refracted to a point on the camera's 308 sensor. At the same focus distance, light rays from the left point 304 that strike different points on the front of the lens 306 are being focused to a point well behind the sensor plane 308. The light from the left point 304 of the plate is being spread out over a blur circle as it strikes the sensor.

[0058] FIG. 4 illustrates example results of a semantic segmentation model. The column to the left represents the original images, the column in the center is the mask labeled by the domain experts (ground truth) and the images to the right are the mask predicted by the system. As illustrated, the system solves the semantic segmentation problem with the focus selection problem simultaneously. The advantage this solution provides over traditional techniques is that the processing of these two problems can be done in a single step, taking into account the information of both problems, which opens a unique alternative to data generation. This also provides a 3D reconstruction platform that obtains more information than traditional techniques in a shorter time.

[0059] The system also trains with a low amount of data via database enhancement as discussed above, and using data augmentation techniques such as adjustments in brightness, contrast, color enhancement, saturation, crops, flips, rotations, translations, etc. This training is complemented by the feedback loop with the scientists, which can significantly reduce the number of iterations involved in this process.

[0060] FIG. 5 illustrates an example 2D projection of the 3D representation, created by filtering the semantic segmentation results with a CNN model configured as disclosed herein. As discussed above, the system uses a CNN which filters errors and improves the connectivity between the results on different horizontal planes. This CNN can have five convolution layers followed by a fully connected layer. The output of this model is a 3D voxel reconstruction, represented by a 3D matrix of zeros and ones, where one represents the presence of the microorganism in that 3D point. The system can also project the 3D reconstruction onto a 2D image for easier visualization, which is what is seen in Figure 5. [0061] FIG. 6 illustrates an exemplary method embodiment. As illustrated, a method for practicing the concepts disclosed herein can include: for a sample under a microscope: setting the microscope at a starting location; and capturing a plurality of images of the sample via a camera connected to the microscope, with a stepper motor performing one of increasing or decreasing a vertical distance between a microscope lens and a sample while moving the microscope from the starting location a set amount for each image capture until reaching an end location, resulting in a plurality of 2D images for the sample (602); processing, via at least one processor, the plurality of images using a first Convolutional Neural Network (CNN) to tag known objects within the plurality of images associated with at least one of microorganisms or inorganic structures, resulting in tagged versions of the plurality of 2D images (604); constructing, via the at least one processor using the tagged versions of the plurality of 2D images, a 3D structure (606); processing, via the at least one processor using a second CNN, the 3D structure to compare the 3D structure to known microorganisms, resulting in a 3D comparison (608); and outputting at least one candidate microorganism or inorganic structure to a user based on the 3D comparison (610).

[0062] In some configurations, rather than moving the microscope lens vertically to vary the distance between the sample and the microscope, the system can move the sample up and down. For example, the system can move the well plate vertically, rather than the microscope lens. In yet other configurations, both the sample and the microscope can change positions.

[0063] In some configurations, the setting of the microscope, the capturing of the plurality of images, the processing of the plurality of images using the first CNN, the constructing of the 3D structure, the processing of the 3D structure, and the outputting of the at least one candidate microorganism occur for each sample within a plurality of samples. In such configurations, the plurality of samples can be located within a well plate.

[0064] In some configurations, the illustrated method can further include modifying at least one of the first CNN and the second CNN based on feedback regarding the at least one candidate microorganism. In such configurations, the modifying of the at least one of the first CNN and the second CNN can include retraining at least one of the first CNN and the second CNN. In other configurations, the modifying of the at least one of the first CNN and the second CNN can include modifying a weight associated with a portion of at least one of the first CNN and the second CNN.

[0065] In some configurations, the stepper motor raises the microscope a predetermined amount for each image capture, the predetermined amount being a sub-portion of a total distance between the starting location and the end location, such that the plurality of images comprises at least three images. In such configurations, at least one of the at least three images can be out of focus.

[0066] Another exemplary method, which is not illustrated, can be a system which identifies a respective set of data features for each of a plurality of images of microorganisms and inorganic structures, the respective set of data features comprising at least one physiological feature, appearance, or focus. The system can then execute a computer-implemented feature extraction process on the respective set of data features for each of the plurality of microorganisms and inorganic structures, thereby determining or obtaining a compact representation of the respective set of data features for each microorganism within the plurality of microorganisms.

[0067] With reference to FIG. 7, an exemplary system includes a general-purpose computing device 700, including a processing unit (CPU or processor) 720 and a system bus 710 that couples various system components including the system memory 730 such as read-only memory (ROM) 740 and random access memory (RAM) 750 to the processor 720. The system 700 can include a cache of high-speed memory connected directly with, in close proximity to, or integrated as part of the processor 720. The system 700 copies data from the memory 730 and/or the storage device 760 to the cache for quick access by the processor 720. In this way, the cache provides a performance boost that avoids processor 720 delays while waiting for data. These and other modules can control or be configured to control the processor 720 to perform various actions. Other system memory 730 may be available for use as well. The memory 730 can include multiple different types of memory with different performance characteristics. It can be appreciated that the disclosure may operate on a computing device 700 with more than one processor 720 or on a group or cluster of computing devices networked together to provide greater processing capability. The processor 720 can include any general purpose processor and a hardware module or software module, such as module 1 762, module 2 764, and module 3 766 stored in storage device 760, configured to control the processor 720 as well as a special-purpose processor where software instructions are incorporated into the actual processor design. The processor 720 may essentially be a completely self-contained computing system, containing multiple cores or processors, a bus, memory controller, cache, etc. A multi-core processor may be symmetric or asymmetric.

[0068] The system bus 710 may be any of several types of bus structures including a memory bus or memory controller, a peripheral bus, and a local bus using any of a variety of bus architectures. A basic input/output (BIOS) stored in ROM 740 or the like, may provide the basic routine that helps to transfer information between elements within the computing device 700, such as during start-up. The computing device 700 further includes storage devices 760 such as a hard disk drive, a magnetic disk drive, an optical disk drive, tape drive or the like. The storage device 760 can include software modules 762, 764, 766 for controlling the processor 720. Other hardware or software modules are contemplated. The storage device 760 is connected to the system bus 710 by a drive interface. The drives and the associated computer-readable storage media provide nonvolatile storage of computer-readable instructions, data structures, program modules and other data for the computing device 700. In one aspect, a hardware module that performs a particular function includes the software component stored in a tangible computer-readable storage medium in connection with the necessary hardware components, such as the processor 720, bus 710, display 770, and so forth, to carry out the function. In another aspect, the system can use a processor and computer-readable storage medium to store instructions which, when executed by the processor, cause the processor to perform a method or other specific actions. The basic components and appropriate variations are contemplated depending on the type of device, such as whether the device 700 is a small, handheld computing device, a desktop computer, or a computer server.

[0069] Although the exemplary embodiment described herein employs the hard disk 760, other types of computer-readable media which can store data that are accessible by a computer, such as magnetic cassettes, flash memory cards, digital versatile disks, cartridges, random access memories (RAMs) 750, and read-only memory (ROM) 740, may also be used in the exemplary operating environment. Tangible computer-readable storage media, computer-readable storage devices, or computer-readable memory devices, expressly exclude media such as transitory waves, energy, carrier signals, electromagnetic waves, and signals per se.

[0070] To enable user interaction with the computing device 700, an input device 790 represents any number of input mechanisms, such as a microphone for speech, a touch- sensitive screen for gesture or graphical input, keyboard, mouse, motion input, speech and so forth. An output device 770 can also be one or more of a number of output mechanisms known to those of skill in the art. In some instances, multimodal systems enable a user to provide multiple types of input to communicate with the computing device 700. The communications interface 780 generally governs and manages the user input and system output. There is no restriction on operating on any particular hardware arrangement and therefore the basic features here may easily be substituted for improved hardware or firmware arrangements as they are developed.

[0071] Use of language such as “at least one of X, Y, and Z,” “at least one of X, Y, or Z,” “at least one or more of X, Y, and Z,” “at least one or more of X, Y, or Z,” “at least one or more of X, Y, and/or Z,” or “at least one of X, Y, and/or Z,” are intended to be inclusive of both a single item (e.g., just X, or just Y, or just Z) and multiple items (e.g., {X and Y}, {X and Z}, {Y and Z}, or {X, Y, and Z}). The phrase “at least one of’ and similar phrases are not intended to convey a requirement that each possible item must be present, although each possible item may be present.

[0072] The various embodiments described above are provided by way of illustration only and should not be construed to limit the scope of the disclosure. Various modifications and changes may be made to the principles described herein without following the example embodiments and applications illustrated and described herein, and without departing from the spirit and scope of the disclosure.

Claims

CLAIMS We claim:

1. A method comprising: for a sample under a microscope: setting the microscope at a starting location; and capturing a plurality of images of the sample via a camera connected to the microscope, with a stepper motor performing one of increasing or decreasing a vertical distance between a microscope lens and a sample while moving the microscope from the starting location a set amount for each image capture until reaching an end location, resulting in a plurality of 2D images for the sample; processing, via at least one processor, the plurality of images using a first Convolutional Neural Network (CNN) to tag known objects within the plurality of images associated with at least one of microorganisms or inorganic structures, resulting in tagged versions of the plurality of 2D images; constructing, via the at least one processor using the tagged versions of the plurality of 2D images, a 3D structure; processing, via the at least one processor using a second CNN, the 3D structure to compare the 3D structure to at least one of known microorganisms and known inorganic structures, resulting in a 3D comparison; and outputting at least one candidate microorganism or inorganic structure to a user based on the 3D comparison.

2. The method of claim 1, wherein the setting of the microscope, the capturing of the plurality of images, the processing of the plurality of images using the first CNN, the constructing of the 3D structure, the processing of the 3D structure, and the outputting of the at least one candidate microorganism or inorganic structure occur for each sample within a plurality of samples.

3. The method of claim 2, wherein the plurality of samples are located within a well plate.

4. The method of claim 1, further comprising modifying at least one of the first CNN and the second CNN based on feedback regarding the at least one candidate microorganism or inorganic structure.

5. The method of claim 4, wherein the modifying of the at least one of the first CNN and the second CNN comprises retraining at least one of the first CNN and the second CNN.

6. The method of claim 4, wherein the modifying of the at least one of the first CNN and the second CNN comprises modifying a weight associated with a portion of at least one of the first CNN and the second CNN.

7. The method of claim 1, wherein the stepper motor raises the microscope a predetermined amount for each image capture, the predetermined amount being a sub-portion of a total distance between the starting location and the end location, such that the plurality of images comprises at least three images.

8. The method of claim 7, wherein at least one of the at least three images is out of focus.

9. A system comprising: a microscope controlled with a stepper motor; a camera connected to the microscope; at least one processor; and a non-transitory computer-readable storage medium having instructions stored which, when executed by the at least one processor, cause the at least one processor to perform operations comprising: for a sample under the microscope: setting the microscope at a starting location; and capturing a plurality of images of the sample via the camera, with the stepper motor performing one of increasing or decreasing a vertical distance between a microscope lens and a sample while moving the microscope from the starting location a set amount for each image capture until reaching an end location, resulting in a plurality of 2D images for the sample; processing the plurality of images using a first Convolutional Neural Network (CNN) to tag known objects within the plurality of images associated with at least one of microorganisms or inorganic structures, resulting in tagged versions of the plurality of 2D images; constructing, using the tagged versions of the plurality of 2D images, a 3D structure; processing, using a second CNN, the 3D structure to compare the 3D structure to at least one of known microorganisms and known inorganic structures, resulting in a 3D comparison, resulting in a 3D comparison; and outputting at least one candidate microorganism or inorganic structure to a user based on the 3D comparison.

10. The system of claim 9, wherein the setting of the microscope, the capturing of the plurality of images, the processing of the plurality of images using the first CNN, the constructing of the 3D structure, the processing of the 3D structure, and the outputting of the at least one candidate microorganism or inorganic structure occur for each sample within a plurality of samples.

11. The system of claim 10, wherein the plurality of samples are located within a well plate.

12. The system of claim 9, the non-transitory computer-readable storage medium having additional instructions stored which, when executed by the at least one processor, cause the at least one processor to perform operations comprising: modifying at least one of the first CNN and the second CNN based on feedback regarding the at least one candidate microorganism or inorganic structure.

13. The system of claim 12, wherein the modifying of the at least one of the first CNN and the second CNN comprises retraining at least one of the first CNN and the second CNN.

14. The system of claim 12, wherein the modifying of the at least one of the first CNN and the second CNN comprises modifying a weight associated with a portion of at least one of the first CNN and the second CNN.

15. The system of claim 9, wherein the stepper motor raises the microscope a predetermined amount for each image capture, the predetermined amount being a sub-portion of a total distance between the starting location and the end location, such that the plurality of images comprises at least three images.

16. The system of claim 15, wherein at least one of the at least three images is out of focus.

17. A non-transitory computer-readable storage medium having instructions stored which, when executed by at least one processor, cause the at least one processor to perform operations comprising: for a sample under a microscope: setting the microscope at a starting location; and capturing a plurality of images of the sample via a camera, with a stepper motor performing one of increasing or decreasing a vertical distance between a microscope lens and a sample while moving the microscope from the starting location a set amount for each image capture until reaching an end location, resulting in a plurality of 2D images for the sample; processing the plurality of images using a first Convolutional Neural Network (CNN) to tag known objects within the plurality of images associated with at least one of microorganisms or inorganic structures, resulting in tagged versions of the plurality of 2D images; constructing, using the tagged versions of the plurality of 2D images, a 3D structure; processing, using a second CNN, the 3D structure to compare the 3D structure to at least one of known microorganisms and known inorganic structures, resulting in a 3D comparison; and outputting at least one candidate microorganism or inorganic structure to a user based on the 3D comparison.

18. The non-transitory computer-readable storage medium of claim 17, wherein the setting of the microscope, the capturing of the plurality of images, the processing of the plurality of images using the first CNN, the constructing of the 3D structure, the processing

21 of the 3D structure, and the outputting of the at least one candidate microorganism or inorganic structure occur for each sample within a plurality of samples.

19. The non-transitory computer-readable storage medium of claim 18, wherein the plurality of samples are located within a well plate.

20. The non-transitory computer-readable storage medium of claim 17, having additional instructions stored which, when executed by the at least one processor, cause the at least one processor to perform operations comprising: modifying at least one of the first CNN and the second CNN based on feedback regarding the at least one candidate microorganism or inorganic structure.

22