WO2019140428A1

WO2019140428A1 - Thresholding in pattern detection at low signal-to-noise ratio

Info

Publication number: WO2019140428A1
Application number: PCT/US2019/013598
Authority: WO
Inventors: Nancy E. KLECKNER; Frederick S. CHANG
Original assignee: President And Fellows Of Harvard College
Priority date: 2018-01-15
Filing date: 2019-01-15
Publication date: 2019-07-18

Abstract

Methods and systems for detecting and characterizing a pattern (or patterns) of interest in a low signal-to-noise ratio (SNR) data set are disclosed. One method is a two-stage Likelihood pipeline analysis that takes advantage of the benefits of a full Likelihood analysis while providing computational tractability. The two-stage Likelihood pipeline may include, at either stage, the calculation and use of one or more of a false positive rate, a standard error, and a multi-factor landscape involving a gradient and/or Hessian of a Likelihood ratio.

Description

THRESHOLDING IN PATTERN DETECTION AT LOW SIGNAL-TO-NOISE

RATIO

CROSS-REFERENCE TO RELATED APPLICATION

[0001] This application claims the benefit of priority to U.S. provisional application no. 62/617,571, filed January 15, 2018, currently pending, which is hereby incorporate by reference in its entirety.

STATEMENT OF GOVERNMENT SUPPORT

[0002] This invention was made with government support under ROl GM025326, ROl GM044794, and T32 GM007598 awarded by the National Institutes through the National Institute of General Medical Sciences (NIGMS), and under PHY-1219334 awarded by the National Science Foundation. The government has certain rights in the invention.

TECHNICAL FIELD

[0003] This disclosure is generally directed to pattern detection in data sets, in particular pattern detection including Likelihood analysis.

BACKGROUND

[0004] Robust detection of a pattern at low signal-to-noise ratio (“SNR”) represents a fundamental challenge for many types of data analysis. Robustness comprises both the reliable detection of a pattern when it is present (i.e., minimized false negatives) and the failure to falsely identify a pattern as being present when, in fact, it is not (i.e.. minimized false positives). Biological imaging provides a prominent example: super-resolution imaging of fluorescent point sources (“spots”), e.g., single molecules, is currently possible only in high-SNR regimes, i.e. where the“spot” is very bright.

[0005] Powerful methodologies have recently emerged which permit the reliable detection and precise localization of fluorescent objects, both spots and fluorescent objects with more complex shapes, with resolution below the optical limit set by the wavelength of light. However, thus far, known single molecule/point source methodologies (e.g., STORM, double helical point spread function analysis) require imaging in a high SNR regime. Also, methodologies for imaging objects with more complex shapes that are not based on imaging of individual point sources (for example Structured Illumination Microscopy or“SIM”), are based on imaging multiple instances of the sample of interest under different illumination conditions. A super-resolution image of the object that surpasses the optical limit is then reconstructed from this battery of images. Each component image must be obtained in a high-SNR regime for successful reconstruction.

[0006] These high SNR imaging methods require high excitation energy in order to achieve a signal that is detectable above the background, which comprises flourescence that emanates from sources other than the spot, and despite the presence of noise. High excitation energy results in photobleaching and, in living samples, phototoxicity. Because of these effects, super-resolution spot detection and localization is currently limited to the acquisition of 2D data, with 3D information extracted indirectly by modification of the optics plus appropriate data analysis (thus avoiding the need to take z-slices for each image). Moreover, time-lapse analysis in living cells is limited to a relatively small number of time points. Conversely, an unique approach has made it possible to detect fluorescent point sources and/or objects with more complex shapes in low SNR regimes (and thus low excitation energies), thereby enabling imaging of living cells with many images collected over very long time periods. (See, e.g., (1) Carlton, Peter M., et al.“Fast live simultaneous

multiwavelength four-dimensional optical microscopy” Proceedings of the National Academy of Sciences 107.37 (2010): 16016-16022; (2) Arigovindan, Muthuvel, et al. "High- resolution restoration of 3D structures from widefield images with extreme low signal-to- noise-ratio. " Proceedings of the National Academy of Sciences 110.43 (2013): 17344-17349). However, this approach does not provide super-resolution precision of localization for fluorescent point sources or for visualization of objects with more complex shapes and, moreover, is computationally -intractable for large datasets.

SUMMARY

[0007] Methods and systems for detecting and characterizing a pattern (or patterns) of interest in a low Signal-to-Noise Ratio (SNR) data set are disclosed herein.

[0008] An example method of detecting a pattern of interest in a data set, the data set comprising a plurality of segments, may include calculating an approximate Likelihood Ratio (LR) for one or more of the plurality of segments and calculating a false positive rate for the approximate LR. The method may further include designating the one or more of the plurality of segments as a pattem-of-interest candidate segments based on the false positive rate, applying a full Likelihood analysis to each of the candidate segments, and designating one or more of the candidate segments as including the pattern according to the result of the full Likelihood analysis.

[0009] An example method of detecting a pattern of interest in a data set, the data set comprising a plurality of segments, may include calculating an approximate maximum likelihood estimate (MLE) or an approximate likelihood value for one or more of the plurality of segments, the MLE comprising one or more values. The method may further include calculating a standard error of the approximate MLE or the approximate likelihood value, designating the one or more of the plurality of segments as a pattem-of-interest candidate segment according to the standard error, applying a full Likelihood analysis to each of the candidate segments, and designating one or more of the candidate segments as including the pattern according to the result of the full Likelihood analysis.

[0010] A method of detecting a pattern of interest in a data set, the data set comprising a plurality of segments, may include calculating an approximate Likelihood ratio for one or more of the plurality of segments. The method may further include calculating a gradient of the approximate Likelihood ratio, calculating a Hessian of the approximate Likelihood ratio, designating the one or more of the plurality of segments as a pattem-of-interest candidate segments based on two or more of the approximate Likelihood ratio, the gradient, and the Hessian, calculating a full Likelihood for each of the candidate segments, and designating one or more of the candidate segments as including the pattern according to the full

Likelihood.

BRIEF DESCRIPTION OF THE DRAWINGS

[0011] FIG. 1 is a flow chart illustrating an embodiment of a method of characterizing a pattern of interest in a data set.

[0012] FIG. 2 is a diagrammatic view of an example z-stack of two-dimensional images that may comprise an example three-dimensional data set.

[0013] FIG. 3 is a flow chart illustrating an embodiment of a first stage in a two-stage likelihood pipeline analysis.

[0014] FIG. 4 is a flow chart illustrating an embodiment of a second stage in a two-stage likelihood pipeline analysis.

[0015] FIG. 5 is a diagrammatic view of an embodiment of a system for acquiring a data set and identifying and localizing a pattern of interest in a data set.

[0016] FIG. 6 is a flow chart illustrating an embodiment of a method of conducting portions of a stage I or a stage II analysis in a two-stage Likelihood pipeline.

DETAILED DESCRIPTION

[0017] The instant disclosure provides an algorithm that may be applied to reliably detect a pattern of interest (e.g., one or more fluorescent spots) in a low-SNR data set. When applied to detect fluorescent spots in biological bodies with an optical image detector (e.g., a microscope), the algorithm of the present disclosure may provide localization precision beyond the optical diffraction limit, may provide optimal resolution of overlapping spots, and may further provide accurate quantification. The algorithm of the present disclosure may enable super-resolution visualization of single molecules in three-dimensions (3D) at frequent time intervals for long, biologically-relevant time periods. The algorithm of the present disclosure may also enable super-resolution time-lapse imaging of whole objects.

[0018] The instant disclosure builds on an algorithm, the“two-stage Likelihood pipeline,” originally disclosed in U.S. app. nos. 62/211,950 and 62/211,994 and PCT app. no.

PCT/US2016/049706, each of which is hereby incorporated by reference in its entirety. The instand disclosure provides numerous additions to the two-stage likelihood pipeline that, individually and collectively, may provide increased precision and/or accuracy in pattern detection and characterization.

[0019] The two-stage Likelihood pipeline makes use of the Likelihood approach, which is a well-documented method for finding patterns in noisy data sets that is understood by a person of ordinary skill in the art. The Likelihood approach, in order to identify and characterize a pattern of interest, can take into account multiple components that contribute to the values of the data set. In embodiments, the Likelihood approach may take into account all components that contribute to the values of the data set. For example, in an embodiment, as described further below, a Likelihood approach may take into account the ideal theoretical natures of a pattern of interest and of the background as well as the effects of multiple types of noise, with each of these contributors to the values in the data set expressed

mathematically as a function of particular parameters.

[0020] A full Likelihood approach, in its purest form (which may also be referred to herein as a“single stage” form of Likelihood analysis), would involve consideration of all possible combinations of all possible values of all of parameters across the entire data set in order to determine which particular combination of parameter values would best fit the data set. This theoretical ideal is rarely, if ever, achieved in practice because it requires a computational workload that cannot be achieved with current technology in an acceptable amount of time. As a result, various strategies have been developed that simplify the computational task, thus ameliorating the challenge of computational intractability. For example, the Markov Chain Monte Carlo method takes the approach of intelligent sampling of particular subsets of parameter values. However, this approach is slow and inefficient, and thus only applicable if the position of the pattern of interest in the data set is first approximately defined and if the number of data sets to be analyzed is restricted. A similar situation exists for the specific case of fluorescent spot detection. In one known approach, the data set is scanned by eye, or by ad hoc criteria, to determine the presence of a spot, and a full Likelihood approach is then implemented on the limited portion of the data set that includes the spot to more stringently define the presence of a spot and to provide its precise and accurate localization. See Sage, Daniel, et al.“Quantitative evaluation of software packages for single-molecule localization microscopy,” Nature methods 12.8 (2015): 717-724. This method cannot be applied to low- SNR data sets because, in such regimes, a spot may be missed by visual or“ad-hoc” criteria and thus may not actually be evaluated by the Likelihood approach. In other words, initial inspection of a data set through visual inspection or ad hoc criteria is particularly ineffective in low-SNR data sets because of an unacceptably high chance of false negatives. A general, computationally -tractable algorithm for robustly detecting and precisely localizing a pattern of interest in a large, noisy, low-SNR data set (e.g., a data set generated by 3D time lapse analysis of a fluorescent spot with imaging at frequent intervals for long time periods) does not currently exist.

[0021] The two-stage Likelihood pipeline modifies the Likelihood approach in order to both detect and localize a pattern of interest (e.g., in a large data set) in a computationally - efficient (and therefore feasible) manner. The two-stage Likelihood pipeline may provide robust spot detection and may enable both accurate (e.g., super-resolution) localization and precise localization and quantitation of detected spots. Moreover, the two-stage Likelihood pipeline may enable pattern detection and localization in low-SNR data sets.

[0022] When applied to imaging of fluorescent bodies, the two-stage Likelihood pipeline enables the collection of data at lower excitation energy relative to known methods, and therefore enables imaging and following of fluorescent bodies over longer periods of time (through capture of more images at more frequent intervals) than known methods.

[0023] In some embodiments described in this disclosure, the pattern of interest is a 3D photon distribution produced by a fluorescent point source. Along with the fluorescent point source of interest, photons also emanate from other sources in the measured environment; these photons comprise“background.” In embodiments, the data set may be a digitized set of images of the pattern of interest and the background. The images may be captured in 3D in a vertical series of planar sections. Each image in such a data set may represent the output of the detection units (i.e., pixels) of a camera and the camera may be a part of a microscope, in an embodiment. Photons impinging on these pixels from both the spot and the background are converted to electrons and then to analog-to-digital units (“ADUs”).

[0024] Where the pattern of interest is a 3D photon distribution produced by a fluorescent point source, noise in the data set may arise from two sources: (1)“photon noise,” which comprises fluctuations in the number of photons impinging on each pixel per time unit (i.e. image capture time), from both spot and background sources; and (2)“measurement noise,” which arises during the conversion of photons to ADUs. In the case of digital imaging, this measurement noise may vary from pixel-to-pixel due to mechanical defects of the detector, (e.g. broken or unreliable pixels). Additionally, edge effects may occur when the fluorescent point source is located within the data set, but so near an edge that the image does not capture the entire“spot” or when the point source is located outside of the data set, with only part of the corresponding spot located within the imaged data set. [0025] To provide context for the two-stage Likelihood pipeline, the instant disclosure will first provide a brief description of Likelihood analysis. Next, the instant disclosure will provide a description of the two-stage Likelihood pipeline. The instant disclosure will then discuss three novel additions to the two-stage Likelihood pipeline: (i) calculation and use of a false positive rate, or a related metric such as true positive rate, false negative rate, or true negative rate, for a Likelihood ratio; (ii) calculation and use of a standard error associated with one or more values of a maximum likelihood estimate or likelihood value; and (iii) calculation and use of a multi-factor landscape value. The instant disclosure will then describe example methods that apply the two-stage Likelihood pipeline, and the novel additions thereto, to the detection and localization of fluorescent spots in images of biological bodies.

[0026] Brief Description of Likelihood Analysis.

[0027] As noted above, the Likelihood approach may be applied to characterize (i.e., identify, define the position of, and determine an amplitude for) a pattern of interest in a data set. In one form, the Likelihood approach compares two hypotheses to each other. First, the “signal hypothesis” hypothesizes that both the pattern of interest and the background are represented in the data set and therefore mathematically models the data set as a sum of the pattern of interest and the background. An example model of the signal hypothesis is given in equation (1) below:

signal hypothesis º fA, B, pos_A, pos_B) = Affposf₎ + B g_L(pos_B)

(Eq. 1)

where A_L is the mean value of the data set for index /. pos_A is a position (e.g., (x, y, z) in an embodiment in which the data set exists in a three-dimensional Cartesian coordinate system) respective of the pattern of interest (e.g., the center of the pattern of interest), f_L is the distribution function of the pattern of interest at point /. A is the amplitude of the pattern of interest, g_L is the distribution function of the background pattern at point /. B is the amplitude of the background pattern, and pos_B is a position respective of the background (e.g, the center of the background pattern). The signal hypothesis supposes that the mean value A_L at a point i is the sum of the pattern of interest at that point and the background pattern at that point.

[0028] For the mathematical model of the signal hypothesis, the background is itself a pattern, but one that comprises“interference” for the pattern of interest. Accordingly, in this disclosure, both a“pattern of interest” and a“background pattern” may be referenced. When used in isolation, the term“pattern” in this disclosure refers to the pattern of interest, not the background pattern.

[0029] The second hypothesis, the“null hypothesis,” hypothesizes that the pattern of interest is not present in the data set, only the background pattern, and can be conceptualized according to equation (2) below with terms defined as described above:

null hypothesis º AJB, pos_B) = Bg_L pos_B)

(Eq. 2)

[0030] The Likelihood approach compares the signal hypothesis with the null hypotheses to determine which has a higher likelihood of describing the data set. The more accurate the signal hypothesis is (i.e., the higher the likelihood associated with the signal hypothesis), and the less accurate the null hypothesis is (i.e., the lower the likelihood associated with the null hypothesis), the more likely it is that the pattern of interest is present in the data set. As will be described in further detail below, a comparison of the likelihoods of the two hypotheses may be referred to as the“Likelihood ratio” of the given data set.

[0031] Although the instant disclosure may refer to the use of the signal hypothesis and the null hypothesis with respect to a single pattern of interest and a single background pattern, a person of skill in the art will appreciate that the teachings of the instant disclosure may readily be extended to more than one pattern of interest, such as two overlapping fluorescent spots of different emission wavelengths, for example only, and/or more than one background pattern.

[0032] For example, the instant disclosure may also make reference to the use of the signal hypothesis and the null hypothesis with respect to multiple patterns of interest, collected with multiple data collection regimes, and resulting in multiple discrete subsets of a data set. Multi-spectral imaging is one example of the use of multiple data capture regimes that may result in a data set having multiple discrete subsets. Multi-spectral imaging may be used, for example, for differential detection of the positions and dynamics of different molecular entities. In such a context, multi-spectral imaging may involve labeling different entities with fluorophores of different fluorescent colors— e.g., having different excitation wavelengths and/or different output wavelengths— to separately locate and/or track those entities. Such fluorophores may be imaged by: (i) providing, for each fluorophore color, a respective excitation signal (e.g., excitation light of a wavelength known to cause output by the fluorophore), which is achieved by an appropriate combination of a light source and a suitable wavelength filter; (ii) filtering the overall light output of the sample with a filter, for each fluorophore, that passes the output light wavelength of the fluorophore to the detector; and (iii) detecting and recording the photons of the filtered output. The use of a given excitation wavelength and filtered collection of the corresponding output wavelength may comprise a data collection regime specific to a given color of fluorophore. The data for different colors may be collected separately, with the collected data for a given color and regime resulting in a subset of a data set. The complete data set, therefore, may include respective subsets corresponding to the respective colors and/or regimes. [0033] In multi-spectral fluorescence imaging, the fluorophore output collected for a given data collection regime primarily comprises light output by fluorophores specifically targeted by the data collection regime. However, it is common to encounter output from fluorophores not targeted by the data collection regime— /. e.. fluorophores of a different color— in the corresponding data subset. In other words, for a sample comprising“red” and“green” fluorophores, the data respective of the red regime may include some output from the green fluorophores, and vice-versa. This feature is known in multi-spectral fluorescence imaging as “spectral bleedthrough.” In low-SNR conditions, such bleedthrough may increase the chances of false positive detection of a pattern of interest. In addition, because of bleedthrough, data corresponding to a given color fluorophore may be spread across the data subsets from different data collection regimes, and thus multiple subsets can be used to define the presence and positions of a single fluorophore color spot. Unlike known fluorescence imaging methods and procedures, the pattern detection and localization algorithm described herein takes advantage of this extra information in multi-spectral fluorescence imaging.

[0034] More generally, different patterns of interest that are collected using separate data collection regimes, resulting in separate data subsets, may be present to some degree in one or more of those data subsets. The degree to which a given pattern of interest is present in a data subset corresponding to a discrete data collection regime may be referred to herein as “penetrance.” For example,“bleedthrough” of a fluorophore’ s output into a different color regime may be considered penetrance of that fluorophore’ s output into the other color regime. The fluorophore may also have penetrance in its own regime.

[0035] As also discussed above, one or more components of the system of interest as manifested in a data set may be characterized by noise (e.g., (i) measurement noise, (ii) pattern noise, and/or (iii) background noise). Measurement noise may result from the processes involved in detecting or measuring the pattern of interest and background and/or in converting such entities to numerical form. In the case of an image, measurement noise may include, e.g., noise intrinsic to the image capture device. Pattern noise may be an intrinsic feature of a pattern to be detected. For example, in an image of a fluorophore, the pattern of interest may be noisy because of quantum variation in the number of photons emitted by the fluorophore (so-called“quantum noise”). Background noise may be an intrinsic feature of the background pattern. In an image of a fluorophore, the background pattern may include light emissions from the measurement environment and thus may also be characterized by quantum variation.

[0036] Where the pattern of interest is a fluorescent spot, both pattern noise and background noise may be described by a Poisson distribution. As will be described in greater detail later in this disclosure, these two noise effects (i.e. the effects of the fluctuations in the two different noise sources) may be additive. Accordingly, pattern noise and background noise may be jointly represented by a single Poisson distribution (as the sum of two Poisson distributions is a single new Poisson distribution). Measurement noise may be described by a Gaussian distribution with a mean and a variance that are pixel-specific parameters.

[0037] A Likelihood analysis includes respective functions for both the signal hypothesis and the null hypothesis, which may be solved to determine optimal values of A, B, pos_A. and pos_B (i.e., the parameter values that best describe the data set), as well as the likelihoods associated with those optimal values. It should be noted that the signal hypothesis depends on A, B, pos_A. and pos_B (or the multi-pattern, multi-regime equivalent of those values), and the null hypothesis depends only B and pos_B (or the multi-pattern, multi-regime equivalent of those values). This process— determining optimal values for the parameters of the hypotheses and the likelihoods associated with those values— may be carried out independently for, first, a function corresponding to the signal hypothesis and, second, a function corresponding to the noise hypothesis. These functions are known as“Likelihood functions.” Each Likelihood function describes the likelihood that a given data set arose from the model that implements the corresponding hypothesis (e.g., signal or null) as a function of the parameters of the corresponding model. The Likelihood function defines the likelihood (£) that given data (e.g., a data point d_j ) arose from a particular model (signal or null) and is proportional (k_j) to the probability ( P ) of observing the data given the values of the parameters of the model. Equation (3) below sets forth the general form of a Likelihood function:

(Eq. 3)

Proportionality constant k_j s a data dependent constant, which means that each given dataset j is assigned its own constant k_j in the Likelihood function. Since this constant is a data dependent term, different Likelihood functions, e.g. from different hypotheses, that share the same data will also share the same constant. Thus, for a Likelihood ratio based on two different hypotheses operating on the same dataset, this same constant will be present in both the numerator and the denominator and will cancel out.

[0038] The Likelihood analysis, and functions involved in the analysis, will be described in this disclosure with reference to a data set d, in which each data point d_j is an «-dimensional value having the form given in equation (4) below:

(Eq. 4)

[0039] In an example form of a Likelihood analysis, the Likelihood function for the signal hypothesis is given and expanded in equation (5) below:

(Eq. 5)

[0040] The Likelihood function for the signal hypothesis can be solved to identify the values of the parameters (A. B, pos_A. and pos_B) that give the best fit of the model to the data set of interest d and to define the likelihood that the values d_j in the data set of interest d would occur according to the model, given those parameter values. This exercise comprises a “Maximum Likelihood Estimation.” After solving the above equation (5) (an example process for which is provided below), the identified optimal parameter values (A, B, pos_A. and posg ) comprise the“Maximum Likelihood Estimate” (MLE). The Likelihood value (LV) at the MLE is related to how probable it is to see the given data d_j for the given model having parameters (A, B, pos_A. and pos_B).

[0041] In the same example Likelihood analysis, the Likelihood function for the null hypothesis can be represented in the same general form as equation (5) above and solved to find the optimal values of B and pos_B and the value of its corresponding likelihood.

[0042] Once the Likelihood functions for the signal hypothesis and the null hypothesis are solved, the Likelihood values associated with the two hypotheses can be compared with each other in a ratio. The ratio of the two Likelihood values may be referred to as the“Likelihood Ratio.” The Likelihood Ratio provides a quantitative measure of the relative probabilities that the experimental data set would have arisen according to the signal hypothesis model or the null hypothesis model. Because the only difference between the two hypotheses is the presence or absence of the pattern of interest, the Likelihood ratio gives a measure of the probability that the pattern of interest is present in the data set. [0043] A threshold can be defined for the Likelihood ratio to define the minimum value that may be considered to define the presence of the pattern of interest, with the value of the threshold at the discretion of the user. In an embodiment, the Likelihood Ratio threshold can be defined through experimentation to determine an appropriate threshold that results in an accurate determination of the presence of the pattern of interest. In embodiments, the threshold may be applied directly to the numerical form of the Likelihood Ratio. In other embodiments, the threshold may be applied to a mathematical manipulation of the Likelihood Ratio, which is also considered application of the threshold to the Likelihood Ratio for the purposes of this disclosure. Such mathematical manipulations may include, for example, defining local maxima through H-dome transformation or other local maxima approaches, all of which will be understood by a person of skill in the art.

[0044] If the Likelihood ratio indicates that the pattern of interest is present, the parameter values of the signal hypothesis model (i.e.. the amplitude (A) and position (pos_A) of the pattern of interest and the amplitude ( B ) and position (pos_B) of the background) define the optimally -likely values of those parameters for that pattern of interest and background.

[0045] The Likelihood Approach, via determination of a Likelihood ratio, is a powerful tool to analyze patterns in data sets characterized by high background intensities and high levels of system noise from all sources, relative to the intensity of the pattern of interest.

[0046] Part of the power of likelihood analysis derives from the fact that it incorporates and accommodates different sources of information available in the data set. The Likelihood functions based on the signal hypothesis model and the null hypothesis model take into account not only the nature of the pattern of interest and the background pattern (e.g., for the case of a fluorescent spot, a 3D Gaussian photon distribution and, for example, a constant average level of photons that emanate from sources other than the spot, across the data set) but also the fact that the values present in any individual data set will fluctuate from one sampling of the system to another. This fluctuation comprises“noise.”

[0047] To restate the above point in another way: in principle, the value of the datum predicted to occur at each position in the data set could be a specific number that would be the same in every sampling of the system (e.g., every fluorescence imaging data set of a particular sample). However, if there is noise in the system, from any or all of the above- noted sources, that predicted value fluctuates as described by some appropriate statistical probability distribution(s) (e.g. Poisson, Gaussian, or an empirically-determined distribution), and the probability that the observed datum will be any particular value is predicted by that distribution. Conversely, given such a distribution(s), the probability that the data value actually observed would have arisen from the relevant model at the defined parameter values can be specifically defined. A unique advantage of the Likelihood approach is that it incorporates such“noise distributions” and thus can consider the effects of noise fluctuations on the probability that a given datum in a data set will a particular value.

[0048] Modeling Fluorescence Spot Image Data for Likelihood Analysis Problems With

Full Likelihood Analysis.

[0049] Likelihood analysis can find particular use in the identification and characterization of fluorescent spots in images of biological bodies. To apply a likelihood analysis for such a case, sources of noise may be modeled in the Likelihood functions for a signal hypothesis model and for a null hypothesis model. For the particular case of fluorescence spot image data, the following considerations apply.“Photon noise” (which is characteristic of both the pattern of interest and background pattern) can be described by a Poisson distribution whose mean value is the average number of photons impinging on a given pixel (L_έ). This Poisson distribution pertains to either (i) photons from both the pattern of interest and the background pattern (in the signal hypothesis) or (ii) photons from the background pattern only (in the null hypothesis). Equation (6) below sets forth an example probability function setting forth the probability of observing a given pixel Y_L (/. e. , the combined contributions of noise from the pattern of interest and the background pattern) in a Poisson distribution parameterized by the mean number of photons _L:

(Eq. 6)

[0050] Another source of noise in fluorescence spot image data is (iii) measurement noise. In a particular image, a respective number of photons impinges on each pixel and is converted by the camera to some number of electrons, each of which is stored as an“analog- to-digital unit” (ADU). This conversion process is characterized by noise. This so-called “camera noise” (a form of measurement noise) may be described by a Gaussian distribution whose variance is given by the read noise of the particular pixel and whose mean value is given by the magnitude of the user-selected or system-selected“offset” used to eliminate negative ADU values. Equation (7) below sets forth an example probability function setting forth the probability of observing a given data set X_L (i.e.. measurement noise) in a Gaussian distribution parameterized by the read noise variance of the camera (a ) and a mean offset (m_έ) used to eliminate negative ADU values:

(Eq. 7)

[0051] The two types of noise, i.e. photon noise (from the pattem(s) of interest and/or the background pattem(s)) and camera noise may be independent from each other.

Consequently, their effects may add to give the distribution that describes the fluctuation in

ADUs. Mathematically, this additive distribution therefore may be described by convolving the distributions. In the particular case described above, this implies convolving the Poisson distribution describing the photon noise with the Gaussian distribution describing the camera noise. Equation (8) below illustrates that convolution:

(Eq. 8)

[0052] Given the distribution of equation (8) above, the probability of occurrence of a particular number of ADUs present at a given point in the data set (in the case of 3D fluorescence images, at a given pixel or voxel) can be determined. The mean value of the Poisson distribution may be defined by the predicted photon level value at that pixel, whereas the camera noise Gaussian distribution may be defined by empirical calibration

measurements for each pixel in a particular camera/imaging setup.

[0053] The standard Likelihood approach (including noise modeled according to equation (8) above) cannot practically be applied if the data set is very large, if the models involve multiple parameters, and/or for complex spatial and/or numerical distributions of noise, because it would be computationally intractable. This intractability arises from the computational complexity required to determine an MLE for the signal hypothesis (or any hypotheses) in those situations.

[0054] The computational intractability of a full Likelihood analysis applies to fluorescent spot analysis— specifically, to initial detection of fluorescent spots in image data. As a result, existing approaches include the use of ad hoc computational criteria and/or human visual inspection of images to initially identify the presence of a spot. After identifying the region of the data set containing a spot, existing approaches then may use a full Likelihood approach to determine the specific location of the spot. For example, some existing approaches use an iterative“hill-climbing” exercise that defines the MLE for a small region of the data that is defined as containing a spot based on ad hoc computational criteria and/or human visual examination. A hill-climbing exercise starts from a particular position in parameter space of the Likelihood function, evaluates the slope of the Likelihood function at that point, and follows that slope in the upward direction in parameter space to find a maximum Likelihood value at the position where the derivative of the Likelihood function is zero.

[0055] A hill-climbing exercise cannot practically be used to initially identify a spot because, if the wrong starting point were chosen, the MLE may be determined for a local maximum in the data set that does not, in fact, represent a spot. The result would be loss of robustness, i.e. detection of a spot where none is present or failure to detect a spot when one is present, or detection of a spot with incorrect parameters specified. Thus, to robustly define the presence of a spot by a hill-climbing Likelihood approach, it would be necessary to carry out the hill-climbing exercise beginning from every position in the parameter set. The iterative nature of the hill-climbing approach is very computationally expensive, and the computational load required to carry out hill-climbing starting at every position is a parameter set is prohibitive. Thus, as noted above, known approaches utilize ad hoc criteria and/or human visual inspection detect the presence of a spot at an approximate position in the data set.

[0056] As an improvement on the above approaches— e.g., a full, single-stage Likelihood analysis to identify and locate a pattern of interest, or a manual or computational

identification of a pattern through ad-hoc criteria followed by a Likelihood analysis to localize and further characterize the pattern of interest— the two-stage Likelihood pipeline may be used to take advantage of the benefits of a Likelihood analysis while reducing computational workload to a practical level. In embodiment, the two-stage pipeline may be modified according to the additions of this disclosure. In this conceptual framework, the Likelihood approach, or a version thereof, may be used at both stages of the analysis. This contrasts with other known methods, in which the Likelihood approach, when applied, is reserved for the final stage, thus losing out on its potential benefits during initial stages.

[0057] Improvement to a Full Likelihood Analysis Two Stage Likelihood Pipeline.

[0058] The two-stage Likelihood pipeline includes two stages, described in turn below. First, in Stage I, a modified Likelihood analysis may be performed to determine the presence of zero, one or more spot candidate locations. Second, in Stage II, a full Likelihood analysis may be carried out at one or more of the spot candidate locations identified in Stage I to verify the presence of (i.e., to formally detect) and to localize and determine the amplitude of spots at the candidate locations. Either stage, or both stages, of the pipeline may be modified with one or more of the novel advances of the instant disclosure, namely: (i) calculation and use of a false positive rate respective of a Likelihood ratio; (ii) calculation and use of a standard error of one or more values of a maximum likelihood estimate or a likelihood value, and the use of that standard error; and (iii) calculation and use of a multi-factor likelihood landscape.

[0059] Two-Stage Likelihood Pipeline - Stage I.

[0060] In a pipeline analysis, the full data set may be analyzed in small units that may be referred to in this disclosure as“patches” or“segments.” In an embodiment, the size of a patch or segment may be slightly larger than that expected for a fluorescent spot. In embodiments in which the pattern of interest is something other than a fluorescent spot, patch size may be selected as appropriate. In embodiments in which each data point is a pixel or voxel, a patch or segment may be as small as a single pixel or voxel, or may be a set of pixels or voxels that contain a pattern of interest. In embodiments, a patch or segment may include a contiguous set of adjacent pixels or voxels. Additionally or alternatively, a patch or segment may include pixels or voxels that are non-contiguous or non-adjacent. Patches may be defined for every position in the data set. In an embodiment, every data point in the data set may be included in at least one patch. A Likelihood-type analysis may be carried out for each patch with three simplifying modifications relative to a full Likelihood analysis: (i) the 3D Gaussian distribution corresponding to the pattern of interest (i.e.. the spot) and the distribution corresponding to the background may be assumed to be positioned at some respective specified positions within the patch; (ii) the measurement noise may be described as a Poisson distribution, rather than as a Gaussian distribution; and (iii) the intensity of the mean photon values may be assumed to be low relative to measurement noise. This disclosure will discuss embodiments in which the specified location of the pattern of interest and/or the background pattern within a patch or segment is the center of the patch or segment. However, the two-stage Likelihood pipeline is not so limited. Rather, it should be understood that the specified position of the pattern of interest within a patch or segment may

alternatively be some position other than the center, in embodiments.

[0061] A signal hypothesis model (incorporating both the pattern of interest and background) and a null hypothesis model (incorporating only background) based on the above assumptions offer two simplifications over a standard, full Likelihood approach as described above. First, the number of variable parameters is decreased because the positions of the pattern of interest and of the background pattern are specified, rather than variable. Thus, in the signal hypothesis model, there are only two variable parameters, A and B. For the same reason, in the null hypothesis model, there is just one variable parameter, B. As a result, at stage I, example signal hypothesis and null hypothesis models may be described according to equations (9) and (10) below:

signal hypothesis

(Eq. 9)

null hypothesis º T;(B) = B (Eq. 10)

where ceri represents a specified position at the center of the patch or segment.

[0062] For a fluorescent spot existing in a three-dimensional data set, /; is a 3D Gaussian distribution, corresponding to the distribution of photons emanating from a fluorescent point source, A is the amplitude of that Gaussian distribution, pos (in x, y, z) defines the position of the center of the Gaussian within the 3D data set, and B is the amplitude of the background.

[0063] A second simplification offered by the above assumptions is that, because the effects of noise can be reasonably approximated by describing the camera noise as a Poisson distribution, all noise components (i.e., (i) photon noise for the pattern and/or background and (ii) camera noise can be incorporated into a single Poisson distribution that is the sum of the two Poisson components, as set forth in the progression of equations (11) - (14) below:

(Eq. 13)

Pp_oisspoissid^i = O, s , l ) = Poisson{af + l ) [άi + of]

(Eq. 14)

[0064] Equation (14), stated as a probability function, can also be described as a

Likelihood function, as in equation (15) below:

(Eq. 15) [0065] Taking the logarithm of equation (15) yields equation (16) below:

(Eq. 16)

[0066] The derivative of equation (16) with respect to L_έ, which may be used to derive the MLE for (A, B ), as described below, is shown in equation (17) below:

(Eq. 17)

[0067] To derive the approximate MLE for (A. B ) at a given pos from equation (16), the following process may be followed, in an embodiment: (1) assume the intensity L_έ (i.e.. the data representing the pattern of interest) is low relative to the camera noise af , such that the additive data and camera noise, represented as af + L_έ, can be reduced to af (as in equation (18) below):

(Eq. 18)

(2) take the derivatives of the B_PoissPoiss with respect to A. B, and (3) set each derivative as equal to zero (as in equations (19) and (20) below, which represent the A, B derivatives of equation (18) as applied to the signal hypothesis model):

The summations can be treated as scalars, and thus {A, B} can be solved by matrix inversion. Equations (21) and (22) below illustrate this derivation as applied to equations (19) and (20), respectively, and equation (23) below illustrates the setup of the matrix for inversion:

[0068] Equation (23) above may be solved for _^4 and 5 to derive the stage I MLE of those parameters for a single patch. To extend the solution to the entire data set, the dot product of, e.g., equation (21) (å _{=1 ma}y be generalized to convolution for the entire data set,

fk(cen)

yielding å_fceS to complete the algebra for total dataset analysis by the approximate

MLE of Stage I. If f_k is not symmetric, then the orientation may be flipped in all dimensions to solve equation (23), as set forth in equation (24) below:

(Eq. 24)

[0069] Computational tractability may be provided through the use of a Fast Fourier Transform analog of the convolution operator, or for the case of a 3D Gaussian, by separable convolution, in embodiments. [0070] Solving equation (23) for A gives the approximate MLE of A_L (with the assumption that the pattern of interest is centered at point /) for the signal hypothesis, as given by equation (25) below:

(Eq. 25)

[0071] Solving equation (23) for B gives the approximate MLE of B_L (once again, with the assumption that the pattern of interest is centered at point /) for the signal hypothesis, as given by equation (26) below:

(Eq. 26)

[0072] Equations (24) and (25) above may be solved for A and B to determine the MLE for the signal hypothesis at a single patch, or segment, of the data set. Thus, equations (25) and (26) may be applied to all patches in the dataset, solving for {A^ B_L \ for all positions i and assuming that the spot pattern is centered in the patch at position cen. As noted above, in other embodiments, the spot pattern may be assumed to be centered or located at some point in each patch or segment other than the center.

[0073] To determine the approximate MLE for the null hypothesis model, a similar analysis to that set forth with respect to equations (19) - (23) above may be carried out, but with equation (18) applied to the null hypothesis model instead of the signal hypothesis model, yielding the following equation (27) for the approximate MLE of B_L for the null hypothesis model:

(Eq. 27)

[0074] For each given patch, the MLEs for the signal hypothesis model and the null hypothesis model may be calculated and used to determine the Likelihood ratio for that patch and a resulting Likelihood ratio landscape with equations (28) - (30) below, where equation (28) is the result of integrating equation (18), equation (29) is the application of equation (28) to the signal hypothesis model, and equation (30) is the application of equation (28) to the null hypothesis:

(Eq. 30)

[0075] In an embodiment, a Likelihood ratio based on equations (29) and (30) is given by equation (31) below:

[0076] Likelihood ratios may be determined for each patch, or segment, in the data set according to the equations above. In the case of fluorescence image data, where a patch is present centered on each pixel or voxel in the data set, a Likelihood ratio may be determined for each such position. The Likelihood ratios for each segment patch may be compared to a threshold such that, above a minimum value of the Likelihood ratio, a patch is determined to have a reasonable probability of including a spot. These patches may be termed“candidate spot regions,”“candidate spot patches,” or“candidate spot segments” in this disclosure. The threshold may be selected or determined experimentally, in embodiments, to achieve an appropriate balance between sensitivity and over-inclusiveness (i.e., to minimize false positives and false negatives). Additionally or alternatively, one or more other values may be compared to a threshold, such as a false positive rate calculated according to the present disclosure, or a multi-factor landscape value calculated according to the present disclosure.

[0077] The stage I analysis set forth above is set forth with respect to identification of fluorescent spots in a data set comprising 3D image data, but a person of skill in the art will appreciate that the approach is readily applicable to many other types of patterns and backgrounds in many other types of data sets. To that end, the approximate MLE of A_L for the signal hypothesis model for a general case is given in equation (32) below:

(Eq. 32)

[0078] The approximate MLE of B; for the signal hypothesis model for a general case is given in equation (33) below:

(Eq. 33)

[0079] As a result of the above analysis, determination of MLEs for all patches in a large data set, for each model, and the corresponding Likelihood Ratios, are computationally tractable. For example, in an embodiment given present-day computing resources, the determination of MLEs for all patches in an example fluorescence image data set may take approximately a minute per large 3D image data set.

[0080] In addition, although the instant disclosure discusses embodiments in which the signal hypothesis includes one linear parameter (A) associated with the signal pattern and one linear parameter ( B ) associated with the background pattern, the techniques and methods of this disclosure may be readily applied to any number of linear parameters associated with the signal pattern and/or the background pattern. In the general case above, for the

signal hypothesis º li(A, B, pos_A, pos_B) = Af_L(pos_A) + Bg_L(pos_B). the parameters A and B are linearly related to X_L and for a given data set and specified positions, A and B may be solved algebraically by the using the notation described in equation (23). Consequently, all linear parameters for any given hypothesis can be solved for any n number of patterns since these linear parameters can be manipulated into the matrix notation described in equation

(23). Ultimately, for a hypothesis º a₁f₁ ⁽ c ^') + a₂f₂ (x^) d - f < /_nC ) a set of amplitude parameters {a_t, a₂, ... a_n } can be solved directly by the approximate Likelihood for each pattern {f , f₂, ... , f_n } given each position { , x , ... , ¾} for a given data set.

[0081] In embodiments, the result of Stage I of the two-stage Likelihood pipeline is definition of one or more candidate patches or segments. As set forth below, those candidate patches or segments may then be subjected to a full Likelihood analysis to characterize (i.e.. confirm (or not) the existence of, and determine the intensity and exact location of) the corresponding patterns of interest (i.e.. spots) that may be present in those candidate patches or segments.

[0082] Stage I analysis has another important feature: it can be used to reverse systematic distortions that arise during data collection. This is achieved if, during Stage I analysis, via the signal hypothesis, the pattern of interest fi is chosen so as to match the shape of the systematic distortion. This can be illustrated for the case of image analysis, where an imaged object is systematically distorted by physical limitations of the optics. Correction for such distortion is called deconvolution. For the application of fluorescent imaging, the systematic distortion for the imaging system involved is defined by the Point Spread Function of the lens. By Stage I analysis, via the signal hypothesis, deconvolution of a fluorescence image may be achieved by defining as the Point Spread Function of the lens.

[0083] It should also be noted that, in fluorescence imaging, Stage I analysis can be applied not only to a spot, but to a fluorescent object with a more complex shape ( e.g ., a bacterial nucleoid). Such an object may be illuminated by many individual fluorophores and thus may comprise the superposition of many spots. Such an image comprises the summed output of the large number of point sources decorating that complex object, i.e. is one continuum of different spot densities. Upon application of the Stage I signal hypothesis, the output of the parameter A at all positions in the data set may provide an image of such an object. Moreover, when this exercise is carried out with f, equal to the distortion (i.e. the Point Spread Function), the output of the parameter A at all positions in the data set may provide the true, undistorted version of the object.

[0084] Two-Stage Likelihood Pipeline - Stage II.

[0085] An example Likelihood function for the signal hypothesis model having a full noise model for a fluorescent image data set is set forth in equation (34) below:

k_j P?₌₁₍ Gaussian(pi, af) * Poisson(Ai(A, B, pos)))[di]

(Eq. 34)

[0086] A person of skill in the art will appreciate how to construct a similar full Likelihood function for the null hypothesis model, as the Likelihood approach is well documented and understood. Similarly, a person of skill in the art will appreciate how to solve for A and B and pos to determine the Likelihood ratio respective of each candidate segment or patch as a function of the values of those parameters.

[0087] The data set at stage II comprises the data from the original data set d that is within or around the candidate spot segments or patches. A Likelihood ratio may be determined for each candidate spot segment, and each ratio may be compared to a second threshold. This second threshold may be separately determined or selected from the first threshold, in embodiments. This Likelihood ratio may provide the final definition of whether a spot is present or not.

[0088] The full Likelihood analysis at Stage II may differ from the approximated

Likelihood analysis off Stage I. First, the Likelihood functions used in stage II may be fully- detailed Likelihood functions, thus providing optimal definition of MLEs in Stage II. For example, in the case of spot detection and characterization, the noise per pixel may be represented as a (Poisson*Gaussian) distribution. Second, the values of x, y, and z (i.e.. the components of pos ) may vary throughout the analyzed region in Stage II, rather than being fixed at the center of a patch as in Stage 1. As a result of these two features, for a region defined as containing a spot (i.e.. a candidate segment), analysis of the MLE for the signal hypothesis model for that region will yield not only the values of parameters A and B ( i . e. , intensities of the spot itself and of the background), but also the position of the spot in the three dimensions at sub-pixel values of x, y and z. MLE determinations may be made in Stage II through a hill-climbing exercise or other appropriate methods, in embodiments.

[0089] As noted above, a full Likelihood analysis for fluorescent spot detection is limited in known methods by the need to initially identify candidate spot locations manually and/or by ad-hoc manual or computational criteria. The two-stage Likelihood pipeline overcomes this limitation through the use of minimally -invasive approximations of the full Likelihood functions at Stage I. Furthermore, the estimates of A and B and x, y, z provided at Stage I may generally be similar to the precisely-defined global maxima provided by the fully detailed Likelihood analysis of Stage II. Thus, if a hill climbing exercise is applied in Stage II to each candidate spot region, it can be seeded by (i.e. begin with) the parameter values of the signal hypothesis and the null hypothesis defined at Stage I. For application according to the signal hypothesis, the starting point for this exercise may be provided by the values of A, B, x, y and z defined by the Stage I MLE according to that hypothesis. For application according to the null hypothesis, this starting point may be provided by the value of B defined by the Stage I MLE, where the value of B is the same at every position, thus removing x, y and z as variables. Seeding the hill-climbing exercise provides computational tractability without the risks of (i) climbing an irrelevant hill and thus detecting a spot where none is present, (ii) failing to detect a spot when one is present, or (iii) detecting of a spot with incorrect parameters specified. The outcome of the hill-climbing exercise is, for each hypothesis, a Likelihood landscape in the corresponding 5-parameter space (signal hypothesis) or 1 -parameter space (null hypothesis). In each case, the position in the parameter space with the highest Likelihood comprises the MLE; and the ratio of the Likelihood values at the MLEs for the two hypotheses comprises the Likelihood Ratio for that candidate spot region. The value of the Likelihood Ratio provides a measure of the probability that a spot is present; and the corresponding values of all parameters

corresponding to the MLE of the signal hypothesis yield the intensity of the spot (A) and of the background (B) and the location of the spot (in (x, y, z), which may be at sub-pixel resolution) in the data set. In summary, overall, the effects of combining Stage I and Stage II in the Two-stage Likelihood Pipeline confer the advantages of a full Likelihood approach with respect to robust spot detection, precise and accurate spot localization and quantification of spot and background intensities without the unmanageable computational complexity of the standard Likelihood approach.

[0090] Two Stage Likelihood Pipeline False Positive Rate

[0091] As noted above, one aspect of both Stage I and Stage II is comparing one or more values, such as one or more values in an MLE (approximate or full), or one or more values based on an MLE, such as a Likelihood value or Likelihood ratio, for a segment to a threshold and designating the segment as a candidate segment for the pattern of interest (Stage I) or as containing the pattern of interest (Stage II) when the value exceeds the threshold. For example, in some embodiments, a Likelihood ratio based on the difference between the likelihood value of the signal hypothesis and the likelihood value of the null hypothesis may be compared to a threshold.

[0092] In some data set types, an appropriate threshold for comparison to a Likelihood ratio or other MLE-based value may depend on the influence of noise on the amplitude and/or distribution of the background pattern. Accordingly, in some embodiments, a value that is less dependent on the background amplitude, and that is therefore simpler to set a threshold for, may be used. For example, as detailed below, a false positive rate, false negative rate, true positive rate, or true negative rate respective of the Likelihood ratio may be calculated and compared to a threshold to determine whether or not a segment is designated as a candidate segment for (Stage I) or as the location of (Stage II) the pattern of interest.

[0093] On example method of calculating a false positive rate for a Likelihood ratio may include, first, determining fluctuations in the Likelihood ratios of the segments of a data set over time, to determine the effects of changing amplitudes and distributions of noise on the signal of interest and on the background. In some embodiments, such fluctuations may be measured. In other embodiments, fluctuations in the Likelihood ratios of the segments of the data set may be simulated. In an embodiment, fluctuations may be simulated specifically in segments that do not include the pattern of interest, such that the effects of noise fluctuations on the background amplitude and distribution are simulated.

[0094] After Likelihood ratio fluctuations in the data set are determined, a probability density function may be fit to the Likelihood ratio fluctuations. One probability density function that may be used is a gamma function, for the reasons given below. Equation (16), which is used to solve for the values of the MLE at Stage I, is composed of a difference between two mean squared error components, with one mean squared error component corresponding to the signal hypothesis, and the other to the null hypothesis. That is, one component in equation (16) represents the squared error between the data and the best fit intensity model of the signal hypothesis. It is known that the distribution of a random variable that is composed of a squared error is given by the Chi-squared distribution, which is one specific parameterization within the Gamma distribution family. Thus, because equation (16) is composed of a difference of two squared error components, how the value of equation

(16) (i.e., L(hypothesis \ d_j)_pOLSsPoLSS) is distributed is determined by a difference of two Gamma random variables. For computational simplicity and the fact that a difference of two Gamma random variables may be another Gamma distribution, a single Gamma distribution may be used as the probability density function for determining a false positive ratio, in some embodiments.

[0095] Accordingly, in an embodiment, a gamma function may be fit to the Likelihood ratio fluctuations. The output of that probability density function (e.g., gamma function) may be predictive of the correlation between the Likelihood ratio and the existence of a pattern of interest as to the data set in question based on a given background value, thus allowing a false positive rate to be calculated. Accordingly, the false positive rate of the likelihood value may be defined according to equation (35) below:

FPR(LRatiOi) = 1— CDF {Gamma(a, b))[LRatioi]

(Eq. 35)

where FPR is the false positive rate, LRatio is the Likelihood ratio at point /. CDF indicates that the Gamma function is a cumulative distribution function, a is a shape or scale parameter, and b is a rate parameter.

[0096] The values of a and b can be calculated by plotting or fitting a function (e.g. a polynomial function) that relates the values of a and b to the different effects of changing amplitudes and distributions of noise on the pattern of interest and on the background. In an embodiment, only the background distributions may be changed. In such an embodiment, a may be a constant and b may be a function of the background intensity.

[0097] For a particular data set, a false positive rate of the likelihood value, based on a particular background that determines the rate parameter of the Gamma function, may be defined according to equation (36) below.

(Eq. 36)

[0098] As known in the art, a true negative rate may be calculated instead of or in addition to a false positive rate, based on the same underlying data and calculations.

[0099] In an embodiment, a false positive rate (e.g., calculated as set forth above) may be determined based on measured or otherwise known fluctuations in the distribution of the Likelihood Ratio for portions (e.g., segments) of the data set known not to contain a pattern of interest. In contrast, in an embodiment, a true positive rate (TPR) may be calculated based on fluctuations in the distribution of the Likelihood Ratio in portions of the data set (e.g., segments) that do include a pattern of interest. After Likelihood ratio fluctuations in the portions of the data set containing a pattern of interest are determined, a probability density function may be fit to the Likelihood ratio fluctuations to determine the true positive rate, in substantially the manner set forth above with respect to the false positive rate.

[0100] As known in the art, a false negative rate may be calculated instead of or in addition to a true positive rate, based on the same underlying data and calculations.

[0101] With an appropriately-defined formulation for the false positive rate according to equation (35) (generically) or (36) (specifically) above, the value of the false positive rate for a given segment can be compared to a threshold to determine if that segment should be designated as a pattem-of-interest candidate segment (Stage I) or as a location of a pattern of interest (Stage II). In addition, as noted above, one or more other values associated with an MLE of the segment, such as a Likelihood value or ratio, may also be compared to a threshold to determine if that segment should be designated as a pattem-of-interest candidate segment or is the location of a pattern of interest.

[0102] Two-Stage Likelihood Pipeline Multi-Factor Landscape [0103] The calculation of respective Likelihood ratios for a plurality of segments within a data set may result in a landscape of Likelihood ratios (which may be referred to herein as a “likelihood landscape”), the peaks of which are most likely to correspond with the locations of a pattem-of-interest. In some embodiments, such peaks may appear as“mountain tops”— that is, with the values in adjacent segments appearing as“slopes” up to a“peak” segment for each instance of the pattern of interest. In such embodiments, additional or alternative values or calculations may be considered to locate the peaks in the landscape. For example, as described below, a gradient (i.e., first derivative) of the likelihood landscape may be calculated and used. Additionally or alternatively, a Hessian (i.e., second derivative) of the likelihood landscape may be calculated and used.

[0104] An example formulation of the gradient of the likelihood landscape is provided in equation (37) below, which is a gradient equation of the likelihood landscape with respect to the position (x, y, z) of the pattern of interest f_L.

(Eq. 37)

where A, B, L, and s are as defined above, and u is the position of the segment in the data set. [0105] In some embodiments, a two-stage Likelihood pipeline analysis may be applied to a data set having multiple discrete subparts resulting from data capture with multiple discrete data capture regimes. As noted above, one example of a multi-regime data set may be an image or set of images containing fluorescent spots different respective colors (which may be considered different patterns of interest) captured using different imaging regimes (e.g., separate excitation wavelengths and separate wavelength filters for data collection). In such multi-regime data sets, a given pattern of interest may have output in multiple data subsets, i.e., may have penetrance in multiple data subsets. A signal hypothesis for a multi-regime, multi-pattern data set may include the sum of the various patterns of interest in the various data capture regimes, appropriately weighted by penetrance coefficients, as shown in equation (38) below.

two pattern, two regime signal hypothesis = b₁₁l_{1 ί} + b_ΐ2^·2_,ί> bii^-i_,ί + bii^-i_,ί

(Eq. 38)

where L_{1 έ} and A_2,i ^are the output of the first and second pattern intensity signals, respectively, at data point index i.

[0106] A logarithm of the likelihood value for the two pattern, two regime signal hypothesis may take the form given in equation 39 below.

(Eq. 39)

L(two signal hypothesis |d)

[0107] An example formulation of the gradient of the likelihood landscape at a segment for an example two-regime, two pattern data set is given in equation 40 below.

(Eq. 40)

where cl₁ is the data respective of the first data subset corresponding to the first data capture regime, A_x is the measured intensity of the pattern of interest intended to be captured in the first regime, and 9_t is the parameter with respect to which the derivative is taken (e.g., position of the pattern of interest), d₂ is the data respective of the second data subset corresponding to the first data capture regime, A₂ is the measured intensity of the second pattern of interest intended to be captured in the second regime, and B_L and B_j are the amplitudes of the background at points i and j, respectively.

[0108] The gradient of the likelihood landscape with respect to 9_t in equation (40) above is a weighted sum of the gradients in each regime (where B_L å

is the gradient in the

first regime, is the gradient in the second regime) and each regime

gradient is a dot product between

and

[0109] In a two-regime data set (e.g. , having two discrete subparts respectively associated with the two regimes), the Hessian of the likelihood landscape is similarly a weighted sum of the values in the two regimes, as shown in equation (41) below.

(Eq. 41) where 9_m is the parameter with respect to which the second derivative is taken. In an embodiment, both q_Ίh and 9_t may be the position of the pattern of interest. In another embodiment, 9_m and 9_t may be set to another parameter, and/or may be set to different parameters from each other.

[0110] In a single-regime data set, on the other hand, the Hessian of the likelihood landscape may be given by equation (42) below.

(Eq. 42)

[0111] In some embodiments, one or more of these additional values or calculations (i.e., gradient and/or Hessian) may be combined with Likelihood ratios at the various data points in a data set to create a“multi-factor landscape” that may provide more precise localization of patterns than the likelihood landscape alone. For example, in an embodiment, the Hessian, gradient, and Likelihood ratio at each segment may be arithmetically combined (e.g., multiplied together) to create a three-factor landscape value for that segment. The three- factor landscape value for a given segment may be compared to a threshold to determine if that segment should be designated as a pattem-of-interest candidate segment (Stage I) or as the location of the pattern of interest (Stage II). [0112] In some embodiments, instead of or in addition to a three-factor landscape, a two- factor landscape may be calculated. In such embodiments, two of the Hessian, gradient, and Likelihood ratio at each segment may be combined (e.g. , arithmetically, such as by multiplication) to create a two-factor landscape value for that segment. The two-factor landscape value for a given segment may be compared to a threshold to determine if that segment should be designated as a pattem-of-interest candidate segment.

[0113] In some embodiments, a Likelihood ratio raised to a power greater than one may correlate strongly with a multi-factor landscape value, and thus may provide more precise localization of a pattern than the likelihood value itself. Accordingly, in some embodiments, the likelihood ratio for a particular segment may be raised to a power greater than one to calculate a scaled likelihood ratio, and the scaled likelihood ratio may be compared to a threshold to determine if that segment should be designated as a pattem-of-interest candidate segment. In an embodiment, the power may be three (3).

[0114] In addition to calculating a multi-factor landscape value, the gradient and Hessian may also find use in hill-climbing. As noted above in the description of Stage II, a hill- climbing exercise may be conducted in Stage II to determine amplitude and position for the pattem(s) of interest. In some embodiments, that hill-climbing exercise may be based on the gradient and/or Hessian of the likelihood landscape. For example, in an embodiment, the gradient of the likelihood landscape may be considered for hill climbing. Equation (43) below provides an example formulation of an advance from a first position, 6_prev, to an adjacent position, 6_next (where, in this example, Q is selected in the gradient to be the position of the pattern of interest, as described above), along the“hillside” of the gradient.

(Eq. 43) [0115] In another example embodiment, the Hessian of the likelihood landscape may be considered for hill-climbing at Stage II, and adjacent data points in a“hill” may be considered for hill climbing. Equation (44) below provides an example formulation of an advance from a first position, d_prev, to an adjacent position, 6_next (where, in this example, Q is selected in the gradient to be the position of the pattern of interest, as described above), along the“hillside” of the Hessian.

(Eq. 44)

[0116] Two-Stage Likelihood Pipeline Standard Error.

[0117] Like any calculation, the MLE, likelihood values, and likelihood ratios at Stage I and Stage II may be associated with a standard error. Proper calculation of such error my enable more accurate designations of candidate segments at Stage I and more accurate designations of pattern locations and amplitudes at Stage II.

[0118] The standard error in the parameter values in the MLE is related to the Expected Fisher Information Matrix. For a single-regime data set, the Fisher Information Matrix for a given parameter Q in a logarithm of a Likelihood equation (e.g., equation (16) or an equivalent thereof) may be defined according to equation (45) below.

(Eq. 45)

where E is the expected value, L is the likelihood value, and 9_t and 6_m are selected values with which the first and second derivatives of the logarithm of the Likelihood equation are taken.

[0119] In an example two-regime data set, the Fisher Information /(0) is additive across the two regimes because, as shown in equation (39), the logarithms of the Likelihoods are additive across the two regimes. A two-regime version of the Fisher Information is given in equation (46) below.

(Eq. 46)

[0120] Given the form of the logarithm of the Likelihood in equation (16) (e.g., in which the Likelihood is based on a Poisson distribution approximation of noise), the Fisher Information in equation (46) above may be expressed according to equation (47) below.

(Eq. 47)

[0121] A covariance matrix can be defined based on the inverse of the Fisher Information, as shown in equation (48) below.

CoviB^') = Iie ¹

(Eq. 48)

[0122] The diagonal entries of the covariance matrix defines how the variables in the

Likelihood correlate with themselves— i.e., the variance. Accordingly, the square root of the diagonal gives the standard error, and the standard error of Q can be solved as the square root of the diagonal of the covariance matrix Covie).

[0123] Conceptually, the Fisher Information of a Likelihood value may be thought of as the curvature of the Likelihood at the MLE. The“sharper” the curvature is, the more reliable the MLE estimate may be, because a sharper peak may indicate that any parameter estimates deviating from peak values result in a likelihood value that reduces more quickly as the values move away from the peak. In contrast, a lower curvature may indicate that many parameter estimates near the MLE result in a likelihood value that is closer to the MLE. In other words, the standard error will go up as the sharpness of the peak of the curvature of the Likelihood increases.

[0124] Example Methods Applying the Two-Stage Likelihood Pipeline.

[0125] As noted above, the two-stage Likelihood pipeline may find particular use with data sets having a low signal-to-noise ratio. One such type of data set is a data set including one or more images of a biological body under study to characterize one or more fluorophores. The photon output of such fluorophores is proportional to the intensity of the excitation energy applied to the fluorophores. By enabling low-SNR characterization of fluorophores, the two-stage Likelihood pipeline enables the use of low excitation energy, thereby reducing cell toxicity from excitation energy. Low-SNR regimes also minimize destruction of the fluorophores that occurs due to excitation (known as“photobleaching”). The following methods are generally directed to characterization of fluorophores using the two-stage Likelihood pipeline, but it will be appreciated that the two-stage Likelihood pipeline may find use with many types of data sets.

[0126] FIG. 1 is a flow chart illustrating an embodiment of a method 10 of identifying and characterizing a pattern of interest in a data set. The method may be or may include one or more aspects of the two-stage Likelihood pipeline, described above. The method may begin with a step 12 of acquiring an N-dimensional data set. The step 12 of acquiring the N- dimensional data set may include acquiring (e.g.. by electronic transmission) one or more pre-captured images. Additionally or alternatively, the step 12 of acquiring the N- dimensional data set may include capturing one or more images with an image capture device and/or controlling or otherwise communicating with an image capture device to cause the image capture device to capture one or more images. The data set may include 3D imaging data captured using fluorescence microscopy, in an embodiment. The data set may include data respective of a single data capture regime (e.g., a single subset), or data respective of multiple regimes having respective associated data subsets. For ease of reference, the method 10 will be described with respect to an embodiment in which the data set includes 3D imaging data captured using fluorescence microscopy, where each data point has the form given in equation (4) of this disclosure. It should be understood, however, that in other embodiments, the data set may include another type of imaging data and/or non-imaging data. The method 10 will also be described with reference to a fluorescent spot as the pattern of interest. It should be understood, however, that the method is more broadly applicable to other patterns.

[0127] In an embodiment, the 3D dataset may include multiple images captured at multiple respective 2D focal planes using a microscope, with each of the 2D focal plane images having x- and y-dimensions and the third z-dimension corresponding to a depth dimension along the different focal plane images. FIG. 2 is a diagrammatic illustration of a 3D dataset 20 that may be captured, acquired, and/or processed in accordance with some embodiments of the method 10. As shown in FIG. 2, the z-dimension of the 3D dataset 20 may include a plurality of images 22 captured at different focal planes.

[0128] Although ten different focal plane images 22 are illustrated in FIG. 2, it should be understood that any suitable number of images at any suitable number of focal planes may be included in the 3D dataset (also colloquially referred to herein as a“z-stack” of images or a “z-series”).

[0129] In some embodiments, the 3D dataset may be acquired using a conventional epi- fluorescence illumination microscope in which images from multiple focal planes are acquired sequentially by physically moving the position of the microscope stage up/down (e.g., in the z-direction). The position of the stage may be manually operated or

automatically controlled by a controller including, but not limited to, a computer processor or one or more circuits configured to provide command control signals to the microscope to position the stage. Reducing the amount of time needed to acquire a complete set of focal plane images by using a hardware controller circuit may enable the acquired data to more closely resemble simultaneous acquisition of the data, which facilitates spot detection by reducing the effect of motion over time on the spot detection process, as discussed in more detail below. However, it should be appreciated that capturing a z-stack of images may include any suitable number of focal-plane images.

[0130] Rather than sequentially obtaining a z-stack of images by physically moving the microscope stage, as discussed above, the 3D dataset may be captured with a microscope having multiple cameras, each of which simultaneously acquires data in a unique focal plane, which enables instantaneous collection of a 3D dataset, thereby removing the obscuring effect of object motion between capture of images at different focal planes. In one illustrative embodiment, a microscope having nine cameras and associated optics may be used to simultaneously acquire nine focal plane images. However, any suitable number of cameras (including two cameras) may be used to simultaneously acquire a z-stack of focal plane images, and embodiments are not limited in this respect. For example, in some embodiments, at least three cameras may be used.

[0131] In yet further embodiments, the 3D dataset may be acquired using a combination of multiple cameras and physically moving the microscope stage. Using multiple cameras reduces the time required to acquire a z-stack of images compared to single camera microscope embodiments. Using fewer cameras than would be required to simultaneously acquire all images in a z-stack (e.g., nine focal plane images) and combining the multi camera microscope with stage repositioning may provide for a lower cost microscope compared to fully-simultaneous image capture microscope embodiments. For example, some embodiments may acquire the 3D dataset using a microscope having three cameras and use three different stage positions to acquire a nine focal-plane image 3D dataset. Any suitable number of cameras and physical positioning of the microscope stage may be used to acquire a 3D dataset, and embodiments are not limited in this respect.

[0132] In an embodiment involving the capture of images of one or more fluorescent spots, step 12 may include controlling a microscope, as noted above, and/or controlling a source of excitation radiation to activate the fluorophores to be imaged as fluorescent spots.

[0133] The method may further include a step 14 of applying an approximate Likelihood analysis to the data set to identify one or more pattern candidate segments. Applying an approximate Likelihood analysis to the data set may proceed according to stage I of the two- stage Likelihood pipeline described herein, in an embodiment. An example method that may be applied in step 14 will be described with respect to FIG. 3.

[0134] With continued reference to FIG. 1, step 14 may include dividing the data set into a plurality of segments and applying an approximate Likelihood analysis to each segment, in an embodiment. As noted above in this disclosure, a segment may include a set of adjacent, contiguous voxels, in an embodiment. In other embodiments, a segment may include non- adjacent or non-contiguous voxels or pixels or other portions of the data set. A result of the step may be, for each segment, a likelihood that each segment includes the pattern of interest (e.g. , a fluorescent spot). If the likelihood that a given segment includes the pattern of interest is sufficiently high, the segment may be designated a“pattern candidate segment” for further processing.

[0135] The method 10 may further include a step 16 of applying a full Likelihood analysis to the pattern candidate segments identified in step 14 to characterize the pattern of interest. Step 16 may generally proceed according to stage II of the two-stage Likelihood pipeline described above. A result of step 16 may be one or more characterized pattems-of-interest.

In an embodiment, a result of step 16 may be a location and amplitude of one or more patterns of interest, such as one or more fluorescent spots, as well as further confirmation of the existence of one or more patterns. Of course, in embodiments, no pattern of interest may actually be present in the data set, and the result of step 16 may be zero detected instances of the pattern of interest.

[0136] As noted above, the reduction in excitation enabled by the two-stage Likelihood pipeline approach consequently may reduce biological toxicity of the excitation energy and atomic degradation of fluorophores by the excitation energy, and therefore may allow for more frequent capture of more images over longer periods of time of imaging. Accordingly, embodiments that employ the spot detection techniques described herein also allow for acquisition of a larger number of images and, correspondingly, of imaging data with image capture at more frequent intervals over substantially longer timescales, which opens new possibilities for observing in vivo biological processes that unfold dynamically via rapid modulations over such longer timescales.

[0137] The steps 12, 14, 16 of the method 10 may be repeated over a period of time to track one or more patterns of interest over a plurality of data sets, with each data set comprising a 3D image of the same subject captured at a respective given point in time. In an embodiment, a visualization (e.g., a snapshot image, movie, etc.) of the characterized pattern of interest (e.g., of the characterized fluorescent spot) may be created.

[0138] FIG. 3 is a flow chart illustrating a method 30 of identifying one or more pattern candidate segments (i.e.. segments that may contain a pattern of interest) in an N-dimensional data set. The method 30 may encompass an embodiment of stage I in a two-stage Likelihood pipeline analysis. Accordingly, as noted above, the method 30 may be applied at step 14 of the method 10 of FIG. 1. The method 30 may be applied to a data set that has been divided into segments (the nature of which is described in detail above) in order to identify one or more pattern candidate segments. The data set may be a 3D image data set and the pattern of interest may be, in an embodiment, one or more fluorescent spots. [0139] The method 30 is illustrated and will be described with respect to its application to a single segment. Thus, the illustrated and below-described steps of the method may be applied to each of a plurality of segments in the data set, and the method may be repeated for each segment. Repetitions of the method 30 and/or steps of the method 30 may be performed serially or in parallel. In addition, the method 30 will be described with reference to data of a single regime represented in the data set. A person of skill in the art will appreciate that the method 30 may be applicable to data sets having multiple subsets that correspond to numerous respective data capture regimes.

[0140] For a given segment, the method may include a step 32 of defining the segment as having the pattern of interest and background at respective specified positions within the segment. In an embodiment, the specified positions of the pattern of interest and background may be the same as each other. In other embodiments, the specified positions of the pattern of interest and background may be different from each other. In an embodiment, step 32 may include formulating a signal hypothesis and a null hypothesis having the form set forth in equations (9) and (10), respectively, for the segment. As noted above with respect to equations (9) and (10), formulating the signal and null hypotheses may include assuming that both the pattern of interest and the background are at respective specified positions within the segment, such as the center of the segment, for example. In embodiments, the pattern of interest and background may be assumed to be at the same specified position within the segment. In other embodiments, the pattern of interest and background may be assumed to be at different specified positions within the segment.

[0141] The method may further include a step 34 of calculating a first approximate Maximum Likelihood Estimate (MLE) with respect to a model of the pattern of interest and the background (i.e.. the signal hypothesis model). Calculating the approximate MLE with respect to the signal hypothesis model may include formulating an approximate Likelihood function with respect to the signal hypothesis model that accounts for one or more sources of noise, in an embodiment. For example, the approximate Likelihood function for the signal hypothesis at step 34 may represent measurement noise (e.g., camera noise) as a Poisson distribution, may represent background noise as a Poisson distribution, and may represent pattern noise as a Poisson distribution. In an embodiment, step 34 may include formulating an approximate Likelihood function having the form in equation (23) and solving that approximate Likelihood function to determine optimal values of the approximate Likelihood function for the signal hypothesis at the segment, /. e.. optimal values of the amplitude of the pattern of interest and the amplitude of the background at the segment. In this disclosure, the MLE of an approximate Likelihood function may be referred to as an approximate MLE.

The approximate Likelihood function may also be solved to calculate an approximate Likelihood value (LV) for the segment, /. e.. the likelihood that the data actually present at the segment arose from the signal hypothesis having the calculated optimal values.

[0142] The method may further include a step 36 of calculating a second approximate Maximum Likelihood Estimate (MLE) with respect to a model of the background, i.e., the null hypothesis model. Calculating the approximate MLE with respect to the null hypothesis model may include formulating an approximate Likelihood function with respect to the null hypothesis model that accounts for one or more sources of noise, in an embodiment. For example, the approximate Likelihood function for the null hypothesis at step 36 may represent measurement noise (e.g., camera noise) as a Poisson distribution and may represent background noise as a Poisson distribution. In an embodiment, step 34 may include formulating an approximate Likelihood function having the form in equation (23) (which, as noted above, can readily be modified by a person of skill in the art so as to apply to the null hypothesis) and solving that approximate Likelihood function to determine optimal values of the approximate Likelihood function for the null hypothesis at the segment, i.e., the optimal value of the amplitude of the background at the segment. The approximate Likelihood function may also be solved to calculate a Likelihood value (LV) for the segment, /. e.. the likelihood that the data actually present at the segment arose from the null hypothesis having the calculated optimal background amplitude value.

[0143] The method may further include a step 38 of calculating an approximate Likelihood ratio. In an embodiment, the approximate Likelihood ratio may be the ratio of the Likelihood value associated with the first MLE (i.e.. the MLE with respect to the signal hypothesis model) to the Likelihood value associated with the second MLE (i.e., the MLE with respect to the null hypothesis model).

[0144] The method may further include applying a threshold to the approximate

Likelihood ratio to determine if the segment is a pattern candidate segment. The threshold may be applied to the approximate Likelihood ratio directly, in an embodiment. In other embodiments, the threshold may be applied to a derivation of the approximate Likelihood ratio, i.e., one or more values derived from or based on the approximate Likelihood ratio. If the approximate Likelihood ratio meets the threshold, the segment under examination may be designated as a candidate segment for further processing.

[0145] The method may further include a query step 42 at which it may be determined if further segments remain in the data set for initial examination according to the method 30. If there are additional segments, the method begins anew at step 32 with a new segment. If not, the method ends.

[0146] FIG. 4 is a flow chart illustrating a method 50 of detecting and characterizing a pattern of interest in an N-dimensional data set. The method 50 may encompass an embodiment of stage II in a two-stage Likelihood pipeline analysis. Accordingly, as noted above, the method 50 may be applied at step 16 of the method 10 of FIG. 1. The method 50 may be applied to one or more pattern candidate segments in a data set that have been identified according to, for example, the method 30 of FIG. 3. The data set may be a 3D image data set and the pattern of interest may be, in an embodiment, one or more fluorescent spots.

[0147] Like the method 30 of FIG. 3, the method 50 of FIG. 4 will be described with reference to a single data capture regime represented in the data set. A person of skill in the art will appreciate that the method 50 may be applicable to data sets having multiple subsets that correspond to numerous respective regimes.

[0148] The method may include a step 52 of selecting a pattern candidate segment from a set of one or more pattern candidate segments. The remaining steps of the method 50 are illustrated and will be described with respect to its application to a single selected pattern candidate segment. Thus, the illustrated and below-described steps of the method 50 may be applied to each of one or more pattern candidate segments in the data set, and the method 50 may be repeated for each segment. Repetitions of the method 50 and/or steps of the method 50 may be performed serially or in parallel.

[0149] The method 50 may further include a step 54 of calculating a first full Maximum Likelihood Estimate (MLE) with respect to a model of the pattern of interest and the background (i.e.. the signal hypothesis model). Calculating the full MLE with respect to the signal hypothesis model may include formulating a full Likelihood function with respect to the signal hypothesis model that accounts for one or more sources of noise, in an embodiment. For example, the full Likelihood function for the signal hypothesis at step 54 may represent measurement noise ( e.g ., camera noise) as a Gaussian distribution, may represent background noise as a Poisson distribution, and may represent pattern noise as a Poisson distribution. In an embodiment, step 54 may include formulating a full Likelihood function according to equation (34) and solving that full Likelihood function (e.g., through a hill-climbing exercise) to determine optimal values of the full Likelihood function for the signal hypothesis at the segment, /. e.. optimal values of the amplitude of the pattern of interest, the location (e.g., the center of the distribution) of the pattern of interest, the amplitude of the background, and the location (e.g., the center of the distribution) of the background at the segment. In this disclosure, the MLE of a full Likelihood function may be referred to as a full MLE. The full Likelihood function may also be solved to calculate a Likelihood value for the segment, /. e.. the likelihood that the data actually present at the segment arose from the signal hypothesis having the calculated optimal values.

[0150] The method may further include a step 56 of calculating a second full Maximum Likelihood Estimate (MLE) with respect to a model of the background, i.e., the null hypothesis model. Calculating the full MLE with respect to the null hypothesis model may include formulating a full Likelihood function with respect to the null hypothesis model that accounts for one or more sources of noise, in an embodiment. For example, the full

Likelihood function for the null hypothesis at step 56 may represent measurement noise (e.g., camera noise) as a Gaussian distribution and may represent background noise as a Poisson distribution. In an embodiment, step 56 may include formulating an full Likelihood function according to equation (34) (which, as noted above, can readily be modified by a person of skill in the art so as to apply to the null hypothesis) and solving that full Likelihood function to determine optimal values of the full Likelihood function for the null hypothesis at the segment, i.e., the optimal value of the amplitude and position of the background at the segment. The full Likelihood function may also be solved to calculate a Likelihood value for the segment, i.e., the likelihood that the data actually present at the segment arose from the null hypothesis having the calculated optimal background amplitude value and position.

[0151] The method may further include a step 58 of calculating a full Likelihood ratio. In an embodiment, the full Likelihood ratio may be the ratio of the Likelihood value associated with the first MLE (i.e., the full MLE with respect to the signal hypothesis model) to the Likelihood value associated with the second MLE (i.e., the full MLE with respect to the null hypothesis model).

[0152] The method 50 may further include a step 60 of applying a threshold to the full Likelihood ratio to determine if the pattern is present in the segment. The threshold may be applied to the full Likelihood ratio directly, in an embodiment. In other embodiments, the threshold may be applied to a derivation of the full Likelihood ratio, i.e., one or more values derived from or based on the full Likelihood ratio. If the full Likelihood ratio meets the threshold, the pattern of interest may be considered detected in the candidate segment under examination, and the optimal values of the first full MLE (i.e.. the full MLE respective of the signal hypothesis) may be considered the characteristics of the pattern of interest and the background at the segment.

[0153] The method may further include a query step 62 at which it may be determined if further pattern candidate segments remain in the data set for further examination according to the method 50. If there are additional segments, the method begins anew at step 62. If not, the method ends.

[0154] Embodiments that address detecting and characterizing fluorescent spots may enable direct 3D spot-based super-resolution time-lapse imaging, with spots of two or more fluorescence colors, with unprecedented temporal resolution and duration, in living samples. The defining feature of spot-based super-resolution imaging is very precise specification of the position of a spot, i.e. its“localization.” However, imaged entities move due to thermal forces or, in living cells, more dramatically due to energy-driven effects. Super-resolution 3D imaging may involve acquisition 2D images in each of multiple focal planes. If the different focal planes are imaged sequentially, an imaged entity may move during the process of 3D data collection and such movement will compromise the precision with which a spot can be localized. This effect may be eliminated if 2D datasets are captured in all focal planes simultaneously, which in turn can be accomplished by a microscope having multiple cameras, one per focal plane, which capture images in perfect coordination.

[0155] Super-resolution imaging involves acquisition of images in multiple focal planes, as discussed above. Although these images are obtained in rapid succession, the elapsed time between images may be a significant fraction of the total time involved. Since effective super-resolution time-lapse imaging requires minimization of total excitation energy, and thus total illumination time, it is desirable for the sample to be excited only when an image is actually being captured and not during the intervening periods. This outcome can be accomplished by a suitable combination of hardware and software in which the camera and the light source are in direct communication, without intervening steps involving signals to and from a computer, such that the sample is excited by light only at the same instant that the camera is taking a picture. For simultaneous imaging in multiple focal planes, this direct communication between camera and light source must occur synchronously for all of the multiple cameras responsible for imaging at the multiple focal planes as described above.

[0156] FIG. 5 is a diagrammatic view of an embodiment of a system 70 for acquiring a data set and identifying and localizing a pattern of interest in a data set. As discussed above, a non-limiting 3D dataset that may be analyzed in accordance with the techniques described herein using 3D pattern matching may be acquired using any suitable fluorescence imaging microscope configured to acquire a plurality of 2D focal plane images in a z-stack. The system 70 of FIG. 5 includes a microscope 72 that may be used to acquire such a 3D dataset in accordance with some embodiments. Microscope 72 may include optics 74, which may include lenses, mirrors, or any other suitable optics components needed to receive magnified images of biological specimens under study. In some embodiments, optics 74 may include optics configured to correct for distortions (e.g., spherical aberration). [0157] Microscope 70 also includes stage 78 on which one or more biological specimens under study may be placed. For example, stage 78 may include component(s) configured to secure a microscope slide including the biological specimen(s) for observation using the microscope. In some embodiments, stage 78 may be mechanically controllable such that the stage may be moved in the z-direction to obtain images at different focal planes, as discussed in more detail below.

[0158] Microscope 70 also includes a light source 80. The light source 80 may be configured to provide excitation energy to illuminate a biological sample placed on stage 78 to activate fluorophores attached to biological structures in the sample. In an embodiment, the light source may be a laser. In some embodiments, the light source 80 may be configured to illuminate the biological sample using light of a wavelength different than that used to acquire images of photons released by the fluorophores. For example, some fluorescent imaging techniques such as stochastic optical reconstruction microscopy (STORM) and photoactivated location microscopy (PALM), employ different fluorophores to mark different locations in a biological structure, and the different fluorophores may be activated at different times based on the characteristics (e.g., wavelength) of the light produced by the light source used to illuminate the sample. In such instances, the light source 80 may include at least two light sources, each of which is configured to generate light having different characteristics for use in STORM or PALM-based imaging. In other embodiments, a single tunable light source may be used.

[0159] The microscope 72 may also include a camera 76 configured to detect photons emitted from the fluorophores and to construct 2D images. Any suitable camera(s) may be used including but not limited to, CMOS or CCD-based cameras. As discussed above, some embodiments may include a single camera with a controllable microscope stage to time sequentially acquire images in a z-stack as the stage moves positions, whereas other embodiments may include multiple cameras, each of which is configured such that the multiple cameras simultaneously acquire 2D images in appropriate different focal planes, thus creating a z-stack instantaneously without any time delay between the 2D images throughout the stack.

[0160] The microscope 72 may also include a processor 82 programmed to control the operation of one or more of stage 78, light source 80, and camera 76. The processor 82 may be implemented as a general- or special-purpose processor programmed with instructions to control the operation of one or more components of the microscope 72. Alternatively, the processor 82 may be implemented, at least in part, by hardware circuit components arranged to control operation of one or more components of the microscope.

[0161] The microscope 72 may further include a non-transitory, computer-readable memory 84 which may be or may include a volatile or non-volatile computer-readable medium. The memory 84 may temporarily or permanently store one or more images captured by the microscope 72. The memory 84 may additionally or alternatively store one or more instructions for execution by the processor 82. The instructions may encompass or embody one or more of the methods of this disclosure (e.g., one or more of methods 10, 30, 50, 90, any portions thereof, and/or any portions of the two-stage Likelihood pipeline disclosed herein).

[0162] In addition to or instead of a processor 82 and memory 84, the microscope may include one or more additional computing devices. For example, in embodiments, the microscope 72 may include one or more of an application-specific integrated circuit (ASIC), a field-programmable gate array (FPGA), and/or another type of processing device.

[0163] Although the components of the microscope 72 (i.e., the optics 74, camera 76, stage 78, light source 80, processor 82, and memory 84) are generally described above as singular, it should be understood that any of the components of the microscope 72 may be provided in multiple. That is, the microscope may include multiple optics 74, cameras 76, stages 78, light sources 80, processors 82, and/or memories 84.

[0164] In an embodiment, the microscope may include multiple optics 74 and multiple cameras 76, with each set of optics 74 paired with a respective camera 76. Each paired optics 74 and camera 76 may be configured for imaging in a specific focal plane, with the focal plane of each paired optics 74 and camera 76 different from each other pair. In such an embodiment, the system 70 may enable simultaneous imaging in multiple focal planes for, e.g., capture of a z-stack of images at a single point in time.

[0165] In an embodiment, the processor 82 may control the light source 80 and camera 76 so as to enable simultaneous application of excitation energy from the light source 80 and imaging with the camera 76. As noted above, in an embodiment, the microscope may include multiple cameras 76. Accordingly, in an embodiment, the processor 82 may control the light source 80 and multiple cameras 76 configured to image different respective imaging planes so as to simultaneously image in multiple focal planes with simultaneous application of excitation energy from the light source 80. Such an arrangement, in conjunction with the techniques for processing the subsequent images of this disclosure, may enable super resolution imaging for long periods of time of the same biological sample.

[0166] In addition to the microscope 72, the system 70 may further include a computing device 86 and a storage device 88, both in electronic communication with the microscope 72. The storage device 88 may be configured to store image data acquired by the camera 76. The storage device 88 may be integrated with or directly connected to the microscope 72 as local storage and/or the storage device 88 may be located remote to microscope 72 as remote storage in communication with microscope 72 using one or more networks. In some embodiments, the storage device 88 may be configured to store a plurality of 3D images of a fluorescent spot. [0167] The computing device 86 may be in communication with microscope 72 using one or more wired or wireless networks. The computing device 86 may be or may include, for example only, a laptop computer, a desktop computer, a tablet computer, a smartphone, a smart watch, or some other electronic computing device. In some embodiments, the computing device 86 may be configured to control one or more operating parameters of the microscope 72 using applications installed on the computing device. In some embodiments, the computing device 86 may be configured to receive imaging data captured using the microscope 72.

[0168] The computing device 86 may include its own respective processor and memory and/or other processing devices for storage and execution of one or more methods or techniques of this disclosure. For example, the computing device 86 may store and execute one or more instructions. The instructions may encompass or embody one or more of the methods of this disclosure (e.g., one or more of methods 10, 30, 50, any portions thereof, and/or any portions of the two-stage Likelihood pipeline disclosed herein). The computing device may be in electronic communication with the storage device 88, in embodiments, in order to acquire one or more data sets from the storage device 88 for processing.

[0169] In some embodiments, a time-lapse visualization (e.g., a movie) may be created (e.g., by the computing device 86) to visualize the tracked location of an imaged entity as identified by processing the 4D dataset. In one implementation, the time-lapse visualization may be created based, at least in part, on a plurality of point-in-time visualizations created in accordance with the techniques described above. In an embodiment, such a visualization may include a plot of one or more of the MLE parameters of a model (e.g., a signal hypothesis model). For example, plotting the value of A (see equations (1) and (9) above) would provide a visualization of the amplitude of a pattern of interest. [0170] FIG. 6 is a flow chart illustrating an example method 90 of conducting a stage of a two-stage likelihood pipeline analysis. The method 90 describes the application of a standard error calculation, a false positive rate calculation, and determination of a multi-factor landscape to a two-stage Likelihood pipeline analysis. One or more steps of the method 90 may be performed at either stage of the pipeline analysis, as discussed further below. The method 90 will be described with reference to a single segment of the data set. In practice, one or more steps of the method 90 may be applied to each segment in a data set in a stage of a two-stage Likelihood pipeline analysis.

[0171] The method 90 may include a step 92 of calculating one or more maximum likelihood estimates (MLEs) and a likelihood value (LVs) with respect to a segment in a data set. The MLEs and LVs may be approximate (e.g, calculated according to stage I of a two- stage Likelihood pipeline analysis), in an embodiment. For example, the MLEs and LVs may be calculated according to the steps of the method 30 of FIG. 3. In another embodiment, the MLEs and LVs may be full values (e.g., calculated according to stage II of a two-stage Likelihood pipeline analysis). For example, the MLEs and LVs may be calculated according to the steps of the method 50 of FIG. 4.

[0172] The method 90 may further include a step 94 of calculating a standard error associated with one or more values of an MLE, or with a LV, calculated in step 92 for the segment. The standard error may be calculated according to a Fisher Information matrix, as described above with respect to equations (45)-(48).

[0173] The method 90 may further include a step 96 of calculating a Likelihood ratio for the segment. The Likelihood ratio calculation step 96 may be performed according to step 38 of the method 30 (if performed at stage I), or according to step 58 of the method 50 (if performed at stage II), in an embodiment. [0174] The method may further include a step 98 of calculating a false positive rate for the Likelihood ratio calculated in step 96. The false positive rate may be calculated according to equations (35) and (36), above. In addition to or instead of a false positive rate, a false negative rate, true positive rate, or true negative rate may be calculated and used later in the method 90.

[0175] The method 90 may further include a step 100 of calculating a multi-factor landscape value. The multi-factor landscape value may include an arithmetic (e.g., multiplicative) combination of two or more of a Likelihood ratio, a gradient of the Likelihood ratio, and a Hessian of the Likelihood ratio, each respective of the segment. Additionally or alternatively, the step may include multiplying a Likelihood ratio of the segment with a scalar.

[0176] The method 90 may further include steps 102, 104, 106 of comparing the standard error, the false positive rate, and the multi-factor landscape value to respective thresholds. Each threshold may be set as appropriate for the compared value.

[0177] The standard error comparison step 102 may include comparing a combination of the standard error and another value with the threshold, in an embodiment. For example, the step 102 may include calculating the standard error associated with a likelihood value, and comparing the likelihood value plus (or minus) the standard error with the threshold.

[0178] The method 90 may further include a step 108 of designating the segment as a pattern candidate segment (if the method is being performed at stage I) or as the location of the pattern of interest (if at stage II) based on the threshold comparisons. In some embodiments, the step 108 may include accounting for numerous threshold comparisons. For example, a segment may be designated as a pattern candidate segment at stage I if both the false positive rate is below a threshold at step 104 and if the likelihood value for the segment is above its threshold (as discussed with respect to step 40 in the method 30 of FIG. 3). [0179] While this disclosure has described certain embodiments, it will be understood that the claims are not intended to be limited to these embodiments except as explicitly recited in the claims. On the contrary, the instant disclosure is intended to cover alternatives, modifications and equivalents, which may be included within the spirit and scope of the disclosure. Furthermore, in the detailed description of the present disclosure, numerous specific details are set forth in order to provide a thorough understanding of the disclosed embodiments. However, it will be obvious to one of ordinary skill in the art that systems and methods consistent with this disclosure may be practiced without these specific details. In other instances, well known methods, procedures, components, and circuits have not been described in detail as not to unnecessarily obscure various aspects of the present disclosure.

[0180] Some portions of the detailed descriptions of this disclosure have been presented in terms of procedures, logic blocks, processing, and other symbolic representations of operations on data bits within a computer or digital system memory. These descriptions and representations are the means used by those skilled in the data processing arts to most effectively convey the substance of their work to others skilled in the art. A procedure, logic block, process, etc., is herein, and generally, conceived to be a self-consistent sequence of steps or instructions leading to a desired result. The steps are those requiring physical manipulations of physical quantities. Usually, though not necessarily, these physical manipulations take the form of electrical or magnetic data capable of being stored, transferred, combined, compared, and otherwise manipulated in a computer system or similar electronic computing device. For reasons of convenience, and with reference to common usage, such data is referred to as bits, values, elements, symbols, characters, terms, numbers, or the like, with reference to various presently disclosed embodiments.

[0181] It should be borne in mind, however, that these terms are to be interpreted as referencing physical manipulations and quantities and are merely convenient labels that should be interpreted further in view of terms commonly used in the art. Unless specifically stated otherwise, as apparent from the discussion herein, it is understood that throughout discussions of the present embodiment, discussions utilizing terms such as“determining” or “outputting” or“transmitting” or“recording” or“locating” or“storing” or“displaying” or “receiving” or“recognizing” or“utilizing” or“generating” or“providing” or“accessing” or “checking” or“notifying” or“delivering” or the like, refer to the action and processes of a computer system, or similar electronic computing device, that manipulates and transforms data. The data is represented as physical (electronic) quantities within the computer system’s registers and memories and is transformed into other data similarly represented as physical quantities within the computer system memories or registers, or other such information storage, transmission, or display devices as described herein or otherwise understood to one of ordinary skill in the art.

Claims

CLAIMS What is claimed is:

1. A method of detecting a pattern of interest in a data set, the data set comprising a plurality of segments, the method comprising:

calculating an approximate Likelihood Ratio (LR) for one or more of the plurality of segments;

calculating a false positive rate for the approximate LR;

designating the one or more of the plurality of segments as a pattem-of-interest

candidate segments based on the false positive rate; and

applying a full Likelihood analysis to each of the candidate segments; and designating one or more of the candidate segments as including the pattern according to the result of the full Likelihood analysis.

2. The method of claim 1, wherein designating the one or more of the plurality of segments as pattem-of-interest candidate segments based on the false positive rate comprises: comparing the false positive rate to a threshold; and

designating the one or more of the plurality of segments as pattem-of-interest

candidate segments when the false positive rate is less than the threshold.

3. The method of claim 1, wherein designating the one or more of the plurality of segments as pattem-of-interest candidate segments based on the false positive rate comprises: calculating the inverse logarithm of the false positive rate;

comparing the inverse logarithm of the false positive rate to a threshold; and designating the one or more of the plurality of segments as pattem-of-interest

candidate segments when the inverse logarithm of the false positive rate is greater than the threshold.

4. The method of claim 1, wherein the data set comprises one or more images.

5. The method of claim 1, wherein calculating the false positive rate for the approximate LR comprises determining fluctuations in the LR over time.

6. The method of claim 5, wherein calculating the false positive rate for the approximate LR further comprises fitting a probability density function to the fluctuations in the LR.

7. The method of claim 6, wherein the probability density function is a gamma function.

8. The method of claim 1, further comprising:

calculating a true positive rate for the approximate LR; and

candidate segments based on the false positive rate and the true positive rate.

9. A method of detecting a pattern of interest in a data set, the data set comprising a plurality of segments, the method comprising:

calculating an approximate maximum likelihood estimate (MLE) or an approximate likelihood value for one or more of the plurality of segments, the MLE comprising one or more values;

calculating a standard error of the approximate MLE or the approximate likelihood value;

candidate segment according to the standard error;

10. The method of claim 9, wherein calculating a standard error of the approximate MLE or the approximate likelihood value comprises calculating the standard error according to a Fisher information matrix.

11. The method of claim 9, wherein the data set comprises one or more images.

12. The method of claim 9, wherein designating the one or more of the plurality of segments as a pattem-of-interest candidate segment according to the standard error comprises comparing the standard error to a threshold and designating the one or more of the plurality of segments as a pattem-of-interest candidate segment according to the threshold comparison.

13. The method of claim 9, wherein designating the one or more of the plurality of segments as a pattem-of-interest candidate segment according to the standard error comprises designating the one or more of the plurality of segments as a pattem-of-interest candidate segment according to a combination of a value of the MLE and the standard error.

14. The method of claim 13, wherein designating the one or more of the plurality of segments as a pattem-of-interest candidate segment according to a combination of a value of the MLE and the standard error comprises comparing the combination of the value of the MLE and the standard error to a threshold and designating the one or more of the plurality of segments as a pattem-of-interest candidate segment according to the threshold comparison.

15. A method of detecting a pattern of interest in a data set, the data set comprising a plurality of segments, the method comprising: calculating an approximate Likelihood ratio for one or more of the plurality of segments;

calculating a gradient of the approximate Likelihood ratio;

calculating a Hessian of the approximate Likelihood ratio;

candidate segments based on two or more of the approximate Likelihood ratio, the gradient, and the Hessian;

calculating a full Likelihood for each of the candidate segments; and

designating one or more of the candidate segments as including the pattern according to the full Likelihood.

16. The method of claim 15, wherein the data set comprises one or more images.

17. The method of claim 15, wherein designating the one or more of the plurality of segments as a pattem-of-interest candidate segments based on two or more of the approximate Likelihood ratio, the gradient, and the Hessian comprises designating the one or more of the plurality of segments as a pattem-of-interest candidate segments based on a mathematical combination of two of the gradient, the Hessian, and the approximate Likelihood ratio.

18. The method of claim 17, wherein the mathematical combination comprises a

multiplicative combination.

19. The method of claim 15, wherein designating the one or more of the plurality of segments as a pattem-of-interest candidate segments based on two or more of the approximate Likelihood ratio, the gradient, and the Hessian comprises designating the one or more of the plurality of segments as a pattem-of-interest candidate segments based on a mathematical combination of the gradient, the Hessian, and the approximate Likelihood ratio.

20. The method of claim 19, wherein the mathematical combination comprises a

multiplicative combination.