WO2019140434A2

WO2019140434A2 - Overlapping pattern differentiation at low signal-to-noise ratio

Info

Publication number: WO2019140434A2
Application number: PCT/US2019/013614
Authority: WO
Inventors: Nancy E. KLECKNER; Frederick S. CHANG
Original assignee: President And Fellows Of Harvard College
Priority date: 2018-01-15
Filing date: 2019-01-15
Publication date: 2019-07-18
Also published as: WO2019140434A3

Abstract

Methods and systems for detecting and characterizing a pattern (or patterns) of interest in a low signal-to-noise ratio (SNR) data set are disclosed. One method is a form of a two-stage Likelihood pipeline analysis for differentiating multiple closely-spaced spots that takes advantage of the benefits of a full Likelihood analysis while providing computational tractability. The two-stage pipeline may include a first stage including the application of approximate Likelihood functions. The second stage may include a full Likelihood analysis. Once a pattern of interest instance is characterized, it may be subtracted from the underlying data, and the two-stage analysis may be performed on the reduced data to detect a further pattern of interest proximate the characterized pattern.

Description

OVERLAPPING PATTERN DIFFERENTIATION AT LOW SIGNAL-TO-NOISE

RATIO

CROSS-REFERENCE TO RELATED APPLICATION

[0001] This application claims the benefit of priority to U.S. provisional application no. 62/617,567, filed January 15, 2018, now pending, which is hereby incorporated by reference in its entirety.

STATEMENT OF GOVERNMENT SUPPORT

[0002] This invention was made with government support under ROl GM025326, ROl GM044794, and T32 GM007598 awarded by the National Institutes through the National Institute of General Medical Sciences (NIGMS), and under PHY-1219334 awarded by the National Science Foundation. The government has certain rights in the invention.

FIELD

[0003] This disclosure is generally directed to pattern detection in data sets, in particular detection and differentiation of multiple overlapping pattern instances, the detection including Likelihood analysis.

BACKGROUND

[0004] Robust detection of a pattern at low signal-to-noise ratio (“SNR”) represents a fundamental challenge for many types of data analysis. Robustness comprises both the reliable detection of a pattern when it is present (i.e., maximization of true positives, which maximizes sensitivity) and the failure to falsely identify a pattern as being present when, in fact, it is not (i.e., minimization of false positives, which maximizes selectivity).

[0005] Biological imaging provides a prominent example of this challenge: super resolution imaging of fluorescent point sources (“spots”), e.g., representing single molecules, is currently possible only in high-SNR regimes, i.e. where the“spot” is very bright. [0006] Powerful methodologies have recently emerged which permit the reliable detection and precise localization of fluorescent objects, both spots and fluorescent objects with more complex shapes, with resolution below the optical limit set by the wavelength of light.

However, thus far, known single molecule/point source methodologies (e.g., STORM, double helical point spread function analysis) require imaging in a high SNR regime. Also, methodologies for imaging objects with more complex shapes that are not based on imaging of individual point sources (for example, Structured Illumination Microscopy or“SIM”), are based on imaging multiple instances of the sample of interest under different illumination conditions. A super-resolution image of the object that surpasses the optical limit is then reconstructed from this battery of images. Each component image must be obtained in a high-SNR regime for successful reconstruction.

[0007] These high-SNR imaging methods require high excitation energy in order to achieve a signal that is detectable above the background, which comprises fluorescence that emanates from sources other than the spot, and despite the presence of noise. High excitation energy results in photobleaching and, in living samples, phototoxicity. Because of these effects, super-resolution spot detection and localization is currently limited to the acquisition of 2D data, with 3D information extracted indirectly by modification of the optics plus appropriate data analysis (thus avoiding the need to take images in multiple focal planes ("z- slices") for each image). Moreover, time-lapse analysis in living cells is limited to a relatively small number of time points. Conversely, an alternative approach has made it possible to detect fluorescent point sources and/or objects with more complex shapes in low SNR regimes (and thus low excitation energies), thereby enabling imaging of living cells with many images collected over very long time periods. (See, e.g., (1) Carlton, Peter M., et al.“Fast live simultaneous multiwavelength four-dimensional optical microscopy.”

Proceedings of the National Academy of Sciences 107.37 (2010): 16016-16022; (2) Arigovindan, Muthuvel, et al. "High-resolution restoration of 3D structures from widefield images with extreme low signal -to-noise-ratio." Proceedings of the National Academy of Sciences 110.43 (2013): 17344-17349). However, these approaches do not provide super resolution precision of localization for fluorescent point sources or for visualization of objects with more complex shapes and, moreover, are computationally -intractable for large datasets.

SUMMARY

[0008] Methods and systems for detecting, characterizing, and differentiating multiple patterns of interest in a low Signal-to-Noise Ratio (SNR) data set are disclosed herein.

[0009] A method of detecting two or more pattern of interest instances in a segment of a data set may include identifying a first pattern of interest instance in the segment, subtracting the amplitude of the first pattern of interest instance from data in the data set respective of the segment to create a reduced data set portion, and calculating an approximate second pattern MLE for the reduced data set portion to identify a second pattern instance. The method may further include applying a full Likelihood analysis to the segment according to the approximate second pattern MLE, and designating the candidate segment as including the second pattern according to the result of the full Likelihood analysis.

BRIEF DESCRIPTION OF THE DRAWINGS

[0010] FIG. 1 is a flow chart illustrating an example embodiment of a method of characterizing patterns of interest in a data set.

[0011] FIG. 2 is a diagrammatic view of an example z-stack of two-dimensional images that may comprise an example three-dimensional data set.

[0012] FIG. 3 is a flow chart illustrating an example embodiment of a first stage in a two- stage likelihood pipeline analysis for detecting and characterizing multiple patterns of interest in a data set. [0013] FIG. 4 is a flow chart illustrating an example embodiment of a second stage in a two-stage likelihood pipeline analysis for detecting and characterizing multiple patterns of interest in a data set.

[0014] FIG. 5 is a chart illustrating example penetrance of two patterns of interest into data sets associated with two different data capture regimes.

[0015] FIG. 6 is an example plot illustrating an example empirical determination of penetrance coefficients for two patterns of interest into two data sets associated with two data capture regimes.

[0016] FIG. 7 is a diagrammatic view of an example embodiment of a system for acquiring a data set and identifying and localizing one or more instances of one or more patterns of interest in the data set.

[0017] FIG. 8 is a flow chart illustrating an example method of detecting and

differentiating overlapping instances of one or more patterns of interest.

DETAILED DESCRIPTION

[0018] The instant disclosure provides an algorithm that may be applied to detect zero, one or more than one pattem(s) of interest (e.g., zero, one or more fluorescent spots of zero, one, or more colors) in a low-SNR data set. In particular, the teachings of the instant disclosure may be applied to detect zero, one or more than one pattem(s) of interest that at least partially overlap. For example, the teachings of the instant disclosure may be applied to detect and differentiate zero, one or more than one pattem(s) of interest comprising fluorescent colors in a data set comprising one or more images.

[0019] When applied to detect fluorescent spots in biological entities with an optical image detector (e.g., a microscope), the algorithm of the present disclosure may provide localization precision beyond the optical diffraction limit, may provide optimal resolution of overlapping spots, and may further provide accurate quantification of the intensity of the signal emanating from the source of the spot. The algorithm of the present disclosure may enable super resolution visualization of single molecules in three-dimensions (3D) at frequent time intervals for long, biologically -relevant time periods. The algorithm of the present disclosure may also enable super-resolution time-lapse imaging of whole objects.

[0020] The instant disclosure builds on an algorithm, the“two-stage Likelihood pipeline,” originally disclosed in U.S. app. nos. 62/211,950 and 62/211,994 and PCT app. no.

PCT/US2016/049706, each of which is hereby incorporated by reference in its entirety. The two-stage Likelihood pipeline makes use of the Likelihood approach, which is a well- documented method for finding patterns in noisy data sets that is understood by a person of ordinary skill in the art. The Likelihood approach, in order to identify and characterize a pattern of interest, can take into account multiple components that contribute to the values of the data set. In embodiments, the Likelihood approach may take into account all components that contribute to the values of the data set. For example, in an embodiment, as described further below, a Likelihood approach may take into account the ideal theoretical natures of one or more patterns of interest and of the background as well as the effects of multiple types of noise, with each of these contributors to the values in the data set expressed

mathematically as a function of particular parameters.

[0021] A full Likelihood approach, in its purest form (which may also be referred to herein as a“single stage” form of Likelihood analysis), would involve consideration of all possible combinations of all possible values of all of parameters across an entire data set in order to determine which particular combination of parameter values would best fit the data set. This theoretical ideal is rarely, if ever, achieved in practice because it requires a computational workload that cannot be achieved with current technology in an acceptable amount of time. As a result, various strategies have been developed that simplify the computational task, thus ameliorating the challenge of computational intractability. For example, the Markov Chain Monte Carlo method takes the approach of intelligent sampling of particular subsets of parameter values. However, this approach is slow and inefficient, and thus only applicable if the position of the pattern of interest in the data set is first approximately defined and if the number of data sets to be analyzed is restricted. A similar situation exists for the specific case of fluorescent spot detection. In one known approach, the data set is scanned by eye, or by ad hoc criteria, to determine the presence of a spot, and a full Likelihood approach is then implemented on the limited portion of the data set that includes the spot to more stringently define the presence of a spot and to provide its precise and accurate localization.

Alternatively, custom algorithms are used to initially detect candidate spots. However, these custom algorithms all use "ad hoc" criteria to define the presence of a spot and thus cannot, in principle, perform as well as pure likelihood derived methods. They are also difficult to use. See Sage, Daniel, et al.“Quantitative evaluation of software packages for single-molecule localization microscopy,” Nature methods 12.8 (2015): 717-724. These approaches cannot be applied to low-SNR data sets because, in such regimes, a spot may be missed by visual or ad hoc” criteria and thus may not actually be evaluated by the Likelihood approach. In other words, initial inspection of a data set through visual inspection or ad hoc criteria is particularly ineffective in low-SNR data sets because of an unacceptably high chance of false negatives. In addition, such criteria may wrongly detect a spot where one is present, giving an unacceptably high chance of false positives. Other than the two-stage Likelihood pipeline, a general, computationally -tractable algorithm for robustly detecting and precisely localizing a pattern of interest in a large, noisy, low-SNR data set (e.g., a data set generated by 3D time lapse analysis of a fluorescent spot with imaging at frequent intervals for long time periods) does not currently exist.

[0022] The two-stage Likelihood pipeline modifies the Likelihood approach in order to both detect and localize a pattern of interest (e.g., in a large data set) in a computationally- efficient (and therefore feasible) manner. The two-stage Likelihood pipeline may provide robust spot detection and may enable both accurate (e.g., super-resolution) localization and precise localization and quantitation of detected spots. Moreover, the two-stage Likelihood pipeline may enable pattern detection and localization in low-SNR data sets. Still further, some embodiments of the two-stage Likelihood pipeline, such as those disclosed herein, may enable characterization and differentiation of multiple overlapping patterns in low-SNR data sets.

[0023] The teachings of the instant disclosure provide an algorithm for applying the two- stage Likelihood pipeline to detect and characterize zero, one or more than one pattem(s). Although the methods and processes disclosed herein may be used to detect zero, one, or more pattems-of-interest, the instant disclosure provides particular improvement when applied to two or more closely-spaced patterns of interest.

[0024] Multi-spectral imaging is one example of the use of multiple data capture regimes that may result in a data set having multiple discrete subsets. Multi-spectral imaging may be used, for example, for differential detection of the positions and dynamics of different molecular entities. In such a context, multi-spectral imaging may involve labeling different entities with fluorophores of different fluorescent colors— e.g., having different excitation wavelengths and/or different output wavelengths— to separately locate and/or track those entities. Such fluorophores may be imaged by: (i) providing, for each fluorophore color, a respective excitation signal (e.g., excitation light of a wavelength known to cause output by the fluorophore), which is achieved by an appropriate combination of a light source and a suitable wavelength filter; (ii) filtering the overall light output of the sample with a filter, for each fluorophore, that passes the output light wavelength of the fluorophore to the detector; and (iii) detecting and recording the photons of the filtered output. The use of a given excitation wavelength and filtered collection of the corresponding output wavelength may comprise a data collection regime specific to a given color of fluorophore. The data for different colors may be collected separately, with the collected data for a given color and regime resulting in a subset of a data set. The complete data set, therefore, may include respective subsets corresponding to the respective colors and/or regimes.

[0025] In multi-spectral fluorescence imaging, the fluorophore output collected for a given data collection regime primarily comprises light output by fluorophores specifically targeted by the data collection regime. However, it is common to encounter output from fluorophores not targeted by the data collection regime— /. e.. fluorophores of a different color— in the corresponding data subset. In other words, for a sample comprising“red” and“green” fluorophores, the data respective of the red regime may include some output from the green fluorophores, and vice-versa. This feature is known in multi-spectral fluorescence imaging as “spectral bleedthrough.” In low-SNR conditions, such bleedthrough may increase the chances of false positive detection of a pattern of interest. In addition, because of bleedthrough, data corresponding to a given color fluorophore may be spread across the data subsets from different data collection regimes, and thus multiple subsets can be used to define the presence and positions of a single fluorophore color spot. Unlike known fluorescence imaging methods and procedures, the pattern detection and localization algorithm described herein takes advantage of this extra information in multi-spectral fluorescence imaging.

[0026] More generally, different patterns of interest that are collected using separate data collection regimes, resulting in separate data subsets, may be present to some degree in one or more of those data subsets. The degree to which a given pattern of interest is present in a data subset corresponding to a discrete data collection regime may be referred to herein as “penetrance.” For example,“bleedthrough” of a fluorophore’ s output into a different color regime may be considered penetrance of that fluorophore’ s output into the other color regime. The fluorophore may also have penetrance in its own regime. [0027] In some embodiments described in this disclosure, patterns of interest are 3D photon distributions produced by fluorescent point sources. Along with photons emanating from the fluorescent point sources of interest, photons also emanate from other sources in the measured environment; these photons comprise“background.” In embodiments, the data set may be a digitized set of images that comprises the totality of any pattem(s) of interest plus the background. The images may be captured in 3D in a vertical series of planar sections (a ”z-stack”), in some embodiments. Each image in such a data set may represent the output of the detection units (i.e., pixels) of a camera and the camera may be a part of a microscope, in an embodiment. Photons impinging on these pixels from both the spot and the background are converted to electrons and then to analog-to-digital units (“ADUs”).

[0028] Where the patterns of interest are 3D photon distributions produced by fluorescent point sources, noise in the data set may arise from two sources: (1)“photon noise,” which comprises fluctuations in the number of photons impinging on each pixel per time unit (i.e. image capture time), from both spot and background sources; and (2)“measurement noise,” which arises during the conversion of photons to ADUs. In the case of digital imaging, this measurement noise may vary from pixel-to-pixel due to mechanical defects of the detector, (e.g. broken or unreliable pixels or just natural manufacturing variation). In addition to noise, edge effects may occur when the fluorescent point source is located within the data set, but so near an edge that the image does not capture the entire“spot” or when the point source is located outside of the data set, with only part of the corresponding spot located within the imaged data set.

[0029] To provide context for characterization of patterns with the two-stage Likelihood pipeline, the instant disclosure will first provide a brief description of Likelihood analysis. Next, the instant disclosure will provide a description of the two-stage Likelihood pipeline for detecting and characterizing patterns of interest in a data set resulting from a single data capture regime. After, the instant disclosure will describe an embodiment of the two-stage Likelihood pipeline for detecting and characterizing patterns of interest in a data set comprising multiple discrete subsets resulting from multiple data collection regimes. The instant disclosure will then describe example methods that apply the two-stage Likelihood pipeline to the detection and localization of fluorescent spots of multiple colors (and thus multiple patterns of interest) in images of biological entities.

[0030] Brief Description of Likelihood Analysis.

[0031] As noted above, the Likelihood approach may be applied to characterize (e.g., identify, define the position of, and determine an amplitude for) a pattern of interest in a data set. In one form, the Likelihood approach compares two hypotheses to each other. First, the “signal hypothesis” hypothesizes that both the pattern of interest and the background are represented in the data set and therefore mathematically models the data set as a sum of the pattern of interest and the background. An example model of the signal hypothesis is given in equation (1) below:

signal hypothesis = fA, B, pos_A, posf) = Affposf) + B g_L(pos_B)

(Eq. 1)

where L_έ is the mean value of the data set for index /. pos_A is a position (e.g., (x, y, z) in an embodiment in which the data set exists in a three-dimensional Cartesian coordinate system) respective of the pattern of interest (e.g., the center of the pattern of interest), f is the distribution function of the pattern of interest at point i, A is the amplitude of the pattern of interest, g_L is the distribution function of the background pattern at point /. B is the amplitude of the background pattern, and pos_B is a position respective of the background (e.g., the center of the background pattern). The signal hypothesis supposes that the mean value A_L at a point i is the sum of the pattern of interest at that point and the background pattern at that point. [0032] For the mathematical model of the signal hypothesis, the background is itself a pattern, but one that comprises“interference” for the pattern of interest. Accordingly, in this disclosure, both a“pattern of interest” and a“background pattern” may be referenced. When used in isolation, the term“pattern” in this disclosure refers to the pattern of interest, not the background pattern. When used in isolation, the term“background” refers to intensity components of the data that emanate from sources other than the“pattern of interest” as described above.

[0033] The second hypothesis, the“null hypothesis,” hypothesizes that the pattern of interest is not present in the data set, only the background pattern, and can be conceptualized according to equation (2) below with terms defined as described above:

null hypothesis = A₍(b, pos_B) = Bg_L pos_B)

(Eq. 2)

[0034] The Likelihood approach compares the signal hypothesis with the null hypothesis to determine which has a higher likelihood of describing the data set. The more accurate the signal hypothesis is (i.e., the higher the likelihood associated with the signal hypothesis), and the less accurate the null hypothesis is (i.e., the lower the likelihood associated with the null hypothesis), the more likely it is that the pattern of interest is present in the data set. As will be described in further detail below, a comparison of the likelihoods of the two hypotheses may be referred to as the“Likelihood ratio” of the given data set.

[0035] As also discussed above, one or more components of the system of interest as manifested in a data set may be characterized by noise ( e.g ., (i) measurement noise, (ii) pattern noise, and/or (iii) background noise). Measurement noise may result from the processes involved in detecting or measuring the pattern of interest and background and/or in converting such entities to numerical form. In the case of an image, measurement noise may include, e.g., noise intrinsic to the image capture device. Pattern noise may be an intrinsic feature of a pattern to be detected. For example, in an image of a fluorophore, the pattern of interest may be noisy because of quantum variation in the number of photons emitted by the fluorophore (so-called“quantum noise”). Background noise may be an intrinsic feature of the background pattern. In an image of a fluorophore, the background pattern may include light emissions from the measurement environment and thus may also be characterized by quantum variation.

[0036] Where the pattern of interest is a fluorescent spot, both pattern noise and background noise may be described by a Poisson distribution. As will be described in greater detail later in this disclosure, these two noise effects (i.e. the effects of the fluctuations in the two different noise sources) may be additive. Accordingly, pattern noise and background noise may be jointly represented by a single Poisson distribution (as the sum of two Poisson distributions is a single new Poisson distribution). In a full Likelihood analysis, measurement noise may be described by a Gaussian distribution with a mean and a variance that are pixel- specific parameters.

[0037] A Likelihood analysis includes respective functions for both the signal hypothesis and the null hypothesis, which may be solved to determine optimal values of A, B, pos_A. and pos_B (i.e., the parameter values that best describe the data set), as well as the likelihoods associated with those optimal values. It should be noted that the signal hypothesis depends on A, B, pos_A. and pos_B. and the null hypothesis depends only B and pos_B. This process— determining optimal values for the parameters of the hypotheses and the likelihoods associated with those values— may be carried out independently for, first, a function corresponding to the signal hypothesis and, second, a function corresponding to the noise hypothesis. These functions are known as“Likelihood functions.” Each Likelihood function describes the likelihood that a given data set arose from the model that implements the corresponding hypothesis ( e.g ., signal or null) as a function of the parameters of the corresponding model. The Likelihood function defines the likelihood (£) that given data (e.g., a data point d_s) arose from a particular model (signal or null) and is proportional (k_j) to the probability ( P ) of observing the data given the values of the parameters of the model.

Equation (3) below sets forth the general form of a Likelihood function:

(Eq. 3)

Proportionality constant k_j s a data dependent constant, which means that each given dataset j is assigned its own constant k_j in the Likelihood function. Since this constant is a data dependent term, different Likelihood functions, e.g. from different hypotheses, that share the same data will also share the same constant. Thus, for a Likelihood ratio based on two different hypotheses operating on the same dataset, this same constant will be present in both the numerator and the denominator and will cancel out.

[0038] The Likelihood analysis, and functions of the analysis, will be described in this disclosure with reference to a data set d, in which each data point d_j is an n-dimensional value having the form given in equation (4) below:

(Eq. 4)

[0039] In an example form of a Likelihood analysis, the Likelihood function for the signal hypothesis is given and expanded in equation (5) below:

(Eq. 5)

[0040] The Likelihood function for the signal hypothesis can be solved to identify the values of the parameters (A, B, pos_A. and pos_B) that give the best fit of the model to the data set of interest d and to define the likelihood that the values d_j in the data set of interest d would occur according to the model, given those parameter values. This exercise comprises a “Maximum Likelihood Estimation” (MLE). After solving the above equation (5) (an example process for which is provided below), the identified optimal parameter values (A, B, pos_A. and pos_B) comprise the (MLE). The Likelihood Value (LV) at the MLE is related to how probable it is to see the given data d_j for the given model having parameters (A, B, pos_A. and posg).

[0041] In the same example Likelihood analysis, the Likelihood function for the null hypothesis can be represented in the same general form as equation (5) above and solved to find the optimal values of B and pos_B and the value of its corresponding likelihood.

[0042] Once the Likelihood functions for the signal hypothesis and the null hypothesis are solved, the Likelihood values associated with the two hypotheses can be compared with each other in a ratio. The ratio of the two likelihood values may be referred to as the“Likelihood Ratio” (LR). The Likelihood Ratio provides a quantitative measure of the relative probabilities that the experimental data set would have arisen according to the signal hypothesis model or the null hypothesis model. Because the only difference between the two hypotheses is the presence or absence of the pattern of interest, the Likelihood ratio gives a measure of the probability that the pattern of interest is present in the data set.

[0043] A threshold can be defined for the Likelihood ratio to define the minimum value that may be considered to define the presence of the pattern of interest, with the value of the threshold at the discretion of the user. In an embodiment, the Likelihood Ratio threshold can be defined through experimentation to determine an appropriate threshold that results in an accurate determination of the presence of the pattern of interest. In embodiments, the threshold may be applied directly to the numerical form of the Likelihood Ratio. In other embodiments, the threshold may be applied to a mathematical manipulation of the Likelihood Ratio, which is also considered application of the threshold to the Likelihood Ratio for the purposes of this disclosure. Such mathematical manipulations may include, for example, defining local maxima through H-dome transformation or other local maxima approaches, all of which will be understood by a person of skill in the art.

[0044] If the value of the Likelihood ratio indicates that the pattern of interest is present, the parameter values of the signal hypothesis model (i.e.. the amplitude (A) and position (pos_A) of the pattern of interest and the amplitude (B) and position (pos_B) of the

background) define the optimally-likely values of those parameters for that pattern of interest and for the background.

[0045] The Likelihood Approach, via determination of a Likelihood ratio, is a powerful tool to analyze patterns in data sets characterized by high background intensities and high levels of system noise from all sources, relative to the intensity of the pattern of interest.

[0046] Part of the power of likelihood analysis derives from the fact that it incorporates and accommodates different sources of information available in the data set. The Likelihood functions based on the signal hypothesis model and the null hypothesis model take into account not only the nature of the pattern of interest and the background pattern (e.g. , for the case of a fluorescent spot, a 3D Gaussian photon distribution and, for example, a constant average level of photons that emanate from sources other than the spot, across the data set) but also the fact that the values present in any individual data set will fluctuate from one sampling of the system to another. This fluctuation comprises“noise.” [0047] To restate the above point in another way: in principle, the value of the datum predicted to occur at each position in the data set could be a specific number that would be the same in every sampling of the system ( e.g . every fluorescence imaging data set of a particular sample). However, if there is noise in the system, from any or all of the above- noted sources, that predicted value fluctuates as described by some appropriate statistical probability distribution(s) (e.g., Poisson, Gaussian, or an empirically-determined

distribution), and the probability that the observed datum will be any particular value is predicted by that distribution. Conversely, given such a distribution(s), the probability that the data value actually observed would have arisen from the relevant model at the defined parameter values can be specifically defined. A unique advantage of the Likelihood approach is that it incorporates such“noise distributions” and thus can consider the effects of noise fluctuations on the probability that a given datum in a data set will a particular value.

[0048] Modeling Fluorescence Spot Image Data for Likelihood Analysis Problems With Full Likelihood Analysis.

[0049] Likelihood analysis can find particular use in the identification and characterization of fluorescent spots in images of biological entities. To apply a likelihood analysis for such a case, sources of noise may be modeled in the Likelihood functions for a signal hypothesis model and for a null hypothesis model. For the particular case of fluorescence spot image data, the following considerations apply. “Photon noise” (which is characteristic of both the pattern of interest and background pattern) can be described by a Poisson distribution whose mean value is the average number of photons impinging on a given pixel (k_L). This Poisson distribution pertains to either: (i) photons from both the pattern of interest and the background pattern (in the signal hypothesis); or (ii) photons from the background pattern only (in the null hypothesis). Equation (6) below sets forth an example probability function setting forth the probability of observing a given pixel Y_L (i.e.. the combined contributions of noise from the pattern of interest and the background pattern) in a Poisson distribution parameterized by the mean number of photons L_έ:

(Eq. 6)

[0050] Another source of noise in fluorescence spot image data is: (iii) measurement noise. In a particular image, a respective number of photons impinges on each pixel and is converted by the camera to some number of electrons, each of which is stored as an“analog- to-digital unit” (ADU). This conversion process is characterized by noise. This so-called “camera noise” (a form of measurement noise) may be described by a Gaussian distribution whose variance is given by the read noise of the particular pixel and whose mean value is given by the magnitude of the user-selected or system-selected“offset” used to eliminate negative ADU values. Equation (7) below sets forth an example probability function setting forth the probability of observing a given data set X_L (i.e., measurement noise) in a Gaussian distribution parameterized by the read noise variance of the camera (s²) and a mean offset (_jiq) used to eliminate negative ADU values:

(Eq. 7)

[0051] The two types of noise, i.e. photon noise (from the pattem(s) of interest and/or the background pattem(s)) and camera noise may be independent from each other.

Consequently, their effects may add to give the distribution that describes the fluctuation in

ADUs. Mathematically, this additive distribution therefore may be described by convolving the distributions. In the particular case described above, this implies convolving the Poisson distribution describing the photon noise with the Gaussian distribution describing the camera noise. Equation (8) below illustrates that convolution: (Eq. 8)

[0052] Given the distribution of equation (8) above, the probability of occurrence of a particular number of ADUs present at a given point in the data set (in the case of 3D fluorescence images, at a given pixel or voxel) can be determined. The mean value of the Poisson distribution may be defined by the predicted photon level value at that pixel, whereas the camera noise Gaussian distribution may be defined by empirical calibration

measurements for each pixel in a particular camera/imaging setup.

[0053] The standard Likelihood approach (including noise modeled according to equation (8) above) cannot practically be applied if the data set is very large, if multiple patterns are sought, if the models involve multiple parameters, and/or for complex spatial and/or numerical distributions of noise, because it would be computationally intractable. This intractability arises from the computational complexity required to determine an MLE for the signal hypothesis (or any hypotheses) in those situations.

[0054] The computational intractability of a full Likelihood analysis applies to fluorescent spot analysis— specifically, to initial detection of fluorescent spots in image data. As a result, existing approaches include the use of ad hoc computational criteria and/or human visual inspection of images to initially identify the presence of a spot. After identifying the region of the data set containing a spot, existing approaches then may use a full Likelihood approach to determine the specific location of the spot. For example, some existing approaches use an iterative“hill-climbing” exercise that defines the MLE for a small region of the data that is defined as containing a spot based on ad hoc computational criteria and/or human visual examination. A hill-climbing exercise starts from a particular position in parameter space of the Likelihood function, evaluates the slope of the Likelihood function at that point, and follows that slope in the upward direction in parameter space to find a maximum Likelihood value at the position where the derivative of the Likelihood function is zero.

[0055] A hill-climbing exercise cannot practically be used to initially identify a spot accurately because, if the wrong starting point were chosen, the MLE may be determined for a local maximum in the data set that does not, in fact, represent a spot. The result would be loss of robustness, i.e. detection of a spot where none is present or failure to detect a spot when one is present, or detection of a spot with incorrect parameters specified. Thus, to robustly define the presence of a spot by a hill-climbing Likelihood approach, it would be necessary to carry out the hill-climbing exercise beginning from every position in the parameter set. The iterative nature of the hill-climbing approach is very computationally expensive, and the computational load required to carry out hill-climbing starting at every position is a parameter set is prohibitive. Thus, as noted above, known approaches utilize ad hoc criteria and/or human visual inspection detect the presence of a spot at an approximate position in the data set.

[0056] A single-stage likelihood analysis also may not adequately differentiate between numerous patterns of interest that may be present in a data set. Specifically, a single-stage likelihood analysis may not account for penetrance of a single pattern of interest into multiple data set subsets, and/or penetrance of multiple patterns of interest into a single data set subset.

[0057] As an improvement on the above approaches— e.g., a full, single-stage Likelihood analysis to identify and locate one or more patterns of interest, or a manual or computational identification of a pattern through ad hoc criteria followed by a Likelihood analysis to localize and further characterize the pattern (or patterns) of interest— the two-stage

Likelihood pipeline may be used to take advantage of the benefits of a Likelihood analysis while reducing computational workload to a practical level to identify and characterize multiple patterns of interest. The Likelihood approach, or a version thereof, may be used at both stages of the analysis. This contrasts with other known methods, in which the

Likelihood approach, when applied, is reserved for the final stage, thus losing out on its potential benefits during initial stages.

[0058] Improvement to a Full Likelihood Analysis Two Stage Likelihood Pipeline for

Detection and Characterization of Multiple Patterns.

[0059] The two-stage Likelihood pipeline includes two stages, described in turn below. First, in Stage I, a modified (e.g., approximate) Likelihood analysis may be performed to determine the presence of zero, one or more spot candidate locations for one or more patterns of interest. Below, a version of Stage I for a data set resulting from a single data capture regime will first be set forth, followed by a version of Stage I for a data set having multiple discrete subparts resulting from multiple data capture regimes. Second, in Stage II, a full Likelihood analysis may be carried out at one or more of the spot candidate locations identified in Stage I to verify the presence of (e.g., to formally detect) and to localize and determine the amplitude of patterns at the candidate locations.

[0060] Two-Stage Likelihood Pipeline Stage I Single Data Capture Regime

[0061] In the two-stage Likelihood pipeline, the full data set may be divided into small units for separate analysis that may be referred to in this disclosure as“patches” or “segments”. In an embodiment, the size of a patch or segment may be slightly larger than that expected for a fluorescent spot. In embodiments in which a pattern of interest is something other than a fluorescent spot, patch size may be selected as appropriate. For example, a segment may be defined to be slightly larger than the largest pattern of interest, in an embodiment. In embodiments in which each data point in the data set is a pixel or voxel, a patch or segment may be as small as a single pixel or voxel, or may be a set of pixels or voxels that contain a pattern of interest. In embodiments, a patch or segment may include a contiguous set of adjacent pixels or voxels. Additionally or alternatively, a patch or segment may include pixels or voxels that are non-contiguous or non-adjacent. Patches may be defined for every position in the data set. In an embodiment, every data point in the data set may be included in at least one patch. In the single-regime version of Stage I, a Likelihood- type analysis may be carried out for each patch with three simplifying modifications: (i) the 3D Gaussian distribution corresponding to the pattern of interest (e.g., the fluorescent spot) and the distribution corresponding to the background may be assumed to be positioned at some respective specified positions within the patch; (ii) the measurement noise may be described as a Poisson distribution, rather than as a Gaussian distribution; and (iii) the intensity of the mean photon values may be assumed to be low relative to measurement noise. This disclosure will discuss embodiments in which the specified location of a pattern of interest and/or the background pattern within a patch or segment is the center of the patch or segment. However, the two-stage Likelihood pipeline is not so limited. Rather, it should be understood that the specified position of the pattern of interest within a patch or segment may alternatively be some position other than the center, in embodiments.

[0062] A Stage I analysis, in which a signal hypothesis model (incorporating both the pattern of interest and background) and a null hypothesis model (incorporating only background) may incorporate the above assumptions offers two simplifications over a standard, full Likelihood approach as described above. First, the number of variable parameters is decreased because the positions of the pattern of interest and of the background pattern are specified, rather than variable. Thus, in the signal hypothesis model, there are only two variable parameters, A and B. For the same reason, in the null hypothesis model, there is just one variable parameter, B. As a result, at stage I, example signal hypothesis and null hypothesis models for a single pattern of interest may be described according to the general form of equations (9) and (10) below:

signal hypothesis º Ai(A, B, cen) = Af_L ( cen ) + B

(Eq. 9)

null hypothesis º 2_j(B) = B

(Eq. 10)

where cen represents a specified position at the center of the patch or segment.

[0063] For a fluorescent spot existing in a three-dimensional data set, f_L may be a 3D Gaussian distribution, corresponding to the distribution of photons emanating from a fluorescent point source, A is the amplitude of that Gaussian distribution, cen (in x, y, z) defines the position of the center of the Gaussian within the 3D data set as the center of a given segment, and B is the amplitude of the background.

[0064] A second simplification offered by the above assumptions regarding Stage I is that, because the effects of noise can be reasonably approximated by describing the camera noise as a Poisson distribution, all noise components (e.g.. (i) photon noise for the pattern and/or background and (ii) camera noise) can be incorporated into a single Poisson distribution that is the sum of the two Poisson components, as set forth in the progression of equations (11) - (14) below:

( Poisson(a f) * Poisson( i))[di]

(Eq. 13)

(Eq. 14)

[0065] Equation (14), stated as a probability function, can also be described as a

Likelihood function, as in equation (15) below:

rrn (Xj+af)^dl+ai _Q -ih+^i )

l^{l i=1} ¾+^²)!

(Eq. 15)

[0066] Taking the logarithm of equation (15) yields equation (16) below:

L, (hyp othesis\d_j)_Po i_ssPo i_ss Log (£ poissPoiss)

(Eq. 16)

[0067] The derivative of equation (16) with respect to L_έ, which may be used to derive the MLE for (A, B ), as described below, is shown in equation (17) below:

(Eq. 17)

[0068] To derive the approximate MLE for (A. B ) at a given pos, such as ceri from equation (16), the following process may be followed, in an embodiment: (1) assume the intensity L_έ (i.e.. the data representing the pattern of interest) is low relative to the camera noise af, such that the additive data and camera noise, represented as s₍ ² + X_L. can be reduced to a (as in equation (18) below):

(Eq. 18)

(2) take the derivatives of the Bp_dssPoiss with respect to A. B, and (3) set each derivative as equal to zero (as in equations (19) and (20) below, which represent the A, B derivatives of equation (18) as applied to the signal hypothesis model):

(Eq. 19)

(Eq. 20)

The summations can be treated as scalars, and thus {A, B} can be solved by matrix inversion. Equations (21) and (22) below illustrate this derivation as applied to equations (19) and (20), respectively, and equation (23) below illustrates the setup of the matrix for inversion:

[0069] Equation (23) above may be solved for A and B to derive the stage I MLE of those parameters for a single patch. To extend the solution to the entire data set, the dot product of, e.g., equation (21) be generalized to convolution for the entire data set,

yielding å_keS f_k ( ceri )

, to complete the algebra for total dataset analysis by the

°l-k approximate MLE of Stage I. If f_k is not symmetric, then the orientation may be flipped in all dimensions to solve equation (23), as set forth in equation (24) below:

/_fc (cen) v /fc(cen)

f_f2 l kes

/_fc(cen)

-

(Eq. 24)

[0070] Computational tractability may be provided through the use of a Fast Fourier Transform analog of the convolution operator, or for the case of a 3D Gaussian, by separable convolution, in embodiments.

[0071] Solving equation (23) for A gives the approximate MLE of A_L (with the assumption that the pattern of interest is centered at point /) for the signal hypothesis, as given by equation (25) below:

(Eq. 25)

[0072] Solving equation (23) for B gives the approximate MLE of B_L (once again, with the assumption that the pattern of interest is centered at point /) for the signal hypothesis, as given by equation (26) below:

(Eq. 26)

[0073] Equations (24) and (25) above may be solved for A and B to determine the MLE for the signal hypothesis for a single pattern at a single patch, or segment, of the data set. Thus, to characterize a single pattern, equations (25) and (26) may be applied to all patches in the dataset, solving for {A_t, B_L) for all positions i and assuming that the spot pattern is centered in the patch at position cen. As noted above, in other embodiments, the spot pattern may be assumed to be centered or located at some point in each patch or segment other than the center.

[0074] To determine the approximate MLE for the null hypothesis model, a similar analysis to that set forth with respect to equations (19) - (23) above may be carried out, but with equation (18) applied to the null hypothesis model instead of the signal hypothesis model, yielding the following equation (27) for the approximate MLE of B _έ for the null hypothesis model:

(Eq. 27)

[0075] For each given patch, the MLEs for the signal hypothesis model and the null hypothesis model may be calculated and used to determine the Likelihood ratio for that patch and a resulting Likelihood ratio landscape with equations (28) - (30) below, where equation (28) is the result of integrating equation (18), equation (29) is the application of equation (28) to the signal hypothesis model, and equation (30) is the application of equation (28) to the null hypothesis:

(Eq. 28)

dj__fc

(Eq. 29)

(Eq. 30)

[0076] In an embodiment, a Likelihood ratio based on equations (29) and (30) is given by equation (31) below:

(Eq. 31)

[0077] Likelihood ratios may be determined for each patch, or segment, in the data set according to the equations above. In the case of fluorescence image data, where a patch is present centered on each pixel or voxel in the data set, a Likelihood ratio may be determined for each such position. The Likelihood ratios for each segment may be compared to a threshold such that, above a minimum value of the Likelihood ratio, a patch is determined to have a reasonable probability of including an instance of the pattern of interest (e.g., a spot). These patches may be termed“candidate regions,”“candidate patches,” or“candidate segments” in this disclosure. The threshold may be selected or determined experimentally, in embodiments, to achieve an appropriate balance between sensitivity and over-inclusiveness (e.g., to minimize false positives and false negatives).

[0078] The above considerations apply to Stage I analysis. These considerations are set forth here with respect to identification of fluorescent spots in a data set comprising 3D image data, but a person of skill in the art will appreciate that the approach is readily applicable to many other types of patterns and backgrounds in many other types of data sets. To that end, the approximate MLE of A_L for the signal hypothesis model for a general case is given in equation (32) below: (Eq. 32)

[0079] The approximate MLE of B; for the signal hypothesis model for a general case is given in equation (33) below:

(Eq. 33)

[0080] As a result of the above analysis, determination of MLEs for all patches in a large data set, for each model, and the corresponding Likelihood Ratios, are computationally tractable. For example, in an embodiment given present-day computing resources, the determination of MLEs for all patches in an example fluorescence image data set may take approximately one minute per large 3D image data set.

[0081] In addition, although the instant disclosure discusses embodiments in which the signal hypothesis includes one linear parameter (A) associated with the signal pattern and one linear parameter ( B ) associated with the background pattern, the techniques and methods of this disclosure may be readily applied to any number of linear parameters associated with the signal pattern and/or the background pattern. In the general case above, for the

signal hypothesis º li(A, B, pos_A, pos_B) = Af_L(pos_A) + Bg_L(pos_B). the parameters A and B are linearly related to X_L and for a given data set and specified positions, A and B may be solved algebraically by the using the notation described in equation (23). Consequently, all linear parameters for any given hypothesis can be solved for any n number of patterns since these linear parameters can be manipulated into the matrix notation described in equation

(23). Ultimately, for a hypothesis º a_tf_t(x^) + a₂f₂ (x^) d - f < /_nC ) a set of amplitude parameters {a_t, a₂, ... a_n } can be solved directly by the approximate Likelihood for each pattern {f , f₂, ... , f_n } given each position { , x , ... , ¾} for a given data set.

[0082] In embodiments, the result of Stage I of the two-stage Likelihood pipeline is definition of one or more candidate patches or segments. As set forth below, those candidate patches or segments may then be subjected to a full Likelihood analysis (e.g.. as described in the Stage II section below) to characterize (i.e., confirm (or not) the existence of, and determine the intensity and exact location of) the respective instances of a pattern of interest (i.e., spots) that may be present in those candidate patches or segments.

[0083] Stage I analysis has another important feature: it can be used to reverse systematic distortions that arise during data collection. This may be achieved if, during Stage I analysis, via the signal hypothesis, the pattern of interest fi is chosen so as to match the shape of the systematic distortion. This can be illustrated for the case of image analysis, where an imaged object is systematically distorted by physical limitations of the optics. Correction for such distortion is called deconvolution. For the application of fluorescent imaging, the systematic distortion for the imaging system involved is defined by the Point Spread Function of the lens. By Stage I analysis, via the signal hypothesis, deconvolution of a fluorescence image may be achieved by defining as the Point Spread Function of the lens.

[0084] It should also be noted that, in fluorescence imaging, Stage I analysis can be applied not only to a spot, but to a fluorescent object with a more complex shape (e.g., a bacterial nucleoid). Such an object may be illuminated by many individual fluorophores and thus may comprise the superposition of many spots. Such an image comprises the summed output of the large number of point sources decorating that complex object, i.e. is one continuum of different spot densities. Upon application of the Stage I signal hypothesis, the output of the parameter A at all positions in the data set may provide an image of such an object. Moreover, when this exercise is carried out with f, equal to the distortion (i.e. the Point Spread Function), the output of the parameter A at all positions in the data set may provide the true, undistorted version of the object.

[0085] Two-Stage Pipeline Stage I Multiple Data Collection Regimes

[0086] Stage I is described above with respect to a data set resulting from a single data collection regime targeted to a single pattern of interest. Hence, the signal hypothesis above in equation (9) includes a single pattern of interest amplitude A, a single pattern of interest distribution fi, and a single background amplitude B. Stage I may be extended and applied to a data set having multiple discrete subparts, or subsets resulting from multiple data collection regimes, by formulating the signal hypothesis and null hypothesis to account for multiple patterns of interest (for each of which there may be multiple instances in the data set) and background patterns, and to account for penetrance of each pattern of interest into each data collection regime. One example of such a data set is an image or set of images resulting from capture of two or more patterns of interest with two or more data collection regimes, such as fluorescent spots of two or more fluorophores with different excitation and output wavelengths (i.e.. different“colors”) and captured with respective filtering and image captures (which may be referred to herein as imaging or data capture regimes). Other types of data sets that have multiple subparts resulting from multiple data collection regimes for multiple patterns of interest include multi-spectral LIDAR and multi-spectral astronomical imaging. The remainder of this disclosure will discuss the example of multi-spectral fluorescence imaging, but such discussion is by way of example only.

[0087] This disclosure will refer to an image data set having two discrete subsets resulting from two discrete data collection regimes. Specifically, this disclosure will refer to an image data set comprising images of“red” fluorophores and“green” fluorophores, with“red” fluorophores having a separate excitation and output wavelength from the“green” fluorophores, both the red and green fluorophores being associated with separate images (and thus separate data collection regimes, each regime targeted to detection of red or green) and both the red and green fluorophores having penetrance in both data collection regimes. The “red” fluorophores and data subset will be referred to as the first fluorophores and data subset, and this will be referenced with a subscript of postscript 1. The“green” fluorophores and data subset will be referred to as the second fluorophores and data subset, and thus will be referenced with a subscript of postscript 2. As shown in equation 34 below, the two-color example data set d may include components di, d2 that respectively correspond to the first and second data capture regimes.

d = {d_t, d₂ }

(Eq. 34)

[0088] As noted above, output of a fluorophore of one color may penetrate into multiple data collection regimes. Penetrance of a fluorophore’s color within a given data collection regime (and thus within a given data subset) may be represented herein by penetrance coefficients b. For example, as illustrated in FIG. 5, the intensity of the red fluorophores, l_ί. may penetrate the data collection regime respective of the green fluorophores, and thus may be present in the data subset resulting from the green data collection regime, ch. which penetrance may be represented by coefficient bh. Similarly, the intensity of the green fluorophores, l₂, may penetrate the data collection regime respective of the red fluorophores, and thus may be present in the data subset resulting from the red data collection regime, di , which penetrance may be represented by coefficient b2ΐ. Further, the penetrance of the intensity of the red fluorophores, A₁ in the data respective of the red data collection regime, di may be represented by coefficient bp, and the penetrance of the intensity of the green fluorophores, l₂. in the data respective of the green data collection regime, d . may be represented by coefficient /¾2.

[0089] The signal hypothesis for the two-regime, two-data-subset example includes two patterns of interest (red and green, in this example) and the above-noted penetrance coefficients, so that the effects of penetrance can later be accounted for to determine the true signal amplitude for each color. As shown in equation 35 below, a signal hypothesis may include the sum of the various signals in the various sub-data-sets, appropriately weighted by penetrance coefficients.

(Eq. 35)

two regime, two pattern signal hypothesis = b₁₁l_{1 j} + b_ΐ2^-_2,ί> b_ΐΐ^-_ΐ,ί + b₂₂l_{2 ί} where l_{1 j} and T_2,i ^are the output of the red and green intensity signals, respectively, at data point index i.

[0090] As noted above in equation (1), the measured intensity at a given data point is a sum of the amplitude of the pattern of interest, A, and the amplitude of the background, B, multiplied by their respective pattern functions,/ and g. Accordingly, the above two-pattern hypothesis expands to equation (36) below, which is an example two-pattern, two-regime version of equation (1).

(Eq. 36)

where A₁ and A₂ are the amplitudes of the first and second (i.e., red and green) patterns, respectively, B_t and B₂ are the amplitudes of the first and second backgrounds, respectively, f_{l L} and f_{2 t} are the distribution functions for the first and second patterns, respectively, at point i, and g_{t i} and g_2,L are the distribution functions of the first and second background patterns, respectively, at point i.

[0091] Similarly, a null hypothesis for the data set having two subsets resulting from two data collection regimes may have the form given in equation (37) below:

two regime null hypothesis = b_\\B^_{1 ί}{ro5₁ ) + b₂₁ B₂g_{2 i}{pos_{2 B} ^'), biiB^i, i(pos_XB) + b₂2B₂92, R^2b)

(Eq. 37)

[0092] Like the single-regime Stage I analysis, the multi-regime Stage I analysis may involve three assumptions: (i) the 3D Gaussian distributions corresponding to the patterns of interest (i.e., the red and green patterns of interest) and the distributions corresponding to the background may be assumed to be positioned at some respective specified positions within the patch (pos_A and pos_B). respectively; (ii) the measurement noise may be described as a Poisson distribution, rather than as a Gaussian distribution; and (iii) the intensity of the mean photon values may be assumed to be low relative to measurement noise. Accordingly, like the single-regime case set forth above, all noise components (e.g., (i) photon noise for the pattern and/or background and (ii) camera noise) can be incorporated into a single Poisson distribution that is the sum of the two Poisson components and the probability function of the two-regime, two pattern signal hypothesis may take the form of equation (14).

[0093] Unlike the noise model, the likelihood function associated with the signal hypothesis does change as between a single-regime Stage I analysis and a multi-regime Stage I analysis. A likelihood function associated with the signal hypothesis for a two-regime analysis is provided in equation (38) below. (Eq. 38)

A person of skill in the art will appreciate how to construct a similar likelihood function for a two-regime null hypothesis.

[0094] The logarithm of the two-regime likelihood function for the two-regime signal hypothesis - i.e., the two-pattern, two-regime variant of the left side of equation (16) - is composed of a sum of the contributions of the signals and noise in each data capture regime, weighted according to the penetrance coefficients, as shown in equation (39) below:

(Eq. 39)

A person of skill in the art will appreciate how to construct a similar logarithm of a likelihood function for a two-regime null hypothesis.

[0095] As in the single-regime embodiment of Stage I, the two-regime embodiment may involve assuming that the centers of the pattern of interest distribution and the background distribution, pos_A and pos_B, are at the center of the segment, cen. The above equation (39) can be expanded, like equation (16), into a sum of summations containing variables for the measured intensities X_{t i}, l_{2 ί} and the camera noise s? . And, like the single-regime variant of Stage I, the derivative of the logarithm of the likelihood may be taken with respect to each intensity amplitude value - A_x, A₂. B₁. and B_{2 -} and those derivatives may be set to zero, as shown in equations (40), (41), (42), and (43) below, to derive an approximate MLE for (A .^_{2 ^} Si, B₂).

(Eq. 40)

(Eq. 41)

(Eq. 42)

(Eq. 43)

[0096] The derivatives shown in equations (40), (41), (42), and (43) result in a matrix formulation with the general form of equation 44 below:

bΐ_,ίMbu = bϊb

(Eq. 44)

M is a matrix comprising equations equivalent to the various values of A and B, b_{1 ]} is a matrix of values of penetrance coefficients at data point /. and b is a linear vector comprising values of A and B. For a case involving more than two patterns of interest, more than two data capture regimes, and/or more than two data set subsets, the various data structures included in equation (44) may be of a higher order than those explicitly described above.

[0097] After expansion of equation (44) and some manipulation and reduction, M and bi become matrices in block diagonal form, shown in equation (45) below, which is a two- regime version of equation (24).

(Eq. 45) [0098] The transpose of the penetrance coefficient matrices

cancel from both sides, leaving equation 46 below:

(Eq. 46)

[0099] In order to solve for the MLE of A_{t i}, A₂ U B_{± i}, and b_2,ί fr°^m equation 46 above, a three-step solution process may be followed. First, the values of A₁ A_{2 i}, B₁ and B_{2 L} may be solved for without regard to the penetrance coefficients. A person of skill in the art will appreciate how to solve the values of A₁ A_{2 i}, B₁ and b_2,ί by extension of the solution process described above in the single-regime version of Stage I to equation (46) above. In this intermediate solution, the values of A₁ A_{2 i}, B₁ and b_{2 ί} may include contributions— i.e. , penetrance— from multiple patterns of interest and multiple background patterns. This intermediate solution may be considered“solving” the likelihood equations without respect to the penetrance coefficients.

[0100] Second, the values of the penetrance coefficients may be determined. In some embodiments, penetrance coefficients may be determined for each combination of a pattern of interest and a data capture regime. Determining a value of a penetrance coefficient may involve a two-step sub-process. The first step in the sub-process may include selecting a set of data points in the data set from which to determine the penetrance coefficient. In an embodiment, data points from the data subset corresponding to a data capture regime targeted to the pattern of interest (referred to, for ease of description, as the“target” regime in this discussion) and from the data subset corresponding to the data capture regime for which a degree of penetrance is to be calculated (referred to, for ease of description, as the “incidental” regime in this discussion). For example, in an embodiment, data points known to contain only the particular pattern of interest (e.g., only“green” fluorophores”, when determining penetrance of the green pattern of interest into another regime, or only“red” fluorophores, when considering penetrance of the red pattern into another regime) may be selected. Further, in an embodiment, data points from both data subsets that correspond to the same point in space may be used. Still further, data points may be selected in which the amplitude in both the target regime and the incidental regime is high, thus indicating penetrance of the pattern of interest into both regimes. The respective amplitudes in the two data subsets may be compared to a threshold, in embodiments, and data points with amplitudes that meet the threshold in both subsets may be selected for use in determining penetrance coefficient values. The data respective of the two data capture protocols (e.g., di and d₂) at a particular data point may be set as the two coordinates of a point (di,i , d2,i) and a set of points {di,i , d_2,i} may be defined.

[0101] The second step in the sub-process may include plotting the selected data points {di,i , d_2,i} and determining the slope of the plot (e.g., the slope of a trend line). In an embodiment, The slope of the plot (e.g., slope of the trend line of the plot) may be set as the penetrance coefficient of the pattern of interest in the data capture protocol used in the numerator of the slope calculation. Accordingly, in an embodiment, to determine the value of the penetrance coefficient for a pattern of interest into an incidental regime, the incidental regime may be placed in the numerator of the slope calculation, and the target regime in the denominator. An example of the three-step process for determining penetrance coefficients will be described in conjunction with FIG. 3 and FIG. 6.

[0102] In embodiments in which more than two pattern of interest types exist, the above process may be repeated for each combination of a pattern of interest and a data capture regime to determine all penetrance coefficients. In an embodiment in which data capture regimes are specifically tailored to respective pattern of interest types, the penetrance coefficient for a pattern of interest in a data capture regime targeted to that pattern of interest may be set at a full value (e.g., 1.)

[0103] The third step in the process for determining an MLE may be to determine the MLE values according to the penetrance coefficient values. In an embodiment, this may involve the use of the amplitude values calculated in the intermediate likelihood equation solution. Because those values are calculated without regard to the penetrance coefficients, the values may include penetrance from more than one pattern of interest. Accordingly, a further operation may be performed on the intermediate values that accounts for penetrance of multiple pattern of interest types to determine the“true” value of the amplitudes of the signals of interest and the backgrounds for the MLE. In an embodiment, in that further operation, the inverse of the penetrance coefficients may be applied to each respective {A,

B}, which is equivalent to linear unmixing the As and the Bs. The result of this operation is an approximate MLE for each pattern of interest type at a given data point— that is, in a two- regime, two pattern case, the values of A_{l t} and B_{l L} without penetrance from the second pattern of interest, and values of A_{2 L} and B_{2 L} without penetrance from the first pattern of interest. An approximate MLE may be calculated in this manner for each segment. Further, a likelihood value, based on the MLE values, may be calculated for both the signal hypothesis and for the null hypothesis for each pattern of interest at each segment. In an embodiment, a single Likelihood ratio may then be calculated for each segment, and any segment with a Likelihood ratio above a selected threshold may be designated as a candidate segment for further processing at Stage II. In another embodiment, a separate Likelihood ratio may be calculated as to each data subset (e.g, corresponding to each data capture regime), and segments may be designated as candidate segments on a regime-by -regime basis.

[0104] Two-Stage Likelihood Pipeline Stage II.

[0105] Stage II of the two-stage likelihood pipeline may include a full Likelihood analysis. An example single-regime Likelihood function for the signal hypothesis model having a full noise model for a fluorescent image data set is set forth in equation (47) below:

k_j PG_=i( Gaussian(pi, af) _* Poisson(Ai(A, B, pos)))[di]

(Eq. 47)

[0106] An example Likelihood function for the signal hypothesis model for a data set having two subsets resulting from two data capture regimes and having a full noise model for a fluorescent image data set is set forth in equation (48) below:

(Eq. 48)

[0107] A person of skill in the art will appreciate how to construct a similar full Likelihood function for the null hypothesis model, as the Likelihood approach is well documented and understood. Similarly, a person of skill in the art will appreciate how to solve for A₁ A_{l l}.

B_{t j}, B₂ pos . and pos₂ to determine the Likelihood ratio respective of each candidate segment or patch as a function of the values of those parameters. [0108] The data considered at Stage II may comprise the data from the original data set d that is within or around the candidate spot segments or patches. Accordingly, in some embodiments, the“segment” considered in Stage II may a different size than (e.g., the “segments” analyzed in Stage I. In other embodiments, the segments considered in Stage II may be the same size as the segments analyzed in Stage I.

[0109] In Stage II, a Likelihood ratio may be determined for each candidate spot segment, and each ratio may be compared to a second threshold. This second threshold may be separately determined or selected from the first threshold, in embodiments. This Likelihood ratio may provide the final definition of whether a spot is present or not. In an embodiment in which a candidate segment was determined as to all patterns (e.g., with a single Likelihood ratio at Stage I), this Stage II Likelihood ratio may indicate the presence (or lack thereof) of the most prominent pattern in the segment examined in Stage II. In an embodiment in which candidate segments are designated on a regime-specific basis, this Stage II Likelihood ratio may indicate the presence (or lack thereof) of a pattern in the regime for which the segment was designated.

[0110] The full Likelihood analysis at Stage II may differ from the approximated

Likelihood analysis of Stage I. First, the Likelihood functions used in stage II may be fully- detailed Likelihood functions, thus providing optimal definition of MLEs in Stage II. For example, in the case of spot detection and characterization, the noise per pixel may be represented as a (Poisson*Gaussian) distribution. Second, the values of x, y, and z (i.e.. the components of pos) may vary throughout the analyzed region in Stage II, rather than being fixed at the center of a patch as in Stage 1. As a result of these two features, for a region defined as containing a spot (i.e.. a candidate segment), analysis of the MLE for the signal hypothesis model for that region will yield not only the values of the various parameters A and B (i.e., intensities of the patterns of interest and of the backgrounds), but also the position of the spot in the three dimensions at sub-pixel values of x, y and z. MLE determinations may be made in Stage II through a hill-climbing exercise or other appropriate methods, in embodiments.

[0111] As noted above, a full Likelihood analysis for fluorescent spot detection is limited in known methods by the need to initially identify candidate spot locations manually and/or by ad hoc manual or computational criteria. The two-stage Likelihood pipeline overcomes this limitation through the use of minimally -invasive approximations of the full Likelihood functions at Stage I. Furthermore, the estimates of A and B and x, y, z provided at Stage I for any given pattern or regime may generally be similar to the precisely-defined global maxima provided by the fully detailed Likelihood analysis of Stage II. Thus, if a hill climbing exercise is applied in Stage II to each candidate spot region, it can be seeded by (i.e. begin with) the parameter values of the signal hypothesis and the null hypothesis defined at Stage I. For application according to the signal hypothesis, the starting point for this exercise may be provided by the values of A, B, x, y and z for a given pattern or regime defined by the Stage I MLE according to that hypothesis. For application according to the null hypothesis, this starting point may be provided by the value of B defined by the Stage I MLE, where the value of B is the same at every position, thus removing x, y and z as variables. Seeding the hill-climbing exercise provides computational tractability without the risks of (i) climbing an irrelevant hill and thus detecting a spot where none is present, (ii) failing to detect a spot when one is present, or (iii) detecting of a spot with incorrect parameters specified. The outcome of the hill-climbing exercise is, for each hypothesis, a Likelihood landscape in the corresponding 5-parameter space (signal hypothesis) or 1 -parameter space (null hypothesis). In each case, the position in the parameter space with the highest Likelihood comprises the MLE; and the ratio of the Likelihood values at the MLEs for the two hypotheses comprises the Likelihood Ratio for that candidate spot region. The value of the Likelihood Ratio provides a measure of the probability that a spot is present; and the corresponding values of all parameters corresponding to the MLE of the signal hypothesis yield the intensity of the spot (A) and of the background (B) and the location of the spot (in (x, y, z), which may be at sub-pixel resolution) in the data set. In summary, overall, the effects of combining Stage I and Stage II in the Two-stage Likelihood Pipeline confer the advantages of a full Likelihood approach with respect to robust spot detection, precise and accurate spot localization and quantification of spot and background intensities without the unmanageable computational complexity of the standard Likelihood approach.

[0112] Two-Stage Likelihood Pipeline Differentiation of Closely-Space Patterns of

Interest

[0113] The two-stage likelihood pipeline, as described herein, is generally well-suited to detect one instance of a pattern of interest per segment at a time. That is, the LR calculated at Stage I may Boolean in nature - indicative of the presence of a single instance of a pattern of interest, but not indicative of the quantity of instances of any pattern of interest, or the quantity of different patterns of interest. In some embodiments, the LR calculated at Stage I may be indicative of the most prominent instance of a pattern, but not additional instances of the same pattern, or the presence of instances of other patterns of interest. For example, if a segment contains two green fluorescent spots (i.e., two instances of the green spot pattern of interest) and two red fluorescent spots (i.e., two instances of the red spot pattern of interest), a single Stage I analysis may detect the brightest one of the four spots, but not the other three spots. Similarly, the full likelihood analysis at Stage II may generally locate and determine the amplitude of a single instance of a pattern of interest ( e.g ., the most prominent instance of a pattern of interest) for a segment. As a result, where there are multiple instances of a single pattern of interest and/or multiple patterns of interest within a segment, some instances of one or more patterns of interest in the segment may not be detected and characterized according to a single iteration of the two-stage Likelihood pipeline.

[0114] Accordingly, the instant disclosure introduces an iterative process for

differentiating, detecting, and characterizing multiple closely-spaced instances of one or more (e.g., multiple patterns of interest and/or multiple instances of a pattern of interest within a single segment). This iterative process will be described below with reference to FIG. 8.

[0115] Example Methods Applying the Two-Stage Likelihood Pipeline.

[0116] As noted above, the multi-regime two-stage Likelihood pipeline may find particular use with data sets having a low signal-to-noise ratio and including data sets having two or more subsets corresponding to two or more data capture regimes. Similarly, versions of the two-stage Likelihood pipeline may find particular use with data sets having two or more pattern of interest instances that overlap with each other, and/or that are otherwise closely spaced. For example, the instant methods may find use with a data set including two or more images of a biological entity under study to characterize the fluorescence emanating from fluorophores of two or more colors as captured with different imaging regimes, and/or two or more closely-spaced fluorophores, whether captured with the same or different imaging regimes. The photon output of any such fluorophores is proportional to the intensity of the excitation energy applied to the fluorophores. By enabling low-SNR characterization of fluorophores, the two-stage Likelihood pipeline enables the use of low excitation energy, thereby reducing cell toxicity from excitation energy. Low-SNR regimes also minimize destruction of the fluorophores that occurs due to excitation (known as“photobleaching”). The following methods are generally directed to characterization of fluorophores using the two-stage Likelihood pipeline, but it will be appreciated that the two-stage Likelihood pipeline may find use in pattern identification and characterization with many types of data sets, especially those characterized by low SNR. [0117] FIG. 1 is a flow chart illustrating an embodiment of a method 10 of identifying and characterizing one or more instances of one or more patterns of interest in a data set. In an embodiment, the method 10 may be applied to identify and characterize two or more instances of a single pattern of interest in a per segment of a data set and/or two or more patterns of interest per segment. The method 10 may be or may include one or more aspects of the two-stage Likelihood pipeline, described above, including aspects of the two-stage Likelihood pipeline directed to pattern detection and characterization in a data set having multiple discrete subparts resulting from multiple data capture regimes. The method may begin with a step 12 of acquiring an N-dimensional data set containing two or more patterns of interest and/or two or more instances of any given pattern of instance. The patterns of interest, and instances thereof, may exist in a data set captured according to a single data capture regime, in some embodiments. In other embodiments, the patterns of interest, and instances thereof, may exist in two or more different subparts of the data set corresponding to two or more different data capture regimes. For example, in an embodiment in which the data set is one or more images, one or more instances of a first pattern of interest may have been captured with a first imaging regime, and one or more instances of a second pattern of interest may have been captured with a second imaging regime that is different from the first regime.

[0118] The step 12 of acquiring the N-dimensional data set may include acquiring (e.g., by electronic transmission) one or more pre-captured images. Additionally or alternatively, the step 12 of acquiring the N-dimensional data set may include capturing one or more images with an image capture device and/or controlling or otherwise communicating with an image capture device to cause the image capture device to capture one or more images. The data set may include 3D imaging data captured using fluorescence microscopy, in an embodiment.

For ease of reference, the method 10 will be described with respect to an embodiment in which the data set includes 3D imaging data captured using fluorescence microscopy, where each data point has the form given in equation (4) of this disclosure. It should be understood, however, that in other embodiments, the data set may include another type of imaging data and/or non-imaging data. The method 10 will also be described with reference to fluorescent spots as the patterns of interest. It should be understood, however, that the method is more broadly applicable to other patterns.

[0119] In an embodiment, at the data set acquisition step 12, multiple fluorescent spots may be imaged with different imaging regimes. Each regime may include a respective excitation wavelength intended to excite one or more fluorophores in a sample. Each regime may also include an image capture phase in which the output of the one or more fluorophores are imaged. In some embodiments, excitation signals in different regimes may be provided simultaneously. In some embodiments, different excitation wavelengths may be separated in time. Similarly, in some embodiments, data capture in different regimes may occur simultaneously, such as through multiple different image capture devices coupled to respective light filters, or through filtering images by wavelength after capture. In some embodiments, image capture of different fluorophores in different regimes may be separated in time (e.g., to coincide with a known output period of the fluorophores in each regime based on the time when excitation signals are provided).

[0120] In an embodiment, the 3D dataset may include multiple images captured at multiple respective 2D focal planes using a microscope, with each of the 2D focal plane images having x- and y-dimensions and the third z-dimension corresponding to a depth dimension along the different focal plane images. FIG. 2 is a diagrammatic illustration of a 3D dataset 20 that may be captured, acquired, and/or processed in accordance with some embodiments of the method 10. As shown in FIG. 2, the z-dimension of the 3D dataset 20 may include a plurality of images 22 captured at different focal planes. [0121] Although ten different focal plane images 22 are illustrated in FIG. 2, it should be understood that any suitable number of images at any suitable number of focal planes may be included in the 3D dataset (also colloquially referred to herein as a“z-stack” of images or a “z-series”).

[0122] In some embodiments, the 3D dataset may be acquired using a conventional epi- fluorescence illumination microscope in which images from multiple focal planes are acquired sequentially by physically moving the position of the microscope stage up/down (e.g., in the z-direction). The position of the stage may be manually operated or

automatically controlled by a controller including, but not limited to, a computer processor or one or more circuits configured to provide command control signals to the microscope to position the stage. Reducing the amount of time needed to acquire a complete set of focal plane images by using a hardware controller circuit may enable the acquired data to more closely resemble simultaneous acquisition of the data, which facilitates spot detection by reducing the effect of motion over time on the spot detection process, as discussed in more detail below. However, it should be appreciated that capturing a z-stack of images may include any suitable number of focal-plane images.

[0123] Rather than sequentially obtaining a z-stack of images by physically moving the microscope stage, as discussed above, the 3D dataset may be captured with a microscope having multiple cameras, each of which simultaneously acquires data in a unique focal plane, which enables instantaneous collection of a 3D dataset, thereby removing the obscuring effect of object motion between capture of images at different focal planes. In one illustrative embodiment, a microscope having nine cameras and associated optics may be used to simultaneously acquire nine focal plane images. However, any suitable number of cameras (including two cameras) may be used to simultaneously acquire a z-stack of focal plane images, and embodiments are not limited in this respect. For example, in some embodiments, at least three cameras may be used.

[0124] In yet further embodiments, the 3D dataset may be acquired using a combination of multiple cameras and physically moving the microscope stage. Using multiple cameras reduces the time required to acquire a z-stack of images compared to single camera microscope embodiments. Using fewer cameras than would be required to simultaneously acquire all images in a z-stack (e.g., nine focal plane images) and combining the multi camera microscope with stage repositioning may provide for a lower cost microscope compared to fully-simultaneous image capture microscope embodiments. For example, some embodiments may acquire the 3D dataset using a microscope having three cameras and use three different stage positions to acquire a nine focal-plane image 3D dataset. Any suitable number of cameras and physical positioning of the microscope stage may be used to acquire a 3D dataset, and embodiments are not limited in this respect.

[0125] In an embodiment involving the capture of images of one or more fluorescent spots of one or more colors, step 12 may include controlling a microscope, as noted above, and/or controlling one or more sources of excitation radiation to activate the fluorophores to be imaged as fluorescent spots. Step 12 may additionally include sequential data capture for each regime, and/or simultaneous data capture for multiple regimes, in some embodiments. Step 12 may additionally include regime-specific filtering of captured data, in some embodiments.

[0126] The method 10 may further include a step 14 of applying an approximate multi regime Likelihood analysis to the data set to identify one or more pattern candidate segments for each pattern. Applying an approximate multi-regime Likelihood analysis to the data set may proceed according to the multi-regime version of stage I of the two-stage Likelihood pipeline described herein, in an embodiment. An example method that may be applied in step 14 will be described with respect to FIG. 3.

[0127] With continued reference to FIG. 1, step 14 may include dividing the data set into a plurality of segments and applying an approximate multi-regime Likelihood analysis to each segment, in an embodiment. As noted above in this disclosure, a segment may include a set of adjacent, contiguous voxels, in an embodiment. In other embodiments, a segment may include non-adjacent or non-contiguous voxels or pixels or other portions of the data set. A result of the step may be, for each segment, a likelihood that each segment includes one or more patterns of interest (e.g., one or more fluorescent spots). If the likelihood that a given segment includes one or more patterns of interest is sufficiently high, the segment may be designated a“pattern candidate segment” for a specific pattern (or for all patterns) for further processing.

[0128] The method 10 may further include a step 16 of applying a full Likelihood analysis to the pattern candidate segments identified in step 14 to characterize the patterns of interest. Step 16 may generally proceed according to Stage II of the two-stage Likelihood pipeline described above. A result of step 16 may be one or more characterized patterns of interest.

In an embodiment, a result of step 16 may be a location and amplitude of one or more patterns of interest, such as one or more fluorescent spots, as well as further confirmation of the existence of one or more patterns. For example, one or more respective instances of two or more patterns respectively associated with two or more imaging regimes may be characterized. Of course, in embodiments, no instances of a pattern of interest may actually be present in the data set, and the result of step 16 may be zero detected instances of any given pattern of interest.

[0129] A single iteration of steps 14 and 16 may result in identification of up to a single instance of a pattern of interest per segment, in embodiments. Accordingly, the method may advance to a step 18 for distinguishing, detecting, and characterizing further instances of the same pattern of interest and/or instances of further patterns of interest that may have been in close proximity to the pattern of interest instances during the initial iteration of steps 14 and 16

[0130] Step 18 may include distinguishing and identifying one or more further instances of a pattern of interest, and/or one or more further patterns of interest in or near the candidate segments identified in step 14. In an embodiment, step 18 may involve an iterative process of subtracting the amplitude respective of the most prominent (e.g., highest-amplitude) pattern of interest instance from a given segment, then re-applying an approximate likelihood analysis on the resulting data set portion (e.g., the data in and/or around the segment, less the subtracted amplitude). The iterative process may be applied within and around each segment having a confirmed pattern of interest instance until an appropriate number of iterations is reached. In other words, step 18 may involve generally repeatedly detecting an instance of a pattern of interest, subtracting an amplitude associated with that instance from the underlying data, and proceeding to attempt to detect a further pattern of interest instance nearby the detected and subtracted instance. An example process for performing step 18 will be described with reference to FIG. 8.

[0131] As noted above, the reduction in excitation energy enabled by the two-stage Likelihood pipeline approach consequently may reduce biological toxicity of the excitation energy and atomic degradation of fluorophores by the excitation energy, and therefore may allow for more frequent capture of more images over longer periods of time of imaging. Accordingly, embodiments that employ the spot detection techniques described herein also allow for acquisition of a larger number of images and, correspondingly, of imaging data with image capture at more frequent intervals over substantially longer timescales, which opens new possibilities for observing in vivo biological processes that unfold dynamically via rapid modulations over such longer timescales.

[0132] The steps 12, 14, 16, 18 of the method 10 may be repeated over a period of time to track one or more instances of one or more patterns of interest over a plurality of data sets, with each data set comprising a 3D image of the same subject captured at a respective given point in time. In an embodiment, a visualization (e.g., a snapshot image, movie, etc.) of the characterized pattern of interest (e.g., of the characterized fluorescent spot) may be created.

[0133] FIG. 3 is a flow chart illustrating a method 30 of identifying one or more pattern candidate segments (i.e., segments that may contain one or more instances of one or more patterns of interest in any of two or more subsets of the data set corresponding to two or more data capture regimes) in an N-dimensional data set. The method 30 may encompass a multi regime embodiment of Stage I in a two-stage Likelihood pipeline analysis. Accordingly, as noted above, the method 30 may be applied at step 14 of the method 10 of FIG. 1. Portions of the method 30 may also be applied at step 18 of the method 10, as will be described with reference to FIG. 8 in this disclosure. The method 30, or portions thereof, may be applied to one or more segments of a data set in order to define one or more pattern candidate segments for two or more patterns of interest, and/or to determine an approximate maximum likelihood estimate associated with multiple instances of one or more patterns of interest in a segment.

[0134] The method 30 is illustrated and will be described with respect to its application to a single segment. Thus, the illustrated and below-described steps of the method may be applied to each of a plurality of segments in the data set, and the method may be repeated for each segment. Repetitions of the method 30 and/or steps of the method 30 may be performed serially or in parallel. Further, it should be understood that the data respective of a segment examined and analyzed in the method 30 may be either a“full” version of a data set segment (e.g., the data set segment as originally captured) or a“reduced” version of a data set segment, in which amplitude data respective of one or more instances of one or more patterns of interest have been subtracted from the underlying data as part of a process to detect further instances of one or more patterns of interest in and around the segment.

[0135] For a given segment, the method may include a step 32 of defining the segment as having the patterns of interest and background at respective specified positions within the segment. In an embodiment, the specified positions of the patterns of interest and background may be the same as each other. In other embodiments, the specified positions of the pattern of interest and background may be different from each other. Furthermore, in an embodiment, the specified positions of each pattern of interest may be the same as the specified position as each other pattern of interest. For example, in some embodiments, the positions of the patterns of interest and of the background patterns maybe assumed to be at the center of the segment.

[0136] In an embodiment in which the data set includes two subparts corresponding to two data capture regimes, step 32 may include formulating a signal hypothesis having the form set forth in equation (36). In other embodiments, step 32 may include formulating a signal hypothesis having an equivalent form to equation (36) that is appropriate for the number of data capture regimes represented in the data set. Step 32 may additionally include formulating a null hypothesis having the form set forth in equation (37), above, if a two- regime-derived data set, or an equivalent form appropriate for the number of regimes represented in the data set.

[0137] Formulating a signal hypothesis and a null hypothesis at step 32 may also include weighting components of each hypothesis with coefficients to account for penetrance of the various patterns of interest in the various data capture regimes. For example, each signal- background-regime combination may be weighted by a respective penetrance coefficient, as in equations (36) and (37). [0138] The method may further include a step 34 that includes defining likelihood functions that include penetrance coefficients. For example, in an embodiment, respective likelihood functions may be defined for both the signal hypothesis and the null hypothesis formulated in step 32, and each of those likelihood functions may include penetrance coefficients respective of each combination of pattern of interest and data capture regime.

[0139] Defining likelihood functions at step 34 may include formulating an approximate Likelihood function with respect to a signal hypothesis model that accounts for one or more sources of noise, in an embodiment. For example, the approximate Likelihood function for the signal hypothesis at step 34 may represent measurement noise (e.g., camera noise) as a Poisson distribution, may represent background noise as a Poisson distribution, and may represent pattern noise as a Poisson distribution. In a two-regime embodiment, step 34 may include formulating an approximate Likelihood function having the form in equation (38). In another embodiment having three, four, or more subsets in the data set corresponding to three, four, or more data capture regimes, step 34 may include formulating an approximate Likelihood function having a form based on equation (38), but tailored to the required number of data subsets.

[0140] Defining likelihood functions at step 34 may further include defining an approximate Likelihood function with respect to the null hypothesis model that accounts for one or more sources of noise, in an embodiment. For example, the approximate Likelihood function for the null hypothesis at step 34 may represent measurement noise (e.g., camera noise) as a Poisson distribution and may represent one or more sources of background noise as a Poisson distribution. In an embodiment, step 38 may include formulating an approximate Likelihood function having the form in equation (38) (which, as noted above, can readily be modified by a person of skill in the art so as to apply to the null hypothesis) [0141] The method 30 may further include a step 36 of solving the likelihood equations without respect to penetrance coefficients. For example, in an embodiment, the likelihood equations may be solved to determine the respective amplitudes of the signals of interest and background patterns as described above with respect to equations (38) and (39).

[0142] The method may further include steps 38, 40 for determining the values of the penetrance coefficients. At step 38, the method may include determining a set of penetrance data points. In an embodiment, penetrance data points may be determined for each pattern of interest. In an embodiment, penetrance data points may include points in the data subset corresponding to a data capture regime targeted to the pattern of interest (referred to, for ease of description, as the“target” regime in this discussion) and points in the data subset corresponding to the data capture regime for which a degree of penetrance is to be calculated (referred to, for ease of description, as the“incidental” regime in this discussion). For example, in an embodiment, data points known to contain only the particular pattern of interest (e.g., only“green” fluorophores”, when determining penetrance of the green pattern of interest into another regime, or only“red” fluorophores, when considering penetrance of the red pattern into another regime) may be selected. Further, in an embodiment, data points from both data subsets that correspond to the same point in space may be used. Still further, data points may be selected in which the amplitude in both the target regime and the incidental regime is high, thus indicating penetrance of the pattern of interest into both regimes. The respective amplitudes at given point in the two data subsets may be compared to a threshold, in embodiments, and data points with amplitudes that meet the threshold in both subsets may be selected as penetrance data points for use in determining penetrance coefficient values. The data respective of the two data capture protocols (e.g., di and d₂) at a particular data point may be set as the two coordinates of a point (di,i , d_2,i) and a set of penetrance data points {di,i , d_2,i} may be defined for each combination of pattern of interest and incidental regime. Alternatively, the signal amplitudes in the intermediate MLE solutions respective of the two data subsets for segments containing on the particular pattern of interest may be set as the two coordinates of a point (An, A2,i) and a set of penetrance data points {An, A2,_I } may be defined for each combination of pattern of interest and incidental regime.

[0143] It should be understood that the term“penetrance data point” is used in the present discussion to refer to data points containing only a single pattern of interest, but penetrance of a pattern of interest into multiple data capture regimes may also occur at points in the data set including multiple patterns of interest. However, points in the data set containing only a single pattern of interest may be used as penetrance data points, in an embodiment, in order to isolate the effects of that single pattern of interest from the effects of other patterns of interest.

[0144] At step 40, the method may include calculating the values of the penetrance coefficients according to the data points determined in step 38. For example, in an embodiment, step 40 may include plotting the penetrance data points. Each set of penetrance data points may be plotted— thus plotting each combination of pattern of interest and incidental regime— with a first regime amplitude on a first axis, and a second regime amplitude on another axis. The slope of a given plot (e.g. , the slope of a trend line) comprising penetrance data points may be set as the value of the penetrance coefficient for the pattern of interest into the incidental data capture regime. An example plot is shown in FIG. 6, which illustrates an example of the penetrance of a first pattern (where first regime intensity is on the vertical axis) on data respective of a second regime (where second regime intensity data is on the horizontal axis), and vice-versa. In other words, the plot of FIG. 6 includes two sets of penetrance data points. On the left, and generally oriented vertically, is plot 64 of a set of penetrance data points for the pattern of interest targeted by the first regime. The slope of this plot 64 may be used to determine the value of the penetrance coefficient that describes the degree of penetrance of the first pattern into the second data capture regime. On the right, and generally oriented horizontally, is a plot 66 of a set of penetrance data points for the pattern of interest targeted by the second regime As indicated in FIG. 6, the trend line slope of the first plot 64 is 0.004; accordingly, the penetrance coefficient for the first pattern of interest into the second regime may be set at 0.004. Also as indicated in FIG. 6, the slope of the trend line of the second plot 66 is 0.3144; accordingly, the penetrance coefficient for the second pattern of interest into the first regime may be set at 0.004. Similar plots may be made, and slopes determined, for each combination of pattern of interest and regime in the data set. In an embodiment, the respective values of patterns of interest into their target regimes may be set at one (1).

[0145] Steps 36 and 40 may include building one or more data structures, such as matrices or higher-order structures, of amplitudes of patterns of interest and backgrounds (step 36) and of penetrance coefficients (step 40). That is, one or more data structures collectively including amplitudes of the patterns of interest and the background patterns may be defined at step 36. Similarly, one or more data structures collectively including the values of the penetrance coefficients may be defined at step 40. Equation (46) provides one example of such data structures, in the form of matrices.

[0146] The method 30 may further include a step 42 of calculating a first approximate Maximum Likelihood Estimate (MLE) with respect to a model of the pattern of interest and the background (i.e.. the signal hypothesis model). Calculating the approximate MLE with respect to the signal hypothesis may be performed according to the penetrance coefficients. The step 42 may include applying an inverse of a penetrance coefficient matrix ( e.g ., an inverse of a matrix defined in step 40) to a matrix of intermediate amplitude values (e.g., a matrix defined in step 36), in an embodiment. Examples of such matrices for a two-regime embodiment are set forth in equation (46). Based on the MLE, the approximate Likelihood function respective of the signal hypothesis may be solved to calculate an approximate Likelihood value (LV) for the segment, /. e.. the likelihood that the data actually present at the segment arose from the signal hypothesis having the calculated optimal values. Likelihood values may be calculated for each data subset, in an embodiment, and those likelihood values may be summed to calculated a likelihood value for the multi-regime signal hypothesis.

[0147] The method 30 may further include a step 44 of calculating a first approximate Maximum Likelihood Estimate (MLE) with respect to a model of the background (i.e., the null hypothesis model). Calculating the approximate MLE with respect to the null hypothesis may be performed according to the penetrance coefficients. The step 42 may include applying an inverse of a penetrance coefficient matrix ( e.g ., an inverse of a matrix defined in step 40) to a matrix of intermediate amplitude values (e.g., a matrix defined in step 36), in an embodiment. Examples of such matrices for a two-regime embodiment are set forth in equation (46). Based on the MLE, the approximate Likelihood function may be solved to calculate a Likelihood value (LV) for the segment, i.e., the likelihood that the data actually present at the segment arose from the null hypothesis having the calculated optimal background amplitude values. Likelihood values may be calculated for each data subset, in an embodiment, and those likelihood values may be summed to calculated a likelihood value for the multi-regime null hypothesis.

[0148] The method 30 may further include a step 46 of calculating an approximate Likelihood ratio. In an embodiment, the approximate Likelihood ratio may be the ratio of the Likelihood value associated with the first MLE (i.e.. the MLE with respect to the signal hypothesis model) to the Likelihood value associated with the second MLE (i.e., the MLE with respect to the null hypothesis model). [0149] The method 30 may further include a step 48 of applying a threshold to the approximate Likelihood ratio to determine if the segment is a pattern candidate segment. The threshold may be applied to the approximate Likelihood ratio directly, in an embodiment. In other embodiments, the threshold may be applied to a derivation of the approximate

Likelihood ratio, i.e., one or more values derived from or based on the approximate

Likelihood ratio. If the approximate Likelihood ratio meets the threshold, the segment under examination may be designated as a candidate segment for further processing.

[0150] The method 30 may further include a query step 49 at which it may be determined if further segments remain in the data set for initial examination according to the method 30. If there are additional segments, the method begins anew at step 32 with a new segment. If not, the method 30 may terminate. A result of the method 30 may be a set of zero or more candidate segments for each pattern of interest.

[0151] FIG. 4 is a flow chart illustrating a method 50 of detecting and characterizing a pattern of interest in an N-dimensional data set. The method 50 may encompass an embodiment of stage II in a two-stage Likelihood pipeline analysis. Accordingly, as noted above, the method 50 may be applied at step 16 of the method 10 of FIG. 1. The method 50 may be applied to one or more pattern candidate segments in a data set that have been identified according to, for example, the method 30 of FIG. 3, and/or to portions of the data set surrounding such pattern candidate segments. The data set may be a 3D image data set and the pattern of interest may be, in an embodiment, one or more fluorescent spots.

[0152] Portions of the method 50 may also be applied at step 18 of the method 10, as will be described with reference to FIG. 8 in this disclosure. The method 50, or portions thereof, may be applied to one or more segments of a data set in order to define one or more pattern candidate segments for two or more patterns of interest, and/or to determine a maximum likelihood estimate and likelihood ratio associated with multiple instances of one or more patterns of interest in a segment.

[0153] The method may include a step 52 of selecting a pattern candidate segment from a set of one or more pattern candidate segments. The remaining steps of the method 50 are illustrated and will be described with respect to its application to a single selected pattern candidate segment. In an embodiment, application of the remaining steps of the method 50 may include analysis of a portion of the data set that is larger than the pattern candidate segment— e.g., the pattern candidate segment and adjacent portions of the data set. Thus, the illustrated and below-described steps of the method 50 may be applied to each of one or more pattern candidate segments in the data set, and the method 50 may be repeated for each segment. Repetitions of the method 50 and/or steps of the method 50 may be performed serially or in parallel.

[0154] The method 50 may further include a step 54 of calculating a first full Maximum Likelihood Estimate (MLE) with respect to a model of the patterns of interest and the background (i.e.. the signal hypothesis model). Calculating the full MLE with respect to the signal hypothesis model may include formulating a full Likelihood function with respect to the signal hypothesis model that includes two or more patterns of interest, two or more background patterns, and penetrance coefficients, in an embodiment. Calculating the full MLE with respect to the signal hypothesis model may further include formulating a full Likelihood function with respect to the signal hypothesis model that accounts for one or more sources of noise, in an embodiment. For example, the full Likelihood function for the signal hypothesis at step 54 may represent measurement noise (e.g., camera noise) as a Gaussian distribution, may represent background noise as a Poisson distribution, and may represent pattern noise as a Poisson distribution. In an embodiment, step 54 may include formulating a full Likelihood function according to equation (47) and solving that full Likelihood function (e.g., through a hill-climbing exercise) to determine optimal values of the full Likelihood function for the signal hypothesis at the segment, /. e.. optimal values of the respective amplitudes of the patterns of interest, the locations (e.g, the centers of the distributions) of the patterns of interest, the respective amplitudes of the backgrounds, and the locations (e.g., the centers of the distributions) of the backgrounds at the segment. In an embodiment, calculating a full MLE with respect to a signal hypothesis model may include using initial parameter estimates calculated in stage I (e.g., the parameter estimates in an MLE in the method 30). In this disclosure, the MLE of a full Likelihood function may be referred to as a full MLE. The full Likelihood function may also be solved to calculate a Likelihood value for the segment, /. e.. the likelihood that the data actually present at the segment arose from the signal hypothesis having the calculated optimal values.

[0155] The method 50 may further include a step 56 of calculating a second full Maximum Likelihood Estimate (MLE) with respect to a model of the background, i.e., the null hypothesis model. Calculating the full MLE with respect to the null hypothesis model may include formulating a full Likelihood function with respect to the null hypothesis model that accounts for two or more background patterns, penetrance of a pattern into multiple regimes, and one or more sources of noise, in an embodiment. For example, the full Likelihood function for the null hypothesis at step 56 may represent measurement noise (e.g., camera noise) as a Gaussian distribution and may represent background noise as a Poisson distribution. In an embodiment, step 56 may include formulating a full Likelihood function according to equation (47) (which, as noted above, can readily be modified by a person of skill in the art so as to apply to the null hypothesis) and solving that full Likelihood function to determine optimal values of the full Likelihood function for the null hypothesis at the segment, i.e., the optimal values of the amplitudes and positions of the background patterns at the segment. The full Likelihood function may also be solved to calculate a Likelihood value for the segment, /. e.. the likelihood that the data actually present at the segment arose from the null hypothesis having the calculated optimal background amplitude value and position.

[0156] The method may further include a step 58 of calculating a full Likelihood ratio. In an embodiment, the full Likelihood ratio may be the ratio of the Likelihood value associated with the first MLE (i.e.. the full MLE with respect to the signal hypothesis model) to the Likelihood value associated with the second MLE (i.e., the full MLE with respect to the null hypothesis model).

[0157] The method 50 may further include a step 60 of applying a threshold to the full Likelihood ratio to determine if the pattern is present in the segment. The threshold may be applied to the full Likelihood ratio directly, in an embodiment. In other embodiments, the threshold may be applied to a derivation of the full Likelihood ratio, i.e., one or more values derived from or based on the full Likelihood ratio. If the full Likelihood ratio meets the threshold, a pattern of interest may be considered detected in the candidate segment under examination, and the optimal values of the first full MLE (i.e.. the full MLE respective of the signal hypothesis) may be considered the characteristics of the pattern of interest and the background at the segment.

[0158] The method may further include a query step 62 at which it may be determined if further pattern candidate segments remain in the data set for further examination according to the method 50. If there are additional segments, the method begins anew at step 62. If not, the method ends.

[0159] Embodiments that address detecting and characterizing fluorescent spots may enable direct 3D spot-based super-resolution time-lapse imaging, with spots of two or more fluorescence colors, with unprecedented temporal resolution and duration, in living samples. The defining feature of spot-based super-resolution imaging is very precise specification of the position of a spot, i.e. its“localization.” However, imaged entities move due to thermal forces or, in living cells, more dramatically due to energy-driven effects. Super-resolution 3D imaging may involve acquisition 2D images in each of multiple focal planes. If the different focal planes are imaged sequentially, an imaged entity may move during the process of 3D data collection and such movement will compromise the precision with which a spot can be localized. This effect may be eliminated if 2D datasets are captured in all focal planes simultaneously, which in turn can be accomplished by a microscope having multiple cameras, one per focal plane, which capture images in perfect coordination.

[0160] Super-resolution imaging involves acquisition of images in multiple focal planes, as discussed above. Although these images are obtained in rapid succession, the elapsed time between images may be a significant fraction of the total time involved. Since effective super-resolution time-lapse imaging requires minimization of total excitation energy, and thus total illumination time, it is desirable for the sample to be excited only when an image is actually being captured and not during the intervening periods. This outcome can be accomplished by a suitable combination of hardware and software in which the camera and the light source are in direct communication, without intervening steps involving signals to and from a computer, such that the sample is excited by light only at the same instant that the camera is taking a picture. For simultaneous imaging in multiple focal planes, this direct communication between camera and light source must occur synchronously for all of the multiple cameras responsible for imaging at the multiple focal planes as described above.

[0161] FIG. 7 is a diagrammatic view of an embodiment of a system 70 for acquiring a data set and identifying and localizing a pattern of interest in a data set. As discussed above, a non-limiting 3D dataset that may be analyzed in accordance with the techniques described herein using 3D pattern matching may be acquired using any suitable fluorescence imaging microscope configured to acquire a plurality of 2D focal plane images in a z-stack. The system 70 of FIG. 7 includes a microscope 72 that may be used to acquire such a 3D dataset in accordance with some embodiments. Microscope 72 may include optics 74, which may include lenses, mirrors, or any other suitable optics components needed to receive magnified images of biological specimens under study. In some embodiments, optics 74 may include optics configured to correct for distortions (e.g., spherical aberration).

[0162] Microscope 70 also includes stage 78 on which one or more biological specimens under study may be placed. For example, stage 78 may include component(s) configured to secure a microscope slide including the biological specimen(s) for observation using the microscope. In some embodiments, stage 78 may be mechanically controllable such that the stage may be moved in the z-direction to obtain images at different focal planes, as discussed in more detail below.

[0163] Microscope 70 also includes a light source 80. The light source 80 may be configured to provide excitation energy to illuminate a biological sample placed on stage 78 to activate fluorophores attached to biological structures in the sample. In an embodiment, the light source may be a laser. In some embodiments, the light source 80 may be configured to illuminate the biological sample using light of a wavelength different than that used to acquire images of photons released by the fluorophores. For example, some fluorescent imaging techniques such as stochastic optical reconstruction microscopy (STORM) and photoactivated location microscopy (PALM), employ different fluorophores to mark different locations in a biological structure, and the different fluorophores may be activated at different times based on the characteristics (e.g., wavelength) of the light produced by the light source used to illuminate the sample. In such instances, the light source 80 may include at least two light sources, each of which is configured to generate light having different characteristics for use in STORM or PALM-based imaging. In other embodiments, a single tunable light source may be used. [0164] The microscope 72 may also include a camera 76 configured to detect photons emitted from the fluorophores and to construct 2D images. Any suitable camera(s) may be used including but not limited to, CMOS or CCD-based cameras. As discussed above, some embodiments may include a single camera with a controllable microscope stage to time sequentially acquire images in a z-stack as the stage moves positions, whereas other embodiments may include multiple cameras, each of which is configured such that the multiple cameras simultaneously acquire 2D images in appropriate different focal planes, thus creating a z-stack instantaneously without any time delay between the 2D images throughout the stack.

[0165] The microscope 72 may also include a processor 82 programmed to control the operation of one or more of stage 78, light source 80, and camera 76. The processor 82 may be implemented as a general- or special-purpose processor programmed with instructions to control the operation of one or more components of the microscope 72. Alternatively, the processor 82 may be implemented, at least in part, by hardware circuit components arranged to control operation of one or more components of the microscope.

[0166] The microscope 72 may further include a memory 84 which may be or may include a volatile or non-volatile computer-readable medium that is non-transitory. The memory 84 may temporarily or permanently store one or more images captured by the microscope 72. The memory 84 may additionally or alternatively store one or more instructions for execution by the processor 82. The instructions may encompass or embody one or more of the methods of this disclosure (e.g., one or more of methods 10, 30, 50, any portions thereof, and/or any portions of the two-stage Likelihood pipeline disclosed herein).

[0167] In addition to or instead of a processor 82 and memory 84, the microscope may include one or more additional computing devices. For example, in embodiments, the microscope 72 may include one or more of an application-specific integrated circuit (ASIC), a field-programmable gate array (FPGA), and/or another type of processing device.

[0168] Although the components of the microscope 72 (i.e., the optics 74, camera 76, stage 78, light source 80, processor 82, and memory 84) are generally described above as singular, it should be understood that any of the components of the microscope 72 may be provided in multiple. That is, the microscope may include multiple optics 74, cameras 76, stages 78, light sources 80, processors 82, and/or memories 84.

[0169] In an embodiment, the microscope may include multiple optics 74 and multiple cameras 76, with each set of optics 74 paired with a respective camera 76. Each paired optics 74 and camera 76 may be configured for imaging in a specific focal plane, with the focal plane of each paired optics 74 and camera 76 different from each other pair. In such an embodiment, the system 70 may enable simultaneous imaging in multiple focal planes for, e.g., capture of a z-stack of images at a single point in time.

[0170] In an embodiment, the processor 82 may control the light source 80 and camera 76 so as to enable simultaneous application of excitation energy from the light source 80 and imaging with the camera 76. As noted above, in an embodiment, the microscope may include multiple cameras 76. Accordingly, in an embodiment, the processor 82 may control the light source 80 and multiple cameras 76 configured to image different respective imaging planes so as to simultaneously image in multiple focal planes with simultaneous application of excitation energy from the light source 80. Such an arrangement, in conjunction with the techniques for processing the subsequent images of this disclosure, may enable super resolution imaging for long periods of time of the same biological sample.

[0171] In addition to the microscope 72, the system 70 may further include a computing device 86 and a storage device 88, both in electronic communication with the microscope 72. The storage device 88 may be configured to store image data acquired by the camera 76. The storage device 88 may be integrated with or directly connected to the microscope 72 as local storage and/or the storage device 88 may be located remote to microscope 72 as remote storage in communication with microscope 72 using one or more networks. In some embodiments, the storage device 88 may be configured to store a plurality of 3D images of a fluorescent spot.

[0172] The computing device 86 may be in communication with microscope 72 using one or more wired or wireless networks. The computing device 86 may be or may include, for example only, a laptop computer, a desktop computer, a tablet computer, a smartphone, a smart watch, or some other electronic computing device. In some embodiments, the computing device 86 may be configured to control one or more operating parameters of the microscope 72 using applications installed on the computing device. In some embodiments, the computing device 86 may be configured to receive imaging data captured using the microscope 72.

[0173] The computing device 86 may include its own respective processor and memory and/or other processing devices for storage and execution of one or more methods or techniques of this disclosure. For example, the computing device 86 may store and execute one or more instructions. The instructions may encompass or embody one or more of the methods of this disclosure (e.g., one or more of methods 10, 30, 50, any portions thereof, and/or any portions of the two-stage Likelihood pipeline disclosed herein). The computing device may be in electronic communication with the storage device 88, in embodiments, in order to acquire one or more data sets from the storage device 88 for processing.

[0174] In some embodiments, a time-lapse visualization (e.g., a movie) may be created (e.g., by the computing device 86) to visualize the tracked location of an imaged entity as identified by processing the 4D dataset. In one implementation, the time-lapse visualization may be created based, at least in part, on a plurality of point-in-time visualizations created in accordance with the techniques described above. In an embodiment, such a visualization may include a plot of one or more of the MLE parameters of a model (e.g., a signal hypothesis model). For example, plotting the value of A (see equations (1) and (9) above) would provide a visualization of the amplitude of a pattern of interest.

[0175] FIG. 8 is flow chart illustrating an example embodiment of a method 90 for identifying and distinguishing further patterns of interest in a data set. The method 90 may be applied, in embodiments, after a respective primary instance of a pattern of interest has been identified in one or more respective segments a data set (e.g., after initial iterations of Stage I and Stage II of a two-stage likelihood pipeline analysis). The method 90 may find use as step 18 in the method of FIG. 1, in an embodiment. As will be described below, the method 90 may generally include an iterative analysis in which each iteration identifies a further pattern of interest.

[0176] The method may include a step 92 of selecting a pattern candidate segment with a confirmed pattern of interest instance. The step 92 may include selecting a segment, for example, for which a pattern of interest instance has been identified and characterized according to a two-stage likelihood pipeline analysis, for example, such as according to methods 30 and 50. The pattern of interest instance initially identified and characterized may have been the“most prominent” pattern of interest instance in the segment, in an

embodiment. For example, in a fluorescent spot embodiment, the pattern of interest instance initially identified and characterized may have been the spot having the greatest intensity amplitude in the segment.

[0177] The method may further include a step 94 that includes subtracting data respective of the most recent full signal hypothesis for the segment from the data set at and/or around the segment. In other words, a signal hypothesis may have been formulated and confirmed at a maximum likelihood estimate in a full likelihood analysis for the segment (e.g., in Stage II of a two-stage likelihood pipeline analysis), and that signal hypothesis may include one or more instances of one or more patterns of interest. The data respective of each of those one or more instances, as theorized by the signal hypothesis at the full MLE, may be subtracted from the original data set in an embodiment of step 94. In embodiments, subtraction may involve subtracting known, assumed, or theoretical amplitudes respective of the pattern of interest instances from the data in the data set. For example, in an embodiment in which a pattern of interest is a fluorescent spot, step 94 may include subtracting the respective amplitudes (as hypothesized in the signal hypothesis) of one or more instances of the fluorescent spot from the data points that are specified as the locations of those fluorescent spots in the signal hypothesis. Further, in an embodiment in which a pattern of interest is a fluorescent spot, step 94 may further include, for each spot included in the signal hypothesis, subtracting amplitudes from data points surrounding the center point of the pattern of interest instance according to a distribution function. For example, the pattern of interest may be assumed to follow a Gaussian distribution, and amplitudes may be subtracted from the data points surrounding the center of the fluorescent spot according to a Gaussian distribution. A result of the subtraction of step 94 may be a reduced data set portion at and/or around the segment.

[0178] To provide a further example: in an embodiment in which an initial instance of a pattern of interest has been identified in the segment, and two additional instances of a pattern of interest (either the same pattern of interest, or different patterns of interest) have been identified in two iterations of steps 94-102, a full signal hypothesis model may have been developed that theorizes a respective amplitude and a respective position for each of the three instances. At a third iteration of step 94, the three pattern of interest instances included in that full signal hypothesis model may be subtracted from the full data set according to the characteristics (e.g., amplitude and position) theorized in the signal hypothesis, and the method 90 may continue to add a fourth instance of a pattern of interest to the model.

[0179] In an embodiment,“subtracting” from the data set at step 94 may include reducing the values at one or more data points in the data set. For example, if a data point calculated at the center of a pattern of interest instance had a value of three thousand, five hundred (3500) before subtraction, and the amplitude of the pattern of interest instance at that data point was calculated to be one thousand (1000), then a subtraction operation may reduce the value of that data point to two thousand, five hundred (2500).

[0180] The method may further include a step 96 that includes applying an approximate likelihood analysis to the reduced data set portion created in step 94. The approximate likelihood analysis may be applied according to a single-regime or a multi-regime stage I of a two-stage likelihood pipeline analysis, in an embodiment, or portions thereof. For example, step 96 may be conducted according to steps 32-46 of the method 30. In embodiments, the step 94 may include calculating an approximate MLE and LV respective of a signal hypothesis for the segment and calculating an approximate MLE and LV respective of a null hypothesis for the segment. The MLEs and LV calculated at step 94 may be respective of a further pattern of interest instance— /. e.. a pattern of interest instance that has not been subtracted from the data set, or has not been characterized in previous iterations of the steps of the method 90. In embodiments, the step 94 may further include calculating a likelihood ratio for the segment. Further, in embodiments, the step 94 may include comparing the likelihood ratio to a threshold and, if the likelihood ratio meets the threshold, continuing to step 96. If the LR does not meet a threshold at step 94, in an embodiment, the method may advance to step 104.

[0181] In some embodiments, including the embodiment illustrated in FIG. 8, step 94 may not include any comparison of a LR to a threshold to determine if a candidate pattern instance is present (and, thus, whether to cease iterations as to a given segment). Rather, as will be described with respect to step 102, a predetermined number of iterations of steps 94-100 may be performed, in some embodiments.

[0182] The method 90 may further include a step 98 of formulating a signal hypothesis for a full likelihood analysis that includes the further pattern of interest instance identified in step 96. In an embodiment, the signal hypothesis formulated at step 98 may additionally include each previously-identified pattern of interest instance. Accordingly, in an embodiment, a signal hypothesis formulated on the /ith iteration of step 98 may include n+1 pattern of interest instances - one for the pattern of interest instance that was identified and

characterized before execution of the method 90, and one per iteration of steps 94-100. The characteristics (i.e., amplitude and location) of pattern of interest instances included in the signal hypothesis formulated at step 98 may be as follows: the pattern of interest instance identified in the most recent iteration of step 96 may be given the characteristics included in the approximate signal hypothesis MLE determined at that iteration of step 96; other pattern of interest instances may be given the characteristics included in the most recent full signal hypothesis MLE. In an embodiment in which the signal hypothesis includes numerous patterns of interest, any pattern of interest that has not yet had an instance identified in the segment may be given an amplitude of zero in the full signal hypothesis of step 98.

[0183] The method 90 may further include a step 100 of applying a full likelihood analysis based on the signal hypothesis developed in step 98. In embodiments, the values of the signal hypothesis and of the null hypothesis, and the approximate locations of the pattern of interest and background, may be used as a starting point for a full Likelihood analysis (which may include, for example, a hill-climbing exercise). In some embodiments, the step 100 may proceed according to steps 54-58 of the method 50 of FIG. 4. A result of the step 100 may be the characterization of a further pattern of interest instance in the segment, and/or of all pattern of interest instances included in the signal hypothesis formulated at step 98.

Accordingly, after a first iteration of steps 92, 94, 96, 98, and 100, two pattern of interest iterations may have been characterized in the particular segment of interest; after a second iteration of steps 92, 94, 96, 98, and 100, three pattern of interest instances may have been characterized in the particular segment, and so on. Further, each iteration of step 100 may result in a signal hypothesis model including a quantity of pattern of interest instances, with each further iteration resulting in a model with an additional pattern of interest instance, in some embodiments. Each such model may be stored for later use and/or comparison, in some embodiments.

[0184] The method may further include a step 102 that includes querying whether a predetermined cutoff point for the method 90 as to the segment has been reached. For example, the cutoff may be a predetermined number of iterations, in some embodiments. In other embodiments, the cutoff may be an incremental likelihood ratio differential (e.g., if the likelihood ratio increases by less than a threshold from one iteration to the next, the cutoff may be reached). If the cutoff point is not reached at step 102, the method may return to step 94 to perform another iteration and identify a further pattern of interest instance in the segment.

[0185] If the cutoff has been reached at step 102, the method may advance to step 104, which may include selecting a model option to define and characterize the data at the segment, i.e., to define and characterize one or more instances of one or more patterns of interest. As noted above at step 100, each iteration of steps 94-100 as to a segment may result in a model with a quantity of pattern of interest instances and that is associated with a likelihood ratio. One of those models may be selected at step 102, and one or more instances of one or more patterns of interest may be defined and characterized according to the model. In an embodiment, the most recently-defined model— e.g., the model with the largest quantity of pattern of interest instances included— may be selected. In other embodiments, other criteria may be used for selecting a model.

[0186] The method may further include a step 106 of querying whether additional segments exist in the data set having a confirmed pattern of interest instance. If so, the method may return to step 92 to select another segment to detect and differentiate further patterns of interest for that segment. If not, the method 90 may terminate.

[0187] While this disclosure has described certain embodiments, it will be understood that the claims are not intended to be limited to these embodiments except as explicitly recited in the claims. On the contrary, the instant disclosure is intended to cover alternatives, modifications and equivalents, which may be included within the spirit and scope of the disclosure. Furthermore, in the detailed description of the present disclosure, numerous specific details are set forth in order to provide a thorough understanding of the disclosed embodiments. However, it will be obvious to one of ordinary skill in the art that systems and methods consistent with this disclosure may be practiced without these specific details. In other instances, well known methods, procedures, components, and circuits have not been described in detail as not to unnecessarily obscure various aspects of the present disclosure.

[0188] Some portions of the detailed descriptions of this disclosure have been presented in terms of procedures, logic blocks, processing, and other symbolic representations of operations on data bits within a computer or digital system memory. These descriptions and representations are the means used by those skilled in the data processing arts to most effectively convey the substance of their work to others skilled in the art. A procedure, logic block, process, etc., is herein, and generally, conceived to be a self-consistent sequence of steps or instructions leading to a desired result. The steps are those requiring physical manipulations of physical quantities. Usually, though not necessarily, these physical manipulations take the form of electrical or magnetic data capable of being stored, transferred, combined, compared, and otherwise manipulated in a computer system or similar electronic computing device. For reasons of convenience, and with reference to common usage, such data is referred to as bits, values, elements, symbols, characters, terms, numbers, or the like, with reference to various presently disclosed embodiments.

[0189] It should be borne in mind, however, that these terms are to be interpreted as referencing physical manipulations and quantities and are merely convenient labels that should be interpreted further in view of terms commonly used in the art. Unless specifically stated otherwise, as apparent from the discussion herein, it is understood that throughout discussions of the present embodiment, discussions utilizing terms such as“determining” or “outputting” or“transmitting” or“recording” or“locating” or“storing” or“displaying” or “receiving” or“recognizing” or“utilizing” or“generating” or“providing” or“accessing” or “checking” or“notifying” or“delivering” or the like, refer to the action and processes of a computer system, or similar electronic computing device, that manipulates and transforms data. The data is represented as physical (electronic) quantities within the computer system’s registers and memories and is transformed into other data similarly represented as physical quantities within the computer system memories or registers, or other such information storage, transmission, or display devices as described herein or otherwise understood to one of ordinary skill in the art.

Claims

CLAIMS What is claimed is:

1. A method of detecting two or more pattern of interest instances in a segment of a data set, the method comprising:

identifying a first pattern of interest instance in the segment;

subtracting the amplitude of the first pattern of interest instance from data in the data set respective of the segment to create a reduced data set portion;

calculating an approximate second pattern MLE for the reduced data set portion to identify a second pattern instance;

applying a full Likelihood analysis to the segment according to the approximate

second pattern MLE; and

designating the candidate segment as including the second pattern according to the result of the full Likelihood analysis.

2. The method of claim 1, wherein the data set comprises one or more images.

3. The method of claim 1, wherein applying a full Likelihood analysis to the candidate segment according to the approximate second pattern MLE comprises applying the full Likelihood analysis to the full data set at the segment.

4. The method of claim 1, wherein identifying the first pattern of interest instance comprises calculating an approximate first pattern MLE for the full data at the segment to identify the segment as a pattern of interest candidate segment.

5. The method of claim 4, wherein identifying the first pattern of interest instance comprises applying a full Likelihood analysis to the full data at the segment according to the first pattern MLE.

6. The method of claim 1, further comprising:

subtracting the amplitude of the second pattern of interest instance from data in the reduced data set portion to create a further reduced data set portion;

calculating an approximate third pattern MLE for the further reduced data set portion to identify a third pattern instance;

applying a full Likelihood analysis to the candidate segment according to the

approximate third pattern MLE; and

designating the candidate segment as including the third pattern according to the result of the full Likelihood analysis.

7. The method of claim 6, wherein subtracting the amplitude of the first pattern of interest instance from data in the data set respective of the segment comprises reducing respective values of one or more data points in the data set respective of the segment.

8. The method of claim 1, further comprising iteratively:

subtracting the amplitude of each identified pattern of interest from the data set to create a further reduced data set portion;

calculating an approximate further pattern MLE for the further reduced data set

portion to identify a further pattern instance;

applying a full Likelihood analysis to the segment according to the approximate further pattern MLE; and designating the candidate segment as including a further identified pattern according to the result of the full Likelihood analysis.

9. A non-transitory, computer-readable medium storing instructions that, when executed by a processor, cause the processor to perform a method of detecting two or more pattern of interest instances in a segment of a data set, the method comprising:

identifying a first pattern of interest instance in the segment;

applying a full Likelihood analysis to the segment according to the approximate

second pattern MLE; and

10. The non-transitory, computer-readable medium of claim 9, wherein the data set comprises one or more images.

11. The non-transitory, computer-readable medium of claim 9, wherein applying a full

Likelihood analysis to the candidate segment according to the approximate second pattern MLE comprises applying the full Likelihood analysis to the full data set at the segment.

12. The non-transitory, computer-readable medium of claim 9, wherein identifying the first pattern of interest instance comprises calculating an approximate first pattern MLE for the full data at the segment to identify the segment as a pattern of interest candidate segment.

13. The non-transitory, computer-readable medium of claim 12, wherein identifying the first pattern of interest instance comprises applying a full Likelihood analysis to the full data at the segment according to the first pattern MLE.

14. The non-transitory, computer-readable medium of claim 9, wherein the method further comprises:

applying a full Likelihood analysis to the candidate segment according to the

approximate third pattern MLE; and

15. The non-transitory, computer-readable medium of claim 14, wherein subtracting the amplitude of the first pattern of interest instance from data in the data set respective of the segment comprises reducing respective values of one or more data points in the data set respective of the segment.

16. The non-transitory, computer-readable medium of claim 14, wherein the method

comprises further comprises iteratively:

calculating an approximate further pattern MLE for the further reduced data set

portion to identify a further pattern instance; applying a full Likelihood analysis to the segment according to the approximate further pattern MLE; and

designating the candidate segment as including a further identified pattern according to the result of the full Likelihood analysis.

17. A method of detecting two or more pattern of interest instances in a segment of a data set, the method comprising:

applying a Likelihood analysis to the segment to identify a first pattern of interest instance, the first pattern of interest comprising an amplitude;

applying a full Likelihood analysis to the segment according to the approximate

second pattern MLE; and

18. The method of claim 17, wherein applying a full Likelihood analysis to the candidate segment according to the approximate second pattern MLE comprises applying the full Likelihood analysis to the full data set at the segment.

19. The method of claim 17, further comprising:

subtracting the amplitude of the second pattern of interest instance from data in the reduced data set portion to create a further reduced data set portion; calculating an approximate third pattern MLE for the further reduced data set portion to identify a third pattern instance;

applying a full Likelihood analysis to the candidate segment according to the

approximate third pattern MLE; and

20. The method of claim 19, wherein subtracting the amplitude of the first pattern of interest instance from data in the data set respective of the segment comprises reducing respective values of one or more data points in the data set respective of the segment.