WO2009135271A1 - A system and method for processing flow cytometry data - Google Patents

A system and method for processing flow cytometry data Download PDF

Info

Publication number
WO2009135271A1
WO2009135271A1 PCT/AU2009/000582 AU2009000582W WO2009135271A1 WO 2009135271 A1 WO2009135271 A1 WO 2009135271A1 AU 2009000582 W AU2009000582 W AU 2009000582W WO 2009135271 A1 WO2009135271 A1 WO 2009135271A1
Authority
WO
WIPO (PCT)
Prior art keywords
flow cytometry
subset
data
events
cytometry data
Prior art date
Application number
PCT/AU2009/000582
Other languages
French (fr)
Inventor
Dieter Gottwald
Vittorio Cordioli
Original Assignee
Inivai Technologies Pty Ltd
Priority date (The priority date is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the date listed.)
Filing date
Publication date
Application filed by Inivai Technologies Pty Ltd filed Critical Inivai Technologies Pty Ltd
Publication of WO2009135271A1 publication Critical patent/WO2009135271A1/en

Links

Classifications

    • GPHYSICS
    • G01MEASURING; TESTING
    • G01NINVESTIGATING OR ANALYSING MATERIALS BY DETERMINING THEIR CHEMICAL OR PHYSICAL PROPERTIES
    • G01N15/00Investigating characteristics of particles; Investigating permeability, pore-volume, or surface-area of porous materials
    • G01N15/10Investigating individual particles
    • G01N15/14Electro-optical investigation, e.g. flow cytometers
    • G01N15/1429Electro-optical investigation, e.g. flow cytometers using an analyser being characterised by its signal processing
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F18/00Pattern recognition
    • G06F18/40Software arrangements specially adapted for pattern recognition, e.g. user interfaces or toolboxes therefor
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06VIMAGE OR VIDEO RECOGNITION OR UNDERSTANDING
    • G06V20/00Scenes; Scene-specific elements
    • G06V20/60Type of objects
    • G06V20/69Microscopic objects, e.g. biological cells or cellular parts
    • G06V20/695Preprocessing, e.g. image segmentation
    • G01N15/1433
    • GPHYSICS
    • G01MEASURING; TESTING
    • G01NINVESTIGATING OR ANALYSING MATERIALS BY DETERMINING THEIR CHEMICAL OR PHYSICAL PROPERTIES
    • G01N15/00Investigating characteristics of particles; Investigating permeability, pore-volume, or surface-area of porous materials
    • G01N15/10Investigating individual particles
    • G01N15/14Electro-optical investigation, e.g. flow cytometers
    • G01N15/1468Electro-optical investigation, e.g. flow cytometers with spatial resolution of the texture or inner structure of the particle
    • G01N15/147Electro-optical investigation, e.g. flow cytometers with spatial resolution of the texture or inner structure of the particle the analysis being performed on a sample stream

Definitions

  • the present invention relates to a system and process for analysis of flow cytometry data.
  • Flow cytometry is a technique for measuring multiple properties (e.g. fluorescence) of microscopic particles (e.g. biological cells), and modern flow cytometry systems generate increasingly large volumes of measured data representing multiple properties of thousands or millions of particles.
  • a group, or set, of measurements for a particular particle is referred to in the art as an "event”.
  • cytometry data typically , represents a correspondingly large number of measurement "events”.
  • a system for analysis of flow cytometry data representing parameter values of cytometry events including a generator for generating a display matrix of elements that represent event densities in the flow cytometry data, wherein the generator includes an accumulator that, for each event, increments a selected element's value, where the selected element has a position in the display matrix dete ⁇ nined by a normalised value of at least one selected parameter.
  • the present invention also provides a system for analysis of flow cytometry data, including: a selector for selecting a subset of events in the flow cytometry data; and a linear array generator for generating a linear array that represents the subset, where each event is represented by one of two data representations.
  • the linear array is a bit string and the data representations are 0 and 1 bits.
  • the present invention also provides a system for analysis of flow cytometry data, including: a parser for generating a parse tree based on a Boolean expression relating to one or more operand bit arrays representing events in the flow cytometry data; a Boolean evaluator for generating an evaluation command for each node in the parse tree, including selecting the command to be a result-generation or an overwriting evaluation command, based on whether the node is respectively an intermediate or terminal node in the parse tree; and a bit string processor for respectively generating a result bit array, or overwriting an operand bit array, based on each evaluation command from the Boolean evaluator.
  • the present invention also provides a system for analysis of flow cytometry, data representing parameter values of cytometry events, the system including: a user interface for generating a graphical representation of a flow cytometry display matrix of elements that represent event densities in the flow cytometry data; and a cluster selector for selecting a cluster of the elements based on a selected threshold density value and the event densities of the elements.
  • Figure 1 is a schematic diagram of one or more cytometers and an analysis system for analysis of flow cytometry data
  • Figure 2 is a block diagram of software modules of the analysis system
  • Figure 3 is a flowchart of a display matrix generation process for analysis of the flow cytometry data by the analysis system
  • Figure 4 is a flowchart of a display image generation process performed by the analysis system
  • Figure 5 is a flowchart of a subset selection process for selection of a subset of the flow cytometry data by the analysis system
  • Figure 6 is a flowchart of a dynamic statistics generation and display process performed by the analysis system
  • Figure 7 is a screen shot of a graphical user interface of the analysis system showing an example one-dimensional display image, a selected subset and dynamic statistics;
  • Figure 8 is a screen shot of the graphical user interface showing an example two-dimensional display image
  • Figure 9 is a screen shot showing a display image of a first subset of the flow cytometry data in Figure 8.
  • Figure 10 is a screen shot showing a display image of a second subset of the flow cytometry data in Figure 8;
  • Figure 11 is a screen shot showing a display image of a third subset of the flow cytometry data in Figure 8;
  • Figure 12 is a flowchart of a subset bit string generation process performed by the analysis system;
  • Figure 13 is a flowchart of a Boolean expression evaluation process performed by the analysis system
  • Figure 14 is a screen shot showing a display image with a subset resulting from evaluation of a Boolean expression
  • Figure 15 is a flowchart of a cluster generation process, for a cluster of elements in the display matrix, performed by the analysis system;
  • Figure 16 is a screen shot showing a display image of a two-dimensional display matrix
  • Figure 17 is a screen shot showing a display image of a first cluster selected from the display matrix of Figure 16;
  • Figure 18 is a screen shot showing a display image of a second cluster selected from the display matrix of Figure 16;
  • Figure 19 is a screen shot of the graphical user interface showing a report, including a list of display matrices, statistics and a Boolean expression;
  • Figure 20 is a screen shot of the user interface showing a display matrix with an x-axis logarithmic scale for positive and linear scale for negative values.
  • An analysis system 100 for the analysis of flow cytometry data, including processing, manipulation and display, is embodied in a computing device 102, which includes a processor 104, non- volatile memory storage 106 (e.g. hard disks) and Random Access Memory (RAM) 108.
  • the analysis system 100 receives flow cytometry data from one or more flow cytometry systems or cytometers 110..
  • the flow cytometry data is received either directly from the flow cytometer 110 during operation, or as data packets transmitted over a data network (e.g. a wired or wireless, open or closed network, including the Internet), or from a data storage device with prerecorded flow cytometry data (e.g. files stored in persistent memory, such as removable hard drives or flash memory).
  • a data network e.g. a wired or wireless, open or closed network, including the Internet
  • prerecorded flow cytometry data e.g. files stored in persistent memory, such as removable hard drives or flash memory
  • the computing device 102 is a personal computer (PC) with an Intel IA32 based processor 104 and a Microsoft Windows operating system. In alternative embodiments, however, the computing device 102 may be based on a different processor 104, e.g. from Advanced Micro Devices (AMD), and the operating system may be a Macintosh operating system or a Linux operating system.
  • the computing device 102 may also be an application-specific computer, provided at least in part by one or more dedicated hardware components such as Application-Specific Integrated Circuits (ASICs), or Programmable Logic Devices (PLDs) such as Field- Programmable Gate Arrays (FPGAs).
  • ASICs Application-Specific Integrated Circuits
  • PLDs Programmable Logic Devices
  • FPGAs Field- Programmable Gate Arrays
  • the functions of the computing device 102 may be distributed over a variety of locations, including based on client-server computing, or alternatively the computing device 102 may be a computing device such as a mobile telephone, or Personal Digital Assistant (
  • the analysis system 100 includes a plurality of software modules 200, running in a Java virtual machine 202 executed by an operating system (O/S) 204 operating on the computing device 102, as shown in Figure 2.
  • the software modules 200 may be implemented using other computing languages such as C, C++, C#, or Ada, etc.
  • the analysis modules 200 include:
  • an import/export module 206 for importing raw flow cytometry data and exporting processed data and analysis reports;
  • a display matrix generator 208 for generating a display matrix (also referred to as a 'display array'), which may be one-dimensional or multi-dimensional, including multiple array elements (or data 'cells'), based on the raw cytometry data;
  • an image generator 210 for generating images based on the display matrix
  • a statistics generator 212 for generating statistics relating to the display matrix
  • GUI graphical user interface
  • a Boolean evaluator 216 for generating and evaluating Boolean expressions based on user selections
  • a subset selector 218 for selecting subsets of events, e.g. based on selected parameter values for the events, or for events fitting certain clusters in the display matrix;
  • bit array generator 220 for generating a bit array, or bit string, based on the input flow cytometry data, a Boolean expression, and/or any selected clusters.
  • the analysis system 100 receives flow cytometry data from the cytometers(s) 100 and generates one or more graphical displays based on this data.
  • the raw cytometer data represents measurements made on a large number of 'events' (e.g. one event per detected particle), e.g. from measuring a large number of microscopic particles in the cytometers(s) 110.
  • a number of parameters are monitored in the cytometers(s) 110, such as the intensities or levels of light emitted by different fluorescent species (referred to as labels') attached to the microscopic particles, e.g. green (TLl 1 ), orange (TL2 1 ), red ('FL3') and infrared ('FL4').
  • This cytometry data may be in the form of a raw data array, with the measured values for each parameter listed for each event, provided as a binary data file with header information describing the file details, e.g. format, size, and total number of events N.
  • the cytometry data is displayed by the analysis system 100 in various display images. For example, a histogram of events for each of the measured parameters may be generated, such as shown in Figure 7, where the number of events (also referred to as the density of events) is plotted against the parameter "FSC-H".
  • An alternate display is a type of two-dimensional (bivariate) plot referred to in the art as a density plot where both the X and Y axes relate to measured parameters and the vertical axis, or a colour- based intensity indicator, represents the density of events, such as shown in Figure 8 for the parameters "FSC-H" (X axis) and "FSC-A" (Y axis).
  • a user of the analysis system 100 uses the display images to analyse and manipulate the raw cytometer data.
  • a one or multidimensional plot may display peaks in event density relating to different species, or types, of particles in the mixture of particles. The number of events (and thus particles) related to each different peak allows a user to determine, for example, relative proportions of particle types in the mixture.
  • the analysis system 100 generates the display matrix in a display matrix generation process 300, in Figure 3.
  • the analysis system 100 loads raw event data from the cytometer(s) 110, including event parameter data for each event, using the import/export module 206 (step 302).
  • the display matrix generator 208 receives the raw event data from the import/export module 206.
  • the raw event data is received in a binary data file, including a header that represents the format of the actual raw event data.
  • the raw event data may represent a table of recorded events, e.g.
  • the display matrix generator 208 will only process data relating to a subset of the raw event data: in this case, the display matrix generator 208 loads a subset bit array, or bit string, defining the subset of events (step 304). Subset bit arrays and bit strings are described below with reference to Figure 5. From the loaded event data, one or more event parameter arrays are generated with one array for each event parameter (step 306).
  • Each parameter array is a linear (i.e. one- dimensional) array with a value stored for each measured event: e.g. an array for the parameter "FSC-H" is one of the general form[xl, x2, x3,..., xi, , xN], .
  • the display matrix generator 208 needs to know which parameters to display and therefore the display matrix generator 208 receives selection data from the GUI module 214 representative of a selection of parameters to be displayed, e.g. a user selection (step 308).
  • the selection of event parameters to be displayed in step 308 may also be automated, e.g. for generating pre-selected displays.
  • the image size, data resolution (e.g. number of display cells or points in the plot) and type (e.g. one-dimensional or multi-dimensional, having linear or logarithmic axes) are also selected for the display image (step 310): e.g. a one-dimensional display may be a histogram with 1024 'cells', and a two-dimensional display may be a density plot with linear-linear axes and a display of 256 x 256 cells.
  • the data resolution, i.e. number of 'cells' in the output image is independent of the device resolution (e.g. resolution of the screen in pixels).
  • the display matrix can be one-dimensional, or multi-dimensional, and may be of any practical size.
  • the display matrix generator 208 generates an event-to-image transformation, or normalisation, for each of the one or more selected event parameter arrays (step 312).
  • the role of the transformation is to transform, or normalise, each raw parameter value for the selected parameter(s) into a corresponding display value appropriate to the selected display image.
  • the transformation maps raw parameter values to display parameter values.
  • the transformation also applies any desired scaling, zooming and anti-aliasing operations. For example, for a one-dimensional histogram 701 (in Figure 7), the transformation scales the raw values of the parameter FSC-H (such as floating point values) to generate display values of the parameter (which are being integers) based on the size and resolution selected for the display image.
  • the floating point values are scaled to fit the display axes (desired, data resolution), then rounded off to the nearest integer values.
  • the transformation also accounts for translation of values, e.g. where some parameter values are less than or equal to zero for logarithmic-scale plots as shown on the mixed linear-logarithmic axis of Figure 20.
  • the values of the display matrix are generated, or populated, by an accumulator 209 in the display matrix generator 208 in an iterative process that is repeated for each event in the event parameter array(s), i.e. all events in the raw event data (or in any selected subset).
  • the image coordinates of the first event are generated as integers using the event-to-image transformation and the first value in each selected parameter array (step 314): for example, for a one-dimensional histogram with 256 histogram 'cells', the integer image coordinate of the first event is the scaled value, converted to an integer, indicative of where that event lies in the histogram.
  • the integer image coordinate determines into which element of the display matrix the event falls.
  • the display matrix generator 208 updates an initially empty display matrix (initialised in the import/export module 206) by incrementing by a single value (in this case, 1) the value in the array element corresponding to the image coordinates (step 316).
  • the value of one element in a display matrix is incremented for each event being processed in the loop, and thus an up-to-date event density is indicated by the updated display matrix.
  • the display matrix accumulates its values during iteration of step 316.
  • the display matrix generator 208 determines if the density, or accumulated value, of the newly updated array element is greater than a "max-density" value, indicative of the maximum density of events in the display matrix at any one element (e.g. the maximum peak in a histogram): if the density of the updated array element is greater, the value of max-density is increased or incremented by the single value, e.g. 1 (step 318).
  • the display matrix update loop of steps 314, 316 and 318 is repeated for each event in the raw event data, or the subset of event data if relevant (step 320). Once the display matrix is generated, it is stored by the display matrix generator 208 (step 322).
  • a display image is generated from the display matrix in a display image generation process 400, in Figure 4, performed by the image generator 210.
  • the image generator 210 accesses the display matrix, as generated by the display matrix generator 208 (step 402), and loops through the display matrix to generate an image display matrix (steps 404 and 406).
  • the image generator 210 determines whether the element has a non zero value (i.e. represents one or more events): if so, a value for the image display matrix is generated representing a height (e.g. for a one-dimensional histogram as in Figure 7), or a colour (for a multidimensional plot, e.g.
  • the max-density value provides an upper limit on the density to be displayed, and thus allows the values in the display matrix to be mapped to heights, or colours in a colour map, scaled between zero (no events) and the maximum (max-density).
  • the colour or height selection step 404 is repeated for each element in the display matrix (step 406), e.g. for all 65,536 elements in a 256x256 display matrix.
  • the image generator 210 then renders all the elements in the image display matrix in one step (step 408) to generate a display image for the GUI module 214 for viewing by a user, e.g. plots such as in Figures 7 and 8.
  • the GUI module 214 generates a graphical user interface which allows the user to view and manipulate the event data.
  • a display in Figure 7, shows the histogram 701 of events plotted by the parameter "FSC-H".
  • the GUI module 214 also generates user interface controls including subset selection controls 712, chart type selection controls 714 (e.g. histogram, two-dimensional, multi-dimensional, etc.), a clusterer control 716, cloning controls 718 (e.g. for cloning a gate or a window), file controls 720 (e.g. print, save, delete), and a Boolean function control 722.
  • the analysis system 100 allows for selection of subsets of the events in the raw events data, e.g. by a user interaction.
  • An example subset 702, in Figure 7, includes the events whose values for the parameter "FSC-H" lie on or between the values represented in the histogram by lines 704 A and line 704B.
  • Another subset in Figure 7 is subset 706 representing events with "FSC-H" parameter values lying between the values at lines 708 A and 708B.
  • a subset may be selected by drawing two-dimensional shapes, e.g. polygon 802, or polygon 804, or polygon 806, which define corresponding subsets containing events represented by the points on the graph in Figure 8.
  • the subset selector 218 selects events represented by part of the display matrix, which has been generated by the display matrix generator 208.
  • the subset is selected (step 502) using one or more of the following methods:
  • the analysis system 100 performs a dynamic statistics generation and display process 600 (step 504) to dynamically generate statistics relating to the selected subset.
  • a dynamic statistics generation and display process 600 step 504 to dynamically generate statistics relating to the selected subset.
  • the number of events in the selected subset "Ml " 706 in Figure 7 is 17.3% of the total number of events N, which is displayed as a number 710 in Figure 7.
  • the number of events in subset "P2" as a percentage of total events N in the raw event data is 67.2%, which is displayed as a number 810 in Figure 8.
  • the dynamic statistics generation and display process 600 is performed by the statistics generator 212.
  • the statistics generator 212 To determine the statistics related to a selected subset, the statistics generator 212 initialises a counter in the form of a subset density counter to zero (step 602).
  • the statistics process 600 then loops through each element in the display matrix (e.g. 65,536 elements for a 2D 256x256 display matrix) and increments the subset density counter with each iteration of the loop, if a bit corresponding to that element in a subset bit string is set (steps 604 and 606).
  • the statistics generator 212 generates statistics (step 608) based on the total subset density, generated in the loop.of steps 604 and 606, and the total number of events N in the raw event data, as determined by the import/export module 206.
  • the statistics generator 212 displays the statistics (e.g. as numbers 710, or 810) using the GUI module 214 (step 610).
  • the subset selection process 500 generates a linear subset array representing events within the selected subset (step 506).
  • the linear array is in the form of a bit array, or preferably a bit string, i.e. a one-dimensional array with a length in bits corresponding to the number of events N in the raw event data.
  • Each bit in the subset bit array represents a corresponding event in the raw event data, e.g. a binary string of size/length N bits.
  • the bit string defines a gate that indicates for each event whether it is included in the subset or not, e.g. a binary "1 " indicates that the event is included in the subset, and a binary "0" indicates that the event is not included.
  • the generated subset bit strings for a given set of raw event data have the same size, N, as the raw event data itself.
  • a subset e.g. "P2" 802 in Figure 8
  • P2 the display matrix generation process 300
  • step 314 When generating the integer in the image coordinates for each event, in step 314, only events in the selected subset are included.
  • the integer image coordinates to be included in the subset display matrix are selected based on the bits in a subset bit string: in the update display matrix loop of steps 314, 316, 3 ⁇ B and 320, the value of the bit in the bit string corresponding to the event in the loop is used to include or exclude that event.
  • events not represented in the subset have 0 as a binary value at a position in the bit string corresponding to their position in the event parameter arrays.
  • Each event parameter array and each subset bit string is of length N.
  • subset display matrices and their display by the image generator 210, is illustrated by the displayed subsets 902 in Figure 9, 1002 in Figure 10 and 1102 in Figure 11 : these subsets 902, 1002 and 1102 correspond to the subsets selected by polygons 802, 804 and 806 in Figure 8, respectively.
  • the displayed subset images illustrate that the range of density values used to map the colours of the pixels has changed as the range of density values for the subset display matrix is typically different to that of the display matrix for the whole set of raw event data.
  • the display matrix generation process 300 relates to all events, and filters them using the subset bit string.
  • the parameters selected for the display matrix generation process 300 where a subset bit string has been selected may be different from those used to select the bit string.
  • a subset of events can be selected based on a first group of parameters, e.g. green and red fluorescence, and then this subset may then be displayed using different parameters, e.g. orange and infrared, which allows the events in the subset to be further distinguished by a user into multiple subsidiary subsets.
  • the bit string generator 220 loads a pre-initialised main bit array, e.g. as a one-dimensional bit string (e.g. initialised to an equal binary value for each point, and having a length N) and the raw event data (step 1202).
  • the bit string generator 220 receives subset parameter values from the subset selector 218 as part of the subset selection process 500 (step 1204): these are the selected values of the relevant parameters that define a subset, e.g. the values of the parameters "FSC-H" and "FSC-A" inside the polygon 802 in Figure 8.
  • the bit string generator 220 then loops through each event in the raw event data (steps 1206 and 1208).
  • the bit string generator 220 determines whether an event's relevant parameter values, i.e. relevant to the display matrix used for selection of the subset, lie within the subset parameter values (step 1206).
  • the bit in the bit string corresponding to this event is then set to reflect whether or not the event is in the subset; for example, the corresponding bit is set to binary 1 if the event is in the subset, and else it is left at binary 0.
  • the determination in step 1206 is performed for each event in the main bit string/or event data, i.e. N times (step 1208).
  • Boolean evaluation process 1300 performed by the Boolean evaluator 216, is performed in accordance with control selections made by the user of the GUI module 214, and uses existing bit arrays or bit strings, representing the one or more subsets, to generate a result from the expression.
  • An expression may be selected by the user using the Boolean control 722, which allows an expression to be generated using operands (referring to the one or more subsets), and Boolean operators.
  • the Boolean evaluation uses bit arrays or bit strings, representing all events. The Boolean expression is evaluated using only binary operations directly on the bit strings, which allows for a result to be generated with high speed / efficiency.
  • a Boolean expression is selected, including Boolean operators and bit array (or bit string) operands (step 1302).
  • the Boolean expression is parsed to generate a parse tree (step 1304), using for example a parse command in Java or another computer language.
  • Each operator and its bit string(s) are evaluated using the bit string processor 217 (step 1306).
  • the bit string processor 217 is a module or device for performing bit- wise operations on bit arrays or strings. All operations performed by the bit string processor 217 take bit arfeys or strings as their operands and produce results of type bit string or of type integer.
  • the bit string processor 217 is implemented in either software (e.g. using the Java . programming language) or in hardware (e.g. using a FPGA).
  • the operations are performed in a step-by-step a traversal of the generated parse tree (step 1308), i.e. the loop of steps 1306 and 1308 is repeated for all expressions in the parse tree.
  • a resultant evaluation bit string for the subset resulting from the Boolean expression is generated at the end of the Boolean process 1300.
  • the evaluation process 1300 generates command data representing a form of machine language "program", or list of evaluation commands, for the bit string processor 217.
  • An evaluation command for the bit string processor 217 is generated for each node in the parse tree.
  • the corresponding plurality of evaluation commands comprise the generated "program”.
  • the evaluation command corresponding to each node is selected, by the Boolean evaluator 216, based on the position of the node in the parse tree.
  • the Boolean evaluator 216 selects a result-generation operation, known as an "op" type, which causes the bit string processor 217 to generate data for a new resultant bit string, based on the Boolean operator and the one or more operand bit strings of that node.
  • a result-generation operation known as an "op" type
  • the evaluation process 1300 is performed with reduced memory allocation/deallocation overheads and at a relatively high speed compared to at least some existing 'methods of Boolean expression evaluation.
  • the "op" type operations performed by the bit string processor 217 include:
  • This result generation operation returns a newly generated resultant bit string, with each bit being the logical opposite of the bit in the same position in the bit string "operandl".
  • a further "op" type operation performed by the bit string processor 217 is:
  • This overwriting operation overwrites each bit in "operandl" with its logical opposite.
  • the bit string processor 217 also performs a population count operation that generates data representing the total number of bits of a certain value in an input bit string (e.g. the total number of binary Is in a bit string).
  • the population count operation is:
  • bit string processor 217 also performs permute operations on bit strings of potentially different lengths, e.g. to expand, condense or re-order their individual bits.
  • An example Boolean result subset may be generated from the union of two subsets "Rl” (1504) and "R2" (1506), in Figure 14.
  • the resulting subset indicated by the area 1502 can be displayed in a new display matrix generation process 300 using alternate parameters, and/or used in further Boolean manipulations based on its corresponding subset bit string.
  • a further example result subset, represented by the area 1503 in Figure 14, is generated from the union of two subsets "Ml " (702) and "M2" (706) in Figure 7: this illustrates that the subsets, including result subsets, can be used to generate a display matrix with parameters (e.g. "FSC-H” and "FSC-A” in Figure 14) that differ from the parameter(s) in the display matrix used to select the subset(s) (e.g. only "FSC-H” in Figure 7).
  • parameters e.g. "FSC-H" and "FSC-A” in Figure 14
  • subset selection process 500 it is possible to select a subset of the display matrix using a contour-based or automatic "flood-fill" style, cluster generation process 1600, in Figure 15.
  • a portion, or cluster, of the elements of the display matrix is selected, by the user, based on a selected threshold density value (e.g. by selecting the density value of one of the elements).
  • the subset selector 218 first receives or accesses a display matrix that has been generated for selected parameters (step 1602), e.g. the display matrix represented by the display image of Figure 16.
  • a threshold density value which is an integer number representing a number of events in the display matrix, between zero and max- density — is selected by the user (step 1604).
  • the user may select the threshold density value by specifying a number, i.e. numerically, or by graphically identifying an element corresponding to a pixel (i.e.
  • the cluster generation process 1600 loops through each neighbouring element in the display matrix to determine which elements are differentiated by their event density value, compared to the threshold density value (steps 1606 and 1608). Neighbouring elements are those adjacent each other in the display matrix.
  • the "flood- fill" process first determines the threshold density in the display matrix corresponding to that location. This process then adds that location to the cluster. The addition of any new location to the cluster triggers the examination of its neighbours: up, down, left and right. The examination first checks whether or not the neighbour in question is already considered part of the cluster. Only if it isn't, its density is compared to the threshold. If its density is higher than or equal to the threshold, it is added to the cluster, in turn triggering the examination of its neighbours: up, down, left and right. This process is optimised so as to minimise the number of pixel examinations.
  • steps 1606 and 1608 may optionally be repeated for all local maxima (step 1609), i.e. all elements in the entire display matrix with density values relative to the threshold, e.g. greater or equal to the threshold density. In this way, multiple, non-contiguous regions can be selected into a single cluster.
  • the cluster is displayed by graphically indicating the corresponding pixels in the cluster on the display image (step 1610). For example, for a selected element corresponding to pixel 1704, by hovering the mouse cursor over that pixel, all pixels with element density values above or equal to that pixel 1704 are highlighted in a cluster display 1802 in Figure 17. The element corresponding to pixel 1704 is included in the cluster as an element corresponding to pixel 1804. In another example, if the element corresponding to pixel 1706 is selected, e.g. by hovering the cursor over it, a smaller cluster display 1902 is generated as shown in Figure 18. The smaller cluster display 1902 excludes elements with a density less than the threshold value, e.g. the element corresponding to pixel 1904 which corresponds to pixel 1704 and 1804, but includes the selected element corresponding to pixel 1706 and 1906.
  • the "flood-f ⁇ H" selection process of steps 1606 and 1608 is performed with variations in terms of which locations are considered “neighbours".
  • the four diagonally adjacent locations are included in the examination.
  • the examination is limited to the diagonally adjacent locations only.
  • the "in or out” evaluation rule is replaced with a selection rule based on a dynamically adjusting threshold value, e.g. taking into account the distance from the starting location.
  • the cluster is used to generate a new subset including all events represented by the selected elements in the cluster.
  • the cluster subset is selected by clicking the mouse cursor once the cluster display has been highlighted (step 1612).
  • a subset bit string generated by the cluster generation process 1600 may be used to generate a subsequent display matrix, for any selected parameters, in the display matrix generation process 300, i.e. a cluster subset is another subset that can be used for analysis, in Boolean expressions, for display, etc., as any other subset.
  • the flood-fill operates on the display matrix only.
  • the mouse button is clicked to select the cluster subset (known as "fixing" the cluster selection)
  • all events contributing to the respective density of the highlighted pixels in the display matrix are selected.
  • the computational complexity of the interactive "flood-fill" highlighting process is bounded by the dimensions of the display matrix.
  • the potentially very expensive selection of the set of events in the cluster subset is performed only when a user of the GUI 214 confirms their selection of the highlighted cluster by pressing the mouse button.
  • the GUI module 214 generates summary reports, e.g. shown in Figure 19,-;.that list the number of events 2002 in the various subsets of events 2004, for example 1 million events in the total data set "Fl” while only 173,293 or 17.3%, are in the set "Ml” 2006, also shown as “Ml" group 704 in Figure 7.
  • the report data includes selected subsets and subsets generated by Boolean expressions, e.g. "Gl " 2008 generated by the

Abstract

The present invention relates to a system and process for analysis of flow cytometry data. The flow cytometry data comprises a set of one or more parameter values for each of many cytometry events, and each parameter has a value within a known range. A generator transforms the sets of parameter values into a display matrix comprising element values that represent event densities for each of a number of sub-ranges of each parameter's range. Each element in the matrix has co-ordinates which have a dimension for each parameter. Each element has co-ordinate values determined by the numbers of the respective sub-ranges of the corresponding parameters. In addition the generator includes an accumulator that, for each event, increments the value for each element if and only if the values of all the parameters are in the corresponding sub-ranges. The generator also maintains an up to date count of the highest value accumulated in any of the elements, and uses this to normalise the data for display.

Description

"A SYSTEM AND METHOD FOR PROCESSING FLOW CYTOMETRY DATA"
CROSS-REFERENCE TO RELATED APPLICATIONS The present application claims priority from United States of America Provisional Patent Application No 61/071,619 filed on 8 May 2008, the content of which is incorporated herein by reference.
FIELD The present invention relates to a system and process for analysis of flow cytometry data.
BACKGROUND
Flow cytometry is a technique for measuring multiple properties (e.g. fluorescence) of microscopic particles (e.g. biological cells), and modern flow cytometry systems generate increasingly large volumes of measured data representing multiple properties of thousands or millions of particles. A group, or set, of measurements for a particular particle is referred to in the art as an "event". Thus cytometry data typically, represents a correspondingly large number of measurement "events".
Existing systems for displaying, analysing and manipulating flow cytometry data are typically inefficient and slow, and may be too slow for interactive and convenient processing, analysis, display and manipulation of large volumes of data.
In particular, existing tools for manipulating, and selecting subsets of, displayed flow cytometry data are typically slow and time-consuming to use, particularly when attempting to select a subset according to precise criteria.
It is desired to provide a system and process for analysis of flow cytometry .data that alleviate one or more difficulties of the prior art, or at least provide a useful .alternative. SUMMARY
In accordance with the present invention, there is provided a system for analysis of flow cytometry data representing parameter values of cytometry events, th& system including a generator for generating a display matrix of elements that represent event densities in the flow cytometry data, wherein the generator includes an accumulator that, for each event, increments a selected element's value, where the selected element has a position in the display matrix deteπnined by a normalised value of at least one selected parameter.
The present invention also provides a system for analysis of flow cytometry data, including: a selector for selecting a subset of events in the flow cytometry data; and a linear array generator for generating a linear array that represents the subset, where each event is represented by one of two data representations.
Preferably the linear array is a bit string and the data representations are 0 and 1 bits.
The present invention also provides a system for analysis of flow cytometry data, including: a parser for generating a parse tree based on a Boolean expression relating to one or more operand bit arrays representing events in the flow cytometry data; a Boolean evaluator for generating an evaluation command for each node in the parse tree, including selecting the command to be a result-generation or an overwriting evaluation command, based on whether the node is respectively an intermediate or terminal node in the parse tree; and a bit string processor for respectively generating a result bit array, or overwriting an operand bit array, based on each evaluation command from the Boolean evaluator.
The present invention also provides a system for analysis of flow cytometry, data representing parameter values of cytometry events, the system including: a user interface for generating a graphical representation of a flow cytometry display matrix of elements that represent event densities in the flow cytometry data; and a cluster selector for selecting a cluster of the elements based on a selected threshold density value and the event densities of the elements.
DESCRIPTION OF THE DRAWINGS
Preferred embodiments of the present invention are hereinafter described, by way of example only, with reference to the accompanying drawings, wherein: Figure 1 is a schematic diagram of one or more cytometers and an analysis system for analysis of flow cytometry data;
Figure 2 is a block diagram of software modules of the analysis system; Figure 3 is a flowchart of a display matrix generation process for analysis of the flow cytometry data by the analysis system; Figure 4 is a flowchart of a display image generation process performed by the analysis system;
Figure 5 is a flowchart of a subset selection process for selection of a subset of the flow cytometry data by the analysis system;
Figure 6 is a flowchart of a dynamic statistics generation and display process performed by the analysis system;
Figure 7 is a screen shot of a graphical user interface of the analysis system showing an example one-dimensional display image, a selected subset and dynamic statistics;
Figure 8 is a screen shot of the graphical user interface showing an example two-dimensional display image;
Figure 9 is a screen shot showing a display image of a first subset of the flow cytometry data in Figure 8;
Figure 10 is a screen shot showing a display image of a second subset of the flow cytometry data in Figure 8; Figure 11 is a screen shot showing a display image of a third subset of the flow cytometry data in Figure 8; Figure 12 is a flowchart of a subset bit string generation process performed by the analysis system;
Figure 13 is a flowchart of a Boolean expression evaluation process performed by the analysis system; Figure 14 is a screen shot showing a display image with a subset resulting from evaluation of a Boolean expression;
Figure 15 is a flowchart of a cluster generation process, for a cluster of elements in the display matrix, performed by the analysis system;
Figure 16 is a screen shot showing a display image of a two-dimensional display matrix;
Figure 17 is a screen shot showing a display image of a first cluster selected from the display matrix of Figure 16;
Figure 18 is a screen shot showing a display image of a second cluster selected from the display matrix of Figure 16; Figure 19 is a screen shot of the graphical user interface showing a report, including a list of display matrices, statistics and a Boolean expression; and
Figure 20 is a screen shot of the user interface showing a display matrix with an x-axis logarithmic scale for positive and linear scale for negative values.
DETAILED DESCRIPTION
An analysis system 100, as shown in Figure 1, for the analysis of flow cytometry data, including processing, manipulation and display, is embodied in a computing device 102, which includes a processor 104, non- volatile memory storage 106 (e.g. hard disks) and Random Access Memory (RAM) 108. The analysis system 100 receives flow cytometry data from one or more flow cytometry systems or cytometers 110.. The flow cytometry data is received either directly from the flow cytometer 110 during operation, or as data packets transmitted over a data network (e.g. a wired or wireless, open or closed network, including the Internet), or from a data storage device with prerecorded flow cytometry data (e.g. files stored in persistent memory, such as removable hard drives or flash memory). The computing device 102 is a personal computer (PC) with an Intel IA32 based processor 104 and a Microsoft Windows operating system. In alternative embodiments, however, the computing device 102 may be based on a different processor 104, e.g. from Advanced Micro Devices (AMD), and the operating system may be a Macintosh operating system or a Linux operating system. The computing device 102 may also be an application-specific computer, provided at least in part by one or more dedicated hardware components such as Application-Specific Integrated Circuits (ASICs), or Programmable Logic Devices (PLDs) such as Field- Programmable Gate Arrays (FPGAs). The functions of the computing device 102 may be distributed over a variety of locations, including based on client-server computing, or alternatively the computing device 102 may be a computing device such as a mobile telephone, or Personal Digital Assistant (PDA).
The analysis system 100 includes a plurality of software modules 200, running in a Java virtual machine 202 executed by an operating system (O/S) 204 operating on the computing device 102, as shown in Figure 2. Alternatively, the software modules 200 may be implemented using other computing languages such as C, C++, C#, or Ada, etc. The analysis modules 200 include:
1. an import/export module 206 for importing raw flow cytometry data and exporting processed data and analysis reports; 2. a display matrix generator 208 for generating a display matrix (also referred to as a 'display array'), which may be one-dimensional or multi-dimensional, including multiple array elements (or data 'cells'), based on the raw cytometry data;
3. an image generator 210 for generating images based on the display matrix; 4. a statistics generator 212 for generating statistics relating to the display matrix;
5. a graphical user interface (GUI) module 214 for displaying statistics and image data, and for receiving inputs/selections from a user;
6. a Boolean evaluator 216 for generating and evaluating Boolean expressions based on user selections; 7. a bit string processor 217 for performing bit- wise operations on bit arrays (or bit strings) representing events or aspects of events in the cytometry data;
8. a subset selector 218 for selecting subsets of events, e.g. based on selected parameter values for the events, or for events fitting certain clusters in the display matrix; and
9. a bit array generator 220 for generating a bit array, or bit string, based on the input flow cytometry data, a Boolean expression, and/or any selected clusters.
The analysis system 100 receives flow cytometry data from the cytometers(s) 100 and generates one or more graphical displays based on this data. The raw cytometer data represents measurements made on a large number of 'events' (e.g. one event per detected particle), e.g. from measuring a large number of microscopic particles in the cytometers(s) 110. For each event, a number of parameters are monitored in the cytometers(s) 110, such as the intensities or levels of light emitted by different fluorescent species (referred to as labels') attached to the microscopic particles, e.g. green (TLl1), orange (TL21), red ('FL3') and infrared ('FL4'). This cytometry data may be in the form of a raw data array, with the measured values for each parameter listed for each event, provided as a binary data file with header information describing the file details, e.g. format, size, and total number of events N.
The cytometry data is displayed by the analysis system 100 in various display images. For example, a histogram of events for each of the measured parameters may be generated, such as shown in Figure 7, where the number of events (also referred to as the density of events) is plotted against the parameter "FSC-H". An alternate display is a type of two-dimensional (bivariate) plot referred to in the art as a density plot where both the X and Y axes relate to measured parameters and the vertical axis, or a colour- based intensity indicator, represents the density of events, such as shown in Figure 8 for the parameters "FSC-H" (X axis) and "FSC-A" (Y axis). A user of the analysis system 100 uses the display images to analyse and manipulate the raw cytometer data. -For example, in a mixed sample of particles in the cytometers(s) 110, a one or multidimensional plot may display peaks in event density relating to different species, or types, of particles in the mixture of particles. The number of events (and thus particles) related to each different peak allows a user to determine, for example, relative proportions of particle types in the mixture.
The analysis system 100 generates the display matrix in a display matrix generation process 300, in Figure 3. The analysis system 100 loads raw event data from the cytometer(s) 110, including event parameter data for each event, using the import/export module 206 (step 302). The display matrix generator 208 receives the raw event data from the import/export module 206. For example, the raw event data is received in a binary data file, including a header that represents the format of the actual raw event data. The raw event data may represent a table of recorded events, e.g.
EVENTS PARAMETER A PARAMETER S PARAMETER Ω event 1 value IA value IB value 1 Ω event 2 value 2A value 2B value 2 Ω event 3 value 3 A value 3 B value 3 Ω
event i value iA value iB value i Ω
event N value NA value NB value N Ω
Under certain circumstances, the display matrix generator 208 will only process data relating to a subset of the raw event data: in this case, the display matrix generator 208 loads a subset bit array, or bit string, defining the subset of events (step 304). Subset bit arrays and bit strings are described below with reference to Figure 5. From the loaded event data, one or more event parameter arrays are generated with one array for each event parameter (step 306). Each parameter array is a linear (i.e. one- dimensional) array with a value stored for each measured event: e.g. an array for the parameter "FSC-H" is one of the general form[xl, x2, x3,..., xi, , xN],. where N is the total number of events and xi, of the general form {xi, i=l , N} = (xi, x-& ....XN), are the values of this parameter for each event. To generate the display matrix, the display matrix generator 208 needs to know which parameters to display and therefore the display matrix generator 208 receives selection data from the GUI module 214 representative of a selection of parameters to be displayed, e.g. a user selection (step 308). The selection of event parameters to be displayed in step 308 may also be automated, e.g. for generating pre-selected displays.
The image size, data resolution (e.g. number of display cells or points in the plot) and type (e.g. one-dimensional or multi-dimensional, having linear or logarithmic axes) are also selected for the display image (step 310): e.g. a one-dimensional display may be a histogram with 1024 'cells', and a two-dimensional display may be a density plot with linear-linear axes and a display of 256 x 256 cells. The data resolution, i.e. number of 'cells' in the output image, is independent of the device resolution (e.g. resolution of the screen in pixels).
The display matrix can be one-dimensional, or multi-dimensional, and may be of any practical size.
Once the input event data has been received, and the form of the output display image has been selected, the display matrix generator 208 generates an event-to-image transformation, or normalisation, for each of the one or more selected event parameter arrays (step 312). The role of the transformation is to transform, or normalise, each raw parameter value for the selected parameter(s) into a corresponding display value appropriate to the selected display image. The transformation maps raw parameter values to display parameter values. The transformation also applies any desired scaling, zooming and anti-aliasing operations. For example, for a one-dimensional histogram 701 (in Figure 7), the transformation scales the raw values of the parameter FSC-H (such as floating point values) to generate display values of the parameter (which are being integers) based on the size and resolution selected for the display image. The floating point values are scaled to fit the display axes (desired, data resolution), then rounded off to the nearest integer values. The transformation also accounts for translation of values, e.g. where some parameter values are less than or equal to zero for logarithmic-scale plots as shown on the mixed linear-logarithmic axis of Figure 20.
The values of the display matrix are generated, or populated, by an accumulator 209 in the display matrix generator 208 in an iterative process that is repeated for each event in the event parameter array(s), i.e. all events in the raw event data (or in any selected subset). In the first iteration, the image coordinates of the first event are generated as integers using the event-to-image transformation and the first value in each selected parameter array (step 314): for example, for a one-dimensional histogram with 256 histogram 'cells', the integer image coordinate of the first event is the scaled value, converted to an integer, indicative of where that event lies in the histogram. The integer image coordinate, or coordinates for a multi-dimensional display, determines into which element of the display matrix the event falls. Using the integer image coordinates, the display matrix generator 208 updates an initially empty display matrix (initialised in the import/export module 206) by incrementing by a single value (in this case, 1) the value in the array element corresponding to the image coordinates (step 316). The value of one element in a display matrix is incremented for each event being processed in the loop, and thus an up-to-date event density is indicated by the updated display matrix. The display matrix accumulates its values during iteration of step 316. The display matrix generator 208 determines if the density, or accumulated value, of the newly updated array element is greater than a "max-density" value, indicative of the maximum density of events in the display matrix at any one element (e.g. the maximum peak in a histogram): if the density of the updated array element is greater, the value of max-density is increased or incremented by the single value, e.g. 1 (step 318). The display matrix update loop of steps 314, 316 and 318 is repeated for each event in the raw event data, or the subset of event data if relevant (step 320). Once the display matrix is generated, it is stored by the display matrix generator 208 (step 322).
A display image is generated from the display matrix in a display image generation process 400, in Figure 4, performed by the image generator 210. The image generator 210 accesses the display matrix, as generated by the display matrix generator 208 (step 402), and loops through the display matrix to generate an image display matrix (steps 404 and 406). For each element in the display matrix, the image generator 210 determines whether the element has a non zero value (i.e. represents one or more events): if so, a value for the image display matrix is generated representing a height (e.g. for a one-dimensional histogram as in Figure 7), or a colour (for a multidimensional plot, e.g. as shown in Figure 8), based on the value of that element, scaled by the max-density value (step 404). The max-density value provides an upper limit on the density to be displayed, and thus allows the values in the display matrix to be mapped to heights, or colours in a colour map, scaled between zero (no events) and the maximum (max-density). The colour or height selection step 404 is repeated for each element in the display matrix (step 406), e.g. for all 65,536 elements in a 256x256 display matrix. The image generator 210 then renders all the elements in the image display matrix in one step (step 408) to generate a display image for the GUI module 214 for viewing by a user, e.g. plots such as in Figures 7 and 8.
The GUI module 214 generates a graphical user interface which allows the user to view and manipulate the event data. A display, in Figure 7, shows the histogram 701 of events plotted by the parameter "FSC-H". The GUI module 214 also generates user interface controls including subset selection controls 712, chart type selection controls 714 (e.g. histogram, two-dimensional, multi-dimensional, etc.), a clusterer control 716, cloning controls 718 (e.g. for cloning a gate or a window), file controls 720 (e.g. print, save, delete), and a Boolean function control 722.
The analysis system 100 allows for selection of subsets of the events in the raw events data, e.g. by a user interaction. An example subset 702, in Figure 7, includes the events whose values for the parameter "FSC-H" lie on or between the values represented in the histogram by lines 704 A and line 704B. Another subset in Figure 7 is subset 706 representing events with "FSC-H" parameter values lying between the values at lines 708 A and 708B. For a density plot display such as that shown in Figure 8, a subset may be selected by drawing two-dimensional shapes, e.g. polygon 802, or polygon 804, or polygon 806, which define corresponding subsets containing events represented by the points on the graph in Figure 8.
In a subset selection process 500, in Figure 5, the subset selector 218 selects events represented by part of the display matrix, which has been generated by the display matrix generator 208. The subset is selected (step 502) using one or more of the following methods:
1. defining a range for each of the selected event parameters in the display matrix (e.g. a range 702 or 706 in Figure 7), or by defining values numerically;
2. graphically selecting an area or volume in the displayed image, such as graphically positioned lines 704A and 704B, or the graphically created polygons 802, 804 and 806; and
3. using a contour-based clusterer, as described below with reference to Figures 15, 16, 17 and 18, which automatically selects the events in the subset based on values of elements in the display matrix.
In the selection process 500, the analysis system 100 performs a dynamic statistics generation and display process 600 (step 504) to dynamically generate statistics relating to the selected subset. For example, the number of events in the selected subset "Ml " 706 in Figure 7 is 17.3% of the total number of events N, which is displayed as a number 710 in Figure 7. Similarly, the number of events in subset "P2" as a percentage of total events N in the raw event data is 67.2%, which is displayed as a number 810 in Figure 8.
The dynamic statistics generation and display process 600, in Figure 6, is performed by the statistics generator 212. To determine the statistics related to a selected subset, the statistics generator 212 initialises a counter in the form of a subset density counter to zero (step 602). The statistics process 600 then loops through each element in the display matrix (e.g. 65,536 elements for a 2D 256x256 display matrix) and increments the subset density counter with each iteration of the loop, if a bit corresponding to that element in a subset bit string is set (steps 604 and 606). (Subset bit string generation is described below with reference to Figure 5.) The statistics generator 212 generates statistics (step 608) based on the total subset density, generated in the loop.of steps 604 and 606, and the total number of events N in the raw event data, as determined by the import/export module 206. The statistics generator 212 displays the statistics (e.g. as numbers 710, or 810) using the GUI module 214 (step 610).
The subset selection process 500 generates a linear subset array representing events within the selected subset (step 506). The linear array is in the form of a bit array, or preferably a bit string, i.e. a one-dimensional array with a length in bits corresponding to the number of events N in the raw event data. Each bit in the subset bit array represents a corresponding event in the raw event data, e.g. a binary string of size/length N bits. The bit string defines a gate that indicates for each event whether it is included in the subset or not, e.g. a binary "1 " indicates that the event is included in the subset, and a binary "0" indicates that the event is not included. The generated subset bit strings for a given set of raw event data have the same size, N, as the raw event data itself.
Once a subset has been selected in the subset selection process 500, that subset can then be displayed in a separate image display by generating a new display matrix for that subset. In the display matrix generation process 300, a subset, e.g. "P2" 802 in Figure 8, is represented by a subset bit string loaded in step 304. When generating the integer in the image coordinates for each event, in step 314, only events in the selected subset are included. As the subset is represented by a subset bit string, the integer image coordinates to be included in the subset display matrix are selected based on the bits in a subset bit string: in the update display matrix loop of steps 314, 316, 3\B and 320, the value of the bit in the bit string corresponding to the event in the loop is used to include or exclude that event. For example, events not represented in the subset have 0 as a binary value at a position in the bit string corresponding to their position in the event parameter arrays. Each event parameter array and each subset bit string is of length N. Generation of subset display matrices, and their display by the image generator 210, is illustrated by the displayed subsets 902 in Figure 9, 1002 in Figure 10 and 1102 in Figure 11 : these subsets 902, 1002 and 1102 correspond to the subsets selected by polygons 802, 804 and 806 in Figure 8, respectively. The displayed subset images illustrate that the range of density values used to map the colours of the pixels has changed as the range of density values for the subset display matrix is typically different to that of the display matrix for the whole set of raw event data.
A subset display matrix, as generated by the display matrix generation process 300, need not use the same selected parameters as used for generation of that subset. For example, a subset of events may be selected using two parameters "FCS-H" and "FSC- A", as in Figure 8 and a polygon 802; however, the subset selection process 500 generates a subset bit string based the underlying events in the subset, regardless of which parameters were used during generation of that subset. The display matrix generation process 300 relates to all events, and filters them using the subset bit string. The parameters selected for the display matrix generation process 300 where a subset bit string has been selected may be different from those used to select the bit string. In this way, a subset of events can be selected based on a first group of parameters, e.g. green and red fluorescence, and then this subset may then be displayed using different parameters, e.g. orange and infrared, which allows the events in the subset to be further distinguished by a user into multiple subsidiary subsets. In the subset bit string generation process 1200, in Figure 12, the bit string generator 220 loads a pre-initialised main bit array, e.g. as a one-dimensional bit string (e.g. initialised to an equal binary value for each point, and having a length N) and the raw event data (step 1202). The bit string generator 220 receives subset parameter values from the subset selector 218 as part of the subset selection process 500 (step 1204): these are the selected values of the relevant parameters that define a subset, e.g. the values of the parameters "FSC-H" and "FSC-A" inside the polygon 802 in Figure 8. The bit string generator 220 then loops through each event in the raw event data (steps 1206 and 1208). The bit string generator 220 determines whether an event's relevant parameter values, i.e. relevant to the display matrix used for selection of the subset, lie within the subset parameter values (step 1206). The bit in the bit string corresponding to this event is then set to reflect whether or not the event is in the subset; for example, the corresponding bit is set to binary 1 if the event is in the subset, and else it is left at binary 0. The determination in step 1206 is performed for each event in the main bit string/or event data, i.e. N times (step 1208).
Once one or more subsets have been defined, it is possible to evaluate these subsets by themselves, or with each other using a Boolean expression evaluation process 1300. The Boolean evaluation process 1300, performed by the Boolean evaluator 216, is performed in accordance with control selections made by the user of the GUI module 214, and uses existing bit arrays or bit strings, representing the one or more subsets, to generate a result from the expression. An expression may be selected by the user using the Boolean control 722, which allows an expression to be generated using operands (referring to the one or more subsets), and Boolean operators. The Boolean evaluation uses bit arrays or bit strings, representing all events. The Boolean expression is evaluated using only binary operations directly on the bit strings, which allows for a result to be generated with high speed / efficiency.
In the Boolean expression evaluation process 1300, a Boolean expression is selected, including Boolean operators and bit array (or bit string) operands (step 1302). The Boolean expression is parsed to generate a parse tree (step 1304), using for example a parse command in Java or another computer language. Each operator and its bit string(s) are evaluated using the bit string processor 217 (step 1306). The bit string processor 217 is a module or device for performing bit- wise operations on bit arrays or strings. All operations performed by the bit string processor 217 take bit arfeys or strings as their operands and produce results of type bit string or of type integer. While the operations are defined in terms of bit-wise Boolean logic, the implementation makes use of the widest available integer operation on the execution platform. The bit string processor 217 is implemented in either software (e.g. using the Java . programming language) or in hardware (e.g. using a FPGA). The operations are performed in a step-by-step a traversal of the generated parse tree (step 1308), i.e. the loop of steps 1306 and 1308 is repeated for all expressions in the parse tree. A resultant evaluation bit string for the subset resulting from the Boolean expression is generated at the end of the Boolean process 1300.
The evaluation process 1300 generates command data representing a form of machine language "program", or list of evaluation commands, for the bit string processor 217. An evaluation command for the bit string processor 217 is generated for each node in the parse tree. For a parse tree with a plurality of nodes, the corresponding plurality of evaluation commands comprise the generated "program". The evaluation command corresponding to each node is selected, by the Boolean evaluator 216, based on the position of the node in the parse tree. For an intermediate node, the Boolean evaluator 216 selects a result-generation operation, known as an "op" type, which causes the bit string processor 217 to generate data for a new resultant bit string, based on the Boolean operator and the one or more operand bit strings of that node. For a terminal node, the Boolean evaluator 216 selects an overwriting operation, known as an "op=" type, which causes the bit string processor 217 to overwriting one of the operand bit strings of that node based on the Boolean operator and the operand bit string(s). The evaluation process 1300 is performed with reduced memory allocation/deallocation overheads and at a relatively high speed compared to at least some existing 'methods of Boolean expression evaluation. The "op" type operations performed by the bit string processor 217 include:
(1) "bit_string_result = operand_l AND operand2";
(2) "bit_string_result = operand_l OR operand2"; and
(3) "bit_string_result = operand_l XOR operand2". Each of these result generation operations returns a newly generated resultant bit string, with each bit being the result of a (respective) logical AND, OR, XOR between the bits in the same position in the operand bit strings "operandl" and "operand2". .
A further "op" type operation performed by the bit string processor 217 is: (4) "bit_string_result = NOT bit_string_operand_l " .
This result generation operation returns a newly generated resultant bit string, with each bit being the logical opposite of the bit in the same position in the bit string "operandl".
The "op=" type operations performed by the bit string processor 217 include: ( 1 ) "operand 1 AND= operand2 " ;
(2) "operandl OR= operand2"; and
(3) "operandl XOR= operand2".
These overwriting operations overwrite each bit in "operandl" with the (respective) logical AND, OR, XOR between the corresponding bit in "operandl " and, bit in the same position in "operand2".
A further "op=" type operation performed by the bit string processor 217 is:
(4) "operandl NOT=".
This overwriting operation overwrites each bit in "operandl " with its logical opposite.
The bit string processor 217 also performs a population count operation that generates data representing the total number of bits of a certain value in an input bit string (e.g. the total number of binary Is in a bit string). The population count operation is:
"integer_result = POP_COUNT operandl". This operation returns the number of bits set to 1 in "operandl". The bit string processor 217 also performs permute operations on bit strings of potentially different lengths, e.g. to expand, condense or re-order their individual bits.
An example Boolean result subset, represented by an area 1502, may be generated from the union of two subsets "Rl" (1504) and "R2" (1506), in Figure 14. The resulting subset indicated by the area 1502 can be displayed in a new display matrix generation process 300 using alternate parameters, and/or used in further Boolean manipulations based on its corresponding subset bit string. A further example result subset, represented by the area 1503 in Figure 14, is generated from the union of two subsets "Ml " (702) and "M2" (706) in Figure 7: this illustrates that the subsets, including result subsets, can be used to generate a display matrix with parameters (e.g. "FSC-H" and "FSC-A" in Figure 14) that differ from the parameter(s) in the display matrix used to select the subset(s) (e.g. only "FSC-H" in Figure 7).
In the subset selection process 500, it is possible to select a subset of the display matrix using a contour-based or automatic "flood-fill" style, cluster generation process 1600, in Figure 15.
In the cluster generation process 1600, a portion, or cluster, of the elements of the display matrix is selected, by the user, based on a selected threshold density value (e.g. by selecting the density value of one of the elements). The subset selector 218 first receives or accesses a display matrix that has been generated for selected parameters (step 1602), e.g. the display matrix represented by the display image of Figure 16. In conjunction with the GUI module 214, a threshold density value — which is an integer number representing a number of events in the display matrix, between zero and max- density — is selected by the user (step 1604). The user may select the threshold density value by specifying a number, i.e. numerically, or by graphically identifying an element corresponding to a pixel (i.e. an element of the display matrix) in the display image of the display matrix, thereby specifying the threshold value to be the value of this graphically selected pixel. For example, the threshold value may be selected by hovering a mouse cursor over one of the circled pixels 1702, 1704 and 1706 in Figure 16. Once a threshold density value has been selected in step 1604, the cluster generation process 1600 loops through each neighbouring element in the display matrix to determine which elements are differentiated by their event density value, compared to the threshold density value (steps 1606 and 1608). Neighbouring elements are those adjacent each other in the display matrix.
Starting at a pixel location indicated, or hovered over, by the mouse cursor, the "flood- fill" process first determines the threshold density in the display matrix corresponding to that location. This process then adds that location to the cluster. The addition of any new location to the cluster triggers the examination of its neighbours: up, down, left and right. The examination first checks whether or not the neighbour in question is already considered part of the cluster. Only if it isn't, its density is compared to the threshold. If its density is higher than or equal to the threshold, it is added to the cluster, in turn triggering the examination of its neighbours: up, down, left and right. This process is optimised so as to minimise the number of pixel examinations. Examination extends outward from the starting location, "flowing" around "holes" in the cluster like a flood flows around obstacles. All elements selected in the cluster are contiguous/continuous, or join one another, thus ensuring that the cluster defines a single contiguous region as displayed. The analysis of each element in the display matrix to determine whether it is included in the cluster is repeated for all elements neighbouring the selected element (e.g. corresponding to the selected pixel) (step 1608). Requiring a contiguous cluster provides for a form of "flood-fill" effect in the selection of the cluster.
The "flood-fill" selection process of steps 1606 and 1608 may optionally be repeated for all local maxima (step 1609), i.e. all elements in the entire display matrix with density values relative to the threshold, e.g. greater or equal to the threshold density. In this way, multiple, non-contiguous regions can be selected into a single cluster.
The cluster is displayed by graphically indicating the corresponding pixels in the cluster on the display image (step 1610). For example, for a selected element corresponding to pixel 1704, by hovering the mouse cursor over that pixel, all pixels with element density values above or equal to that pixel 1704 are highlighted in a cluster display 1802 in Figure 17. The element corresponding to pixel 1704 is included in the cluster as an element corresponding to pixel 1804. In another example, if the element corresponding to pixel 1706 is selected, e.g. by hovering the cursor over it, a smaller cluster display 1902 is generated as shown in Figure 18. The smaller cluster display 1902 excludes elements with a density less than the threshold value, e.g. the element corresponding to pixel 1904 which corresponds to pixel 1704 and 1804, but includes the selected element corresponding to pixel 1706 and 1906.
In certain embodiments, the "flood-fϊH" selection process of steps 1606 and 1608 is performed with variations in terms of which locations are considered "neighbours". In one variation, the four diagonally adjacent locations are included in the examination. In another variation, the examination is limited to the diagonally adjacent locations only. In other variations, the "in or out" evaluation rule is replaced with a selection rule based on a dynamically adjusting threshold value, e.g. taking into account the distance from the starting location.
The cluster is used to generate a new subset including all events represented by the selected elements in the cluster. The cluster subset is selected by clicking the mouse cursor once the cluster display has been highlighted (step 1612). A subset bit string generated by the cluster generation process 1600 may be used to generate a subsequent display matrix, for any selected parameters, in the display matrix generation process 300, i.e. a cluster subset is another subset that can be used for analysis, in Boolean expressions, for display, etc., as any other subset.
While the mouse is being moved to select elements in the display matrix by hovering, the flood-fill operates on the display matrix only. When the mouse button is clicked to select the cluster subset (known as "fixing" the cluster selection), all events contributing to the respective density of the highlighted pixels in the display matrix are selected. In this way, the computational complexity of the interactive "flood-fill" highlighting process is bounded by the dimensions of the display matrix. The potentially very expensive selection of the set of events in the cluster subset is performed only when a user of the GUI 214 confirms their selection of the highlighted cluster by pressing the mouse button.
The GUI module 214 generates summary reports, e.g. shown in Figure 19,-;.that list the number of events 2002 in the various subsets of events 2004, for example 1 million events in the total data set "Fl" while only 173,293 or 17.3%, are in the set "Ml" 2006, also shown as "Ml" group 704 in Figure 7. The report data includes selected subsets and subsets generated by Boolean expressions, e.g. "Gl " 2008 generated by the
Boolean expression 2010, corresponding to the shaded areas 1502 and 1506 in Figure 14.
It will be appreciated by persons skilled in the art that numerous variations and/or modifications may be made to the invention as shown in the specific embodiments without departing from the scope of the invention as broadly described. "The present embodiments are, therefore, to be considered in all respects as illustrative and not restrictive.

Claims

CLAIMS:
1. A system for analysis of flow cytometry data representing parameter values of cytometry events, the system including a generator for generating a display; matrix of elements that represent event densities in the flow cytometry data, wherein the generator includes an accumulator that, for each event, increments a selected element's value, where the selected element has a position in the display matrix determined by a normalised value of at least one selected parameter.
2. A system for analysis of flow cytometry data, including: a selector for selecting a subset of events in the flow cytometry data; and a linear array generator for generating a linear array that represents the subset, where each event is represented by a corresponding one of two data representations.
3. A system as claimed in claim 2, wherein the linear array is a bit string and the data representations are 0 and 1 bits.
4. A system for analysis of flow cytometry data, including: a parser for generating a parse tree based on a Boolean expression relating to one or more operand bit arrays representing events in the flow cytometry data; a Boolean evaluator for generating an evaluation command for each node in the parse tree, including selecting the command to be a result-generation or an overwriting evaluation command, based on whether the node is respectively an intermediate or terminal node in the parse tree; and a bit string processor for respectively generating a result bit array, or overwriting an operand bit array, based on each evaluation command from the Boolean evaluator.
5. A system for analysis of flow cytometry data representing parameter values of cytometry events, the system including: a user interface for generating a graphical representation of a flow cytometry display matrix of elements that represent event densities in the flow cytometry data; and a cluster selector for selecting a cluster of the elements based on a selected threshold density value and the event densities of the elements.
6. A system as claimed in claim 4, wherein the threshold density value is selected by selecting one of the elements.
7. A system as claimed in any one of claims 4 to 6, wherein the user interface displays the selected cluster by highlighting a corresponding portion of the graphical representation of the display matrix.
8. A system as claimed in any one of preceding claims 4 to 7, including a subset selector, for selecting a subset of the events, based on the selected cluster.
9. A method for analysis of flow cytometry data representing parameter values of cytometry events, the system including a generator for generating a display matrix of elements that represent event densities in the flow cytometry data, wherein the generator includes an accumulator that, for each event, increments a selected element's value, where the selected element has a position in the display matrix determined by a normalised value of at least one selected parameter.
10. A method for analysis of flow cytometry data, including: a selector for selecting a subset of events in the flow cytometry data; and a bit array generator for generating a bit array that represents the subset, where each event is represented by a corresponding data bit. !
11. A method as claimed in claim 10, wherein the bit array is a bit string.
12. A method for analysis of flow cytometry data, including: a parser for generating a parse tree based on a Boolean expression relating to one or more operand bit arrays representing events in the flow cytometry data; a Boolean evaluator for generating an evaluation command for each node in the parse tree, including selecting the command to be a result-generation or an overwriting evaluation command, based on whether the node is respectively an intermediate or terminal node in the parse tree; and a bit string processor for respectively generating a result bit array, or overwriting an operand bit array, based on each evaluation command from the Boolean evaluator.
13. A method for analysis of flow cytometry data representing parameter values of cytometry events, the system including: a user interface for generating a graphical representation of a flow cytometry display matrix of elements that represent event densities in the flow cytometry data; and a cluster selector for selecting a cluster of the elements based on a selected threshold density value and the event densities of the elements.
14. A method as claimed in claim 13, wherein the threshold density value is selected by selecting one of the elements.
15. A method as claimed in any one of claims 13 to 14, wherein the user interface displays the selected cluster by highlighting a corresponding portion of the graphical representation of the display matrix.
16. A method as claimed in any one of preceding claims 13 to 15, including a subset selector, for selecting a subset of the events, based on the selected cluster.
17. A flow cytometry data analysis process, including: accessing flow cytometry data representing values of one or more parameters measured by flow cytometry; processing the flow cytometry data to generate display data representing a density plot of values of a selected one of said parameters; receiving user selection data representing a user selection of a first region of said density plot; processing the user selection data to determine a value of the selected parameter corresponding to the selected region; and selecting one or more portion of the flow cytometry data based on comparisons of the determined value of said parameter with values of said parameter corresponding to adjacent regions of said density plot.
18. The process of claim 17, including generating second region display data representing selection of a second region of said density plot corresponding to the one or more selected portion of the flow cytometry data; wherein said second region is a contiguous region that includes said first region.
19. The process of claim 17 or 18, wherein a of the flow cytometry data is selected if the determined value of said parameter is less than a value of said parameter corresponding to the set.
20. A system having components configured to execute the process of any one of claims 17 to 19.
21 A computer-readable storage medium having stored thereon programming instructions for executing the process of any one of claims 17 to 19.
PCT/AU2009/000582 2008-05-08 2009-05-08 A system and method for processing flow cytometry data WO2009135271A1 (en)

Applications Claiming Priority (2)

Application Number Priority Date Filing Date Title
US7161908P 2008-05-08 2008-05-08
US61/071,619 2008-05-08

Publications (1)

Publication Number Publication Date
WO2009135271A1 true WO2009135271A1 (en) 2009-11-12

Family

ID=41264348

Family Applications (1)

Application Number Title Priority Date Filing Date
PCT/AU2009/000582 WO2009135271A1 (en) 2008-05-08 2009-05-08 A system and method for processing flow cytometry data

Country Status (1)

Country Link
WO (1) WO2009135271A1 (en)

Cited By (3)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
WO2013016038A1 (en) * 2011-07-22 2013-01-31 Constitution Medical, Inc. Blood analyzer calibration and assessment
US9459196B2 (en) 2011-07-22 2016-10-04 Roche Diagnostics Hematology, Inc. Blood analyzer calibration and assessment
US11047791B2 (en) 2011-06-17 2021-06-29 Roche Diagnostics Hematology, Inc. Systems and methods for sample display and review

Citations (4)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
US4661913A (en) * 1984-09-11 1987-04-28 Becton, Dickinson And Company Apparatus and method for the detection and classification of articles using flow cytometry techniques
WO2005031357A2 (en) * 2003-09-24 2005-04-07 Ucl Biomedica Plc. Cell sample analysis
US20070118297A1 (en) * 2005-11-10 2007-05-24 Idexx Laboratories, Inc. Methods for identifying discrete populations (e.g., clusters) of data within a flow cytometer multi-dimensional data set
WO2008052258A1 (en) * 2006-10-31 2008-05-08 Inivai Technologies Pty Ltd A system and method for processing flow cytometry data

Patent Citations (4)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
US4661913A (en) * 1984-09-11 1987-04-28 Becton, Dickinson And Company Apparatus and method for the detection and classification of articles using flow cytometry techniques
WO2005031357A2 (en) * 2003-09-24 2005-04-07 Ucl Biomedica Plc. Cell sample analysis
US20070118297A1 (en) * 2005-11-10 2007-05-24 Idexx Laboratories, Inc. Methods for identifying discrete populations (e.g., clusters) of data within a flow cytometer multi-dimensional data set
WO2008052258A1 (en) * 2006-10-31 2008-05-08 Inivai Technologies Pty Ltd A system and method for processing flow cytometry data

Non-Patent Citations (1)

* Cited by examiner, † Cited by third party
Title
TREE STAR, INC.: "FlowJo manual for windows", FLOWJO V7.2.5, 17 September 2007 (2007-09-17), Retrieved from the Internet <URL:http://offsite.treestar.com/downloads/flowjo_v7_reference.pdf> [retrieved on 20090616] *

Cited By (4)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
US11047791B2 (en) 2011-06-17 2021-06-29 Roche Diagnostics Hematology, Inc. Systems and methods for sample display and review
US11933711B2 (en) 2011-06-17 2024-03-19 Roche Diagnostics Hematology, Inc. Systems and methods for sample display and review
WO2013016038A1 (en) * 2011-07-22 2013-01-31 Constitution Medical, Inc. Blood analyzer calibration and assessment
US9459196B2 (en) 2011-07-22 2016-10-04 Roche Diagnostics Hematology, Inc. Blood analyzer calibration and assessment

Similar Documents

Publication Publication Date Title
EP2347352B1 (en) Interactive tree plot for flow cytometry data
Bremer et al. Interactive exploration and analysis of large-scale simulations using topology-based data segmentation
JP4121125B2 (en) Graphics image generation apparatus and method, data analysis apparatus and method, and program
Doleisch et al. Interactive feature specification for focus+ context visualization of complex simulation data
US6928436B2 (en) Interactive generation of graphical visualizations of large data structures
US7940271B2 (en) System and method for large scale information analysis using data visualization techniques
US20100070502A1 (en) Collision Free Hash Table for Classifying Data
EP2149081B1 (en) Graphical user interface for analysis and comparison of location-specific multiparameter data sets
Scheiblauer et al. Out-of-core selection and editing of huge point clouds
US6665670B2 (en) Method and system for graphical representation of multitemporal, multidimensional data relationships
US8799859B2 (en) Augmented design structure matrix visualizations for software system analysis
US10753848B2 (en) Efficient contours and gating
CN106228554B (en) Fuzzy coarse central coal dust image partition method based on many attribute reductions
CN109937358B (en) Managing, synthesizing, visualizing and exploring parameters of a large multi-parameter dataset using computer technology
Jern et al. The gav toolkit for multiple linked views
WO2009135271A1 (en) A system and method for processing flow cytometry data
Chaszar et al. Multivariate interactive visualization of data in generative design
US10883912B2 (en) Biexponential transformation for graphics display
KR20160059452A (en) Partitioning an image
US11302070B1 (en) Systems and methods for multi-tree deconstruction and processing of point clouds
US8255416B2 (en) System and method for contextual data modeling utilizing tags
JP4972997B2 (en) Program analysis method for asset diagnosis
CN113010615B (en) Hierarchical data visualization method based on Gaussian mixture model clustering algorithm
Dragotakes et al. The Size of Fields in Biomedical Sciences
KR102494834B1 (en) Data processing apparatus and method for analyzing cluster of incomplete data set and adding virtual data points of hypercube shape based on fcs data

Legal Events

Date Code Title Description
121 Ep: the epo has been informed by wipo that ep was designated in this application

Ref document number: 09741600

Country of ref document: EP

Kind code of ref document: A1

DPE1 Request for preliminary examination filed after expiration of 19th month from priority date (pct application filed from 20040101)
NENP Non-entry into the national phase

Ref country code: DE

122 Ep: pct application non-entry in european phase

Ref document number: 09741600

Country of ref document: EP

Kind code of ref document: A1