EP2105863B1 - Procédé et système pour la classification automatique d'événements acquis par un cytomètre de flux - Google Patents

Procédé et système pour la classification automatique d'événements acquis par un cytomètre de flux Download PDF

Info

Publication number
EP2105863B1
EP2105863B1 EP08103082.7A EP08103082A EP2105863B1 EP 2105863 B1 EP2105863 B1 EP 2105863B1 EP 08103082 A EP08103082 A EP 08103082A EP 2105863 B1 EP2105863 B1 EP 2105863B1
Authority
EP
European Patent Office
Prior art keywords
event
events
clusters
flow cytometer
dimensional
Prior art date
Legal status (The legal status is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the status listed.)
Not-in-force
Application number
EP08103082.7A
Other languages
German (de)
English (en)
Other versions
EP2105863A1 (fr
Inventor
Miguel CASTAÑO GRACIA
Current Assignee (The listed assignees may be inaccurate. Google has not performed a legal analysis and makes no representation or warranty as to the accuracy of the list.)
Cytognos SL
Original Assignee
Cytognos SL
Priority date (The priority date is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the date listed.)
Filing date
Publication date
Application filed by Cytognos SL filed Critical Cytognos SL
Priority to EP08103082.7A priority Critical patent/EP2105863B1/fr
Priority to PCT/EP2009/053605 priority patent/WO2009118385A1/fr
Publication of EP2105863A1 publication Critical patent/EP2105863A1/fr
Application granted granted Critical
Publication of EP2105863B1 publication Critical patent/EP2105863B1/fr
Not-in-force legal-status Critical Current
Anticipated expiration legal-status Critical

Links

Images

Classifications

    • GPHYSICS
    • G01MEASURING; TESTING
    • G01NINVESTIGATING OR ANALYSING MATERIALS BY DETERMINING THEIR CHEMICAL OR PHYSICAL PROPERTIES
    • G01N15/00Investigating characteristics of particles; Investigating permeability, pore-volume or surface-area of porous materials
    • G01N15/10Investigating individual particles
    • G01N15/14Optical investigation techniques, e.g. flow cytometry
    • G01N15/1456Optical investigation techniques, e.g. flow cytometry without spatial resolution of the texture or inner structure of the particle, e.g. processing of pulse signals
    • G01N15/1459Optical investigation techniques, e.g. flow cytometry without spatial resolution of the texture or inner structure of the particle, e.g. processing of pulse signals the analysis being performed on a sample stream
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F18/00Pattern recognition
    • G06F18/20Analysing
    • G06F18/23Clustering techniques
    • G06F18/232Non-hierarchical techniques
    • G06F18/2321Non-hierarchical techniques using statistics or function optimisation, e.g. modelling of probability density functions
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06VIMAGE OR VIDEO RECOGNITION OR UNDERSTANDING
    • G06V20/00Scenes; Scene-specific elements
    • G06V20/60Type of objects
    • G06V20/69Microscopic objects, e.g. biological cells or cellular parts
    • G06V20/698Matching; Classification
    • GPHYSICS
    • G01MEASURING; TESTING
    • G01NINVESTIGATING OR ANALYSING MATERIALS BY DETERMINING THEIR CHEMICAL OR PHYSICAL PROPERTIES
    • G01N15/00Investigating characteristics of particles; Investigating permeability, pore-volume or surface-area of porous materials
    • G01N15/10Investigating individual particles
    • G01N2015/1006Investigating individual particles for cytology
    • GPHYSICS
    • G01MEASURING; TESTING
    • G01NINVESTIGATING OR ANALYSING MATERIALS BY DETERMINING THEIR CHEMICAL OR PHYSICAL PROPERTIES
    • G01N15/00Investigating characteristics of particles; Investigating permeability, pore-volume or surface-area of porous materials
    • G01N15/10Investigating individual particles
    • G01N15/14Optical investigation techniques, e.g. flow cytometry
    • G01N2015/1477Multiparameters

Definitions

  • the invention relates to the field of techniques for classifying sets of multidimensional data so as to distinguish inside them different groups or populations of elements identifiable by certain characteristics of their associated measured parameters.
  • These techniques are useful in a variety of domains, including medicine and biology, and in particular in the field of cellular analysis of blood or other biological samples by flow cytometry, a technique by which several characteristics of microscopic components of a biological sample, such as blood cells, are measured and recorded inside an apparatus called flow cytometer that drives them one by one through a region equipped with one or more lasers and appropriate light detectors.
  • Flow cytometry is a domain where the use of computer-assisted techniques for classification of multidimensional data is particularly useful if not essential.
  • the aim of this technology is to classify the cells or other particles detected by the flow cytometer into categories relevant for the diagnostic or other purpose for which the biological analysis is performed.
  • doctors must know the respective concentrations in the sample of the cells of the basic types: red blood cells (RBC), platelets, or white blood cells (WBC), which, in turn, can be classified into sub-groups such as granulocytes, monocytes or lymphocytes.
  • T of B lymphocytes which in turn may be of many different kinds.
  • T of B lymphocytes which in turn may be of many different kinds.
  • the progress in this field is associated to a continuous increase of the number of types of cells that can and must be distinguished. For example, it is increasingly frequent that some form of cancer that was considered a single disease several years ago is now found to have in fact several variants caused by mutations of different genes and for which some treatments may not produce the same results.
  • the problem of this method is that, as the number of dimensions of the problem grows, the technicians must identify clusters of cells in an increasing number of graphs, and with increasing frequency the clusters that can be seen in these graphs are in fact a superposition of several populations that must be painstakingly separated in other graphs representing other pairs of coordinates, making the analysis process increasingly time-consuming, tedious and prone to errors.
  • the invention deals with a method according to claim 1 and modular system recording to claim 6 designed to analyze sets of data elements that can be represented as points of a multidimensional space.
  • these points or data elements represent physical events such as the passage of cells or other microscopic particles through a certain region of a device equipped with suitable detectors, such as a flow cytometer.
  • these points or data elements will generally be called here “events”, and their associated detected values, represented as coordinates in a multidimensional space, will be called “parameters”.
  • the device equipped with detectors will also be designated here as "acquisition device”.
  • the modular analytical system receives event information from the acquisition device by whatever means, for example through a real-time software interface, or a hardware transmission link, or through storage and later retrieval of data in files located in computer storage devices.
  • the software integrated into the invention may reside in the same computer as the software controlling the acquisition device or in a separate one, and treats the above information by means of the following modules:
  • the clustering module includes a first step of preparation of data structures that may be useful to accelerate subsequent steps of the identification process even if the step of clustering proper is not performed.
  • This module may include an optional manual step through which, if necessary, the end-user can override all or part of the identifications performed by the computer.
  • the invention comprises a set of preferred modules provided by the inventor and self-sufficient for most applications, but it allows the inclusion of additional modules developed by third parties, as a complement or a replacement for some of the preferred modules of the invention.
  • the aim of the invention consists in classifying into appropriate categories the elements of some group or sample for each of which an appropriate device or system equipped with suitable detectors or measuring instruments provides a set of n measures reflected in real numbers.
  • each of these information elements corresponds to a physical event such as the passage of a microscopic object extracted from the analysed sample through a region of the device equipped with detectors. For this reason, these information elements will be typically called here “events”.
  • the n numbers associated to each event will be called “parameters”, and the process by which the said device or system extracts elements from a sample and feeds them successively through a sensitive region will be called here “acquisition”, and therefore the device or system performing it will be called “acquisition device” or “acquisition system”.
  • each of the n numbers associated with each event represents a different property of the object associated with this event, but the i-th number associated with any event always represents the same characteristic for all the events. Since each information element (event) is associated to a set of n numbers, it can be modelled as a point in an n-dimensional space where its coordinates are the n numbers measured by the acquisition device. In this model, each dimension of this multidimensional space corresponds to a parameter in the sense defined above.
  • the invention is a modular analytical system comprising one or more of the modules described hereafter: (A) data entry interface, (B) clustering module, (C) identification module, (D) acquisition control link. It is designed so that it is possible and easy to combine modules provided by the inventor with others developed by third parties. A schematic representation this system is given in Fig. 1 .
  • a typical application of this invention may be, for example, the automatic or assisted identification and counting of cells in a biological sample, such as a blood sample, the acquisition device being a flow cytometer or similar apparatus.
  • the data may be sent in real time by the acquisition system to the analytical system, or stored on a computer storage device for later analysis by this system.
  • the data may be fed directly to the analytical system through a software interface if the control software of the acquisition device and the analytical system are installed in the same computer, or through a communications link if they are in separate computers.
  • the analysis is not performed in real time during the acquisition, it is irrelevant whether the two software systems are located in the same computer or not.
  • a very useful function of the invention is the classification of the events into groups inside which the average or overall difference between the parameters of the events are significantly lower than in the whole dataset.
  • clustering is implicit in the manual identification process performed currently with the help of graphical representations of the events and described under [B3. Manual Modifications]. Essentially, this method consists in projecting the event space in one- or two-dimensional views as the one represented in Fig. 2 , where it is immediate to distinguish three main clusters of events, and defining individually each visible population by surrounding it with a closed curve such as the ellipse of the figure.
  • This manual process is currently the preferred technique of flow cytometrists for problems of medium and high complexity, because of the lack of automated systems sufficiently fast and convenient.
  • the invention includes its own preferred clustering method, but can work with other clustering methods provided they are efficient enough and use the same interface and rules as those provided with the invention.
  • this invention does not use a single grid, but a set of grids of different sizes, so that for each event its neighbours are searched for only in the grid of the most appropriate size.
  • the algorithm builds several grids of decreasing sizes and it associates each event with the grid element to which it belongs in each of the defined grids.
  • FIG. 3 In the (C) view of Fig 3 , we have represented for illustrative purposes two grids of different sizes on the same two-dimensional space. It should be clear that this representation is given here exclusively for explanatory purposes. In any application of the present invention to real problems, the space typically has many more dimensions and there are several more levels of grids, which are treated differently than in any published algorithm.
  • Figs. 4a-d disclose the method of the invention when applied to a bi-dimensional space, that is, when only two parameters are acquired for each event. Nevertheless, it is clear that the method is most advantageous when applied to the classification of events characterised by a large number of parameters, since an important advantage with respect to other known methods is the improved calculation time.
  • the preferred preparation algorithm works as follows:
  • the events of the dataset are read one by one and, when each event is loaded, the software calculates to which hypercube it belongs in each of the n-dimensional grids, and it associates the event with each of these hypercubes, updating the total number of events per hypercube.
  • the software does not update the coarsest levels of grids for each event: it is more efficient to build first the lowest (finest) layers of grids ant then to build the coarser ones by adding lower-level hypercubes instead of individual events.
  • the program stores information only for the hypercubes that contain events, so that the possible number of hypercubes in memory is at most equal to the number of events multiplied by the number of layers of grids, which is far lower than the potential number of hypercubes of all the grids.
  • the program can use any efficient "sparse array" technique, for example based on keys stored in hash tables or indexes.
  • Event 1 belongs to Squares I, II and III.
  • the software has created two lists, namely, a list of events containing which squares they belong to, and a list of squares containing the number of events in each square and a list of these events, as also shown in Fig. 4c .
  • each event is assigned a different size depending on how isolated the event is. This has the advantage of saving making only the minimum number of operations required for each event, thus saving calculation time.
  • the program looks for the closest neighbours of the reference event inside its hypercube of the selected grid level and, optionally, in one or a few of the contiguous hypercubes.
  • the software efficiently associates to each event a list of its p closest neighbours, p being a number predefined in the software, and calculates an associated local density, either based on the number of events of the selected hypercube, or on the average distances to the p closest neighbours.
  • Fig. 4d shows the selected 10 events. Then, the local density corresponding to Event 1 is calculated depending, in this example, on the average distance to these 10 events.
  • the program also builds a structure through which the events of the space can be accessed efficiently by descending order of local density, and it records group values for each parameter such as mean, maximum and minimum values, percentiles such as 5% and 95%, and other relevant information that can later be used to scale or adjust the expected values of the positions of the main populations that should be identified in the sample.
  • the number of grids, their separation, the sizes of the coarsest and finest one, the minimum number k of events of the selected hypercubes, and the minimum number p of neighbours taken into account for density evaluations and cluster building are configured empirically so as to provide the lowest calculation times compatible with a suitable construction of clusters plus a security margin, for the applications that will be treated with the invention. It is also almost mandatory to configure a maximal distance d beyond which no neighbour is taken into account for the reference event, even if there is no other available neighbour. In sophisticated applications, it is possible to define one value of d for each dimension of the event space or special values for specific regions of this space.
  • the algorithm will work as long as there is at least a "coarse" enough grid.
  • the minimum numbers k of events per hypercube, p of closest neighbours stored per event, or d of maximal separation between neighbouring events of a cluster are too small, the algorithm will perform no suitable clustering at all.
  • the ratios between the sizes of consecutive grids need not necessarily be integers, nor uniform, although in these cases the calculation of densities as a number of events per unit of hypervolume may be complicated. It is necessary, however, that the grids are explored by the program by order of size, either ascending or descending.
  • the distance typically used by the program for all calculations is the maximum of the absolute values of the differences between the respective coordinates of the two points under comparison, although in some parts of the invented system the use of other distances, for example Euclidean, may be considered.
  • the preparation step may perform some mathematical transformations on some of the coordinates if with the raw values of these fed by the acquisition device the average geometric characteristics of the population clusters are not sufficiently homogeneous between the different regions of the event space. For example, some coordinates may be best viewed in logarithmic scale, and the logarithmic transformation may or may not have been applied inside the acquisition device.
  • the preparation method described here allows, depending on the applications, either to perform a mathematical transformation on some coordinates just after loading them and before any treatment, to perform some of such transformations for some purposes and use the original data for others, or to configure the parameters controlling the search of the closest neighbours of each event with different values depending on the region of the event space where they are located.
  • the program processes the events by descending order of local density and, for each one (the reference event), determines whether there is among its p closest neighbours any event with a greater or equal local density. If there is none, the program starts building a new cluster with the reference event as its first one. If there are neighbours with equal or greater densities, the program identifies the cluster containing the event with greatest local density among the closest ones, or the closest event with greater local density than the reference one, and associates the reference event to this cluster. If the initially selected neighbour is associated to no cluster, the program explores the next one, and so on.
  • the algorithm may be configured so as the reference event is considered "isolated", that is, not associated to any cluster, or starts its own cluster as if it had been located in a density maximum.
  • Figs. 5a-b represent the abovementioned clustering process in the bi-dimensional example of Figs. 4a-d .
  • the process implies an exploration of the global list of events by descending order of local density. Let us suppose that at a given moment this exploration arrives to Event 1 of Figs 4c-d . Then, a search is performed to determine if there is an event among the 10 closest events with a higher local density. Since there isn't, Event 1 is the first event of a new cluster, named here Cluster 1. Next, Event 2 is selected. Is there an event among its closest 10 events with a higher local density? Yes, the software finds Event 1, and accordingly it assigns Event 2 to Cluster 1. As this process continues, a different cluster will originate in each local density maximum event.
  • the program will find some event at the frontier or "valley" between two clusters, as is the case of Event 12 in Fig. 5a .
  • the program must know whether the two adjacent clusters should be merged into one or be left separated, depending on the depth of their separation.
  • This aspect is a further one that must be configured empirically for each application of the invention. For example, in the bi-dimensional example of Fig. 5a , a high local density of Event 12 indicates that Cluster 1 and Cluster 2 must be merged. As the process continues, the same thing occurs for Event 67. Nevertheless, in this case, due to the low local density of Event 67, the program decides that the two clusters must remain separated, and further, it decides that Event 67 must not be taken into account.
  • Fig. 6 The same thing is represented for a one-dimensional example in Fig. 6 .
  • view (1) of Fig. 6 it seems clear that there are two clusters, in view (2) there is one, but in view (3) things are not so obvious. Furthermore, most often it will not suffice to give a value to a single number, because cases like (4) or (5) must be taken into account.
  • all the views of Fig. 6 reflect the presence of two different "bell-shaped" distributions of events indicating the presence of at least two sub-types of these, but this distinction may not necessarily be significant in practice in all applications.
  • the method described here in combination with the data structures prepared in the previous step is very fast even for problems with high numbers of parameters (dimensions) but there is often a price to pay for this speed: the resulting clusters are usually incomplete, containing often no more than 90% of the events that really form the cluster under objective criteria.
  • this problem results at least from two facts: the list of closest neighbours built for each event is not always complete, and the estimates of local density around each event cannot be irreproachably regular because of the inevitable statistical irregularities. It is possible to increase the percentage of events per cluster given by the program to virtually 100% by adjusting appropriately the configurable values described before, but the price to pay in terms of speed is too high. Just increasing this percentage from about 90% to 99% can increase many times the processing speed.
  • the speed of the clustering algorithm described above can be further improved by limiting the construction of clusters to some proportion of the events, for example 10% or 5%, which would multiply the speed respectively by 10 or 20.
  • This approach may be especially useful in problems involving great numbers of events, such as diagnostic tests of minimal residual disease performed by flow cytometry, where it is necessary to acquire millions of events to be reasonably sure of acquiring also at least a few dozen events of some populations whose presence is critical for the diagnostic and whose concentration in the sample is very low.
  • the basic clustering method described here can be completed, if necessary or useful depending on the application, by diverse rules reflecting particular situations, either based only on the data provided by the cluster construction procedure or on feedback from the identification module described below.
  • each of the populations of interest is identified by a set of conditions applied to the coordinates of the events.
  • An event is considered to belong to a certain population if its coordinates fulfil all the conditions associated to this population. If the problem has been correctly modelled, these sets of conditions should be mutually exclusive, because it should not be possible for an event to belong to several populations simultaneously, unless one is a subpopulation of the other, a situation that will not be considered here because it suffices to identify the lowest-level subpopulations and those of higher levels can be deduced immediately.
  • Fig. 7 represents two clusters with "low" values of the X j coordinate and different values of the X i coordinate. It is understood that, in a manual analysis process, such representation would have required to previously identify and hide other populations that could appear superposed to the above in this representation, but a multidimensional analysis program can operate directly on each coordinate without need of previously hiding other populations projecting on the same region of the two-dimensional graph.
  • the two populations are identified according to their respective positions on the X i axis in relation to the standard separation line, which is vertical for the horizontal axis.
  • the clusters are found approximately at their normal expected positions and therefore the separation is left at its default position. It will be used later to classify small clusters and events belonging to no cluster.
  • both clusters occupy regions with values of the X i coordinate lower than expected.
  • the program still finds two clusters, and stills considers them to belong to different populations because the X i coordinates of the rightmost one are too high to classify this cluster as a second cluster of the "low X i " population, but the program must correct the position of the frontier, which will be used later to separate small clusters and isolated events.
  • the preferred process of classification of the main clusters detected in the previous clustering step is described below. It is normally restricted to clusters over a certain size, but it is sufficiently flexible to be used also, depending on the applications, for small clusters or isolated events, or for isolated events only, either if there has been a previous clustering step or if there has been none.
  • the program determines the best classification for each cluster by calculating for it a certain coefficient for each possible population, on the basis of the characteristics of this cluster and possibly of other ones of the sample. These characteristics may include the location, dimensions, density, shape, or whatever other characteristic that can be calculated on the basis of the properties of the clusters or events of the sample. Shapes will normally be evaluated on two-dimensional projections, because, on the one hand, if is nearly impossible for the human mind to conceive shape patterns in more than 3 dimensions, and very difficult in 3, and, on the other hand, with the computers of present times, shape analysis in several dimensions would require prohibitive calculation times.
  • the calculated coefficient is considered to reflect the "probability", “likeliness” or “presumption” that the said cluster belongs to the population for which the coefficient is calculated.
  • the program is totally flexible and allows the implementation of practically any known model of the domains of standard mathematics or artificial intelligence. For example:
  • the system presents the further advantage of allowing an easy integration of modules developed by third parties, as long as they are capable of reading the values calculated and stored by previous steps for events and clusters, and their result is a "likeliness” or a “likeliness correction” coefficient for which the developer provides enough explanations and scaling instructions to allow the integration of the module into the system.
  • One of the characteristics used most frequently to determine the population to which a group of events belongs is the location of its projection on the coordinate axes.
  • the simplest distinctions are between “low” and “high” values, or between “low”, “medium”, and “high” values when three different populations can be resolved by the values of a single coordinate. Therefore, it is very frequent that the likeliness coefficients are calculated, at least in part, as some additive or multiplicative combination of several other coefficients that reflect the respective positions of the projections of the event group on a certain number of axes.
  • Each of these partial one-dimensional likeliness coefficients is normally calculated as a one-variable function of its associated coordinate, and each of these functions can typically be represented by a curve similar to those shown in Fig. 8 .
  • This figure shows two different possibilities for the function used to calculate the individual likeliness coefficient corresponding for the coordinate X i to a population characterized as having a "high" value for this coordinate.
  • the dotted curve (A) corresponds to a Bayesian approach and should represent the density of the probability that the centre of the said population has the value X i for the i-th coordinate. In principle, this distribution should be obtained by recording the frequencies of each possible position of the population in a sufficient number of real samples analysed by conventional means.
  • the likeliness function defined by the (B) "curve" composed of three straight segments is an artificial variant that most often provides acceptable results with shorter calculation times than the (A) function.
  • Normal implementations of the identification module of the invention should in principle provide this kind of basic one-dimensional partial likeliness functions, so that the end-user can configure full solutions for simple problems by just defining a set of such functions without need of custom programming.
  • a good number of applications of flow cytometry can be configured by adding to a set of the above one-dimensional functions a specialized module capable of recognizing granulocytes, monocytes and lymphocytes in a two-dimensional front scatter / side scatter (FSC / SSC) representation like that of Fig. 2 .
  • FSC / SSC two-dimensional front scatter / side scatter
  • One such module provided with this invention would be a generalization to two dimensions, and to two or more separator lines of configurable orientations, of the simple one-dimensional separation problem represented in Fig. 7 , complemented by optional additional rules. It may be worth noting that, in comparison to other possible existing treatments of this problem, the one provided with this invention should normally work in n dimensions and use all the available coordinates, in addition to FSC and SSC, to identify the clusters. In other words, in our treatment the cluster that can be seen at the bottom of Fig.
  • the program when the program has calculated for a given cluster its likeliness coefficients for all the populations according to the configured model, it assigns the cluster to the population for which the coefficient is higher. Once this operation has been performed for all clusters above a certain size (or for all clusters or events depending on the configuration), there is the possibility of ending this phase and initiate the next one, or of iterating the step of coefficient calculation and population assignment. Obviously, this can be useful only if certain coefficients may depend on which populations have been assigned to certain clusters. In case the program is configured to allow iteration, for reasons of stability it is strongly recommended to base the calculations of the coefficients of step i only on the results of step i-1 and not on any of those of the ongoing step i. It is also strongly recommended to configure a maximal number of steps with a reasonable value, so that the program stops unconditionally after at most this number of steps, avoiding possible endless loops.
  • step 2 it is possible to configure the module so as to allow only assignment changes increasing a global likeliness coefficient calculated on the basis of all the individual ones of each population.
  • the assignment change can occur only if the likeliness coefficient of the group whose classification changes increases.
  • the assignment change is allowed even if this likeliness coefficient decreases, as long as the global one increases. Both variants may be time-consuming, because all the possible consequences of each change must be evaluated. Practice can determine whether such refinements are appropriate for a given application.
  • the program can be configured to stop iterating when one or more of the following conditions occur:
  • the results of the identification algorithm can be completed by deducing a global likeliness coefficient from those associated to each population.
  • the exact calculations depend on the configured model (Bayesian, neuronal, etc.) and, essentially, a value of the global coefficient near the maximum means that in the processed sample all the populations are clearly recognizable, whereas low values indicate that a significant number of characteristics of the analysed sample do not correspond to any expected or recognizable combination of the populations identifiable by the system. This may occur, for example, if the sample is analyzed with a program configured for samples of another type, or, in the analysis of biological samples, if there are pathological or deviant populations whose characteristics have not been configured in the system, or if the sample contains great quantities of debris.
  • the invention may optionally include a manual analysis module so that the user can manually override the results of the automatic analysis procedure.
  • the method is already well-known in the art and essentially consists in identifying each population by surrounding its group of points in several complementary two-dimensional projections of the event space, as illustrated in Fig. 2 , where it is possible to distinguish three clusters corresponding to different populations of blood cells, and the one corresponding to monocytes has been surrounded by the user with an ellipse so as to identify it to the computer.
  • This option requires a real-time bi-directional communication channel between the acquisition device and the identification module of the analytical system, which may consist, for example, in direct subprogram calls or an internal or external TCP/IP communication. Normally, this link will be the same through which the acquisition device feeds data to the clustering module, but it must necessarily be bi-directional and work in real time.
  • the acquisition device sends information in real time to the analysis module about the parameters of each acquired event.
  • the analysis module analyzes groups of events and looks for patterns that it can identify. If it determines that these patterns are inadequate because some settings of the acquisition device are maladjusted, it can send in real time a request to the acquisition program to modify some of these settings and continue the acquisition. After several such cycles, when the analysis module considers that it has received enough well-acquired events to identify all the populations relevant to the kind of analysis being performed, it instructs the acquisition device to stop the acquisition and it provides automatically the results of the analysis.
  • the analysis of input data for purposes of adjustment of the settings of the acquisition device is performed by functions that may operate independently of one another or not, and analyse either aspects of the said data related to the characteristics of the identified populations, or other ones independent of these, or both.
  • the modules of this kind included in the system and provided by the inventor may be completed by others provided by third parties.
  • the feedback module described here can be used with the clustering and identification procedures described above and these are already very fast, the overall performance can be further improved for real-time analysis by taking into account only a fraction of the acquired events. This will be usually be acceptable for purposes of initial acquisition control, though it may not be so for the subsequent complete analysis once the settings of the acquisition device have been correctly adjusted, because some applications require acquiring great numbers of events, typically several million, so as to be able to detect populations that are present only in proportions of a few events per million of total events. This is the case, for example, of minimal residual disease analysis in flow cytometry.
  • the dialogue between the acquisition program and the analysis module must be standardized and include at least the following instructions:

Landscapes

  • Engineering & Computer Science (AREA)
  • Physics & Mathematics (AREA)
  • Life Sciences & Earth Sciences (AREA)
  • General Physics & Mathematics (AREA)
  • Health & Medical Sciences (AREA)
  • Theoretical Computer Science (AREA)
  • Chemical & Material Sciences (AREA)
  • General Health & Medical Sciences (AREA)
  • Data Mining & Analysis (AREA)
  • Biomedical Technology (AREA)
  • Dispersion Chemistry (AREA)
  • Immunology (AREA)
  • Biochemistry (AREA)
  • Molecular Biology (AREA)
  • Multimedia (AREA)
  • Analytical Chemistry (AREA)
  • Pathology (AREA)
  • Artificial Intelligence (AREA)
  • Probability & Statistics with Applications (AREA)
  • Bioinformatics & Cheminformatics (AREA)
  • Bioinformatics & Computational Biology (AREA)
  • Computer Vision & Pattern Recognition (AREA)
  • Evolutionary Biology (AREA)
  • Evolutionary Computation (AREA)
  • General Engineering & Computer Science (AREA)
  • Information Retrieval, Db Structures And Fs Structures Therefor (AREA)
  • Management, Administration, Business Operations System, And Electronic Commerce (AREA)

Claims (6)

  1. Procédé pour la classification automatique d'événements obtenus par un cytomètre de flux, dans lequel chaque événement est caractérisé par un ensemble de paramètres n, le procédé comprenant les étapes consistant à calculer la densité locale correspondant à chaque évènement, à trouver des groupes d'événements dans l'espace de paramètre n-dimensionnel et à identifier chaque groupe à une population, caractérisé en ce que l'étape consistant à calculer la densité locale correspondant à chaque évènement comprend les étapes suivantes :
    a) créer un ensemble de grilles n-dimensionnelles de différentes tailles d'hypercube dans l'espace de paramètre n-dimensionnel ;
    b) choisir une taille d'hypercube à partir dudit ensemble de grilles n-dimensionnelles pour chaque évènement, la taille d'hypercube étant la taille de l'hypercube le moins peuplé contenant ledit événement et un nombre minimum k d'événements ; et
    c) associer à chaque événement une liste de ses voisins les plus proches p situés à l'intérieur de l'hypercube assigné à chaque événement et à une distance maximale d dudit évènement, et
    d) calculer la densité locale dans les voisinages de chaque évènement, sur la base des distances moyennes aux voisins les plus proches p dudit évènement.
  2. Procédé pour la classification automatique d'événements obtenus par un cytomètre de flux selon la revendication 1, dans lequel l'étape c) comprend en outre la recherche des voisins les plus proches p dans un ou quelques-uns des hypercubes contigus à l'hypercube assigné à l'évènement.
  3. Procédé pour la classification automatique d'événements obtenus par un cytomètre de flux selon l'une quelconque des revendications 1-2, qui comprend en outre l'étape consistant à :
    e) trier les événements selon leurs densités locales afin de réaliser l'étape consistant à trouver des groupes d'événements dans l'ordre descendant de densité locale.
  4. Procédé pour la classification automatique d'événements obtenus par un cytomètre de flux selon l'une quelconque des revendications 1-3, dans lequel, après une étape consistant à construire des groupes d'évènements, le procédé comprend en outre l'étape consistant à :
    f) corriger les paramètres de calibrage du cytomètre de flux sur la base d'un mauvais emplacement de groupes par rapport aux populations prévisibles.
  5. Procédé selon la revendication 4, dans lequel les paramètres de calibrage comprennent les voltages du multiplicateur et les valeurs de compensation.
  6. Système pour la classification automatique des événements obtenus par un cytomètre de flux, dans lequel chaque événement est caractérisé par un ensemble de paramètres n, le système comprenant :
    un cytomètre de flux, qui obtient un ensemble de paramètres n correspondant aux propriétés physiques des événements ;
    un module de regroupement (B), connecté au cytomètre de flux au moyen d'un module d'interface (A) configuré pour calculer la densité locale correspondant à chaque événement selon les étapes a) à d) de la revendication 1, et optionnellement selon la revendication 2, et configuré pour trouver des groupes parmi les données d'événements reçus du cytomètre de flux ; et
    un module d'identification (C), connecté au module de regroupement, qui identifie des groupes à des populations,
    le système comprenant en outre une connexion (D) entre le module d'identification (C) et le module d'interface (A) pour corriger les paramètres de calibrage du cytomètre de flux sur la base d'un mauvais emplacement de groupes par rapport aux populations prévisibles.
EP08103082.7A 2008-03-28 2008-03-28 Procédé et système pour la classification automatique d'événements acquis par un cytomètre de flux Not-in-force EP2105863B1 (fr)

Priority Applications (2)

Application Number Priority Date Filing Date Title
EP08103082.7A EP2105863B1 (fr) 2008-03-28 2008-03-28 Procédé et système pour la classification automatique d'événements acquis par un cytomètre de flux
PCT/EP2009/053605 WO2009118385A1 (fr) 2008-03-28 2009-03-26 Procédé et système pour la classification automatique d’événements acquis par un cytomètre en flux

Applications Claiming Priority (1)

Application Number Priority Date Filing Date Title
EP08103082.7A EP2105863B1 (fr) 2008-03-28 2008-03-28 Procédé et système pour la classification automatique d'événements acquis par un cytomètre de flux

Publications (2)

Publication Number Publication Date
EP2105863A1 EP2105863A1 (fr) 2009-09-30
EP2105863B1 true EP2105863B1 (fr) 2017-09-13

Family

ID=39683945

Family Applications (1)

Application Number Title Priority Date Filing Date
EP08103082.7A Not-in-force EP2105863B1 (fr) 2008-03-28 2008-03-28 Procédé et système pour la classification automatique d'événements acquis par un cytomètre de flux

Country Status (2)

Country Link
EP (1) EP2105863B1 (fr)
WO (1) WO2009118385A1 (fr)

Families Citing this family (1)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
WO2012119206A1 (fr) * 2011-03-10 2012-09-13 Newsouth Innovations Pty Limited Analyse de grappes multidimensionnelles

Family Cites Families (5)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
US4661913A (en) * 1984-09-11 1987-04-28 Becton, Dickinson And Company Apparatus and method for the detection and classification of articles using flow cytometry techniques
US6014904A (en) 1996-05-09 2000-01-18 Becton, Dickinson And Company Method for classifying multi-parameter data
EP2565826B1 (fr) * 2000-05-11 2019-11-06 Becton Dickinson and Company Systeme d'identification de grappes dans des diagrammes de dispersion faisant intervenir des polygones lisses avec des limites optimales
US7697764B2 (en) * 2003-11-21 2010-04-13 National University Corporation Kochi University Similar pattern searching apparatus, method of similar pattern searching, program for similar pattern searching, and fractionation apparatus
US7299135B2 (en) * 2005-11-10 2007-11-20 Idexx Laboratories, Inc. Methods for identifying discrete populations (e.g., clusters) of data within a flow cytometer multi-dimensional data set

Non-Patent Citations (1)

* Cited by examiner, † Cited by third party
Title
None *

Also Published As

Publication number Publication date
WO2009118385A1 (fr) 2009-10-01
EP2105863A1 (fr) 2009-09-30

Similar Documents

Publication Publication Date Title
Schwämmle et al. A simple and fast method to determine the parameters for fuzzy c–means cluster analysis
Rai et al. A survey of clustering techniques
Bashashati et al. A survey of flow cytometry data analysis methods
Boullé MODL: a Bayes optimal discretization method for continuous attributes
JP4354977B2 (ja) フローサイトメーター多次元データセット内のデータの離散母集団(例えば、クラスター)を識別する方法
EP3500964A1 (fr) Système et procédé de classement de particules biologiques
Handl et al. Strategies for the increased robustness of ant-based clustering
WO2005010677A2 (fr) Caracterisation de stimuli biologiques par courbes de reponse
US20210406272A1 (en) Methods and systems for supervised template-guided uniform manifold approximation and projection for parameter reduction of high dimensional data, identification of subsets of populations, and determination of accuracy of identified subsets
Ammu et al. Review on feature selection techniques of DNA microarray data
Cahyani et al. Increasing Accuracy of C4. 5 Algorithm by applying discretization and correlation-based feature selection for chronic kidney disease diagnosis
Camilleri et al. Parameter optimization in decision tree learning by using simple genetic algorithms
EP2105863B1 (fr) Procédé et système pour la classification automatique d'événements acquis par un cytomètre de flux
Yang et al. Probabilistic multimodal optimization
Ji et al. Machine learning of discriminative gate locations for clinical diagnosis
CN109981630A (zh) 基于卡方检验和ldof算法的入侵检测方法及系统
Prajapati et al. High dimensional nearest neighbor search considering outliers based on fuzzy membership
Holmström et al. Estimation of level set trees using adaptive partitions
Ingram et al. Glint: An MDS Framework for Costly Distance Functions.
Li et al. Data mining techniques for the practical bioinformatician
Ahmad et al. Humoral-mediated clustering
Sharmili et al. A Hybridization Approach for Optimal Feature Subset Selection in High Dimensional Data
Gupta Explanation Methods for a Medical Image Classifier by Analysis of its Uncertainty
Yue Computational methods for cytometry
Abdallah et al. Using Bayesian inference to measure the proximity of flow cytometry data

Legal Events

Date Code Title Description
PUAI Public reference made under article 153(3) epc to a published international application that has entered the european phase

Free format text: ORIGINAL CODE: 0009012

AK Designated contracting states

Kind code of ref document: A1

Designated state(s): AT BE BG CH CY CZ DE DK EE ES FI FR GB GR HR HU IE IS IT LI LT LU LV MC MT NL NO PL PT RO SE SI SK TR

AX Request for extension of the european patent

Extension state: AL BA MK RS

17P Request for examination filed

Effective date: 20100304

R17P Request for examination filed (corrected)

Effective date: 20100303

AKX Designation fees paid

Designated state(s): AT BE BG CH CY CZ DE DK EE ES FI FR GB GR HR HU IE IS IT LI LT LU LV MC MT NL NO PL PT RO SE SI SK TR

REG Reference to a national code

Ref country code: DE

Ref legal event code: R079

Ref document number: 602008052080

Country of ref document: DE

Free format text: PREVIOUS MAIN CLASS: G06K0009000000

Ipc: G01N0015140000

RIC1 Information provided on ipc code assigned before grant

Ipc: G06K 9/62 20060101ALI20170420BHEP

Ipc: G01N 15/14 20060101AFI20170420BHEP

Ipc: G06K 9/00 20060101ALI20170420BHEP

GRAP Despatch of communication of intention to grant a patent

Free format text: ORIGINAL CODE: EPIDOSNIGR1

STAA Information on the status of an ep patent application or granted ep patent

Free format text: STATUS: GRANT OF PATENT IS INTENDED

RIN1 Information on inventor provided before grant (corrected)

Inventor name: CASTANO GRACIA, MIGUEL

INTG Intention to grant announced

Effective date: 20170616

GRAS Grant fee paid

Free format text: ORIGINAL CODE: EPIDOSNIGR3

GRAA (expected) grant

Free format text: ORIGINAL CODE: 0009210

STAA Information on the status of an ep patent application or granted ep patent

Free format text: STATUS: THE PATENT HAS BEEN GRANTED

AK Designated contracting states

Kind code of ref document: B1

Designated state(s): AT BE BG CH CY CZ DE DK EE ES FI FR GB GR HR HU IE IS IT LI LT LU LV MC MT NL NO PL PT RO SE SI SK TR

REG Reference to a national code

Ref country code: GB

Ref legal event code: FG4D

REG Reference to a national code

Ref country code: CH

Ref legal event code: EP

REG Reference to a national code

Ref country code: IE

Ref legal event code: FG4D

REG Reference to a national code

Ref country code: AT

Ref legal event code: REF

Ref document number: 928632

Country of ref document: AT

Kind code of ref document: T

Effective date: 20171015

REG Reference to a national code

Ref country code: DE

Ref legal event code: R096

Ref document number: 602008052080

Country of ref document: DE

REG Reference to a national code

Ref country code: NL

Ref legal event code: MP

Effective date: 20170913

REG Reference to a national code

Ref country code: LT

Ref legal event code: MG4D

PG25 Lapsed in a contracting state [announced via postgrant information from national office to epo]

Ref country code: NO

Free format text: LAPSE BECAUSE OF FAILURE TO SUBMIT A TRANSLATION OF THE DESCRIPTION OR TO PAY THE FEE WITHIN THE PRESCRIBED TIME-LIMIT

Effective date: 20171213

Ref country code: HR

Free format text: LAPSE BECAUSE OF FAILURE TO SUBMIT A TRANSLATION OF THE DESCRIPTION OR TO PAY THE FEE WITHIN THE PRESCRIBED TIME-LIMIT

Effective date: 20170913

Ref country code: LT

Free format text: LAPSE BECAUSE OF FAILURE TO SUBMIT A TRANSLATION OF THE DESCRIPTION OR TO PAY THE FEE WITHIN THE PRESCRIBED TIME-LIMIT

Effective date: 20170913

Ref country code: SE

Free format text: LAPSE BECAUSE OF FAILURE TO SUBMIT A TRANSLATION OF THE DESCRIPTION OR TO PAY THE FEE WITHIN THE PRESCRIBED TIME-LIMIT

Effective date: 20170913

Ref country code: FI

Free format text: LAPSE BECAUSE OF FAILURE TO SUBMIT A TRANSLATION OF THE DESCRIPTION OR TO PAY THE FEE WITHIN THE PRESCRIBED TIME-LIMIT

Effective date: 20170913

REG Reference to a national code

Ref country code: AT

Ref legal event code: MK05

Ref document number: 928632

Country of ref document: AT

Kind code of ref document: T

Effective date: 20170913

PG25 Lapsed in a contracting state [announced via postgrant information from national office to epo]

Ref country code: ES

Free format text: LAPSE BECAUSE OF FAILURE TO SUBMIT A TRANSLATION OF THE DESCRIPTION OR TO PAY THE FEE WITHIN THE PRESCRIBED TIME-LIMIT

Effective date: 20170913

Ref country code: BG

Free format text: LAPSE BECAUSE OF FAILURE TO SUBMIT A TRANSLATION OF THE DESCRIPTION OR TO PAY THE FEE WITHIN THE PRESCRIBED TIME-LIMIT

Effective date: 20171213

Ref country code: GR

Free format text: LAPSE BECAUSE OF FAILURE TO SUBMIT A TRANSLATION OF THE DESCRIPTION OR TO PAY THE FEE WITHIN THE PRESCRIBED TIME-LIMIT

Effective date: 20171214

Ref country code: LV

Free format text: LAPSE BECAUSE OF FAILURE TO SUBMIT A TRANSLATION OF THE DESCRIPTION OR TO PAY THE FEE WITHIN THE PRESCRIBED TIME-LIMIT

Effective date: 20170913

REG Reference to a national code

Ref country code: FR

Ref legal event code: PLFP

Year of fee payment: 11

PG25 Lapsed in a contracting state [announced via postgrant information from national office to epo]

Ref country code: NL

Free format text: LAPSE BECAUSE OF FAILURE TO SUBMIT A TRANSLATION OF THE DESCRIPTION OR TO PAY THE FEE WITHIN THE PRESCRIBED TIME-LIMIT

Effective date: 20170913

PG25 Lapsed in a contracting state [announced via postgrant information from national office to epo]

Ref country code: RO

Free format text: LAPSE BECAUSE OF FAILURE TO SUBMIT A TRANSLATION OF THE DESCRIPTION OR TO PAY THE FEE WITHIN THE PRESCRIBED TIME-LIMIT

Effective date: 20170913

Ref country code: CZ

Free format text: LAPSE BECAUSE OF FAILURE TO SUBMIT A TRANSLATION OF THE DESCRIPTION OR TO PAY THE FEE WITHIN THE PRESCRIBED TIME-LIMIT

Effective date: 20170913

Ref country code: PL

Free format text: LAPSE BECAUSE OF FAILURE TO SUBMIT A TRANSLATION OF THE DESCRIPTION OR TO PAY THE FEE WITHIN THE PRESCRIBED TIME-LIMIT

Effective date: 20170913

PGFP Annual fee paid to national office [announced via postgrant information from national office to epo]

Ref country code: GB

Payment date: 20180327

Year of fee payment: 11

PG25 Lapsed in a contracting state [announced via postgrant information from national office to epo]

Ref country code: IT

Free format text: LAPSE BECAUSE OF FAILURE TO SUBMIT A TRANSLATION OF THE DESCRIPTION OR TO PAY THE FEE WITHIN THE PRESCRIBED TIME-LIMIT

Effective date: 20170913

Ref country code: IS

Free format text: LAPSE BECAUSE OF FAILURE TO SUBMIT A TRANSLATION OF THE DESCRIPTION OR TO PAY THE FEE WITHIN THE PRESCRIBED TIME-LIMIT

Effective date: 20180113

Ref country code: EE

Free format text: LAPSE BECAUSE OF FAILURE TO SUBMIT A TRANSLATION OF THE DESCRIPTION OR TO PAY THE FEE WITHIN THE PRESCRIBED TIME-LIMIT

Effective date: 20170913

Ref country code: AT

Free format text: LAPSE BECAUSE OF FAILURE TO SUBMIT A TRANSLATION OF THE DESCRIPTION OR TO PAY THE FEE WITHIN THE PRESCRIBED TIME-LIMIT

Effective date: 20170913

Ref country code: SK

Free format text: LAPSE BECAUSE OF FAILURE TO SUBMIT A TRANSLATION OF THE DESCRIPTION OR TO PAY THE FEE WITHIN THE PRESCRIBED TIME-LIMIT

Effective date: 20170913

PGFP Annual fee paid to national office [announced via postgrant information from national office to epo]

Ref country code: FR

Payment date: 20180326

Year of fee payment: 11

REG Reference to a national code

Ref country code: DE

Ref legal event code: R097

Ref document number: 602008052080

Country of ref document: DE

PLBE No opposition filed within time limit

Free format text: ORIGINAL CODE: 0009261

STAA Information on the status of an ep patent application or granted ep patent

Free format text: STATUS: NO OPPOSITION FILED WITHIN TIME LIMIT

PG25 Lapsed in a contracting state [announced via postgrant information from national office to epo]

Ref country code: DK

Free format text: LAPSE BECAUSE OF FAILURE TO SUBMIT A TRANSLATION OF THE DESCRIPTION OR TO PAY THE FEE WITHIN THE PRESCRIBED TIME-LIMIT

Effective date: 20170913

26N No opposition filed

Effective date: 20180614

REG Reference to a national code

Ref country code: DE

Ref legal event code: R119

Ref document number: 602008052080

Country of ref document: DE

REG Reference to a national code

Ref country code: CH

Ref legal event code: PL

PG25 Lapsed in a contracting state [announced via postgrant information from national office to epo]

Ref country code: SI

Free format text: LAPSE BECAUSE OF FAILURE TO SUBMIT A TRANSLATION OF THE DESCRIPTION OR TO PAY THE FEE WITHIN THE PRESCRIBED TIME-LIMIT

Effective date: 20170913

Ref country code: MC

Free format text: LAPSE BECAUSE OF FAILURE TO SUBMIT A TRANSLATION OF THE DESCRIPTION OR TO PAY THE FEE WITHIN THE PRESCRIBED TIME-LIMIT

Effective date: 20170913

REG Reference to a national code

Ref country code: BE

Ref legal event code: MM

Effective date: 20180331

REG Reference to a national code

Ref country code: IE

Ref legal event code: MM4A

PG25 Lapsed in a contracting state [announced via postgrant information from national office to epo]

Ref country code: LU

Free format text: LAPSE BECAUSE OF NON-PAYMENT OF DUE FEES

Effective date: 20180328

PG25 Lapsed in a contracting state [announced via postgrant information from national office to epo]

Ref country code: DE

Free format text: LAPSE BECAUSE OF NON-PAYMENT OF DUE FEES

Effective date: 20181002

Ref country code: IE

Free format text: LAPSE BECAUSE OF NON-PAYMENT OF DUE FEES

Effective date: 20180328

PG25 Lapsed in a contracting state [announced via postgrant information from national office to epo]

Ref country code: CH

Free format text: LAPSE BECAUSE OF NON-PAYMENT OF DUE FEES

Effective date: 20180331

Ref country code: LI

Free format text: LAPSE BECAUSE OF NON-PAYMENT OF DUE FEES

Effective date: 20180331

Ref country code: BE

Free format text: LAPSE BECAUSE OF NON-PAYMENT OF DUE FEES

Effective date: 20180331

GBPC Gb: european patent ceased through non-payment of renewal fee

Effective date: 20190328

PG25 Lapsed in a contracting state [announced via postgrant information from national office to epo]

Ref country code: MT

Free format text: LAPSE BECAUSE OF NON-PAYMENT OF DUE FEES

Effective date: 20180328

Ref country code: GB

Free format text: LAPSE BECAUSE OF NON-PAYMENT OF DUE FEES

Effective date: 20190328

PG25 Lapsed in a contracting state [announced via postgrant information from national office to epo]

Ref country code: FR

Free format text: LAPSE BECAUSE OF NON-PAYMENT OF DUE FEES

Effective date: 20190331

PG25 Lapsed in a contracting state [announced via postgrant information from national office to epo]

Ref country code: TR

Free format text: LAPSE BECAUSE OF FAILURE TO SUBMIT A TRANSLATION OF THE DESCRIPTION OR TO PAY THE FEE WITHIN THE PRESCRIBED TIME-LIMIT

Effective date: 20170913

PG25 Lapsed in a contracting state [announced via postgrant information from national office to epo]

Ref country code: PT

Free format text: LAPSE BECAUSE OF FAILURE TO SUBMIT A TRANSLATION OF THE DESCRIPTION OR TO PAY THE FEE WITHIN THE PRESCRIBED TIME-LIMIT

Effective date: 20170913

Ref country code: HU

Free format text: LAPSE BECAUSE OF FAILURE TO SUBMIT A TRANSLATION OF THE DESCRIPTION OR TO PAY THE FEE WITHIN THE PRESCRIBED TIME-LIMIT; INVALID AB INITIO

Effective date: 20080328

PG25 Lapsed in a contracting state [announced via postgrant information from national office to epo]

Ref country code: CY

Free format text: LAPSE BECAUSE OF FAILURE TO SUBMIT A TRANSLATION OF THE DESCRIPTION OR TO PAY THE FEE WITHIN THE PRESCRIBED TIME-LIMIT

Effective date: 20170913