CN107389536B - Flow cell particle classification counting method based on density-distance center algorithm - Google Patents

Flow cell particle classification counting method based on density-distance center algorithm Download PDF

Info

Publication number
CN107389536B
CN107389536B CN201710641341.0A CN201710641341A CN107389536B CN 107389536 B CN107389536 B CN 107389536B CN 201710641341 A CN201710641341 A CN 201710641341A CN 107389536 B CN107389536 B CN 107389536B
Authority
CN
China
Prior art keywords
data
density
particle
distance
algorithm
Prior art date
Legal status (The legal status is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the status listed.)
Active
Application number
CN201710641341.0A
Other languages
Chinese (zh)
Other versions
CN107389536A (en
Inventor
陶靖
Current Assignee (The listed assignees may be inaccurate. Google has not performed a legal analysis and makes no representation or warranty as to the accuracy of the list.)
Shanghai Nano Derivatives Technology Co Ltd
Original Assignee
Shanghai Nano Derivatives Technology Co Ltd
Priority date (The priority date is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the date listed.)
Filing date
Publication date
Application filed by Shanghai Nano Derivatives Technology Co Ltd filed Critical Shanghai Nano Derivatives Technology Co Ltd
Priority to CN201710641341.0A priority Critical patent/CN107389536B/en
Publication of CN107389536A publication Critical patent/CN107389536A/en
Application granted granted Critical
Publication of CN107389536B publication Critical patent/CN107389536B/en
Active legal-status Critical Current
Anticipated expiration legal-status Critical

Links

Images

Classifications

    • GPHYSICS
    • G01MEASURING; TESTING
    • G01NINVESTIGATING OR ANALYSING MATERIALS BY DETERMINING THEIR CHEMICAL OR PHYSICAL PROPERTIES
    • G01N15/00Investigating characteristics of particles; Investigating permeability, pore-volume or surface-area of porous materials
    • G01N15/10Investigating individual particles
    • G01N15/14Optical investigation techniques, e.g. flow cytometry
    • GPHYSICS
    • G01MEASURING; TESTING
    • G01NINVESTIGATING OR ANALYSING MATERIALS BY DETERMINING THEIR CHEMICAL OR PHYSICAL PROPERTIES
    • G01N15/00Investigating characteristics of particles; Investigating permeability, pore-volume or surface-area of porous materials
    • G01N15/10Investigating individual particles
    • G01N15/14Optical investigation techniques, e.g. flow cytometry
    • G01N2015/1486Counting the particles

Landscapes

  • Chemical & Material Sciences (AREA)
  • Dispersion Chemistry (AREA)
  • Physics & Mathematics (AREA)
  • Health & Medical Sciences (AREA)
  • Life Sciences & Earth Sciences (AREA)
  • Analytical Chemistry (AREA)
  • Biochemistry (AREA)
  • General Health & Medical Sciences (AREA)
  • General Physics & Mathematics (AREA)
  • Immunology (AREA)
  • Pathology (AREA)
  • Investigating Or Analysing Biological Materials (AREA)

Abstract

The invention relates to a flow cytometry particle classification counting method based on a density-distance center algorithm, which comprises the following steps: 1) acquiring a flow data set of cell particles to be classified and counted by adopting a flow cytometry, wherein the flow data set comprises multi-dimensional data of the particles; 2) obtaining local density and distance parameters of each particle in the streaming data set according to a density-distance center algorithm, screening and sequencing to obtain an initial cluster center to be clustered; 3) and taking the initial cluster center as an initial value of a mixed model algorithm, clustering the particle swarm according to the mixed model to obtain a plurality of classified particle clusters, and counting. Compared with the prior art, the method has the advantages of high accuracy, good stability, adaptability to the distribution of streaming data, adaptability to the classification of small sample particle swarms, high calculation speed and the like.

Description

Flow cell particle classification counting method based on density-distance center algorithm
Technical Field
The invention relates to the field of cell particle classification measurement, in particular to a flow cell particle classification counting method based on a density-distance center algorithm.
Background
Flow Cytometry (FCM) is a quantitative analysis technique using a flow cytometer, which utilizes the hydrodynamic focusing principle to align cells or microparticles to be analyzed in a row, rapidly flow detection light beams one by one, and measure multi-angle scattered light and multi-color fluorescence caused by the cells or microparticles by a high-precision optical system, electronic signal processing and computer data analysis, thereby obtaining physical and chemical characteristics such as the size, internal structure, nucleic acid, protein, and the like of tens of thousands of cells or microparticles in a short time. Flow cytometry is an important basic scientific research instrument for advanced scientific research in the field of biomedical science due to the advantages of rapidness, accuracy, large batch, multi-parameter analysis and the like of flow cytometry; meanwhile, the device is also an important clinical examination device.
Multi-angle scattered light and multicolor fluorescence caused by each cell or particle are collected by an optical system and converted into electric signals by a photoelectric sensor, the electric signals are processed and sampled into digital signals, and the digital signals are stored and analyzed by a computer; the characteristic data of all cells or particles acquired by the flow cytometer is called flow data.
Traditionally, analysis of streaming data relies on experienced personnel projecting the data into a two-dimensional scattergram and then using area gating to analyze the clusters of interest, such as classification and counting, known as manual gating. With the continuous development of flow cytometry, the amount of flow data is multiplied, and the automatic analysis of the data becomes a main direction for the future development of flow cytometry technology. For the cluster analysis of streaming data, some automatic analysis methods are proposed in sequence, and mainly classified into a clustering method based on probability distribution and a clustering method based on spatial information.
The clustering method based on probability distribution is mainly a finite mixed model clustering algorithm, such as a Gaussian mixed model algorithm based on Bayesian information criterion, and the algorithm has better processing capacity on a cell group consisting of normal or near-normal distributed data sets; the t-distribution mixed model algorithm converts the data of the non-normal distribution into the near-normal distribution, and replaces a Gaussian mixed model to perform cluster analysis on the flow data; and a skewed t-distribution mixed model algorithm can better process data in asymmetric distribution. The hybrid model clustering algorithms are continuously developed, and the adaptability of the models to different data distributions is improved. However, the solutions found by the mixture models themselves, such as gaussian, t-and biased t-distributions, are locally optimal, so that the clustering algorithm based on the finite mixture model depends on the position of the initial point (i.e., the cluster center). Because actual data is often complex, for example, under the condition of more noise points, the mixed model clustering algorithm has wrong scores, and the stability of the algorithm is not high.
The clustering method based on the spatial information is another main method for analyzing the streaming data, such as a K-means algorithm and a DBSCAN algorithm, and the clustering capability of the streaming data is limited. The clustering algorithm based on the finite mixture model is more suitable for analyzing the streaming data and is applied more. Since the finite mixture model-based clustering algorithm depends on the location of the initial point (i.e., the cluster center), it is sensitive to the initial value of the model. The clustering algorithm based on K-means and a mixed model is usually random for selecting the central point of the initial cluster, people are used to make the mutual distance of the initial clustering centers as far as possible, but the K-means algorithm itself obtains a local optimal solution, so that the random initial value still possibly falls into local optimal, the initial value of the model is difficult to be stably selected, and the accuracy and the stability of the result cannot be ensured.
In practical situations, streaming data is often complex, and various adverse situations have great challenges in clustering analysis of the streaming data, for example, when there are many noise points, the noise points are sometimes mistakenly classified into a single cluster by the predecessor method. In addition, the small sample size and sparsely distributed clusters do not provide a good solution. For example, in the classification analysis of leukocytes in human peripheral blood, monocytes usually account for 2% to 10% of the total leukocytes, eosinophils usually account for 1% to 6% of the total leukocytes, lymphocytes account for about 40% and granulocytes account for about 50%, which are the most predominant group. In such multi-class clustering analysis, the number of large sample classes and small sample classes are very different and close to each other, and the difficulty is the positioning and distinguishing of the small sample classes. The small sample group is easy to be interfered by the adjacent dominant group due to small sample amount and sparse distribution, and is wrongly divided into one part of other groups, so that the small sample group has high requirements on the discrimination and stability of the algorithm.
Disclosure of Invention
The invention aims to overcome the defects of the prior art and provide a flow cytometry particle classification counting method based on a density-distance center algorithm.
The purpose of the invention can be realized by the following technical scheme:
a flow cytometry particle classification counting method based on a density-distance center algorithm comprises the following steps:
1) acquiring a flow data set of cell particles to be classified and counted by adopting a flow cytometry, wherein the flow data set comprises multi-dimensional data of the particles;
2) obtaining local density and distance parameters of each particle in the streaming data set according to a density-distance center algorithm, screening and sequencing to obtain an initial cluster center to be clustered;
3) and taking the initial cluster center as an initial value of a mixed model algorithm, clustering the particle swarm according to the mixed model to obtain a plurality of classified particle clusters, and counting.
In the step 1), when the data in the streaming data set is two-dimensional data, taking the data of a forward scattering light channel as a y-axis and the data of a side scattering light channel as an x-axis to form a two-dimensional scattergram; or taking the data of the side scattered light channel as a y axis and the data of the fluorescence channel as an x axis to form a two-dimensional scatter diagram; when the data in the streaming data set is three-dimensional data, the data of the forward scattering light channel is taken as an x-axis, the data of the side scattering light channel is taken as a y-axis, and the data of the fluorescence channel is taken as a z-axis to form a three-dimensional scattergram.
The step 2) specifically comprises the following steps:
21) for streaming data set S ═ x1,x2...xi...xnDefine the ith particle x thereiniLocal density of (p)iAnd a distance deltaiThe parameters are respectively;
Figure BDA0001365974470000031
Figure BDA0001365974470000032
Figure BDA0001365974470000033
wherein d isijIs xiTo xjEuclidean distance of dcχ (x) is a function of the truncation distance;
22) setting a local density threshold ρ0And excluding particles having a local density less than a threshold;
23) arranging all the remaining particles into a sequence according to the sequence of the distances from large to small;
24) and setting the number k of the clusters, and sequentially selecting the first k particles as initial cluster centers to be clustered according to the sequence.
In the step 21) described above, the step,
when the ith particle is the point with the highest local density, the value is deltaiThe maximum of the distances from the ith particle to all points is then:
Figure BDA0001365974470000034
in the step 21) described above, the step,
when there are a plurality of particle points having the same local density, an increment approaching 0 is added to the local density, and then the local density and distance parameters of each particle are recalculated.
In the step 24), when the euclidean distance between the centers of the two clusters is smaller than the set threshold, the centers are regarded as the same cluster, and any one point in the centers of the two clusters is taken as a new cluster center, or a point with a higher local density in the centers of the two clusters is taken as a new cluster center.
In the step 3), the mixed model algorithm comprises a Gaussian mixed model, a t-distribution mixed model and a partial t-distribution mixed model.
Compared with the prior art, the invention has the following advantages:
firstly, the accuracy is high, and the stability is good: the density-distance center algorithm is adopted to find the initial center of each particle group, so that the clustering process is high in accuracy and good in stability, and the situation of wrong classification caused by local optimal solution is avoided.
Secondly, adapting the distribution of streaming data: and a mixed model (such as a Gaussian model, a t-distribution mixed model, a partial t-distribution mixed model and the like) is adopted for clustering, so that the distribution characteristics of the streaming data can be effectively adapted.
Thirdly, adapting to the classification of small sample particle swarms: the method can effectively process the small sample particle swarm and has high positioning and classifying accuracy.
Fourthly, the calculation speed is high: and determining an initial cluster center by a density-distance center algorithm, wherein the initial cluster center is used as an initial center value of a mixed model clustering algorithm, and the calculation speed is accelerated.
Drawings
FIG. 1 is a flow chart of the method of the present invention.
FIG. 2 is a schematic diagram of embodiment I of the present invention, wherein FIG. 2a is a distance-density distribution diagram, FIG. 2b is a two-dimensional scatter diagram, and FIG. 2c is the result after clustering.
FIG. 3 is a schematic diagram of example II of the present invention, in which FIG. 3a is a distance-density distribution diagram, FIG. 3b is a two-dimensional scatter diagram, and FIG. 3c is the result after clustering.
FIG. 4 is a schematic view of example III of the present invention, in which FIG. 4a is a distance-density distribution diagram, FIG. 4b is a two-dimensional scatter diagram, and FIG. 4c is a result after clustering.
Detailed Description
The invention is described in detail below with reference to the figures and specific embodiments.
The invention provides a density-distance center-based mixed model stream data clustering method, which applies a density-distance center algorithm to the positioning of an initial clustering center of stream data to determine an initial cluster center, thereby ensuring the stability and accuracy of a finite mixed model result. The method integrates methods based on probability distribution and spatial information (density and distance), so that the problem of distinguishing small sample groups can be well solved, and meanwhile, the method is strong in noise resistance, high in stability and good in accuracy.
Fig. 1 shows a specific flow of the clustering method for processing streaming data according to the present invention. The following clustering steps are described in detail below with reference to fig. 1:
in step 401, a flow cytometer is used to obtain a flow data set to be analyzed, such as characteristic data of cellular particles, including detected amounts of multi-angle scattered light and polychromatic fluorescence. The streaming dataset to be analyzed contains multidimensional data of the particles. When the data in the streaming data set is two-dimensional data, such as data including forward scattered light channel data and side scattered light channel data, the forward scattered light channel data may be used as the y-axis, and the side scattered light channel data may be used as the x-axis to form a two-dimensional scattergram, as shown in fig. 2 (b); if the data comprises the data of the side scattering light channel and the data of the fluorescence channel, the data of the side scattering light channel can be used as a y axis, and the data of the fluorescence channel can be used as an x axis to form a two-dimensional scattergram; when the data in the streaming data set is three-dimensional data, such as data including forward scattered light channel data, data of side scattered light channel, and data of fluorescence channel, the three-dimensional scattergram can be formed by using the forward scattered light channel data as the x-axis, the data of side scattered light channel as the y-axis, and the data of fluorescence channel as the z-axis.
In step 402, for the streaming data set to be analyzed, the local density and distance parameters of each particle are obtained by a density-distance center algorithm, represented in a distance-density profile, as shown in fig. 2 (a).
For a dataset to be clustered, S ═ { x1,x2,…,xnDefine the ith particle x thereiniLocal density of (p)iAnd a distance deltaiTwo parameters (i ∈ [1, n ]]). The local density reflects the density of data within a certain interval, which is defined as follows:
Figure BDA0001365974470000051
wherein the function
Figure BDA0001365974470000052
Parameter dijIs represented by xiTo xjSuch as the spatial euclidean distance. Parameter dc>0 is a truncation distance, which is preset according to actual sample data, if d is takenc5. From the formula (1), the local density ρiRepresenting the sum of x in the data seti(excluding itself) is less than dcThe number of data points of (a).
Distance delta to a pointiThe definition of (1) is to calculate the distance from the point to all points with the density larger than the local density, and take the minimum value, and the specific formula is as follows:
Figure BDA0001365974470000053
if this point is already the point of maximum local density, then δiThe value is assigned as the maximum of its distances to all points.
Figure BDA0001365974470000054
According to formulas (1) to (4), each point xiA local density p can be obtainediAnd a distance value deltai
Specifically, if there are a plurality of particle points having the same local density, an increment approaching 0 is added to the local density, and then the local density and distance parameters of each particle are recalculated.
In step 403, a local density threshold ρ is set0And making a judgment. If the local density of a particle spot is less than the threshold value p0The particle spot is deleted from the data set.
In step 404, all the remaining particles are arranged in a sequence with the distances from large to small.
In step 405, a cluster number k is set, and the first k particles are sequentially selected as the initial cluster center to be clustered according to the sequence.
For certain streaming data which is determined to be analyzed, the number of the clusters to be classified of the same type of experimental sample is determined a priori and is the same, and the number of the clusters is preset to a fixed value k, for example, k is 4.
Set the center of the class group as
Figure BDA0001365974470000061
(j∈[1,k]),cjThe reference number indicating the center point of the cluster (i.e. sequentially selected delta)iI) and D represents the set of labels of the center points of the selected clusters, the specific formula is as follows:
Figure BDA0001365974470000062
particularly, if the spatial euclidean distance between the two cluster centers is smaller than a predetermined threshold, the two cluster centers are regarded as the same cluster, and any one of the two cluster centers is taken as a new cluster center, or a point having a large local density is taken as a new cluster center.
In step 406, the initial cluster center is taken asThe initial value of the hybrid model algorithm, i.e. the position parameter μ of the respective t-distribution component density functionjAnd performing cluster analysis on the particle swarm according to the mixed model, wherein parameter estimation is performed by using a maximum likelihood algorithm.
And considering the distribution characteristics of the streaming data, the clustering algorithm based on the finite mixture model is more suitable. The Gaussian mixture model algorithm has better processing capability on a cell group consisting of a normal or near-normal distribution data set; the t-distribution mixed model algorithm can adapt to the data of the abnormal distribution; the skew t-distribution mixed model algorithm can better process data with asymmetric distribution. The hybrid model clustering algorithms are continuously developed, and the adaptability of the models to different data distributions is improved. The method for solving the initial cluster center according to the density-distance center algorithm can be applied to all mixed models (Gaussian model, t-distribution mixed model and partial t-distribution mixed model). However, according to the distribution characteristics of the flow data of the blood cells, and considering the complexity of algorithm implementation and the operation efficiency, a t-distribution mixed model is adopted for clustering analysis.
The following describes the specific algorithm of the hybrid model:
1) hybrid model
Let X be a p-dimensional random vector, and X1,x2,…,xnN p-dimensional random sample observations of a random vector X, and independent of each other, the probability density function of the multivariate mixture model generated by X and composed of k components is defined as:
Figure BDA0001365974470000071
wherein k is the component number of the mixed model; Θ ═ (|)1,...,πk-11,...,θk) Is an unknown parameter matrix; f (x; theta)i) Representing the probability density function, theta, of the ith componentiIs its unknown parameter vector; piiFor the mixing ratio, the ratio of the ith component density in the mixing model is expressed, which satisfies
Figure BDA0001365974470000072
2) t-hybrid model
If f (x; theta) in formula (5)i) Is a t-distribution, then f (x; Θ) is a t-mixture model. The probability density function of the P-dimensional t-distribution is of the form:
Figure BDA0001365974470000073
where μ is a position parameter, Σ is a positive definite matrix, υ is a degree of freedom, δ (x; μ, Σ) ═ x- μ)TΣ (x- μ), which is the square of the mahalanobis distance between x and μ, and Γ (x) is a Gamma function, defined as
Figure BDA0001365974470000074
For the t-hybrid model, each component density function is a P-dimensional t-distribution density function, and the hybrid model formula is:
Figure BDA0001365974470000075
for streaming data, if it can be divided into k classes, the t-hybrid model assumes that it consists of k t-distributions. The final clustering result is to find k flow cell groups corresponding to k t-distributions. By establishing maximum likelihood estimation on the streaming data samples, the mixing parameters of the maximum likelihood estimation can be obtained by adopting an EM algorithm. XiFor a certain p-dimensional sample value, X, in streaming datai=(xi1,xi2,...,xip)T. Introduction of XiComponent label vector Zi=(zi1,zi2,...,zik)TAnd satisfies the following conditions: xiWhen it belongs to the jth t-distribution, zij1, otherwise z ij0. Namely ZiRepresents the sample value XiTo which t-profile it belongs. At this point, the complete data vector set is XC=(XT,Z1 T,Z2 T,...,Zn T)T. Wherein X ═ X1 T,X2 T,...,Xn T)T. Its corresponding log-likelihood function can be written as:
Figure BDA0001365974470000076
3) EM algorithm estimation
For the t-hybrid model, the process of parameter estimation by using the EM algorithm is as follows:
(1) and E stage: let Θ be(t)Is the estimated value of the t-th iteration, then under the given condition theta(t)The conditions of the log-likelihood function under are expected to be
Q(Θ;Θ(t)))=E(ln(Lc(Θ|Xc));Θ(t)) (9)
(2) And (3) an M stage: from equation (8), the theta is calculated(t+1)Let Q (theta; theta)(t+1)) At a maximum, i.e.
Θ(t+1)=argmax(Q(Θ;Θ(t))) (10)
(3) And (5) iterating the loop of the formula (9) and the formula (10) until the parameters converge to obtain an estimated value of the parameter theta.
The iterative formula of the corresponding parameters obtained by the EM algorithm is:
Figure BDA0001365974470000081
Figure BDA0001365974470000082
Figure BDA0001365974470000083
Figure BDA0001365974470000084
degree of freedom υj (t+1)Is a non-linear equation
Figure BDA0001365974470000085
Wherein
Figure BDA0001365974470000086
In step 407, clustering is performed to obtain a plurality of particle clusters, which can be identified by different colors, and performing a classification count statistic, as shown in fig. 2 c.
Example I:
fig. 2 shows embodiment I of the method of the present invention. A two-dimensional scattergram is created from the measurement data of the forward scattered light channel (FSC) and the side scattered light channel (SSC) for the streaming data sample to be processed, as shown in fig. 2b (side scattered light channel on horizontal axis and forward scattered light channel on vertical axis). The sample is a normal sample, the monocyte population accounts for about 5%, the various populations are clearly distinguished, the upper left is the lymphocyte population, the lower left is the erythrocyte debris, the upper middle is the monocyte population, and the right is the granulocyte population.
The distance and the local density parameter for each particle obtained by the density-distance center algorithm are shown in the distance-density distribution diagram, and as shown in fig. 2a, the horizontal axis represents the local density and the vertical axis represents the distance.
Setting a local density threshold value, and excluding particles with local density smaller than the threshold value; arranging all the rest particles into a sequence according to the sequence of the distances from large to small; and setting the number k of the clusters to be 4, and sequentially selecting the first k particles as the centers of the initial clusters to be clustered according to the sequence. The selected cluster centers are indicated in FIG. 2b by "o", "+", "Δ", and "□", respectively.
The selected 1 st initial cluster center is X2719 in the data set and is marked as Xc 1;
the 2 nd initial cluster center is selected as X102 in the data set and is marked as Xc 2;
the selected 3 rd initial cluster center is X3546 in the data set and is marked as Xc 3;
the 4 th initial cluster center is selected as X1568 in the data set and is denoted as Xc 4.
And the solved initial cluster center is used as an initial value of the hybrid model, iterative solution is carried out on the flow data according to the hybrid model, and parameter estimation is carried out by combining a maximum likelihood algorithm. The results of the clustering analysis using the t-distribution mixture model are shown in FIG. 2 c. And identifying each particle group by different colors, and performing classification counting statistics. The noise points in fig. 2 are more, and if the solution is performed only according to the hybrid model, the solution is easy to be wrongly divided, and falls into a local optimal solution. And determining the initial cluster center by using a density-distance center algorithm, thereby ensuring the stability and accuracy of the finite mixture model result.
Based on the classification result of the artificial gating method, the samples clustered by the algorithm are divided into 4 groups, namely red blood cell fragments, lymphocytes, monocytes and granulocytes. Compared with the classification result of the manual gating method, the error of the algorithm is 0.33% for the mononuclear cells with less particles.
Example II:
fig. 3 shows embodiment II of the method herein. A two-dimensional scattergram is created from the data of the forward scattered light channel (FSC) and the side scattered light channel (SSC) for the streaming data sample to be processed, as shown in figure (3b) (side scattered light on the horizontal axis and forward scattered light on the vertical axis). The sample size of the monocyte group of this sample is very small, about 2%, which is a disease or extreme condition.
The distance and the local density parameter for each particle obtained by the density-distance center algorithm are shown in the distance-density distribution diagram, and as shown in fig. 3a, the horizontal axis represents the local density and the vertical axis represents the distance.
Setting a local density threshold value, and excluding particles with local density smaller than the threshold value; arranging all the rest particles into a sequence according to the sequence of the distances from large to small; and setting the number k of the clusters to be 4, and sequentially selecting the first k particles as the centers of the initial clusters to be clustered according to the sequence. The selected cluster centers are indicated in FIG. 3b by "o", "+", "Δ", and "□", respectively.
And the solved initial cluster center is used as an initial value of the hybrid model, iterative solution is carried out on the flow data according to the hybrid model, and parameter estimation is carried out by combining a maximum likelihood algorithm. The results of the clustering analysis using the t-distribution mixture model are shown in FIG. 3 c. And identifying each particle group by different colors, and performing classification counting statistics. The sample has a small sample size of the mononuclear cell group, is distributed sparsely, is easily interfered by the adjacent dominant group, and is mistakenly divided into a part of other groups. And determining the initial cluster center by using a density-distance center algorithm, thereby ensuring the stability and accuracy of the finite mixture model result.
Based on the classification result of the artificial gating method, the samples clustered by the algorithm are divided into 4 groups, namely red blood cell fragments, lymphocytes, monocytes and granulocytes. Compared with the classification result of the manual gating method, the error of the algorithm is 0.19% for the mononuclear cells with less particles.
Example III:
fig. 4 shows embodiment III of the method herein. A two-dimensional scattergram is created from the data of the forward scattered light channel (FSC) and the side scattered light channel (SSC) for the streaming data sample to be processed, as shown in figure (4b) (side scattered light on the horizontal axis and forward scattered light on the vertical axis). The mononuclear cell population of the sample is not only small in sample size (about 2%), but also close to the lymphocyte population, and is partially mixed.
The distance and the local density parameter for each particle obtained by the density-distance center algorithm are shown in the distance-density distribution diagram, and as shown in fig. 4a, the horizontal axis represents the local density and the vertical axis represents the distance.
Setting a local density threshold value, and excluding particles with local density smaller than the threshold value; arranging all the rest particles into a sequence according to the sequence of the distances from large to small; and setting the number k of the clusters to be 4, and sequentially selecting the first k particles as the centers of the initial clusters to be clustered according to the sequence. The selected cluster centers are indicated in FIG. 4b by "o", "+", "Δ", and "□", respectively.
And the solved initial cluster center is used as an initial value of the hybrid model, iterative solution is carried out on the flow data according to the hybrid model, and parameter estimation is carried out by combining a maximum likelihood algorithm. The results of the clustering analysis using the t-distribution mixture model are shown in FIG. 4 c. And identifying each particle group by different colors, and performing classification counting statistics. The sample has a small sample size of the monocyte group, is close to the lymphocyte group, is partially mixed, is easily interfered by the adjacent dominant group, and is mistakenly divided into a part of the lymphocyte group. And determining the initial cluster center by using a density-distance center algorithm, thereby ensuring the stability and accuracy of the finite mixture model result.
Based on the classification result of the artificial gating method, the samples clustered by the algorithm are divided into 4 groups, namely red blood cell fragments, lymphocytes, monocytes and granulocytes. Compared with the classification result of the manual gating method, the error of the algorithm is 0.27% for the mononuclear cells with less particles.
By combining the above embodiments, the density-distance center algorithm has stable results for distinguishing various bad distribution situations such as small sample groups and groups close to each other. Therefore, the initial cluster center is determined by the density-distance center algorithm, the obtained cluster center is accurate and reliable, the problems of positioning and classifying of small sample clusters can be well solved, the interference of various noise points can be effectively eliminated, and the stability and the accuracy of the finite mixture model result are guaranteed; and the initial central value of the mixed model clustering algorithm is used, so that the calculation speed is accelerated.

Claims (4)

1. A flow cytometry particle classification counting method based on a density-distance center algorithm is characterized by comprising the following steps:
1) acquiring a flow data set of cell particles to be classified and counted by adopting a flow cytometry, wherein the flow data set comprises multi-dimensional data of the particles;
2) the method comprises the following steps of obtaining local density and distance parameters of each particle in a flow data set according to a density-distance center algorithm, screening and sequencing to obtain an initial cluster center to be clustered, and specifically comprises the following steps:
21) for streaming data set S ═ x1,x2...xi...xnDefine the ith particle x thereiniLocal density of (p)iAnd a distance deltaiThe parameters are respectively;
Figure FDA0002199202680000011
Figure FDA0002199202680000012
Figure FDA0002199202680000013
wherein d isijIs xiTo xjEuclidean distance of dcχ (x) is a function of the truncation distance;
when a plurality of particle points with the same local density exist, adding an increment approaching 0 to the local density, and then recalculating the local density and distance parameters of each particle;
22) setting a local density threshold ρ0And excluding particles having a local density less than a threshold;
23) arranging all the remaining particles into a sequence according to the sequence of the distances from large to small;
24) setting a cluster number k, sequentially selecting the first k particles as initial cluster centers to be clustered according to the sequence, and when the Euclidean distance between the two cluster centers is smaller than a set threshold value, regarding the two cluster centers as the same cluster, and taking any one point in the two cluster centers as a new cluster center or taking a point with higher local density in the two cluster centers as a new cluster center;
3) and taking the initial cluster center as an initial value of a mixed model algorithm, clustering the particle swarm according to the mixed model to obtain a plurality of classified particle clusters, and counting.
2. The method for classifying and counting flow cytometry based on the density-distance center algorithm as claimed in claim 1, wherein in the step 1), when the data in the flow data set is two-dimensional data, the data of the forward scattering light channel is used as y-axis, and the data of the side scattering light channel is used as x-axis to form a two-dimensional scattergram; or taking the data of the side scattered light channel as a y axis and the data of the fluorescence channel as an x axis to form a two-dimensional scatter diagram; when the data in the streaming data set is three-dimensional data, the data of the forward scattering light channel is taken as an x-axis, the data of the side scattering light channel is taken as a y-axis, and the data of the fluorescence channel is taken as a z-axis to form a three-dimensional scattergram.
3. A flow cytometry particle classifying and counting method based on density-distance center algorithm as claimed in claim 1, wherein in step 21),
when the ith particle is the point with the highest local density, the value is deltaiThe maximum of the distances from the ith particle to all points is then:
Figure FDA0002199202680000021
4. the flow cytometry particle classification and counting method based on the density-distance center algorithm as claimed in claim 1, wherein in the step 3), the mixture model algorithm comprises a gaussian mixture model, a t-distribution mixture model and a biased t-distribution mixture model.
CN201710641341.0A 2017-07-31 2017-07-31 Flow cell particle classification counting method based on density-distance center algorithm Active CN107389536B (en)

Priority Applications (1)

Application Number Priority Date Filing Date Title
CN201710641341.0A CN107389536B (en) 2017-07-31 2017-07-31 Flow cell particle classification counting method based on density-distance center algorithm

Applications Claiming Priority (1)

Application Number Priority Date Filing Date Title
CN201710641341.0A CN107389536B (en) 2017-07-31 2017-07-31 Flow cell particle classification counting method based on density-distance center algorithm

Publications (2)

Publication Number Publication Date
CN107389536A CN107389536A (en) 2017-11-24
CN107389536B true CN107389536B (en) 2020-03-31

Family

ID=60343087

Family Applications (1)

Application Number Title Priority Date Filing Date
CN201710641341.0A Active CN107389536B (en) 2017-07-31 2017-07-31 Flow cell particle classification counting method based on density-distance center algorithm

Country Status (1)

Country Link
CN (1) CN107389536B (en)

Families Citing this family (6)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
JP7201297B2 (en) * 2018-09-26 2023-01-10 シスメックス株式会社 Flow cytometer, data transmission method and information processing system
CN110516584B (en) * 2019-08-22 2021-10-08 杭州图谱光电科技有限公司 Cell automatic counting method based on dynamic learning for microscope
CN112507991B (en) * 2021-02-04 2021-06-04 季华实验室 Method and system for setting gate of flow cytometer data, storage medium and electronic equipment
CN113380318B (en) * 2021-06-07 2023-04-07 天津金域医学检验实验室有限公司 Artificial intelligence assisted flow cytometry 40CD immunophenotyping detection method and system
CN114136868B (en) * 2021-12-03 2022-07-15 浙江博真生物科技有限公司 Flow cytometry full-automatic grouping method based on density and nonparametric clustering
CN116401567B (en) * 2023-06-02 2023-09-08 支付宝(杭州)信息技术有限公司 Clustering model training, user clustering and information pushing method and device

Citations (4)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN102680379A (en) * 2012-05-31 2012-09-19 长春迪瑞医疗科技股份有限公司 Device for classifying and counting white cells by using even high-order aspherical laser shaping system
CN103562920A (en) * 2011-03-21 2014-02-05 贝克顿迪金森公司 Neighborhood thresholding in mixed model density gating
CN103942415A (en) * 2014-03-31 2014-07-23 中国人民解放军军事医学科学院卫生装备研究所 Automatic data analysis method of flow cytometer
CN105424560A (en) * 2015-11-24 2016-03-23 苏州创继生物科技有限公司 Automatic quantitative analysis method for data of flow-type particle instrument

Family Cites Families (1)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
US20130226469A1 (en) * 2008-04-01 2013-08-29 Purdue Research Foundation Gate-free flow cytometry data analysis

Patent Citations (4)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN103562920A (en) * 2011-03-21 2014-02-05 贝克顿迪金森公司 Neighborhood thresholding in mixed model density gating
CN102680379A (en) * 2012-05-31 2012-09-19 长春迪瑞医疗科技股份有限公司 Device for classifying and counting white cells by using even high-order aspherical laser shaping system
CN103942415A (en) * 2014-03-31 2014-07-23 中国人民解放军军事医学科学院卫生装备研究所 Automatic data analysis method of flow cytometer
CN105424560A (en) * 2015-11-24 2016-03-23 苏州创继生物科技有限公司 Automatic quantitative analysis method for data of flow-type particle instrument

Non-Patent Citations (1)

* Cited by examiner, † Cited by third party
Title
Clustering by fast search and find of density peaks,Alex Rodriguez;Alex Rodriguez 等;《Science》;20140627;第344卷(第6191期);第1492-1496页 *

Also Published As

Publication number Publication date
CN107389536A (en) 2017-11-24

Similar Documents

Publication Publication Date Title
CN107389536B (en) Flow cell particle classification counting method based on density-distance center algorithm
JP7465914B2 (en) Systems and methods for biological particle classification - Patents.com
JP4521490B2 (en) Similar pattern search device, similar pattern search method, similar pattern search program, and fraction separation device
US20080172185A1 (en) Automatic classifying method, device and system for flow cytometry
Quiñones et al. Leukocyte segmentation and counting based on microscopic blood images using HSV saturation component with blob analysis
EP4003596A1 (en) System and method for immune activity determination
CN115270874A (en) Method and system for flow cytometry classification and counting based on density estimation
Johnsson Structures in high-dimensional data: Intrinsic dimension and cluster analysis
Fitri et al. A comparison of platelets classification from digitalization microscopic peripheral blood smear
US20210406272A1 (en) Methods and systems for supervised template-guided uniform manifold approximation and projection for parameter reduction of high dimensional data, identification of subsets of populations, and determination of accuracy of identified subsets
CN111274949B (en) Blood disease white blood cell scatter diagram similarity analysis method based on structural analysis
WO2023186051A1 (en) Auxiliary diagnosis method and apparatus, and construction apparatus, analysis apparatus and related product
EP2920573B1 (en) Particle data segmentation result evaluation methods and flow cytometer
Nithya et al. Detection of Anaemia using Image Processing Techniques from microscopy blood smear images
Wei et al. Automatic counting method for complex overlapping erythrocytes based on seed prediction in microscopic imaging
Isnanto et al. Size-based feature extraction on blood cells calculation process using k-means clustering
US11964281B2 (en) System and method for correcting patient index
CN108169105B (en) Leukocyte classification processing method applied to hematology analyzer
Abdallah et al. Using Bayesian inference to measure the proximity of flow cytometry data
Wallin et al. Latent modeling of flow cytometry cell populations
JabbarKaram Detection of White Blood Cells in Smear Image Using Morphological Operations
MATTON Automating flow cytometry data analysis using clustering techniques
CN117686411A (en) Flow cytometer detection data analysis method, medium and system
WO2024083853A1 (en) Detection of abnormality in specimen image
CN117491259A (en) Flow lymphocyte analysis method, device, computer equipment and storage medium

Legal Events

Date Code Title Description
PB01 Publication
PB01 Publication
SE01 Entry into force of request for substantive examination
SE01 Entry into force of request for substantive examination
GR01 Patent grant
GR01 Patent grant