CN106548204A - The fast automatic grouping method of Flow cytometry data - Google Patents

The fast automatic grouping method of Flow cytometry data Download PDF

Info

Publication number
CN106548204A
CN106548204A CN201610943348.3A CN201610943348A CN106548204A CN 106548204 A CN106548204 A CN 106548204A CN 201610943348 A CN201610943348 A CN 201610943348A CN 106548204 A CN106548204 A CN 106548204A
Authority
CN
China
Prior art keywords
main constituent
matrix
data
point
cluster
Prior art date
Legal status (The legal status is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the status listed.)
Pending
Application number
CN201610943348.3A
Other languages
Chinese (zh)
Inventor
张文昌
祝连庆
娄小平
潘志康
孟晓辰
刘超
董明利
Current Assignee (The listed assignees may be inaccurate. Google has not performed a legal analysis and makes no representation or warranty as to the accuracy of the list.)
Beijing Information Science and Technology University
Original Assignee
Beijing Information Science and Technology University
Priority date (The priority date is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the date listed.)
Filing date
Publication date
Application filed by Beijing Information Science and Technology University filed Critical Beijing Information Science and Technology University
Priority to CN201610943348.3A priority Critical patent/CN106548204A/en
Publication of CN106548204A publication Critical patent/CN106548204A/en
Pending legal-status Critical Current

Links

Classifications

    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F18/00Pattern recognition
    • G06F18/20Analysing
    • G06F18/23Clustering techniques
    • G06F18/232Non-hierarchical techniques
    • G06F18/2321Non-hierarchical techniques using statistics or function optimisation, e.g. modelling of probability density functions
    • G06F18/23213Non-hierarchical techniques using statistics or function optimisation, e.g. modelling of probability density functions with fixed number of clusters, e.g. K-means clustering

Landscapes

  • Engineering & Computer Science (AREA)
  • Data Mining & Analysis (AREA)
  • Physics & Mathematics (AREA)
  • Theoretical Computer Science (AREA)
  • Bioinformatics & Cheminformatics (AREA)
  • Artificial Intelligence (AREA)
  • Life Sciences & Earth Sciences (AREA)
  • Bioinformatics & Computational Biology (AREA)
  • Computer Vision & Pattern Recognition (AREA)
  • Evolutionary Biology (AREA)
  • Evolutionary Computation (AREA)
  • General Engineering & Computer Science (AREA)
  • General Physics & Mathematics (AREA)
  • Probability & Statistics with Applications (AREA)
  • Investigating Or Analysing Biological Materials (AREA)

Abstract

The invention provides a kind of method that flow cytometer data carry out fast automatic point of group, the method comprising the steps of:Step one, is lost in cell number evidence, including following sub-step using PCA process:1) sample matrix X is standardized, obtains normalized matrix X*;2) obtain its correlation matrix and carry out feature decomposition, obtain eigenvalue (λ1≥λ2≥…≥λp) and its corresponding characteristic vector a1,a2,…,ap;3) number k of main constituent is determined according to main constituent variance contribution ratio;4) according to the corresponding characteristic vector U=[λ of front k main constituent12…λk], obtain the eigenvectors matrix W=X that sample data is constituted to k principal component vector*U;Step 2, is clustered using the K means algorithm flow cytometrics after improvement, is obtained monoid label;Step 3, arranges the maximum main constituent of contribution rate and draws scatterplot as coordinate axess;Step 4, realizes point group automatically.

Description

The fast automatic grouping method of Flow cytometry data
Technical field
The present invention relates to field of biological medicine, and in particular to a kind of flow cytometer data carry out fast automatic point The method of group.
Background technology
Flow cytometer (Flow Cytometer) becomes biological study and the most important instrument of clinical diagnosises of carrying out, stream Formula cell art (Flow Cytometry) is a kind of multiparameter can be carried out to the cell that suspends or other microgranules, quick is analyzed Or the technology of sorting.Flow cytometer can detect various physicochemical properties of individual cells, while it is thin to obtain representative from the cell Cell space product, the scattered light signal (SC) of granularity and various fluorescent pulse signals (FL) of each antigenic content are represented, and extract signal Peak value, the characteristic parameter such as pulsewidth and area.Each cell induction obtains scattered light and fluorescence signal with individual event (event) Form be recorded, all of event pools the complete stream data of tested cell group.
Flow cytometry data analysis is one of difficult point in flow cytometry, and its main purpose is to recognize and divide in sample Cell subset.When flow cytometry data analysis is carried out, it is usually used and can shows that the two dimension of two Measurement channel parameters dissipates Point diagram carries out visual analyzing to the data for obtaining, the parameter can for forward scattering light (SSC), side scattered light (FSC) or Fluorescence signal.But two-dimentional scatterplot can only be analyzed to the parameter of two dimensions every time, as multiparameter stream data is tieed up Degree is high, and data volume is big, if stream data number of parameters is n, two parameters of random selection can be drawn as horizontal, vertical coordinate Scatterplot map number isUnder normal circumstances, in the scatterplot that random selection coordinate axess parameter is drawn, the differentiation of cell subsets Not substantially, need operator to possess the Professional knowledge of higher level and choose specific parameter combination and be analyzed and could obtain Comparatively ideal grouping result, process is loaded down with trivial details, time-consuming.
The content of the invention
In order to solve the above problems, it is an object of the invention to provide a kind of flow cytometer data carry out it is fast automatic The method for dividing group, the method comprising the steps of:Step one, is lost in cell number evidence using PCA process, including Following sub-step:1) sample matrix X is standardized, obtains normalized matrix X*;2) obtain its correlation matrix to go forward side by side Row feature decomposition, obtains eigenvalue (λ1≥λ2≥…≥λp) and its corresponding characteristic vector a1, a2..., ap;3) according to main constituent Variance contribution ratio determines number k of main constituent;4) according to the corresponding characteristic vector U=[λ of front k main constituent1, λ2…λk], obtain The eigenvectors matrix W=X that sample data is constituted to k principal component vector*U;Step 2, is calculated using the K-means after improvement Method flow cytometric is clustered, and obtains monoid label;Step 3, arranges the maximum main constituent of contribution rate and draws as coordinate axess Scatterplot;Step 4, realizes point group automatically.
Preferably, the step 2 is specifically included:Determine a data point as first initial cluster center, choose with , used as second cluster centre, selected distance the first two cluster centre distance is most for the data point of first cluster centre distance maximum Big data point is the 3rd cluster centre, by that analogy, finally determines n initial cluster center;Finally to each data point Now cluster in initial clustering.The distance of the heart is iterated computing and realizes cluster.
It should be appreciated that aforementioned description substantially and follow-up description in detail be exemplary illustration and
Explain, should not be as the restriction to claimed content of the invention.
Description of the drawings
With reference to the accompanying drawing enclosed, the present invention more purpose, function and advantages are by by the as follows of embodiment of the present invention Description is illustrated, wherein:
Fig. 1 is the flow chart of the method that the flow cytometer data of the present invention carry out fast automatic point of group;
Fig. 2 is to draw the result schematic diagram that two-dimentional scatterplot is obtained using Traditional Man grouping method;
Fig. 3 is the contribution rate and contribution rate of accumulative total of the main constituent using obtaining after PCA methods process of the invention;
Fig. 4 is the grouping result schematic diagram obtained using the method for the present invention.
Specific embodiment
By reference to one exemplary embodiment, the purpose of the present invention and function and the side for realizing these purposes and function Method will be illustrated.However, the present invention is not limited to one exemplary embodiment disclosed below;Can by multi-form come Which is realized.The essence of description is only to aid in the detail of the various equivalent modifications Integrated Understanding present invention.
Hereinafter, embodiments of the invention will be described with reference to the drawings.In the accompanying drawings, identical reference represents identical Or similar part, or same or like step.
The present invention proposes PCA (PCA) to be applied in the analysis of multiparameter stream data, by convection type number Extract according to dimension-reduction treatment and sign is carried out, by the use of best embody two main variables of difference between different cell subsets as The horizontal stroke of two-dimentional scatterplot, axis of ordinates, carry out scatterplot point cluster analysiss to sample.
PCA is a kind of conventional multi-variate statistical analyses technology, and it is selected by linear transformation according to maximum variance principle Less significant variable replaces original multiple variables, reduces data dimension and maximizes the effective information for preserving data. PCA algorithms are standardized to sample matrix X first, obtain normalized matrix X*;Then obtain its correlation matrix to go forward side by side Row feature decomposition, obtains eigenvalue (λ1≥λ2≥…≥λp) and its corresponding characteristic vector a1, a2..., ap;Next according to master Composition variance contribution ratio determines number k of main constituent;Finally, according to the corresponding characteristic vector U=[λ of front k main constituent1, λ2… λk], obtain the eigenvectors matrix W=X that sample data is constituted to k principal component vector*U.Multiparameter flow cytometry data has The features such as data volume is big, dimension is high, PCA methods can reduce the dimension and redundancy of flow cytometry data, choose main constituent Variable arranges coordinate axess automatically as new characteristic variable, draws scatterplot, realizes point group automatically.
K-means algorithms are the algorithms for typically being clustered based on distance, and the algorithm is quick, simple, efficiency high.We Method realizes the automatic gating of cell using the K-means algorithms after improvement.The improvement of algorithm is mainly manifested in initialization cluster The determination of the position of the heart, traditional K-means clustering algorithms usually randomly choose n value as initial cluster center, cause to gather Class result is simultaneously unstable.This method is:First determine that a data point, as first initial cluster center, is then chosen and first The maximum data point of individual cluster centre distance is used as second cluster centre, following selected distance the first two cluster centre distance Maximum data point is the 3rd cluster centre, by that analogy, finally determines n initial cluster center;Finally to each data Point is iterated computing to the distance of initial cluster center and realizes cluster.
The method that present invention side provides can realize that Flow cytometry data divides group automatically, without the need for the seat for manually arranging scatterplot Parameter, the first two obtained after by process or three maximum main constituents of contribution rate are automatically set as coordinate axess, just can Realize point group automatically of automatic flow cytometry data.Additionally, by using the Kmeans clustering algorithms after improvement to process after Stream data carries out cluster analyses, obtains the tag along sort of each event of flow cytometry data, realizes the circle door of different cell subsets. Fig. 1 is the flow chart of the method that the flow cytometer data of the present invention carry out fast automatic point of group.This method divides group to tie automatically Fruit, analysis time time well below manual analyses consistent with Traditional Man grouping result, the efficiency of cell point group is improve, The reliability of grouping result is improve simultaneously, and this method has preferable application prospect in the analysis of multiparameter flow cytometry data, Can be applied in other biological medical data analysis field simultaneously.Fig. 2 is to draw two dimension using Traditional Man grouping method to dissipate The result schematic diagram that point diagram is obtained.Fig. 3 be processed using the PCA methods of the present invention after the contribution rate of main constituent that obtains and accumulative Contribution rate.Fig. 4 is the grouping result schematic diagram that profit is obtained by the present invention.From the point of view of Fig. 2 and Fig. 4 contrasts, using this Bright point group's effect will be due to the snock swarming method being driven.
Adopt the Flow cytometry experiments data of human peripheral blood lymphocytes for process object, sample includes 4811 cells And 3 kinds of surface differentiation antigens (CD3+, CD19+ and CD56+) of lymphocyte.The stream data of each cell includes 11 ginsengs Number, respectively pulse height (FITC-H, PE-H, APC-H), pulse area (FSC-A, SSC-A, FITC-A, PE-A, APC-A) With pulse width (FITC-W, PE-W, APC-W).
The eigenvalue and characteristic vector of 1 contribution rate of table maximum main constituent PC0 and PC1
Tab.1 Characteristic value and characteristic vector of PC1and PC2
Table 2:PCA grouping result accuracys rate
With reference to the explanation and practice of the present invention for disclosing here, the other embodiment of the present invention is for those skilled in the art All will be readily apparent and understand.Illustrate and embodiment be to be considered only as it is exemplary, the present invention true scope and purport it is equal It is defined in the claims.

Claims (2)

1. a kind of method that flow cytometer data carry out fast automatic point of group, the method comprising the steps of:
Step one, is lost in cell number evidence, including following sub-step using PCA process:
1) sample matrix X is standardized, obtains normalized matrix X*
2) obtain its correlation matrix and carry out feature decomposition, obtain eigenvalue (λ1≥λ2≥…≥λp) and its corresponding spy Levy vectorial a1,a2,…,ap
3) number k of main constituent is determined according to main constituent variance contribution ratio;
4) according to the corresponding characteristic vector U=[λ of front k main constituent12…λk], sample data is obtained to k principal component vector The eigenvectors matrix W=X of composition*U;
Step 2, is clustered using the K-means algorithm flow cytometrics after improvement, is obtained monoid label;
Step 3, arranges the maximum main constituent of contribution rate and draws scatterplot as coordinate axess;
Step 4, realizes point group automatically.
2. method according to claim 1, the step 2 are specifically included:Determine that a data point is initial as first Cluster centre, chooses with first cluster centre apart from maximum data point as second cluster centre, two before selected distance The maximum data point of individual cluster centre distance is the 3rd cluster centre, by that analogy, finally determines n initial cluster center; Computing is finally iterated to the distance of each data point to initial cluster center and realizes cluster.
CN201610943348.3A 2016-11-01 2016-11-01 The fast automatic grouping method of Flow cytometry data Pending CN106548204A (en)

Priority Applications (1)

Application Number Priority Date Filing Date Title
CN201610943348.3A CN106548204A (en) 2016-11-01 2016-11-01 The fast automatic grouping method of Flow cytometry data

Applications Claiming Priority (1)

Application Number Priority Date Filing Date Title
CN201610943348.3A CN106548204A (en) 2016-11-01 2016-11-01 The fast automatic grouping method of Flow cytometry data

Publications (1)

Publication Number Publication Date
CN106548204A true CN106548204A (en) 2017-03-29

Family

ID=58393603

Family Applications (1)

Application Number Title Priority Date Filing Date
CN201610943348.3A Pending CN106548204A (en) 2016-11-01 2016-11-01 The fast automatic grouping method of Flow cytometry data

Country Status (1)

Country Link
CN (1) CN106548204A (en)

Cited By (7)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN108169105A (en) * 2017-11-07 2018-06-15 山东卓越生物技术股份有限公司 Leukocyte differential count processing method applied to cellanalyzer
CN108287129A (en) * 2018-03-22 2018-07-17 中国计量大学 The detection device of multichannel fluorescence Spectra bioaerosol particle
CN110197193A (en) * 2019-03-18 2019-09-03 北京信息科技大学 A kind of automatic grouping method of multi-parameter stream data
CN112131937A (en) * 2020-08-14 2020-12-25 中翰盛泰生物技术股份有限公司 Automatic grouping method of fluorescent microspheres
CN113188981A (en) * 2021-04-30 2021-07-30 天津深析智能科技发展有限公司 Automatic analysis method of multi-factor cytokine
CN114136868A (en) * 2021-12-03 2022-03-04 浙江博真生物科技有限公司 Flow cytometry full-automatic clustering method based on density and nonparametric clustering
CN117517176A (en) * 2024-01-04 2024-02-06 成都棱镜泰克生物科技有限公司 Automatic processing method and device for flow cytometry data

Citations (2)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN101226190A (en) * 2007-01-17 2008-07-23 深圳迈瑞生物医疗电子股份有限公司 Automatic sorting method and apparatus for flow type cell art
CN104200114A (en) * 2014-09-10 2014-12-10 中国人民解放军军事医学科学院卫生装备研究所 Flow cytometry data fast analysis method

Patent Citations (2)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN101226190A (en) * 2007-01-17 2008-07-23 深圳迈瑞生物医疗电子股份有限公司 Automatic sorting method and apparatus for flow type cell art
CN104200114A (en) * 2014-09-10 2014-12-10 中国人民解放军军事医学科学院卫生装备研究所 Flow cytometry data fast analysis method

Non-Patent Citations (3)

* Cited by examiner, † Cited by third party
Title
GERALD GREGORI等: "Hyperspectral Cytometry at the Single-Cell Level Using a 32-Channel Photodetector", 《CYTOMETRY PART A BANNER》 *
MALCOLM F. WILKINS等: "Comparison of Five Clustering Algorithms to Classify Phytoplankton From Flow Cytometry Data", 《CYTOMETRY BANNER》 *
周鹏等: "基于主成分分析和支持向量机的睡眠分期研究", 《生物医学工程学杂志》 *

Cited By (9)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN108169105A (en) * 2017-11-07 2018-06-15 山东卓越生物技术股份有限公司 Leukocyte differential count processing method applied to cellanalyzer
CN108169105B (en) * 2017-11-07 2020-12-18 山东卓越生物技术股份有限公司 Leukocyte classification processing method applied to hematology analyzer
CN108287129A (en) * 2018-03-22 2018-07-17 中国计量大学 The detection device of multichannel fluorescence Spectra bioaerosol particle
CN110197193A (en) * 2019-03-18 2019-09-03 北京信息科技大学 A kind of automatic grouping method of multi-parameter stream data
CN112131937A (en) * 2020-08-14 2020-12-25 中翰盛泰生物技术股份有限公司 Automatic grouping method of fluorescent microspheres
CN113188981A (en) * 2021-04-30 2021-07-30 天津深析智能科技发展有限公司 Automatic analysis method of multi-factor cytokine
CN114136868A (en) * 2021-12-03 2022-03-04 浙江博真生物科技有限公司 Flow cytometry full-automatic clustering method based on density and nonparametric clustering
CN117517176A (en) * 2024-01-04 2024-02-06 成都棱镜泰克生物科技有限公司 Automatic processing method and device for flow cytometry data
CN117517176B (en) * 2024-01-04 2024-03-22 成都棱镜泰克生物科技有限公司 Automatic processing method and device for flow cytometry data

Similar Documents

Publication Publication Date Title
CN106548204A (en) The fast automatic grouping method of Flow cytometry data
CN106548205A (en) A kind of fast automatic point of group of flow cytometry data and circle door method
CN106548203A (en) A kind of fast automatic point of group of multiparameter flow cytometry data and gating method
US20170102310A1 (en) Flow cytometer and a multi-dimensional data classification method and an apparatus thereof
Poomcokrak et al. Red blood cells extraction and counting
Safdar et al. Intelligent microscopic approach for identification and recognition of citrus deformities
Chen et al. Automated flow cytometric analysis across large numbers of samples and cell types
Amin et al. 3d semantic deep learning networks for leukemia detection
CN104091178A (en) Method for training human body sensing classifier based on HOG features
Rahadi et al. Red blood cells and white blood cells detection by image processing
CN107356594A (en) Medicinal material section detection method, electronic equipment and storage medium based on cell analysis
CN109580458A (en) Fluidic cell intelligent immunity classifying method, device and electronic equipment
Bacus et al. Image processing for automated erythrocyte classification.
Aliyu et al. Normal and abnormal red blood cell recognition using image processing
CN110226083B (en) Erythrocyte fragment recognition method and device, blood cell analyzer and analysis method
Gavhale et al. Identification of medicinal plant using Machine learning approach
CN110197193A (en) A kind of automatic grouping method of multi-parameter stream data
Azad et al. Immunophenotype discovery, hierarchical organization, and template-based classification of flow cytometry samples
Di Ruberto et al. A region proposal approach for cells detection and counting from microscopic blood images
Appleby et al. Sources of variability in cytosolic calcium transients triggered by stimulation of homogeneous uro-epithelial cell monolayers
Gondois‐Rey et al. Multi‐parametric cytometry from a complex cellular sample: Improvements and limits of manual versus computational‐based interactive analyses
FI117987B (en) General procedure for classifying plant embryos by a generalized Lorenz-Bayes classifier
Hokanson et al. Some theoretical and practical considerations for multivariate statistical cell classification useful in autologous stem cell transplantation and tumor cell purging
Wen et al. Dimension reduction analysis in image-based species classification
Micks et al. A chromatographic study of the systematic relationship within the Anopheles gambiae complex

Legal Events

Date Code Title Description
PB01 Publication
PB01 Publication
SE01 Entry into force of request for substantive examination
SE01 Entry into force of request for substantive examination
RJ01 Rejection of invention patent application after publication

Application publication date: 20170329

RJ01 Rejection of invention patent application after publication