CN113588524A - Method for removing gadolinium isotope channel pollution in mass spectrum flow data - Google Patents
Method for removing gadolinium isotope channel pollution in mass spectrum flow data Download PDFInfo
- Publication number
- CN113588524A CN113588524A CN202110904965.3A CN202110904965A CN113588524A CN 113588524 A CN113588524 A CN 113588524A CN 202110904965 A CN202110904965 A CN 202110904965A CN 113588524 A CN113588524 A CN 113588524A
- Authority
- CN
- China
- Prior art keywords
- gadolinium
- data
- channel
- gadolinium isotope
- isotope
- Prior art date
- Legal status (The legal status is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the status listed.)
- Granted
Links
- 229910052688 Gadolinium Inorganic materials 0.000 title claims abstract description 216
- UIWYJDYFSGRHKR-UHFFFAOYSA-N gadolinium atom Chemical compound [Gd] UIWYJDYFSGRHKR-UHFFFAOYSA-N 0.000 title claims abstract description 214
- 238000000034 method Methods 0.000 title claims abstract description 36
- 238000001819 mass spectrum Methods 0.000 title claims abstract description 13
- 238000001514 detection method Methods 0.000 claims abstract description 43
- 238000012937 correction Methods 0.000 claims abstract description 22
- 239000000090 biomarker Substances 0.000 claims abstract description 8
- 238000011109 contamination Methods 0.000 claims description 55
- 230000008878 coupling Effects 0.000 claims description 18
- 238000010168 coupling process Methods 0.000 claims description 18
- 238000005859 coupling reaction Methods 0.000 claims description 18
- 238000012417 linear regression Methods 0.000 claims description 18
- 238000007781 pre-processing Methods 0.000 claims description 17
- 238000005457 optimization Methods 0.000 claims description 11
- 238000004949 mass spectrometry Methods 0.000 claims description 7
- 210000004027 cell Anatomy 0.000 abstract description 49
- 238000007405 data analysis Methods 0.000 abstract description 3
- 210000003850 cellular structure Anatomy 0.000 abstract description 2
- 102000004169 proteins and genes Human genes 0.000 description 6
- 108090000623 proteins and genes Proteins 0.000 description 6
- 229910052751 metal Inorganic materials 0.000 description 5
- 239000002184 metal Substances 0.000 description 5
- 230000002194 synthesizing effect Effects 0.000 description 5
- 210000001519 tissue Anatomy 0.000 description 5
- 238000004458 analytical method Methods 0.000 description 4
- 239000011159 matrix material Substances 0.000 description 4
- 238000004364 calculation method Methods 0.000 description 3
- 239000002872 contrast media Substances 0.000 description 3
- 238000005516 engineering process Methods 0.000 description 3
- 238000000684 flow cytometry Methods 0.000 description 3
- 150000002739 metals Chemical class 0.000 description 3
- 210000005084 renal tissue Anatomy 0.000 description 3
- 241000282412 Homo Species 0.000 description 2
- 238000007621 cluster analysis Methods 0.000 description 2
- 238000010219 correlation analysis Methods 0.000 description 2
- 238000012083 mass cytometry Methods 0.000 description 2
- 102100024222 B-lymphocyte antigen CD19 Human genes 0.000 description 1
- 206010006187 Breast cancer Diseases 0.000 description 1
- 208000026310 Breast neoplasm Diseases 0.000 description 1
- 208000030808 Clear cell renal carcinoma Diseases 0.000 description 1
- 101000980825 Homo sapiens B-lymphocyte antigen CD19 Proteins 0.000 description 1
- 101000934338 Homo sapiens Myeloid cell surface antigen CD33 Proteins 0.000 description 1
- 241001465754 Metazoa Species 0.000 description 1
- 102100025243 Myeloid cell surface antigen CD33 Human genes 0.000 description 1
- QPCDCPDFJACHGM-UHFFFAOYSA-N N,N-bis{2-[bis(carboxymethyl)amino]ethyl}glycine Chemical compound OC(=O)CN(CC(O)=O)CCN(CC(=O)O)CCN(CC(O)=O)CC(O)=O QPCDCPDFJACHGM-UHFFFAOYSA-N 0.000 description 1
- 238000005481 NMR spectroscopy Methods 0.000 description 1
- 206010062237 Renal impairment Diseases 0.000 description 1
- 238000013459 approach Methods 0.000 description 1
- 230000009286 beneficial effect Effects 0.000 description 1
- 230000015572 biosynthetic process Effects 0.000 description 1
- 210000004204 blood vessel Anatomy 0.000 description 1
- 210000000988 bone and bone Anatomy 0.000 description 1
- 210000000481 breast Anatomy 0.000 description 1
- 239000013522 chelant Substances 0.000 description 1
- 206010073251 clear cell renal cell carcinoma Diseases 0.000 description 1
- 238000003759 clinical diagnosis Methods 0.000 description 1
- 230000002950 deficient Effects 0.000 description 1
- 238000003745 diagnosis Methods 0.000 description 1
- 238000010586 diagram Methods 0.000 description 1
- 238000009792 diffusion process Methods 0.000 description 1
- 238000001647 drug administration Methods 0.000 description 1
- 238000002474 experimental method Methods 0.000 description 1
- 210000003722 extracellular fluid Anatomy 0.000 description 1
- 230000004907 flux Effects 0.000 description 1
- 210000003494 hepatocyte Anatomy 0.000 description 1
- 210000005260 human cell Anatomy 0.000 description 1
- 238000003384 imaging method Methods 0.000 description 1
- 238000002347 injection Methods 0.000 description 1
- 239000007924 injection Substances 0.000 description 1
- 238000010253 intravenous injection Methods 0.000 description 1
- 230000003907 kidney function Effects 0.000 description 1
- 210000005228 liver tissue Anatomy 0.000 description 1
- 238000002595 magnetic resonance imaging Methods 0.000 description 1
- 238000005259 measurement Methods 0.000 description 1
- 230000004060 metabolic process Effects 0.000 description 1
- 239000000203 mixture Substances 0.000 description 1
- 239000003147 molecular marker Substances 0.000 description 1
- 210000000056 organ Anatomy 0.000 description 1
- 230000005298 paramagnetic effect Effects 0.000 description 1
- 231100000857 poor renal function Toxicity 0.000 description 1
- 238000003908 quality control method Methods 0.000 description 1
- 238000004088 simulation Methods 0.000 description 1
- 238000010186 staining Methods 0.000 description 1
- 238000003786 synthesis reaction Methods 0.000 description 1
Images
Classifications
-
- G—PHYSICS
- G01—MEASURING; TESTING
- G01N—INVESTIGATING OR ANALYSING MATERIALS BY DETERMINING THEIR CHEMICAL OR PHYSICAL PROPERTIES
- G01N15/00—Investigating characteristics of particles; Investigating permeability, pore-volume or surface-area of porous materials
- G01N15/10—Investigating individual particles
- G01N15/14—Optical investigation techniques, e.g. flow cytometry
-
- G—PHYSICS
- G01—MEASURING; TESTING
- G01N—INVESTIGATING OR ANALYSING MATERIALS BY DETERMINING THEIR CHEMICAL OR PHYSICAL PROPERTIES
- G01N1/00—Sampling; Preparing specimens for investigation
- G01N1/28—Preparing specimens for investigation including physical details of (bio-)chemical methods covered elsewhere, e.g. G01N33/50, C12Q
- G01N1/34—Purifying; Cleaning
-
- G—PHYSICS
- G01—MEASURING; TESTING
- G01N—INVESTIGATING OR ANALYSING MATERIALS BY DETERMINING THEIR CHEMICAL OR PHYSICAL PROPERTIES
- G01N27/00—Investigating or analysing materials by the use of electric, electrochemical, or magnetic means
- G01N27/62—Investigating or analysing materials by the use of electric, electrochemical, or magnetic means by investigating the ionisation of gases, e.g. aerosols; by investigating electric discharges, e.g. emission of cathode
Landscapes
- Chemical & Material Sciences (AREA)
- Health & Medical Sciences (AREA)
- Life Sciences & Earth Sciences (AREA)
- General Physics & Mathematics (AREA)
- Pathology (AREA)
- Analytical Chemistry (AREA)
- Biochemistry (AREA)
- General Health & Medical Sciences (AREA)
- Physics & Mathematics (AREA)
- Immunology (AREA)
- Dispersion Chemistry (AREA)
- Engineering & Computer Science (AREA)
- Biomedical Technology (AREA)
- Molecular Biology (AREA)
- Chemical Kinetics & Catalysis (AREA)
- Electrochemistry (AREA)
- Other Investigation Or Analysis Of Materials By Electrical Means (AREA)
Abstract
The invention discloses a method for removing gadolinium isotope channel pollution in mass spectrum flow data. By using gadolinium isotope channels (155Gd~158Gd,160Gd) pollution signals have the characteristic of collinearity, under the condition that biomarkers specifically marked by antibodies coupled with gadolinium isotopes and designed in advance cannot be co-expressed on one cell at the same time, the estimation value of the pollution signals of a single cell in a gadolinium isotope channel is calculated, and correction data for removing the gadolinium isotope channel pollution is obtained. The invention provides a method for estimating and removing a pollution signal of a gadolinium isotope channel in mass spectrum flow data, which avoids the deviation caused by artificial change of cell components of a detected sample caused by directly removing polluted cells, improves the quality of the mass spectrum flow detection data and improves the subsequent quality of the mass spectrum flow detection dataThe data analysis has important application value.
Description
Technical Field
The invention relates to the technical field of mass spectrum flow cytometry, in particular to a method for removing pollution of a gadolinium isotope channel in mass spectrum flow data.
Background
Gadolinium (Gd) isotope channel contamination is common in mass spectrometry flow data obtained by detecting local tissue samples of patients using mass cytometry.
Mass Cytometry (CyTOF for short) is a single-cell, high-throughput and high-dimensionality detection technology, protein molecular markers on the surface and inside of cells are calibrated by coupling antibody specificity of rare metals, and the content of the calibrated rare metals is accurately and quantitatively detected by using a Mass spectrometry principle. Compared with the traditional flow cytometry technology, the method has the characteristics of higher flux and higher detection signal precision. Currently, rare metals used to couple antibodies have more than 40 channels, among which the gadolinium isotope channel: (155Gd~158Gd,160Gd) is a common channel for cytef detection.
Gadolinium (Gd) is a metal element with isotopes of152Gd,154Gd~158Gd,160Gd。
The paramagnetic gadolinium chelate is the most commonly used contrast agent for magnetic resonance imaging at present, and is mainly used for improving the imaging quality, increasing the contrast of images and improving the diagnosis of clinical conditions. At the end of the 80's 20 th century, gadolinium diethylenetriaminepentaacetate (Gd-TDPA) was officially approved by the United states drug administration (FDA) for clinical diagnosis after undergoing a number of animal experiments. Gadolinium-based contrast agents (GBCA) are used by intravenous injection into the human body and rapidly reach a concentration equilibrium in the blood vessels and extracellular fluid, and also enter cells (including liver and kidney tissue cells, etc.) by passive diffusion or some special transport channel. The clinically approved GBCA, at lower doses of injection, will not substantially enter human cells except for small amounts that may remain in hepatocytes, and will be rapidly and completely excreted by humans with normal renal function. However, at higher injected doses or in humans with defective or impaired renal function, the rate of GBCA metabolism is significantly slowed, with a half-life extending from hours to days, and even remaining in multiple organs (including skin, bone, etc.). Therefore, patients who have used GBCA may have gadolinium isotopes remaining in local tissue cells to different degrees, the amount of residual gadolinium isotopes, the duration of residual gadolinium isotopes, and the amount of residual gadolinium isotopes remaining in different cells may vary according to the individual and the dosage of GBCA.
When the GBCA-used local tissue of a patient is detected by CyTOF after surgical excision, the obtained CyTOF data may include two detection signals of gadolinium isotope channels: a portion of the detection signal from the gadolinium isotope coupled to the antibody is also the target signal of detection; the other part is from a detection signal caused by residual gadolinium isotopes in cells by using GBCA recently, the part is not a detected target signal and can interfere the detection precision and accuracy of the target signal, and the part is called a pollution signal. In CyTOF detection of renal tissue samples from patients with renal clear cell carcinoma, as published by Bernd Bodenmiller et al in 2017 at page 736-749 of J.169, cell & J.169, it was clearly indicated that there were cells with a signal of gadolinium isotope contamination in the renal tissue samples. They subsequently published 2019 in "cell journal 177, pages 1-16, CyTOF measurements of breast tissue from breast cancer patients, again clearly indicating the presence of cells with a signal for gadolinium isotope contamination. To remove these contaminating signals, they used all the data detected from cells directly depleted of contaminating signals from gadolinium isotopes (equivalent to directly removing the whole cells). Such an approach directly changes the composition of the cells in the sample being tested, and is prone to biased analytical conclusions. Therefore, the method has important significance for effectively estimating and removing the pollution signal of the gadolinium isotope channel in CyTOF detection data, and ensuring the accuracy of CyTOF detection, the quality of CyTOF data and the effectiveness of relevant conclusions obtained by later-stage data analysis.
Disclosure of Invention
The object of the present invention is to provide an estimation and eliminationA method for removing gadolinium isotope pollution in mass flow data. By using gadolinium isotope channels (155Gd~158Gd,160Gd) pollution signals have the characteristic of collinearity, and based on the condition that the biomarkers specifically marked by the antibodies coupled with the gadolinium isotopes cannot be co-expressed on one cell at the same time, the estimation value of the pollution signals of the single cell in a gadolinium isotope channel is calculated, and correction data for removing the gadolinium isotope channel pollution is obtained.
The invention adopts the following technical scheme:
a method for removing gadolinium isotope channel pollution in mass spectrum flow data comprises the following steps:
1) when CyTOF detection is used, the specific labeled biomarker of the gadolinium isotope coupling antibody cannot be co-expressed in one cell at the same time, and at least two gadolinium isotope coupling antibodies and at most five gadolinium isotope coupling antibodies are designed, wherein the gadolinium isotope is selected from155Gd~158Gd,160Gd;
2) For the ratio R of the intensity of the contaminating signals between the channels of the gadolinium isotopeGdCarrying out estimation;
3) estimating the pollution degree coefficient k of the single cell;
4) based on estimated RGdAnd k, calculating an estimated value of a pollution signal of the gadolinium isotope channel, and obtaining correction data for removing the pollution of the gadolinium isotope channel by the following formula:
Datacorrected=max(0,Dataobserved-k*RGd)+noise
in the formula, DataobservedCyTOF data of a gadolinium isotope channel representing data preprocessing, wherein if a sample contains gadolinium pollution, the CyTOF data of the gadolinium isotope channel before the gadolinium pollution is removed; datacorrectedCyTOF correction data representing gadolinium isotope channels depleted of gadolinium contamination; noise represents the background noise signal of the gadolinium isotope channel.
Further, step 2) utilizes the collinearity characteristic of the pollution signal of the gadolinium isotope channel to estimate RGd:
Get DataobservedCalculating the intensity ratio of the pollution signals of different gadolinium isotope channels by using a linear regression model for cells 5-10% of the average signal intensity, specifically, calculating by using one gadolinium isotope channel as a gadolinium isotope reference channel and using the following formula155Gd~158Gd,160Intensity ratio of Gd channel contamination signal to gadolinium isotope baseline channel contamination signal
Wherein each value of i represents a gadolinium isotope channel, and i ═ 1, 2, …,5 respectively correspond to the gadolinium isotope channels155Gd~158Gd,160Gd;Respectively corresponding to gadolinium isotope channels155Gd~158Gd,160The intensity ratio of the Gd contamination signal to the gadolinium isotope baseline channel contamination signal;detecting signals of a gadolinium isotope reference channel of CyTOF data subjected to data preprocessing; b is a constant term of unary linear regression; epsilon is the error of the data from linearity;respectively corresponding to CyTOF data gadolinium isotope channels subjected to data preprocessing155Gd~158Gd,160A detection signal for Gd;
or step 2) using the ratio of the abundance of gadolinium isotopes to R existing in natureGdAn estimation is performed.
Further, step 2) utilizes the collinearity characteristic of the pollution signal of the gadolinium isotope channel to estimate RGd:
Data fetchingobservedCalculating the intensity ratio of the pollution signals of different gadolinium isotope channels by using a linear regression model for cells 5-10% of the average signal intensity, specifically, calculating by using one gadolinium isotope channel as a gadolinium isotope reference channel and using the following formula155Gd~158Gd,160Intensity ratio of Gd channel contamination signal to gadolinium isotope baseline channel contamination signal
Wherein each value of i represents a gadolinium isotope channel, and i ═ 1, 2, …,5 respectively correspond to the gadolinium isotope channels155Gd~158Gd,160Gd;Respectively corresponding to gadolinium isotope channels155Gd~158Gd,160The intensity ratio of the Gd contamination signal to the gadolinium isotope baseline channel contamination signal;detecting signals of a gadolinium isotope reference channel of CyTOF data subjected to data preprocessing; b is a constant term of unary linear regression; epsilon is the error of the data from linearity;respectively corresponding to CyTOF data gadolinium isotope channels subjected to data preprocessing155Gd~158Gd,160A detection signal of Gd.
Further, get DataobservedAnd (3) calculating the intensity ratio of the pollution signals of different gadolinium isotope channels by using a linear regression model for the cells with the first 5% of the average signal intensity.
Further, step 3) adopts an L1-norm optimization method to estimate k:
or step 3) estimating k by adopting an L2-norm optimization method:
or step 3) calculating an estimated value of the pollution degree coefficient of the single gadolinium isotope channelEstimating k by its minimum value:
or step 3) calculating an estimated value of the pollution degree coefficient of the single gadolinium isotope channelK is estimated by its mean value:
or step 3) calculating an estimated value of the pollution degree coefficient of the single gadolinium isotope channelK is estimated by the number of bits therein:
wherein DataAbCyTOF data for gadolinium isotope channel preprocessed by data from organisms specifically labeled with coupling antibodyA detection signal of the label; each value of i represents a gadolinium isotope channel, and i is 1, 2, … and 5 respectively corresponding to the gadolinium isotope channels155Gd~158Gd,160Gd;Respectively corresponding to CyTOF data gadolinium isotope channels subjected to data preprocessing155Gd~158Gd,160A detection signal for Gd;respectively corresponding to gadolinium isotope channels155Gd~158Gd,160The intensity ratio of the Gd contamination signal to the gadolinium isotope baseline channel contamination signal;respectively corresponding to single cell in single gadolinium isotope channel155Gd~158Gd,160Estimate of Gd contamination degree coefficient.
Further, step 3) adopts an L1-norm optimization method to estimate k:
the invention has the beneficial effects that:
the invention discloses a method for estimating and removing a pollution signal of a gadolinium isotope channel in mass spectrum flow data, which avoids deviation caused by artificial change of cell components of a detected sample caused by directly removing polluted cells. The technology is mainly used for mass spectrum flow cytometry detection of local tissue samples of patients injected with gadolinium-based contrast agents due to nuclear magnetic resonance detection, and has important application values for improving the quality of mass spectrum flow detection data and subsequent data analysis.
Drawings
FIG. 1 is a graph illustrating the collinearity of the contamination signals of gadolinium isotope channels:
scatter plots of the a gadolinium isotope channels between each two: left panel: no gadolinium contaminates the signal sample data; right panel: signal sample data contaminated by gadolinium;
b gadolinium isotope channel correlation coefficient: left panel: the correlation coefficient of every two detection signals of gadolinium isotope channels of a gadolinium pollution signal sample is not generated; right panel: the correlation coefficient of each two of gadolinium isotope channel detection signals of a gadolinium pollution signal sample;
c, under the condition that the same gadolinium pollution sample is subjected to antibody coupling with or without a gadolinium isotope channel, the correlation coefficient of each two detection signals of the corresponding channel is as follows: left panel: the gadolinium isotope channel is free of coupling antibodies, and all detection signals come from gadolinium pollution signals; right panel: the gadolinium isotope channel is provided with a coupling antibody;
d multivariate linear regression analysis of gadolinium isotope channel signals of coupled antibodies: for coupling gamma delta TCR, CD19, CD33 respectively156Gd、158Gd、160Performing multivariate linear regression on the Gd channel detection signals of the four channels shown by the longitudinal axis, wherein the obtained regression coefficients are shown in the figure; to couple gamma delta TCR156Gd channels, e.g. with detection signals predominantly associated with uncoupled antibodies155Gd contaminating signal and non-Gd channel coupled to gamma delta TCR142Nd-gamma delta TCR linear correlation with other metal channels coupled to other antibodies148Nd-CD19,149Sm-CD33 was independent (correlation coefficient 0), indicating that the contaminating Gd channel signal was linearly superimposed by the Gd contamination signal and the coupled antibody signal;
e to155Gd is used as a reference channel, and R is obtained by linear regression calculation of a formula (4)GdAnd through correlation analysis of the ratio of the natural abundance of the gadolinium isotope, the correlation coefficient reaches 0.96.
FIG. 2 is a graph showing the calculation of the contamination signal intensity ratio R between the respective channels of gadolinium isotopesGd:
Detecting the obtained gadolinium isotope pollution signal of a detection sample with gadolinium pollution under the condition that no coupling antibody exists in gadolinium isotope channels so as to155Taking Gd as a reference channel, performing linear regression on cells n% before the average signal intensity of the gadolinium isotope channels (gradually increasing from 5% to 95% from the former 5%), and calculating the relative intensity of other gadolinium isotope channels155Alignment of the signal ratio of Gd channels (blue line) to the ratio of the natural abundance of gadolinium isotopes (red line).
FIG. 3 illustrates example 1:
a, synthesizing a Gd channel scattergram of CyTOF data polluted by gadolinium isotopes;
b, synthesizing a Gd channel correlation coefficient of CyTOF data polluted by gadolinium isotopes;
the ratio of 25 cell subsets obtained by C7 data clustering;
d, synthesizing gadolinium isotope pollution data and analyzing the correlation between correction data obtained by adopting the methods of formula (5) -formula (9) and pollution-free data (raw) in a ratio of 25 cell subsets.
FIG. 4 shows example 1, Explanation 2:
a7 panel data cluster analysis the resulting heatmaps of 25 cell subsets: each row represents a subpopulation of cells, each column expressing one protein molecular marker expression;
b7, comparing and analyzing the average expression intensity of the data of gadolinium isotope channels;
c, synthesizing gadolinium isotope pollution data and comparing correction data obtained by adopting a method of formula (5) -formula (9) with correlation coefficients of 25 cell subsets of pollution-free data (raw);
d, synthesizing gadolinium isotope pollution data and a performance index of the similarity between correction data obtained by adopting a method of a formula (5) -a formula (9) and pollution-free data (raw);
e synthesis of gadolinium isotope channel multiple linear regression coefficient analysis coupled with gadolinium isotope contamination data and 1DNorm correction data to antibodies.
Detailed Description
The invention is explained in more detail below with reference to exemplary embodiments and the accompanying drawings. The following examples are provided only for illustrating the present invention and are not intended to limit the scope of the present invention.
The basic principle of the invention is as follows:
firstly, defining gadolinium isotope channel (subjected to data preprocessing (CyTOF data off-machine preprocessing usually including barcode decoding, quality control and the like, which is a conventional operation) (155Gd~158Gd,160Gd) as Dataobserved(if the sample contains gadolinium pollution, the CyTOF Data of gadolinium isotope channel before gadolinium pollution removal is a numerical matrix of n x 5, each row represents an effective single cell, each column represents a gadolinium isotope channel, and n is the effective number of single cells contained in the detected sample), wherein the detection signal from the biomarker specifically labeled by the coupling antibody, namely the protein molecule is DataAbThe part of the signal is a normal signal which needs to be detected; the contaminating signal from gadolinium isotope residues resulting from the use of GBCA is DataGdThis part of the signal is the polluting signal that needs to be removed; the relationship between the three is expressed by formula (1):
Dataobserved=DataAb+DataGdformula (1)
Wherein DataAbAnd DataGdN x 5 numerical matrix, definition of rows and columns and DataobservedThe same is true.
Data when cells contain residual signals of gadolinium isotopesobservedTwo-by-two co-linear relationships are present between different channels, especially in cells with higher signal intensity expression, which shows a multiple co-linear relationship (FIG. 1). The preset experimental condition that the specific labeled biomarkers of the antibodies based on gadolinium isotope coupling, namely protein molecules, cannot be co-expressed in one cell at the same time, DataobservedThe phenomenon of co-linearity of the two presented channels is then caused by gadolinium contamination signals. Therefore, the collinearity characteristic can be used to estimate the pollution signal intensity ratio R between the gadolinium isotope channelsGd(n x 5 matrix of values) and a single-cell contamination degree coefficient k (n x 1 matrix of values), and calculating an estimated value of a contamination signal of the gadolinium isotope channel by using a formula (2):
whereinIn the CyTOF data representing the data-preprocessed gadolinium isotope channels, an estimate of the contamination signal from gadolinium isotope residuals resulting from the use of GBCA. Obtaining corrected data for removing the gadolinium isotope channel contamination by removing the estimated value of the contamination signal in the detection signal and using the formula (3), namely the estimated value of the target detection signal:
wherein DatacorrectedCyTOF correction data representing gadolinium isotope channels depleted of gadolinium contamination; noise represents the background noise signal of the gadolinium isotope channel, and the term is added to avoid the zero value produced by the correction. The artificial addition of background noise signals will not change the expression pattern of "positive" or "negative" of the target molecules, and thus will not affect the results of the subsequent cluster analysis.
Specifically, the following method was used to estimate the ratio R of the intensity of the contaminating signal between the channels of gadolinium isotopesGd:
(1) Estimating R by utilizing collinearity characteristic of gadolinium isotope channel pollution signalGd(FIG. 2). Data fetchingobservedCalculating the intensity ratio of the pollution signals of different gadolinium isotope channels by linear regression model respectively for the cells with 5% -10% of the average signal intensity, preferably the cells with 5% of the average signal intensity155The Gd channel is a gadolinium isotope reference channel (also can be a gadolinium isotope reference channel)156Gd,157Gd,158Gd, or160Gd as reference channel), calculated using equation (4)155Gd~158Gd,160Gd channel pollution signal phase contrast155Intensity ratio of Gd channel contamination signals
Wherein each i value represents a gadolinium isotopeChannels, i-1, 2, …,5, respectively, correspond to gadolinium isotope channels155Gd~158Gd,160Gd;Respectively corresponding to gadolinium isotope channels155Gd~158Gd,160Contamination signal of Gd and155the intensity ratio of Gd channel contamination signals;CyTOF data pre-processed for data155B is a constant term of unary linear regression, and epsilon is an error of data deviation linearity;respectively corresponding to CyTOF data gadolinium isotope channels subjected to data preprocessing155Gd~158Gd,160A detection signal of Gd.
(2) Or using the ratio of the abundance of gadolinium isotopes to R existing in natureGdThe estimation is performed to avoid the calculation of linear regression and the like required in (1), which is an experience-based direct estimation method (FIGS. 1-E), to155The Gd channel is a reference channel (can also be a reference channel)156Gd,157Gd,158Gd, or160Gd as the reference channel) ofGdThe value is [1,1.3831,1.0574,1.6784,1.4770 ]]。
Specifically, k is estimated using the following method:
(1) and (3) estimating the single cell pollution degree coefficient k by adopting an optimization method. The basic principle of the optimization method is to obtain the maximum estimation value of the possible contamination degree of the single cell by minimizing the detection signal labeled by the coupling antibody, wherein the minimization optimization method can be realized by the optimization formula (5) of L1-norm or the optimization formula (6) of L2-norm:
(2) under the preset experimental conditions that the gadolinium isotope-coupled antibody-specific labeled biomarkers, namely protein molecules, cannot be co-expressed in one cell at the same time, for a single polluted cell, at least one row of channels should exist, and the signals mainly come from the pollution signals, namely, the protein molecules labeled by the coupled antibody are not expressed on the cell. Therefore, the signals and R are detected by calculating different channelsGdThe ratio of (A) to (B) can obtain the pseudo-contamination coefficient of a single cell in different gadolinium isotope channelsAnd by calculatingThe minimum or average or median value of k, which is expressed by the corresponding formula (7) -formula (9):
each value of i represents a gadolinium isotope channel, and i is 1, 2, … and 5 respectively corresponding to the gadolinium isotope channels155Gd~158Gd,160Gd;Respectively corresponding to CyTOF data gadolinium isotope channels subjected to data preprocessing155Gd~158Gd,160A detection signal for Gd;respectively corresponding to gadolinium isotope channels155Gd~158Gd,160The intensity ratio of the Gd contamination signal to the gadolinium isotope baseline channel contamination signal;respectively corresponding to single cell in single gadolinium isotope channel155Gd~158Gd,160Estimate of Gd contamination degree coefficient.
Example 1
Firstly, gadolinium isotope channel coupling-free antibody staining and CyTOF on-machine detection are carried out on a sample polluted by gadolinium isotopes, obtained 155 Gd-158 Gd and 160Gd channel detection data are superposed into sample data (raw, without gadolinium isotope pollution) polluted by gadolinium isotope channels as gadolinium pollution signals, and synthetic simulation data (simulate, with gadolinium isotope pollution) are generated and have gadolinium isotope collinearity characteristics and pairwise strong correlation characteristics between channels (figures 3A and B).
With R provided in the inventionGdEstimation method (equation (4)) to obtain the pair RGdThe estimates of (c) are as follows:
RGd=[1.000000,1.308478,1.004611,1.764121,1.714435]。
then, the k value is estimated by respectively adopting a formula (5) to a formula (9), then the pollution data is corrected by adopting a formula (3), and correction data corresponding to different estimation methods of the formula (5) to the formula (9) are respectively obtained as follows: 1DNorm, 2DNorm, Min, Mean, Median. And carrying out unified clustering analysis on raw, simulate, 1DNorm, 2DNorm, Min, Mean and Median 7 groups of data, and comparing clustering results of different data. Clustering analysis yielded 25 cell subsets (C1-C25) (FIG. 4A), with the cells of C13 and C14 subsets being derived primarily from simulate data (FIG. 3C), suggesting that these two subsets are spurious cell subsets due to gadolinium contamination of the detection signal. Correlation analysis of 25 cell subpopulation ratios per dataset (FIG. 3D and FIG. 4C) revealed that simulate data was corrected by the Min methodThe data and raw data are low in correlation, and the 1DNorm data and the raw data are the highest in correlation (the correlation coefficient reaches 0.95), which shows that the correction data generated by the 1DNorm method is the closest to the original data without gadolinium pollution. By comparing the average expression intensities of Gd isotope channels, it was found that the expression intensities of the simulate data and the Min method-corrected data and the raw data are the most different, while the expression intensities of the rest of the method-corrected data and the raw data are similar. In addition, by calculating F1 score, Recall, Precision, Homogenity and ARI parameters among several groups of data and raw data, the performance of 1DNorm on several performance indexes is found to be optimal, so that k is estimated by adopting a method of 1DNorm (formula (5)), and the obtained correction data is closest to the data without gadolinium isotope pollution, which shows that the correction performance is optimal and is a recommended estimation method. The samples before correction (simulate data) and after 1DNorm correction (1DNorm) were further analyzed156Gd、158Gd and160gd channel detects the source of the signal, finding coupling to-gamma delta TCR156Gd channel, pre-correction detection signal and155gd and142nd-gamma delta TCR average linear correlation (red dots in a scatter diagram of FIG. 4E), after correction by 1DNorm, the detection signal is only related to142Nd-gamma-delta TCR linear correlation with155The Gd channel has a correlation coefficient of 0 (blue dots in the scatter plot of fig. 4E), indicating that the 1DNorm correction effectively removed the contaminating signal from the gadolinium isotope. In the other two gadolinium isotope channels158Gd-CD19 and160Gd-CD33 found the same phenomenon, and 1DNorm correction effectively removed the gadolinium isotope contamination signal pair158Gd-CD19 and160influence of Gd-CD33 channel.
Claims (6)
1. A method for removing gadolinium isotope channel pollution in mass spectrum flow data is characterized by comprising the following steps:
1) when CyTOF detection is used, the specific labeled biomarker of the gadolinium isotope coupling antibody cannot be co-expressed in one cell at the same time, and at least two gadolinium isotope coupling antibodies and at most five gadolinium isotope coupling antibodies are designed, wherein the gadolinium isotope is selected from155Gd~158Gd,160Gd;
2) For the ratio R of the intensity of the contaminating signals between the channels of the gadolinium isotopeGdCarrying out estimation;
3) estimating the pollution degree coefficient k of the single cell;
4) based on estimated RGdAnd k, calculating an estimated value of a pollution signal of the gadolinium isotope channel, and obtaining correction data for removing the pollution of the gadolinium isotope channel by the following formula:
Datacorrected=max(0,Dataobserved-k*RGd)+noise
in the formula, DataobservedCyTOF data of a gadolinium isotope channel representing data preprocessing, wherein if a sample contains gadolinium pollution, the CyTOF data of the gadolinium isotope channel before the gadolinium pollution is removed; datacorrectedCyTOF correction data representing gadolinium isotope channels depleted of gadolinium contamination; noise represents the background noise signal of the gadolinium isotope channel.
2. The method for removing gadolinium isotope channel contamination in mass spectrometry flow data as claimed in claim 1, wherein step 2) estimates R using collinearity characteristics of gadolinium isotope channel contamination signalsGd:
Data fetchingobservedCalculating the intensity ratio of the pollution signals of different gadolinium isotope channels by using a linear regression model for cells 5-10% of the average signal intensity, specifically, calculating by using one gadolinium isotope channel as a gadolinium isotope reference channel and using the following formula155Gd~158Gd,160Intensity ratio of Gd channel contamination signal to gadolinium isotope baseline channel contamination signal
Wherein each i value represents a gadolinium isotopeChannel i 1, 2, 5 corresponds to a gadolinium isotope channel155Gd~158Gd,160Gd;1, 2, 5 correspond to gadolinium isotope channels, respectively155Gd~158Gd,160The intensity ratio of the Gd contamination signal to the gadolinium isotope baseline channel contamination signal;detecting signals of a gadolinium isotope reference channel of CyTOF data subjected to data preprocessing; b is a constant term of unary linear regression; epsilon is the error of the data from linearity;respectively corresponding to CyTOF data gadolinium isotope channels subjected to data preprocessing155Gd~158Gd,160A detection signal for Gd;
or step 2) using the ratio of the abundance of gadolinium isotopes to R existing in natureGdAn estimation is performed.
3. The method for removing gadolinium isotope channel contamination in mass spectrometry flow data as claimed in claim 2, wherein step 2) estimates R using collinearity characteristics of gadolinium isotope channel contamination signalsGd:
Data fetchingobservedCalculating the intensity ratio of the pollution signals of different gadolinium isotope channels by using a linear regression model for cells 5-10% of the average signal intensity, specifically, calculating by using one gadolinium isotope channel as a gadolinium isotope reference channel and using the following formula155Gd~158Gd,160Intensity ratio of Gd channel contamination signal to gadolinium isotope baseline channel contamination signal
Wherein each i value represents a gadolinium isotope channel, i 1, 2, 5 respectively corresponding to the gadolinium isotope channels155Gd~158Gd,160Gd:Respectively corresponding to gadolinium isotope channels155Gd~158Gd,160The intensity ratio of the Gd contamination signal to the gadolinium isotope baseline channel contamination signal;detecting signals of a gadolinium isotope reference channel of CyTOF data subjected to data preprocessing; b is a constant term of unary linear regression; epsilon is the error of the data from linearity;respectively corresponding to CyTOF data gadolinium isotope channels subjected to data preprocessing155Gd~158Gd,160A detection signal of Gd.
4. The method for removing gadolinium isotope channel contamination in mass spectrometry flow Data as claimed in claim 2 or 3, wherein Data is takenobservedAnd (3) calculating the intensity ratio of the pollution signals of different gadolinium isotope channels by using a linear regression model for the cells with the first 5% of the average signal intensity.
5. The method for removing gadolinium isotope channel contamination in mass spectrometry flow data as claimed in claim 1, wherein step 3) adopts L1-norm optimization method to estimate k:
or step 3) estimating k by adopting an L2-norm optimization method:
or step 3) calculating an estimated value of the pollution degree coefficient of the single gadolinium isotope channelEstimating k by its minimum value:
or step 3) calculating an estimated value of the pollution degree coefficient of the single gadolinium isotope channelK is estimated by its mean value:
or step 3) calculating an estimated value of the pollution degree coefficient of the single gadolinium isotope channelK is estimated by the number of bits therein:
wherein DataAbDetecting signals from biomarkers specifically labeled by coupling antibodies in CyTOF data of gadolinium isotope channels subjected to data preprocessing; each value of i represents a gadolinium isotope channel, i 1, 2, 5 respectively corresponding to gadoliniumIsotope passage155Gd~158Gd,160Gd;Respectively corresponding to CyTOF data gadolinium isotope channels subjected to data preprocessing155Gd~158Gd,160A detection signal for Gd;respectively corresponding to gadolinium isotope channels155Gd~158Gd,160The intensity ratio of the Gd contamination signal to the gadolinium isotope baseline channel contamination signal;respectively corresponding to single cell in single gadolinium isotope channel155Gd~158Gd,160Estimate of Gd contamination degree coefficient.
Applications Claiming Priority (2)
Application Number | Priority Date | Filing Date | Title |
---|---|---|---|
CN202110476288X | 2021-04-29 | ||
CN202110476288 | 2021-04-29 |
Publications (2)
Publication Number | Publication Date |
---|---|
CN113588524A true CN113588524A (en) | 2021-11-02 |
CN113588524B CN113588524B (en) | 2022-06-07 |
Family
ID=78256120
Family Applications (1)
Application Number | Title | Priority Date | Filing Date |
---|---|---|---|
CN202110904965.3A Active CN113588524B (en) | 2021-04-29 | 2021-08-07 | Method for removing gadolinium isotope channel pollution in mass spectrum flow data |
Country Status (1)
Country | Link |
---|---|
CN (1) | CN113588524B (en) |
Citations (6)
Publication number | Priority date | Publication date | Assignee | Title |
---|---|---|---|---|
WO2016024020A1 (en) * | 2014-08-14 | 2016-02-18 | Universität Zürich | Highly multiplexed absolute quantification of molecules on the single cell level |
CN107255722A (en) * | 2017-04-26 | 2017-10-17 | 马鞍山易廷生物科技有限公司 | Streaming combination ICP MS single cell protein detection method is marked based on metal isotope |
US20190331689A1 (en) * | 2018-04-27 | 2019-10-31 | Deutsches Rheuma-Forschungszentrum Berlin | Functionalized metal-labeled beads for mass cytometry |
CN110412287A (en) * | 2019-07-11 | 2019-11-05 | 上海宸安生物科技有限公司 | One kind being based on single celled immunocyte parting quantitative analysis method |
CN110412286A (en) * | 2019-07-11 | 2019-11-05 | 上海宸安生物科技有限公司 | A method of Single cell analysis being carried out to tumor sample using mass spectrum streaming systems |
CN111982789A (en) * | 2020-08-21 | 2020-11-24 | 中国科学院生态环境研究中心 | High-throughput detection method of metal ions and metal nanoparticles based on single-cell enrichment and single-cell mass spectrometry |
-
2021
- 2021-08-07 CN CN202110904965.3A patent/CN113588524B/en active Active
Patent Citations (6)
Publication number | Priority date | Publication date | Assignee | Title |
---|---|---|---|---|
WO2016024020A1 (en) * | 2014-08-14 | 2016-02-18 | Universität Zürich | Highly multiplexed absolute quantification of molecules on the single cell level |
CN107255722A (en) * | 2017-04-26 | 2017-10-17 | 马鞍山易廷生物科技有限公司 | Streaming combination ICP MS single cell protein detection method is marked based on metal isotope |
US20190331689A1 (en) * | 2018-04-27 | 2019-10-31 | Deutsches Rheuma-Forschungszentrum Berlin | Functionalized metal-labeled beads for mass cytometry |
CN110412287A (en) * | 2019-07-11 | 2019-11-05 | 上海宸安生物科技有限公司 | One kind being based on single celled immunocyte parting quantitative analysis method |
CN110412286A (en) * | 2019-07-11 | 2019-11-05 | 上海宸安生物科技有限公司 | A method of Single cell analysis being carried out to tumor sample using mass spectrum streaming systems |
CN111982789A (en) * | 2020-08-21 | 2020-11-24 | 中国科学院生态环境研究中心 | High-throughput detection method of metal ions and metal nanoparticles based on single-cell enrichment and single-cell mass spectrometry |
Non-Patent Citations (1)
Title |
---|
曾浔 等: "基于单细胞质谱流式技术的早期肝癌区域免疫特性的研究", 《第13届全国免疫学学术大会》 * |
Also Published As
Publication number | Publication date |
---|---|
CN113588524B (en) | 2022-06-07 |
Similar Documents
Publication | Publication Date | Title |
---|---|---|
Estabrook et al. | Studies on the content and organization of the respiratory enzymes of mitochondria | |
Haberkorn et al. | Glucose uptake, perfusion, and cell proliferation in head and neck tumors: relation of positron emission tomography to flow cytometry | |
Kim et al. | Metabolomic screening and star pattern recognition by urinary amino acid profile analysis from bladder cancer patients | |
O'Neill et al. | Thymidine kinase 1–a prognostic and diagnostic indicator in ALL and AML patients | |
EP3321676B1 (en) | Blood test kit and blood analysis method | |
Amano et al. | Stable xenon CT cerebral blood flow measurements computed by a single compartment-double integration model in normal aging and dementia | |
Wieland et al. | Neuromelanin-sensitive magnetic resonance imaging in schizophrenia: a meta-analysis of case-control studies | |
Links et al. | Biomarkers and mechanistic approaches in environmental epidemiology | |
US4835097A (en) | Method for ascertaining the history of a condition of the body from a single blood sample | |
CN113588524B (en) | Method for removing gadolinium isotope channel pollution in mass spectrum flow data | |
Quigg et al. | Dynamic FDG‐PET demonstration of functional brain abnormalities | |
Ulvik | Hereditary haemochromatosis through 150 years | |
CN103760159B (en) | A kind of method and system of Bacteria Identification and Analysis of Drug Susceptibility | |
CN113155983B (en) | Combined marker and application and detection kit thereof | |
Borras et al. | Exhaled breath condensate methods adapted from human studies using longitudinal metabolomics for predicting early health alterations in dolphins | |
US9189595B2 (en) | Apparatus and associated method for analyzing small molecule components in a complex mixture | |
Zhu et al. | Urine based near-infrared spectroscopy analysis reveals a noninvasive and convenient diagnosis method for cancers: a pilot study | |
Van De Wiele et al. | Absolute 24 h quantification of 99Tcm-DMSA uptake in patients with severely reduced kidney function: A comparison with: 51: Cr-EDTA clearance | |
Hare et al. | Rapid estimation of DOPA in physiological fluids using the amino acid analyzer | |
Boisson et al. | French experience of quality assessment of quantitative urinary analysis | |
CN117630219B (en) | Method for detecting pyrrole alkaloid protein adduct and kit | |
Hare et al. | Imaging Metals in the Brain by Laser Ablation–Inductively Coupled Plasma-Mass Spectrometry | |
CN110993021B (en) | Method for measuring and calculating biological age of human body | |
CN109979598B (en) | By human body18F-FDG PET data analysis tissue DNA hydroxymethyl background and application | |
CN109932511B (en) | Urine exosome phospholipid marker for liver cancer screening and kit thereof |
Legal Events
Date | Code | Title | Description |
---|---|---|---|
PB01 | Publication | ||
PB01 | Publication | ||
SE01 | Entry into force of request for substantive examination | ||
SE01 | Entry into force of request for substantive examination | ||
GR01 | Patent grant | ||
GR01 | Patent grant |