CN113588524A - Method for removing gadolinium isotope channel pollution in mass spectrum flow data - Google Patents

Method for removing gadolinium isotope channel pollution in mass spectrum flow data Download PDF

Info

Publication number
CN113588524A
CN113588524A CN202110904965.3A CN202110904965A CN113588524A CN 113588524 A CN113588524 A CN 113588524A CN 202110904965 A CN202110904965 A CN 202110904965A CN 113588524 A CN113588524 A CN 113588524A
Authority
CN
China
Prior art keywords
gadolinium
data
channel
gadolinium isotope
isotope
Prior art date
Legal status (The legal status is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the status listed.)
Granted
Application number
CN202110904965.3A
Other languages
Chinese (zh)
Other versions
CN113588524B (en
Inventor
尹巍巍
刘俊伟
陈伟
Current Assignee (The listed assignees may be inaccurate. Google has not performed a legal analysis and makes no representation or warranty as to the accuracy of the list.)
Zhejiang University ZJU
Original Assignee
Zhejiang University ZJU
Priority date (The priority date is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the date listed.)
Filing date
Publication date
Application filed by Zhejiang University ZJU filed Critical Zhejiang University ZJU
Publication of CN113588524A publication Critical patent/CN113588524A/en
Application granted granted Critical
Publication of CN113588524B publication Critical patent/CN113588524B/en
Active legal-status Critical Current
Anticipated expiration legal-status Critical

Links

Images

Classifications

    • GPHYSICS
    • G01MEASURING; TESTING
    • G01NINVESTIGATING OR ANALYSING MATERIALS BY DETERMINING THEIR CHEMICAL OR PHYSICAL PROPERTIES
    • G01N15/00Investigating characteristics of particles; Investigating permeability, pore-volume or surface-area of porous materials
    • G01N15/10Investigating individual particles
    • G01N15/14Optical investigation techniques, e.g. flow cytometry
    • GPHYSICS
    • G01MEASURING; TESTING
    • G01NINVESTIGATING OR ANALYSING MATERIALS BY DETERMINING THEIR CHEMICAL OR PHYSICAL PROPERTIES
    • G01N1/00Sampling; Preparing specimens for investigation
    • G01N1/28Preparing specimens for investigation including physical details of (bio-)chemical methods covered elsewhere, e.g. G01N33/50, C12Q
    • G01N1/34Purifying; Cleaning
    • GPHYSICS
    • G01MEASURING; TESTING
    • G01NINVESTIGATING OR ANALYSING MATERIALS BY DETERMINING THEIR CHEMICAL OR PHYSICAL PROPERTIES
    • G01N27/00Investigating or analysing materials by the use of electric, electrochemical, or magnetic means
    • G01N27/62Investigating or analysing materials by the use of electric, electrochemical, or magnetic means by investigating the ionisation of gases, e.g. aerosols; by investigating electric discharges, e.g. emission of cathode

Landscapes

  • Chemical & Material Sciences (AREA)
  • Health & Medical Sciences (AREA)
  • Life Sciences & Earth Sciences (AREA)
  • General Physics & Mathematics (AREA)
  • Pathology (AREA)
  • Analytical Chemistry (AREA)
  • Biochemistry (AREA)
  • General Health & Medical Sciences (AREA)
  • Physics & Mathematics (AREA)
  • Immunology (AREA)
  • Dispersion Chemistry (AREA)
  • Engineering & Computer Science (AREA)
  • Biomedical Technology (AREA)
  • Molecular Biology (AREA)
  • Chemical Kinetics & Catalysis (AREA)
  • Electrochemistry (AREA)
  • Other Investigation Or Analysis Of Materials By Electrical Means (AREA)

Abstract

The invention discloses a method for removing gadolinium isotope channel pollution in mass spectrum flow data. By using gadolinium isotope channels (155Gd~158Gd,160Gd) pollution signals have the characteristic of collinearity, under the condition that biomarkers specifically marked by antibodies coupled with gadolinium isotopes and designed in advance cannot be co-expressed on one cell at the same time, the estimation value of the pollution signals of a single cell in a gadolinium isotope channel is calculated, and correction data for removing the gadolinium isotope channel pollution is obtained. The invention provides a method for estimating and removing a pollution signal of a gadolinium isotope channel in mass spectrum flow data, which avoids the deviation caused by artificial change of cell components of a detected sample caused by directly removing polluted cells, improves the quality of the mass spectrum flow detection data and improves the subsequent quality of the mass spectrum flow detection dataThe data analysis has important application value.

Description

Method for removing gadolinium isotope channel pollution in mass spectrum flow data
Technical Field
The invention relates to the technical field of mass spectrum flow cytometry, in particular to a method for removing pollution of a gadolinium isotope channel in mass spectrum flow data.
Background
Gadolinium (Gd) isotope channel contamination is common in mass spectrometry flow data obtained by detecting local tissue samples of patients using mass cytometry.
Mass Cytometry (CyTOF for short) is a single-cell, high-throughput and high-dimensionality detection technology, protein molecular markers on the surface and inside of cells are calibrated by coupling antibody specificity of rare metals, and the content of the calibrated rare metals is accurately and quantitatively detected by using a Mass spectrometry principle. Compared with the traditional flow cytometry technology, the method has the characteristics of higher flux and higher detection signal precision. Currently, rare metals used to couple antibodies have more than 40 channels, among which the gadolinium isotope channel: (155Gd~158Gd,160Gd) is a common channel for cytef detection.
Gadolinium (Gd) is a metal element with isotopes of152Gd,154Gd~158Gd,160Gd。
The paramagnetic gadolinium chelate is the most commonly used contrast agent for magnetic resonance imaging at present, and is mainly used for improving the imaging quality, increasing the contrast of images and improving the diagnosis of clinical conditions. At the end of the 80's 20 th century, gadolinium diethylenetriaminepentaacetate (Gd-TDPA) was officially approved by the United states drug administration (FDA) for clinical diagnosis after undergoing a number of animal experiments. Gadolinium-based contrast agents (GBCA) are used by intravenous injection into the human body and rapidly reach a concentration equilibrium in the blood vessels and extracellular fluid, and also enter cells (including liver and kidney tissue cells, etc.) by passive diffusion or some special transport channel. The clinically approved GBCA, at lower doses of injection, will not substantially enter human cells except for small amounts that may remain in hepatocytes, and will be rapidly and completely excreted by humans with normal renal function. However, at higher injected doses or in humans with defective or impaired renal function, the rate of GBCA metabolism is significantly slowed, with a half-life extending from hours to days, and even remaining in multiple organs (including skin, bone, etc.). Therefore, patients who have used GBCA may have gadolinium isotopes remaining in local tissue cells to different degrees, the amount of residual gadolinium isotopes, the duration of residual gadolinium isotopes, and the amount of residual gadolinium isotopes remaining in different cells may vary according to the individual and the dosage of GBCA.
When the GBCA-used local tissue of a patient is detected by CyTOF after surgical excision, the obtained CyTOF data may include two detection signals of gadolinium isotope channels: a portion of the detection signal from the gadolinium isotope coupled to the antibody is also the target signal of detection; the other part is from a detection signal caused by residual gadolinium isotopes in cells by using GBCA recently, the part is not a detected target signal and can interfere the detection precision and accuracy of the target signal, and the part is called a pollution signal. In CyTOF detection of renal tissue samples from patients with renal clear cell carcinoma, as published by Bernd Bodenmiller et al in 2017 at page 736-749 of J.169, cell & J.169, it was clearly indicated that there were cells with a signal of gadolinium isotope contamination in the renal tissue samples. They subsequently published 2019 in "cell journal 177, pages 1-16, CyTOF measurements of breast tissue from breast cancer patients, again clearly indicating the presence of cells with a signal for gadolinium isotope contamination. To remove these contaminating signals, they used all the data detected from cells directly depleted of contaminating signals from gadolinium isotopes (equivalent to directly removing the whole cells). Such an approach directly changes the composition of the cells in the sample being tested, and is prone to biased analytical conclusions. Therefore, the method has important significance for effectively estimating and removing the pollution signal of the gadolinium isotope channel in CyTOF detection data, and ensuring the accuracy of CyTOF detection, the quality of CyTOF data and the effectiveness of relevant conclusions obtained by later-stage data analysis.
Disclosure of Invention
The object of the present invention is to provide an estimation and eliminationA method for removing gadolinium isotope pollution in mass flow data. By using gadolinium isotope channels (155Gd~158Gd,160Gd) pollution signals have the characteristic of collinearity, and based on the condition that the biomarkers specifically marked by the antibodies coupled with the gadolinium isotopes cannot be co-expressed on one cell at the same time, the estimation value of the pollution signals of the single cell in a gadolinium isotope channel is calculated, and correction data for removing the gadolinium isotope channel pollution is obtained.
The invention adopts the following technical scheme:
a method for removing gadolinium isotope channel pollution in mass spectrum flow data comprises the following steps:
1) when CyTOF detection is used, the specific labeled biomarker of the gadolinium isotope coupling antibody cannot be co-expressed in one cell at the same time, and at least two gadolinium isotope coupling antibodies and at most five gadolinium isotope coupling antibodies are designed, wherein the gadolinium isotope is selected from155Gd~158Gd,160Gd;
2) For the ratio R of the intensity of the contaminating signals between the channels of the gadolinium isotopeGdCarrying out estimation;
3) estimating the pollution degree coefficient k of the single cell;
4) based on estimated RGdAnd k, calculating an estimated value of a pollution signal of the gadolinium isotope channel, and obtaining correction data for removing the pollution of the gadolinium isotope channel by the following formula:
Datacorrected=max(0,Dataobserved-k*RGd)+noise
in the formula, DataobservedCyTOF data of a gadolinium isotope channel representing data preprocessing, wherein if a sample contains gadolinium pollution, the CyTOF data of the gadolinium isotope channel before the gadolinium pollution is removed; datacorrectedCyTOF correction data representing gadolinium isotope channels depleted of gadolinium contamination; noise represents the background noise signal of the gadolinium isotope channel.
Further, step 2) utilizes the collinearity characteristic of the pollution signal of the gadolinium isotope channel to estimate RGd
Get DataobservedCalculating the intensity ratio of the pollution signals of different gadolinium isotope channels by using a linear regression model for cells 5-10% of the average signal intensity, specifically, calculating by using one gadolinium isotope channel as a gadolinium isotope reference channel and using the following formula155Gd~158Gd,160Intensity ratio of Gd channel contamination signal to gadolinium isotope baseline channel contamination signal
Figure BDA0003201292650000031
Figure BDA0003201292650000032
Wherein each value of i represents a gadolinium isotope channel, and i ═ 1, 2, …,5 respectively correspond to the gadolinium isotope channels155Gd~158Gd,160Gd;
Figure BDA0003201292650000033
Respectively corresponding to gadolinium isotope channels155Gd~158Gd,160The intensity ratio of the Gd contamination signal to the gadolinium isotope baseline channel contamination signal;
Figure BDA0003201292650000034
detecting signals of a gadolinium isotope reference channel of CyTOF data subjected to data preprocessing; b is a constant term of unary linear regression; epsilon is the error of the data from linearity;
Figure BDA0003201292650000035
respectively corresponding to CyTOF data gadolinium isotope channels subjected to data preprocessing155Gd~158Gd,160A detection signal for Gd;
or step 2) using the ratio of the abundance of gadolinium isotopes to R existing in natureGdAn estimation is performed.
Further, step 2) utilizes the collinearity characteristic of the pollution signal of the gadolinium isotope channel to estimate RGd:
Data fetchingobservedCalculating the intensity ratio of the pollution signals of different gadolinium isotope channels by using a linear regression model for cells 5-10% of the average signal intensity, specifically, calculating by using one gadolinium isotope channel as a gadolinium isotope reference channel and using the following formula155Gd~158Gd,160Intensity ratio of Gd channel contamination signal to gadolinium isotope baseline channel contamination signal
Figure BDA0003201292650000036
Figure BDA0003201292650000037
Wherein each value of i represents a gadolinium isotope channel, and i ═ 1, 2, …,5 respectively correspond to the gadolinium isotope channels155Gd~158Gd,160Gd;
Figure BDA0003201292650000038
Respectively corresponding to gadolinium isotope channels155Gd~158Gd,160The intensity ratio of the Gd contamination signal to the gadolinium isotope baseline channel contamination signal;
Figure BDA0003201292650000039
detecting signals of a gadolinium isotope reference channel of CyTOF data subjected to data preprocessing; b is a constant term of unary linear regression; epsilon is the error of the data from linearity;
Figure BDA0003201292650000041
respectively corresponding to CyTOF data gadolinium isotope channels subjected to data preprocessing155Gd~158Gd,160A detection signal of Gd.
Further, get DataobservedAnd (3) calculating the intensity ratio of the pollution signals of different gadolinium isotope channels by using a linear regression model for the cells with the first 5% of the average signal intensity.
Further, step 3) adopts an L1-norm optimization method to estimate k:
Figure BDA0003201292650000042
or step 3) estimating k by adopting an L2-norm optimization method:
Figure BDA0003201292650000043
or step 3) calculating an estimated value of the pollution degree coefficient of the single gadolinium isotope channel
Figure BDA0003201292650000044
Estimating k by its minimum value:
Figure BDA0003201292650000045
or step 3) calculating an estimated value of the pollution degree coefficient of the single gadolinium isotope channel
Figure BDA0003201292650000046
K is estimated by its mean value:
Figure BDA0003201292650000047
or step 3) calculating an estimated value of the pollution degree coefficient of the single gadolinium isotope channel
Figure BDA0003201292650000048
K is estimated by the number of bits therein:
Figure BDA0003201292650000049
wherein DataAbCyTOF data for gadolinium isotope channel preprocessed by data from organisms specifically labeled with coupling antibodyA detection signal of the label; each value of i represents a gadolinium isotope channel, and i is 1, 2, … and 5 respectively corresponding to the gadolinium isotope channels155Gd~158Gd,160Gd;
Figure BDA00032012926500000410
Respectively corresponding to CyTOF data gadolinium isotope channels subjected to data preprocessing155Gd~158Gd,160A detection signal for Gd;
Figure BDA00032012926500000411
respectively corresponding to gadolinium isotope channels155Gd~158Gd,160The intensity ratio of the Gd contamination signal to the gadolinium isotope baseline channel contamination signal;
Figure BDA0003201292650000051
respectively corresponding to single cell in single gadolinium isotope channel155Gd~158Gd,160Estimate of Gd contamination degree coefficient.
Further, step 3) adopts an L1-norm optimization method to estimate k:
Figure BDA0003201292650000052
the invention has the beneficial effects that:
the invention discloses a method for estimating and removing a pollution signal of a gadolinium isotope channel in mass spectrum flow data, which avoids deviation caused by artificial change of cell components of a detected sample caused by directly removing polluted cells. The technology is mainly used for mass spectrum flow cytometry detection of local tissue samples of patients injected with gadolinium-based contrast agents due to nuclear magnetic resonance detection, and has important application values for improving the quality of mass spectrum flow detection data and subsequent data analysis.
Drawings
FIG. 1 is a graph illustrating the collinearity of the contamination signals of gadolinium isotope channels:
scatter plots of the a gadolinium isotope channels between each two: left panel: no gadolinium contaminates the signal sample data; right panel: signal sample data contaminated by gadolinium;
b gadolinium isotope channel correlation coefficient: left panel: the correlation coefficient of every two detection signals of gadolinium isotope channels of a gadolinium pollution signal sample is not generated; right panel: the correlation coefficient of each two of gadolinium isotope channel detection signals of a gadolinium pollution signal sample;
c, under the condition that the same gadolinium pollution sample is subjected to antibody coupling with or without a gadolinium isotope channel, the correlation coefficient of each two detection signals of the corresponding channel is as follows: left panel: the gadolinium isotope channel is free of coupling antibodies, and all detection signals come from gadolinium pollution signals; right panel: the gadolinium isotope channel is provided with a coupling antibody;
d multivariate linear regression analysis of gadolinium isotope channel signals of coupled antibodies: for coupling gamma delta TCR, CD19, CD33 respectively156Gd、158Gd、160Performing multivariate linear regression on the Gd channel detection signals of the four channels shown by the longitudinal axis, wherein the obtained regression coefficients are shown in the figure; to couple gamma delta TCR156Gd channels, e.g. with detection signals predominantly associated with uncoupled antibodies155Gd contaminating signal and non-Gd channel coupled to gamma delta TCR142Nd-gamma delta TCR linear correlation with other metal channels coupled to other antibodies148Nd-CD19,149Sm-CD33 was independent (correlation coefficient 0), indicating that the contaminating Gd channel signal was linearly superimposed by the Gd contamination signal and the coupled antibody signal;
e to155Gd is used as a reference channel, and R is obtained by linear regression calculation of a formula (4)GdAnd through correlation analysis of the ratio of the natural abundance of the gadolinium isotope, the correlation coefficient reaches 0.96.
FIG. 2 is a graph showing the calculation of the contamination signal intensity ratio R between the respective channels of gadolinium isotopesGd:
Detecting the obtained gadolinium isotope pollution signal of a detection sample with gadolinium pollution under the condition that no coupling antibody exists in gadolinium isotope channels so as to155Taking Gd as a reference channel, performing linear regression on cells n% before the average signal intensity of the gadolinium isotope channels (gradually increasing from 5% to 95% from the former 5%), and calculating the relative intensity of other gadolinium isotope channels155Alignment of the signal ratio of Gd channels (blue line) to the ratio of the natural abundance of gadolinium isotopes (red line).
FIG. 3 illustrates example 1:
a, synthesizing a Gd channel scattergram of CyTOF data polluted by gadolinium isotopes;
b, synthesizing a Gd channel correlation coefficient of CyTOF data polluted by gadolinium isotopes;
the ratio of 25 cell subsets obtained by C7 data clustering;
d, synthesizing gadolinium isotope pollution data and analyzing the correlation between correction data obtained by adopting the methods of formula (5) -formula (9) and pollution-free data (raw) in a ratio of 25 cell subsets.
FIG. 4 shows example 1, Explanation 2:
a7 panel data cluster analysis the resulting heatmaps of 25 cell subsets: each row represents a subpopulation of cells, each column expressing one protein molecular marker expression;
b7, comparing and analyzing the average expression intensity of the data of gadolinium isotope channels;
c, synthesizing gadolinium isotope pollution data and comparing correction data obtained by adopting a method of formula (5) -formula (9) with correlation coefficients of 25 cell subsets of pollution-free data (raw);
d, synthesizing gadolinium isotope pollution data and a performance index of the similarity between correction data obtained by adopting a method of a formula (5) -a formula (9) and pollution-free data (raw);
e synthesis of gadolinium isotope channel multiple linear regression coefficient analysis coupled with gadolinium isotope contamination data and 1DNorm correction data to antibodies.
Detailed Description
The invention is explained in more detail below with reference to exemplary embodiments and the accompanying drawings. The following examples are provided only for illustrating the present invention and are not intended to limit the scope of the present invention.
The basic principle of the invention is as follows:
firstly, defining gadolinium isotope channel (subjected to data preprocessing (CyTOF data off-machine preprocessing usually including barcode decoding, quality control and the like, which is a conventional operation) (155Gd~158Gd,160Gd) as Dataobserved(if the sample contains gadolinium pollution, the CyTOF Data of gadolinium isotope channel before gadolinium pollution removal is a numerical matrix of n x 5, each row represents an effective single cell, each column represents a gadolinium isotope channel, and n is the effective number of single cells contained in the detected sample), wherein the detection signal from the biomarker specifically labeled by the coupling antibody, namely the protein molecule is DataAbThe part of the signal is a normal signal which needs to be detected; the contaminating signal from gadolinium isotope residues resulting from the use of GBCA is DataGdThis part of the signal is the polluting signal that needs to be removed; the relationship between the three is expressed by formula (1):
Dataobserved=DataAb+DataGdformula (1)
Wherein DataAbAnd DataGdN x 5 numerical matrix, definition of rows and columns and DataobservedThe same is true.
Data when cells contain residual signals of gadolinium isotopesobservedTwo-by-two co-linear relationships are present between different channels, especially in cells with higher signal intensity expression, which shows a multiple co-linear relationship (FIG. 1). The preset experimental condition that the specific labeled biomarkers of the antibodies based on gadolinium isotope coupling, namely protein molecules, cannot be co-expressed in one cell at the same time, DataobservedThe phenomenon of co-linearity of the two presented channels is then caused by gadolinium contamination signals. Therefore, the collinearity characteristic can be used to estimate the pollution signal intensity ratio R between the gadolinium isotope channelsGd(n x 5 matrix of values) and a single-cell contamination degree coefficient k (n x 1 matrix of values), and calculating an estimated value of a contamination signal of the gadolinium isotope channel by using a formula (2):
Figure BDA0003201292650000071
wherein
Figure BDA0003201292650000072
In the CyTOF data representing the data-preprocessed gadolinium isotope channels, an estimate of the contamination signal from gadolinium isotope residuals resulting from the use of GBCA. Obtaining corrected data for removing the gadolinium isotope channel contamination by removing the estimated value of the contamination signal in the detection signal and using the formula (3), namely the estimated value of the target detection signal:
Figure BDA0003201292650000073
wherein DatacorrectedCyTOF correction data representing gadolinium isotope channels depleted of gadolinium contamination; noise represents the background noise signal of the gadolinium isotope channel, and the term is added to avoid the zero value produced by the correction. The artificial addition of background noise signals will not change the expression pattern of "positive" or "negative" of the target molecules, and thus will not affect the results of the subsequent cluster analysis.
Specifically, the following method was used to estimate the ratio R of the intensity of the contaminating signal between the channels of gadolinium isotopesGd
(1) Estimating R by utilizing collinearity characteristic of gadolinium isotope channel pollution signalGd(FIG. 2). Data fetchingobservedCalculating the intensity ratio of the pollution signals of different gadolinium isotope channels by linear regression model respectively for the cells with 5% -10% of the average signal intensity, preferably the cells with 5% of the average signal intensity155The Gd channel is a gadolinium isotope reference channel (also can be a gadolinium isotope reference channel)156Gd,157Gd,158Gd, or160Gd as reference channel), calculated using equation (4)155Gd~158Gd,160Gd channel pollution signal phase contrast155Intensity ratio of Gd channel contamination signals
Figure BDA0003201292650000074
Figure BDA00032012926500000810
Wherein each i value represents a gadolinium isotopeChannels, i-1, 2, …,5, respectively, correspond to gadolinium isotope channels155Gd~158Gd,160Gd;
Figure BDA0003201292650000082
Respectively corresponding to gadolinium isotope channels155Gd~158Gd,160Contamination signal of Gd and155the intensity ratio of Gd channel contamination signals;
Figure BDA0003201292650000083
CyTOF data pre-processed for data155B is a constant term of unary linear regression, and epsilon is an error of data deviation linearity;
Figure BDA0003201292650000084
respectively corresponding to CyTOF data gadolinium isotope channels subjected to data preprocessing155Gd~158Gd,160A detection signal of Gd.
(2) Or using the ratio of the abundance of gadolinium isotopes to R existing in natureGdThe estimation is performed to avoid the calculation of linear regression and the like required in (1), which is an experience-based direct estimation method (FIGS. 1-E), to155The Gd channel is a reference channel (can also be a reference channel)156Gd,157Gd,158Gd, or160Gd as the reference channel) ofGdThe value is [1,1.3831,1.0574,1.6784,1.4770 ]]。
Specifically, k is estimated using the following method:
(1) and (3) estimating the single cell pollution degree coefficient k by adopting an optimization method. The basic principle of the optimization method is to obtain the maximum estimation value of the possible contamination degree of the single cell by minimizing the detection signal labeled by the coupling antibody, wherein the minimization optimization method can be realized by the optimization formula (5) of L1-norm or the optimization formula (6) of L2-norm:
Figure BDA0003201292650000085
Figure BDA0003201292650000086
(2) under the preset experimental conditions that the gadolinium isotope-coupled antibody-specific labeled biomarkers, namely protein molecules, cannot be co-expressed in one cell at the same time, for a single polluted cell, at least one row of channels should exist, and the signals mainly come from the pollution signals, namely, the protein molecules labeled by the coupled antibody are not expressed on the cell. Therefore, the signals and R are detected by calculating different channelsGdThe ratio of (A) to (B) can obtain the pseudo-contamination coefficient of a single cell in different gadolinium isotope channels
Figure BDA0003201292650000087
And by calculating
Figure BDA0003201292650000088
The minimum or average or median value of k, which is expressed by the corresponding formula (7) -formula (9):
Figure BDA0003201292650000089
Figure BDA0003201292650000091
Figure BDA0003201292650000092
each value of i represents a gadolinium isotope channel, and i is 1, 2, … and 5 respectively corresponding to the gadolinium isotope channels155Gd~158Gd,160Gd;
Figure BDA0003201292650000093
Respectively corresponding to CyTOF data gadolinium isotope channels subjected to data preprocessing155Gd~158Gd,160A detection signal for Gd;
Figure BDA0003201292650000094
respectively corresponding to gadolinium isotope channels155Gd~158Gd,160The intensity ratio of the Gd contamination signal to the gadolinium isotope baseline channel contamination signal;
Figure BDA0003201292650000095
respectively corresponding to single cell in single gadolinium isotope channel155Gd~158Gd,160Estimate of Gd contamination degree coefficient.
Example 1
Firstly, gadolinium isotope channel coupling-free antibody staining and CyTOF on-machine detection are carried out on a sample polluted by gadolinium isotopes, obtained 155 Gd-158 Gd and 160Gd channel detection data are superposed into sample data (raw, without gadolinium isotope pollution) polluted by gadolinium isotope channels as gadolinium pollution signals, and synthetic simulation data (simulate, with gadolinium isotope pollution) are generated and have gadolinium isotope collinearity characteristics and pairwise strong correlation characteristics between channels (figures 3A and B).
With R provided in the inventionGdEstimation method (equation (4)) to obtain the pair RGdThe estimates of (c) are as follows:
RGd=[1.000000,1.308478,1.004611,1.764121,1.714435]。
then, the k value is estimated by respectively adopting a formula (5) to a formula (9), then the pollution data is corrected by adopting a formula (3), and correction data corresponding to different estimation methods of the formula (5) to the formula (9) are respectively obtained as follows: 1DNorm, 2DNorm, Min, Mean, Median. And carrying out unified clustering analysis on raw, simulate, 1DNorm, 2DNorm, Min, Mean and Median 7 groups of data, and comparing clustering results of different data. Clustering analysis yielded 25 cell subsets (C1-C25) (FIG. 4A), with the cells of C13 and C14 subsets being derived primarily from simulate data (FIG. 3C), suggesting that these two subsets are spurious cell subsets due to gadolinium contamination of the detection signal. Correlation analysis of 25 cell subpopulation ratios per dataset (FIG. 3D and FIG. 4C) revealed that simulate data was corrected by the Min methodThe data and raw data are low in correlation, and the 1DNorm data and the raw data are the highest in correlation (the correlation coefficient reaches 0.95), which shows that the correction data generated by the 1DNorm method is the closest to the original data without gadolinium pollution. By comparing the average expression intensities of Gd isotope channels, it was found that the expression intensities of the simulate data and the Min method-corrected data and the raw data are the most different, while the expression intensities of the rest of the method-corrected data and the raw data are similar. In addition, by calculating F1 score, Recall, Precision, Homogenity and ARI parameters among several groups of data and raw data, the performance of 1DNorm on several performance indexes is found to be optimal, so that k is estimated by adopting a method of 1DNorm (formula (5)), and the obtained correction data is closest to the data without gadolinium isotope pollution, which shows that the correction performance is optimal and is a recommended estimation method. The samples before correction (simulate data) and after 1DNorm correction (1DNorm) were further analyzed156Gd、158Gd and160gd channel detects the source of the signal, finding coupling to-gamma delta TCR156Gd channel, pre-correction detection signal and155gd and142nd-gamma delta TCR average linear correlation (red dots in a scatter diagram of FIG. 4E), after correction by 1DNorm, the detection signal is only related to142Nd-gamma-delta TCR linear correlation with155The Gd channel has a correlation coefficient of 0 (blue dots in the scatter plot of fig. 4E), indicating that the 1DNorm correction effectively removed the contaminating signal from the gadolinium isotope. In the other two gadolinium isotope channels158Gd-CD19 and160Gd-CD33 found the same phenomenon, and 1DNorm correction effectively removed the gadolinium isotope contamination signal pair158Gd-CD19 and160influence of Gd-CD33 channel.

Claims (6)

1. A method for removing gadolinium isotope channel pollution in mass spectrum flow data is characterized by comprising the following steps:
1) when CyTOF detection is used, the specific labeled biomarker of the gadolinium isotope coupling antibody cannot be co-expressed in one cell at the same time, and at least two gadolinium isotope coupling antibodies and at most five gadolinium isotope coupling antibodies are designed, wherein the gadolinium isotope is selected from155Gd~158Gd,160Gd;
2) For the ratio R of the intensity of the contaminating signals between the channels of the gadolinium isotopeGdCarrying out estimation;
3) estimating the pollution degree coefficient k of the single cell;
4) based on estimated RGdAnd k, calculating an estimated value of a pollution signal of the gadolinium isotope channel, and obtaining correction data for removing the pollution of the gadolinium isotope channel by the following formula:
Datacorrected=max(0,Dataobserved-k*RGd)+noise
in the formula, DataobservedCyTOF data of a gadolinium isotope channel representing data preprocessing, wherein if a sample contains gadolinium pollution, the CyTOF data of the gadolinium isotope channel before the gadolinium pollution is removed; datacorrectedCyTOF correction data representing gadolinium isotope channels depleted of gadolinium contamination; noise represents the background noise signal of the gadolinium isotope channel.
2. The method for removing gadolinium isotope channel contamination in mass spectrometry flow data as claimed in claim 1, wherein step 2) estimates R using collinearity characteristics of gadolinium isotope channel contamination signalsGd
Data fetchingobservedCalculating the intensity ratio of the pollution signals of different gadolinium isotope channels by using a linear regression model for cells 5-10% of the average signal intensity, specifically, calculating by using one gadolinium isotope channel as a gadolinium isotope reference channel and using the following formula155Gd~158Gd,160Intensity ratio of Gd channel contamination signal to gadolinium isotope baseline channel contamination signal
Figure FDA0003201292640000011
Figure FDA0003201292640000012
Wherein each i value represents a gadolinium isotopeChannel i 1, 2, 5 corresponds to a gadolinium isotope channel155Gd~158Gd,160Gd;
Figure FDA0003201292640000015
1, 2, 5 correspond to gadolinium isotope channels, respectively155Gd~158Gd,160The intensity ratio of the Gd contamination signal to the gadolinium isotope baseline channel contamination signal;
Figure FDA0003201292640000013
detecting signals of a gadolinium isotope reference channel of CyTOF data subjected to data preprocessing; b is a constant term of unary linear regression; epsilon is the error of the data from linearity;
Figure FDA0003201292640000014
respectively corresponding to CyTOF data gadolinium isotope channels subjected to data preprocessing155Gd~158Gd,160A detection signal for Gd;
or step 2) using the ratio of the abundance of gadolinium isotopes to R existing in natureGdAn estimation is performed.
3. The method for removing gadolinium isotope channel contamination in mass spectrometry flow data as claimed in claim 2, wherein step 2) estimates R using collinearity characteristics of gadolinium isotope channel contamination signalsGd
Data fetchingobservedCalculating the intensity ratio of the pollution signals of different gadolinium isotope channels by using a linear regression model for cells 5-10% of the average signal intensity, specifically, calculating by using one gadolinium isotope channel as a gadolinium isotope reference channel and using the following formula155Gd~158Gd,160Intensity ratio of Gd channel contamination signal to gadolinium isotope baseline channel contamination signal
Figure FDA0003201292640000021
Figure FDA0003201292640000022
Wherein each i value represents a gadolinium isotope channel, i 1, 2, 5 respectively corresponding to the gadolinium isotope channels155Gd~158Gd,160Gd:
Figure FDA0003201292640000023
Respectively corresponding to gadolinium isotope channels155Gd~158Gd,160The intensity ratio of the Gd contamination signal to the gadolinium isotope baseline channel contamination signal;
Figure FDA0003201292640000024
detecting signals of a gadolinium isotope reference channel of CyTOF data subjected to data preprocessing; b is a constant term of unary linear regression; epsilon is the error of the data from linearity;
Figure FDA0003201292640000025
respectively corresponding to CyTOF data gadolinium isotope channels subjected to data preprocessing155Gd~158Gd,160A detection signal of Gd.
4. The method for removing gadolinium isotope channel contamination in mass spectrometry flow Data as claimed in claim 2 or 3, wherein Data is takenobservedAnd (3) calculating the intensity ratio of the pollution signals of different gadolinium isotope channels by using a linear regression model for the cells with the first 5% of the average signal intensity.
5. The method for removing gadolinium isotope channel contamination in mass spectrometry flow data as claimed in claim 1, wherein step 3) adopts L1-norm optimization method to estimate k:
Figure FDA0003201292640000026
or step 3) estimating k by adopting an L2-norm optimization method:
Figure FDA0003201292640000027
or step 3) calculating an estimated value of the pollution degree coefficient of the single gadolinium isotope channel
Figure FDA0003201292640000028
Estimating k by its minimum value:
Figure FDA0003201292640000029
or step 3) calculating an estimated value of the pollution degree coefficient of the single gadolinium isotope channel
Figure FDA00032012926400000210
K is estimated by its mean value:
Figure FDA0003201292640000031
or step 3) calculating an estimated value of the pollution degree coefficient of the single gadolinium isotope channel
Figure FDA0003201292640000032
K is estimated by the number of bits therein:
Figure FDA0003201292640000033
wherein DataAbDetecting signals from biomarkers specifically labeled by coupling antibodies in CyTOF data of gadolinium isotope channels subjected to data preprocessing; each value of i represents a gadolinium isotope channel, i 1, 2, 5 respectively corresponding to gadoliniumIsotope passage155Gd~158Gd,160Gd;
Figure FDA0003201292640000034
Respectively corresponding to CyTOF data gadolinium isotope channels subjected to data preprocessing155Gd~158Gd,160A detection signal for Gd;
Figure FDA0003201292640000035
respectively corresponding to gadolinium isotope channels155Gd~158Gd,160The intensity ratio of the Gd contamination signal to the gadolinium isotope baseline channel contamination signal;
Figure FDA0003201292640000036
respectively corresponding to single cell in single gadolinium isotope channel155Gd~158Gd,160Estimate of Gd contamination degree coefficient.
6. The method for removing gadolinium isotope channel contamination in mass spectrometry flow data as claimed in claim 5, wherein step 3) adopts L1-norm optimization method to estimate k:
Figure FDA0003201292640000037
CN202110904965.3A 2021-04-29 2021-08-07 Method for removing gadolinium isotope channel pollution in mass spectrum flow data Active CN113588524B (en)

Applications Claiming Priority (2)

Application Number Priority Date Filing Date Title
CN202110476288X 2021-04-29
CN202110476288 2021-04-29

Publications (2)

Publication Number Publication Date
CN113588524A true CN113588524A (en) 2021-11-02
CN113588524B CN113588524B (en) 2022-06-07

Family

ID=78256120

Family Applications (1)

Application Number Title Priority Date Filing Date
CN202110904965.3A Active CN113588524B (en) 2021-04-29 2021-08-07 Method for removing gadolinium isotope channel pollution in mass spectrum flow data

Country Status (1)

Country Link
CN (1) CN113588524B (en)

Citations (6)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
WO2016024020A1 (en) * 2014-08-14 2016-02-18 Universität Zürich Highly multiplexed absolute quantification of molecules on the single cell level
CN107255722A (en) * 2017-04-26 2017-10-17 马鞍山易廷生物科技有限公司 Streaming combination ICP MS single cell protein detection method is marked based on metal isotope
US20190331689A1 (en) * 2018-04-27 2019-10-31 Deutsches Rheuma-Forschungszentrum Berlin Functionalized metal-labeled beads for mass cytometry
CN110412287A (en) * 2019-07-11 2019-11-05 上海宸安生物科技有限公司 One kind being based on single celled immunocyte parting quantitative analysis method
CN110412286A (en) * 2019-07-11 2019-11-05 上海宸安生物科技有限公司 A method of Single cell analysis being carried out to tumor sample using mass spectrum streaming systems
CN111982789A (en) * 2020-08-21 2020-11-24 中国科学院生态环境研究中心 High-throughput detection method of metal ions and metal nanoparticles based on single-cell enrichment and single-cell mass spectrometry

Patent Citations (6)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
WO2016024020A1 (en) * 2014-08-14 2016-02-18 Universität Zürich Highly multiplexed absolute quantification of molecules on the single cell level
CN107255722A (en) * 2017-04-26 2017-10-17 马鞍山易廷生物科技有限公司 Streaming combination ICP MS single cell protein detection method is marked based on metal isotope
US20190331689A1 (en) * 2018-04-27 2019-10-31 Deutsches Rheuma-Forschungszentrum Berlin Functionalized metal-labeled beads for mass cytometry
CN110412287A (en) * 2019-07-11 2019-11-05 上海宸安生物科技有限公司 One kind being based on single celled immunocyte parting quantitative analysis method
CN110412286A (en) * 2019-07-11 2019-11-05 上海宸安生物科技有限公司 A method of Single cell analysis being carried out to tumor sample using mass spectrum streaming systems
CN111982789A (en) * 2020-08-21 2020-11-24 中国科学院生态环境研究中心 High-throughput detection method of metal ions and metal nanoparticles based on single-cell enrichment and single-cell mass spectrometry

Non-Patent Citations (1)

* Cited by examiner, † Cited by third party
Title
曾浔 等: "基于单细胞质谱流式技术的早期肝癌区域免疫特性的研究", 《第13届全国免疫学学术大会》 *

Also Published As

Publication number Publication date
CN113588524B (en) 2022-06-07

Similar Documents

Publication Publication Date Title
Estabrook et al. Studies on the content and organization of the respiratory enzymes of mitochondria
Haberkorn et al. Glucose uptake, perfusion, and cell proliferation in head and neck tumors: relation of positron emission tomography to flow cytometry
Kim et al. Metabolomic screening and star pattern recognition by urinary amino acid profile analysis from bladder cancer patients
O'Neill et al. Thymidine kinase 1–a prognostic and diagnostic indicator in ALL and AML patients
EP3321676B1 (en) Blood test kit and blood analysis method
Amano et al. Stable xenon CT cerebral blood flow measurements computed by a single compartment-double integration model in normal aging and dementia
Wieland et al. Neuromelanin-sensitive magnetic resonance imaging in schizophrenia: a meta-analysis of case-control studies
Links et al. Biomarkers and mechanistic approaches in environmental epidemiology
US4835097A (en) Method for ascertaining the history of a condition of the body from a single blood sample
CN113588524B (en) Method for removing gadolinium isotope channel pollution in mass spectrum flow data
Quigg et al. Dynamic FDG‐PET demonstration of functional brain abnormalities
Ulvik Hereditary haemochromatosis through 150 years
CN103760159B (en) A kind of method and system of Bacteria Identification and Analysis of Drug Susceptibility
CN113155983B (en) Combined marker and application and detection kit thereof
Borras et al. Exhaled breath condensate methods adapted from human studies using longitudinal metabolomics for predicting early health alterations in dolphins
US9189595B2 (en) Apparatus and associated method for analyzing small molecule components in a complex mixture
Zhu et al. Urine based near-infrared spectroscopy analysis reveals a noninvasive and convenient diagnosis method for cancers: a pilot study
Van De Wiele et al. Absolute 24 h quantification of 99Tcm-DMSA uptake in patients with severely reduced kidney function: A comparison with: 51: Cr-EDTA clearance
Hare et al. Rapid estimation of DOPA in physiological fluids using the amino acid analyzer
Boisson et al. French experience of quality assessment of quantitative urinary analysis
CN117630219B (en) Method for detecting pyrrole alkaloid protein adduct and kit
Hare et al. Imaging Metals in the Brain by Laser Ablation–Inductively Coupled Plasma-Mass Spectrometry
CN110993021B (en) Method for measuring and calculating biological age of human body
CN109979598B (en) By human body18F-FDG PET data analysis tissue DNA hydroxymethyl background and application
CN109932511B (en) Urine exosome phospholipid marker for liver cancer screening and kit thereof

Legal Events

Date Code Title Description
PB01 Publication
PB01 Publication
SE01 Entry into force of request for substantive examination
SE01 Entry into force of request for substantive examination
GR01 Patent grant
GR01 Patent grant