CN109859188B - Fluorescence crosstalk correction method based on mean shift algorithm and application thereof - Google Patents

Fluorescence crosstalk correction method based on mean shift algorithm and application thereof Download PDF

Info

Publication number
CN109859188B
CN109859188B CN201910099554.4A CN201910099554A CN109859188B CN 109859188 B CN109859188 B CN 109859188B CN 201910099554 A CN201910099554 A CN 201910099554A CN 109859188 B CN109859188 B CN 109859188B
Authority
CN
China
Prior art keywords
fluorescence
data
value
calculating
channel
Prior art date
Legal status (The legal status is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the status listed.)
Active
Application number
CN201910099554.4A
Other languages
Chinese (zh)
Other versions
CN109859188A (en
Inventor
高鹏
夏江
Current Assignee (The listed assignees may be inaccurate. Google has not performed a legal analysis and makes no representation or warranty as to the accuracy of the list.)
Pilot medical technology (Shenzhen) Co.,Ltd.
Original Assignee
Pilot Gene Technologies Hangzhou Co ltd
Priority date (The priority date is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the date listed.)
Filing date
Publication date
Application filed by Pilot Gene Technologies Hangzhou Co ltd filed Critical Pilot Gene Technologies Hangzhou Co ltd
Priority to CN201910099554.4A priority Critical patent/CN109859188B/en
Publication of CN109859188A publication Critical patent/CN109859188A/en
Application granted granted Critical
Publication of CN109859188B publication Critical patent/CN109859188B/en
Active legal-status Critical Current
Anticipated expiration legal-status Critical

Links

Images

Abstract

The invention relates to the technical field of machine learning, in particular to a fluorescence crosstalk correction method based on a mean shift algorithm and application thereof. The fluorescence crosstalk correction method based on the mean shift algorithm comprises the following steps: s1: preprocessing multi-channel fluorescence intensity data; s2: calculating different channel clustering centers based on mean shift clustering; s3: and calculating the slope, and filling the correction matrix to obtain the corrected fluorescence data. The correction method disclosed by the invention can avoid artificial errors introduced when the fluorescence intensity value is sampled, reduce the influence of noise on the fluorescence intensity, and accurately and quickly calculate the element value of the correction matrix, so that corrected fluorescence data is obtained, and the obtained fluorescence data is more real and accurate.

Description

Fluorescence crosstalk correction method based on mean shift algorithm and application thereof
Technical Field
The invention relates to the technical field of machine learning, in particular to a fluorescence crosstalk correction method based on a mean shift algorithm and application thereof.
Background
The digital PCR technology is a technology that DNA or RNA samples are diluted and then are evenly distributed to independent micro-reaction units, a fluorescent probe marked by a specific gene fragment is used for marking, molecular template PCR amplification is carried out, and after the amplification is finished, the positivity or the negativity of the micro-reaction units is judged according to the strength of a fluorescent signal.
The fluorescent probe labeled by the specific gene fragment generates a corresponding fluorescent signal after being combined with the molecular template strand. Different fluorescence light corresponds to a specific wave band, and the sensor detecting the wave band can display the maximum brightness value. The fluorescence spectra are continuous, and meanwhile, partial intersection exists among different fluorescence, and the fluorescence intensity detected on the same wave band can be superposed by several kinds of fluorescence, which is a fluorescence crosstalk phenomenon. As shown in fig. 1.
To achieve spectral crosstalk correction, the fluorescence intensity space is converted to the dye concentration space as follows: assumed to be in the four-color fluorescence signal, I1(t),I2(t),I3(t) and I4(t) 4 fluorescence intensity values detected at time t, ROX (t), FAM (t), VIC (t), CY5(t) represent 4 dye concentration values. The conversion equation from dye concentration space to fluorescence intensity space is as follows:
Figure BDA0001965333830000011
in the formula, WijRepresenting the ratio of the fluorescence intensity of the ith dye in the jth channel, b1(t),b2(t),B3(t),b4(t) represents noise signals of 4 channels. In the above mathematical model, S represents [ I ]1(t),I2(t),I3(t),I4(t)]TI.e. the fluorescence signal; f represents [ ROX (t), FAM (t), VIC (t), CY5(t)]TI.e. the dye signal; w is a spectral crosstalk matrix; the formula is expressed as S ═ WF. According to the fluorescence intensity signal S, the dye concentration signal is F ═ W-1And S, realizing fluorescence crosstalk correction.
The existing method for calculating the fluorescence crosstalk correction matrix W comprises a peak value method, a slope iteration method, a K-mean clustering method and the like.
The peak value method is to manually select a time point, fill the light intensity of n wave bands of the time point into the corresponding column of the correction matrix, and the selected wave peaks have good signal-to-noise ratio and have small overlap. The peak method however has the following drawbacks: firstly, manual selection of peaks produces human error, and secondly, the sample is limited and cannot cover all data sets.
K-mean clustering requires that fluorescence intensity values form n clusters in an n-dimensional space, but if the fluorescence intensity data is disturbed to a high degree or noise influences form clusters larger than n, the K-mean clustering may calculate erroneous clustering centers, thereby influencing the correction matrix data.
The slope iteration method is to select two channels as X axis and Y axis and take the fluorescence intensity value as coordinate value. The data points are generally distributed around the X-axis and Y-axis in the first quadrant, and then sampled at equal intervals, the slope of the fitted line is calculated, and the corresponding W matrix is filled. However, if the distribution of data is concentrated or influenced by noise points, after sampling at equal intervals, the fitted straight line generates a slope greater than 1, and the authenticity of the W data is greatly influenced.
Disclosure of Invention
The invention discloses a fluorescence crosstalk correction method based on a mean shift algorithm, which can avoid human errors introduced when a fluorescence intensity value is sampled, reduce the influence of noise on the fluorescence intensity, and accurately and quickly calculate the element value of a correction matrix so as to obtain corrected fluorescence data.
The specific scheme of the invention is as follows:
the invention discloses a fluorescence crosstalk correction method based on a mean shift algorithm, which comprises the following steps:
s1: preprocessing multi-channel fluorescence intensity data;
s2: calculating different channel clustering centers based on mean shift clustering;
s3: and calculating the slope, and filling the correction matrix to obtain the corrected fluorescence data.
It should be understood that the present invention is not limited to the above steps, and may also include other additional steps, for example, before step S1, between steps S1 and S2, between steps S2 and S3, between steps S3 and S4, between steps S4 and S5, and after step S5, without departing from the scope of the present invention.
Preferably, the step S1 includes:
s11: shooting chip pictures of different channels under a stable imaging system;
s12: acquiring a fluorescence value of a micro-reaction unit in a picture;
s13: selecting a channel to be corrected and a chip type of a picture;
s14: and (4) preprocessing fluorescence data.
It is understood that additional steps may be included before, during, or after the above-described steps without departing from the scope of the present invention.
More preferably, the S12 includes:
s121: positioning the central position of the micro-reaction unit;
s122: the mean of the fluorescence values of the microreaction units was calculated.
Further, the formula for calculating the mean value of the fluorescence values of the micro reaction cells in S122 is:
Figure BDA0001965333830000031
where f (x, y) represents a pixel value at the center position (x, y), Vn(x, y) represents the fluorescence value of the nth microreaction unit, N ∈ [1, N ]]And N represents the number of reaction units.
It is understood that additional steps may be included before, during, or after the above-described steps without departing from the scope of the present invention.
Preferably, the S14 includes:
s141: optionally selecting a channel fluorescence data value, calculating
Figure BDA0001965333830000032
Wherein, Bn(x, y) represents a background pixel value of the nth micro-reaction unit,
Figure BDA0001965333830000033
new fluorescence data representing the nth micro-reaction unit;
s142: selecting the maximum fluorescence value in the channel
Figure BDA0001965333830000034
And minimum fluorescence value
Figure BDA0001965333830000035
Let the initial threshold be
Figure BDA0001965333830000036
Figure BDA0001965333830000037
S143: data sets A and B are set, where A represents a pixel value equal to or greater than T0The number of data is M1(ii) a B represents that the number of data is less than T0The number of data is M2
S144: calculating the gray level mean value G of the set A and the set BAAnd GB
Figure BDA0001965333830000038
S145: calculating a new threshold value
Figure BDA0001965333830000039
S146: comparing the iteration T of the previous iteration and the iteration T of the next iteration, if the iteration T of the previous iteration and the iteration T of the next iteration are different, repeating the steps S143, S144 and S145, otherwise, carrying out the next step;
s147: taking T as a reference point of channel data; summing all data points greater than T to obtain a mean value
Figure BDA00019653338300000310
Figure BDA00019653338300000311
Wherein M isCThe number of data with the pixel value greater than or equal to T;
s148: all data under the normalized channel:
Figure BDA00019653338300000312
wherein
Figure BDA00019653338300000313
Representing the normalized fluorescence intensity value;
s149: and repeating S141-S148, and performing normalization processing on other selected channel data.
It is understood that additional steps may be included before, during, or after the above-described steps without departing from the scope of the present invention.
Preferably, the step S2 includes:
s21: two channels, CH1 and CH2, optionally normalized, to construct a two-dimensional spatial distribution P (V) of fluorescence intensity datach1,Vch2) In which V isch1And Vch2Fluorescence intensity values for channel CH1 and channel CH2, respectively;
s22: randomly selecting an unlabeled data point
Figure BDA0001965333830000041
As the center point, where c ∈ [1, N ]];
S23: calculating distance points
Figure BDA0001965333830000042
Point less than BW
Figure BDA0001965333830000043
Point recording
Figure BDA0001965333830000044
Are integrated into U, wherein
Figure BDA0001965333830000045
Wherein BW is 0.5, j represents different points in the set U; setting all points in the set U as class C, and adding 1 to the statistical times of the points respectively;
s24: by point
Figure BDA0001965333830000046
As a center, calculate
Figure BDA0001965333830000047
To the vector of each element in the set U, these vectors are added to obtain the Shift vector Shift (V)ch1,Vch2) Update the center position to
Figure BDA0001965333830000048
Wherein
Figure BDA0001965333830000049
Figure BDA00019653338300000410
Where j represents a different point in the set U;
s25: repeating S23-S24 until point Pnew(Vch1,Vch2) And point
Figure BDA00019653338300000411
Is less than 0.05, when the recording point P is formednew(Vch1,Vch2) All coordinate points encountered in the iteration process belong to the class C;
s26: p of current class C if convergednew(Vch1,Vch2) If the distance between the center of the cluster and the center of other existing clusters is less than the threshold value of 0.1, merging the two clusters, and if not, taking c as a new cluster;
s27: repeating S22-S26 until all points are marked for access;
s28: counting the access frequency of each point corresponding class, taking the class with the maximum access frequency as the class to which the current point set belongs,
s29: repeating S21-S28 finds all cluster centers for any two channels.
It is understood that additional steps may be included before, during, or after the above-described steps without departing from the scope of the present invention.
Preferably, the step S3 includes:
s31: calculating a slope;
s32: all slope values take negative numbers and are filled into corresponding crosstalk inverse matrixes, and elements on main diagonals of the crosstalk inverse matrixes are assigned to be 1;
s33: substituting the generated inverse matrix into the formula F ═ W-1And S, obtaining corrected fluorescence data, wherein W represents a spectrum crosstalk signal, S represents a fluorescence intensity signal, and F represents a dye concentration signal.
It is understood that additional steps may be included before, during, or after the above-described steps without departing from the scope of the present invention.
More preferably, in S31, a cluster center is selected based on the fluorescence intensity value, the number of clusters, and the cluster position, and then the slope is calculated.
Specifically, in an embodiment of the present invention, the step S31 includes one of the steps S311 or S312:
s311: if the number of channels is less than 3, assigning all main diagonal angles of the correction matrix to be 1, and assigning other positions to be 0;
s312: if the number of channels is greater than or equal to 3, the slope is calculated as follows:
s3121: optionally marking two channels CH1 and CH2 as an X channel and a Y channel, and copying the cluster number and the central position obtained in the step S2;
s3122: if the number of clusters is less than 3, the corresponding correction matrix is assigned to be 0, otherwise, the following steps are carried out;
s3123: if the number of clusters is equal to 3, each cluster center CiAddition of the horizontal and vertical coordinate values
Figure BDA0001965333830000051
Figure BDA0001965333830000052
And
Figure BDA0001965333830000053
represents the ith class CiCorresponding abscissa and ordinate values, CH1 and CH2 represent corresponding channels; cluster C with the smallest summinAnd othersTwo cluster centers are marked as CAAnd CBThen slope of
Figure BDA0001965333830000054
If L isI<LIIThen, then
Figure BDA0001965333830000055
Otherwise
Figure BDA0001965333830000056
LxAnd LyRepresents the corresponding slopes of the channels CH1 and CH 2;
s3124: if the number of clusters is more than 3, firstly calculating the mean value of all cluster centers:
Figure BDA0001965333830000057
Figure BDA0001965333830000058
where n represents the total number of cluster centers.
S3125: calculating a slope, comprising the steps of:
step A1, screening
Figure BDA0001965333830000059
Cluster center of (2):
(1) if the filtered cluster center does not exist, L x0 and Ly=0;
(2) If the number of the screened cluster centers is 1, the data of the cluster centers corresponding to the CH1 channel and the CH2 channel are represented as
Figure BDA00019653338300000510
And
Figure BDA00019653338300000511
(3) if the number of the screened cluster centers is more than 1, the coordinate value corresponding to the class with the minimum sum of the horizontal coordinate value and the vertical coordinate value is given to
Figure BDA00019653338300000512
And
Figure BDA00019653338300000513
step A2, screening
Figure BDA00019653338300000514
Cluster center of (2):
(1) if the number of the screened cluster centers is less than 2, L x0 and Ly=0;
(2) If the number of the screened cluster centers is equal to 2, two cluster centers are marked as CAAnd CBThen slope of
Figure BDA0001965333830000061
Figure BDA0001965333830000062
If L isI<LIIThen Lx=LI
Figure BDA0001965333830000063
Otherwise Lx=LII
Figure BDA0001965333830000064
LxAnd LyRepresents the corresponding slopes of the channels CH1 and CH 2;
(3) if the number of the screened cluster centers is more than 2, marking the coordinate corresponding to the class with the maximum abscissa value as CAThe coordinate corresponding to the class with the maximum ordinate value is marked as CBThen slope of
Figure BDA0001965333830000065
If L isI<LIIThen Lx=LI
Figure BDA0001965333830000066
Otherwise Lx=LII
Figure BDA0001965333830000067
LxAnd LyRepresents the corresponding slopes of the channels CH1 and CH 2;
s3126: and repeating the steps S3121-S3125, and calculating slope values of other two channels.
It is understood that additional steps may be included before, during, or after the above-described steps without departing from the scope of the present invention.
On the basis of the common general knowledge in the field, the above-mentioned preferred conditions can be combined arbitrarily without departing from the concept and the protection scope of the invention.
The invention discloses a digital PCR system in a second aspect, and the correction of the fluorescence crosstalk signal is realized by adopting the method.
The third aspect of the invention discloses the application of the method in the technical field of machine learning.
Compared with the prior art, the invention has the following remarkable advantages and effects:
the uniform drift clustering correction method is completed based on the condition that all data sets participate in calculation, the distribution and the shape of clustering do not influence the calculation of the slope, and the condition that the selection of single data and the limitation of samples occur by means of a peak value method and the like is greatly avoided.
In the data preprocessing stage, the threshold value T is determined by adopting an iterative loop method, all data with fluorescence values larger than T are calculated, the mean value of the data is calculated, and the data is normalized finally, so that the threshold value T is determined in a way that the data are comprehensively considered to be composed of negative and positive, and the method is more accurate.
When mean shift updates the cluster center, a formula is used
Figure BDA0001965333830000068
The clustering process is prevented from falling into local optimal or dead loops.
When the slope is calculated, the method disclosed by the invention screens real clustering centers according to the conditions of fluorescence intensity values, clustering numbers, clustering positions and the like, eliminates the false clustering centers from participating in calculation, and ensures that the data is more authentic.
Drawings
FIG. 1 is a schematic diagram of four-color fluorescence crosstalk;
FIG. 2 is a flow chart of a technical solution in an embodiment of the present invention;
FIG. 3 is a diagram illustrating a mean shift according to an embodiment of the present invention;
FIG. 4 is a graph of fluorescence distribution before calibration in an embodiment of the present invention;
FIG. 5 is a corrected fluorescence distribution diagram according to an embodiment of the present invention.
Detailed Description
The invention is further illustrated by the following examples, which are not intended to limit the scope of the invention.
The method disclosed by the invention mainly comprises the following steps:
s1: preprocessing multi-channel fluorescence intensity data;
s2: calculating different channel clustering centers based on mean shift clustering;
s3: and calculating the slope, and filling the correction matrix to obtain the corrected fluorescence data.
The terms are explained in the present invention:
and (3) mean shift algorithm: mean-shift clustering is a sliding-window based algorithm that attempts to find dense regions of data points. This is a centroid based algorithm, which means that its goal is to locate the center point of each group/class by updating the candidate points for the center point to the mean of the points within the sliding window. These candidate windows are then filtered in a post-processing stage to eliminate approximate duplicates, forming a final set of center points and their corresponding groups.
Referring to fig. 2, a basic flow chart of the technical scheme of the invention is shown, which comprises the following steps:
firstly, reading in data, then judging the number of channels, if the number of channels is less than three channels, calculating the slope, and then calculating the correction matrix to obtain corrected fluorescence data. If the number of the channels is larger than or equal to three channels, selecting a solid or liquid chip picture, then carrying out data preprocessing, then calculating a positioning/non-positioning channel clustering center, screening the clustering center, then calculating a slope, and then calculating a correction matrix to obtain corrected fluorescence data.
Specifically, in an embodiment of the present invention, a fluorescence crosstalk correction method based on a mean shift algorithm is disclosed, which includes the following steps:
s1: preprocessing multi-channel fluorescence intensity data;
s2: calculating different channel clustering centers based on mean shift clustering;
s3: and calculating the slope, and filling the correction matrix to obtain the corrected fluorescence data.
In an embodiment of the present invention, the step S1 includes:
s11: shooting chip pictures of different channels under a stable imaging system;
s12: acquiring a fluorescence value of a micro-reaction unit in a picture;
s13: selecting a channel to be corrected and a chip type of a picture;
s14: and (4) preprocessing fluorescence data.
More preferably, the S12 includes:
s121: positioning the central position of the micro-reaction unit;
s122: the mean of the fluorescence values of the microreaction units was calculated.
In one embodiment of the present invention, in S122, the formula for calculating the mean value of the fluorescence values of the micro reaction cells is:
Figure BDA0001965333830000081
where f (x, y) represents a pixel value at the center position (x, y), Vn(x, y) represents the fluorescence value of the nth microreaction unit, N ∈ [1, N ]]And N represents the number of reaction units.
In an embodiment of the present invention, the S14 includes:
s141: optionally selecting a channel fluorescence data value, calculating
Figure BDA0001965333830000082
Wherein, Bn(x, y) represents a background pixel value of the nth micro-reaction unit,
Figure BDA0001965333830000083
new fluorescence data representing the nth micro-reaction unit;
s142: selecting the maximum fluorescence value in the channel
Figure BDA0001965333830000084
And minimum fluorescence value
Figure BDA0001965333830000085
Let the initial threshold be
Figure BDA0001965333830000086
Figure BDA0001965333830000087
S143: data sets A and B are set, where A represents a pixel value equal to or greater than T0The number of data is M1(ii) a B represents that the number of data is less than T0The number of data is M2
S144: calculating the gray level mean value G of the set A and the set BAAnd GB
Figure BDA0001965333830000088
S145: calculating a new threshold value
Figure BDA0001965333830000089
S146: comparing the iteration T of the previous iteration and the iteration T of the next iteration, if the iteration T of the previous iteration and the iteration T of the next iteration are different, repeating the steps S143, S144 and S145, otherwise, carrying out the next step;
s147: taking T as a reference point of channel data; summing all data points greater than T to obtain a mean value
Figure BDA00019653338300000810
Figure BDA00019653338300000811
Wherein M isCThe number of data with the pixel value greater than or equal to T;
s148: all data under the normalized channel:
Figure BDA0001965333830000091
wherein
Figure BDA0001965333830000092
Representing the normalized fluorescence intensity value;
s149: and repeating S141-S148, and performing normalization processing on other selected channel data.
In an embodiment of the present invention, the step S2 includes:
s21: two channels, CH1 and CH2, optionally normalized, to construct a two-dimensional spatial distribution P (V) of fluorescence intensity datach1,Vch2) In which V isch1And Vch2Fluorescence intensity values for channel CH1 and channel CH2, respectively;
s22: randomly selecting an unlabeled data point
Figure BDA0001965333830000093
As the center point, where c ∈ [1, N ]];
S23: calculating distance points
Figure BDA0001965333830000094
Point less than BW
Figure BDA0001965333830000095
Point recording
Figure BDA0001965333830000096
Are integrated into U, wherein
Figure BDA0001965333830000097
Wherein BW is 0.5, j represents different points in the set U; setting all points in the set U as class C, and adding 1 to the statistical times of the points respectively;
s24: by point
Figure BDA0001965333830000098
As a center, calculate
Figure BDA0001965333830000099
To the vector of each element in the set U, these vectors are added to obtain the Shift vector Shift (V)ch1,Vch2) Update the center position to
Figure BDA00019653338300000910
Wherein
Figure BDA00019653338300000911
Figure BDA00019653338300000912
Where j represents a different point in the set U;
s25: repeating S23-S24 until point Pnew(Vch1,Vch2) And point
Figure BDA00019653338300000913
Is less than 0.05, when the recording point P is formednew(Vch1,Vch2) All coordinate points encountered in the iteration process belong to the class C;
s26: p of current class C if convergednew(Vch1,Vch2) If the distance between the center of the cluster and the center of other existing clusters is less than the threshold value of 0.1, merging the two clusters, and if not, taking c as a new cluster;
s27: repeating S22-S26 until all points are marked for access;
s28: counting the access frequency of each point corresponding class, taking the class with the maximum access frequency as the class to which the current point set belongs,
s29: repeating S21-S28 finds all cluster centers for any two channels.
The specific process is shown in FIG. 3, the center position is a randomly selected data point
Figure BDA00019653338300000914
The effective area refers to all points in the set U, each point to
Figure BDA00019653338300000915
Is less than BW, and the black arrow represented by the solid line represents the movement vector Shift (V)ch1,Vch2)。
In an embodiment of the present invention, the step S3 includes:
s31: calculating a slope;
s32: and taking negative values of all slope values, filling the negative values into the corresponding crosstalk inverse matrix, and assigning 1 to the elements on the main diagonal of the crosstalk inverse matrix. Table 1 is the inverse matrix calculated from the fluorescence data for a selected set of 3 channels: the main diagonal element is 1, and the non-main diagonal element is obtained by taking the negative of the element at the corresponding position of the crosstalk matrix.
TABLE 1 inverse matrix for crosstalk correction
Channel ROX VIC FAM
ROX 1 -0.398 -0.149
VIC 0 1 -0.337
FAM 0 -0.0522 1
S33: substituting the generated inverse matrix into the formula F ═ W-1And S, obtaining corrected fluorescence data, wherein W represents a spectrum crosstalk signal, S represents a fluorescence intensity signal, and F represents a dye concentration signal. Fig. 4 and 5 show the fluorescence value distributions of the VIC channel and the FAM channel before and after correction, and the fluorescence value distributions after correction are more gathered, and the positive and negative points of the corresponding channels are easier to segment.
In one embodiment of the present invention, in S31, the cluster center is selected based on the fluorescence intensity value, the number of clusters, and the cluster position, and then the slope is calculated.
Specifically, in an embodiment of the present invention, the step S31 includes one of the steps S311 or S312:
s311: if the number of channels is less than 3, assigning all main diagonal angles of the correction matrix to be 1, and assigning other positions to be 0;
s312: if the number of channels is greater than or equal to 3, the slope is calculated as follows:
s3121: optionally marking two channels CH1 and CH2 as an X channel and a Y channel, and copying the cluster number and the central position obtained in the step S2;
s3122: if the number of clusters is less than 3, the corresponding correction matrix is assigned to be 0, otherwise, the following steps are carried out;
s3123: if the number of clusters is equal to 3, each cluster center CiTransverse and longitudinal directionsAddition of coordinate values
Figure BDA0001965333830000101
Figure BDA0001965333830000102
And
Figure BDA0001965333830000103
represents the ith class CiCorresponding abscissa and ordinate values, CH1 and CH2 represent corresponding channels; cluster C with the smallest summinThe other two cluster centers are denoted CAAnd CBThen slope of
Figure BDA0001965333830000104
If L isI<LIIThen Lx=LI
Figure BDA0001965333830000105
Otherwise Lx=LII
Figure BDA0001965333830000106
LxAnd LyRepresents the corresponding slopes of the channels CH1 and CH 2;
s3124: if the number of clusters is more than 3, firstly calculating the mean value of all cluster centers:
Figure BDA0001965333830000111
Figure BDA0001965333830000112
where n represents the total number of cluster centers.
S3125: calculating a slope, comprising the steps of:
step A1, screening
Figure BDA0001965333830000113
Cluster center of (2):
(1) if the filtered cluster center does not exist, L x0 and Ly=0;
(2) If the number of the screened cluster centers is 1, the data of the cluster centers corresponding to the CH1 channel and the CH2 channel are represented as
Figure BDA0001965333830000114
And
Figure BDA0001965333830000115
(3) if the number of the screened cluster centers is more than 1, the coordinate value corresponding to the class with the minimum sum of the horizontal coordinate value and the vertical coordinate value is given to
Figure BDA0001965333830000116
And
Figure BDA0001965333830000117
step A2, screening
Figure BDA0001965333830000118
Cluster center of (2):
(1) if the number of the screened cluster centers is less than 2, L x0 and Ly=0;
(2) If the number of the screened cluster centers is equal to 2, two cluster centers are marked as CAAnd CBThen slope of
Figure BDA0001965333830000119
Figure BDA00019653338300001110
If L isI<LIIThen Lx=LI
Figure BDA00019653338300001111
Otherwise Lx=LII
Figure BDA00019653338300001112
LxAnd LyRepresenting the pairs of channels CH1 and CH2The corresponding slope;
(3) if the number of the screened cluster centers is more than 2, marking the coordinate corresponding to the class with the maximum abscissa value as CAThe coordinate corresponding to the class with the maximum ordinate value is marked as CBThen slope of
Figure BDA00019653338300001113
If L isI<LIIThen Lx=LI
Figure BDA00019653338300001114
Otherwise Lx=LII
Figure BDA00019653338300001115
LxAnd LyRepresents the corresponding slopes of the channels CH1 and CH 2;
s3126: and repeating the steps S3121-S3125, and calculating slope values of other two channels.
The above embodiments are preferred embodiments of the present invention, but the present invention is not limited to the above embodiments, and any other changes, modifications, substitutions, combinations, and simplifications which do not depart from the spirit and principle of the present invention should be construed as equivalents thereof, and all such changes, modifications, substitutions, combinations, and simplifications are intended to be included in the scope of the present invention.

Claims (7)

1. A fluorescence crosstalk correction method based on a mean shift algorithm is characterized by comprising the following steps:
s1: preprocessing multi-channel fluorescence intensity data;
s2: calculating different channel clustering centers based on mean shift clustering;
s3: calculating the slope, and filling a correction matrix to obtain corrected fluorescence data;
the step S3 includes:
s31: calculating a slope;
s32: all slope values take negative numbers and are filled into corresponding crosstalk inverse matrixes, and elements on main diagonals of the crosstalk inverse matrixes are assigned to be 1;
s33: substituting the generated inverse matrix into the formula F ═ W-1S, obtaining corrected fluorescence data, wherein W represents a spectrum crosstalk signal, S represents a fluorescence intensity signal, and F represents a dye concentration signal;
in S31, screening out a clustering center according to the fluorescence intensity value, the clustering number and the clustering position, and then calculating a slope;
s311: if the number of channels is less than 3, assigning all main diagonal angles of the correction matrix to be 1, and assigning other positions to be 0;
s312: if the number of channels is greater than or equal to 3, the slope is calculated as follows:
s3121: optionally marking two channels ch1 and ch2 as an X channel and a Y channel, and copying the cluster number and the central position obtained in the step S2;
s3122: if the number of clusters is less than 3, the corresponding correction matrix is assigned to be 0, otherwise, the following steps are carried out;
s3123: if the number of clusters is equal to 3, each cluster center CiAddition of the horizontal and vertical coordinate values
Figure FDA0002838465060000011
And
Figure FDA0002838465060000012
represents the ith class CiCorresponding horizontal and vertical coordinate values, ch1 and ch2 represent corresponding channels; selecting the smallest cluster center CminThe other two cluster centers are denoted CAAnd CBThen slope of
Figure FDA0002838465060000013
If L isI<LIIThen Lx=LI
Figure FDA0002838465060000014
Otherwise Lx=LII
Figure FDA0002838465060000015
LxAnd LyIndicates the slopes of the channels ch1 and ch 2;
s3124: if the number of clusters is more than 3, firstly calculating the mean value of all cluster centers:
Figure FDA0002838465060000016
Figure FDA0002838465060000017
wherein n represents the total number of cluster centers;
s3125: calculating a slope, comprising the steps of:
step A1, screening
Figure FDA0002838465060000021
Cluster center of (2):
(1) if the filtered cluster center does not exist, Lx0 and Ly=0;
(2) If the number of the screened cluster centers is 1, the data of the corresponding channel ch1 and channel ch2 of the cluster center is represented as
Figure FDA0002838465060000022
And
Figure FDA0002838465060000023
(3) if the number of the screened cluster centers is more than 1, the coordinate value corresponding to the class with the minimum sum of the horizontal coordinate value and the vertical coordinate value is given to
Figure FDA0002838465060000024
And
Figure FDA0002838465060000025
step A2, screening
Figure FDA0002838465060000026
Is polymerized byClass center:
(1) if the number of the screened cluster centers is less than 2, Lx0 and Ly=0;
(2) If the number of the screened cluster centers is equal to 2, two cluster centers are marked as CAAnd CBThen slope of
Figure FDA0002838465060000027
If L isI<LIIThen Lx=LI
Figure FDA0002838465060000028
Otherwise Lx=LII
Figure FDA0002838465060000029
LxAnd LyIndicate the slopes corresponding to the channels ch1 and ch 2;
(3) if the number of the screened cluster centers is more than 2, marking the coordinate corresponding to the class with the maximum abscissa value as CAThe coordinate corresponding to the class with the maximum ordinate value is marked as CBThen slope of
Figure FDA00028384650600000210
If L isI<LIIThen Lx=LI
Figure FDA00028384650600000211
Otherwise Lx=LII
Figure FDA00028384650600000212
LxAnd LyIndicate the slopes corresponding to the channels ch1 and ch 2;
s3126: and repeating the steps S3121-S3125, and calculating slope values of other two channels.
2. The method according to claim 1, wherein the step S1 includes:
s11: shooting chip pictures of different channels under a stable imaging system;
s12: acquiring a fluorescence value of a micro-reaction unit in a picture;
s13: selecting a channel to be corrected and a chip type of a picture;
s14: and (4) preprocessing fluorescence data.
3. The method according to claim 2, wherein the S12 includes:
s121: positioning the central position of the micro-reaction unit;
s122: the mean of the fluorescence values of the microreaction units was calculated.
4. The method of claim 3, wherein the formula for calculating the mean value of the fluorescence values of the micro reaction cells in S122 is:
Figure FDA00028384650600000213
where f (x, y) represents a pixel value at the center position (x, y), Vn(x, y) represents the fluorescence value of the nth microreaction unit, N ∈ [1, N ]]And N represents the number of microreaction units.
5. The method according to claim 2, wherein the S14 includes:
s141: optionally selecting a channel fluorescence data value, calculating
Figure FDA0002838465060000031
Wherein, Bn(x, y) represents a background pixel value of the nth micro-reaction unit,
Figure FDA0002838465060000032
new fluorescence data representing the nth micro-reaction unit;
s142: selecting the maximum fluorescence value in the channel
Figure FDA0002838465060000033
And minimum fluorescence value
Figure FDA0002838465060000034
Let the initial threshold be
Figure FDA0002838465060000035
S143: data sets A and B are set, where A represents a pixel value equal to or greater than T0The number of data is M1(ii) a B represents a pixel value less than T0The number of data is M2
S144: calculating the gray level mean value G of the set A and the set BAAnd GB
Figure FDA0002838465060000036
S145: calculating a new threshold value
Figure FDA0002838465060000037
S146: comparing the iteration T of the previous iteration and the iteration T of the next iteration, if the iteration T of the previous iteration and the iteration T of the next iteration are different, repeating the steps S143, S144 and S145, otherwise, carrying out the next step;
s147: taking T as a reference point of channel data; summing all data points greater than T to obtain a mean value
Figure FDA0002838465060000038
Wherein M isCThe number of data with pixel values larger than T;
s148: all data under the normalized channel:
Figure FDA0002838465060000039
wherein
Figure FDA00028384650600000310
Representing the normalized fluorescence intensity value;
s149: and repeating S141-S148, and performing normalization processing on other selected channel data.
6. The method according to claim 1, wherein the step S2 includes:
s21: two channels ch1 and ch2, optionally normalized, construct the two-dimensional spatial distribution P (V) of fluorescence intensity datach1,Vch2) In which V isch1And Vch2Fluorescence intensity values for channels ch1 and ch2, respectively;
s22: randomly selecting an unlabeled data point
Figure FDA00028384650600000311
As the center point, where c ∈ [1, N ]];
S23: calculating distance points
Figure FDA0002838465060000041
Point less than BW
Figure FDA0002838465060000042
Point recording
Figure FDA0002838465060000043
Are integrated into U, wherein
Figure FDA0002838465060000044
Wherein BW is 0.5, j represents different points in the set U; setting all points in the set U as class C, and adding 1 to the statistical times of the points respectively;
s24: by point
Figure FDA0002838465060000045
As a center, calculate
Figure FDA0002838465060000046
To the vector of each element in the set U, these vectors are added to obtain the Shift vector Shift (V)ch1,Vch2) Update the center position to
Figure FDA0002838465060000047
Wherein
Figure FDA0002838465060000048
Figure FDA0002838465060000049
Figure FDA00028384650600000410
Where j represents a different point in the set U;
s25: repeating S23-S24 until point Pnew(Vch1,Vch2) And point
Figure FDA00028384650600000411
Is less than 0.05, when the recording point P is formednew(Vch1,Vch2) During the process of S23-S24, all the points in the set U traversed by the algorithm belong to the class C;
s26: p of current class C if convergednew(Vch1,Vch2) If the distance between the center of the cluster and the center of other existing clusters is less than the threshold value of 0.1, merging the two clusters, and if not, taking C as a new cluster;
s27: repeating S22-S26 until all points are marked for access;
s28: counting the access frequency of each point corresponding class, taking the class with the maximum access frequency as the class to which the current point set belongs,
s29: repeating S21-S28 finds all cluster centers for any two channels.
7. A digital PCR system, characterized in that the correction of the fluorescence cross talk signal is achieved by the method according to any of claims 1-6.
CN201910099554.4A 2019-01-31 2019-01-31 Fluorescence crosstalk correction method based on mean shift algorithm and application thereof Active CN109859188B (en)

Priority Applications (1)

Application Number Priority Date Filing Date Title
CN201910099554.4A CN109859188B (en) 2019-01-31 2019-01-31 Fluorescence crosstalk correction method based on mean shift algorithm and application thereof

Applications Claiming Priority (1)

Application Number Priority Date Filing Date Title
CN201910099554.4A CN109859188B (en) 2019-01-31 2019-01-31 Fluorescence crosstalk correction method based on mean shift algorithm and application thereof

Publications (2)

Publication Number Publication Date
CN109859188A CN109859188A (en) 2019-06-07
CN109859188B true CN109859188B (en) 2021-04-06

Family

ID=66897236

Family Applications (1)

Application Number Title Priority Date Filing Date
CN201910099554.4A Active CN109859188B (en) 2019-01-31 2019-01-31 Fluorescence crosstalk correction method based on mean shift algorithm and application thereof

Country Status (1)

Country Link
CN (1) CN109859188B (en)

Families Citing this family (4)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN112345759B (en) * 2020-11-16 2022-07-12 三诺生物传感股份有限公司 Method for detecting fluorescence intensity peak
CN114444927A (en) * 2021-02-02 2022-05-06 方舟生物安全科技(广州)有限公司 Pathogenic microorganism Internet of things real-time monitoring system based on high risk area division
CN116503471B (en) * 2023-06-28 2023-08-29 南开大学 Single-molecule positioning imaging drift correction method and system for K-time neighbor position cloud picture
CN117054338A (en) * 2023-08-16 2023-11-14 四川杰莱美科技有限公司 Parallel light path system based on PCR appearance

Citations (6)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
US7402817B2 (en) * 2004-07-29 2008-07-22 The Research Foundation Of State University Of New York System and method for cross-talk cancellation in a multilane fluorescence detector
CN101688841A (en) * 2007-06-29 2010-03-31 霍夫曼-拉罗奇有限公司 Systems and methods for determining cross-talk coefficients in PCR and other data sets
CN101688838A (en) * 2008-06-11 2010-03-31 博奥生物有限公司 A reliable fluorescence correction method for two-color microarray fluorescence system
WO2016057923A1 (en) * 2014-10-09 2016-04-14 Kinetic River Corp. Particle analysis and sorting apparatus and methods
CN106596489A (en) * 2016-12-19 2017-04-26 中国科学院苏州生物医学工程技术研究所 Processing method of fluorescence intensity data in fluorescence droplet detection
CN108139325A (en) * 2015-09-22 2018-06-08 迈卡提斯公众有限公司 Crosstalk correction in the multiplexing analysis of biological sample

Patent Citations (6)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
US7402817B2 (en) * 2004-07-29 2008-07-22 The Research Foundation Of State University Of New York System and method for cross-talk cancellation in a multilane fluorescence detector
CN101688841A (en) * 2007-06-29 2010-03-31 霍夫曼-拉罗奇有限公司 Systems and methods for determining cross-talk coefficients in PCR and other data sets
CN101688838A (en) * 2008-06-11 2010-03-31 博奥生物有限公司 A reliable fluorescence correction method for two-color microarray fluorescence system
WO2016057923A1 (en) * 2014-10-09 2016-04-14 Kinetic River Corp. Particle analysis and sorting apparatus and methods
CN108139325A (en) * 2015-09-22 2018-06-08 迈卡提斯公众有限公司 Crosstalk correction in the multiplexing analysis of biological sample
CN106596489A (en) * 2016-12-19 2017-04-26 中国科学院苏州生物医学工程技术研究所 Processing method of fluorescence intensity data in fluorescence droplet detection

Non-Patent Citations (5)

* Cited by examiner, † Cited by third party
Title
An estimate of the cross-talk matrix in fourdye fluorescence-based DNA sequencing;Lei Li等;《Electrophoresis 1999》;19991231(第20期);第1433-1442页 *
Cross-talk filtering in four dye fluorescence-based DNA sequencing;Cristian Domnisoru等;《Electrophoresis 2000》;20001231(第21期);第2983-2989页 *
DNA分析仪荧光信号采集与处理系统的研究;郑华;《中国博士学位论文全文数据库 信息科技辑(月刊)》;20080915(第09期);第I140-24页 *
四色荧光DNA测序中的转换矩阵分析;林云跃 等;《光学仪器》;20070430;第29卷(第2期);第41-45页 *
多重定量PCR系统中多色荧光检测和光谱串扰校正方法;臧留琴 等;《光学学报》;20140131;第34卷(第1期);第0117002-1至0117002-7页 *

Also Published As

Publication number Publication date
CN109859188A (en) 2019-06-07

Similar Documents

Publication Publication Date Title
CN109859188B (en) Fluorescence crosstalk correction method based on mean shift algorithm and application thereof
Norton et al. Detecting hierarchical genome folding with network modularity
Patruno et al. A review of computational strategies for denoising and imputation of single-cell transcriptomic data
CN109284786B (en) SAR image terrain classification method for generating countermeasure network based on distribution and structure matching
US20040033485A1 (en) Feature extraction and normalization algorithms for high-density oligonucleotide gene expression array data
CN115797352B (en) Tongue picture image processing system for traditional Chinese medicine health-care physique detection
CN108898152A (en) A kind of cystic Tumor of Pancreas CT image classification method based on multichannel multi-categorizer
CN116994246B (en) Base recognition method and device based on multitasking combination, gene sequencer and medium
CN110490836A (en) DPCR microarray images information processing method
CN113012757B (en) Method and system for identifying bases in nucleic acids
CN117252786A (en) Gene detection data enhancement processing method
CN115803817A (en) Systems and methods for per-cluster intensity correction and base detection
Duan et al. Common copy number variation detection from multiple sequenced samples
CN109063537B (en) Hyperspectral image preprocessing method for unmixing of abnormal small target
EP2728502A1 (en) Method and computer program product for genotype classification
CN115359845A (en) Spatial transcriptome biological tissue substructure analysis method fusing unicellular transcriptome
Littman et al. JSTA: joint cell segmentation and cell type annotation for spatial transcriptomics
CN114038506A (en) Micro-drop type digital PCR high-concentration detection method
CN110751660B (en) Color image segmentation method
Chakraborty Use of partial least squares improves the efficacy of removing unwanted variability in differential expression analyses based on RNA-Seq data
Bergemann et al. A statistically driven approach for image segmentation and signal extraction in cDNA microarrays
Lukac et al. cDNA microarray image segmentation using root signals
Deepa et al. Automatic gridding of DNA microarray images using optimum subimage
CN117523559B (en) Base recognition method and device, gene sequencer and storage medium
Parthasarathy et al. An adaptive segmentation method based on Gaussian Mixture Model (GMM) clustering for DNA microarray

Legal Events

Date Code Title Description
PB01 Publication
PB01 Publication
SE01 Entry into force of request for substantive examination
SE01 Entry into force of request for substantive examination
GR01 Patent grant
GR01 Patent grant
TR01 Transfer of patent right

Effective date of registration: 20211130

Address after: 518101 g1316, Lianxing building, building B, Yihua new village, district 46, Haifu community, Xin'an street, Bao'an District, Shenzhen City, Guangdong Province

Patentee after: Pilot medical technology (Shenzhen) Co.,Ltd.

Address before: 310000 room 1114, Jin Jun Road, 341 Shui Xiang Road, Jianggan District, Hangzhou, Zhejiang.

Patentee before: PILOT GENE TECHNOLOGIES (HANGZHOU) Co.,Ltd.

TR01 Transfer of patent right