CN109859188B

CN109859188B - Fluorescence crosstalk correction method based on mean shift algorithm and application thereof

Info

Publication number: CN109859188B
Application number: CN201910099554.4A
Authority: CN
Inventors: 高鹏; 夏江
Original assignee: Pilot Gene Technologies Hangzhou Co ltd
Current assignee: Pilot medical technology (Shenzhen) Co.,Ltd.
Priority date: 2019-01-31
Filing date: 2019-01-31
Publication date: 2021-04-06
Anticipated expiration: 2039-01-31
Also published as: CN109859188A

Abstract

The invention relates to the technical field of machine learning, in particular to a fluorescence crosstalk correction method based on a mean shift algorithm and application thereof. The fluorescence crosstalk correction method based on the mean shift algorithm comprises the following steps: s1: preprocessing multi-channel fluorescence intensity data; s2: calculating different channel clustering centers based on mean shift clustering; s3: and calculating the slope, and filling the correction matrix to obtain the corrected fluorescence data. The correction method disclosed by the invention can avoid artificial errors introduced when the fluorescence intensity value is sampled, reduce the influence of noise on the fluorescence intensity, and accurately and quickly calculate the element value of the correction matrix, so that corrected fluorescence data is obtained, and the obtained fluorescence data is more real and accurate.

Description

Fluorescence crosstalk correction method based on mean shift algorithm and application thereof

Technical Field

The invention relates to the technical field of machine learning, in particular to a fluorescence crosstalk correction method based on a mean shift algorithm and application thereof.

Background

The digital PCR technology is a technology that DNA or RNA samples are diluted and then are evenly distributed to independent micro-reaction units, a fluorescent probe marked by a specific gene fragment is used for marking, molecular template PCR amplification is carried out, and after the amplification is finished, the positivity or the negativity of the micro-reaction units is judged according to the strength of a fluorescent signal.

The fluorescent probe labeled by the specific gene fragment generates a corresponding fluorescent signal after being combined with the molecular template strand. Different fluorescence light corresponds to a specific wave band, and the sensor detecting the wave band can display the maximum brightness value. The fluorescence spectra are continuous, and meanwhile, partial intersection exists among different fluorescence, and the fluorescence intensity detected on the same wave band can be superposed by several kinds of fluorescence, which is a fluorescence crosstalk phenomenon. As shown in fig. 1.

To achieve spectral crosstalk correction, the fluorescence intensity space is converted to the dye concentration space as follows: assumed to be in the four-color fluorescence signal, I₁(t)，I₂(t)，I₃(t) and I₄(t) 4 fluorescence intensity values detected at time t, ROX (t), FAM (t), VIC (t), CY5(t) represent 4 dye concentration values. The conversion equation from dye concentration space to fluorescence intensity space is as follows:

in the formula, W_ijRepresenting the ratio of the fluorescence intensity of the ith dye in the jth channel, b₁(t)，b₂(t)，B₃(t)，b₄(t) represents noise signals of 4 channels. In the above mathematical model, S represents [ I ]₁(t)，I₂(t)，I₃(t)，I₄(t)]^TI.e. the fluorescence signal; f represents [ ROX (t), FAM (t), VIC (t), CY5(t)]^TI.e. the dye signal; w is a spectral crosstalk matrix; the formula is expressed as S ═ WF. According to the fluorescence intensity signal S, the dye concentration signal is F ═ W^-1And S, realizing fluorescence crosstalk correction.

The existing method for calculating the fluorescence crosstalk correction matrix W comprises a peak value method, a slope iteration method, a K-mean clustering method and the like.

The peak value method is to manually select a time point, fill the light intensity of n wave bands of the time point into the corresponding column of the correction matrix, and the selected wave peaks have good signal-to-noise ratio and have small overlap. The peak method however has the following drawbacks: firstly, manual selection of peaks produces human error, and secondly, the sample is limited and cannot cover all data sets.

K-mean clustering requires that fluorescence intensity values form n clusters in an n-dimensional space, but if the fluorescence intensity data is disturbed to a high degree or noise influences form clusters larger than n, the K-mean clustering may calculate erroneous clustering centers, thereby influencing the correction matrix data.

The slope iteration method is to select two channels as X axis and Y axis and take the fluorescence intensity value as coordinate value. The data points are generally distributed around the X-axis and Y-axis in the first quadrant, and then sampled at equal intervals, the slope of the fitted line is calculated, and the corresponding W matrix is filled. However, if the distribution of data is concentrated or influenced by noise points, after sampling at equal intervals, the fitted straight line generates a slope greater than 1, and the authenticity of the W data is greatly influenced.

Disclosure of Invention

The invention discloses a fluorescence crosstalk correction method based on a mean shift algorithm, which can avoid human errors introduced when a fluorescence intensity value is sampled, reduce the influence of noise on the fluorescence intensity, and accurately and quickly calculate the element value of a correction matrix so as to obtain corrected fluorescence data.

The specific scheme of the invention is as follows:

the invention discloses a fluorescence crosstalk correction method based on a mean shift algorithm, which comprises the following steps:

s1: preprocessing multi-channel fluorescence intensity data;

s2: calculating different channel clustering centers based on mean shift clustering;

s3: and calculating the slope, and filling the correction matrix to obtain the corrected fluorescence data.

It should be understood that the present invention is not limited to the above steps, and may also include other additional steps, for example, before step S1, between steps S1 and S2, between steps S2 and S3, between steps S3 and S4, between steps S4 and S5, and after step S5, without departing from the scope of the present invention.

Preferably, the step S1 includes:

s11: shooting chip pictures of different channels under a stable imaging system;

s12: acquiring a fluorescence value of a micro-reaction unit in a picture;

s13: selecting a channel to be corrected and a chip type of a picture;

s14: and (4) preprocessing fluorescence data.

It is understood that additional steps may be included before, during, or after the above-described steps without departing from the scope of the present invention.

More preferably, the S12 includes:

s121: positioning the central position of the micro-reaction unit;

s122: the mean of the fluorescence values of the microreaction units was calculated.

Further, the formula for calculating the mean value of the fluorescence values of the micro reaction cells in S122 is:

where f (x, y) represents a pixel value at the center position (x, y), V_n(x, y) represents the fluorescence value of the nth microreaction unit, N ∈ [1, N ]]And N represents the number of reaction units.

Preferably, the S14 includes:

s141: optionally selecting a channel fluorescence data value, calculating

Wherein, B_n(x, y) represents a background pixel value of the nth micro-reaction unit,

new fluorescence data representing the nth micro-reaction unit;

s142: selecting the maximum fluorescence value in the channel

And minimum fluorescence value

Let the initial threshold be

S143: data sets A and B are set, where A represents a pixel value equal to or greater than T₀The number of data is M₁(ii) a B represents that the number of data is less than T₀The number of data is M₂；

S144: calculating the gray level mean value G of the set A and the set B_AAnd G_B：

S145: calculating a new threshold value

S146: comparing the iteration T of the previous iteration and the iteration T of the next iteration, if the iteration T of the previous iteration and the iteration T of the next iteration are different, repeating the steps S143, S144 and S145, otherwise, carrying out the next step;

s147: taking T as a reference point of channel data; summing all data points greater than T to obtain a mean value

Wherein M is_CThe number of data with the pixel value greater than or equal to T;

s148: all data under the normalized channel:

wherein

Representing the normalized fluorescence intensity value;

s149: and repeating S141-S148, and performing normalization processing on other selected channel data.

Preferably, the step S2 includes:

s21: two channels, CH1 and CH2, optionally normalized, to construct a two-dimensional spatial distribution P (V) of fluorescence intensity data^ch1，V^ch2) In which V is^ch1And V^ch2Fluorescence intensity values for channel CH1 and channel CH2, respectively;

s22: randomly selecting an unlabeled data point

As the center point, where c ∈ [1, N ]]；

S23: calculating distance points

Point less than BW

Point recording

Are integrated into U, wherein

Wherein BW is 0.5, j represents different points in the set U; setting all points in the set U as class C, and adding 1 to the statistical times of the points respectively;

s24: by point

As a center, calculate

To the vector of each element in the set U, these vectors are added to obtain the Shift vector Shift (V)^ch1，V^ch2) Update the center position to

Wherein

Where j represents a different point in the set U;

s25: repeating S23-S24 until point P_new(V^ch1，V^ch2) And point

Is less than 0.05, when the recording point P is formed_new(V^ch1，V^ch2) All coordinate points encountered in the iteration process belong to the class C;

s26: p of current class C if converged_new(V^ch1，V^ch2) If the distance between the center of the cluster and the center of other existing clusters is less than the threshold value of 0.1, merging the two clusters, and if not, taking c as a new cluster;

s27: repeating S22-S26 until all points are marked for access;

s28: counting the access frequency of each point corresponding class, taking the class with the maximum access frequency as the class to which the current point set belongs,

s29: repeating S21-S28 finds all cluster centers for any two channels.

Preferably, the step S3 includes:

s31: calculating a slope;

s32: all slope values take negative numbers and are filled into corresponding crosstalk inverse matrixes, and elements on main diagonals of the crosstalk inverse matrixes are assigned to be 1;

s33: substituting the generated inverse matrix into the formula F ═ W^-1And S, obtaining corrected fluorescence data, wherein W represents a spectrum crosstalk signal, S represents a fluorescence intensity signal, and F represents a dye concentration signal.

More preferably, in S31, a cluster center is selected based on the fluorescence intensity value, the number of clusters, and the cluster position, and then the slope is calculated.

Specifically, in an embodiment of the present invention, the step S31 includes one of the steps S311 or S312:

s311: if the number of channels is less than 3, assigning all main diagonal angles of the correction matrix to be 1, and assigning other positions to be 0;

s312: if the number of channels is greater than or equal to 3, the slope is calculated as follows:

s3121: optionally marking two channels CH1 and CH2 as an X channel and a Y channel, and copying the cluster number and the central position obtained in the step S2;

s3122: if the number of clusters is less than 3, the corresponding correction matrix is assigned to be 0, otherwise, the following steps are carried out;

s3123: if the number of clusters is equal to 3, each cluster center C_iAddition of the horizontal and vertical coordinate values

And

represents the ith class C_iCorresponding abscissa and ordinate values, CH1 and CH2 represent corresponding channels; cluster C with the smallest sum_minAnd othersTwo cluster centers are marked as C_AAnd C_BThen slope of

If L is_I＜L_IIThen, then

Otherwise

L_xAnd L_yRepresents the corresponding slopes of the channels CH1 and CH 2;

s3124: if the number of clusters is more than 3, firstly calculating the mean value of all cluster centers:

where n represents the total number of cluster centers.

S3125: calculating a slope, comprising the steps of:

step A1, screening

Cluster center of (2):

(1) if the filtered cluster center does not exist, L _x0 and L_y＝0；

(2) If the number of the screened cluster centers is 1, the data of the cluster centers corresponding to the CH1 channel and the CH2 channel are represented as

And

(3) if the number of the screened cluster centers is more than 1, the coordinate value corresponding to the class with the minimum sum of the horizontal coordinate value and the vertical coordinate value is given to

And

step A2, screening

Cluster center of (2):

(1) if the number of the screened cluster centers is less than 2, L _x0 and L_y＝0；

(2) If the number of the screened cluster centers is equal to 2, two cluster centers are marked as C_AAnd C_BThen slope of

If L is_I＜L_IIThen L_x＝L_I，

Otherwise L_x＝L_II，

L_xAnd L_yRepresents the corresponding slopes of the channels CH1 and CH 2;

(3) if the number of the screened cluster centers is more than 2, marking the coordinate corresponding to the class with the maximum abscissa value as C_AThe coordinate corresponding to the class with the maximum ordinate value is marked as C_BThen slope of

If L is_I＜L_IIThen L_x＝L_I，

Otherwise L_x＝L_II，

L_xAnd L_yRepresents the corresponding slopes of the channels CH1 and CH 2;

s3126: and repeating the steps S3121-S3125, and calculating slope values of other two channels.

On the basis of the common general knowledge in the field, the above-mentioned preferred conditions can be combined arbitrarily without departing from the concept and the protection scope of the invention.

The invention discloses a digital PCR system in a second aspect, and the correction of the fluorescence crosstalk signal is realized by adopting the method.

The third aspect of the invention discloses the application of the method in the technical field of machine learning.

Compared with the prior art, the invention has the following remarkable advantages and effects:

the uniform drift clustering correction method is completed based on the condition that all data sets participate in calculation, the distribution and the shape of clustering do not influence the calculation of the slope, and the condition that the selection of single data and the limitation of samples occur by means of a peak value method and the like is greatly avoided.

In the data preprocessing stage, the threshold value T is determined by adopting an iterative loop method, all data with fluorescence values larger than T are calculated, the mean value of the data is calculated, and the data is normalized finally, so that the threshold value T is determined in a way that the data are comprehensively considered to be composed of negative and positive, and the method is more accurate.

When mean shift updates the cluster center, a formula is used

The clustering process is prevented from falling into local optimal or dead loops.

When the slope is calculated, the method disclosed by the invention screens real clustering centers according to the conditions of fluorescence intensity values, clustering numbers, clustering positions and the like, eliminates the false clustering centers from participating in calculation, and ensures that the data is more authentic.

Drawings

FIG. 1 is a schematic diagram of four-color fluorescence crosstalk;

FIG. 2 is a flow chart of a technical solution in an embodiment of the present invention;

FIG. 3 is a diagram illustrating a mean shift according to an embodiment of the present invention;

FIG. 4 is a graph of fluorescence distribution before calibration in an embodiment of the present invention;

FIG. 5 is a corrected fluorescence distribution diagram according to an embodiment of the present invention.

Detailed Description

The invention is further illustrated by the following examples, which are not intended to limit the scope of the invention.

The method disclosed by the invention mainly comprises the following steps:

s1: preprocessing multi-channel fluorescence intensity data;

The terms are explained in the present invention:

and (3) mean shift algorithm: mean-shift clustering is a sliding-window based algorithm that attempts to find dense regions of data points. This is a centroid based algorithm, which means that its goal is to locate the center point of each group/class by updating the candidate points for the center point to the mean of the points within the sliding window. These candidate windows are then filtered in a post-processing stage to eliminate approximate duplicates, forming a final set of center points and their corresponding groups.

Referring to fig. 2, a basic flow chart of the technical scheme of the invention is shown, which comprises the following steps:

firstly, reading in data, then judging the number of channels, if the number of channels is less than three channels, calculating the slope, and then calculating the correction matrix to obtain corrected fluorescence data. If the number of the channels is larger than or equal to three channels, selecting a solid or liquid chip picture, then carrying out data preprocessing, then calculating a positioning/non-positioning channel clustering center, screening the clustering center, then calculating a slope, and then calculating a correction matrix to obtain corrected fluorescence data.

Specifically, in an embodiment of the present invention, a fluorescence crosstalk correction method based on a mean shift algorithm is disclosed, which includes the following steps:

s1: preprocessing multi-channel fluorescence intensity data;

In an embodiment of the present invention, the step S1 includes:

s12: acquiring a fluorescence value of a micro-reaction unit in a picture;

s13: selecting a channel to be corrected and a chip type of a picture;

s14: and (4) preprocessing fluorescence data.

More preferably, the S12 includes:

s121: positioning the central position of the micro-reaction unit;

In one embodiment of the present invention, in S122, the formula for calculating the mean value of the fluorescence values of the micro reaction cells is:

In an embodiment of the present invention, the S14 includes:

s141: optionally selecting a channel fluorescence data value, calculating

new fluorescence data representing the nth micro-reaction unit;

s142: selecting the maximum fluorescence value in the channel

And minimum fluorescence value

Let the initial threshold be

S145: calculating a new threshold value

s148: all data under the normalized channel:

wherein

Representing the normalized fluorescence intensity value;

In an embodiment of the present invention, the step S2 includes:

s22: randomly selecting an unlabeled data point

As the center point, where c ∈ [1, N ]]；

S23: calculating distance points

Point less than BW

Point recording

Are integrated into U, wherein

s24: by point

As a center, calculate

Wherein

Where j represents a different point in the set U;

s25: repeating S23-S24 until point P_new(V^ch1，V^ch2) And point

s27: repeating S22-S26 until all points are marked for access;

s29: repeating S21-S28 finds all cluster centers for any two channels.

The specific process is shown in FIG. 3, the center position is a randomly selected data point

The effective area refers to all points in the set U, each point to

Is less than BW, and the black arrow represented by the solid line represents the movement vector Shift (V)^ch1，V^ch2)。

In an embodiment of the present invention, the step S3 includes:

s31: calculating a slope;

s32: and taking negative values of all slope values, filling the negative values into the corresponding crosstalk inverse matrix, and assigning 1 to the elements on the main diagonal of the crosstalk inverse matrix. Table 1 is the inverse matrix calculated from the fluorescence data for a selected set of 3 channels: the main diagonal element is 1, and the non-main diagonal element is obtained by taking the negative of the element at the corresponding position of the crosstalk matrix.

TABLE 1 inverse matrix for crosstalk correction

Channel	ROX	VIC	FAM
				ROX	1	-0.398	-0.149
VIC	0	1	-0.337
				FAM	0	-0.0522	1

S33: substituting the generated inverse matrix into the formula F ═ W^-1And S, obtaining corrected fluorescence data, wherein W represents a spectrum crosstalk signal, S represents a fluorescence intensity signal, and F represents a dye concentration signal. Fig. 4 and 5 show the fluorescence value distributions of the VIC channel and the FAM channel before and after correction, and the fluorescence value distributions after correction are more gathered, and the positive and negative points of the corresponding channels are easier to segment.

In one embodiment of the present invention, in S31, the cluster center is selected based on the fluorescence intensity value, the number of clusters, and the cluster position, and then the slope is calculated.

s3123: if the number of clusters is equal to 3, each cluster center C_iTransverse and longitudinal directionsAddition of coordinate values

And

represents the ith class C_iCorresponding abscissa and ordinate values, CH1 and CH2 represent corresponding channels; cluster C with the smallest sum_minThe other two cluster centers are denoted C_AAnd C_BThen slope of

If L is_I＜L_IIThen L_x＝L_I，

Otherwise L_x＝L_II，

L_xAnd L_yRepresents the corresponding slopes of the channels CH1 and CH 2;

where n represents the total number of cluster centers.

S3125: calculating a slope, comprising the steps of:

step A1, screening

Cluster center of (2):

(1) if the filtered cluster center does not exist, L _x0 and L_y＝0；

And

And

step A2, screening

Cluster center of (2):

If L is_I＜L_IIThen L_x＝L_I，

Otherwise L_x＝L_II，

L_xAnd L_yRepresenting the pairs of channels CH1 and CH2The corresponding slope;

If L is_I＜L_IIThen L_x＝L_I，

Otherwise L_x＝L_II，

L_xAnd L_yRepresents the corresponding slopes of the channels CH1 and CH 2;

The above embodiments are preferred embodiments of the present invention, but the present invention is not limited to the above embodiments, and any other changes, modifications, substitutions, combinations, and simplifications which do not depart from the spirit and principle of the present invention should be construed as equivalents thereof, and all such changes, modifications, substitutions, combinations, and simplifications are intended to be included in the scope of the present invention.

Claims

1. A fluorescence crosstalk correction method based on a mean shift algorithm is characterized by comprising the following steps:

s1: preprocessing multi-channel fluorescence intensity data;

s3: calculating the slope, and filling a correction matrix to obtain corrected fluorescence data;

the step S3 includes:

s31: calculating a slope;

s33: substituting the generated inverse matrix into the formula F ═ W^-1S, obtaining corrected fluorescence data, wherein W represents a spectrum crosstalk signal, S represents a fluorescence intensity signal, and F represents a dye concentration signal;

in S31, screening out a clustering center according to the fluorescence intensity value, the clustering number and the clustering position, and then calculating a slope;

And

represents the ith class C_iCorresponding horizontal and vertical coordinate values, ch1 and ch2 represent corresponding channels; selecting the smallest cluster center C_minThe other two cluster centers are denoted C_AAnd C_BThen slope of

If L is_I＜L_IIThen L_x＝L_I，

Otherwise L_x＝L_II，

L_xAnd L_yIndicates the slopes of the channels ch1 and ch 2;

wherein n represents the total number of cluster centers;

s3125: calculating a slope, comprising the steps of:

step A1, screening

Cluster center of (2):

(1) if the filtered cluster center does not exist, L_x0 and L_y＝0；

(2) If the number of the screened cluster centers is 1, the data of the corresponding channel ch1 and channel ch2 of the cluster center is represented as

And

And

step A2, screening

Is polymerized byClass center:

(1) if the number of the screened cluster centers is less than 2, L_x0 and L_y＝0；

If L is_I＜L_IIThen L_x＝L_I，

Otherwise L_x＝L_II，

L_xAnd L_yIndicate the slopes corresponding to the channels ch1 and ch 2;

If L is_I＜L_IIThen L_x＝L_I，

Otherwise L_x＝L_II，

L_xAnd L_yIndicate the slopes corresponding to the channels ch1 and ch 2;

2. The method according to claim 1, wherein the step S1 includes:

s12: acquiring a fluorescence value of a micro-reaction unit in a picture;

s13: selecting a channel to be corrected and a chip type of a picture;

s14: and (4) preprocessing fluorescence data.

3. The method according to claim 2, wherein the S12 includes:

s121: positioning the central position of the micro-reaction unit;

4. The method of claim 3, wherein the formula for calculating the mean value of the fluorescence values of the micro reaction cells in S122 is:

where f (x, y) represents a pixel value at the center position (x, y), V_n(x, y) represents the fluorescence value of the nth microreaction unit, N ∈ [1, N ]]And N represents the number of microreaction units.

5. The method according to claim 2, wherein the S14 includes:

s141: optionally selecting a channel fluorescence data value, calculating

new fluorescence data representing the nth micro-reaction unit;

s142: selecting the maximum fluorescence value in the channel

And minimum fluorescence value

Let the initial threshold be

S143: data sets A and B are set, where A represents a pixel value equal to or greater than T₀The number of data is M₁(ii) a B represents a pixel value less than T₀The number of data is M₂；

S145: calculating a new threshold value

Wherein M is_CThe number of data with pixel values larger than T;

s148: all data under the normalized channel:

wherein

Representing the normalized fluorescence intensity value;

6. The method according to claim 1, wherein the step S2 includes:

s21: two channels ch1 and ch2, optionally normalized, construct the two-dimensional spatial distribution P (V) of fluorescence intensity data^ch1，V^ch2) In which V is^ch1And V^ch2Fluorescence intensity values for channels ch1 and ch2, respectively;

s22: randomly selecting an unlabeled data point

As the center point, where c ∈ [1, N ]]；

S23: calculating distance points

Point less than BW

Point recording

Are integrated into U, wherein

s24: by point

As a center, calculate

Wherein

Where j represents a different point in the set U;

s25: repeating S23-S24 until point P_new(V^ch1，V^ch2) And point

Is less than 0.05, when the recording point P is formed_new(V^ch1，V^ch2) During the process of S23-S24, all the points in the set U traversed by the algorithm belong to the class C;

s27: repeating S22-S26 until all points are marked for access;

s29: repeating S21-S28 finds all cluster centers for any two channels.

7. A digital PCR system, characterized in that the correction of the fluorescence cross talk signal is achieved by the method according to any of claims 1-6.