CN112085067B

CN112085067B - Method for high-throughput screening of DNA damage response inhibitor

Info

Publication number: CN112085067B
Application number: CN202010829597.6A
Authority: CN
Inventors: 王毅; 王锐; 荀德金; 陈雪纯
Original assignee: Zhejiang University ZJU
Current assignee: Zhejiang University ZJU
Priority date: 2020-08-17
Filing date: 2020-08-17
Publication date: 2022-07-12
Anticipated expiration: 2040-08-17
Also published as: CN112085067A

Abstract

The invention discloses a method for high-throughput screening of a DNA damage response inhibitor, which comprises the following steps: s1, training a cell nucleus segmentation network model based on the U-Net network; s2, constructing a cell nucleus type judgment network model and training; s3, shooting the cells acted by the DNA damage reaction inhibitor by using high content imaging equipment to obtain an image to be analyzed; and inputting the image to be analyzed into the cell nucleus segmentation network model and then inputting into the cell nucleus type judgment network model, and counting damaged cell nucleus ratios corresponding to each DNA damage reaction inhibitor, wherein the smaller the damaged cell nucleus ratio is, the better the effect of the DNA damage reaction inhibitor is. The method can automatically perform segmentation and class decision on the images acquired by the high-content imaging equipment in batches, and can preliminarily screen out compounds with further research value through statistical analysis.

Description

Method for high-throughput screening of DNA damage response inhibitor

Technical Field

The invention relates to the technical field of DNA damage, drug screening and deep learning, in particular to a method for screening a DNA damage response inhibitor in a high-throughput manner.

Background

DNA damage is caused when organisms are subjected to various endogenous and exogenous factors (e.g., reactive oxygen species, DNA replication errors, ultraviolet radiation, ionizing radiation and genotoxic agents). The accumulation of DNA damage has been shown to be closely related to organ aging and cancer progression.

Despite the question whether inhibiting DNA damage or optimizing the DNA repair process slows aging in humans, evidence suggests that prevention of DNA damage and promotion of DNA repair are key therapeutic targets for age-related diseases, including vascular diseases, metabolic diseases, neurodegenerative diseases.

In addition, DNA damage response inhibitors (DDR) are also useful in the treatment of cancer due to the high likelihood of tumor tissue accumulating DNA damage. Therefore, the development of a rapid and accurate high-throughput DDR screening method has important academic value.

The occurrence of nuclear foci is a common indicator of DNA damage and has wide applications in biometrics, individual radiosensitivity assessment, and toxicity assessment. The formation of nuclear foci is caused by the accumulation or modification of certain DDR proteins at double strand breaks.

DDR proteins include gamma H2AX, 53BP1, RAD51, MRE11/RAD50/NBS1 complexes, and the like. Lesions can be visualized under a fluorescence microscope by immunofluorescence, immunohistochemical analysis or labeling methods with fluorescent proteins. In general, the number of lesions is closely related to the radiation dose, and researchers can quantify DNA damage by counting the number of lesions and counting the lesions per nucleus or per DNA region.

Currently, some automated methods that can perform batch processing are not always satisfactory in some situations.

In current open source software, FoCo has a friendly graphical user interface, but because of the variation in brightness between individual cells and batch-by-batch in the acquisition setup, intensity parameters need to be adjusted manually, which often introduces large errors.

Focinator is an ImageJ-based macro that detects Foci using only the maximum criterion, and also has similar limitations of FoCo.

Findfici allows manual training of parameters, but people mark Foci (focus) is laborious and error prone, especially in situations where background interference is large. In addition, when the cell density is high, some cell nuclei adhered to each other exist, and the cell nuclei cannot be well segmented by using the threshold segmentation method.

Therefore, a method capable of processing a large amount of image data acquired by a high-content imaging platform in batch, performing image segmentation rapidly and accurately, determining whether cell nuclei are damaged, and finally performing drug screening by using statistical analysis is urgently needed.

Disclosure of Invention

The invention provides a method for screening a DNA damage response inhibitor at high flux, which can automatically perform single cell nuclear segmentation and class decision on images acquired by high content imaging equipment in batches, and can preliminarily screen out compounds with further research value through statistical analysis.

A method for high-throughput screening of DNA damage response inhibitors comprises the following steps:

s1, training a cell nucleus segmentation network model based on the U-Net network;

s2, constructing a cell nucleus type judgment network model and training;

s3, shooting the cells after the action of the DNA damage reaction inhibitor by using high content imaging equipment to obtain an image to be analyzed; and inputting the image to be analyzed into the cell nucleus segmentation network model and then inputting into the cell nucleus type judgment network model, and counting damaged cell nucleus ratios corresponding to each DNA damage reaction inhibitor, wherein the smaller the damaged cell nucleus ratio is, the better the effect of the DNA damage reaction inhibitor is.

The cell nucleus segmentation model can automatically segment images shot by high-content imaging platforms of different types, realizes automatic adaptation of image segmentation among different batches, improves the condition of non-segmentation under the condition of cell nucleus adhesion, and improves the robustness of segmentation under the condition of large cell background interference.

The nuclear segmentation model provided by the invention uses a deep learning method, adopts a U-Net network architecture, comprises an encoder and a decoder, wherein the encoder can automatically extract features, the extracted features are more and more abstract along with the increase of the number of layers, higher-dimensional information is reflected, the input of the extracted features is an image acquired by a high-content imaging platform, and the output of the extracted features is an extracted feature map.

The decoder gradually restores the details and the spatial dimensions of the object, and meanwhile, the encoder and the decoder are connected quickly, so that the decoder can be helped to restore the target details better. The input is a feature map extracted by the encoder, and the output is a mask image having the same size as the input image. The mask image can be used to segment individual nuclei in an image.

The cell nucleus type judging model can judge whether the input single cell nucleus image is damaged or not, has high accuracy, reduces the time consumed by manual counting, and can output the probability of each cell nucleus corresponding to each type so as to facilitate subsequent statistical analysis.

The cell nucleus type judgment model uses a deep learning method, uses a VGG-19 network architecture, uses a convolutional neural network to extract characteristics, uses a pooling layer to zoom images, obtains higher-dimensional characteristic information after several groups of convolution pooling, finally uses the high-dimensional characteristic information to classify the images, inputs the images of single cell nucleus and outputs the type judgment result of the cell nucleus.

The method for screening the DNA damage reaction inhibitor in high flux carries out statistical analysis on the result of the nucleus type judgment, carries out comparative analysis on the result of the nucleus type judgment and the result of a control group and a positive drug, calculates the proportion of damaged nuclei in an image acquired by each DNA damage reaction inhibitor, and finally carries out sequencing for statistical analysis. The compounds with the top rank are selected to carry out subsequent efficacy verification experiments, such as dose-effect curve experiments, comet experiments and the like.

Compared with the prior art, the invention has the following effects:

the method for screening the DNA damage reaction inhibitor at high flux can automatically process images from different drug sources, has high accuracy, processes a large amount of image data acquired by a high content imaging platform in batches, and provides a foundation for drug effect experiments.

Drawings

FIG. 1 is a general flowchart of the method for screening DNA damage response inhibitor with high throughput according to the present invention, in which Focinet refers to a cell nucleus segmentation model and a cell nucleus classification determination model.

FIG. 2 is a schematic diagram of a network architecture of a nuclear segmentation model according to the present invention.

Fig. 3 is a schematic diagram of a network architecture of the cell nucleus type determination model according to the present invention.

Fig. 4 is a flow chart of DDR drug screening using the established cell nucleus segmentation model and cell nucleus type determination model in the present invention, wherein FociNet in the figure refers to the cell nucleus segmentation model and the cell nucleus type determination model.

Detailed Description

The technical solution of the present invention is further described in detail below with reference to the accompanying tables and examples. The following examples are carried out on the premise of the technical scheme of the invention, and detailed embodiments and processes are given, but the scope of the invention is not limited to the following examples.

As shown in FIG. 1, this example provides a method for high throughput screening of DNA damage response inhibitors.

And S1, training the nuclear segmentation network model based on the U-Net network.

The cell nucleus segmentation network model is used for carrying out image segmentation on image data shot by the high content imaging equipment to obtain a mask image corresponding to the image data, and then the mask image is used for cutting the original image to obtain a single cell nucleus image.

And S11, constructing a first U-Net network and a second U-Net network.

The U-Net network comprises an encoder and a decoder, wherein quick connection exists between the encoder and the decoder, the structure of the encoder comprises 4-5 subblocks, except the last subblock, each subblock comprises two convolution layers and a pooling layer, elu is used as an activation function, and a Drapout layer is added between the two convolution layers; the last subblock includes only two convolutional layers, with a Dropout layer added between them, using elu as the activation function.

The encoder can automatically extract features, the extracted features are more and more abstract along with the increase of the number of layers, higher-dimensional information is reflected, the input of the encoder is an image acquired by a high-content imaging platform, and the output of the encoder is a feature map obtained through extraction. The decoder gradually restores the details and the spatial dimensions of the object, and meanwhile, a quick connection exists between the encoder and the decoder, so that the decoder can be helped to restore the target details better. The decoder input is a feature map extracted by the encoder, and the output is a mask image of the same size as the input image.

The number of subblocks in the encoder has a certain influence on the segmentation effect of the model, the number of subblocks is too small, the model training is insufficient, the characteristics with higher dimension cannot be extracted, the number of subblocks is too large, the model training process is slow, and the model can generate more redundant parameters. The number of sub-blocks is typically 4 to 5.

As shown in fig. 2, the structure of the encoder according to this embodiment includes 5 sub-blocks, each of which includes a certain number of convolutional layers and pooling layers.

The first sub-block contains two convolutional layers and a pooling layer, with elu being used as the activation function, with a Dropout layer added between the two convolutional layers to randomly drop some features during the training process to prevent over-fitting and increase the robustness of the model. The convolutional layer can be used for extracting features, and the pooling layer is used for scaling the image to extract higher-dimensional features;

similarly, the second sub-block, the third sub-block, and the fourth sub-block also include two convolutional layers and a pooling layer, with elu being used as the activation function, and a Dropout layer being added between the two convolutional layers;

the fifth subblock contains only two convolutional layers, again with a Dropout layer added between them, using elu as the activation function.

The structure of the decoder described in this embodiment includes 4 subblocks, each subblock includes a certain number of transposed convolution layers and a shortcut connection layer, the transposed convolution layers are used to scale the feature map back to a previous size, and the shortcut connection layer is used to connect the feature map in the encoder and the scaled image of the corresponding size of the transposed convolution layers, and the decoder can be helped to better restore the target details through information sharing. The decoder is connected after the encoder, each subblock is firstly subjected to feature scaling by using a transposed convolutional layer, then is communicated with a feature map of the encoder with a corresponding size, finally is connected with two convolutional layers, a Dropout layer is added between the two convolutional layers, elu is also used for an activation function of the two convolutional layers, and finally a convolutional layer is connected after the fourth subblock to output a final mask image.

After the structures of the encoder and the decoder are built, the encoder and the decoder learn the input samples, namely, the parameter optimization of the encoder and the decoder is realized, and the encoder and the decoder capable of performing the cell nucleus segmentation can be obtained. Because the images acquired by high content do not have corresponding mask images, and manual labeling consumes a large amount of time, the existing data sets are searched on the network, and two training sets are found out together.

The S12, DATA-SCIENCE-BOWL-2018 dataset is first passed through a resize function and then used to train the first U-Net network.

The DATA-SCIENCE-BOWL-2018 dataset is derived from https: com/kamalkraj/DATA-SCIENCE-BOWL-2018/tree/master/dada. The method is characterized in that the background difference is large, the image source is complex, and the method can be used for a rough network to inhibit the situation that some background interference is large. The loss function is a cross entropy loss function.

Aiming at images shot by high content imaging platforms from different sources, as the sizes of the images shot by different platforms and the settings of different experimenters are different, a resize function needs to be added before a U-Net network model, and for the condition of unequal length and width, the original image is cut by the resize function to obtain the images with equal length and width, then the cut images are scaled, and finally the images are unified to the size of 512 to be input into the U-Net network model.

S13, the BBBC039 picture data set is subjected to a resize function and then used for training a second U-Net network.

The contrast between the background of the BBBC039 data set and the interested area is very obvious, and the poor applicability of the model under the condition that the ground network trained by the latter data set greatly interferes with the background is well avoided due to the preprocessing of the network trained by the former data set; meanwhile, a plurality of cell nucleuses are adhered in the latter set of images, so that the method is very suitable for segmenting scenes. Through the model trained by the network, the image preprocessed by the previous network can be subjected to more fine image segmentation, and finally the mask image of the original input image is obtained.

The method comprises the steps that an image of an input cell nucleus segmentation network model firstly passes through a U-Net network to obtain a first mask image; and after the image to be detected is multiplied by the second mask image, obtaining the positions of all pixel points of each communication area through a communication domain algorithm, and further cutting each communication area out independently.

Through the mask image, the positions of all pixel points contained in each communicated region in the image can be obtained by using a connected domain algorithm, and then each communicated region can be cut out independently. Because the sizes of different cell nuclei are different, the subsequent cell nucleus type judgment model is troublesome by using the respective length and width of each cell nucleus for cutting, according to the priori knowledge of cell biology, it is determined that each connected region (usually, the region of one cell nucleus and the situation that a small amount of cells are not divided) is placed in a 256-by-256 container, and the pixel values of the other regions of the image except the extracted region of the cell nucleus are 0. Each communicating region is placed in the middle of the container. This cuts out each individual cell nucleus region from the original image.

And S2, constructing a cell nucleus type judgment network model and training.

And S21, constructing a nucleus type judgment network model based on the VGG-19 network.

As shown in fig. 3. Specifically, a network architecture of VGG-19 is employed.

The number of subblocks has a certain influence on the classification effect of the model, if the number of subblocks is too small, the model training is insufficient, high-dimensional features cannot be extracted, the model is not suitable for subsequent classification decision, the number of subblocks is too large, the model training process is slow, large redundant parameters can be generated, and the number of subblocks is usually 4 to 5.

The cell nucleus type judgment network model is divided into 5 sub-blocks and 2 full-connection layers which are connected in sequence, wherein the first two sub-blocks respectively comprise two convolution layers and a pooling layer, the activation function of the convolution layers uses relu, the last 3 sub-blocks respectively comprise 4 convolution layers and a pooling layer, and the activation function of the convolution layers also uses relu; the activation function of the first fully-connected layer is relu, and the activation function of the latter fully-connected layer is softmax.

The input of the model is an image of individual cell nuclei obtained by post-cropping using a cell nucleus segmentation model. After 5 sub-blocks, the obtained feature map is stretched into a one-dimensional vector.

The number of layers of the full-connection layer can influence the classification effect of the model to a certain extent, under the common condition, if the number of layers is small, the model training is insufficient, fitting is easy to lack, the good classification effect cannot be achieved, and if the number of layers is large, fitting is easy to pass, and the method cannot be applied to actual classification scenes. Here, in the process of optimizing the network structure, a classification scene with very suitable characteristics extracted by the first 5 sub-blocks is found, so that a good classification effect can be achieved only by using two full-connection layers, and therefore, no more full-connection layers are added. After the framework is constructed, the network structure learns the input samples, namely, the parameter optimization of the network structure is realized, and finally, a model capable of carrying out the nucleus classification is obtained through training.

S22, obtaining the single cell image data set to train the cell nucleus type judgment network model.

The loss function is a cross entropy loss function.

The contrast group and the positive drug are segmented by using a nucleus segmentation network through images shot by high content to obtain corresponding single cell nucleus images, then 2000 images are manually selected from the single cell nucleus images, each image is strictly screened and examined, the three categories are damaged, undamaged and signal-free, the nuclei of EGFP focuses which have diffuse EGFP signals and no aggregated fluorescent spots or have the aggregated fluorescent spots counted as 1 to 4 are marked as undamaged types, and the nuclei with more than 4 EGFP focuses are marked as damaged types. Nuclei without EGFP signaling or showing pan-nuclear noise are cells that do not express EGFP or that are poorly illuminated, and are therefore labeled as a no-signal type.

For the labeled data set, a data amplification method is used, the original image is rotated by 90 degrees, 180 degrees and 270 degrees, the final data set is amplified to 24000 (2000 × 3) × (1+3), and then the data set is randomly processed according to the following steps of 4: the proportion of 1 is divided into a training set and a verification set.

The training set directly participates in model training and is used for adjusting parameters of the model, the verification set indirectly participates in the training of the model, after each batch of training is completed, verification can be performed on the verification set and is used for adjusting hyper-parameters of the model and performing primary evaluation on the capability of the model. In addition, 300 images of single cell nuclei are additionally marked as a test set, and the test set is not involved in training and is directly used for evaluating the final model. The accuracy of the final model on the training set reaches 99.03%, the accuracy on the verification set reaches 99.15%, and the accuracy on the test set reaches 99.02%. By using the trained model, the image of the single cell nucleus input later can be predicted, and the corresponding category of each cell nucleus is output.

S31, shooting the cells acted by the DNA damage reaction inhibitor by using high content imaging equipment to obtain an image to be analyzed;

s32, inputting the image to be analyzed into the trained cell nucleus segmentation network model to obtain a single cell nucleus image;

s33, inputting the single cell nucleus image into the trained cell nucleus type judgment network model for classification decision;

s34, counting the damaged cell nucleus ratio corresponding to each DNA damage response inhibitor, wherein the smaller the damaged cell nucleus ratio is, the better the effect of the DNA damage response inhibitor is.

As shown in fig. 4, a control group, a radiation damage group and a positive drug group are selected, and shot by high content imaging equipment, and then segmented by using a trained nucleus segmentation model to obtain a series of single cell nucleus images, and then the single cell nucleus images are input into a nucleus type determination model for classification decision. And counting the proportion of damaged cell nuclei in all the images of each group.

There was a significant difference between the control group and the radiation-damaged group, with the control group having a lower proportion of damaged nuclei and the radiation-damaged group having a higher proportion of damaged nuclei. After intervention of the positive drug WR-1065, the proportion of damaged cell nuclei is equivalent to that of a control group, and is obviously different from that of a radiation damage group, so that the DNA damage reaction inhibitor can inhibit the DNA damage reaction process to a certain extent, and then the DNA damage reaction inhibitor can be further verified through a dose-effect curve, a comet assay and the like.

Claims

1. A method for screening a DNA damage response inhibitor in high throughput, which is characterized by comprising the following steps:

s1, training a cell nucleus segmentation network model based on the U-Net network by using the picture set; the structure of the cell nucleus segmentation network model comprises two serially connected U-Net networks, and the training process comprises the following steps:

s11, constructing a first U-Net network and a second U-Net network;

s12, the first data set passes through a resize function, and then is used for training a first U-Net network; the first data set is characterized by large background difference and complex image source;

s13, the second data set passes through a resize function and then is used for training a second U-Net network; wherein the second data set is characterized in that the background and the interested area are obviously contrasted, and multiple cell nucleuses are adhered;

s14, connecting the trained first U-Net network and the trained second U-Net network in series, then connecting a connected domain algorithm to obtain a cell nucleus segmentation network model, and inputting an image of the cell nucleus segmentation network model to pass through the U-Net network to obtain a first mask image; multiplying the image to be detected by the first mask image, inputting the multiplied image to a second U-Net network to obtain a second mask image, multiplying the image to be detected by the second mask image, obtaining the positions of all pixel points of each communication area through a communication area algorithm, and further cutting each communication area out independently;

s2, constructing a cell nucleus type judgment network model and training;

s3, shooting the cells after the drug action by using high content imaging equipment to obtain an image to be analyzed; inputting the image to be analyzed into a cell nucleus segmentation network model and then inputting into a cell nucleus type judgment network model, counting damaged cell nucleus ratios corresponding to each medicine, wherein the smaller the damaged cell nucleus ratio is, the better the effect of the DNA damage reaction inhibitor is, the method comprises the following steps:

s31, shooting the cells acted by the medicine by using high content imaging equipment to obtain image data;

s32, inputting the image data into the trained cell nucleus segmentation network model to obtain a single cell nucleus image;

s34, counting the damaged cell nucleus ratio corresponding to each drug, wherein the smaller the damaged cell nucleus ratio is, the better the effect of the DNA damage reaction inhibitor is.

2. The method for high throughput screening of DNA damage response inhibitors according to claim 1, wherein the first U-Net network and the second U-Net network are structured to include an encoder and a decoder, and a shortcut connection exists between the encoder and the decoder;

the structure of the encoder comprises 4-5 subblocks, except for the last subblock, each subblock comprises two convolution layers and a pooling layer which are sequentially connected, elu is used as an activation function, and a Dropout layer is added between the two convolution layers; the last subblock only comprises two convolutional layers, elu is used as an activation function, and a Dropout layer is added between the two convolutional layers;

the decoder structure comprises 4-5 subblocks, each subblock uses a transposition convolution layer firstly, then two convolution layers are connected, a Dropout layer is added between the two convolution layers, elu is used for the activation function of the two convolution layers, a convolution layer is connected behind the last subblock, and a final mask image is output.

3. The method for high throughput screening of DNA damage response inhibitors of claim 1, wherein said cell nucleus type decision network model is based on VGG-19 network, ResNet network or DenseNet network.

4. The method for high-throughput screening of DNA damage response inhibitors according to claim 1 or 3, wherein S2 is performed by constructing and training a nuclear class determination network model, specifically as follows:

s21, constructing a nucleus type judgment network model based on VGG-19;

5. The method for high-throughput screening of the DNA damage response inhibitor according to claim 1, wherein S22, the obtaining of the single cell image dataset trains a cell nucleus classification determination network model, specifically as follows:

s221, acquiring an image shot by high content equipment, and segmenting by using a trained cell nucleus segmentation network model to obtain a corresponding single cell nucleus image;

s222, manually selecting 1800-2200 damaged cell nuclei, undamaged cell nuclei and no-signal images from the single cell nucleus image and marking;

s223, amplifying the original three types of images by using a data amplification method, and then dividing the images into a training set and a verification set according to the proportion;

s224, the training set judges the network model training for the cell nucleus type, adjusts the model parameters, the verification set indirectly participates in the model training, after each batch of training is finished, the verification set is used for verification, and the hyper-parameters of the model are adjusted.