CN116705193A - Screening method of repositioning candidate medicine and application thereof - Google Patents

Screening method of repositioning candidate medicine and application thereof Download PDF

Info

Publication number
CN116705193A
CN116705193A CN202310617601.6A CN202310617601A CN116705193A CN 116705193 A CN116705193 A CN 116705193A CN 202310617601 A CN202310617601 A CN 202310617601A CN 116705193 A CN116705193 A CN 116705193A
Authority
CN
China
Prior art keywords
gene
differential expression
genes
tissue
screening
Prior art date
Legal status (The legal status is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the status listed.)
Pending
Application number
CN202310617601.6A
Other languages
Chinese (zh)
Inventor
余艳
孙蔓蔓
苏洁琼
唐春燕
李莉莉
袁悉奥
Current Assignee (The listed assignees may be inaccurate. Google has not performed a legal analysis and makes no representation or warranty as to the accuracy of the list.)
Changsha Jinyu Medical Laboratory Co ltd
Original Assignee
Changsha Jinyu Medical Laboratory Co ltd
Priority date (The priority date is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the date listed.)
Filing date
Publication date
Application filed by Changsha Jinyu Medical Laboratory Co ltd filed Critical Changsha Jinyu Medical Laboratory Co ltd
Priority to CN202310617601.6A priority Critical patent/CN116705193A/en
Publication of CN116705193A publication Critical patent/CN116705193A/en
Pending legal-status Critical Current

Links

Classifications

    • GPHYSICS
    • G16INFORMATION AND COMMUNICATION TECHNOLOGY [ICT] SPECIALLY ADAPTED FOR SPECIFIC APPLICATION FIELDS
    • G16CCOMPUTATIONAL CHEMISTRY; CHEMOINFORMATICS; COMPUTATIONAL MATERIALS SCIENCE
    • G16C20/00Chemoinformatics, i.e. ICT specially adapted for the handling of physicochemical or structural data of chemical particles, elements, compounds or mixtures
    • G16C20/50Molecular design, e.g. of drugs
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F18/00Pattern recognition
    • G06F18/20Analysing
    • G06F18/21Design or setup of recognition systems or techniques; Extraction of features in feature space; Blind source separation
    • G06F18/213Feature extraction, e.g. by transforming the feature space; Summarisation; Mappings, e.g. subspace methods
    • G06F18/2135Feature extraction, e.g. by transforming the feature space; Summarisation; Mappings, e.g. subspace methods based on approximation criteria, e.g. principal component analysis
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F18/00Pattern recognition
    • G06F18/20Analysing
    • G06F18/23Clustering techniques
    • GPHYSICS
    • G16INFORMATION AND COMMUNICATION TECHNOLOGY [ICT] SPECIALLY ADAPTED FOR SPECIFIC APPLICATION FIELDS
    • G16BBIOINFORMATICS, i.e. INFORMATION AND COMMUNICATION TECHNOLOGY [ICT] SPECIALLY ADAPTED FOR GENETIC OR PROTEIN-RELATED DATA PROCESSING IN COMPUTATIONAL MOLECULAR BIOLOGY
    • G16B20/00ICT specially adapted for functional genomics or proteomics, e.g. genotype-phenotype associations
    • GPHYSICS
    • G16INFORMATION AND COMMUNICATION TECHNOLOGY [ICT] SPECIALLY ADAPTED FOR SPECIFIC APPLICATION FIELDS
    • G16BBIOINFORMATICS, i.e. INFORMATION AND COMMUNICATION TECHNOLOGY [ICT] SPECIALLY ADAPTED FOR GENETIC OR PROTEIN-RELATED DATA PROCESSING IN COMPUTATIONAL MOLECULAR BIOLOGY
    • G16B40/00ICT specially adapted for biostatistics; ICT specially adapted for bioinformatics-related machine learning or data mining, e.g. knowledge discovery or pattern finding
    • GPHYSICS
    • G16INFORMATION AND COMMUNICATION TECHNOLOGY [ICT] SPECIALLY ADAPTED FOR SPECIFIC APPLICATION FIELDS
    • G16CCOMPUTATIONAL CHEMISTRY; CHEMOINFORMATICS; COMPUTATIONAL MATERIALS SCIENCE
    • G16C20/00Chemoinformatics, i.e. ICT specially adapted for the handling of physicochemical or structural data of chemical particles, elements, compounds or mixtures
    • G16C20/70Machine learning, data mining or chemometrics
    • YGENERAL TAGGING OF NEW TECHNOLOGICAL DEVELOPMENTS; GENERAL TAGGING OF CROSS-SECTIONAL TECHNOLOGIES SPANNING OVER SEVERAL SECTIONS OF THE IPC; TECHNICAL SUBJECTS COVERED BY FORMER USPC CROSS-REFERENCE ART COLLECTIONS [XRACs] AND DIGESTS
    • Y02TECHNOLOGIES OR APPLICATIONS FOR MITIGATION OR ADAPTATION AGAINST CLIMATE CHANGE
    • Y02ATECHNOLOGIES FOR ADAPTATION TO CLIMATE CHANGE
    • Y02A90/00Technologies having an indirect contribution to adaptation to climate change
    • Y02A90/10Information and communication technologies [ICT] supporting adaptation to climate change, e.g. for weather forecasting or climate simulation

Landscapes

  • Engineering & Computer Science (AREA)
  • Life Sciences & Earth Sciences (AREA)
  • Theoretical Computer Science (AREA)
  • Bioinformatics & Cheminformatics (AREA)
  • Data Mining & Analysis (AREA)
  • Physics & Mathematics (AREA)
  • Health & Medical Sciences (AREA)
  • Bioinformatics & Computational Biology (AREA)
  • Computer Vision & Pattern Recognition (AREA)
  • Evolutionary Biology (AREA)
  • Evolutionary Computation (AREA)
  • Medical Informatics (AREA)
  • General Health & Medical Sciences (AREA)
  • Chemical & Material Sciences (AREA)
  • Artificial Intelligence (AREA)
  • Spectroscopy & Molecular Physics (AREA)
  • General Physics & Mathematics (AREA)
  • Crystallography & Structural Chemistry (AREA)
  • Biotechnology (AREA)
  • Computing Systems (AREA)
  • Software Systems (AREA)
  • Databases & Information Systems (AREA)
  • Biophysics (AREA)
  • General Engineering & Computer Science (AREA)
  • Bioethics (AREA)
  • Measuring Or Testing Involving Enzymes Or Micro-Organisms (AREA)
  • Pharmacology & Pharmacy (AREA)
  • Epidemiology (AREA)
  • Medicinal Chemistry (AREA)
  • Analytical Chemistry (AREA)
  • Genetics & Genomics (AREA)
  • Molecular Biology (AREA)
  • Proteomics, Peptides & Aminoacids (AREA)
  • Public Health (AREA)

Abstract

The invention relates to a screening method of repositioning candidate drugs and application thereof, and relates to the technical field of bioinformatics. The screening method integrates single-cell transcriptome sequencing technology and transcriptome sequencing technology, and selects genes which are up-regulated and down-regulated in the disease sample compared with a normal sample from single-cell transcriptome sequencing data and transcriptome sequencing data of the disease, inputs the genes into a transcriptome signal database, and screens out drugs which generate 'negative' related transcriptome signals. The medicines screened by the method can theoretically reverse transcriptome changes caused by the disease so as to enable the transcriptome to return to normal, and can be listed as repositioning candidate medicines for treating the disease.

Description

Screening method of repositioning candidate medicine and application thereof
Technical Field
The invention relates to the technical field of bioinformatics, in particular to a screening method of repositioning candidate drugs and application thereof.
Background
The traditional new medicine research and development mainly comprises preclinical research, preclinical experiments, clinical experiments and the like, and the process needs to consume a great deal of time and expense, and has high research and development cost and high failure rate. Drug repositioning refers to the development strategy for related research on drugs that have been marketed or under development for new therapeutic uses. Since the relocated drug uses approved drugs or compounds under development, which have been tested in humans, we can obtain information about pharmacology, dosage, possible toxicity and formulation. Compared with the traditional drug development method, drug repositioning can significantly reduce development cost and shorten drug development time. There are many successful examples of drug repositioning at present, for example minoxidil approved by the U.S. FDA for the treatment of hypertension in 1988 may be used to treat hair loss; the american preventive services working group issued a suggested draft for pain relief that aspirin could help prevent cardiovascular disease and colorectal cancer in 2015.
Drug repositioning is based on both experimental and computer-based prediction methods. Of the experimental-based methods, the most typical method is phenotypic screening. With the development of sequencing technology, a calculation and prediction method based on transcriptome signal matching is mature gradually, and a conventionally adopted method is to screen repositioning candidate medicines capable of being used as the diseases from conventional transcriptome data of the diseases. However, screening methods based on conventional transcriptome sequencing data present certain false positives.
Disclosure of Invention
Aiming at the problems, the invention provides a screening method of repositioning candidate drugs, which integrates a single-cell transcriptome sequencing technology and a conventional transcriptome sequencing technology to respectively take intersections of differential expression genes obtained by analyzing two sets of sequencing data, so that a more real differential expression gene result is obtained, and false positives are avoided.
In order to achieve the above object, the present invention provides a method for screening a drug candidate for repositioning, comprising the steps of:
single cell differential expression gene screening: obtaining focus single cell transcriptome sequencing data of a positive sample group to be screened and single cell transcriptome sequencing data of corresponding tissues of a control sample group, analyzing and screening differential expression genes transcribed by single cells in the positive sample group and the control sample group to obtain single cell differential expression up-regulation genes and/or single cell differential expression down-regulation genes;
tissue differential expression gene screening: obtaining focus tissue transcriptome sequencing data of a positive sample group of a disease to be screened and tissue transcriptome sequencing data of tissue corresponding to a control sample group, analyzing a gene counts matrix of tissue transcriptome sequencing by adopting PCA principal component analysis and hierarchical clustering analysis, selecting sample data which are correctly grouped in the PCA principal component analysis and the hierarchical clustering analysis, analyzing and screening differential expression genes of tissue transcription in the positive sample group and the control sample group, and obtaining tissue transcription differential expression up-regulation genes and/or tissue transcription differential expression down-regulation genes;
candidate drug screening: taking intersection of the single cell differential expression up-regulating gene and the tissue differential expression up-regulating gene to obtain differential expression up-regulating intersection gene; taking intersection of the single cell differential expression down-regulating gene and the tissue differential expression down-regulating gene to obtain differential expression down-regulating intersection gene; screening for genes associated with differentially up-regulated intersection genes as up-regulated candidate genes, and/or screening for genes associated with differentially down-regulated intersection genes as down-regulated candidate genes, inhibitors or antagonists of the up-regulated candidate genes, and/or activators or promoters of the down-regulated candidate genes as relocated candidate drugs.
The screening method integrates single-cell transcriptome sequencing technology and conventional transcriptome sequencing technology, and selects genes which are expressed up and down in a disease sample compared with a normal sample from single-cell transcriptome sequencing data and conventional transcriptome sequencing data of the disease, inputs the genes into a transcriptome signal database, and screens out drugs which generate 'negative' related transcriptome signals. The medicines screened by the method can theoretically reverse transcriptome changes caused by the disease so as to enable the transcriptome to return to normal, so that the medicines can be listed as repositioning candidate medicines for treating the disease. Compared with the single tissue transcriptome sequencing data, the differential expression genes obtained by analyzing the two sets of sequencing data are respectively intersected to obtain a more real differential expression gene result, so that false positives are avoided. Meanwhile, the single cell transcriptome sequencing has low sequencing depth and high sensitivity, so that the accuracy is poor, and compared with the conventional transcriptome sequencing, the conventional transcriptome sequencing has high sequencing depth and more mature technology, so that the screening method obtained by integrating the single cell transcriptome sequencing technology and the conventional transcriptome sequencing technology can complement the two technologies, and the finally obtained result is more reliable.
In the analysis step, in the tissue differential expression gene screening step, a tumor tissue or a normal tissue sample is obtained by transcriptome sequencing, average data of transcriptomes of a group of cells including normal cells and cancer cells are obtained, and the sample grouping is verified by PCA principal component analysis and hierarchical clustering analysis, so that the tumor tissue subjected to sequencing analysis is mainly tumor cells, and the normal tissue is mainly normal cells, thereby realizing that the sample grouping has no error, namely the meaning of the correct grouping.
In one embodiment, the single cell differential expression gene screening step further includes filtering and normalizing the gene counts matrix in the single cell transcriptome sequencing data after obtaining focal single cell transcriptome sequencing data corresponding to the positive sample set and the control sample set of the disease to be screened and before analyzing and screening differential expression genes transcribed by single cells in the positive sample set and the control sample set;
in the tissue differential expression gene screening step, after focal tissue transcriptome sequencing data corresponding to a positive sample group and a control sample group of a disease to be screened are obtained, the gene counts matrix in the tissue transcriptome sequencing data is filtered and standardized before the normalized gene counts matrix of the tissue transcriptome sequencing is analyzed by adopting PCA principal component analysis and hierarchical cluster analysis.
In one embodiment, the filtering of the gene counts matrix in single cell transcriptome sequencing data is performed under the following conditions: when the number of expressed genes of a cell is more than or equal to 200, the cell is reserved; when a gene is expressed in 3 or more cells, the gene is retained.
In one embodiment, the filtering of the matrix of gene counts in the tissue transcriptome sequencing data is performed under the following conditions: when a gene is a protein-encoding gene, the gene is retained.
In one embodiment, in the single cell differential expression gene screening step, the gene counts matrix in the single cell transcriptome sequencing data is normalized using a normazedata function of R package at; adopting FindMarkers function analysis of R package seal to screen differential expression genes of single cell transcriptome in a positive sample group and a control sample group;
in the tissue differential expression gene screening step, a gene count matrix in the tissue transcriptome sequencing data is standardized by adopting a quantization factor method of R-pack DESeq 2; screening differential expression genes transcribed from tissues in a positive sample group and a control sample group by adopting R-pack DESeq2 analysis;
in the candidate drug screening step, a venn.diagram function of R-packet venn diagram is adopted to take intersection of single cell differential expression up-regulating genes and tissue differential expression up-regulating genes; the single cell differential expression down-regulating gene and the tissue differential expression down-regulating gene were crossed using the venn.diagram function of R-packet venn diagram.
The quantization factor method is a method for normalizing the expression matrix by the R package DESeq2, and the normalization step of the DESeq2 is specifically as follows: 1. taking logarithmic transformation for count matrix; 2. taking the average number of the logarithm transformed genes; 3. filtering-Inf genes, wherein the filtered genes do not participate in the calculation of the normalization factor; 4. subtracting the logarithmic mean value obtained in the step 3 from the logarithmic matrix obtained in the step 1 in the standardization process of the DESeq2 to obtain a logarithmic ratio matrix; 5. calculating the median of the logarithmic ratio matrix of each sample; 6. converting the logarithmic median into the corresponding true number to obtain the standardized factors of each sample; 7. the original expression matrix is divided by the normalization factor obtained in step 6 in the normalization process of the DESeq 2.
In one embodiment, the conditions for analyzing and screening for differentially expressed genes transcribed from a single cell in a positive sample set and a control sample set comprise: the difference multiple of gene expression quantity between groups is more than or equal to 1, the P value is less than or equal to 0.05, and the proportion of the gene expression number to the total number of the cells is more than or equal to 0.5.
In one embodiment, the conditions for analyzing and screening the positive sample group and the control sample group for differentially expressed genes transcribed from the tissue comprise: the difference multiple of gene expression quantity between groups is more than or equal to 1, and the P value is less than or equal to 0.05.
In one embodiment, the candidate drug screening step screens the CMap database for genes associated with differentially up-regulated intersection genes as up-regulated candidate genes, and/or for genes associated with differentially down-regulated intersection genes as down-regulated candidate genes, and/or for inhibitors or antagonists of up-regulated candidate genes with a score of-90, and/or for activators or promoters of down-regulated candidate genes with a score of-90.
The Connectivity Map (CMap) database is the most authoritative of transcriptome signal databases developed by researchers at harvard, cambridge university and the university of hemp and province. The developer first treated the different cell lines with thousands of drugs, and then sequenced these cell lines and recorded the gene expression profile. A user first submits a list of differentially expressed genes that are up-and down-regulated in the disease, which is compared to a reference data set stored in the CMap database to obtain a relevance score (-100). Positive numbers indicate that the up-and down-regulated differential expression genes entered by the user have similarity to the reference gene expression profile; negative numbers indicate that the input is inversely related to the reference gene expression profile. In the present invention, the drug that produces the negative-related transcriptome signal is the target candidate drug.
The invention also provides a medicine for treating breast cancer, the active ingredients of the medicine are obtained by adopting the screening method, and the active ingredients of the medicine comprise:
down-regulating at least 1 of the agents expressed by GPRC5A, PRSS8, MRPS34, CA12, TPD52, PAFAH1B3, FXYD3, ESR1, EZR, MAGED2, PYCARD, TRPS1, HSPB1, TSPAN13, AGR2, GATA3, CCND1, SCGB2A2, KRT18, MLPH, SPDEF, SCGB1D2, FOXA1, SH3BGRL, RAB25, PKIB, SLC39A6, CRABP2, SCCPDH, SRP9, MAL2, ANKRD30A, aloa, RAB11FIP1, TFF3, ZG16B, COX6C, TMED3, spit 2, DEGS2, KRT8, KRT19, AGR3, CLDN7, LRRC26, MUC1, S100A14, CRIP1, STARD10, td1, MIF, SMIM22, and CD 24;
and/or up-regulating at least 1 of agents expressed by PDK4, VIM, YBX3, PLPP1, MGLL, ITM2A, ITGA6, PALMD, EIF3L, CYB5R3, MYL9, CAV2, CAV1, VWF, LDHB, NEDD9, SPTBN1, EPAS1, EGR1, ARL4A, ID1, GNG11, RNASE1, EMP1, ANXA1, TINAGL1, PNRC1, TACC1, NFIB, GSN, SERPING1, ARID5B, ZFP36L2, SPARCL1, ETS2, CSRP1, RCAN1, RBP7, ADGRL4, TGFBR2, EMCN, RY1, FABP5, CYYR1, FABP4, SNCG, CD34, A2M, CLEC14A, GIMAP7, CLDN5, SOCS3, KR1, AQP1, PEP 1, PECAM and NIP.
In one embodiment, the active ingredients of the medicament include: panbendazole, QL-XII-47, XMD-1150, ipecac, wart-sporine A, narcissus, anisomycin, cycloheximide, digitoxin aglycone, vorinostat, NCH-51, hydralazine, aphicillin, doxofmorphine, digoxin, ISOX, proto-chives glycoside, and ipecac.
Compared with the prior art, the invention has the following beneficial effects:
according to the screening method of the repositioning candidate drugs and the application thereof, disclosed by the invention, the single-cell transcriptome sequencing technology and the conventional transcriptome sequencing technology are integrated, and the differential expression genes obtained by analyzing two sets of sequencing data are respectively intersected to obtain a more real differential expression gene result, so that false positives are avoided.
Drawings
FIG. 1 is a flowchart of a screening method in example 1;
FIG. 2 is a graph showing the results of PCA principal component analysis in example 1;
FIG. 3 is a graph showing the results of differentially expressed genes in example 1.
Detailed Description
In order that the invention may be readily understood, a more complete description of the invention will be rendered by reference to the appended drawings. Preferred embodiments of the present invention are shown in the drawings. This invention may, however, be embodied in many different forms and should not be construed as limited to the embodiments set forth herein. Rather, these embodiments are provided so that this disclosure will be thorough and complete.
Unless defined otherwise, all technical and scientific terms used herein have the same meaning as commonly understood by one of ordinary skill in the art to which this invention belongs. The terminology used herein in the description of the invention is for the purpose of describing particular embodiments only and is not intended to be limiting of the invention. The term "and/or" as used herein includes any and all combinations of one or more of the associated listed items.
The source is as follows:
the reagents, materials and equipment used in the examples are all commercially available sources unless otherwise specified; the test methods are conventional in the art unless otherwise specified.
Example 1
A screening method of repositioning candidate drugs aiming at Er alpha positive breast cancer.
The flow of the screening method is shown in fig. 1, and the specific steps are as follows.
1. Filtering and normalizing the gene counts matrix of single cell transcriptome sequencing.
1. Sequencing data (GSE 176078) of the article "A single-cell and spatially resolved atlas of human breast cancers" (PMID: 34493872) published in Nature Genetics was first downloaded in database NCBI.
The downloaded data is then imported through the Read10X function of R package setup.
Finally, the imported data (i.e. cell-gene expression matrix of each cell) is filtered by the CreateEurokObject function of R package setup, min.features=200, min.cells=3 are set, i.e. cells expressing at least 200 genes are selected, and genes expressed in at least 3 cells;
2. the filtered data were normalized by the normazedata function of R package setup.
The process uses a normalization method of "lognomize" global scaling, normalizes the gene expression value of each cell by the total expression value, multiplies it by a scaling factor (set to 10,000), and finally logarithmically transforms the result.
2. Calculating differential expression genes according to the standardized single cell sequencing data;
firstly, extracting the gene expression matrix of subtype ERalpha+ cells and normal cells from the cell-gene expression matrix after the standardization of the step one by using a function subset, then calculating differential expression genes by using a FindMarkers function of R package, and setting screening conditions as logFC > =1, p.value < = 0.05 and min.pct > =0.5.
Wherein logFC refers to the fold difference in gene expression levels between groups; the value is obtained through detection and calculation of WilcoxDETest; and the minute pct represents the proportion of the expression number of the gene to the total number of the cell.
This step was analyzed to obtain 96 up-regulated differentially expressed genes and 144 down-regulated differentially expressed genes.
3. Filtering and normalization of transcriptome sequenced gene counts matrices.
1. Firstly, a gene expression matrix of a breast cancer sample is downloaded in a database TCGA, and then, the gene expression matrix of cancer tissues and normal tissues of patients with subtype ERalpha+ is extracted according to sample information. Finally, filtering the gene expression matrix of the cancer tissue and the normal tissue, and reserving the gene which is protein-coding in the HGNC database;
2. the gene expression matrix after screening was then normalized using DESeq 2. This step is mainly performed by normalizing the counts using a quantization factor method.
4. Sample grouping verification is carried out on the standardized transcriptome sequencing data, wherein the sample grouping verification comprises PCA principal component analysis and hierarchical clustering analysis.
1. Principal component analysis by PCA means that a principal component analysis is plotted for a sample using a plotPCA function, and a principal component analysis is plotted for a sample using a plotPCA function. The operator determines whether the grouping of samples is reasonable by determining whether samples of the same group are clustered together on the PCA result plot and whether different groups are distinguished.
The PCA results plot shows samples grouped together and the different groupings are distinguished, the results are shown in fig. 2;
2. hierarchical clustering refers to an analytical method for classifying samples with hcrout functions, which are used to map the hierarchical clustering analysis of samples. An operator judges whether the sample grouping is reasonable by judging whether the samples in the same grouping are clustered into one class in hierarchical clustering and whether different groupings are clustered into different classes.
The hierarchical clustering graph shows that samples in the same group are gathered into one type, and different groups are gathered into different types;
3. and the PCA principal component analysis result and the hierarchical clustering analysis result show that the sample grouping is correct, namely if the samples in the same group are gathered together, the samples in different groups are distinguished, the sample grouping is proved to be reasonable, and the next analysis can be continued. If samples of different groupings are clustered together, then the samples of the grouping anomaly need to be culled. For example, if the samples a1 in group a are clustered into group B, then the samples a1 need to be culled.
5. Differential expression genes were calculated based on transcriptome sequencing data with no errors after PCA validation.
In this step, the differential expression gene analysis was performed on the normalized and verified gene counts matrix using DESeq2, and the screening conditions were set to logFC > =1, p.val < = 0.05, and the results are shown in fig. 3.
logFC refers to the fold difference in gene expression levels between groups; val is calculated by Wald test.
This step was analyzed to yield 2256 up-regulated differentially expressed genes and 2506 down-regulated differentially expressed genes.
6. The differential expression up-regulation and down-regulation genes obtained by single cell transcriptome sequencing data analysis are respectively intersected with the differential expression up-regulation and down-regulation genes obtained by transcriptome sequencing data analysis.
In the step, differential expression up-regulation and down-regulation genes obtained by analyzing single cell transcriptome sequencing data are respectively subjected to analysis by using a venn.diagram function of R package VennDiagram, and intersection sets are obtained by respectively carrying out differential expression up-regulation and down-regulation genes obtained by analyzing the transcriptome sequencing data, so that 53 differential expression up-regulation intersection genes and 56 differential expression down-regulation intersection genes are respectively obtained.
The above 53 differentially up-regulated intersection genes were: GPRC5A, PRSS8, MRPS34, CA12, TPD52, PAFAH1B3, FXYD3, ESR1, EZR, MAGED2, PYCARD, TRPS1, HSPB1, TSPAN13, AGR2, GATA3, CCND1, SCGB2A2, KRT18, MLPH, SPDEF, SCGB1D2, FOXA1, SH3BGRL, RAB25, PKIB, SLC39A6, CRABP2, SCCPDH, SRP9, MAL2, ANKRD30A, ALDOA, RAB11FIP1, TFF3, ZG16B, COX6C, TMED3, SPINT2, DEGS2, KRT8, KRT19, AGR3, CLDN7, LRRC26, MUC1, S100A14, CRIP1, STARD10, TSTD1, MIF, SMIM22, CD24.
The 56 intersection genes of differential expression down-regulation are: PDK4, VIM, YBX3, PLPP1, MGLL, ITM2A, ITGA6, PALMD, EIF3L, CYB5R3, MYL9, CAV2, CAV1, VWF, LDHB, NEDD, SPTBN1, EPAS1, EGR1, ARL4A, ID1, GNG11, RNASE1, EMP1, ANXA1, tinadl 1, PNRC1, TACC1, NFIB, GSN, SERPING1, ARID5B, ZFP36L2, SPARCL1, ETS2, CSRP1, RCAN1, RBP7, ADGRL4, TGFBR2, EMCN, SPRY1, FABP5, CYYR1, FABP4, SNCG, CD34, A2M, CLEC14A, GIMAP7, CLDN5, SOCS3, actkr 1, AQP1, PECAM1, TXNIP.
7. Genes up-and down-regulated by differential expression related compounds were retrieved in database CMap and negatively correlated.
The step is to input the differential expression up-regulation and down-regulation gene list obtained in the step six into a database CMap, select Compound in the returned result page after submitting, and screen score < = -90, namely take negative related compounds.
This procedure gave 18 negatively related compounds, specifically: pabendazole (Parbendazole), QL-XII-47, XMD-1150, ipeline (emetin), wart-sporin A (verrucarin-a), narcicline (narciclasine), anisomycin (anisomycin), cycloheximide (cyclopemide), digitoxigenin (digitoxigenin), vorinostat (vorinostat), NCH-51, hydralazine (hydrazine), aphicillin (apigenin), doxoforphine (dorsomorphin), digoxin (digoxin), ISOX, protosea allicin (proccolidine), ipeline (cephaerin).
The inventors annotated these 18 negatively related compounds, which showed that 9 compounds have been approved for the treatment of cancer, where vorinostat was determined to be relevant for the treatment of breast cancer, and have begun clinical trials of breast cancer treatment, demonstrating the reliability of the predictions of this method of the invention.
The other 9 compounds have not been approved for the treatment of cancer, nor have no related studies or reports mentioned that these 9 compounds are associated with breast cancer. The inventors consider these 9 compounds as potential candidate reuse drugs for treating era+ breast cancer, providing directions for clinical trials of subsequent drugs, as shown in the following table.
TABLE 1 case of 18 negatively related compounds
The technical features of the above-described embodiments may be arbitrarily combined, and all possible combinations of the technical features in the above-described embodiments are not described for brevity of description, however, as long as there is no contradiction between the combinations of the technical features, they should be considered as the scope of the description.
The above examples illustrate only a few embodiments of the invention, which are described in detail and are not to be construed as limiting the scope of the invention. It should be noted that it will be apparent to those skilled in the art that several variations and modifications can be made without departing from the spirit of the invention, which are all within the scope of the invention. Accordingly, the scope of protection of the present invention is to be determined by the appended claims.

Claims (10)

1. A method of screening a drug candidate for repositioning, comprising the steps of:
single cell differential expression gene screening: obtaining focus single cell transcriptome sequencing data of a positive sample group to be screened and single cell transcriptome sequencing data of corresponding tissues of a control sample group, analyzing and screening differential expression genes transcribed by single cells in the positive sample group and the control sample group to obtain single cell differential expression up-regulation genes and/or single cell differential expression down-regulation genes;
tissue differential expression gene screening: obtaining focus tissue transcriptome sequencing data of a positive sample group of a disease to be screened and tissue transcriptome sequencing data of tissue corresponding to a control sample group, analyzing a gene counts matrix of tissue transcriptome sequencing by adopting PCA principal component analysis and hierarchical clustering analysis, selecting sample data which are correctly grouped in the PCA principal component analysis and the hierarchical clustering analysis, analyzing and screening differential expression genes of tissue transcription in the positive sample group and the control sample group, and obtaining tissue transcription differential expression up-regulation genes and/or tissue transcription differential expression down-regulation genes;
candidate drug screening: taking intersection of the single cell differential expression up-regulating gene and the tissue differential expression up-regulating gene to obtain differential expression up-regulating intersection gene; taking intersection of the single cell differential expression down-regulating gene and the tissue differential expression down-regulating gene to obtain differential expression down-regulating intersection gene; screening for genes associated with differentially up-regulated intersection genes as up-regulated candidate genes, and/or screening for genes associated with differentially down-regulated intersection genes as down-regulated candidate genes, inhibitors or antagonists of the up-regulated candidate genes, and/or activators or promoters of the down-regulated candidate genes as relocated candidate drugs.
2. The screening method according to claim 1, wherein in the single cell differential expression gene screening step, after obtaining focal single cell transcriptome sequencing data of a positive sample group to be screened and single cell transcriptome sequencing data of a corresponding tissue of a control sample group, filtering and normalizing the gene counts matrix in the single cell transcriptome sequencing data before analyzing and screening differential expression genes transcribed by single cells in the positive sample group and the control sample group;
in the tissue differential expression gene screening step, after focal tissue transcriptome sequencing data of a positive sample group of a disease to be screened and tissue transcriptome sequencing data of a tissue corresponding to a control sample group are obtained, the normalized tissue transcriptome sequencing gene count matrix is filtered and normalized before the normalized tissue transcriptome sequencing gene count matrix is analyzed by adopting PCA principal component analysis and hierarchical clustering analysis.
3. The method of claim 2, wherein the filtering the matrix of gene counts in the single cell transcriptome sequencing data is performed under the following conditions: when the number of expressed genes of a cell is more than or equal to 200, the cell is reserved; when a gene is expressed in 3 or more cells, the gene is retained.
4. The method of claim 2, wherein the filtering of the matrix of gene counts in the tissue transcriptome sequencing data is performed under the following conditions: when a gene is a protein-encoding gene, the gene is retained.
5. The screening method according to claim 2, wherein in the single cell differential expression gene screening step, the gene counts matrix in the single cell transcriptome sequencing data is normalized using a normazedata function of R package setup; adopting FindMarkers function analysis of R package seal to screen differential expression genes of single cell transcriptome in a positive sample group and a control sample group;
in the tissue differential expression gene screening step, a gene count matrix in the tissue transcriptome sequencing data is standardized by adopting a quantization factor method of R-pack DESeq 2; screening differential expression genes transcribed from tissues in a positive sample group and a control sample group by adopting R-pack DESeq2 analysis;
in the candidate drug screening step, a venn.diagram function of R-packet venn diagram is adopted to take intersection of single cell differential expression up-regulating genes and tissue differential expression up-regulating genes; the single cell differential expression down-regulating gene and the tissue differential expression down-regulating gene were crossed using the venn.diagram function of R-packet venn diagram.
6. The method of claim 5, wherein the analyzing conditions for screening the differentially expressed genes transcribed by single cells in the positive sample group and the control sample group comprises: the difference multiple of gene expression quantity between groups is more than or equal to 1, the P value is less than or equal to 0.05, and the proportion of the gene expression number to the total number of the cells is more than or equal to 0.5.
7. The method of claim 5, wherein the analyzing conditions for screening the positive sample group and the control sample group for differentially expressed genes transcribed from the tissue comprises: the difference multiple of gene expression quantity between groups is more than or equal to 1, and the P value is less than or equal to 0.05.
8. The method according to claim 5, wherein in the candidate drug screening step, a gene associated with an intersection gene whose differential expression is up-regulated is screened in a CMap database as an up-regulated candidate gene, and/or a gene associated with an intersection gene whose differential expression is down-regulated is screened as a down-regulated candidate gene, and an inhibitor or antagonist of the up-regulated candidate gene whose score is not more than-90, and/or an activator or promoter of the down-regulated candidate gene whose score is not more than-90 is a candidate drug.
9. A medicament for treating breast cancer, wherein the active ingredients of the medicament are screened by the screening method according to any one of claims 1 to 8, and the active ingredients of the medicament comprise:
down-regulating at least 1 of the agents expressed by GPRC5A, PRSS8, MRPS34, CA12, TPD52, PAFAH1B3, FXYD3, ESR1, EZR, MAGED2, PYCARD, TRPS1, HSPB1, TSPAN13, AGR2, GATA3, CCND1, SCGB2A2, KRT18, MLPH, SPDEF, SCGB1D2, FOXA1, SH3BGRL, RAB25, PKIB, SLC39A6, CRABP2, SCCPDH, SRP9, MAL2, ANKRD30A, aloa, RAB11FIP1, TFF3, ZG16B, COX6C, TMED3, spit 2, DEGS2, KRT8, KRT19, AGR3, CLDN7, LRRC26, MUC1, S100A14, CRIP1, STARD10, td1, MIF, SMIM22, and CD 24;
and/or up-regulating at least 1 of agents expressed by PDK4, VIM, YBX3, PLPP1, MGLL, ITM2A, ITGA6, PALMD, EIF3L, CYB5R3, MYL9, CAV2, CAV1, VWF, LDHB, NEDD9, SPTBN1, EPAS1, EGR1, ARL4A, ID1, GNG11, RNASE1, EMP1, ANXA1, TINAGL1, PNRC1, TACC1, NFIB, GSN, SERPING1, ARID5B, ZFP36L2, SPARCL1, ETS2, CSRP1, RCAN1, RBP7, ADGRL4, TGFBR2, EMCN, RY1, FABP5, CYYR1, FABP4, SNCG, CD34, A2M, CLEC14A, GIMAP7, CLDN5, SOCS3, KR1, AQP1, PEP 1, PECAM and NIP.
10. The medicament according to claim 9, wherein the active ingredients of the medicament comprise: panbendazole, QL-XII-47, XMD-1150, ipecac, wart-sporine A, narcissus, anisomycin, cycloheximide, digitoxin aglycone, vorinostat, NCH-51, hydralazine, aphicillin, doxofmorphine, digoxin, ISOX, proto-chives glycoside, and ipecac.
CN202310617601.6A 2023-05-29 2023-05-29 Screening method of repositioning candidate medicine and application thereof Pending CN116705193A (en)

Priority Applications (1)

Application Number Priority Date Filing Date Title
CN202310617601.6A CN116705193A (en) 2023-05-29 2023-05-29 Screening method of repositioning candidate medicine and application thereof

Applications Claiming Priority (1)

Application Number Priority Date Filing Date Title
CN202310617601.6A CN116705193A (en) 2023-05-29 2023-05-29 Screening method of repositioning candidate medicine and application thereof

Publications (1)

Publication Number Publication Date
CN116705193A true CN116705193A (en) 2023-09-05

Family

ID=87824967

Family Applications (1)

Application Number Title Priority Date Filing Date
CN202310617601.6A Pending CN116705193A (en) 2023-05-29 2023-05-29 Screening method of repositioning candidate medicine and application thereof

Country Status (1)

Country Link
CN (1) CN116705193A (en)

Cited By (2)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN117079726A (en) * 2023-10-16 2023-11-17 浙江大学长三角智慧绿洲创新中心 Database visualization method based on single cells and related equipment
CN117210553A (en) * 2023-09-13 2023-12-12 武汉科技大学 Application of TMED3 as acting target in preparation of medicines for preventing and treating myocardial remodeling

Cited By (4)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN117210553A (en) * 2023-09-13 2023-12-12 武汉科技大学 Application of TMED3 as acting target in preparation of medicines for preventing and treating myocardial remodeling
CN117210553B (en) * 2023-09-13 2024-02-13 武汉科技大学 Application of TMED3 as acting target in preparation of medicines for preventing and treating myocardial remodeling
CN117079726A (en) * 2023-10-16 2023-11-17 浙江大学长三角智慧绿洲创新中心 Database visualization method based on single cells and related equipment
CN117079726B (en) * 2023-10-16 2024-01-30 浙江大学长三角智慧绿洲创新中心 Database visualization method based on single cells and related equipment

Similar Documents

Publication Publication Date Title
CN116705193A (en) Screening method of repositioning candidate medicine and application thereof
US11621080B2 (en) Methods and machine learning systems for predicting the likelihood or risk of having cancer
Jézéquel et al. Gene-expression molecular subtyping of triple-negative breast cancer tumours: importance of immune response
Jayawardana et al. Determination of prognosis in metastatic melanoma through integration of clinico‐pathologic, mutation, mRNA, microRNA, and protein information
Whiteford et al. Credentialing preclinical pediatric xenograft models using gene expression and tissue microarray analysis
CN103733065B (en) Molecular diagnostic assay for cancer
CA3152591C (en) Lung cancer biomarkers and uses thereof
Bicciato et al. Pattern identification and classification in gene expression data using an autoassociative neural network model
US20120015843A1 (en) Gene and gene expressed protein targets depicting biomarker patterns and signature sets by tumor type
Simon Genomic clinical trials and predictive medicine
Moretti et al. The class of microarray games and the relevance index for genes
Simon Biomarker based clinical trial design
CN111676288B (en) System for predicting lung adenocarcinoma patient prognosis and application thereof
US20190018930A1 (en) Method for building a database
Alcorta et al. Microarray studies of gene expression in circulating leukocytes in kidney diseases
CN107208131A (en) Method for lung cancer parting
Tarca et al. Methodological approach from the best overall team in the sbv improver diagnostic signature challenge
CN107849613A (en) Method for lung cancer parting
Chen et al. PRKAR1A and SDCBP serve as potential predictors of heart failure following acute myocardial infarction
Simon Drug-diagnostics co-development in oncology
CN115881296B (en) Thyroid papillary carcinoma (PTC) risk auxiliary layering system
Li et al. Identifying diagnostic biomarkers of breast cancer based on gene expression data and ensemble feature selection
TW202325857A (en) Identification system of circulating biomarkers for cancer detection, development method of circulating biomarkers for cancer detection, cancer detection method and kit
CN114822690A (en) Multi-class multifunctional intelligent classification method applied to whole genome expression profile data
CN114121150A (en) Cancer drug sensitivity prediction method, system, storage medium and terminal

Legal Events

Date Code Title Description
PB01 Publication
PB01 Publication
SE01 Entry into force of request for substantive examination
SE01 Entry into force of request for substantive examination