CN117095744A - Copy number variation detection method based on single-sample high-throughput transcriptome sequencing data - Google Patents

Copy number variation detection method based on single-sample high-throughput transcriptome sequencing data Download PDF

Info

Publication number
CN117095744A
CN117095744A CN202311056237.7A CN202311056237A CN117095744A CN 117095744 A CN117095744 A CN 117095744A CN 202311056237 A CN202311056237 A CN 202311056237A CN 117095744 A CN117095744 A CN 117095744A
Authority
CN
China
Prior art keywords
copy number
number variation
data
gene
detection
Prior art date
Legal status (The legal status is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the status listed.)
Pending
Application number
CN202311056237.7A
Other languages
Chinese (zh)
Inventor
钟建伟
柳佳琦
Current Assignee (The listed assignees may be inaccurate. Google has not performed a legal analysis and makes no representation or warranty as to the accuracy of the list.)
Shanghai Xinnuo Baishi Medical Laboratory Co ltd
Original Assignee
Shanghai Xinnuo Baishi Medical Laboratory Co ltd
Priority date (The priority date is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the date listed.)
Filing date
Publication date
Application filed by Shanghai Xinnuo Baishi Medical Laboratory Co ltd filed Critical Shanghai Xinnuo Baishi Medical Laboratory Co ltd
Priority to CN202311056237.7A priority Critical patent/CN117095744A/en
Publication of CN117095744A publication Critical patent/CN117095744A/en
Pending legal-status Critical Current

Links

Classifications

    • GPHYSICS
    • G16INFORMATION AND COMMUNICATION TECHNOLOGY [ICT] SPECIALLY ADAPTED FOR SPECIFIC APPLICATION FIELDS
    • G16BBIOINFORMATICS, i.e. INFORMATION AND COMMUNICATION TECHNOLOGY [ICT] SPECIALLY ADAPTED FOR GENETIC OR PROTEIN-RELATED DATA PROCESSING IN COMPUTATIONAL MOLECULAR BIOLOGY
    • G16B20/00ICT specially adapted for functional genomics or proteomics, e.g. genotype-phenotype associations
    • G16B20/10Ploidy or copy number detection
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06NCOMPUTING ARRANGEMENTS BASED ON SPECIFIC COMPUTATIONAL MODELS
    • G06N3/00Computing arrangements based on biological models
    • G06N3/02Neural networks
    • G06N3/04Architecture, e.g. interconnection topology
    • G06N3/047Probabilistic or stochastic networks
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06NCOMPUTING ARRANGEMENTS BASED ON SPECIFIC COMPUTATIONAL MODELS
    • G06N3/00Computing arrangements based on biological models
    • G06N3/02Neural networks
    • G06N3/04Architecture, e.g. interconnection topology
    • G06N3/048Activation functions
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06NCOMPUTING ARRANGEMENTS BASED ON SPECIFIC COMPUTATIONAL MODELS
    • G06N3/00Computing arrangements based on biological models
    • G06N3/02Neural networks
    • G06N3/08Learning methods
    • GPHYSICS
    • G16INFORMATION AND COMMUNICATION TECHNOLOGY [ICT] SPECIALLY ADAPTED FOR SPECIFIC APPLICATION FIELDS
    • G16BBIOINFORMATICS, i.e. INFORMATION AND COMMUNICATION TECHNOLOGY [ICT] SPECIALLY ADAPTED FOR GENETIC OR PROTEIN-RELATED DATA PROCESSING IN COMPUTATIONAL MOLECULAR BIOLOGY
    • G16B30/00ICT specially adapted for sequence analysis involving nucleotides or amino acids
    • GPHYSICS
    • G16INFORMATION AND COMMUNICATION TECHNOLOGY [ICT] SPECIALLY ADAPTED FOR SPECIFIC APPLICATION FIELDS
    • G16BBIOINFORMATICS, i.e. INFORMATION AND COMMUNICATION TECHNOLOGY [ICT] SPECIALLY ADAPTED FOR GENETIC OR PROTEIN-RELATED DATA PROCESSING IN COMPUTATIONAL MOLECULAR BIOLOGY
    • G16B40/00ICT specially adapted for biostatistics; ICT specially adapted for bioinformatics-related machine learning or data mining, e.g. knowledge discovery or pattern finding
    • YGENERAL TAGGING OF NEW TECHNOLOGICAL DEVELOPMENTS; GENERAL TAGGING OF CROSS-SECTIONAL TECHNOLOGIES SPANNING OVER SEVERAL SECTIONS OF THE IPC; TECHNICAL SUBJECTS COVERED BY FORMER USPC CROSS-REFERENCE ART COLLECTIONS [XRACs] AND DIGESTS
    • Y02TECHNOLOGIES OR APPLICATIONS FOR MITIGATION OR ADAPTATION AGAINST CLIMATE CHANGE
    • Y02ATECHNOLOGIES FOR ADAPTATION TO CLIMATE CHANGE
    • Y02A90/00Technologies having an indirect contribution to adaptation to climate change
    • Y02A90/10Information and communication technologies [ICT] supporting adaptation to climate change, e.g. for weather forecasting or climate simulation

Landscapes

  • Engineering & Computer Science (AREA)
  • Physics & Mathematics (AREA)
  • Life Sciences & Earth Sciences (AREA)
  • Theoretical Computer Science (AREA)
  • Health & Medical Sciences (AREA)
  • Biophysics (AREA)
  • General Health & Medical Sciences (AREA)
  • Bioinformatics & Cheminformatics (AREA)
  • Data Mining & Analysis (AREA)
  • Molecular Biology (AREA)
  • Artificial Intelligence (AREA)
  • Evolutionary Computation (AREA)
  • Software Systems (AREA)
  • Medical Informatics (AREA)
  • Spectroscopy & Molecular Physics (AREA)
  • Biomedical Technology (AREA)
  • Biotechnology (AREA)
  • Bioinformatics & Computational Biology (AREA)
  • Mathematical Physics (AREA)
  • Evolutionary Biology (AREA)
  • General Physics & Mathematics (AREA)
  • General Engineering & Computer Science (AREA)
  • Computing Systems (AREA)
  • Computational Linguistics (AREA)
  • Proteomics, Peptides & Aminoacids (AREA)
  • Analytical Chemistry (AREA)
  • Chemical & Material Sciences (AREA)
  • Computer Vision & Pattern Recognition (AREA)
  • Probability & Statistics with Applications (AREA)
  • Public Health (AREA)
  • Epidemiology (AREA)
  • Databases & Information Systems (AREA)
  • Genetics & Genomics (AREA)
  • Bioethics (AREA)
  • Measuring Or Testing Involving Enzymes Or Micro-Organisms (AREA)

Abstract

The invention discloses a copy number variation detection method based on single-sample high-throughput transcriptome sequencing data, which comprises the steps of firstly comparing genome sequencing data to a human reference genome, and calculating the data quantity of sequencing fragments on each gene to obtain an expression matrix; then inputting the expression matrix into a detection model to obtain a copy number variation detection result; the detection model is obtained by training a data set by using a preprocessed known database sample based on a deep neural network model; the detection method can better understand the genome structure and variation of a single individual; the use of lower cost and time not only provides the information of the expression level of the gene, but also can obtain the information of the copy number of the gene, can better understand the functions and the regulation mechanism of the genome and the influence of copy number variation on the gene expression, and has important significance for researching hereditary diseases, rare diseases and complex diseases.

Description

Copy number variation detection method based on single-sample high-throughput transcriptome sequencing data
Technical Field
The invention relates to the technical field of raw letter analysis, in particular to a copy number variation detection method based on single-sample high-throughput transcriptome sequencing data.
Background
Copy number variation (CopyNumberVariations, CNV) refers to a change in the copy number of a region on a chromosome, i.e., an increase or decrease in the number of DNA sequence repeats in that region. CNVs are common in the human genome and have been associated with the occurrence and progression of some human diseases.
CNVs can involve large genomic fragments, even whole genes or multiple genes, affecting gene expression and function. They may have a significant impact on susceptibility to human disease, risk of morbidity and clinical manifestations. Some CNVs are closely related to the occurrence of genetic diseases, such as certain hereditary cancers, neurological disorders (e.g., autism and dysnoesia), and certain congenital heart diseases.
In addition, CNV can also have an effect on drug response and individual responsiveness to drug treatment. Some CNVs can cause changes in the number or function of drug metabolizing enzymes or drug targets, thereby affecting the metabolism and efficacy of the drug in vivo.
Thus, CNV associated with disease is detected and the contribution and mechanism of these variations to disease is further understood. These studies have helped to enhance understanding of the disease, develop personalized medicine, and improve methods of disease prevention and treatment.
There are three main strategies for CNV analysis, namely Whole Genome (WGS), whole Exome (WES) and targeted sequencing, and detection involves a variety of algorithms and methods. The following are some common algorithms:
1. based on Depth analysis (Read Depth): the method extrapolates copy number variation based on the distribution density of sequencing reads across the genome. By comparing the read depths of the sample and reference genomes, regions of increased or absent copy number can be identified. However, reads depth analysis has limitations for detecting smaller CNVs and is susceptible to factors such as sequencing depth and regional GC content.
2. Breakpoint analysis (split reads): the method detects CNV by analyzing breakpoint positions of copy number variation. It can use paired-end sequencing data or long-read sequencing data to find breakpoint regions and infer the location and size of copy number variations. However, breakpoint analysis requires high quality sequencing data and accurate breakpoint positioning, and is challenging for complex structural variations.
3. Segment comparison (segment-based Methods): these methods divide the genome into successive fragments and compare the read depth or other characteristics of each fragment. By detecting copy number variation between fragments, the presence of CNV can be determined. However, the segment comparison method may have a problem of false alarm or missing report when identifying small CNVs and complex structural variations.
In contrast, there are few detection schemes to address transcriptome data CNV, probably for the following reasons:
1) Noise and technical errors such as sequencing errors, alignment errors, expression estimation errors and the like exist in the transcriptome sequencing data. These errors may have an impact on the results of CNV detection, requiring proper data correction and correction, and conventional CNV analysis methods may not be applicable;
2) The characteristic that the gene expression signal and the CNV signal in the data can interfere with each other can cause the detection of copy number variation to be interfered and influenced by gene expression;
3) The coverage of the genome by the RNA-seq is relatively sparse, which means that some regions may have a higher coverage, while other regions may be under-covered. Such uneven coverage can affect the accuracy and sensitivity of CNV detection, making it highly challenging, if not impossible, to detect accurate CNV breakpoints and small CNV fragments based on a comparative depth approach.
4) Current sequencing analysis CNVs generally rely on the construction of a reference baseline. Reference to a baseline refers to sequencing and analysis of a large number of individuals to determine genomic variations in the normal population. By comparison with a reference baseline, variations in the individual genome can be determined and the presence and copy number of CNV can be inferred. However, there are also limitations to the construction of the reference baseline, such as:
i) Sample number and diversity limitations: the quality and representativeness of the reference baseline depends on the number and type of samples contained. If the number of reference baseline samples is limited or not sufficiently diverse, it may result in some crowd-specific CNVs not being accurately captured.
ii) detection of rare and individual-specific CNVs: reference baselines are often primarily concerned with common CNV variations, however, there may be differences between different samples and data sets, which may not provide accurate baseline information for rare or individual-specific CNVs. These variations may have a significant impact on disease susceptibility and phenotypic characteristics of an individual.
iii) Change of experimental environment: changes in the experimental environment may include changes in temperature, humidity, illumination, etc., or changes in experimental equipment, reagent lots. These variations can lead to inconsistent experimental conditions, resulting in large differences between the sample from which the baseline is constructed and the sample from which the analysis is actually performed, and can have a large impact on the results of the analysis.
Therefore, the current transcriptome sequencing data is mostly used for detecting the expression levels of genes and transcripts to estimate the gene activity, or for identifying Single Nucleotide Polymorphisms (SNPs) and short indels, however, it contains a large amount of information about genomic variations in samples, which is not fully utilized. Among these variations, copy Number Variations (CNVs) are very important for cancer research, as they are the primary genetic driver of cancer. However, identification of CNV from RNA-seq data is very challenging, because the dynamic and highly heterogeneous coverage of the genome by the RNA-seq signal makes it difficult to distinguish between deletion and amplification events and dynamic changes in gene expression levels.
Thus, the use of conventional CNV analysis methods that rely solely on reference baselines and depth-based may have certain limitations on transcriptome data, and there is a strong need for a flexible and accurate method of detecting CNV.
Disclosure of Invention
The invention aims to provide a copy number variation detection method based on single-sample high-throughput transcriptome sequencing data, which overcomes the limitations of the traditional CNV analysis method and realizes flexible and accurate detection of CNV.
In view of this, the scheme of the invention is as follows:
a copy number variation detection method based on single-sample high-throughput transcriptome sequencing data comprises the following steps:
comparing the genome sequencing data to a human reference genome, and calculating the data quantity of the sequencing fragments on each gene to obtain an expression matrix;
inputting the expression matrix into a detection model to obtain a copy number variation detection result;
the detection model is obtained by training a data set by using a preprocessed known database sample based on a deep neural network model; the pretreatment comprises the standardization of the gene expression quantity of the database sample and the division of copy number variation types.
Further, the data alignment is preceded by a pretreatment of the genomic sequencing data to remove low quality sequences and excision of consecutive low quality bases.
Further, the database samples, prior to training, convert the copy number type to a value that can be used by the deep learning algorithm.
Further, the deep network model initializes weights and biases of the neural network in a random initialization manner.
Further, the hidden layer is activated in the training process of the detection model, and the expression level of the sample gene is mapped to max (0, x), namely x is output when x is greater than 0, otherwise 0 is output.
Further, an output layer is activated in the training process of the detection model, input vectors are normalized, and probability values of each element are taken.
Further, a cross entropy loss function is used in the training process of the detection model, and minimization processing is carried out.
The invention also provides a copy number variation detection system based on single-sample high-throughput transcriptome sequencing data, which comprises:
calculating a comparison module: comparing the genome sequencing data to a human reference genome, and calculating the data quantity of the sequencing fragments on each gene to obtain an expression matrix;
and a detection module: inputting the expression matrix into a detection model to obtain a copy number variation detection result;
the detection model is obtained by training a data set by using a preprocessed known database sample based on a deep neural network model; the pretreatment comprises the standardization of the gene expression quantity of the database sample and the division of copy number variation types.
The invention also provides a computer device comprising a memory storing a computer program and a processor implementing the steps of the above-described detection method when the processor executes the computer program.
The invention also provides a computer readable storage medium having stored thereon a computer program which when executed by a processor implements the steps of the detection method described above.
Compared with the prior art, the beneficial effects of the invention include, but are not limited to:
the detection method provided by the invention uses a single sample to carry out CNV analysis, so that the genome structure and variation condition of a single individual can be better known; the lower cost and time are used, so that not only is the expression level information of the gene provided, but also the copy number information of the gene can be obtained, and the functions and regulation mechanisms of the genome and the influence of copy number variation on the gene expression can be better understood; the detection method can identify potential disease-related CNVs, thereby aiding in the diagnosis and prognosis of the disease. This is of great importance for the study of genetic, rare and complex diseases.
Drawings
FIG. 1 is a flow chart of a method for detecting copy number variation of single sample high throughput transcriptome sequencing data according to the present invention.
Detailed Description
In order to make the objects, technical solutions and advantageous technical effects of the present invention more apparent, the present invention will be described in further detail with reference to the following detailed description. It should be understood that the detailed description is intended to illustrate the invention, and not to limit the invention.
CNV analysis using a single sample can address some pain points and limitations, including:
sample collection is difficult: conventional CNV analysis typically requires a large number of samples to produce meaningful results. However, in some cases, it may be difficult or expensive to obtain a sufficient number of samples. The CNV analysis using a single sample can solve the problem of sample collection, making the analysis more convenient and feasible.
The cost and time to construct the reference baseline are high: conventional CNV assays typically require long sample collection times and high assay costs. Using a single sample for analysis may save time and cost in collecting the sample and may allow for faster results to be obtained.
In order to solve the problem of cost caused by the need to construct a baseline for detecting copy number variation in transcriptome data, and the problem that the traditional DNA CNV detection method cannot be used for RNAseq data, in one embodiment, a method for detecting copy number variation in single sample-based high throughput genome capture sequencing data is provided, and the flowchart is shown in fig. 1, and specifically includes the following steps:
1. the patient's genomic sequencing data was data quality processed using fastp software, low quality sequences were removed, consecutive low quality bases were excised, and then high quality sequences were aligned to human reference genome hg 19.
2. A gene annotation file (e.g., a GTF or GFF file) is then used to determine the location of each gene. The gene annotation file provides positional information of transcripts and exons of each gene. Correlating the reads with genes according to the comparison result, and calculating the data quantity of the reads compared to each gene to obtain an expression matrix.
3. Then we downloaded sample data in the public database, each sample having the expression level of the gene and the corresponding copy number variation type, for constructing the model. The specific method comprises the following steps:
1) Data preprocessing: the average value (μ) and standard deviation (σ) of each gene expression amount (x) were calculated, and for each sample gene, normalization was performed using the following formula:
z=(x-μ)/σ;
2) Copy number types are classified as normal (copy number equal to 2), deleted (copy number less than 2), added (copy number greater than 2), and One-Hot Encoding (One-Hot Encoding) is used to convert the copy number types to a numerical representation that can be used by the deep learning algorithm.
4. And then dividing the expression quantity and copy number data of the genes obtained by the treatment into a training set and a testing set.
1) Initializing network parameters: the weights and biases of the neural network are initialized in a random initialization manner.
2) An input layer (with 3 features), two hidden layers (each with 16 neurons) and an output layer (with 3 neurons) were provided
3) Activating the hidden layer: the hidden layer is activated using the ReLU activation function, mapping the input x to max (0, x), i.e. outputting x when x is greater than 0, otherwise outputting 0.
4) Activating an output layer: the output layer is activated using a Softmax laser function. The input vector is normalized, the value of each element is converted to a probability value between 0 and 1, and the sum of all elements is 1.
5) Parameter optimization: the cross entropy loss function used in the training process of the model is optimized and minimized by using an Adam optimizer, and the accuracy of the model is improved.
6) And (3) verifying a model: using the test set as an input to the model, the accuracy of the model was verified as shown in table 1.
Table 1:
gene Accurate predictionRate of
ENSG00000000457.14 0.9122
ENSG00000000460.17 0.9298
ENSG00000000938.13 0.9298
ENSG00000000971.16 0.9122
ENSG00000001460.18 0.9298
5. The final step is to input the expression moment of the sample into the model, and then obtain the copy number prediction result of each gene.
In the above examples, the detection method may be used to detect the presence or absence of copy number variation in a region on a chromosome in a single sample, and is not capable of directly diagnosing a disease or diseases. The CNV results are only used as a single sample, so that the functions and the regulation mechanism of the genome and the influence of copy number variation on gene expression can be better understood, and the CNV results have important significance for researching genetic diseases, rare diseases and complex diseases.
The following are CNV detection examples using certain genomic sequencing data as an example:
(1) Sequencing data pretreatment
Fastq data were obtained and the statistics are shown in Table 2.
Table 2:
Samples Totalreads Totalbases(bp) Q20(%) Q30(%)
read1 54593963 8189094450 97.52 93.27
read2 54593963 8189094450 97.52 93.27
(2) Fastq data processing
After quality control, high quality sequences were obtained and the data statistics are shown in Table 3.
Table 3:
(3) Sequence to reference genome alignment
The alignment of the sequence data with the human reference genome hg19 is shown in table 4.
Table 4:
(4) The number of reads per gene was calculated as shown in Table 5.
Table 5:
(5) The copy number variation test results were obtained by inputting the test model as shown in Table 6.
Table 6:
chromosome of the human body Variation type
ENSG00000007908.16 Gain
ENSG00000007923.16 Gain
ENSG00000007933.13 Gain
ENSG00000007968.7 Normal
ENSG00000008118.10 Normal
ENSG00000008128.23 Normal
ENSG00000008130.15 Normal
ENSG00000009307.16 Normal
The present invention is not limited to the details and embodiments described herein, and further advantages and modifications may readily be achieved by those skilled in the art, so that the present invention is not limited to the specific details, representative solutions and examples described herein, without departing from the spirit and scope of the general concepts defined by the claims and the equivalents.

Claims (10)

1. A copy number variation detection method based on single sample high throughput transcriptome sequencing data, comprising the steps of:
comparing the genome sequencing data to a human reference genome, and calculating the data quantity of the sequencing fragments on each gene to obtain an expression matrix;
inputting the expression matrix into a detection model to obtain a copy number variation detection result;
the detection model is obtained by training a data set by using a preprocessed known database sample based on a deep neural network model; the pretreatment comprises the standardization of the gene expression quantity of the database sample and the division of copy number variation types.
2. The method of claim 1, wherein the data alignment is preceded by pretreatment of genomic sequencing data to remove low quality sequences and excision of consecutive low quality bases.
3. The method of claim 1, wherein the database sample converts the copy number type to a value that can be used by a deep learning algorithm prior to training.
4. The method of claim 1, wherein the deep network model initializes weights and biases of the neural network in a random initialization manner.
5. The method according to claim 1, wherein the hidden layer is activated during the training of the detection model, and the sample gene expression level is mapped to max (0, x), i.e., x is output when x is greater than 0, otherwise 0 is output.
6. The method according to claim 1, wherein the output layer is activated during the training process of the detection model, the input vector is normalized, and the probability value of each element is taken.
7. The method according to claim 1, wherein the cross entropy loss function is used in the training process of the detection model and is subjected to a minimization process.
8. A copy number variation detection system based on single sample high throughput transcriptome sequencing data, comprising:
calculating a comparison module: comparing the genome sequencing data to a human reference genome, and calculating the data quantity of the sequencing fragments on each gene to obtain an expression matrix;
and a detection module: inputting the expression matrix into a detection model to obtain a copy number variation detection result;
the detection model is obtained by training a data set by using a preprocessed known database sample based on a deep neural network model; the pretreatment comprises the standardization of the gene expression quantity of the database sample and the division of copy number variation types.
9. A computer device comprising a memory and a processor, the memory storing a computer program, characterized in that the processor implements the steps of the method of any of claims 1-7 when the computer program is executed.
10. A computer readable storage medium, on which a computer program is stored, characterized in that the computer program, when being executed by a processor, implements the steps of the method according to any of claims 1-7.
CN202311056237.7A 2023-08-21 2023-08-21 Copy number variation detection method based on single-sample high-throughput transcriptome sequencing data Pending CN117095744A (en)

Priority Applications (1)

Application Number Priority Date Filing Date Title
CN202311056237.7A CN117095744A (en) 2023-08-21 2023-08-21 Copy number variation detection method based on single-sample high-throughput transcriptome sequencing data

Applications Claiming Priority (1)

Application Number Priority Date Filing Date Title
CN202311056237.7A CN117095744A (en) 2023-08-21 2023-08-21 Copy number variation detection method based on single-sample high-throughput transcriptome sequencing data

Publications (1)

Publication Number Publication Date
CN117095744A true CN117095744A (en) 2023-11-21

Family

ID=88771058

Family Applications (1)

Application Number Title Priority Date Filing Date
CN202311056237.7A Pending CN117095744A (en) 2023-08-21 2023-08-21 Copy number variation detection method based on single-sample high-throughput transcriptome sequencing data

Country Status (1)

Country Link
CN (1) CN117095744A (en)

Citations (9)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN110648721A (en) * 2019-09-19 2020-01-03 北京市儿科研究所 Method and device for detecting copy number variation by aiming at exon capture technology
CN111210873A (en) * 2020-01-14 2020-05-29 西安交通大学 Exon sequencing data-based copy number variation detection method and system, terminal and storage medium
CN111276187A (en) * 2020-01-12 2020-06-12 湖南大学 Gene expression profile feature learning method based on self-encoder
CN111599407A (en) * 2020-05-13 2020-08-28 北京橡鑫生物科技有限公司 Method and device for detecting copy number variation
CN112634987A (en) * 2020-12-25 2021-04-09 北京吉因加医学检验实验室有限公司 Method and device for detecting copy number variation of single-sample tumor DNA
CN113903395A (en) * 2021-10-28 2022-01-07 聊城大学 BP neural network copy number variation detection method and system for improving particle swarm optimization
CN114566209A (en) * 2022-03-03 2022-05-31 四川大学 Training method and application of mycobacterium tuberculosis drug resistance prediction model based on hierarchical attention neural network
CN115171779A (en) * 2022-07-13 2022-10-11 浙江大学 Cancer driver gene prediction device based on graph attention network and multigroup chemical fusion
CN115249513A (en) * 2021-12-14 2022-10-28 聊城大学 Neural network copy number variation detection method and system based on Adaboost integration idea

Patent Citations (9)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN110648721A (en) * 2019-09-19 2020-01-03 北京市儿科研究所 Method and device for detecting copy number variation by aiming at exon capture technology
CN111276187A (en) * 2020-01-12 2020-06-12 湖南大学 Gene expression profile feature learning method based on self-encoder
CN111210873A (en) * 2020-01-14 2020-05-29 西安交通大学 Exon sequencing data-based copy number variation detection method and system, terminal and storage medium
CN111599407A (en) * 2020-05-13 2020-08-28 北京橡鑫生物科技有限公司 Method and device for detecting copy number variation
CN112634987A (en) * 2020-12-25 2021-04-09 北京吉因加医学检验实验室有限公司 Method and device for detecting copy number variation of single-sample tumor DNA
CN113903395A (en) * 2021-10-28 2022-01-07 聊城大学 BP neural network copy number variation detection method and system for improving particle swarm optimization
CN115249513A (en) * 2021-12-14 2022-10-28 聊城大学 Neural network copy number variation detection method and system based on Adaboost integration idea
CN114566209A (en) * 2022-03-03 2022-05-31 四川大学 Training method and application of mycobacterium tuberculosis drug resistance prediction model based on hierarchical attention neural network
CN115171779A (en) * 2022-07-13 2022-10-11 浙江大学 Cancer driver gene prediction device based on graph attention network and multigroup chemical fusion

Similar Documents

Publication Publication Date Title
Barfield et al. Transcriptome‐wide association studies accounting for colocalization using Egger regression
Hsu et al. Denoising array-based comparative genomic hybridization data using wavelets
US7454293B2 (en) Methods for enhanced detection and analysis of differentially expressed genes using gene chip microarrays
Taylor Implementation and accuracy of genomic selection
CN103201744B (en) For estimating the method that full-length genome copies number variation
US7937225B2 (en) Systems, methods and software arrangements for detection of genome copy number variation
CN109887546B (en) Single-gene or multi-gene copy number detection system and method based on next-generation sequencing
CN107408163B (en) Method and apparatus for analyzing gene
JP2018522531A5 (en)
Nevado et al. Resequencing studies of nonmodel organisms using closely related reference genomes: optimal experimental designs and bioinformatics approaches for population genomics
CN110648721B (en) Method and device for detecting copy number variation by aiming at exon capture technology
CN114049914B (en) Method and device for integrally detecting CNV, uniparental disomy, triploid and ROH
CN112634987A (en) Method and device for detecting copy number variation of single-sample tumor DNA
CN111210873B (en) Exon sequencing data-based copy number variation detection method and system, terminal and storage medium
Eichner et al. Support vector machines-based identification of alternative splicing in Arabidopsis thaliana from whole-genome tiling arrays
CN109461473B (en) Method and device for acquiring concentration of free DNA of fetus
Gong et al. MethCP: differentially methylated region detection with change point models
Mezey et al. Coordinated evolution of co-expressed gene clusters in the Drosophila transcriptome
CN117095744A (en) Copy number variation detection method based on single-sample high-throughput transcriptome sequencing data
WO2023196928A2 (en) True variant identification via multianalyte and multisample correlation
CN113284558B (en) Method for distinguishing gene expression difference and long copy number variation in RNA sequencing data
Schrider et al. Detecting highly differentiated copy-number variants from pooled population sequencing
CN114694752B (en) Method, computing device and medium for predicting homologous recombination repair defects
US20150094223A1 (en) Methods and apparatuses for diagnosing cancer by using genetic information
CN116508105A (en) Genomic marker interpolation based on haplotype blocks

Legal Events

Date Code Title Description
PB01 Publication
PB01 Publication
SE01 Entry into force of request for substantive examination
SE01 Entry into force of request for substantive examination