WO2024056722A1 - Determining the health status with cell-free dna using cis-regulatory elements and interaction networks - Google Patents

Determining the health status with cell-free dna using cis-regulatory elements and interaction networks Download PDF

Info

Publication number
WO2024056722A1
WO2024056722A1 PCT/EP2023/075125 EP2023075125W WO2024056722A1 WO 2024056722 A1 WO2024056722 A1 WO 2024056722A1 EP 2023075125 W EP2023075125 W EP 2023075125W WO 2024056722 A1 WO2024056722 A1 WO 2024056722A1
Authority
WO
WIPO (PCT)
Prior art keywords
gene
network
determining
computer
implemented method
Prior art date
Application number
PCT/EP2023/075125
Other languages
French (fr)
Inventor
Michael Speicher
Samantha HASENLEITHNER
Isaac LAZZERI
Original Assignee
Medizinische Universität Graz
Priority date (The priority date is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the date listed.)
Filing date
Publication date
Application filed by Medizinische Universität Graz filed Critical Medizinische Universität Graz
Publication of WO2024056722A1 publication Critical patent/WO2024056722A1/en

Links

Classifications

    • GPHYSICS
    • G16INFORMATION AND COMMUNICATION TECHNOLOGY [ICT] SPECIALLY ADAPTED FOR SPECIFIC APPLICATION FIELDS
    • G16BBIOINFORMATICS, i.e. INFORMATION AND COMMUNICATION TECHNOLOGY [ICT] SPECIALLY ADAPTED FOR GENETIC OR PROTEIN-RELATED DATA PROCESSING IN COMPUTATIONAL MOLECULAR BIOLOGY
    • G16B5/00ICT specially adapted for modelling or simulations in systems biology, e.g. gene-regulatory networks, protein interaction networks or metabolic networks
    • GPHYSICS
    • G16INFORMATION AND COMMUNICATION TECHNOLOGY [ICT] SPECIALLY ADAPTED FOR SPECIFIC APPLICATION FIELDS
    • G16BBIOINFORMATICS, i.e. INFORMATION AND COMMUNICATION TECHNOLOGY [ICT] SPECIALLY ADAPTED FOR GENETIC OR PROTEIN-RELATED DATA PROCESSING IN COMPUTATIONAL MOLECULAR BIOLOGY
    • G16B20/00ICT specially adapted for functional genomics or proteomics, e.g. genotype-phenotype associations
    • GPHYSICS
    • G16INFORMATION AND COMMUNICATION TECHNOLOGY [ICT] SPECIALLY ADAPTED FOR SPECIFIC APPLICATION FIELDS
    • G16BBIOINFORMATICS, i.e. INFORMATION AND COMMUNICATION TECHNOLOGY [ICT] SPECIALLY ADAPTED FOR GENETIC OR PROTEIN-RELATED DATA PROCESSING IN COMPUTATIONAL MOLECULAR BIOLOGY
    • G16B25/00ICT specially adapted for hybridisation; ICT specially adapted for gene or protein expression

Definitions

  • the present invention relates to methods and means for analyzing cell-free DNA for determining the tissue contribution, the health status of a subject, and monitoring the treatment of a patient.
  • the present invention specifically relates to the determination of regulation networks between transcription factors, genes, DNase hypersensitivity sites, and any combination thereof.
  • BACKGROUND OF THE INVENTION Globally, there has been a significant increase in the number of people who have cancer. The rising prevalence of cancer can be attributed to several factors, such as environmental factors, tobacco consumption, infectious agents such as Hepatitis B and C, and lifestyle changes.
  • Liquid biopsy holds several benefits over traditional cancer diagnostic techniques-reduced cost, early prognosis, therapy monitoring, detection of tumor heterogeneity, acquired drug resistance, and patient comfort.
  • a restraint of liquid biopsy is the lower sensitivity.
  • Detecting ctDNA in liquid biopsies is technically challenging because the levels of ctDNA on any given cancer mutation may be very low in the plasma of a cancer patient, especially after treatment or surgery.
  • the sampling statistics mean that in any individual plasma sample from a patient, there may be less than one detectable copy of the ctDNA with the cancer mutation. This may result in the ctDNA not being detected in the patient sample, even though it is present in the plasma, but at a low level.
  • Companion diagnostics include tests or assays intended to assist healthcare providers in making treatment decisions for patients based on the best response to therapy.
  • the co-development of companion diagnostics with therapeutic products can significantly alter the drug development process and commercialize drug candidates by yielding safer drugs with enhanced therapeutic efficacy quickly and cost-effectively.
  • MUG003P -2- Diverse digital solutions are increasingly adopted in healthcare. Artificial intelligence is used more and more to develop products for the healthcare market.
  • ENCODE Encyclopedia of DNA Elements
  • Roadmap Epigenomics http://www.roadmapepigenomics.org
  • Blueprint Epigenome http://www.blueprint- epigenome.eu
  • TFs transcription factors
  • miRNAs microRNAs
  • enhancers establish the transcription level and when and where a gene is expressed, thus determining cell identity.
  • stage-specific regulatory DNA is temporally activated to instruct lineage-specific gene expression programs that underpin cellular fate and potential.
  • cCREs the in-depth analysis of cCREs and their interaction with each other within regulatory network is pivotal in understanding how gene expression is regulated and altered in pathological conditions.
  • Instrumental to this process is that regulatory DNA is established and maintained by the combinatorial engagement of TFs that bind in the place of a canonical nucleosome.
  • cis-regulatory elements present characteristic epigenetic features, typically devoid of nucleosomes to allow the binding of TFs and protein complexes to their DNA motifs. Therefore, regulatory sequences are more accessible for enzymatic digestion as they lack protection provided by DNA bound to a nucleosome.
  • DHSs DNase hypersensitive sites
  • a cardinal property of regulatory DNA MUG003P -4- is that its accessibility is cell type- and state-selective, with only a tiny fraction of all genome-encoded elements becoming actuated in a given cellular context (Meuleman et al., 2020; Vierstra et al., 2020).
  • cCREs are characterized by a lack of protection from enzymatic digestion due to the lack of a nucleosome and provide a wealth of information about the disease and physiologic conditions.
  • this largely depends on interrogating the cCREs involved in tissue- specific gene regulation, i.e., within the context of the gene regulatory networks (GRNs) that alter which genes are expressed and control the extent of that expression in tissue- specific processes.
  • GNNs gene regulatory networks
  • Ulz et al (2016) describes the identification of two discrete regions at transcription start sites (TSSs) where nucleosome occupancy resulted in different read depth coverage patterns for expressed and silent genes. Thereby, machine learning was employed for gene classification and it was demonstrated that gene expression profiles of cells releasing DNA into the circulation could be directly inferred from nucleosome positioning. Ulz et al (2019) describe the development of a method to investigate the accessibility of transcription factor binding sites, which revealed insights into biological processes from the cells releasing DNA into the circulation and furthermore enabled a subclassification of prostate cancer entities. MUG003P -5- Ulz P. et al.
  • US 2016/0004814 A1 does not disclose how to use native, non-processed sources of input material such as cfDNA for determining regulatory networks. Therefore, the method disclosed in US 2016/0004814 A1 depends on identification of DHSs and boundaries of DNase I accessibility.
  • inferring functional, biological information from cfDNA is not possible and known cfDNA assays are only designed to identify disease or to analyze specific cCREs. For example, the mere presence of DNA released from specific organs is not informative about the biological relevance, e.g., whether it indicates a disease and, if so, which pathways are altered.
  • cfDNA in a regulatory context is in detecting genetic alterations associated with diseases, particularly cancer. For example, identifying MUG003P -6- somatic mutations, copy number variations, and methylation changes in cfDNA can provide information about the genomic landscape of tumors. Also, so far, known methods are restricted to concise, genomic locations associated with active expression. Thus, there is no comprehensive method known which allows analyzing specific features throughout the expressed and unexpressed regions of the human genome which would highly increase the resolution for inferring regulatory networks. Using the methods known so far, it is not possible to identify e.g., silenced genes.
  • genomic sequence i.e. nucleosome positions, fragmentation profiles, depth of coverage, coverage patterns, nucleosome maps, open chromatin regions, and expression data based on TSS patterns
  • genomic MUG003P -7- locations associated with active expression The comprehensive analysis of these features throughout the expressed and unexpressed regions of the human genome dramatically increases resolution for inferring regulatory networks, such that new applications alongside determination of disease/disorder states are now possible, e.g. determining physiological conditions such as aging.
  • the present invention does not depend on DHSs and boundaries of DNase I accessibility as previously disclosed methods but instead, the present invention includes determining nucleosome positions and optionally also open chromatin regions which are flanked by nucleosomes. Determining nucleosome positions allows definition of open chromatin regions within the context of nucleosome positions, i.e., the space between nucleosomes. Thus, definition of open chromatin regions does not depend on determination of regions by a lab-produced signal as in the previously disclosed methods such as in US 2016/0004814 A1. In general, between nucleosomes may be just linker DNA, or at specific sites, a nucleosome-depleted region (NDR) or a nucleosome-free region (NFR).
  • NDR nucleosome-depleted region
  • NFR nucleosome-free region
  • NDR and NFR are used interchangeably.
  • the space between nucleosomes is often broader, so proteins may bind to the DNA to fulfill some regulatory functions.
  • the sequencing coverages in the spaces between nucleosomes are usually lower than in nucleosome-occupied regions.
  • the cfDNA fragmentation patterns between nucleosomes i.e., in open chromatin regions, have different length patterns and increased length variability compared to nucleosome-occupied regions.
  • determination of open chromatin regions is based on factors such as nucleosome positions, coverage, or fragmentation patterns, i.e., on different characteristics than those in previously disclosed methods such as in US 2016/0004814 A1.
  • the inventors of the present invention surprisingly found comprehensive approaches to deduce regulation networks from cfDNA.
  • information is inferred by integrating nucleosome positions and/or many open chromatin features, particularly at the network level.
  • the herein described invention enables to investigate the dynamics of regulatory and functional events during various stages of diseases or physiological conditions, such as aging, by analyzing cfDNA.
  • the methods described herein comprise the integration of different network topologies to increase the resolution of cfDNA analyses to enable distinguishing MUG003P -8- between different diseases e.g., presence of a certain disease and exclusion of another disease, and determining the health state of an individual.
  • tissue-specific cCRE markers in cfDNA are analyzed and specific interaction networks are built. These interaction networks are built between transcription factors, DNase hypersensitivity sites, and genes. As many tissues and organs contribute only minute amounts to the cfDNA pool, highly sensitive approaches are needed for their identification. The accurate identification of rare cell populations in cfDNA cannot be achieved by analyzing only a single or few regions but depends on interrogating multiple, sometimes even thousands of loci. Thereby, as described herein, thousands of cCREs are leveraged within their biological context to significantly enhance cfDNA analyses to identify diseases, such as cancer and characterize physiological states, e.g., aging.
  • tissue-specific candidate cis- regulatory elements cCREs
  • prior knowledge about cCREs is used to construct sets of thousands of cCREs informative about diseases and certain physiologic conditions.
  • the herein described solution offers the option to explore the aging process, e.g., whether somebody ages well (“healthy aging”) or whether somebody has an increased risk for developing diseases in specific organs.
  • the herein described approach paves the way for analyzing open chromatin regions, i.e., cCREs and deduced interaction networks, to address these questions.
  • the problem of the invention is solved by leveraging the interplay between chromatin accessibility and gene expression dynamics.
  • the present invention provides a computer-implemented method for determining a regulation network from cell-free DNA (cfDNA) fragments from a sample comprising the steps of: i. receiving data representing the DNA sequences of cfDNA fragments acquired by sequencing of cfDNA fragments extracted from the sample; ii. determining nucleosome positions; and MUG003P -9- iii. determining at least one of the regulation networks selected from the group consisting of: a.
  • TF-gene network a transcription factor (TF)-gene network
  • b a TF-TF network
  • c a gene-gene network
  • d a TF-DNase hypersensitive site (DHS) network
  • e a DHS-gene network.
  • determining open chromatin regions in ii.. further comprising determining coverage patterns at nucleosome positions and/or open chromatin regions in ii..
  • fragmentation patterns at nucleosome positions and/or at open chromatin regions in ii..
  • TSS transcription start site
  • determining the TF-gene network comprises the steps of: a. determining actively transcribed TFs; b. determining tissue-specificity of the actively transcribed TFs from a.; c. determining gene sets which the TFs from a. activate in each tissue determined in b.; d. evaluating if the gene sets are transcribed; e. determining the intersect for identical and different genes from the gene sets; and f. determining the TF-gene network from the data obtained from e..
  • determining the TF-TF network comprises the steps of: a. assessing accessibility of the respective TFBS of each TF; b.
  • determining the gene-gene network comprises the steps of: MUG003P -10- a. determining the expression status of pre-selected genes or gene-sets, wherein the expression status is determined by i. determining the coverage pattern at the nucleosome depleted region (NDR) and/or at the region of 2 kilobases upstream and downstream of the TSS (2K) region, or ii. determining if a nucleosome is present at the NDR; b.
  • determining the TF-DHS network comprises the steps of: a. determining actively transcribed TFs; b. determining maps of accessible distal DHSs; c. correlating the actively transcribed TFs with the maps of distal DHSs; and d. determining the TF-DHS network from the data obtained from c..
  • determining the DHS-gene network from DNA sequences of cfDNA fragments comprises the steps of: a. determining gene expression status by i. determining the coverage pattern at the NDR and/or at the 2K region, or ii.
  • the sample is a biological sample from a subject or from a cohort of subjects. Specifically, further comprising comparing the at least one of the regulation networks with one or more standard regulation network, or regulation model selected from TF-gene network, TF-TF network, gene-gene network, TF-DHS network, and DHS- gene network.
  • the one or more standard regulation network, or regulation model is determined for one or more cohorts of subjects having a specific classification.
  • the specific classification is associated with a condition.
  • the condition is selected from the group consisting of health status, aging status, cell type, tissue type, and specific disease status.
  • markers for specific conditions are defined. Specifically, the most differently active TFs, genes, or DHSs are determined.
  • the cell and/or tissue origin of cfDNA fragments is determined.
  • the one or more standard regulation network, or regulation model is derived from healthy subjects, and/ or unhealthy subjects.
  • a. congruence with the standard regulation network, or regulation model derived from healthy subjects and difference with the standard network or model derived from unhealthy subjects is characteristic for a healthy status; and/or b. congruence with the standard regulation network, or regulation model derived from unhealthy subjects and difference with the standard regulation network or regulation model derived from healthy subjects is characteristic for an unhealthy status.
  • the health status of a subject is determined.
  • the subject is a patient undergoing treatment of a health condition.
  • the one or more standard regulation network, or regulation model is derived from a previous result from said patient and/or a standard regulation networks characteristic for treatment success.
  • differences and/or congruences provide information on the treatment success of the patient.
  • the treatment success of a patient is monitored.
  • congruence with the standard regulation network derived from a specific cohort of subjects having a specific aging status is characteristic for a specific aging status.
  • the aging status of a subject is determined.
  • the cohort of subjects having a specific aging status is selected from healthy subjects older than 55 years, healthy subjects between 20 and 30 years, pregnant females, and subjects having a disease.
  • the disease is cancer, specifically selected from colorectal cancer and prostate cancer.
  • the present invention further provides a model comprising at least one of the regulation networks selected from TF-gene, TF-TF, gene-gene, TF-DHS, and DHS-gene networks obtained from cfDNA according to the computer-implemented method described herein.
  • the present invention further provides a data processing apparatus comprising means for carrying out the computer-implemented method described herein.
  • the present invention further provides a computer program comprising instructions which, when the program is executed by a computer, cause the computer to carry out the computer-implemented method described herein.
  • the present invention further provides a computer-readable medium having stored thereon the computer program described herein.
  • the present invention further provides an in vitro method for analyzing the cell and/or tissue origin of cell-free DNA (cfDNA) fragments from a sample comprising the steps of: i. extracting cfDNA fragments from the sample; ii. performing whole genome sequencing on the extracted cfDNA fragments; and iii. determining at least one of the regulation networks selected from the group consisting of: a. a transcription factor (TF)-gene network; b. a TF-TF network; c. a gene-gene network; d. a TF-DNase hypersensitive site (DHS) network; and e. a DHS-gene network, and iv. comparing the at least one network of iii.
  • TF transcription factor
  • DHS TF-DNase hypersensitive site
  • a TF-gene network a TF-TF network
  • a TF-DHS network a DHS-gene network
  • a DHS-gene network a computer-implemented method for analyzing the cell and/or tissue origin of cell-free DNA (cfDNA) fragments from a sample comprising the steps of: MUG003P -13- i. receiving data representing the DNA sequences of cfDNA fragments acquired by sequencing of cfDNA fragments extracted from a sample; ii. determining at least one of the regulation networks selected from the group consisting of: a. a TF-gene network; b.
  • a TF-TF network c. a gene-gene network; d. a TF-DNase hypersensitive site (DHS) network; and e. a DHS-gene network, and iii. comparing the at least one network of ii. with one or more standard regulation networks or regulation models characteristic for a specific tissue or cell comprising at least one network selected from a TF-gene network, a TF-TF network, a TF-DHS network, a DHS-gene network, and any combination thereof.
  • an in vitro method for determining the health status of a subject comprising the steps of: i. extracting cfDNA fragments from a sample from the subject; ii.
  • a. a TF-gene network b. a TF-TF network
  • c. a gene-gene network d. a TF-DNase hypersensitive site (DHS) network
  • DHS-gene network iv. comparing the at least one regulation network of iii. with one or more standard regulation networks or regulation models derived from healthy subjects, and/ or unhealthy subjects; wherein a.
  • congruence with the standard regulation network or regulation model derived from healthy subjects and difference with the standard network or model derived from unhealthy subjects is characteristic for a healthy status; and/or MUG003P -14- b. congruence with the standard regulation network or regulation model derived from unhealthy subjects and difference with the standard regulation network or regulation model derived from healthy subjects is characteristic for an unhealthy status.
  • a computer-implemented method for determining the health status of a subject comprising the steps of: i. receiving data representing the DNA sequences of cfDNA fragments acquired by sequencing of cfDNA fragments extracted from a sample; ii. determining at least one of the regulation networks selected from the group consisting of: a. a transcription factor (TF)-gene network; b.
  • TF transcription factor
  • a TF-TF network c. a gene-gene network; d. a TF-DNase hypersensitive site (DHS) network; and e. a DHS-gene network; iii. comparing the at least one regulation network of ii. with one or more standard regulation networks or regulation models derived from healthy subjects, and/ or unhealthy subjects; wherein a. congruence with the standard regulation network or regulation model derived from healthy subjects and difference with the standard network or model derived from unhealthy subjects is characteristic for a healthy status; and/or b. congruence with the standard regulation network or regulation model derived from unhealthy subjects and difference with the standard regulation network or regulation model derived from healthy subjects is characteristic for an unhealthy status.
  • an in vitro method for monitoring the treatment success of a patient comprising the steps of: i. extracting cfDNA fragments from a sample of said patient; ii. performing whole genome sequencing on the extracted cfDNA fragments; iii. determining at least one of the regulation networks selected from the group consisting of: a. a TF-gene network; b. a TF-TF network; c. a gene-gene network; d. a TF-DNase hypersensitive site (DHS) network; and MUG003P -15- e. a DHS-gene network; iv. comparing the at least one network of iii.
  • a computer-implemented method for monitoring the treatment success of a patient comprising the steps of: i. receiving data representing the DNA sequences of cfDNA fragments acquired by sequencing of cfDNA fragments extracted from a sample; ii. determining at least one of the regulation networks selected from the group consisting of: a. a TF-gene network; b. a TF-TF network; c. a gene-gene network; d.
  • determining the TF-gene network comprises the steps of: a. determining the actively transcribed TFs; b. determining the tissue-specificity of the actively transcribed TFs from a.; c. determining the gene sets which the TFs from a. activate in each tissue determined in b.; d.
  • determining the TF-TF network comprises the steps of: a. assessing the accessibility of the respective TFBS of each TF; b. optionally determining overlapping binding sites in TFs; MUG003P -16- c. determining the TF-TF interaction; d. correlating the accessibility obtained from a. with the interaction obtained from c.; and e. determining the network from the data obtained from d..
  • determining the gene-gene network comprises the steps of: a.
  • determining the expression status of pre-selected genes or gene-sets wherein the expression status is determined by i. determining the coverage pattern at the NDR and/or at the 2K region, or ii. determining if a nucleosome is present at the NDR; b. correlating the genes according to their expression status; and c. determining the network from the data obtained from b..
  • determining the TF-DHS network comprises the steps of: a. determining the actively transcribed TFs; b. determining maps of accessible distal DHSs; c. correlating the actively transcribed TFs with the maps of distal DHSs; and d. determining the network from the data obtained from c..
  • determining the DHS-gene interaction network from DNA sequences of cfDNA fragments comprises the steps of: a. determining the gene expression status by i. determining the coverage pattern at the NDR and/or at the 2K region, or ii. determining if a nucleosome is present at the NDR; b. determining maps of accessible distal DHSs; c. correlating the gene expression status of a. with the maps of accessible distal DHSs of b.; and d. determining the network from the data obtained from c..
  • a model comprising at least one of the regulation networks selected from TF-gene, TF-TF, gene-gene, TF-DHS, and DHS-gene networks obtained from cfDNA according to the method described herein.
  • a data processing apparatus comprising means for carrying out the computer-implemented method described herein.
  • a computer program comprising instructions which, when the program is executed by a computer, cause the computer to carry out the computer-implemented method described herein.
  • MUG003P -17- Further provided herein is also a computer-readable medium having stored thereon the computer program described herein.
  • FIGURES Figure 1 Effect of transcribed TFs on open chromatin regions and nucleosome positions
  • Figure 2 TFs activate a gene regulatory network (GRN) in a tissue-dependent manner
  • Figure 3 Construction of TF-gene interaction networks
  • Figure 4 Construction of TF-TF interaction networks
  • Figure 5 Construction of gene-gene interaction networks
  • Figure 6 TF-DHS and DHS-gene interaction network to include all (distant) relevant regulatory regions
  • Figure 7 Building a comprehensive model from cfDNA
  • Figure 8 Computation of nucleosome dyad prior distribution
  • Figure 9 Transformation of empiric count distributions to nucleosome prior distributions
  • Figure 10 Overview heatmap of dyad count distributions with different distribution truncation strategies
  • Figure 11 Nucleosome occupancy pattern from nucleosome priors
  • Figure 12 TF-TF network generated from a plasma sample from a patient with prostate cancer.
  • Figure 13 The TF-TF subnetwork or community showing with which other TFs AR interacts. The table represents the top results of enrichment analysis performed on this subnetwork.
  • Figure 14 Analysis of the same TFs as in Figure 13, but with cfDNA from a healthy individual.
  • Figure 15 The TF-TF subnetwork or community showing TFs, which with STAT1 interacts.
  • Figure 16 The HDAC1 TF-TF subnetwork.
  • Figure 17 The TCF7L1, MYC and ASH2L TF-TF subnetwork. The table represent the top results of the enrichment analysis performed over this community.
  • Figure 18 Comparison of the AR subnetwork in longitudinal samples from patient with prostate cancer whose tumor transdifferentiated from an adenocarcinoma to a MUG003P -18- treatment-emergent small-cell neuroendocrine prostate cancer (t-SCNC).
  • Figure 19 Comparison of AR and FOXA1 signals in prostate cancer (PC) and castration resistant prostate cancer patients (CRPC). The change in AR signals in CRPC demonstrates the loss of the edges connecting AR and FOXA1 in CRPC samples.
  • PC prostate cancer
  • CRPC castration resistant prostate cancer patients
  • Figure 20 Comparison of a PC subnetwork and the same subnetwork in healthy individuals
  • Figure 21 Result of a subtraction operation between the PC specific subnetwork and the equivalent subnetwork in healthy individuals.
  • Figure 22 Example of prostate cancer specific TF-TF-genes subnetwork. A TF- TF subnetwork was expanded to include genes regulated by TFs. Genes are represented as triangles.
  • Figure 23 Another example of prostate cancer-specific TF-TF-genes subnetwork.
  • Figure 24 Ratio values between short ( ⁇ 250bp) and long ( ⁇ 250bp) cfDNA fragments for various gene groups (Y-axis) and cohorts calculated for the +1 nucleosome (top panel) and the +2 nucleosome (bottom panel).
  • Figure 25 Ratio values between short ( ⁇ 250bp) and long ( ⁇ 250bp) cfDNA fragments for various gene groups (Y-axis) and cohorts calculated for gene bodies.
  • Figure 26 Outline of the algorithm to find the best model to classify patients based on top differentially active transcription factors per cohort. Each dataset Di is split into test and train sets. The training set is used to select the best model and hyperparameters. This is achieved through cross-validation on each training set i. Best models are refit on the full training set i, and their final performances are evaluated on an independent test set i.
  • Figure 28 Confusion matrix demonstrating that most cfDNA samples can be classified correctly by TF-TF network analyses after selection of the most different TFs.
  • DETAILED DESCRIPTION Unless indicated or defined otherwise, all terms used herein have their usual meaning in the art, which will be clear to the skilled person. Reference is for example made to the standard handbooks, such as “Molecular Biology of the Cell” (Alberts et al., MUG003P -19- 2022), “Vogel and Motulsky's Human Genetics: Problems and Approaches” (Speicher et al., 2010), “Human Molecular Genetics” (Strachan and Read, 2018), and “The Biology of Cancer” (Weinberg et al., 2013).
  • a method for analyzing cell-free DNA (cfDNA) fragments from a sample comprising the steps of: i. extracting cfDNA fragments from the sample; ii. performing whole genome sequencing on the extracted cfDNA fragments; and iii.
  • DNA refers to deoxyribonucleic acid.
  • DNA is a type of nucleic acid. MUG003P -20-
  • nucleic acid generally refers to a polynucleotide comprising two or more nucleotides.
  • a nucleotide is a monomer composed of three components: a 5-carbon sugar, a phosphate group, and a nitrogenous base.
  • the four naturally occurring types of DNA nucleotides are: adenine (A), thymine (T), guanine (G), and cytosine (C).
  • A adenine
  • T thymine
  • G guanine
  • C cytosine
  • cfDNA refers to “cell free DNA”, “cell-free DNA”, “circulating free DNA”, or “circulating-free DNA”.
  • cfDNA consists of highly degraded DNA fragments, which are detectable in the peripheral blood of every human. In healthy individuals, the vast majority of cfDNA is derived from the hematopoietic system.
  • cfDNA can also provide information about physiological processes such as aging.
  • cfDNA may comprise a footprint representative of its underlying chromatin organization, which may capture one or more of: expressing-governing nucleosomal occupancy, RNA Polymerase II pausing, cell death-specific DNase hypersensitivity, and chromatin condensation during cell death.
  • Such a footprint may carry a signature of cell debris clearance and trafficking, e.g., DNA fragmentation carried out by caspase- activated DNase (CAD) in cells dying by apoptosis, but also may be carried out by lysosomal DNase II after the dying cells are phagocytosed, resulting in different cleavage patterns.
  • cfDNA represents an essential component of “liquid biopsies”, which refers to the analyses of non-solid biological sources (e.g., blood, urine, CSF, ascites) to obtain information similar to tissue biopsies. Analyses of cfDNA are of extraordinary relevance, particularly in oncology, since in patients with cancer, cfDNA contains circulating tumor DNA (ctDNA) shed from tumor cells into the circulation.
  • ctDNA tumor DNA
  • Mechanisms for DNA release into the bloodstream can be apoptosis, necrosis, and active secretion, specifically cfDNA is released by apoptosis.
  • DNA is wrapped around histones to form nucleosomes, which are the basic structure of DNA packing.
  • typical cfDNA fragment lengths have a modal distribution of 167 bp. This length corresponds approximately to the size of DNA wrapped around a nucleosome ( ⁇ 147 bp) and a linker fragment ( ⁇ 20 bp).
  • This particular cfDNA size pattern corresponds to fragmentation patterns after enzymatic processing in apoptotic cells.
  • the cfDNA fragmentation patterns reflect the association between cfDNA MUG003P -21- with nucleosome core particles and linker histones, determining where nuclease cleavage may occur.
  • DNA is frequently cleaved between nucleosomes and only rarely within nucleosomes. The latter circumstance is also called “cleaving resistance” and associated with cfDNA fragments described herein.
  • the architecture of individual nucleosomes determines access to nucleosomal DNA.
  • the individual nucleosome core particle contains 147 bp of DNA wrapped in ⁇ 1.7 left-handed superhelical turns around a central octamer composed of two copies of each of the four core histones H2A, H2B, H3, and H4.
  • nucleosome core particle architecture is pseudo-2-fold symmetric, with the DNA position at the symmetry axis.
  • the symmetry axis i.e., the dyad, is designated as location 0.
  • sample generally refers to a biological sample obtained from or derived from a subject.
  • Biological samples may be cell-free biological samples or substantially cell-free biological samples, or may be processed or fractionated to produce cell-free biological samples.
  • cell-free biological samples may include cell-free ribonucleic acid (cfRNA), cell-free deoxyribonucleic acid (cfDNA), cell-free protein and/or cell-free polypeptides.
  • a biological sample may be tissue (e.g., tissue obtained by biopsy), blood (e.g., whole blood), plasma, serum, sweat, urine, saliva, or a derivative thereof.
  • Cell-free biological samples may be obtained or derived from subjects using an ethylenediaminetetraacetic acid (EDTA) collection tube, a cell-free RNA collection tube (e.g., Streck), or a cell-free DNA collection tube (e.g., Streck).
  • EDTA ethylenediaminetetraacetic acid
  • Streck cell-free RNA collection tube
  • DNA collection tube e.g., Streck
  • a biological sample may be a blood sample or a derivative thereof (e.g., blood collected by a collection tube or blood drops), a tumor sample, a tissue sample, a urine sample, or a cell (e.g., tissue) sample.
  • the biological sample used in the method of the invention is a biofluid sample.
  • Non-limiting examples of useful biofluid samples include, e.g., a blood sample, a serum sample, a plasma sample, a cerebrospinal fluid (CSF) sample, a lymph sample, an endometrial fluid sample, a urine sample, a saliva sample, a tear fluid sample, a synovial fluid sample, an amniotic fluid sample, and a sputum sample.
  • the biofluid sample is selected from a blood sample, a urine sample, a cerebrospinal sample, or an amniotic fluid sample.
  • cfDNA can, e.g., be obtained by a standard blood draw, i.e., a minimally invasive approach.
  • extract in the context of extracting cfDNA fragments refers to the isolation of the cfDNA or cfDNA fragments from the sample. Isolation, extraction, and or purification of cfDNA or cfDNA fragments may be performed through collection of bodily fluids using a variety of techniques. In some cases, collection may comprise aspiration of a bodily fluid from a patient using a syringe. In other cases, collection may comprise pipetting or direct collection of fluid into a collecting vessel.
  • cfDNA or cfDNA fragments may be isolated and extracted using a variety of techniques known in the art.
  • cfDNA may be isolated, extracted and prepared using commercially available kits such as the Qiagen Qiamp® Circulating Nucleic Acid Kit protocol.
  • Qiagen QubitTM dsDNA HS Assay kit protocol, AgilentTM DNA 1000 kit, or TruSeqTM Sequencing Library Preparation; and Low-Throughput (LT) protocol may be used.
  • a cell-free fraction of a biological sample may be used as a sample in the methods described herein.
  • the term “cell-free fraction” of a biological sample, as used herein, generally refers to a fraction of the biological sample that is substantially free of cells.
  • the term “substantially free of cells” generally refers to a preparation from the biological sample comprising fewer than about 20,000 cells per mL, fewer than about 2,000 cells per mL, fewer than about 200 cells per mL, or fewer than about 20 cells per mL.
  • Genomic DNA may not be excluded from MUG003P -23- the acellular sample and typically comprises from about 50% to about 90% of the nucleic acids that are present in the sample.
  • the term “liquid biopsy” refers to a broad category for sampling and minimally invasive testing done of a biofluid (e.g., blood, blood plasma or blood serum) to look for fragments of e.g., tumor derived cfDNA, that are in the blood.
  • the methods described herein may comprise a step of amplifying a nucleic acid.
  • amplifying and amplification generally refer to increasing the size or quantity of a nucleic acid molecule.
  • the nucleic acid molecule may be single-stranded or double-stranded. Amplification may include generating one or more copies or “amplified product” of the nucleic acid molecule.
  • Amplification may be performed, for example, by extension (e.g., primer extension) or ligation.
  • Amplification may include performing a primer extension reaction to generate a strand complementary to a single-stranded nucleic acid molecule, and in some cases generate one or more copies of the strand and/or the single-stranded nucleic acid molecule.
  • the term “DNA amplification” generally refers to generating one or more copies of a DNA molecule or “amplified DNA product.”
  • a method comprises performing DNA sequencing e.g., whole genome sequencing, Sanger sequencing, targeted next-generation sequencing (NGS), whole-genome NGS.
  • NGS next-generation sequencing
  • whole genome sequencing is performed on extracted cfDNA fragments for obtaining the DNA sequence of the cfDNA fragment.
  • the result of this sequencing of the cfDNA fragment is also referred to herein under “sequenced cfDNA fragment” or the “read”.
  • sequenced refers to a sequence read from a portion of a nucleic acid sample, i.e., the result of the sequencing experiment.
  • a read represents a short sequence of contiguous base pairs in the sample.
  • the read may be represented symbolically by the base pair sequence (in ATCG) of the sample portion.
  • sequences or a read may be obtained directly from a sequencing apparatus or indirectly from stored sequence information concerning the sample.
  • sequenced fragment or fragment sequence refers to the combined sequence and length information of a DNA fragment which is gained, MUG003P -24- for example, from a pair of sequencing reads which were created by sequencing both ends of that DNA fragment, a process which is known as “paired-end read sequencing”, and subsequently aligning the obtained sequences to a reference genome.
  • the length information is obtained from start and end coordinates of the paired sequence alignments.
  • This information can also be extracted from a single sequencing read of a DNA fragment which was created by exhaustive sequencing of a DNA fragment until an adjacent sequencing adapter is read during the sequencing process.
  • This type of sequencing process is known as “single-end read sequencing”.
  • the adapter sequence is removed computationally from the read sequence afterwards.
  • the DNA sequences of the cfDNA fragments have different lengths. The length may vary from tens to hundreds of base pairs.
  • the sequence reads are about 25bp, about 30bp, about 35bp, about 40bp, about 45bp, about 50 bp, about 55 bp, about 60 bp, about 65 bp, about 70 bp, about 75 bp, about 80 bp, about 85 bp, about 90 bp, about 95 bp, about 100 bp, about 110 bp, about 120 bp, about 130 bp, about 140 bp, about 150 bp, about 175 bp about 200 bp, about 250 bp, about 300 bp, about 350 bp, about 400 bp, about 450 bp, or about 500 bp.
  • the sequence reads are 151 bp for each end of a DNA fragment that is sequenced in paired-end read sequencing mode.
  • paired-end reads are 50 bp, 75 bp, 100 bp, 101 bp, 150 bp, 151 bp, or 175 bp long.
  • the sequences obtained from sequencing of the cfDNA fragments extracted from the sample may be aligned with a reference sequence.
  • alignment refers to the process of comparing a DNA sequence with a reference sequence.
  • aligning means comparing a read or sequence obtained by sequencing to a reference sequence and thereby determining whether the reference sequence contains the read sequence, the location where the read sequence is aligned in the reference sequence, and/or how the read sequence aligns with the reference sequence. If the reference sequence contains the read, the read may be mapped to the reference sequence or, in certain embodiments, to a particular location in the reference sequence. In some cases, alignment simply tells whether or not a read is a member of a particular reference sequence (i.e., whether the read is present or absent in the reference sequence).
  • the term “aligned sequence pattern” generally refers to a spatial pattern of sequence reads after alignment to a reference genome.
  • reference sequence or a “reference genome sequence” is a sequence of a biological molecule, which is frequently a nucleic acid such as a chromosome or genome. Typically, DNA sequences of multiple cfDNA fragments are members of a given reference sequence. In various embodiments, the reference sequence is significantly larger than the sequenced portions or reads that are aligned to it. In one example, the reference sequence is the sequence of a full length genome of a subject, specifically it is a full length human genome. Such sequences may be referred to as reference genome sequences. Such sequences may also be referred to as chromosome reference sequences.
  • reference sequences include genomes of other species, as well as chromosomes, sub-chromosomal regions, e.g., strands of any species.
  • the reference sequence is a consensus sequence or other combination derived from multiple individuals. However, in certain applications, the reference sequence may be taken from a particular individual.
  • a DNA sequence of a cfDNA fragment is aligned with a reference genome sequence in order to determine the cfDNA fragmentation profile.
  • the methods described herein may further comprise analyzing the depth of coverage.
  • depth of coverage refers to the number of fragment sequences that align with a particular site of the reference genome.
  • coverage describes whether or not any fragment sequence aligns with a particular site or region of a reference genome. In another embodiment, is also used to describe the x-fold target coverage on average across an entire reference genome.
  • coverage pattern generally refers to a spatial arrangement of fragment sequences after alignment of read sequences to a reference genome. The coverage pattern identifies the extent and depth of coverage of next- generation sequencing methods.
  • fragmentation profile refers to evaluation of fragmentation patterns of cfDNA across the genome.
  • Such an evaluation can include cfDNA fragment lengths, positions of aligned fragments relative to the reference genome sequence, relative to a specific point on the reference genome, or alignment positions of multiple fragments relative to each other, the ratio between cfDNA fragments with MUG003P -26- different lengths (e.g., ratio between all cfDNA below a certain length (e.g., 150 bp) vs. all fragments above this length), or whether the nucleosome patterns computed from the cfDNA fragments correspond to nucleosome patterns of a particular cell type, such as white blood cells.
  • the fragmentation profile of cfDNA fragments is used to generate a nucleosome map that identifies the position of nucleosomes in the sample.
  • the nucleosome map displays positions of nucleosome peaks, indicating open and closed chromatin regions in the subject’s genome.
  • Open chromatin regions indicate regions of the genome that do not contain nucleosomes. These open regions are able to be bound by various protein factors and regulatory elements and transcribed.
  • Closed chromatin regions are regions of the genome that surround nucleosomes and are inaccessible to protein factors, regulatory elements, and other molecules. These closed chromatin regions are not able to be transcribed.
  • the term “network” as used herein refers to a collection of connected objects. Objects are referred to as nodes or vertices. The connections between the nodes are referred to as edges.
  • the objects are the points connected in a network and the lines between the points are the edges.
  • graph may be used for the term “network”.
  • regulation network refers to the regulatory network of gene regulation and comprises the correlation, interaction, cooperation, co- expression, co-regulation and/or co-occurrence of different factors involved in gene regulation.
  • interaction network can be used interchangeably herein for the term “regulation network”.
  • at least one of the following regulation networks are determined in the methods described herein: TF-gene network, TF-TF network, gene-gene network, TF-DHS network, and DHS-gene network.
  • genes and TFs are the nodes.
  • the FT to gene connections are also referred to as edges.
  • a TF-gene network is reconstructed from cfDNA data. Thereby, the regulatory connections from TFs to their target genes with the GRN are reconstructed.
  • transcription factor abbreviated herein as “TF”, generally refers to a protein that controls the rate of transcription of genetic information from DNA to messenger RNA by binding to a specific DNA sequence.
  • Transcription factors are MUG003P -27- proteins that bind to DNA-regulatory sequences (e.g., enhancers and silencers), usually localized in the 5′-upstream region of target genes, to modulate the rate of gene transcription. This may result in increased or decreased gene transcription, protein synthesis, and subsequent altered cellular function, (for example, cells changing, in response to the environment (normal or pathological), for example during atrophy, hypertrophy, hyperplasia, metaplasia, or dysplasia).
  • DNA-regulatory sequences e.g., enhancers and silencers
  • TFs bind to specific DNA motifs to activate or repress gene expression.
  • TFs do not activate a single gene but a GRN in a tissue-dependent manner.
  • Each TF has a canonical regulatory profile, and its target genes have distinct co-expression patterns.
  • the TF to target gene connections are also referred to as network edges, whereas genes are network nodes and TFs regulating nodes.
  • edges are often uniquely called as specific in only one tissue.
  • tissue-specific genes often have a high multiplicity, meaning that they are identified as specific in more than one tissue.
  • TFs primarily participate in tissue-specific regulatory processes via alterations in their targeting patterns. TFs regulate tissue- specific biological processes by subtle differences in the regulatory connections between genes and TFs. Hence, even edges may have a higher multiplicity indicative of shared regulatory processes between tissues.
  • RNA can be directly functional or be an intermediate template for a protein that performs a function.
  • data on TFs and regulated genes can be retrieved from various sources, e.g., from the PANDA (Passing Attributes between Networks for Data Assimilation) (Guebila et al., 2022b) or the GRAND websites (https://grand.networkmedicine.org) (Guebila et al., 2022a).
  • PANDA Passing Attributes between Networks for Data Assimilation
  • GRAND websites https://grand.networkmedicine.org
  • the transcribed status is reflected in the typical transcription start site (TSS) pattern, i.e., in the nucleosome depleted region (NDR) and 2K region, and the distances of upstream and downstream nucleosomes MUG003P -28- (Fourier transformation (FFT), short-time Fourier transformation (STFT)) and the accessibility of the corresponding transcription factor binding sites (TFBSs).
  • TSS transcription start site
  • NDR nucleosome depleted region
  • STFT short-time Fourier transformation
  • TFBSs transcription factor binding sites
  • building the TF-gene interaction network in cfDNA may consist of the steps given in Figure 3a.
  • transcription start site refers to the location where the first DNA nucleotide is transcribed into RNA.
  • TSS do not harbor many nucleosomes.
  • Chromatin remodeling complexes maintain the nucleosome depleted region (NDR) by sliding nucleosomes away in order to ensure accessibility of transcription factors and polymerase.
  • NDR refers to the nucleosome depleted region at the TSS, specifically to the region between TSS-150bp and TSS+50bp.
  • 2K refers to the region of 2 kilobases around the TSS, i.e.
  • the term “Fourier transformation” or “FFT” informs about the frequencies, here the distances between nucleosomes close to a TSS.
  • the “short-time Fourier transformation” or “STFT” informs about the location of the frequencies, i.e., the position when a signal changes.
  • transcription factor binding site refers to the DNA region to which a TF binds.
  • transcription factor binding sites are identified from the Gene Transcription Regulation Database (GTRD: a database on gene transcription regulation-2019 update. I. S. Yevshin, R. N. Sharipov. S. K. Kolmykov, Y. V. Kondrakhin, F. A. Kolpakov. Nucleic Acids Res. 2019 Jan.
  • the identified TFBS are informative for machine learning models and classifier generation.
  • the associated pathways and classes of transcription factors are similarly useful and informative for machine learning models and classifier generation.
  • transcription factor binding profile generally refers to a multi-factor information profile for a given transcription factor that includes both tissue contributions and biological processes.
  • the TFBP also includes an “accessibility score” and a z-score statistic to objectively compare across different plasma samples for significant changes in TFBS accessibility.
  • the profile may allow identification of lineage-specific TFs suitable for both tissue-of-origin and tumor-of-origin identification.
  • the term “accessibility score” generally refers to a measure for the accessibility of each transcription factor binding site. Since transcription factor binding may open or “prime” its target enhancers without necessarily activating them per se, the rank values are termed “accessibility score.”
  • the accessibility score may be used to objectively compare the accessibility of TFBSs in serial analyses from the same person or among different individuals. This score provides a robust assessment of TFBS accessibility with particular utility to use cfDNA in clinical diagnostics, cancer detection, treatment monitoring, and other applications described herein. According to one embodiment, a list of actively transcribed TFs is generated from cfDNA data.
  • the activity, i.e., the transcription, of TFs is deduced from the coverage patters at the TSS, i.e., the NDR and 2K regions, combined with the position of upstream and downstream nucleosomes to establish the transcription status of the respective TFs precisely.
  • the accessibility of the respective TFBSs should be increased, which represents another parameter for TF status assessment included in the model.
  • the activity of a TF is assessed by a combination of several factors: TSSs i.e., NDR/2K, nucleosome positions, and TFBS accessibility. Further parameters, such as the (relative) entropy at TSS and TFBSs, may be included in the evaluation.
  • the activity (transcription) of TFs may be deduced from the coverage pattern at the TSS, i.e., the NDR and 2K regions as described by Ulz et al., 2016.
  • the nucleosome positions may be derived from cfDNA data by determining the coverage pattern and/ or by the nucleosome priors approach described herein.
  • the term “nucleosome position” as used herein refers to the position of a nucleosome on cfDNA. In other words, this refers also to the presence of a nucleosome at a base position of the cfDNA fragment.
  • the methods described herein comprise determining the nucleosome position.
  • determining the nucleosome position may be performed by determining the probability of the presence of a nucleosome or a nucleosomal dyad for a base position of the cfDNA fragments. Specifically, this probability may be determined by determining the dyad count distribution for specific fragment lengths, performing a fragment length-based truncation, determining probability density functions, and removing of the non- informative portion.
  • nucleosome dyad prior distribution This probability is also be termed “nucleosome dyad prior distribution”, “nucleosome prior distribution”, or “nucleosome prior” herein.”
  • the methods described herein comprise determining fragmentation profiles. According to a specific embodiment, these fragmentation profiles are determined for some or for all sites of interest. According to an even more specific embodiment, cfDNA fragment length and the variability of the fragments may be included in the methods described herein. According to a specific embodiment, the variability of fragment lengths is reflected in the term “entropy” or “entropy of TSS and TFBS” as described elsewhere herein. According to one embodiment, nucleosome positions and open chromatin sites are mapped.
  • the co-occurrence of nucleosome positions and open chromatin sites are determined.
  • the resulting atlas of open chromatin sites may consist of co-regulated open chromatin regions which correlate to a high extend, of open chromatin regions which are related only to a subset of other regions, and/or of an open chromatin region which may not be associated with any other region.
  • maps of open chromatin regions may represent a mixed collection of regulatory regions such as TSSs, TFBSs, or any other regulatory regions, such as enhancers.
  • open chromatin regions are assigned to their function based on publicly available databases, literature, or other publicly available resources.
  • cfDNA multi-dimensional data is integrated and analyzed to identify potential regulatory interactions between transcription factors, target genes, and other regulatory elements.
  • the involved regulatory networks are determined. According to a specific embodiment, this involves determining that whenever a subset of open chromatin regions is accessible, that then follows -based on known regulatory interactions- another subset of other open chromatin must also be accessible to characterize conditions such as a health state or a defined disease state.
  • regulatory networks reflect the complex molecular mechanisms that govern gene expression and cellular processes.
  • incorporation of regulatory networks in a model comprising at least one of the regulation networks selected from TF-gene, TF-TF, gene- gene, TF-DHS, and DHS-gene networks obtained from cfDNA according to the method described herein provides an increased informative value of regulatory networks.
  • the increase of accessibility of the respective TFBSs for each TF is described by Ulz et al., 2019.
  • the tissue specificity of actively transcribed TFs may be determined from resources such as from Lambert et al. (2016), from the TF- Marker database described in Xu et al. (2022), or from other sources.
  • the gene sets are analyzed which are activated by the TFs in the different tissues.
  • This analysis comprises the analysis of the TSS, e.g., the NDR/2K region, and the nucleosome position e.g., the distances between nucleosomes.
  • a separate evaluation is performed of the gene set activated by this TF in tissue A, then the gene set activated in tissue B, etc.
  • These gene sets can be retrieved from various resources (e.g., the GRAND database).
  • the analyzed gene sets are evaluated for the most substantial evidence of whether they are active, i.e., transcribed.
  • an example of evaluating the analyzed gene sets for the most substantial evidence of whether they are active, i.e., transcribed, is establishing a ranking order according to the evidence to which tissue they correspond.
  • MUG003P -32- The ranking of these gene sets enables to estimate the cell abundance in cfDNA.
  • cfDNA represents a mixture of DNA released from different tissues, the composition may change depending on physiologic or pathological conditions.
  • multiple ranking lists of gene networks i.e., one ranking list for each TF, are generated. Numerous data sets are obtained to reconstruct which tissue contributed what percentage to the cfDNA pool.
  • An alternative to such ranking lists could be other bioinformatics approaches such as neuronal networks or autoencoders.
  • these gene lists are compared and filtered for the genes common or different in the lists ( Figure 3b).
  • tissue-specific TFs e.g., hematopoietic or GI-specific TFs, as exemplarily shown in Figure 3b
  • the gene networks regulated by tissue-specific TFs will have similarities and overlap. Specifically, establishing such similarities further increases the resolution and specify specific tissues' contribution to the cfDNA pool with improved precision.
  • the TFs are the nodes.
  • the TF to TF connections are also referred to as edges.
  • in the TF-TF network it is determined which TFs cooperate in a cfDNA sample.
  • a TF-TF interaction network is established as described herein. Thereby, cooperative interactions between TFs in a cfDNA sample are established.
  • the TF networks are deciphered which cooperate in each cfDNA sample.
  • TFs involved in developmental state specification such as HOXB1, OCT4, and SOX2 preferentially regulate other TF genes.
  • the regulatory process is multi-faceted.
  • the genomic locations where TFs may bind, i.e., the TFBSs may be computationally estimated using DNA recognition sequences, i.e., motifs. However, focusing only on predicting the genomic locations of TF binding does not help deduce GRN relationships.
  • TFs may work together by forming protein complexes. Consequently, a member of a TF complex may regulate a target gene even without a corresponding binding site in the regulatory region of that gene. From protein-protein interaction (PPI) data, it is known that TFs often form multi- protein complexes that carry out regulatory functions. Therefore, investigating only an initial set of motif locations does not include cases where TFs bind to the DNA without a corresponding recognition sequence (motif). Furthermore, not all TFBSs are functionally relevant or active. MUG003P -33- According to one embodiment, evidence for TF interactions can be established from cfDNA data ( Figure 4). According to one embodiment, the accessibility of the respective TFBSs is assessed for each TF.
  • Figure 4a it is analyzed how TFs share TFBSs, e.g., due to sequence homology or other factors. As many TFs bind in large complexes, overlapping binding sites in TFs that regulate the same genes may be found e.g., in the GTRD database. Specifically, the most recent GTRD version 21.12 (Kolmykov et al., 2021) (https://gtrd.biouml.org/) harbors information on 1.391 TFs.
  • the methods described herein include a constantly updated screening for TFBS overlaps based on the latest versions of the respective databases ( Figure 4b, lower panel).
  • protein-protein interaction (PPI) data are employed to explore established TF cooperation in cfDNA.
  • PPIs can be obtained from public interaction databases such as PINA2, STRING, IntAct, and BioGRID or from a recent publication (Göös et al., 2022).
  • TFs cooperate exclusively in the same tissue, their accessibility patterns should show a strong correlation. In these cases, it will be possible to deduce something like: if TFn has increased accessibility, so should TFm. As in the data above from Göös and colleagues: if HNF4A has increased accessibility, the accessibility of TYY1 should be likewise increased.
  • Figure 4d illustrates an example of close cooperation between TF1 and TF2, but not TF3, in the hematopoietic system. Examples for co-expressed, cell-type-specific TFs exist.
  • the cooperate accessibility of these TFs has a high power to evaluate the MUG003P -34- balance between hematopoietic derived DNA and epithelial DNA within a cfDNA sample with high precision.
  • the pattern of correlations in cfDNA changes according to the contribution of different tissues to the cfDNA. This is depicted in Figure 4f, where TF1 cooperates with T2 in hematopoietic cells and with TF3 in the GI tract.
  • TF correlation matrices can elucidate which TFs cooperate and establish a TF-by-TF “cooperativity network”, particularly if DNA from various tissues contributes to the cfDNA pool.
  • TF interactions can be set for various tissues, which release DNA into the circulation.
  • the genes are the nodes.
  • the gene to gene connections are also referred to as edges.
  • the gene-gene network it is estimated whether genes or pairs of genes are co-regulated in a cfDNA sample. An example is shown in Figure 5.
  • the gene-gene network i.e., the co-regulation of genes from cfDNA can be determined with two strategies: one strategy involves defining core gene sets and determining their combined expression pattern in cfDNA. Another strategy investigates single genes.
  • the co-regulation of genes from cfDNA may utilize the prior nucleosome strategy where the presence or absence of a nucleosome at its TSS is used as a proxy for gene expression.
  • distinct core gene sets are defined in the core gene strategy. Thereby, the design of these gene sets depends on the question to be addressed. For example, core gene sets are defined corresponding to major cell types using extensive new maps of RNA transcripts in a broad range of primary cell types ( Figure 5a).
  • MUG003P -35- In general, core transcriptional programs define the morphology and function common to a few major cellular types, which are at the root of the hierarchy of the many cell types that exist in the human body, i.e., epithelial, endothelial, mesenchymal, neural, and blood cells (Breschi et al., 2020). Genes whose expression is specific to these cell types were identified.
  • meaningful gene sets may be defined, e.g., subsets of PBMC specific genes according to their expression levels or organ-specific gene sets based on tissue-specificity, e.g., as indicated by the protein atlas.
  • the expression of each gene is estimated based on the presence or absence of a nucleosome at the NDR ( Figure 5b).
  • the presence of a nucleosome at the NDR means that this gene cannot be expressed, as the nucleosome blocks the bulky transcription machinery from binding.
  • the absence of a nucleosome indicates that the gene may be expressed.
  • a gene may be in a poised state, meaning it is not expressed; however, the NDR is nucleosome-free to enable a rapid transcription initiation.
  • an approach to assess the NDR-nucleosome status is to use the nucleosome priors approach.
  • one network consists of the genes with a nucleosome-blocked NDR, and the other network of genes with nucleosome-free NDRs as genes co-regulated should exhibit similar expression patterns.
  • the NDR nucleosome status is established with the nucleosome priors strategy, not only the two states “NDR-blocked” vs. “NDR-free” are obtained but also intermediates, such as evidence for a blocked NDR in a certain percentage within the cfDNA. This information may be included in the construction of the networks described herein.
  • a regulation network can be computed with various approaches and similarity metrics, such as Pearson Correlation Coefficients (PCC), modified Tanimoto similarity (Tfunction), Euclidean, Squared Euclidean, Standardized Euclidean, City Block, Chebychev, Cosine, or Pearson Correlation. MUG003P -36-
  • TFs and DHSs are the nodes.
  • the TF to DHS connections are also referred to as edges.
  • it is determined whether activated TFs and distal DHSs are co-occurring in the cfDNA sample.
  • genes and DHSs are the nodes.
  • the gene to DHS connections are also referred to as edges.
  • the analyses of TF networks include TF-DHS and DHS-gene assignments to characterize different tissue-specific signatures (Georgolopoulos et al., 2021) ( Figure 6).
  • regulatory elements can be located TF-agnostically by mapping DNase I hypersensitive sites (DHSs).
  • DHSs DNase hypersensitive site
  • DHS DNase hypersensitive site
  • DHS DHS as used herein refers to regions of open or accessible chromatin where DNA is not tightly wrapped within a nucleosome, leaving the sequence accessible to DNA-binding proteins.
  • DHSs are described e.g., by (Sheffield et al., 2013). DHSs mark all significant classes of cis-regulatory elements in their cognate cellular context. The systematic delineation of DHSs across human cell types and states has provided fundamental insights into many aspects of genome control (Vierstra et al., 2014). According to one embodiment, detailed mapping of DHSs provides detailed snapshots of regulatory element dynamics across the multidimensional landscape of cell types, environmental exposures, and developmental stages. In general, nucleosomes surrounding accessible promoters and TSS-distal DHSs are generally well-positioned. Nucleosomes surrounding DHSs are collectively well positioned, but well-positioned nucleosomes are associated mainly with regulatory elements in an actuated state.
  • nucleosome positioning is dependent mainly on the actuation of regulatory DNA (Stergachis et al., 2020).
  • Promoter elements are larger and more accessible elements; even though they represent the minority of elements, they dominate the top end of the quantitative accessibility landscape. Promoters also exhibit far less cell type selectivity.
  • TF binding in the proximal promoter region regulates gene expression by forming the preinitiation complex.
  • Distal regulatory elements influence the rate of gene transcription by acting as activators or repressors. MUG003P -37- According to one embodiment, distal regulatory factors are included in interaction network models for a comprehensive assessment of gene regulation.
  • distal elements which constitute the vast majority, exhibit considerable lineage- and cell type-selectivity, typically with ‘on/off’ behavior- i.e., most elements are complete ‘off’ in most cell and tissue types.
  • the biological differences between proximal and distal DHSs and as the promoter, i.e., the TSS regions, are included in the TF-gene, and gene-gene interaction network, the TF-DHS and DHS-gene interaction networks will focus on distal DHSs sites. Specifically, they may overlap with some TFBSs; however, this strategy ensures that all regulatory regions, such as enhancers, will be included in our analyses.
  • ELMER Enhancer Linking by Methylation/Expression Relationships
  • DHSs can be obtained from, e.g., the ENCODE project.
  • the sites can then be merged (concatenated) to contain each cluster into a single region-specific (or universally accessible) dataset (Peneder et al., 2021).
  • Chromatin accessibility landscapes have also been mapped in solid tumors, including breast cancer, colon cancer, glioblastoma, gastric cancer, and lung cancer (Minnoye et al., 2021).
  • TF-DHS and DHS-gene networks are largely unexplored. Georgolopoulos and colleagues identified developmentally regulated DHSs and analyzed corresponding transcripts. Distal DHSs were linked to their target gene MUG003P -38- promoters and individual TFs to their target DHSs.
  • available public data can be used to test whether putative TF-DHS and DHS-gene interactions are present in an analyzed cfDNA sample, e.g., by aligning the TF and gene status with the accessibility of the respective DHSs ( Figure 6a).
  • the relationships between TFs and genes via distal DHSs are modeled by integrating available DNase-seq data and generating edges where chromatin structure indicates that TFs are likely to bind and regulate gene expression.
  • TF-DHS-gene regulatory network As used therein, the term “cohort” or “cohort of subjects” shall refer to a group of subjects having a specific classification and may specifically refer to the samples received from said subjects.
  • the number of subjects of a cohort can vary, i.e. it may comprise 2, 3, 4, 5, 6, 7 or more subjects, however it also may be a larger group of subjects, like for example but not limited to 10, 50, 100 or more subjects. According to the embodiment of the invention the cohort may also comprise large cohorts of 500 or more subjects.
  • the cohort of subjects as described herein shall refer to a group of subjects being associated with or having a condition. These subjects of a cohort can thereby be assigned to a specific classification or status, e.g. displaying a certain condition, such as a clinical, physiologic, or pathologic condition, specifically, selected from but not limited to health status, aging status, cell type, tissue type, and specific disease status.
  • the cohort of subjects shall refer to a group of subjects being healthy, unhealthy, of a certain age, and/or having a specific disease. Markers for specific conditions may be but are not limited to, network patterns indicating a specific condition of a subject or a cohort of subjects.
  • cfDNA sample sets from well-defined cohorts can be employed and tested for recurrent patterns.
  • accessible TFBSs located outside of proximal promoters are mapped.
  • Regulatory regions, i.e., accessible TFBSs for each actuated gene may be investigated at various distances upstream and downstream of the TSS ( Figure 6b). Different distances can be studied for such potential regulatory regions, e.g., ⁇ 5-10kb, ⁇ 20-25kb, ⁇ 45-50kb, ⁇ 70-75kb, or ⁇ 95-100kb. Thereby, it is enabled to predict an epigenetically informed gene regulatory network.
  • distal regulatory DHSs upstream or downstream of TSS are investigated in the next step.
  • these regulatory DHSs may not comprise thousands of DHSs but only a few, unique methods to establish their accessibility may be needed.
  • the distal DHSs of a core gene set may be combined to increase their number, or alternatively, a unique strategy, such as nucleosome priors, may be applied.
  • the four different interaction networks are combined ( Figure 7).
  • TF-gene, TF-TF, and gene-gene networks overlap with TF-DHSs or DHSs-gene networks. All TSSs of genes and all TFBSs are also DHS sites.
  • the TF-gene, TF-TF, and gene-gene networks may also include TF- DHSs or DHSs-gene networks.
  • DHSs may include some additional cCREs, such as enhancers.
  • TFs, genes, and/or DHSs are selected for determining a regulation network described herein. The selection of these TFs, genes, and/or DHSs may depend on the specific further purpose of the regulation network, e.g., for determining whether a given cfDNA sample is from a healthy individual or a person with cancer, for determining the specific cancer type. According to a specific embodiment, the most differentially active TFs, genes, and/or DHSs between different specific cohorts are selected.
  • an in vitro method for analyzing the cell and/or tissue origin of cell-free DNA (cfDNA) fragments from a sample comprising the steps of: i. extracting cfDNA fragments from the sample; ii. performing whole genome sequencing on the extracted cfDNA fragments; and MUG003P -40- iii. determining at least one of the regulation networks selected from the group consisting of: a. a transcription factor (TF)-gene network; b. a TF-TF network; c. a gene-gene network; d. a TF-DNase hypersensitive site (DHS) network; and e.
  • TF transcription factor
  • DHS TF-DNase hypersensitive site
  • a computer-implemented method for analyzing the cell and/or tissue origin of cell-free DNA (cfDNA) fragments from a sample comprising the steps of: i. receiving data representing the DNA sequences of cfDNA fragments acquired by sequencing of cfDNA fragments extracted from a sample; ii. determining at least one of the regulation networks selected from the group consisting of: a.
  • a TF-gene network b. a TF-TF network; c. a gene-gene network; d. a TF-DNase hypersensitive site (DHS) network; and e. a DHS-gene network, and comparing the at least one network with one or more standard regulation networks or regulation models characteristic for a specific tissue or cell comprising at least one network selected from a TF-gene network, a TF-TF network, a TF-DHS network, a DHS- gene network, and any combination thereof.
  • defined sets of cfDNA samples i.e., from well-defined cell and/or tissue origin are used herein for building standard regulation networks or regulation models characteristic for a specific tissue or cell.
  • the methods described herein comprise the determination of at least one, two, three, four, or all of the regulation networks selected from the group consisting of: a. a transcription factor (TF)-gene network; MUG003P -41- b. a TF-TF network; c. a gene-gene network; d. a TF-DNase hypersensitive site (DHS) network; e. a DHS-gene network, and any combination thereof.
  • an in vitro method for determining the health status of a subject comprising the steps of: i. extracting cfDNA fragments from a sample from the subject; ii. performing whole genome sequencing on the extracted cfDNA fragments; and iii.
  • determining at least one of the regulation networks selected from the group consisting of: a. a TF-gene network; b. a TF-TF network; c. a gene-gene network; d. a TF-DNase hypersensitive site (DHS) network; and e. a DHS-gene network; iv. comparing the at least one regulation network of iii. with one or more standard regulation networks or regulation models derived from healthy subjects, and/ or unhealthy subjects; wherein a. congruence with the standard regulation network or regulation model derived from healthy subjects and difference with the standard network or model derived from unhealthy subjects is characteristic for a healthy status; and/or b.
  • DHS TF-DNase hypersensitive site
  • a computer-implemented method for determining the health status of a subject comprising the steps of: i. receiving data representing the DNA sequences of cfDNA fragments acquired by sequencing of cfDNA fragments extracted from a sample; ii. determining at least one of the regulation networks selected from the group consisting of: a. a transcription factor (TF)-gene network; MUG003P -42- b. a TF-TF network; c. a gene-gene network; d.
  • TF transcription factor
  • TF-DNase hypersensitive site DHS
  • DHS-gene network iii. comparing the at least one regulation network of ii. with one or more standard regulation networks or regulation models derived from healthy subjects, and/ or unhealthy subjects; wherein a. congruence with the standard regulation network or regulation model derived from healthy subjects and difference with the standard network or model derived from unhealthy subjects is characteristic for a healthy status; and/or b. congruence with the standard regulation network or regulation model derived from unhealthy subjects and difference with the standard regulation network or regulation model derived from healthy subjects is characteristic for an unhealthy status.
  • DHS TF-DNase hypersensitive site
  • the standard regulation network or regulation model derived from unhealthy subjects is derived from subjects suffering from a condition selected from cancer, specifically colorectal cancer, prostate cancer, colon cancer, breast cancer, bladder cancer, and/or lung cancer; inflammation; autoinflammatory diseases, coronary disease, acute tissue damage; chronic disease, specifically a chronic disease affecting the gastrointestinal tract, more specifically Crohn’s disease or ulcerative colitis, or chronic obstructive pulmonary disease; and/or asthma, or thyroiditis; complications during pregnancy; beginning sepsis; sepsis; hypertension; obesity; and diabetes; processes associated with aging.
  • a standard regulation network may also be derived from specific cell types or tissue types.
  • a congruence of 50, 60, 65, 70, 75, 80, 85, 90, 91, 92, 93, 94, 95, 96, 97, 98, 99, or 100% with the standard regulation network or regulation model derived from healthy subjects is characteristic for a healthy status.
  • a difference of 50, 60, 65, 70, 75, 80, 85, 90, 91, 92, 93, 94, 95, 96, 97, 98, 99, or 100% with the standard regulation network or regulation model derived from unhealthy subjects is characteristic for a healthy status.
  • a congruence of 50, 60, 65, 70, 75, 80, 85, 90, 91, 92, 93, 94, 95, 96, 97, 98, 99, or 100% with the standard regulation network or regulation model derived from unhealthy subjects is characteristic for an MUG003P -43- unhealthy status.
  • a difference of 50, 60, 65, 70, 75, 80, 85, 90, 91, 92, 93, 94, 95, 96, 97, 98, 99, or 100% with the standard regulation network or regulation model derived from healthy subjects is characteristic for an unhealthy status
  • the subject is considered healthy if the deviation of a regulation network between the regulation network obtained from the sample and a standard regulation network or a regulation model characteristic for a healthy subject is less than the deviation of a regulation network between the regulation networks obtained from the sample and a standard regulation network or a regulation model characteristic for an unhealthy subject.
  • said deviation of a regulation network between a regulation network obtained from the sample and a standard regulation network or a regulation model characteristic for a healthy subject is 99, 98, 97, 96, 95, 94, 93, 92, 91, 90, 89, 88, 87, 86, 85, 84, 83, 82, 81, 80,75, 70, 65, 60, 55, 50, 45, 40, 35, 30, 25, 20, 15, 10, or 5% of the deviation of a regulation network between a regulation networks obtained from the sample and a standard regulation network or a regulation model characteristic for an unhealthy subject.
  • a machine learning model for binary classification between healthy and unhealthy regarding a specific disease group or pregnancy can be trained on the set of standard regulation networks from samples of both groups to learn patterns of networks that signify an unhealthy sample. Multiple such models can be combined to achieve multi-class classification.
  • a machine learning model may be used to learn classification from multiple algorithms.
  • the term “derived from” generally refers to an origin or source, and may include naturally occurring, recombinant, unpurified or purified molecules.
  • a nucleic acid derived from an original nucleic acid may comprise the original nucleic acid, in part or in whole, and may be a fragment or variant of the original nucleic acid.
  • a nucleic acid derived from a biological sample may be purified from that sample.
  • a health status may be diagnosed. Such a health status can be an unhealthy status. Thereby, a certain disease, health condition, or also a predisposition may be diagnosed.
  • diagnosis or “diagnosis” of a status or outcome generally refers to predicting or diagnosing the status or outcome, determining predisposition to a status or outcome, monitoring treatment of a subject (e.g., a patient), MUG003P -44- diagnosing a therapeutic response of a subject (e.g., a patient), and prognosis of status or outcome, progression, and response to particular treatment.
  • the standard regulation networks or a regulation model characteristic for a specific health status comprising a TF-gene, a TF-TF, a TF- DHS, a DHS-gene, a TF-DHS-gene network, gene-gene networks, and any combinations thereof, are derived from cfDNA data employing well-defined cohorts.
  • one cohort comprises healthy controls of both sexes and all age groups.
  • healthy subjects may be understood as subjects not having the symptoms that the subject to be tested is suffering from.
  • “Aging” is a combination of processes of deterioration that follow the period of development of an organism. Aging is generally characterized by a declining adaptability to stress, increased homeostatic imbalance, increase in senescent cells, and increased risk of disease. Because of this, death is the ultimate consequence of aging. Unhealthy aging may be induced by stress conditions including, but not limited to chemical, physical, and biological stresses. Unhealthy aging is also referred to as “inflammaging”.
  • accelerated aging can be induced by stresses caused by UV and IR irradiation, drugs and other chemicals, chemotherapy, intoxicants, such as but not limited to DNA intercalating and/or damaging agents, oxidative stressors etc; mitogenic stimuli, oncogenic stimuli, toxic compounds, hypoxia, oxidants, caloric restriction, exposure to environmental pollutants, for example, silica, exposure to an occupational pollutant, for example, dust, smoke, asbestos, or fumes.
  • the standard regulation network or regulation model characteristic for an unhealthy subject are derived from subjects suffering from a condition selected from cancer, specifically colorectal cancer, prostate cancer, colon cancer, breast cancer, bladder cancer, and/or lung cancer; inflammation; autoinflammatory diseases, coronary disease, acute tissue damage; chronic disease, specifically a chronic disease affecting the gastrointestinal tract, more specifically Crohn’s disease or ulcerative colitis, or chronic obstructive MUG003P -45- pulmonary disease; and/or asthma, or thyroiditis; complications during pregnancy; beginning sepsis; sepsis; hypertension; obesity; and diabetes; processes associated with aging.
  • a condition selected from cancer specifically colorectal cancer, prostate cancer, colon cancer, breast cancer, bladder cancer, and/or lung cancer
  • inflammation autoinflammatory diseases, coronary disease, acute tissue damage
  • chronic disease specifically a chronic disease affecting the gastrointestinal tract, more specifically Crohn’s disease or ulcerative colitis, or chronic obstructive MUG003P -45- pulmonary disease
  • asthma or thyroiditis
  • cohorts of well-defined diseases or conditions e.g., chronic diseases, involving mainly specific organs may be generated.
  • diseases or conditions e.g., chronic diseases, involving mainly specific organs
  • individuals with colorectal cancer or chronic diseases affecting the GI tract e.g., Crohn’s disease, ulcerative colitis
  • a cohort of healthy controls may also comprise the data from samples of healthy individuals with common “co-morbidities”, such as hypertension, obesity, diabetes. Specifically, in this case, such co-morbidities do not lead to the result of an unhealthy status of a subject.
  • standard regulation networks or regulation models characteristic for a specific health status may be adapted to the specific application. For example, depending on the specific aim of the health status determination, data from specific physiological conditions may be incorporated into the healthy control or the unhealthy control.
  • described herein is also the establishment of standard regulation networks and regulation models.
  • these standard regulation networks and regulation models are established from samples of healthy and/or unhealthy subjects as described herein.
  • defined sets of cfDNA samples i.e., from well-defined cohorts which not represent only two states (disease X vs.
  • a specific disease may be diagnosed by the methods described herein.
  • the term “subject” generally refers to an individual, entity or a medium that has or is suspected of having testable or detectable genetic information or material.
  • a subject can be a person, individual, or patient.
  • the subject can be a vertebrate, such as, for example, a mammal.
  • Non-limiting examples of mammals include humans, simians, farm animals, sport animals, rodents, and pets.
  • the subject may be displaying a symptom(s) indicative of a health or physiological state or condition of the subject, such as a cancer or a stage of a cancer of the subject.
  • the subject can be asymptomatic with respect to such health or physiological state or condition.
  • an in vitro method for monitoring the treatment success of a patient comprising the steps of: i. extracting cfDNA fragments from a sample of said patient; ii. performing whole genome sequencing on the extracted cfDNA fragments; iii. determining at least one of the regulation networks selected from the group consisting of: a. a TF-gene network; b. a TF-TF network; c. a gene-gene network; d. a TF-DNase hypersensitive site (DHS) network; and e. a DHS-gene network; iv. comparing the at least one network of iii.
  • a computer-implemented method for monitoring the treatment success of a patient comprising the steps of: i. receiving data representing the DNA sequences of cfDNA fragments acquired by sequencing of cfDNA fragments extracted from a sample; ii. determining at least one of the regulation networks selected from the group consisting of: a. a TF-gene network; b. a TF-TF network; c. a gene-gene network; MUG003P -47- d.
  • TF-DNase hypersensitive site DHS
  • DHS-gene network e. comparing the at least one network of ii. with one or more regulation networks of a previous result from said patient and/or with one or more standard regulation networks characteristic for the treatment success, wherein differences and/or congruences obtained in iii. provide information on the treatment success of the patient.
  • the treatment success is determined for diseases selected from cancer, specifically colorectal cancer, prostate cancer, colon cancer, breast cancer, bladder cancer, and/or lung cancer inflammation; autoinflammatory diseases, coronary disease, acute tissue damage; chronic disease, specifically a chronic disease affecting the gastrointestinal tract, more specifically Crohn’s disease or ulcerative colitis, or chronic obstructive pulmonary disease; and/or asthma, or thyroiditis; complications during pregnancy; beginning sepsis; sepsis; hypertension; obesity; and diabetes; processes associated with aging.
  • diseases selected from cancer, specifically colorectal cancer, prostate cancer, colon cancer, breast cancer, bladder cancer, and/or lung cancer inflammation; autoinflammatory diseases, coronary disease, acute tissue damage; chronic disease, specifically a chronic disease affecting the gastrointestinal tract, more specifically Crohn’s disease or ulcerative colitis, or chronic obstructive pulmonary disease; and/or asthma, or thyroiditis; complications during pregnancy; beginning sepsis; sepsis; hypertension; obesity; and diabetes; processes associated with aging.
  • Non-limiting examples of the diagnosed, monitored, or treated diseases include, neurodegenerative diseases, cancers, chemotherapy-related toxicities, irradiation induced toxicities, organ failures, organ injuries, organ infarcts, ischemia, acute vascular events, a stroke, graft-versus-host-disease (GVHD), graft rejections, sepsis, systemic inflammatory response syndrome (SIRS), cytokine releasing syndrome (CRS), multiple organ dysfunction syndrome (MODS), traumatic injuries, aging, diabetes, atherosclerosis, autoimmune disorders, eclampsia, preeclampsia, infertility, pregnancy- associated complications, coagulation disorders, asphyxia, drug intoxication, poisoning, and infections.
  • GVHD graft-versus-host-disease
  • SIRS systemic inflammatory response syndrome
  • CRS cytokine releasing syndrome
  • MODS multiple organ dysfunction syndrome
  • the disease is a cancer.
  • Numerous cancers may be detected, monitored, or treated using the methods described herein. Cancer cells, as most cells, can be characterized by a rate of turnover, in which old cells die and are replaced by newer cells. Generally dead cells, in contact with vasculature in a given patient, may release DNA or fragments of DNA into the bloodstream. This is also true of cancer cells during various stages of the disease. This phenomenon may be used to detect the presence or absence of cancers in individuals using the methods described herein. For example, blood from patients at risk for cancer is drawn, or urine is collected, and the sample is prepared as described herein to generate a population of cfDNA.
  • the methods of the disclosure are employed to detect cfDNA fragment patterns and features MUG003P -48- that may be unique to certain cancers present.
  • the method may detect the presence of cancerous cells in the body, despite the absence of symptoms or other hallmarks of disease.
  • the method may also help to detect different subtypes of cancer based on the features of the cfDNA fragments detected in the patient sample.
  • the types and number of cancers that are detected, monitored, or treated include, but are not limited to, blood cancers, brain cancers, lung cancers, skin cancers, nose cancers, throat cancers, liver cancers, bone cancers, lymphomas, pancreatic cancers, bowel cancers, rectal cancers, thyroid cancers, bladder cancers, kidney cancers, mouth cancers, stomach cancers, solid state tumors, heterogeneous tumors, homogeneous tumors and the like.
  • the methods provided herein may be used to monitor already known cancers or other diseases in a particular patient. This allows a practitioner to adapt treatment options in accordance with the progress of the disease.
  • the methods described herein may track cfDNA or ctDNA in a particular patient over the course of the disease.
  • cancers can progress, i.e. become more aggressive and genetically unstable. In other examples, cancers remain benign, inactive, dormant or in remission.
  • the methods of this disclosure may be useful in determining disease progression, remission or recurrence and the appropriate adjustments in treatment that are required for the disease state. Further, the systems and methods described herein may be useful in determining the efficacy of a particular treatment option. Biological samples are collected longitudinally over time from a single patient and comparison of the cfDNA profiles in all of the different samples collected illustrates how the cancer or disease is progressing or diminishing.
  • cCREs candidate cis-regulatory elements
  • cCREs refers to regions of non-coding DNA which regulate the transcription of neighboring genes, i.e., promoters, enhancers, silencer, and operators. In general, cCREs are typically devoid of nucleosomes to allow binding of transcription factors.
  • open chromatin refers to DNA regions which are often associated with regulatory factor binding and correspond to nucleosome-depleted regions (NDRs). Such open chromatin regions are associated with DNA regulatory elements, including promoters, enhancers, silencers, insulators, and locus control regions.
  • open chromatin regions i.e., cCREs are analyzed as described herein and networks are deduced thereof. Such networks are regulation networks or may be interaction networks. According to one embodiment, in the methods described herein open chromatic features to infer information are described, particularly at the network level. According to one embodiment, combinations of multiple tissue-specific cCRE markers in cfDNA are analyzed and specific interaction networks are built. According to one embodiment, prior knowledge about cCREs is used to construct sets of thousands of cCREs informative about disease and certain physiological conditions. According to one embodiment, maps of candidate cis- regulatory elements “cCREs” are publicly available e.g. through the ENCODE data portal or other publicly accessible databases.
  • cCREs can be selected from publicly available databases (Encode Project Consortium et al., 2020a; Meuleman et al., 2020; Vierstra et al., 2020; Zhang et al., 2021) or from selected publications, e.g., for tissue-specific DHSs (Zhang et al., 2021). Further features may be derived from the expanding knowledge of tissue-specific gene expression (Breschi et al., 2020; Uhlen et al., 2015; Yao et al., 2015) or databases which get constantly updated, such as the Human Protein Atlas (www.proteinatlas.org), Genotype-Tissue Expression (GTEx) project (gtexportal.org), or others.
  • the interplay between chromatin accessibility and gene expression dynamics is leveraged in the methods described herein.
  • the cfDNA analysis focuses on regulatory network connections which are more tissue-specific than investigating only genes or only transcription factors. Thereby, context-dependent, non-canonical regulatory pathways, which are in addition to tissue-specifying informative about altered pathways, and therefore about diseases and physiological changes.
  • an integrative analysis of chromatin accessibility and gene expression is described herein which corresponds to an application of GRN-cCREs.
  • tissue-specific TFs derived from publicly available databases, such as the resources provided by (Lambert et al., 2018) or the TF-Marker database (Xu et al., 2022).
  • TF-Marker database Xu et al., 2022
  • thousands of links between individual regulatory elements and their target genes are considered and viewed within MUG003P -50- their biological context.
  • a transcribed, i.e., active, TF affects several proximal and distal open chromatin regions/cCREs (summarized in Figure 1): ⁇
  • the nucleosome depleted region (NDR) of the respective TF has a distinctive coverage pattern.
  • The distances between nucleosomes downstream of the TSS of this TF are different compared to silent genes, which can be assessed by Fourier transformation (FFT; informs about the frequencies) and short-time Fourier transformation (STFT; gives the location of the frequencies, i.e., the position when a signal changes).
  • FFT Fourier transformation
  • STFT short-time Fourier transformation
  • TFBSs show increased accessibility (TFBSs are selected from the GTRD database).
  • ⁇ interaction networks are built and trained.
  • Figure 1 describes the effect of transcribed TFs on open chromatin regions and nucleosome positions. Transcribed, i.e., active, TFs affect proximal and distal open chromatin regions/cCREs and nucleosome positions are shown.
  • the NDR is flanked by oscillating coverage patterns where the peaks indicate the position of the nucleosomes upstream and downstream of the NDR.
  • FFT fast Fourier transformation
  • STFT short- time Fourier transformation
  • Associated TFBSs may show different accessibilities.
  • the high accessibility at the binding sites of the TF REST is visible as an oscillating pattern in healthy controls (gray).
  • P148_1 the accessibility of the REST binding sites is comparable to healthy controls.
  • P148_3 the accessibility is decreased, as indicated in an almost flat line.
  • Each TF influences the expression of several downstream genes. The downstream genes may differ significantly depending on the tissue or cell type. It was suggested that gene regulatory network connections are more tissue- specific than genes or TFs (Sonawane et al., 2017) (see Figure 2).
  • MUG003P -51- (Right upper panel): Schematic diagram of TF-DHS and DHS-gene assignments. Differentially expressed TFs influence DHS densities, and in particular distal DHSs are linked to their downstream target genes and affect their expression. Such networks are integral in generating specific signatures for cell lineages. According to one embodiment, in the methods described herein many open chromatin features are integrated to infer information at the network level. According to one embodiment, cCRE sets and combinations thereof are made according to the methods described herein. According to a specific embodiment, signatures are used to learn neural network models to predict medical information.
  • building of four interaction networks is described herein, i.e., TF-gene, TF-TF, gene-gene, and TF-DHS-gene. Furthermore, the combination of these networks to a comprehensive model is described herein.
  • the method described herein includes the elucidation of TF-DHS-gene interaction networks within cfDNA samples by employing strategies to combine multiple parameters
  • the method described herein includes the use of edges (TF-gene, TF-TF, TF-DHS, DHS-gene) that are cell-type specific. Some of them are registered in the same network.
  • the methods described herein include that these components are an approximation and serve for cell type/disease identification. There are enough specific edges to get power, especially in low-coverage (integration over many sites) situations.
  • the method described herein includes the generation of generalizable models capable of identifying different disease stages and physiological conditions.
  • the method described herein includes that, due to the multitude of data points, the herein described approach can handle smaller numbers of samples and samples sequenced with relatively low coverage.
  • MUG003P -52- the method described herein includes inferring functional and biological information from cfDNA.
  • the mere presence of DNA released from specific organs is not informative about the biological relevance, e.g., whether it indicates disease and, if so, which pathways are altered.
  • the herein described approach enables the inclusion of the underlying systems biology in cfDNA applications.
  • the methods described herein are minimally invasive or non-invasive.
  • the target groups for adopting the methods described herein are physicians, patients, and professionals in the life science sectors.
  • nucleosomes surrounding these highly accessible DNA sequences are characterized by specific features, such as histone modifications and different nucleosome distances to each other.
  • TFBS accessibility scores are used as input features in machine learning models to find correlations between sequence composition and subject (e.g., patient) groups. Examples of such patient groups include presence of diseases or conditions, stages, subtypes, responders vs. non-responders, and progressors vs. non-progressors.
  • feature matrices are generated to compare samples obtained from individuals with known conditions or characteristics. In some examples, samples are obtained from healthy individuals or individuals who do not have any of the known indications, and samples from patients known to have cancer.
  • the term “feature” refers to an individual measurable property or characteristic of a phenomenon being observed.
  • Features are usually numeric, but structural features such as strings and graphs may be used in syntactic pattern recognition.
  • the concept of “feature” is related to that of explanatory variable used in statistical techniques such as for example, but not limited to, linear regression.
  • the feature is a transcription factor binding profile.
  • the feature is an accessibility score calculated from a transcription factor binding profile.
  • the features are inputted into a feature matrix for machine learning analysis.
  • the accessibility scores of at least 2, or at least 5, or at least 10, or at least 15, or at least 20, or at least 25 transcription factor binding sites are determined and inputted into a machine learning model to train a classifier capable of MUG003P -53- distinguishing between healthy subjects and cancer patients, or between disease progressors and non-progressors.
  • the accessibility scores of at least 2, or at least 5, or at least 10, or at least 15, or at least 20, or at least 25 transcription factor binding sites are determined and inputted into a machine learning model to train a classifier capable of distinguishing between a plurality of disease subtypes, or a plurality of disease stages.
  • the accessibility scores of at least 2, or at least 5, or at least 10, or at least 15, or at least 20, or at least 25 transcription factor binding sites are determined and inputted into a machine learning model to train a classifier capable of distinguishing between disease treatment responders and non-responders.
  • the system identifies feature sets to accept as inputs to a machine learning model.
  • the system performs an assay on each molecule class and forms a feature vector from the measured values.
  • the system accepts as inputs the feature vector into the machine learning model and generates an output classification of whether the biological sample has a specified property.
  • the machine learning model generates a classifier capable of distinguishing between two or more groups or classes of individuals or features in a population of individuals or features of the population.
  • the classifier may be a binary classifier capable of distinguishing between two groups or classes of individuals or features in a population of individuals or features of the population.
  • the classifier may be a multi-class classifier capable of distinguishing between more than two groups or classes of individuals or features in a population of individuals or features of the population.
  • the classifier is a trained machine learning classifier.
  • the informative loci or features of biomarkers in a cancer tissue are assayed to form a profile.
  • receiver operating characteristic (ROC) curves may be generated for plotting the performance of a particular feature (e.g., any of the biomarkers described herein and/or any item of additional biomedical information) in distinguishing between two populations (e.g., individuals responding and not responding to a therapeutic agent).
  • the feature data across the entire population e.g., the cases and controls
  • the specified property is selected from healthy vs. cancer, a disease subtype among a plurality of disease subtypes, a disease stage among a MUG003P -54- plurality of disease stages, progressor vs.
  • the probability of the presence of a nucleosome or a nucleosomal dyad for a base position of the cfDNA fragments is determined by determining the dyad count distribution for specific fragment lengths, performing a fragment length-based truncation, determining probability density functions, and removing of the non-informative portion. This probability is also be termed “nucleosome dyad prior distribution”, “nucleosome prior distribution”, or “nucleosome prior” herein.
  • the fragment length-specific prior probability P(H) gives the probability of a nucleosome, which is represented by its dyad in our model, being positioned relative to each base of the fragment.
  • the probability distribution of the dyad location across a fragment can be approximated by the associated cleaving resistance distribution.
  • the maximum or the most pronounced local maxima of this cleaving resistance distribution gives or give the expected location of the nucleosome dyad or the locations of multiple nucleosome dyads from multiple DNA-associated histone complexes (i.e.
  • creating prior knowledge can be further used for computing the positions of nucleosomes dyads based on coverage maxima and cfDNA fragmentation by using Baye’s Theorem.
  • Theorem is shown in the equation I. In equation I, H is the hypothesis, and E is the evidence.
  • Probabilities are P(H) as the prior probability, P(E
  • the hypothesis MUG003P -55- is that the position of a nucleosome, represented by the position of its dyad, can be derived from the location of an observed cfDNA fragment, which originates from that very same nucleosome, by taking into account the length of the fragment and prior knowledge about the relationship between the dyad’s location and the fragment length.
  • the evidence E is the combined information about cfDNA fragments gained from read alignment against the reference sequence e.g., a high-quality human reference genome, after sequencing.
  • the sequence alignment step produces the length and position information for each fragment.
  • the evidence E at a specific locus will also be called “observed fragmentation” or “fragmentation evidence”.
  • H) is the probability of observing a cfDNA fragment locally under the hypothesis that nucleosomal DNA in immediate genomic vicinity was the origin of the fragment before degradation. The likelihood reduces to the observed local fragmentation after taking into account that observing unprotected fragments by chance is highly unlikely.
  • the denominator P(E) is either called marginal likelihood or model evidence.
  • the factor P(E) other parameters like genomic locus or fragment length have been “integrated out” so that the probability does not depend on them anymore. If this marginal likelihood factor is omitted, the posterior probability is only proportional to the combination of observed fragmentation and prior knowledge (equation II). (II) According to this specific embodiment related to equation II, it is not possible to integrate over the result to compute an actual probability between 0 and 1.
  • E) is the probability of the hypothesis H being true after observing E. In other words, the average resistance to cleaving by DNases across all cfDNA pool tissue sources at a respective base of the genome given the local fragmentation evidence.
  • Finding local maxima/calling peaks of this signal yields positions that show relatively high probability of harboring a nucleosome dyad in at least one of the contributing tissues since cleaving resistance maxima are considered to be conferred by nucleosomes, i.e., the maxima is the resulting average expected location of the nucleosomal dyad at that locus.
  • local peaks of the posterior probability in equation II refer to the base positions in the reference genome sequence where a nucleosomal dyad is most likely to be present as determined in the methods described herein.
  • the observed fragmentation refers to the cfDNA fragmentation profile obtained by aligning the DNA sequences of the cfDNA fragments with a reference genome sequence.
  • classifiers and predictors may be used in the methods described herein. Models for classification of health status of a patient and prediction of certain health parameters (e.g. response to therapy, development of tumor resistance to treatment, recurrence free survival, time to recurrence, tumor metastasized or not, time to sepsis, etc.) are trained on features and feature sets using machine learning methods.
  • PCA principal component analysis
  • NMF non-negative matrix factorization
  • random forests or gradient boosting machines also to limit the number of allowed decisions
  • auto-encoders to reduce feature space to important hyperparameters (also de-noising) or similar methods and/or any combination of these.
  • tissue deconvolution is used or performed in a method described herein.
  • Tissue deconvolution refers to the inference of cfDNA contribution by individual tissues and/or cell-types to the cfDNA pool uses a reference catalog of tissue-/cell type-specific feature signatures. The catalog is created from existing sequencing data sets.
  • Signatures may consist of single features or combinations of features and sets of these as described above. Features and sets may be restricted to certain regions or sets of regions of the genome, especially in the case of chromatin- associated features.
  • NMF non-negative matrix factorization
  • NMF can also be used to compute a “best fit” linear combination of signatures from a sequencing dataset. Signatures may not scale linearly with the abundance of their corresponding cfDNA releasing cells. Therefore, other methods than NMF might be used to achieve a more accurate deconvolution. Tissue deconvolution yields an estimate of tissue-/cell types that are described by the reference catalog as values between 0 and 1.
  • a computer program comprising instructions which, when the program is executed by a computer, cause the computer to carry out the computer-implemented method described herein.
  • a computer-readable medium is provided having stored thereon the computer program described herein for performing the computer-implemented method.
  • the computer-implemented method described herein comprises the step of receiving data representing the DNA sequences of cfDNA fragments acquired by sequencing of cfDNA fragments extracted from a sample of a subject.
  • said data may be generated using a sequencing-device connected to the computer or apparatus used for performing the computer-implemented method described.
  • a data processing apparatus comprising means for carrying out the computer-implemented methods described herein is provided by the invention.
  • said data processing apparatus may be connected to an apparatus or device capable of sequencing cfDNA fragments.
  • said data processing apparatus may be connected to an apparatus or device capable of extracting cfDNA from a sample.
  • said data processing apparatus is further connected to an apparatus or device capable of sequencing cfDNA fragments.
  • a computer program is provided comprising instructions which, when the program is executed by a computer, cause the computer to carry out a computer-implemented method described herein.
  • said computer program may be combined with a computer program comprising instructions to cause the device capable of extracting cfDNA from a sample to execute its function of extracting cfDNA from a sample.
  • said computer program may be further combined with a computer program comprising instructions to cause the device capable of sequencing cfDNA fragments to execute its function of sequencing cfDNA fragments.
  • an apparatus for performing a method described herein.
  • Such apparatus may be characterized by the following features: (a) a sequencer configured to (i) receive DNA extracted from a sample of the bodily fluid comprising DNA, and (ii) sequence the extracted DNA under conditions that produce DNA fragment sequences; and (b) a computational apparatus configured to (e.g., programmed to) instruct one or more processors to perform various operations such as those described with two or more of the method operations described herein.
  • the computational apparatus is configured to perform one or more of the steps of the computer-implemented method described herein.
  • the apparatus also includes a tool for extracting DNA from the sample under suitable conditions.
  • the apparatus includes a module configured to extract cfDNA obtained from plasma for sequencing in the sequencer. MUG003P -59-
  • the apparatus includes a database of reference genome sequences and/or standard regulation networks and regulation models.
  • the computational apparatus may be further configured to instruct the one or more processors to map the cfDNA fragments obtained from the blood of the individual to the database of reference genome.
  • the computational apparatus may be configured to instruct the one or more processors to determine at least one regulation network obtained from the analysis of cfDNA in a sample as described herein.
  • the computational apparatus may be configured to compare the at least one regulation network obtained from the analysis of cfDNA with one or more standard regulation networks or regulation models as described herein.
  • the computational apparatus may perform all steps of the method described herein that can be performed by such an apparatus. Analysis of the sequencing data and the results derived therefrom are typically performed using computer hardware operating according to defined algorithms and programs. Therefore, certain embodiments employ processes involving data stored in or transferred through one or more computer systems or other processing systems. Embodiments of the invention also relate to an apparatus for performing these operations.
  • This apparatus may be specially constructed for the required purposes, or it may be a general-purpose computer (or a group of computers) selectively activated or reconfigured by a computer program and/or data structure stored in the computer.
  • a group of processors performs some or all of the recited analytical operations collaboratively (e.g., via a network or cloud computing) and/or in parallel.
  • a processor or group of processors for performing the methods described herein may be of various types including microcontrollers and microprocessors such as programmable devices (e.g., CPLDs and FPGAs) and other devices such as gate array ASICs, digital signal processors, and/or general purpose microprocessors.
  • programmable devices e.g., CPLDs and FPGAs
  • gate array ASICs gate array ASICs
  • digital signal processors digital signal processors
  • general purpose microprocessors e.g., digital signal processors, and/or general purpose microprocessors.
  • certain embodiments relate to tangible and/or non-transitory computer readable media or computer program products that include program instructions and/or data (including data structures) for performing various computer-implemented operations.
  • Examples of computer-readable media include, but are not limited to, semiconductor memory devices, magnetic media such as disk drives, magnetic tape, optical media such as CDs, magneto-optical media, and hardware devices that are specially configured to store and perform program instructions, such as read-only memory devices (ROM) and random access memory (RAM).
  • ROM read-only memory
  • RAM random access memory
  • the computer readable MUG003P -60- media may be directly controlled by an end user or the media may be indirectly controlled by the end user. Examples of directly controlled media include the media located at a user facility and/or media that are not shared with other entities.
  • Examples of indirectly controlled media include media that is indirectly accessible to the user via an external network and/or via a service providing shared resources such as the "cloud.”
  • Examples of program instructions include both machine code, such as produced by a compiler, and files containing higher level code that may be executed by the computer using an interpreter.
  • a computer-implemented method is described herein, wherein said computer-implemented method is used in a method described herein, specifically in an in vitro method described herein.
  • a computer-implemented method is used for the steps of: i. receiving data representing the DNA sequences of cfDNA fragments acquired by sequencing of cfDNA fragments extracted from a sample; and ii.
  • the computer-implemented method described herein comprises the step of receiving data representing the DNA sequences of cfDNA fragments acquired by sequencing of cfDNA fragments extracted from a sample.
  • This step is to be understood as being equal to the steps of extracting cfDNA fragments from the sample, and performing whole genome sequencing on the extracted cfDNA fragments, as described in the in vitro methods described herein.
  • the following items are described herein: 1.
  • An in vitro method for analyzing the cell and/or tissue origin of cell-free DNA (cfDNA) fragments from a sample comprising the steps of: i. extracting cfDNA fragments from the sample; ii.
  • TF-gene network a transcription factor (TF)-gene network
  • TF-TF network a transcription factor (TF)-gene network
  • TF-TF network a gene-gene network
  • MUG003P -61- d a TF-DNase hypersensitive site (DHS) network
  • DHS TF-DNase hypersensitive site
  • a computer-implemented method for analyzing the cell and/or tissue origin of cell-free DNA (cfDNA) fragments from a sample comprising the steps of: i. receiving data representing the DNA sequences of cfDNA fragments acquired by sequencing of cfDNA fragments extracted from a sample; ii. determining at least one of the regulation networks selected from the group consisting of: a. a TF-gene network; b. a TF-TF network; c.
  • a gene-gene network d. a TF-DNase hypersensitive site (DHS) network; and e. a DHS-gene network, and iii. comparing the at least one network of ii. with one or more standard regulation networks or regulation models characteristic for a specific tissue or cell comprising at least one network selected from a TF-gene network, a TF-TF network, a TF-DHS network, a DHS-gene network, and any combination thereof.
  • An in vitro method for determining the health status of a subject comprising the steps of: i. extracting cfDNA fragments from a sample from the subject; ii. performing whole genome sequencing on the extracted cfDNA fragments; and iii.
  • determining at least one of the regulation networks selected from the group consisting of: a. a TF-gene network; b. a TF-TF network; c. a gene-gene network; d. a TF-DNase hypersensitive site (DHS) network; and e. a DHS-gene network; MUG003P -62- iv. comparing the at least one regulation network of iii. with one or more standard regulation networks or regulation models derived from healthy subjects, and/ or unhealthy subjects; wherein a. congruence with the standard regulation network or regulation model derived from healthy subjects and difference with the standard network or model derived from unhealthy subjects is characteristic for a healthy status; and/or b.
  • a computer-implemented method for determining the health status of a subject comprising the steps of: i. receiving data representing the DNA sequences of cfDNA fragments acquired by sequencing of cfDNA fragments extracted from a sample; ii. determining at least one of the regulation networks selected from the group consisting of: a. a transcription factor (TF)-gene network; b. a TF-TF network; c. a gene-gene network; d. a TF-DNase hypersensitive site (DHS) network; and e. a DHS-gene network; iii.
  • TF transcription factor
  • DHS TF-DNase hypersensitive site
  • An in vitro method for monitoring the treatment success of a patient comprising the steps of: i. extracting cfDNA fragments from a sample of said patient; ii. performing whole genome sequencing on the extracted cfDNA fragments; iii. determining at least one of the regulation networks selected from the group consisting of: a. a TF-gene network; b. a TF-TF network; c. a gene-gene network; d. a TF-DNase hypersensitive site (DHS) network; and e. a DHS-gene network; iv. comparing the at least one network of iii.
  • a computer-implemented method for monitoring the treatment success of a patient comprising the steps of: i. receiving data representing the DNA sequences of cfDNA fragments acquired by sequencing of cfDNA fragments extracted from a sample; ii. determining at least one of the regulation networks selected from the group consisting of: a. a TF-gene network; b. a TF-TF network; c. a gene-gene network; d. a TF-DNase hypersensitive site (DHS) network; and e.
  • DHS TF-DNase hypersensitive site
  • a DHS-gene network MUG003P -64- iii. comparing the at least one network of ii. with one or more regulation networks of a previous result from said patient and/or with one or more standard regulation networks characteristic for the treatment success, wherein differences and/or congruences obtained in iii. provide information on the treatment success of the patient.
  • diseases selected from cancer, specifically colorectal cancer, prostate cancer, colon cancer, breast cancer, bladder cancer, and/or lung cancer inflammation autoinflammatory diseases, coronary disease, acute tissue damage
  • chronic disease specifically a chronic disease affecting the gastrointestinal tract, more specifically Crohn’s disease or ulcerative colitis, or chronic obstructive pulmonary disease
  • asthma or thyroiditis
  • determining the actively transcribed TFs b. determining the tissue-specificity of the actively transcribed TFs from a.; c. determining the gene sets which the TFs from a. activate in each tissue determined in b.; d. evaluating if the gene sets are transcribed; e. determining the intersect for identical and different genes from the gene sets; and f. determining the network from the data obtained from e.. 10.
  • the in vitro method or the computer-implemented method of any one of the preceding items wherein the actively transcribed TFs are deduced from the coverage pattern at the transcription start site (TSS), preferably said coverage pattern comprises the nucleosome depleted region (NDR) and the 2K regions; the position of upstream and downstream nucleosome position patterns; the transcription factor binding sites (TFBS) accessibility; and optionally further from the relative entropy at TSS and TFBS.
  • determining the TF-TF network comprises the steps of: a. assessing the accessibility of the respective TFBS of each TF; MUG003P -65- b.
  • determining the gene-gene network comprises the steps of: a. determining the expression status of pre-selected genes or gene-sets, wherein the expression status is determined by i. determining the coverage pattern at the NDR and/or at the 2K region, or ii. determining if a nucleosome is present at the NDR; b. correlating the genes according to their expression status; and c.
  • the in vitro method or the computer-implemented method of any one of the preceding items, wherein determining the TF-DHS network comprises the steps of: a. determining the actively transcribed TFs; b. determining maps of accessible distal DHSs; c. correlating the actively transcribed TFs with the maps of distal DHSs; and d. determining the network from the data obtained from c.. 14.
  • determining the DHS-gene interaction network from DNA sequences of cfDNA fragments comprises the steps of: a. determining the gene expression status by i.
  • a model comprising at least one of the regulation networks selected from TF-gene, TF-TF, gene-gene, TF-DHS, and DHS-gene networks obtained from cfDNA according to the method of any one of the preceding items.
  • a data processing apparatus comprising means for carrying out the computer-implemented method of any one of the preceding items.
  • a computer program comprising instructions which, when the program is executed by a computer, cause the computer to carry out the computer-implemented method of any one of the preceding items.
  • a computer-readable medium having stored thereon the computer program of item 18.
  • These tubes contain a proprietary blend of reagents that both prevents blood coagulation and stabilizes white blood cells. This is critical, as white blood cells can release their DNA into the circulation, which masks the organ- or tumor-specific signal that is to be detected and profiled downstream.
  • standard tubes such as EDTA tubes can be used, in such cases, processing of the blood has to be started within a defined time range.
  • the tubes are centrifuged for two steps at 1,900 x g for 10 minutes each. DNA is isolated from 2-6mL of plasma using a cfDNA-specific extraction kit. Double-stranded DNA (dsDNA) originating from plasma is then quantified.
  • dsDNA Double-stranded DNA
  • GE diploid genome equivalents
  • 10ng of DNA are used as input and PCR is performed to selectively amplify library fragments containing adapters for subsequent sequencing. Libraries are then quantified and sequenced in paired-end mode (150bp x 2 or 100bp x 2) at high coverage ( ⁇ 30x). For humans, 30x coverage can be achieved with 600 million reads of 150 bp (or 300M paired-end reads).
  • Example 2 Computation of nucleosome dyad prior distribution The basic principle is illustrated in Figure 8. (left side: assumption 1) The lower coverage plot illustrates three regions where different numbers of cfDNA fragments map. To the left is a locus with high coverage of sequencing reads, indicating high resistance to enzymatic digestion. The region on the right side has fewer sequencing reads suggesting moderate resistance to cleaving.
  • nucleosomes offer protection from enzymatic digestion during apoptosis and as the nucleosomal dyad is the region where nucleosomal DNA is most tightly bound, it is possible to translate the sequencing depths into nucleosome position maps where the position of maximum coverage overlaps with the nucleosome dyad (dashed light grey line in the upper coverage plot, the position of the inferred nucleosome dyad axis). Hence, nucleosomal dyad positions are inferred from sequencing read depth analysis for each cfDNA fragment in this step.
  • the individual cfDNA fragment nucleosomal dyad information is then used to infer within each cfDNA fragment the relative position of the dyad (block arrows; left panel “Nucleosomal Fragments”).
  • the nucleosomal dyad may map in the center of a cfDNA fragment, or may be somewhere off the center, or may not be determinable at all.
  • the next step involves fragment length-specific dyad statistics. The inferred nucleosome dyad positions are recorded for all fragments that map to the same locus and have the same length (center panel, black triangles).
  • nucleosome prior distribution (yi).
  • the initial dyad count distribution for a specific fragment length is first truncated according to a certain strategy (step 2; the strategy shown is fragment length-based truncation), then normalized to an area under the curve of one to resemble a probability MUG003P -68- density function (step 3) and finally, the non-informative constant portion of counts is removed by adjusting the zero level (step 4).
  • Inferred dyads were counted beyond fragment ends, which are indicated by the medium gray dashed lines.
  • the distributions shown in the figure are used for computing prior distributions of nucleosome dyads.
  • Medium gray areas indicate low counts.
  • the transition from medium gray to darker gray and further to lighter grays up to white as in the center of the figure, indicates increasing counts.
  • the darker spots to the far left and right of the center mark increase in counts that can be attributed to neighboring nucleosomes of the observed fragment.
  • the degree of how precisely neighboring nucleosomes are positioned relative to the nearest one can be derived from the spread of the spot for that fragment length.
  • the uniform truncation strategy would use an identical distance from the center of each fragment to the truncated bases at both sides (end-to-end distance here: 170bp; white dotted lines).
  • An unmarked version of the count heatmap is shown in the small panel on the top right.
  • Nucleosome occupancy pattern from nucleosome priors The principle is shown in Figure 11. Illustration of how the average dyad signal ( ⁇ ), i.e., the nucleosome posterior signal (bottom panel), is computed from the previously computed fragment-specific prior distributions (yi; center panel).
  • average dyad signal
  • yi the nucleosome posterior signal
  • DNA fragments are replaced by their characteristic nucleosome prior distributions, and the per-base average across these MUG003P -69- distributions yields the nucleosome posterior signal.
  • Example 3 TF-gene interaction network Elucidation of TF to gene regulatory relationships from cfDNA data is described. The aim is to reconstruct the TF-gene interaction network from cfDNA data, i.e., the regulatory connections from TFs to their target genes within the GRN.
  • Data on TFs and regulated genes can be retrieved from various sources, e.g., from the PANDA (Passing Attributes between Networks for Data Assimilation) (Guebila et al., 2022b) or the GRAND websites (https://grand.networkmedicine.org) (Guebila et al., 2022a).
  • Figure 2 illustrates downstream-regulated genes for the TFs HNF4G and HNF4A in whole blood and colon adenocarcinoma. In the respective tissues, different genes are affected by these two TFs.
  • Figure 2 shows the following: To explore the GRN of TFs in different tissues, HNF4G (a) and HNF4A (b) were exemplarily analyzed in whole blood (left panel) and colon adenocarcinoma (right panel) employing the GRAND database (https://grand.networkmedicine.org) (Guebila et al., 2022a). In both tissues, these two TFs regulate the expression of different gene sets.
  • Figure 2a panel HNF4G, blood shows the following: The GRN of the TFs HNF4G in whole blood was established using the GRAND database (https://grand.networkmedicine.org) (Guebila et al., 2022a).
  • the graph shows the results for the top 10 genes; the thickness of the arrows reflects the edge weight.
  • Table 1 displays the top twenty-one genes associated with HNF4G in blood: Table 1 MUG003P -70- Figure 2a (panel HNF4G, colon adenocarcinoma (TCGA)) shows the following: The GRN of the TFs HNF4G in colon adenocarcinoma (TCGA) was established using the GRAND database (https://grand.networkmedicine.org) (Guebila et al., 2022a). The graph shows the results for the top 10 genes; the thickness of the arrows reflects the edge weight.
  • Table 2 displays the top twenty-one genes associated with HNF4G in colon adenocarcinoma (TCGA).
  • Table 2 Figure 2b panel HNF4A, blood shows the following: The GRN of the TFs HNF4A in whole blood was established using the GRAND database MUG003P -71- (https://grand.networkmedicine.org) (Guebila et al., 2022a). The graph shows the results for the top 10 genes; the thickness of the arrows reflects the edge weight.
  • the following table 3 displays the top twenty-one genes associated with HNF4A in blood.
  • Table 3 Figure 2b panel HNF4A, colon adenocarcinoma (TCGA) shows the following:
  • the GRN of the TFs HNF4A in colon adenocarcinoma (TCGA) was established using the GRAND database (https://grand.networkmedicine.org) (Guebila et al., 2022a).
  • the graph shows the results for the top 10 genes; the thickness of the arrows reflects the edge weight.
  • the following table 4 displays the top twenty-one genes associated with HNF4G in colon adenocarcinoma (TCGA).
  • the transcribed status is reflected in the typical TSS pattern, i.e., in the NDR and 2K region (Ulz et al., 2016) and the distances of upstream and downstream nucleosomes (FFT, STFT) and the accessibility of the corresponding TFBSs.
  • the transcribed status of the targeted genes can again be deduced from the typical TSS pattern.
  • building the TF-gene interaction network in cfDNA consists of several steps ( Figure 3a). First, a list of actively transcribed TFs is generated from cfDNA data.
  • the activity (transcription) of TFs is deduced from the coverage pattern at the TSS, i.e., the NDR and 2K regions.
  • TSS coverage pattern at the TSS
  • the position of upstream and downstream nucleosomes are included to establish the transcription status of the respective TFs.
  • the accessibility of the respective TFBSs should be increased (Ulz et al., 2019), which represents another parameter for TF status assessment included in the model.
  • TSSs i.e., NDR/2K
  • nucleosome positions i.e., TFBS accessibility
  • tissue-specificity can be taken from resources such as (Lambert et al., 2018), the TF-Marker database (Xu et al., 2022), or other sources.
  • the gene sets, which the TFs in different tissues activate are analyzed (TSS (NDR/2K) and nucleosome positions (distances between nucleosomes)).
  • TSS NDR/2K
  • nucleosome positions distances between nucleosomes
  • these gene sets can be retrieved from various resources (e.g., the GRAND database).
  • the gene sets analyzed in the third step are evaluated for the most substantial evidence of whether they are active, i.e., transcribed. For example, one option is to establish a ranking order according to the evidence to which tissue they correspond.
  • cfDNA represents a mixture of DNA released from different tissues, and the composition may change depending on physiologic or pathological conditions.
  • multiple ranking lists of gene networks i.e., one ranking list for each TF are generated. Numerous data sets are obtained to reconstruct which tissue contributed what percentage to the cfDNA pool.
  • these gene lists are compared and filtered for the genes common or different in the lists ( Figure 3b).
  • the purpose is that the gene networks regulated by tissue- specific TFs (e.g., hematopoietic or GI-specific TFs, as exemplarily shown in Figure 3b) will have similarities and overlap (Sonawane et al., 2017).
  • Figure 3 Construction of TF-gene interaction networks describes the following: (a) The individual steps in building the TF-gene interaction network in cfDNA are indicated in the rectangles. First, a list of actively transcribed TFs is generated (for details, see text). Second, from the actively transcribed TFs, the TFs with reliable information on tissue-specificity are selected. Third, for each transcribed TFs, the specific gene sets, which the TF in different tissues activates, are analyzed. Each gene set is intersected with a BED file containing the regulatory regions of the genes.
  • the TSS (NDR/2K) and MUG003P -74- nucleosome positions (distances between nucleosomes) are evaluated.
  • the tissue-specific gene sets are considered for the most substantial evidence of whether they are active, i.e., transcribed.
  • One option for further processing is to establish a ranking order according to the evidence to which tissue they correspond.
  • Other options may include employing neuronal networks or autoencoders. As these analyses will be repeated for each TF, several such ranking lists will be generated, providing an accurate tissue contributions pattern.
  • the gene sets are intersected for similarities and differences. Illustrated are examples of TFs with high specificity in hematopoiesis and the GI tract.
  • TF1 has a high specificity for hematopoietic cells. It regulates gene set 2 in “tissue B”, which may, in this example, be neutrophils. Furthermore, it regulates gene set 13 in “tissue M”, e.g., lymphocytes. TF1 may also be involved in controlling gene sets in other organs, e.g., gene set 15 in “tissue O”, which may, in this example, be the kidney. If the kidney does not contribute DNA to the cfDNA pool (as expected in healthy controls), there should not be the gene set 15 specific signature detectable in the respective DNA sample.
  • TF-TF interaction network Establishment of cooperative interactions between TFs in each cfDNA sample, i.e., deciphering of the TF networks, which cooperate in each cfDNA sample is described.
  • TF-TF networks the regulatory process is multi-faceted.
  • the genomic locations where TFs may bind, i.e., the TFBSs, are usually computationally estimated using DNA recognition sequences, i.e., motifs.
  • TFs may work together by forming protein complexes. Consequently, a member of a TF complex may regulate a target gene even without a corresponding binding site in the regulatory region of that gene.
  • PPI protein- protein interaction
  • TFs often form multi-protein complexes that carry out regulatory functions. Therefore, investigating only an initial set of motif locations does not include cases where TFs bind to the DNA without a corresponding recognition sequence (motif).
  • motif recognition sequence
  • not all TFBSs are functionally relevant or MUG003P -75- active. In the following, it is outlined how evidence for TF interactions can be established from cfDNA data ( Figure 4).
  • the procedure includes a constantly updated screening for TFBS overlaps based on the latest versions of the respective databases ( Figure 4b, lower panel).
  • PPI data are employed to explore established TF cooperation in cfDNA.
  • PPIs can be obtained from public interaction databases such as PINA2, STRING, IntAct, and BioGRID or from publications e.g. from Göös et al. (2022). It is suggested that TFs prefer to form more transient or proximal interactions than stable protein complexes. Furthermore, marked differences in the number of detected PPIs between different TF families are observed. There are two possible scenarios based on the fact that TFs, which cooperate within the same tissue, should show a high concordance of their accessibility in cfDNA: 1.
  • TFs cooperate exclusively in the same tissue, their accessibility patterns should show a strong correlation. In these cases, it will be possible to deduce something like: if TFn has increased accessibility, so should TFm. For example, if HNF4A has increased accessibility, the accessibility of TYY1 should be likewise increased.
  • Figure 4d illustrates an example of close cooperation between TF1 and TF2, but not TF3, in the hematopoietic system. Examples for co-expressed, cell-type-specific TFs exist. For example, most cells in the human body share a few broad transcriptional programs, which define five major cell types: epithelial, endothelial, mesenchymal, neural, and blood cells (Figure 4e).
  • these transcriptionally defined major cell types correspond broadly, but not precisely, to the basic histological types in which tissues are usually classified (Breschi et al., 2020).
  • the cooperate accessibility of these TFs has a high power to evaluate the balance between hematopoietic derived DNA and epithelial DNA within a cfDNA sample with high precision.
  • the pattern of correlations in cfDNA changes according to the contribution of different tissues to the cfDNA. This is MUG003P -76- depicted in Figure 4f, where TF1 cooperates with T2 in hematopoietic cells and with TF3 in the GI tract.
  • TF1-TF2 pattern In cfDNA samples from individuals without diseases of the GI tract, i.e., without GI-derived DNA in the circulation, the TF1-TF2 pattern will be concordant because of the low contribution of GI-derived DNA to the cfDNA pool.
  • the contribution of DNA from the GI tract increases in the circulation, the correlation between accessibility patterns between TF1 and TF3 increases, whereas the correlation between TF1 and TF2 decreases.
  • TF correlation matrices can elucidate which TFs cooperate and establish a TF-by-TF “cooperativity network”, particularly if DNA from various tissues contributes to the cfDNA pool.
  • TF interactions can be set for various tissues, which release DNA into the circulation.
  • Figure 4 Construction of TF-TF interaction networks describes the following: (a) TFBSs may show different accessibilities, ranging from high to medium or low or not accessible. (b) The present procedure includes regular screening for potential TFBS overlaps based on the latest versions of the respective databases. (c) The TF HNF4A is depicted as an example of establishing TF cooperations by incorporation of PPI data.
  • the scheme is based on data from (Göös et al., 2022) and reveals the TFs NFIB, NFIA, ELF2, TYY1, CREB1, and P53 as cooperation partners of HNF4A.
  • PPI data suggest strong cooperation between TF1 and TF2 but not with TF3 in hematopoietic lineages.
  • TF1 and TF2 show concordant TFBSs accessibilities in most cfDNA samples, whereas the accessibility of TF3 is independent of the TF1 and TF2 patterns.
  • Left panel Correspondence between transcriptionally derived major cell types and classical histological types.
  • centroids For example, centroids for core gene sets that correspond to the major cell types epithelial, endothelial, mesenchymal, neural, and blood cells are shown.
  • centroids for housekeeping genes which are usually highly expressed, and for unexpressed genes, according to the protein atlas (PAU (protein atlas unexpressed genes) are used as references to estimate the expression status of gene sets.
  • HK genes housekeeping genes
  • PAU protein atlas unexpressed genes
  • the centroid for a blood core gene set should be close to the HK gene set because the vast majority of cfDNA is derived from the hematopoietic system.
  • epithelial cells do not contribute significant quantities of DNA to the cfDNA pool, the epithelial centroid should be in the vicinity of the PAU centroid.
  • the NDR is nucleosome-free to enable a rapid MUG003P -78- transcription initiation.
  • One option to build a gene-gene interaction network is to generate co-regulatory gene networks based on the NDR nucleosome pattern.
  • One network consists of genes with nucleosome-blocked NDR, whereas the other consists of genes with nucleosome-free NDR.
  • Co-regulation of genes from cfDNA can be determined with two strategies: one strategy involves defining core gene sets and determining their combined expression pattern in cfDNA. Another strategy investigates single genes. This may utilize the prior nucleosome strategy where the presence or absence of a nucleosome at its TSS is used as a proxy for gene expression.
  • Core gene set strategy Distinct core gene sets are defined; the design of these gene sets depends on the question to be addressed. For example, core gene sets are defined corresponding to major cell types using extensive new maps of RNA transcripts in a broad range of primary cell types (Figure 5a). Core transcriptional programs define the morphology and function common to a few major cellular types, which are at the root of the hierarchy of the many cell types that exist in the human body, i.e., epithelial, endothelial, mesenchymal, neural, and blood cells (Breschi et al., 2020). Genes whose expression is specific to these cell types are identified.
  • nucleosome at the NDR means that this gene cannot be expressed, as the nucleosome blocks the bulky transcription machinery from binding. In contrast, the absence of a nucleosome indicates that the gene may be expressed. In some cases, a gene may be in a poised state, meaning it is not expressed; however, the NDR is nucleosome-free to enable a rapid transcription initiation.
  • One approach to assess the NDR-nucleosome status is to use the nucleosome priors approach. In any case, it is possible to generate co-regulatory gene networks based on the NDR nucleosome pattern.
  • One network consists of the genes with a nucleosome-blocked NDR, and the other network of genes MUG003P -79- with nucleosome-free NDRs as genes co-regulated should exhibit similar expression patterns.
  • the NDR nucleosome status was established with the nucleosome priors strategy. In that case, not only the two states “NDR-blocked” vs. “NDR-free” are obtained but also intermediates, such as evidence for a blocked NDR in a certain percentage within the cfDNA. This information is included in the construction of our networks.
  • the final regulatory network can be computed with various approaches and similarity metrics, such as Pearson Correlation Coefficients (PCC), modified Tanimoto similarity (Tfunction), Euclidean, Squared Euclidean, Standardized Euclidean, City Block, Chebychev, Cosine, or Pearson Correlation.
  • PCC Pearson Correlation Coefficients
  • Tfunction modified Tanimoto similarity
  • Euclidean Squared Euclidean
  • Standardized Euclidean City Block
  • Chebychev Cosine
  • Pearson Correlation Pearson Correlation Coefficients
  • Example 6 TF-DHS and DHS-gene interaction network The aim is to capture all relevant regulatory regions.
  • the analyses of TF networks include TF-DHS and DHS-gene assignments to characterize different tissue-specific signatures (Georgolopoulos et al., 2021) ( Figure 6).
  • TF-DHS and DHS-gene interaction network to include all (distant) relevant regulatory regions
  • TF-DHS and DHS-gene interaction network describes the following: (a) In each cfDNA sample, we establish which TFs and genes are actuated. In parallel, maps of distal DHSs are generated. These TF, gene, and DHS signatures can be aligned with publicly available data to establish the interaction. In addition, recurrent TF-DHS-gene patterns can be selected and verified from cfDNA samples using well- defined cohorts. (b) Distal DHSs can be investigated at various distances to the NDR. This example displays DHSs 20-25 kb downstream of the TSS, with increased accessibility to a TF. These distal DHSs modulate the expression status of the gene.
  • DHSs DNase I hypersensitive sites
  • Detailed mapping of DHSs provides detailed snapshots of regulatory element dynamics across the MUG003P -80- multidimensional landscape of cell types, environmental exposures, and developmental stages.
  • Nucleosomes surrounding accessible promoters and TSS-distal DHSs are generally well-positioned.
  • Nucleosomes surrounding DHSs are collectively well positioned, but well-positioned nucleosomes are associated mainly with regulatory elements in an actuated state. Hence, nucleosome positioning is dependent mainly on the actuation of regulatory DNA (Stergachis et al., 2020).
  • Promoter elements are larger and more accessible elements; even though they represent the minority of elements, they dominate the top end of the quantitative accessibility landscape. Promoters also exhibit far less cell type selectivity.
  • TF binding in the proximal promoter region regulates gene expression by forming the preinitiation complex.
  • Distal regulatory elements influence the rate of gene transcription by acting as activators or repressors. Therefore, inclusion of these distal regulatory factors in interaction network models provides a comprehensive assessment of gene regulation. Furthermore, distal elements, which constitute the vast majority, exhibit considerable lineage- and cell type-selectivity, typically with ‘on/off’ behavior- i.e., most elements are complete ‘off’ in most cell and tissue types.
  • TF-DHS and DHS-gene interaction networks will focus on distal DHSs sites. They may overlap with some TFBSs; however, this strategy ensures that all regulatory regions, such as enhancers, will be included in our analyses.
  • enhancer regions it is not well-established which genes are targeted by these distal elements through mechanisms such as DNA looping.
  • An approach named ELMER Enhancer Linking by Methylation/Expression Relationships used DNA methylation to identify enhancers and correlated enhancer states with the expression of nearby genes to identify transcriptional targets.
  • ELMER represents a statistical framework for identifying cancer-specific enhancers and paired gene promoters.
  • DHSs e.g., from the ENCODE project.
  • IQR interquartile range
  • the Regulatory Elements Database http://dnase.genome.duke.edu/celltype.php
  • the sites can then be merged (concatenated) to contain each cluster into a single region-specific (or universally accessible) dataset (Peneder et al., 2021).
  • Chromatin accessibility landscapes have also been mapped in solid tumors, including breast cancer, colon cancer, glioblastoma, gastric cancer, and lung cancer (Minnoye et al., 2021).
  • TF-DHS and DHS-gene interaction networks are largely unexplored.
  • TF motif locations are overlapped and gene expression status is deduced from coverage pattern and nucleosome positioning with epigenetic data (open chromatin locations, here distal DHSs). Then the appropriate regions are connected with edges to construct a TF- DHS-gene regulatory network.
  • cfDNA sample sets from well-defined cohorts can be employed and tested for recurrent patterns. For example, accessible TFBSs located outside of proximal promoters are mapped. Regulatory regions, i.e., accessible TFBSs for each actuated gene, are investigated at various distances upstream and downstream of the TSS ( Figure 6b).
  • distal regulatory DHSs upstream or downstream of TSS are investigated in the next step.
  • these regulatory DHSs do not comprise thousands of DHSs but only a few, unique methods to establish their accessibility may be needed.
  • the distal DHSs of a core gene set can be combined to increase their number, or alternatively, a unique strategy, such as nucleosome priors, can be applied.
  • Example 7 Building a comprehensive model The four different interaction networks are combined (Figure 7).
  • MUG003P -82- Figure 7 (Building a comprehensive model from cfDNA) describes the following: Well-defined cohorts will be scrutinized by employing the four interaction networks. Combining these four networks, their different network topologies are established. A variety of machine learning / artificial intelligence approaches, such as neuronal networks or autoencoders may be employed. Defined sets are employed of cfDNA samples, i.e., from well-defined cohorts which will not represent only two states (disease X vs. healthy) but cohorts of individuals with clinically annotated diseases (disease A, B, C,...) or physiologic conditions (e.g., age, obesity, and so on).
  • Example 8 Learning data (to identify patterns and regularities): The cooperating TF-gene, TF-TF, TF-DHS, gene-gene, and DHS-gene interaction networks, and the combined TF-DHS-gene network, are generated from cfDNA data employing well-defined cohorts.
  • One cohort comprises healthy controls of both sexes and all age groups.
  • Healthy is a relative term, and therefore, samples are also collected from “healthy individuals” with common “co-morbidities”, such as hypertension, obesity, diabetes, and so on. Furthermore, cohorts of well-defined diseases or conditions (e.g., chronic diseases) involving mainly specific organs are needed. For example, individuals with colorectal cancer or chronic diseases affecting the GI tract (e.g., Crohn’s disease, ulcerative colitis) are suited to evaluate GI-specific interaction networks in cfDNA.
  • Example 9 Generating or determining TF-TF and TF-gene networks by developing a similarity matrix from all TF profiles Establishing various networks from cfDNA is described in the present invention.
  • TF-TF and TF-gene networks The following example, details of determining or generating TF-TF and TF-gene networks is described and results are provided.
  • interacting TFs should have highly similar cfDNA patterns.
  • a similarity matrix was generated using correlation coefficients or gaussian kernel transformed distance metrics.
  • TFs with a similar/identical accessibility pattern belong to the same TF-TF network.
  • TF-TF network may be correlated with protein-protein interaction networks, which can be retrieved from publicly available resources, such as the STRING database (https://string-db.org/).
  • STRING database https://string-db.org/
  • the cfDNA-derived TF-TF network can be aligned to the PPI network by using an AND operation to select only the edges that are present in both networks.
  • the algorithm can be applied to identify communities in the network/graph that are densely connected. We used a Louvain community detection algorithm in our examples, but other strategies may be applicable.
  • enrichment analyses can be applied to investigate the function, process, pathway, and so on for each community.
  • prototype disease networks can be generated, identifying modules commonly present in a specific disease. This is achieved by merging sample-specific networks (like co-activation networks generated with algorithm 1) representing each edge between two nodes in the final network as the number of sample-specific networks having that edge divided by the total number of sample-specific networks used (mean of edges between two nodes).
  • Disease-specific networks become extremely useful for classification because a distance to these prototype networks can be generated and assigned to a sample-specific network to a disease represented as a prototype network.
  • condition A (Ga) may represent a disease state, e.g., prostate cancer (but not limited to cancer), and condition B (Gb) a healthy state.
  • Figure 12 shows the complete TF-TF network, derived from cfDNAs from a patient with prostate cancer and a healthy individual.
  • FIG. 13 displays the TF-TF interaction network for the community with AR from the example in Figure 12.
  • FIG. 13 it is referred to data according to the following table 5: MUG003P -85- Table 5
  • Figure 14 illustrates that a completely different result is obtained if a cfDNA sample from a healthy individual is analyzed.
  • AR is usually not active.
  • AR should not have increased accessibility. Therefore, there is no connection to AR since the correlation of the others transcription factor signals are low in this case, which confirms the absence of an AR-related TF-TF network.
  • the TF-TF network displayed in Figure 13 is indeed disease-specific.
  • Figure 15 displays another TF-TF network established from the cfDNA of a prostate cancer patient with the TF STAT1 in the center.
  • Figure 16 illustrates a TF-TF network with HDAC1 as the main TF. This network affects the cell cycle and transcriptional regulation in cancer. Another example is displayed in Figure 17, where TFs TCF7L1, MYC, and ASH2L are central. This network affects transcriptional regulation in cancer, several pathways, and WNT signaling. It can be used for subclassifications of prostate cancer types, e.g., at the castration resistant stage.
  • a cfDNA-based approach should have the potential to monitor the course of a disease.
  • MUG003P -87- Community 1 displayed in Figure 18 refers to the following table 6: Table 6 on ng ay rial cer cer ng ay ng me cer cer lar ma cer
  • FIG. 18 The tumor was initially an adenocarcinoma (P148_1), the most common prostate cancer subtype ( Figure 18B).
  • Prostate adenocarcinomas are AR-dependent, and the cfDNA-based TF- TF network analysis reveals an AR network with established TF partners of AR, such as FOXA1, GRHL2, NKX3-1, or GATA3.
  • the time interval between collection of these samples i.e., P148_1 and P148_3, the prostate adenocarcinoma transdifferentiated to a treatment-emergent small-cell neuroendocrine prostate cancer (t-SCNC).
  • Prostate adenocarcinomas and neuroendocrine prostate cancers have fundamental differences in their tumor genomes and biology, as reflected by the network analyses ( Figure 18).
  • the t-SCNC is no longer an androgen-dependent stage of prostate cancer, and the AR network was switched off in sample P148_3 ( Figure 18A).
  • the transdifferentiation from an adenocarcinoma to a neuroendocrine tumor means a change in the cell-type identity, which becomes apparent in our network analyses as vanishing edges between TFs such as HOXB13, NKX3-1, and GRHL2 ( Figure 18A).
  • FIG. 19 illustrates an example that cfDNA-based TF-TF analyses reveal fundamental insights into tumor biology.
  • the left plot shows the signal for the TF AR, and the right plot shows the signal for the TF FOXA1, i.e., two TFs with vital roles in prostate cancer tumorigenesis.
  • the dark grey line displays an early disease stage at which the tumor depends on AR and where the AR binding sites usually have high accessibility.
  • the light grey line reflects an advanced stage called castration- resistant prostate cancer (CRPC).
  • CRPC castration- resistant prostate cancer
  • the status of the AR has usually changed, and the tumor no longer responds to hormone or ADT therapy.
  • Figure 20 is a further example, similar to Figures 13 and 14, illustrating that in the present invention completely different results are obtained depending on whether the TF subnetwork analyses are conduct with a cfDNA sample from a patient with prostate cancer ( Figure 20, upper panel) or with a cfDNA sample from a healthy individual ( Figure 20, lower panel).
  • Figure 21 shows the result of a subtraction operation between the PC specific subnetwork and the equivalent subnetwork in healthy individuals.
  • each TF controls several genes, which can also be investigate from cfDNA, e.g., by the coverage patterns of the respective TSSs of the genes.
  • Figures 22 and 23 show applications of the described approach, where it was established, which genes are co- regulated by specific TFs in cfDNA from patients with prostate cancer.
  • Example 10 Determining or generating gene-gene and TF-gene networks by the inclusion of fragmentation patterns at the +1 and +2 nucleosomes, within gene bodies, and other regulatory regions Nucleosomes and open chromatin regions may determine the fragmentation patterns at TSSs and gene bodies and are related to transcriptional gene activity.
  • Signals may be derived from cfDNA analyses, i.e., determining nucleosome positions, open chromatin regions, coverage patterns at nucleosome locations, and open chromatin regions, fragmentation patterns (e.g., length of cfDNA fragments) at nucleosome locations and open chromatin regions, transcriptional activities of genes based on TSS patterns.
  • fragmentation patterns e.g., length of cfDNA fragments
  • MUG003P -91- Promoters of transcriptionally active genes in eukaryotic cells are characterized by a nucleosome-depleted region (NDR) with two flanking nucleosomes commonly known as the –1 (the last upstream) and the +1 (the first downstream) nucleosomes.
  • NDR nucleosome-depleted region
  • the +1 nucleosome is well positioned downstream of the transcription start site (TSS) and is commonly known as a barrier of transcription.
  • TSS transcription start site
  • the +1 nucleosome displays the tightest positioning (or phasing) of all the nucleosomes found in and around genes.
  • the +1 nucleosome often contains histone variants (H2A.Z and H3.3) and histone tail modifications (methylation and acetylation).
  • the +2 nucleosome is located immediately downstream of the +1 nucleosome. It shares some properties with the +1 nucleosome but contains less H2A.Z and displays less methylation, acetylation, and phasing.
  • the +3 nucleosome and the more downstream nucleosomes have fewer properties than the previous upstream nucleosome.
  • Figure 24 illustrates calculations at the +1 nucleosome (top panel) and the +2 nucleosome (bottom panel). Each dot represents a ratio value between short and long cfDNA fragments.
  • short refers to cfDNA fragments ⁇ 250bp
  • long refers to cfDNA fragments ⁇ 250bp.
  • other differentiators between short and long e.g., 150bp, 200bp, or other values, should also be applicable.
  • the Y-axis indicates different groups of genes.
  • PBMCs peripheral blood mononuclear cells
  • PBMC_0.05 Transcripts Per Kilobase Million
  • PBMC_0.05 Transcripts Per Kilobase Million
  • PBMC_0.05-0.1 PBMC_0.05-0.1
  • PBMC_0.1_0.3 TPM 0.1-0.3
  • PBMC_0.3_0.5 TPM 0.3-0.5
  • PBMC_0.5_1 TPM 0.5-1
  • PBMC_1_2 TPM 1-2
  • PBMC_2_4 TPM 2-4
  • PBMC_4_10 TPM 4-10
  • PBMC_10_50 TM 10-50
  • PBMC_50 TPM >50
  • the ratios differ between the +1 nucleosome and the +2 nucleosome, i.e., the ratio values for the +1 nucleosome are usually higher than those at the +2 nucleosome. 3.
  • the ratio values are different for the various cohorts. For example, healthy persons older than 55 years of age have lower ratio values than younger healthy individuals (20-30 years of age), which is caused by an increase of longer cfDNA fragments with age. Similarly, the ratios are different within gene bodies.
  • Figure 25 displays a similar plot as Figure 24; however, the ratio calculations were not done for the +1 and +2 nucleosomes but for the entire gene body. Similar analyses were performed for transcriptionally inactive regions. For example, the human genome contains several regions with exceptionally firmly positioned nucleosome arrays.
  • a gene-gene network can now be established by scrutinizing a cfDNA sample for all genes with the same ratio values.
  • a gene-gene network of genes with low ratio values would represent a network of highly actively transcribed genes.
  • a gene-gene network with high ratio values would represent those genes with low or absent transcriptional activity.
  • any intermediate ratio values could be used to establish gene-gene networks with various transcriptional activities.
  • These gene- related networks can then be aligned with other networks, e.g., a TF-related network described above, to build a TF-gene network.
  • these examples illustrate how these networks can be used to estimate the (biological) age of the cfDNA donor.
  • the ratio values differ between young and older persons at the +1 nucleosome, +2 nucleosome, and gene body.
  • the age can also be estimated by establishing the gene-gene networks and including ratio values.
  • Example 11 selection of TFs
  • a question may be how to select the most appropriate TFs, genes, or DHSs. What is “most appropriate” depends on the question to be addressed. For example, one application is determining whether a given cfDNA sample is from a healthy individual or a person with cancer. If the cfDNA is derived from a cancer patient, it would be desirable to determine the cancer type through the network analysis.
  • a procedure is needed to identify the most differentially active transcription factors between several tumor entities and healthy cohorts.
  • the most suitable TFs were selected to distinguish between cfDNA samples from patients with breast (BC), prostate (PC), colon cancer (CRC), and from healthy individuals.
  • confounding factors were reduced by diluting all cfDNA samples to the same coverage, e.g., 20x, and to the same tumor fraction, e.g., 0.2.
  • the respective TFBS coverages were calculated and summarized as the signal's amplitude.
  • Each dataset Di is split into test and train sets.
  • the training set is used to select the best model and hyperparameters. This is achieved through cross-validation on each training set i.
  • Best models are refit on the full training set i, and their final performances are evaluated on an independent test set i.
  • MUG003P -95- Through this process multiple models and parameters are evaluated (Figure 27) and the best model with the best parameters set is selected and reevaluated on an independent test set.
  • This model yields outstanding results with high F1 scores, as illustrated in the confusion matrix (Figure 28). Most errors occurred between differentiating the PC from the BC cohort, which was expected as both tumor entities are hormone-dependent, so their biology partly overlaps.
  • Figure 28 shows the confusion matrix demonstrating that most cfDNA samples can be classified correctly by TF-TF network analyses after selection of the most different TFs.
  • GRAND a database of gene regulatory network models across human conditions.
  • gpuZoo Cost-effective estimation of gene regulatory networks using the Graphics Processing Unit.
  • GTRD an integrated view of transcription regulation.
  • TF-Marker a comprehensive manually curated database for transcription factors and related markers in specific cell and tissue types in human.
  • GTRD a database of transcription factor binding sites identified by ChIP-seq experiments. Nucleic Acids Res 45, D61-D67. Zhang, K., et al. (2021). A single-cell atlas of chromatin accessibility in the human genome. Cell 184, 5985-6001 e5919.

Landscapes

  • Physics & Mathematics (AREA)
  • Health & Medical Sciences (AREA)
  • Life Sciences & Earth Sciences (AREA)
  • Engineering & Computer Science (AREA)
  • Bioinformatics & Cheminformatics (AREA)
  • Evolutionary Biology (AREA)
  • Medical Informatics (AREA)
  • Theoretical Computer Science (AREA)
  • Bioinformatics & Computational Biology (AREA)
  • Biotechnology (AREA)
  • Biophysics (AREA)
  • General Health & Medical Sciences (AREA)
  • Molecular Biology (AREA)
  • Spectroscopy & Molecular Physics (AREA)
  • Genetics & Genomics (AREA)
  • Physiology (AREA)
  • Chemical & Material Sciences (AREA)
  • Analytical Chemistry (AREA)
  • Proteomics, Peptides & Aminoacids (AREA)
  • Measuring Or Testing Involving Enzymes Or Micro-Organisms (AREA)

Abstract

The present invention relates to methods and means for analyzing cell-free DNA for determining regulation networks, screening for correlations, the tissue contribution, the health status of a subject, and monitoring the treatment of a patient. The present invention specifically relates to the determination of regulation networks between transcription factors, genes, DNase hypersensitivity sites, and any combination thereof.

Description

MUG003P -1- DETERMINING THE HEALTH STATUS WITH CELL-FREE DNA USING CIS- REGULATORY ELEMENTS AND INTERACTION NETWORKS FIELD OF THE INVENTION The present invention relates to methods and means for analyzing cell-free DNA for determining the tissue contribution, the health status of a subject, and monitoring the treatment of a patient. The present invention specifically relates to the determination of regulation networks between transcription factors, genes, DNase hypersensitivity sites, and any combination thereof. BACKGROUND OF THE INVENTION Globally, there has been a significant increase in the number of people who have cancer. The rising prevalence of cancer can be attributed to several factors, such as environmental factors, tobacco consumption, infectious agents such as Hepatitis B and C, and lifestyle changes. According to the WHO, cancer is the second-leading cause of death globally. Liquid biopsy holds several benefits over traditional cancer diagnostic techniques-reduced cost, early prognosis, therapy monitoring, detection of tumor heterogeneity, acquired drug resistance, and patient comfort. A restraint of liquid biopsy is the lower sensitivity. Detecting ctDNA in liquid biopsies is technically challenging because the levels of ctDNA on any given cancer mutation may be very low in the plasma of a cancer patient, especially after treatment or surgery. The sampling statistics mean that in any individual plasma sample from a patient, there may be less than one detectable copy of the ctDNA with the cancer mutation. This may result in the ctDNA not being detected in the patient sample, even though it is present in the plasma, but at a low level. This leads to false-negative results in which ctDNA is not detected even though it is present and impacts the informative value of the liquid biopsy tests for cancer. In some cases, false negatives may mean that the recurrence of a tumor is not detected at an early stage. There is also a growing significance of companion diagnostics in the healthcare sector. Companion diagnostics include tests or assays intended to assist healthcare providers in making treatment decisions for patients based on the best response to therapy. The co-development of companion diagnostics with therapeutic products can significantly alter the drug development process and commercialize drug candidates by yielding safer drugs with enhanced therapeutic efficacy quickly and cost-effectively. MUG003P -2- Diverse digital solutions are increasingly adopted in healthcare. Artificial intelligence is used more and more to develop products for the healthcare market. Also, evidence-based, software-driven therapeutic interventions are developed in the industry to prevent, manage or treat medical conditions. Furthermore, demographic and technological change, globalization, the increasing burden on social security systems, and increasingly demanding consumers require continuous adjustments. Thereby, a general problem is that highly complex data need to be translated into helpful information for experts and laypersons in order to enable diagnostics, monitoring, and prediction of therapeutic efficiency. Especially, the diagnosing of diseases at the earliest stage or diagnosing relapse as early as possible is a challenge in the field. The aging process remains poorly understood, however, there is accumulating evidence that aging results in a plethora of changes at the cellular and molecular levels. Among the accumulating changes during aging, somatic mutations and epigenetic markers such as methylation changes have been intensively studied (Vijg, 2014; Vijg and Campisi, 2008). Human aging-related genes have been identified, including examples in the GenAge database (Zhang et al., 2016). The human genome contains multiple segments that encode functional elements. Functional elements are defined as discrete, linearly ordered sequence features that specify molecular products (e.g., protein-coding genes or non-coding RNAs) or biochemical activities with mechanistic roles in gene or genome regulation (e.g., transcriptional promoters or enhancers). One of the most extensive international efforts to annotate these functional elements within the human genome is the ENCODE (Encyclopedia of DNA Elements) project, which applied a plethora of state-of-the-art assays to generate high-throughput functional genomic data for virtually all organs and tissues. In addition to ENCODE, other efforts, such as the Roadmap Epigenomics (http://www.roadmapepigenomics.org), and Blueprint Epigenome (http://www.blueprint- epigenome.eu) collected epigenetic data, in particular of several human hematopoietic cell types. Furthermore, existing catalogs of regulatory sequences in the human genome are currently complemented to improve cell-type resolution, e.g., by establishing single- cell atlases of chromatin accessibility in the human genome (Lai et al., 2018; Zhang et al., 2021). Consequently, large-scale maps of cis-candidate regulatory elements (cCREs) are available for most tissues and organs with unprecedented resolution. These MUG003P -3- data are publicly available, e.g., through the ENCODE data portal (https://www.encodeproject.org) or other publicly accessible databases. However, it is poorly understood how cCREs interact within regulatory networks with each other. Complex networks of interacting elements control regulatory processes. These complex networks of interacting factors within cells control gene expression and thus define cellular, tissue, and organismal phenotypes. Furthermore, these networks are instrumental in responding to internal and external perturbations. Among the most important regulators in gene regulation are transcription factors (TFs) and microRNAs (miRNAs). Furthermore, enhancers establish the transcription level and when and where a gene is expressed, thus determining cell identity. Hence, lineage commitment and differentiation are driven by the concerted action of master transcriptional regulators at their target chromatin sites. Stage-specific regulatory DNA is temporally activated to instruct lineage-specific gene expression programs that underpin cellular fate and potential. Therefore, the in-depth analysis of cCREs and their interaction with each other within regulatory network is pivotal in understanding how gene expression is regulated and altered in pathological conditions. Instrumental to this process is that regulatory DNA is established and maintained by the combinatorial engagement of TFs that bind in the place of a canonical nucleosome. Hence, cis-regulatory elements present characteristic epigenetic features, typically devoid of nucleosomes to allow the binding of TFs and protein complexes to their DNA motifs. Therefore, regulatory sequences are more accessible for enzymatic digestion as they lack protection provided by DNA bound to a nucleosome. Regulatory DNA is dynamically activated and silenced during cell state transitions to establish lineage-restricted gene expression programs and functional landscapes (Dixon et al., 2015; Ho and Crabtree, 2010; Stergachis et al., 2013). DNase hypersensitive sites (DHSs) are generic markers of regulatory DNA, and DHSs mark all major classes of cis-regulatory elements in their cognate cellular context. The systematic delineation of DHSs across human cell types and states has provided fundamental insights into many aspects of genome control. A universal feature of active cis-regulatory elements—promoters, enhancers, silencers, chromatin insulators or enhancer blockers, and locus control regions—is the focal alteration in chromatin structure triggered by the binding of proteins, which supplants a canonical nucleosome and renders the underlying DNA accessible to nucleases and other protein factors. Furthermore, a cardinal property of regulatory DNA MUG003P -4- is that its accessibility is cell type- and state-selective, with only a tiny fraction of all genome-encoded elements becoming actuated in a given cellular context (Meuleman et al., 2020; Vierstra et al., 2020). In summary, cCREs are characterized by a lack of protection from enzymatic digestion due to the lack of a nucleosome and provide a wealth of information about the disease and physiologic conditions. However, this largely depends on interrogating the cCREs involved in tissue- specific gene regulation, i.e., within the context of the gene regulatory networks (GRNs) that alter which genes are expressed and control the extent of that expression in tissue- specific processes. Leveraging the transcriptomic data produced by the Genotype- Tissue Expression (GTEx) consortium, it has been shown that network edges (i.e., TF to target gene connections) have higher tissue specificity than network nodes (genes) and that regulating nodes (transcription factors) are less likely to be expressed in a tissue-specific manner as compared to their targets (genes) (Sonawane et al., 2017). Recently, several strategies to improve the sensitivity of cfDNA analyses have been published. These approaches include targeted deep sequencing, molecular barcoding approaches, the inclusion of matched cfDNA and white blood cell sequencing to improve somatic variant interpretation, the combined assessment of circulating proteins and mutations in cfDNA, analyses of methylation sites, the investigation of plasma DNA fragmentation patterns, and nucleosome positioning mapping (reviewed by Heitzer et al (2019) Nat Rev Genet 20:71-88 and Heitzer et al. (2020) Trends Mol Med 26:519-528). Ulz et al (2016) Nat Genet 48:1273-1278 and Ulz et al (2019) Nat Commun 10:4666 describe the potential of nucleosome positioning mapping after whole-genome sequencing of plasma DNA. Ulz et al (2016) describes the identification of two discrete regions at transcription start sites (TSSs) where nucleosome occupancy resulted in different read depth coverage patterns for expressed and silent genes. Thereby, machine learning was employed for gene classification and it was demonstrated that gene expression profiles of cells releasing DNA into the circulation could be directly inferred from nucleosome positioning. Ulz et al (2019) describe the development of a method to investigate the accessibility of transcription factor binding sites, which revealed insights into biological processes from the cells releasing DNA into the circulation and furthermore enabled a subclassification of prostate cancer entities. MUG003P -5- Ulz P. et al. (2019) analyzed cfDNA samples from cancer patients and from healthy controls by whole genome sequencing and bioinformatic methods to infer accessibility of transcription factor binding sites. Thereby, tumor subtype prediction and early detection of certain cancers was enabled. Markus et al (2022) Sci Rep 12:1928 described the integration of cfDNA fragment sizes, genomic position of fragment end points with respect to nucleosome center, and fragment end motifs to enhance ctDNA detection. The integrated analysis of these three features resulted in a higher enrichment of ctDNA when compared to using fragment size alone. Loyfer et al. (2022) describe the development of a human methylome atlas based on deep whole-genome bisulfite sequencing. Bochkis et al. (2014) compared genome-wide nucleosome occupancy in livers from young (3 months) with old (21 months) mice, and evidence for aging-associated changes was found. In previous work, specific elements have been studies such as transcription start sites (TSSs), transcription factor binding sites (TFBSs), or selected DHSs have been studied (Peneder et al., 2021; Snyder et al., 2016; Ulz et al., 2019; Ulz et al., 2016; Zhu et al., 2021). Alternatively, current tissue deconvolution and aging analyses are based on methylation markers. US 2016/0004814 A1 discloses methods for analyzing regulatory regions within polynucleotides, in particular within genomic DNA. However, the direct inference of regulatory networks from cfDNA using specific genomic features has not been enabled so far. US 2016/0004814 A1 does not disclose how to use native, non-processed sources of input material such as cfDNA for determining regulatory networks. Therefore, the method disclosed in US 2016/0004814 A1 depends on identification of DHSs and boundaries of DNase I accessibility. However, at present, inferring functional, biological information from cfDNA is not possible and known cfDNA assays are only designed to identify disease or to analyze specific cCREs. For example, the mere presence of DNA released from specific organs is not informative about the biological relevance, e.g., whether it indicates a disease and, if so, which pathways are altered. So far, the primary use of cfDNA in a regulatory context is in detecting genetic alterations associated with diseases, particularly cancer. For example, identifying MUG003P -6- somatic mutations, copy number variations, and methylation changes in cfDNA can provide information about the genomic landscape of tumors. Also, so far, known methods are restricted to concise, genomic locations associated with active expression. Thus, there is no comprehensive method known which allows analyzing specific features throughout the expressed and unexpressed regions of the human genome which would highly increase the resolution for inferring regulatory networks. Using the methods known so far, it is not possible to identify e.g., silenced genes. The mechanistic aspect and the underlying systems biology are underutilized factors in all current cfDNA applications since, at present, no technologies exist to investigate interaction networks, which is the basis to elucidate the biological/medical significance of DNA from specific organs in the circulation. cCREs are open chromatin regions with regulatory functions, which contain a wealth of information. To date, methods are lacking to leverage the content of these sites in multimodal analysis, i.e., at the network level. Thus, there is an unmet need in the field for methods for analyzing cfDNA and determining regulation networks from cfDNA. Furthermore, there is an unmet need in the field for determining physiological conditions such as e.g., healthy/unhealthy aging, for determining the health status, and also for determining diseases from liquid biopsies. SUMMARY OF THE INVENTION It is the object of the present invention to provide method and means for analyzing cfDNA fragments for determining the health status of a subject and for monitoring the treatment success with an increased resolution. The problem is solved by the present subject-matter. The inventors of the present invention surprisingly found that native, non- processed cfDNA, or data representing DNA sequences of cfDNA fragments, can be used as input material for determining regulatory networks. Thereby, this input material does not require physical manipulation in the laboratory by means of a cleavage agent such as DNase. The inventors found that the features that can be extracted from cfDNA are manifold and diverse in nature that go beyond genomic sequence (i.e. nucleosome positions, fragmentation profiles, depth of coverage, coverage patterns, nucleosome maps, open chromatin regions, and expression data based on TSS patterns) and cover the entire human genome, whereas previous methods are restricted to concise, genomic MUG003P -7- locations associated with active expression. The comprehensive analysis of these features throughout the expressed and unexpressed regions of the human genome dramatically increases resolution for inferring regulatory networks, such that new applications alongside determination of disease/disorder states are now possible, e.g. determining physiological conditions such as aging. The present invention does not depend on DHSs and boundaries of DNase I accessibility as previously disclosed methods but instead, the present invention includes determining nucleosome positions and optionally also open chromatin regions which are flanked by nucleosomes. Determining nucleosome positions allows definition of open chromatin regions within the context of nucleosome positions, i.e., the space between nucleosomes. Thus, definition of open chromatin regions does not depend on determination of regions by a lab-produced signal as in the previously disclosed methods such as in US 2016/0004814 A1. In general, between nucleosomes may be just linker DNA, or at specific sites, a nucleosome-depleted region (NDR) or a nucleosome-free region (NFR). Herein, the terms NDR and NFR are used interchangeably. At NDRs or NFRs, the space between nucleosomes is often broader, so proteins may bind to the DNA to fulfill some regulatory functions. The sequencing coverages in the spaces between nucleosomes are usually lower than in nucleosome-occupied regions. Furthermore, the cfDNA fragmentation patterns between nucleosomes, i.e., in open chromatin regions, have different length patterns and increased length variability compared to nucleosome-occupied regions. Hence, determination of open chromatin regions is based on factors such as nucleosome positions, coverage, or fragmentation patterns, i.e., on different characteristics than those in previously disclosed methods such as in US 2016/0004814 A1. The inventors of the present invention surprisingly found comprehensive approaches to deduce regulation networks from cfDNA. Thereby, information is inferred by integrating nucleosome positions and/or many open chromatin features, particularly at the network level. Thus, the herein described invention enables to investigate the dynamics of regulatory and functional events during various stages of diseases or physiological conditions, such as aging, by analyzing cfDNA. Thereby, the methods described herein comprise the integration of different network topologies to increase the resolution of cfDNA analyses to enable distinguishing MUG003P -8- between different diseases e.g., presence of a certain disease and exclusion of another disease, and determining the health state of an individual. Thereby, various combinations of multiple tissue-specific cCRE markers in cfDNA are analyzed and specific interaction networks are built. These interaction networks are built between transcription factors, DNase hypersensitivity sites, and genes. As many tissues and organs contribute only minute amounts to the cfDNA pool, highly sensitive approaches are needed for their identification. The accurate identification of rare cell populations in cfDNA cannot be achieved by analyzing only a single or few regions but depends on interrogating multiple, sometimes even thousands of loci. Thereby, as described herein, thousands of cCREs are leveraged within their biological context to significantly enhance cfDNA analyses to identify diseases, such as cancer and characterize physiological states, e.g., aging. Thereby, building regulation networks from tissue-specific candidate cis- regulatory elements (cCREs) is enabled by the methods described herein and prior knowledge about cCREs is used to construct sets of thousands of cCREs informative about diseases and certain physiologic conditions. The herein described solution, offers the option to explore the aging process, e.g., whether somebody ages well (“healthy aging”) or whether somebody has an increased risk for developing diseases in specific organs. The herein described approach paves the way for analyzing open chromatin regions, i.e., cCREs and deduced interaction networks, to address these questions. Thereby, the problem of the invention is solved by leveraging the interplay between chromatin accessibility and gene expression dynamics. To this end, cfDNA analysis is focused on regulatory network connections, which are tissue-specific i.e., context-dependent, non-canonical regulatory pathways are explored, which are in addition to tissue-specifying informative about altered pathways, and therefore about diseases and physiological changes. The present invention provides a computer-implemented method for determining a regulation network from cell-free DNA (cfDNA) fragments from a sample comprising the steps of: i. receiving data representing the DNA sequences of cfDNA fragments acquired by sequencing of cfDNA fragments extracted from the sample; ii. determining nucleosome positions; and MUG003P -9- iii. determining at least one of the regulation networks selected from the group consisting of: a. a transcription factor (TF)-gene network; b. a TF-TF network; c. a gene-gene network; d. a TF-DNase hypersensitive site (DHS) network; and e. a DHS-gene network. Specifically, further comprising determining open chromatin regions in ii.. Specifically, further comprising determining coverage patterns at nucleosome positions and/or open chromatin regions in ii.. Specifically, further comprising determining fragmentation patterns at nucleosome positions and/or at open chromatin regions in ii.. Specifically, further comprising determining the length of cfDNA fragments. Specifically, further comprising determining transcription start site (TSS) patterns in ii.. Specifically, further comprising determining transcriptional activities of genes based on TSS patterns. Specifically, determining the TF-gene network comprises the steps of: a. determining actively transcribed TFs; b. determining tissue-specificity of the actively transcribed TFs from a.; c. determining gene sets which the TFs from a. activate in each tissue determined in b.; d. evaluating if the gene sets are transcribed; e. determining the intersect for identical and different genes from the gene sets; and f. determining the TF-gene network from the data obtained from e.. Specifically, determining the TF-TF network comprises the steps of: a. assessing accessibility of the respective TFBS of each TF; b. optionally determining overlapping binding sites in TFs; c. determining TF-TF interaction; d. correlating the accessibility obtained from a. with the interaction obtained from c.; and e. determining the TF-TF network from the data obtained from d.. Specifically, determining the gene-gene network comprises the steps of: MUG003P -10- a. determining the expression status of pre-selected genes or gene-sets, wherein the expression status is determined by i. determining the coverage pattern at the nucleosome depleted region (NDR) and/or at the region of 2 kilobases upstream and downstream of the TSS (2K) region, or ii. determining if a nucleosome is present at the NDR; b. correlating the genes according to their expression status; and c. determining the gene-gene network from the data obtained from b.. Specifically, determining the TF-DHS network comprises the steps of: a. determining actively transcribed TFs; b. determining maps of accessible distal DHSs; c. correlating the actively transcribed TFs with the maps of distal DHSs; and d. determining the TF-DHS network from the data obtained from c.. Specifically, determining the DHS-gene network from DNA sequences of cfDNA fragments comprises the steps of: a. determining gene expression status by i. determining the coverage pattern at the NDR and/or at the 2K region, or ii. determining if a nucleosome is present at the NDR; b. determining maps of accessible distal DHSs; c. correlating the gene expression status of a. with the maps of accessible distal DHSs of b.; and d. determining the DHS-gene network from the data obtained from c.. Specifically, at least two, three, four, or five of the regulation networks are determined. Specifically, the sample is a biological sample from a subject or from a cohort of subjects. Specifically, further comprising comparing the at least one of the regulation networks with one or more standard regulation network, or regulation model selected from TF-gene network, TF-TF network, gene-gene network, TF-DHS network, and DHS- gene network. Specifically, further comprising screening for a correlation of the at least one of the regulation networks with one or more standard regulation network, or regulation model selected from TF-gene network, TF-TF network, gene-gene network, TF-DHS network, and DHS-gene network. MUG003P -11- Specifically, the one or more standard regulation network, or regulation model is determined for one or more cohorts of subjects having a specific classification. Specifically, the specific classification is associated with a condition. Specifically, the condition is selected from the group consisting of health status, aging status, cell type, tissue type, and specific disease status. Specifically, markers for specific conditions are defined. Specifically, the most differently active TFs, genes, or DHSs are determined. Specifically, further comprising determining whether a subject has a specific condition. Specifically, the cell and/or tissue origin of cfDNA fragments is determined. Specifically, the one or more standard regulation network, or regulation model is derived from healthy subjects, and/ or unhealthy subjects. Specifically, a. congruence with the standard regulation network, or regulation model derived from healthy subjects and difference with the standard network or model derived from unhealthy subjects is characteristic for a healthy status; and/or b. congruence with the standard regulation network, or regulation model derived from unhealthy subjects and difference with the standard regulation network or regulation model derived from healthy subjects is characteristic for an unhealthy status. Specifically, the health status of a subject is determined. Specifically, the subject is a patient undergoing treatment of a health condition. Specifically, the one or more standard regulation network, or regulation model is derived from a previous result from said patient and/or a standard regulation networks characteristic for treatment success. Specifically, differences and/or congruences provide information on the treatment success of the patient. Specifically, the treatment success of a patient is monitored. Specifically, congruence with the standard regulation network derived from a specific cohort of subjects having a specific aging status is characteristic for a specific aging status. Specifically, the aging status of a subject is determined. Specifically, the cohort of subjects having a specific aging status is selected from healthy subjects older than 55 years, healthy subjects between 20 and 30 years, pregnant females, and subjects having a disease. MUG003P -12- Specifically, the disease is cancer, specifically selected from colorectal cancer and prostate cancer. The present invention further provides a model comprising at least one of the regulation networks selected from TF-gene, TF-TF, gene-gene, TF-DHS, and DHS-gene networks obtained from cfDNA according to the computer-implemented method described herein. The present invention further provides a data processing apparatus comprising means for carrying out the computer-implemented method described herein. The present invention further provides a computer program comprising instructions which, when the program is executed by a computer, cause the computer to carry out the computer-implemented method described herein. The present invention further provides a computer-readable medium having stored thereon the computer program described herein. The present invention further provides an in vitro method for analyzing the cell and/or tissue origin of cell-free DNA (cfDNA) fragments from a sample comprising the steps of: i. extracting cfDNA fragments from the sample; ii. performing whole genome sequencing on the extracted cfDNA fragments; and iii. determining at least one of the regulation networks selected from the group consisting of: a. a transcription factor (TF)-gene network; b. a TF-TF network; c. a gene-gene network; d. a TF-DNase hypersensitive site (DHS) network; and e. a DHS-gene network, and iv. comparing the at least one network of iii. with one or more standard regulation networks or regulation models characteristic for a specific tissue or cell comprising at least one network selected from a TF-gene network, a TF-TF network, a TF-DHS network, a DHS-gene network, and any combination thereof. Further provided herein is also a computer-implemented method for analyzing the cell and/or tissue origin of cell-free DNA (cfDNA) fragments from a sample comprising the steps of: MUG003P -13- i. receiving data representing the DNA sequences of cfDNA fragments acquired by sequencing of cfDNA fragments extracted from a sample; ii. determining at least one of the regulation networks selected from the group consisting of: a. a TF-gene network; b. a TF-TF network; c. a gene-gene network; d. a TF-DNase hypersensitive site (DHS) network; and e. a DHS-gene network, and iii. comparing the at least one network of ii. with one or more standard regulation networks or regulation models characteristic for a specific tissue or cell comprising at least one network selected from a TF-gene network, a TF-TF network, a TF-DHS network, a DHS-gene network, and any combination thereof. Further provided herein is also an in vitro method for determining the health status of a subject comprising the steps of: i. extracting cfDNA fragments from a sample from the subject; ii. performing whole genome sequencing on the extracted cfDNA fragments; and iii. determining at least one of the regulation networks selected from the group consisting of: a. a TF-gene network; b. a TF-TF network; c. a gene-gene network; d. a TF-DNase hypersensitive site (DHS) network; and e. a DHS-gene network; iv. comparing the at least one regulation network of iii. with one or more standard regulation networks or regulation models derived from healthy subjects, and/ or unhealthy subjects; wherein a. congruence with the standard regulation network or regulation model derived from healthy subjects and difference with the standard network or model derived from unhealthy subjects is characteristic for a healthy status; and/or MUG003P -14- b. congruence with the standard regulation network or regulation model derived from unhealthy subjects and difference with the standard regulation network or regulation model derived from healthy subjects is characteristic for an unhealthy status. Further provided herein is also a computer-implemented method for determining the health status of a subject comprising the steps of: i. receiving data representing the DNA sequences of cfDNA fragments acquired by sequencing of cfDNA fragments extracted from a sample; ii. determining at least one of the regulation networks selected from the group consisting of: a. a transcription factor (TF)-gene network; b. a TF-TF network; c. a gene-gene network; d. a TF-DNase hypersensitive site (DHS) network; and e. a DHS-gene network; iii. comparing the at least one regulation network of ii. with one or more standard regulation networks or regulation models derived from healthy subjects, and/ or unhealthy subjects; wherein a. congruence with the standard regulation network or regulation model derived from healthy subjects and difference with the standard network or model derived from unhealthy subjects is characteristic for a healthy status; and/or b. congruence with the standard regulation network or regulation model derived from unhealthy subjects and difference with the standard regulation network or regulation model derived from healthy subjects is characteristic for an unhealthy status. Further provided herein is also an in vitro method for monitoring the treatment success of a patient comprising the steps of: i. extracting cfDNA fragments from a sample of said patient; ii. performing whole genome sequencing on the extracted cfDNA fragments; iii. determining at least one of the regulation networks selected from the group consisting of: a. a TF-gene network; b. a TF-TF network; c. a gene-gene network; d. a TF-DNase hypersensitive site (DHS) network; and MUG003P -15- e. a DHS-gene network; iv. comparing the at least one network of iii. with one or more regulation networks of a previous result from said patient and/or with one or more standard regulation networks characteristic for the treatment success, wherein differences and/or congruences obtained in iv. provide information on the treatment success of the patient. Further provided herein is also a computer-implemented method for monitoring the treatment success of a patient comprising the steps of: i. receiving data representing the DNA sequences of cfDNA fragments acquired by sequencing of cfDNA fragments extracted from a sample; ii. determining at least one of the regulation networks selected from the group consisting of: a. a TF-gene network; b. a TF-TF network; c. a gene-gene network; d. a TF-DNase hypersensitive site (DHS) network; and e. a DHS-gene network; iii. comparing the at least one network of ii. with one or more regulation networks of a previous result from said patient and/or with one or more standard regulation networks characteristic for the treatment success, wherein differences and/or congruences obtained in iii. provide information on the treatment success of the patient. Specifically, determining the TF-gene network comprises the steps of: a. determining the actively transcribed TFs; b. determining the tissue-specificity of the actively transcribed TFs from a.; c. determining the gene sets which the TFs from a. activate in each tissue determined in b.; d. evaluating if the gene sets are transcribed; e. determining the intersect for identical and different genes from the gene sets; and f. determining the network from the data obtained from e.. Specifically, determining the TF-TF network comprises the steps of: a. assessing the accessibility of the respective TFBS of each TF; b. optionally determining overlapping binding sites in TFs; MUG003P -16- c. determining the TF-TF interaction; d. correlating the accessibility obtained from a. with the interaction obtained from c.; and e. determining the network from the data obtained from d.. Specifically, determining the gene-gene network comprises the steps of: a. determining the expression status of pre-selected genes or gene-sets, wherein the expression status is determined by i. determining the coverage pattern at the NDR and/or at the 2K region, or ii. determining if a nucleosome is present at the NDR; b. correlating the genes according to their expression status; and c. determining the network from the data obtained from b.. Specifically, determining the TF-DHS network comprises the steps of: a. determining the actively transcribed TFs; b. determining maps of accessible distal DHSs; c. correlating the actively transcribed TFs with the maps of distal DHSs; and d. determining the network from the data obtained from c.. Specifically, determining the DHS-gene interaction network from DNA sequences of cfDNA fragments comprises the steps of: a. determining the gene expression status by i. determining the coverage pattern at the NDR and/or at the 2K region, or ii. determining if a nucleosome is present at the NDR; b. determining maps of accessible distal DHSs; c. correlating the gene expression status of a. with the maps of accessible distal DHSs of b.; and d. determining the network from the data obtained from c.. Further provided herein is also a model comprising at least one of the regulation networks selected from TF-gene, TF-TF, gene-gene, TF-DHS, and DHS-gene networks obtained from cfDNA according to the method described herein. Further provided herein is also a data processing apparatus comprising means for carrying out the computer-implemented method described herein. Further provided herein is also a computer program comprising instructions which, when the program is executed by a computer, cause the computer to carry out the computer-implemented method described herein. MUG003P -17- Further provided herein is also a computer-readable medium having stored thereon the computer program described herein. FIGURES Figure 1: Effect of transcribed TFs on open chromatin regions and nucleosome positions Figure 2: TFs activate a gene regulatory network (GRN) in a tissue-dependent manner Figure 3: Construction of TF-gene interaction networks Figure 4: Construction of TF-TF interaction networks Figure 5: Construction of gene-gene interaction networks Figure 6: TF-DHS and DHS-gene interaction network to include all (distant) relevant regulatory regions Figure 7: Building a comprehensive model from cfDNA Figure 8: Computation of nucleosome dyad prior distribution Figure 9: Transformation of empiric count distributions to nucleosome prior distributions Figure 10: Overview heatmap of dyad count distributions with different distribution truncation strategies Figure 11: Nucleosome occupancy pattern from nucleosome priors Figure 12: TF-TF network generated from a plasma sample from a patient with prostate cancer. Figure 13: The TF-TF subnetwork or community showing with which other TFs AR interacts. The table represents the top results of enrichment analysis performed on this subnetwork. Figure 14: Analysis of the same TFs as in Figure 13, but with cfDNA from a healthy individual. Figure 15: The TF-TF subnetwork or community showing TFs, which with STAT1 interacts. Figure 16: The HDAC1 TF-TF subnetwork. Figure 17: The TCF7L1, MYC and ASH2L TF-TF subnetwork. The table represent the top results of the enrichment analysis performed over this community. Figure 18: Comparison of the AR subnetwork in longitudinal samples from patient with prostate cancer whose tumor transdifferentiated from an adenocarcinoma to a MUG003P -18- treatment-emergent small-cell neuroendocrine prostate cancer (t-SCNC). A) AR subnetwork of patient P148_3 after transdifferentiation to the neuroendocrine state. B) AR subnetwork in same patient prior to transdifferentiation, i.e. prostate adenocarcinoma state. Figure 19: Comparison of AR and FOXA1 signals in prostate cancer (PC) and castration resistant prostate cancer patients (CRPC). The change in AR signals in CRPC demonstrates the loss of the edges connecting AR and FOXA1 in CRPC samples. Figure 20: Comparison of a PC subnetwork and the same subnetwork in healthy individuals Figure 21: Result of a subtraction operation between the PC specific subnetwork and the equivalent subnetwork in healthy individuals. Figure 22: Example of prostate cancer specific TF-TF-genes subnetwork. A TF- TF subnetwork was expanded to include genes regulated by TFs. Genes are represented as triangles. Figure 23: Another example of prostate cancer-specific TF-TF-genes subnetwork. Figure 24: Ratio values between short (<250bp) and long (≥250bp) cfDNA fragments for various gene groups (Y-axis) and cohorts calculated for the +1 nucleosome (top panel) and the +2 nucleosome (bottom panel). Figure 25: Ratio values between short (<250bp) and long (≥250bp) cfDNA fragments for various gene groups (Y-axis) and cohorts calculated for gene bodies. Figure 26: Outline of the algorithm to find the best model to classify patients based on top differentially active transcription factors per cohort. Each dataset Di is split into test and train sets. The training set is used to select the best model and hyperparameters. This is achieved through cross-validation on each training set i. Best models are refit on the full training set i, and their final performances are evaluated on an independent test set i. Figure 27: Models (n=64) generated with the procedure outlined in Figure 26. Figure 28: Confusion matrix demonstrating that most cfDNA samples can be classified correctly by TF-TF network analyses after selection of the most different TFs. DETAILED DESCRIPTION Unless indicated or defined otherwise, all terms used herein have their usual meaning in the art, which will be clear to the skilled person. Reference is for example made to the standard handbooks, such as “Molecular Biology of the Cell” (Alberts et al., MUG003P -19- 2022), “Vogel and Motulsky's Human Genetics: Problems and Approaches” (Speicher et al., 2010), “Human Molecular Genetics” (Strachan and Read, 2018), and “The Biology of Cancer” (Weinberg et al., 2013). The subject matter of the claims specifically refers to artificial products or methods employing or producing such artificial products, which may be variants of native (wild type) products. Though there can be a certain degree of sequence identity to the native structure, it is well understood that the materials, methods and uses of the invention, e.g., specifically referring to isolated nucleic acid sequences, amino acid sequences, fusion constructs, expression constructs, transformed host cells and modified proteins, are “man-made” or synthetic, and are therefore not considered as a result of “laws of nature”. The terms “comprise”, “contain”, “have” and “include” as used herein can be used synonymously and shall be understood as an open definition, allowing further members or parts or elements. “Consisting” is considered as a closest definition without further elements of the consisting definition feature. Thus “comprising” is broader and contains the “consisting” definition. The term “about” as used herein refers to the same value or a value differing by +/-5 % of the given value. As used herein and in the claims, the singular form, for example “a”, “an” and “the” includes the plural, unless the context clearly dictates otherwise. According to one embodiment, a method for analyzing cell-free DNA (cfDNA) fragments from a sample is described herein, said method comprising the steps of: i. extracting cfDNA fragments from the sample; ii. performing whole genome sequencing on the extracted cfDNA fragments; and iii. determining at least one of the regulation networks selected from the group consisting of: a. a transcription factor (TF)-gene network; b. a TF-TF network; c. a gene-gene network; d. a TF-DNase hypersensitive site (DHS) network; and e. a DHS-gene network. As used herein, DNA refers to deoxyribonucleic acid. DNA is a type of nucleic acid. MUG003P -20- As used herein, the term “nucleic acid” generally refers to a polynucleotide comprising two or more nucleotides. A nucleotide is a monomer composed of three components: a 5-carbon sugar, a phosphate group, and a nitrogenous base. The four naturally occurring types of DNA nucleotides are: adenine (A), thymine (T), guanine (G), and cytosine (C). As used herein, the term “cfDNA” refers to “cell free DNA”, “cell-free DNA”, “circulating free DNA”, or “circulating-free DNA”. cfDNA consists of highly degraded DNA fragments, which are detectable in the peripheral blood of every human. In healthy individuals, the vast majority of cfDNA is derived from the hematopoietic system. However, the preferential DNA contribution to the cfDNA pool may change under certain physiological or pathological conditions. Furthermore, cfDNA can also provide information about physiological processes such as aging. cfDNA may comprise a footprint representative of its underlying chromatin organization, which may capture one or more of: expressing-governing nucleosomal occupancy, RNA Polymerase II pausing, cell death-specific DNase hypersensitivity, and chromatin condensation during cell death. Such a footprint may carry a signature of cell debris clearance and trafficking, e.g., DNA fragmentation carried out by caspase- activated DNase (CAD) in cells dying by apoptosis, but also may be carried out by lysosomal DNase II after the dying cells are phagocytosed, resulting in different cleavage patterns. cfDNA represents an essential component of “liquid biopsies”, which refers to the analyses of non-solid biological sources (e.g., blood, urine, CSF, ascites) to obtain information similar to tissue biopsies. Analyses of cfDNA are of extraordinary relevance, particularly in oncology, since in patients with cancer, cfDNA contains circulating tumor DNA (ctDNA) shed from tumor cells into the circulation. Mechanisms for DNA release into the bloodstream can be apoptosis, necrosis, and active secretion, specifically cfDNA is released by apoptosis. In eukaryotes, DNA is wrapped around histones to form nucleosomes, which are the basic structure of DNA packing. In general, typical cfDNA fragment lengths have a modal distribution of 167 bp. This length corresponds approximately to the size of DNA wrapped around a nucleosome (∼147 bp) and a linker fragment (~20 bp). This particular cfDNA size pattern corresponds to fragmentation patterns after enzymatic processing in apoptotic cells. Specifically, the cfDNA fragmentation patterns reflect the association between cfDNA MUG003P -21- with nucleosome core particles and linker histones, determining where nuclease cleavage may occur. Hence, DNA is frequently cleaved between nucleosomes and only rarely within nucleosomes. The latter circumstance is also called “cleaving resistance” and associated with cfDNA fragments described herein. The architecture of individual nucleosomes determines access to nucleosomal DNA. The individual nucleosome core particle contains 147 bp of DNA wrapped in ~1.7 left-handed superhelical turns around a central octamer composed of two copies of each of the four core histones H2A, H2B, H3, and H4. These fundamental nucleosome units are connected with intervening linkers ranging from 20 to 100 bp (Michael and Thomä, 2021). Usually, the DNA is tightly wrapped around this histone octamer and sharply bent. This sharp bending occurs at every DNA helical repeat, i.e., ~10bp, when the major groove faces inwards towards the histone octamer and ~5 bp away, with opposite direction, when the major groove faces outward. The nucleosome core particle architecture is pseudo-2-fold symmetric, with the DNA position at the symmetry axis. The symmetry axis, i.e., the dyad, is designated as location 0. The superhelix locations (SHLs) are labeled with ±1, ±2, and so on and denote where the minor grooves of the DNA double helix structure face away from the histone octamer (shown in Michael and Thomä, 2021). As used herein, the term “sample” generally refers to a biological sample obtained from or derived from a subject. Biological samples may be cell-free biological samples or substantially cell-free biological samples, or may be processed or fractionated to produce cell-free biological samples. For example, cell-free biological samples may include cell-free ribonucleic acid (cfRNA), cell-free deoxyribonucleic acid (cfDNA), cell-free protein and/or cell-free polypeptides. A biological sample may be tissue (e.g., tissue obtained by biopsy), blood (e.g., whole blood), plasma, serum, sweat, urine, saliva, or a derivative thereof. Cell-free biological samples may be obtained or derived from subjects using an ethylenediaminetetraacetic acid (EDTA) collection tube, a cell-free RNA collection tube (e.g., Streck), or a cell-free DNA collection tube (e.g., Streck). Cell-free biological samples may be derived from whole blood samples by fractionation. Biological samples or derivatives thereof may contain cells. For example, a biological sample may be a blood sample or a derivative thereof (e.g., blood collected by a collection tube or blood drops), a tumor sample, a tissue sample, a urine sample, or a cell (e.g., tissue) sample. MUG003P -22- In some embodiments, the biological sample used in the method of the invention is a biofluid sample. Non-limiting examples of useful biofluid samples include, e.g., a blood sample, a serum sample, a plasma sample, a cerebrospinal fluid (CSF) sample, a lymph sample, an endometrial fluid sample, a urine sample, a saliva sample, a tear fluid sample, a synovial fluid sample, an amniotic fluid sample, and a sputum sample. In preferred embodiments, the biofluid sample is selected from a blood sample, a urine sample, a cerebrospinal sample, or an amniotic fluid sample. cfDNA can, e.g., be obtained by a standard blood draw, i.e., a minimally invasive approach. As the blood vial after the blood draw contains both the cellular components of blood and the cell-free fraction, which is referred to as plasma, extraction steps such as centrifugation steps may be required to separate these components. As used herein, the term “extract” in the context of extracting cfDNA fragments refers to the isolation of the cfDNA or cfDNA fragments from the sample. Isolation, extraction, and or purification of cfDNA or cfDNA fragments may be performed through collection of bodily fluids using a variety of techniques. In some cases, collection may comprise aspiration of a bodily fluid from a patient using a syringe. In other cases, collection may comprise pipetting or direct collection of fluid into a collecting vessel. After collection of bodily fluid, cfDNA or cfDNA fragments may be isolated and extracted using a variety of techniques known in the art. In some cases, cfDNA may be isolated, extracted and prepared using commercially available kits such as the Qiagen Qiamp® Circulating Nucleic Acid Kit protocol. In other examples, Qiagen Qubit™ dsDNA HS Assay kit protocol, Agilent™ DNA 1000 kit, or TruSeq™ Sequencing Library Preparation; and Low-Throughput (LT) protocol may be used. After isolation, in some cases, the cfDNA or cfDNA fragments are pre-mixed with one or more additional materials, such as one or more reagents (e.g., ligase, protease, polymerase) prior to sequencing. According to another embodiment, a cell-free fraction of a biological sample may be used as a sample in the methods described herein. The term “cell-free fraction” of a biological sample, as used herein, generally refers to a fraction of the biological sample that is substantially free of cells. As used herein, the term “substantially free of cells” generally refers to a preparation from the biological sample comprising fewer than about 20,000 cells per mL, fewer than about 2,000 cells per mL, fewer than about 200 cells per mL, or fewer than about 20 cells per mL. Genomic DNA may not be excluded from MUG003P -23- the acellular sample and typically comprises from about 50% to about 90% of the nucleic acids that are present in the sample. In the context of the present invention, the term “liquid biopsy” refers to a broad category for sampling and minimally invasive testing done of a biofluid (e.g., blood, blood plasma or blood serum) to look for fragments of e.g., tumor derived cfDNA, that are in the blood. According to one embodiment, the methods described herein may comprise a step of amplifying a nucleic acid. The terms “amplifying” and “amplification” generally refer to increasing the size or quantity of a nucleic acid molecule. The nucleic acid molecule may be single-stranded or double-stranded. Amplification may include generating one or more copies or “amplified product” of the nucleic acid molecule. Amplification may be performed, for example, by extension (e.g., primer extension) or ligation. Amplification may include performing a primer extension reaction to generate a strand complementary to a single-stranded nucleic acid molecule, and in some cases generate one or more copies of the strand and/or the single-stranded nucleic acid molecule. The term “DNA amplification” generally refers to generating one or more copies of a DNA molecule or “amplified DNA product.” In some embodiments of the methods described herein, a method comprises performing DNA sequencing e.g., whole genome sequencing, Sanger sequencing, targeted next-generation sequencing (NGS), whole-genome NGS. In a specific embodiment, whole genome sequencing is performed on extracted cfDNA fragments for obtaining the DNA sequence of the cfDNA fragment. The result of this sequencing of the cfDNA fragment is also referred to herein under “sequenced cfDNA fragment” or the “read”. Thereby the term “sequenced” or "read" refers to a sequence read from a portion of a nucleic acid sample, i.e., the result of the sequencing experiment. Typically, a read represents a short sequence of contiguous base pairs in the sample. The read may be represented symbolically by the base pair sequence (in ATCG) of the sample portion. It may be stored in a memory device and processed as appropriate to align the sequences with another sequence, to determine whether it matches a reference sequence, or if it meets other criteria. A sequence or a read may be obtained directly from a sequencing apparatus or indirectly from stored sequence information concerning the sample. The term “sequenced fragment” or “fragment sequence” as used herein refers to the combined sequence and length information of a DNA fragment which is gained, MUG003P -24- for example, from a pair of sequencing reads which were created by sequencing both ends of that DNA fragment, a process which is known as “paired-end read sequencing”, and subsequently aligning the obtained sequences to a reference genome. The length information is obtained from start and end coordinates of the paired sequence alignments. This information can also be extracted from a single sequencing read of a DNA fragment which was created by exhaustive sequencing of a DNA fragment until an adjacent sequencing adapter is read during the sequencing process. This type of sequencing process is known as “single-end read sequencing”. The adapter sequence is removed computationally from the read sequence afterwards. According to one embodiment, in the methods described herein the DNA sequences of the cfDNA fragments have different lengths. The length may vary from tens to hundreds of base pairs. In some embodiments of the method described herein, the sequence reads are about 25bp, about 30bp, about 35bp, about 40bp, about 45bp, about 50 bp, about 55 bp, about 60 bp, about 65 bp, about 70 bp, about 75 bp, about 80 bp, about 85 bp, about 90 bp, about 95 bp, about 100 bp, about 110 bp, about 120 bp, about 130 bp, about 140 bp, about 150 bp, about 175 bp about 200 bp, about 250 bp, about 300 bp, about 350 bp, about 400 bp, about 450 bp, or about 500 bp. In one embodiment, the sequence reads are 151 bp for each end of a DNA fragment that is sequenced in paired-end read sequencing mode. In other embodiments, paired-end reads are 50 bp, 75 bp, 100 bp, 101 bp, 150 bp, 151 bp, or 175 bp long. According to one embodiment, the sequences obtained from sequencing of the cfDNA fragments extracted from the sample may be aligned with a reference sequence. The term "alignment" as used herein refers to the process of comparing a DNA sequence with a reference sequence. In other words, aligning means comparing a read or sequence obtained by sequencing to a reference sequence and thereby determining whether the reference sequence contains the read sequence, the location where the read sequence is aligned in the reference sequence, and/or how the read sequence aligns with the reference sequence. If the reference sequence contains the read, the read may be mapped to the reference sequence or, in certain embodiments, to a particular location in the reference sequence. In some cases, alignment simply tells whether or not a read is a member of a particular reference sequence (i.e., whether the read is present or absent in the reference sequence). As used herein, the term “aligned sequence pattern” generally refers to a spatial pattern of sequence reads after alignment to a reference genome. MUG003P -25- A "reference sequence" or a “reference genome sequence” is a sequence of a biological molecule, which is frequently a nucleic acid such as a chromosome or genome. Typically, DNA sequences of multiple cfDNA fragments are members of a given reference sequence. In various embodiments, the reference sequence is significantly larger than the sequenced portions or reads that are aligned to it. In one example, the reference sequence is the sequence of a full length genome of a subject, specifically it is a full length human genome. Such sequences may be referred to as reference genome sequences. Such sequences may also be referred to as chromosome reference sequences. Other examples of reference sequences include genomes of other species, as well as chromosomes, sub-chromosomal regions, e.g., strands of any species. In various embodiments, the reference sequence is a consensus sequence or other combination derived from multiple individuals. However, in certain applications, the reference sequence may be taken from a particular individual. In certain embodiments, a DNA sequence of a cfDNA fragment is aligned with a reference genome sequence in order to determine the cfDNA fragmentation profile. According to one embodiment, the methods described herein may further comprise analyzing the depth of coverage. The term “depth of coverage” as used herein refers to the number of fragment sequences that align with a particular site of the reference genome. Specifically, coverage describes whether or not any fragment sequence aligns with a particular site or region of a reference genome. In another embodiment, is also used to describe the x-fold target coverage on average across an entire reference genome. As used herein, the term “coverage pattern” generally refers to a spatial arrangement of fragment sequences after alignment of read sequences to a reference genome. The coverage pattern identifies the extent and depth of coverage of next- generation sequencing methods. The term “fragmentation profile” as used herein refers to evaluation of fragmentation patterns of cfDNA across the genome. Such an evaluation can include cfDNA fragment lengths, positions of aligned fragments relative to the reference genome sequence, relative to a specific point on the reference genome, or alignment positions of multiple fragments relative to each other, the ratio between cfDNA fragments with MUG003P -26- different lengths (e.g., ratio between all cfDNA below a certain length (e.g., 150 bp) vs. all fragments above this length), or whether the nucleosome patterns computed from the cfDNA fragments correspond to nucleosome patterns of a particular cell type, such as white blood cells. In another embodiment, the fragmentation profile of cfDNA fragments is used to generate a nucleosome map that identifies the position of nucleosomes in the sample. The nucleosome map displays positions of nucleosome peaks, indicating open and closed chromatin regions in the subject’s genome. Open chromatin regions indicate regions of the genome that do not contain nucleosomes. These open regions are able to be bound by various protein factors and regulatory elements and transcribed. Closed chromatin regions are regions of the genome that surround nucleosomes and are inaccessible to protein factors, regulatory elements, and other molecules. These closed chromatin regions are not able to be transcribed. The term “network” as used herein refers to a collection of connected objects. Objects are referred to as nodes or vertices. The connections between the nodes are referred to as edges. In other words, the objects are the points connected in a network and the lines between the points are the edges. Synonymously to the term “graph” may be used for the term “network”. The term “regulation network” as used herein refers to the regulatory network of gene regulation and comprises the correlation, interaction, cooperation, co- expression, co-regulation and/or co-occurrence of different factors involved in gene regulation. The term “interaction network” can be used interchangeably herein for the term “regulation network”. According to one embodiment, at least one of the following regulation networks are determined in the methods described herein: TF-gene network, TF-TF network, gene-gene network, TF-DHS network, and DHS-gene network. In the context of the TF-gene network described herein, genes and TFs are the nodes. The FT to gene connections are also referred to as edges. According to one embodiment, a TF-gene network is reconstructed from cfDNA data. Thereby, the regulatory connections from TFs to their target genes with the GRN are reconstructed. The term “transcription factor”, abbreviated herein as “TF”, generally refers to a protein that controls the rate of transcription of genetic information from DNA to messenger RNA by binding to a specific DNA sequence. Transcription factors are MUG003P -27- proteins that bind to DNA-regulatory sequences (e.g., enhancers and silencers), usually localized in the 5′-upstream region of target genes, to modulate the rate of gene transcription. This may result in increased or decreased gene transcription, protein synthesis, and subsequent altered cellular function, (for example, cells changing, in response to the environment (normal or pathological), for example during atrophy, hypertrophy, hyperplasia, metaplasia, or dysplasia). As used herein, specific transcription factors are referred to by a nomenclature although other synonyms may also be used for the transcription factors recited herein. In general, TFs bind to specific DNA motifs to activate or repress gene expression. However, many TFs do not activate a single gene but a GRN in a tissue- dependent manner. Each TF has a canonical regulatory profile, and its target genes have distinct co-expression patterns. The TF to target gene connections are also referred to as network edges, whereas genes are network nodes and TFs regulating nodes. In general, edges are often uniquely called as specific in only one tissue. At the same time, tissue-specific genes often have a high multiplicity, meaning that they are identified as specific in more than one tissue. TFs primarily participate in tissue-specific regulatory processes via alterations in their targeting patterns. TFs regulate tissue- specific biological processes by subtle differences in the regulatory connections between genes and TFs. Hence, even edges may have a higher multiplicity indicative of shared regulatory processes between tissues. The term “gene” as used herein refers to a DNA sequence which is transcribed into RNA. The RNA can be directly functional or be an intermediate template for a protein that performs a function. According to one embodiment, data on TFs and regulated genes can be retrieved from various sources, e.g., from the PANDA (Passing Attributes between Networks for Data Assimilation) (Guebila et al., 2022b) or the GRAND websites (https://grand.networkmedicine.org) (Guebila et al., 2022a). As an example, Figure 2 illustrates downstream-regulated genes for the TFs HNF4G and HNF4A in whole blood and colon adenocarcinoma. In the respective tissues, different genes are affected by these two TFs. According to one embodiment, at the TF level, the transcribed status is reflected in the typical transcription start site (TSS) pattern, i.e., in the nucleosome depleted region (NDR) and 2K region, and the distances of upstream and downstream nucleosomes MUG003P -28- (Fourier transformation (FFT), short-time Fourier transformation (STFT)) and the accessibility of the corresponding transcription factor binding sites (TFBSs). The transcribed status of the targeted genes can again be deduced from the typical TSS pattern. According to a specific embodiment, building the TF-gene interaction network in cfDNA may consist of the steps given in Figure 3a. The term “transcription start site” or “TSS” as used herein refers to the location where the first DNA nucleotide is transcribed into RNA. In general, TSS do not harbor many nucleosomes. Chromatin remodeling complexes maintain the nucleosome depleted region (NDR) by sliding nucleosomes away in order to ensure accessibility of transcription factors and polymerase. The term “NDR” as used herein refers to the nucleosome depleted region at the TSS, specifically to the region between TSS-150bp and TSS+50bp. The term “2K” as used herein refers to the region of 2 kilobases around the TSS, i.e. to the region between TSS-1000bp and TSS+1000bp. Specifically, the term “2K” as used herein refers to the region of 2 kilobases upstream and downstream of the TSS, i.e. to the region between TSS-1000bp and TSS+1000bp. According to one embodiment, the “Fourier transformation” or “FFT” informs about the frequencies, here the distances between nucleosomes close to a TSS. According to one embodiment, the “short-time Fourier transformation” or “STFT” informs about the location of the frequencies, i.e., the position when a signal changes. Hence, the STFT informs not only about the distances between nucleosomes but about the positions downstream and upstream of a TSS where the distances change. The term “transcription factor binding site” or “TFBS” as used herein refers to the DNA region to which a TF binds. According to one embodiment, transcription factor binding sites (TFBS) are identified from the Gene Transcription Regulation Database (GTRD: a database on gene transcription regulation-2019 update. I. S. Yevshin, R. N. Sharipov. S. K. Kolmykov, Y. V. Kondrakhin, F. A. Kolpakov. Nucleic Acids Res. 2019 Jan. 8; 47(D1):D100-D105) using statistical thresholds for use in the herein described methods and systems. According to a specific embodiment, the identified TFBS are informative for machine learning models and classifier generation. In some examples, the associated pathways and classes of transcription factors are similarly useful and informative for machine learning models and classifier generation. MUG003P -29- The term “transcription factor binding profile” generally refers to a multi-factor information profile for a given transcription factor that includes both tissue contributions and biological processes. The TFBP also includes an “accessibility score” and a z-score statistic to objectively compare across different plasma samples for significant changes in TFBS accessibility. The profile may allow identification of lineage-specific TFs suitable for both tissue-of-origin and tumor-of-origin identification. As used herein, the term “accessibility score” generally refers to a measure for the accessibility of each transcription factor binding site. Since transcription factor binding may open or “prime” its target enhancers without necessarily activating them per se, the rank values are termed “accessibility score.” The accessibility score may be used to objectively compare the accessibility of TFBSs in serial analyses from the same person or among different individuals. This score provides a robust assessment of TFBS accessibility with particular utility to use cfDNA in clinical diagnostics, cancer detection, treatment monitoring, and other applications described herein. According to one embodiment, a list of actively transcribed TFs is generated from cfDNA data. Thereby, the activity, i.e., the transcription, of TFs is deduced from the coverage patters at the TSS, i.e., the NDR and 2K regions, combined with the position of upstream and downstream nucleosomes to establish the transcription status of the respective TFs precisely. Furthermore, for each TF, the accessibility of the respective TFBSs should be increased, which represents another parameter for TF status assessment included in the model. Hence, the activity of a TF is assessed by a combination of several factors: TSSs i.e., NDR/2K, nucleosome positions, and TFBS accessibility. Further parameters, such as the (relative) entropy at TSS and TFBSs, may be included in the evaluation. Thereby, all TFs with evidence for active transcription in the analyzed cfDNA sample are revealed. According to a specific embodiment, the activity (transcription) of TFs may be deduced from the coverage pattern at the TSS, i.e., the NDR and 2K regions as described by Ulz et al., 2016. According to a specific embodiment, the nucleosome positions may be derived from cfDNA data by determining the coverage pattern and/ or by the nucleosome priors approach described herein. The term “nucleosome position” as used herein refers to the position of a nucleosome on cfDNA. In other words, this refers also to the presence of a nucleosome at a base position of the cfDNA fragment. MUG003P -30- According to one embodiment, the methods described herein comprise determining the nucleosome position. According to a specific embodiment, determining the nucleosome position may be performed by determining the probability of the presence of a nucleosome or a nucleosomal dyad for a base position of the cfDNA fragments. Specifically, this probability may be determined by determining the dyad count distribution for specific fragment lengths, performing a fragment length-based truncation, determining probability density functions, and removing of the non- informative portion. This probability is also be termed “nucleosome dyad prior distribution”, “nucleosome prior distribution”, or “nucleosome prior” herein.” According to one embodiment, the methods described herein comprise determining fragmentation profiles. According to a specific embodiment, these fragmentation profiles are determined for some or for all sites of interest. According to an even more specific embodiment, cfDNA fragment length and the variability of the fragments may be included in the methods described herein. According to a specific embodiment, the variability of fragment lengths is reflected in the term “entropy” or “entropy of TSS and TFBS” as described elsewhere herein. According to one embodiment, nucleosome positions and open chromatin sites are mapped. According to a specific embodiment, the co-occurrence of nucleosome positions and open chromatin sites are determined. According to a specific embodiment, the resulting atlas of open chromatin sites may consist of co-regulated open chromatin regions which correlate to a high extend, of open chromatin regions which are related only to a subset of other regions, and/or of an open chromatin region which may not be associated with any other region. According to one embodiment, maps of open chromatin regions may represent a mixed collection of regulatory regions such as TSSs, TFBSs, or any other regulatory regions, such as enhancers. According to one embodiment, open chromatin regions are assigned to their function based on publicly available databases, literature, or other publicly available resources. According to one embodiment, there may be open chromatin sites with currently no known function or any established association with other regions. MUG003P -31- According to one embodiment, cfDNA multi-dimensional data is integrated and analyzed to identify potential regulatory interactions between transcription factors, target genes, and other regulatory elements. According to one embodiment, after the assignment of open chromatin regions to their function, the involved regulatory networks are determined. According to a specific embodiment, this involves determining that whenever a subset of open chromatin regions is accessible, that then follows -based on known regulatory interactions- another subset of other open chromatin must also be accessible to characterize conditions such as a health state or a defined disease state. According to one embodiment, regulatory networks reflect the complex molecular mechanisms that govern gene expression and cellular processes. Specifically, they are essential for understanding normal development, disease progression and identifying potential therapeutic targets. According to one embodiment incorporation of regulatory networks in a model comprising at least one of the regulation networks selected from TF-gene, TF-TF, gene- gene, TF-DHS, and DHS-gene networks obtained from cfDNA according to the method described herein provides an increased informative value of regulatory networks. According to a specific embodiment, the increase of accessibility of the respective TFBSs for each TF is described by Ulz et al., 2019. According to one embodiment, the tissue specificity of actively transcribed TFs may be determined from resources such as from Lambert et al. (2018), from the TF- Marker database described in Xu et al. (2022), or from other sources. According to one embodiment, from actively transcribed TFs and the tissue specificity of the actively transcribed TFs, the gene sets are analyzed which are activated by the TFs in the different tissues. This analysis comprises the analysis of the TSS, e.g., the NDR/2K region, and the nucleosome position e.g., the distances between nucleosomes. In other words, for each TF, a separate evaluation is performed of the gene set activated by this TF in tissue A, then the gene set activated in tissue B, etc. These gene sets can be retrieved from various resources (e.g., the GRAND database). According to one embodiment, the analyzed gene sets are evaluated for the most substantial evidence of whether they are active, i.e., transcribed. According to a specific embodiment, an example of evaluating the analyzed gene sets for the most substantial evidence of whether they are active, i.e., transcribed, is establishing a ranking order according to the evidence to which tissue they correspond. MUG003P -32- The ranking of these gene sets enables to estimate the cell abundance in cfDNA. As cfDNA represents a mixture of DNA released from different tissues, the composition may change depending on physiologic or pathological conditions. Hence, multiple ranking lists of gene networks, i.e., one ranking list for each TF, are generated. Numerous data sets are obtained to reconstruct which tissue contributed what percentage to the cfDNA pool. An alternative to such ranking lists could be other bioinformatics approaches such as neuronal networks or autoencoders. According to one embodiment, these gene lists are compared and filtered for the genes common or different in the lists (Figure 3b). The purpose is that the gene networks regulated by tissue-specific TFs (e.g., hematopoietic or GI-specific TFs, as exemplarily shown in Figure 3b) will have similarities and overlap. Specifically, establishing such similarities further increases the resolution and specify specific tissues' contribution to the cfDNA pool with improved precision. In the context of the TF-TF network described herein, the TFs are the nodes. The TF to TF connections are also referred to as edges. According to one embodiment, in the TF-TF network it is determined which TFs cooperate in a cfDNA sample. According to one embodiment, a TF-TF interaction network is established as described herein. Thereby, cooperative interactions between TFs in a cfDNA sample are established. In other words, the TF networks are deciphered which cooperate in each cfDNA sample. In general, regarding TF-TF networks, as a general pattern, TFs involved in developmental state specification such as HOXB1, OCT4, and SOX2 preferentially regulate other TF genes. The regulatory process is multi-faceted. The genomic locations where TFs may bind, i.e., the TFBSs, may be computationally estimated using DNA recognition sequences, i.e., motifs. However, focusing only on predicting the genomic locations of TF binding does not help deduce GRN relationships. TFs may work together by forming protein complexes. Consequently, a member of a TF complex may regulate a target gene even without a corresponding binding site in the regulatory region of that gene. From protein-protein interaction (PPI) data, it is known that TFs often form multi- protein complexes that carry out regulatory functions. Therefore, investigating only an initial set of motif locations does not include cases where TFs bind to the DNA without a corresponding recognition sequence (motif). Furthermore, not all TFBSs are functionally relevant or active. MUG003P -33- According to one embodiment, evidence for TF interactions can be established from cfDNA data (Figure 4). According to one embodiment, the accessibility of the respective TFBSs is assessed for each TF. Thereby, different accessibilities are revealed, e.g., high, medium, and low accessibility (Figure 4a). According to one embodiment, it is analyzed how TFs share TFBSs, e.g., due to sequence homology or other factors. As many TFs bind in large complexes, overlapping binding sites in TFs that regulate the same genes may be found e.g., in the GTRD database. Specifically, the most recent GTRD version 21.12 (Kolmykov et al., 2021) (https://gtrd.biouml.org/) harbors information on 1.391 TFs. Hence, the methods described herein include a constantly updated screening for TFBS overlaps based on the latest versions of the respective databases (Figure 4b, lower panel). According to one embodiment, protein-protein interaction (PPI) data are employed to explore established TF cooperation in cfDNA. According to a specific embodiment, PPIs can be obtained from public interaction databases such as PINA2, STRING, IntAct, and BioGRID or from a recent publication (Göös et al., 2022). According to one embodiment, there may be two possible scenarios based on the fact that TFs, which cooperate within the same tissue, should show a high concordance of their accessibility in cfDNA. According to a specific embodiment, if TFs cooperate exclusively in the same tissue, their accessibility patterns should show a strong correlation. In these cases, it will be possible to deduce something like: if TFn has increased accessibility, so should TFm. As in the data above from Göös and colleagues: if HNF4A has increased accessibility, the accessibility of TYY1 should be likewise increased. Figure 4d illustrates an example of close cooperation between TF1 and TF2, but not TF3, in the hematopoietic system. Examples for co-expressed, cell-type-specific TFs exist. For example, most cells in the human body share a few broad transcriptional programs, which define five major cell types: epithelial, endothelial, mesenchymal, neural, and blood cells (Figure 4e). Hence, these transcriptionally defined major cell types correspond broadly, but not precisely, to the basic histological types in which tissues are usually classified (Breschi et al., 2020). The authors of this study manually curated a list of 43 driver genes, i.e., cell-type-specific TFs that are firmly co-expressed in the respective tissues (Breschi et al., 2020) (Figure 4e). Hence, the cooperate accessibility of these TFs has a high power to evaluate the MUG003P -34- balance between hematopoietic derived DNA and epithelial DNA within a cfDNA sample with high precision. According to a specific embodiment, if TFs cooperate with other TFs in different tissues, the pattern of correlations in cfDNA changes according to the contribution of different tissues to the cfDNA. This is depicted in Figure 4f, where TF1 cooperates with T2 in hematopoietic cells and with TF3 in the GI tract. In cfDNA samples from individuals without diseases of the GI tract, i.e., without GI-derived DNA in the circulation, the TF1- TF2 pattern will be concordant because of the low contribution of GI-derived DNA to the cfDNA pool. However, if the contribution of DNA from the GI tract increases in the circulation, the correlation between accessibility patterns between TF1 and TF3 increases, whereas the correlation between TF1 and TF2 decreases. According to one embodiment, TF correlation matrices can elucidate which TFs cooperate and establish a TF-by-TF “cooperativity network”, particularly if DNA from various tissues contributes to the cfDNA pool. Combined with tissue deconvolution means, i.e., methods allowing to establish which tissues contribute what amount to the cfDNA pool, TF interactions can be set for various tissues, which release DNA into the circulation. In the context of the gene-gene network described herein, the genes are the nodes. The gene to gene connections are also referred to as edges. According to one embodiment, in the gene-gene network it is estimated whether genes or pairs of genes are co-regulated in a cfDNA sample. An example is shown in Figure 5. According to one embodiment, the gene-gene network i.e., the co-regulation of genes from cfDNA can be determined with two strategies: one strategy involves defining core gene sets and determining their combined expression pattern in cfDNA. Another strategy investigates single genes. According to a specific embodiment, the co-regulation of genes from cfDNA may utilize the prior nucleosome strategy where the presence or absence of a nucleosome at its TSS is used as a proxy for gene expression. According to a specific embodiment, in the core gene strategy, distinct core gene sets are defined. Thereby, the design of these gene sets depends on the question to be addressed. For example, core gene sets are defined corresponding to major cell types using extensive new maps of RNA transcripts in a broad range of primary cell types (Figure 5a). MUG003P -35- In general, core transcriptional programs define the morphology and function common to a few major cellular types, which are at the root of the hierarchy of the many cell types that exist in the human body, i.e., epithelial, endothelial, mesenchymal, neural, and blood cells (Breschi et al., 2020). Genes whose expression is specific to these cell types were identified. From these genes, the contribution of the major cell types to the composition of human tissues was estimated, resulting in 2,871 genes (including 2,463 protein-coding genes; 283 long non-coding RNAs, and 125 pseudogenes) whose expression was specific to epithelial, endothelial, mesenchymal or melanocyte cell types (Breschi et al., 2020). According to a specific embodiment, meaningful gene sets may be defined, e.g., subsets of PBMC specific genes according to their expression levels or organ-specific gene sets based on tissue-specificity, e.g., as indicated by the protein atlas. According to a specific embodiment, in the single gene strategy, the expression of each gene is estimated based on the presence or absence of a nucleosome at the NDR (Figure 5b). The presence of a nucleosome at the NDR means that this gene cannot be expressed, as the nucleosome blocks the bulky transcription machinery from binding. In contrast, the absence of a nucleosome indicates that the gene may be expressed. In some cases, a gene may be in a poised state, meaning it is not expressed; however, the NDR is nucleosome-free to enable a rapid transcription initiation. According to a specific embodiment, an approach to assess the NDR-nucleosome status is to use the nucleosome priors approach. In any case, it is possible to generate co-regulatory gene networks based on the NDR nucleosome pattern. Specifically, one network consists of the genes with a nucleosome-blocked NDR, and the other network of genes with nucleosome-free NDRs as genes co-regulated should exhibit similar expression patterns. Specifically, if the NDR nucleosome status is established with the nucleosome priors strategy, not only the two states “NDR-blocked” vs. “NDR-free” are obtained but also intermediates, such as evidence for a blocked NDR in a certain percentage within the cfDNA. This information may be included in the construction of the networks described herein. According to one embodiment, a regulation network can be computed with various approaches and similarity metrics, such as Pearson Correlation Coefficients (PCC), modified Tanimoto similarity (Tfunction), Euclidean, Squared Euclidean, Standardized Euclidean, City Block, Chebychev, Cosine, or Pearson Correlation. MUG003P -36- In the context of the TF-DHS network described herein, TFs and DHSs are the nodes. The TF to DHS connections are also referred to as edges. According to one embodiment, it is determined whether activated TFs and distal DHSs are co-occurring in the cfDNA sample. In the context of the DHS-gene network described herein, genes and DHSs are the nodes. The gene to DHS connections are also referred to as edges. According to one embodiment, it is determined whether activated genes and distal DHSs are co-occurring in the cfDNA sample. According to one embodiment, the analyses of TF networks include TF-DHS and DHS-gene assignments to characterize different tissue-specific signatures (Georgolopoulos et al., 2021) (Figure 6). In general, regulatory elements can be located TF-agnostically by mapping DNase I hypersensitive sites (DHSs). The term “DNase hypersensitive site” or “DHS” as used herein refers to regions of open or accessible chromatin where DNA is not tightly wrapped within a nucleosome, leaving the sequence accessible to DNA-binding proteins. In general, DHSs are described e.g., by (Sheffield et al., 2013). DHSs mark all significant classes of cis-regulatory elements in their cognate cellular context. The systematic delineation of DHSs across human cell types and states has provided fundamental insights into many aspects of genome control (Vierstra et al., 2014). According to one embodiment, detailed mapping of DHSs provides detailed snapshots of regulatory element dynamics across the multidimensional landscape of cell types, environmental exposures, and developmental stages. In general, nucleosomes surrounding accessible promoters and TSS-distal DHSs are generally well-positioned. Nucleosomes surrounding DHSs are collectively well positioned, but well-positioned nucleosomes are associated mainly with regulatory elements in an actuated state. Hence, nucleosome positioning is dependent mainly on the actuation of regulatory DNA (Stergachis et al., 2020). Promoter elements are larger and more accessible elements; even though they represent the minority of elements, they dominate the top end of the quantitative accessibility landscape. Promoters also exhibit far less cell type selectivity. TF binding in the proximal promoter region regulates gene expression by forming the preinitiation complex. Distal regulatory elements influence the rate of gene transcription by acting as activators or repressors. MUG003P -37- According to one embodiment, distal regulatory factors are included in interaction network models for a comprehensive assessment of gene regulation. According to one embodiment, distal elements, which constitute the vast majority, exhibit considerable lineage- and cell type-selectivity, typically with ‘on/off’ behavior- i.e., most elements are complete ‘off’ in most cell and tissue types. According to one embodiment, the biological differences between proximal and distal DHSs and as the promoter, i.e., the TSS regions, are included in the TF-gene, and gene-gene interaction network, the TF-DHS and DHS-gene interaction networks will focus on distal DHSs sites. Specifically, they may overlap with some TFBSs; however, this strategy ensures that all regulatory regions, such as enhancers, will be included in our analyses. In general, regarding enhancer regions, it is not well-established which genes are targeted by these distal elements through mechanisms such as DNA looping. An approach named ELMER (Enhancer Linking by Methylation/Expression Relationships) used DNA methylation to identify enhancers and correlated enhancer states with the expression of nearby genes to identify transcriptional targets. ELMER represents a statistical framework for identifying cancer-specific enhancers and paired gene promoters. According to one embodiment, DHSs can be obtained from, e.g., the ENCODE project. There are 3.6 million consensus DHSs with an average width of 204 bp (median 196 bp, interquartile range (IQR) 151–240 bp) and which collectively span 665.57 Mb (21.55%) of the reference human genome sequence (Meuleman et al., 2020), which can be utilized for the methods described herein. Furthermore, there are thousands of tissue- specific DHSs. Several strategies for defining the tissue-specific and universal DHS region-sets exist. For example, the Regulatory Elements Database (http://dnase.genome.duke.edu/celltype.php) can be screened for highly specific (or universal) clusters. The sites can then be merged (concatenated) to contain each cluster into a single region-specific (or universally accessible) dataset (Peneder et al., 2021). Chromatin accessibility landscapes have also been mapped in solid tumors, including breast cancer, colon cancer, glioblastoma, gastric cancer, and lung cancer (Minnoye et al., 2021). At present, TF-DHS and DHS-gene networks are largely unexplored. Georgolopoulos and colleagues identified developmentally regulated DHSs and analyzed corresponding transcripts. Distal DHSs were linked to their target gene MUG003P -38- promoters and individual TFs to their target DHSs. The vast majority of gene-DHS links occurred within 50kb of the TSS. While, on average, 93.6 DHSs reside within ±1 Mb of a given gene, only 9 DHSs (±8SD) were found to be linked to a changing gene (Georgolopoulos et al., 2021). According to one embodiment, it is established which TFs and which genes are actuated to build the DHS-gene network. At the same time, maps of accessible distal DHSs are generated (Figure 6a). According to a specific embodiment, available public data (e.g., (Georgolopoulos et al., 2021)) can be used to test whether putative TF-DHS and DHS-gene interactions are present in an analyzed cfDNA sample, e.g., by aligning the TF and gene status with the accessibility of the respective DHSs (Figure 6a). Specifically, in the herein described methods, the relationships between TFs and genes via distal DHSs are modeled by integrating available DNase-seq data and generating edges where chromatin structure indicates that TFs are likely to bind and regulate gene expression. Specifically, an overlap of TF motif locations and gene expression status deduced from coverage pattern and nucleosome positioning with epigenetic data (open chromatin locations, here distal DHSs) is performed. Then the appropriate regions are connected with edges to construct a TF-DHS-gene regulatory network. As used therein, the term “cohort” or “cohort of subjects” shall refer to a group of subjects having a specific classification and may specifically refer to the samples received from said subjects. The number of subjects of a cohort can vary, i.e. it may comprise 2, 3, 4, 5, 6, 7 or more subjects, however it also may be a larger group of subjects, like for example but not limited to 10, 50, 100 or more subjects. According to the embodiment of the invention the cohort may also comprise large cohorts of 500 or more subjects. Specifically, the cohort of subjects as described herein shall refer to a group of subjects being associated with or having a condition. These subjects of a cohort can thereby be assigned to a specific classification or status, e.g. displaying a certain condition, such as a clinical, physiologic, or pathologic condition, specifically, selected from but not limited to health status, aging status, cell type, tissue type, and specific disease status. Specifically, the cohort of subjects shall refer to a group of subjects being healthy, unhealthy, of a certain age, and/or having a specific disease. Markers for specific conditions may be but are not limited to, network patterns indicating a specific condition of a subject or a cohort of subjects. MUG003P -39- According to a specific embodiment, cfDNA sample sets from well-defined cohorts can be employed and tested for recurrent patterns. For example, accessible TFBSs located outside of proximal promoters are mapped. Regulatory regions, i.e., accessible TFBSs for each actuated gene, may be investigated at various distances upstream and downstream of the TSS (Figure 6b). Different distances can be studied for such potential regulatory regions, e.g., ±5-10kb, ±20-25kb, ±45-50kb, ±70-75kb, or ±95-100kb. Thereby, it is enabled to predict an epigenetically informed gene regulatory network. Specifically, deducing from cfDNA data the actuated TFs, distal regulatory DHSs upstream or downstream of TSS are investigated in the next step. As these regulatory DHSs may not comprise thousands of DHSs but only a few, unique methods to establish their accessibility may be needed. For example, the distal DHSs of a core gene set may be combined to increase their number, or alternatively, a unique strategy, such as nucleosome priors, may be applied. According to one embodiment, the four different interaction networks are combined (Figure 7). According to one embodiment, TF-gene, TF-TF, and gene-gene networks overlap with TF-DHSs or DHSs-gene networks. All TSSs of genes and all TFBSs are also DHS sites. Therefore, the TF-gene, TF-TF, and gene-gene networks may also include TF- DHSs or DHSs-gene networks. However, DHSs may include some additional cCREs, such as enhancers. According to one embodiment, TFs, genes, and/or DHSs are selected for determining a regulation network described herein. The selection of these TFs, genes, and/or DHSs may depend on the specific further purpose of the regulation network, e.g., for determining whether a given cfDNA sample is from a healthy individual or a person with cancer, for determining the specific cancer type. According to a specific embodiment, the most differentially active TFs, genes, and/or DHSs between different specific cohorts are selected. Specifically, these are selected for determining a specific standard regulation network or a marker. According to one embodiment, an in vitro method for analyzing the cell and/or tissue origin of cell-free DNA (cfDNA) fragments from a sample is described herein comprising the steps of: i. extracting cfDNA fragments from the sample; ii. performing whole genome sequencing on the extracted cfDNA fragments; and MUG003P -40- iii. determining at least one of the regulation networks selected from the group consisting of: a. a transcription factor (TF)-gene network; b. a TF-TF network; c. a gene-gene network; d. a TF-DNase hypersensitive site (DHS) network; and e. a DHS-gene network, and comparing the at least one network with one or more standard regulation networks or regulation models characteristic for a specific tissue or cell comprising at least one network selected from a TF-gene network, a TF-TF network, a TF-DHS network, a DHS- gene network, and any combination thereof. According to one embodiment, a computer-implemented method for analyzing the cell and/or tissue origin of cell-free DNA (cfDNA) fragments from a sample is described herein comprising the steps of: i. receiving data representing the DNA sequences of cfDNA fragments acquired by sequencing of cfDNA fragments extracted from a sample; ii. determining at least one of the regulation networks selected from the group consisting of: a. a TF-gene network; b. a TF-TF network; c. a gene-gene network; d. a TF-DNase hypersensitive site (DHS) network; and e. a DHS-gene network, and comparing the at least one network with one or more standard regulation networks or regulation models characteristic for a specific tissue or cell comprising at least one network selected from a TF-gene network, a TF-TF network, a TF-DHS network, a DHS- gene network, and any combination thereof. According to a specific embodiment, defined sets of cfDNA samples, i.e., from well-defined cell and/or tissue origin are used herein for building standard regulation networks or regulation models characteristic for a specific tissue or cell. According to a specific embodiment, the methods described herein comprise the determination of at least one, two, three, four, or all of the regulation networks selected from the group consisting of: a. a transcription factor (TF)-gene network; MUG003P -41- b. a TF-TF network; c. a gene-gene network; d. a TF-DNase hypersensitive site (DHS) network; e. a DHS-gene network, and any combination thereof. According to one embodiment, an in vitro method for determining the health status of a subject comprising the steps of: i. extracting cfDNA fragments from a sample from the subject; ii. performing whole genome sequencing on the extracted cfDNA fragments; and iii. determining at least one of the regulation networks selected from the group consisting of: a. a TF-gene network; b. a TF-TF network; c. a gene-gene network; d. a TF-DNase hypersensitive site (DHS) network; and e. a DHS-gene network; iv. comparing the at least one regulation network of iii. with one or more standard regulation networks or regulation models derived from healthy subjects, and/ or unhealthy subjects; wherein a. congruence with the standard regulation network or regulation model derived from healthy subjects and difference with the standard network or model derived from unhealthy subjects is characteristic for a healthy status; and/or b. congruence with the standard regulation network or regulation model derived from unhealthy subjects and difference with the standard regulation network or regulation model derived from healthy subjects is characteristic for an unhealthy status. According to one embodiment, a computer-implemented method for determining the health status of a subject is described herein comprising the steps of: i. receiving data representing the DNA sequences of cfDNA fragments acquired by sequencing of cfDNA fragments extracted from a sample; ii. determining at least one of the regulation networks selected from the group consisting of: a. a transcription factor (TF)-gene network; MUG003P -42- b. a TF-TF network; c. a gene-gene network; d. a TF-DNase hypersensitive site (DHS) network; and e. a DHS-gene network; iii. comparing the at least one regulation network of ii. with one or more standard regulation networks or regulation models derived from healthy subjects, and/ or unhealthy subjects; wherein a. congruence with the standard regulation network or regulation model derived from healthy subjects and difference with the standard network or model derived from unhealthy subjects is characteristic for a healthy status; and/or b. congruence with the standard regulation network or regulation model derived from unhealthy subjects and difference with the standard regulation network or regulation model derived from healthy subjects is characteristic for an unhealthy status. According to a specific embodiment, the standard regulation network or regulation model derived from unhealthy subjects is derived from subjects suffering from a condition selected from cancer, specifically colorectal cancer, prostate cancer, colon cancer, breast cancer, bladder cancer, and/or lung cancer; inflammation; autoinflammatory diseases, coronary disease, acute tissue damage; chronic disease, specifically a chronic disease affecting the gastrointestinal tract, more specifically Crohn’s disease or ulcerative colitis, or chronic obstructive pulmonary disease; and/or asthma, or thyroiditis; complications during pregnancy; beginning sepsis; sepsis; hypertension; obesity; and diabetes; processes associated with aging. A standard regulation network may also be derived from specific cell types or tissue types. According to a specific embodiment, a congruence of 50, 60, 65, 70, 75, 80, 85, 90, 91, 92, 93, 94, 95, 96, 97, 98, 99, or 100% with the standard regulation network or regulation model derived from healthy subjects is characteristic for a healthy status. In a certain embodiment, a difference of 50, 60, 65, 70, 75, 80, 85, 90, 91, 92, 93, 94, 95, 96, 97, 98, 99, or 100% with the standard regulation network or regulation model derived from unhealthy subjects is characteristic for a healthy status. According to another specific embodiment, a congruence of 50, 60, 65, 70, 75, 80, 85, 90, 91, 92, 93, 94, 95, 96, 97, 98, 99, or 100% with the standard regulation network or regulation model derived from unhealthy subjects is characteristic for an MUG003P -43- unhealthy status. In a certain embodiment, a difference of 50, 60, 65, 70, 75, 80, 85, 90, 91, 92, 93, 94, 95, 96, 97, 98, 99, or 100% with the standard regulation network or regulation model derived from healthy subjects is characteristic for an unhealthy status According to a specific embodiment, the subject is considered healthy if the deviation of a regulation network between the regulation network obtained from the sample and a standard regulation network or a regulation model characteristic for a healthy subject is less than the deviation of a regulation network between the regulation networks obtained from the sample and a standard regulation network or a regulation model characteristic for an unhealthy subject. Specifically, said deviation of a regulation network between a regulation network obtained from the sample and a standard regulation network or a regulation model characteristic for a healthy subject is 99, 98, 97, 96, 95, 94, 93, 92, 91, 90, 89, 88, 87, 86, 85, 84, 83, 82, 81, 80,75, 70, 65, 60, 55, 50, 45, 40, 35, 30, 25, 20, 15, 10, or 5% of the deviation of a regulation network between a regulation networks obtained from the sample and a standard regulation network or a regulation model characteristic for an unhealthy subject. According to one embodiment, a machine learning model for binary classification between healthy and unhealthy regarding a specific disease group or pregnancy can be trained on the set of standard regulation networks from samples of both groups to learn patterns of networks that signify an unhealthy sample. Multiple such models can be combined to achieve multi-class classification. According to an alternative embodiment, a machine learning model may be used to learn classification from multiple algorithms. As used herein, the term “derived from” generally refers to an origin or source, and may include naturally occurring, recombinant, unpurified or purified molecules. A nucleic acid derived from an original nucleic acid may comprise the original nucleic acid, in part or in whole, and may be a fragment or variant of the original nucleic acid. A nucleic acid derived from a biological sample may be purified from that sample. According to a specific embodiment of the invention, a health status may be diagnosed. Such a health status can be an unhealthy status. Thereby, a certain disease, health condition, or also a predisposition may be diagnosed. As used herein, the term “diagnose” or “diagnosis” of a status or outcome generally refers to predicting or diagnosing the status or outcome, determining predisposition to a status or outcome, monitoring treatment of a subject (e.g., a patient), MUG003P -44- diagnosing a therapeutic response of a subject (e.g., a patient), and prognosis of status or outcome, progression, and response to particular treatment. According to one embodiment, the standard regulation networks or a regulation model characteristic for a specific health status comprising a TF-gene, a TF-TF, a TF- DHS, a DHS-gene, a TF-DHS-gene network, gene-gene networks, and any combinations thereof, are derived from cfDNA data employing well-defined cohorts. According to a specific embodiment, one cohort comprises healthy controls of both sexes and all age groups. According to a specific embodiment, healthy subjects may be understood as subjects not having the symptoms that the subject to be tested is suffering from. In general, healthy subjects are not suffering from cancer, inflammation, coronary disease, acute tissue damage, chronic disease, complications during pregnancy, beginning sepsis, sepsis, hypertension; obesity; and diabetes; processes associated with aging; and/or unhealthy aging. "Aging" according to this invention is a combination of processes of deterioration that follow the period of development of an organism. Aging is generally characterized by a declining adaptability to stress, increased homeostatic imbalance, increase in senescent cells, and increased risk of disease. Because of this, death is the ultimate consequence of aging. Unhealthy aging may be induced by stress conditions including, but not limited to chemical, physical, and biological stresses. Unhealthy aging is also referred to as “inflammaging”. For example, accelerated aging can be induced by stresses caused by UV and IR irradiation, drugs and other chemicals, chemotherapy, intoxicants, such as but not limited to DNA intercalating and/or damaging agents, oxidative stressors etc; mitogenic stimuli, oncogenic stimuli, toxic compounds, hypoxia, oxidants, caloric restriction, exposure to environmental pollutants, for example, silica, exposure to an occupational pollutant, for example, dust, smoke, asbestos, or fumes. According to one embodiment, in the methods described herein the standard regulation network or regulation model characteristic for an unhealthy subject are derived from subjects suffering from a condition selected from cancer, specifically colorectal cancer, prostate cancer, colon cancer, breast cancer, bladder cancer, and/or lung cancer; inflammation; autoinflammatory diseases, coronary disease, acute tissue damage; chronic disease, specifically a chronic disease affecting the gastrointestinal tract, more specifically Crohn’s disease or ulcerative colitis, or chronic obstructive MUG003P -45- pulmonary disease; and/or asthma, or thyroiditis; complications during pregnancy; beginning sepsis; sepsis; hypertension; obesity; and diabetes; processes associated with aging. According to a specific embodiment, cohorts of well-defined diseases or conditions e.g., chronic diseases, involving mainly specific organs may be generated. For example, individuals with colorectal cancer or chronic diseases affecting the GI tract e.g., Crohn’s disease, ulcerative colitis, are suited to evaluate GI-specific interaction networks in cfDNA. According to a specific embodiment, depending on the specification application of the methods described herein, a cohort of healthy controls may also comprise the data from samples of healthy individuals with common “co-morbidities”, such as hypertension, obesity, diabetes. Specifically, in this case, such co-morbidities do not lead to the result of an unhealthy status of a subject. In general, standard regulation networks or regulation models characteristic for a specific health status may be adapted to the specific application. For example, depending on the specific aim of the health status determination, data from specific physiological conditions may be incorporated into the healthy control or the unhealthy control. According to one embodiment, described herein is also the establishment of standard regulation networks and regulation models. In a specific embodiment, these standard regulation networks and regulation models are established from samples of healthy and/or unhealthy subjects as described herein. According to a specific embodiment, defined sets of cfDNA samples, i.e., from well-defined cohorts which not represent only two states (disease X vs. healthy) but cohorts of individuals with clinically annotated diseases (disease A, B, C,…) or physiologic conditions (e.g., age, obesity, and so on) are used herein. Thereby, combining the herein described networks, their different network topologies are established. According to a specific embodiment, artificial intelligence approaches may be applied, e.g., machine learning or convolutional neural networks, to identify cCRE signatures capable of predicting the respective biological/medical condition to achieve generalized models, which also work with a smaller number of samples (Figure 7). According to another specific embodiment, a specific disease may be diagnosed by the methods described herein. MUG003P -46- As used herein, the term “subject” generally refers to an individual, entity or a medium that has or is suspected of having testable or detectable genetic information or material. A subject can be a person, individual, or patient. The subject can be a vertebrate, such as, for example, a mammal. Non-limiting examples of mammals include humans, simians, farm animals, sport animals, rodents, and pets. The subject may be displaying a symptom(s) indicative of a health or physiological state or condition of the subject, such as a cancer or a stage of a cancer of the subject. As an alternative, the subject can be asymptomatic with respect to such health or physiological state or condition. According to one embodiment, an in vitro method for monitoring the treatment success of a patient is described herein comprising the steps of: i. extracting cfDNA fragments from a sample of said patient; ii. performing whole genome sequencing on the extracted cfDNA fragments; iii. determining at least one of the regulation networks selected from the group consisting of: a. a TF-gene network; b. a TF-TF network; c. a gene-gene network; d. a TF-DNase hypersensitive site (DHS) network; and e. a DHS-gene network; iv. comparing the at least one network of iii. with one or more regulation networks of a previous result from said patient and/or with one or more standard regulation networks characteristic for the treatment success, wherein differences and/or congruences obtained in iv. provide information on the treatment success of the patient According to one embodiment, a computer-implemented method for monitoring the treatment success of a patient is described herein comprising the steps of: i. receiving data representing the DNA sequences of cfDNA fragments acquired by sequencing of cfDNA fragments extracted from a sample; ii. determining at least one of the regulation networks selected from the group consisting of: a. a TF-gene network; b. a TF-TF network; c. a gene-gene network; MUG003P -47- d. a TF-DNase hypersensitive site (DHS) network; and e. a DHS-gene network; iii. comparing the at least one network of ii. with one or more regulation networks of a previous result from said patient and/or with one or more standard regulation networks characteristic for the treatment success, wherein differences and/or congruences obtained in iii. provide information on the treatment success of the patient. According to a specific embodiment, the treatment success is determined for diseases selected from cancer, specifically colorectal cancer, prostate cancer, colon cancer, breast cancer, bladder cancer, and/or lung cancer inflammation; autoinflammatory diseases, coronary disease, acute tissue damage; chronic disease, specifically a chronic disease affecting the gastrointestinal tract, more specifically Crohn’s disease or ulcerative colitis, or chronic obstructive pulmonary disease; and/or asthma, or thyroiditis; complications during pregnancy; beginning sepsis; sepsis; hypertension; obesity; and diabetes; processes associated with aging. Non-limiting examples of the diagnosed, monitored, or treated diseases include, neurodegenerative diseases, cancers, chemotherapy-related toxicities, irradiation induced toxicities, organ failures, organ injuries, organ infarcts, ischemia, acute vascular events, a stroke, graft-versus-host-disease (GVHD), graft rejections, sepsis, systemic inflammatory response syndrome (SIRS), cytokine releasing syndrome (CRS), multiple organ dysfunction syndrome (MODS), traumatic injuries, aging, diabetes, atherosclerosis, autoimmune disorders, eclampsia, preeclampsia, infertility, pregnancy- associated complications, coagulation disorders, asphyxia, drug intoxication, poisoning, and infections. In one specific embodiment, the disease is a cancer. Numerous cancers may be detected, monitored, or treated using the methods described herein. Cancer cells, as most cells, can be characterized by a rate of turnover, in which old cells die and are replaced by newer cells. Generally dead cells, in contact with vasculature in a given patient, may release DNA or fragments of DNA into the bloodstream. This is also true of cancer cells during various stages of the disease. This phenomenon may be used to detect the presence or absence of cancers in individuals using the methods described herein. For example, blood from patients at risk for cancer is drawn, or urine is collected, and the sample is prepared as described herein to generate a population of cfDNA. The methods of the disclosure are employed to detect cfDNA fragment patterns and features MUG003P -48- that may be unique to certain cancers present. The method may detect the presence of cancerous cells in the body, despite the absence of symptoms or other hallmarks of disease. The method may also help to detect different subtypes of cancer based on the features of the cfDNA fragments detected in the patient sample. The types and number of cancers that are detected, monitored, or treated include, but are not limited to, blood cancers, brain cancers, lung cancers, skin cancers, nose cancers, throat cancers, liver cancers, bone cancers, lymphomas, pancreatic cancers, bowel cancers, rectal cancers, thyroid cancers, bladder cancers, kidney cancers, mouth cancers, stomach cancers, solid state tumors, heterogeneous tumors, homogeneous tumors and the like. In certain embodiments, the methods provided herein may be used to monitor already known cancers or other diseases in a particular patient. This allows a practitioner to adapt treatment options in accordance with the progress of the disease. In this example, the methods described herein may track cfDNA or ctDNA in a particular patient over the course of the disease. In some instances, cancers can progress, i.e. become more aggressive and genetically unstable. In other examples, cancers remain benign, inactive, dormant or in remission. The methods of this disclosure may be useful in determining disease progression, remission or recurrence and the appropriate adjustments in treatment that are required for the disease state. Further, the systems and methods described herein may be useful in determining the efficacy of a particular treatment option. Biological samples are collected longitudinally over time from a single patient and comparison of the cfDNA profiles in all of the different samples collected illustrates how the cancer or disease is progressing or diminishing. The term “candidate cis-regulatory elements” or “cCREs” as used herein refers to regions of non-coding DNA which regulate the transcription of neighboring genes, i.e., promoters, enhancers, silencer, and operators. In general, cCREs are typically devoid of nucleosomes to allow binding of transcription factors. The term “open chromatin” as used herein refers to DNA regions which are often associated with regulatory factor binding and correspond to nucleosome-depleted regions (NDRs). Such open chromatin regions are associated with DNA regulatory elements, including promoters, enhancers, silencers, insulators, and locus control regions. MUG003P -49- According to one embodiment, open chromatin regions, i.e., cCREs are analyzed as described herein and networks are deduced thereof. Such networks are regulation networks or may be interaction networks. According to one embodiment, in the methods described herein open chromatic features to infer information are described, particularly at the network level. According to one embodiment, combinations of multiple tissue-specific cCRE markers in cfDNA are analyzed and specific interaction networks are built. According to one embodiment, prior knowledge about cCREs is used to construct sets of thousands of cCREs informative about disease and certain physiological conditions. According to one embodiment, maps of candidate cis- regulatory elements “cCREs” are publicly available e.g. through the ENCODE data portal or other publicly accessible databases. According to one embodiment of the invention, cCREs can be selected from publicly available databases (Encode Project Consortium et al., 2020a; Meuleman et al., 2020; Vierstra et al., 2020; Zhang et al., 2021) or from selected publications, e.g., for tissue-specific DHSs (Zhang et al., 2021). Further features may be derived from the expanding knowledge of tissue-specific gene expression (Breschi et al., 2020; Uhlen et al., 2015; Yao et al., 2015) or databases which get constantly updated, such as the Human Protein Atlas (www.proteinatlas.org), Genotype-Tissue Expression (GTEx) project (gtexportal.org), or others. According to one embodiment, the interplay between chromatin accessibility and gene expression dynamics is leveraged in the methods described herein. According to one embodiment, the cfDNA analysis focuses on regulatory network connections which are more tissue-specific than investigating only genes or only transcription factors. Thereby, context-dependent, non-canonical regulatory pathways, which are in addition to tissue-specifying informative about altered pathways, and therefore about diseases and physiological changes. According to one embodiment of the invention, an integrative analysis of chromatin accessibility and gene expression is described herein which corresponds to an application of GRN-cCREs. It is based to a large extent on tissue-specific TFs derived from publicly available databases, such as the resources provided by (Lambert et al., 2018) or the TF-Marker database (Xu et al., 2022). Thereby, thousands of links between individual regulatory elements and their target genes are considered and viewed within MUG003P -50- their biological context. For example, a transcribed, i.e., active, TF affects several proximal and distal open chromatin regions/cCREs (summarized in Figure 1): ^ The nucleosome depleted region (NDR) of the respective TF has a distinctive coverage pattern. ^ The distances between nucleosomes downstream of the TSS of this TF are different compared to silent genes, which can be assessed by Fourier transformation (FFT; informs about the frequencies) and short-time Fourier transformation (STFT; gives the location of the frequencies, i.e., the position when a signal changes). ^ The associated TFBSs show increased accessibility (TFBSs are selected from the GTRD database). ^ interaction networks are built and trained. Figure 1 describes the effect of transcribed TFs on open chromatin regions and nucleosome positions. Transcribed, i.e., active, TFs affect proximal and distal open chromatin regions/cCREs and nucleosome positions are shown. Thereby it is described in Figure 1 that: (Left upper panel): The nucleosome depleted region (NDR), which is upstream of the TSS, has a decreased coverage. The NDR is flanked by oscillating coverage patterns where the peaks indicate the position of the nucleosomes upstream and downstream of the NDR. (Left lower panel): The distances between nucleosomes downstream of the TSS are narrower than nucleosomes of silent genes. These differences can be determined by employing fast Fourier transformation (FFT; informs about the frequencies) and short- time Fourier transformation (STFT; gives the location of the frequencies, i.e., the position when a signal changes). (Center, bottom): Associated TFBSs may show different accessibilities. In the example shown here, the high accessibility at the binding sites of the TF REST is visible as an oscillating pattern in healthy controls (gray). In the first plasma sample from a patient with prostate cancer (P148_1), the accessibility of the REST binding sites is comparable to healthy controls. However, in a later sample (P148_3), the accessibility is decreased, as indicated in an almost flat line. (Right lower panel): Each TF influences the expression of several downstream genes. The downstream genes may differ significantly depending on the tissue or cell type. It was suggested that gene regulatory network connections are more tissue- specific than genes or TFs (Sonawane et al., 2017) (see Figure 2). MUG003P -51- (Right upper panel): Schematic diagram of TF-DHS and DHS-gene assignments. Differentially expressed TFs influence DHS densities, and in particular distal DHSs are linked to their downstream target genes and affect their expression. Such networks are integral in generating specific signatures for cell lineages. According to one embodiment, in the methods described herein many open chromatin features are integrated to infer information at the network level. According to one embodiment, cCRE sets and combinations thereof are made according to the methods described herein. According to a specific embodiment, signatures are used to learn neural network models to predict medical information. According to one embodiment, building of four interaction networks is described herein, i.e., TF-gene, TF-TF, gene-gene, and TF-DHS-gene. Furthermore, the combination of these networks to a comprehensive model is described herein. According to a specific embodiment, the method described herein includes the elucidation of TF-DHS-gene interaction networks within cfDNA samples by employing strategies to combine multiple parameters According to a specific embodiment, the method described herein includes the use of edges (TF-gene, TF-TF, TF-DHS, DHS-gene) that are cell-type specific. Some of them are registered in the same network. By combining data from multiple diseases i.e., not only disease versus healthy state is compared but also multiple diseases or multiple conditions are compared with healthy states. For example, multiple diseases or multiple conditions are compared with sex and age-matched control populations. Thereby, it is possible to determine cell-type/disease-specific components of the graphs that are edge-disjoint. According to a specific embodiment, the methods described herein include that these components are an approximation and serve for cell type/disease identification. There are enough specific edges to get power, especially in low-coverage (integration over many sites) situations. According to a specific embodiment, the method described herein includes the generation of generalizable models capable of identifying different disease stages and physiological conditions. According to a specific embodiment, the method described herein includes that, due to the multitude of data points, the herein described approach can handle smaller numbers of samples and samples sequenced with relatively low coverage. MUG003P -52- According to a specific embodiment, the method described herein includes inferring functional and biological information from cfDNA. The mere presence of DNA released from specific organs is not informative about the biological relevance, e.g., whether it indicates disease and, if so, which pathways are altered. The herein described approach enables the inclusion of the underlying systems biology in cfDNA applications. According to one embodiment, the methods described herein are minimally invasive or non-invasive. According to one embodiment, the target groups for adopting the methods described herein are physicians, patients, and professionals in the life science sectors. Furthermore, nucleosomes surrounding these highly accessible DNA sequences are characterized by specific features, such as histone modifications and different nucleosome distances to each other. In some examples, TFBS accessibility scores are used as input features in machine learning models to find correlations between sequence composition and subject (e.g., patient) groups. Examples of such patient groups include presence of diseases or conditions, stages, subtypes, responders vs. non-responders, and progressors vs. non-progressors. In some examples, feature matrices are generated to compare samples obtained from individuals with known conditions or characteristics. In some examples, samples are obtained from healthy individuals or individuals who do not have any of the known indications, and samples from patients known to have cancer. As used herein, as it relates to machine learning and pattern recognition, the term “feature” refers to an individual measurable property or characteristic of a phenomenon being observed. Features are usually numeric, but structural features such as strings and graphs may be used in syntactic pattern recognition. The concept of “feature” is related to that of explanatory variable used in statistical techniques such as for example, but not limited to, linear regression. In some examples, the feature is a transcription factor binding profile. In some examples, the feature is an accessibility score calculated from a transcription factor binding profile. In some examples, the features are inputted into a feature matrix for machine learning analysis. In some examples, the accessibility scores of at least 2, or at least 5, or at least 10, or at least 15, or at least 20, or at least 25 transcription factor binding sites are determined and inputted into a machine learning model to train a classifier capable of MUG003P -53- distinguishing between healthy subjects and cancer patients, or between disease progressors and non-progressors. In some examples, the accessibility scores of at least 2, or at least 5, or at least 10, or at least 15, or at least 20, or at least 25 transcription factor binding sites are determined and inputted into a machine learning model to train a classifier capable of distinguishing between a plurality of disease subtypes, or a plurality of disease stages. In some examples, the accessibility scores of at least 2, or at least 5, or at least 10, or at least 15, or at least 20, or at least 25 transcription factor binding sites are determined and inputted into a machine learning model to train a classifier capable of distinguishing between disease treatment responders and non-responders. For a plurality of assays, the system identifies feature sets to accept as inputs to a machine learning model. The system performs an assay on each molecule class and forms a feature vector from the measured values. The system accepts as inputs the feature vector into the machine learning model and generates an output classification of whether the biological sample has a specified property. In some examples, the machine learning model generates a classifier capable of distinguishing between two or more groups or classes of individuals or features in a population of individuals or features of the population. For example, the classifier may be a binary classifier capable of distinguishing between two groups or classes of individuals or features in a population of individuals or features of the population. As another example, the classifier may be a multi-class classifier capable of distinguishing between more than two groups or classes of individuals or features in a population of individuals or features of the population. In some examples, the classifier is a trained machine learning classifier. In some examples, the informative loci or features of biomarkers in a cancer tissue are assayed to form a profile. In the case of a binary classifier, receiver operating characteristic (ROC) curves may be generated for plotting the performance of a particular feature (e.g., any of the biomarkers described herein and/or any item of additional biomedical information) in distinguishing between two populations (e.g., individuals responding and not responding to a therapeutic agent). In some examples, the feature data across the entire population (e.g., the cases and controls) are sorted in ascending order based on the value of a single feature. In some examples, the specified property is selected from healthy vs. cancer, a disease subtype among a plurality of disease subtypes, a disease stage among a MUG003P -54- plurality of disease stages, progressor vs. non-progressor, responder vs. non- responder, or a combination thereof. According to a specific embodiment, the probability of the presence of a nucleosome or a nucleosomal dyad for a base position of the cfDNA fragments is determined by determining the dyad count distribution for specific fragment lengths, performing a fragment length-based truncation, determining probability density functions, and removing of the non-informative portion. This probability is also be termed “nucleosome dyad prior distribution”, “nucleosome prior distribution”, or “nucleosome prior” herein. According to one embodiment, the fragment length-specific prior probability P(H) gives the probability of a nucleosome, which is represented by its dyad in our model, being positioned relative to each base of the fragment. Based on the knowledge that nucleosome dyads confer by far the highest cleaving resistance to cfDNA fragments, the probability distribution of the dyad location across a fragment can be approximated by the associated cleaving resistance distribution. The maximum or the most pronounced local maxima of this cleaving resistance distribution gives or give the expected location of the nucleosome dyad or the locations of multiple nucleosome dyads from multiple DNA-associated histone complexes (i.e. di-nucleosomal fragments) relative to the fragment before all of the cfDNA fragmentation evidence of the alignment locus of that fragment has been taken into account. The process of computing fragment length-dependent prior distributions is also referred to as herein under “Creating prior knowledge”. According to a specific embodiment, creating prior knowledge can be further used for computing the positions of nucleosomes dyads based on coverage maxima and cfDNA fragmentation by using Baye’s Theorem. The theorem is shown in the equation I.
Figure imgf000055_0001
In equation I, H is the hypothesis, and E is the evidence. Probabilities are P(H) as the prior probability, P(E|H) as the likelihood, P(E) is called the model evidence or marginal likelihood, and P(H|E) is the posterior probability which is computed according to the methods described herein. For the problem at hand, the hypothesis MUG003P -55- is that the position of a nucleosome, represented by the position of its dyad, can be derived from the location of an observed cfDNA fragment, which originates from that very same nucleosome, by taking into account the length of the fragment and prior knowledge about the relationship between the dyad’s location and the fragment length. According to one embodiment, the evidence E is the combined information about cfDNA fragments gained from read alignment against the reference sequence e.g., a high-quality human reference genome, after sequencing. The sequence alignment step produces the length and position information for each fragment. In this context, the evidence E at a specific locus will also be called “observed fragmentation” or “fragmentation evidence”. According to one embodiment, the likelihood P(E|H) is the probability of observing a cfDNA fragment locally under the hypothesis that nucleosomal DNA in immediate genomic vicinity was the origin of the fragment before degradation. The likelihood reduces to the observed local fragmentation after taking into account that observing unprotected fragments by chance is highly unlikely. Observation of cfDNA fragments in bodily fluids of living mammals can only be justified by DNA being in a protective nucleosomal structure before fragmentation that hinders rapid clearance and recycling of cellular debris. According to one embodiment, the denominator P(E) is either called marginal likelihood or model evidence. According to a specific embodiment, for the factor P(E), other parameters like genomic locus or fragment length have been “integrated out” so that the probability does not depend on them anymore. If this marginal likelihood factor is omitted, the posterior probability is only proportional to the combination of observed fragmentation and prior knowledge (equation II). (II)
Figure imgf000056_0001
According to this specific embodiment related to equation II, it is not possible to integrate over the result to compute an actual probability between 0 and 1. However, this is negligible as only the local maxima of the posterior nucleosome signal are of interest, which works with a scaled version of the posterior probability independent of a constant scaling factor or a factor that varies significantly only on a large scale, i.e., not locally. MUG003P -56- According to one embodiment, the posterior probability P(H|E) is the probability of the hypothesis H being true after observing E. In other words, the average resistance to cleaving by DNases across all cfDNA pool tissue sources at a respective base of the genome given the local fragmentation evidence. Finding local maxima/calling peaks of this signal yields positions that show relatively high probability of harboring a nucleosome dyad in at least one of the contributing tissues since cleaving resistance maxima are considered to be conferred by nucleosomes, i.e., the maxima is the resulting average expected location of the nucleosomal dyad at that locus. According to one embodiment, local peaks of the posterior probability in equation II refer to the base positions in the reference genome sequence where a nucleosomal dyad is most likely to be present as determined in the methods described herein. The observed fragmentation refers to the cfDNA fragmentation profile obtained by aligning the DNA sequences of the cfDNA fragments with a reference genome sequence. The prior knowledge refers to the probability of the presence of a nucleosomal dyad for each base position of the cfDNA fragments. According to one embodiment, classifiers and predictors may be used in the methods described herein. Models for classification of health status of a patient and prediction of certain health parameters (e.g. response to therapy, development of tumor resistance to treatment, recurrence free survival, time to recurrence, tumor metastasized or not, time to sepsis, etc.) are trained on features and feature sets using machine learning methods. Features and feature sets may be selected and reduced in dimensions using principal component analysis (PCA) to extract combinations of features that explain variability in the data, non-negative matrix factorization (NMF) to extract recurring signatures in homogeneous sample groups, random forests or gradient boosting machines (also to limit the number of allowed decisions) to assess important binary decisions for classification, auto-encoders to reduce feature space to important hyperparameters (also de-noising) or similar methods and/or any combination of these. Suppression of batch effects on the feature selection procedure is achieved by applying standard controlling procedures involving computing of correlation metrics, regression analysis, (hierarchical) clustering of features based on similarity and testing resulting clusters against known possible confounding variables like sequencing batch, sequencing technology, depth of coverage, age of sample, sample sex (wherever applicable) an any other example. MUG003P -57- According to one embodiment, tissue deconvolution is used or performed in a method described herein. Tissue deconvolution refers to the inference of cfDNA contribution by individual tissues and/or cell-types to the cfDNA pool uses a reference catalog of tissue-/cell type-specific feature signatures. The catalog is created from existing sequencing data sets. Signatures may consist of single features or combinations of features and sets of these as described above. Features and sets may be restricted to certain regions or sets of regions of the genome, especially in the case of chromatin- associated features. To name a possible approach, non-negative matrix factorization (NMF) can be used to extract recurrent feature signatures from homogeneous groups of samples. Using the reference catalog, NMF can also be used to compute a “best fit” linear combination of signatures from a sequencing dataset. Signatures may not scale linearly with the abundance of their corresponding cfDNA releasing cells. Therefore, other methods than NMF might be used to achieve a more accurate deconvolution. Tissue deconvolution yields an estimate of tissue-/cell types that are described by the reference catalog as values between 0 and 1. Minor contributions might be ignored and only a ranked list of top contributing tissues used for further model training and/or use in regression or classification tasks. According to one embodiment, a computer program is provided comprising instructions which, when the program is executed by a computer, cause the computer to carry out the computer-implemented method described herein. According to a further embodiment, a computer-readable medium is provided having stored thereon the computer program described herein for performing the computer-implemented method. According to a specific embodiment, the computer-implemented method described herein comprises the step of receiving data representing the DNA sequences of cfDNA fragments acquired by sequencing of cfDNA fragments extracted from a sample of a subject. Specifically, said data may be generated using a sequencing-device connected to the computer or apparatus used for performing the computer-implemented method described. According to one embodiment, other features of the methods described herein, especially the features described herein in the context of determining regulation networks and regulation models can be combined with the described provided computer- implemented methods. MUG003P -58- According to one embodiment, a data processing apparatus comprising means for carrying out the computer-implemented methods described herein is provided by the invention. According to a specific embodiment, said data processing apparatus may be connected to an apparatus or device capable of sequencing cfDNA fragments. According to a specific embodiment, said data processing apparatus may be connected to an apparatus or device capable of extracting cfDNA from a sample. Specifically, said data processing apparatus is further connected to an apparatus or device capable of sequencing cfDNA fragments. According to one embodiment, a computer program is provided comprising instructions which, when the program is executed by a computer, cause the computer to carry out a computer-implemented method described herein. Specifically, said computer program may be combined with a computer program comprising instructions to cause the device capable of extracting cfDNA from a sample to execute its function of extracting cfDNA from a sample. Specifically, said computer program may be further combined with a computer program comprising instructions to cause the device capable of sequencing cfDNA fragments to execute its function of sequencing cfDNA fragments. Alternatively, said computer program may be combined with a computer program comprising instructions to cause the device capable of sequencing cfDNA fragments to execute its function of sequencing cfDNA fragments. According to yet another embodiment, an apparatus is used for performing a method described herein. Such apparatus may be characterized by the following features: (a) a sequencer configured to (i) receive DNA extracted from a sample of the bodily fluid comprising DNA, and (ii) sequence the extracted DNA under conditions that produce DNA fragment sequences; and (b) a computational apparatus configured to (e.g., programmed to) instruct one or more processors to perform various operations such as those described with two or more of the method operations described herein. In some embodiments, the computational apparatus is configured to perform one or more of the steps of the computer-implemented method described herein. In certain embodiments, the apparatus also includes a tool for extracting DNA from the sample under suitable conditions. In some embodiments, the apparatus includes a module configured to extract cfDNA obtained from plasma for sequencing in the sequencer. MUG003P -59- In some examples, the apparatus includes a database of reference genome sequences and/or standard regulation networks and regulation models. The computational apparatus may be further configured to instruct the one or more processors to map the cfDNA fragments obtained from the blood of the individual to the database of reference genome. The computational apparatus may be configured to instruct the one or more processors to determine at least one regulation network obtained from the analysis of cfDNA in a sample as described herein. The computational apparatus may be configured to compare the at least one regulation network obtained from the analysis of cfDNA with one or more standard regulation networks or regulation models as described herein. In general, the computational apparatus may perform all steps of the method described herein that can be performed by such an apparatus. Analysis of the sequencing data and the results derived therefrom are typically performed using computer hardware operating according to defined algorithms and programs. Therefore, certain embodiments employ processes involving data stored in or transferred through one or more computer systems or other processing systems. Embodiments of the invention also relate to an apparatus for performing these operations. This apparatus may be specially constructed for the required purposes, or it may be a general-purpose computer (or a group of computers) selectively activated or reconfigured by a computer program and/or data structure stored in the computer. In some embodiments, a group of processors performs some or all of the recited analytical operations collaboratively (e.g., via a network or cloud computing) and/or in parallel. A processor or group of processors for performing the methods described herein may be of various types including microcontrollers and microprocessors such as programmable devices (e.g., CPLDs and FPGAs) and other devices such as gate array ASICs, digital signal processors, and/or general purpose microprocessors. In addition, certain embodiments relate to tangible and/or non-transitory computer readable media or computer program products that include program instructions and/or data (including data structures) for performing various computer-implemented operations. Examples of computer-readable media include, but are not limited to, semiconductor memory devices, magnetic media such as disk drives, magnetic tape, optical media such as CDs, magneto-optical media, and hardware devices that are specially configured to store and perform program instructions, such as read-only memory devices (ROM) and random access memory (RAM). The computer readable MUG003P -60- media may be directly controlled by an end user or the media may be indirectly controlled by the end user. Examples of directly controlled media include the media located at a user facility and/or media that are not shared with other entities. Examples of indirectly controlled media include media that is indirectly accessible to the user via an external network and/or via a service providing shared resources such as the "cloud." Examples of program instructions include both machine code, such as produced by a compiler, and files containing higher level code that may be executed by the computer using an interpreter. According to one embodiment, the use of a computer-implemented method is described herein, wherein said computer-implemented method is used in a method described herein, specifically in an in vitro method described herein. Specifically, a computer-implemented method is used for the steps of: i. receiving data representing the DNA sequences of cfDNA fragments acquired by sequencing of cfDNA fragments extracted from a sample; and ii. performing at least one of the steps of a method described herein, specifically of an in vitro method described. According to one embodiment, the computer-implemented method described herein comprises the step of receiving data representing the DNA sequences of cfDNA fragments acquired by sequencing of cfDNA fragments extracted from a sample. This step is to be understood as being equal to the steps of extracting cfDNA fragments from the sample, and performing whole genome sequencing on the extracted cfDNA fragments, as described in the in vitro methods described herein. The following items are described herein: 1. An in vitro method for analyzing the cell and/or tissue origin of cell-free DNA (cfDNA) fragments from a sample comprising the steps of: i. extracting cfDNA fragments from the sample; ii. performing whole genome sequencing on the extracted cfDNA fragments; and iii. determining at least one of the regulation networks selected from the group consisting of: a. a transcription factor (TF)-gene network; b. a TF-TF network; c. a gene-gene network; MUG003P -61- d. a TF-DNase hypersensitive site (DHS) network; and e. a DHS-gene network, and iv. comparing the at least one network of iii. with one or more standard regulation networks or regulation models characteristic for a specific tissue or cell comprising at least one network selected from a TF-gene network, a TF-TF network, a TF-DHS network, a DHS-gene network, and any combination thereof. 2. A computer-implemented method for analyzing the cell and/or tissue origin of cell-free DNA (cfDNA) fragments from a sample comprising the steps of: i. receiving data representing the DNA sequences of cfDNA fragments acquired by sequencing of cfDNA fragments extracted from a sample; ii. determining at least one of the regulation networks selected from the group consisting of: a. a TF-gene network; b. a TF-TF network; c. a gene-gene network; d. a TF-DNase hypersensitive site (DHS) network; and e. a DHS-gene network, and iii. comparing the at least one network of ii. with one or more standard regulation networks or regulation models characteristic for a specific tissue or cell comprising at least one network selected from a TF-gene network, a TF-TF network, a TF-DHS network, a DHS-gene network, and any combination thereof. 3. An in vitro method for determining the health status of a subject comprising the steps of: i. extracting cfDNA fragments from a sample from the subject; ii. performing whole genome sequencing on the extracted cfDNA fragments; and iii. determining at least one of the regulation networks selected from the group consisting of: a. a TF-gene network; b. a TF-TF network; c. a gene-gene network; d. a TF-DNase hypersensitive site (DHS) network; and e. a DHS-gene network; MUG003P -62- iv. comparing the at least one regulation network of iii. with one or more standard regulation networks or regulation models derived from healthy subjects, and/ or unhealthy subjects; wherein a. congruence with the standard regulation network or regulation model derived from healthy subjects and difference with the standard network or model derived from unhealthy subjects is characteristic for a healthy status; and/or b. congruence with the standard regulation network or regulation model derived from unhealthy subjects and difference with the standard regulation network or regulation model derived from healthy subjects is characteristic for an unhealthy status. 4. A computer-implemented method for determining the health status of a subject comprising the steps of: i. receiving data representing the DNA sequences of cfDNA fragments acquired by sequencing of cfDNA fragments extracted from a sample; ii. determining at least one of the regulation networks selected from the group consisting of: a. a transcription factor (TF)-gene network; b. a TF-TF network; c. a gene-gene network; d. a TF-DNase hypersensitive site (DHS) network; and e. a DHS-gene network; iii. comparing the at least one regulation network of ii. with one or more standard regulation networks or regulation models derived from healthy subjects, and/ or unhealthy subjects; wherein a. congruence with the standard regulation network or regulation model derived from healthy subjects and difference with the standard network or model derived from unhealthy subjects is characteristic for a healthy status; and/or b. congruence with the standard regulation network or regulation model derived from unhealthy subjects and difference with the standard regulation network or regulation model derived from healthy subjects is characteristic for an unhealthy status. 5. The in vitro method of item 3 or the computer-implemented method of item 4, wherein the standard regulation network or regulation model derived from unhealthy subjects is derived from subjects suffering from a condition selected from cancer, MUG003P -63- specifically colorectal cancer, prostate cancer, colon cancer, breast cancer, bladder cancer, and/or lung cancer; inflammation; autoinflammatory diseases, coronary disease, acute tissue damage; chronic disease, specifically a chronic disease affecting the gastrointestinal tract, more specifically Crohn’s disease or ulcerative colitis, or chronic obstructive pulmonary disease; and/or asthma, or thyroiditis; complications during pregnancy; beginning sepsis; sepsis; hypertension; obesity; and diabetes; processes associated with aging. 6. An in vitro method for monitoring the treatment success of a patient comprising the steps of: i. extracting cfDNA fragments from a sample of said patient; ii. performing whole genome sequencing on the extracted cfDNA fragments; iii. determining at least one of the regulation networks selected from the group consisting of: a. a TF-gene network; b. a TF-TF network; c. a gene-gene network; d. a TF-DNase hypersensitive site (DHS) network; and e. a DHS-gene network; iv. comparing the at least one network of iii. with one or more regulation networks of a previous result from said patient and/or with one or more standard regulation networks characteristic for the treatment success, wherein differences and/or congruences obtained in iv. provide information on the treatment success of the patient. 7. A computer-implemented method for monitoring the treatment success of a patient comprising the steps of: i. receiving data representing the DNA sequences of cfDNA fragments acquired by sequencing of cfDNA fragments extracted from a sample; ii. determining at least one of the regulation networks selected from the group consisting of: a. a TF-gene network; b. a TF-TF network; c. a gene-gene network; d. a TF-DNase hypersensitive site (DHS) network; and e. a DHS-gene network; MUG003P -64- iii. comparing the at least one network of ii. with one or more regulation networks of a previous result from said patient and/or with one or more standard regulation networks characteristic for the treatment success, wherein differences and/or congruences obtained in iii. provide information on the treatment success of the patient. 8. The in vitro method of item 6 or the computer-implemented method of item 7, wherein the treatment success is determined for diseases selected from cancer, specifically colorectal cancer, prostate cancer, colon cancer, breast cancer, bladder cancer, and/or lung cancer inflammation; autoinflammatory diseases, coronary disease, acute tissue damage; chronic disease, specifically a chronic disease affecting the gastrointestinal tract, more specifically Crohn’s disease or ulcerative colitis, or chronic obstructive pulmonary disease; and/or asthma, or thyroiditis; complications during pregnancy; beginning sepsis; sepsis; hypertension; obesity; and diabetes; processes associated with aging. 9. The in vitro method or the computer-implemented method of any one of the preceding claims, wherein determining the TF-gene network comprises the steps of: a. determining the actively transcribed TFs; b. determining the tissue-specificity of the actively transcribed TFs from a.; c. determining the gene sets which the TFs from a. activate in each tissue determined in b.; d. evaluating if the gene sets are transcribed; e. determining the intersect for identical and different genes from the gene sets; and f. determining the network from the data obtained from e.. 10. The in vitro method or the computer-implemented method of any one of the preceding items, wherein the actively transcribed TFs are deduced from the coverage pattern at the transcription start site (TSS), preferably said coverage pattern comprises the nucleosome depleted region (NDR) and the 2K regions; the position of upstream and downstream nucleosome position patterns; the transcription factor binding sites (TFBS) accessibility; and optionally further from the relative entropy at TSS and TFBS. 11. The in vitro method or the computer-implemented method of any one of the preceding items, wherein determining the TF-TF network comprises the steps of: a. assessing the accessibility of the respective TFBS of each TF; MUG003P -65- b. optionally determining overlapping binding sites in TFs; c. determining the TF-TF interaction; d. correlating the accessibility obtained from a. with the interaction obtained from c.; and e. determining the network from the data obtained from d.. 12. The in vitro method or the computer-implemented method of any one of the preceding items, wherein determining the gene-gene network comprises the steps of: a. determining the expression status of pre-selected genes or gene-sets, wherein the expression status is determined by i. determining the coverage pattern at the NDR and/or at the 2K region, or ii. determining if a nucleosome is present at the NDR; b. correlating the genes according to their expression status; and c. determining the network from the data obtained from b.. 13. The in vitro method or the computer-implemented method of any one of the preceding items, wherein determining the TF-DHS network comprises the steps of: a. determining the actively transcribed TFs; b. determining maps of accessible distal DHSs; c. correlating the actively transcribed TFs with the maps of distal DHSs; and d. determining the network from the data obtained from c.. 14. The in vitro method or the computer-implemented method of any one of the preceding items, wherein determining the DHS-gene interaction network from DNA sequences of cfDNA fragments comprises the steps of: a. determining the gene expression status by i. determining the coverage pattern at the NDR and/or at the 2K region, or ii. determining if a nucleosome is present at the NDR; b. determining maps of accessible distal DHSs; c. correlating the gene expression status of a. with the maps of accessible distal DHSs of b.; and d. determining the network from the data obtained from c.. 15. The in vitro method or the computer-implemented method of any one of the preceding items, wherein the TF-DHS network and the DHS-gene network are combined into a TF-DHS-gene network. MUG003P -66- 16. A model comprising at least one of the regulation networks selected from TF-gene, TF-TF, gene-gene, TF-DHS, and DHS-gene networks obtained from cfDNA according to the method of any one of the preceding items. 17. A data processing apparatus comprising means for carrying out the computer-implemented method of any one of the preceding items. 18. A computer program comprising instructions which, when the program is executed by a computer, cause the computer to carry out the computer-implemented method of any one of the preceding items. 19. A computer-readable medium having stored thereon the computer program of item 18. EXAMPLES Example 1: sample preparation For this molecular testing, 10-20mL of blood are drawn from the subject into specialized tubes for the stabilization of cfDNA. These tubes contain a proprietary blend of reagents that both prevents blood coagulation and stabilizes white blood cells. This is critical, as white blood cells can release their DNA into the circulation, which masks the organ- or tumor-specific signal that is to be detected and profiled downstream. As an alternative to the specialized tubes standard tubes, such as EDTA tubes can be used, in such cases, processing of the blood has to be started within a defined time range. To separate plasma from whole blood, the tubes are centrifuged for two steps at 1,900 x g for 10 minutes each. DNA is isolated from 2-6mL of plasma using a cfDNA-specific extraction kit. Double-stranded DNA (dsDNA) originating from plasma is then quantified. On average, 1 mL of plasma from a patient with cancer contains approximately 1,500 diploid genome equivalents (GE) (~10ng of DNA), with considerably higher amounts often observed in patients with metastatic cancer. A typical 10 mL blood draw yields on average 4 mL plasma containing 6,000 GE (12 x 10^3 molecules per region or gene). For library preparation, 10ng of DNA are used as input and PCR is performed to selectively amplify library fragments containing adapters for subsequent sequencing. Libraries are then quantified and sequenced in paired-end mode (150bp x 2 or 100bp x 2) at high coverage (~30x). For humans, 30x coverage can be achieved with 600 million reads of 150 bp (or 300M paired-end reads). After quality control and pre-processing of sequencing reads (e.g. adapter trimming, base quality filtration, GC-correction), the data MUG003P -67- are further analyzed e.g., at least one of the regulation networks described elsewhere herein are determined. Example 2: Computation of nucleosome dyad prior distribution The basic principle is illustrated in Figure 8. (left side: assumption 1) The lower coverage plot illustrates three regions where different numbers of cfDNA fragments map. To the left is a locus with high coverage of sequencing reads, indicating high resistance to enzymatic digestion. The region on the right side has fewer sequencing reads suggesting moderate resistance to cleaving. In contrast, the region in the center hardly overlaps with any cfDNA sequencing reads, which indicates no protection from enzymatic digestion. As nucleosomes offer protection from enzymatic digestion during apoptosis and as the nucleosomal dyad is the region where nucleosomal DNA is most tightly bound, it is possible to translate the sequencing depths into nucleosome position maps where the position of maximum coverage overlaps with the nucleosome dyad (dashed light grey line in the upper coverage plot, the position of the inferred nucleosome dyad axis). Hence, nucleosomal dyad positions are inferred from sequencing read depth analysis for each cfDNA fragment in this step. (right side: assumption 2) The individual cfDNA fragment nucleosomal dyad information is then used to infer within each cfDNA fragment the relative position of the dyad (block arrows; left panel “Nucleosomal Fragments”). The nucleosomal dyad may map in the center of a cfDNA fragment, or may be somewhere off the center, or may not be determinable at all. The next step involves fragment length-specific dyad statistics. The inferred nucleosome dyad positions are recorded for all fragments that map to the same locus and have the same length (center panel, black triangles). These statistics are then translated into inferred (hypothetical) nucleosome positions over cfDNA fragments with a specific length (right panel). The empiric distribution of nucleosome dyad locations per fragment length is referred to as “nucleosome prior distribution” (yi). After this step, nucleosomal prior distributions for all cfDNA fragments with a particular length are available. Transformation of empiric count distributions to nucleosome prior distributions The principle is shown in Figure 9. The initial dyad count distribution for a specific fragment length is first truncated according to a certain strategy (step 2; the strategy shown is fragment length-based truncation), then normalized to an area under the curve of one to resemble a probability MUG003P -68- density function (step 3) and finally, the non-informative constant portion of counts is removed by adjusting the zero level (step 4). Overview heatmap of dyad count distributions with different distribution truncation strategies. The principle is shown in Figure 10. Count distributions are shown relative to the central base(s) (medium gray vertical line) of fragments. Counts for fragment lengths from 50bp to 300bp are depicted. Inferred dyads were counted beyond fragment ends, which are indicated by the medium gray dashed lines. The distributions shown in the figure are used for computing prior distributions of nucleosome dyads. Medium gray areas indicate low counts. The transition from medium gray to darker gray and further to lighter grays up to white as in the center of the figure, indicates increasing counts. The darker spots to the far left and right of the center mark increase in counts that can be attributed to neighboring nucleosomes of the observed fragment. The degree of how precisely neighboring nucleosomes are positioned relative to the nearest one (based on fragment length) can be derived from the spread of the spot for that fragment length. The most accurate neighboring dyad positioning seems to exist for fragments between 200bp and 220bp. Horizontal bands of different grays appearing approximately every 10bp vertically for smaller fragments up to a length of about 160bp originate from the increasing cleaving resistance based on the DNA helix twisting with an approximate 10bp periodicity, which causes steric hindrance of the cleaving process by making the DNA backbone facing towards the histone complex in the same periodic manner. White lines indicate the count minima that are closest to the fragment center, which would be likely chosen by the short-range truncation method. Fragment length truncation would terminate count distributions at the fragment ends instead (medium gray dashed lines). The uniform truncation strategy would use an identical distance from the center of each fragment to the truncated bases at both sides (end-to-end distance here: 170bp; white dotted lines). An unmarked version of the count heatmap is shown in the small panel on the top right. Nucleosome occupancy pattern from nucleosome priors. The principle is shown in Figure 11. Illustration of how the average dyad signal (^), i.e., the nucleosome posterior signal (bottom panel), is computed from the previously computed fragment-specific prior distributions (yi; center panel). In the first step, DNA fragments are replaced by their characteristic nucleosome prior distributions, and the per-base average across these MUG003P -69- distributions yields the nucleosome posterior signal. After this step, a detailed map of posterior nucleosomal dyad positions across the human genome can be computed via peak calling. Example 3: TF-gene interaction network Elucidation of TF to gene regulatory relationships from cfDNA data is described. The aim is to reconstruct the TF-gene interaction network from cfDNA data, i.e., the regulatory connections from TFs to their target genes within the GRN. Data on TFs and regulated genes can be retrieved from various sources, e.g., from the PANDA (Passing Attributes between Networks for Data Assimilation) (Guebila et al., 2022b) or the GRAND websites (https://grand.networkmedicine.org) (Guebila et al., 2022a). As an example, Figure 2 illustrates downstream-regulated genes for the TFs HNF4G and HNF4A in whole blood and colon adenocarcinoma. In the respective tissues, different genes are affected by these two TFs. Thereby Figure 2 shows the following: To explore the GRN of TFs in different tissues, HNF4G (a) and HNF4A (b) were exemplarily analyzed in whole blood (left panel) and colon adenocarcinoma (right panel) employing the GRAND database (https://grand.networkmedicine.org) (Guebila et al., 2022a). In both tissues, these two TFs regulate the expression of different gene sets. Figure 2a (panel HNF4G, blood) shows the following: The GRN of the TFs HNF4G in whole blood was established using the GRAND database (https://grand.networkmedicine.org) (Guebila et al., 2022a). The graph shows the results for the top 10 genes; the thickness of the arrows reflects the edge weight. The following table 1 displays the top twenty-one genes associated with HNF4G in blood: Table 1
Figure imgf000070_0001
MUG003P -70-
Figure imgf000071_0001
Figure 2a (panel HNF4G, colon adenocarcinoma (TCGA)) shows the following: The GRN of the TFs HNF4G in colon adenocarcinoma (TCGA) was established using the GRAND database (https://grand.networkmedicine.org) (Guebila et al., 2022a). The graph shows the results for the top 10 genes; the thickness of the arrows reflects the edge weight. The following table 2 displays the top twenty-one genes associated with HNF4G in colon adenocarcinoma (TCGA). Table 2
Figure imgf000071_0002
Figure 2b (panel HNF4A, blood) shows the following: The GRN of the TFs HNF4A in whole blood was established using the GRAND database MUG003P -71- (https://grand.networkmedicine.org) (Guebila et al., 2022a). The graph shows the results for the top 10 genes; the thickness of the arrows reflects the edge weight. The following table 3 displays the top twenty-one genes associated with HNF4A in blood. Table 3
Figure imgf000072_0001
Figure 2b (panel HNF4A, colon adenocarcinoma (TCGA)) shows the following: The GRN of the TFs HNF4A in colon adenocarcinoma (TCGA) was established using the GRAND database (https://grand.networkmedicine.org) (Guebila et al., 2022a). The graph shows the results for the top 10 genes; the thickness of the arrows reflects the edge weight. The following table 4 displays the top twenty-one genes associated with HNF4G in colon adenocarcinoma (TCGA). MUG003P -72- Table 4
Figure imgf000073_0001
As mentioned above, at the TF level, the transcribed status is reflected in the typical TSS pattern, i.e., in the NDR and 2K region (Ulz et al., 2016) and the distances of upstream and downstream nucleosomes (FFT, STFT) and the accessibility of the corresponding TFBSs. The transcribed status of the targeted genes can again be deduced from the typical TSS pattern. Accordingly, building the TF-gene interaction network in cfDNA consists of several steps (Figure 3a). First, a list of actively transcribed TFs is generated from cfDNA data. The activity (transcription) of TFs is deduced from the coverage pattern at the TSS, i.e., the NDR and 2K regions. In addition to the TSS data the position of upstream and downstream nucleosomes are included to establish the transcription status of the respective TFs. Furthermore, for each TF, the accessibility of the respective TFBSs should be increased (Ulz et al., 2019), which represents another parameter for TF status assessment included in the model. Hence, the activity of a TF is assessed by a combination of several factors (TSSs (i.e., NDR/2K), nucleosome positions, TFBS accessibility). Further parameters, such as the (relative) entropy at TSS and TFBSs, may be included in the MUG003P -73- evaluation. This step will reveal all TFs with evidence for active transcription in the analyzed cfDNA sample. Second, from the actively transcribed TFs, the TFs with reliable information on tissue-specificity are selected. The information about tissue-specificity can be taken from resources such as (Lambert et al., 2018), the TF-Marker database (Xu et al., 2022), or other sources. Third, for transcribed TFs, the gene sets, which the TFs in different tissues activate, are analyzed (TSS (NDR/2K) and nucleosome positions (distances between nucleosomes)). Hence, for each TF, the gene set activated by this TF in tissue A, then the gene set activated in tissue B, and so on, are evaluated separately. These gene sets can be retrieved from various resources (e.g., the GRAND database). Fourth, after completed analyses, the gene sets analyzed in the third step are evaluated for the most substantial evidence of whether they are active, i.e., transcribed. For example, one option is to establish a ranking order according to the evidence to which tissue they correspond. The ranking of these gene sets enables to estimate the cell abundance in cfDNA. As mentioned above, cfDNA represents a mixture of DNA released from different tissues, and the composition may change depending on physiologic or pathological conditions. Hence, multiple ranking lists of gene networks, i.e., one ranking list for each TF are generated. Numerous data sets are obtained to reconstruct which tissue contributed what percentage to the cfDNA pool. Fifth, these gene lists are compared and filtered for the genes common or different in the lists (Figure 3b). The purpose is that the gene networks regulated by tissue- specific TFs (e.g., hematopoietic or GI-specific TFs, as exemplarily shown in Figure 3b) will have similarities and overlap (Sonawane et al., 2017). Establishing such similarities further increases the resolution and specify specific tissues' contribution to the cfDNA pool with improved precision. Figure 3 (Construction of TF-gene interaction networks) describes the following: (a) The individual steps in building the TF-gene interaction network in cfDNA are indicated in the rectangles. First, a list of actively transcribed TFs is generated (for details, see text). Second, from the actively transcribed TFs, the TFs with reliable information on tissue-specificity are selected. Third, for each transcribed TFs, the specific gene sets, which the TF in different tissues activates, are analyzed. Each gene set is intersected with a BED file containing the regulatory regions of the genes. In addition, the TSS (NDR/2K) and MUG003P -74- nucleosome positions (distances between nucleosomes) are evaluated. Fourth, the tissue-specific gene sets are considered for the most substantial evidence of whether they are active, i.e., transcribed. One option for further processing is to establish a ranking order according to the evidence to which tissue they correspond. Other options may include employing neuronal networks or autoencoders. As these analyses will be repeated for each TF, several such ranking lists will be generated, providing an accurate tissue contributions pattern. (b) Finally, the gene sets are intersected for similarities and differences. Illustrated are examples of TFs with high specificity in hematopoiesis and the GI tract. For example, TF1 has a high specificity for hematopoietic cells. It regulates gene set 2 in “tissue B”, which may, in this example, be neutrophils. Furthermore, it regulates gene set 13 in “tissue M”, e.g., lymphocytes. TF1 may also be involved in controlling gene sets in other organs, e.g., gene set 15 in “tissue O”, which may, in this example, be the kidney. If the kidney does not contribute DNA to the cfDNA pool (as expected in healthy controls), there should not be the gene set 15 specific signature detectable in the respective DNA sample. The gene sets for hematopoietic TFs should have higher similarities than those for TFs specific to the GI tract (in this example, TF1 and TF2 both control the same gene set, 13). The intersection of similarities and differences between TFs with similar tissue- specificity further increases our strategy's resolution. Example 4: TF-TF interaction network Establishment of cooperative interactions between TFs in each cfDNA sample, i.e., deciphering of the TF networks, which cooperate in each cfDNA sample is described. Regarding TF-TF networks, the regulatory process is multi-faceted. The genomic locations where TFs may bind, i.e., the TFBSs, are usually computationally estimated using DNA recognition sequences, i.e., motifs. However, focusing only on predicting the genomic locations of TF binding does not help deduce GRN relationships. TFs may work together by forming protein complexes. Consequently, a member of a TF complex may regulate a target gene even without a corresponding binding site in the regulatory region of that gene. From protein- protein interaction (PPI) data, it is known that TFs often form multi-protein complexes that carry out regulatory functions. Therefore, investigating only an initial set of motif locations does not include cases where TFs bind to the DNA without a corresponding recognition sequence (motif). Furthermore, not all TFBSs are functionally relevant or MUG003P -75- active. In the following, it is outlined how evidence for TF interactions can be established from cfDNA data (Figure 4). First, for each TF, the accessibility of the respective TFBSs is assessed. This reveals different accessibilities, e.g., high, medium, or low accessibility (Figure 4a). Second, an important issue is how TFs share TFBSs, e.g., due to sequence homology or other factors. As many TFs bind in large complexes, overlapping binding sites in TFs that regulate the same genes are expected to be found in the GTRD database. The most recent GTRD version 21.12 (Kolmykov et al., 2021) (https://gtrd.biouml.org/) harbors information on 1.391 TFs. Hence, the procedure includes a constantly updated screening for TFBS overlaps based on the latest versions of the respective databases (Figure 4b, lower panel). Third, PPI data are employed to explore established TF cooperation in cfDNA. PPIs can be obtained from public interaction databases such as PINA2, STRING, IntAct, and BioGRID or from publications e.g. from Göös et al. (2022). It is suggested that TFs prefer to form more transient or proximal interactions than stable protein complexes. Furthermore, marked differences in the number of detected PPIs between different TF families are observed. There are two possible scenarios based on the fact that TFs, which cooperate within the same tissue, should show a high concordance of their accessibility in cfDNA: 1. If TFs cooperate exclusively in the same tissue, their accessibility patterns should show a strong correlation. In these cases, it will be possible to deduce something like: if TFn has increased accessibility, so should TFm. For example, if HNF4A has increased accessibility, the accessibility of TYY1 should be likewise increased. Figure 4d illustrates an example of close cooperation between TF1 and TF2, but not TF3, in the hematopoietic system. Examples for co-expressed, cell-type-specific TFs exist. For example, most cells in the human body share a few broad transcriptional programs, which define five major cell types: epithelial, endothelial, mesenchymal, neural, and blood cells (Figure 4e). Hence, these transcriptionally defined major cell types correspond broadly, but not precisely, to the basic histological types in which tissues are usually classified (Breschi et al., 2020). Hence, the cooperate accessibility of these TFs has a high power to evaluate the balance between hematopoietic derived DNA and epithelial DNA within a cfDNA sample with high precision. 2. If TFs cooperate with other TFs in different tissues, the pattern of correlations in cfDNA changes according to the contribution of different tissues to the cfDNA. This is MUG003P -76- depicted in Figure 4f, where TF1 cooperates with T2 in hematopoietic cells and with TF3 in the GI tract. In cfDNA samples from individuals without diseases of the GI tract, i.e., without GI-derived DNA in the circulation, the TF1-TF2 pattern will be concordant because of the low contribution of GI-derived DNA to the cfDNA pool. However, if the contribution of DNA from the GI tract increases in the circulation, the correlation between accessibility patterns between TF1 and TF3 increases, whereas the correlation between TF1 and TF2 decreases. Hence, TF correlation matrices can elucidate which TFs cooperate and establish a TF-by-TF “cooperativity network”, particularly if DNA from various tissues contributes to the cfDNA pool. Combined with tissue deconvolution means, i.e., methods allowing to establish which tissues contribute what amount to the cfDNA pool, TF interactions can be set for various tissues, which release DNA into the circulation. Figure 4 (Construction of TF-TF interaction networks) describes the following: (a) TFBSs may show different accessibilities, ranging from high to medium or low or not accessible. (b) The present procedure includes regular screening for potential TFBS overlaps based on the latest versions of the respective databases. (c) The TF HNF4A is depicted as an example of establishing TF cooperations by incorporation of PPI data. The scheme is based on data from (Göös et al., 2022) and reveals the TFs NFIB, NFIA, ELF2, TYY1, CREB1, and P53 as cooperation partners of HNF4A. (d) In this example, PPI data suggest strong cooperation between TF1 and TF2 but not with TF3 in hematopoietic lineages. Hence, TF1 and TF2 show concordant TFBSs accessibilities in most cfDNA samples, whereas the accessibility of TF3 is independent of the TF1 and TF2 patterns. (e) Left panel: Correspondence between transcriptionally derived major cell types and classical histological types. Right panel: Network of the most strongly co-expressed cell-type-specific TFs, shown here exemplarily for endothelial cells. Data based on (Breschi et al., 2020). Nodes are based on the availability of sequence motif: (square) available; (circle) not available. (f) In this example, TF1 cooperates in addition with T2 also with TF3. However, the TF1-TF2 cooperation is confined to the hematopoietic system, whereas the TF1-TF3 cooperation is in the GI tract. In cfDNA samples from individuals without diseases of the GI tract, the TF1-TF2 pattern will still be concordant due to the low contribution of GI- MUG003P -77- derived DNA to the cfDNA pool. However, if the contribution of DNA from the GI tract increases in the circulation, the correlation between accessibility patterns between TF1 and TF3 increases, whereas the correlation between TF1 and TF2 decreases. Example 5: Gene-gene interaction network Estimation of whether (pairs of) genes are co-regulated (Figure 5) is described. Figure 5 (Construction of gene-gene interaction networks) describes the following: (a) Core gene set approach: Distinct core gene sets are defined. The combined coverage profile of these genes reveals information about their expression status. A decreased coverage at the NDR and oscillating coverage pattern upstream and downstream indicates a high expression. In contrast, a relatively flat pattern designates that these genes are unexpressed (upper panels). Another representation is to plot the data in a coordinate system (lower panels). The y-axis represents the coverage at the NDR, and the x-axis the coverage within the 2K region, i.e., the 2K flanking the NDR, the region with the oscillating coverage pattern. This results in dots for each gene group referred to as centroids. For example, centroids for core gene sets that correspond to the major cell types epithelial, endothelial, mesenchymal, neural, and blood cells are shown. In addition, centroids for housekeeping genes (HK genes), which are usually highly expressed, and for unexpressed genes, according to the protein atlas (PAU (protein atlas unexpressed genes), are used as references to estimate the expression status of gene sets. In a healthy individual, the centroid for a blood core gene set should be close to the HK gene set because the vast majority of cfDNA is derived from the hematopoietic system. In contrast, as epithelial cells do not contribute significant quantities of DNA to the cfDNA pool, the epithelial centroid should be in the vicinity of the PAU centroid. In a cfDNA sample from a patient with cancer, the contribution of epithelial DNA increases, and the relative proportion of blood-derived DNA decreases, which becomes apparent by shifts of these centroids. Hence, a core gene set approach can estimate the various contributions of different tissues to the cfDNA pool. (b) Single-gene strategy: The expression of each gene is estimated based on the presence or absence of a nucleosome at the NDR. The presence of a nucleosome at the NDR means that this gene cannot be expressed, as the nucleosome blocks the bulky transcription machinery from binding. In contrast, the absence of a nucleosome indicates that the gene may be expressed. In some cases, a gene may be in a poised state, meaning it is not expressed. However, the NDR is nucleosome-free to enable a rapid MUG003P -78- transcription initiation. One option to build a gene-gene interaction network is to generate co-regulatory gene networks based on the NDR nucleosome pattern. One network consists of genes with nucleosome-blocked NDR, whereas the other consists of genes with nucleosome-free NDR. Co-regulation of genes from cfDNA can be determined with two strategies: one strategy involves defining core gene sets and determining their combined expression pattern in cfDNA. Another strategy investigates single genes. This may utilize the prior nucleosome strategy where the presence or absence of a nucleosome at its TSS is used as a proxy for gene expression. Core gene set strategy: Distinct core gene sets are defined; the design of these gene sets depends on the question to be addressed. For example, core gene sets are defined corresponding to major cell types using extensive new maps of RNA transcripts in a broad range of primary cell types (Figure 5a). Core transcriptional programs define the morphology and function common to a few major cellular types, which are at the root of the hierarchy of the many cell types that exist in the human body, i.e., epithelial, endothelial, mesenchymal, neural, and blood cells (Breschi et al., 2020). Genes whose expression is specific to these cell types are identified. From these genes, the contribution of the major cell types to the composition of human tissues was estimated, resulting in 2,871 genes (including 2,463 protein-coding genes; 283 long non-coding RNAs, and 125 pseudogenes) whose expression was specific to epithelial, endothelial, mesenchymal or melanocyte cell types (Breschi et al., 2020). There are many other options to define meaningful gene sets, e.g., subsets of PBMC specific genes according to their expression levels or organ-specific gene sets based on tissue-specificity, e.g., as indicated by the protein atlas. Single gene strategy: The expression of each gene is estimated based on the presence or absence of a nucleosome at the NDR (Figure 5b). The presence of a nucleosome at the NDR means that this gene cannot be expressed, as the nucleosome blocks the bulky transcription machinery from binding. In contrast, the absence of a nucleosome indicates that the gene may be expressed. In some cases, a gene may be in a poised state, meaning it is not expressed; however, the NDR is nucleosome-free to enable a rapid transcription initiation. One approach to assess the NDR-nucleosome status is to use the nucleosome priors approach. In any case, it is possible to generate co-regulatory gene networks based on the NDR nucleosome pattern. One network consists of the genes with a nucleosome-blocked NDR, and the other network of genes MUG003P -79- with nucleosome-free NDRs as genes co-regulated should exhibit similar expression patterns. Suppose the NDR nucleosome status was established with the nucleosome priors strategy. In that case, not only the two states “NDR-blocked” vs. “NDR-free” are obtained but also intermediates, such as evidence for a blocked NDR in a certain percentage within the cfDNA. This information is included in the construction of our networks. The final regulatory network can be computed with various approaches and similarity metrics, such as Pearson Correlation Coefficients (PCC), modified Tanimoto similarity (Tfunction), Euclidean, Squared Euclidean, Standardized Euclidean, City Block, Chebychev, Cosine, or Pearson Correlation. Example 6: TF-DHS and DHS-gene interaction network The aim is to capture all relevant regulatory regions. The analyses of TF networks include TF-DHS and DHS-gene assignments to characterize different tissue-specific signatures (Georgolopoulos et al., 2021) (Figure 6). Figure 6 (TF-DHS and DHS-gene interaction network to include all (distant) relevant regulatory regions) describes the following: (a) In each cfDNA sample, we establish which TFs and genes are actuated. In parallel, maps of distal DHSs are generated. These TF, gene, and DHS signatures can be aligned with publicly available data to establish the interaction. In addition, recurrent TF-DHS-gene patterns can be selected and verified from cfDNA samples using well- defined cohorts. (b) Distal DHSs can be investigated at various distances to the NDR. This example displays DHSs 20-25 kb downstream of the TSS, with increased accessibility to a TF. These distal DHSs modulate the expression status of the gene. A detailed comparison between such distal DHSs and the NDR regions allows us to establish the interaction from cfDNA data. Building the network: Regulatory elements can be located TF-agnostically by mapping DNase I hypersensitive sites (DHSs). DHSs indicate open or accessible chromatin where DNA is not tightly wrapped within a nucleosome, leaving the sequence accessible to DNA-binding proteins (Sheffield et al., 2013). DHSs mark all significant classes of cis-regulatory elements in their cognate cellular context. The systematic delineation of DHSs across human cell types and states has provided fundamental insights into many aspects of genome control (Vierstra et al., 2014). Detailed mapping of DHSs provides detailed snapshots of regulatory element dynamics across the MUG003P -80- multidimensional landscape of cell types, environmental exposures, and developmental stages. Nucleosomes surrounding accessible promoters and TSS-distal DHSs are generally well-positioned. Nucleosomes surrounding DHSs are collectively well positioned, but well-positioned nucleosomes are associated mainly with regulatory elements in an actuated state. Hence, nucleosome positioning is dependent mainly on the actuation of regulatory DNA (Stergachis et al., 2020). Promoter elements are larger and more accessible elements; even though they represent the minority of elements, they dominate the top end of the quantitative accessibility landscape. Promoters also exhibit far less cell type selectivity. TF binding in the proximal promoter region regulates gene expression by forming the preinitiation complex. Distal regulatory elements influence the rate of gene transcription by acting as activators or repressors. Therefore, inclusion of these distal regulatory factors in interaction network models provides a comprehensive assessment of gene regulation. Furthermore, distal elements, which constitute the vast majority, exhibit considerable lineage- and cell type-selectivity, typically with ‘on/off’ behavior- i.e., most elements are complete ‘off’ in most cell and tissue types. Because the biological differences between proximal and distal DHSs and as the promoter, i.e., the TSS regions, are included in the TF-gene, and gene-gene interaction network, the TF-DHS and DHS-gene interaction networks will focus on distal DHSs sites. They may overlap with some TFBSs; however, this strategy ensures that all regulatory regions, such as enhancers, will be included in our analyses. Regarding enhancer regions, it is not well-established which genes are targeted by these distal elements through mechanisms such as DNA looping. An approach named ELMER (Enhancer Linking by Methylation/Expression Relationships) used DNA methylation to identify enhancers and correlated enhancer states with the expression of nearby genes to identify transcriptional targets. ELMER represents a statistical framework for identifying cancer-specific enhancers and paired gene promoters. There is a wealth of information about DHSs, e.g., from the ENCODE project. There are 3.6 million consensus DHSs with an average width of 204 bp (median 196 bp, interquartile range (IQR) 151–240 bp) and which collectively span 665.57 Mb (21.55%) of the reference human genome sequence (Meuleman et al., 2020), which can be utilized for our purposes. Furthermore, there are thousands of tissue-specific DHSs. Several strategies for defining the tissue-specific and universal DHS region-sets exist. MUG003P -81- For example, the Regulatory Elements Database (http://dnase.genome.duke.edu/celltype.php) can be screened for highly specific (or universal) clusters. The sites can then be merged (concatenated) to contain each cluster into a single region-specific (or universally accessible) dataset (Peneder et al., 2021). Chromatin accessibility landscapes have also been mapped in solid tumors, including breast cancer, colon cancer, glioblastoma, gastric cancer, and lung cancer (Minnoye et al., 2021). At present, TF-DHS and DHS-gene interaction networks are largely unexplored. It is established herein which TFs and which genes are actuated to build the network. At the same time, maps of accessible distal DHSs are generated (Figure 6a). Then, available public data can be used (e.g., (Georgolopoulos et al., 2021)) to test whether putative TF-DHS and DHS-gene interactions are present in an analyzed cfDNA sample, e.g., by aligning the TF and gene status with the accessibility of the respective DHSs (Figure 6a). The relationships are modeled between TFs and genes via distal DHSs by integrating available DNase-seq data and generating edges where chromatin structure indicates that TFs are likely to bind and regulate gene expression. Hence, TF motif locations are overlapped and gene expression status is deduced from coverage pattern and nucleosome positioning with epigenetic data (open chromatin locations, here distal DHSs). Then the appropriate regions are connected with edges to construct a TF- DHS-gene regulatory network. Alternatively, cfDNA sample sets from well-defined cohorts can be employed and tested for recurrent patterns. For example, accessible TFBSs located outside of proximal promoters are mapped. Regulatory regions, i.e., accessible TFBSs for each actuated gene, are investigated at various distances upstream and downstream of the TSS (Figure 6b). Different distances can be studied for such potential regulatory regions, e.g., ±5-10kb, ±20-25kb, ±45-50kb, ±70-75kb, or ±95-100kb. This enables to predict an epigenetically informed gene regulatory network. Hence, deducing from cfDNA data the actuated TFs, distal regulatory DHSs upstream or downstream of TSS are investigated in the next step. As these regulatory DHSs do not comprise thousands of DHSs but only a few, unique methods to establish their accessibility may be needed. For example, the distal DHSs of a core gene set can be combined to increase their number, or alternatively, a unique strategy, such as nucleosome priors, can be applied. Example 7: Building a comprehensive model The four different interaction networks are combined (Figure 7). MUG003P -82- Figure 7 (Building a comprehensive model from cfDNA) describes the following: Well-defined cohorts will be scrutinized by employing the four interaction networks. Combining these four networks, their different network topologies are established. A variety of machine learning / artificial intelligence approaches, such as neuronal networks or autoencoders may be employed. Defined sets are employed of cfDNA samples, i.e., from well-defined cohorts which will not represent only two states (disease X vs. healthy) but cohorts of individuals with clinically annotated diseases (disease A, B, C,…) or physiologic conditions (e.g., age, obesity, and so on). Combining the four described networks, their different network topologies are established. Artificial intelligence approaches are applied, e.g., machine learning or convolutional neural networks, to identify cCRE signatures capable of predicting the respective biological/medical condition to achieve generalized models, which also work with a smaller number of samples (Figure 7). Example 8: Learning data (to identify patterns and regularities): The cooperating TF-gene, TF-TF, TF-DHS, gene-gene, and DHS-gene interaction networks, and the combined TF-DHS-gene network, are generated from cfDNA data employing well-defined cohorts. One cohort comprises healthy controls of both sexes and all age groups. “Healthy” is a relative term, and therefore, samples are also collected from “healthy individuals” with common “co-morbidities”, such as hypertension, obesity, diabetes, and so on. Furthermore, cohorts of well-defined diseases or conditions (e.g., chronic diseases) involving mainly specific organs are needed. For example, individuals with colorectal cancer or chronic diseases affecting the GI tract (e.g., Crohn’s disease, ulcerative colitis) are suited to evaluate GI-specific interaction networks in cfDNA. Example 9: Generating or determining TF-TF and TF-gene networks by developing a similarity matrix from all TF profiles Establishing various networks from cfDNA is described in the present invention. The following example, details of determining or generating TF-TF and TF-gene networks is described and results are provided. In general, interacting TFs should have highly similar cfDNA patterns. To assess TFs similarity, a similarity matrix was generated using correlation coefficients or gaussian kernel transformed distance metrics. As an example, the following algorithm 1 can be used for establishing such a similarity matrix (D = distance matrix; S = similarity matrix; C = Correlation matrix; 1058: the number of currently used TFs; 4000: the MUG003P -83- number of positions around the center of provided genomic intervals; T: Transcription factor profile matrix; R: set of real numbers; X: matrix of size n x m (In the context of f it means that a function maps a matrix of this size to a square matrix of size n x n); Y: matrix returned by function f having a size n x n):
Figure imgf000084_0001
Figure imgf000084_0002
Figure imgf000084_0004
Figure imgf000084_0003
With this similarity matrix, based on nucleosome positions and open chromatin regions within a given cfDNA sample, it can be established which TFs show similar patterns, i.e., accessibilities. Those TFs with a similar/identical accessibility pattern belong to the same TF-TF network. To further confirm such a TF-TF network, it may be correlated with protein-protein interaction networks, which can be retrieved from publicly available resources, such as the STRING database (https://string-db.org/). For example, the cfDNA-derived TF-TF network can be aligned to the PPI network by using an AND operation to select only the edges that are present in both networks. Furthermore, for each disease network, the algorithm can be applied to identify communities in the network/graph that are densely connected. We used a Louvain community detection algorithm in our examples, but other strategies may be applicable. Finally, enrichment analyses can be applied to investigate the function, process, pathway, and so on for each community. MUG003P -84- Furthermore, prototype disease networks can be generated, identifying modules commonly present in a specific disease. This is achieved by merging sample-specific networks (like co-activation networks generated with algorithm 1) representing each edge between two nodes in the final network as the number of sample-specific networks having that edge divided by the total number of sample-specific networks used (mean of edges between two nodes). Disease-specific networks become extremely useful for classification because a distance to these prototype networks can be generated and assigned to a sample-specific network to a disease represented as a prototype network. To obtain disease-specific networks, one can calculate differences between networks. For example, condition A (Ga) may represent a disease state, e.g., prostate cancer (but not limited to cancer), and condition B (Gb) a healthy state. To facilitate the identification of differences between these networks, the two networks can be subtracted from each other (Ga-Gb) by removing common edges as shown in the following example:
Figure imgf000085_0001
The following results were obtained with these strategies: Figure 12 shows the complete TF-TF network, derived from cfDNAs from a patient with prostate cancer and a healthy individual. In this example, 24 communities, i.e., TFs that are very densely connected, were identified altogether. This enables the closer investigation of particular TF-TF subnetworks or communities of interest. For example, prostate cancer is hormone- dependent, and the activation of the androgen receptor (AR) [in Figure 12, shown in the centermost cluster of communities) has a central role in disease initiation and progression, whereas communities of decreasing importance can be seen at the outermost layer of the plot. Figure 13 displays the TF-TF interaction network for the community with AR from the example in Figure 12. In Figure 13 it is referred to data according to the following table 5: MUG003P -85- Table 5
Figure imgf000086_0001
MUG003P -86- As a proof of context of determining a regulation network, the algorithm found previously reported connections between AR and other TFs, such as FOXA1, FOXA2, and NKX3-1 (Figure 13), which verify the capability of the present invention to establish TF-TF networks from cfDNA. The size of the circle indicates the importance of the respective TF in the subnetwork. Furthermore, to confirm that this TF-TF network is prostate cancer-specific, one can compare it with the connectivity between these TFs in cfDNA from a healthy individual. If the TF-TF network is specific for prostate cancer, it should be different in healthy individuals or not present. Indeed, an analysis with the same TFs as in Figure 2 confirms that in cfDNA from a healthy person, the latter is correct (Figure 14). Figure 14 illustrates that a completely different result is obtained if a cfDNA sample from a healthy individual is analyzed. In the cells that release their DNA into the circulation of a healthy individual, AR is usually not active. Hence, AR should not have increased accessibility. Therefore, there is no connection to AR since the correlation of the others transcription factor signals are low in this case, which confirms the absence of an AR-related TF-TF network. Hence, the TF-TF network displayed in Figure 13 is indeed disease-specific. Figure 15 displays another TF-TF network established from the cfDNA of a prostate cancer patient with the TF STAT1 in the center. Furthermore, for each community/TF-TF network, it can be quickly established which pathways are likely affected by interrogating the involved TFs with publicly available databases. Figure 16 illustrates a TF-TF network with HDAC1 as the main TF. This network affects the cell cycle and transcriptional regulation in cancer. Another example is displayed in Figure 17, where TFs TCF7L1, MYC, and ASH2L are central. This network affects transcriptional regulation in cancer, several pathways, and WNT signaling. It can be used for subclassifications of prostate cancer types, e.g., at the castration resistant stage. A cfDNA-based approach should have the potential to monitor the course of a disease. For example, in patients with cancer, selection pressure invoked by a given treatment (e.g., chemotherapy, radiation, immunotherapy, or others) may result in profound alterations of the tumor genome. Identifying these alterations is of utmost importance as they may explain evolving resistance against a given treatment and may require treatment change. MUG003P -87- Community 1 displayed in Figure 18 refers to the following table 6: Table 6 on ng ay rial cer cer ng ay ng me cer cer lar ma cer
Figure imgf000088_0001
MUG003P -88- ell ma id ia ng ys ng … a- ed us on in er lla on on an us on nic lar hy sis er ne
Figure imgf000089_0001
ay
MUG003P -89- The following example illustrates a prostate cancer case (Figure 18). The tumor was initially an adenocarcinoma (P148_1), the most common prostate cancer subtype (Figure 18B). Prostate adenocarcinomas are AR-dependent, and the cfDNA-based TF- TF network analysis reveals an AR network with established TF partners of AR, such as FOXA1, GRHL2, NKX3-1, or GATA3. However, within 12 months, the time interval between collection of these samples, i.e., P148_1 and P148_3, the prostate adenocarcinoma transdifferentiated to a treatment-emergent small-cell neuroendocrine prostate cancer (t-SCNC). Prostate adenocarcinomas and neuroendocrine prostate cancers have fundamental differences in their tumor genomes and biology, as reflected by the network analyses (Figure 18). The t-SCNC is no longer an androgen-dependent stage of prostate cancer, and the AR network was switched off in sample P148_3 (Figure 18A). Furthermore, the transdifferentiation from an adenocarcinoma to a neuroendocrine tumor means a change in the cell-type identity, which becomes apparent in our network analyses as vanishing edges between TFs such as HOXB13, NKX3-1, and GRHL2 (Figure 18A). The high clinical relevance of identifying such changes is apparent, but, currently, no standard non-invasive approach to doing so exists: the standard treatment of prostate cancer is hormone therapy or androgen deprivation therapy (ADT), as prostate cancer is a hormone-dependent tumor. However, this dependency no longer exists after transdifferentiation to the neuroendocrine state, rendering the hormone treatment ineffective. Hence, the treatment must be changed to chemotherapy. Figure 19 illustrates an example that cfDNA-based TF-TF analyses reveal fundamental insights into tumor biology. The left plot shows the signal for the TF AR, and the right plot shows the signal for the TF FOXA1, i.e., two TFs with vital roles in prostate cancer tumorigenesis. In each plot, the dark grey line displays an early disease stage at which the tumor depends on AR and where the AR binding sites usually have high accessibility. In contrast, the light grey line reflects an advanced stage called castration- resistant prostate cancer (CRPC). At this advanced stage, the status of the AR has usually changed, and the tumor no longer responds to hormone or ADT therapy. These differences in the biology of the tumor genomes at these stages are visible in the plots. In the early stage, AR is accessible and has decreased coverage at the transcription factor binding site (TFBS) center. However, at the CRPC stage, this accessibility is lost (Figure 19, left panel). MUG003P -90- In contrast, high accessibility is still given in both disease stages for TF FOXA1, which is also an established driver in prostate cancer tumorigenesis (Figure 19, right panel). Hence, it can be concluded that in the CRPC stage, the edge between AR and FOXA1 is lost and that the biology of the tumor has fundamentally changed. Figure 20 is a further example, similar to Figures 13 and 14, illustrating that in the present invention completely different results are obtained depending on whether the TF subnetwork analyses are conduct with a cfDNA sample from a patient with prostate cancer (Figure 20, upper panel) or with a cfDNA sample from a healthy individual (Figure 20, lower panel). Figure 21 shows the result of a subtraction operation between the PC specific subnetwork and the equivalent subnetwork in healthy individuals. In the following, examples are shown for establishing TF-gene networks from cfDNA: Each TF controls several genes, which can also be investigate from cfDNA, e.g., by the coverage patterns of the respective TSSs of the genes. Figures 22 and 23 show applications of the described approach, where it was established, which genes are co- regulated by specific TFs in cfDNA from patients with prostate cancer. Example 10: Determining or generating gene-gene and TF-gene networks by the inclusion of fragmentation patterns at the +1 and +2 nucleosomes, within gene bodies, and other regulatory regions Nucleosomes and open chromatin regions may determine the fragmentation patterns at TSSs and gene bodies and are related to transcriptional gene activity. Therefore, they may be used to establishing gene-gene, TF-gene, and DHS-gene networks. Signals may be derived from cfDNA analyses, i.e., determining nucleosome positions, open chromatin regions, coverage patterns at nucleosome locations, and open chromatin regions, fragmentation patterns (e.g., length of cfDNA fragments) at nucleosome locations and open chromatin regions, transcriptional activities of genes based on TSS patterns. In the following, it is described how a particular feature of fragmentation patterns, i.e., the fragmentation length, is employed to further improve inferring gene expression from cfDNA, significantly affecting our ability to determine TF-gene, gene-gene, and DHS-gene networks. Furthermore, these patterns are also different depending on age and thus allow establishing gene-related networks in an age-depending pattern. MUG003P -91- Promoters of transcriptionally active genes in eukaryotic cells are characterized by a nucleosome-depleted region (NDR) with two flanking nucleosomes commonly known as the –1 (the last upstream) and the +1 (the first downstream) nucleosomes. The +1 nucleosome is well positioned downstream of the transcription start site (TSS) and is commonly known as a barrier of transcription. The +1 nucleosome displays the tightest positioning (or phasing) of all the nucleosomes found in and around genes. The +1 nucleosome often contains histone variants (H2A.Z and H3.3) and histone tail modifications (methylation and acetylation). The +2 nucleosome is located immediately downstream of the +1 nucleosome. It shares some properties with the +1 nucleosome but contains less H2A.Z and displays less methylation, acetylation, and phasing. The +3 nucleosome and the more downstream nucleosomes have fewer properties than the previous upstream nucleosome. To define actively transcribed genes from cfDNA with improved precision, the cfDNA fragmentation patterns at the +1 and +2 nucleosomes were investigated. The following examples refer to the analyses of these first two nucleosomes downstream of the TSS; however, these analyses can be extended to more nucleosomes downstream of the TSS. Figure 24 illustrates calculations at the +1 nucleosome (top panel) and the +2 nucleosome (bottom panel). Each dot represents a ratio value between short and long cfDNA fragments. In this example, “short” refers to cfDNA fragments <250bp, whereas “long” refers to cfDNA fragments ≥250bp. However, other differentiators between short and long, e.g., 150bp, 200bp, or other values, should also be applicable. The Y-axis indicates different groups of genes. Two reference gene sets were established, one consisting of housekeeping genes (HK genes) from the Housekeeping and Reference Transcript Atlas (https://housekeeping.unicamp.br/) and the second comprising an approximately equal number of unexpressed genes according to the protein atlas (protein atlas unexpressed, PAU genes) (www.proteinatlas.org/). Furthermore, gene sets with different expressions in peripheral blood mononuclear cells (PBMCs) were defined because the vast majority (>90%) of plasma DNA is derived from hematopoietic cells, and gene expression in PBMCs has been extensively investigated so that reliable and verified gene expression data are available. Data were used from the Genotype-Tissue Expression (GTEx) portal (https://gtexportal.org), and protein-coding genes were split into ten different groups MUG003P -92- based on their RNA synthesis levels. Based on their TPM (Transcripts Per Kilobase Million) values, 10 PBMC gene groups were formed, i.e., PBMC_0.05 (TPM<0.05), PBMC_0.05_0.1 (TPM 0.05-0.1), PBMC_0.1_0.3 (TPM 0.1-0.3), PBMC_0.3_0.5 (TPM 0.3-0.5), PBMC_0.5_1 (TPM 0.5-1), PBMC_1_2 (TPM 1-2), PBMC_2_4 (TPM 2-4), PBMC_4_10 (TPM 4-10), PBMC_10_50 (TPM 10-50), and PBMC_50 (TPM >50). These examples were generated with cfDNA samples from different cohorts: Healthy individuals older than 55 years of age (“Older_controls”), healthy individuals between 20-30 years of age (“Healthy_controls”), pregnant females, and individuals with colorectal cancer (CRC;) or prostate cancer (PC). The calculations of the ratios between short and long cfDNA fragments reveal several features (Figure 24): 1. The ratios differ depending on the transcriptional activity of genes: Transcriptional highly active genes (e.g., HK or PBMC_50 genes) have lesser ratios than genes with low transcriptional activity (e.g., PAU and PBMC_0.05 genes). In other words, lesser transcribed genes have longer cfDNA fragments. 2. The ratios differ between the +1 nucleosome and the +2 nucleosome, i.e., the ratio values for the +1 nucleosome are usually higher than those at the +2 nucleosome. 3. The ratio values are different for the various cohorts. For example, healthy persons older than 55 years of age have lower ratio values than younger healthy individuals (20-30 years of age), which is caused by an increase of longer cfDNA fragments with age. Similarly, the ratios are different within gene bodies. Figure 25 displays a similar plot as Figure 24; however, the ratio calculations were not done for the +1 and +2 nucleosomes but for the entire gene body. Similar analyses were performed for transcriptionally inactive regions. For example, the human genome contains several regions with exceptionally firmly positioned nucleosome arrays. An example is near the centromere of chromosome 12 (12p11.1; chr12:34,376,000–34,452,000; hg18). Very similar ratio values were observed for such a region among the different cohorts. Hence, a gene’s activity/expression status can be inferred based on the coverage at the TSS. In addition, particular cfDNA fragmentation patterns can be included. An example is provided of how ratio values between short and long cfDNA fragments (e.g., ratios with different differentiators (e.g., 150bp, 200bp, 250bp,…) can improve the expression status. MUG003P -93- For example, a gene-gene network can now be established by scrutinizing a cfDNA sample for all genes with the same ratio values. A gene-gene network of genes with low ratio values would represent a network of highly actively transcribed genes. In contrast, a gene-gene network with high ratio values would represent those genes with low or absent transcriptional activity. Of course, any intermediate ratio values could be used to establish gene-gene networks with various transcriptional activities. These gene- related networks can then be aligned with other networks, e.g., a TF-related network described above, to build a TF-gene network. Furthermore, these examples illustrate how these networks can be used to estimate the (biological) age of the cfDNA donor. As illustrated above, the ratio values differ between young and older persons at the +1 nucleosome, +2 nucleosome, and gene body. Hence, the age can also be estimated by establishing the gene-gene networks and including ratio values. Example 11: selection of TFs For determining a network, a question may be how to select the most appropriate TFs, genes, or DHSs. What is “most appropriate” depends on the question to be addressed. For example, one application is determining whether a given cfDNA sample is from a healthy individual or a person with cancer. If the cfDNA is derived from a cancer patient, it would be desirable to determine the cancer type through the network analysis. For such a scenario, a procedure is needed to identify the most differentially active transcription factors between several tumor entities and healthy cohorts. In the following, an example is shown where selected the most suitable TFs were selected to distinguish between cfDNA samples from patients with breast (BC), prostate (PC), colon cancer (CRC), and from healthy individuals. To conduct such analyses, confounding factors were reduced by diluting all cfDNA samples to the same coverage, e.g., 20x, and to the same tumor fraction, e.g., 0.2. For each sample, for ~1050 TF the respective TFBS coverages were calculated and summarized as the signal's amplitude. Then the Mann-Whitney-U-Test was applied with Holm-Sidak correction to find the statistically significant TFs or genes. Finally, the top 10 TF per cohort were selected based on the adjusted p-value (Algorithm 2) and were used as features in the training of a classifier. The selection and evaluation of the classifier are described in the workflow presented in Figure 26. MUG003P -94- The following shows the outline of the algorithm (algorithm 2) to find most deferentially active transcription factors between any group defined by the sample labels l. This was used to calculate the differentially active transcription factors between CRC, BC, PC, and healthy cohorts.
Figure imgf000095_0001
Figure imgf000095_0002
Figure imgf000095_0003
Figure 26 shows the outline of the algorithm to find the best model to classify patients based on top differentially active transcription factors per cohort. Each dataset Di is split into test and train sets. The training set is used to select the best model and hyperparameters. This is achieved through cross-validation on each training set i. Best models are refit on the full training set i, and their final performances are evaluated on an independent test set i. MUG003P -95- Through this process multiple models and parameters are evaluated (Figure 27) and the best model with the best parameters set is selected and reevaluated on an independent test set. This model yields outstanding results with high F1 scores, as illustrated in the confusion matrix (Figure 28). Most errors occurred between differentiating the PC from the BC cohort, which was expected as both tumor entities are hormone-dependent, so their biology partly overlaps. Figure 28 shows the confusion matrix demonstrating that most cfDNA samples can be classified correctly by TF-TF network analyses after selection of the most different TFs.
MUG003P -96- REFERENCES Bochkis, I.M., et al. (2014). Changes in nucleosome occupancy associated with metabolic alterations in aged mammalian liver. Cell reports 9, 996-1006. Breschi, A., et al. (2020). A limited set of transcriptional programs define major cell types. Genome Res 30, 1047-1059. Dixon, J.R., et al. (2015). Chromatin architecture reorganization during stem cell differentiation. Nature 518, 331-336. Encode Project Consortium (2011). A user's guide to the encyclopedia of DNA elements (ENCODE). PLoS Biol 9, e1001046. Encode Project Consortium, Moore, J.E., et al. (2020a). Expanded encyclopaedias of DNA elements in the human and mouse genomes. Nature 583, 699- 710. Encode Project Consortium, Snyder, M.P., et al. (2020b). Perspectives on ENCODE. Nature 583, 693-698. Georgolopoulos, G., et al. (2021). Discrete regulatory modules instruct hematopoietic lineage commitment and differentiation. Nature communications 12, 6790. Glass, K., et al. (2013). Passing messages between biological networks to refine predicted interactions. PloS one 8, e64832. Göös, H., et al. (2022). Human transcription factor protein interaction networks. Nature communications 13, 766. Guebila, M.B., et al. (2022a). GRAND: a database of gene regulatory network models across human conditions. Nucleic Acids Res 50, D610-D621. Guebila, M.B., et al. (2022b). gpuZoo: Cost-effective estimation of gene regulatory networks using the Graphics Processing Unit. NAR Genomics and Bioinformatics 4. Ho, L., and Crabtree, G.R. (2010). Chromatin remodelling during development. Nature 463, 474-484. Kolmykov, S., et al. (2021). GTRD: an integrated view of transcription regulation. Nucleic Acids Research 49, D104-D111. Lai, B., et al. (2018). Principles of nucleosome organization revealed by single- cell micrococcal nuclease sequencing. Nature 562, 281-285. Lambert, S.A., et al. (2018). The Human Transcription Factors. Cell 172, 650-665. MUG003P -97- Meuleman, W., et al. (2020). Index and biological spectrum of human DNase I hypersensitive sites. Nature 584, 244-251. Minnoye, L., et al. (2021). Chromatin accessibility profiling methods. Nature Reviews Methods Primers 1, 10. Peneder, P., et al. (2021). Multimodal analysis of cell-free DNA whole-genome sequencing for pediatric cancers with low mutational burden. Nature communications 12, 3230. Roadmap Epigenomics Consortium, Kundaje, A., et al. (2015). Integrative analysis of 111 reference human epigenomes. Nature 518, 317-330. Sheffield, N.C., et al. (2013). Patterns of regulatory activity across diverse human cell types predict tissue identity, transcription factor binding, and long-range interactions. Genome Res 23, 777-788. Snyder, M.W., et al. (2016). Cell-free DNA Comprises an In Vivo Nucleosome Footprint that Informs Its Tissues-Of-Origin. Cell 164, 57-68. Sonawane, A.R., et al. (2017). Understanding Tissue-Specific Gene Regulation. Cell reports 21, 1077-1088. Stergachis, A.B., et al. (2020). Single-molecule regulatory architectures captured by chromatin fiber sequencing. Science 368, 1449-1454. Stergachis, A.B., et al. (2013). Developmental fate and cellular maturity encoded in human regulatory DNA landscapes. Cell 154, 888-903. Stergachis, A.B., et al. (2014). Conservation of trans-acting circuitry during mammalian regulatory evolution. Nature 515, 365-370. Uhlen, M., et al. (2015). Proteomics. Tissue-based map of the human proteome. Science 347, 1260419. Ulz, P., et al. (2019). Inference of transcription factor binding from cell-free DNA enables tumor subtype prediction and early detection. Nature communications 10, 4666. Ulz, P., et al. (2016). Inferring expressed genes by whole-genome sequencing of plasma DNA. Nat Genet 48, 1273-1278. Vierstra, J., et al. (2020). Global reference mapping of human transcription factor footprints. Nature 583, 729-736. Vierstra, J., et al. (2014). Coupling transcription factor occupancy to nucleosome architecture with DNase-FLASH. Nature methods 11, 66-72. Vijg, J. (2014). Somatic mutations, genome mosaicism, cancer and aging. Current opinion in genetics & development 26, 141-149. MUG003P -98- Vijg, J., and Campisi, J. (2008). Puzzles, promises and a cure for ageing. Nature 454, 1065-1071. Xu, M., et al. (2022). TF-Marker: a comprehensive manually curated database for transcription factors and related markers in specific cell and tissue types in human. Nucleic Acids Res 50, D402-D412. Yao, L., et al. (2015). Inferring regulatory element landscapes and transcription factor networks from cancer methylomes. Genome biology 16, 105. Yevshin, I., et al. (2017). GTRD: a database of transcription factor binding sites identified by ChIP-seq experiments. Nucleic Acids Res 45, D61-D67. Zhang, K., et al. (2021). A single-cell atlas of chromatin accessibility in the human genome. Cell 184, 5985-6001 e5919. Zhang, Q., et al. (2016). Systems-level analysis of human aging genes shed new light on mechanisms of aging. Hum Mol Genet 25, 2934-2947. Zhu, G., Guo, et al. (2021). Tissue-specific cell-free DNA degradation quantifies circulating tumor DNA burden. Nature communications 12, 2229.

Claims

MUG003P -99- CLAIMS 1. A computer-implemented method for determining a regulation network from cell-free DNA (cfDNA) fragments from a sample comprising the steps of: i. receiving data representing the DNA sequences of cfDNA fragments acquired by sequencing of cfDNA fragments extracted from the sample; ii. determining nucleosome positions; and iii. determining at least one of the regulation networks selected from the group consisting of: a. a transcription factor (TF)-gene network; b. a TF-TF network; c. a gene-gene network; d. a TF-DNase hypersensitive site (DHS) network; and e. a DHS-gene network. 2. The computer-implemented method of claim 1, further comprising determining open chromatin regions in ii.. 3. The computer-implemented method of claim 1 or 2, further comprising determining coverage patterns at nucleosome positions and/or open chromatin regions in ii.. 4. The computer-implemented method of any one of claims 1 to 3, further comprising determining fragmentation patterns at nucleosome positions and/or at open chromatin regions in ii.. 5. The computer-implemented method of claim 4, further comprising determining the length of cfDNA fragments. 6. The computer-implemented method of any one of claims 1 to 5, further comprising determining transcription start site (TSS) patterns in ii.. 7. The computer-implemented method of claim 6, further comprising determining transcriptional activities of genes based on TSS patterns. 8. The computer-implemented method of any one of claims 1 to 7, wherein determining the TF-gene network comprises the steps of: a. determining actively transcribed TFs; b. determining tissue-specificity of the actively transcribed TFs from a.; c. determining gene sets which the TFs from a. activate in each tissue determined in b.; d. evaluating if the gene sets are transcribed; MUG003P -100- e. determining the intersect for identical and different genes from the gene sets; and f. determining the TF-gene network from the data obtained from e.. 9. The computer-implemented method of any one of claims 1 to 7, wherein determining the TF-TF network comprises the steps of: a. assessing accessibility of the respective TFBS of each TF; b. optionally determining overlapping binding sites in TFs; c. determining TF-TF interaction; d. correlating the accessibility obtained from a. with the interaction obtained from c.; and e. determining the TF-TF network from the data obtained from d.. 10. The computer-implemented method of any one of claims 1 to 7, wherein determining the gene-gene network comprises the steps of: a. determining the expression status of pre-selected genes or gene-sets, wherein the expression status is determined by i. determining the coverage pattern at the nucleosome depleted region (NDR) and/or at the region of 2 kilobases upstream and downstream of the TSS (2K) region, or ii. determining if a nucleosome is present at the NDR; b. correlating the genes according to their expression status; and c. determining the gene-gene network from the data obtained from b.. 11. The computer-implemented method of any one of claims 1 to 7, wherein determining the TF-DHS network comprises the steps of: a. determining actively transcribed TFs; b. determining maps of accessible distal DHSs; c. correlating the actively transcribed TFs with the maps of distal DHSs; and d. determining the TF-DHS network from the data obtained from c.. 12. The computer-implemented method of any one of claims 1 to 7, wherein determining the DHS-gene network from DNA sequences of cfDNA fragments comprises the steps of: a. determining gene expression status by i. determining the coverage pattern at the NDR and/or at the 2K region, or ii. determining if a nucleosome is present at the NDR; b. determining maps of accessible distal DHSs; MUG003P -101- c. correlating the gene expression status of a. with the maps of accessible distal DHSs of b.; and d. determining the DHS-gene network from the data obtained from c.. 13. The computer-implemented method of any one of claims 1 to 12, wherein at least two, three, four, or five of the regulation networks are determined. 14. The computer-implemented method of any one of claims 1 to 13, wherein the sample is a biological sample from a subject or from a cohort of subjects. 15. The computer-implemented method of any one of claims 1 to 14, further comprising comparing the at least one of the regulation networks with one or more standard regulation network, or regulation model selected from TF-gene network, TF-TF network, gene-gene network, TF-DHS network, and DHS-gene network. 16. The computer-implemented method of any one of claims 1 to 15, further comprising screening for a correlation of the at least one of the regulation networks with one or more standard regulation network, or regulation model selected from TF-gene network, TF-TF network, gene-gene network, TF-DHS network, and DHS-gene network. 17. The computer-implemented method of claim 15 or 16, wherein the one or more standard regulation network, or regulation model is determined for one or more cohorts of subjects having a specific classification. 18. The computer-implemented method of claim 17, wherein the specific classification is associated with a condition. 19. The computer-implemented method of claim 18, wherein the condition is selected from the group consisting of health status, aging status, cell type, tissue type, and specific disease status. 20. The computer-implemented method of any one of claims 15 to 19, wherein markers for specific conditions are defined. 21. The computer-implemented method of any one of claims 1 to 20, wherein the most differently active TFs, genes, or DHSs are determined. 22. The computer-implemented method of any one of claims 15 to 21, further comprising determining whether a subject has a specific condition. 23. The computer-implemented method of any one of claims 15 to 22, wherein the cell and/or tissue origin of cfDNA fragments is determined. 24. The computer-implemented method of any one of claims 15 to 23, wherein the one or more standard regulation network, or regulation model is derived from healthy subjects, and/ or unhealthy subjects. MUG003P -102- 25. The computer-implemented method of claim 24, wherein a. congruence with the standard regulation network, or regulation model derived from healthy subjects and difference with the standard network or model derived from unhealthy subjects is characteristic for a healthy status; and/or b. congruence with the standard regulation network, or regulation model derived from unhealthy subjects and difference with the standard regulation network or regulation model derived from healthy subjects is characteristic for an unhealthy status. 26. The computer-implemented method of any one of claims 24 or 25, wherein the health status of a subject is determined. 27. The computer-implemented method of any one of claims 15 to 22, wherein the subject is a patient undergoing treatment of a health condition. 28. The computer-implemented method of claim 27, wherein the one or more standard regulation network, or regulation model is derived from a previous result from said patient and/or a standard regulation networks characteristic for treatment success. 29. The computer-implemented method of claim 28, wherein differences and/or congruences provide information on the treatment success of the patient. 30. The computer-implemented method of claim 28 or 29, wherein the treatment success of a patient is monitored. 31. The computer-implemented method of any one of claims 15 to 22, wherein congruence with the standard regulation network derived from a specific cohort of subjects having a specific aging status is characteristic for a specific aging status. 32. The computer-implemented method of claim 33, wherein the aging status of a subject is determined. 33. The computer-implemented method of claim 31 or 32, wherein the cohort of subjects having a specific aging status is selected from healthy subjects older than 55 years, healthy subjects between 20 and 30 years, pregnant females, and subjects having a disease. 34. The computer-implemented method of claim 33, wherein the disease is cancer, specifically selected from colorectal cancer and prostate cancer. MUG003P -103- 35. A model comprising at least one of the regulation networks selected from TF-gene, TF-TF, gene-gene, TF-DHS, and DHS-gene networks obtained from cfDNA according to the computer-implemented method of any one of claims 1 to 34. 36. A data processing apparatus comprising means for carrying out the computer-implemented method of any one of claims 1 to 34. 37. A computer program comprising instructions which, when the program is executed by a computer, cause the computer to carry out the computer-implemented method of any one of claims 1 to 34. 38. A computer-readable medium having stored thereon the computer program of claim 37.
PCT/EP2023/075125 2022-09-13 2023-09-13 Determining the health status with cell-free dna using cis-regulatory elements and interaction networks WO2024056722A1 (en)

Applications Claiming Priority (2)

Application Number Priority Date Filing Date Title
EP22195320.1 2022-09-13
EP22195320 2022-09-13

Publications (1)

Publication Number Publication Date
WO2024056722A1 true WO2024056722A1 (en) 2024-03-21

Family

ID=83319319

Family Applications (1)

Application Number Title Priority Date Filing Date
PCT/EP2023/075125 WO2024056722A1 (en) 2022-09-13 2023-09-13 Determining the health status with cell-free dna using cis-regulatory elements and interaction networks

Country Status (1)

Country Link
WO (1) WO2024056722A1 (en)

Citations (1)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
US20160004814A1 (en) 2012-09-05 2016-01-07 University Of Washington Through Its Center For Commercialization Methods and compositions related to regulation of nucleic acids

Patent Citations (1)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
US20160004814A1 (en) 2012-09-05 2016-01-07 University Of Washington Through Its Center For Commercialization Methods and compositions related to regulation of nucleic acids

Non-Patent Citations (44)

* Cited by examiner, † Cited by third party
Title
ALBERTS ET AL.: "Molecular Biology of the Cell", 2022, article "Vogel and Motulsky's Human Genetics: Problems and Approaches"
BOCHKIS, I.M. ET AL.: "Changes in nucleosome occupancy associated with metabolic alterations in aged mammalian liver", CELL REPORTS, vol. 9, 2014, pages 996 - 1006
BRESCHI, A. ET AL.: "A limited set of transcriptional programs define major cell types", GENOME RES, vol. 30, 2020, pages 1047 - 1059
DIXON, J.R. ET AL.: "Chromatin architecture reorganization during stem cell differentiation", NATURE, vol. 518, 2015, pages 331 - 336
ENCODE PROJECT CONSORTIUM: "A user's guide to the encyclopedia of DNA elements (ENCODE", PLOS BIOL, vol. 9, 2011, pages e1001046
ENCODE PROJECT CONSORTIUMMOORE, J.E. ET AL.: "Expanded encyclopaedias of DNA elements in the human and mouse genomes", NATURE, vol. 583, 2020, pages 699 - 710
ENCODE PROJECT CONSORTIUMSNYDER, M.P. ET AL.: "Perspectives on ENCODE", NATURE, vol. 583, 2020, pages 693 - 698
GEORGOLOPOULOS, G. ET AL.: "Discrete regulatory modules instruct hematopoietic lineage commitment and differentiation", NATURE COMMUNICATIONS, vol. 12, 2021, pages 6790
GLASS, K. ET AL.: "Passing messages between biological networks to refine predicted interactions", PLOS ONE, vol. 8, 2013, pages e64832
GOOS, H. ET AL.: "Human transcription factor protein interaction networks", NATURE COMMUNICATIONS, vol. 13, 2022, pages 766
GUEBILA, M.B. ET AL.: "gpuZoo: Cost-effective estimation of gene regulatory networks using the Graphics Processing Unit", NAR GENOMICS AND BIOINFORMATICS, 2022, pages 4
GUEBILA, M.B. ET AL.: "GRAND: a database of gene regulatory network models across human conditions", NUCLEIC ACIDS RES, vol. 50, 2022, pages D610 - D621
HEITZER ET AL., NAT REV GENET, vol. 20, 2019, pages 71 - 88
HEITZER ET AL., TRENDS MOL MED, vol. 26, 2020, pages 519 - 528
HO, L.CRABTREE, G.R.: "Chromatin remodelling during development", NATURE, vol. 463, 2010, pages 474 - 484, XP037158932, DOI: 10.1038/nature08911
I. S. YEVSHIN, R. N. SHARIPOV. S. K. KOLMYKOV, Y. V. KONDRAKHIN, F. A. KOLPAKOV: "GTRD: a database on gene transcription regulation-2019 update", NUCLEIC ACIDS RES, vol. 47, no. D1, 8 January 2019 (2019-01-08), pages D100 - D105
KOLMYKOV, S. ET AL.: "GTRD: an integrated view of transcription regulation", NUCLEIC ACIDS RESEARCH, vol. 49, 2021, pages 0104 - 0111
LAI, B. ET AL.: "Principles of nucleosome organization revealed by single-cell micrococcal nuclease sequencing", NATURE, vol. 562, 2018, pages 281 - 285, XP036953693, DOI: 10.1038/s41586-018-0567-3
LAMBERT, S.A. ET AL.: "The Human Transcription Factors", CELL, vol. 172, 2018, pages 650 - 665, XP085347128, DOI: 10.1016/j.cell.2018.01.029
MARKUS ET AL., SCI REP, vol. 12, 2022, pages 1928
MEULEMAN, W. ET AL.: "Index and biological spectrum of human DNase I hypersensitive sites", NATURE, vol. 584, 2020, pages 244 - 251, XP037218281, DOI: 10.1038/s41586-020-2559-3
MINNOYE, L. ET AL.: "Chromatin accessibility profiling methods", NATURE REVIEWS METHODS PRIMERS, vol. 1, 2021, pages 10
PENEDER, P. ET AL.: "Multimodal analysis of cell-free DNA whole-genome sequencing for pediatric cancers with low mutational burden", NATURE COMMUNICATIONS, vol. 12, 2021, pages 3230
ROADMAP EPIGENOMICS CONSORTIUMKUNDAJE, A. ET AL.: "Integrative analysis of 111 reference human epigenomes", NATURE, vol. 518, 2015, pages 317 - 330, XP055434136, DOI: 10.1038/nature14248
SHEFFIELD, N.C. ET AL.: "Patterns of regulatory activity across diverse human cell types predict tissue identity, transcription factor binding, and long-range interactions", GENOME RES, vol. 23, 2013, pages 777 - 788
SNYDER, M.W. ET AL.: "Cell-free DNA Comprises an In Vivo Nucleosome Footprint that Informs Its Tissues-Of-Origin", CELL, vol. 164, 2016, pages 57 - 68
SONAWANE, A.R. ET AL.: "Understanding Tissue-Specific Gene Regulation", CELL REPORTS, vol. 21, 2017, pages 1077 - 1088
STERGACHIS, A.B. ET AL.: "Conservation of trans-acting circuitry during mammalian regulatory evolution", NATURE, vol. 515, 2014, pages 365 - 370, XP037474510, DOI: 10.1038/nature13972
STERGACHIS, A.B. ET AL.: "Developmental fate and cellular maturity encoded in human regulatory DNA landscapes", CELL, vol. 154, 2013, pages 888 - 903
STERGACHIS, A.B. ET AL.: "Single-molecule regulatory architectures captured by chromatin fiber sequencing", SCIENCE, vol. 368, 2020, pages 1449 - 1454
UHLEN, M. ET AL.: "Proteomics. Tissue-based map of the human proteome", SCIENCE, vol. 347, 2015, pages 1260419, XP055393269, DOI: 10.1126/science.1260419
ULZ ET AL., NAT COMMUN, vol. 10, 2019, pages 4666
ULZ, P. ET AL.: "Inference of transcription factor binding from cell-free DNA enables tumor subtype prediction and early detection", NATURE COMMUNICATIONS, vol. 10, 2019, pages 4666, XP055892459, DOI: 10.1038/s41467-019-12714-4
ULZ, P. ET AL.: "Inferring expressed genes by whole-genome sequencing of plasma DNA", NAT GENET, vol. 48, 2016, pages 1273 - 1278
VIERSTRA, J. ET AL.: "Coupling transcription factor occupancy to nucleosome architecture with DNase-FLASH", NATURE METHODS, vol. 11, 2014, pages 66 - 72
VIERSTRA, J. ET AL.: "Global reference mapping of human transcription factor footprints", NATURE, vol. 583, 2020, pages 729 - 736, XP037212584, DOI: 10.1038/s41586-020-2528-x
VIJG, J.: "Somatic mutations, genome mosaicism, cancer and aging", CURRENT OPINION IN GENETICS & DEVELOPMENT, vol. 26, 2014, pages 141 - 149
VIJG, J.CAMPISI, J.: "Puzzles, promises and a cure for ageing", NATURE, vol. 454, 2008, pages 1065 - 1071
XU, M. ET AL.: "TF-Marker: a comprehensive manually curated database for transcription factors and related markers in specific cell and tissue types in human", NUCLEIC ACIDS RES, vol. 50, 2022, pages D402 - D412
YAO, L. ET AL.: "Inferring regulatory element landscapes and transcription factor networks from cancer methylomes", GENOME BIOLOGY, vol. 16, 2015, pages 105, XP021224294, DOI: 10.1186/s13059-015-0668-3
YEVSHIN, I. ET AL.: "GTRD: a database of transcription factor binding sites identified by ChlP-seq experiments", NUCLEIC ACIDS RES, vol. 45, 2017, pages D61 - D67
ZHANG, K. ET AL.: "A single-cell atlas of chromatin accessibility in the human genome", CELL, vol. 184, 2021, pages 5985 - 6001
ZHANG, Q. ET AL.: "Systems-level analysis of human aging genes shed new light on mechanisms of aging", HUM MOL GENET, vol. 25, 2016, pages 2934 - 2947
ZHU, G.GUO ET AL.: "Tissue-specific cell-free DNA degradation quantifies circulating tumor DNA burden", NATURE COMMUNICATIONS, vol. 12, 2021, pages 2229, XP055954818, DOI: 10.1038/s41467-021-22463-y

Similar Documents

Publication Publication Date Title
AU2016370835B2 (en) Distinguishing methylation levels in complex biological samples
JP2023123420A (en) Methods of determining tissues and/or cell types giving rise to cell-free dna, and methods of identifying disease or disorder using the same
AU2022224861A1 (en) Analysis of fragmentation patterns of cell-free DNA
AU2022201026A1 (en) Non-invasive determination of methylome of fetus or tumor from plasma
EP3529377B1 (en) Gestational age assessment by methylation and size profiling of maternal plasma dna
US20210071262A1 (en) Method of detecting cancer through generalized loss of stability of epigenetic domains and compositions thereof
JP5938484B2 (en) Method, system, and computer-readable storage medium for determining presence / absence of genome copy number variation
CN110800063A (en) Detection of tumor-associated variants using cell-free DNA fragment size
KR20230017169A (en) Method and system for colorectal cancer detection through nucleic acid methylation analysis
JP2023521308A (en) Cancer classification with synthetic training samples
US20230175058A1 (en) Methods and systems for abnormality detection in the patterns of nucleic acids
US20180066323A1 (en) Biomarkers in Cancer, Methods, and Systems Related Thereto
US20180371553A1 (en) Methods and compositions for the analysis of cancer biomarkers
AU2022255198A1 (en) Cell-free dna sequence data analysis method to examine nucleosome protection and chromatin accessibility
WO2024056722A1 (en) Determining the health status with cell-free dna using cis-regulatory elements and interaction networks
JP2022527316A (en) Stratification of virus-related cancer risk
US20240296920A1 (en) Redacting cell-free dna from test samples for classification by a mixture model
US20240233872A9 (en) Component mixture model for tissue identification in dna samples
US20240055073A1 (en) Sample contamination detection of contaminated fragments with cpg-snp contamination markers
US20240309461A1 (en) Sample barcode in multiplex sample sequencing
US20240312564A1 (en) White blood cell contamination detection
US20230272477A1 (en) Sample contamination detection of contaminated fragments for cancer classification
US20240170099A1 (en) Methylation-based age prediction as feature for cancer classification
WO2024118500A2 (en) Methods for detecting and treating ovarian cancer
WO2024155681A1 (en) Methods and systems for detecting and assessing liver conditions

Legal Events

Date Code Title Description
121 Ep: the epo has been informed by wipo that ep was designated in this application

Ref document number: 23772162

Country of ref document: EP

Kind code of ref document: A1