WO2013144663A2 - Method of determination of neutral dna sequences in the genome, system for targeting sequences obtained thereby and methods for use thereof - Google Patents

Method of determination of neutral dna sequences in the genome, system for targeting sequences obtained thereby and methods for use thereof Download PDF

Info

Publication number
WO2013144663A2
WO2013144663A2 PCT/HR2013/000003 HR2013000003W WO2013144663A2 WO 2013144663 A2 WO2013144663 A2 WO 2013144663A2 HR 2013000003 W HR2013000003 W HR 2013000003W WO 2013144663 A2 WO2013144663 A2 WO 2013144663A2
Authority
WO
WIPO (PCT)
Prior art keywords
seq
dna
genome
sequence
sequences
Prior art date
Application number
PCT/HR2013/000003
Other languages
French (fr)
Other versions
WO2013144663A3 (en
Inventor
Ivica RUBELJ
Tomislav DOMAZET-LOSO
Robert BAKARIC
Milena IVANKOVIC
Nikolina SKROBOT VIDACEK
Andrea CUKUSIC KALAJZIC
Original Assignee
Rudjer Boskovic Institute
Priority date (The priority date is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the date listed.)
Filing date
Publication date
Application filed by Rudjer Boskovic Institute filed Critical Rudjer Boskovic Institute
Publication of WO2013144663A2 publication Critical patent/WO2013144663A2/en
Publication of WO2013144663A3 publication Critical patent/WO2013144663A3/en

Links

Classifications

    • GPHYSICS
    • G16INFORMATION AND COMMUNICATION TECHNOLOGY [ICT] SPECIALLY ADAPTED FOR SPECIFIC APPLICATION FIELDS
    • G16BBIOINFORMATICS, i.e. INFORMATION AND COMMUNICATION TECHNOLOGY [ICT] SPECIALLY ADAPTED FOR GENETIC OR PROTEIN-RELATED DATA PROCESSING IN COMPUTATIONAL MOLECULAR BIOLOGY
    • G16B50/00ICT programming tools or database systems specially adapted for bioinformatics
    • GPHYSICS
    • G16INFORMATION AND COMMUNICATION TECHNOLOGY [ICT] SPECIALLY ADAPTED FOR SPECIFIC APPLICATION FIELDS
    • G16BBIOINFORMATICS, i.e. INFORMATION AND COMMUNICATION TECHNOLOGY [ICT] SPECIALLY ADAPTED FOR GENETIC OR PROTEIN-RELATED DATA PROCESSING IN COMPUTATIONAL MOLECULAR BIOLOGY
    • G16B30/00ICT specially adapted for sequence analysis involving nucleotides or amino acids
    • GPHYSICS
    • G16INFORMATION AND COMMUNICATION TECHNOLOGY [ICT] SPECIALLY ADAPTED FOR SPECIFIC APPLICATION FIELDS
    • G16BBIOINFORMATICS, i.e. INFORMATION AND COMMUNICATION TECHNOLOGY [ICT] SPECIALLY ADAPTED FOR GENETIC OR PROTEIN-RELATED DATA PROCESSING IN COMPUTATIONAL MOLECULAR BIOLOGY
    • G16B50/00ICT programming tools or database systems specially adapted for bioinformatics
    • G16B50/10Ontologies; Annotations

Definitions

  • the present invention relates to the method for identification of neutral DNA sequences in a given genome, to the neutral DNA sequences suitable for integration of the DNA of interest and methods for use thereof.
  • the method itself has major drawbacks.
  • the self replicating vector is readily diluted from the cell in the absence of the drug used for the selection. Additionally, replication of the vector poses burden for the cell and might eventually result in the deletion of the DNA of interest from the vector even in the presence of the selection drug.
  • a safe harbor site as a locus that fulfils the following five criteria: they are located (i) at least 50 kbp from the 5' end of a gene, (ii) at least 300 kbp from any cancer-related gene, (iii) at least 300 kbp from any miRNA, (iv) outside a transcription unit and (v) outside ultraconserved regions of the human genome (Papapetrou et al. (2011) Genomic safe harbours permit high ⁇ -globin transgene expression in thalassemia induced pluripotent stem cells. Nat. Biotechnology 29: 73-78).
  • rDNA is well conserved among organisms, especially a 50-bp region of 28S rDNA that is a target for sequence-specific, non-LTR retroposons (LINE). This short region is highly conserved in chorclates and arthropods and in all eukaryotes. This could constitute a good universal integrative site.
  • LINE sequence-specific, non-LTR retroposons
  • Ribosomal DNA integrating rAAV-rDNA vectors allow for stable transgene expression., Molecular Therapy 20: 1912-1923.
  • ribosomal RNAs are among most important and essential genes in living cell and interruption some of their genes may result in transcription product that could interfere with normal ribosomal function.
  • the second solution is to drive the integration to specific loci already reported to be nonmutagenic. In this approach several loci have been identified. The two most prominent sites used nowadays are: mouse ROSA26 locus and human AAVS1 site.
  • mouse ROSA26 locus has become established as one of the preferred docking sites for the ubiquitous expression of transgenes, as it can be targeted with high efficiency and is expressed in most cell types.
  • transgenic constructs harboring different exogenous promoters or different transgene cassette including reporters, site-specific recombinases and, recently, non- coding RNAs have been positioned at the ROSA26 locus (Casola S (2010) Mouse models for miRNA expression: the ROSA 26 locus. Methods Mol Biol 667:145-163; Chen C el al. (2011) A comparison of exogenous promoter activity at the ROSA26 locus using PhiC31 integrase mediated cassette exchange Approach in mouse ES cells. PLoS ONE 6(8) e23376.
  • AAV Addeno-associated virus
  • AAVS1 adeno-associated virus
  • PPPIRI2C protein phosphatase 1 regulatory subunit I2C
  • AAV has the features that makes it attractive as integration site: natural persistence in the human chromosome and no known adverse effects are reported until now. Therefore, A AVS 1 is considered to be a safe harbor for adding a transgene into the human genome (De elver et al. (2010) Functional genomics, proteomics, and regulatory DNA analysis in isogenic setting using zinc finger nuclease-driven transgenesis into safe harbour locus in the human genome. Genome Research 20:1133-1142).
  • this site can be used only for human transgenesis. Moreover, this site constitutively expresses a protein named p84 with no known function and therefore the integration of the DNA of interest at this site does not provide an isogenic setting (Linden RM et al. (1996) Site-specific integration by adeno-associated virus, P AS, USA 93: 11288-11294). This is even more important due to the fact that studies on long term toxicity of the integration of DNA of interest at this site are lacking.
  • the third solution is to screen clones that have only one copy of the integrated transgene in order to characterize putative universal genomic safe harbours, method developed by Papapetrou et al. (Papapetrou et al. (2011) Genomic safe harbours permit high ⁇ -globin transgene expression in thalassemia induced pluripotent stem cells. Nat. Biotechnology 29: 73-78. They hypothesized that the screening of clones of induced pluripotent stem cells (i PS) harboring a single vector copy would facilitate the discovery of new secure harbor sites. For this purpose, they used iPS cells with a well-characterized globin LV vector in the expectation that this would meet the five criteria for safe harbour locus described above.
  • i PS induced pluripotent stem cells
  • the present invention provides a new method by which we can identify neutral sequences in any partially or completely annotated genome and/or sequence and which gives fast and reliable results in a short period of time. Neutral sequences identified by such method are then used as the best safe loci for integration of DNA of interest into any selected genome for purposes of creation of transgenic animals, experimental biology and medicine as well as gene therapy.
  • the present invention provides the method for identification of neutral DNA sequences in a given genome, to the neutral DNA sequences suitable for integration of the DNA of interest and methods for use thereof.
  • the invention provides the method for identification of neutral DNA sequences in which the neutral DNA sequences are filtered out by a data processing system as the ones not having any known functional features selected from: regulatory or coding regions, functional genes, satellite DNA, introns, exons, microRNAs, pseudogenes, hetei chromatin, repetitive sequences, transposable elements, and exceptions marked as having functional feature.
  • present invention relates to the method of identification of neutral sequences in any partially and or completely annotated genome and/or sequence.
  • the invention further relates to the neutral DNA sequences identified thereby.
  • neutral sequences identified in human genome selected from the group consisting of SEQ ID NO: I, SEQ ID NO:2, SEQ ID NO:3, SEQ ID NO:4, SEQ ID NO:5, SEQ ID NO:6, SEQ ID NO:7, SEQ ID NO:8, SEQ 1D N0:9, SEQ ID NO: 10, SEQ ID NO: I I , SEQ ID NO: 12, SEQ ID NO: 13, SEQ ID NO: 14, SEQ ID NO: 15 and SEQ ID NO: 16.
  • the invention further relates to the system for targeting of neutral sequences determined according to the invention, namely method for the introduction of the DNA of interest into neutral sequences determined according to the invention by making use of recombinant vector containing the DNA of interest, suitable selection marker, regulatory elements, gene insertion system and neutral sequence according to the invention.
  • the invention further relates to the use of neutral sequences according to the invention for producing cell lines, transgenic plants and animals and for gene therapy.
  • the present invention provides the method for identification of neutral DNA sequences in a given genome, to the neutral DNA sequences suitable for integration of the DNA of interest and methods for use thereof.
  • neutral sequences refers to any particular portion of the genome and of DNA sequence that is located far enough from al l functional genetic elements so that integration of DNA of interest into this place does not disturb the normal physiology of the cell or the changes are kept at lowest possible level.
  • DNA of interest refers to desired DNA fragments which encode proteins or non- cod ing RNAs (such as genes, regulatory elements or other functional sequences of interest) that is chosen by the man sk i l led in the art to be integrated into the host cel l.
  • vector refers to an extrachromosomal element that is used to introduce any DNA material into cells for either expression or repl ication thereof. Both terms “vector” and “plasm id” are used indifferently.
  • predeterm ined site refers to any of the sequence selected as a place for integration of DNA of interest.
  • cell l ine refers to normal cel l lines, telomerase immortal ized cel l l ines, immortal transformed cell l ines and stem cel l l ines.
  • the invention provides a method of integration of DNA of interest at a predeterm ined site in the genome, the method comprising following steps: a) determ ination of neutral DNA sequence as predeterm ined site in the genome (insertion loci) for the insertion of DNA of interest,
  • Step a) - determination of neutral DNA sequence as predetermined site in the genome (insertion loci) for the insertion of DNA of interest - consists of a method to operate a data- processing system for determ ination of one or more neutral DNA sequences of a selected genome serv ing as an insertion locus. Sub-steps of above cited step a) are:
  • A loading of the genome into said data-processing system in the form of an array
  • step B marking the reported functional features according to the one or more auxiliary database's records from the array obtained in step A by tagging the start and stop positions of each known functional feature selected from : regu latory or coding regions, functional genes, satel l ite DNA, nitrons, exons, m icroRNAs, pseuclogenes, heterochromatin, repetitive sequences, transposable elements, and exceptions marked as having functional feature in auxi l iary database;
  • step C filtering out the reported functional features marked in step B. from the array in step A. and form ing set of independent sequences characterized by the length;
  • step D having less than 35 kb in order to obtain ; sequences of 35kb or longer;
  • step E for each sequence extracted in step E the regions of at least 7 kb from the end and at least 7 kb from the right end of said sequence were cut-out from the sequence;
  • step G the BLAST analysis is executed over the narrowed sequence obtained in step F. to con firm their unique status in the selected genome.
  • kb stands for "k ilobase” - a un it of measurement in molecular biology equal to 1000 base pairs of DNA or R A.
  • Th is invention further com prises isolation and puri fication of the neutral sequences obtained by the step a).
  • the invention also provides a method of construction of a vector capable of integrating into predeterm ined site in the given genome of the cell, the method comprising: h) Isolation and purification of selected sequences
  • Suitable parts of identi fied neutral sequences are ampl ified by standard techniques such as PGR method. PCR products are inserted into plasm id vector for further ampl i fication. Purification of the desired sequence from the PCR products may be done by the procedures known in the art. Veri fication of the purified sequence can be achieved by sequencing of the products. c) Construction ⁇ recombinant vector comprising neiitml sequence
  • a recombinant vector is constructed by insertion of neutral sequence and suitable cassette in the vector.
  • suitable cassette depends on the system of gene insertion used.
  • the techniques used to increase efficiency and specificity of integration are selected from site- specific recombination systems Cre/Lox, FLP/FRT, SIRT. Red/ET, TAGIT, modified Gin/gix, Zinc finger nucleases and meganucleases and the like which are well known in the art (p.e. Vasquez KM et a ! .(2001 ) Manipulating the mammalian genome by homologous recombination. Proc Natl Acad Sci U S A 98(15): 8403-10; Orban PC el al.
  • a recombinant " vector includes a promoter, choice of which depends on the intended application of the vector, and may be tissue specific, ubiquitously expressed, or a promoter that allows conditional expression.
  • the recombinant vector according to present invention may include a selection marker such as Neomycin, Kanamycin, Ampicillin or other selection markers depending on the system used.
  • the recombinant vector according to present invention may include enhancer.
  • Enhancer is a short region of DNA that can be bound with proteins (the trans-acting factors) to enhance transcription levels of genes in a gene cluster.
  • An enhancer may be selected from any suitable commercial source depending on particular system used, p.e. SV40 enhancer, plasmid PCAT3- Enhancer (Smallwood A and Ren B (2013) Genome organisation and long range regulation of gene expression by enhancers. Current Opinion in Cell biology 25: 1-8)
  • the recombinant vector according to present invention may include reporter gene such as green fluorescence protein (GFP), luciferase or beta-galactosidase.
  • reporter gene such as green fluorescence protein (GFP), luciferase or beta-galactosidase.
  • the recombinant vector according to present invention may include suitable multi-cloning site containing restriction sites of various restriction enzymes.
  • the recombinant vector according to present invention may include start codon for the beginning of transcription of DNA of interest.
  • the recombinant vector according to present invention may include stop codon to end transcription of DNA of interest.
  • the recombinant vector according to present invention may include poly-A sequence at the end of gene sequence so it can be recognized as messenger RNA in the cell.
  • the host cells are transfected with recombinant vectors comprising our preselected neutral DNA sequence.
  • Various traiisfection systems may be used, currently known in the art (fugene, lipofectamine 2000, eleclroporation, etc.). The efficiency of traiisfection with any particular system is cell type specific. Cell types of interest include a large range of normal cells from different tissues and from different plant and animal species. Immortal and/or tumour derived cells are also of interest. Traiisfection efficiencies can be monitored by insertion of GFP or beta-galactosidase into the vector used.
  • the ratio of homologous/non-homologous recombination is determined by digestion with restriction enzymes and subsequent Southern blot analysis.
  • the gene expression profile analysis can be done by methods such as northern blotting, macroarray or niicroarray analysis in order to check if there is disturbance in endogenous gene expression caused by the insertion.
  • neutral sequences were determined on the human genome and sequences SEQ ID NO:l, SEQ ID NO:2, SEQ ID NO:3, SEQ ID NO:4, SEQ IDNO:5, SEQ IDNO:6, SEQ IDNO:7, SEQ IDN0.8, SEQ ID NO:9, SEQ ID NO: 10, SEQ ID NO.l 1, SEQ 1DN0:12, SEQ ID NO:13, SEQ ID NO:l4, SEQ ID NO:l5 and SEQ ID NO: 16 were retrieved by the said method. These sequences were isolated and purified, vectors containing said neutral DNA sequence, a promoter, a DNA of interest, a selection marker and an enhancer were constructed.
  • the neutrality screening procedure for a given organism can be divided into two phases: an initial "data retrieval " phase where all relevant data is downloaded from the adequate database, followed by a filtration phase.
  • the selected genome or part of the genome of particular organism is loaded into data-processing system in the form of array from any available public or commercial database containing said genome.
  • the reported functional features of the said genome are extracted from one or more auxiliary databases and also loaded into data-processing system.
  • an early step is to map a large set of reported functional features, that is, their associated start and stop . positions, onto a corresponding reference sequence. This process is carried out via data-processing system and is explained below under summonApproach". Once all features have been mapped, genome is scanned for complementary or "feature-free" regions. These positions are then marked as safe spots for integrating a DNA sequence of interest.
  • Example 1 - Human genome and EnsEMBL database This example discloses the method to operate a data-processing system on the human genome using EnsEMBL database in order to determine neutral DNA sequences in said genome.
  • marking the reported functional features of the human genome can be performed by combining one or more sources of information, i.e. auxiliary databases, without limitation.
  • a neutral locus site is considered to be a featureless region of DNA.
  • a pipeline was developed to merge publicly available data provided by EnsEMBL database with the Homo sapiens genome.
  • Homo sapiens genome is represented in the form of an array suitable for computational biology.
  • EnsEMBL API Application programming interface
  • the start and stop positions of every reported feature associated with a particular region in the genome were downloaded.
  • Process of marking the reported functional features according to the EnsEMBL database is carried on via data-processing system.
  • Each region having known functional feature selected from: regulatory or coding regions, functional genes, satellite DNA, introns, exons, microR As, pseiidogenes, heterochromatin, repetitive sequences, transposable elements, and exceptions marked as having functional feature; is "cut"-out form the array that represents Homo sapiens genome.
  • the portions of said array without known functional features, i.e. one or more perspective sequences with variable lengths, are randomly distributed across the said array.
  • each sequence obtained in the above cited manner is finally subjected to the BLAST analysis to confirm their unique status in the selected genome.
  • the BLAST procedure is given in the reference S. F. Altschul, T. L. Madden, A. A. SclVaffer, J. Zhang, Z. Zhang, W. Miller, and D. Lipman. Gapped blast and psi-blast: a new generation of protein database search programs. Nucleic Acids Research, 25:3389-3402, 1997.
  • DNA template is genomic DNA from MJ90hTERT cells isolated using DNeasy Blood &
  • DNA template is genomic DNA from MJ90hTERT cells isolated using DNeasy Blood &
  • Revers primer with sequence for BamFII restiction enzyme digestion 5'- GGCGGATCC-ACCTGTATGTTCTTCCAACGA-3' (Sigma)
  • DNA template is genomic DNA from .I90hTERT cells isolated using DNeasy Blood &
  • Revers primer with sequence for Bam hi I restriction enzyme digestion 5'- GGCGGATCC-TCAGGAGATACCAGTGTACTA-3' (Sigma)
  • DNA template is genomic DNA from J90hTERT cells isolated using DNeasy Blood &
  • DNA template (genomic DNA from MJhTERT cells) 4 500 ng
  • Revers primer 5-GTGGTGATCTTCCTGTCAGC -3' (Sigma)
  • DNA template is genomic DNA from J90hTERT cells isolated using DNeasy Blood &
  • DNA template (genomic DNA from M.IhTERT cells) 8,6 500 ng
  • Forward primer 5' -CAGACTGGAGAAGCAGCATC-3' (Sigma).
  • Reverse primer (R primer) 5'- CTGCAACTCTCATACCAGGA-3' (Sigma) DNA template is genomic DNA from J90!iTERT cells isolated using DNeasy Blood &
  • Reverse primer 5'- CACACACAGCAGGCTGTCTT-3' (Sigma)
  • DNA template is genomic DNA from MJ90hTERT cells isolated using DNeasy Blood &
  • Chroinosome position conting 4 1 I 1879550-1 I 1889350; q25 according to Ensembl release 60 - Nov 2010
  • F primer Forward primer with sequence for pnl restriction enzyme digestion: 5'- CGCGGTACC-AGAAGGAATCCTCATGATTGC -3' (Sigma).
  • Reverse primer with sequence for Bam HI restriction enzyme digestion 5'- GGCGGATCC-TCATGGTATTGTATTAGGCTC-3' (Sigma)
  • DNA template is genomic DNA from J90hTERT cells isolated using DNeasy Blood &
  • Sequence ID NO 9 was not amplified using primers with or without restriction sites (Kpnl and Xbal), F: 5'-CGCGGTACC-CACCTCCTGGAGTAGTGTTC-3' and R: 5'- CTAGTCTAG- CTGAGAGTGAGCACTGCACC 3';
  • DNA template is genomic DNA from MJ90hTERT cells isolated using DNeasy Blood &
  • DNA template (genomic DNA from MJIiTERT cells) 5,4 450 ng
  • DNA template is genomic DNA from J90hTERT cells isolated using DNeasy Blood &
  • Tissue Kit (Qiagen) according to manu acturer's instructions.
  • Example 13 Sequence ID No.12 Chromosome position coining 4:71178426-71189339, 4pl6.1 according to Ensembl release 60 -Nov 2010
  • Reverse primer 5'-CCGTGTGATCCAGTGGAAGA-3' (Sigma)
  • DNA template is genomic DNA from J90hTERT cells isolated using DNeasy Blood &
  • F primer Forward primer with sequence for Kpnl restriction enzyme digestion 5'- C G C G G T A C C -TTC TG A TTC A TG TG G TCG TTC -3' (Sigma).
  • DNA template is genomic D A from MJ90hTERT cells isolated using DNeasy Blood &
  • a template is genomic DNA from M.I90hTERT cells isolated using DNeasy Blood &
  • DNA template (genomic DNA from JhTERT cells) 7,27 450 ng
  • Reverse primer (R primer) with sequence for Not I. restriction enzyme digestion
  • DNA template is genomic DNA from MJ90hTERT cells isolated using DNeasy Blood &
  • DNA template (genomic DNA from MJhTERT cells) 6,36 450 ng
  • Sequence amplification by Polymerase chain reaction (PCR): Sequence 8-3 could not be amplified by PCR using primers (FP) AATTCCAGCAGGTCTTGTCC-3' and ( P) 5'-CTATGAGAAGCTGCTCCTGA - primers with restriction site sequence.
  • Plasmids that contained cloned sequences were isolated from bacteria cultures using QIAGEN kit according to manufacturer's instructions. Nucleotide sequence of sequences cloned into pCR ll-TOPO was determined using kit ABI PRISM BigDye Terminator v3.J in 1 B DNA-servis. For every sequencing reaction plasmid concentration was 500 ng/ ⁇ . Using Ml 3 forward primer (CTGGCCGTCGTTTTAC) and l 3 reverse primer (CAGGAAACAGCTATGAC) (Invitrogen), nucleotide sequence from both ends was determined. Results of sequencing were verified using NCB1 database. After determining sequence identity, their unique presence in the genome was checked using Southern blot hybridization.
  • 300 ng of PCR product for every sequence was diluted in 15 ⁇ reH 2 0 and incubated for 10 min in termoblock at I00°C to denature DNA, and then chilled for 5 min on ice, following addition of: 2 ⁇ 10X buffer, 2 ⁇ nucleotides mixture with DIG- 11-dUTP and I ⁇ lenow enzyme. Mixture was incubated at 37°C overnight. To stop the reaction and to precipitate DNA 2 ⁇ 0,2 M EDTA, 2,5 ⁇ 4 M LiCI, 75 ml absolute cold ethanol and 1 ⁇ glycogen were added and reaction was incubated for 2h at -20°C. Efficiency of probe labeling was checked by clot blot test that compared labeled probes with labeled control DNA. Labeled DNA probes were stored at -20°C and were used during several months.
  • Suitable amounts (from 7 to 10 ⁇ g) of genomic DNA were digested with restriction enzymes, for every sequence two different digestion combinations that give fragments of different lengths were chosen (see table).
  • Digested DNAs were than separated on 0,8% agarose gel electrophoresis at 120 V for 3 h, following gel clepurination in 0,25 M HCI for 5 min., denaturation in 0,5 M NaOH and 1,5 M NaCl 2x15 min, and neutralization in 0,5 M Tris HCI pH 7,5 and 3 M NaCl 2 15 min. Gel was than washed in 2 x SSC (3 M NaCl, 0,3 M sodium citrate), and DNA transfer on positive charged nylon membrane (Roche) was performed overnight in 20 x SSC.
  • TOPO plasinids that contained chosen sequences were digested using proper restriction enzymes in order to redone sequences into phCM V-cGFP plasmid. Sequence 4.4 was cloned directly into phCMV-cGFP plasmid.
  • Plasmid DNAs were isolated from bacteria using DNA Qiagen kit for plasmid DNA isolation.
  • MI90hTERT cells normal human skin fibroblasts with active telomerase
  • DMEM medium Sigma
  • 10% fetal bovine serum Gibco
  • Cells were passaged when they reached 80% confluency.
  • Example 25 Electroporation of M.J90hTE T cells with phC V-cGFP plasmids that contain target se uences
  • M.I90hTERT cells were subcLiltured 24 h prior to electroporation in order to be -50-60% confluent at the time of electroporation. I0 7 of cells were collected in 0,4 ml DMEM media without serum and transferred in cold cuvette with 0,4 cm space between electrodes.20 ⁇ 3 ⁇ 4 of phCMV-cGFP plasmid diluted in 20 ⁇ TE buffer was added to the cuvette. Used phCMV- cGFP plasmids contained chosen sequences and were linearized using proper restriction enzymes. Cells were electroporated under these conditions: voltage 220V, time constant 30ms, capacity 960 ⁇ .

Landscapes

  • Physics & Mathematics (AREA)
  • Life Sciences & Earth Sciences (AREA)
  • Health & Medical Sciences (AREA)
  • Engineering & Computer Science (AREA)
  • Evolutionary Biology (AREA)
  • General Health & Medical Sciences (AREA)
  • Theoretical Computer Science (AREA)
  • Bioinformatics & Cheminformatics (AREA)
  • Bioinformatics & Computational Biology (AREA)
  • Biotechnology (AREA)
  • Spectroscopy & Molecular Physics (AREA)
  • Biophysics (AREA)
  • Medical Informatics (AREA)
  • Bioethics (AREA)
  • Databases & Information Systems (AREA)
  • Chemical & Material Sciences (AREA)
  • Analytical Chemistry (AREA)
  • Proteomics, Peptides & Aminoacids (AREA)
  • Micro-Organisms Or Cultivation Processes Thereof (AREA)
  • Measuring Or Testing Involving Enzymes Or Micro-Organisms (AREA)

Abstract

The present invention relates to the method for identification of neutral DNA sequences suitable for integration of DNA of interest in any partially or completely annotated sequence or genome. The invention further relates to the identified neutral sequences which are suitable for integration of DNA of interest, the method for inserting a DNA of interest into the genome of interest. Furthermore, the invention relates to the cell and/or organ ism which have integrated DNA of interest in the neutral sequence and their use in the production of cell lines, transgenic plants and animals and for gene therapy.

Description

METHOD OF DETERMINATION OF NEUTRAL DNA SEQUENCES IN THE GENOME, SYSTEM FOR TARGETING SEQUENCES OBTAINED THEREBY AND METHODS FOR USE THEREOF
FIELD OF THE INVENTION
The present invention relates to the method for identification of neutral DNA sequences in a given genome, to the neutral DNA sequences suitable for integration of the DNA of interest and methods for use thereof.
BACKGROUND OF THE INVENTION
Introduction and expression of the DNA of various origin into the cells and/or organisms is an essential procedure in almost every field of molecular biology, molecular genetics, biotechnology, pharmaceutical research, experimental medicine, preclinical research stage and clinical stage gene therapy such as proteomics, functional genomics, protein structure- function studies, cell based drug discovery.
By doing so, it is of uttermost importance to ensure the isogenic setting of the targeted cell and/organism. Ideally, any physical disruption of the individual host genes and/or transcription units, any changes in the gene expression of the host cell any introduction of the gene and/or transcription unit other than DNA of interest have to be avoided and at the same time stable expression of the transgene has to be ensured.
Currently available procedures are following: transformation with self-replicative vector carrying the DNA of interest and transgenesis using a vector capable of integration into the host cell genome.
Although, in the first method physical constitution of the host genome is not deranged, the method itself has major drawbacks. The self replicating vector is readily diluted from the cell in the absence of the drug used for the selection. Additionally, replication of the vector poses burden for the cell and might eventually result in the deletion of the DNA of interest from the vector even in the presence of the selection drug.
So far, two well-characterized integrative systems have been described that rely on viral and non-viral vectors respectively. According to currently available data there are a numerous adverse effects, some of them being very serious, associated with the experimental outcome of their use. These issues involve the efficiency of delivery, insertional mutagenesis/oncogenesis and also transgene remobilization and postintegrative promoter silencing. None of these risks cannot be completely avoided using currently available technology.
For example, most of the currently used protocols rely on the random insertion of appropriate expression vectors. Such a random insertion approach poses high risk of the disruption of a functionally active gene or its regulatory sequence. The concerns regarding the random insertion approach dramatically increase when experiments or applications of gene therapy are performed. The cause for concern is the possibility of the integration of the experimental or therapeutic DNA, by chance, into a sensitive genomic site such as an oncogene (Yanez R.l and Porter ACG (1998) Gene Ther.5(2): 149-59, Hacein-Bey-Abina S el al. (2003) Science 302: 415-419; Hacein-Bey-Abina S el: al. (2003) N. Engl. J. Med.348: 255-256).
To avoid problems of random insertion approach, scientists have turned to the site-directed insertion into the predetermined sites which were chosen due to the fact that they have favourable properties compared to the random integration sites.
Therefore, primary focus of the optimisation of the protocols for the introduction and expression of the DNA of various origin into the cells and/or organisms presently is to improve the vector design in such a way to ensure both site-directed integration into a safe locus and long-term expression of the transgene. The issues that are still open in the set up of these protocols are: what is an ideal vector of choice and what defines a good integration site. Regarding the optimisation of the vector architecture, major improvements have been made up to now.
Several efforts during the last decade led to the development of nonviral approaches based on DN A-moclifying enzymes by exploiting the cellular mechanisms of cut-and-paste transposition and homologous recombination (HR). These methods include the use of zinc finger nucleases, meganucleases, site-specific recombinases such as OC31 integrase, Cre and Flp recombinases and transposase-based systems to achieve the integration of foreign DNA at a desired genomic position. Presently, strategies on how to alter the site-specificity of these enzymes and precisely target safe insertion sites are successfully implemented (p.e. Lombardo A el al. (2007) Gene editing in human stemcells using zinc finger nucleases and integrase defective lentiviral vector delivery. Nat. Biotechnol.25:1298-1306)
Therefore an imperative in defining an optimal protocol is now to find an approach that would minimize all the above mentioned risks and which is based on targeting specific safe loci in the genome. Since the methods which precisel -target the D A of interest to a specific locus are already known in the state of the art (p.e. the most promising emerging zinc finger genome targeting technology), it is now essential to identify the safe target site in the genome of choice.
Presently there are several criteria that define safe target site or safe harbour locus. Integration at this site is needed to support sufficient and stable gene expression in the modified cells. In addition, the integration should not interfere with normal cell function or gene regulation (a locus with no proto-oncogene or essential genes in its vicinity). Some authors have described a safe harbor site as a locus that fulfils the following five criteria: they are located (i) at least 50 kbp from the 5' end of a gene, (ii) at least 300 kbp from any cancer-related gene, (iii) at least 300 kbp from any miRNA, (iv) outside a transcription unit and (v) outside ultraconserved regions of the human genome (Papapetrou et al. (2011) Genomic safe harbours permit high β-globin transgene expression in thalassemia induced pluripotent stem cells. Nat. Biotechnology 29: 73-78).
In the present state of art there is a description of three types of solutions on how to identify a safe harbour locus for the insertion of the DNA of interest.
First solution advises on use of repetitive sequences. There are several repetitive genes that are widely transcribed in the genome. It is supposed that disruption of one or few copies can be tolerated without any deleterious effects. For example, the human ribosomal DNA or rDNA locus contains ~ 400 copies per haploid genome clustered on five chromosomes (Sakai K et al. (1995) Human ribosomal RN A cluster: identification of the proximal end containing a novel tandem repeat sequence. Genomics 26: 521-526). The expression of these genes is robust, since rRNA accounts for nearly 80% of the total RNA in growing mammalian cells. rDNA is well conserved among organisms, especially a 50-bp region of 28S rDNA that is a target for sequence-specific, non-LTR retroposons (LINE). This short region is highly conserved in chorclates and arthropods and in all eukaryotes. This could constitute a good universal integrative site. (Lisovvski L. et al. (2012) Ribosomal DNA integrating rAAV-rDNA vectors allow for stable transgene expression., Molecular Therapy 20: 1912-1923). However, one should keep in mind that ribosomal RNAs are among most important and essential genes in living cell and interruption some of their genes may result in transcription product that could interfere with normal ribosomal function. The second solution is to drive the integration to specific loci already reported to be nonmutagenic. In this approach several loci have been identified. The two most prominent sites used nowadays are: mouse ROSA26 locus and human AAVS1 site.
Presently, mouse ROSA26 locus has become established as one of the preferred docking sites for the ubiquitous expression of transgenes, as it can be targeted with high efficiency and is expressed in most cell types. In addition to ubiquitous transgene expression via the endogenous promoter, transgenic constructs harboring different exogenous promoters or different transgene cassette including reporters, site-specific recombinases and, recently, non- coding RNAs have been positioned at the ROSA26 locus (Casola S (2010) Mouse models for miRNA expression: the ROSA 26 locus. Methods Mol Biol 667:145-163; Chen C el al. (2011) A comparison of exogenous promoter activity at the ROSA26 locus using PhiC31 integrase mediated cassette exchange Approach in mouse ES cells. PLoS ONE 6(8) e23376.
However, the transcriptional complexity of the ROSA26 locus (Zambrowicz BP et al. (1997) Disruption of overlapping transcripts in the ROSA beta geo 26 gene trap strain leads to widespread expression of β-galactosidase in mouse embryos and hematopoietic cells, Proc Natl Acad Sci U S A.94(8):3789-94) indicates orientation-dependent effects (Strathdee D et al. (2006) Expression of trausgens targeted to the Gt(ROSA)26Sor locus is orientation dependent. PLoS ONE l:e4)'. The lack of systematic studies has limited the general application of this method with respect to exogenous promoters. Although recently, the human homologue of the mouse ROSA26 locus on chromosome 3 had been identified (Irion S et al. (2007) Identification and targeting of the ROSA26 locus in human embryonic stem cells. Nat Biotechnol 25: 1477-1482), it is not expected to exist in all genomes that have to be targeted and therefore lack its universal character. Moreover, it has been shown that both DNA strands of the ROSA 26 genomic region are transcribed producing convergent and antisense RNAs (Zambrowicz BP et al. (1997) Disruption of overlapping transcripts in the ROSA beta geo 26 gene trap strain leads to widespread expression of β-galactosidase in mouse embryos and hematopoietic cells, Proc Natl Acad Sci U S A. 94(8):3789-94.) It is known that disruption of a gene transcript may cause metabolic changes in some tissues.
In its latent state, AAV (Adeno-associated virus) is found integrated into the host genome at a specific site, designated AAVS1. This site has been mapped within the PPPIRI2C gene (protein phosphatase 1 regulatory subunit I2C). AAV has the features that makes it attractive as integration site: natural persistence in the human chromosome and no known adverse effects are reported until now. Therefore, A AVS 1 is considered to be a safe harbor for adding a transgene into the human genome (De elver et al. (2010) Functional genomics, proteomics, and regulatory DNA analysis in isogenic setting using zinc finger nuclease-driven transgenesis into safe harbour locus in the human genome. Genome Research 20:1133-1142). However, because of its specificity for human genome this site can be used only for human transgenesis. Moreover, this site constitutively expresses a protein named p84 with no known function and therefore the integration of the DNA of interest at this site does not provide an isogenic setting (Linden RM et al. (1996) Site-specific integration by adeno-associated virus, P AS, USA 93: 11288-11294). This is even more important due to the fact that studies on long term toxicity of the integration of DNA of interest at this site are lacking.
The third solution is to screen clones that have only one copy of the integrated transgene in order to characterize putative universal genomic safe harbours, method developed by Papapetrou et al. (Papapetrou et al. (2011) Genomic safe harbours permit high β-globin transgene expression in thalassemia induced pluripotent stem cells. Nat. Biotechnology 29: 73-78. They hypothesized that the screening of clones of induced pluripotent stem cells (i PS) harboring a single vector copy would facilitate the discovery of new secure harbor sites. For this purpose, they used iPS cells with a well-characterized globin LV vector in the expectation that this would meet the five criteria for safe harbour locus described above. They have examined 5,840 integration sites, and found that 17% of them met all five criteria. This method for de novo discovery of the genomic safe harbour can be applied for all genomes. However, clue to the fact that it includes very laborious and expensive laboratory work which then has to be applied in all various experimental settings using different genomes as targets, it is difficult and time-consuming in its application.
Consequently, there is still a need in the state of the art for a method of identification of the sequence which can serve as best places for integration of DNA of interest that can be applied to any genome of interest for genetic experiments. The integration of DNA of interest into this loci should preserve the isogenic background of the targeted cell/organism not causing any effects in addition to introduction and expression of the transgene.
The present invention provides a new method by which we can identify neutral sequences in any partially or completely annotated genome and/or sequence and which gives fast and reliable results in a short period of time. Neutral sequences identified by such method are then used as the best safe loci for integration of DNA of interest into any selected genome for purposes of creation of transgenic animals, experimental biology and medicine as well as gene therapy. SUMMARY OF THE INVENTION
The present invention provides the method for identification of neutral DNA sequences in a given genome, to the neutral DNA sequences suitable for integration of the DNA of interest and methods for use thereof.
More precisely, the invention provides the method for identification of neutral DNA sequences in which the neutral DNA sequences are filtered out by a data processing system as the ones not having any known functional features selected from: regulatory or coding regions, functional genes, satellite DNA, introns, exons, microRNAs, pseudogenes, hetei chromatin, repetitive sequences, transposable elements, and exceptions marked as having functional feature.
In a particular aspect present invention relates to the method of identification of neutral sequences in any partially and or completely annotated genome and/or sequence.
The invention further relates to the neutral DNA sequences identified thereby. In particular, neutral sequences identified in human genome, selected from the group consisting of SEQ ID NO: I, SEQ ID NO:2, SEQ ID NO:3, SEQ ID NO:4, SEQ ID NO:5, SEQ ID NO:6, SEQ ID NO:7, SEQ ID NO:8, SEQ 1D N0:9, SEQ ID NO: 10, SEQ ID NO: I I , SEQ ID NO: 12, SEQ ID NO: 13, SEQ ID NO: 14, SEQ ID NO: 15 and SEQ ID NO: 16.
The invention further relates to the system for targeting of neutral sequences determined according to the invention, namely method for the introduction of the DNA of interest into neutral sequences determined according to the invention by making use of recombinant vector containing the DNA of interest, suitable selection marker, regulatory elements, gene insertion system and neutral sequence according to the invention.
The invention further relates to the use of neutral sequences according to the invention for producing cell lines, transgenic plants and animals and for gene therapy.
DETAILED DECRIPTION OF THE INVENTION
The present invention provides the method for identification of neutral DNA sequences in a given genome, to the neutral DNA sequences suitable for integration of the DNA of interest and methods for use thereof. The term "neutral sequences" refers to any particular portion of the genome and of DNA sequence that is located far enough from al l functional genetic elements so that integration of DNA of interest into this place does not disturb the normal physiology of the cell or the changes are kept at lowest possible level.
The term "DNA of interest" refers to desired DNA fragments which encode proteins or non- cod ing RNAs (such as genes, regulatory elements or other functional sequences of interest) that is chosen by the man sk i l led in the art to be integrated into the host cel l.
The term "vector" as used herein refers to an extrachromosomal element that is used to introduce any DNA material into cells for either expression or repl ication thereof. Both terms "vector" and "plasm id" are used indifferently.
The term "predeterm ined site" as used herein refers to any of the sequence selected as a place for integration of DNA of interest.
The term "cell l ine" as used herein refers to normal cel l lines, telomerase immortal ized cel l l ines, immortal transformed cell l ines and stem cel l l ines.
A l l molecu lar biology techniques used for realizing the invention are ful ly described in "Molecular clon ing: a laboratory manual", 3"' ed, Cold Spring Harbor Laboratory Press, 2001 , by Sambrook, Fritsch and Man iatis and " etode u moleku larnoj biologij i", 2007, Ambriovic Ristov A (Ed in Chief), Ruder Boskovic I nstitute, Zagreb.
The invention provides a method of integration of DNA of interest at a predeterm ined site in the genome, the method comprising following steps: a) determ ination of neutral DNA sequence as predeterm ined site in the genome (insertion loci) for the insertion of DNA of interest,
b) isolation and puri fication of neutral DNA sequence,
c) construction of recombinant vector comprising neutral DNA sequence and DNA of interest,
d) transfection the cells with a recombinant vector containing a neutral DNA sequence and DNA of interest,
e) screening positive clones, and
f) gene expression analysis of cells after insertion of DNA of interest. Step a) - determination of neutral DNA sequence as predetermined site in the genome (insertion loci) for the insertion of DNA of interest - consists of a method to operate a data- processing system for determ ination of one or more neutral DNA sequences of a selected genome serv ing as an insertion locus. Sub-steps of above cited step a) are:
A . loading of the genome into said data-processing system in the form of an array;
B. marking the reported functional features according to the one or more auxiliary database's records from the array obtained in step A by tagging the start and stop positions of each known functional feature selected from : regu latory or coding regions, functional genes, satel l ite DNA, nitrons, exons, m icroRNAs, pseuclogenes, heterochromatin, repetitive sequences, transposable elements, and exceptions marked as having functional feature in auxi l iary database;
C. filtering out the reported functional features marked in step B. from the array in step A. and form ing set of independent sequences characterized by the length;
D. sorting the obtained sequences by length;
E. filtering out the sequences from step D. having less than 35 kb in order to obtain ; sequences of 35kb or longer;
F. for each sequence extracted in step E the regions of at least 7 kb from the end and at least 7 kb from the right end of said sequence were cut-out from the sequence; and
G. the BLAST analysis is executed over the narrowed sequence obtained in step F. to con firm their unique status in the selected genome.
Above used acronym "kb" stands for "k ilobase" - a un it of measurement in molecular biology equal to 1000 base pairs of DNA or R A.
Th is invention further com prises isolation and puri fication of the neutral sequences obtained by the step a).
The invention also provides a method of construction of a vector capable of integrating into predeterm ined site in the given genome of the cell, the method comprising: h) Isolation and purification of selected sequences
Suitable parts of identi fied neutral sequences are ampl ified by standard techniques such as PGR method. PCR products are inserted into plasm id vector for further ampl i fication. Purification of the desired sequence from the PCR products may be done by the procedures known in the art. Veri fication of the purified sequence can be achieved by sequencing of the products. c) Construction οΓ recombinant vector comprising neiitml sequence
Confirmed unique neutral DNA sequences are cloned into suitable vector for mammalian expression.
A recombinant vector is constructed by insertion of neutral sequence and suitable cassette in the vector. The choice of suitable cassette depends on the system of gene insertion used. The techniques used to increase efficiency and specificity of integration are selected from site- specific recombination systems Cre/Lox, FLP/FRT, SIRT. Red/ET, TAGIT, modified Gin/gix, Zinc finger nucleases and meganucleases and the like which are well known in the art (p.e. Vasquez KM et a ! .(2001 ) Manipulating the mammalian genome by homologous recombination. Proc Natl Acad Sci U S A 98(15): 8403-10; Orban PC el al. (1992) Tissue- and site specific DNA recombination in transgenic mice. Proc. Natl. Acad. Sci. USA (89): 6861-6865; Porteus MH and Carroll D (2005) Gene targeting using zinc finger nucleases. Nat. Biotec nol. 23:967-973, Renault S and Duchateau Philippe (2012). Site-directed insertion of transgenes: 23 (Topics in Current Genetics) Springer. Kindle Edition)
A recombinant" vector includes a promoter, choice of which depends on the intended application of the vector, and may be tissue specific, ubiquitously expressed, or a promoter that allows conditional expression.
The recombinant vector according to present invention may include a selection marker such as Neomycin, Kanamycin, Ampicillin or other selection markers depending on the system used.
The recombinant vector according to present invention may include enhancer. Enhancer is a short region of DNA that can be bound with proteins (the trans-acting factors) to enhance transcription levels of genes in a gene cluster. An enhancer may be selected from any suitable commercial source depending on particular system used, p.e. SV40 enhancer, plasmid PCAT3- Enhancer (Smallwood A and Ren B (2013) Genome organisation and long range regulation of gene expression by enhancers. Current Opinion in Cell biology 25: 1-8)
The recombinant vector according to present invention may include reporter gene such as green fluorescence protein (GFP), luciferase or beta-galactosidase.
The recombinant vector according to present invention may include suitable multi-cloning site containing restriction sites of various restriction enzymes. The recombinant vector according to present invention may include start codon for the beginning of transcription of DNA of interest.
The recombinant vector according to present invention may include stop codon to end transcription of DNA of interest.
The recombinant vector according to present invention may include poly-A sequence at the end of gene sequence so it can be recognized as messenger RNA in the cell.
Traiisfection of the cells with a recombinant vector
The host cells are transfected with recombinant vectors comprising our preselected neutral DNA sequence. Various traiisfection systems may be used, currently known in the art (fugene, lipofectamine 2000, eleclroporation, etc.). The efficiency of traiisfection with any particular system is cell type specific. Cell types of interest include a large range of normal cells from different tissues and from different plant and animal species. Immortal and/or tumour derived cells are also of interest. Traiisfection efficiencies can be monitored by insertion of GFP or beta-galactosidase into the vector used.
Screening positive clones
The ratio of homologous/non-homologous recombination is determined by digestion with restriction enzymes and subsequent Southern blot analysis.
Gene expression profile of cells after gene insertion
After DNAs insertion in preselected unique loci is completed and positive clones selected, the gene expression profile analysis can be done by methods such as northern blotting, macroarray or niicroarray analysis in order to check if there is disturbance in endogenous gene expression caused by the insertion.
In concrete examples of the present invention, neutral sequences were determined on the human genome and sequences SEQ ID NO:l, SEQ ID NO:2, SEQ ID NO:3, SEQ ID NO:4, SEQ IDNO:5, SEQ IDNO:6, SEQ IDNO:7, SEQ IDN0.8, SEQ ID NO:9, SEQ ID NO: 10, SEQ ID NO.l 1, SEQ 1DN0:12, SEQ ID NO:13, SEQ ID NO:l4, SEQ ID NO:l5 and SEQ ID NO: 16 were retrieved by the said method. These sequences were isolated and purified, vectors containing said neutral DNA sequence, a promoter, a DNA of interest, a selection marker and an enhancer were constructed. Furthermore, cells were transfected by the said vector and gene expression analysis of the cell after insertion of DNA of interest was analysed. The present invention will now be described in greater detail by means of the following examples. The following examples are for illustrative purpose and are not intended to limit the scope of the invention.
EXAMPLES
The neutrality screening procedure for a given organism can be divided into two phases: an initial "data retrieval" phase where all relevant data is downloaded from the adequate database, followed by a filtration phase. According to the present invention, the selected genome or part of the genome of particular organism is loaded into data-processing system in the form of array from any available public or commercial database containing said genome. The reported functional features of the said genome are extracted from one or more auxiliary databases and also loaded into data-processing system.
In the second phase, an early step is to map a large set of reported functional features, that is, their associated start and stop . positions, onto a corresponding reference sequence. This process is carried out via data-processing system and is explained below under„Approach". Once all features have been mapped, genome is scanned for complementary or "feature-free" regions. These positions are then marked as safe spots for integrating a DNA sequence of interest.
Good examples of such databases containing genomes and/or corresponding reported functional features of said genome are:
• EnsEMBL - http://vvvvw.ensembl.org/index.litml
» NCBI - http://vvvvvv.ncbi.nlm.nih.gov/
• UCSC - http://genome.ucsc.edu/
• Vega - http://vega.san ger.ac.uk/index. hi ml
• BGD - http://mbgd.genome.ad.jp/
• MetaCyc - http://vvvvvv.melacvc.oi g/
• BioCyc - http://biocyc.org/
• Hymenoptera Genome Database - http://hynienopteragenome.org/
• ArkDB - http://www.thearkclb.org/arkdb/
» itozoa clb - liltp://mi. caspur.it/initozoa/
• Non-B db - IUtp://nonb, abcc.ncifcif.gov/apps/site/de fan It
Example 1 - Human genome and EnsEMBL database This example discloses the method to operate a data-processing system on the human genome using EnsEMBL database in order to determine neutral DNA sequences in said genome. For the skilled person in the art it is evident that marking the reported functional features of the human genome can be performed by combining one or more sources of information, i.e. auxiliary databases, without limitation.
Approach: A neutral locus site is considered to be a featureless region of DNA. To identify such regions, a pipeline was developed to merge publicly available data provided by EnsEMBL database with the Homo sapiens genome. In the data-processing system Homo sapiens genome is represented in the form of an array suitable for computational biology.
Using the EnsEMBL API (Application programming interface), the start and stop positions of every reported feature associated with a particular region in the genome were downloaded. Process of marking the reported functional features according to the EnsEMBL database is carried on via data-processing system. Each region having known functional feature selected from: regulatory or coding regions, functional genes, satellite DNA, introns, exons, microR As, pseiidogenes, heterochromatin, repetitive sequences, transposable elements, and exceptions marked as having functional feature; is "cut"-out form the array that represents Homo sapiens genome. In this representation the portions of said array without known functional features, i.e. one or more perspective sequences with variable lengths, are randomly distributed across the said array.
The perspective sequences without reported functional features are sorted according to its length. The sequences with less than "L" kb were filtered out. In this example L = 35. It is evident that lowering the value "L" increases number of perspective sequences for future processing and vice versa. Having in mind that one aspect of invention is to find neutral sequences serving as insertion loci, restriction enzymes used during insertions should recognize desired palindrome within the neutral sequence. If the perspective sequence is too short, i.e. much less than 35 kb it is not likely that such insertion place exist within the said sequence. In the other hand, increasing the L, number of available sequences for future analysis tends to zero. It was concluded that good experimental value for the human genome is value L close to 35.
Furthermore, in order that possible insertion place be as safe as possible, i.e. far from the places having reported functional features that cannot be affected by restriction enzymes - additional 7kb were cut out from both ends of each sequence having at least 35 kb. By this we prevent possible insertion within the isolated sequence that is too close to the region having reported functional features. According to the example, the shortest sequence therefore has: 35 kb - 2x7kb = 21 kb or more.
Finally, each sequence obtained in the above cited manner is finally subjected to the BLAST analysis to confirm their unique status in the selected genome. The BLAST procedure is given in the reference S. F. Altschul, T. L. Madden, A. A. SclVaffer, J. Zhang, Z. Zhang, W. Miller, and D. Lipman. Gapped blast and psi-blast: a new generation of protein database search programs. Nucleic Acids Research, 25:3389-3402, 1997.
The result of the above cited operation of data-processing system performed over the human genome, followed by isolation and purification of selected sequences by the technique known in the art are: SEQ ID NO: I, SEQ ID N0:2, SEQ ID NO:3, SEQ ID NO:4, SEQ ID NO:5, SEQ IDNO:6, SEQ ID NO:7, SEQ IDN0:8, SEQ ID NO:9, SEQ ID NO:10, SEQ ID NO:l 1 ; SEQ ID N0:I2, SEQ ID NO:13, SEQ ID NO:14, SEQ ID N0:I5 and SEQ ID N0:I6; wherein said D A sequences are neutral sequences of the human genome serving as insertion loci.
From the above it is evident that the described method for identification neutral DNA sequences is not limited to any particular genome and can be easily extended to every completely or partially annotated genome sequence.
Example 2: Sequence I'D No.1
Chromosome position conting 1: 2716140-2725763; p36.32 according to Ensembl release 60 - Nov 2010
Inside conting is unique 5624 bp I: 2718140-2723763
Primer positions: 397 bp and 5534 bp inside unique sequence
Product length: 5138 bp
Sequence amplification by Polymerase chain reaction (PCR):
Forward primer (F primer) with sequence for Kpnl restriction enzyme digestion:
5*-CGCGGTACC-AGCCTCCTCTCCACATAGAT-3' (Sigma)
Reverse primer (R primer) with sequence for Xbal restriction enzyme digestion:
5'-CGCTCTAGA-CCATCACCTGCTAACAGCACC-3' (Sigma)
DNA template is genomic DNA from MJ90hTERT cells isolated using DNeasy Blood &
Tissue Kit (Qiagen) according to manufacturer's instructions.
Expand Long Template PCR System (Roche) was used for amplification.
Figure imgf000015_0001
PCR conditions:
Tem erature Time Num er of cycles
Initial denaturation 94 °C 3 min 1
Denaturation 94 °C 15 s
Annealing 61 °C 30 s 30
Extension/elongation 68 °C 4 min 10 s
Final elongation 68 °C 7 min 1
Cooling 4°C infinite 1 Specificity and amount of amplified PGR product was verified using agarose gel electrophoresis and purified using QIAqiiick PCR Purification Kit (Qiagen) according to manufacturer's instructions.
Example 3: Sequence W No.2
Chromosome position conting 4:71185537-71194274; q!3.3 according to Ensembl release 60 - Nov 2010
Inside conting is unique 4738 bp; 4: 71187537-71192274
Primer positions: 38 bp and 4627 bp inside unique sequence
Product length: 4069 bp
Sequence amplification by Polymerase chain reaction (PCR):
Forward primer (F primer) 5'- CTACTCTAGTGCTGATGCCT-3' (Sigma)
Revers primer (R primer) 5'- CTGGATGTGATAGTGTCTGC -3' (Sigma)
DNA template is genomic DNA from MJ90hTERT cells isolated using DNeasy Blood &
Tissue Kit (Qiagen) according to manufacturer's instructions.
Expand Long Template PCR System (Roche) was used for amplification.
Figure imgf000016_0001
PCR conditions:
Temperature Time N mber of cycles
Initial denaturatioii 94 °C 3 m in 1
Denaturatioii 94 °C 15 s
30
Annealing 59 °C 30 s Extension/elongation 68 °C 4 min
Final elongation 68 °C 7 min 1
Cooling 4 °C infinite 1
Specificity and amount of amplified PCR product was verified using agarose gel electrophoresis and by cutting with Bglll restriction enzyme and purified using QIAquick PCR Purification Kit (Qiagen) according to manufacturer's instructions.
Example 4: Sequence ID No.3
Chromosome position conting 4:78 I 15056-78127653; q21.1 according to EnsembI release 60 - Nov 2010
Inside conting is unique 8598 bp; 4:78117056-78125653
Primer positions: 145 bp and 8489 bp inside unique seguence
Product length: 8345 bp
Sequence amplification by Polymerase chain reaction (PCR):
Forward primer (F primer) with sequence for Kpnl restiction enzyme digestion 5'- CGCGGTACC-TGAGTCCTATGGAACTGATGT-3' (Sigma)
Revers primer (R primer) with sequence for BamFII restiction enzyme digestion 5'- GGCGGATCC-ACCTGTATGTTCTTCCAACGA-3' (Sigma)
DNA template is genomic DNA from .I90hTERT cells isolated using DNeasy Blood &
Tissue Kit {Qiagen) according to manufacturer's instructions.
Expand Long Template PCR System (Roche) was used for amplification.
Figure imgf000017_0001
PCR conditions: Te m pc rat lire Time Number of cycles
Initial deiiaturation 94 °C 3 min 1
Deiiaturation 94 °C 10 s
Annealing 58 °C 30 s 10
Extension/elongation 68 °C 6 m
Deiiaturation 94 °C 15 s
Annealing 58 °C 30 s 20
Extension/elongation 68 °C 6 min + 20 s for every next step
Final elongation 68 °C 7 min 1
Cooling 4 °C infinite 1
Specificity and amount of amplified PCR product was verified using agarose gel electrophoresis and by cutting with A fill restriction enzyme and purified using QIAquick PCR Purification Kit (Oiagen) according to manufacturer's instructions.
Example 5: Sequence ID No.4
Chromosome position conting 8:140505 13-140518109; q24.3 according to Eiisembl release 60 -Nov 2010
Inside conting is unique 8197 bp; 8: 140507913- 140516109
Primer positions: 7 bp and 4403 bp inside unique sequence
Product length: 4397 bp
Sequence amplification by Polymerase chain reaction (PCR):
Forward primer (F primer) with sequence for Kpnl restriction enzyme digestion 5'- CGCGGTACC-AGTAATTAGGAACAGCACATC-3' (Sigma)
Revers primer (R primer) with sequence for Bam hi I restriction enzyme digestion 5'- GGCGGATCC-TCAGGAGATACCAGTGTACTA-3' (Sigma)
DNA template is genomic DNA from J90hTERT cells isolated using DNeasy Blood &
Tissue Kit (Qiagen) according to manufacturer's instructions.
Expand Long Template PCR System (Roche) was used for amplification.
Components Volume (μΙ) Final concentration
1-1,0 35
lOx buffer with 27,5 in gCI2 5 Ix 2,75mM F primer (ΙΟμΜ) 1,5 0,3 μΜ
R primer (ΙΟμΜ) 1,5 0,3 μΜ
DNA template (genomic DNA from MJhTERT cells) 4 500 ng
cINTP mix (lOmM each) 1,4 350 μΜ
Enzyme mix (5 U/μΙ) 0,2 IU
D SO 1
Total volume 50
PCR conditions:
Figure imgf000019_0001
Specificity and amount of amplified PCR product was verified using agarose gel electrophoresis and by cutting with Afll! restriction enzyme and purified using QIAquick PCR Purification Kit (Qiagen) according to manufacturer's instructions.
Example 6: Sequence ID No.5
Chromosome position contiiig 10:72902162-72914163; q22.1 according to Ensembl release 60 - Nov 2010
Inside contiiig is unique 8002 bp; 10:72904162-72912163
Primer positions: 5 bp and 7938 bp inside unique sequence
Product length: 7934 bp
Sequence amplification by Polymerase chain reaction (PCR):
Forward primer (F primer) 5'-GTTAGGATGGCACACTGGAG-3' (Sigma)
Revers primer (R primer) 5-GTGGTGATCTTCCTGTCAGC -3' (Sigma)
DNA template is genomic DNA from J90hTERT cells isolated using DNeasy Blood &
Tissue Kit (Qiagen) according to manufacturer's instructions.
Expand Long Template PCR System (Roche) was used for amplification. Components Volume (μΙ) Final concentration
H20 29,7
lOx buffer with 27,5 ni MgCI2 5 Ix 2,75mM
F primer (ΙΟμ ) 1,5 0,3 μΜ
R primer (ΙΟμ ) 1,5 0,3 μ
DNA template (genomic DNA from M.IhTERT cells) 8,6 500 ng
cINTP mix (!OmM each) 2,5 500 μ
Enzyme mix (SU/μΙ) 0,2 1 U
D SO 1
Total volume 50
PCR conditions:
Figure imgf000020_0001
Specificity and amount of amplified PCR product was verified using agarose gel electrophoresis and purified using QIAquick PCR Purification Kit (Qiagen) according to manufacturer's instructions.
Example 7: Sequence iD No.6
Chromosome position conting 8:140505913-140518109; q24.3 according to Ensembl 60 - Nov 2010
Inside conting is unique 8197 bp; 8:140507913-140516109
Primer positions: 35 bp i 8026 bp inside unique sequence
Product length 7992 bp
Sequence amplification by Polymerase chain reaction (PCR):
Forward primer (F primer) 5' -CAGACTGGAGAAGCAGCATC-3' (Sigma).
Reverse primer (R primer) 5'- CTGCAACTCTCATACCAGGA-3' (Sigma) DNA template is genomic DNA from J90!iTERT cells isolated using DNeasy Blood &
Tissue Kit {Qiagen) according to manulacturer's instructions.
Expand Long Template PCR System (Roche) was used for amplification.
Figure imgf000021_0001
PCR conditions:
Figure imgf000021_0002
Specificity and amount of amplified PCR product was verified using agarose gel electrophoresis and by cutting with A fill restriction enzyme and purified using QIAquick PCR Purification Kit (Qiagen) according to manufacturer's instructions.
Example 8: Sequence ID No.7
Chromosome position contiiig 4:10148242-10161859; pi 6.1 according to Ensenibl release 60 - Nov 2010
Inside contiiig is unique 9618 bp; 4:10150242-10159859 Primer positions: 297 bp and 9123 bp inside unique sequence
Product size: 8827 bp
Sequence amplification by Polymerase chain reaction (PCR):
Forward primer (F primer) 5' -GCACATGAGTGGACTCAGGT -3' (Sigma).
Reverse primer (R primer) 5'- CACACACAGCAGGCTGTCTT-3' (Sigma)
DNA template is genomic DNA from MJ90hTERT cells isolated using DNeasy Blood &
Tissue Kit (Qiagcn) according to manufacturer's instructions.
Expand Long Template PCR System (Roche) was used for amplification.
Figure imgf000022_0001
PCR conditions:
Figure imgf000022_0002
Specificity and amount of amplified PCR product was verified using agarose gel electrophoresis and by cutting with Aflll restriction enzyme and purified using QIAquick PCR Purification Kit (Qi gen) according to manufacturer's instructions. Example 9: Sequence fD No.8
Chroinosome position conting 4: 1 I 1879550-1 I 1889350; q25 according to Ensembl release 60 - Nov 2010
Inside conting is unique 5801 bp; 4: 111881550-111887350
Primer positions: 172 bp i 5749 bp inside unique sequence
Product length: 5578 bp
Sequence amplification by Polymerase chain reaction (PCR):
Forward primer (F primer) with sequence for pnl restriction enzyme digestion: 5'- CGCGGTACC-AGAAGGAATCCTCATGATTGC -3' (Sigma).
Reverse primer (R primer) with sequence for Bam HI restriction enzyme digestion 5'- GGCGGATCC-TCATGGTATTGTATTAGGCTC-3' (Sigma)
DNA template is genomic DNA from J90hTERT cells isolated using DNeasy Blood &
Tissue Kit (Oiagen) according to inanufactLirer's instructions.
Expand Long Template PCR System (Roche) was used for amplification.
Figure imgf000023_0001
PCR conditions:
Temperature Time Number of cycles
Initial denaturation 94 °C 3 min 1
Denaturation 94 °C 15 s
Annealing 52 °C 25 s 30
Extension/elongation 68 °C 4 min i 20 s Final elongation 68 °C 7 miii 1
Cooling 4 °C infinite 1
Specificity and amount of amplified PCR product was verified using agarose gel electrophoresis and by cutting with Aflll restriction enzyme and purified using QIAquick PCR Purification Kit (Qiag n) according to manufacturer's instructions.
Example 10: Sequence ID No.9
Chromosome position contiiig 2:239444761 -239452562, 2q37.3 according to Ensembl release 60 -Nov 2010
Inside conting is unique 3800 pb, 2: 239446761-239450560
Primer positions: 3 bp and 3787 bp inside unique sequence
Product length: 3784 bp
Sequence amplification by Polymerase chain reaction (PCR):
Sequence ID NO 9 was not amplified using primers with or without restriction sites (Kpnl and Xbal), F: 5'-CGCGGTACC-CACCTCCTGGAGTAGTGTTC-3' and R: 5'- CTAGTCTAG- CTGAGAGTGAGCACTGCACC 3';
F 5'-CACCTCCTGGAGTAGTTGTTC-3' and R 5'- CTGAGAGTGAGCACTGCACC-3'
Example 11: Sequence ID No.10
Chromosome position conting 3:13744468-13756039, 3p25.1 according to Ensembl release 60 -Nov 2010
Inside conting is unique 7572 bp, 13746468- 13754039
Primer positions: 2440 bp and 5415 bp inside unique sequence
Product length: 2705 bp
Sequence amplification by Polymerase chain reaction (PCR):
Forward primer (F primer) with sequence for Kpnl restiction enzyme digestion:
5'-CGCGGTACC-GGCACATAGAGGAACCTGAG-3' (Sigma).
Revers primer (R primer) with sequence for Xbal restiction enzyme digestion:
5'-CTAGTCTAGA-CATTCCTAATGGTAGGCCAG3' (Sigma)
DNA template is genomic DNA from MJ90hTERT cells isolated using DNeasy Blood &
Tissue Kit {Qiagen) according to manufacturer's instructions.
Expand Long Template PCR System (Roche) was used for amplification. Components Volume (μΙ) Final concentration
H20 33,4
lOx buffer with 27,5 niM MgCI2 5 l
F primer (ΙΟμ ) 1,5 0,3 μ
R primer (ΙΟμ ) 1,5 0,3 μΜ
DNA template (genomic DNA from MJIiTERT cells) 5,4 450 ng
cINTP mix (lOmM eacli) 2 400 μ
Enzyme mix (5 U/ι ]) 0,2 1U
DMSO 1
Total volume 50
PCR conditions:
Figure imgf000025_0001
Specificity and amount of amplified PCR product was verified using agarose gel electrophoresis and by cutting with Aflll restriction enzyme and purified using QIAquick PCR Purification Kit (Oiagen) according to manufacturer's instructions.
Example 12: Sequence ID No.11
Chromosome position conting X:36780658-36784689, Xp21.1 according to Ensembl release 60 -Nov 2010
Inside conting is unique 8220 bp; 36791886-36800105
Primer positions: 2333 bp and 8043 bp inside unique sequence
Product length: 5810 bp
Sequence amplification by Polymerase chain reaction (PCR): Forward primer (F primer) with sequence for Kpnl restiction enzyme digestion
5'-CGCGGTACC-GGCTGAATGCTCACTCTTCC-3' (Sigma).
Reverse primer (R primer) with sequence for Bam H I restiction enzyme digestion
5'-GGCGGATCC-GATGGAGAATCCAGGCTAAG-3' (Sigma)
DNA template is genomic DNA from J90hTERT cells isolated using DNeasy Blood &
Tissue Kit (Qiagen) according to manu acturer's instructions.
Expand Long Template PCR System (Roche) was used for amplification.
Figure imgf000026_0001
PCR conditions:
Figure imgf000026_0002
Specificity and amount of amplified PCR product was verified using agarose gel electrophoresis and by cutting with A fill restriction enzyme and purified using QIAquick PCR Purification Kit (Qiagen) according to manufacturer's instructions.
Example 13: Sequence ID No.12 Chromosome position coining 4:71178426-71189339, 4pl6.1 according to Ensembl release 60 -Nov 2010
Inside conting is unique 6914 bp: 71180426-71187339
Primer positions: 117 bp and 6443 bp inside unique sequence
Product length: 6327 bp
Sequence amplification by Polymerase chain reaction (PCR):
Forward primer (F primer) 5'-GATCCTAGTCCTGACTGTGA-3'(Sigma).
Reverse primer (R primer) 5'-CCGTGTGATCCAGTGGAAGA-3' (Sigma)
DNA template is genomic DNA from J90hTERT cells isolated using DNeasy Blood &
Tissue Kit (Qiagen) according to manufacturer's instructions.
Expand Long Template PCR System (Roche) was used for amplification.
Figure imgf000027_0001
PCR conditions:
Temperature Time umber of cycles
Initial denaturation 94 °C 3 min 1
Denaturation 94 °C 15 s
Annealing 57°C 30 s 30
Extension/elongation 68 °C 6 min
Final elongation 68 °C 7 min 1
Cooling 4 °C infinite 1 Specificity and amount of amplified PCR product was verified using agarose gel electrophoresis and by cutting with Aflll restriction enzyme and purified using QIAquick PCR Purification Kit (Qi gen) according to manufacturer's instructions.
Example 14: Sequence ID No.13
Chromosome position conting 4: 11187164-1 I I 883093, 4q25 according to Ensembl release 60 - Nov 2010
Inside contig is unique 7339 bp: 1 I 1873764- 111881093
Primer positions: 2608 bp and 6573 bp inside unique sequence
Product length: 3965 bp
Sequence amplification by Polymerase chain reaction (PCR):
Forward primer (F primer) with sequence for Kpnl restriction enzyme digestion 5'- C G C G G T A C C -TTC TG A TTC A TG TG G TCG TTC -3' (Sigma).
Revers primer (R primer) with sequence for BamHI restriction enzyme digestion
5'-GGCGGATCC-AAGAACTCAGACTGTTGCAGG3' (Sigma)
DNA template is genomic D A from MJ90hTERT cells isolated using DNeasy Blood &
Tissue Kit (Qiagen) according to manufacturer's instructions.
Expand Long Template PCR System (Roche) was used for amplification.
Figure imgf000028_0001
PCR conditions:
Tcmpcratu re Time Number of cycles
Initial denaturation 94 °C 3 min 1 Denattiration 94 °C 15 s
Annealing 56 °C 30 s 30
Extension/elongation 68 °C 4 min
Final elongation 68 °C 7 min 1
Cooling 4°C infi ite 1
Specificity and amount of amplified PCR product was verified using agarose gel electrophoresis and by cutting with Aflll restriction enzyme and purified using QfAquick PCR Purification Kit (Qiagen) according to manufacturer's instructions.
Example 15: Sequence ID No.14
Chromosome position conting 18:75828111-75836980, according to Ensembl release 60 - Nov 2010
Inside contig is unique 8870 bp: 75826111-75838980
Primer positions: 4 I bp and 8536 bp inside unique sequence
Product length: 8495 bp
Sequence amplification by Polymerase chain reaction (PCR):
Forward primer (F primer) with sequence for pnl restriction enzyme digestion
5'-CGCGGTACC-AACAACTACAGGAGTCGCAA-3' (Sigma).
Revers primer (R primer) with sequence for BamHI restriction enzyme digestion
5'-CGCGGATCC-TCCTCATGTCTCGTTCTTCA3' (Sigma)
D A template is genomic DNA from M.I90hTERT cells isolated using DNeasy Blood &
Tissue Kit (Qiagen) according to manufacturer's instructions.
Expand Long Template PCR System (Roche) was used for amplification.
Components Volume (μΙ) Final concentration
H20 30,73
lOx buffer with 27,5 ni gCh 5 Ix
F primer (10μΜ) 1,5 0,3 μΜ
R primer (ΙΟ ) 1,5 0,3 μΜ
DNA template (genomic DNA from JhTERT cells) 7,27 450 ng
cINTP mix (10m M each) 2,5 500 μ
Enzyme mix (5U/ul) 0,5 2,5U DM SO 1
Total volume 50
PCR conditions:
Figure imgf000030_0001
Specificity and amount of amplified PCR product was verified using agarose gel electrophoresis and by cutting with All 11 restriction enzyme and purified using QIAquick PC Purification Kit (Oiagen) according to manufacturer's instructions.
Example 16: Sequence ID No.15
Chromosome position conting 10:729021 2-72914463 , I0q22.l according to Ensembl release 60 - Nov 2010
Inside contig is unique 8002 bp: 72902162-72 14163
Primer positions: 28 bp and 4561 bp inside unique sequence
Product length: 4533 bp
Sequence amplification by Polymerase chain reaction (PCR):
Forward primer (F primer) with sequence for Kpnl restriction enzyme digestion
5'-CGCGGTACC-AGATAATCCAGTCTCCAAGG -3' (Sigma).
Reverse primer (R primer) with sequence for Not I. restriction enzyme digestion
5'-TAAAGCGGCCGC-GCAGAAGAGCCTTTAGAAAA3' (Sigma)
DNA template is genomic DNA from MJ90hTERT cells isolated using DNeasy Blood &
Tissue Kit (Qiagen) according to manufacturer's instructions.
Expand Long Template PCR System (Roche) was used for amplification. Components Volume (μΐ) Final concentration
11,0 32,68
lOx butter with 27,5 inM MgCI2 5 Ix
F primer (ΙΟμΜ) 1,5 0,3 μ
R primer (ΙΟμΜ) 1,5 0,3 μΜ
DNA template (genomic DNA from MJhTERT cells) 6,36 450 ng
clNTP mix (lOmM each) 1,75 350 μΜ
Enzyme mix (5 U/μί) 0,2 1U
DMSO 1
Total volume 50
PCR conditions:
Figure imgf000031_0001
Specificity and amount of amplified PCR product was verified using agarose gel electrophoresis and by cutting with Aflll restriction enzyme and purified using QIAquick PCR Purification Kit (Oiagen) according to manufacturer's instructions.
Example 17: Sequence ID No.16
Chromosome position conting 8: 140505 13- 1405 I 8109, 8q24.3 according to Ensembl release 60 -Nov 2010
Inside contig is unique 8197 bp: 140507913-140516109
Primer positions: 129 bp and 8104 bp inside unique sequence
Product length: 7976 bp
Sequence amplification by Polymerase chain reaction (PCR): Sequence 8-3 could not be amplified by PCR using primers (FP) AATTCCAGCAGGTCTTGTCC-3' and ( P) 5'-CTATGAGAAGCTGCTCCTGA - primers with restriction site sequence.
Example 18:
Cloning of target sequences
All sequences amplified by PCR were cloned into pCR ll-TOPO vector (Invitrogen) for further amplification in bacteria and subsequent sequencing. Sequence ID No 13 was directly cloned into phC VGFP vector for amplification and sequencing.
Figure imgf000032_0001
Protocol:
1. Incubate ligation mixture for 5 min at room temperature.
2. 2 μΙ of ligation mixture add to 25 μΙ One Shot F10 electrocompetent bacteria (Invitrogen), mix with a tip of the tips and pour into cold cuvette for electroporation.
3. Electroporate bacteria at 2,5 kV in 0,2 cm cuvette, add 1 ml SOC media.
4. Incubate I h at 37°C and 250 rpm shaking.
5. Centrifuge bacteria, decant SOC and add 250 μΙ of sterile reH20 and resuspend,
6. Seed 5 μΙ and 50 μΙ of bacterial suspension on heated plates, with 34 pg/ml kanamycin antibiotic and coated with 80 μΙ 20 mg/ml X-Gal in DMF.
7. Overnight incubation at 37 °C.
Exam le 19:
Sequencing of target sequences for positive identification
Plasmids that contained cloned sequences (pCR II-TOPO+ sequence) were isolated from bacteria cultures using QIAGEN kit according to manufacturer's instructions. Nucleotide sequence of sequences cloned into pCR ll-TOPO was determined using kit ABI PRISM BigDye Terminator v3.J in 1 B DNA-servis. For every sequencing reaction plasmid concentration was 500 ng/μΙ. Using Ml 3 forward primer (CTGGCCGTCGTTTTAC) and l 3 reverse primer (CAGGAAACAGCTATGAC) (Invitrogen), nucleotide sequence from both ends was determined. Results of sequencing were verified using NCB1 database. After determining sequence identity, their unique presence in the genome was checked using Southern blot hybridization.
Example 20:
Labeling of DNA sequence specific probe
300 ng of PCR product for every sequence was diluted in 15 μΐ reH20 and incubated for 10 min in termoblock at I00°C to denature DNA, and then chilled for 5 min on ice, following addition of: 2 μΙ 10X buffer, 2 μΙ nucleotides mixture with DIG- 11-dUTP and I μΙ lenow enzyme. Mixture was incubated at 37°C overnight. To stop the reaction and to precipitate DNA 2 μΙ 0,2 M EDTA, 2,5 μΙ 4 M LiCI, 75 ml absolute cold ethanol and 1 μΙ glycogen were added and reaction was incubated for 2h at -20°C. Efficiency of probe labeling was checked by clot blot test that compared labeled probes with labeled control DNA. Labeled DNA probes were stored at -20°C and were used during several months.
Example 21:
Southern blot hybridization
Suitable amounts (from 7 to 10 μg) of genomic DNA were digested with restriction enzymes, for every sequence two different digestion combinations that give fragments of different lengths were chosen (see table). Digested DNAs were than separated on 0,8% agarose gel electrophoresis at 120 V for 3 h, following gel clepurination in 0,25 M HCI for 5 min., denaturation in 0,5 M NaOH and 1,5 M NaCl 2x15 min, and neutralization in 0,5 M Tris HCI pH 7,5 and 3 M NaCl 2 15 min. Gel was than washed in 2 x SSC (3 M NaCl, 0,3 M sodium citrate), and DNA transfer on positive charged nylon membrane (Roche) was performed overnight in 20 x SSC. After that membrane was washed in 2 x SSC and fixed for 30 min at 120°C. Membrane was prehybi lizated in prehybridization buffer (0,25 M Na2HP0 pH 7,2, 1 niM EDTA, 20% SDS) at ~68°C for 2h and hybridized at 68°C overnight in hybridization buffer that contained digoxigenin labeled probe (-15 ng/ml) specific for every sequence. Membrane was washed in buffer (20 niM Na2HP04, 1 mM EDTA, 1% SDS) 3 x 20 min at 65"C. After that membrane was washed in buffer pH 8 (0,1 M maleic acid, 3 M NaCl, 0,3% Tvveen 20) and incubated Ih in blocking buffer (0,5% blocking reagens in washing buffer). Membrane was incubated with anti-D!G-AP conjugate in blocking buffer in ratio 1:20000, for 30 min. Then membrane was washed 5 x 10 min in washing buffer, and incubated 2x5 min in substrate buffer (0,1 NaCl 0,1 Tris HCl pH 9,5). CDP-Star (Roche diagnostic) in substrate buffer (ratio 1:100) was used for detection. Membrane was exposed to chemiluminiscent film (Roche diagnostic) and films were developed using universal developer and fixer. Films were scanned and densitometry analysis was performed. Results confirmed presence of unique DNA fragments of these sequences (see Table 1).
Table I: Identification and verification of specific fragments of unique sequences following restriction enzymes digestion and Southern blot hybridization
Figure imgf000035_0001
Example 22:
Recloning of sequences from TOPO into phCMV-cGFP plasmid
TOPO plasinids that contained chosen sequences were digested using proper restriction enzymes in order to redone sequences into phCM V-cGFP plasmid. Sequence 4.4 was cloned directly into phCMV-cGFP plasmid.
Seq.- Restr. enzyme
1 - Kpn 1
10 - Kpn 1
2 - Kpn 1 i Not 1
3 - Kpn 1 i BamHI
4- Kpn 1 i Bam HI
15 - Kpn ! i Not 1
16 - Kpn 1 i Not 1
14 - Kpnl BamHI
11 - Kpnl BamHI
After digestion fragments were separated on agarose gels and desired bends (sequences) were isolated from the gels using Qiagen gel extraction kit. Sequences were ligated with prior digested pliCM V-cGFP plasm ids. Vector-ii isert ratios in ligation mixtures were 1:8; 1:1; 1:3 i 3:1. Ligation lasted 3 h at 22 °C and at 4 °C overnight.
Example 23:
Chemical transformation of XL-Gold bacteria using ligation mixture (Stratagcnc protocol)
1. Chill I 5ml tubes on ice and heat NZY media at 42 °C
2. Dilute XL-Gold chemical competent bacteria on ice and aliquot 100 μΙ bacteria in chilled I 5ml tubes.
3. Add 4 μΙ β mercaptoethanol in bacteria aliquot.
4. Mix gently and incubate 10 min on ice. Mix gently every 2 min.
5. Add 3 Ι of ligation mixture in every bacteria aliquot.
6. Mix gently and incubate on ice for 30 min.
7. Make heat shock in water bath at 42 °C, 30 s.
8. Incubate tubes on ice for 2 min.
9. Add 0,9 ml heated NZY media and incubate I h at 37 °C with shaking on 225-250 rpm..
10. Seed bacteria on kanamycin LB plates.
I 1. Incubate plates at 37 °C overnight.
Grown colonies were seeded in LB liquid media and incubated at 37 °C overnight with shaking at 225-250 rpm. Plasmid DNAs were isolated from bacteria using DNA Qiagen kit for plasmid DNA isolation.
Exam le 24:
Cell growth
MI90hTERT cells (normal human skin fibroblasts with active telomerase) were maintained in DMEM medium (Sigma) supplemented with 10% fetal bovine serum (Gibco), in humidified atmosphere with 5% CO 2 at 37 0 C. Cells were passaged when they reached 80% confluency.
Example 25: Electroporation of M.J90hTE T cells with phC V-cGFP plasmids that contain target se uences
M.I90hTERT cells were subcLiltured 24 h prior to electroporation in order to be -50-60% confluent at the time of electroporation. I07 of cells were collected in 0,4 ml DMEM media without serum and transferred in cold cuvette with 0,4 cm space between electrodes.20 μ¾ of phCMV-cGFP plasmid diluted in 20 μΙ TE buffer was added to the cuvette. Used phCMV- cGFP plasmids contained chosen sequences and were linearized using proper restriction enzymes. Cells were electroporated under these conditions: voltage 220V, time constant 30ms, capacity 960 μΡ. After electroporation cells were transferred to 6 T I 50 flasks with DMEM media supplemented with 10% FBS and penicillin and streptomycin (100 lU/ml pen, 100 strep). Media was changed 24 h after electroporation and G418 selection antibiotic was added in final concentration of 500 ng/ml.
A ter two to three weeks of selection, individual clones were isolated using glass rings for clone isolation and transferred in 6 well-plates. After reaching confluency clones were subcultured in T25 flasks for freezing, and in T I 50 flasks for DNA isolation.

Claims

C LA IMS
1 . A method to operate a data-processing system for determ ination of one or more neutral DNA sequences of a selected genome serving as an insertion locus, comprising steps of:
A. loading of the genome into said data-processing system in the form of an array;
B. marking the reported functional features according to the one or more auxi liary database's records from the array obtained in step A by tagging the start and stop positions of each known functional feature selected from : regulatory or coding regions, functional genes, satell ite DNA, introns, exons, ni icroRNAs, pseudogenes, heteiOchromatin, repetitive sequences, transposable elements, and exceptions marked as having functional feature in auxil iary database;
C. filtering out the reported functional features marked in step B. from the array in step A. and form ing set of independent sequences characterized by the length;
D. sorting the obtained sequences by length;
E. fi ltering out the sequences from step D. having less than 35 kb in order to obtain sequences of 35 k or longer;
F. for each sequence extracted in step E the regions of at least 7 kb from the end and at least 7 kb from the right end of said sequence were cut-out from the sequence; and
G. the BLAST analysis is executed over the narrowed sequence obtained in step F. to con firm their unique status in the selected genome.
2. A method as defined in claim I , wherein the auxi liary databases in step B. are:
EnsE BL, NCBI , UCSC, Vega M BG D, MetaCyc, B ioCyc, Hymenoptera Genome Database, AR db, itoZoa, or on-B database.
3. A method as defined in claim 1 or 2, wherein selected genome is any completely or partially annotated genome sequence.
4. A method as defined in claims 1 -3, wherein the obtained sequences in step G. are further tested for design of primers for PGR ampl ificat ion, and el im inated from the l ist i f not suitable for PGR amplification.
5. A data processing system for determ ination of one or more neutra l DNA sequences of a selected genome, comprising means for carrying out the method of claims 1 -4.
6. A DNA sequence obtained via method claimed in claims 1-5, wherein said DNA sequence is neutral DNA sequence of the selected genome serving as an insertion locus.
7. A DNA sequence according the claim 6, wherein genome is selected to be human genome.
8. A method for inserting a DNA of interest into the genome of interest, having neutral sequence serving as an insertion locus; comprising transfection of selected cell or cells containing said genome with the recombinant vector containing the DNA of interest, suitable selection marker, regulatory elements, gene insertion system and neutral sequence according to claim 6 or 7.
9. A method according to claim 8, wherein DNA of interest encodes a protein, or non- coding NA.
10. A method according to claims 8-9, wherein gene insertion system is site specific recombinant system.
11. A gene insertion system according to claims 8-9, wherein gene insertion is selected from:
Cre-Lox, R/RS, modified Gin/gix system, FLP/FRT, S1RT, Red/ET, TAG IT, Zinc finger nuclease or meganuclease.
12. A cell which has integrated DNA of interest into the predetermined site in the genome, wherein the predetermined site in the genome is a neutral sequence according to claim 6 or
13. A set of DNA sequences consisting of: SEQ ID NO: I, SEQ ID NO:2, SEQ ID N0:3, SEQ ID ΝΟΆ SEQ ID NO:5, SEQ ID NO:6, SEQ ID NO:7, SEQ ID NO:8, SEQ ID NO:9, SEQ ID NO: 10, SEQ IDNO:ll; SEQ ID NO: 12, SEQ ID NO: 13, SEQ ID NO: 14, SEQ ID NO: I 5 and SEQ ID NO: 16; wherein said DNA sequences are neutral sequences of the human genome serving as insertion loci.
14. A human cell which has integrated DNA of interest into the predetermined site in the genome being selected from SEQ ID NO: I, SEQ ID NO:2, SEQ ID NO:3, SEQ ID NO:4, SEQ ID NO:5, SEQ ID NO:6, SEQ ID NO:7, SEQ ID NO:8, SEQ ID NO:9, SEQ ID NO:IO, SEQ ID NO:ll; SEQ ID N0:I2, SEQ ID N0:13, SEQ ID NO: 14, SEQ ID NO: 15 and SEQ ID NO: 16.
15. An organism with determined one of more neutral DNA sequences of the genome serving as an insertion locus in accordance to claim 6, wherein the DNA of interest is integrated into.
Use of neutral sequences obtained in claim 6 or 7 for producing cell lines, transgenic plants and animals and for gene therapy.
PCT/HR2013/000003 2012-03-27 2013-03-26 Method of determination of neutral dna sequences in the genome, system for targeting sequences obtained thereby and methods for use thereof WO2013144663A2 (en)

Applications Claiming Priority (2)

Application Number Priority Date Filing Date Title
US201261616284P 2012-03-27 2012-03-27
US61/616,284 2012-03-27

Publications (2)

Publication Number Publication Date
WO2013144663A2 true WO2013144663A2 (en) 2013-10-03
WO2013144663A3 WO2013144663A3 (en) 2014-04-17

Family

ID=48468672

Family Applications (1)

Application Number Title Priority Date Filing Date
PCT/HR2013/000003 WO2013144663A2 (en) 2012-03-27 2013-03-26 Method of determination of neutral dna sequences in the genome, system for targeting sequences obtained thereby and methods for use thereof

Country Status (1)

Country Link
WO (1) WO2013144663A2 (en)

Cited By (5)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN105980395A (en) * 2013-11-04 2016-09-28 美国陶氏益农公司 Optimal soybean loci
US9909131B2 (en) 2013-11-04 2018-03-06 Dow Agrosciences Llc Optimal soybean loci
US10093940B2 (en) 2013-11-04 2018-10-09 Dow Agrosciences Llc Optimal maize loci
US10273493B2 (en) 2013-11-04 2019-04-30 Dow Agrosciences Llc Optimal maize loci
WO2021047363A1 (en) * 2019-09-12 2021-03-18 浙江大学 Method for using whole genome re-sequencing data to quickly identify transgenic or gene editing material and insertion sites thereof

Family Cites Families (1)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
US20060210967A1 (en) * 2004-07-02 2006-09-21 Agan Brian K Re-sequencing pathogen microarray

Non-Patent Citations (29)

* Cited by examiner, † Cited by third party
Title
"Ensembl release 60", CHROMOSOME POSITION CONTING, vol. 18, November 2010 (2010-11-01), pages 75828111 - 75836980
"Ensembl release 60", CHROMOSOME POSITION CONTING, vol. 3, November 2010 (2010-11-01), pages 13744468 - 13756039
"Ensembl release 60", CHROMOSOME POSITION CONTING, vol. 4, November 2010 (2010-11-01), pages 10148242 - 10161859
"Metode u molekularnoj biologiji", 2007, RUDER BOSSKOVIC INSTITUTE
CASOLA S: "Mouse models for miRNA expression: the ROSA 26 locus", METHODS MOL BIOL, vol. 667, 2010, pages 145 - 163
CHEN C: "A comparison of exogenous promoter activity at the ROSA26 locus using PhiC31 integrase mediated cassette exchange Approach in mouse ES cells", PLOS ONE, vol. 6, no. 8, 2011, pages E23376
DEKELVER ET AL.: "Functional genom ics, proteomics, and regulatory DNA analysis in isogenic setting using zinc finger nuclease-driven transgenesis into safe harbour locus in the human genome", GENOME RESEARCH, vol. 20, 2010, pages 1 133 - 1142
HACEIN-BEY-ABINA S ET AL., N. ENGL. J. MED., vol. 348, 2003, pages 255 - 256
HACEIN-BEY-ABINA S ET AL., SCIENCE, vol. 302, 2003, pages 415 - 419
INSIDE CONTIG IS UNIQUE 8870 BP, pages 75826111 - 75838980
INSIDE CONTING IS UNIQUE 7572 BP, pages 13746468 - 13754039
INSIDE CONTING IS UNIQUE 9618 BP, vol. 4, pages 10150242 - 10159859
LINDEN RM ET AL.: "Site-specific integration by adeno-associated virus", PNAS, USA, vol. 93, 1996, pages 11288 - 11294
LISOWSKI L. ET AL.: "Ribosomal DNA integrating rAAV-rDNA vectors allow for stable transgene expression.", MOLECULAR THERAPY, vol. 20, 2012, pages 1912 - 1923
LOMBARDO A ET AL.: "Gene editing in human stemcells using zinc finger nucleases and integrase defective lentiviral vector delivery", NAT. BIOTECHNOL., vol. 25, 2007, pages 1298 - 1306
ORBAN PC ET AL.: "Tissue-and site specific DNA recombination in transgenic mice", PROC. NATL. ACAD. SCI. USA, 1992, pages 6861 - 6865
PAPAPETROU ET AL.: "Genomic safe harbours permit high ?-globin transgene expression in thalassemia induced pluripotent stem cells", NAT. BIOTECHNOLOGY, vol. 29, 2011, pages 73 - 78, XP055089159, DOI: doi:10.1038/nbt.1717
PAPAPETROU: "Genomic safe harbours permit high ?-globin transgene expression in thalassemia induced pluripotent stem cells", NAT. BIOTECHNOLOGY, vol. 29, 2011, pages 73 - 78, XP055089159, DOI: doi:10.1038/nbt.1717
PORTEUS MH; CARROLL D: "Gene targeting using zinc finger nucleases", NAT. BIOTECHNOL., vol. 23, 2005, pages 967 - 973, XP002467422, DOI: doi:10.1038/nbt1125
RENAULT S; DUCHATEAU PHILIPPE: "Site-directed insertion oftrnnsgenes: 23 (Topics in Current Genetics", 2012, SPRINGER
S. F. ALTSCHUL; T. L. MADDEN; A. A. SCH''AFFER; J. ZHANG; Z. ZHANG; W. MILLER; D. LIPMAN: "Gapped blast and psi-blast: a new generation of protein database search programs", NUCLEIC ACIDS RESEARCH, vol. 25, 1997, pages 3389 - 3402, XP002905950, DOI: doi:10.1093/nar/25.17.3389
SAKAI K ET AL.: "Human ribosomal RNA cluster: identification of the proximal end containing a novel tandem repeat sequence", GENOMICS, vol. 26, 1995, pages 521 - 526, XP004828667, DOI: doi:10.1016/0888-7543(95)80170-Q
SAMBROOK; FRITSCH; MANIATIS: "Molecular cloning: a laboratory manual", 2001, COLD SPRING HARBOR LABORATORY PRESS
SMALLWOOD A; REN B: "Genome organisation and long range regulation of gene expression by enhancers", CURRENT OPINION IN CELL BIOLOGY, vol. 25, 2013, pages 1 - 8
TRION S ET AL.: "Identification and targeting of the ROSA26 locus in human cmbryonic stem cells", NAT BIOTECHNOL, vol. 25, 2007, pages 1477 - 1482, XP008111005, DOI: doi:10.1038/nbt1362
VASQUEZ KM ET AL.: "Manipulating the mammalian genome by homologous recombination", PROC NATL ACAD SCI US A, vol. 98, no. 15, 2001, pages 8403 - 10, XP002320465, DOI: doi:10.1073/pnas.111009698
YANEZ RJ; PORTER ACG, GENE THER., vol. 5, no. 2, 1998, pages 149 - 59
ZAMBROWICZ BP: "Disruption of overlapping transcripts in the ROSA beta geo 26 gene trap strain leads to widespread expression of ?-galactosidase in mouse embryos and hematopoietic cells", PROC NATL ACAD SCI U S A., vol. 94, no. 8, 1997, pages 3789 - 94, XP002919951, DOI: doi:10.1073/pnas.94.8.3789
ZAMBROWIEZ BP ET AL.: "Disruption of overlapping transcripts in the ROSA beta geo 26 gene trap strain leads to widespread expression of ?-galactosidase in mouse embryos and hematopoietic cells", PROC NATL ACAD SCI USA., vol. 94, no. 8, 1997, pages 3789 - 94, XP002919951, DOI: doi:10.1073/pnas.94.8.3789

Cited By (12)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN105980395A (en) * 2013-11-04 2016-09-28 美国陶氏益农公司 Optimal soybean loci
EP3066109A4 (en) * 2013-11-04 2017-11-29 Dow AgroSciences LLC Optimal soybean loci
US9909131B2 (en) 2013-11-04 2018-03-06 Dow Agrosciences Llc Optimal soybean loci
US10093940B2 (en) 2013-11-04 2018-10-09 Dow Agrosciences Llc Optimal maize loci
US10106804B2 (en) 2013-11-04 2018-10-23 Dow Agrosciences Llc Optimal soybean loci
US10233465B2 (en) 2013-11-04 2019-03-19 Dow Agrosciences Llc Optimal soybean loci
US10273493B2 (en) 2013-11-04 2019-04-30 Dow Agrosciences Llc Optimal maize loci
US11098316B2 (en) 2013-11-04 2021-08-24 Corteva Agriscience Llc Optimal soybean loci
US11098317B2 (en) 2013-11-04 2021-08-24 Corteva Agriscience Llc Optimal maize loci
US11149287B2 (en) 2013-11-04 2021-10-19 Corteva Agriscience Llc Optimal soybean loci
US11198882B2 (en) 2013-11-04 2021-12-14 Corteva Agriscience Llc Optimal maize loci
WO2021047363A1 (en) * 2019-09-12 2021-03-18 浙江大学 Method for using whole genome re-sequencing data to quickly identify transgenic or gene editing material and insertion sites thereof

Also Published As

Publication number Publication date
WO2013144663A3 (en) 2014-04-17

Similar Documents

Publication Publication Date Title
Aida et al. Cloning-free CRISPR/Cas system facilitates functional cassette knock-in in mice
EP3008181B1 (en) Methods and compositions for target dna modification
Shang et al. Chickens possess centromeres with both extended tandem repeats and short non-tandem-repetitive sequences
Hicks et al. Functional genomics in mice by tagged sequence mutagenesis
JP2021166513A (en) CRISPR-Cas COMPONENT SYSTEM, METHOD AND COMPOSITION FOR SEQUENCE MANIPULATION
EP1034260B1 (en) Novel dna cloning method relying on the e. coli rece/rect recombination system
CN110527697B (en) RNA fixed-point editing technology based on CRISPR-Cas13a
EP2690177B1 (en) Protein with recombinase activity for site-specific DNA-recombination
WO2017107898A2 (en) Compositions and methods for gene editing
CN110343724B (en) Method for screening and identifying functional lncRNA
WO2013144663A2 (en) Method of determination of neutral dna sequences in the genome, system for targeting sequences obtained thereby and methods for use thereof
Chalker Transformation and strain engineering of Tetrahymena
Kapusi et al. phiC31 integrase-mediated site-specific recombination in barley
WO2019041344A1 (en) Methods and compositions for single-stranded dna transfection
Lavalou et al. Strategies for genetic inactivation of long noncoding RNAs in zebrafish
CN106520829B (en) method for terminating double allele transcription
Ishizaki et al. Multicopy genes uniquely amplified in the Y chromosome‐specific repeats of the liverwort Marchantia polymorpha
Zhang et al. Harnessing eukaryotic retroelement proteins for transgene insertion into human safe-harbor loci
Higo et al. Genome editing in human induced pluripotent stem cells (hiPSCs)
WO2021004456A1 (en) Improved genome editing system and use thereof
CN114836418A (en) CRISPR-Cas13d system for knocking down porcine epidemic diarrhea virus
CN113881678B (en) C/EBPZ gene promoter and application thereof
WO2023081762A2 (en) Serine recombinases
Kouprina et al. From selective full-length genes isolation by TAR cloning in yeast to their expression from HAC vectors in human cells
CN112458080A (en) siRNA fishing method for obtaining lncRNA LOC157273

Legal Events

Date Code Title Description
121 Ep: the epo has been informed by wipo that ep was designated in this application

Ref document number: 13723948

Country of ref document: EP

Kind code of ref document: A2

122 Ep: pct application non-entry in european phase

Ref document number: 13723948

Country of ref document: EP

Kind code of ref document: A2