WO2015008245A2 - Methods of identification of ethnic origin based on differentiated transcription profiles and genetic markers used in those methods - Google Patents

Methods of identification of ethnic origin based on differentiated transcription profiles and genetic markers used in those methods Download PDF

Info

Publication number
WO2015008245A2
WO2015008245A2 PCT/IB2014/063179 IB2014063179W WO2015008245A2 WO 2015008245 A2 WO2015008245 A2 WO 2015008245A2 IB 2014063179 W IB2014063179 W IB 2014063179W WO 2015008245 A2 WO2015008245 A2 WO 2015008245A2
Authority
WO
WIPO (PCT)
Prior art keywords
seq
human
uts2
identification
chi3l2
Prior art date
Application number
PCT/IB2014/063179
Other languages
French (fr)
Other versions
WO2015008245A3 (en
Inventor
Izabela SABAŁA
Rafał WIERZCHOSŁAWSKI
Ewa ZIĘTKIEWICZ
Michał WITT
Patrycja DACA-ROSZAK
Barbara JARZĄB
Michał JARZĄB
Małgorzata OCZKO-WOJCIECHOWSKA
Jadwiga ŻEBRACKA-GALA
Dagmara RUSINEK
Małgorzata KOWALSKA
Aleksandra Pfeifer
Michał ŚWIERNIAK
Monika KOWAL
Tomasz TYSZKIEWICZ
Original Assignee
Międzynarodowy Instytut Biologii Molekularnej I Komórkowej
Centrum-Onkologii-Instytut Im. M. Skłodowskiej-Curie W Warszawie, Oddział W Gliwicach
Instytut Genetyki Człowieka Polskiej Akademii Nauk
Priority date (The priority date is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the date listed.)
Filing date
Publication date
Application filed by Międzynarodowy Instytut Biologii Molekularnej I Komórkowej, Centrum-Onkologii-Instytut Im. M. Skłodowskiej-Curie W Warszawie, Oddział W Gliwicach, Instytut Genetyki Człowieka Polskiej Akademii Nauk filed Critical Międzynarodowy Instytut Biologii Molekularnej I Komórkowej
Publication of WO2015008245A2 publication Critical patent/WO2015008245A2/en
Publication of WO2015008245A3 publication Critical patent/WO2015008245A3/en

Links

Classifications

    • CCHEMISTRY; METALLURGY
    • C12BIOCHEMISTRY; BEER; SPIRITS; WINE; VINEGAR; MICROBIOLOGY; ENZYMOLOGY; MUTATION OR GENETIC ENGINEERING
    • C12QMEASURING OR TESTING PROCESSES INVOLVING ENZYMES, NUCLEIC ACIDS OR MICROORGANISMS; COMPOSITIONS OR TEST PAPERS THEREFOR; PROCESSES OF PREPARING SUCH COMPOSITIONS; CONDITION-RESPONSIVE CONTROL IN MICROBIOLOGICAL OR ENZYMOLOGICAL PROCESSES
    • C12Q1/00Measuring or testing processes involving enzymes, nucleic acids or microorganisms; Compositions therefor; Processes of preparing such compositions
    • C12Q1/68Measuring or testing processes involving enzymes, nucleic acids or microorganisms; Compositions therefor; Processes of preparing such compositions involving nucleic acids
    • C12Q1/6876Nucleic acid products used in the analysis of nucleic acids, e.g. primers or probes
    • C12Q1/6881Nucleic acid products used in the analysis of nucleic acids, e.g. primers or probes for tissue or cell typing, e.g. human leukocyte antigen [HLA] probes
    • CCHEMISTRY; METALLURGY
    • C12BIOCHEMISTRY; BEER; SPIRITS; WINE; VINEGAR; MICROBIOLOGY; ENZYMOLOGY; MUTATION OR GENETIC ENGINEERING
    • C12QMEASURING OR TESTING PROCESSES INVOLVING ENZYMES, NUCLEIC ACIDS OR MICROORGANISMS; COMPOSITIONS OR TEST PAPERS THEREFOR; PROCESSES OF PREPARING SUCH COMPOSITIONS; CONDITION-RESPONSIVE CONTROL IN MICROBIOLOGICAL OR ENZYMOLOGICAL PROCESSES
    • C12Q2600/00Oligonucleotides characterized by their use
    • C12Q2600/158Expression markers

Definitions

  • the object of the invention is the use of selected genetic markers showing transcriptional variability in human cells in terms of their Caucasian or Asian ethnic origin in methods of identification of human biological material and human cell separation, in particular their mixtures as well as uses thereof, in particular in forensics.
  • the aim of forensic examination is to identify persons that left biological traces at a crime scene.
  • the most common method of identification of persons based on biological traces is the genetic material - DNA analysis.
  • the most popular DNA profiling method used by forensic laboratories for identification purposes is the multiplex analysis of STR polymorphism (STR, short tandem repeats) i.e. repeated short non-coding DNA sequences) based on differences in the number of examined STRs between individuals.
  • STR polymorphism short tandem repeats
  • the newest generation of STR polymorphism analysis sets STR regions in 17 loci, located in various chromosomes of human genome are evaluated simultaneously. This ensures a significant power of discrimination, i.e. probability bordering on certainty, that any particular STR profile will not repeat in a random individual.
  • Literature provides information about mRNA markers that are characteristic for the Caucasian, Asian or African ethnic groups (Schman et al. 2007; Storey et al. 2007), however, these data have not been verified in terms of their usefulness in forensic analyses.
  • the aim of the present invention is to overcome the described difficulties and to enable identification of the ethnic origin based on transcription markers through the provision of innovative tools allowing for the determination of marker mRNA transcription profiles in single human cells, thus enabling the segregation of cells with regard to their Caucasian or Asian origin; based on the above the characterization of forensic traces and precise separation of mixed traces allowing further identification of particular components using common forensic methods.
  • the aim of the invention is, therefore, to provide a set of genetic markers allowing the classification and separation of single human cells according to their ethnic Caucasian or Asian origin.
  • the inventors performed a comprehensive evaluation of the cellular transcriptome for the Caucasian and Asian populations using the technology of hybridization of expression microarrays.
  • 26 genetic markers were identified, with particularly differentiating results obtained for 20 genes (genetic markers), which expression level is correlated with the ethnic origin and differentiated within the examined populations.
  • a classifier validation was also carried out in order to confirm the usefulness of individual genes for ethnic identification of the examined cells.
  • the identified marker genes had not been characterized before in terms of the correlation of their expression with ethnic origin nor the possibility of being used for separation of biological material, which implies innovative character of the obtained results.
  • transcriptome profiling projects so far used cultured lymphocyte cell lines as research material, due no doubt to the ease of obtaining and propagating of such a material.
  • the universal character of the population-specific transcriptome markers in other human tissues has not been confirmed.
  • the markers identified in the model cell cultures were compared with those characteristic for tissues most commonly found in forensic evidence material, i.e. peripheral blood, dermis, epidermis or buccal epithelium
  • RNAs extracted from the forensic traces which were even up to 547 days old, were proven useful in forensic analysis (sequencing).
  • the invention relates to a method of identification and separation of human cell mixture with regard to the Caucasian or Asian ethnic origin, wherein the method comprises the following:
  • a labeled probe being a nucleotide sequence complementary to the mRNA sequence of at least one genetic marker selected from human CYPIB I, CHI3L2, MOXDl, DBNDD2, UGT2B17, UTS2, CD47, SMC6, PLA2G4C, C160RF75, C10RF115, SLC7A7, PEX6, RABEPl, S1PR4, SNHG8, TBC1D4, UGT2B7, GPR56, HS.137971, IFITM3, LOC644936, LOC729708, CDC42EP5, GAPDHL6, HSPC157, or, more preferably, at least two or more various genetic markers selected from human CYPIBI, CHI3L2, MOXDl, DBNDD2, UGT2B17, UTS2, CD47, SMC6, PLA2G4C, C160RF75, C10RF115, SLC7A7, PEX6, RABEPl
  • the probe is complementary to at least one genetic marker, more preferably to at least two genetic markers selected from human CYPIBI, CHI3L2, MOXDl , DBNDD2, UGT2B 17, UTS2.
  • the probe is fluorescently labeled, whereas the separation of human cells is carried out by means of laser microdissection or in a Fluorescence-Activated Cell Sorter (FACS).
  • FACS Fluorescence-Activated Cell Sorter
  • a probe being a nucleic acid complementary to the particular genetic marker is a Stellaris type probe, Singer type probe or a combination thereof.
  • a probe complementary to the genetic marker is complementary to the sequence selected from CYPIBI of SEQ ID Nol, CHI3L2 of SEQ ID No2, MOXDl of SEQ ID No 3, DBNDD2 of SEQ ID No4, UGT2B 17 of SEQ ID No5, UTS2 of SEQ ID No6, CD47 of SEQ ID No7, SMC 6 of SEQ ID No8, PLA2G4C of SEQ ID No9, C 160RF75 of SEQ ID No 10, ClORFl 15 of SEQ ID Nol l, SLC7A7 of SEQ ID No 12, PEX6 of SEQ ID Nol 3, RABEPl of SEQ ID Nol4, S1PR4 of SEQ ID Nol5, SNHG8 of SEQ ID Nol6, TBC1D4 of SEQ ID Nol 7, UGT2B7 of SEQ ID Nol8, GPR56 of SEQ ID Nol9, HS.137971 of SEQ ID No20, IFITM3 of SEQ
  • the human cell mixture is a forensic trace.
  • the invention also relates to an identification method of the ethnic origin of human biological material, in particular a human cell with regard to the Caucasian or Asian origin, which includes a determination of the mRNA quantity in human biological material, in particular a human cell, of at least one genetic marker selected from human CYP1B 1, CHI3L2, MOXD1, DBNDD2, UTS2, CD47, SMC6, PLA2G4C, C160RF75, C10RF115, SLC7A7, RABEPl, S1PR4, SNHG8, TBC1D4, UGT2B7, GPR56, HS.137971, IFITM3, LOC644936, LOC729708, CDC42EP5, GAPDHL6, HSPC157; more preferably, the mRNA quantity is assessed for at least one additional genetic marker other than selected previously, selected from human CYP1B1, CHI3L2, MOXD1, DBNDD2, UTS2, CD47, SMC6, PLA2G4C, C160RF75, C10RF115, SLC7
  • the means for determining the mRNA quantity is a microarray analysis, Taqman Low-Density Array (TLDA), Real Time PCR (also called Quantitative PCR, QPCR) amplification or Fluorescence In Situ Hybridization (FISH).
  • TLDA Taqman Low-Density Array
  • QPCR Real Time PCR
  • FISH Fluorescence In Situ Hybridization
  • the means for determining the mRNA quantity in the method of the identification of the ethnic origin of human biological material, in particular a human cell is the FISH hybridization, preferably the marker is selected from: CYP1B 1, CHI3L2, MOXD1, DBNDD2, UTS2.
  • the marker when the means for determining the mRNA quantity is a microarray analysis, the marker is selected from human UTS2, SMC6, CD47, HS.137971, C160RF75, RABEPl, S1PR4, HSPC157; or when the means for determining the mRNA quantity is the Taqman Low-Density Array, the marker is selected from human UTS2, CHI3L2, C10RF115, C160RF75; or when the means for determining the mRNA quantity is PCR amplification, the marker is selected from human UTS2.
  • the genetic marker is selected from CYP1B1 of SEQ ID Nol, CHI3L2 of SEQ ID No2, MOXD1 of SEQ ID No3, DB DD2 of SEQ ID No4 confront UTS2 of SEQ ID No6, CD47 of SEQ ID No7, SMC6 of SEQ ID No8, PLA2G4C of SEQ ID No9, C160RF75 of SEQ ID NolO, C10RF115 of SEQ ID Nol l, SLC7A7 of SEQ ID Nol2, RABEPl of SEQ ID Nol4, S1PR4 of SEQ ID Nol5, SNHG8 of SEQ ID Nol6, TBC1D4 of SEQ ID Nol7, UGT2B7 of SEQ ID Nol8, GPR56 of SEQ ID Nol9, HS.137971 of SEQ ID No20, IFITM3 of SEQ ID No21, LOC644936 of SEQ ID No
  • the biological material originates from a forensic trace.
  • the invention also relates to use of at least one genetic marker selected from human CYP1B 1, CHI3L2, MOXD1, DBNDD2, UGT2B 17, UTS2, CD47, SMC6, PLA2G4C, C160RF75, C10RF115, SLC7A7, PEX6, RABEPl, S1PR4, SNHG8, TBC1D4, UGT2B7, GPR56, HS.137971, IFITM3, LOC644936, LOC729708, CDC42EP5, GAPDHL6, HSPC157 for the identification and separation of a mixture of human cells with regard to the Caucasian or Asian ethnic origin, based on the differences in the mRNA quantities for that genetic marker in human cells.
  • at least one genetic marker selected from human CYP1B 1, CHI3L2, MOXD1, DBNDD2, UGT2B 17, UTS2, CD47, SMC6, PLA2G4C, C160RF75, C10RF115, SLC7A7, PEX6, RABEPl, S
  • a genetic marker for the identification and separation of human cell mixture with regard to the ethnic Caucasian or Asian origin additionally an assessment of a difference in mRNA quantities for at least one additional genetic marker other than previously selected from human CYP1B1, CHI3L2, MOXD1, DBNDD2, UGT2B17, UTS2, CD47, SMC6, PLA2G4C, C160RF75, C10RF115, SLC7A7, PEX6, RABEPl, S1PR4, SNHG8, TBC1D4, UGT2B7, GPR56, HS.137971, IFITM3, LOC644936, LOC729708, CDC42EP5, GAPDHL6, HSPC157 is applied.
  • the genetic marker is selected from human CYP1B 1, CHI3L2, MOXD1, DBNDD2, UGT2B17, UTS2.
  • the mRNA quantity is determined by hybridizing the nucleic acids in human cells with a labeled probe being a nucleotide sequence complementary to the mRNA sequence of the selected genetic marker.
  • Such a probe is preferably fluorescently labeled and the separation of cell mixture is carried out by using laser microdissection or in a Fluorescence - Activated Cell Sorter (FACS).
  • FACS Fluorescence - Activated Cell Sorter
  • the genetic marker is selected from the group comprising: CYP1B 1 of SEQ ID Nol, CHI3L2 of SEQ ID No2, MOXD1 of SEQ ID No 3, DBNDD2 of SEQ ID No4, UGT2B 17 of SEQ ID No5, UTS2 of SEQ ID No6, CD47 of SEQ ID No7, SMC 6 of SEQ ID No8, PLA2G4C of SEQ ID No9, C160RF75 of SEQ ID NolO, C10RF115 of SEQ ID Nol l, SLC7A7 of SEQ ID No 12, PEX6 of SEQ ID Nol3, RABEPl of SEQ ID Nol4, S1PR4 of SEQ ID Nol5, SNHG8 of SEQ ID Nol 6, TBC1D4 of SEQ ID Nol 7, UGT2B7 of SEQ ID Nol 8, GPR56 of SEQ ID Nol 9, HS.1379
  • the cell mixture is a forensic trace.
  • the invention also relates to use of a genetic marker for the identification of the ethnic origin of human biological material, in particular a human cell, with regard to the Caucasian or Asian origin, wherein by means for determining the mRNA quantity, the mRNA quantity is determined in human biological material, for at least one genetic marker selected from human CYP1B1, CHI3L2, MOXD1, DBNDD2, UTS2, CD47, SMC6, PLA2G4C, C160RF75, C10RF115, SLC7A7, RABEPl, S1PR4, SNHG8, TBC1D4, UGT2B7, GPR56, HS.137971, IFITM3, LOC644936, LOC729708, CDC42EP5, GAPDHL6, HSPC157.
  • a genetic marker for the identification of the ethnic origin of human biological material in particular a human cell, with regard to the Caucasian or Asian origin
  • the mRNA quantity is determined in human biological material, for at least one genetic marker selected from human CYP1B1,
  • the mRNA quantity is further determined for at least one additional genetic marker other than selected previously from human YP1B 1, CHI3L2, MOXD1, DBNDD2, UTS2, CD47, SMC 6, PLA2G4C, C160RF75, C10RF115, SLC7A7, RABEPl, S1PR4, SNHG8, TBC1D4, UGT2B7, GPR56, HS.137971, IFITM3, LOC644936, LOC729708, CDC42EP5, GAPDHL6, HSPC157.
  • a preferable means for determining the mRNA quantity is a microarray analysis, TaqMan Low Density Array analysis (TLDA), Real-Time PCR (QPCR) amplification or Fluorescent In Situ Hybridization.
  • TLDA TaqMan Low Density Array analysis
  • QPCR Real-Time PCR
  • the marker is preferably selected from human CYP1B1, CHI3L2, MOXD1, DBNDD2, UTS2.
  • the maker is selected from human UTS2, SMC6, CD47, HS.137971, C160RF75, RABEPl, S1PR4, HSPC157; or if the means for determining the mRNA quantity is the TaqMan Low Density Array analysis, the marker is selected from human UTS2, CHI3L2, C10RF115, C160RF75; or if the means for determining the mRNA quantity is the PCR amplification, the marker is selected from human UTS2.
  • the genetic marker is preferably selected from CYP1B1 of SEQ ID Nol, CHI3L2 of SEQ ID No2, MOXD1 of SEQ ID No 3, DBNDD2 of SEQ ID No4 confront UTS2 of SEQ ID No6, CD47 of SEQ ID No7, SMC6 of SEQ ID No8, PLA2G4C of SEQ ID No9, C160RF75 of SEQ ID NolO, C10RF115 of SEQ ID Nol l, SLC7A7 of SEQ ID Nol2, , RABEPl of SEQ ID Nol4, S1PR4 of SEQ ID Nol5, SNHG8 of SEQ ID Nol6, TBC1D4 of SEQ ID Nol 7, UGT2B7 of SEQ ID Nol 8, GPR56 of SEQ ID Nol 9, HS.137971 of SEQ ID No20, IFITM3 of SEQ ID No21, LOC644936 of SEQ ID No20, IFITM3 of SEQ ID No21, LOC644936 of SEQ ID
  • the human biological material In a preferable use for the identification of the ethnic origin of the biological material, in particular a human cell, with regard to the Caucasian or Asian population origin, the human biological material originates from a forensic trace.
  • the invention also relates to a set for the identification of the ethnic origin of human biological material, in particular a human cell, with regard to the Caucasian or Asian population origin or for the identification and separation of cell mixture with regard to the Caucasian or Asian origin that includes molecules of nucleic acid complementary to the sequence of at least one genetic marker, more preferably at least two, even more preferably at least nine markers selected from CYP1B1, CHI3L2, MOXD1, DBNDD2, UGT2B17, UTS2, CD47, SMC6, PLA2G4C, C160RF75, C10RF1 15, SLC7A7, PEX6, RABEPl, S1PR4, SNHG8, TBC1D4, UGT2B7, GPR56, HS.137971, IFITM3, LOC644936, LOC729708, CDC42EP5, GAPDHL6, HSPC157, wherein the molecules of nucleic acid complementary to the sequence of such a genetic marker allow for determination of mRNA quantity for that marker by means of suitable means for
  • a preferable set will include molecules of nucleic acid complementary to the sequence of at least one genetic marker, more preferably to at least two genetic markers selected from human CYP1B1, MOXDl, CHI3L2, SLC2A5, UTS2, DBNDD2, ROBOl .
  • a preferable set will for example include molecules of nucleic acid which are QPCR amplification primers or will be a microarray comprising short ligated nucleic acid fragments complementary to the selected markers, or a targeted array (for TLDA), or a fluorescently labeled nucleic acid, in particular a Stellaris-type or Singer-type probe.
  • the molecules of nucleic acid will be complementary to the sequence of the genetic marker selected from the group comprising CYP1B 1 of SEQ ID Nol, CHI3L2 of SEQ ID No2, MOXDl of SEQ ID No 3, DBNDD2 of SEQ ID No4, UGT2B17 of SEQ ID No5, UTS2 of SEQ ID No6, CD47 of SEQ ID No7, SMC6 of SEQ ID No8, PLA2G4C of SEQ ID No9, C 160RF75 of SEQ ID No 10, C 1 ORF 115 of SEQ ID No 11 , SLC7A7 of SEQ ID No 12, PEX6 of SEQ ID Nol3, RABEPl of SEQ ID Nol4, S1PR4 of SEQ ID Nol5, SNHG8 of SEQ ID Nol6, TBC1
  • any means allowing for specific probe labeling radioactive, fluorescent, labeling by incorporation of modified bases, labeling using dendrimers, labeling using biotine and streptavidin, etc.
  • probes may be labeled and the separation carried out using the means for reading/measuring of such signals as radioactive or fluorescent signals.
  • Stellaris probes were developed and described by Raj et al. (2008). In the described method the detection of single transcripts is possible following an in situ mRNA hybridization with sets of probes fluorescently labeled at one of the termini. Usually, the probe sets comprise around 30 to 50 oligonucleotides, each about 25-35 nucleotides in length. The probes should be designed in such a way that the melting temperatures of particular oligonucleotides are in the range of 40-50°C. Hybridization with such probes is performed in a hybridization buffer containing around 10-20% formamide for several to more than ten hours, in temperature of 20-40°C. The unhybridized probes are rinsed off by incubation of the samples with a wash buffer containing 10-20% formamide.
  • Singer probes were developed by Singer (Femino et al, 1998) and later modified by other authors.
  • the probe sets usually comprise 4 to 5 oligonucleotides, each about 45-55 nucleotides in length.
  • Probe sequences should be designed in a way ensuring that each oligonucleotide contains at least four thymidines separated from each other by stretches of at least 10 nucleotides. Those thymidines are fluorescently labeled.
  • the hybridization using such probes is performed is a hybridization buffer containing about 50% formamide for several to more than ten hours, in temperature of 20 to 40°C.
  • the unhybridized probes are rinsed off by incubation of the samples with a wash buffer containing 50% formamide.
  • a genetic marker should be understood as a recognizable nucleic acid fragment used for identification.
  • An example of a genetic marker may be a transcription marker.
  • a forensic trace should be understood as biological material of human or other origin useful in forensic analysis.
  • Biomaterial should be understood as material of biological origin containing nucleic acids.
  • An example of biological material is eg. isolated nucleic acids, in particular mRNA, DNA, mixtures thereof, cells, cells cultures, etc.
  • the performed analyses resulted in the identification of 26 genetic markers (transcription markers), 20 of which were particularly preferable, and for which a significant difference in mRNA expression levels between human cells from the Caucasian and Asian populations was shown.
  • the genetic markers showing differences in mRNA expression levels relate to mRNAs for the genes: CYP1B1, CHI3L2, MOXD1, DBNDD2, UGT2B17, UTS2, CD47, SMC6, PLA2G4C, C160RF75, C10RF115, SLC7A7, PEX6, RABEPl, S1PR4, SNHG8, TBC1D4, UGT2B7, GPR56, HS.137971, IFITM3, LOC644936, LOC729708, CDC42EP5, GAPDHL6, HSPC157.
  • the variability in transcription profiles between the Asian and Caucasian populations for genetic markers of the invention may be used in any way and in order to increase the test reliability, the differences in mRNA quantities may be shown for at least two markers, more preferably for between 3 to 26 genetic markers of the invention, selected from CYP1B 1, CHI3L2, MOXD1, DBNDD2, UGT2B17, UTS2, CD47, SMC6, PLA2G4C, C160RF75, C10RF115, SLC7A7, PEX6, RABEPl, S1PR4, SNHG8, TBC1D4, UGT2B7, GPR56, HS.137971, IFITM3, LOC644936, LOC729708, CDC42EP5, GAPDHL6, HSPC157.
  • a person skilled in the art will be able to select suitable means for showing the differences in quantities of a given mRNA, allowing for differentiation of a human cell with regard to its Asian or Caucasian origin.
  • the means for showing differences in mRNA quantities may be, for example, chosen from the microarray analysis, quantitative PCR or labeling by means of the FISH technique, which may for example be further combined with separation by laser microdissection or FACS sorting.
  • microarray analysis the application of a two-gene classifier makes it possible to discriminate between the two populations with almost 90% accuracy, whereas a nine- gene classifier will have an almost 100% accuracy in distinguishing between the Caucasian and Asian populations. Therefore, while using microarray analysis as the means for the identification of the differences in the transcription levels, it is preferable to select at least two markers, more preferable to select between three to twenty six markers, and the most preferable to select at least nine markers selected from CYP1B1, CHI3L2, MOXD1, DBNDD2, UGT2B17, UTS2, CD47, SMC6, PLA2G4C, C160RF75, C10RF115, SLC7A7, PEX6, RABEP1, S1PR4, SNHG8, TBC1D4, UGT2B7, GPR56, HS.137971, IFITM3, LOC644936, LOC729708, CDC42EP5, GAPDHL6, HSPC157.
  • QPCR, amplification primers and fluorescently labeled probes for marker genes should be designed according to standard principles. In order to properly identify the ethnic origin of the examined samples it is sufficient to carry out an analysis of a single selected marker gene.
  • the expression of the examined genes should be normalized relative to the reference index obtained by the calculation of the geometric mean for the expression of the control genes (the so called HKG - housekeeping).
  • At least one marker preferably selected from UTS2 and UGT2B17, more preferably between two to twenty six markers selected from CYP1B 1, CHI3L2, MOXD1, DBNDD2, UGT2B 17, UTS2, CD47, SMC6, PLA2G4C, C160RF75, C10RF115, SLC7A7, PEX6, RABEP1, S1PR4, SNHG8, TBC1D4, UGT2B7, GPR56, HS.137971, IFITM3, LOC644936, LOC729708, CDC42EP5, GAPDHL6, HSPC157.
  • TLDA targeted microarrays
  • amplification primers and fluorescently labeled probes should be designed for discriminatory transcripts and reference genes (control genes, the so called HKG - housekeeping).
  • control genes the so called HKG - housekeeping
  • the TLDA as the means for the identification of the differences in the transcription levels it is preferable to select at least one marker, more preferably two selected from UTS2 and UGT2B17, more preferably between three and twenty six markers selected from CYP1B 1, CHI3L2, MOXDl, DBNDD2, UGT2B17, UTS2, CD47, SMC6, PLA2G4C, C160RF75, C10RF115, SLC7A7, PEX6, RABEPl, S1PR4, SNHG8, TBC1D4, UGT2B7, GPR56, HS.137971, IFITM3, LOC644936, LOC729708, CDC42EP5, GAPDHL6, HSPC157.
  • a set of Stellaris or Singer probes should be designed according to the described principles for at least one marker gene with a high discrimination power, more preferably, for a set of two genes indicated in statistical analyses.
  • FISH as the means for the identification of the differences in the transcription levels, it is preferable to select at least one marker, more preferably two, even more preferably between three and twenty six markers selected from CYP1B 1, CHI3L2, MOXDl, DBNDD2, UGT2B17, UTS2, CD47, SMC6, PLA2G4C, C160RF75, C10RF115, SLC7A7, PEX6, RABEPl, S1PR4, SNHG8, TBC1D4, UGT2B7, GPR56, HS.137971, IFITM3, LOC644936, LOC729708, CDC42EP5, GAPDHL6, HSPC157.
  • FISH technique it is particularly preferable to use the markers CYP1B1,
  • the appropriate means for the identification of the differences in the quantities of a given mRNA will utilize the genetic markers exhibiting the difference in the mRNA transcription levels between the biological material from the Caucasian population and the biological material from the Asian population, for markers selected from CYP1B1 of SEQ ID Nol, CHI3L2 of SEQ ID No2, MOXDl of SEQ ID No 3, DBNDD2 of SEQ ID No4, UGT2B17 of SEQ ID No5, UTS2 of SEQ ID No6, CD47 of SEQ ID No7, SMC6 of SEQ ID No8, PLA2G4C of SEQ ID No9, C160RF75 of SEQ ID NolO, C10RF115 of SEQ ID Nol l, SLC7A7 of SEQ ID Nol2, PEX6 of SEQ ID Nol3, RABEP1 of SEQ ID Nol4, S1PR4 of SEQ ID Nol5, SNHG8 of SEQ ID Nol6, TBC1D4 of SEQ ID Nol7, U
  • Cell separation based on variable levels of marker mRNA expression may be carried out using any technique facilitating the discrimination and separation of cells according to the differences in expression levels of marker mRNA(s).
  • Laser microdissection is a precise and automated method of cell isolation, which additionally enables quantitative evaluation of the levels of marker mRNA expression with a single-molecule precision.
  • the technique carries some disadvantages such as high time consumption, high depreciation costs (due to the need to purchase expensive equipment), as well as the necessity to employ highly qualified and experienced staff.
  • the cells can be isolated directly from the surface of the forensic adhesive tapes used to collect forensic traces. The performed tests showed that the technology of laser microdissection, combined with organic DNA extraction, enables obtaining a genetic profile using only a few cells.
  • Fluorescence-Activated Cell Sorting An alternative method of the separation of fluorescently labeled cells is the Fluorescence-Activated Cell Sorting. It is a modern and fully automated technology allowing to measure physical and chemical properties of the cells, to determine the fluorescence levels in particular cells and to separate them physically, based on precisely defined parameters. A great advantage of this method is its efficiency. Modern devices can analyze and separate up to a few thousands cells per second. Moreover, they are equipped with several lasers and fluorescence detectors, which allows for cell labeling using a whole variety of fluorescent dyes and for simultaneous application of several probes. The use of FACS for the separation of cells based on their transcriptional signature significantly simplifies and accelerates the identification process. The described method constitutes an innovative approach, in particular to the analysis of complex forensic traces collected at the crime scene.
  • FACS Fluorescence Activated Cell Sorting
  • Both methods of cell separation laser microdissection and FACS sorting, enable an a priori examination of the material, eg. forensic traces. It means that the mixture components are identified and segregated before isolation of their genetic material and analysis of STR polymorphism (genetic profiling). In forensic sciences it is undoubtedly an innovative approach that enables obtaining pure DNA profiles for individual components of the mixture.
  • cellular DNA is subjected to PCR amplification using multiplex STR polymorphism analysis kits in order to determine the genetic profiles of the groups of cells segregated with the use of genetic markers discriminating between the Caucasian and Asian population, and to identify the donors.
  • the cells can be isolated directly from the surface of the forensic adhesive tapes used to collect forensic traces.
  • the performed tests showed that the technology of laser microdissection, combined with organic DNA extraction, enables to obtain a genetic profile using as few as 4 to 10 human cells, which constitutes substantial progress when compared with traditional methods requiring around 150 cells, let alone the material loss resulting from the isolation process.
  • FIG. 1. illustrates the selection of the optimal size of the classifier, based on the cross- validation.
  • the determined optimal size of the classifier was nine genes.
  • FIG. 1 Results obtained in TLDA analyses were subjected to a statistical analysis with a Student's t-test and a Volcano plot with the Data Assist software (Life Technologies). Out of 19 examined transcription-differentiating genes, 5 showed statistically significant differences (p ⁇ 0.05) in the expression levels between the examined populations, Caucasian (CEU) and Asian (CHB).
  • CEU Caucasian
  • CHB Asian
  • FIG. 3 Hybridization of lymphocyte cells representing Asian population (A) or Caucasian population (B) with probes specific for DBNDD2 gene.
  • FIG. 4 The analysis of post-hybridization signals using BlobFinder software. Within the field of vision there is a single HeLa cell, which has been hybridized with the probes specific for the GAPDH reference gene. The circles indicate the hybridization signals, while the line demarcates the cell nucleus.
  • the table presents the software - generated report, including the number of signals, number of nuclei and the total areas of the nuclei in analyzed cells.
  • FIG. 5 The chart illustrating mean values distribution - for markers CHI3L2 vs. CYP1B 1, o
  • FIG. 6 The chart illustrating mean values distribution - for markers MOXD1 vs. CYP1B1, o
  • FIG. 7 Projection of variables (values for markers UTS2, MOXD1, DBNDD2, CHI3L2, CYP1B1) on the PCI (ca. 70%) and PC2 (ca.15%) planes.
  • FIG. 8. Projection of events (Lines) on the PCI (ca.70%) and PC2 (ca.15%) planes, o - Caucasian population, ⁇ - Asian population)
  • FIG. 9. Fluorescence profile obtained using a FACS sorter for single population lines, (A) Asian, (B) Caucasian, and (C) a mixture of lymphocytes of the two lines, following the hybridization with the DB DD2 probe. The sort gates set up for acquiring the cells exhibiting low (P9) and high (PI 3) fluorescence are marked.
  • FIG. 10 Fragments of genetic profiles obtained by using the STR polymorphism analysis method for cells sorted out of a mixture (Fig. 9C), corresponding to P9 (A) and P13 (B) fractions.
  • FIG. 11 Separation of a tissue mixture (vaginal epithelial cells + buccal epithelial cells) hybridized with the STATH probe (specific for the buccal epithelium), carried out using the FACS sorter.
  • FIG. 12 Separation of a tissue mixture (vaginal epithelium individual A + buccal epithelium from individual ) hybridized with the MUC7 probe (specific for the buccal epithelium), carried out using a FACS sorter.
  • FIG. 13 Separation of a tissue mixture (vaginal epithelium from individual A + epidermis from individual B) hybridized with the LCE1C probe (specific for the epidermis) + the STATH probe (specific for the buccal epithelium), carried out using a FACS sorter.
  • B Cells sorted at FAM - specific wavelength (detected STATH probe). The P4 fraction (the highest fluorescence), gate size: 16%; sort size: ca. 250 cells; profile: pure - individual B.
  • FIG. 14 A) Schematic representation of the MMI membrane subjected to hybridization or laser microdissection. The areas of deposition of the cell mixture from the forensic stain (SK/BS), positive controls (SK and BS) and negative control are marked with circles (the area of the membrane without biological material, subjected to hybridization is marked as zero- liability). B) microscopic image of cells, following hybridization with the MUC7 probes (continuous circles) and LCE1C probes (dashed circles). The arrows indicate the cells which were qualified for laser microdissection.
  • FIG. 15 Genetic profiles of the fractions of buccal epithelial cells and epidermal cells, each of 10 cells, obtained by means of cell mixture separation using laser microdissection. The profiling was carried out using NGM SElect kit.
  • FIG. 16 Microscopic image of the L32/L33 lymphocyte lines mixture, following hybridization with CHI3L2 and CYP1B1 probes - classification of cells into certain intensity intervals was carried out with NIS-Elements software.
  • FIG. 17 The results of the genetic profiling of the mixture of lymphocytes representing the Caucasian (L32) and Asian (L33) population, hybridized with the CH3L2 and CYP1B1 probes.
  • RNAs isolated from 127 cell lines of lymphocytes from Caucasian and Asian populations were hybridized on HumanHT-12v4 Expression BeadChip Kit expression arrays produced by Illumina.
  • HumanHT-12v4 Expression BeadChip Kit expression arrays produced by Illumina.
  • the analysis included such parameters as signal intensity (including basic metabolism genes), background level, noise level, the number of detected actively expressed genes, or hybridization control, as well as mutual relations thereof. Two samples were excluded based on the quality analysis.
  • RNA samples including:
  • Unsupervised analysis was performed by means of hierarchical clustering, the results of which are usually presented in a dendrogram.
  • the hierarchical clustering (in each step of the analysis) consists of the calculation of the matrix of distances between all objects, the creation of clusters by combining the objects and/or clusters created in the previous step, and the calculation of the distances between clusters combining several objects.
  • the Euclidean distance was used to calculate the distances between samples, while the Ward's method was applied to combining individual clusters.
  • the aim of this step of the analysis was to detect and eliminate any technical factors and to filter out, based on biological replicates of cellular lines, the genes displaying high measurement reliability.
  • the clustering performed for all the probes on the microarray revealed as the main source of variability in the analyzed dataset a variability between individual microarrays.
  • the so called data centering method was applied (i.e. bringing the mean expression of each gene within one microarray to a common value). This step is performed as follows: for each gene, the mean expression in particular arrays is calculated by the following formula:
  • the measurability coefficient was calculated as the quotient of the total variation and the third quartile of the variation coefficient of this gene : vc,
  • the genes deemed as "measurable” were those, for which "the coefficient of measurability" mc > 1.5. A total of 3732 genes were filtered out as fulfilling this criterion. The hierarchical clustering of the analyzed samples based on the "measurable” genes showed an ideal grouping of all available biological replicates.
  • a supervised analysis was carried out using the Student's t test for independent samples.
  • a total of 67 male cell lines were analyzed, with 35 males from the Caucasian and 32 from the Asian population.
  • the test was performed for 3732 probes that met the measurability criterion mc > 1.5.
  • p-value a statistical significance
  • FDR False Discovery Rate
  • the expression profile of the most significant UTS2 gene indicates its high classification potential, by allowing to identify correctly 30 (93.7%) Asian samples and 32 (91.4%) Caucasian samples.
  • the classification was carried out using Diagonal Linear Discriminant Analysis (DLDA).
  • DLDA Diagonal Linear Discriminant Analysis
  • the Student's t test was used for the selection of genes.
  • the effectiveness of the obtained classifier was tested both using the LOO cross-validation method (Leave One Out - i.e., in each iteration one sample is a test sample, whereas the remaining samples are a teaching set), and by using microarray data of an independent set of 10 female lines (5 from each of the Asian and Caucasian populations).
  • the optimal size of the classifier has been determined to be nine genes (Fig. 1, Tab. 2).
  • the obtained classifier proved highly effective by correctly classifying all 10 samples of the independent set of female lines.
  • QPCR Real Time PCR
  • Example 3 Ethnic identification of the donor, based on analysis of the biological material using the transcriptional markers and targeted miniarrays (TaqMan Low Density Arrays - TLDA)
  • RNA was isolated from 80 cell lines with the use of RNeasy Mini Kit and RNeasy Plus Mini Kit (Qiagen). RNA isolates corresponding to the cell lines that have not been previously subjected to microarray analysis were selected for TaqMan Array analysis. Nucleic acid concentration was assessed by spectrophotometric analysis, i.e., UV absorbance measurement at 260 nm (A260) using the NanoDrop ND-1000 apparatus. RNA quality was determined by capillary electrophoresis in Agilent 2100 Bioanalyser, by applying the RNA 6000 Nano Assay (Agilent Technologies) and the suitable RNA 6000 Nano Marker. RNA integrity was determined by applying the RNA Integrity Number (RIN).
  • RIN RNA Integrity Number
  • RNA isolates were of good quality as confirmed by the observed RIN values in the range 8-9.5, and fulfilled the quality criteria for further analyses.
  • Aliquots of RNAs obtained from 72 cell lines were reversely transcribed into cDNA by using the Enhanced Avian RT First Strand Synthesis Kit (Sigma).
  • Validation of the selected genes of the classifier obtained in microarray analysis was performed using TLDA Array Cards (Life Technologies). This method is based on estimation of gene expression using 384-well reaction plates with anchored TaqMAn GB probes. The sample expression level is assessed relative to the calibrator and it is recorded as a fold- change of expression level in relation to the calibrator (RQ value).
  • a calibrator may be made from any randomly selected sample or a mix of samples. Any potential fluctuations in expression levels, stemming from laboratory errors or unequal quantities of the input material are normalized against a selected gene characterized by constitutive expression, i.e., expressed continuously, regardless of the tissue type or experimental conditions (so-called housekeeping gene). It is currently recommended that more than one housekeeping gene be applied within any particular cycle of experiments (Sorby et al. 2010).
  • the measurement of relative gene expression using the TLDA cards was conducted in three repetitions for each of the genes tested.
  • the wells of the 384-well TLDA cards were loaded with lOOul aliquots of the reaction mix containing cDNA, Gene Expression Master mix (Life Technologies) and water.
  • the TLDA cards were centrifuged twice at 1200rpm for lmin in Heraeus centrifuge (Thermo Scientific), and sealed with a card sealer to prevent well-to- well contamination.
  • the TLDA cards containing 8 analyzed samples and microarray classifier genes were processed using the Real-Time PCR 7900 HT system (Life Technologies).
  • Classifier l card set containing 13 out of 19 microarray classifier genes was applied to testing of 33 Caucasian lines and 39 Asian lines.
  • the calibrator for Classifier l was made up of 5 Caucasian lines.
  • Two different calibrators were used in Classifier_2 analyses, that is: i) 5 Caucasian lines for cultured cell line samples and ii) epidermis mixture from two Asian donors for the epidermis samples.
  • Amplification curves were analyzed with RQ Manager software. For each of the genes tested, the threshold ct value was determined, indicative of the reaction proceeding into logarithmic phase of product increase, used for determination of input quantity of product in the reaction mix. In the next step, all samples were subjected to further statistical analysis, wherein the Student's t-test and the Volcano plot were performed with Data Assist software (Life Technologies) (Fig. 2), whereas the Mann- Whitney U test was performed with Statistica v.9.0. software.
  • the experiments were aimed at determining the optimal conditions for in situ detection of single transcripts as well as suitability of particular genetic markers for ethnic identification of cells using FISH technique.
  • a set of genes was selected for further analysis of transcription levels in lymphocyte cell lines derived from donors of Caucasian and Asian origin, with the use of FISH technique.
  • the Stellaris probes (Probe Designer, Bioserach Technologies, USA) were designed for genes with sufficiently long transcripts (>1000nts ), such as: CYP1B 1, CHI3L2, DB DD2, MOXD1, UGT2B 17, ROBOl and SLC2A5.
  • the probe sets consisted of 30-48 terminally, fluorescently labeled oligonucleotides, each 20-30 nts in length (Tab. 5).
  • the Singer probe sets consisting of five internally labeled probes, each ⁇ 50 nts in length, were designed for UTS2 gene, for which the length of the transcript was insufficient for designing the Stellaris probes. For comparison purposes, the Singer probe sets were also designed for CD47 and SMC6 genes.
  • the probes were synthesized by Biosearch TEchnologies (USA). Hybridization was performed on lymphocytes from cultured cell lines obtained from Coriell Cell Repositories (USA). The cells were immobilized on poly-lysine coated coverslips, followed by fixation in 4% formaldehyde solution. After cell fixation, the coverslips were stored in 70% ethyl alcohol for 1 to 30 days. Prior to hybridization, the specimens were washed in the wash buffer (2xSSC; 10-50% formamide - depending on the probe used). After 10-15 min. the wash buffer was replaced with hybridization buffer (2xSSC; 10% dextran sulphate; 5 nM probe).
  • Fig. 3 shows an exemplary post- hybridization image.
  • the microscopic imaging was carried out with the use of the confocal scanning NIKON TiE Ecllipse Al microscope (NIKON), equipped with diode lasers (405nm, 561nm, 638nm), an argon ion laser (457nm/488nm/514nm) as well as visible light source.
  • the cells were imaged using two types of oil immersion objectives: Apo TIRF lOOx Oil DIC N2 and Plan Apo VC 60x Oil DIC N2. Imaging was carried out at the resolution of 512x512; at the speed of 1 frame / 24s; using the linear scanning mode "Integrate". During the three-dimensional (3D) scanning, the "z" layers were acquired at intervals of 0.2 ⁇ .
  • the quantitative analysis of fluorescent signals was performed with the use of the BlobFinder software, allowing for preliminary analysis, i.e., identification and quantification of fluorescent signals resulting from hybridization, which numbers correspond to the numbers of transcript molecules present in analyzed cells.
  • the BlobFinder allows for identification of single cells, separation of unspecific background from specific fluorescent signals and quantification of signals within any particular cell (Fig. 4).
  • Probe name Fluorescent dye Probes (number, length)
  • MOXD1 st2 Call Fluor 610 Stellaris: 48, 20 nts
  • the data on expression levels of the marker genes were statistically analyzed (Tab. 6,7).
  • the analysis covered the selected marker genes identified as a result of microarray analyses; and the GAPDH housekeeping gene typically used in various gene expression studies; as well as the ROBOl and SLC2A5 differential genes selected based on a review of the literature ( Saintman et al. 2007).
  • the probes were ranked by their differentiation potential as follows:
  • CHI3L2 St > CYPlB l St > SLC2A5 St > MOXD St2 > UGT2B17 St bis > UTS2 Si > DB DD2 St > MOXD Stl > ROB01 St .
  • the probes were ranked from the highest to the lowest spread value ( ⁇ ), as follows:
  • CYPlBl St > MOXDl St2 > DB DD2 St >MOXD 1 Stl > CHI3L2 St > UTS2 Si > UGT2B17 St bis > SLC2A5 St > ROB01 St .
  • This example aimed at demonstrating the suitability of one of the transcription markers - DB DD2 for separation of cell mixtures composed of donor cells of different ethnic origin, with the use of FACS sorting.
  • the cell lines were cultured according to standard methods, Aliquots of appx. lO ml of dense cell suspension from each cell line were centrifuged, the growth medium was discarded and the cell pellet washed in lxPBS. Next, the cells were fixed in 4% formaldehyde in lxPBS for 30 minutes. After fixation, the cells were washed twice in lxPBS and stored in 70% ethyl alcohol, for 1 up to 30 days, at +4°C. Prior to hybridization, the cells were washed twice in the wash buffer (2xSSC; 15% formamide) and then, incubated in 100 ⁇ hybridization buffer (containing DB DD2 probe at a final concentration of 5nM).
  • Hybridization was carried out for 16-18 hrs, at 37°C, in darkness. After hybridization period, the unattached probes were removed by washing in the wash buffer (2 x 30min, at 37°C). After the final wash, the cells were suspended in lxPBS.
  • the samples prepared in this way were sorted using BD FACSCalibur flow cytometer, equipped with 488nm and 645nm lasers.
  • the fluorescence profiles of the tested lines were determined (Fig. 9A, B).
  • 10000 cells of each line were used. Additional experiments showed that it was feasible to limit the number of cells required for obtaining the DNA profile to 1000.
  • the fluorescently labeled cells were mixed at 1 : 1 ratio and the fluorescent profile of the mixture was determined (Fig. 9 C).
  • the sort gates (P) were set, based on fluorescence profiles of single and mixed cell lines.
  • the sort gates were set to cover the entire fluoresce intensity spectrum to enable separation of cells displaying the lowest and highest specific fluorescence.
  • the gate sizes (percentage of cell population displaying the lowest and highest fluorescence intensity) were determined experimentally and they varied depending on the probe used. In the experiment described herein, the 3%, 5% and 10%) gating was applied (to both ends of the fluorescence spectrum). The 5% gating allowed for unequivocal separation of populations.
  • 500 events (ca. 250 cells) were separated from the mixture within each of the gates.
  • the sorted cells were subjected to genetic identification in order to confirm a successful separation of populations.
  • a standard method of polymorphism analysis of 17 STR loci was applied using NGM SElect forensic identification kit (Applied Biosystems).
  • Example 5B Separation of mixture of two forensic traces (saliva, vaginal swabs and epidermis with MUC7, STATH, LCE1C markers) using FACS sorter
  • the aim of this example was to demonstrate the possibility of separation and identification of cells within the mixture deposited as a forensic trace, with the use of methods disclosed in the invention, based on the detection of differential transcription levels of the marker genes and on sorting of the labeled cells in the FACS sorter.
  • vaginal epithelial cells and buccal epithelial cells ( ⁇ each) was deposited on the canvas fabric, which was left to dry out for 7 days at room temperature. Then, the fragment of fabric containing biological material was excised and incubated in 1% PBS for lOmin, at 37°C in order to rinse off the cells. Next, the incubate was centrifuged at 900 rev./min, followed by resuspending the cell pellet in 200 ⁇ 1 lxPBS. In case of mixtures composed of buccal epithelial cells and epidermis, the cells were collected by swabbing the inside of the cheek and the palm/ forearm, respectively.
  • the aim of this example was to demonstrate the possibility of separation and identification of cells within the mixture deposited as a forensic trace, with the use of methods disclosed in the invention, based on the detection of differential transcription levels of the marker genes and on laser microdissection - assisted separation of labeled cells from the mixture.
  • the object of the study was a simulated forensic trace set up in a way that the perpetrator had bitten the inside of victim's forearm.
  • Biological material anticipated in such a stain is a mixture composed of victim's epidermis (SK) and perpetrator's buccal epithelial cells (BS).
  • SK victim's epidermis
  • BS buccal epithelial cells
  • a sample was collected by a swabbing method using a sterile cotton swab moistened with F water (Ambion).
  • a wad of cotton containing biological material was incubated in IxPBS for 10 min, in 37°C in order to rinse off the cells. Next, the incubate was centrifuged at 900 rev/min. The obtained cell suspension was deposited onto the MMI membrane by pipetting. To provide controls of hybridization reaction, apart from cell suspension from the trace, the epidermis and buccal epithelial cell suspensions were deposited onto separate areas of the membrane (Fig. 14). The hybridization was carried out with the use of two probes: LCE1C - specific towards epidermial mRNA; labeled with the Q560 fluorochrome and MUC7 - specific towards buccal epitelium; labeled with FAM fluorochrome.
  • the specimens were imaged with fluorescent microscopy, at the excitation wavelengths specific for the fluorochromes used for labeling of the FISH probes and the fluorescence signal intensities were assessed.
  • the intensity assessment was performed by NIS Elements software, using the "Mean Intensity” parameter for calculation of mean signal intensity, expressed in absolute units, within a specified Region Of Interest (ROI) corresponding approximately to the area of the imaged cell.
  • ROI Region Of Interest
  • the cells were classified into individual tissues (SK or BS) by assuming for both probes the fluorescence intensity threshold of 1100 units, which meant that to be classified into either tissue the cell had to exhibit the fluorescence intensity greater than 1100 units at emission spectrum characteristic for either Q560 or FAM fluorochrome.
  • the fluorescence intensity threshold of 1100 units, which meant that to be classified into either tissue the cell had to exhibit the fluorescence intensity greater than 1100 units at emission spectrum characteristic for either Q560 or FAM fluorochrome.
  • two groups of 10 cells each, corresponding to both fluorochromes were isolated from the specimen by using laser microdissection. In order to confirm separation accuracy, both groups were subjected to DNA profiling with the use of NGM SElect kit (Fig. 15).
  • Example 5D Separation of mixture of two fluorescently labeled B lymphocyte cell lines - Caucasian and Asian (CHI3L2 + CYP1B1 markers) with the use of laser microdissection.
  • This example demonstrates the possibility of separating the mixture of cells originated from two donors representing different ethnic populations by using fluorescent microscopy imaging in order to detect differential transcription of marker genes at the single cell level, followed by physical segregation of cells using laser microdissection.
  • Lymphocytes were obtained directly form suspended cultured cell lines. Following a cell count using the Fuchs-Rosentall counting chamber, the suspensions were brought to desired densities by adding lxPBS buffer. Cell mixtures at 1 : 1 ratio intended for laser microdissection were deposited onto the MMI membrane in lOul aliquots.
  • FISH Hybridization
  • NIKON confocal scanning NIKON TiE Ecllipse Al microscope
  • diode lasers 405nm, 561nm, 638nm
  • argon ion laser 457nm/488nm/514nm
  • the specimens were subjected to fluorescence microscopy imaging at the excitation wavelength specific for Quazar670 fluorochrome used for labeling CHI3L2 and CYP1B1 probes.
  • the intensity assessment was performed by NIS Elements software, using the "Mean Intensity" parameter for calculation of mean signal intensity, expressed in absolute units, within a specified Region Of Interest (ROI) corresponding approximately to the area of imaged cell.
  • ROI Region Of Interest
  • a characteristic fluorescence intensity gradient was observed, reflecting, in addition to population-specific transcription signatures, also various biological side effects, e.g., different phases of the cellular life cycle, levels of vitality, hybridization effectiveness, transcriptional bursting, etc.
  • the fluorescence intensity thresholds were experimentally set in order to separate putative single components of the mixture, i.e., cells belonging to either L32 or L33 line.
  • the adopted intensity gradient thresholds are schematically presented in Fig. 17.
  • MMI CellCut Plus® microdissection system (Molecular Machines & Industries AG), coupled with NIKON AIR confocal microscope.
  • MMI CapLift automatic cell collection system based on adhesion of cells to the isolation caps (MMI IsolationCap), i.e., adhesive lids of standard reaction tubes, which can be used in further analysis.
  • Table 9 A table summarizing the genetic markers, microarray probe IDs and microarray probe sequences.
  • CYP1 B1 2120053 27 GCTTTCATGTCCCAGAACTTAGCCTTTACCTGTGAAGTGTTACTACAGCC
  • UTS2 6290228 32 AGCTTCCCTTCTACAGATACTGCCAGAGATGCTGGGTGCAGAAAGAGGGG
  • PEX6 3440086 39 TCCAGGAGATCCCAGGGTGCAAAGTGGCATTGAGACAGCAGCAACAGCTC
  • RABEP1 240110 40 AGCTCAGTTGGGTTTCACGAGTGTTCCTGTGCTTATATTCAGTCTGTGCC
  • UGT2B7 5420450 44 GAGCTAAACACCTTCGGGTTGCAGCCCACGACCTCACCTGGTTCCAGTAC
  • GPR56 5490768 45 GTAGATTGCTGGCCTGTTGTAGGTGGTAGGGACACAGATGACCGACCTGG
  • Table 10 A table summarizing genetic markers, according to the invention, applicable for distinguishing between Caucasian and Asian population.
  • cacgaggcag gggccatttt acctccaggt tggccctgct caggaccagg aggaacacc 60 tccagcccgc gacctcctc cacaggggga aaaggaagc aggaggacca cagaagcttt 120 ggcaccgagg atcccgcag tcttcacccg cggagattcc ggctgaaggagga gctgtccagc 180 gactacaccg ctaagcgcag ggagcccaag cctccgcacc ggattccgga gcacaagctc 240 caccgcgcat gcgcacacgc cccagaccca ggctcaggag gactgagaat ttctgaccg 300 cagtgcacca tg
  • n is a, c, g, or t

Landscapes

  • Life Sciences & Earth Sciences (AREA)
  • Chemical & Material Sciences (AREA)
  • Health & Medical Sciences (AREA)
  • Proteomics, Peptides & Aminoacids (AREA)
  • Organic Chemistry (AREA)
  • Zoology (AREA)
  • Immunology (AREA)
  • Analytical Chemistry (AREA)
  • Wood Science & Technology (AREA)
  • Engineering & Computer Science (AREA)
  • Molecular Biology (AREA)
  • Bioinformatics & Cheminformatics (AREA)
  • Biotechnology (AREA)
  • Biophysics (AREA)
  • Physics & Mathematics (AREA)
  • Cell Biology (AREA)
  • Biochemistry (AREA)
  • Microbiology (AREA)
  • General Engineering & Computer Science (AREA)
  • General Health & Medical Sciences (AREA)
  • Genetics & Genomics (AREA)
  • Measuring Or Testing Involving Enzymes Or Micro-Organisms (AREA)
  • Heterocyclic Carbon Compounds Containing A Hetero Ring Having Oxygen Or Sulfur (AREA)
  • Micro-Organisms Or Cultivation Processes Thereof (AREA)

Abstract

The present invention relates to the application of genetic markers, exhibiting differential transcription with regard to ethnic origin from Caucasian or Asian population, in methods of identification of human biological material and separation of human cells, especially mixtures thereof, and the uses thereof, especially in forensic analyses.

Description

Methods of identification of ethnic origin based on differentiated transcription
profiles and genetic markers used in those methods
DESTCRIPTION TECHNICAL FIELD
The object of the invention is the use of selected genetic markers showing transcriptional variability in human cells in terms of their Caucasian or Asian ethnic origin in methods of identification of human biological material and human cell separation, in particular their mixtures as well as uses thereof, in particular in forensics.
BACKGROUND ART
The aim of forensic examination is to identify persons that left biological traces at a crime scene. The most common method of identification of persons based on biological traces is the genetic material - DNA analysis. Currently, the most popular DNA profiling method used by forensic laboratories for identification purposes is the multiplex analysis of STR polymorphism (STR, short tandem repeats) i.e. repeated short non-coding DNA sequences) based on differences in the number of examined STRs between individuals. The newest generation of STR polymorphism analysis sets STR regions in 17 loci, located in various chromosomes of human genome are evaluated simultaneously. This ensures a significant power of discrimination, i.e. probability bordering on certainty, that any particular STR profile will not repeat in a random individual. Unfortunately, there are serious limitations to the use of this method, the crucial ones being: i) serious problems in the interpretation of mixed DNA profiles from two or more individuals (mixed genotypes); ii) the necessity of having a relatively large amount of biological material required to carry out a test; iii) non- existing possibility of determining the phenotypic properties of donors, e.g. ethnic origin or age; iv) non-existing tool for a precise analysis of forensic material, e.g. the age of biological traces.
This means that the usefulness of standard methods is limited with regard to specific types of evidence in forensic practice, e.g. semen traces after a gang rape, mixed blood traces left on surfaces, etc. A particularly difficult identification problem relates to investigating the crime scenes where the victim's traces prevail significantly over the perpetrator's traces. The analysis of such biological material becomes a serious technical problem that requires a non- standard analytical approach, since the modern knowledge and forensic sciences have no tools for the identification of the minor components of such a mixture. On the other hand, the issue of ethnic origin identification is given priority by international security services in the face of an increasing global terrorist threat. Also the possibility of determining the type of tissue in the samples, including the forensic traces plays in many cases a crucial role in proving guilt or innocence of a suspect and in providing a complete reconstruction of events.
Literature provides information about mRNA markers that are characteristic for the Caucasian, Asian or African ethnic groups (Spielman et al. 2007; Storey et al. 2007), however, these data have not been verified in terms of their usefulness in forensic analyses.
Therefore, the usefulness of the above-mentioned marker mRNAs is limited in practice and it is only hypothetical knowledge that needs confirmation. The known markers of differential transcription profiles correlated with a specific ethnic group, Caucasian or Asian, did not show appropriate sensitivity and specificity, which made them useless for practical applications, especially as markers for identification and separation of cell mixtures in terms of their various ethnic origins. Moreover, they have never been utilized in practice for the identification of materials of different ethnic origin using other auxiliary techniques, such as Q-PCR, TLDA, FISH, and in particular FACS or laser microdissection.
Hence, there are no effective methods for the identification of biological material in terms of its ethnic origin in samples recovered from two or more individuals. Especially, there are no known methods utilizing differential transcription profiles for particular genetic markers that would enable identification of biological materials in terms of the ethnic origin of samples retrieved from two or more individuals, which in turn would make it possible to separate biological traces (e.g. forensic traces with mixed genotypes) into groups of cells of different ethnic origins. Also no known methods exist for the separation of biological samples according to the ethnic origin of a single human cell in the mixture containing a small number of cells of two or more individuals, especially in cell mixtures of representatives of Caucasian and Asian population.
DISCLOSURE OF THE INVENTION
The aim of the present invention is to overcome the described difficulties and to enable identification of the ethnic origin based on transcription markers through the provision of innovative tools allowing for the determination of marker mRNA transcription profiles in single human cells, thus enabling the segregation of cells with regard to their Caucasian or Asian origin; based on the above the characterization of forensic traces and precise separation of mixed traces allowing further identification of particular components using common forensic methods. The aim of the invention is, therefore, to provide a set of genetic markers allowing the classification and separation of single human cells according to their ethnic Caucasian or Asian origin.
For this purpose, the inventors performed a comprehensive evaluation of the cellular transcriptome for the Caucasian and Asian populations using the technology of hybridization of expression microarrays. As a result of the analysis 26 genetic markers were identified, with particularly differentiating results obtained for 20 genes (genetic markers), which expression level is correlated with the ethnic origin and differentiated within the examined populations. A classifier validation was also carried out in order to confirm the usefulness of individual genes for ethnic identification of the examined cells. The identified marker genes had not been characterized before in terms of the correlation of their expression with ethnic origin nor the possibility of being used for separation of biological material, which implies innovative character of the obtained results.
Most transcriptome profiling projects so far used cultured lymphocyte cell lines as research material, due no doubt to the ease of obtaining and propagating of such a material. However, the universal character of the population-specific transcriptome markers in other human tissues has not been confirmed. For that purpose, the markers identified in the model cell cultures were compared with those characteristic for tissues most commonly found in forensic evidence material, i.e. peripheral blood, dermis, epidermis or buccal epithelium
An important issue in the context of the analysis of the transcriptome in forensic traces is the stability of mRNA as the material for forensic examination. The stability of RNA was extensively tested with regard to forensic trace collection methods (Juusola & Ballantyne, 2003; Alvarez et al., 2004; Bauer et al., 2003; Nussbaumer et al., 2006; Setzer et al. 2008; Conti T & Buel E 2009; Vennemann & Koppelkamm, 2010); the RNAs extracted from the forensic traces, which were even up to 547 days old, were proven useful in forensic analysis (sequencing).
An important issue is the variability of a transcriptome which is commonly known to be influenced by external factors, such as diet or lifestyle. This element was taken into consideration in the design of microarray analysis, which was carried out using the data filtering method appropriate for selecting markers exhibiting transcription level significantly higher/lower in a particular population, in comparison to the reference sample and the remaining populations, as well as independent from environmental factors. Thus, the invention relates to a method of identification and separation of human cell mixture with regard to the Caucasian or Asian ethnic origin, wherein the method comprises the following:
contacting a cell mixture under conditions facilitating nucleic acid hybridization in human cells with a labeled probe being a nucleotide sequence complementary to the mRNA sequence of at least one genetic marker selected from human CYPIB I, CHI3L2, MOXDl, DBNDD2, UGT2B17, UTS2, CD47, SMC6, PLA2G4C, C160RF75, C10RF115, SLC7A7, PEX6, RABEPl, S1PR4, SNHG8, TBC1D4, UGT2B7, GPR56, HS.137971, IFITM3, LOC644936, LOC729708, CDC42EP5, GAPDHL6, HSPC157, or, more preferably, at least two or more various genetic markers selected from human CYPIBI, CHI3L2, MOXDl, DBNDD2, UGT2B17, UTS2, CD47, SMC6, PLA2G4C, C160RF75, C10RF115, SLC7A7, PEX6, RABEPl, S1PR4, SNHG8, TBC1D4, UGT2B7, GPR56, HS.137971, IFITM3, LOC644936, LOC729708, CDC42EP5, GAPDHL6, HSPC157;
separation of the cell mixture with regard to the differences in the signals of the hybridized probe between a human cell of Caucasian ethnic origin and a human cell of Asian ethnic origin.
In a preferable method of identification and separation of human cell mixture the probe is complementary to at least one genetic marker, more preferably to at least two genetic markers selected from human CYPIBI, CHI3L2, MOXDl , DBNDD2, UGT2B 17, UTS2.
In a preferable method of identification and separation of human cell mixture the probe is fluorescently labeled, whereas the separation of human cells is carried out by means of laser microdissection or in a Fluorescence-Activated Cell Sorter (FACS).
In a preferable method of identification and separation of a human cell mixture a probe being a nucleic acid complementary to the particular genetic marker, is a Stellaris type probe, Singer type probe or a combination thereof.
In a particularly preferable method of identification and separation of human cell mixture a probe complementary to the genetic marker is complementary to the sequence selected from CYPIBI of SEQ ID Nol, CHI3L2 of SEQ ID No2, MOXDl of SEQ ID No 3, DBNDD2 of SEQ ID No4, UGT2B 17 of SEQ ID No5, UTS2 of SEQ ID No6, CD47 of SEQ ID No7, SMC 6 of SEQ ID No8, PLA2G4C of SEQ ID No9, C 160RF75 of SEQ ID No 10, ClORFl 15 of SEQ ID Nol l, SLC7A7 of SEQ ID No 12, PEX6 of SEQ ID Nol 3, RABEPl of SEQ ID Nol4, S1PR4 of SEQ ID Nol5, SNHG8 of SEQ ID Nol6, TBC1D4 of SEQ ID Nol 7, UGT2B7 of SEQ ID Nol8, GPR56 of SEQ ID Nol9, HS.137971 of SEQ ID No20, IFITM3 of SEQ ID No21, LOC644936 of SEQ ID No22, LOC729708 of SEQ ID No23, CDC42EP5 of SEQ ID No24, GAPDHL6 of SEQ ID No25, HSPC157 of SEQ ID No26.
In the most preferable method of identification and separation of a human cell mixture the human cell mixture is a forensic trace.
The invention also relates to an identification method of the ethnic origin of human biological material, in particular a human cell with regard to the Caucasian or Asian origin, which includes a determination of the mRNA quantity in human biological material, in particular a human cell, of at least one genetic marker selected from human CYP1B 1, CHI3L2, MOXD1, DBNDD2, UTS2, CD47, SMC6, PLA2G4C, C160RF75, C10RF115, SLC7A7, RABEPl, S1PR4, SNHG8, TBC1D4, UGT2B7, GPR56, HS.137971, IFITM3, LOC644936, LOC729708, CDC42EP5, GAPDHL6, HSPC157; more preferably, the mRNA quantity is assessed for at least one additional genetic marker other than selected previously, selected from human CYP1B1, CHI3L2, MOXD1, DBNDD2, UTS2, CD47, SMC6, PLA2G4C, C160RF75, C10RF115, SLC7A7, RABEPl, S1PR4, SNHG8, TBC1D4, UGT2B7, GPR56, HS.137971, IFITM3, LOC644936, LOC729708, CDC42EP5, GAPDHL6, HSPC157, wherein the mRNA quantity is determined by means for determining the mRNA quantity for that marker, as well as a comparison of such determined mRNA quantity with the mRNA quantity established for that marker for human biological material of a Caucasian and/or Asian origin, and based on that an identification of the ethnic origin of human biological material.
In a preferable method of identification of the ethnic origin of human biological material, in particular a human cell, the means for determining the mRNA quantity is a microarray analysis, Taqman Low-Density Array (TLDA), Real Time PCR (also called Quantitative PCR, QPCR) amplification or Fluorescence In Situ Hybridization (FISH).
When the means for determining the mRNA quantity in the method of the identification of the ethnic origin of human biological material, in particular a human cell, is the FISH hybridization, preferably the marker is selected from: CYP1B 1, CHI3L2, MOXD1, DBNDD2, UTS2. In a preferable method, when the means for determining the mRNA quantity is a microarray analysis, the marker is selected from human UTS2, SMC6, CD47, HS.137971, C160RF75, RABEPl, S1PR4, HSPC157; or when the means for determining the mRNA quantity is the Taqman Low-Density Array, the marker is selected from human UTS2, CHI3L2, C10RF115, C160RF75; or when the means for determining the mRNA quantity is PCR amplification, the marker is selected from human UTS2.
In a preferable method of identification of the ethnic origin of human biological material, in particular a human cell, the genetic marker is selected from CYP1B1 of SEQ ID Nol, CHI3L2 of SEQ ID No2, MOXD1 of SEQ ID No3, DB DD2 of SEQ ID No4„ UTS2 of SEQ ID No6, CD47 of SEQ ID No7, SMC6 of SEQ ID No8, PLA2G4C of SEQ ID No9, C160RF75 of SEQ ID NolO, C10RF115 of SEQ ID Nol l, SLC7A7 of SEQ ID Nol2, RABEPl of SEQ ID Nol4, S1PR4 of SEQ ID Nol5, SNHG8 of SEQ ID Nol6, TBC1D4 of SEQ ID Nol7, UGT2B7 of SEQ ID Nol8, GPR56 of SEQ ID Nol9, HS.137971 of SEQ ID No20, IFITM3 of SEQ ID No21, LOC644936 of SEQ ID No22, LOC729708 of SEQ ID No23, CDC42EP5 of SEQ ID No24, GAPDHL6 of SEQ ID No25, HSPC157 of SEQ ID No26.
In a preferable method of identification of the ethnic origin of human biological material, in particular a human cell, the biological material originates from a forensic trace.
The invention also relates to use of at least one genetic marker selected from human CYP1B 1, CHI3L2, MOXD1, DBNDD2, UGT2B 17, UTS2, CD47, SMC6, PLA2G4C, C160RF75, C10RF115, SLC7A7, PEX6, RABEPl, S1PR4, SNHG8, TBC1D4, UGT2B7, GPR56, HS.137971, IFITM3, LOC644936, LOC729708, CDC42EP5, GAPDHL6, HSPC157 for the identification and separation of a mixture of human cells with regard to the Caucasian or Asian ethnic origin, based on the differences in the mRNA quantities for that genetic marker in human cells.
In a preferable use of a genetic marker for the identification and separation of human cell mixture with regard to the ethnic Caucasian or Asian origin, additionally an assessment of a difference in mRNA quantities for at least one additional genetic marker other than previously selected from human CYP1B1, CHI3L2, MOXD1, DBNDD2, UGT2B17, UTS2, CD47, SMC6, PLA2G4C, C160RF75, C10RF115, SLC7A7, PEX6, RABEPl, S1PR4, SNHG8, TBC1D4, UGT2B7, GPR56, HS.137971, IFITM3, LOC644936, LOC729708, CDC42EP5, GAPDHL6, HSPC157 is applied.
In a particularly preferable use of a genetic marker for the identification and separation of the cell mixture with regard to the Caucasian or Asian origin, the genetic marker is selected from human CYP1B 1, CHI3L2, MOXD1, DBNDD2, UGT2B17, UTS2.
In a preferable use of a genetic marker for the identification and separation of the cell mixture with regard to the Caucasian or Asian origin, the mRNA quantity is determined by hybridizing the nucleic acids in human cells with a labeled probe being a nucleotide sequence complementary to the mRNA sequence of the selected genetic marker.
Such a probe is preferably fluorescently labeled and the separation of cell mixture is carried out by using laser microdissection or in a Fluorescence - Activated Cell Sorter (FACS). Particularly preferable are the fluorescently labeled Stelleris-type probes, Singer-type probes or the mixture thereof.
In a preferable use of a genetic marker for the identification and separation of the cell mixture with regard to the Caucasian or Asian ethnic origin, the genetic marker is selected from the group comprising: CYP1B 1 of SEQ ID Nol, CHI3L2 of SEQ ID No2, MOXD1 of SEQ ID No 3, DBNDD2 of SEQ ID No4, UGT2B 17 of SEQ ID No5, UTS2 of SEQ ID No6, CD47 of SEQ ID No7, SMC 6 of SEQ ID No8, PLA2G4C of SEQ ID No9, C160RF75 of SEQ ID NolO, C10RF115 of SEQ ID Nol l, SLC7A7 of SEQ ID No 12, PEX6 of SEQ ID Nol3, RABEPl of SEQ ID Nol4, S1PR4 of SEQ ID Nol5, SNHG8 of SEQ ID Nol 6, TBC1D4 of SEQ ID Nol 7, UGT2B7 of SEQ ID Nol 8, GPR56 of SEQ ID Nol 9, HS.137971 of SEQ ID No20, IFITM3 of SEQ ID No21, LOC644936 of SEQ ID No22, LOC729708 of SEQ ID No23, CDC42EP5 of SEQ ID No24, GAPDHL6 of SEQ ID No25, HSPC157 of SEQ ID No26.
In a preferable use of a genetic marker for the identification or separation of the cell mixture according to the ethnic Caucasian or Asian origin, the cell mixture is a forensic trace.
The invention also relates to use of a genetic marker for the identification of the ethnic origin of human biological material, in particular a human cell, with regard to the Caucasian or Asian origin, wherein by means for determining the mRNA quantity, the mRNA quantity is determined in human biological material, for at least one genetic marker selected from human CYP1B1, CHI3L2, MOXD1, DBNDD2, UTS2, CD47, SMC6, PLA2G4C, C160RF75, C10RF115, SLC7A7, RABEPl, S1PR4, SNHG8, TBC1D4, UGT2B7, GPR56, HS.137971, IFITM3, LOC644936, LOC729708, CDC42EP5, GAPDHL6, HSPC157.
Preferably, in the case of such use for the identification of the ethnic origin of human biological material with regard to the Caucasian or Asian origin of human biological material, in particular a human cell, the mRNA quantity is further determined for at least one additional genetic marker other than selected previously from human YP1B 1, CHI3L2, MOXD1, DBNDD2, UTS2, CD47, SMC 6, PLA2G4C, C160RF75, C10RF115, SLC7A7, RABEPl, S1PR4, SNHG8, TBC1D4, UGT2B7, GPR56, HS.137971, IFITM3, LOC644936, LOC729708, CDC42EP5, GAPDHL6, HSPC157. In the use for the identification of the human cell ethnic origin with regard to the Caucasian or Asian population origin, a preferable means for determining the mRNA quantity is a microarray analysis, TaqMan Low Density Array analysis (TLDA), Real-Time PCR (QPCR) amplification or Fluorescent In Situ Hybridization.
When the means for determining the mRNA quantity is the FISH hybridization, the marker is preferably selected from human CYP1B1, CHI3L2, MOXD1, DBNDD2, UTS2. In the equally preferable use, when the means for determining the mRNA quantity is the microarray analysis, the maker is selected from human UTS2, SMC6, CD47, HS.137971, C160RF75, RABEPl, S1PR4, HSPC157; or if the means for determining the mRNA quantity is the TaqMan Low Density Array analysis, the marker is selected from human UTS2, CHI3L2, C10RF115, C160RF75; or if the means for determining the mRNA quantity is the PCR amplification, the marker is selected from human UTS2.
In the use for the identification of the human cell ethnic origin with regard to the Caucasian or Asian population origin, the genetic marker is preferably selected from CYP1B1 of SEQ ID Nol, CHI3L2 of SEQ ID No2, MOXD1 of SEQ ID No 3, DBNDD2 of SEQ ID No4„ UTS2 of SEQ ID No6, CD47 of SEQ ID No7, SMC6 of SEQ ID No8, PLA2G4C of SEQ ID No9, C160RF75 of SEQ ID NolO, C10RF115 of SEQ ID Nol l, SLC7A7 of SEQ ID Nol2, , RABEPl of SEQ ID Nol4, S1PR4 of SEQ ID Nol5, SNHG8 of SEQ ID Nol6, TBC1D4 of SEQ ID Nol 7, UGT2B7 of SEQ ID Nol 8, GPR56 of SEQ ID Nol 9, HS.137971 of SEQ ID No20, IFITM3 of SEQ ID No21, LOC644936 of SEQ ID No22, LOC729708 of SEQ ID No23, CDC42EP5 of SEQ ID No24, GAPDHL6 of SEQ ID No25, HSPC157 of SEQ ID No26.
In a preferable use for the identification of the ethnic origin of the biological material, in particular a human cell, with regard to the Caucasian or Asian population origin, the human biological material originates from a forensic trace.
The invention also relates to a set for the identification of the ethnic origin of human biological material, in particular a human cell, with regard to the Caucasian or Asian population origin or for the identification and separation of cell mixture with regard to the Caucasian or Asian origin that includes molecules of nucleic acid complementary to the sequence of at least one genetic marker, more preferably at least two, even more preferably at least nine markers selected from CYP1B1, CHI3L2, MOXD1, DBNDD2, UGT2B17, UTS2, CD47, SMC6, PLA2G4C, C160RF75, C10RF1 15, SLC7A7, PEX6, RABEPl, S1PR4, SNHG8, TBC1D4, UGT2B7, GPR56, HS.137971, IFITM3, LOC644936, LOC729708, CDC42EP5, GAPDHL6, HSPC157, wherein the molecules of nucleic acid complementary to the sequence of such a genetic marker allow for determination of mRNA quantity for that marker by means of suitable means for determining the mRNA quantity.
A preferable set will include molecules of nucleic acid complementary to the sequence of at least one genetic marker, more preferably to at least two genetic markers selected from human CYP1B1, MOXDl, CHI3L2, SLC2A5, UTS2, DBNDD2, ROBOl .
Therefore, a preferable set will for example include molecules of nucleic acid which are QPCR amplification primers or will be a microarray comprising short ligated nucleic acid fragments complementary to the selected markers, or a targeted array (for TLDA), or a fluorescently labeled nucleic acid, in particular a Stellaris-type or Singer-type probe.
A person skilled in the art will be able to easily select the appropriate length, type and method of labeling of the nucleic acid included in such a kit, depending on the means for determining the differences in mRNA quantities, which wil be involved with such a set
In a preferable set for the identification of the ethnic origin of human biological material, in particular a human cell, with regard to the Asian or Caucasian population origin or the identification and separation of a cell mixture with regard to its Caucasian or Asian origin, the molecules of nucleic acid will be complementary to the sequence of the genetic marker selected from the group comprising CYP1B 1 of SEQ ID Nol, CHI3L2 of SEQ ID No2, MOXDl of SEQ ID No 3, DBNDD2 of SEQ ID No4, UGT2B17 of SEQ ID No5, UTS2 of SEQ ID No6, CD47 of SEQ ID No7, SMC6 of SEQ ID No8, PLA2G4C of SEQ ID No9, C 160RF75 of SEQ ID No 10, C 1 ORF 115 of SEQ ID No 11 , SLC7A7 of SEQ ID No 12, PEX6 of SEQ ID Nol3, RABEPl of SEQ ID Nol4, S1PR4 of SEQ ID Nol5, SNHG8 of SEQ ID Nol6, TBC1D4 of SEQ ID Nol 7, UGT2B7 of SEQ ID Nol 8, GPR56 of SEQ ID Nol 9, HS.137971 of SEQ ID No20, IFITM3 of SEQ ID No21, LOC644936 of SEQ ID No22, LOC729708 of SEQ ID No23, CDC42EP5 of SEQ ID No24, GAPDHL6 of SEQ ID No25, HSPC157 of SEQ ID No26.
In equally preferable methods, uses and sets for the identification of the ethnic origin of human biological material, especially a human cell, or the identification and separation of a human cell mixture, any means allowing for specific probe labeling (radioactive, fluorescent, labeling by incorporation of modified bases, labeling using dendrimers, labeling using biotine and streptavidin, etc.) for single transcripts may be applied and a properly selected method suitable for these means, enabling the detection of single transcripts labeled in this way may be selected. For example, probes may be labeled and the separation carried out using the means for reading/measuring of such signals as radioactive or fluorescent signals.
Stellaris probes were developed and described by Raj et al. (2008). In the described method the detection of single transcripts is possible following an in situ mRNA hybridization with sets of probes fluorescently labeled at one of the termini. Usually, the probe sets comprise around 30 to 50 oligonucleotides, each about 25-35 nucleotides in length. The probes should be designed in such a way that the melting temperatures of particular oligonucleotides are in the range of 40-50°C. Hybridization with such probes is performed in a hybridization buffer containing around 10-20% formamide for several to more than ten hours, in temperature of 20-40°C. The unhybridized probes are rinsed off by incubation of the samples with a wash buffer containing 10-20% formamide.
Singer probes were developed by Singer (Femino et al, 1998) and later modified by other authors. The probe sets usually comprise 4 to 5 oligonucleotides, each about 45-55 nucleotides in length. Probe sequences should be designed in a way ensuring that each oligonucleotide contains at least four thymidines separated from each other by stretches of at least 10 nucleotides. Those thymidines are fluorescently labeled. The hybridization using such probes is performed is a hybridization buffer containing about 50% formamide for several to more than ten hours, in temperature of 20 to 40°C. The unhybridized probes are rinsed off by incubation of the samples with a wash buffer containing 50% formamide.
A genetic marker should be understood as a recognizable nucleic acid fragment used for identification. An example of a genetic marker may be a transcription marker.
A forensic trace should be understood as biological material of human or other origin useful in forensic analysis.
Biological material should be understood as material of biological origin containing nucleic acids. An example of biological material is eg. isolated nucleic acids, in particular mRNA, DNA, mixtures thereof, cells, cells cultures, etc.
The tests that led to the identification of marker genes distinguishing between the Caucasian and Asian population, and based on that, allowing for the separation of human cells from two or more individuals, included several steps and alternative procedures:
1. Identification of marker mRNA sequences (marker genes) based on whole-genome microarray analyses;
2. QPCR - based ethnic identification of the examined biological material for the selected genetic (transcription) markers; 3. TLDA - based ethnic identification of the examined biological material for the selected genetic markers;
4. Demonstrating a difference in the mRNA quantities for marker mRNA sequences, in particular by using the in situ fluorescent labeling of discriminatory genes (marker genes) in the examined biological material (FISH technique). The labeling targets were newly identified mRNA marker sequences distinguishing between the donors' ethnicities, as well as the markers described in literature, determining the types of bodily fluids and tissues.
5. Cell separation based on characteristic transcription patterns, in particular using laser microdissection or FACS;
6. Genetic identification of the separated cell populations;
The performed analyses resulted in the identification of 26 genetic markers (transcription markers), 20 of which were particularly preferable, and for which a significant difference in mRNA expression levels between human cells from the Caucasian and Asian populations was shown. The genetic markers showing differences in mRNA expression levels, relate to mRNAs for the genes: CYP1B1, CHI3L2, MOXD1, DBNDD2, UGT2B17, UTS2, CD47, SMC6, PLA2G4C, C160RF75, C10RF115, SLC7A7, PEX6, RABEPl, S1PR4, SNHG8, TBC1D4, UGT2B7, GPR56, HS.137971, IFITM3, LOC644936, LOC729708, CDC42EP5, GAPDHL6, HSPC157. It was found that the differences in mRNA expression levels for as few as a single genetic marker selected out of the CYP1B1, CHI3L2, MOXD1, DBNDD2, UGT2B17, UTS2 genes made it possible to discriminate between the Asian and Caucasian origin of a given cell. In order to determine the ethnic origin of a cell with the highest probability, the variability in transcription profiles between the Asian and Caucasian populations for genetic markers of the invention may be used in any way and in order to increase the test reliability, the differences in mRNA quantities may be shown for at least two markers, more preferably for between 3 to 26 genetic markers of the invention, selected from CYP1B 1, CHI3L2, MOXD1, DBNDD2, UGT2B17, UTS2, CD47, SMC6, PLA2G4C, C160RF75, C10RF115, SLC7A7, PEX6, RABEPl, S1PR4, SNHG8, TBC1D4, UGT2B7, GPR56, HS.137971, IFITM3, LOC644936, LOC729708, CDC42EP5, GAPDHL6, HSPC157. A person skilled in the art, will be able to select suitable means for showing the differences in quantities of a given mRNA, allowing for differentiation of a human cell with regard to its Asian or Caucasian origin. The means for showing differences in mRNA quantities may be, for example, chosen from the microarray analysis, quantitative PCR or labeling by means of the FISH technique, which may for example be further combined with separation by laser microdissection or FACS sorting.
A person skilled in the art, depending on the particular means applied to determine the differences in mRNA quantities will be able to select based on the information disclosed herein appropriate genetic markers of the invention and their numbers to be able to determine on these grounds whether a given biological material originates from the Caucasian or Asian population.
If the means for the identification of the differences in the transcription levels of individual markers, between samples representing various ethnic populations is:
a microarray analysis, the application of a two-gene classifier makes it possible to discriminate between the two populations with almost 90% accuracy, whereas a nine- gene classifier will have an almost 100% accuracy in distinguishing between the Caucasian and Asian populations. Therefore, while using microarray analysis as the means for the identification of the differences in the transcription levels, it is preferable to select at least two markers, more preferable to select between three to twenty six markers, and the most preferable to select at least nine markers selected from CYP1B1, CHI3L2, MOXD1, DBNDD2, UGT2B17, UTS2, CD47, SMC6, PLA2G4C, C160RF75, C10RF115, SLC7A7, PEX6, RABEP1, S1PR4, SNHG8, TBC1D4, UGT2B7, GPR56, HS.137971, IFITM3, LOC644936, LOC729708, CDC42EP5, GAPDHL6, HSPC157.
QPCR, amplification primers and fluorescently labeled probes for marker genes should be designed according to standard principles. In order to properly identify the ethnic origin of the examined samples it is sufficient to carry out an analysis of a single selected marker gene. The expression of the examined genes should be normalized relative to the reference index obtained by the calculation of the geometric mean for the expression of the control genes (the so called HKG - housekeeping). Hence, in the case of the application of QPCR as the means for the identification of the differences in the transcription levels, it is preferable to select at least one marker preferably selected from UTS2 and UGT2B17, more preferably between two to twenty six markers selected from CYP1B 1, CHI3L2, MOXD1, DBNDD2, UGT2B 17, UTS2, CD47, SMC6, PLA2G4C, C160RF75, C10RF115, SLC7A7, PEX6, RABEP1, S1PR4, SNHG8, TBC1D4, UGT2B7, GPR56, HS.137971, IFITM3, LOC644936, LOC729708, CDC42EP5, GAPDHL6, HSPC157.
TLDA (targeted microarrays), amplification primers and fluorescently labeled probes should be designed for discriminatory transcripts and reference genes (control genes, the so called HKG - housekeeping). In order to properly identify the ethnic origin of the examined samples it is preferable to carry out an analysis of one or two marker genes. Hence, in the case of the application of the TLDA as the means for the identification of the differences in the transcription levels it is preferable to select at least one marker, more preferably two selected from UTS2 and UGT2B17, more preferably between three and twenty six markers selected from CYP1B 1, CHI3L2, MOXDl, DBNDD2, UGT2B17, UTS2, CD47, SMC6, PLA2G4C, C160RF75, C10RF115, SLC7A7, PEX6, RABEPl, S1PR4, SNHG8, TBC1D4, UGT2B7, GPR56, HS.137971, IFITM3, LOC644936, LOC729708, CDC42EP5, GAPDHL6, HSPC157. In the case of TLDA it is particularly preferable to use the markers UTS2 and UGT2B17, CHI3L2, C10RF115 and C160RF75;
- FISH, a set of Stellaris or Singer probes should be designed according to the described principles for at least one marker gene with a high discrimination power, more preferably, for a set of two genes indicated in statistical analyses. Hence, in the case of the application of FISH as the means for the identification of the differences in the transcription levels, it is preferable to select at least one marker, more preferably two, even more preferably between three and twenty six markers selected from CYP1B 1, CHI3L2, MOXDl, DBNDD2, UGT2B17, UTS2, CD47, SMC6, PLA2G4C, C160RF75, C10RF115, SLC7A7, PEX6, RABEPl, S1PR4, SNHG8, TBC1D4, UGT2B7, GPR56, HS.137971, IFITM3, LOC644936, LOC729708, CDC42EP5, GAPDHL6, HSPC157. In the case of FISH technique, it is particularly preferable to use the markers CYP1B1, CHI3L2, MOXDl, DBNDD2, UGT2B17, UTS2.
In particularly preferred embodiments of the invention, the appropriate means for the identification of the differences in the quantities of a given mRNA will utilize the genetic markers exhibiting the difference in the mRNA transcription levels between the biological material from the Caucasian population and the biological material from the Asian population, for markers selected from CYP1B1 of SEQ ID Nol, CHI3L2 of SEQ ID No2, MOXDl of SEQ ID No 3, DBNDD2 of SEQ ID No4, UGT2B17 of SEQ ID No5, UTS2 of SEQ ID No6, CD47 of SEQ ID No7, SMC6 of SEQ ID No8, PLA2G4C of SEQ ID No9, C160RF75 of SEQ ID NolO, C10RF115 of SEQ ID Nol l, SLC7A7 of SEQ ID Nol2, PEX6 of SEQ ID Nol3, RABEP1 of SEQ ID Nol4, S1PR4 of SEQ ID Nol5, SNHG8 of SEQ ID Nol6, TBC1D4 of SEQ ID Nol7, UGT2B7 of SEQ ID Nol8, GPR56 of SEQ ID No 19, HS.137971 of SEQ ID No20, IFITM3 of SEQ ID No21, LOC644936 of SEQ ID No22, LOC729708 of SEQ ID No23, CDC42EP5 of SEQ ID No24, GAPDHL6 SEQ ID No25, HSPC157 SEQ ID No26.
Cell separation using marker genes discriminating between Caucasian and Asian population
Cell separation, based on variable levels of marker mRNA expression may be carried out using any technique facilitating the discrimination and separation of cells according to the differences in expression levels of marker mRNA(s).
The exemplary methods are as follows:
(a) laser microdissection- by means of cell excision by laser beam;
(b) separation in a Fluorescence-Activated Cell Sorter (FACS).
Laser microdissection is a precise and automated method of cell isolation, which additionally enables quantitative evaluation of the levels of marker mRNA expression with a single-molecule precision. At the same time, the technique carries some disadvantages such as high time consumption, high depreciation costs (due to the need to purchase expensive equipment), as well as the necessity to employ highly qualified and experienced staff. By applying laser microdissection, the cells can be isolated directly from the surface of the forensic adhesive tapes used to collect forensic traces. The performed tests showed that the technology of laser microdissection, combined with organic DNA extraction, enables obtaining a genetic profile using only a few cells.
An alternative method of the separation of fluorescently labeled cells is the Fluorescence-Activated Cell Sorting. It is a modern and fully automated technology allowing to measure physical and chemical properties of the cells, to determine the fluorescence levels in particular cells and to separate them physically, based on precisely defined parameters. A great advantage of this method is its efficiency. Modern devices can analyze and separate up to a few thousands cells per second. Moreover, they are equipped with several lasers and fluorescence detectors, which allows for cell labeling using a whole variety of fluorescent dyes and for simultaneous application of several probes. The use of FACS for the separation of cells based on their transcriptional signature significantly simplifies and accelerates the identification process. The described method constitutes an innovative approach, in particular to the analysis of complex forensic traces collected at the crime scene. The performed analyses have been very promising. The experiments involving model cell lines with different expression levels of differentiating genes allowed for their precise separation and identification. The use of FACS sorter made it possible, so far, to obtain a full profile using a few dozen human cells (following sorting).
Although FACS is a known and commonly used cell segregation method, its application to separate the cells in forensic traces based on their transcriptional profiles, especially with regard to the differences in expression patterns of the marker genes discriminating between the Caucasian and Asian populations, is an entirely new approach.
Both methods of cell separation: laser microdissection and FACS sorting, enable an a priori examination of the material, eg. forensic traces. It means that the mixture components are identified and segregated before isolation of their genetic material and analysis of STR polymorphism (genetic profiling). In forensic sciences it is undoubtedly an innovative approach that enables obtaining pure DNA profiles for individual components of the mixture.
Identification of cells following separation
Following cell separation, cellular DNA is subjected to PCR amplification using multiplex STR polymorphism analysis kits in order to determine the genetic profiles of the groups of cells segregated with the use of genetic markers discriminating between the Caucasian and Asian population, and to identify the donors.
By means of the separation methods described above it is possible to avoid losses related to DNA isolation from the material, commonly collected on swabs and isolated using the purification column kits.
By applying laser microdissection, the cells can be isolated directly from the surface of the forensic adhesive tapes used to collect forensic traces. The performed tests showed that the technology of laser microdissection, combined with organic DNA extraction, enables to obtain a genetic profile using as few as 4 to 10 human cells, which constitutes substantial progress when compared with traditional methods requiring around 150 cells, let alone the material loss resulting from the isolation process.
The use of the FACS sorter made it possible, so far, to obtain a full profile using less than only a few dozens of human cells (following sorting). However, tests are in progress aiming to determine a minimum detection threshold for this method of mixture separation. The publications cited in the description and the references thereof are hereby also included as reference.
BRIEF DESCRIPTION OF DRAWINGS
In order to provide a better understanding of the invention, it has been illustrated by the embodiments and accompanying drawings, wherein:
FIG. 1. illustrates the selection of the optimal size of the classifier, based on the cross- validation. The determined optimal size of the classifier was nine genes.
FIG 2. Results obtained in TLDA analyses were subjected to a statistical analysis with a Student's t-test and a Volcano plot with the Data Assist software (Life Technologies). Out of 19 examined transcription-differentiating genes, 5 showed statistically significant differences (p< 0.05) in the expression levels between the examined populations, Caucasian (CEU) and Asian (CHB).
FIG. 3. Hybridization of lymphocyte cells representing Asian population (A) or Caucasian population (B) with probes specific for DBNDD2 gene. A confocal image taken using a Nikon Al microscope, graphically processed using Imaris software. Grey areas indicate locations of cell nuclei, bright spots correspond to the signals obtained as a result of FISH with single transcripts. The number of signals corresponds to the number of transcript molecules within the cell.
FIG. 4. The analysis of post-hybridization signals using BlobFinder software. Within the field of vision there is a single HeLa cell, which has been hybridized with the probes specific for the GAPDH reference gene. The circles indicate the hybridization signals, while the line demarcates the cell nucleus. The table presents the software - generated report, including the number of signals, number of nuclei and the total areas of the nuclei in analyzed cells.
FIG. 5. The chart illustrating mean values distribution - for markers CHI3L2 vs. CYP1B 1, o
- Caucasian population,□ - Asian population)
FIG. 6. The chart illustrating mean values distribution - for markers MOXD1 vs. CYP1B1, o
- Caucasian population,□ - Asian population)
FIG. 7. Projection of variables (values for markers UTS2, MOXD1, DBNDD2, CHI3L2, CYP1B1) on the PCI (ca. 70%) and PC2 (ca.15%) planes.
FIG. 8. Projection of events (Lines) on the PCI (ca.70%) and PC2 (ca.15%) planes, o - Caucasian population,□ - Asian population) FIG. 9. Fluorescence profile obtained using a FACS sorter for single population lines, (A) Asian, (B) Caucasian, and (C) a mixture of lymphocytes of the two lines, following the hybridization with the DB DD2 probe. The sort gates set up for acquiring the cells exhibiting low (P9) and high (PI 3) fluorescence are marked.
FIG. 10. Fragments of genetic profiles obtained by using the STR polymorphism analysis method for cells sorted out of a mixture (Fig. 9C), corresponding to P9 (A) and P13 (B) fractions.
FIG. 11. Separation of a tissue mixture (vaginal epithelial cells + buccal epithelial cells) hybridized with the STATH probe (specific for the buccal epithelium), carried out using the FACS sorter. A) the P5 fraction (the lowest fluorescence), gate size: 4.7%; sort size: ca. 100 cells; profile: pure - vaginal epithelium. B) the P4 fraction (the highest fluorescence), gate size: 5%, sort size: ca. 100 cells, profile: pure - buccal epithelium.
FIG. 12. Separation of a tissue mixture (vaginal epithelium individual A + buccal epithelium from individual ) hybridized with the MUC7 probe (specific for the buccal epithelium), carried out using a FACS sorter. A) the P5 fraction (the lowest fluorescence), gate size: 5%; sort size: ca. 100 cells; profile: pure -individual A. B) the P4 fraction (the highest fluorescence), gate size: 5%, sort size: ca. 100 cells, profile: pure - individual B.
FIG. 13. Separation of a tissue mixture (vaginal epithelium from individual A + epidermis from individual B) hybridized with the LCE1C probe (specific for the epidermis) + the STATH probe (specific for the buccal epithelium), carried out using a FACS sorter. A) Cells sorted at FAM - specific wavelength (LCE1C probe specific). The P5 fraction (the lowest fluorescence), gate size: 5%; sort size: ca. 250 cells; profile: pure -individual A. B) Cells sorted at FAM - specific wavelength (detected STATH probe). The P4 fraction (the highest fluorescence), gate size: 16%; sort size: ca. 250 cells; profile: pure - individual B.
FIG. 14. A) Schematic representation of the MMI membrane subjected to hybridization or laser microdissection. The areas of deposition of the cell mixture from the forensic stain (SK/BS), positive controls (SK and BS) and negative control are marked with circles (the area of the membrane without biological material, subjected to hybridization is marked as„-„). B) microscopic image of cells, following hybridization with the MUC7 probes (continuous circles) and LCE1C probes (dashed circles). The arrows indicate the cells which were qualified for laser microdissection.
FIG. 15. Genetic profiles of the fractions of buccal epithelial cells and epidermal cells, each of 10 cells, obtained by means of cell mixture separation using laser microdissection. The profiling was carried out using NGM SElect kit. A) Fraction of buccal epithelium cells (BS) - pure profile. B. Fraction of epidermal cells (SK) - pure profile.
FIG. 16. Microscopic image of the L32/L33 lymphocyte lines mixture, following hybridization with CHI3L2 and CYP1B1 probes - classification of cells into certain intensity intervals was carried out with NIS-Elements software.
FIG. 17. The results of the genetic profiling of the mixture of lymphocytes representing the Caucasian (L32) and Asian (L33) population, hybridized with the CH3L2 and CYP1B1 probes. A) Experimentally determined intensity intervals along with marked mixture components. B) Genetic profiles of the cell fractions classified into particular intensity intervals: Bl - Mixed genetic profile obtained for the intensity intervals: 0-500 and 1100- 2000; B2 - Pure genetic profile of the L32 line obtained for the intensity interval: 500-1100; B3 - Pure genetic profile of the L33 line obtained for the intensity interval: >2000.
The following examples are presented merely to illustrate the invention and not to be limiting and should not be equated with all its scope, which is defined in the appended claims.
EXAMPLES
In the examples below, unless specified otherwise, standard materials and methods described in Sambrook J., Russell D.W. Molecular cloning: A laboratory manual. Cold Spring Harbor Laboratory Press, New York were used or procedures in accordance with the manufacturers' recommendations were followed regarding specific materials and methods.
Example 1
Identification of mRNA marker sequences based on whole-2enome microarray analysis Biological material and quality assessment
The total RNAs isolated from 127 cell lines of lymphocytes from Caucasian and Asian populations were hybridized on HumanHT-12v4 Expression BeadChip Kit expression arrays produced by Illumina. For technical quality evaluation of hybridized RNA samples the GenomeStudio V2010.1 program by Illumina was used. The analysis included such parameters as signal intensity (including basic metabolism genes), background level, noise level, the number of detected actively expressed genes, or hybridization control, as well as mutual relations thereof. Two samples were excluded based on the quality analysis.
Further analysis included a set of 125 RNA samples, including:
-67 male samples (35 of Caucasian origin, 32 of Asian origin);
-10 female samples (5 of Caucasian origin and 5 of Asian origin),
-2x24 biological repeats of the male population. All samples were normalized jointly using the quintile method.
Unsupervised analysis
Unsupervised analysis was performed by means of hierarchical clustering, the results of which are usually presented in a dendrogram. The hierarchical clustering (in each step of the analysis) consists of the calculation of the matrix of distances between all objects, the creation of clusters by combining the objects and/or clusters created in the previous step, and the calculation of the distances between clusters combining several objects. In the clustering the Euclidean distance was used to calculate the distances between samples, while the Ward's method was applied to combining individual clusters.
The aim of this step of the analysis was to detect and eliminate any technical factors and to filter out, based on biological replicates of cellular lines, the genes displaying high measurement reliability.
The clustering performed for all the probes on the microarray revealed as the main source of variability in the analyzed dataset a variability between individual microarrays.
In order to eliminate the effect of individual microarrays on the obtained profiles of gene expression, the so called data centering method was applied (i.e. bringing the mean expression of each gene within one microarray to a common value). This step is performed as follows: for each gene, the mean expression in particular arrays is calculated by the following formula:
B
S ib ^ ' S ij
7=1
-for each gene and each sample, the mean value of the gene obtained in the array, on which a given sample was hybridized is subtracted:
Based on the "centered" probes, a hierarchical clustering was repeated, which in result proved the effectiveness of the applied method in eliminating the effects of individual microarrays.
Filtering genes displaying high measurement reliability
Despite the elimination of the effect of individual microarrays, a small number of samples, for which biological replicates were created, did not group together, which implied insufficient reliability of the expression measurement in case of a certain, significant in numbers, group of genes. Therefore a novel method of filtering out genes with the high reliability of measurement ("measurable" genes) was developed, that followed the procedure below:
-for each gene, based on all samples, the coefficient of variation was calculated: vc _ var(g, )
1 rr , where g - mean expression of the i gene
-for each gene the coefficient of variation was calculated for each line with three biological repetitions:
Figure imgf000021_0001
Sij where 7=1,2, ... ,23- lines that have three repetitions
k= 1,2,3 -k. repetition of the j line
gijk-rsxean expression of the i gene in the j line
-for each gene the third quartile of the coefficient of variation was calculated:
vc.3q, = centile((vc , vcl2, vcl23), 0. 75)
-for each gene the measurability coefficient was calculated as the quotient of the total variation and the third quartile of the variation coefficient of this gene : vc,
mci =
vc.3qj
In an arbitrary decision, the genes deemed as "measurable" were those, for which "the coefficient of measurability" mc > 1.5. A total of 3732 genes were filtered out as fulfilling this criterion. The hierarchical clustering of the analyzed samples based on the "measurable" genes showed an ideal grouping of all available biological replicates.
Supervised analysis - selection of genes discriminating between the Caucasian and Asian populations.
A supervised analysis was carried out using the Student's t test for independent samples. A total of 67 male cell lines were analyzed, with 35 males from the Caucasian and 32 from the Asian population. The test was performed for 3732 probes that met the measurability criterion mc > 1.5. For each probe, a statistical significance (p-value) and the adjusted value were determined by applying the correction for multiple False Discovery Rate (FDR) testing and for the fold-change in expression level within the analyzed populations.
As a result of the supervised analysis, 189 genes were selected that met the FDR<5% criterion. Twenty of those genes had been strongly deregulated with eleven genes exhibiting an increased (fold-change > 1.5) and nine genes exhibiting a decreased (fold-change < 0.67) expression in the Asian population as compared to the Caucasian population. The sequences of microarray probes on the expression matrixes HumanHT-12v4 Expression BeadChip Kit for 26 genetic markers, for which differences between populations were observed, are presented in Table 9. A list of 20 strongly deregulated genes with their statistical significance values expression level fold-changes are shown in Table 1.
Table 1. Genes strongly deregulated between the Asian and Caucasian populations
Figure imgf000022_0001
The expression profile of the most significant UTS2 gene indicates its high classification potential, by allowing to identify correctly 30 (93.7%) Asian samples and 32 (91.4%) Caucasian samples.
Classification
The classification was carried out using Diagonal Linear Discriminant Analysis (DLDA). The Student's t test was used for the selection of genes. The effectiveness of the obtained classifier was tested both using the LOO cross-validation method (Leave One Out - i.e., in each iteration one sample is a test sample, whereas the remaining samples are a teaching set), and by using microarray data of an independent set of 10 female lines (5 from each of the Asian and Caucasian populations). In the first step of the classification, (based on cross-validation) the optimal size of the classifier has been determined to be nine genes (Fig. 1, Tab. 2).
Next, the effectiveness of the nine-gene classifier was evaluated by assessing the following parameters in the cross-validation:
TP + TN
Accuracy: : 97% p (positive) -Asian population
TP + FP + TN + FN
TP N (negative) - Caucasian population
= 94%
TP + FN
TP=30
Sensitivity: TN
= 100%
TN + FP
TP
100%
TP + FP ~
Specificity: TN
■ 95%
TN + FN ~
The obtained classifier proved highly effective by correctly classifying all 10 samples of the independent set of female lines.
Table 2 Classifier enes
Figure imgf000023_0001
Example 2
Ethnic identification of the biological material using the transcritpional markers and OPCR.
Validation of transcripts selected based on the previously performed microarray analyses was carried out using the Real Time PCR (QPCR) technique on a group of 68 RNA samples isolated from blood drawn from 37 Caucasian donors and 31 Asian donors. Out of the list of 189 genes that significantly differentiated Caucasian and Asian population at the established significance threshold of FDR = 5%, five genes with elevated expression in the Asian population and 4 genes with the lowest expression in Asian population as compared with the Caucasian population were selected for further analysis (Tab. 1).
Among the 5 selected genes with elevated expression in the Asian population, the statistically significant differences in expression levels between the examined populations were observed for UTS2 gene (p<0.001; Tab.3). The expression of this gene was 13 times more abundant in Asian population as compared with Caucasian population.
No statistically significant differences were observed between populations for the remaining studied genes (Table 3).
Table 3. Relative values reflecting gene expression in Asian and Caucasian population.
Figure imgf000024_0001
Among the genes displaying the lowest expression in Asian population, only UGT2B 17 yielded statistically significant difference (Table 4). The UGT2B17 expression was 6 times lower in Asian population as compared with Caucasian population.
Table 4. Relative values reflecting gene expression in Asian and Caucasian population
Figure imgf000024_0002
Example 3 Ethnic identification of the donor, based on analysis of the biological material using the transcriptional markers and targeted miniarrays (TaqMan Low Density Arrays - TLDA)
Total RNA was isolated from 80 cell lines with the use of RNeasy Mini Kit and RNeasy Plus Mini Kit (Qiagen). RNA isolates corresponding to the cell lines that have not been previously subjected to microarray analysis were selected for TaqMan Array analysis. Nucleic acid concentration was assessed by spectrophotometric analysis, i.e., UV absorbance measurement at 260 nm (A260) using the NanoDrop ND-1000 apparatus. RNA quality was determined by capillary electrophoresis in Agilent 2100 Bioanalyser, by applying the RNA 6000 Nano Assay (Agilent Technologies) and the suitable RNA 6000 Nano Marker. RNA integrity was determined by applying the RNA Integrity Number (RIN). The obtained RNA isolates were of good quality as confirmed by the observed RIN values in the range 8-9.5, and fulfilled the quality criteria for further analyses. Aliquots of RNAs obtained from 72 cell lines were reversely transcribed into cDNA by using the Enhanced Avian RT First Strand Synthesis Kit (Sigma).
Validation of the selected genes of the classifier obtained in microarray analysis was performed using TLDA Array Cards (Life Technologies). This method is based on estimation of gene expression using 384-well reaction plates with anchored TaqMAn GB probes. The sample expression level is assessed relative to the calibrator and it is recorded as a fold- change of expression level in relation to the calibrator (RQ value). A calibrator may be made from any randomly selected sample or a mix of samples. Any potential fluctuations in expression levels, stemming from laboratory errors or unequal quantities of the input material are normalized against a selected gene characterized by constitutive expression, i.e., expressed continuously, regardless of the tissue type or experimental conditions (so-called housekeeping gene). It is currently recommended that more than one housekeeping gene be applied within any particular cycle of experiments (Sorby et al. 2010).
In the first step of validation, in order to select the reference genes stably expressed in the examined system, a test reaction was carried out using the TLDA card (Life Technologies) containing 16 HKG (housekeeping) genes typically used in gene expression studies. Based on the above experiment, the three most stable genes: GAPDH, IP08 and PPIA were selected for further analyses. Validation with the use of the 384-well TLDA cards involved 19 genes selected based on the microarray analysis, that is: UTS2, CHI3L2, C10RF115, PLA2G4C, CDC42EP5, UGT2B7, TBC1D4, MOXD1, UGT2B17, SLC7A 7, S1PR4, IFITM3, CYP1B1, SMC 6, CD47, C160RF75, RABEPl, PEX6, HSPC157, for which the probe sets and primers were selected from the Life Technologies database, so to identify the same transcripts as those detected by the probes used in microarray analysis. Due to limited capacity of the TLDA cards, the validation of the aforesaid genes, including the reference genes, was carried out using two sets of cards, referred to as Classifier l and Classifier_2. Beside the probes specific for the aforesaid genes, one of the cards additionally contained the probes specific for 8 genes which, in the literature, have been reported to had been exposed to selection pressure. Predominantly, the latter belong to the pigmentation genes and they differentiate the analyzed populations in terms of allele frequency.
The measurement of relative gene expression using the TLDA cards was conducted in three repetitions for each of the genes tested. The wells of the 384-well TLDA cards were loaded with lOOul aliquots of the reaction mix containing cDNA, Gene Expression Master mix (Life Technologies) and water. Next, the TLDA cards were centrifuged twice at 1200rpm for lmin in Heraeus centrifuge (Thermo Scientific), and sealed with a card sealer to prevent well-to- well contamination. Finally, the TLDA cards containing 8 analyzed samples and microarray classifier genes were processed using the Real-Time PCR 7900 HT system (Life Technologies). In total, Classifier l card set containing 13 out of 19 microarray classifier genes was applied to testing of 33 Caucasian lines and 39 Asian lines. The calibrator for Classifier l was made up of 5 Caucasian lines. A second set of TLDA cards - Classifier_2, which contained the remaining 6 out of 19 microarray classifier genes and the pigmentation genes, was applied to testing of isolates from 49 lines (21 Caucasian and 28 Asian) as well as epidermal samples collected from 14 donors representing Caucasian population and 13 donors representing Asian population. Two different calibrators were used in Classifier_2 analyses, that is: i) 5 Caucasian lines for cultured cell line samples and ii) epidermis mixture from two Asian donors for the epidermis samples. Amplification curves were analyzed with RQ Manager software. For each of the genes tested, the threshold ct value was determined, indicative of the reaction proceeding into logarithmic phase of product increase, used for determination of input quantity of product in the reaction mix. In the next step, all samples were subjected to further statistical analysis, wherein the Student's t-test and the Volcano plot were performed with Data Assist software (Life Technologies) (Fig. 2), whereas the Mann- Whitney U test was performed with Statistica v.9.0. software. On the basis of the obtained results, it was found that 5 genes out of 19 tested, that is: UTS 2, UGT2B17, CHI3L2, C10RF115 and C160RF75, exhibited statistically significant differences (p< 0,05) in transcription levels between the Caucasian and Asian population. Out of the above genes, the greatest fold-change was observed for UTS2 and UGT2B17. UTS2 gene expression is 33 times more abundant in Asian population than in Caucasian population, whereas UGT2B17 gene expression is 30 times more abundant in Caucasian population, as compared to Asian population. CHI3L2 and C10RF115 genes are expressed more abundantly in Asian population (3.7 and 2.9 times, respectively), whereas C160RF75 gene expression is 2.5 times more abundant in Caucasian population.
Example 4
Detection of transcriptional variability of selected markers in biological material using FISH.
The experiments were aimed at determining the optimal conditions for in situ detection of single transcripts as well as suitability of particular genetic markers for ethnic identification of cells using FISH technique.
Based on microarray analysis results and on literature data, a set of genes was selected for further analysis of transcription levels in lymphocyte cell lines derived from donors of Caucasian and Asian origin, with the use of FISH technique. The Stellaris probes (Probe Designer, Bioserach Technologies, USA) were designed for genes with sufficiently long transcripts (>1000nts ), such as: CYP1B 1, CHI3L2, DB DD2, MOXD1, UGT2B 17, ROBOl and SLC2A5. The probe sets consisted of 30-48 terminally, fluorescently labeled oligonucleotides, each 20-30 nts in length (Tab. 5). The Singer probe sets, consisting of five internally labeled probes, each ~ 50 nts in length, were designed for UTS2 gene, for which the length of the transcript was insufficient for designing the Stellaris probes. For comparison purposes, the Singer probe sets were also designed for CD47 and SMC6 genes.
The probes were synthesized by Biosearch TEchnologies (USA). Hybridization was performed on lymphocytes from cultured cell lines obtained from Coriell Cell Repositories (USA). The cells were immobilized on poly-lysine coated coverslips, followed by fixation in 4% formaldehyde solution. After cell fixation, the coverslips were stored in 70% ethyl alcohol for 1 to 30 days. Prior to hybridization, the specimens were washed in the wash buffer (2xSSC; 10-50% formamide - depending on the probe used). After 10-15 min. the wash buffer was replaced with hybridization buffer (2xSSC; 10% dextran sulphate; 5 nM probe). The microscopic slides with specimens were covered with coverslips in order to prevent evaporation of the buffer and incubated at 30-37°C for a period of few to several hours. After incubation, the samples were washed in the wash buffer (2x30-40min) and then, fixed in the final buffer (0.1% glucose; 0.01M Tris pH 8.0; 2xSSC). Next, the slides were covered with coverslips and sealed with a colourless nail varnish. Fig. 3 shows an exemplary post- hybridization image.
Post-hybridization (FISH) microscopic imaging
The microscopic imaging was carried out with the use of the confocal scanning NIKON TiE Ecllipse Al microscope (NIKON), equipped with diode lasers (405nm, 561nm, 638nm), an argon ion laser (457nm/488nm/514nm) as well as visible light source. The cells were imaged using two types of oil immersion objectives: Apo TIRF lOOx Oil DIC N2 and Plan Apo VC 60x Oil DIC N2. Imaging was carried out at the resolution of 512x512; at the speed of 1 frame / 24s; using the linear scanning mode "Integrate". During the three-dimensional (3D) scanning, the "z" layers were acquired at intervals of 0.2μπι. The quantitative analysis of fluorescent signals was performed with the use of the BlobFinder software, allowing for preliminary analysis, i.e., identification and quantification of fluorescent signals resulting from hybridization, which numbers correspond to the numbers of transcript molecules present in analyzed cells. The BlobFinder allows for identification of single cells, separation of unspecific background from specific fluorescent signals and quantification of signals within any particular cell (Fig. 4).
Table 5. A table summarizing the probes and fluorochromes used in the study
Probe name Fluorescent dye Probes (number, length)
GAPDHst TAMRA Stellaris: 32, 20 nts
ROB01 st TAMRA Stellaris: 35, 30 nts
SLC2A5st Cal Fluor Orange 560 Stellaris: 35, 25 nts
UGT2B17st Cal Flour Red 610 Stellaris: 48, 20 nts
UGT2B17st bls Cal Flour Red 610 Stellaris: 30, 30 nts
UTS2SI Cal Fluor Red610 Singer: 5, 38-48 nts
CHI3L2st Quasar 670 Stellaris: 48, 25 nts
DBNDD2st FAM Stellaris: 40, 20 nts
CYP1 B1 st Quasar670 Stellaris: 48, 25 nts MOXD1 st1 Call Fluor 610 Stellaris: 48, 25 nts
MOXD1 st2 Call Fluor 610 Stellaris: 48, 20 nts
CD47st Quasar 670 Stellaris: 36, 25 nts
CD47SI Quasar 670 Singer: 5, 49-54 nts
MSC6SI FAM Singer: 5, 47-52 nts
Statistical analysis of the expression level determined based on post hybridization imaging The transcription level was assessed in 16-18 cell lines for each population investigated. The mean number of transcripts per cell was calculated, based on measurement data collected from at least 10 lymphocytes, whereby the measurements were repeated at least twice (Tab. 6,7).
Table 6. Descriptive statistics of the results obtained for Caucasian population.
Figure imgf000029_0002
mean value; SA - standard deviation
Table 7. Descriptive statistics of the results obtained for Asian population.
Figure imgf000029_0001
SLC2A5st 91 .9 80.0 103.8 92.2 50.9 141 .0 23.1
MOXD1st1 163.5 141 .2 185.8 167.2 127.4 213.4 26.7 i£- mean value; SB - standard deviation
The data on expression levels of the marker genes: GAPDH, UGT2B17, ROBOl, SLC2A5, CYP1B1, UTS2B 17, MOXD1, DB DD2, CHI3L2, obtained from FISH experiments, were statistically analyzed (Tab. 6,7). The analysis covered the selected marker genes identified as a result of microarray analyses; and the GAPDH housekeeping gene typically used in various gene expression studies; as well as the ROBOl and SLC2A5 differential genes selected based on a review of the literature (Spielman et al. 2007).
A preliminary analysis showed that the data conformed to normal distribution (according to three different tests). Additionally, the analysis of homogeneity of variance was performed using three different tests: In some of the cases, (shown in bold in Table 8) the p- value was lower than the applied significance threshold (0.05), which gave grounds for rejection of the null hypothesis assuming homogeneity of variance. Homogeneity of variance is one of the criteria for applying the t test. Hence, in addition to the t test, the nonparametric tests were performed that are not encumbered with the above criterion.
Table 8. The results of F-test, Levene's test and Brown-Forsythe test:
Figure imgf000030_0001
Based on the statistical tests it was concluded that the GAPDH probe (reference gene with a constitutive high expression level) was not differential in respect of the two populations (Caucasian and Asian), which confirmed the literature data. For the remaining cases, involving both probes being the subject of the study and those drawn from literature, the t test and nonparametrical tests yielded statistically significant difference between mean values. The p-values above the applied significance threshold (0.05) indicate the possibility of occurrences of random differences between mean values at the following frequencies:
for UTS2Si probe: 6 in 100 000 cases,
for MOXDlSt2 probe: 4 in 1 000 000 cases,
for CYPlB lSt probe: 8 in 100 000 000 cases,
for DB DD2St probe: 2 in 10 000 cases,
for CHI3L2St probe: 4 in 100 000 000 cases,
for UGT2B 17St"bis probe: 1 in 100 000 cases,
for ROB01St probe: 9 in 1 000 cases,
for SLC2A5St probe: 3 in 1 000 000 cases,
for MOXDlStl probe: 1 in 1 000 cases.
Selecting the probes (genetic markers) with the highest differentiation potential
Based on the obtained results, the probes were ranked by their differentiation potential as follows:
CHI3L2St > CYPlB lSt > SLC2A5St > MOXDSt2 > UGT2B17St bis > UTS2Si > DB DD2St > MOXDStl > ROB01St.
With regard to the spread between mean values, the probes were ranked from the highest to the lowest spread value (Δ), as follows:
CYPlBlSt > MOXDlSt2 > DB DD2St>MOXD 1 Stl > CHI3L2St > UTS2Si > UGT2B17St bis > SLC2A5St > ROB01St.
With regard to the differences between confidence interval limits the differentiation potentials of the probes were ranked as follows:
CYP IB 1 st > MOXD 1 St2 > CHI3L2St > UGT2B 17St bis > MOXD 1 Stl > DB DD2St > SLC2A5St > UTS2Si > ROB01St
Based on the results of the statistical analysis it was concluded that the single most differentiating probe was CYP IB 1 . Other highly differentiating probes were MOXDl^ and CHI3L2. The occurrence of a random difference between mean values is less probable for the CHI3L2St probe than for the MOXD 1st2 probe. However, the MOXD 1st2 probe exhibited greater spread between mean values. Additionally, for the combinations: CHI3L2St vs CYPlBlSt (Fig. 5) and MOXDlSt2 vs CYPlBlSt (Fig. 6) the dispersions of values in Asian and Caucasian population were plotted on graphs. A full separation of samples into two groups was observed for the combined CHI3L2 and CYP IB 1 probes In the next step, a chemometric analysis was performed in order to carry out a complex evaluation of all variables. As a result of the analysis of main components the relations between variables (values for particular probes) (Fig. 7) and between samples (lines representing particular populations) (Fig. 8) were visualized.
Based on the projection of events (individual lines - L) on the plane defined by the two main components (Fig. 8), it was shown that the dataset consisted of two groups of elements (events), which reflected the ethnic origins attributed to them. Moreover, based on PCA analysis results (presented in Fig. 7), it was found that the results for probe sets: DB DD2 / UTS2Si and MOXDlSt2 / CHI3L2 were highly correlated, which thus means that when the measurement for both probes is carried out, the information provided is not merely specific information derived from each separate probe.
In summary, as a result of the study, it was concluded that by measuring the probes specific for selected genetic markers, for example CYP1B1 probe or CHI3L2 probe, preferably at least two probes, it was possible to distinguish the Caucasian population from the Asian population. Additionally, it was shown that the marker genes selected as a result of microarray analyses being a part of the current study, performed better in terms of population differentiation than those selected based on literature data (ROBOl i SLC2A5, UGT2B 17). The analysis showed that - in accordance with expectations - the GAPDH reference gene did not differentiate populations.
Example 5
Separation of cells based on characteristic transcriptional patterns using laser microdissection or FACS sorting followed by genetic profiling of separated cells
Example 5 A
Separation of mixture of two fluorescently labeled lymphocyte lines representing Caucasian and Asian population using DBNDD2 marker and FACS sorting
This example aimed at demonstrating the suitability of one of the transcription markers - DB DD2 for separation of cell mixtures composed of donor cells of different ethnic origin, with the use of FACS sorting.
The cell lines were cultured according to standard methods, Aliquots of appx. lO ml of dense cell suspension from each cell line were centrifuged, the growth medium was discarded and the cell pellet washed in lxPBS. Next, the cells were fixed in 4% formaldehyde in lxPBS for 30 minutes. After fixation, the cells were washed twice in lxPBS and stored in 70% ethyl alcohol, for 1 up to 30 days, at +4°C. Prior to hybridization, the cells were washed twice in the wash buffer (2xSSC; 15% formamide) and then, incubated in 100 μΐ hybridization buffer (containing DB DD2 probe at a final concentration of 5nM). Hybridization was carried out for 16-18 hrs, at 37°C, in darkness. After hybridization period, the unattached probes were removed by washing in the wash buffer (2 x 30min, at 37°C). After the final wash, the cells were suspended in lxPBS.
The samples prepared in this way were sorted using BD FACSCalibur flow cytometer, equipped with 488nm and 645nm lasers. In the first step, the fluorescence profiles of the tested lines were determined (Fig. 9A, B). For this purpose, 10000 cells of each line were used. Additional experiments showed that it was feasible to limit the number of cells required for obtaining the DNA profile to 1000. Next, the fluorescently labeled cells were mixed at 1 : 1 ratio and the fluorescent profile of the mixture was determined (Fig. 9 C). The sort gates (P) were set, based on fluorescence profiles of single and mixed cell lines. The sort gates were set to cover the entire fluoresce intensity spectrum to enable separation of cells displaying the lowest and highest specific fluorescence. The gate sizes (percentage of cell population displaying the lowest and highest fluorescence intensity) were determined experimentally and they varied depending on the probe used. In the experiment described herein, the 3%, 5% and 10%) gating was applied (to both ends of the fluorescence spectrum). The 5% gating allowed for unequivocal separation of populations.
As a result of the sorting, 500 events (ca. 250 cells) were separated from the mixture within each of the gates. The sorted cells were subjected to genetic identification in order to confirm a successful separation of populations. For identification purposes, a standard method of polymorphism analysis of 17 STR loci was applied using NGM SElect forensic identification kit (Applied Biosystems).
The genetic profiles obtained by STR analysis method for P9 and P13 cell populations confirmed unequivocally that each population contained solely the cells of a single donor and that the contamination was marginal (Fig. 10A, B).
Furthermore, the successful separations of Caucasian and Asian populations were also obtained for lymphocyte cell mixtures labeled with single probes: UTS2 , MOXD1 , CYPlBlSt, CHI3L2St and with combinations of sets of two different probes.
Example 5B Separation of mixture of two forensic traces (saliva, vaginal swabs and epidermis with MUC7, STATH, LCE1C markers) using FACS sorter
The aim of this example was to demonstrate the possibility of separation and identification of cells within the mixture deposited as a forensic trace, with the use of methods disclosed in the invention, based on the detection of differential transcription levels of the marker genes and on sorting of the labeled cells in the FACS sorter.
Setting up and hybridization of cell mixtures
The mixture of vaginal epithelial cells and buccal epithelial cells (ΙΟΟμΙ each) was deposited on the canvas fabric, which was left to dry out for 7 days at room temperature. Then, the fragment of fabric containing biological material was excised and incubated in 1% PBS for lOmin, at 37°C in order to rinse off the cells. Next, the incubate was centrifuged at 900 rev./min, followed by resuspending the cell pellet in 200μ1 lxPBS. In case of mixtures composed of buccal epithelial cells and epidermis, the cells were collected by swabbing the inside of the cheek and the palm/ forearm, respectively. Following incubation of the swabs and spinning down the cells (as above), the pellets were resuspended and the suspensions combined. Prepared cell mixtures were fixed in 4% formaldehyde in lxPBS for appx. 30min, washed twice in lxPBS and dehydrated in 70% ethyl alcohol. Immediately prior to hybridization, the cells were washed twice in the wash buffer (2xSSC; 15% formamide) and then, incubated in 100 μΐ hybridization buffer containing appropriate probes. Hybridization was carried out for 16-18 hrs, at 37°C, in darkness. After hybridization period, the unattached probes were removed by washing in the wash buffer (2 x 30min, at 37°C). After final wash, the cells were suspended in lxPBS and the prepared samples were used in sorting experiments. Hybridizations were carried out in the following combinations:
-vaginal epithelium + buccal epithelium - STATH probe (Fig. 11)
-vaginal epithelium + buccal epithelium - MUC7 probe (Fig. 12)
-buccal epithelium + epidermis - LCE1C and STATH probes (Fig. 13)
Sorting of tissue mixtures and genetic profiling
As a result of sorting experiment on buccal epithelium/vaginal epithelium tissue mixtures, hybridized with STATH and MUC7 probes, the fractions of 100 cells each were obtained, corresponding to the defined values (sizes) of the sort gates (P). Cell fractions were subjected to DNA extraction by applying the standard column method, followed by genetic profiling of DNA isolates with the use of NGM SElect Kit. For both the STATH probe and MUC7 probe hybridization experiments, the separation yielded pure mixture components in fractions P4 and P5 (Fig. 11, 12).
For hybridization of epidermis/buccal epithelium mixture, a duplex probe approach was applied, in which two probes were used, each specific for a different component of the mixture (LCE1C for epidermis; STATH for buccal epithelium). Next, two independent sorting experiments were carried out in the FAM fluorescence channel (corresponding to the label on LCE1C probe) and CFR610 fluorescence channel (corresponding to the label on STATH probe), respectively. From both sorts, only the P4 fraction was collected, i.e., corresponding to the sort gate delimiting the cells exhibiting the strongest fluorescence signal. As a result, a successful separation of mixture into both components was observed, i.e., the P4 fraction collected in the FAM channel contained pure epidermis cells profile, whereas the P4 fraction collected in the Texas Red channel (equivalent to CFR610) contained pure buccal epithelium cells profile (Fig. 13).
It was shown that sorting of morpholologically identical tissues after FISH hybridization with tissue-specific probes, with the use of the FACS sorter is a fast and efficient method of separation of cell mixtures into pure components. While applying this method, it is important to bear in mind that the pure mixture components are delimited by specific fluorescence intensity intervals, defined by the settings of the sort gates. The fluorescence intensity intervals need to be adjusted individually for each of the tested probes. Physical separation of particular intensity intervals guarantees obtaining pure mixture components.
Example 5C
Separation of mixture of human tissues from a forensic trace (epidermis/buccal epithelium - LCElC/MUC7bis markers) using laser microdissection
The aim of this example was to demonstrate the possibility of separation and identification of cells within the mixture deposited as a forensic trace, with the use of methods disclosed in the invention, based on the detection of differential transcription levels of the marker genes and on laser microdissection - assisted separation of labeled cells from the mixture. The object of the study was a simulated forensic trace set up in a way that the perpetrator had bitten the inside of victim's forearm. Biological material anticipated in such a stain is a mixture composed of victim's epidermis (SK) and perpetrator's buccal epithelial cells (BS). A sample was collected by a swabbing method using a sterile cotton swab moistened with F water (Ambion). A wad of cotton containing biological material was incubated in IxPBS for 10 min, in 37°C in order to rinse off the cells. Next, the incubate was centrifuged at 900 rev/min. The obtained cell suspension was deposited onto the MMI membrane by pipetting. To provide controls of hybridization reaction, apart from cell suspension from the trace, the epidermis and buccal epithelial cell suspensions were deposited onto separate areas of the membrane (Fig. 14). The hybridization was carried out with the use of two probes: LCE1C - specific towards epidermial mRNA; labeled with the Q560 fluorochrome and MUC7 - specific towards buccal epitelium; labeled with FAM fluorochrome. The specimens were imaged with fluorescent microscopy, at the excitation wavelengths specific for the fluorochromes used for labeling of the FISH probes and the fluorescence signal intensities were assessed. The intensity assessment was performed by NIS Elements software, using the "Mean Intensity" parameter for calculation of mean signal intensity, expressed in absolute units, within a specified Region Of Interest (ROI) corresponding approximately to the area of the imaged cell.
Next, the cells were classified into individual tissues (SK or BS) by assuming for both probes the fluorescence intensity threshold of 1100 units, which meant that to be classified into either tissue the cell had to exhibit the fluorescence intensity greater than 1100 units at emission spectrum characteristic for either Q560 or FAM fluorochrome. Out of the classified cells (i.e., with fluorescence intensity above the 1 100 threshold) two groups of 10 cells each, corresponding to both fluorochromes, were isolated from the specimen by using laser microdissection. In order to confirm separation accuracy, both groups were subjected to DNA profiling with the use of NGM SElect kit (Fig. 15).
The experiment demonstrated that by using tissue-specific mRNA markers in combination with laser microdissection technique it was possible to separate morphologically indistinguishable cell mixtures and to obtain pure genetic profiles of the mixture components. High sensitivity and specificity of this method allow for a range of potential applications in the field of forensic analyses.
Example 5D Separation of mixture of two fluorescently labeled B lymphocyte cell lines - Caucasian and Asian (CHI3L2 + CYP1B1 markers) with the use of laser microdissection.
This example demonstrates the possibility of separating the mixture of cells originated from two donors representing different ethnic populations by using fluorescent microscopy imaging in order to detect differential transcription of marker genes at the single cell level, followed by physical segregation of cells using laser microdissection.
Setting up lymphocyte mixtures
Lymphocytes were obtained directly form suspended cultured cell lines. Following a cell count using the Fuchs-Rosentall counting chamber, the suspensions were brought to desired densities by adding lxPBS buffer. Cell mixtures at 1 : 1 ratio intended for laser microdissection were deposited onto the MMI membrane in lOul aliquots.
Hybridization (FISH) in cell mixtures:
Mixture of a Caucasian line - L32 and Asian line - L33 prepared as described above, was deposited onto the MMI membrane. Pure cell lines L32 and L33 were used as positive controls of hybridization reaction, whereas the membrane area without any biological material served as the negative control. Hybridization of samples intended for microscopic imaging and laser microdissection was performed directly on membranes in a two-day cycle. On the first day, the cell line mixtures and the control cell lines were prepared, deposited onto the membranes and incubated for 3 hrs at 4°C in order to allow for adhesion of cells to the surface. Next, the cells were fixed in 4% formaldehyde in lxPBS for appx. 30min, washed twice in lxPBS and dehydrated in 70% ethyl alcohol. Immediately prior to hybridization, the cells were washed twice in the wash buffer (2xSSC; 15% formamide) and then, incubated in 100 μΐ hybridization buffer (containing CHI3L2 and CYP1B1 probes). Hybridization was carried out for 16 hrs, in 37°C, in darkness. On the next day, the unattached probes were removed by washing in the wash buffer (2 x 30min, at 37°C). After hybridization, the preparations were subjected to nuclear staining with DAPI, followed by washing in lxPBS. Stained preparations were dehydrated in a series of decreasing concentrations of ethyl alcohol in order to prevent cell degradation and to reduce PBS buffer salts crystallization.
Microscopy imaging of cell mixtures after FISH hybridization The microscopic imaging was carried out with the use of confocal scanning NIKON TiE Ecllipse Al microscope (NIKON), equipped with diode lasers (405nm, 561nm, 638nm), an argon ion laser (457nm/488nm/514nm) as well as visible light source.
Measurement of posthybridization signal intensities
After hybridization, the specimens were subjected to fluorescence microscopy imaging at the excitation wavelength specific for Quazar670 fluorochrome used for labeling CHI3L2 and CYP1B1 probes. The intensity assessment was performed by NIS Elements software, using the "Mean Intensity" parameter for calculation of mean signal intensity, expressed in absolute units, within a specified Region Of Interest (ROI) corresponding approximately to the area of imaged cell.
Determining the threshold values for posthybridization signal intensities
Within analyzed cell lines, a characteristic fluorescence intensity gradient was observed, reflecting, in addition to population-specific transcription signatures, also various biological side effects, e.g., different phases of the cellular life cycle, levels of vitality, hybridization effectiveness, transcriptional bursting, etc. Based on earlier observations, the fluorescence intensity thresholds were experimentally set in order to separate putative single components of the mixture, i.e., cells belonging to either L32 or L33 line. The adopted intensity gradient thresholds are schematically presented in Fig. 17.
Classification of cells into signal intensity intervals
Based on microscopic images of posthybridization preparations of cell mixtures (Fig. 16); measurements of mean intensity and on the adopted threshold values, the cells were classified into one of four fractions (Fig. 17A):
- fraction of 10 cells with mean signal intensity ranging between 0 and 500 units
- fraction of 10 cells with mean signal intensity ranging between 500 and 1100 units
- fraction of 10 cells with mean signal intensity ranging between 1100 and 2000 units
- fraction of 10 cells with mean signal intensity above 2000 units
Laser microdissection of classified cell fractions
Separation of the 10-cell fractions from the membrane was carried out by applying laser microdissection technique, using MMI CellCut Plus® microdissection system (Molecular Machines & Industries AG), coupled with NIKON AIR confocal microscope. The isolated cells were lifted off the membrane by using the MMI CapLift automatic cell collection system, based on adhesion of cells to the isolation caps (MMI IsolationCap), i.e., adhesive lids of standard reaction tubes, which can be used in further analysis.
Genetic profiling of separated cell fractions
Separated cell fractions were subjected to DNA extraction with the use of PicoPure™ DNA Extraction Kit. Next, separation accuracy was confirmed by DNA profiling within 17 STR loci using NGM SElect Kit (Applied Biosystems). Separation was considered successful wherever a pure genetic profile was obtained within all analyzed DNA loci (Fig. 17 B).
It was shown that by applying FISH hybridization towards identification of the selected population-specific mRNA markers, followed by laser microdissection, it was possible to separate a morphologically indistinguishable B lymphocyte mixture composed of cells of the donors representing Caucasian and Asian population. Furthermore, it was possible to obtain pure genetic profiles of both donors. Until now, there were no practical methods of separation of cell mixtures with regard to ethnic origin of the donors, and therefore the above technique should be considered an innovation.
While applying the method of mixture separation based on posthybridization signal intensity, it is important to bear in mind that the pure mixture components are delimited by specific fluorescence intensity intervals, defined by intensity thresholds. The fluorescence intensity intervals need to be adjusted individually for each of the tested probes. Separation of particular intensity intervals guarantees obtaining pure mixture components.
Table 9. A table summarizing the genetic markers, microarray probe IDs and microarray probe sequences.
Gene name Microarray probe SEQ Microarray probe sequence
ID ID
NO
CYP1 B1 2120053 27 GCTTTCATGTCCCAGAACTTAGCCTTTACCTGTGAAGTGTTACTACAGCC
CHI3L2 4830202 28 GTGAGCACACCCACATTTTCACTGCCATTATCTGGGACAGCAGAACCAGG
MOXD1 3310520 29 CCGTGTGAGCCAGTCCAGGAGGGTGTAAGTTCTGAATGGTTCCTTGCTGA
DBNDD2 6420168 30 AGGTGGTACTCAAGCCATGCTGCCTCCTTACATCCTTTTTGGAACAGAGC
UGT2B17 6290189 31 TCCAAAAGTTACCCCACACAAAAGTTACTGAGCTTCCTTATGTTTCACAC
UTS2 6290228 32 AGCTTCCCTTCTACAGATACTGCCAGAGATGCTGGGTGCAGAAAGAGGGG
CD47 6270286 33 CAGGTAACTTTGAAGAGATGAGCAGTGAGTGACCAGGCAGTTTTTCTGCC
SMC6 3460091 34 GTTGTGCTCTTTCATACAGAACGGGAAACATAATCCTCAGGTATCCCAGC
PLA2G4C 1990672 35 TGCCTTCCACTGCTCCTTTATGACTGCACTTCTAGCCAGTAGCTCTGCAC
C160RF75 4540072 36 CGGCCCCTGAAAGACAACAGCTCCCTTTCTGCTTCGGACACCACTCAAAC
C10RF115 770564 37 GGGCTGCCCTGGGTTTCTCTTACTCAATCCCTGGAGTGTAAGCATTTGGA SLC7A7 4830632 38 CCTGATGTGGAAAGCAGGGGTTTCTGGTCTACTGGCTAGAGCTAAGGAAG
PEX6 3440086 39 TCCAGGAGATCCCAGGGTGCAAAGTGGCATTGAGACAGCAGCAACAGCTC
RABEP1 240110 40 AGCTCAGTTGGGTTTCACGAGTGTTCCTGTGCTTATATTCAGTCTGTGCC
S1 PR4 3370075 41 AGCGTGCGGAGCATCTGAAGTTGCAGTCTTGCGTGTGGATGGTGCAGCCA
SNHG8 2060181 42 CGAGAACCGTCAGTTTGAGCCAGATGGAAGCTGAGCTGAACACATTACGA
TBC1 D4 7650669 43 GATGAACAGATGACAAACATCTGAAACCCCCTCCGCACTGTTACCCAGTG
UGT2B7 5420450 44 GAGCTAAACACCTTCGGGTTGCAGCCCACGACCTCACCTGGTTCCAGTAC
GPR56 5490768 45 GTAGATTGCTGGCCTGTTGTAGGTGGTAGGGACACAGATGACCGACCTGG
HS.137971 6020692 46 GTTTGGGGTGTCAATTTTCTATCAACGCAAACACCTTAGGGTACCCAGAC
IFITM3 6650242 47 GATCTTCCAGGCCTATGGATAGATCAGGAGGCATCACTGAGGCCAGGAGC
LOC644936 3850168 48 CATGGGATGCATTGTTACAGGAAGTCCCTTGCCCTCCTAAAAGCCACCTC
LOC729708 7400193 49 GTGAACCACCCACGTGAGGGAATAAACCTGGCACTAGGTCTTGTGAAAAA
CDC42EP5 3370730 50 CTTCCCGCCCGGCACCCCACTTCTGTATACATAAACGGCCAAGGTGTGTG
GAPDHL6 5270541 51 TCGACTTCAACAGCAACACCCACTCTTCTACCTTCGATGCTGGGGCAGCC
HSPC157 3460022 52 GCCTGGAATCAGAATCACTGGCTAAGTCTTGCTGCTTGCTTCTTGCTCAG
Table 10. A table summarizing genetic markers, according to the invention, applicable for distinguishing between Caucasian and Asian population.
Figure imgf000040_0001
Figure imgf000041_0001
SEQ ID NO 25 GAPDHL6 Homo sapiens XM 001726954 847-897
glyceraldehyde-3-phosphate .1
dehydrogenase-like 6
(GAPDHL6), mRNA
SEQ ID NO 26 HSPC157 Homo sapiens long NR_023919.1 539-589
intergenic non-protein
coding RNA 339
(LINC00339), transcript
variant 2, non-coding RNA
LITERATURE
The list of literature below is incorporated entirely as reference:
Chao Tian,et al. European Population Genetic Substructure: Further Definition of Ancestry Informative Markers for Distinguishing among Diverse European Ethnic Groups. Mol Med. 2009 November; 15(11-12): 371-383.
Juusola J, Ballantyne J. Messenger RNA profiling: a prototype method to supplant conventional methods for body fluid identification. 2003, Forensic Sci Int 135:85-96
Juusola J., Ballantyne J., Multiplex mRNA profiling for the identification of body fluids. Forensic Science International. 2005, 152, 1-12
Juusola J., Ballantyne J., mRNA profiling for body fluid identification by multiplex quantitative RT-PCR, Journal of Forensic Sciences. 2007, 52, 1252-1262
Hanson E. K., Lubenow H., Ballantyne J., Identification of forensically relevant body fluids using a panel of differentially expressed microRNAs, Analytical Biochemistry. 2009, 387, 303- 314.
Haas C, Hanson E., Kratzer A. [et al.], Selection of highly specific and sensitive mRNA biomarkers for the identification of blood, Forensic Science International: Genetics. 2011, 5, 449-458.
Nussbaumer C, Gharehbaghi-Schnell E., Korschineck L, Messenger RNA profiling: A novel method for body fluid identification by Real-Time PCR, Forensic Science International. 2006, 157, 181-186.
Richard ML, Harper KA, Craig RL, Onorato AJ, Robertson JM, Donfack J. Evaluation of mRNA marker specificity for the identification of five human body fluids by capillary electrophoresis. Forensic Sci Int Genet. 2012 Jul;6(4):452-60
Raj A., van den Bogaard P., Rifkin S., van Oudenaarden A., Tyagi S. Imaging individual mRNA molecules using multiple singly labeled probes. Nature Methods, 2008 5(10) pp. 877-9 Femino A., Fay F., Fogarty K., Singer R. 1998. Visualization of single RNA transcripts in situ. Science, 280(5363) pp. 585-90
Spielman RS, Bastone LA, Burdick JT, Morley M, Ewens WJ, Cheung VG. 2007. Common genetic variants account for differences in gene expression among ethnic groups Nat Genet. 2007 Feb;39(2):226-31
Storey JD, Madeoy J, Strout JL, Wurfel M, Ronald J, Akey JM. 2007,Gene-expression variation within and among human populations. Am J Hum Genet. 2007 Mar;80(3):502-9.
The sequence listing below is consistent with the descriptions and data presented in Table 9 and 10.
SEQUENCE LISTING
<110> IMDIK
<120> Methods of identification of ethnic origin based on differentiated transcription profiles and genetic markers used in those methods <130> PZ/2067/AGR/PCT
<160> 26
<170> Patentln version 3.5
<210> 1
<211> 5160
<212> DNA
<213> Homo sapiens
<400> 1
aaaacccgga ggagcgggat ggcgcgcttt gactctggag tgggagtggg agcgagcgct 60 tctgcgactc cagttgtgag agccgcaagg gcatgggaat tgacgccact caccgacccc 120 cagtctcaat ctcaacgctg tgaggaaacc tcgactttgc caggtcccca agggcagcgg 180 ggctcggcga gcgaggcacc cttctccgtc cccatcccaa tccaagcgct cctggcactg 240 acgacgccaa gagactcgag tgggagttaa agcttccagt gagggcagca ggtgtccagg 300 ccgggcctgc gggttcctgt tgacgtcttg ccctaggcaa aggtcccagt tccttctcgg 360 agccggctgt cccgcgccac tggaaaccgc acctccccgc agcatgggca ccagcctcag 420 cccgaacgac ccttggccgc taaacccgct gtccatccag cagaccacgc tcctgctact 480 cctgtcggtg ctggccactg tgcatgtggg ccagcggctg ctgaggcaac ggaggcggca 540 gctccggtcc gcgcccccgg gcccgtttgc gtggccactg atcggaaacg cggcggcggt 600 gggccaggcg gctcacctct cgttcgctcg cctggcgcgg cgctacggcg acgttttcca 660 gatccgcctg ggcagctgcc ccatagtggt gctgaatggc gagcgcgcca tccaccaggc 720 cctggtgcag cagggctcgg ccttcgccga ccggccggcc ttcgcctcct tccgtgtggt 780 gtccggcggc cgcagcatgg ctttcggcca ctactcggag cactggaagg tgcagcggcg 840 cgcagcccac agcatgatgc gcaacttctt cacgcgccag ccgcgcagcc gccaagtcct 900 cgagggccac gtgctgagcg aggcgcgcga gctggtggcg ctgctggtgc gcggcagcgc 960 ggacggcgcc ttcctcgacc cgaggccgct gaccgtcgtg gccgtggcca acgtcatgag 1020 tgccgtgtgt ttcggctgcc gctacagcca cgacgacccc gagttccgtg agctgctcag 1080 ccacaacgaa gagttcgggc gcacggtggg cgcgggcagc ctggtggacg tgatgccctg 1140 gctgcagtac ttccccaacc cggtgcgcac cgttttccgc gaattcgagc agctcaaccg 1200 caacttcagc aacttcatcc tggacaagtt cttgaggcac tgcgaaagcc ttcggcccgg 1260 ggccgccccc cgcgacatga tggacgcctt tatcctctct gcggaaaaga aggcggccgg 1320 ggactcgcac ggtggtggcg cgcggctgga tttggagaac gtaccggcca ctatcactga 1380 catcttcggc gccagccagg acaccctgtc caccgcgctg cagtggctgc tcctcctctt 1440 caccaggtat cctgatgtgc agactcgagt gcaggcagaa ttggatcagg tcgtggggag 1500 ggaccgtctg ccttgtatgg gtgaccagcc caacctgccc tatgtcctgg ccttccttta 1560 tgaagccatg cgcttctcca gctttgtgcc tgtcactatt cctcatgcca ccactgccaa 1620 cacctctgtc ttgggctacc acattcccaa ggacactgtg gtttttgtca accagtggtc 1680 tgtgaatcat gacccactga agtggcctaa cccggagaac tttgatccag ctcgattctt 1740 ggacaaggat ggcctcatca acaaggacct gaccagcaga gtgatgattt tttcagtggg 1800 caaaaggcgg tgcattggcg aagaactttc taagatgcag ctttttctct tcatctccat 1860 cctggctcac cagtgcgatt tcagggccaa cccaaatgag cctgcgaaaa tgaatttcag 1920 ttatggtcta accattaaac ccaagtcatt taaagtcaat gtcactctca gagagtccat 1980 ggagctcctt gatagtgctg tccaaaattt acaagccaag gaaacttgcc aataagaagc 2040 aagaggcaag ctgaaatttt agaaatattc acatcttcgg agatgaggag taaaattcag 2100 tttttttcca gttcctcttt tgtgctgctt ctcaattagc gtttaaggtg agcataaatc 2160 aactgtccat caggtgaggt gtgctccata cccagcggtt cttcatgagt agtgggctat 2220 gcaggagctt ctgggagatt tttttgagtc aaagacttaa agggcccaat gaattattat 2280 atacatactg catcttggtt atttctgaag gtagcattct ttggagttaa aatgcacata 2340 tagacacata cacccaaaca cttacaccaa actactgaat gaagcagtat tttggtaacc 2400 aggccatttt tggtgggaat ccaagattgg tctcccatat gcagaaatag acaaaaagta 2460 tattaaacaa agtttcagag tatattgttg aagagacaga gacaagtaat ttcagtgtaa 2520 agtgtgtgat tgaaggtgat aagggaaaag ataaagacca gaaattccct tttcaccttt 2580 tcaggaaaat aacttagact ctagtattta tgggtggatt tatccttttg ccttctggta 2640 tacttcctta cttttaagga taaatcataa agtcagttgc tcaaaaagaa atcaatagtt 2700 gaattagtga gtatagtggg gttccatgag ttatcatgaa ttttaaagta tgcattatta 2760 aattgtaaaa ctccaaggtg atgttgtacc tcttttgctt gccaaagtac agaatttgaa 2820 ttatcagcaa agaaaaaaaa aaaagccagc caagctttaa attatgtgac cataatgtac 2880 tgatttcagt aagtctcata ggttaaaaaa aaaagtcacc aaatagtgtg aaatatatta 2940 cttaactgtc cgtaagcagt atattagtat tatcttgttc aggaaaaggt tgaataatat 3000 atgccttgta taatattgaa aattgaaaag tacaactaac gcaaccaagt gtgctaaaaa 3060 tgagcttgat taaatcaacc acctattttt gacatggaaa tgaagcaggg tttcttttct 3120 tcactcaaat tttggcgaat ctcaaaatta gatcctaaga tgtgttctta tttttataac 3180 atctttattg aaattctatt tataatacag aatcttgttt tgaaaataac ctaattaata 3240 tattaaaatt ccaaattcat ggcatgctta aattttaact aaattttaaa gccattctga 3300 ttattgagtt ccagttgaag ttagtggaaa tctgaacatt ctcctgtgga aggcagagaa 3360 atctaagctg tgtctgccca atgaataatg gaaaatgcca tgaattacct ggatgttctt 3420 tttacgaggt gacaagagtt ggggacagaa ctcccattac aactgaccaa gtttctcttc 3480 tagatgattt tttgaaagtt aacattaatg cctgcttttt ggaaagtcag aatcagaaga 3540 tagtcttgga agctgtttgg aaaagacagt ggagatgagg tcagttgtgt tttttaagat 3600 ggcaattact ttggtagctg ggaaagcata aagctcaaat gaaatgtatg cattcacatt 3660 tagaaaagtg aattgaagtt tcaagtttta aagttcattg caattaaact tccaaagaaa 3720 gttctacagt gtcctaagtg ctaagtgctt attacatttt attaagcttt ttggaatctt 3780 tgtaccaaaa ttttaaaaaa gggagttttt gatagttgtg tgtatgtgtg tgtggggtgg 3840 ggggatggta agagaaaaga gagaaacact gaaaagaagg aaagatggtt aaacattttc 3900 ccactcattc tgaattaatt aatttggagc acaaaattca aagcatggac atttagaaga 3960 aagatgtttg gcgtagcaga gttaaatctc aaataggcta ttaaaaaagt ctacaacata 4020 gcagatctgt tttgtggttt ggaatattaa aaaacttcat gtaattttat tttaaaattt 4080 catagctgta cttcttgaat ataaaaaatc atgccagtat ttttaaaggc attagagtca 4140 actacacaaa gcaggcttgc ccagtacatt taaatttttt ggcacttgcc attccaaaat 4200 attatgcccc accaaggctg agacagtgaa tttgggctgc tgtagcctat ttttttagat 4260 tgagaaatgt gtagctgcaa aaataatcat gaaccaatct ggatgcctca ttatgtcaac 4320 caggtccaga tgtgctataa tctgttttta cgtatgtagg cccagtcgtc atcagatgct 4380 tgcggcaaaa ggaaagctgt gtttatatgg aagaaagtaa ggtgcttgga gtttacctgg 4440 cttatttaat atgcttataa cctagttaaa gaaaggaaaa gaaaacaaaa aacgaatgaa 4500 aataactgaa tttggaggct ggagtaatca gattactgct ttaatcagaa accctcattg 4560 tgtttctacc ggagagagaa tgtatttgct gacaaccatt aaagtcagaa gttttactcc 4620 aggttattgc aataaagtat aatgtttatt aaatgcttca tttgtatgtc aaagctttga 4680 ctctataagc aaattgcttt tttccaaaac aaaaagatgt ctcaggtttg ttttgtgaat 4740 tttctaaaag ctttcatgtc ccagaactta gcctttacct gtgaagtgtt actacagcct 4800 taatattttc ctagtagatc tatattagat caaatagttg catagcagta tatgttaatt 4860 tgtgtgtttt tagctgtgac acaactgtgt gattaaaagg tatactttag tagacattta 4920 taactcaagg ataccttctt atttaatctt ttcttatttt tgtactttat catgaatgct 4980 tttagtgtgt gcataatagc tacagtgcat agttgtagac aaagtacatt ctggggaaac 5040 aacatttata tgtagccttt actgtttgat ataccaaatt aaaaaaaaat tgtatctcat 5100 tacttatact gggacaccat taccaaaata ataaaaatca ctttcataat cttgaaaaaa 5160
<210> 2
<211> 1516
<212> DNA
<213> Homo sapiens
<400> 2
cacagcagct gtggctgggg agcccagatg aagtgtggct ctatcttgta tgtgagcaca 60 cccacatttt cactgccatt atctgggaca gcagaaccag gtttggctca acagatttct 120 ctttccaccc atctattgca ggtgtagtgg tcttgctgct tctccaggga ggatctgcct 180 acaaactggt ttgctacttt accaactggt cccaggaccg gcaggaacca ggaaaattca 240 cccctgagaa tattgacccc ttcctatgct ctcatctcat ctattcattc gccagcatcg 300 aaaacaacaa ggttatcatc aaggacaaga gtgaagtgat gctctaccag accatcaaca 360 gtctcaaaac caagaatccc aaactgaaaa ttctcttgtc cattggaggg tacctgtttg 420 gttccaaagg gttccaccct atggtggatt cttctacatc acgcttggaa ttcattaact 480 ccataatcct gtttctgagg aaccataact ttgatggact ggatgtaagc tggatctacc 540 cagatcagaa agaaaacact catttcactg tgctgattca tgagttagca gaagcctttc 600 agaaggactt cacaaaatcc accaaggaaa ggcttctctt gactgcgggc gtatctgcag 660 ggaggcaaat gattgataac agctatcaag ttgagaaact ggcaaaagat ctggatttca 720 tcaacctcct gtcctttgac ttccatgggt cttgggaaaa gccccttatc actggccaca 780 acagccctct gagcaagggg tggcaggaca gagggccaag ctcctactac aatgtggaat 840 atgctgtggg gtactggata cataagggaa tgccatcaga gaaggtggtc atgggcatcc 900 ccacatatgg gcactccttc acactggcct ctgcagaaac caccgtgggg gcccctgcct 960 ctggccctgg agctgctgga cccatcacag agtcttcagg cttcctggcc tattatgaga 1020 tctgccagtt cctgaaagga gccaagatca cgcggctcca ggatcagcag gttccctacg 1080 cagtcaaggg gaaccagtgg gtgggctatg atgatgtgaa gagtatggag accaaggttc 1140 agttcttaaa gaatttaaac ctgggaggag ccatgatctg gtctattgac atggatgact 1200 tcactggcaa atcctgcaac cagggccctt accctcttgt ccaagcagtc aagagaagcc 1260 ttggctccct gtgaaggatt aacttacaga gaagcaggca agatgacctt gctgcctggg 1320 gcctgctctc tcccaggaat tctcatgtgg gattcccctt gccaggctgg cctttggatc 1380 tctcttccaa gcctttcctg acttcctctt agatcataga ttggacctgg ttttgttttc 1440 ctgcagctgt tgacttgttg ccctgaagta caataaaaaa aattcatttt gctccagtaa 1500 aaaaaaaaaa aaaaaa 1516
<210> 3
<211> 3039
<212> DNA
<213> Homo sapiens
<400> 3
ggtccctggg ctcccgctcg ccgccgctgc cgctcctcgt tctgctcctc actccccagc 60 ggctggaggc cggtaccggc gggcaggagg cgcccgagga tgtgctgctg gccgctgctc 120 ctgctgtggg ggctgctccc cgggacggcg gcggggggct cgggccgaac ctatccgcac 180 cggaccctcc tggactcgga gggcaagtac tggctgggct ggagccagcg gggcagccag 240 atcgccttcc gcctccaggt gcgcactgca ggctacgtgg gcttcggctt ctcgcccacc 300 ggggccatgg cgtccgccga catcgtcgtg ggcggggtgg cccacgggcg gccctacctc 360 caggattatt ttacaaatgc aaatagagag ttgaaaaaag atgctcagca agattaccat 420 ctagaatatg ccatggaaaa tagcacacac acaataattg aatttaccag agagctgcat 480 acatgtgaca taaatgacaa gagtataacg gatagcactg tgagagtgat ctgggcctac 540 caccatgaag atgcaggaga agctggtccc aagtaccatg actccaatag gggcaccaag 600 agtttgcggt tattgaatcc tgagaaaact agtgtgctat ctacagcctt accatacttt 660 gatctggtaa atcaggacgt ccccatccca aacaaagata caacatattg gtgccaaatg 720 tttaagattc ctgtgttcca agaaaagcat catgtaataa aggttgagcc agtgatacag 780 agaggccatg agagtctggt gcaccacatc ctgctctatc agtgcagcaa caactttaac 840 gacagcgttc tggagtccgg ccacgagtgc tatcacccca acatgcccga tgcattcctc 900 acctgtgaaa ctgtgatttt tgcctgggct attggtggag agggcttttc ttatccacct 960 catgttggat tatcccttgg cactccatta gatccgcatt atgtgctcct agaagtccat 1020 tatgataatc ccacttatga ggaaggctta atagataatt ctggactgag gttattttac 1080 acaatggata taaggaaata tgatgctggg gtgattgagg ctggcctctg ggtgagcctc 1140 ttccatacca tccctccagg gatgcctgag ttccagtctg agggtcactg cactttggag 1200 tgcctggaag aggctctgga agccgaaaag ccaagtggaa ttcatgtgtt tgctgttctt 1260 ctccatgctc acctggctgg cagaggcatc aggctgcgtc attttcgaaa agggaaggaa 1320 atgaaattac ttgcctatga tgatgatttt gacttcaatt tccaggagtt tcagtatcta 1380 aaggaagaac aaacaatctt accaggagat aacctaatta ctgagtgtcg ctacaacacg 1440 aaagatagag ctgagatgac ttggggagga ctaagcacca ggagtgaaat gtgtctctca 1500 taccttcttt attacccaag aattaatctt actcgatgtg caagtattcc agacattatg 1560 gaacaacttc agttcattgg ggttaaggag atctacagac cagtcacgac ctggcctttc 1620 attatcaaaa gtcccaagca atataaaaac ctttctttca tggatgctat gaataagttt 1680 aaatggacta aaaaggaagg tctctccttc aacaagctgg tcctcagcct gccagtgaat 1740 gtgagatgtt ccaagacaga caatgctgag tggtcgattc aaggaatgac agcattacct 1800 ccagatatag aaagacccta taaagcagaa cctttggtgt gtggcacgtc ttcttcctct 1860 tccctgcaca gagatttctc catcaacttg cttgtttgcc ttctgctact cagctgcacg 1920 ctgagcacca agagcttgtg atcaaaattc tgttggactt gacaatgttt tctatgatct 1980 gaacctgtca tttgaagtac aggttaaaga ctgtgtccac tttgggcatg aagagtgtgg 2040 agacttttct tccccatttt ccctccctcc tttttccttt ccatgttaca tgagagacat 2100 caatcaggtt ctcttctctt tcttagaaat atctgatgtt atatatacat ggtcaataaa 2160 ataaaactgg cctgacttaa gataaccatt ttaaaaaatt gggctgtcat gtgggaataa 2220 aagaattctt tctttcctac tacattctgt tttatttaaa tactcattgt tgctatttca 2280 ctttttgact tgacttttat atttctttaa aaaattcctt ccttttaaaa aatataaaag 2340 ggactactgt tcattccagt tttcttcttc tttgttgttc ttctagtgtg acttttcaag 2400 tgtaacagcc attcttcctg actttaatat tgtccagttc tggtcttttc tgtgaattac 2460 cactgggccc cttacctcaa tgctttttgt tgatgcccac tctggttccc ttgtttatct 2520 gagtctgttg gtaccccaaa tgaccccaca cccatcttaa agtacttttt ttcaccttcc 2580 ctgtttagta ctggccagat gagttttttc tagagctctg tcactatctg aaaagaaaga 2640 ggctatggga aacatagaaa tggtatgtat taataactga tcataggctg aggagaaaaa 2700 atgtagctgg ctgcaaaccc agtgctgtga ggtgacttat atgaggttcc agatcaaaga 2760 caggccgtgt gagccagtcc aggagggtgt aagttctgaa tggttccttg ctgactttgg 2820 gtgacacatg taccacatac tggctcagtt taagtcatgg ttctattgta gatttatttt 2880 tatattagtt aataaatgac tttaaattgt caccaattga aaatcttgtc actcttttgg 2940 ttttctttat atagctcagc caaatcttgt tttatgtcct gtcctcatct cttaagctaa 3000 atctgtttgg atcatattaa taaactaaat gaaattaca 3039
<210> 4
<211> 1486
<212> DNA
<213> Homo sapiens
<400> 4
cgagtgtggc caagggtgcc ggaggcaggg ttcgggtgcg tagtcgttgc gtgggcgctg 60 cccaaaaggc gcagagcatc aagtgtgcgt gggcagaacc ggcgcgggcg cccgccgcgg 120 gtctgcgcgg ggcgggggcg cagcaagtgc atccgagcga gcggagacta gcgcaccggc 180 gtcggtggcg agggtggtgc agaggagtcc ggctgggcgg agggaggaag gatgggtgcg 240 ggtaactttt tgaccgcctt ggaagtacca gtagccgcgc tcgcaggggc tgcctccgac 300 cgccgggcga gctgcgagcg agtgagcccg ccaccgcccc tcccccactt ccgcctcccg 360 cctcttcctc gttcccggct cccagggccc gtgtccaggc cggagccagg ggccccactg 420 ttgggatgct ggctgcagtg gggcgcccca agcccaggtc ccctctgtct tctctttcga 480 ctttgcagct gtacttgttt tgctcctcta cccgcaggag ctgacatgga cccaaatcct 540 cgggccgccc tggagcgcca gcagctccgc cttcgggagc ggcaaaaatt cttcgaggac 600 attttacagc cagagacaga gtttgtcttt cctctgtccc atctgcatct cgagtcgcag 660 agacccccca taggtagtat ctcatccatg gaagtgaatg tggacacact ggagcaagta 720 gaacttattg accttgggga cccggatgca gcagatgtgt tcttgccttg cgaagatcct 780 ccaccaaccc cccagtcgtc tgggatggac aaccatttgg aggagctgag cctgccggtg 840 cctacatcag acaggaccac atctaggacc tcctcctcct cctcctccga ctcctccacc 900 aacctgcata gcccaaatcc aagtgatgat ggagcagata cgcccttggc acagtcggat 960 gaagaggagg aaaggggtga tggaggggca gagcctggag cctgcagcta gcagtgggcc 1020 cctgcctaca gactgaccac gctggctatt ctccacatga gaccacaggc ccagccagag 1080 cctgtcggga gaagaccaga ctctttactt gcagtaggca ccagaggtgg gaaggatggt 1140 gggattgtgt acctttctaa gaattaaccc tctcctgctt tactgctaat tttttcctgc 1200 tgcaaccctc ccaccagttt ttggcttact cctgagatat gatttgcaaa tgaggagaga 1260 gaagatgagg ttggacaaga tgccactgct tttcttagca ctcttccctc ccctaaacca 1320 tcccgtagtc ttctaataca gtctctcaga caagtgtctc tagatggatg tgaactcctt 1380 aactcatcaa gtaaggtggt actcaagcca tgctgcctcc ttacatcctt tttggaacag 1440 agcacggtat aaataataaa ctaataataa tatgccaacc aaaaaa 1486
<210> 5
<211> 2099
<212> DNA
<213> Homo sapiens
<400> 5
gaaagaaaca acaactggaa aagaagcatt gcataagacc aggatgtctc tgaaatggat 60 gtcagtcttt ctgctgatgc agctcagttg ttactttagc tctgggagtt gtggaaaggt 120 gctggtgtgg cccacagaat acagccattg gataaatatg aagacaatcc tggaagagct 180 tgttcagagg ggtcatgagg tgattgtgtt gacatcttcg gcttctattc ttgtcaatgc 240 cagtaaatca tctgctatta aattagaagt ttatcctaca tctttaacta aaaatgattt 300 ggaagatttt tttatgaaaa tgttcgatag atggacatat agtatttcaa aaaatacatt 360 ttggtcatat ttttcacaac tacaagaatt gtgttgggaa tattctgact ataatataaa 420 gctctgtgaa gatgcagttt tgaacaagaa acttatgaga aaactacaag agtcaaaatt 480 tgatgtcctt ctggcagatg ccgttaatcc ctgtggtgag ctgctggctg aactacttaa 540 catacccttt ctgtacagtc tccgcttctc tgttggctac acagttgaga agaatggtgg 600 aggatttctg ttccctcctt cctatgtacc tgttgttatg tcagaattaa gtgatcaaat 660 gattttcatg gagaggataa aaaatatgat atatatgctt tattttgact tttggtttca 720 agcatatgat ctgaagaagt gggaccagtt ttatagtgaa gttctaggaa gacccactac 780 attatttgag acaatgggga aagctgaaat gtggctcatt cgaacctatt gggattttga 840 atttcctcgc ccattcttac caaatgttga ttttgttgga ggacttcact gtaaaccagc 900 caaacccttg cctaaggaaa tggaagagtt tgtgcagagc tctggagaaa atggtattgt 960 ggtgttttct ctggggtcga tgatcagtaa catgtcagaa gaaagtgcca acatgattgc 1020 atcagccctt gcccagatcc cacaaaaggt tctatggaga tttgatggca agaagccaaa 1080 tactttaggt tccaatactc gactgtataa gtggttaccc cagaatgacc ttcttggtca 1140 tcccaaaacc aaagctttta taactcatgg tggaaccaat ggcatctatg aggcgatcta 1200 ccatgggatc cctatggtgg gcattccctt gtttgcggat caacatgata acattgctca 1260 catgaaagcc aagggagcag ccctcagtgt ggacatcagg accatgtcaa gtagagattt 1320 gctcaatgca ttgaagtcag tcattaatga ccctatctat aaagagaata tcatgaaatt 1380 atcaagaatt catcatgatc aaccggtgaa gcccctggat cgagcagtct tctggattga 1440 gtttgtcatg cgccataaag gagccaagca ccttcgggtc gcagcccaca acctcacctg 1500 gatccagtac cactctttgg atgtgatagc attcctgctg gcctgcgtgg caactatgat 1560 atttatgatc acaaaatgtt gcctgttttg tttccgaaag cttgccaaaa caggaaagaa 1620 gaagaaaagg gattagttat atcaaaagcc tgaagtggaa tgaccaaaag atgggactcc 1680 tcctttattc cagcatggag ggttttaaat ggaggatttc ctttttcctg cgacaaaacg 1740 tcttttcaca acttaccctg ttaagtcaaa atttattttc caggaattta atatgtactt 1800 tagttggaat tattctatgt caatgatttt taagctatga aaaataataa tataaaacct 1860 tatgggctta tattgaaatt tattattcta atccaaaagt taccccacac aaaagttact 1920 gagcttcctt atgtttcaca cattgtattt gaacacaaaa cattaacaac tccactcata 1980 gtatcaacat tgttttgcaa atactcagaa tattttggct tcattttgag cagaattttt 2040 gtttttaatt ttgccaatga aatcttcaat aattaaaaaa aaaaaaaaaa aaaaaaaaa 2099
<210> 6
<211> 648
<212> DNA
<213> Homo sapiens
<400> 6
gggatggcag ccctaaacac agcatggcaa ctcatctact cactcatgaa agattaaaaa 60 atggaaacca acgtatttca tcttatgctc tgcgtcactt ctgctcggac tcataaatcc 120 acgtctcttt gctttggcca cttcaactca tatccaagcc ttcctttaat tcatgattta 180 ttgctggaaa tatcctttca actctcagca cctcatgaag acgcgcgctt aactccggag 240 gagctagaaa gagcttccct tctacagata ctgccagaga tgctgggtgc agaaagaggg 300 gatattctca ggaaagcaga ctcaagtacc aacattttta acccaagagg aaatttgaga 360 aagtttcagg atttctctgg acaagatcct aacattttac tgagtcatct tttggccaga 420 atctggaaac catacaagaa acgtgagact cctgattgct tctggaaata ctgtgtctga 480 agtgaaataa gcatctgtta gtcagctcag aaacacccat cttagaatat gaaaaataac 540 acaatgcttg atttgaaaac agtgtggaga aaaactaggc aaactacacc ctgttcattg 600 ttacctggaa aataaatcct ctatgttttg cacaaaaaaa aaaaaaaa 648
<210> 7
<211> 5288
<212> DNA
<213> Homo sapiens
<400> 7
ggggagcagg cgggggagcg ggcgggaagc agtgggagcg cgcgtgcgcg cggccgtgca 60 gcctgggcag tgggtcctgc ctgtgacgcg cggcggcggt cggtcctgcc tgtaacggcg 120 gcggcggctg ctgctccaga cacctgcggc ggcggcggcg accccgcggc gggcgcggag 180 atgtggcccc tggtagcggc gctgttgctg ggctcggcgt gctgcggatc agctcagcta 240 ctatttaata aaacaaaatc tgtagaattc acgttttgta atgacactgt cgtcattcca 300 tgctttgtta ctaatatgga ggcacaaaac actactgaag tatacgtaaa gtggaaattt 360 aaaggaagag atatttacac ctttgatgga gctctaaaca agtccactgt ccccactgac 420 tttagtagtg caaaaattga agtctcacaa ttactaaaag gagatgcctc tttgaagatg 480 gataagagtg atgctgtctc acacacagga aactacactt gtgaagtaac agaattaacc 540 agagaaggtg aaacgatcat cgagctaaaa tatcgtgttg tttcatggtt ttctccaaat 600 gaaaatattc ttattgttat tttcccaatt tttgctatac tcctgttctg gggacagttt 660 ggtattaaaa cacttaaata tagatccggt ggtatggatg agaaaacaat tgctttactt 720 gttgctggac tagtgatcac tgtcattgtc attgttggag ccattctttt cgtcccaggt 780 gaatattcat taaagaatgc tactggcctt ggtttaattg tgacttctac agggatatta 840 atattacttc actactatgt gtttagtaca gcgattggat taacctcctt cgtcattgcc 900 atattggtta ttcaggtgat agcctatatc ctcgctgtgg ttggactgag tctctgtatt 960 gcggcgtgta taccaatgca tggccctctt ctgatttcag gtttgagtat cttagctcta 1020 gcacaattac ttggactagt ttatatgaaa tttgtggctt ccaatcagaa gactatacaa 1080 cctcctagga ataactgaag tgaagtgatg gactccgatt tggagagtag taagacgtga 1140 aaggaataca cttgtgttta agcaccatgg ccttgatgat tcactgttgg ggagaagaaa 1200 caagaaaagt aactggttgt cacctatgag acccttacgt gattgttagt taagttttta 1260 ttcaaagcag ctgtaattta gttaataaaa taattatgat ctatgttgtt tgcccaattg 1320 agatccagtt ttttgttgtt atttttaatc aattaggggc aatagtagaa tggacaattt 1380 ccaagaatga tgcctttcag gtcctagggc ctctggcctc taggtaacca gtttaaattg 1440 gttcagggtg ataactactt agcactgccc tggtgattac ccagagatat ctatgaaaac 1500 cagtggcttc catcaaacct ttgccaactc aggttcacag cagctttggg cagttatggc 1560 agtatggcat tagctgagag gtgtctgcca cttctgggtc aatggaataa taaattaagt 1620 acaggcagga atttggttgg gagcatcttg tatgatctcc gtatgatgtg atattgatgg 1680 agatagtggt cctcattctt gggggttgcc attcccacat tcccccttca acaaacagtg 1740 taacaggtcc ttcccagatt tagggtactt ttattgatgg atatgttttc cttttattca 1800 cataacccct tgaaaccctg tcttgtcctc ctgttacttg cttctgctgt acaagatgta 1860 gcaccttttc tcctctttga acatggtcta gtgacacggt agcaccagtt gcaggaagga 1920 gccagacttg ttctcagagc actgtgttca cacttttcag caaaaatagc tatggttgta 1980 acatatgtat tcccttcctc tgatttgaag gcaaaaatct acagtgtttc ttcacttctt 2040 ttctgatctg gggcatgaaa aaagcaagat tgaaatttga actatgagtc tcctgcatgg 2100 caacaaaatg tgtgtcacca tcaggccaac aggccagccc ttgaatgggg atttattact 2160 gttgtatcta tgttgcatga taaacattca tcaccttcct cctgtagtcc tgcctcgtac 2220 tccccttccc ctatgattga aaagtaaaca aaacccacat ttcctatcct ggttagaaga 2280 aaattaatgt tctgacagtt gtgatcgcct ggagtacttt tagactttta gcattcgttt 2340 tttacctgtt tgtggatgtg tgtttgtatg tgcatacgta tgagataggc acatgcatct 2400 tctgtatgga caaaggtggg gtacctacag gagagcaaag gttaattttg tgcttttagt 2460 aaaaacattt aaatacaaag ttctttattg ggtggaatta tatttgatgc aaatatttga 2520 tcacttaaaa cttttaaaac ttctaggtaa tttgccacgc tttttgactg ctcaccaata 2580 ccctgtaaaa atacgtaatt cttcctgttt gtgtaataag atattcatat ttgtagttgc 2640 attaataata gttatttctt agtccatcag atgttcccgt gtgcctcttt tatgccaaat 2700 tgattgtcat atttcatgtt gggaccaagt agtttgccca tggcaaacct aaatttatga 2760 cctgctgagg cctctcagaa aactgagcat actagcaaga cagctcttct tgaaaaaaaa 2820 aatatgtata cacaaatata tacgtatatc tatatatacg tatgtatata cacacatgta 2880 tattcttcct tgattgtgta gctgtccaaa ataataacat atatagaggg agctgtattc 2940 ctttatacaa atctgatggc tcctgcagca ctttttcctt ctgaaaatat ttacattttg 3000 ctaacctagt ttgttacttt aaaaatcagt tttgatgaaa ggagggaaaa gcagatggac 3060 ttgaaaaaga tccaagctcc tattagaaaa ggtatgaaaa tctttatagt aaaatttttt 3120 ataaactaaa gttgtacctt ttaatatgta gtaaactctc atttatttgg ggttcgctct 3180 tggatctcat ccatccattg tgttctcttt aatgctgcct gccttttgag gcattcactg 3240 ccctagacaa tgccaccaga gatagtgggg gaaatgccag atgaaaccaa ctcttgctct 3300 cactagttgt cagcttctct ggataagtga ccacagaagc aggagtcctc ctgcttgggc 3360 atcattgggc cagttccttc tctttaaatc agatttgtaa tggctcccaa attccatcac 3420 atcacattta aattgcagac agtgttttgc acatcatgta tctgttttgt cccataatat 3480 gctttttact ccctgatccc agtttctgct gttgactctt ccattcagtt ttatttattg 3540 tgtgttctca cagtgacacc atttgtcctt ttctgcaaca acctttccag ctacttttgc 3600 caaattctat ttgtcttctc cttcaaaaca ttctcctttg cagttcctct tcatctgtgt 3660 agctgctctt ttgtctctta acttaccatt cctatagtac tttatgcatc tctgcttagt 3720 tctattagtt ttttggcctt gctcttctcc ttgattttaa aattccttct atagctagag 3780 cttttctttc tttcattctc tcttcctgca gtgttttgca tacatcagaa gctaggtaca 3840 taagttaaat gattgagagt tggctgtatt tagatttatc actttttaat agggtgagct 3900 tgagagtttt ctttctttct gttttttttt tttgtttttt tttttttttt tttttttttt 3960 ttttttgact aatttcacat gctctaaaaa ccttcaaagg tgattatttt tctcctggaa 4020 actccaggtc cattctgttt aaatccctaa gaatgtcaga attaaaataa cagggctatc 4080 ccgtaattgg aaatatttct tttttcagga tgctatagtc aatttagtaa gtgaccacca 4140 aattgttatt tgcactaaca aagctcaaaa cacgataagt ttactcctcc atctcagtaa 4200 taaaaattaa gctgtaatca accttctagg tttctcttgt cttaaaatgg gtattcaaaa 4260 atggggatct gtggtgtatg tatggaaaca catactcctt aatttacctg ttgttggaaa 4320 ctggagaaat gattgtcggg caaccgttta ttttttattg tattttattt ggttgaggga 4380 tttttttata aacagtttta cttgtgtcat attttaaaat tactaactgc catcacctgc 4440 tggggtcctt tgttaggtca ttttcagtga ctaataggga taatccaggt aactttgaag 4500 agatgagcag tgagtgacca ggcagttttt ctgcctttag ctttgacagt tcttaattaa 4560 gatcattgaa gaccagcttt ctcataaatt tctctttttg aaaaaaagaa agcatttgta 4620 ctaagctcct ctgtaagaca acatcttaaa tcttaaaagt gttgttatca tgactggtga 4680 gagaagaaaa cattttgttt ttattaaatg gagcattatt tacaaaaagc cattgttgag 4740 aattagatcc cacatcgtat aaatatctat taaccattct aaataaagag aactccagtg 4800 ttgctatgtg caagatcctc tcttggagct tttttgcata gcaattaaag gtgtgctatt 4860 tgtcagtagc catttttttg cagtgatttg aagaccaaag ttgttttaca gctgtgttac 4920 cgttaaaggt tttttttttt atatgtatta aatcaattta tcactgttta aagctttgaa 4980 tatctgcaat ctttgccaag gtactttttt atttaaaaaa aaacataact ttgtaaatat 5040 taccctgtaa tattatatat acttaataaa acattttaag ctattttgtt gggctatttc 5100 tattgctgct acagcagacc acaagcacat ttctgaaaaa tttaatttat taatgtattt 5160 ttaagttgct tatattctag gtaacaatgt aaagaatgat ttaaaatatt aattatgaat 5220 tttttgagta taatacccaa taagctttta attagagcag agttttaatt aaaagtttta 5280 aatcagtc 5288
<210> 8
<211> 5188
<212> DNA
<213> Homo sapiens
<400> 8
gagggcgggg cgcaggcgcg gttagtaccg cggtgggcgc cggggctccc gggaatctac 60 cttctcctgc ggccggcacg cggttcccag ggggccagcg gcggtcagcc gaggtcgaga 120 cgcccgcagg gtggccttag cggccggtcg taccacggca gccccgccga tcaggttcct 180 ttgggagact tcgacttgtt ggcgacctga tggccaaaag aaaggaagaa aatttttcct 240 ctcctaaaaa tgccaaaagg ccaagacaag aagaattgga ggattttgat aaagatggtg 300 acgaagacga atgtaaaggt actactttga ctgcagcaga agttggaata attgagagta 360 ttcacctaaa aaacttcatg tgtcattcaa tgcttggacc ttttaagttt ggttctaatg 420 tcaactttgt tgttggcaac aatggaagtg ggaagagtgc agtactcaca gctctcatag 480 tcggtcttgg tggaagagca gttgctacta atagaggatc ctctttaaaa ggttttgtga 540 aagatggaca gaactctgca gatatctcaa taacattgag gaacagagga gatgatgcct 600 ttaaagccag tgtgtatggt aactctatac ttatacagca acacatcagc atagatggaa 660 gtcgatctta taaacttaaa agtgcaacag gctccgtggt ttccacgagg aaagaagagc 720 tgattgcaat tcttgatcat tttaacatcc aggtggataa tccagtttct gttttaacac 780 aagaaatgag caagcagttc ttacagtcta aaaatgaagg agacaaatac aaattcttca 840 tgaaagcaac gcaacttgaa cagatgaagg aagattattc atacattatg gaaacgaaag 900 aaagaacaaa ggagcagata catcaaggag aagagcggct tactgaacta aagcgccagt 960 gtgtagagaa agaggaacgt tttcaaagta ttgctggttt aagtacaatg aagactaatt 1020 tagagtcctt gaaacatgaa atggcttggg cagtggtcaa tgaaattgaa aaacaattga 1080 atgccatcag agataatatc aaaattggag aagatcgtgc tgctagactt gacaggaaaa 1140 tggaagaaca gcaggtcaga cttaatgagg cagaacaaaa gtacaaggat attcaagaca 1200 aactagaaaa gattagtgaa gagacaaatg cacgagcacc agaatgtatg gcattgaaag 1260 cagatgttgt tgctaagaaa agggcctata atgaagctga ggttttatat aaccgatcct 1320 taaacgaata taaagcatta aagaaagatg atgagcagct ttgtaaacga attgaagagc 1380 tgaaaaaaag tactgaccaa tctttggaac ctgaacggtt ggaaagacaa aaaaaaatat 1440 cttggttaaa agagagagta aaggcctttc aaaatcaaga aaattcagtc aatcaagaga 1500 tcgaacagtt tcagcaagcc atagaaaagg acaaagaaga acatggcaaa attaagagag 1560 aagaattaga tgtgaagcat gcactgagct acaatcagag gcaactgaaa gaattgaaag 1620 atagtaaaac tgatcgactc aaaagatttg gccctaatgt tccagctctt cttgaagcca 1680 tagatgatgc ttatagacaa ggacatttta cctataaacc tgtaggccct ttaggagctt 1740 gcattcatct tcgggaccca gaacttgctt tggctattga atcttgctta aaagggcttc 1800 tgcaggccta ttgttgccat aatcatgctg atgaaagggt ccttcaggca ctcatgaaaa 1860 ggttttattt accagggacc tcacggccac cgataatagt ttctgagttt cggaatgaga 1920 tatatgatgt aagacacaga gctgcttatc atccagactt tccaacagtt ctgacagctt 1980 tagaaataga taatgcggtt gtggcaaata gcctaattga catgagaggc atagagacag 2040 tgctactaat caaaaataat tctgtagctc gtgcagtaat gcagtcccaa aagccaccca 2100 aaaattgtag agaagctttt actgctgatg gtgatcaagt ttttgcagga cgttattatt 2160 catctgaaaa tacaagacct aagttcctaa gcagagatgt ggattctgaa ataagtgact 2220 tggagaatga ggttgaaaat aagacggccc agatattaaa tcttcagcaa catttatctg 2280 cccttgaaaa agatattaaa cacaatgagg aacttcttaa aaggtgccaa ctacattata 2340 aagaactaaa gatgaaaata agaaaaaata tttctgaaat tcgggaactt gagaacatag 2400 aagaacacca gtctgtagat attgcaactt tggaagatga agctcaggaa aataaaagca 2460 aaatgaaaat ggttgaggaa catatggagc aacaaaaaga aaatatggag catcttaaaa 2520 gtctgaaaat agaagcagaa aataagtatg atgcaattaa attcaaaatt aatcaactat 2580 cggagctagc agacccactt aaggatgaat taaaccttgc tgattctgaa gtggataacc 2640 aaaaacgagg gaaacgacat tatgaagaaa aacaaaaaga acacttggat accttaaata 2700 aaaagaaacg agaactggat atgaaagaga aagaactaga ggagaaaatg tcacaagcaa 2760 gacaaatctg cccagagcgt atagaagtag aaaaatctgc atcaattctg gacaaagaaa 2820 ttaatcgatt aaggcagaag atacaggcag aacatgctag tcatggagat cgagaggaaa 2880 taatgaggca gtaccaagaa gcaagagaga cctatcttga tctggatagt aaagtgagga 2940 ctttaaaaaa gtttattaaa ttactgggag aaatcatgga gcacagattc aagacatatc 3000 aacaatttag aaggtgtttg actttacgat gcaaattata ctttgacaac ttactatctc 3060 agcgggccta ttgtggaaaa atgaattttg accacaagaa tgaaactcta agtatatcag 3120 ttcagcctgg agaaggaaat aaagctgctt tcaatgacat gagagccttg tctggaggtg 3180 aacgttcttt ctccacagtg tgttttattc tttccctgtg gtccatcgca gaatctcctt 3240 tcagatgcct ggatgaattt gatgtctaca tggatatggt taataggaga attgccatgg 3300 acttgatact gaagatggca gattcccagc gttttagaca gtttatcttg ctcacacctc 3360 aaagcatgag ttcacttcca tccagtaaac tgataagaat tctccgaatg tctgatcctg 3420 aaagaggaca aactacattg cctttcagac ctgtgactca agaagaagat gatgaccaaa 3480 ggtgatttgt aacttaacat gccttgtcct gatgttgaag gatttgtgaa gggaaaaaaa 3540 attctggact ctttgatata ataaaatgag actggaggca ttctgaaatg aaagaaactc 3600 ctttatatat ccaaccacaa tcaaacatat aaataagcct ggaaaaccaa ctacaacctg 3660 caatttaaga ttactattac tttaagaaaa tcaatttcat agtattggtt ttaaatcttt 3720 ttaagttttt ttaatacgat ctatttttat aggttctttt tcagaagtaa aattttgtac 3780 atatatacat gtacatatct gtttagtttg ggttcatttc tataacattt tgtaagaaaa 3840 taaaagtttg agcacctgat tatatttagt tttgcttttc cagatattac attctatagt 3900 taccaaaaat ggttgaaggg agggatttct cattgcagag ggtggggtgc aagggaataa 3960 gacacttgta cggaacactg aagctttgcc aacttctaca catgcctttt ttgcagtcct 4020 ttaactgtcc accctaccaa gagcttataa ccagtatcag aactggataa tgacgcagtt 4080 tttcactctg acctccatca tgcttgcctg atttaaaagc cctcagtttg cagtccaggg 4140 actgttcagg cttgtcctca gctgagagga cacaggctag agggactgtg cagaaccagg 4200 ctgggagaag ggctgggaaa actgggagtg gagggtggat cctcatggag caggagagta 4260 gctcatggct ccaggagcct gaggccatgc agttgatggt gagctgacat caattctaag 4320 actcatccta attgaggggt gttaaaaagt gtgctgctta gaatgaccaa atatagttat 4380 tgtaaaaaat gatatttatg aactttttat tttagaaaac atgaatttta ttgctccctg 4440 tattatttgt ttgatactag gattcatgct aaacttttta agaatgtatt ggatatcaag 4500 aagcattcct tacattagta gcaataaata ttagaataaa tatgaaattg aactattttc 4560 agaaaaaggg cagtatatta agagcaggga ctgttctcta gttattgagg aaaactggac 4620 tttgtttgtg tttttggtgg aggaagaagt ttaagatact ttagtcttaa attgaggttt 4680 gccaaatgag aagttcaaaa acttgggctt tctaatcaga atttccagga ggaggaaagt 4740 gtgtgctgaa tattttaaac atttcccact gatcatacaa agtctgattt ttaaatttac 4800 acttataatg cctttgtatt aaaattattt ttaacatgtg cttttccaaa ttaaaaatga 4860 agtagagtat accaaatgca taaactttca ttagctaagg aactcatgtc tgaattttgt 4920 tgtagttttg aatgttgtgc tctttcatac agaatgggaa acataatcct caggtatccc 4980 agcatctctt gttgaattga agattattca ttgctttggc ctcacaaagt tttgatttca 5040 actatcataa gtgaaaatat cttcctttaa tgttctaagt agtgataata ttactagaat 5100 gaaagaataa aaagaaattg ttcttttaaa atatgtgtaa cttctaaaaa taaaacttaa 5160 aatttatatg taaaaaaaaa aaaaaaaa 5188
<210> 9
<211> 2494
<212> DNA
<213> Homo sapiens
<400> 9
cacgaggcag gggccatttt acctccaggt tggccctgct caggaccagg aggaaacacc 60 tccagcccgc gacctcctcc cacaggggga aaaggaaagc aggaggacca cagaagcttt 120 ggcaccgagg atccccgcag tcttcacccg cggagattcc ggctgaagga gctgtccagc 180 gactacaccg ctaagcgcag ggagcccaag cctccgcacc ggattccgga gcacaagctc 240 caccgcgcat gcgcacacgc cccagaccca ggctcaggag gactgagaat tttctgaccg 300 cagtgcacca tgggaagctc tgaagtttcc ataattcctg ggctccagaa agaagaaaag 360 gcggccgtgg agagacgaag acttcatgtg ctgaaagctc tgaagaagct aaggattgag 420 gctgatgagg ccccagttgt tgctgtgctg ggctcaggcg gaggactgcg ggctcacatt 480 gcctgccttg gggtcctgag tgagatgaaa gaacagggcc tgttggatgc cgtcacgtac 540 ctcgcagggg tctctggatc cacttgggca atatcttctc tctacaccaa tgatggtgac 600 atggaagctc tcgaggctga cctgaaacat cgatttaccc gacaggagtg ggacttggct 660 aagagcctac agaaaaccat ccaagcagcg aggtctgaga attactctct gaccgacttc 720 tgggcctaca tggttatctc taagcaaacc agagaactgc cggagtctca tttgtccaat 780 atgaagaagc ccgtggaaga agggacacta ccctacccaa tatttgcagc cattgacaat 840 gacctgcaac cttcctggca ggaggcaaga gcaccagaga cctggttcga gttcacccct 900 caccacgctg gcttctctgc actgggggcc tttgtttcca taacccactt cggaagcaaa 960 ttcaagaagg gaagactggt cagaactcac cctgagagag acctgacttt cctgagaggt 1020 ttatggggaa gtgctcttgg taacactgaa gtcattaggg aatacatttt tgaccagtta 1080 aggaatctga ccctgaaagg tttatggaga agggctgttg ctaatgctaa aagcattgga 1140 caccttattt ttgcccgatt actgaggctg caagaaagtt cacaagggga acatcctccc 1200 ccagaagatg aaggcggtga gcctgaacac acctggctga ctgagatgct cgagaattgg 1260 accaggacct ccctggaaaa gcaggagcag ccccatgagg accccgaaag gaaaggctca 1320 ctcagtaact tgatggattt tgtgaagaaa acaggcattt gcgcttcaaa gtgggaatgg 1380 gggaccactc acaacttcct gtacaaacac ggtggcatcc gggacaagat aatgagcagc 1440 cggaagcacc tccacctggt ggatgctggt ttagccatca acactccctt cccactcgtg 1500 ctgcccccga cgcgggaggt tcacctcatc ctctccttcg acttcagtgc cggagatcct 1560 ttcgagacca tccgggctac cactgactac tgccgccgcc acaagatccc ctttccccaa 1620 gtagaagagg ctgagctgga tttgtggtcc aaggcccccg ccagctgcta catcctgaaa 1680 ggagaaactg gaccagtggt gatacatttt cccctgttca acatagatgc ctgtggaggt 1740 gatattgagg catggagtga cacatacgac acattcaagc ttgctgacac ctacactcta 1800 gatgtggtgg tgctactctt ggcattagcc aagaagaatg tcagggaaaa caagaagaag 1860 atccttagag agttgatgaa cgtggccggg ctctactacc cgaaggatag tgcccgaagt 1920 tgctgcttgg catagatgag cctcagcttc cagggcactg tgggcctgtt ggtctactag 1980 ggccctgaag tccacctggc cttcctgttc ttcactccct tcagccacac gcttcatggc 2040 cttgagttca ccttggctgt cctaacaggg ccaatcacca gtgaccagct agactgtgat 2100 tttgatagcg tcattcagaa gaaggtgtcc aaggagctga aggtggtgaa atttgtcctg 2160 caggtccctc gggagatcct ggagctggag catgagtgtc tgacaatcag aagcatcatg 2220 tccaatgtcc agatggccag aatgaatgtg atagttcaga ccaatgcctt ccactgctcc 2280 tttatgactg cacttctagc cagtagctct gcacaagtta gctctgtaga agtaagaact 2340 tgggcttaaa tcatgggcta tctctccaca gccaagtgga gctctgagaa tacaacaagt 2400 gctcaataaa tgcttgctga ttgactgatg aaaaaaaaaa aaaaaaaaaa aaaaaaaaaa 2460 aaaaaaaaaa aaaaaaaaaa aaaaaaaaaa aaaa 2494
<210> 10
<211> 1462
<212> DNA
<213> Homo sapiens
<400> 10
agggtgcggc gaggcggaat ggcggcggct gcggactcgt tctcaggcgg ccccgcgggg 60 gtgcggcttc cgaggtcgcc gccactcaag gtgctggcgg agcagctgcg gcgcgacgcg 120 gagggcggcc cgggcgcgtg gcggctgtca cgggcggcgg cgggccgcgg gccgctggac 180 ctggcggccg tgtggatgca gggcagggta gtgatggcgg accgcggcga ggctcggctg 240 agggacccga gcggggactt ctcggtccgc ggcctggagc gggtgccgcg cgggcggccc 300 tgtctagtcc caggaaagta tgtgatggtg atgggagtgg ttcaggcctg cagccctgag 360 ccctgcctgc aggctgtgaa gatgacagac ctttctgata atcccatcca tgaaagtatg 420 tgggaactgg aggtagaaga tttacacagg aatattcctt agagtatgtt ggaactgtcg 480 ttaaaaacaa ccaaaatccc gaaactattt agaagcttat aatgatgtgg atttcatgga 540 cacttttcaa tgcgtatttt tcaaatgctt ctcagagagc cttgctttgg ttgaccaagg 600 agtccggatg taggaatgtt taaatcctcg gatacttcag tgacacagcc tctgctgccc 660 cttgctttgc ctgtgtttgc tgatgaaaag cagatgcttg tgtttcattt tccttcctgg 720 tttgtgtgtg ttaattctct ctctctctct cagacacaga agtctcatgt tgcattttcc 780 aaattttatg agtgatgata ctttttccat tactgctgcg tccctgtttt acaatgcaaa 840 atttaagtac ggtcattgcc catggtgatt aaagtgtggt tatgggcagg aagacagact 900 gtgtaaaaaa ggaatgacat cctggctcct catcttcttc atcagcaact accataacca 960 gtttgcgagt caaatggcat ttcctaacgg caggcatggc ggcccctgaa agacaacagc 1020 tccctttctg cttcggacac cactcaaaca tttagacgca gctctatccc ttttcctagc 1080 tagagaaggt gatgccttct tccattactc agagatgttg agacgttttc agaatttctt 1140 gttgaaatga aaaacatcaa gataaaggac gcctttcagg cattagctaa acttccactt 1200 cataactttc ggcgagacgt ggtgagcctc ctggtgtaga gttcttttgt ctttgtatgg 1260 aatgactttt tgctgtgatg gttttgaatg ttgggtttct gctgtctgct tagtacccat 1320 gcctgaattt tttgagattg taaatatcaa aggagttaga ttgtgttctg acatggttgt 1380 agactttcac ctggattatt gatattctac ctctaataaa tttttaatag gcaaaaaaaa 1440 aaaaaaaaaa aaaaaaaaaa aa 1462
<210> 11
<211> 2990
<212> DNA
<213> Homo sapiens
<400> 11
agggggcgct gcggcccccc caatcccccg ccccgtccgg gctggggcgg aggagcgggc 60 ggggaccaaa ggttggtgtc tttgcgctcg gaccttcgcc agaggggccg ggacatcatg 120 acggtgggag ccaggctccg aagcaaggcg gagagcagcc tcctgcgccg cgggccccga 180 gggcgagggc gaaccgaggg ggacgaggag gcggccgcca tcctggagca cctggagtac 240 gcggacgagg cggaggcggc ggccgagagc gggacgagcg cggcggacga gcggggcccg 300 gggacccggg gcgcgcggag ggtgcacttc gccctcctgc ccgagcgcta cgagccactg 360 gaggagccgg cgccgagcga gcagcccagg aagaggtacc ggaggaagct gaagaagtac 420 ggcaagaatg tcgggaaggt catcatcaaa ggatgccgct acgtggtcat cggcctgcaa 480 ggcttcgctg cagcctactc cgccccgttt gcggtagcca ccagcgtggt atccttcgtg 540 cgctaatggg agctgctgtg gcaggtgccc ccagagtgaa cgggagcccc tgctgtggga 600 actttgtgaa tcctggagca tctcagactt gaacacacag catatttgga agagaaaaca 660 tgcctttctt tgttgaatca cattagtatg atgagtgagt catccctgcc catctgctga 720 gcttctcaca tctctcagtc acacgtggac ccagtggtca atcctgcaga gaattcggcg 780 gaggttaggt ttgggagtgg agctagcgtg ctaaagccag agccttcacg tgaaggtggc 840 aggcactggg gcggaagcca acactcaaca gatgcaagca gtgtgggtgt gcagcagaac 900 agtgatcttg ggggaggaag aggatgttac tagagtcaga tgatttgctg tattctcctg 960 aaaggtcgta ggctgacagg cgctcacatt ccttggctgc ctcggttctg agggcagcta 1020 aggagctgtt tattcctcaa gtcatgctcc ccgatctcct tcctctacca ctctgtcacc 1080 aggagtttaa ttacaggctt gaggagaaga aaggaagaaa agatatcttg atgctttgaa 1140 aactgtgttg gcagtgtggc atgactgttt aaagtagata aaaccttgtc attttacccc 1200 atccctgcat gactgtgaag ctggcgagga aggaggaaga agggcaagtt cagatgcagg 1260 ctgggtggct gggacaggtt ggctaaggga ctactctgga gggctcttct gcctggcatt 1320 gcccacttcg gcccagccac gtgtttgcag cgaccagagt ccctgcaaag gtgtggctgg 1380 ctgtggtcag ggtgctacta gcaccatcag cgcactcccg ccattggctc agctcctctc 1440 tgccagtcca actaagagtg ctttgtcctg ggtgggacat aggggctgag agagatgggg 1500 ggagacataa cacccaggaa tgaaaataca gatttagaga aggaaccagt aagtaggaga 1560 cagatgtgaa ggaaatggaa atgaggcaag aggacattgg aagagagaag tttgctgtcc 1620 aggagccagg tctggagcat cagtgtgagg gagttcaggt aggctgggcc tgtgcctcta 1680 ggtagggaca agggaggctg ggtagccagg gctggtgctt aaaacccctg aggccatgag 1740 ctcattggct gcctttgtag catcctgtct tcttctgtgc tgcctggttt gatctcatct 1800 cacctggatt caaagggtaa ggtgggcatg ggtcttgggc ctgacaccca ccaaggatga 1860 cctgtggact gccatcggat gctgaacagg gagatgaaag gaggtcctct taccataccc 1920 ctctgccaac cccccagtag gccactgttc tgactttgtt tccagaatat ccagaaatcc 1980 aaaggggctg ttgctgaaca gtctgcagga ccagtgacag cacctacctg ttgtcccaag 2040 gcatacaaag gaggcctcaa cgctcatgct tctctaatca agccctacca agacagacag 2100 aaagacagac agaaaaaagg aaggggtaga ggagaaggtt gaagctgtgg agctagactc 2160 tgcttcactt cctgaagctt caacttcatg tcgaagattc actgggaccc aattcctgca 2220 ttgttaatat ttgtgaggaa aagtgaaaca agtgatctgg ttttagccca gatgatgaaa 2280 gtggatatgg cacattttca cacacgtgag ataattacag cttgccccac aacactgggt 2340 gttggagaaa gggagagata gtcataagtg gaagaaaaag ccaagcatag tgagtgggaa 2400 agagagtgag agcctgtgca ggctgctgac gagccccagg cagcccacaa gtttctcgtg 2460 gggagatgga ggcagagccc agggtagggg acagagctgc tggggccttt ccttgcctgg 2520 gaatctgtcc caggaagagc ttccccactc ccatccccca aattggaaaa accgtacatt 2580 caagcctgtt tggccctgaa attcttaaga atctggttaa gaattaactc actaatgtca 2640 aaagtcaaaa cctcctaggg gttgtcctgg gagtcaggtt cacgggtaca gaagatgaat 270 ctcagatgtc actcaacctg agccgtcatt ctctgtggca gggctgccct gggtttctct 2760 tactcaatcc ctggagtgta agcatttgga ttgtgtcaca gattaccttt ttaccttttc 2820 tttctttttt tttctttttt tcaatatcag tgcccacacc ttactgagta ttgagtttta 2880 gagctttcgc ttgatgtgct tgaccaagag acttcttttg tatccttttc ttgtcctatg 2940 atgtaaataa aagcctcgat ttatgtaatg ttaaaaaaaa aaaaaaaaaa 2990 <210> 12
<211> 2447
<212> DNA
<213> Homo sapiens
<400> 12
gcatggagga cacagagaga gagagtgctg tgtattcctt ccccgctact gtcctgtcct 60 cagctaactt gctctgggac agcttcccca gggctacaga tactgcactc agctgactgt 120 cctttcttct gggcccctgg tcccagagca gagctgacaa aggagattcc tgagagagca 180 ccttcttatc acagaaagtg ctgagccaag agctcctagc tgcccctttt gcagatgtga 240 agggccagtg aaccttggac ccagatggtt gcttaatact cctttccccc tccctcactc 300 cttcctttgc gggctgcctc acctcctcca cccttcttgc ttaaatccat aggcatttgt 360 ctggccttcc cttttactgc tggctgggaa ggaggagcat cagaccacag atcctggaag 420 gcacttctct ccctgactgc cgctcacact gccgtgagaa cctgcttata tccaggacca 480 aggaggcaat gccaggaagc tggtgaaggg tttcctctcc tccaccatgg ttgacagcac 540 tgagtatgaa gtggcctccc agcctgaggt ggaaacctcc cctttgggtg atggggccag 600 cccagggccg gagcaggtga agctgaagaa ggagatctca ctgcttaacg gcgtgtgcct 660 gattgtgggg aacatgatcg gctcaggcat ctttgtttcc cccaagggtg tgctcatata 720 cagtgcctcc tttggtctct ctctggtcat ctgggctgtc gggggcctct tctccgtctt 780 tggggccctt tgttatgtgg aactgggcac caccattaag aaatctgggg ccagctatgc 840 ctatatcctg gaggcctttg gaggattcct tgctttcatc agactctgga cctccctgct 900 catcattgag cccaccagcc aggccatcat tgccatcacc tttgccaact acatggtaca 960 gcctctcttc ccgagctgct tcgcccctta tgctgccagc cgcctgctgg ctgctgcctg 1020 catttgtctc ttaaccttca ttaactgtgc ctatgtcaaa tggggaaccc tggtacaaga 1080 tattttcacc tatgctaaag tattggcact gatcgcggtc atcgttgcag gcattgttag 1140 acttggccag ggagcctcta ctcattttga gaattccttt gagggttcat catttgcagt 1200 gggtgacatt gccctggcac tgtactcagc tctgttctcc tactcaggct gggacaccct 1260 caactatgtc actgaagaga tcaagaatcc tgagaggaac ctgcccctct ccattggcat 1320 ctccatgccc attgtcacca tcatctatat cttgaccaat gtggcctatt atactgtgct 1380 agacatgaga gacatcttgg ccagtgatgc tgttgctgtg acttttgcag atcagatatt 1440 tggaatattt aactggataa ttccactgtc agttgcatta tcctgttttg gtggcctcaa 1500 tgcctccatt gtggctgctt ctaggctttt ctttgtgggc tcaagagaag gccatctccc 1560 tgatgccatc tgcatgatcc atgttgagcg gttcacacca gtgccttctc tgctcttcaa 1620 tggtatcatg gcattgatct acttgtgcgt ggaagacatc ttccagctca ttaactacta 1680 cagcttcagc tactggttct ttgtggggct ttctattgtg ggtcagcttt atctgcgctg 1740 gaaggagcct gatcgacctc gtcccctcaa gctcagcgtt ttcttcccga ttgtcttctg 1800 cctctgcacc atcttcctgg tggctgttcc actttacagt gatactatca actccctcat 1860 cggcattgcc attgccctct caggcctgcc cttttacttc ctcatcatca gagtgccaga 1920 acataagcga ccgctttacc tccgaaggat cgtggggtct gccacaaggt acctccaggt 1980 cctgtgtatg tcagttgctg cagaaatgga tttggaagat ggaggagaga tgcccaagca 2040 acgggatccc aagtctaact aaacaccatc tggaatcctg atgtggaaag caggggtttc 2100 tggtctactg gctagagcta aggaagttga aaaggaaagc tcacttcttt ggaggcacct 2160 gtccagaagc ctggcctagg cagcttcaac ctttgaactt actttttgaa atgaaaagta 2220 atttatttgt tttgctacat actgttccag acttttaaag gggacaatga aggtgactgt 2280 ggggaggagc atgtcaggtt tgggcttggt tgttttagaa gcacctgggt gtgcctacct 2340 actcctcttt tcttttaaaa gggcccacaa tgctccaatt tcctgtctcc tttagagaga 2400 catgaaacta tcacaggtgc tggatgacaa taaaagttta tgttcct 2447
<210> 13
<211> 3514
<212> DNA
<213> Homo sapiens
<400> 13
ggaagcggaa gcggccctcg cgcacactag tcgtctggct ctctggctcc ggaagctgcg 60 ctccttcacc ctcctcgttg gtgtcctgtc accatggcgc tggctgtctt gcgggtcctg 120 gagccctttc cgaccgagac acccccgttg gcagtgctgc tgccacccgg gggcccgtgg 180 ccggcggcgg agctgggcct ggtgctggcc ctgaggcctg caggggagag cccggcaggg 240 ccggcgctgc tggtggcagc cctggagggg ccggacgcgg gcaccgaaga gcagggtccc 300 gggccgccgc agctactggt tagccgcgcg ctgctgcggc tcctggcact gggctccggg 360 gcctgggtgc gggcgcgggc ggtgcggcgg cccccggcgc taggttgggc actgcttggc 420 acctcgctgg ggcctgggct cggaccgcga gtcgggccgc tgctggtgag gcgcggagag 480 accctcccag tgcccggacc gcgggtgctg gagacgcggc cggcgttgca agggctgctg 540 ggcccaggga ctcggctggc tgtgactgag ctccgcgggc gggccagact gtgtccagag 600 tctggggaca gcagtcggcc cccacccccg cccgtggtgt cctcctttgc ggtttctggc 660 acagtgcggc gactccaggg agttctggga gggactggag attcactagg ggtgagccgg 720 agctgtctcc gtggccttgg cctcttccag ggcgaatggg tgtgggtggc ccaggccaga 780 gagtcatcga acacttcaca gccgcacttg gctagggtgc aggtcctaga acctcgctgg 840 gacctctctg atagactggg acccggctct ggaccgctgg gagagcccct cgctgacgga 900 ctggcgcttg tccctgccac tttggctttt aatcttggct gtgaccccct ggaaatggga 960 gagctcagaa ttcagaggta cttggaaggc tccatcgccc ctgaagacaa aggaagctgc 1020 tcattgctgc ctgggcctcc atttgccaga gagttacaca tcgaaattgt gtcttctccc 1080 cactacagca ctaatggaaa ttatgacggt gttctttacc ggcactttca gatacccagg 1140 gtagtccagg aaggggatgt tctatgtgtg ccaacaattg ggcaagtaga gatcctggaa 1200 ggaagtccag agaaactgcc caggtggcgg gaaatgtttt ttaaagtgaa gaaaacagtt 1260 ggggaagctc cagatggacc agccagtgcc tacttggccg acaccaccca tacctccttg 1320 tacatggtgg gttctaccct gagccctgtt ccatggctcc cttcagagga atccactctc 1380 tggagcagtt tgtctcctcc aggcctggag gccttggtgt ctgaactctg tgctgtcctg 1440 aagcctcgcc tccagccagg gggtgccctg ctgacaggaa ctagcagtgt ccttctacgg 1500 ggccccccag gctgtgggaa gaccacagta gttgctgctg cctgtagtca ccttgggctc 1560 cacttactga aggtgccctg ctccagcctc tgtgcagaaa gtagtggggc tgtggagaca 1620 aaactgcagg ccatcttctc ccgggcccgc cgttgccggc ctgcagtcct gttgctcaca 1680 gctgtggacc ttctgggccg ggaccgtgat gggctgggtg aggatgcccg tgtgatggct 1740 gtgctgcgtc acctcctcct caatgaggac cccctcaaca gctgccctcc cctcatggtt 1800 gtggccacca caagccgggc ccaggacctg cctgctgatg tgcagacagc atttcctcat 1860 gagctcgagg tgcctgctct gtcagagggg cagcggctca gcatcctgcg ggccctcact 1920 gcccaccttc ccctgggcca ggaggtgaac ttggcacagc tagcacggcg gtgtgcaggc 1980 tttgtggtag gggatctcta tgcccttctg acccacagca gccgggcagc ctgcaccagg 2040 atcaagaact caggtttggc aggtggcttg actgaggagg atgaggggga gctgtgtgct 2100 gccggctttc ctctcctggc tgaggacttt gggcaggcac tggagcaact gcagacagct 2160 cactcccagg ccgttggagc ccccaagatc ccctcagtgt cctggcatga tgtgggtggg 2220 ctgcaggagg tgaagaagga gatcctggag accattcagc tccccctgga gcaccctgag 2280 ctactgagcc tgggcctgag acgctcaggc cttctgctcc atgggccccc tggcaccggc 2340 aagacccttc tggccaaggc agtagccact gagtgcagcc ttaccttcct cagcgtgaag 2400 gggccagagc tcattaacat gtatgtgggc caaagtgagg agaatgtgcg ggaagtgttt 2460 gccagggcca gggctgcagc tccatgcatt atcttctttg atgaactgga ctctttggcc 2520 ccaagccggg ggcgaagtgg agattctgga ggagtgatgg acagggtggt gtctcagctc 2580 cttgccgagc tagatgggct gcacagcact caggatgtgt ttgtgattgg agccaccaac 2640 agaccagatc tcctggaccc tgcccttctg cggcctggca gatttgacaa gctggtgttt 2700 gtgggggcaa atgaggaccg ggcctcccag ctacgcgttc taagtgccat cacacgcaaa 2760 ttcaagctag agccatctgt gagcctggta aacgtgctag attgctgccc tccccagctg 2820 acgggcgcgg acctctactc tctctgctct gatgctatga cagctgccct caaacgcagg 2880 gttcatgacc tggaggaagg gctggagcca ggtagctcag cactgatgct caccatggag 2940 gacttgctgc aggctgccgc ccggctgcaa ccctcagtca gtgagcagga gctgctccgg 3000 tacaagcgca tccagcgcaa gtttgctgcc tgctaggagc cccccagggt ctgggacccc 3060 gctcagcatg gctgcaggta ccttgatagc ccacagagag atctgggaag gaagggctcc 3120 tcctcaggct gctgccaacc cacctggagg ccacctccct ccaggagatc ccagggtgca 3180 aagtggcatt gagacagcag caacagctca agagatatct cctgcctact tgcccctcct 3240 tccaggccgg ctctaagaga aaggcccatc tactcaggaa gagggccagg gccttgggtt 3300 ctggggattg ggccctgaga gggctagttc tgtggctgaa aataaagcat gtcccgcccc 3360 ctactggtgt tgtggcatga aaggttggag tgagaaagag cagggttgtg ggaggagtaa 3420 gccctgggag agggtggggg ggtgggcagc atgggagcct gatccttctg acataaataa 3480 acacaatgca tgcaggaaaa aaaaaaaaaa aaaa 3514
210> 14
<211> 5421
<212> DNA
<213> Homo sapiens
<400> 14
gcgcactgct tatttcccgc tgtcaggatg aggaggcgga ggtcggcggt cgggtccgtc 60 tctgcccgcg gctgtggcgg cgccggcgga tccagcctta gcggtttctc tctgggcggc 120 ggcggcggcg gctcggttga cgcctcctcc gccagctgag cccgcgggag cccaggacgc 180 cgcttccccg cccatccccg ctccccgagg ccggccgcct ggtcatggcg cagccgggcc 240 cggcttccca gcctgacgtt tctcttcagc aacgggtagc agaattggaa aaaattaatg 300 cagaattttt acgtgcacaa cagcagcttg aacaagaatt taatcaaaag agagcaaaat 360 ttaaggagtt atatttggct aaagaggagg atctgaagag gcaaaatgca gtattacaag 420 ctgcacaaga tgatttggga caccttcgaa cccagctgtg ggaagctcaa gcagagatgg 480 agaatattaa ggcgattgcc acagtctctg agaacaccaa gcaagaagct atagatgaag 540 tgaaaagaca gtggagagaa gaagttgctt cacttcaggc tgttatgaaa gaaacagttc 600 gtgactatga gcaccagttc caccttaggc tggagcagga gcgaacacag tgggcacagt 660 atagagaatc cgcagagagg gaaatagctg atttaagaag aaggctgtct gaaggtcaag 720 aggaggaaaa tttagaaaat gaaatgaaaa aggcccaaga ggatgctgag aaacttcggt 780 ccgttgtgat gccaatggaa aaggaaattg cagctttgaa ggataaactg acagaggctg 840 aagacaaaat taaagagctg gaggcctcaa aggttaaaga actgaatcat tatctggaag 900 ctgagaaatc ttgtaggact gatctagaga tgtatgtagc tgttttgaat actcagaaat 960 ctgttctaca ggaagatgct gagaaactgc ggaaagaatt gcatgaagtt tgccatctct 1020 tggagcaaga gcgacaacaa cacaaccagt taaaacatac gtggcagaag gccaatgacc 1080 agtttctgga atctcagcgt ttactgatga gagacatgca gcgaatggag attgtgctaa 1140 cttcagaaca gctccgacaa gttgaagaac tgaagaagaa agatcaggag gatgatgaac 1200 aacaaagact caataagaga aaggatcaca aaaaagcaga tgttgaggaa gaaataaaaa 1260 taccagtagt gtgtgcttta actcaagaag aatcttcagc ccagttatca aatgaagagg 1320 agcatttaga cagcacccgt ggctcagttc attccttaga tgcaggcttg ctgttgccat 1380 ctggagatcc tttcagtaaa tcggacaatg acatgtttaa agatggactc aggagagcac 1440 agtctacaga cagcttggga acctcgggct cattgcaatc caaagcttta ggctataact 1500 acaaagcaaa atctgctgga aacctggacg agtcagattt tggaccactg gtaggagcag 1560 attcagtgtc tgagaacttt gatactgcat cccttgggtc actccagatg ccaagtgggt 1620 ttatgttaac caaagatcag gaaagagcaa tcaaggcgat gacaccagaa caagaagaga 1680 cagcgtccct cctctccagc gttacccagg gcatggagag tgcctatgtg tcccctagtg 1740 gttatcgttt agttagtgaa acagaatgga atctcttgca gaaagaggta cataatgctg 1800 gaaataaact tggtagacgt tgtgatatgt gttccaatta cgaaaaacag ttacaaggaa 1860 ttcagattca ggaggctgaa acgagagacc aggtgaaaaa actacagctg atgctaaggc 1920 aagctaatga ccagttagag aagacaatga aagataagca ggagctggaa gacttcataa 1980 agcaaagcag cgaagattcg agtcaccaga tctctgcact cgtcctaaga gcccaggcct 2040 ccgagatctt acttgaagag ttacagcagg ggctttccca ggcaaagagg gatgttcagg 2100 aacagatggc ggtgctgatg cagtcacggg aacaggtttc agaagagctg gtgaggttac 2160 agaaagataa tgacagtctc cagggaaagc acagcctgca tgtgtcatta cagcaagcag 2220 aagacttcat cctcccagac actacagagg cactgcggga gttggtatta aaataccgtg 2280 aggacatcat taatgtgcgg acagcagcag accacgtaga agaaaagctg aaggctgaga 2340 tacttttcct aaaagagcag atccaagcag aacagtgttt aaaagaaaat cttgaagaaa 2400 ctctgcaact agaaatagaa aactgcaagg aggaaatagc ttctatttct agcctaaaag 2460 ctgaattaga aagaataaaa gtggaaaaag gacagttgga gtccacatta agagagaagt 2520 ctcaacagct tgagagtctt caggaaataa agatcagttt ggaagagcag ttaaagaaag 2580 agactgctgc taaggctacc gttgaacaac taatgtttga agagaagaat aaagctcaga 2640 gattacagac agaattagat gtcagtgagc aagtccagag ggattttgta aagctttcac 2700 agacccttca ggtgcagtta gagcggatcc ggcaagctga ctccttggag agaatccggg 2760 caattctgaa tgatactaaa ctgacagaca ttaaccagct tcctgagaca tgacaccctc 2820 atggcaggat tctagcctgc actttgggtt tttaactcat ctttagagca acagtaatta 2880 ttatttaact cttaactgaa gaaagagaag tcacaacaaa aggaagactg gagaaatgct 2940 tacttctaga gggagaagac tgtgcggcac aggaaacagc aaacagtggg gtgatctgca 3000 gcccagagac cttcaaatgc gaacactata aactccaggc ttgattccaa caggcgtggg 3060 atcagatttg gtgatggaaa aagcgctgtt tccttgcctg ctttctccaa gacagatttt 3120 cggaacacat ttccctaccc taaagcgaca tcccagtagt gtttggaatt ttctgttcat 3180 agatattgga agagagaatt tgctttatct gttgtctaga gctcatcagt aacagtattc 3240 agttagcaga cagaagtgta gtatgctgta tgaatatttt atattaaaat atgaaggttt 3300 gagagcaggc cattggctac tgactgtatt ttgttttcct tttaccattt tatcttatct 3360 tgctttggag gaccttaaat gctactgaaa cttacctgag aaccatagga ctgtggcaaa 3420 caaaaccaat tcatgaagtg aattaatgat gaggatctga aaacttggct ggggctatat 3480 atactggaag ttgaaggtta aaggaaaagc acaagaaact ttggaagcac tttttctgca 3540 tacttagatg atcttcatgg gcccacaggg taccaggata aagccgaact ggtaccatct 3600 actcttttga agtgttttag tctagttatt ggatcagtga aaaacattag tatacgtttt 3660 taaataggct aatttttcaa cttggatcat taggcttacg tactacttgt ttcaaatgtg 3720 tcaaatacaa aaatggtaac taggttgaca gatactttgt atttttcttt tgaattcaga 3780 cctggaatgt aagtaagtga caatgcttat ggaaagccag ttagttagaa ttggaaatct 3840 gtcttgtcat tttacaagca ttagattcct ttcctgtgtg aagaaagcct cagtgaaaca 3900 ggtctttgcc ataactttat gaagtgctac agaaagcaca aagaattgat tcatgttcat 3960 caatacctgc tgagagtact gtcccaggaa tatccagtgg atggattcat catccaggag 4020 gttcaaaagt aagatggttt tcaaatcatt tttgagactg gttgcataac agcagggtac 4080 ctgaaagagc cttctgggag ttagtgaact aggtagattg ttttgttcac ataacgccac 4140 catcaactta aagtgaattg tctttgttat aaatgaggtc actatggact taccctaaag 4200 atcttctgta cttctctctt ccataggaca aatgataagt actacatacc tcatctcttg 4260 ggttattatt gtagtcttgc attcatggtt atgaatttaa aaataaatac caattatgga 4320 aatagtacta aaggcttgcc gcacatgaaa cattatttta attggtttaa agtcccttta 4380 taaagagtgc tacatggttt agataaagga aacatataac tattgagtta caggggattt 4440 tattaattat aaaatgcaat caatttaaat tacgtaggtt taagactagt cccttggata 4500 agccccaagc gaatttgtct tcagattatt aaaattagtg ctgtaaatca gggtgggcaa 4560 ttcacagcct ttctgaactg actgaactag agcttgcagt gaagtgttct gctgagactg 4620 agcaccttac agatattttt ctccagaaga tggtgctggg taataaaatc atcacaatta 4680 gggaatggtt agtggtctct actgtggcaa atgccaactg ttggaattca ctttattgta 4740 gaaaaaccca aactgagact cttaagtttt gtttagcaat gtgtttctgg tatgaaacaa 4800 actactgtgt cactgtccag gtaggaaaca attctttcaa ctgggttttc agcataaatg 4860 ggaactgatg tagaaggcag gatttagccc ttctaggcaa aagaaaagct cagttgggtt 4920 tcacgagtgt tcctgtgctt atattcagtc tgtgcctaca tgttctcatg catgtctaac 4980 ctgatttacc tcttacctgt aacctacctt atcatgtggc ttttaattga cagtcactca 5040 gccatttcta agcagatata gtagtacctt tcagaactca cattggcaag tgtaaaaaga 5100 tgacttaagg tgaagtgagg acaaaatcac attctgcata ctaacctatt tttttctccc 5160 tttaaggtgc taaacttgca cctcatgtcc actcagtaac aagtattggg acgtagagca 5220 cagcctcact cagctctgaa aggtaataca gcttgtgagg aagtgagcca gcagtggcct 5280 ttgcaattgt ggatcttgag ctctgctctc agcagatttc aggtgtaacc atttgttaac 5340 tgtactgaag gtgtgtcctc aagaagaaag tgttcaaatt aaaaaagctg ctgccaagta 5400 caaaaaaaaa aaaaaaaaaa a 5421
<210> 15
<211> 1582
<212> DNA
<213> Homo sapiens
<400> 15
cctcagtcag cccccggggg aggccatgaa cgccacgggg accccggtgg cccccgagtc 60 ctgccaacag ctggcggccg gcgggcacag ccggctcatt gttctgcact acaaccactc 120 gggccggctg gccgggcgcg gggggccgga ggatggcggc ctgggggccc tgcgggggct 180 gtcggtggcc gccagctgcc tggtggtgct ggagaacttg ctggtgctgg cggccatcac 240 cagccacatg cggtcgcgac gctgggtcta ctattgcctg gtgaacatca cgctgagtga 300 cctgctcacg ggcgcggcct acctggccaa cgtgctgctg tcgggggccc gcaccttccg 360 tctggcgccc gcccagtggt tcctacggga gggcctgctc ttcaccgccc tggccgcctc 420 caccttcagc ctgctcttca ctgcagggga gcgctttgcc accatggtgc ggccggtggc 480 cgagagcggg gccaccaaga ccagccgcgt ctacggcttc atcggcctct gctggctgct 540 ggccgcgctg ctggggatgc tgcctttgct gggctggaac tgcctgtgcg cctttgaccg 600 ctgctccagc cttctgcccc tctactccaa gcgctacatc ctcttctgcc tggtgatctt 660 cgccggcgtc ctggccacca tcatgggcct ctatggggcc atcttccgcc tggtgcaggc 720 cagcgggcag aaggccccac gcccagcggc ccgccgcaag gcccgccgcc tgctgaagac 780 ggtgctgatg atcctgctgg ccttcctggt gtgctggggc ccactcttcg ggctgctgct 840 ggccgacgtc tttggctcca acctctgggc ccaggagtac ctgcggggca tggactggat 900 cctggccctg gccgtcctca actcggcggt caaccccatc atctactcct tccgcagcag 960 ggaggtgtgc agagccgtgc tcagcttcct ctgctgcggg tgtctccggc tgggcatgcg 1020 agggcccggg gactgcctgg cccgggccgt cgaggctcac tccggagctt ccaccaccga 1080 cagctctctg aggccaaggg acagctttcg cggctcccgc tcgctcagct ttcggatgcg 1140 ggagcccctg tccagcatct ccagcgtgcg gagcatctga agttgcagtc ttgcgtgtgg 1200 atggtgcagc caccgggtgc gtgccaggca ggccctcctg gggtacagga agctgtgtgc 1260 acgcagcctc gcctgtatgg ggagcaggga acgggacagg cccccatggt cttcccggtg 1320 gcctctcggg gcttctgacg ccaaatgggc ttcccatggt caccctggac aaggaggcaa 1380 ccaccccacc tccccgtagg agcagagagc accctggtgt gggggcgagt gggttcccca 1440 caaccccgct tctgtgtgat tctggggaag tcccggcccc tctctgggcc tcagtagggc 1500 tcccaggctg caaggggtgg actgtgggat gcatgccctg gcaacattga agttcgatca tggtaaaaaa aaaaaaaaaa aa
<210> 16
<211> 651
<212> DNA
<213> Homo sapiens
<400> 16
tttcacattc gggaagcgtc gggattaggt gaaagtacgt agttgtcttt cgtaagttaa 60 aatgataatt gggccgaaac ttactgcctt acctaaaagg cagcgcagtc aggatattgg 120 taggtcgggg gcggctttgg aaacccttaa gtttacaagc atgcgcggac ttgagtgctc 180 attaggtcgc cgggcgtcca cgtgcagccc tggaccctga accccggcgt gcgtgggccg 240 tgggccctcg gggaaaggtt ccgtgcactc ggggactccg gtgaagcctg ttcagccgtc 300 tgtgtcatgt ggccatcttg agtctactct gtcgctcttg tgccctagca ccccgagaac 360 cgtcagtttg agccagatgg aagctgagct gaacacatta cgatggatga tggaaacata 420 agactatcaa gaaatccaag tggtaatggg cgaagtttat tcagcatccg gcaatgaact 480 tatcgtagtt ggggaaacgg gtgttccgaa taatatcctg gaagttatca ggacacctat 540 tttaaatata ggcctgaatt ttgtaaagta atatttaagg tggtccgtga taattaaata 600 aaatgcttaa ttcatgtgac aaaaaaaaaa aaaaaaaaaa aaaaaaaaaa a 651
<210> 17
<211> 6363
<212> DNA
<213> Homo sapiens
<400> 17
gcggccgcgg ggaccctcgg cgtggtcctc tgaccctgca aacccgcgac ggaggaaggg 60 gaggtcctgc ccgaggcgcc agcccaagga ggaggatgcc catttaaccc gccctcgcct 120 gccgggcgct tgcctcggtg cccgccgccg gagcctccga gccgcgcccg tggaagtgct 180 gcatggggca gggctgctga agcgcggagt tcggggtcgc gccgctccca ggcaggcgcg 240 ggagcccggt gcggcagttg gcacagtttc ggcggcgcct tctgcgcggg agtggggggc 300 gcggtgcgcc cggccggcct ccgcggtgcc ctggtgaggc gagagttatg gagccgccca 360 gctgcattca ggatgagccg ttcccgcacc ccctggagcc cgagccgggc gtctcagctc 420 agcccggccc cgggaagcca agcgataagc ggttccggct gtggtacgtt ggggggtcgt 480 gcctggacca caggaccacg ctgcctatgc tgccctggct catggccgag atccgcaggc 540 gcagccagaa gcccgaggcg ggcggctgcg gggcgccggc ggcccgagag gtgatcctgg 600 tgctcagcgc gcccttcctg cgttgcgtcc ccgcgccggg cgctggggcc tcggggggca 660 ctagtccgtc ggccacgcag cccaacccgg cggtattcat cttcgagcac aaggcgcagc 720 atatctcgcg cttcatccac aacagccacg acctcaccta ctttgcctac ctgatcaagg 780 cgcagcccga cgaccccgag tcgcagatgg cctgccacgt tttccgcgcc acagacccca 840 gccaggttcc tgatgttatt agcagcataa ggcaattatc taaagcggcc atgaaagagg 900 atgccaaacc cagcaaagat aatgaggacg ccttttacaa ctctcagaag ttcgaagtcc 960 tgtactgtgg aaaggtgacc gtgacccaca agaaggcccc ctcaagcctc atcgatgact 1020 gcatggagaa gttcagcctg cacgaacagc agcgcctgaa gatccaaggg gagcagcgcg 1080 gtccggaccc aggagaggac ctggctgact tggaggtggt ggtgcccggg tcccccggag 1140 actgcctgcc ggaggaggct gacggcaccg acacccacct tggcttacct gccggggcca 1200 gccagcctgc cctgaccagc tctcgggtct gcttccctga gcggattttg gaagattctg 1260 gctttgatga gcagcaggag tttcggtctc ggtgcagcag tgtcaccggc gtgcaacgga 1320 gagttcacga gggcagccag aaatcccagc cgcgacggag acacgcgagc gcacccagtc 1380 acgtccagcc ctcggactcg gagaagaaca ggaccatgct cttccaggtt gggcgatttg 1440 agattaacct tatcagtcca gacactaaat cagttgtgct agaaaagaat tttaaagata 1500 tctcctcttg ttctcagggt ataaagcatg tggatcactt tggctttatc tgccgggagt 1560 ctccagagcc tggacttagc cagtatattt gttatgtatt ccagtgtgcc agcgaatctc 1620 tggttgatga ggtaatgctg actctgaaac aggccttcag tacggcggct gccctgcaga 1680 gtgccaagac gcagattaaa ctgtgtgagg cctgcccgat gcactctttg cataagctct 1740 gtgaaaggat tgaaggtctc tacccaccaa gagccaagct ggtgatacag aggcatctct 1800 catcactgac agataatgag caagctgaca tctttgaaag agttcagaaa atgaagccag 1860 tcagtgacca ggaagaaaat gaacttgtga ttttacacct gaggcagctg tgtgaagcca 1920 agcagaagac acacgtgcac atcggggaag gcccttctac tatttcaaat agtacaatcc 1980 cagaaaatgc aacaagcagt ggaaggttca aacttgacat tctgaaaaat aaagctaaga 2040 gatccttaac tagctccctg gaaaatatct tctcaagggg agctaacaga atgagaggtc 2100 ggcttggaag tgtggacagt tttgaacggt ccaacagtct tgcttcagag aaggactact 2160 caccagggga ttctccacca gggacaccgc cagcgtcccc accgtcctca gcttggcaaa 2220 cgtttcccga agaggattcc gactccccgc agtttcgaag acgggcacac acgttcagcc 2280 acccaccttc aagcacaaag agaaagctga atttgcagga tgggagggct cagggtgtgc 2340 gttcccctct gctgaggcag agctccagtg aacagtgcag caatctttcg tcagttcgac 2400 gcatgtacaa ggagagtaat tcttcctcca gtcttccaag tcttcacact tccttctctg 2460 ccccttcctt cactgccccc tctttcctga aaagctttta ccagaattca ggtagactgt 2520 ccccacagta tgaaaatgaa atcagacaag acactgcttc agaatcaagt gatggagaag 2580 ggagaaaaag gacctcatct acctgcagca atgagtccct aagtgtggga ggaacctctg 2640 tcactcctcg ccggatctcc tggcggcagc gcattttcct cagggttgct tctcccatga 2700 acaaatctcc ctcagcaatg caacagcaag atggattgga caggaacgag ctgctgccac 2760 tgtcccccct ctctccaacc atggaggagg aaccgctggt tgtattcctg tctggggagg 2820 atgacccaga aaagattgaa gaaagaaaga aatcaaaaga actgaggagc ttgtggagaa 2880 aagctataca ccaacaaatc ttgttacttc gaatggaaaa agaaaaccag aaacttgaag 2940 caagcagaga tgaactccag tccagaaaag ttaaattaga ctatgaagaa gttggtgcat 3000 gtcagaaaga ggtcttaata acttgggata agaagttgtt aaactgcaga gctaaaatca 3060 gatgtgatat ggaagatatt catactcttc ttaaagaagg agttcccaaa agtcgacgag 3120 gagaaatttg gcagtttctg gctttacagt accgactcag acacagattg cctaataaac 3180 aacagcctcc tgacatatcc tataaggaac ttttgaagca gctcactgct cagcagcatg 3240 cgattctcgt ggatttagga aggacgtttc ctactcaccc ttacttttca gtacagcttg 3300 ggccaggaca gctgtcactg tttaacctcc tgaaagccta ttctttgctg gacaaagaag 3360 tgggatactg tcaggggatc agctttgtgg ctggagtcct gcttctgcac atgagtgaag 3420 agcaagcctt tgaaatgctg aaattcctca tgtatgacct cggcttccgc aagcagtaca 3480 gacctgacat gatgtcgctg cagattcaaa tgtaccagct gtccaggctc cttcatgact 3540 atcacagaga tctctacaat caccttgaag aaaatgaaat cagccccagt ctttatgctg 3600 ccccctggtt cctcacattg tttgcctctc agttttcatt aggatttgta gccagagttt 3660 ttgatattat ttttcttcag ggaactgaag ttatattcaa ggttgcactc agcctactga 3720 gcagccaaga gacacttata atggaatgtg agagctttga aaatattgtt gagtttctta 3780 aaaacacgct acctgatatg aatacctctg aaatggaaaa aattattacc caggtttttg 3840 agatggatat ttctaagcag ttgcatgcct atgaggtgga atatcatgtg ctacaggatg 3900 agcttcagga atcttcatat tcctgtgagg atagtgaaac tttggagaag ctggagaggg 3960 ccaatagcca actgaaaaga caaaacatgg acctcctaga aaaattacag gtagctcata 4020 ctaaaatcca ggccttggaa tcaaacctgg aaaatctttt gacgagagag accaaaatga 4080 agtctttaat ccggaccctg gaacaagaaa aaatggctta tcaaaagaca gtggagcaac 4140 tccggaagct gctgcccgcg gatgctctag tcaattgtga cctgttgctg agagacctaa 4200 actgcaaccc taacaacaaa gccaagatag gaaataagcc ataattgaag aggcacggcc 4260 tcagcagaaa gtgctcctta gaatactaca gagaggaaga gcctgcatgt cgctggccca 4320 aggctggacc ctgaagctga tggaaccacc taatactggt gctgagcgcc tagtcacagc 4380 aggtggacct cgtgctcatc agagcatgcc aatcctaagc cattggacat atgtagactg 4440 gtttttgttg ttgctatgta catataaata tatatataaa atgaacatag ttcatgcttt 4500 cagataaaat gagtagatgt atatttagat taattttttt agtcagaact tcatgaaatc 4560 cacaccaaag gaaaggtaaa ctgaaatttc ccttggacat atgtgaaatc tttttgtctt 4620 tatagtgaaa caaagccaga gcatctttgt atattgcaat atacttgaaa aaaatgaatg 4680 tatttttttc tccaaagaac agcatgtttc actcaatggt gaaaaggtgg aaacatttat 4740 gtaactttat gtgtatctgt cttgatatct actgacattg tctatatgag gaaaatgatt 4800 actggtcatg ctcctgtgag ttttttggga aggtagggtc atttctccct gcctgctttg 4860 tgccaactag catgttgcat ctacatgcat tatgagtctg gttaggcatt actttaaaca 4920 tacataaaga gacagtagga cattgtggct gagtctaccc agctcaaggt aaaggagaat 4980 gttgctaatt ttttagcaaa ctagaccagc attattactc aaactaaaaa tatcacacct 5040 gaaaaattta atttaggacc taaaatgtct agattagctt tctgcttttt ttatttgaat 5100 aactcattca gttgtgaatg aattcctctt tatttggtgc cacagtcacc aaatgacaag 5160 gatttgccac tttcccacca aattgtgagt gcttgtaatt taggtctctc taccttaaat 5220 tcagtataag gaaacgtaat tatgattgat tttttccaaa gatgacaagc tgtgttgaaa 5280 tacatttttt cttttgacca attgacagaa tctaataagc tttaataatc ttcccctttt 5340 atgtgaaaag ttttgagaac tgtgaaatgt ttaggaacaa actgttgaaa tccattggaa 5400 gggaaaaaag aaagtggtac cagtgttacc agctcaacta aaacctgcaa ttctgcattt 5460 caactcttca cttcctcagc ctacaaatag ctcattagat gacattcacg catgctgggt 5520 ataggcaagg aaagtaattt tcaaagtaca tttgcagttc tctttttcag agatgattct 5580 atgatagtgc ctctgaaagt tgatgcagca tttttgcctt tccaaaaagt atttatcctc 5640 actgcttttt gcagtacttg tattttcaca gatggattat ctggggtaat tttcttcaaa 5700 gggagtttgt tatacacagt gaaaatgtat tatagagtag aatagtaaag ctctaggggt 5760 ttcagaaagc tttgatgaac agatgacaaa catctgaaac cccctccgca ctgttaccca 5820 gtgtgtatat aatgacttgt tatagctcag tgtgcccttg aatccataca gtttcttaaa 5880 agacaataaa atcttattaa taaagttaat gtaacttcta agttctagaa aatgctgatt 5940 ctgtctgccc cattcaattg ggggctacta attgatttgt tgcttggatt tcctgagaat 6000 ttctctattt gtaggagggg ttttttcttt ttacggtctg ttgatgacaa ttactttatg 6060 ggtgtgatgc accgatggta gccaaggaat ctgttgggga agttcggaaa gaaacctttt 6120 ctttctttta ttcagtttaa agtaaacttt atcctggatg tttagaatca acattaagag 6180 ttatattatg gtgttcagag attaagctga cttggataca atattttctt ttgaaaatga 6240 attttctttt tcatttgtga tttttaaaaa atgttgcacc agttatgctt catgcatcgt 6300 tacatcttca tcaggttaat gtaatgtcta gttcctttgc aataaatata ttgctgcagc 6360 ttt 6363
<210> 18
<211> 2388
<212> DNA
<213> Homo sapiens
<220>
Figure imgf000060_0001
<223> n is a, c, g, or t
<400> 18
acagaaagga acagcaactg gaaaacaagc attgcattgc accaggatgt ctgtgaaatg 60 gacttcagta attttgctaa tacaactgag cttttgcttt agctctggga attgtggaaa 120 ggtgctggtg tgggcagcag aatacagcca ttggatgaat ataaagacaa tcctggatga 180 gcttattcag agaggtcatg aggtgactgt actggcatct tcagcttcca ttctttttga 240 tcccaacaac tcatccgctc ttaaaattga aatttatccc acatctttaa nnnnnnnnnn 300 nnnnnnnnnn tggtcagacc ttccaaaaga tacattttgg ttatattttt cacaagtaca 360 ggaaatcatg tcaatatttg gtgacataac tagaaagttc tgtaaagatg tagtttcaaa 420 taagaaattt atgaaaaaag tacaagagtc aagatttgac gtcatttttg cagatgctat 480 ttttccctgt agtgagctgc tggctgagct atttaacata ccctttgtgt acagtctcag 540 cttctctcct ggctacactt ttgaaaagca tagtggagga tttattttcc ctccttccta 600 cgtacctgtt gttatgtcag aattaactga tcaaatgact ttcatggaga gggtaaaaaa 660 tatgatctat gtgctttact ttgacttttg gttcgaaata tttgacatga agaagtggga 720 tcagttttat agtgaagttc taggaagacc cactacatta tctgagacaa tggggaaagc 780 tgacgtatgg cttattcgaa actcctggaa ttttcagttt cctcatccac tcttaccaaa 840 tgttgatttt gttggaggac tccactgcaa acctgccaaa cccctgccta aggaaatgga 900 agactttgta cagagctctg gagaaaatgg tgttgtggtg ttttctctgg ggtcaatggt 960 cagtaacatg acagaagaaa gggccaacgt aattgcatca gccctggccc agatcccaca 1020 aaaggttctg tggagatttg atgggaataa accagatacc ttaggtctca atactcggct 1080 gtacaagtgg ataccccaga atgaccttct aggtcatcca aagaccagag cttttataac 1140 tcatggtgga gccaatggca tctacgaggc aatctaccat gggatcccta tggtggggat 1200 tccattgttt gccgatcaac ctgataacat tgctcacatg aaggccaggg gagcagctgt 1260 tagagtggac ttcaacacaa tgtcgagtac agacttgctg aatgcattga agagagtaat 1320 taatgatcct tcatataaag agaatgttat gaaattatca agaattcaac atgatcaacc 1380 agtgaagccc ctggatcgag cagtcttctg gattgaattt gtcatgcgcc acaaaggagc 1440 taaacacctt cgggttgcag cccacgacct cacctggttc cagtaccact ctttggatgt 1500 gattgggttc ctgctggtct gtgtggcaac tgtgatattt atcgtcacaa aatgttgtct 1560 gttttgtttc tggaagtttg ctagaaaagc aaagaaggga aaaaatgatt agttatatct 1620 gagatttgaa gctggaaaac ctgataggtg agactacttc agtttattcc agcaagaaag 1680 attgtgatgc aagatttctt tcttcctgag acaaaaaaaa aaaaaagaaa aaaaaatctt 1740 ttcaaaattt actttgtcaa ataaaaattt gtttttcaga gatttaccac ccagttcatg 1800 gttagaaata ttttgtggca atgaagaaaa cactacggaa aataaaaaat aagataaagc 1860 cttatgagct cgtattgaaa tttgttgaac ttatattgaa atttgttgtt ctaattcaca 1920 gtatactcac aaaaaaattt actcagctta actatatttc acacattgta cataaacaca 1980 agatcattaa gaagtccact gacagtatca gtactgtttt gcacatactc agaataattt 2040 ggcttcattt tgaacaggat tgtattgttt taactgctgc taaagaaact attacatagt 2100 taaattgtac agaaagtctc tcttcctttt gatattttaa gatgagtagt attgcttggc 2160 ttttataatg catgcagctt tattctcata tttttcctaa aatttatggc caagtgttta 2220 ctgttttaga gcttaagtca tttctcagtg gaaattatgg ggaattagaa atacagcaac 2280 tcttaccttc ttcctactgt aaaattgaac tattttgtaa catctttggt ttcatgagcc 2340 aattctattt tttctagata tttaaaaata tacatctggt tgacttta 2388
<210> 19
<211> 3755
<212> DNA
<213> Homo sapiens
<400> 19
aacccggtcc ctccctctcc gcactagctg tctgccctgc cctgccgtag gagatgggct 60 gggagcctcc cacgctctcc agctcactcg gcaggcagcg gggaccaggg ctggcaggtt 120 aagcctctgg gggtggatcc tgaaaggtgg tccagccgcc tggccctgcg tgggaccctc 180 cacctggcag caggtggtga cttccaagag tgactccgtc ggaggaaaat gactccccag 240 tcgctgctgc agacgacact gttcctgctg agtctgctct tcctggtcca aggtgcccac 300 ggcaggggcc acagggaaga ctttcgcttc tgcagccagc ggaaccagac acacaggagc 360 agcctccact acaaacccac accagacctg cgcatctcca tcgagaactc cgaagaggcc 420 ctcacagtcc atgccccttt ccctgcagcc caccctgctt cccgatcctt ccctgacccc 480 aggggcctct accacttctg cctctactgg aaccgacatg ctgggagatt acatcttctc 540 tatggcaagc gtgacttctt gctgagtgac aaagcctcta gcctcctctg cttccagcac 600 caggaggaga gcctggctca gggccccccg ctgttagcca cttctgtcac ctcctggtgg 660 agccctcaga acatcagcct gcccagtgcc gccagcttca ccttctcctt ccacagtcct 720 ccccacacgg ccgctcacaa tgcctcggtg gacatgtgcg agctcaaaag ggacctccag 780 ctgctcagcc agttcctgaa gcatccccag aaggcctcaa ggaggccctc ggctgccccc 840 gccagccagc agttgcagag cctggagtcg aaactgacct ctgtgagatt catgggggac 900 atggtgtcct tcgaggagga ccggatcaac gccacggtgt ggaagctcca gcccacagcc 960 ggcctccagg acctgcacat ccactcccgg caggaggagg agcagagcga gatcatggag 1020 tactcggtgc tgctgcctcg aacactcttc cagaggacga aaggccggag cggggaggct 1080 gagaagagac tcctcctggt ggacttcagc agccaagccc tgttccagga caagaattcc 1140 agccaagtcc tgggtgagaa ggtcttgggg attgtggtac agaacaccaa agtagccaac 1200 ctcacggagc ccgtggtgct cactttccag caccagctac agccgaagaa tgtgactctg 1260 caatgtgtgt tctgggttga agaccccaca ttgagcagcc cggggcattg gagcagtgct 1320 gggtgtgaga ccgtcaggag agaaacccaa acatcctgct tctgcaacca cttgacctac 1380 tttgcagtgc tgatggtctc ctcggtggag gtggacgccg tgcacaagca ctacctgagc 1440 ctcctctcct acgtgggctg tgtcgtctct gccctggcct gccttgtcac cattgccgcc 1500 tacctctgct ccaggaggaa acctcgggac tacaccatca aggtgcacat gaacctgctg 1560 ctggccgtct tcctgctgga cacgagcttc ctgctcagcg agccggtggc cctgacaggc 1620 tctgaggctg gctgccgagc cagtgccatc ttcctgcact tctccctgct cacctgcctt 1680 tcctggatgg gcctcgaggg gtacaacctc taccgactcg tggtggaggt ctttggcacc 1740 tatgtccctg gctacctact caagctgagc gccatgggct ggggcttccc catctttctg 1800 gtgacgctgg tggccctggt ggatgtggac aactatggcc ccatcatctt ggctgtgcat 1860 aggactccag agggcgtcat ctacccttcc atgtgctgga tccgggactc cctggtcagc 1920 tacatcacca acctgggcct cttcagcctg gtgtttctgt tcaacatggc catgctagcc 1980 accatggtgg tgcagatcct gcggctgcgc ccccacaccc aaaagtggtc acatgtgctg 2040 acactgctgg gcctcagcct ggtccttggc ctgccctggg ccttgatctt cttctccttt 2100 gcttctggca ccttccagct tgtcgtcctc taccttttca gcatcatcac ctccttccaa 2160 ggcttcctca tcttcatctg gtactggtcc atgcggctgc aggcccgggg tggcccctcc 2220 cctctgaaga gcaactcaga cagcgccagg ctccccatca gctcgggcag cacctcgtcc 2280 agccgcatct aggcctccag cccacctgcc catgtgatga agcagagatt cggcctcgtc 2340 gcacactgcc tgtggccccc gagcccggcc cagccccagg ccagtcagcc gcagactttg 2400 gaaagcccaa cgaccatgga gagatgggcc gttgccatgg tggacggact cccgggctgg 2460 gcttttgaat tggccttggg gactactcgg ctctcactca gctcccacgg gactcagaag 2520 tgcgccgcca tgctgcctag ggtactgtcc ccacatctgt cccaacccag ctggaggcct 2580 ggtctctcct tacaacccct gggcccagcc ctcattgctg ggggccaggc cttggatctt 2640 gagggtctgg cacatcctta atcctgtgcc cctgcctggg acagaaatgt ggctccagtt 2700 gctctgtctc tcgtggtcac cctgagggca ctctgcatcc tctgtcattt taacctcagg 2760 tggcacccag ggcgaatggg gcccagggca gaccttcagg gccagagccc tggcggagga 2820 gaggcccttt gccaggagca cagcagcagc tcgcctacct ctgagcccag gccccctccc 2880 tccctcagcc ccccagtcct ccctccatct tccctggggt tctcctcctc tcccagggcc 2940 tccttgctcc ttcgttcaca gctgggggtc cccgattcca atgctgtttt ttggggagtg 3000 gtttccagga gctgcctggt gtctgctgta aatgtttgtc tactgcacaa gcctcggcct 3060 gcccctgagc caggctcggt accgatgcgt gggctgggct aggtccctct gtccatctgg 3120 gcctttgtat gagctgcatt gcccttgctc accctgacca agcacacgcc tcagaggggc 3180 cctcagcctc tcctgaagcc ctcttgtggc aagaactgtg gaccatgcca gtcccgtctg 3240 gtttccatcc caccactcca aggactgaga ctgacctcct ctggtgacac tggcctaggg 3300 cctgacactc tcctaagagg ttctctccaa gcccccaaat agctccaggc gccctcggcc 3360 gcccatcatg gttaattctg tccaacaaac acacacgggt agattgctgg cctgttgtag 3420 gtggtaggga cacagatgac cgacctggtc actcctcctg ccaacattca gtctggtatg 3480 tgaggcgtgc gtgaagcaag aactcctgga gctacaggga cagggagcca tcattcctgc 3540 ctgggaatcc tggaagactt cctgcaggag tcagcgttca atcttgacct tgaagatggg 3600 aaggatgttc tttttacgta ccaattcttt tgtcttttga tattaaaaag aagtacatgt 3660 tcattgtaga gaatttggaa actgtagaag agaatcaaga agaaaaataa aaatcagctg 3720 ttgtaatcac ctagcaaaaa aaaaaaaaaa aaaaa 3755
<210> 20
<211> 9395
<212> DNA
<213> Homo sapiens
<400> 20
ttggggagct cccgcccctt ccgcctcggc gccccgccca ggcctcgccc ctaggtgttc 6 ccgcccctcc ccctcccgtg tcgctcgctt tctgtcagcc tctctccctc tccctctccc 120 ctctccttcc tctcgcttcc tctctcgcac ctgagcgtac gcacctgccc gggcccggct 180 ccctcctcct ctcccctccc tctttccccg cccggccgcg ggagcctcgt ggctgcgtca 240 ccgccgcccc cccagacaag atggacaccg cggaggaagg ccgcatctac aagtgcttgt 300 ttactggctc cgtgagctca ctactgacgc tgccattaga tatgctgtca acggaaaatt 360 tgttggcaga ttgtttgcag ggttgttttg tggtgacgtg cacactgtgt gcattcatca 420 gcctggtgtg gttgagagag cagatagtcc atgggggagc accaatttgg ttggagcatg 480 ctgccccacc gttcaatgct gcggggcatc accaaaatga ggctccagca ggaggaaatg 540 gtgcagaaaa tgttgctgct gatcagcctg ctaacccacc agctgagaac gcagtggtgg 600 gggaaaaccc tgatgcccag gatgaccagg cagaagagga ggaggaggac aatgaggagg 660 aagatgacgc tggtgtggag gatgcggcag atgctaataa cggagcccag gatgacatga 720 attggaatgc tttagaatgg gaccgagctg ctgaagagct tacatgggaa agaatgctag 780 gacttgatgg atcactagtt tttctggaac atgtcttctg ggtggtatct ttaaatacac 840 tgttcattct tgtttttgca ttttgccctt accatattgg tcatttctcc cttgttggtt 900 tgggatttga agaacacgtc caagcatctc attttgaagg cctaatcaca accatagttg 960 ggtatatact tttagcaata acactgataa tttgtcatgg cttggcaact cttgtgaaat 1020 ttcatagatc tcgtcgctta ctgggagtct gctatattgt tgttaaggtc tctttgttag 1080 tggtggtaga aattggagta ttccctctca tttgtggttg gtggctggat atctgttcct 1140 tggaaatgtt tgatgctact ctgaaagatc gagaactgag ctttcagtcg gctccaggta 1200 ctaccatgtt tctgcattgg ctagtgggaa tggtatatgt cttctacttt gcctccttca 1260 ttctactact gagagaggta cttcgacctg gtgtcctgtg gtttctaagg aatttgaatg 1320 atccagattt caatccagta caggaaatga tccatttgcc aatatatagg catctccgaa 1380 gatttatttt gtcagtgatt gtctttggct ccattgtcct cctgatgctt tggcttccta 1440 tacgtataat taagagtgtg ctgcctaatt ttcttccata caatgtcatg ctctacagtg 1500 atgctccagt gagtgaactg tccctcgagc tgcttctgct tcaggttgtc ttgccagcat 1560 tactcgaaca gggacacacg aggcagtggc tgaaggggct ggtgcgagcg tggactgtga 1620 ccgccggata cttgctggat cttcattctt atttattggg agaccaggaa gaaaatgaaa 1680 acagtgcaaa tcaacaagtt aacaataatc agcatgctcg aaataacaac gctattcctg 1740 tggtgggaga aggccttcat gcagcccacc aagccatact ccagcaggga gggcctgttg 1800 gctttcagcc ttaccgccga cctttaaatt ttccactcag gatatttctg ttgattgtct 1860 tcatgtgtat aacattactg attgccagcc tcatctgcct tactttacca gtatttgctg 1920 gccgttggtt aatgtcgttt tggacgggga ctgccaaaat ccatgagctc tacacagctg 1980 cttgtggtct ctatgtttgc tggctaacca taagggctgt gacggtgatg gtggcatgga 2040 tgcctcaggg acgcagagtg atcttccaga aggttaaaga gtggtctctc atgatcatga 2100 agactttgat agttgcggtg ctgttggctg gagttgtccc tctccttctg gggctcctgt 2160 ttgagctggt cattgtggct cccctgaggg ttcccttgga tcagactcct cttttttatc 2220 catggcagga ctgggcactt ggagtcctgc atgccaaaat cattgcagct ataacattga 2280 tgggtcctca gtggtggttg aaaactgtaa ttgaacaggt ttacgcaaat ggcatccgga 2340 acattgacct tcactatatt gttcgtaaac tggcagctcc cgtgatctct gtgctgttgc 2400 tttccctgtg tgtaccttat gtcatagctt ctggtgttgt tcctttacta ggtgttactg 2460 cggaaatgca aaacttagtc catcggcgga tttatccatt tttactgatg gtcgtggtat 2520 tgatggcaat tttgtccttc caagtccgcc agtttaagcg cctttatgaa catattaaaa 2580 atgacaagta ccttgtgggt caacgactcg tgaactacga acggaaatct ggcaaacaag 2640 gctcatctcc accacctcca cagtcatccc aagaataaag tagttgtctc aacaacttga 2700 ccttcccctt tacatgtcct tttttgtgga cttctctctt tggagatttt tcccagtgat 2760 ctctcagcgt tgtttttaag ttaaatgtat ttgacttgtg ttctcagcat tcagagagca 2820 gcggtgtaag attctgctgt tctccctgga tcttctgaca ttactgctgt ctgagatttg 2880 tatatgtgta aatacaagtt ccttgatacc ctaaaacctt ggattaaaca gaatgtgcat 2940 tgtacatctt taaacaaaat gtatattaat ttattaaatc tagttgtcac tttattttgg 3000 acctgctgtg atctcgacag gaaacgtgcc acagagcagt agtgcgcagg caagactttt 306 cagtgacgcc ttgtggaacg cagttcatga tgtcctagca gctctcacta agggaactgt 3120 acattctttc tttcttggct attcagacct taccaagaac gttaaaggaa acaagtagaa 3180 atcagcagtg gagtgtctgt ggtaagaaaa catgaacttt atgcttcact gttagttgtt 3240 tgtggaagtt attttgtata acaccaaagc tgttgtacat ttcctactgc ctgatttttt 3300 tcatgtgtct gtgtttgtaa tattgtatag tatcttgtgc taggtgagga aattattttt 3360 aattttgata atttaatatt cctagtgtga tcagcattgg gagttgggtt tcagtggggc 3420 atgtctatac ttagagaaaa aaagtccaaa tgaagatttt catgagtcag cccccccgcc 3480 cgcccccacc ccacacccac atcctctctt ttccacacac aactatctgt ttattttttg 3540 tagcagtggc cgaaagtcct gcaaggtcat aaatctttca gagtgacatc accaactgta 3600 ctgcatctta ctggatttag gacttctgag atgcttgtga agtatagatg tggttgtggt 3660 cttagattga cagcattaga gaagactggt tagaacatct ggtctcgctg gttagtgcct 3720 cgttggctga ggactaggtg tgcatttctc ctagcttttc atcaggaaat cccaaagttt 3780 ccaaagcttt ttgtttacag aataaaactt caaataaaac caattcatta tttgtccaga 3840 aggaagcttg gctgagctgg ccttttaaca taggaatgta tttcgttgga aacattctga 3900 aaaatctcag agaactgaac ccttacaaac tttgttttcc ctcataacca aagcttcagg 3960 ttagaagttt agaaaaatag aatggttggg tacatgatct aaatgtttaa tgctaaaggt 4020 atatcgtaag ggtagtgttt gtttttgaac gataatttag aagttctcat agaaagcgta 4080 taacataggt cttcagaaac tataaaagaa ttttcatata gtattaaaat ccatagacta 4140 aaatctgaga attttttaac atatgcaagt cagccaaaca taagctacca aaataaagag 4200 caatgtgttc tggctgtttt atacttcaac aattttttcc ctaagtggta agcaattact 4260 ttaaaacata tttttaaaaa catcggtatc gggagctgcg gtggctccgg ccggttgtcc 4320 tggcacacaa ggaggcgagg ctatgcgttc gaggccaacc taggcaaaat tggaaaaaaa 4380 aaaaaaaatc agtatcagaa ataatgcttg acataggatt cagtgttata ctctttggca 4440 ctttagagca gccttttctc tcatttgaaa taaggtcttt gttagcccta cctttactca 4500 gaaatttagt cctactataa aaaattagga ttttaaaata taatgctgcc tgtcatagtg 4560 tatttctaaa ttgctctctt tgggagaaaa acattattat tgaattgagc atataactat 4620 ataactatat aactatatat atatatatat atatatatat atatatatat atatatatat 4680 atatatatat attttttttt tttttttttt tttttttttt ttgcccccga gacagagtct 4740 tactctgttg cccaggctgg agtgcagtgg cacgatcctg gttcactgca ccctctgcct 4800 cccaggttca agcgattctc ctgcctcagc ctcccaagta gctggtacta caggtgcgtg 4860 ccaccacacc tggcttattt tagttatttt agtagagata gggtttcacc atgttggcca 4920 ggctggtctc ggaactcctg actgcaggtg atccgtcctc cttgggctcc caaagtgctg 4980 ggattatagg tgtgagcccc cacgcctggc caacttgatt atattaattt tgggtttttt 5040 gttttgtttt gttttgtttt ttaatttttc aaagaatttg tggatcttgg tataaagtca 5100 ttggaacata tctgttttaa cagcaagaga tcctaggcag atacttcaaa aggttaaaaa 5160 ttcttaatcc tacagaattt taaatgaatc ttatcaatgt tttgtaaaca aacaatatga 5220 atggccaaaa aattgccctc cattttactg gcaggtaatt tatattgtct tacttaagaa 5280 ttctccccta gtttttcaat tgtactttat cgtgttgttt tcaatgagat aagtattttc 5340 atagggaaag cattttccag cataatattt gcttgggtag catccgggtt ttagtattta 5400 accaagagcc ttttaaatat tgaaaaccca tagttcagaa aatgttagta ttgctgccct 5460 tcttcacata aatttttttt taaattatac tattattttg cttaatttta tattgggtta 5520 aaacaacctt caagaaggtt aactaggaaa gaagaccttt ttgttttatt tttactattt 5580 atatatagaa gacaaatcag catttggtga tagttttaca tgaccagtta tcaaacggtc 5640 atagtatgaa gtgtgcagtt gttcattatt agtaaattat gtttgatttt taaactattt 5700 agtactaata gttgagatga aaactgaaga aaaatgccaa tgtgacgttt gtgtatagct 5760 agccttaaaa aacttcccat gtttttaggt gacttttttc cccctcttag tactctggag 5820 aaacaatgaa gatgggccat ctcaattcca gatgtaaaca aaaagtaatt tttatttcaa 5880 catttaatgt aactgctatt attgtggatt cttgtcttgt gtattttctt tcccttattc 5940 aagtaatata gaataacttt ccttaaaatg atttgatcca agatacgtca tttctgtatt 6000 ggcaaaatgc cactattaaa gtgtaattct tgaatttagg ttgaattgat gactttttaa 6060 aacaaaacaa acactggaca gttctacatt gtattgcgtt tgcgaatgtg cgtgaacaca 6120 cgcacacgtg caggagaatg tagtgccata agaacactgg cgctttttaa aactttccaa 6180 gctagctact tattttcatt ttcagggttg agtactctaa gctctctact tactgtgaga 6240 agttttctac attgtaaaat taaaagatta tatttaaaat acttctttag gatgttattc 6300 agccatcaaa aaaaaaccca actaaaatgt cttaccagta aagtattatt aaaagcctca 6360 tttgtggaga ttcgctgact tccttggtta agctgtcttg tatttactta atttgtcccc 6420 tcaaccaggt gttttctttt cagcctgtta ctgtcctgtg ccatttttaa aaatagccat 6480 catgagtaag ggcttcttta aaataatccc tgacaatatc tttggcatta atacgacata 6540 aagacaagac agtatatgta tgttttcatt cctgttttgt acgagtgcct cattttgtag 6600 tcatgaaagt aacatagtac cttaaataaa aatgtaatga ctaatctaag gtctaggaat 6660 aactattgta gaagctaagt agaacactta gtttggggtg tcaattttct atcaacgcaa 6720 acaccttagg gtacccagac caattttatg tatatatatt ctgaaatgct ccctttttaa 6780 tgtggtcatt tgttgccttc taagttgaga ggacaaaact gaataaagtg aatgttaatt 6840 gtagtctagt ttttttgttt gtttttgtgt tattgcaaga ttgctttaag attgcagaaa 6900 tatgcattat tagccagcat agataatcca gatgttataa taggaaggaa aaacaactcc 6960 agttaaatgt gactgtggcg actccctttt atgtatacat ttaaatgcat agttgcagag 7020 gacacatccc tccccttctg tctccctaac ttgaactgta gaccgtcctc actgggaaga 7080 gtaaggcccc acattttgtg caatagcaag ctactgccca cttctgcatg tctgctacga 7140 gatcagaaag tcagtttcac taaggttgtc ttttagctga atacttagag taaaaccaat 7200 caaatccatt gtacatacct gaccatatgt cccagataaa aaggaaagac ttgcgtgatg 726 tttttttcct tctgagtgcc ctgaggagat ccttttgctt cagactattt ctaatttttc 7320 tttgtttata ggttcagaat tttttccatg attgttttcc cccaaacaag tgcattttta 7380 attactttaa atgcaggggt gatagaggaa atgctattcg aaccatagac atctttaatc 7440 tgtgaagctg aaatttttca gcaaaggaaa ttgcatggtc aagttcagaa actggccatt 7500 aatttccccc ttccgctcca cagttcatag ctactcttgt tgcaaacatg tagtgataag 7560 gagaactaac gtatcaaggg gctgagaggg taacgttccc ctcctcctcc atgtttttat 7620 tttggtgtct gttttgtttc tagcactgta tttagtactg tagttgaatg aggtatgatg 7680 tctcccttca aggaactcag aatggaacag aagcatgagc aaaagattgt attacagagg 7740 tatgaataaa atgccatggg gaacacagag aaaggttcac ccgggagaga tggcctagcc 7800 acttccttag tgactgtttg gtatatgtag gtccccctgc cattactctt tcagccttct 7860 gcaatctagt tctacttagt cacacacttc tctaagacca ctcatacgta aacactacgt 7920 agaggcccct ttttgcctca ttttacattg tttagttatc attttgaaac ttttcttcac 7980 atatgtaaca gtgccggagt ttttctgctt ctctgtgttt gttcagtaac tcttctttag 8040 gatacaccta aagatgagaa gcttcatacc cagtactcct cttcattcac tcatatgttt 8100 ttgggatcag tcccttctgc tggctgtgca ttggtctaat ggaacagaat agagtccaga 8160 aataacccaa catgtatgtg gacaactggt ttttgacaga ggtgcaaagg tcttgaaaaa 8220 atgatgctgg aataattggg catcagatgc aaaattaaaa acaaattgat ccatatctta 8280 acactggcaa agattaaagt ccaaatggat tatagttccc caaaactgta taatttctag 8340 aagacaacaa ggaaaacgtg ttcagccttg ggttaggaaa agatttctta aatccaacac 8400 caaaagcaca atccatgaag gaaaaatcga taaattgtac tttatcaaaa ttgagaactt 8460 ctctttgaaa ggtaccataa ggagaacaaa aagacaagct gtagagtggc agaaaaatat 8520 ttgcaaaaca tttctgataa atgagttgca tctagattac ataaagaagt ctcaaaactg 8580 aacaagtaaa cccattgttt acttaatcgc tgtttctcct gagcttgctg cctctgcccc 8640 tgctctctct ccttttccat ttgttttcaa cattgaatcc agaatgttct tcttgagatc 8700 caagtcagat cacaccaacc ctcagaactc tccaatagac gaccatggca ctcaaaagtc 8760 cacaatagcc ttcaatgctg ggcaaaacat gaagcacccc ttttctccct tctctgaccc 8820 catcacctct gtgttcaccc tgctcctgcc gtcctccctg cctccaaaac aggtcaggcc 8880 ttcgtgcctt tgcacttact atttgcaata cccaaatgtt cttcaggctc tttagcctct 8940 tcatttcttt cctgaagtgt cattctcact gaggcttatc taaagctgca gctactgggg 9000 cattcctgtc tcatctccct gctgtatttt gtactcccgg ctctcttttg tacttttaaa 9060 catactatat ggtttacctt tgttgtttat atttgcatgt tgtttcccac ttgaatgtaa 9120 gctccaaaga ttttattttt ttaaactgaa ttattactgt attcccagaa caattccctg 9180 gcaaatattt ggtactcaat agtaatgcta agttagtaaa taaatgatga atttagaatc 9240 aaaataatgt gtctatggcc aaaataaaac ctgaaatccc tgtcctattt cccagaggta 9300 actgctgtta atagtttagt tgtgtgcttc cagacatacc ttcacagaat catttatcac 9360 aataaaggtg tcatactatg caaaaaaaaa aaaaa 9395
<210> 21
<211> 678
<212> DNA
<213> Homo sapiens
<400> 21 aggaaaagga aactgttgag aaaccgaaac tactggggaa agggagggct cactgagaac 60 catcccagta acccgaccgc cgctggtctt cgctggacac catgaatcac actgtccaaa 120 ccttcttctc tcctgtcaac agtggccagc cccccaacta tgagatgctc aaggaggagc 180 acgaggtggc tgtgctgggg gcgccccaca accctgctcc cccgacgtcc accgtgatcc 240 acatccgcag cgagacctcc gtgcccgacc atgtcgtctg gtccctgttc aacaccctct 300 tcatgaaccc ctgctgcctg ggcttcatag cattcgccta ctccgtgaag tctagggaca 360 ggaagatggt tggcgacgtg accggggccc aggcctatgc ctccaccgcc aagtgcctga 420 acatctgggc cctgattctg ggcatcctca tgaccattct gctcatcgtc atcccagtgc 480 tgatcttcca ggcctatgga tagatcagga ggcatcactg aggccaggag ctctgcccat 540 gacctgtatc ccacgtactc caacttccat tcctcgccct gcccccggag ccgagtcctg 600 tatcagccct ttatcctcac acgcttttct acaatggcat tcaataaagt gcacgtgttt 660 ctggtgctaa aaaaaaaa 678
<210> 22
<211> 1456
<212> DNA
<213> Homo sapiens
<400> 22
ctctgaaccc caaggacaac tgcgagaaga tgacttagat catgtttgag accttcacac 60 cccagccata tacgtggcca tccaggccgt gctgtccctg tacgcctctg gccatagctg 120 gcattgtgat ggactccggt gatggggtca ccaacactgt gcccatctat gaagggtaca 180 cccctccccc acaccatcct gtgtctggac ctgactggcc ggaacctgac tgactacctc 240 atgaagatcc tcacccagtg tggctacagc ttcaccacca cagtcatgca ggaaattgtg 300 tgtgacatca agaaaaggct gtgctacatc cccctggact ttgagcagga gatggccatg 360 gtgggctcca gctcctccct ggagaagagc tacaagctgc ccaatggcca ggtcatcacc 420 attagcaaca agtggttctg ctgccccgag gcactcttcc aaacttcctt cgtgggcatg 480 gaatcctgtg gcatccacga aactaccttc aactccatca tgaagtcgga tgtggacatc 540 tacaaagacc tgtatgccaa cacagtgctg tctggcagca ccaccatgta ccctagcatc 600 accaacagga tgcagaagga gattaccgcc ctggcgccca gcgcaatgaa gatcaagatc 660 attgctcctc ctgagtgcaa gtactctgtg tggatcagag gctccatcct ggcctcgctg 720 tccaccttcc agcagatgtg gatcagcaag caggagtaca gtgagtccag cccctccatc 780 gtccaccgca aatgcttcta ggtggactgt gacttagttg cattacactc tttcttgaca 840 aaacctaact tgcactaaaa acaagatgag attggtgtga ctttgttttt tttgtggctg 900 gagttttctt ttggcttgac tcaggattta aaaactggaa cggtgaaggt gacagcagtc 960 ggttggagcg agcatccccc aaagttctac aatgtggccg aggactttga ttgtacgttg 1020 ttcttttttt tgatagtcat tccaaatatc atgggatgca ttgttacagg aagtcccttg 1080 ccctcctaaa agccacctca cttctctcta aggagaatgg cccagtcctc tccggagtcc 1140 acacagggga ggtgatagca ttgctttcgt gtaaattatg taatacaatt tttaaaaaat 1200 cttcacctta atactttttt actttgtttt attttgaatg atcagccttc gtggctcccc 1260 atttcttttc ccccaactta agatgtatga aggcttttgg tctccctgga agtgcgtgga 1320 tgcagccagg acttacctgt acactaactt gagaccagtt gaataaaagt gtacacctta 1380 aaaaaaaaaa aaaagaaaaa aaaaaaaaaa aaaaaaaaaa aaaaaaaaaa aaaaaaaaaa 1440 aaaaaaaaaa aaaaaa 1456
<210> 23
<211> 1265
<212> DNA
<213> Homo sapiens
<400> 23
aaaaaaaaaa aaagacactg accttcagcg cctcggctcc agcgccatgg cgccctccag 60 gaagttcttc gttgtgggga actggaagat gaacgggcga aagcagagtc tgcgggagct 120 cgtccgcact ctgaacgcgg ccaaggtgcc ggccgacacc gaggtggttt gtactccgcc 180 tactgcctat atcgacttcg cccggcagaa gctagatccc aagattgctg tggctgcgca 240 gaactgctac aaagtgacta atggggcttt tactggggag atcagccctg gcatgatcaa 300 agactgcaga gccacgtggg tggtcctggg gcactcagag agaaggcatg tctttgggga 360 gtcagatgag ctgattgggc agaaagtggc ccatgctctg gcagaggcac tcggagtaat 420 cgcctgcact ggggagaagc tagatgaaag ggaagctggc atcactgaga aggttgtttt 480 cgagcagaca aaggtcatcg cagataacgt gaaggactgg agcaaggtcg tcctggccta 540 tgagcctgtg tgggccattg gtactggcaa gactgcaaca ccccaacagg accaggaagt 600 acacgacaag ctccgaggat ggcttaagtc caacgtctct gatgcggtgg ctcagagcac 660 ccgtatcatt tatggaggct ctgtgactgg ggcaacctgc aaggagctgg ccagccagcc 720 tggcgtggat ggcttccttg tgggtggtgc ttccctcaag cccgaattcg tggacatcat 780 caatgccaaa caatgagccc catccatctt ccctaccctt cctgccaagc cagggactaa 840 gcagcccaga agcccagtaa ctgcccttcc cctgcacatg cttctgatgg tgtcatctgc 900 tccttcctgt ggcctcatcc aaactgtacc ttcctttact gtttatatct tcaccctgta 960 atggttggga ccaggccaat cccttctcca cttactataa tggttggaac taaatgtcac 1020 caaggtggct tctccttggc tgagagatgg aaggggtggg atttgctcct gggttcccta 1080 ggccctagtg agggcagaag agaaaccatc ctctcccttc ttacactgtg aggccaagat 1140 cccctcagaa ggcaggagtg ctgccctctc ccatggtgcc cgtgcctctg tgctgtgtat 1200 gtgaaccacc cacgtgaggg aataaacctg gcactaggtc ttgtgaaaaa aaaaaaaaaa 1260 aaaga 1265
<210> 24
<211> 917
<212> DNA
<213> Homo sapiens
<400> 24
ccgcccccac actccgccca gaggggcctc agcttttcca ccactgcttt ctagtccttt 60 aactcctaga ggcaaacttt tgggggataa gaaagcctgg gaggggcctg tgccaaaacc 120 ctctctgcct ggggactggg cggtgattcc gcttctgcct gggctcctgc catggccccc 180 gagaggggct gacactttag ctcccggtgc aggtgagaac ccgcccggag gaagaaggaa 240 ggcgcgggcc ggggattagg agacggaggc ggactcggag ccagggaacc aggggtccgg 300 gctagagctg gagtcgtgag cgcgcgcccg ccccgctctg ggaggaccgc gagatgcccg 360 tgctgaagca gctgggcccc gcgcagccca agaagcggcc tgatcgcggc gccctgtcca 420 tctccgcgcc gctcggcgac ttccggcaca cgctgcacgt ggggcgcggc ggcgacgcct 480 tcggggacac ctcgttcctg agccgccacg gcggcgggcc gccccccgag ccccgggcgc 540 cccccgcggg ggccccgcgc tccccgccgc cgcccgccgt cccgcagtcc gcagcgccct 600 cgcctgccga cccgctgctg tccttccacc tggatctggg gccctccatg ctggacgcgg 660 tgctgggcgt catggacgcg gcgcgcccgg aggcggctgc cgccaagccc gacgcggaac 720 cccgccccgg gacgcagccc ccccaggccc gctgccgccc caacgcggac ctcgagctga 780 acgacgtcat cggcctctag gttccctcat tccccgcgcc cttcccgccc ggcaccccac 840 ttctgtatac ataaacggcc aaggtgtgtg cccggaaaaa aaaaaaaaaa aaaaaaaaaa 900 aaaaaaaaaa aaaaaaa 917
<210> 25
<211> 1005
<212> DNA
<213> Homo sapiens
<400> 25
atgggtaagg tgaaggtcag agtcaacaga tttggtcgta ctgggtgcct ggtgcaccat 60 gactgctttt aactctggta aagtggatat tgttgccatc aatgacccct tcattgacct 120 caactacatg gtctacatgt tcctgtatga ttctacacat ggcaaattcc atggcaccgt 180 caaggctgag aacgggacgc ttgtcatcaa tggaaacccc atcacccatt ttccaggacc 240 aagatccctc caaaatcaaa tggggtgatg ctggcgctga gtacgtcgta gagttcactg 300 gtgtcttcac caccatggag aaggctgggg ctcacttgca ggggggagcc aaaagggtca 360 acatctctgt cctctctgtt gatgccccca tgtttgtgat gggcgtgaac catgagaagt 420 atgacaacag cctcaagatc gtcagcaatg cctcctgcat caccaactgc ttagcgcccc 480 agccaaggtc atccatgaca actttggtat catagaagga ttcatgacca cagttcacac 540 catcactgcc acccagacta taaatggccc ctccgggaaa ctgtcacgtg atggctgcag 600 ggctctccag gacatcatcc ctgtctctac tggcgctgcc aagcctgtgg gtaaggtcat 660 ccctgagctg aacgggaagc tcactggcat ggccttccat gtccccactg ccaatgtgtc 720 agtggcggac ctgacctgtc gtctggaaaa acctgccaat atgatgacat caagaaggtg 780 gtgaagtagg catcggaggg ccccctcaag agcatcctgg gctacaatga gcaccagatg 840 gtctccttcg acttcaacag caacacccac tcttctacct tcgatgctgg ggcagccatt 900 gtcctcaagg accactctgt caagctaatt tcctggtatg acaatgaatt tggctacagc 960 aacagggtgg tgcacctcat ggcccacaat gcctccaagg agtaa 1005
<210> 26
<211> 1039
<212> DNA <213> Homo sapiens
<400> 26
tctccctttc ctggagtcat ttgtgggagt tagggtctta tcctgatctt tcctcgtccc 60 ccgctccccc gtcctccgag gtggcagctg cgtagaagtc tggaacgttt tttcttagag 120 ttccagaatt gagactgaag tgtacccaga gacacaaaga aatacagaag tccagaatga 180 tcaaatgaga tcactaccca atgaaatgat gggccgagga gccagaagtt gtcctactac 240 ctgctttcca ggtccagatt ccacgagagc cttctgggct cccaccagag accctcagtg 300 aagactctgt tgaagacttt aagaccatgc tagaaagcct cccggcagag gttcaggaag 360 ttctgctatg aggatgcagc tggccccagg gatgtcctca ggcatctcta ggaccttgct 420 ggacggtggt ggctgagacc tgatatccac acaaaggagc agactgtgga aatgctggtg 480 caggagcaat tccaggctgt cctgcccgag gagctcagag ctcaagcaca gagatgtcag 540 cctggaatca gaatcactgg ctaagtcttg ctgcttgctt cttgctcagc ctgcatatga 600 gaagtcaacc aagcattctc ttggatgcaa gatccaactg cacccatggg aaatctgaac 660 ttccactttc tttgagcttg tctttggttg taataagaat ccatcaaaac ccatgtctgg 720 atctctcttt ggctgaggca gctttatttc ctacagaaag agtttctctg tcctcttgtt 780 aggacaatca gcctgctgct ttgggctttt ccaagagcct atttttcatc tgctggatgc 840 ctcttttttt tttttttttt ttttttgcac tgtgtgacat gctttaatat atatattttt 900 tatgtatcca aagaggtaat caactttaac tttcagaatt catgaatctt ggtttggtgg 960 catgaatgaa aggttatgta accccaccag taactcattt ggggtgccct tgaaatatta 1020 aagtttggtc ctcaataaa 1039

Claims

1. A method of identification and separation of human cell mixture with regard to the Caucasian or Asian ethnic origin, wherein the method comprises:
- contacting a cell mixture under conditions facilitating nucleic acid hybridization in human cells with a labeled probe being a nucleotide sequence complementary to the mRNA sequence of at least one genetic marker selected from human CYP1B1, CHI3L2, MOXD1, DB DD2, UGT2B 17, UTS2, CD47, SMC6, PLA2G4C, C160RF75, C10RF115, SLC7A7, PEX6, RABEPl, S1PR4, S HG8, TBC1D4, UGT2B7, GPR56, HS.137971, IFITM3, LOC644936, LOC729708, CDC42EP5, GAPDHL6, HSPC157, or, more preferably, at least two or more different genetic markers selected from human CYP1B1, CHI3L2, MOXD1, DB DD2, UGT2B 17, UTS2, CD47, SMC6, PLA2G4C, C160RF75, C10RF115, SLC7A7, PEX6, RABEPl, S1PR4, S HG8, TBC1D4, UGT2B7, GPR56, HS.137971, IFITM3, LOC644936, LOC729708, CDC42EP5, GAPDHL6, HSPC157,
- separation of the cell mixture with regard to the differences in the signals of the hybridized probe between a human cell of Caucasian ethnic origin and a human cell of Asian ethnic origin.
2. The method of identification and separation of human cell mixture according to claim 1, characterized in that the probe is complementary to at least one genetic marker, more preferably to at least two genetic markers selected from human CYP1B 1, CHI3L2, MOXD1, DB DD2, UGT2B17, UTS2.
3. The method of identification and separation of human cell mixture according to claim 1 or 2, characterized in that the probe is fluorescently labeled and in that the separation of cell mixture is carried out by using laser microdissection or in a Fluorescence - Activated Cell Sorter (FACS).
4. The method of identification and separation of human cell mixture according to claims 1-3, wherein the probe being a nucleic acid complementary to the particular genetic marker, is a Stellaris type probe, a Singer type probe or a combination thereof.
5. The method of identification and separation of human cell mixture according to claims 1-4, wherein the probe complementary to the genetic marker is complementary to CYP1B1 of SEQ ID Nol, CHI3L2 of SEQ ID No2, MOXD1 of SEQ ID No 3, DBNDD2 of SEQ ID No4, UGT2B17 of SEQ ID No5, UTS2 of SEQ ID No6, CD47 of SEQ ID No7, SMC6 of SEQ ID No8, PLA2G4C of SEQ ID No9, C160RF75 of SEQ ID No 10, ClORFl 15 of SEQ ID Nol 1, SLC7A7 of SEQ ID Nol2, PEX6 of SEQ ID Nol3, RABEPl of SEQ ID Nol4, S1PR4 of SEQ ID Nol5, SNHG8 of SEQ ID Nol6, TBC1D4 of SEQ ID Nol 7, UGT2B7 of SEQ ID Nol8, GPR56 of SEQ ID Nol9, HS.137971 of SEQ ID No20, IFITM3 of SEQ ID No21, LOC644936 of SEQ ID No22, LOC729708 of SEQ ID No23, CDC42EP5 of SEQ ID No24, GAPDHL6 of SEQ ID No25, HSPC157 SEQ of ID No26.
6. The method of identification and separation of human cell mixture according to claims 1-5, characterized in that the human cell mixture is a forensic trace.
7. A method of identification of ethnic origin of human biological material, in particular a human cell, with regard to origin from Caucasian or Asian population, comprising
mRNA quantity determination in human genetic material, of at least one genetic marker selected from human CYP1B1, CHI3L2, MOXD1, DBNDD2, UTS2, CD47, SMC6, PLA2G4C, C160RF75, ClORFl 15, SLC7A7, RABEPl, S1PR4, SNHG8, TBC1D4, UGT2B7, GPR56, HS.137971, IFITM3, LOC644936, LOC729708, CDC42EP5, GAPDHL6, HSPC157; more preferably, of at least one additional genetic marker other than selected previously, selected from human CYP1B1, CHI3L2, MOXD1, DBNDD2, UTS2, CD47, SMC6, PLA2G4C, C160RF75, ClORFl 15, SLC7A7, RABEPl, S1PR4, SNHG8, TBC1D4, UGT2B7, GPR56, HS.137971, IFITM3, LOC644936, LOC729708, CDC42EP5, GAPDHL6, HSPC157, by means for determining the mRNA quantity thereof, and
a comparison of the determined mRNA quantity with the established mRNA quantity of this marker, in human biological material originating from Caucasian or Asian population, and identification on this basis of ethnic origin of the human biological material.
8. The method of identification of ethnic origin of human biological material according to claim 7, characterized in that the means for determining the mRNA quantity is a microarray analysis, a targeted array (TLDA) analysis, Real Time PCR (QPCR) amplification or Fluorescence In Situ Hybridization (FISH).
9. The method of identification of ethnic origin of human biological material according to claims 7-8, characterized in that
when the means for determining the mRNA quantity is FISH, the marker is selected from human CYP1B 1, CHI3L2, MOXD1, DB DD2, UTS2 or
when the means for determining the mRNA quantity is microarray analysis, the marker is selected from human UTS2, SMC6, CD47, HS.137971, C160RF75, RABEPl, S1PR4, HSPC157, or
when the means for determining the mRNA quantity is targeted array analysis, the marker is selected from human UTS2, CHI3L2, ClORFl 15, C160RF75, or
when the means for determining the mRNA quantity is PCR amplification, the marker is selected from human UTS2.
10. The method of identification of ethnic origin of human biological material according to claims 7-9, characterized in that the genetic marker is selected from: CYP1B1 of SEQ ID Nol, CHI3L2 of SEQ ID No2, MOXD1 of SEQ ID No 3, DBNDD2 of SEQ ID No4, UTS2 of SEQ ID No6, CD47 of SEQ ID No7, SMC6 of SEQ ID No8, PLA2G4C of SEQ ID No9, C160RF75 of SEQ ID NolO, ClORFl 15 of SEQ ID Nol l, SLC7A7 of SEQ ID Nol2, RABEPl of SEQ ID Nol4, S1PR4 of SEQ ID Nol5, SNHG8 of SEQ ID Nol6, TBC1D4 of SEQ ID Nol 7, UGT2B7 of SEQ ID Nol 8, GPR56 of SEQ ID Nol 9, HS.137971 of SEQ ID No20, IFITM3 of SEQ ID No21, LOC644936 of SEQ ID No22, LOC729708 of SEQ ID No23, CDC42EP5 of SEQ ID No24, GAPDHL6 SEQ ID No25, HSPC157 SEQ ID No26.
11. The method of identification of ethnic origin of human biological material according to claims 7-10, characterized in that the human biological material originates from a forensic trace.
12. A use of at least one genetic marker selected from human CYP1B 1, CHI3L2, MOXD1, DBNDD2, UGT2B17, UTS2, CD47, SMC6, PLA2G4C, C160RF75, ClORFl 15, SLC7A7, PEX6, RABEPl, S1PR4, SNHG8, TBC1D4, UGT2B7, GPR56, HS.137971, IFITM3, LOC644936, LOC729708, CDC42EP5, GAPDHL6, HSPC157, for identification and separation of human cell mixture with regard to ethnic origin from the Caucasian or Asian population, based on determination of differences between mRNA quantities of said genetic markers in human cells.
13. Use according to claim 12, characterized in that in addition, determination of a difference in mRNA quantity is applied to at least one additional genetic marker other than selected previously, selected from human CYP1B1, CHI3L2, MOXD1, DBNDD2, UTS2, CD47, SMC6, PLA2G4C, C160RF75, C10RF115, SLC7A7, RABEPl, S1PR4, SNHG8, TBC1D4, UGT2B7, GPR56, HS.137971, IFITM3, LOC644936, LOC729708, CDC42EP5, GAPDHL6, HSPC157.
14. Use according to claims 12-13, characterized in that the genetic marker is selected from human CYP1B 1, CHI3L2, MOXD1, DBNDD2, UGT2B17, UTS2.
15. Use according to claims 12-14, characterized in that the differences in mRNA quantity are determined by hybridizing the nucleic acids in human cells with a labeled probe being a nucleotide sequence complementary to the mRNA sequence of the selected genetic marker.
16. Use according to claim 15, characterized in that the probe is fluorescently labeled and in that the separation of cell mixture is carried out by using laser microdissection or in a Fluorescence - Activated Cell Sorter (FACS).
17. Use according to claim 15-16, characterized in that the probe is a Stellaris type probe, a Singer type probe or a combination thereof.
18. Use according to claim 15-17, characterized in that the genetic marker is selected from a group comprising: CYP1B 1 of SEQ ID Nol, CHI3L2 of SEQ ID No2, MOXD1 of SEQ ID No 3, DBNDD2 of SEQ ID No4, UGT2B17 of SEQ ID No5, UTS2 of SEQ ID No6, CD47 of SEQ ID No7, SMC 6 of SEQ ID No8, PLA2G4C of SEQ ID No9, C160RF75 of SEQ ID NolO, C10RF115 of SEQ ID Nol l, SLC7A7 of SEQ ID No 12, PEX6 of SEQ ID Nol3, RABEPl of SEQ ID Nol4, S1PR4 of SEQ ID Nol5, SNHG8 of SEQ ID Nol6, TBC1D4 of SEQ ID Nol 7, UGT2B7 of SEQ ID Nol 8, GPR56 of SEQ ID Nol 9, HS.137971 of SEQ ID No20, IFITM3 of SEQ ID No21, LOC644936 of SEQ ID No22, LOC729708 of SEQ ID No23, CDC42EP5 of SEQ ID No24, GAPDHL6 of SEQ ID No25, HSPC157 SEQ of ID No26.
19. Use according to claim 15-18, characterized in that the human cell mixture is a forensic trace.
20. A use of genetic marker for identification of ethnic origin of human biological material, in particular a human cell, with regard to the origin from Caucasian or Asian population, wherein, by means for determining the mRNA quantity, the mRNA quantity is determined in human biological material, for at least one genetic marker selected from human CYP1B 1, CHI3L2, MOXD1, DBNDD2, UTS2, CD47, SMC6, PLA2G4C, C160RF75, C10RF115, SLC7A7, RABEPl, S1PR4, SNHG8, TBC1D4, UGT2B7, GPR56, HS.137971, IFITM3, LOC644936, LOC729708, CDC42EP5, GAPDHL6, HSPC157.
21. Use according to claim 20, characterized in that for the identification of ethnic origin of human biological material with regard to the origin from Caucasian or Asian population, in addition, the mRNA quantity is assessed for at least one additional genetic marker other than selected previously, selected from human CYP1B1, CHI3L2, MOXD1, DBNDD2, UTS2, CD47, SMC6, PLA2G4C, C160RF75, C10RF115, SLC7A7, RABEPl, S1PR4, SNHG8, TBC1D4, UGT2B7, GPR56, HS.137971, IFITM3, LOC644936, LOC729708, CDC42EP5, GAPDHL6, HSPC157.
22. Use according to claim 20-21, characterized in that the means for determining the mRNA quantity is a microarray analysis, a targeted array (TLDA) analysis, Real Time PCR (also called Quantitative PCR, QPCR) amplification or Fluorescence In Situ Hybridization (FISH).
23. Use according to claim 22, characterized in that
when the means for determining the mRNA quantity is FISH, the marker is selected from human CYP1B 1, CHI3L2, MOXD1, DBNDD2, UTS2 or
when the means for determining the mRNA quantity is microarray analysis, the marker is selected from human UTS2, SMC6, CD47, HS.137971, C160RF75, RABEPl, S1PR4, HSPC157, or when the means for determining the mRNA quantity is targeted array analysis, the marker is selected from human UTS2, CHI3L2, ClORFl 15, C160RF75, or
when the means for determining the mRNA quantity is PCR amplification, the marker is selected from human UTS2.
24. The use according to claim 20-23, characterized in that the genetic marker is selected from CYPIBI of SEQ ID Nol, CHI3L2 of SEQ ID No2, MOXDl of SEQ ID No 3, DBNDD2 of SEQ ID No4„ UTS2 of SEQ ID No6, CD47 of SEQ ID No7, SMC6 of SEQ ID No8, PLA2G4C of SEQ ID No9, C160RF75 of SEQ ID NolO, ClORFl 15 of SEQ ID Nol l, SLC7A7 of SEQ ID Nol 2, , RABEPl of SEQ ID Nol4, S1PR4 of SEQ ID Nol5, SNHG8 of SEQ ID Nol6, TBC1D4 of SEQ ID Nol 7, UGT2B7 of SEQ ID Nol 8, GPR56 of SEQ ID No 19, HS.137971 of SEQ ID No20, IFITM3 of SEQ ID No21, LOC644936 of SEQ ID No22, LOC729708 of SEQ ID No23, CDC42EP5 of SEQ ID No24, GAPDHL6 of SEQ ID No25, HSPC157 of SEQ ID No26.
25. Use according to claim 20-24, characterized in that the human biological material is collected from a forensic trace.
PCT/IB2014/063179 2013-07-18 2014-07-17 Methods of identification of ethnic origin based on differentiated transcription profiles and genetic markers used in those methods WO2015008245A2 (en)

Applications Claiming Priority (2)

Application Number Priority Date Filing Date Title
PLP.404749 2013-07-18
PL404749A PL404749A1 (en) 2013-07-18 2013-07-18 Ways to identify ethnicity based on different transcription profiles and genetic markers used in these methods

Publications (2)

Publication Number Publication Date
WO2015008245A2 true WO2015008245A2 (en) 2015-01-22
WO2015008245A3 WO2015008245A3 (en) 2015-10-22

Family

ID=51541112

Family Applications (1)

Application Number Title Priority Date Filing Date
PCT/IB2014/063179 WO2015008245A2 (en) 2013-07-18 2014-07-17 Methods of identification of ethnic origin based on differentiated transcription profiles and genetic markers used in those methods

Country Status (2)

Country Link
PL (1) PL404749A1 (en)
WO (1) WO2015008245A2 (en)

Cited By (2)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
WO2019136234A1 (en) * 2018-01-04 2019-07-11 Virginia Commonwealth University Systems and method for rapid identification and analysis of cells in forensic samples
CN110211639A (en) * 2018-02-13 2019-09-06 中国科学院北京基因组研究所 One kind of groups is distinguished and the construction method and genetic marker reference system of the genetic marker reference system of identification

Family Cites Families (1)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
WO2012178166A1 (en) * 2011-06-24 2012-12-27 Arryx, Inc. Method and apparatus for fractionating genetically distinct cells and cellular components

Non-Patent Citations (12)

* Cited by examiner, † Cited by third party
Title
CHAO TIAN: "European Population Genetic Substructure: Further Definition of Ancestry Informative Markers for Distinguishing among Diverse European Ethnic Groups", MOL MED, vol. 15, no. 11-12, November 2009 (2009-11-01), pages 371 - 383
FEMINO A.; FAY F.; FOGARTY K.; SINGER R.: "Visualization of single RNA transcripts in situ", SCIENCE, vol. 280, no. 5363, 1998, pages 585 - 90, XP002306891, DOI: doi:10.1126/science.280.5363.585
HAAS C.; HANSON E.; KRATZER A.: "Selection of highly specific and sensitive mRNA biomarkers for the identification of blood", FORENSIC SCIENCE INTERNATIONAL: GENETICS, vol. 5, 2011, pages 449 - 458, XP028275679, DOI: doi:10.1016/j.fsigen.2010.09.006
HANSON E. K.; LUBENOW H.; BALLANTYNE J.: "Identification of forensically relevant body fluids using a panel of differentially expressed microRNAs", ANALYTICAL BIOCHEMISTRY, vol. 387, 2009, pages 303 - 314, XP026007607, DOI: doi:10.1016/j.ab.2009.01.037
JUUSOLA J.; BALLANTYNE J.: "mRNA profiling for body fluid identification by multiplex quantitative RT-PCR", JOURNAL OF FORENSIC SCIENCES, vol. 52, 2007, pages 1252 - 1262
JUUSOLA J.; BALLANTYNE J.: "Multiplex mRNA profiling for the identification of body fluids", FORENSIC SCIENCE INTERNATIONAL., vol. 152, 2005, pages 1 - 12, XP025270307, DOI: doi:10.1016/j.forsciint.2005.02.020
JUUSOLA J; BALLANTYNE J.: "Messenger RNA profiling: a prototype method to supplant conventional methods for body fluid identification", FORENSIC SCI INT, vol. 135, 2003, pages 85 - 96
NUSSBAUMER C.; GHAREHBAGHI-SCHNELL E.; KORSCHINECK I.: "Messenger RNA profiling: A novel method for body fluid identification by Real-Time PCR", FORENSIC SCIENCE INTERNATIONAL, vol. 157, 2006, pages 181 - 186, XP025086072, DOI: doi:10.1016/j.forsciint.2005.10.009
RAJ A.; VAN DEN BOGAARD P.; RIFKIN S.; VAN OUDENAARDEN A.; TYAGI S.: "Imaging individual mRNA molecules using multiple singly labeled probes", NATURE METHODS, vol. 5, no. 10, 2008, pages 877 - 9, XP055041223, DOI: doi:10.1038/nmeth.1253
RICHARD ML; HARPER KA; CRAIG RL; ONORATO AJ; ROBERTSON JM; DONFACK J.: "Evaluation of mRNA marker specificity for the identification of five human body fluids by capillary electrophoresis", FORENSIC SCI INT GENET., vol. 6, no. 4, July 2012 (2012-07-01), pages 452 - 60, XP028922151, DOI: doi:10.1016/j.fsigen.2011.09.007
SPIELMAN RS; BASTONE LA; BURDICK JT; MORLEY M; EWENS WJ; CHEUNG VG.: "Common genetic variants account for differences in gene expression among ethnic groups", NAT GENET., vol. 39, no. 2, February 2007 (2007-02-01), pages 226 - 31, XP002504983, DOI: doi:10.1038/NG1955
STOREY JD; MADEOY J; STROUT JL; WURFEL M; RONALD J; AKEY JM: "Gene-expression variation within and among human populations.", AM J HUM GENET., vol. 80, no. 3, March 2007 (2007-03-01), pages 502 - 9, XP055166624, DOI: doi:10.1086/512017

Cited By (4)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
WO2019136234A1 (en) * 2018-01-04 2019-07-11 Virginia Commonwealth University Systems and method for rapid identification and analysis of cells in forensic samples
US11513057B2 (en) 2018-01-04 2022-11-29 Virginia Commonwealth University Systems and method for rapid identification and analysis of cells in forensic samples
CN110211639A (en) * 2018-02-13 2019-09-06 中国科学院北京基因组研究所 One kind of groups is distinguished and the construction method and genetic marker reference system of the genetic marker reference system of identification
CN110211639B (en) * 2018-02-13 2023-07-04 中国科学院北京基因组研究所 Construction method of genetic marker reference system for population discrimination and identification and genetic marker reference system

Also Published As

Publication number Publication date
PL404749A1 (en) 2015-01-19
WO2015008245A3 (en) 2015-10-22

Similar Documents

Publication Publication Date Title
DK2681333T3 (en) EVALUATION OF RESPONSE TO GASTROENTEROPANCREATIC NEUROENDOCRINE NEOPLASIS (GEP-NENE) THERAPY
US10889865B2 (en) Thyroid tumors identified
CN109863251B (en) Method for subtyping lung squamous cell carcinoma
CA2430981A1 (en) Gene expression profiling of primary breast carcinomas using arrays of candidate genes
AU2014299322B2 (en) Sepsis biomarkers and uses thereof
AU2008286361B2 (en) IVIG modulation of chemokines for treatment of multiple sclerosis, Alzheimer&#39;s disease, and Parkinson&#39;s disease
CA2403946A1 (en) Genes expressed in foam cell differentiation
KR101323574B1 (en) Method for measuring resistance or sensitivity to docetaxel
AU2012207442B2 (en) Prognostic signature for colorectal cancer recurrence
AU785080B2 (en) Methods for identifying compounds for motion sickness, vertigo and other disorders related to balance and the perception of gravity
US20230022417A1 (en) Chemical compositions and methods of use
WO2015008245A2 (en) Methods of identification of ethnic origin based on differentiated transcription profiles and genetic markers used in those methods
CN101778954A (en) Predictive markers for egfr inhibitor treatment
US20030175761A1 (en) Identification of genes whose expression patterns distinguish benign lymphoid tissue and mantle cell, follicular, and small lymphocytic lymphoma
WO2005100604A2 (en) Methods for identifying risk of osteoarthritis and treatments thereof
KR102323441B1 (en) Seawater temperature change responsive genes in female gametophyte of Ecklonia cava and the method for diagnosing marine environmental changes using the same
AU2013276994C1 (en) IVIG Modulations of Chemokines for Treatment of Multiple Sclerosis, Alzheimer&#39;s Disease, and Parkinson&#39;s Disease
KR102697907B1 (en) ATAC-seq data normalization method and application method using the same
DK2719769T3 (en) METHOD FOR DIAGNOSIS OF osteochondrosis

Legal Events

Date Code Title Description
NENP Non-entry into the national phase

Ref country code: DE

121 Ep: the epo has been informed by wipo that ep was designated in this application

Ref document number: 14766212

Country of ref document: EP

Kind code of ref document: A2

122 Ep: pct application non-entry in european phase

Ref document number: 14766212

Country of ref document: EP

Kind code of ref document: A2