KR102624979B1

KR102624979B1 - B4GALT1 variants and their uses

Info

Publication number: KR102624979B1
Application number: KR1020197036608A
Authority: KR
Inventors: 메이 몬타설; 크리스토퍼 반 호웃; 알렌 숄디너; 쥬시 델라 가타; 매튜 힐리; 마르자 푸루넨
Original assignee: 리제너론 파마슈티칼스 인코포레이티드; 유니버시티 오브 매릴랜드, 발티모어
Priority date: 2017-06-05
Filing date: 2018-06-04
Publication date: 2024-01-16
Also published as: MX2019014661A; SG11201911597YA; CN110997906A; WO2018226560A1; US10738284B2; IL271073A; KR20200024772A; US20180346888A1; CN110997906B; CA3065938A1; US20200399617A1; AU2018282072A1; RU2019144018A3; JP2020527329A; JP2023113657A; EP3635102A1; RU2019144018A

Abstract

변이체 B4GALT1 게놈, mRNA 및 cDNA 핵산 분자 및 폴리펩타이드, 이러한 분자의 존재를 검출하는 방법, 내인성 B4GALT1 게놈, mRNA 및 cDNA 핵산 분자 및 폴리펩타이드를 조절하는 방법, 상기 변이체 B4GALT1 게놈, mRNA 및 cDNA 핵산 분자 및 폴리펩타이드의 존재 또는 부재를 검출함으로써 심혈관 병태의 발병 위험을 확인하는 방법 및 심혈관 병태의 치료 방법이 본 명세서에 제공된다.Variant B4GALT1 genomes, mRNA and cDNA nucleic acid molecules and polypeptides, methods for detecting the presence of such molecules, methods for regulating endogenous B4GALT1 genomes, mRNA and cDNA nucleic acid molecules and polypeptides, variant B4GALT1 genomes, mRNA and cDNA nucleic acid molecules and Provided herein are methods for determining the risk of developing a cardiovascular condition and methods for treating a cardiovascular condition by detecting the presence or absence of a polypeptide.

Description

B4GALT1 variants and their uses

정부 증서에 대한 참고Note on government deeds

본 발명은 미국 국립 보건원(National Institutes of Health)에 의해서 수여된 HL121007 하의 정부 보조로 수행되었다. 정부가 본 발명에서 특정 권리를 갖는다.This invention was made with government support under HL121007 awarded by the National Institutes of Health. The Government has certain rights in the invention.

서열 목록에 대한 참고 Note on sequence listing

본 출원은 2018년 6월 4일에 생성되고, 161KB의 크기를 갖는, 파일명 18923800202SEQ의 텍스트 파일로서 전자적으로 제출된 서열 목록을 포함한다. 서열 목록은 참고로 본 명세서에 포함된다.This application contains a sequence listing submitted electronically as a text file with file name 18923800202SEQ, created on June 4, 2018, and having a size of 161 KB. The sequence listing is incorporated herein by reference.

기술분야Technology field

본 개시내용은 변이체 B4GALT1(beta-1,4-galactosyltransferase 1) 게놈, mRNA 및 cDNA 핵산 분자 및 폴리펩타이드, 이러한 분자의 존재를 검출하는 방법, 내인성 B4GALT1 게놈, mRNA 및 cDNA 핵산 분자 및 폴리펩타이드를 조절하는 방법, 상기 변이체 B4GALT1 게놈, mRNA 및 cDNA 핵산 분자 및 폴리펩타이드의 존재 또는 부재를 검출함으로써 심혈관 병태의 발병 위험을 확인하는 방법 및 심혈관 병태의 치료 방법을 제공한다.The present disclosure discloses variant B4GALT1 (beta-1,4-galactosyltransferase 1) genomes, mRNA and cDNA nucleic acid molecules and polypeptides, methods for detecting the presence of such molecules, and modulating endogenous B4GALT1 genomes, mRNA and cDNA nucleic acid molecules and polypeptides. Methods for determining the risk of developing cardiovascular conditions by detecting the presence or absence of the variant B4GALT1 genome, mRNA and cDNA nucleic acid molecules and polypeptides, and methods for treating cardiovascular conditions are provided.

특허, 공개된 출원, 등록 번호, 기술 논문 및 학술 논문을 비롯한 다양한 간행물이 본 명세서 전체에서 인용된다. 각각의 인용된 간행물은 본 명세서에서 전체적으로 그리고 모든 목적을 위해서 참고로 원용된다.Various publications are cited throughout this specification, including patents, published applications, registration numbers, technical papers, and academic papers. Each cited publication is herein incorporated by reference in its entirety and for all purposes.

베타-1,4-갈락토실트랜스퍼라제 1(beta-1,4-galactosyltransferase 1: B4GALT1)은 상이한 당접합체 및 당류 구조물의 생합성에 역할을 하는 타입 II 막-결합 당단백질을 암호화하는 베타-1,4-갈락토실트랜스퍼라제 유전자 패밀리의 구성원이다. B4GALT1에 의해서 암호화된 효소는 당단백질에서 N-연결된 올리고당류 모이어티의 가공에서 중요한 역할을 하고, 단백질-연결된 당 쇄는 보통 당단백질의 생물학적 기능을 조절한다. 따라서, 손상된 B4GALT1 활성도는 N-연결된 올리고당류를 함유하는 모든 당단백질의 구조를 변경시킬 가능성을 갖는다. B4GALT1 효소의 긴 형태는 트랜스-골지(trans-Golgi)에 국지화되는데, 여기서 그것은 고-만노즈의 복합체-유형 N-연결된 올리고당류로의 생합성 가공 과정 동안 N-아세틸글루코사민 잔기에 갈락토실 잔기를 전달한다. 갈락토실 잔기의 첨가는 시알산의 첨가를 위한 전제 조건이기 때문에, B4GALT1에서의 결함은 간접적인 효과를 발휘하여 시알산 잔기의 첨가를 차단하고, 따라서 혈장 당단백질의 반감기를 변경할 수 있다. 글리코실화에서의 결함은 LDL 수용체를 비롯한 다양한 당단백질의 세포내 트래피킹(trafficking)을 손상시킨다고 보고되어 있다. 추가로, N-연결된 올리고당류의 구조적 이상은 단백질 접힘을 변경시킬 가능성을 갖고, 이것은 결국 당단백질의 기능 및 이의 분비를 변경할 수 있다. 세포 표면 수용체(예를 들어, LDL 수용체 및 인슐린 수용체)뿐만 아니라 다양한 순환 혈장 단백질(예를 들어, 아포지질단백질 B 및 피브리노겐)을 비롯한 다수의 단백질은 N-연결된 글리코실화를 함유한다. B4GALT1 유전자에서 단백질-절두 돌연변이에 대한 동형접합성(homozygosity)으로 인해서 유전 질환을 갖는 환자가 보고되어 있다. 이러한 하나의 환자는 a) 심각한 신경발달 이상(수두증 포함), b) 근병증 및 c) 혈액 응고 이상을 특징으로 하는 심각한 표현형을 가졌다. 예측된 바와 같이, 순환 트랜스페린으로부터 유래된 올리고당류는 갈락토스 및 시알산 잔기가 결핍되었다. 동일한 유전자 결함을 갖는 2명의 추가적인 환자는 응고 방해, 간장애 및 이형 특징부(dysmorphic feature)를 특징으로 하는, 더 약한 표현형을 갖는 것으로 나타났다.Beta-1,4-galactosyltransferase 1 ( B4GALT1 ) encodes a type II membrane-bound glycoprotein that plays a role in the biosynthesis of different glycoconjugates and saccharide structures. , is a member of the 4-galactosyltransferase gene family. The enzyme encoded by B4GALT1 plays an important role in the processing of N-linked oligosaccharide moieties in glycoproteins, and protein-linked sugar chains usually regulate the biological functions of glycoproteins. Therefore, impaired B4GALT1 activity has the potential to alter the structure of all glycoproteins containing N-linked oligosaccharides. The long form of the B4GALT1 enzyme is localized to the trans-Golgi, where it attaches galactosyl residues to N-acetylglucosamine residues during biosynthetic processing into complex-type N-linked oligosaccharides of high-mannose. Deliver. Because the addition of galactosyl residues is a prerequisite for the addition of sialic acid, defects in B4GALT1 may exert an indirect effect, blocking the addition of sialic acid residues and thus altering the half-life of the plasma glycoprotein. It has been reported that defects in glycosylation impair intracellular trafficking of various glycoproteins, including the LDL receptor. Additionally, structural abnormalities in N-linked oligosaccharides have the potential to alter protein folding, which in turn may alter the function of the glycoprotein and its secretion. Many proteins contain N-linked glycosylation, including cell surface receptors (e.g., LDL receptor and insulin receptor) as well as various circulating plasma proteins (e.g., apolipoprotein B and fibrinogen). Patients with genetic diseases due to homozygosity for protein-truncating mutations in the B4GALT1 gene have been reported. One such patient had a severe phenotype characterized by a) severe neurodevelopmental abnormalities (including hydrocephalus), b) myopathy, and c) blood coagulation abnormalities. As expected, oligosaccharides derived from cyclic transferrin lacked galactose and sialic acid residues. Two additional patients with the same gene defect appeared to have a milder phenotype, characterized by coagulation disturbances, hepatic dysfunction and dysmorphic features.

심혈관 질환은 미국 및 다른 서방 국가에서 사망의 선두적인 원인이다. 죽상혈전성 심혈관 질환(atherothrombotic cardiovascular disease), 예컨대, 뇌졸중 및 심근 경색에 대한 주요 위험 인자는 증가된 혈액 콜레스테롤 및 혈전 경향을 포함한다. 지질 대사 및 응고에 관여되는 다수의 단백질은 글루코실화되고, 따라서 B4GALT1에 의해서 조절된다. 심혈관 병태의 발병 및 진행에 근본적인 유전 인자의 지식은 위험도 분류를 개선시킬 수 있고, 신규 치료 전략을 위한 토대를 제공할 수 있다.Cardiovascular disease is a leading cause of death in the United States and other Western countries. Major risk factors for atherothrombotic cardiovascular disease, such as stroke and myocardial infarction, include increased blood cholesterol and a tendency for blood clots. Many proteins involved in lipid metabolism and coagulation are glucosylated and therefore regulated by B4GALT1 . Knowledge of the genetic factors underlying the onset and progression of cardiovascular conditions can improve risk stratification and provide a foundation for novel treatment strategies.

본 개시내용은 (rs551564683으로 지정된 SNP를 포함하는) B4GALT1 변이체 게놈 서열과 적어도 약 90% 동일한 핵산 서열을 포함하는 핵산 분자를 제공하되, 단 핵산 서열은 또한 전장/성숙 B4GALT1 폴리펩타이드의 352번 위치에 상응하는 위치에 세린을 암호화하는 뉴클레오타이드를 포함한다.The present disclosure provides nucleic acid molecules comprising a nucleic acid sequence that is at least about 90% identical to a B4GALT1 variant genomic sequence (including the SNP designated rs551564683 ), provided that the nucleic acid sequence also differs from position 352 of the full-length/mature B4GALT1 polypeptide. Contains a nucleotide encoding serine at the corresponding position.

본 개시내용은 또한 (rs551564683으로 지정된 SNP를 포함하는) B4GALT1 변이체 mRNA 서열과 적어도 약 90% 동일한 핵산 서열을 포함하는 핵산 분자를 제공하되, 단 핵산 서열은 또한 전장/성숙 B4GALT1 폴리펩타이드의 352번 위치에 상응하는 위치에서 세린을 암호화한다.The present disclosure also provides a nucleic acid molecule comprising a nucleic acid sequence that is at least about 90% identical to a B4GALT1 variant mRNA sequence (including the SNP designated rs551564683 ), provided that the nucleic acid sequence also includes a nucleic acid sequence at position 352 of the full-length/mature B4GALT1 polypeptide. encodes serine at the corresponding position.

본 개시내용은 또한 (rs551564683으로 지정된 SNP를 포함하는) B4GALT1 변이체 cDNA 서열과 적어도 약 90% 동일한 핵산 서열을 포함하는 B4GALT1 폴리펩타이드를 암호화하는 cDNA 분자를 제공하되, 단 핵산 서열은 또한 전장/성숙 B4GALT1 폴리펩타이드 내의 352번 위치에 상응하는 위치에서 세린을 암호화한다.The present disclosure also provides a cDNA molecule encoding a B4GALT1 polypeptide comprising a nucleic acid sequence at least about 90% identical to a B4GALT1 variant cDNA sequence (including the SNP designated rs551564683 ), provided that the nucleic acid sequence also represents the full-length/mature B4GALT1 It encodes serine at the position corresponding to position 352 in the polypeptide.

본 개시내용은 또한 이러한 핵산 분자 중 임의의 하나 이상을 포함하는 벡터 또는 외인성 공여자 서열을 제공한다.The present disclosure also provides vectors or exogenous donor sequences comprising any one or more of these nucleic acid molecules.

본 개시내용은 또한 전장/성숙 B4GALT1 폴리펩타이드 내의 352번 위치에 상응하는 위치에 세린을 갖는 B4GALT1 폴리펩타이드와 적어도 약 90% 동일한 아미노산 서열을 포함하는 단리된 폴리펩타이드를 제공한다.The present disclosure also provides an isolated polypeptide comprising an amino acid sequence that is at least about 90% identical to a B4GALT1 polypeptide with a serine at a position corresponding to position 352 in the full-length/mature B4GALT1 polypeptide.

본 개시내용은 또한 숙주 세포에서 활성인 이종 프로모터에 작동 가능하게 연결된 이러한 핵산 분자 중 임의의 하나 이상을 포함하는 숙주 세포를 제공한다.The present disclosure also provides host cells comprising any one or more of these nucleic acid molecules operably linked to a heterologous promoter that is active in the host cell.

본 개시내용은 또한 B4GALT1 폴리펩타이드를 암호화하는 핵산 분자를 함유하는 숙주 세포를 배양함으로써(여기서 핵산 분자는 숙주 세포에서 활성인 이종 프로모터에 작동 가능하게 연결됨), 핵산 분자가 발현되는 단계 및 단리된 폴리펩타이드를 회수하는 단계에 의해서 B4GALT1 폴리펩타이드를 생산하는 방법을 제공한다.The present disclosure also provides steps in which the nucleic acid molecule is expressed by culturing a host cell containing a nucleic acid molecule encoding a B4GALT1 polypeptide, wherein the nucleic acid molecule is operably linked to a heterologous promoter active in the host cell, and the isolated polypeptide. A method for producing B4GALT1 polypeptide by recovering the peptide is provided.

본 개시내용은 또한 이러한 핵산 분자 또는 폴리펩타이드 및 이의 안정성을 증가시키기 위한 담체를 포함하는 조성물을 제공한다.The present disclosure also provides compositions comprising such nucleic acid molecules or polypeptides and a carrier to increase their stability.

본 개시내용은 또한 인간 대상체에서 (rs551564683으로 지정된 SNP를 포함하는) B4GALT1 변이체 핵산 분자의 존재 또는 부재를 검출하는 방법을 제공하며, 이 방법은, 인간 대상체로부터의 생물학적 샘플에 대해서 검정을 수행하여 생물학적 샘플 중의 핵산 분자가 전장/성숙 B4GALT1 폴리펩타이드 내의 352번 위치에 상응하는 위치에 세린을 갖는 변이체 B4GALT1 폴리펩타이드를 암호화하는 핵산 서열을 포함하는지의 여부를 결정하는 단계를 포함한다.The present disclosure also provides a method for detecting the presence or absence of a B4GALT1 variant nucleic acid molecule (comprising a SNP designated as rs551564683 ) in a human subject, comprising performing the assay on a biological sample from the human subject to determine the biological and determining whether the nucleic acid molecules in the sample comprise a nucleic acid sequence encoding a variant B4GALT1 polypeptide with a serine at a position corresponding to position 352 in the full-length/mature B4GALT1 polypeptide.

본 개시내용은 또한 인간 대상체에서 전장/성숙 B4GALT1 폴리펩타이드 내의 352번 위치에 상응하는 위치에 세린을 갖는 변이체 B4GALT1 폴리펩타이드의 존재를 검출하는 방법을 제공하며, 이 방법은 인간 대상체로부터의 생물학적 샘플에 대해서 검정을 수행하여 변이체 B4GALT1 폴리펩타이드의 존재를 결정하는 단계를 포함한다.The present disclosure also provides a method for detecting the presence of a variant B4GALT1 polypeptide with a serine at a position corresponding to position 352 in the full-length/mature B4GALT1 polypeptide in a human subject, the method comprising: and performing an assay to determine the presence of the variant B4GALT1 polypeptide.

본 개시내용은 또한 심혈관 병태의 발병에 대한 인간 대상체의 민감성을 결정하는 방법을 제공하며, 이 방법은, a) 인간 대상체로부터의 생물학적 샘플에 대해서 검정을 수행하여 생물학적 샘플 중의 핵산 분자가 전장/성숙 B4GALT1 폴리펩타이드 내의 352번 위치에 상응하는 위치에 세린을 갖는 변이체 B4GALT1 폴리펩타이드를 암호화하는 핵산 서열을 포함하는지의 여부를 결정하는 단계; 및 b) 전장/성숙 B4GALT1 폴리펩타이드 내의 352번 위치에 상응하는 위치에 세린을 갖는 변이체 B4GALT1 폴리펩타이드를 암호화하는 핵산 서열을 포함하는 핵산 분자가 생물학적 샘플에서 검출되면, 인간 대상체를 심혈관 병태의 발병에 대해서 감소된 위험을 갖는 것으로 분류하거나, 또는 전장/성숙 B4GALT1 폴리펩타이드 내의 352번 위치에 상응하는 위치에 세린을 갖는 변이체 B4GALT1 폴리펩타이드를 암호화하는 핵산 서열을 포함하는 핵산 분자가 생물학적 샘플에서 검출되지 않으면, 인간 대상체를 심혈관 병태의 발병에 대해서 증가된 위험을 갖는 것으로 분류하는 단계를 포함한다.The present disclosure also provides a method of determining the susceptibility of a human subject to the development of a cardiovascular condition, comprising: a) performing the assay on a biological sample from the human subject to determine whether the nucleic acid molecule in the biological sample is full-length/mature; determining whether it comprises a nucleic acid sequence encoding a variant B4GALT1 polypeptide having a serine at a position corresponding to position 352 in the B4GALT1 polypeptide; and b) a nucleic acid molecule comprising a nucleic acid sequence encoding a variant B4GALT1 polypeptide having a serine at a position corresponding to position 352 in the full-length/mature B4GALT1 polypeptide is detected in the biological sample, predisposing the human subject to the development of a cardiovascular condition. classified as having a reduced risk for or if no nucleic acid molecule comprising a nucleic acid sequence encoding a variant B4GALT1 polypeptide with a serine at position 352 in the full-length/mature B4GALT1 polypeptide is detected in the biological sample. , comprising classifying the human subject as having an increased risk for developing a cardiovascular condition.

본 개시내용은 또한 심혈관 병태의 발병에 대한 인간 대상체의 민감성을 결정하는 방법을 제공하며, 이 방법은, a) 인간 대상체로부터의 생물학적 샘플에 대해서 검정을 수행하여 생물학적 샘플 중의 B4GALT1 폴리펩타이드가 352번 위치에 상응하는 위치에 세린을 포함하는지의 여부를 결정하는 단계; 및 b) 전장/성숙 B4GALT1 폴리펩타이드 내의 352번 위치에 상응하는 위치에 세린을 갖는 B4GALT1 폴리펩타이드가 생물학적 샘플에서 검출되면, 인간 대상체를 심혈관 병태의 발병에 대해서 감소된 위험을 갖는 것으로 분류하거나, 또는 전장/성숙 B4GALT1 폴리펩타이드 내의 352번 위치에 상응하는 위치에 세린을 갖는 B4GALT1 폴리펩타이드가 생물학적 샘플에서 검출되지 않으면, 인간 대상체를 심혈관 병태의 발병에 대해서 증가된 위험을 갖는 것으로 분류하는 단계를 포함한다.The disclosure also provides a method of determining the susceptibility of a human subject to the development of a cardiovascular condition, the method comprising: a) performing the assay on a biological sample from the human subject to determine whether the B4GALT1 polypeptide in the biological sample is 352; determining whether a serine is included at a position corresponding to the position; and b) if a B4GALT1 polypeptide with a serine at a position corresponding to position 352 in the full-length/mature B4GALT1 polypeptide is detected in the biological sample, the human subject is classified as having a reduced risk for developing a cardiovascular condition, or If a B4GALT1 polypeptide with a serine at a position corresponding to position 352 in the full-length/mature B4GALT1 polypeptide is not detected in the biological sample, classifying the human subject as having an increased risk for developing a cardiovascular condition. .

본 개시내용은 또한 Cas 효소가 내인성 B4GALT1 유전자에 결합하거나 또는 이를 절단하도록 지시하기에 효과적인 가이드 RNA 분자를 제공하며, 여기서 가이드 RNA는 야생형 B4GALT1 유전자의 53575 내지 53577번 위치에 상응하는 위치를 포함하거나 또는 이에 인접한 (예를 들어, 하기에 논의된 바와 같이, 특정 수의 뉴클레오타이드 이내) 내인성 B4GALT1 유전자 내의 가이드 RNA 인식 서열에 혼성화되는 DNA-표적화 분절을 포함한다.The present disclosure also provides guide RNA molecules effective for directing a Cas enzyme to bind to or cleave an endogenous B4GALT1 gene, wherein the guide RNA comprises a position corresponding to positions 53575 to 53577 of the wild-type B4GALT1 gene, or and a DNA-targeting segment that hybridizes to a guide RNA recognition sequence within the endogenous B4GALT1 gene adjacent thereto (e.g., within a certain number of nucleotides, as discussed below).

본 개시내용은 또한 세포에서 내인성 B4GALT1 유전자를 변형시키는 방법을 제공하며, 이 방법은, 세포의 게놈을, a) Cas 단백질; 및 b) Cas 단백질과 복합체를 형성하고, 내인성 B4GALT1 유전자 내의 가이드 RNA 인식 서열에 혼성화되는 가이드 RNA와 접촉시키는 단계를 포함하고, 여기서 가이드 RNA 인식 서열은 야생형 B4GALT1 유전자의 53575 내지 53577번 위치에 상응하는 위치를 포함하거나 또는 이에 인접하고(예를 들어, 하기에 논의된 바와 같이, 특정 수의 뉴클레오타이드 내에 존재함), Cas 단백질은 내인성 B4GALT1 유전자를 절단한다.The present disclosure also provides a method of modifying the endogenous B4GALT1 gene in a cell, comprising modifying the genome of the cell with a) a Cas protein; and b) contacting a guide RNA that forms a complex with the Cas protein and hybridizes to a guide RNA recognition sequence in the endogenous B4GALT1 gene, wherein the guide RNA recognition sequence corresponds to positions 53575 to 53577 of the wild-type B4GALT1 gene. Containing or adjacent to the site (e.g., within a certain number of nucleotides, as discussed below), the Cas protein cleaves the endogenous B4GALT1 gene.

본 개시내용은 또한 세포에서 내인성 B4GALT1 유전자를 변형시키는 방법을 제공하며, 이 방법은, 세포의 게놈을, a) Cas 단백질; 및 b) Cas 단백질과 복합체를 형성하고, 내인성 B4GALT1 유전자 내의 제1 가이드 RNA 인식 서열에 혼성화되는 제1 가이드 RNA와 접촉시키는 단계를 포함하고, 여기서 제1 가이드 RNA 인식 서열은 B4GALT1 유전자에 대한 시작 코돈을 포함하거나 또는 시작 코돈의 약 1,000개 뉴클레오타이드 내에 존재하고, 여기서 Cas 단백질은 내인성 B4GALT1 유전자를 절단하거나 또는 이의 발현을 변경한다.The present disclosure also provides a method of modifying the endogenous B4GALT1 gene in a cell, comprising modifying the genome of the cell with a) a Cas protein; and b) contacting a first guide RNA that forms a complex with the Cas protein and hybridizes to a first guide RNA recognition sequence in the endogenous B4GALT1 gene, wherein the first guide RNA recognition sequence is the start codon for the B4GALT1 gene. or is within about 1,000 nucleotides of the start codon, wherein the Cas protein cleaves the endogenous B4GALT1 gene or alters its expression.

본 개시내용은 또한 세포를 변형시키는 방법을 제공하며, 이 방법은, 세포 내에 발현 벡터를 도입시키는 단계를 포함하고, 여기서 발현 벡터는 전장/성숙 B4GALT1 폴리펩타이드 내의 352번 위치에 상응하는 위치에 세린을 갖는 B4GALT1 폴리펩타이드를 암호화하는 뉴클레오타이드 서열을 포함하는 재조합 B4GALT1 유전자를 포함한다.The present disclosure also provides a method of transforming a cell, comprising introducing an expression vector into the cell, wherein the expression vector contains a serine expression at a position corresponding to position 352 in the full-length/mature B4GALT1 polypeptide. It includes a recombinant B4GALT1 gene comprising a nucleotide sequence encoding a B4GALT1 polypeptide.

본 개시내용은 또한 세포를 변형시키는 방법을 제공하며, 이 방법은, 세포 내에 발현 벡터를 도입시키는 단계를 포함하고, 여기서 발현 벡터는 전장/성숙 B4GALT1 폴리펩타이드 내의 352번 위치에 상응하는 위치에 세린을 갖는 B4GALT1 폴리펩타이드와 적어도 약 90% 동일한 폴리펩타이드를 암호화하는 핵산 분자를 포함하고, 여기서 폴리펩타이드는 또한 전장/성숙 B4GALT1 폴리펩타이드 내의 352번 위치에 상응하는 위치에 세린을 포함한다.The present disclosure also provides a method of transforming a cell, comprising introducing an expression vector into the cell, wherein the expression vector contains a serine expression at a position corresponding to position 352 in the full-length/mature B4GALT1 polypeptide. A nucleic acid molecule encoding a polypeptide that is at least about 90% identical to a B4GALT1 polypeptide having a

본 개시내용은 또한 세포를 변형시키는 방법을 제공하며, 이 방법은, 세포 내에 폴리펩타이드 또는 이의 단편을 도입시키는 단계를 포함하고, 여기서 폴리펩타이드는 전장/성숙 B4GALT1 폴리펩타이드 내의 352번 위치에 상응하는 위치에 세린을 갖는 B4GALT1 폴리펩타이드와 적어도 90% 동일하고, 여기서 폴리펩타이드는 또한 전장/성숙 B4GALT1 폴리펩타이드 내의 352번 위치에 상응하는 위치에 세린을 포함한다.The present disclosure also provides a method of modifying a cell, comprising introducing into the cell a polypeptide or fragment thereof, wherein the polypeptide corresponds to position 352 in the full-length/mature B4GALT1 polypeptide. is at least 90% identical to a B4GALT1 polypeptide with a serine at position, wherein the polypeptide also comprises a serine at a position corresponding to position 352 in the full-length/mature B4GALT1 polypeptide.

본 개시내용은 또한 B4GALT1 변이체 핵산 분자 또는 폴리펩타이드(rs551564683으로 지정된 SNP를 포함함)의 보유자가 아니고, 심혈관 병태를 갖거나, 이의 발병에 민감한 대상체의 치료 방법을 제공하며, 이 방법은 대상체 내에 a) Cas 단백질 또는 Cas 단백질을 암호화하는 핵산; b) 가이드 RNA 또는 가이드 RNA를 암호화하는 핵산으로서, 여기서 가이드 RNA는 Cas 단백질과 복합체를 형성하고, 내인성 B4GALT1 유전자 내의 가이드 RNA 인식 서열에 혼성화되고, 가이드 RNA 인식 서열은 야생형 B4GALT1 유전자의 53575 내지 53577번 위치에 상응하는 위치를 포함하거나 또는 이에 인접하는, 가이드 RNA 또는 가이드 RNA를 암호화하는 핵산; 및 c) 야생형 B4GALT1 유전자의 53575 내지 53577번 위치에 상응하는 위치의 표적 서열 5'에 혼성화되는 5' 상동성 아암, 야생형 B4GALT1 유전자의 53575 내지 53577번 위치에 상응하는 위치의 표적 서열 3'에 혼성화되는 3' 상동성 아암 및 5' 상동성 아암 및 3' 상동성 아암에 의해서 측접된 전장/성숙 B4GALT1 폴리펩타이드 내의 352번 위치에 상응하는 위치에 세린을 갖는 B4GALT1 폴리펩타이드를 암호화하는 뉴클레오타이드 서열을 포함하는 핵산 삽입물을 포함하는 외인성 공여자 서열을 도입시키는 단계를 포함하고, 여기서 Cas 단백질은 대상체에서 세포 내의 내인성 B4GALT1 유전자를 절단하고, 외인성 공여자 서열은 세포 내에서 내인성 B4GALT1 유전자와 재조합하고, 외인성 공여자 서열과 내인성 B4GALT1 유전자와의 재조합 시, 세린은 야생형 B4GALT1 유전자의 53575 내지 53577번 위치에 상응하는 뉴클레오타이드에서 삽입된다.The present disclosure also provides a method of treating a subject who is not a carrier of a B4GALT1 variant nucleic acid molecule or polypeptide (including the SNP designated as rs551564683 ) and has, or is susceptible to developing, a cardiovascular condition, which method comprises: ) Cas protein or nucleic acid encoding Cas protein; b) a guide RNA or a nucleic acid encoding a guide RNA, wherein the guide RNA forms a complex with a Cas protein and hybridizes to a guide RNA recognition sequence in the endogenous B4GALT1 gene, and the guide RNA recognition sequence is located at positions 53575 to 53577 of the wild-type B4GALT1 gene. A guide RNA, or a nucleic acid encoding a guide RNA, comprising or adjacent to a position corresponding to the position; and c) a 5' homology arm that hybridizes to the target sequence 5' at a position corresponding to positions 53575 to 53577 of the wild-type B4GALT1 gene, hybridizing to the target sequence 3' at a position corresponding to positions 53575 to 53577 of the wild-type B4GALT1 gene. A nucleotide sequence encoding a B4GALT1 polypeptide having a serine at a position corresponding to position 352 in the full-length/mature B4GALT1 polypeptide flanked by the 3' homology arm and the 5' homology arm and the 3' homology arm. Introducing an exogenous donor sequence comprising a nucleic acid insert, wherein the Cas protein cleaves the endogenous B4GALT1 gene within a cell in the subject, the exogenous donor sequence recombines with the endogenous B4GALT1 gene within the cell, and the exogenous donor sequence and Upon recombination with the endogenous B4GALT1 gene, serine is inserted at the nucleotide corresponding to positions 53575 to 53577 of the wild-type B4GALT1 gene.

본 개시내용은 또한 B4GALT1 변이체 핵산 분자 또는 폴리펩타이드(rs551564683으로 지정된 SNP를 포함함)의 보유자가 아니고, 심혈관 병태를 갖거나, 이의 발병에 민감한 대상체의 치료 방법을 제공하며, 이 방법은 대상체 내에 a) Cas 단백질 또는 Cas 단백질을 암호화하는 핵산; b) 제1 가이드 RNA 또는 제1 가이드 RNA를 암호화하는 핵산으로서, 제1 가이드 RNA는 Cas 단백질과 복합체를 형성하고, 내인성 B4GALT1 유전자 내의 제1 가이드 RNA 인식 서열에 혼성화되고, 여기서 제1 가이드 RNA 인식 서열은 내인성 B4GALT1 유전자에 대한 시작 코돈을 포함하거나 또는 시작 코돈의 약 1,000개 뉴클레오타이드 내에 존재하는, 제1 가이드 RNA 또는 제1 가이드 RNA를 암호화하는 핵산; c) 전장/성숙 B4GALT1 폴리펩타이드 내의 352번 위치에 상응하는 위치에 세린을 갖는 B4GALT1 폴리펩타이드를 암호화하는 뉴클레오타이드 서열을 포함하는 재조합 B4GALT1 유전자를 포함하는 발현 벡터를 도입시키는 단계를 포함하고, 여기서 Cas 단백질은 대상체에서 세포 내의 내인성 B4GALT1 유전자를 절단하거나 또는 이의 발현을 변경시키고, 그리고 발현 벡터는 대상체에서 세포 내에서 재조합 B4GALT1 유전자를 발현한다.The present disclosure also provides a method of treating a subject who is not a carrier of a B4GALT1 variant nucleic acid molecule or polypeptide (including the SNP designated as rs551564683 ) and has, or is susceptible to developing, a cardiovascular condition, which method comprises: ) Cas protein or nucleic acid encoding Cas protein; b) a first guide RNA or a nucleic acid encoding a first guide RNA, wherein the first guide RNA forms a complex with a Cas protein and hybridizes to a first guide RNA recognition sequence in the endogenous B4GALT1 gene, wherein the first guide RNA recognizes The sequence includes a first guide RNA or a nucleic acid encoding a first guide RNA that includes the start codon for the endogenous B4GALT1 gene or is within about 1,000 nucleotides of the start codon; c) introducing an expression vector comprising a recombinant B4GALT1 gene comprising a nucleotide sequence encoding a B4GALT1 polypeptide with a serine at position 352 corresponding to position 352 in the full-length/mature B4GALT1 polypeptide, wherein the Cas protein cleaves or alters the expression of the endogenous B4GALT1 gene in a cell in the subject, and the expression vector expresses the recombinant B4GALT1 gene in a cell in the subject.

본 개시내용은 또한 B4GALT1 변이체 핵산 분자 또는 폴리펩타이드(rs551564683으로 지정된 SNP를 포함함)의 보유자가 아니고, 심혈관 병태를 갖거나, 이의 발병에 민감한 대상체의 치료 방법을 제공하며, 이 방법은, 대상체 내에, 내인성 B4GALT1 유전자 내의 서열에 혼성화화고, 대상체 내의 세포에서 B4GALT1 폴리펩타이드의 발현을 감소시키는 안티센스 DNA, RNA, siRNA 또는 shRNA를 도입시키는 단계를 포함한다.The present disclosure also provides a method of treating a subject who is not a carrier of a B4GALT1 variant nucleic acid molecule or polypeptide (including the SNP designated as rs551564683 ) and has or is susceptible to developing a cardiovascular condition, comprising: , introducing antisense DNA, RNA, siRNA or shRNA that hybridizes to a sequence within the endogenous B4GALT1 gene and reduces expression of the B4GALT1 polypeptide in cells within the subject.

본 개시내용은 또한 B4GALT1 변이체 핵산 분자 또는 폴리펩타이드(rs551564683으로 지정된 SNP를 포함함)의 보유자가 아니고, 심혈관 병태를 갖거나, 이의 발병에 민감한 대상체의 치료 방법을 제공하며, 이 방법은, 대상체 내에 발현 벡터를 도입시키는 단계를 포함하고, 여기서 발현 벡터는 전장/성숙 B4GALT1 폴리펩타이드 내의 352번 위치에 상응하는 위치에 세린을 갖는 B4GALT1 폴리펩타이드를 암호화하는 뉴클레오타이드 서열을 포함하는 재조합 B4GALT1 유전자를 포함하고, 여기서 발현 벡터는 대상체 내의 세포에서 재조합 B4GALT1 유전자를 발현한다.The present disclosure also provides a method of treating a subject who is not a carrier of a B4GALT1 variant nucleic acid molecule or polypeptide (including the SNP designated as rs551564683 ) and has, or is susceptible to developing, a cardiovascular condition, comprising: Introducing an expression vector, wherein the expression vector comprises a recombinant B4GALT1 gene comprising a nucleotide sequence encoding a B4GALT1 polypeptide with a serine at a position corresponding to position 352 in the full-length/mature B4GALT1 polypeptide, wherein the expression vector expresses the recombinant B4GALT1 gene in cells within the subject.

본 개시내용은 또한 B4GALT1 변이체 핵산 분자 또는 폴리펩타이드(rs551564683으로 지정된 SNP를 포함함)의 보유자가 아니고, 심혈관 병태를 갖거나, 이의 발병에 민감한 대상체의 치료 방법을 제공하며, 이 방법은, 대상체 내에 발현 벡터를 도입시키는 단계를 포함하고, 여기서 발현 벡터는 전장/성숙 B4GALT1 폴리펩타이드 내의 352번 위치에 상응하는 위치에 세린을 갖는 B4GALT1 폴리펩타이드를 암호화하는 핵산 분자를 포함하고, 여기서 발현 벡터는 대상체 내의 세포에서 B4GALT1 폴리펩타이드를 암호화하는 핵산을 발현한다.The present disclosure also provides a method of treating a subject who is not a carrier of a B4GALT1 variant nucleic acid molecule or polypeptide (including the SNP designated as rs551564683 ) and has, or is susceptible to developing, a cardiovascular condition, comprising: Introducing an expression vector, wherein the expression vector comprises a nucleic acid molecule encoding a B4GALT1 polypeptide with a serine at a position corresponding to position 352 in the full-length/mature B4GALT1 polypeptide, wherein the expression vector is The cells express nucleic acid encoding the B4GALT1 polypeptide.

본 개시내용은 또한 B4GALT1 변이체 핵산 분자 또는 폴리펩타이드(rs551564683으로 지정된 SNP를 포함함)의 보유자가 아니고, 심혈관 병태를 갖거나, 이의 발병에 민감한 대상체의 치료 방법을 제공하며, 이 방법은, 대상체 내에 mRNA를 도입시키는 단계를 포함하고, 여기서 mRNA는 전장/성숙 B4GALT1 폴리펩타이드 내의 352번 위치에 상응하는 위치에 세린을 갖는 B4GALT1 폴리펩타이드를 암호화하고, 여기서 mRNA는 대상체 내의 세포에서 B4GALT1 폴리펩타이드를 발현한다.The present disclosure also provides a method of treating a subject who is not a carrier of a B4GALT1 variant nucleic acid molecule or polypeptide (including the SNP designated as rs551564683 ) and has, or is susceptible to developing, a cardiovascular condition, comprising: Introducing an mRNA, wherein the mRNA encodes a B4GALT1 polypeptide with a serine at a position corresponding to position 352 in the full-length/mature B4GALT1 polypeptide, and wherein the mRNA expresses the B4GALT1 polypeptide in a cell in the subject. .

본 개시내용은 또한 B4GALT1 변이체 핵산 분자 또는 폴리펩타이드(rs551564683으로 지정된 SNP를 포함함)의 보유자가 아니고, 심혈관 병태를 갖거나, 이의 발병에 민감한 대상체의 치료 방법을 제공하며, 이 방법은, 대상체 내에 전장/성숙 B4GALT1 폴리펩타이드 내의 352번 위치에 상응하는 위치에 세린을 갖는 B4GALT1 폴리펩타이드 또는 이의 단편을 도입시키는 단계를 포함한다.The present disclosure also provides a method of treating a subject who is not a carrier of a B4GALT1 variant nucleic acid molecule or polypeptide (including the SNP designated as rs551564683 ) and has, or is susceptible to developing, a cardiovascular condition, comprising: Introducing a B4GALT1 polypeptide or fragment thereof with a serine at a position corresponding to position 352 in the full-length/mature B4GALT1 polypeptide.

본 명세서에 기재되거나 예시된 방법 중 임의의 것에서, 심혈관 병태는 아테롬성 위험을 증가시키는 1종 이상의 혈청 지질의 수준을 포함할 수 있다. 혈청 지질은 콜레스테롤, LDL, HDL, 트라이글리세리드, HDL-콜레스테롤 및 비-HDL 콜레스테롤 또는 이들의 하위분류물(예를 들어, HDL2, HDL2a, HDL2b, HDL2c, HDL3, HDL3a, HDL3b, HDL3c, HDL3d, LDL1, LDL2, LDL3, 지방단백질 A, Lpa1, Lpa1, Lpa3, Lpa4 또는 Lpa5) 중 1종 이상을 포함한다. 심혈관 병태는 관상 동맥 석회화의 증가된 수준을 포함할 수 있다. 심혈관 병태는 심장주변 지방(pericardial fat)의 증가된 수준을 포함할 수 있다. 심혈관 병태는 죽상혈전성 병태를 포함할 수 있다. 죽상혈전성 병태는 피브리노겐의 증가된 수준을 포함할 수 있다. 죽상혈전성 병태는 피브리노겐-매개된 혈병을 포함할 수 있다. 심혈관 병태는 피브리노겐의 증가된 수준을 포함할 수 있다. 심혈관 병태는 피브리노겐-매개된 혈병을 포함할 수 있다. 심혈관 병태는 피브리노겐 활성도의 관여로부터 형성된 혈병을 포함할 수 있다. 피브리노겐-매개된 혈병 또는 피브리노겐 활성도의 관여로부터 형성된 혈병은 신체 내의 임의의 정맥 또는 동맥 내에 존재할 수 있다.In any of the methods described or exemplified herein, the cardiovascular condition may include levels of one or more serum lipids that increase atherosclerotic risk. Serum lipids include cholesterol, LDL, HDL, triglycerides, HDL-cholesterol, and non-HDL cholesterol or their subclasses (e.g., HDL2, HDL2a, HDL2b, HDL2c, HDL3, HDL3a, HDL3b, HDL3c, HDL3d, LDL1). , LDL2, LDL3, lipoprotein A, Lpa1, Lpa1, Lpa3, Lpa4 or Lpa5). Cardiovascular conditions may include increased levels of coronary artery calcification. Cardiovascular conditions may include increased levels of pericardial fat. Cardiovascular conditions may include atherothrombotic conditions. Atherothrombotic conditions may involve increased levels of fibrinogen. Atherothrombotic conditions may include fibrinogen-mediated blood clots. Cardiovascular conditions may involve increased levels of fibrinogen. Cardiovascular conditions may include fibrinogen-mediated blood clots. Cardiovascular conditions may include blood clots that form from involvement of fibrinogen activity. Fibrinogen-mediated blood clots, or blood clots formed from involvement of fibrinogen activity, can be present within any vein or artery within the body.

도 1은 변이체 B4GALT1과 LDL의 대표적인 전장 게놈 연관(genome-wide association)의 결과를 나타낸 도면.
도 2는 변이체 B4GALT1과 LDL의 대표적인 TOPMed WGS 연관의 결과를 나타낸 도면.
도 3은 상위의 B4GALT1-연관된 SNP의 대표적인 일배체형 구조의 결과를 나타낸 도면.
도 4는 엑솜 서열분석에 의해서 식별된 아미쉬(Amish)에서 변이체 B4GALT1 유전자와 LDL의 연관을 나타낸 도면.
도 5는 변이체 B4GALT1 유전자의 빈도가 아미쉬에서 1000배를 초과하게 풍부함을 나타낸 도면.
도 6은 B4GALT1 Asn352Ser과 감소된 혈청 지질의 연관을 나타낸 도면.
도 7은 B4GALT1 Asn352Ser과 감소된 혈청 지질 및 증가된 AST의 높은 연관 정도를 나타낸 도면.
도 8은 B4GALT1 Asn352Ser과 모든 지질 하위분류물의 연관을 나타낸 도면.
도 9는 B4GALT1 Asn352Ser과 감소된 피브리노겐 수준의 연관을 나타낸 도면.
도 10은 제시된 농도에서 안티센스 모폴리노 올리고뉴클레오타이드가 주입된 제브라피쉬 라바(zebrafish larvae)의 수정 후 5일(5 days post fertilization) 이내에 감소된 B4GALT1 전사물을 나타낸 도면.
도 11은 제시된 농도에서 안티센스 모폴리노 올리고뉴클레오타이드가 주입된 제브라피시 라바의 수정 후 5일 이내의 안티센스 모폴리노 올리고뉴클레오타이드 오프-타깃 효과의 진단 마커를 나타낸 도면.
도 12는 실험에 따른 제브라피쉬 라바의 수정 후 5일에 100개의 균질액에서의 평균 LDL 농도를 나타낸 도면.
도 13은 제브라피쉬에서 50pg의 인간 B4GALT1 mRNA의 공동 발현에 의한 LDL-c 표현형의 구제를 나타낸 도면.
도 14는 표적화된 유전자형분석(genotyping)을 사용하여 B4GALT1 N352S와 LDL 간의 유전자 연관 결과를 나타낸 도면.
도 15는 Flag-352Asn 또는 Flag-352Ser 세포내 국지화(subcellular localization)의 공초점 현미경관찰 영상.
도 16은 트랜스 골지망(trans Golgi Network) 마커 TGN46와 관련된 내인성 B4GALT1, Flag-352Asn 및 Flag-352Se 세포내 국지화의 공초점 현미경관찰 영상.
도 17(패널 A 및 B)은 B4GALT1 단백질의 정상 상태 수준에 대한 352Ser의 효과를 나타낸 도면; (패널 A) 유리 EGFP를 갖는 352Asn 또는 352Ser Flag 태그 단백질 융합체를 발현하는 COS7 세포; 및 (패널 B) RT-qPCR 분석에 의해서 결정된 B4GALT1 유전자에 대한 mRNA 발현 수준.
도 18(패널 A, B 및 C)은 활성도에 대한 352Ser 돌연변이의 효과를 나타낸 도면; (패널 A 및 B) COS7 세포에서 발현된 352Asn 또는 352Ser Flag 태그 단백질 융합체를 발현하고, B4GALT1 또는 Flag에 대한 웨스턴 블롯에 의해서 분석된 COS7 세포; (패널 C) 면역침전물에서의 B4GALT1 활성도.
도 19는 B4GALT1 N352S 유전자형 군에 의한 트라이-시알로/다이-올리고 비를 나타낸 도면.
도 20은 B4GALT1 N352S의 부(SS) 동형접합체 및 주(NN) 동형접합체의 매치된 쌍으로부터의 당단백질의 N-글리칸 분석의 대표적인 HILIC-FLR-MS 스펙트럼을 나타낸 도면.Figure 1 shows the results of a representative genome-wide association of variant B4GALT1 and LDL.
Figure 2 is a diagram showing the results of representative TOPMed WGS association of variant B4GALT1 and LDL.
Figure 3 shows the results of a representative haplotype structure of the top B4GALT1 -linked SNPs.
Figure 4 is a diagram showing the association between the variant B4GALT1 gene and LDL in the Amish identified by exome sequencing.
Figure 5 is a diagram showing that the frequency of the variant B4GALT1 gene is more than 1000 times more abundant in the Amish.
Figure 6 shows the association between B4GALT1 Asn352Ser and reduced serum lipids.
Figure 7 shows the high degree of association between B4GALT1 Asn352Ser and decreased serum lipids and increased AST.
Figure 8 shows the association of B4GALT1 Asn352Ser with all lipid subclasses.
Figure 9 shows the association between B4GALT1 Asn352Ser and reduced fibrinogen levels.
Figure 10 shows the reduced B4GALT1 transcript within 5 days post fertilization of zebrafish larvae injected with antisense morpholino oligonucleotide at the indicated concentration.
Figure 11 shows diagnostic markers of antisense morpholino oligonucleotide off-target effects within 5 days after fertilization of zebrafish Larva injected with antisense morpholino oligonucleotide at the indicated concentration.
Figure 12 is a diagram showing the average LDL concentration in 100 homogenates 5 days after fertilization of zebrafish Lava according to the experiment.
Figure 13 shows rescue of the LDL-c phenotype by co-expression of 50 pg of human B4GALT1 mRNA in zebrafish.
Figure 14 shows the results of genetic association between B4GALT1 N352S and LDL using targeted genotyping.
Figure 15 is a confocal microscopy image of subcellular localization of Flag-352Asn or Flag-352Ser.
Figure 16 is a confocal microscopy image of the intracellular localization of endogenous B4GALT1, Flag-352Asn and Flag-352Se in relation to the trans Golgi Network marker TGN46.
Figure 17 (Panels A and B) shows the effect of 352Ser on steady-state levels of B4GALT1 protein; (Panel A) COS7 cells expressing 352Asn or 352Ser Flag tag protein fusions with free EGFP; and (Panel B) mRNA expression levels for the B4GALT1 gene determined by RT-qPCR analysis.
Figure 18 (Panels A, B and C) shows the effect of the 352Ser mutation on activity; (Panels A and B) COS7 cells expressing 352Asn or 352Ser Flag tag protein fusions and analyzed by Western blot for B4GALT1 or Flag; (Panel C) B4GALT1 activity in immunoprecipitates.
Figure 19 is a diagram showing the tri-sialo/di-oligo ratio by B4GALT1 N352S genotype group.
Figure 20 shows representative HILIC-FLR-MS spectra of N-glycan analysis of glycoproteins from matched pairs of minor (SS) and major (NN) homozygotes of B4GALT1 N352S.

본 명세서에 언급된 바와 같이, 서열분석 연구는 올드 오더 아미쉬(Old Order Amish: OOA)의 개체의 약 11% 내지 12%(대안적인 대립유전자 빈도 = 6%)에 존재하는 아스파라긴 대신에 전장/성숙 B4GALT1 폴리펩타이드 내의 352번 위치에 상응하는 위치에 세린을 갖는 B4GALT1의 변이체를 식별하며, 일반적인 집단 안에서 극히 드물다. 이러한 돌연변이는 398개의 아미노산 길이의 인간 단백질의 352번 위치(N352S) 또는 짧은 아이소폼의 311번 위치에서 아스파라긴을 세린으로 변화시킨다. 변이체 B4GALT1은 저밀도 지방단백질 콜레스테롤(LDL), 총 콜레스테롤 및 피브리노겐 및 eGFR의 더 낮은 수준, 아스파테이트 트랜스아미나제(aspartate transaminase: AST)(그러나 알라닌 트랜스아미나제(ALT) 아님)의 증가된 수준 및 크레아틴 카이나제 및 크레아티닌의 혈청 수준, 근육 조직(그러나 간 또는 적혈구 아님)에서의 발현 및 호염구의 감소와 연관되는 것으로 밝혀졌다. N352S 변이체는 1종 이상의 심혈관 병태에 대해서 보호성이라고 여겨진다. 추가로 변이체 상태를 비롯한 B4GALT1은 환자의 심혈관 병태의 발병 위험을 진단하는 데 사용될 수 있다고 여겨진다.As mentioned herein, sequencing studies have shown that the full-length/mature gene instead of asparagine is present in approximately 11% to 12% of Old Order Amish (OOA) individuals (alternative allele frequency = 6%). We identify variants of B4GALT1 that have a serine at the position corresponding to position 352 in the B4GALT1 polypeptide, and are extremely rare in the general population. This mutation changes asparagine to serine at position 352 (N352S) in the 398 amino acid long human protein or at position 311 in the short isoform. Variant B4GALT1 has lower levels of low-density lipoprotein cholesterol (LDL), total cholesterol and fibrinogen, and eGFR, increased levels of aspartate transaminase (AST) (but not alanine transaminase (ALT)), and creatine. It was found to be associated with serum levels of kinase and creatinine, their expression in muscle tissue (but not the liver or red blood cells), and a decrease in basophils. The N352S variant is believed to be protective against one or more cardiovascular conditions. It is further believed that B4GALT1 , including variant status, may be used to diagnose a patient's risk of developing cardiovascular conditions.

구 "에 상응하는"은 주어진 아미노산 또는 폴리뉴클레오타이드 서열의 넘버링과 관련하여 사용되는 경우 주어진 아미노산 또는 폴리뉴클레오타이드 서열을 표준 서열과 비교할 때 명시된 표준 서열의 잔기의 넘버링을 지칭한다(본 명세서에서 표준 서열은 (야생형/전장) B4GALT1의 폴리뉴클레오타이드(gDNA 서열, mRNA 서열, cDNA 서열) 또는 폴리펩타이드임). 즉, 주어진 중합체의 잔기 번호 또는 잔기 위치는 주어진 아미노산 또는 폴리뉴클레오타이드 서열 내의 잔기의 실제 번호 위치에 의하기 보다는 표준 서열에 대해서 지정된다. 예를 들어, 주어진 아미노산 서열은 두 서열 사이에 잔기 매치를 최적화하기 위한 갭을 도입함으로써 표준 서열에 정렬될 수 있다. 이러한 사례에서, 갭이 존재하긴 하지만, 주어진 아미노산 또는 폴리뉴클레오타이드 서열에서 잔기의 넘버링은 이것이 정렬된 표준 서열에 대해서 수행된다.The phrase "corresponding to", when used in connection with the numbering of a given amino acid or polynucleotide sequence, refers to the numbering of residues in a specified standard sequence when comparing a given amino acid or polynucleotide sequence to a standard sequence (as used herein, a standard sequence is (wild type/full length) is a polynucleotide (gDNA sequence, mRNA sequence, cDNA sequence) or polypeptide of B4GALT1 . That is, the residue number or residue position of a given polymer is assigned relative to a standard sequence rather than by the actual number position of the residue within a given amino acid or polynucleotide sequence. For example, a given amino acid sequence can be aligned to a standard sequence by introducing gaps to optimize residue matches between the two sequences. In these cases, the numbering of residues in a given amino acid or polynucleotide sequence is performed relative to the standard sequence to which it is aligned, although gaps exist.

본 명세서에서 사용되는 바와 같이, 단수 형태는 그 문맥이 달리 명확하게 언급하지 않는 한 복수 대상을 포함한다.As used herein, the singular forms include the plural, unless the context clearly dictates otherwise.

본 명세서에서 사용되는 바와 같이, 그리고 그 문맥으로부터 달리 명백하지 않는 한 "약"은 언급된 값의 측정의 표준 오차(예를 들어, SEM) 내의 값을 포함한다.As used herein, and unless otherwise clear from the context, “about” includes values within the standard error of measurement (e.g., SEM) of the stated value.

본 명세서에서 사용되는 바와 같이, 용어 "및/또는"은 연관된 열거된 항목 중 하나 이상의 임의의 및 모든 가능한 조합, 뿐만 아니라 대안적으로 해석되는 경우 조합의 결핍("또는")을 지칭하거나 이들을 포함한다.As used herein, the term "and/or" refers to or includes any and all possible combinations of one or more of the associated listed items, as well as lack of combinations ("or") when alternatively interpreted. do.

본 명세서에서 사용되는 바와 같이, 용어 "포함하는" 또는 "비롯한"은 언급된 요소 중 하나 이상이 구체적으로 언급되지 않은 다른 요소를 포함할 수 있다는 것을 의미한다. 예를 들어, 단백질을 "포함하는" 또는 "포함한" 조성물은 단독으로 또는 다른 성분과 조합하여 단백질을 함유할 수 있다. 접속구 "본질적으로 이루어지는"은 청구범위의 범주가 그 청구범위에서 인용되는 명시된 요소 및 청구된 발명의 발명 주제의 기본적이고 신규한 특징(들)에 실질적으로 영향을 주지 않는 것을 포함하는 것으로 해석될 것을 의미한다. 따라서, 용어 "본질적으로 이루어지는"은 본 개시내용의 청구범위에서 사용되는 경우 "포함하는"과 동등한 것으로 해석되는 것으로 의도되지 않는다.As used herein, the terms “comprising” or “including” mean that one or more of the mentioned elements may include other elements not specifically mentioned. For example, a composition “comprising” or “comprising” a protein may contain the protein alone or in combination with other ingredients. The conjunction "consisting essentially of" means that the scope of the claim shall be construed to include the specified elements recited in the claim and without materially affecting the basic and novel feature(s) of the subject matter of the claimed invention. it means. Accordingly, the term “consisting essentially of” when used in the claims of this disclosure is not intended to be construed as equivalent to “comprising.”

본 명세서에서 사용되는 바와 같이, "선택적인" 또는 "선택적으로"는 후속으로 기재된 사건 또는 상황이 일어날 수 있거나 일어나지 않을 수 있는 것을 의미하고 그러한 기재는 그 사건 또는 상황이 일어난 예 및 일어나지 않은 예를 포함하는 것을 의미한다.As used herein, “optional” or “optionally” means that a subsequently described event or circumstance may or may not occur and such description includes instances in which the event or circumstance occurs and instances in which it does not occur. means to include.

본 명세서에서 사용되는 바와 같이, "또는"은 특정 목록의 임의의 하나의 구성원을 지칭하고, 또한 그 목록의 구성원의 임의의 조합을 포함한다.As used herein, “or” refers to any one member of a particular list, and also includes any combination of members of that list.

값의 범위의 지정은 그 범위 내의 또는 그 범위를 한정하는 모든 정수(두 종점 값 포함), 및 그 범위 내의 정수에 의해 한정된 모든 하위범위를 포함한다.The specification of a range of values includes all integers within or delimiting the range (including both endpoint values), and all subranges defined by integers within the range.

명확화를 위해서 별개의 실시형태의 맥락으로 기재된 본 개시내용의 특정 특징부는 또한 단일 실시형태의 조합으로 제공될 수 있다는 것을 인지해야 한다. 반대로, 간략화를 위해서 단일 실시형태의 맥락으로 기재된 본 개시내용의 다양한 특징부는 또한 별개로 또는 임의의 적합한 하위조합으로 제공될 수 있다.It should be recognized that certain features of the disclosure that are described in the context of separate embodiments for the sake of clarity may also be provided in combination in a single embodiment. Conversely, various features of the disclosure that are described in the context of a single embodiment for the sake of simplicity may also be provided separately or in any suitable subcombination.

본 개시내용은 단리된 B4GALT1 게놈 및 mRNA 변이체, B4GALT1 cDNA 변이체, 또는 이의 임의의 보체(complement), 및 단리된 B4GALT1 폴리펩타이드 변이체를 제공한다. 이러한 변이체는 혈청 지질의 증가된 수준, 및 피브리노겐의 증가된 수준, 관상 동맥 석회화, 관상 동맥 질환(CAD) 및 알라닌 트랜스아미나제(ALT)가 아닌 아스파테이트 아미노트랜스퍼라제(AST)의 증가된 수준을 포함하지만 이들로 제한되지 않는 다양한 심혈관 병태의 감소된 발병 위험과 연관된다고 여겨진다. 임의의 이론에 얽매이고자 함은 아니지만, 이러한 B4GALT1 변이체는 ALT가 아닌 AST의 실험적으로 관찰된 증가된 수준에 의해서 입증되는 바와 같이, 간 또는 적혈구에서가 아니라 근육 조직에서의 발현과 연관된다고 여겨진다. B4GALT1 게놈 및 mRNA 변이체, B4GALT1 cDNA 변이체 및 단리된 B4GALT1 폴리펩타이드 변이체를 포함하는 조성물이 또한 본 명세서에 제공된다. B4GALT1 게놈 및 mRNA 변이체 및 B4GALT1 cDNA 변이체에 혼성화되는 핵산 분자가 또한 본 명세서에 제공된다. 본 개시내용은 또한 B4GALT1 게놈 및 mRNA 변이체, B4GALT1 cDNA 변이체, 및 B4GALT1 폴리펩타이드 변이체를 포함하는 벡터 및 세포를 제공한다.The present disclosure provides isolated B4GALT1 genomic and mRNA variants, B4GALT1 cDNA variants, or any complement thereof, and isolated B4GALT1 polypeptide variants. These variants are associated with increased levels of serum lipids, and increased levels of fibrinogen, coronary artery calcification, coronary artery disease (CAD), and increased levels of aspartate aminotransferase (AST) but not alanine transaminase (ALT). It is believed to be associated with a reduced risk of developing a variety of cardiovascular conditions, including but not limited to: Without wishing to be bound by any theory, it is believed that this B4GALT1 variant is associated with expression in muscle tissue but not in the liver or red blood cells, as evidenced by experimentally observed increased levels of AST but not ALT. Also provided herein are compositions comprising B4GALT1 genomic and mRNA variants, B4GALT1 cDNA variants, and isolated B4GALT1 polypeptide variants. Nucleic acid molecules that hybridize to B4GALT1 genomic and mRNA variants and B4GALT1 cDNA variants are also provided herein. The disclosure also provides vectors and cells comprising B4GALT1 genomic and mRNA variants, B4GALT1 cDNA variants, and B4GALT1 polypeptide variants.

본 개시내용은 또한 생물학적 샘플에서 게놈 및/또는 mRNA 변이체, B4GALT1 cDNA 변이체, 또는 이의 보체, 및/또는 B4GALT1 폴리펩타이드 변이체의 존재 및/또는 수준을 검출하는 방법을 제공한다. 또한 심혈관 병태의 발병에 대한 대상체의 민감성을 결정하는 방법, 및 심혈관 병태를 갖거나 심혈관 병태에 대한 위험을 갖는 대상체를 진단하는 방법이 제공된다. 또한 뉴클레아제 작용제, 외인성 공여자 서열, 전사 활성인자, 전사 억제인자, 및 재조합 B4GALT1 유전자 또는 B4GALT1 폴리펩타이드를 암호화하는 핵산을 발현하기 위한 발현 벡터의 임의의 조합물을 사용함으로써 세포를 변형시키는 방법이 제공된다. 또한 심혈관 병태를 갖거나 또는 이의 발병 위험이 있는 대상체를 치료하기 위한 치료적 방법 및 예방적 방법이 제공된다.The disclosure also provides methods for detecting the presence and/or levels of genomic and/or mRNA variants, B4GALT1 cDNA variants, or complements thereof, and/or B4GALT1 polypeptide variants in a biological sample. Also provided are methods of determining a subject's susceptibility to developing a cardiovascular condition, and methods of diagnosing a subject with a cardiovascular condition or at risk for a cardiovascular condition. There is also a method of transforming a cell by using any combination of a nuclease agonist, an exogenous donor sequence, a transcriptional activator, a transcriptional repressor, and an expression vector for expressing a nucleic acid encoding a recombinant B4GALT1 gene or B4GALT1 polypeptide. provided. Also provided are therapeutic and prophylactic methods for treating subjects with or at risk of developing cardiovascular conditions.

야생형 인간 게놈 B4GALT1 핵산은 대략 56.7kb 길이이고, 6개의 엑손을 포함하고, 인간 게놈 내에 염색체 9에 위치된다. 예시적인 야생형 인간 게놈 B4GALT1 서열은 NCBI 등록 번호 NG_008919.1(서열번호 1)에 배정된다. 인간 게놈 B4GALT1의 변이체는 서열번호 2에 제시되며, 단일 뉴클레오타이드 다형성(single nucleotide polymorphism: SNP)(위치 53576에서 A에서 G로; 본 명세서에서 변이체 B4GALT1이라고 지칭됨)을 포함한다. 변이체 SNP는 야생형 B4GALT1 폴리펩타이드에 의해서 암호화된 아스파라긴이라기 보다는, 암호화된 B4GALT1 변이체 폴리펩타이드의 전장/성숙 B4GALT1 폴리펩타이드 내의 352번 위치에 상응하는 위치에 세린을 초래한다. 야생형 인간 게놈 B4GALT1의 53575 내지 53577번 위치에서의 3개의 염기 "aat"에 상반되는 바와 같이, 변이체 인간 게놈 B4GALT1 핵산은 예를 들어, 야생형 인간 게놈 B4GALT1의 53575 내지 53577번 위치에 상응하는 위치에 세린을 암호화하는 3개의 염기(예를 들어, "agt")를 포함한다(각각 서열번호 2를 서열번호 1과 비교). 일부 실시형태에서, 단리된 핵산 분자는 서열번호 2를 포함한다. 일부 실시형태에서, 단리된 핵산 분자는 서열번호 2로 이루어진다. 일부 실시형태에서, 단리된 핵산 분자는 본 명세서에 개시된 임의의 게놈 B4GALT1 핵산 분자의 보체이다.The wild-type human genomic B4GALT1 nucleic acid is approximately 56.7 kb long, contains six exons, and is located on chromosome 9 in the human genome. An exemplary wild-type human genomic B4GALT1 sequence is assigned to NCBI accession number NG_008919.1 (SEQ ID NO: 1). The variant of human genome B4GALT1 is set forth in SEQ ID NO:2 and contains a single nucleotide polymorphism (SNP) (A to G at position 53576; referred to herein as variant B4GALT1 ). The variant SNP results in a serine at a position corresponding to position 352 in the full-length/mature B4GALT1 polypeptide, rather than the asparagine encoded by the wild-type B4GALT1 polypeptide. As opposed to the three bases "aat" at positions 53575 to 53577 of wild-type human genome B4GALT1 , the variant human genomic B4GALT1 nucleic acid may contain, for example, a serine at a position corresponding to positions 53575 to 53577 of wild-type human genome B4GALT1 . (e.g., “agt”) encoding (compare SEQ ID NO: 2 to SEQ ID NO: 1, respectively). In some embodiments, the isolated nucleic acid molecule comprises SEQ ID NO:2. In some embodiments, the isolated nucleic acid molecule consists of SEQ ID NO:2. In some embodiments, the isolated nucleic acid molecule is the complement of any genomic B4GALT1 nucleic acid molecule disclosed herein.

일부 실시형태에서, 단리된 핵산 분자는 서열번호 2와 적어도 약 70%, 적어도 약 75%, 적어도 약 80%, 적어도 약 85%, 적어도 약 90%, 적어도 약 95%, 적어도 약 96%, 적어도 약 97%, 적어도 약 98%, 적어도 약 99% 또는 100% 동일한 핵산 서열을 포함하거나 이것으로 이루어진다. 일부 실시형태에서, 이러한 핵산 서열은 또한 서열번호 2의 53575 내지 53577번 위치에 상응하는 뉴클레오타이드를 포함한다. 일부 실시형태에서, 단리된 핵산 분자는 B4GALT1 유전자의 엑손 1 내지 6을 포함하는 서열번호 2의 일부와 적어도 약 70%, 적어도 약 75%, 적어도 약 80%, 적어도 약 85%, 적어도 약 90%, 적어도 약 95%, 적어도 약 96%, 적어도 약 97%, 적어도 약 98%, 적어도 약 99% 또는 100% 동일한 핵산 서열을 포함하거나 이것으로 이루어진다. 일부 실시형태에서, 이러한 핵산 서열은 또한 서열번호 2의 53575 내지 53577번 위치에 상응하는 뉴클레오타이드를 포함한다. 일부 실시형태에서, 단리된 핵산 분자는 엑손 5를 포함하는 서열번호 2의 일부와 적어도 약 70%, 적어도 약 75%, 적어도 약 80%, 적어도 약 85%, 적어도 약 90%, 적어도 약 95%, 적어도 약 96%, 적어도 약 97%, 적어도 약 98%, 적어도 약 99% 또는 100% 동일한 핵산 서열을 포함하거나 이것으로 이루어진다. 일부 실시형태에서, 이러한 핵산 서열은 또한 서열번호 2의 53575 내지 53577번 위치에 상응하는 뉴클레오타이드를 포함한다. 일부 실시형태에서, 단리된 핵산 분자는 서열번호 2와 적어도 약 90% 동일한 핵산 서열을 포함하되, 단 핵산 서열은 서열번호 2의 53575 내지 53577번 위치에 상응하는 뉴클레오타이드를 포함한다.In some embodiments, the isolated nucleic acid molecule is at least about 70%, at least about 75%, at least about 80%, at least about 85%, at least about 90%, at least about 95%, at least about 96%, or at least It comprises or consists of nucleic acid sequences that are about 97%, at least about 98%, at least about 99%, or 100% identical. In some embodiments, this nucleic acid sequence also includes nucleotides corresponding to positions 53575 to 53577 of SEQ ID NO:2. In some embodiments, the isolated nucleic acid molecule comprises at least about 70%, at least about 75%, at least about 80%, at least about 85%, at least about 90% of a portion of SEQ ID NO: 2, including exons 1 to 6 of the B4GALT1 gene. , comprises or consists of nucleic acid sequences that are at least about 95%, at least about 96%, at least about 97%, at least about 98%, at least about 99%, or 100% identical. In some embodiments, this nucleic acid sequence also includes nucleotides corresponding to positions 53575 to 53577 of SEQ ID NO:2. In some embodiments, the isolated nucleic acid molecule is at least about 70%, at least about 75%, at least about 80%, at least about 85%, at least about 90%, or at least about 95% with a portion of SEQ ID NO:2, including exon 5. , comprises or consists of nucleic acid sequences that are at least about 96%, at least about 97%, at least about 98%, at least about 99%, or 100% identical. In some embodiments, this nucleic acid sequence also includes nucleotides corresponding to positions 53575 to 53577 of SEQ ID NO:2. In some embodiments, the isolated nucleic acid molecule comprises a nucleic acid sequence that is at least about 90% identical to SEQ ID NO:2, provided that the nucleic acid sequence includes nucleotides corresponding to positions 53575-53577 of SEQ ID NO:2.

핵산 내의 핵산 서열의 특정 스트레치(stretch) 간의 백분율 상보성은 BLAST 프로그램(기본적인 국지 정렬 탐색 툴) 및 PowerBLAST 프로그램(Altschul et al., J. Mol. Biol., 1990, 215, 403-410; Zhang and Madden, Genome Res., 1997, 7, 649-656)을 사용함으로써 또는 디폴트 설정을 사용하는 Gap 프로그램(위스콘신 서열 분석 패키지(Wisconsin Sequence Analysis Package), 유닉스용 버전 8, 제네틱스 컴퓨터 그룹(Genetics Computer Group), 미국 위스콘신주 매디슨 유니버시티 리서치 파크 소재))(이것은 스미쓰 및 워터만의 알고리즘(Adv. Appl. Math., 1981, 2, 482-489)을 사용함)을 사용함으로써 일상적으로 결정될 수 있다.Percent complementarity between specific stretches of nucleic acid sequence within a nucleic acid can be determined using the BLAST program (a basic local alignment search tool) and the PowerBLAST program (Altschul et al., J. Mol. Biol., 1990, 215, 403-410; Zhang and Madden , Genome Res., 1997, 7, 649-656) or by using the Gap program (Wisconsin Sequence Analysis Package, version 8 for Unix, Genetics Computer Group, using default settings). It can be determined routinely by using the algorithm of Smith and Waterman (Adv. Appl. Math., 1981, 2, 482-489), University Research Park, Madison, Wisconsin, USA).

일부 실시형태에서, 단리된 핵산 분자는 전체 게놈 서열 미만을 포함한다. 일부 실시형태에서, 단리된 핵산 분자는 서열번호 2의 적어도 약 15, 적어도 약 20, 적어도 약 25, 적어도 약 30, 적어도 약 35, 적어도 약 40, 적어도 약 45, 적어도 약 50, 적어도 약 60, 적어도 약 70, 적어도 약 80, 적어도 약 90, 적어도 약 100, 적어도 약 200, 적어도 약 300, 적어도 약 400, 적어도 약 500, 적어도 약 600, 적어도 약 700, 적어도 약 800, 적어도 약 900, 적어도 약 1000, 적어도 약 2000, 적어도 약 3000, 적어도 약 4000, 적어도 약 5000, 적어도 약 6000, 적어도 약 7000, 적어도 약 8000, 적어도 약 9000, 적어도 약 10000, 적어도 약 11000, 적어도 약 12000, 적어도 약 13000, 적어도 약 14000, 적어도 약 15000, 적어도 약 16000, 적어도 약 17000, 적어도 약 18000, 적어도 약 19000 또는 적어도 약 20000개의 인접 뉴클레오타이드를 포함하거나 이것으로 이루어진다. 일부 실시형태에서, 이러한 단리된 핵산 분자는 또한 서열번호 2의 53575 내지 53577번 위치에 상응하는 뉴클레오타이드를 포함한다. 일부 실시형태에서, 단리된 핵산 분자는 서열번호 2의 적어도 약 15, 적어도 약 20, 적어도 약 25, 적어도 약 30, 적어도 약 35, 적어도 약 40, 적어도 약 45, 적어도 약 50, 적어도 약 60, 적어도 약 70, 적어도 약 80, 적어도 약 90, 적어도 약 100, 적어도 약 200, 적어도 약 300, 적어도 약 400, 적어도 약 500, 적어도 약 600, 적어도 약 700, 적어도 약 800, 적어도 약 900 또는 적어도 약 1000개의 인접 뉴클레오타이드를 포함하거나 이것으로 이루어진다. 일부 실시형태에서, 이러한 단리된 핵산 분자는 또한 서열번호 2의 53575 내지 53577번 위치에 상응하는 뉴클레오타이드를 포함한다. 일부 실시형태에서, 단리된 핵산 분자는 서열번호 2의 엑손 5의 적어도 약 15, 적어도 약 20, 적어도 약 25, 적어도 약 30, 적어도 약 35, 적어도 약 40, 적어도 약 45, 적어도 약 50, 적어도 약 60, 적어도 약 70, 적어도 약 80, 적어도 약 90, 적어도 약 100, 적어도 약 200, 적어도 약 300, 적어도 약 400, 적어도 약 500, 적어도 약 600, 적어도 약 700, 적어도 약 800, 적어도 약 900 또는 적어도 약 1000개의 인접 뉴클레오타이드를 포함하거나 이것으로 이루어진다. 일부 실시형태에서, 이러한 단리된 핵산 분자는 또한 서열번호 2의 53575 내지 53577번 위치에 상응하는 뉴클레오타이드를 포함한다.In some embodiments, an isolated nucleic acid molecule comprises less than the entire genome sequence. In some embodiments, the isolated nucleic acid molecule is at least about 15, at least about 20, at least about 25, at least about 30, at least about 35, at least about 40, at least about 45, at least about 50, at least about 60, At least about 70, at least about 80, at least about 90, at least about 100, at least about 200, at least about 300, at least about 400, at least about 500, at least about 600, at least about 700, at least about 800, at least about 900, at least about 1000, at least about 2000, at least about 3000, at least about 4000, at least about 5000, at least about 6000, at least about 7000, at least about 8000, at least about 9000, at least about 10000, at least about 11000, at least about 12000, at least about 13000, Contains or consists of at least about 14,000, at least about 15,000, at least about 16,000, at least about 17,000, at least about 18,000, at least about 19,000 or at least about 20,000 contiguous nucleotides. In some embodiments, this isolated nucleic acid molecule also includes nucleotides corresponding to positions 53575-53577 of SEQ ID NO:2. In some embodiments, the isolated nucleic acid molecule is at least about 15, at least about 20, at least about 25, at least about 30, at least about 35, at least about 40, at least about 45, at least about 50, at least about 60, at least about 70, at least about 80, at least about 90, at least about 100, at least about 200, at least about 300, at least about 400, at least about 500, at least about 600, at least about 700, at least about 800, at least about 900 or at least about Contains or consists of 1000 contiguous nucleotides. In some embodiments, this isolated nucleic acid molecule also includes nucleotides corresponding to positions 53575-53577 of SEQ ID NO:2. In some embodiments, the isolated nucleic acid molecule has at least about 15, at least about 20, at least about 25, at least about 30, at least about 35, at least about 40, at least about 45, at least about 50, at least About 60, at least about 70, at least about 80, at least about 90, at least about 100, at least about 200, at least about 300, at least about 400, at least about 500, at least about 600, at least about 700, at least about 800, at least about 900 or comprises or consists of at least about 1000 contiguous nucleotides. In some embodiments, this isolated nucleic acid molecule also includes nucleotides corresponding to positions 53575-53577 of SEQ ID NO:2.

예를 들어, 일부 실시형태에서, 단리된 핵산 분자는 서열번호 2의 적어도 15개의 인접 뉴클레오타이드를 포함하고, 여기서 인접 뉴클레오타이드는 서열번호 2의 뉴클레오타이드 53575 내지 53577을 포함한다. 일부 이러한 실시형태에서, 단리된 핵산 분자는 서열번호 2의 적어도 20, 적어도 25 또는 적어도 30개의 인접 뉴클레오타이드를 포함한다. 일부 실시형태에서, 단리된 핵산 분자는 서열번호 2의 15 내지 50개의 인접 뉴클레오타이드를 포함하고, 여기서 인접 뉴클레오타이드는 서열번호 2의 뉴클레오타이드 53575 내지 53577을 포함한다. 일부 이러한 실시형태에서, 단리된 핵산 분자는 서열번호 2의 적어도 20, 적어도 25 또는 적어도 30개의 인접 뉴클레오타이드를 포함한다.For example, in some embodiments, the isolated nucleic acid molecule comprises at least 15 contiguous nucleotides of SEQ ID NO:2, wherein the contiguous nucleotides comprise nucleotides 53575 to 53577 of SEQ ID NO:2. In some such embodiments, the isolated nucleic acid molecule comprises at least 20, at least 25, or at least 30 contiguous nucleotides of SEQ ID NO:2. In some embodiments, the isolated nucleic acid molecule comprises 15 to 50 contiguous nucleotides of SEQ ID NO:2, where the contiguous nucleotides comprise nucleotides 53575 to 53577 of SEQ ID NO:2. In some such embodiments, the isolated nucleic acid molecule comprises at least 20, at least 25, or at least 30 contiguous nucleotides of SEQ ID NO:2.

일부 실시형태에서, 본 개시내용은 서열번호 2의 일부와 적어도 90% 동일한 핵산 서열을 포함하는 단리된 핵산을 제공하며, 여기서 서열번호 2의 일부는 서열번호 2의 뉴클레오타이드 53575 내지 53577를 포함하고, 여기서 서열번호 2의 일부는 적어도 15개 뉴클레오타이드 길이이다. 일부 이러한 실시형태에서, 서열번호 2의 일부는 적어도 20, 적어도 25 또는 적어도 30개 뉴클레오타이드 길이이다. 일부 실시형태에서, 본 개시내용은 서열번호 2의 일부와 적어도 90% 동일한 핵산 서열을 포함하는 단리된 핵산을 제공하며, 여기서 서열번호 2의 일부는 서열번호 2의 뉴클레오타이드 53575 내지 53577를 포함하고, 여기서 서열번호 2의 일부는 15 내지 50개 뉴클레오타이드 길이이다. 일부 이러한 실시형태에서, 서열번호 2의 일부는 적어도 20, 적어도 25 또는 적어도 30개 뉴클레오타이드 길이이다.In some embodiments, the present disclosure provides an isolated nucleic acid comprising a nucleic acid sequence that is at least 90% identical to a portion of SEQ ID NO:2, wherein the portion of SEQ ID NO:2 comprises nucleotides 53575 to 53577 of SEQ ID NO:2, wherein the portion of SEQ ID NO:2 is at least 15 nucleotides long. In some such embodiments, the portion of SEQ ID NO:2 is at least 20, at least 25, or at least 30 nucleotides long. In some embodiments, the present disclosure provides an isolated nucleic acid comprising a nucleic acid sequence that is at least 90% identical to a portion of SEQ ID NO:2, wherein the portion of SEQ ID NO:2 comprises nucleotides 53575 to 53577 of SEQ ID NO:2, wherein the portion of SEQ ID NO:2 is 15 to 50 nucleotides long. In some such embodiments, the portion of SEQ ID NO:2 is at least 20, at least 25, or at least 30 nucleotides long.

일부 실시형태에서, 본 개시내용은 서열번호 2의 일부와 적어도 95% 동일한 핵산 서열을 포함하는 단리된 핵산을 제공하며, 여기서 서열번호 2의 일부는 서열번호 2의 뉴클레오타이드 53575 내지 53577를 포함하고, 여기서 서열번호 2의 일부는 적어도 15개 뉴클레오타이드 길이이다. 일부 이러한 실시형태에서, 서열번호 2의 일부는 적어도 20, 적어도 25 또는 적어도 30개 뉴클레오타이드 길이이다. 일부 실시형태에서, 본 개시내용은 서열번호 2의 일부와 적어도 95% 동일한 핵산 서열을 포함하는 단리된 핵산을 제공하며, 여기서 서열번호 2의 일부는 서열번호 2의 뉴클레오타이드 53575 내지 53577를 포함하고, 여기서 서열번호 2의 일부는 15 내지 50개 뉴클레오타이드 길이이다. 일부 이러한 실시형태에서, 서열번호 2의 일부는 적어도 20, 적어도 25 또는 적어도 30개 뉴클레오타이드 길이이다.In some embodiments, the present disclosure provides an isolated nucleic acid comprising a nucleic acid sequence that is at least 95% identical to a portion of SEQ ID NO:2, wherein the portion of SEQ ID NO:2 comprises nucleotides 53575 to 53577 of SEQ ID NO:2, wherein the portion of SEQ ID NO:2 is at least 15 nucleotides long. In some such embodiments, the portion of SEQ ID NO:2 is at least 20, at least 25, or at least 30 nucleotides long. In some embodiments, the present disclosure provides an isolated nucleic acid comprising a nucleic acid sequence that is at least 95% identical to a portion of SEQ ID NO:2, wherein the portion of SEQ ID NO:2 comprises nucleotides 53575 to 53577 of SEQ ID NO:2, wherein the portion of SEQ ID NO:2 is 15 to 50 nucleotides long. In some such embodiments, the portion of SEQ ID NO:2 is at least 20, at least 25, or at least 30 nucleotides long.

이러한 단리된 핵산 분자는 예를 들어, 변이체 B4GALT1 mRNA 및 단백질을 발현하도록 또는 외인성 공여자 서열로서 사용될 수 있다. 집단 내의 유전자 서열은 다형성, 예컨대, SNP로 인해서 달라질 수 있다는 것이 이해된다. 본 명세서에 제공된 예는 단지 예시적인 서열이며, 다른 서열이 또한 가능하다.Such isolated nucleic acid molecules can be used, for example, to express variant B4GALT1 mRNA and protein or as an exogenous donor sequence. It is understood that genetic sequences within a population may vary due to polymorphisms, such as SNPs. The examples provided herein are merely illustrative sequences; other sequences are also possible.

일부 실시형태에서, 단리된 핵산 분자는, 서열번호 2의 하나 이상의 비필수적인 분절이 상응하는 야생형 B4GALT1 유전자에 대해서 결실된 변이체 B4GALT1 미니유전자를 포함한다. 일부 실시형태에서, 결실된 비필수적인 분절은 하나 이상의 인트론 서열을 포함한다. 일부 실시형태에서, B4GALT1 미니유전자는 예를 들어, 변이체 B4GALT1(서열번호 2)로부터의, 엑손 1 내지 6 중 임의의 하나 이상에 상응하는 엑손, 또는 이러한 엑손의 임의의 조합물을 포함할 수 있다. 일부 실시형태에서, 미니유전자는 서열번호 2의 엑손 5을 포함하거나 이것으로 이루어진다. 일부 실시형태에서, B4GALT1 미니유전자는 엑손 1 내지 6 중 임의의 하나 이상 또는 이러한 엑손의 임의의 조합물을 포함하는 서열번호 2의 일부와 적어도 약 70%, 적어도 약 75%, 적어도 약 80%, 적어도 약 85%, 적어도 약 90%, 적어도 약 95%, 적어도 약 96%, 적어도 약 97%, 적어도 약 98%, 적어도 약 99% 또는 100% 동일하다. 일부 실시형태에서, B4GALT1 미니유전자는 엑손 1 내지 6 중 임의의 하나 이상 또는 이러한 엑손의 임의의 조합물을 포함하는 서열번호 2의 일부와 적어도 70%, 적어도 75%, 적어도 80%, 적어도 85%, 적어도 90%, 적어도 95%, 적어도 96%, 적어도 97%, 적어도 98%, 적어도 99% 또는 100% 동일하고, 서열번호 2의 53575 내지 53577번 위치에 상응하는 뉴클레오타이드를 포함한다. 일부 실시형태에서, B4GALT1 미니유전자는 엑손 5를 포함하는 서열번호 2의 일부와 적어도 약 70%, 적어도 약 75%, 적어도 약 80%, 적어도 약 85%, 적어도 약 90%, 적어도 약 95%, 적어도 약 96%, 적어도 약 97%, 적어도 약 98%, 적어도 약 99% 또는 100% 동일하다.In some embodiments, the isolated nucleic acid molecule comprises a variant B4GALT1 minigene in which one or more non-essential segments of SEQ ID NO:2 are deleted relative to the corresponding wild-type B4GALT1 gene. In some embodiments, the deleted non-essential segment includes one or more intron sequences. In some embodiments, the B4GALT1 minigene may comprise an exon corresponding to any one or more of exons 1 to 6, for example, from variant B4GALT1 (SEQ ID NO: 2), or any combination of such exons. . In some embodiments, the minigene comprises or consists of exon 5 of SEQ ID NO:2. In some embodiments, the B4GALT1 minigene is at least about 70%, at least about 75%, at least about 80% of a portion of SEQ ID NO:2, including any one or more of exons 1 to 6 or any combination of such exons. At least about 85%, at least about 90%, at least about 95%, at least about 96%, at least about 97%, at least about 98%, at least about 99%, or 100% identical. In some embodiments, the B4GALT1 minigene is at least 70%, at least 75%, at least 80%, at least 85% of a portion of SEQ ID NO:2 comprising any one or more of exons 1 to 6 or any combination of such exons. , is at least 90%, at least 95%, at least 96%, at least 97%, at least 98%, at least 99% or 100% identical, and includes nucleotides corresponding to positions 53575 to 53577 of SEQ ID NO: 2. In some embodiments, the B4GALT1 minigene is at least about 70%, at least about 75%, at least about 80%, at least about 85%, at least about 90%, at least about 95%, are at least about 96% identical, at least about 97% identical, at least about 98% identical, at least about 99% identical, or 100% identical.

본 개시내용은 또한 변이체 B4GALT1 게놈 서열 또는 변이체 B4GALT1 미니유전자에 혼성화된 단리된 핵산 분자를 제공한다. 일부 실시형태에서, 이러한 단리된 핵산 분자는 적어도 약 15, 적어도 약 20, 적어도 약 25, 적어도 약 30, 적어도 약 35, 적어도 약 40, 적어도 약 45, 적어도 약 50, 적어도 약 60, 적어도 약 70, 적어도 약 80, 적어도 약 90, 적어도 약 100, 적어도 약 200, 적어도 약 300, 적어도 약 400, 적어도 약 500, 적어도 약 600, 적어도 약 700, 적어도 약 800, 적어도 약 900, 적어도 약 1000, 적어도 약 2000, 적어도 약 3000, 적어도 약 4000, 적어도 약 5000, 적어도 약 6000, 적어도 약 7000, 적어도 약 8000, 적어도 약 9000, 적어도 약 10000, 적어도 약 11000, 적어도 약 12000, 적어도 약 13000, 적어도 약 14000, 적어도 약 15000, 적어도 약 16000, 적어도 약 17000, 적어도 약 18000, 적어도 약 19000 또는 적어도 약 20000개의 뉴클레오타이드를 포함하거나 이것으로 이루어진다. 일부 실시형태에서, 이러한 단리된 핵산 분자는 또한 서열번호 2의 53575 내지 53577번 위치에 혼성화된다. 일부 실시형태에서, 단리된 핵산 분자는 서열번호 2의 53575 내지 53577번 위치를 포함하거나 또는 이의 약 1000, 약 500, 약 400, 약 300, 약 200, 약 100, 약 50, 약 45, 약 40, 약 35, 약 30, 약 25, 약 20, 약 15, 약 10, 또는 약 5개 뉴클레오타이드 내에 존재하는 분절에서 변이체 B4GALT1 게놈 또는 미니유전자의 일부에 혼성화된다. 일부 실시형태에서, 단리된 핵산 분자는 변이체 B4GALT1 게놈 DNA 또는 미니유전자와 적어도 약 70%, 적어도 약 75%, 적어도 약 80%, 적어도 약 85%, 적어도 약 90%, 적어도 약 95%, 적어도 약 96%, 적어도 약 97%, 적어도 약 98%, 적어도 약 99% 또는 100% 동일한 핵산 분자의 적어도 약 15개의 인접 뉴클레오타이드에 혼성화된다. 일부 실시형태에서, 이러한 단리된 핵산 분자는 또한 서열번호 2의 53575 내지 53577번 위치에 혼성화된다. 일부 실시형태에서, 단리된 핵산 분자는 약 15 내지 약 100개의 뉴클레오타이드, 또는 약 15 내지 약 35개의 뉴클레오타이드를 포함하거나 이것으로 이루어진다.The present disclosure also provides isolated nucleic acid molecules that hybridize to a variant B4GALT1 genomic sequence or variant B4GALT1 minigene. In some embodiments, such isolated nucleic acid molecules have at least about 15, at least about 20, at least about 25, at least about 30, at least about 35, at least about 40, at least about 45, at least about 50, at least about 60, at least about 70. , at least about 80, at least about 90, at least about 100, at least about 200, at least about 300, at least about 400, at least about 500, at least about 600, at least about 700, at least about 800, at least about 900, at least about 1000, at least About 2000, at least about 3000, at least about 4000, at least about 5000, at least about 6000, at least about 7000, at least about 8000, at least about 9000, at least about 10000, at least about 11000, at least about 12000, at least about 13000, at least about 14000 , comprises or consists of at least about 15,000, at least about 16,000, at least about 17,000, at least about 18,000, at least about 19,000 or at least about 20,000 nucleotides. In some embodiments, this isolated nucleic acid molecule also hybridizes to positions 53575-53577 of SEQ ID NO:2. In some embodiments, the isolated nucleic acid molecule comprises positions 53575 to 53577 of SEQ ID NO:2, or about 1000, about 500, about 400, about 300, about 200, about 100, about 50, about 45, about 40 thereof. , about 35, about 30, about 25, about 20, about 15, about 10, or about 5 nucleotides in a segment that hybridizes to a portion of the variant B4GALT1 genome or minigene. In some embodiments, the isolated nucleic acid molecule is at least about 70%, at least about 75%, at least about 80%, at least about 85%, at least about 90%, at least about 95%, at least about hybridizes to at least about 15 contiguous nucleotides of the nucleic acid molecule that are 96%, at least about 97%, at least about 98%, at least about 99%, or 100% identical. In some embodiments, this isolated nucleic acid molecule also hybridizes to positions 53575-53577 of SEQ ID NO:2. In some embodiments, the isolated nucleic acid molecule contains or consists of about 15 to about 100 nucleotides, or about 15 to about 35 nucleotides.

예를 들어, 일부 실시형태에서, 본 개시내용은 적어도 15개의 뉴클레오타이드를 포함하는 단리된 핵산 분자를 제공하며, 여기서 단리된 핵산 분자는 서열번호 2의 서열을 포함하는 핵산에 혼성화되며, 여기서 단리된 핵산 분자는 서열번호 2의 일부에 혼성화되고, 여기서 서열번호 2의 일부는 서열번호 2의 뉴클레오타이드 53575 내지 53577을 포함한다. 일부 이러한 실시형태에서, 단리된 핵산 분자는 적어도 20, 적어도 25 또는 적어도 30개의 뉴클레오타이드를 포함한다. 일부 실시형태에서, 본 개시내용은 15 내지 50개의 뉴클레오타이드를 포함하는 단리된 핵산 분자를 제공하며, 여기서 단리된 핵산 분자는 서열번호 2의 서열을 포함하는 핵산에 혼성화되며, 여기서 단리된 핵산 분자는 서열번호 2의 일부에 혼성화되고, 여기서 서열번호 2의 일부는 서열번호 2의 뉴클레오타이드 53575 내지 53577을 포함한다. 일부 이러한 실시형태에서, 단리된 핵산 분자는 적어도 20, 적어도 25 또는 적어도 30개의 뉴클레오타이드를 포함한다.For example, in some embodiments, the present disclosure provides an isolated nucleic acid molecule comprising at least 15 nucleotides, wherein the isolated nucleic acid molecule hybridizes to a nucleic acid comprising the sequence of SEQ ID NO: 2, wherein the isolated The nucleic acid molecule hybridizes to a portion of SEQ ID NO:2, wherein the portion of SEQ ID NO:2 includes nucleotides 53575 to 53577 of SEQ ID NO:2. In some such embodiments, the isolated nucleic acid molecule comprises at least 20, at least 25, or at least 30 nucleotides. In some embodiments, the present disclosure provides an isolated nucleic acid molecule comprising 15 to 50 nucleotides, wherein the isolated nucleic acid molecule hybridizes to a nucleic acid comprising the sequence of SEQ ID NO: 2, wherein the isolated nucleic acid molecule has Hybridizes to a portion of SEQ ID NO:2, wherein the portion of SEQ ID NO:2 includes nucleotides 53575 to 53577 of SEQ ID NO:2. In some such embodiments, the isolated nucleic acid molecule comprises at least 20, at least 25, or at least 30 nucleotides.

일부 실시형태에서, 단리된 핵산 분자는 핵산의 적어도 15개의 인접 뉴클레오타이드에 혼성화되고, 여기서 인접 뉴클레오타이드는 서열번호 2의 일부와 적어도 90% 동일하고, 여기서 인접 뉴클레오타이드는 서열번호 2의 위치 53575 내지 53577에 상응하는 위치에서 서열번호 2의 뉴클레오타이드 53575 내지 53577을 포함한다. 일부 이러한 실시형태에서, 인접 뉴클레오타이드는 적어도 20, 적어도 25 또는 적어도 30개 뉴클레오타이드 길이이다. 일부 실시형태에서, 단리된 핵산 분자는 핵산의 적어도 15개의 인접 뉴클레오타이드에 혼성화되고, 여기서 인접 뉴클레오타이드는 서열번호 2의 일부와 적어도 95% 동일하고, 여기서 인접 뉴클레오타이드는 서열번호 2의 위치 53575 내지 53577에 상응하는 위치에서 서열번호 2의 뉴클레오타이드 53575 내지 53577을 포함한다. 일부 이러한 실시형태에서, 인접 뉴클레오타이드는 적어도 20, 적어도 25 또는 적어도 30개 뉴클레오타이드 길이이다. 일부 실시형태에서, 단리된 핵산 분자는 핵산의 적어도 15개의 인접 뉴클레오타이드에 혼성화되고, 여기서 인접 뉴클레오타이드는 서열번호 2의 일부와 적어도 100% 동일하고, 여기서 인접 뉴클레오타이드는 서열번호 2의 위치 53575 내지 53577에 상응하는 위치에서 서열번호 2의 뉴클레오타이드 53575 내지 53577을 포함한다. 일부 이러한 실시형태에서, 인접 뉴클레오타이드는 적어도 20, 적어도 25 또는 적어도 30개 뉴클레오타이드 길이이다.In some embodiments, the isolated nucleic acid molecule hybridizes to at least 15 contiguous nucleotides of the nucleic acid, wherein the contiguous nucleotides are at least 90% identical to a portion of SEQ ID NO:2, and wherein the contiguous nucleotides are at positions 53575 to 53577 of SEQ ID NO:2. It includes nucleotides 53575 to 53577 of SEQ ID NO:2 at the corresponding position. In some such embodiments, the contiguous nucleotides are at least 20, at least 25, or at least 30 nucleotides long. In some embodiments, the isolated nucleic acid molecule hybridizes to at least 15 contiguous nucleotides of the nucleic acid, wherein the contiguous nucleotides are at least 95% identical to a portion of SEQ ID NO:2, and wherein the contiguous nucleotides are at positions 53575 to 53577 of SEQ ID NO:2. It includes nucleotides 53575 to 53577 of SEQ ID NO:2 at the corresponding position. In some such embodiments, the contiguous nucleotides are at least 20, at least 25, or at least 30 nucleotides long. In some embodiments, the isolated nucleic acid molecule hybridizes to at least 15 contiguous nucleotides of the nucleic acid, wherein the contiguous nucleotides are at least 100% identical to a portion of SEQ ID NO:2, and wherein the contiguous nucleotides are at positions 53575 to 53577 of SEQ ID NO:2. It includes nucleotides 53575 to 53577 of SEQ ID NO:2 at the corresponding position. In some such embodiments, the contiguous nucleotides are at least 20, at least 25, or at least 30 nucleotides long.

일부 실시형태에서, 단리된 핵산 분자는 핵산의 15 내지 50개의 인접 뉴클레오타이드에 혼성화되고, 여기서 인접 뉴클레오타이드는 서열번호 2의 일부와 적어도 90% 동일하고, 여기서 인접 뉴클레오타이드는 서열번호 2의 위치 53575 내지 53577에 상응하는 위치에서 서열번호 2의 뉴클레오타이드 53575 내지 53577을 포함한다. 일부 이러한 실시형태에서, 인접 뉴클레오타이드는 적어도 20, 적어도 25 또는 적어도 30개 뉴클레오타이드 길이이다. 일부 실시형태에서, 단리된 핵산 분자는 핵산의 15 내지 50개의 인접 뉴클레오타이드에 혼성화되고, 여기서 인접 뉴클레오타이드는 서열번호 2의 일부와 적어도 95% 동일하고, 여기서 인접 뉴클레오타이드는 서열번호 2의 위치 53575 내지 53577에 상응하는 위치에서 서열번호 2의 뉴클레오타이드 53575 내지 53577을 포함한다. 일부 이러한 실시형태에서, 인접 뉴클레오타이드는 적어도 20, 적어도 25 또는 적어도 30개 뉴클레오타이드 길이이다. 일부 실시형태에서, 단리된 핵산 분자는 핵산의 15 내지 50개의 인접 뉴클레오타이드에 혼성화되고, 여기서 인접 뉴클레오타이드는 서열번호 2의 일부와 적어도 100% 동일하고, 여기서 인접 뉴클레오타이드는 서열번호 2의 위치 53575 내지 53577에 상응하는 위치에서 서열번호 2의 뉴클레오타이드 53575 내지 53577을 포함한다. 일부 이러한 실시형태에서, 인접 뉴클레오타이드는 적어도 20, 적어도 25 또는 적어도 30개 뉴클레오타이드 길이이다.In some embodiments, the isolated nucleic acid molecule hybridizes to 15 to 50 contiguous nucleotides of the nucleic acid, wherein the contiguous nucleotides are at least 90% identical to a portion of SEQ ID NO: 2, and wherein the contiguous nucleotides are positions 53575 to 53577 of SEQ ID NO: 2. It includes nucleotides 53575 to 53577 of SEQ ID NO: 2 at the corresponding position. In some such embodiments, the contiguous nucleotides are at least 20, at least 25, or at least 30 nucleotides long. In some embodiments, the isolated nucleic acid molecule hybridizes to 15 to 50 contiguous nucleotides of the nucleic acid, wherein the contiguous nucleotides are at least 95% identical to a portion of SEQ ID NO: 2, and wherein the contiguous nucleotides are positions 53575 to 53577 of SEQ ID NO: 2. It includes nucleotides 53575 to 53577 of SEQ ID NO: 2 at the corresponding position. In some such embodiments, the contiguous nucleotides are at least 20, at least 25, or at least 30 nucleotides long. In some embodiments, the isolated nucleic acid molecule hybridizes to 15 to 50 contiguous nucleotides of the nucleic acid, wherein the contiguous nucleotides are at least 100% identical to a portion of SEQ ID NO: 2, and wherein the contiguous nucleotides are positions 53575 to 53577 of SEQ ID NO: 2. It includes nucleotides 53575 to 53577 of SEQ ID NO: 2 at the corresponding position. In some such embodiments, the contiguous nucleotides are at least 20, at least 25, or at least 30 nucleotides long.

이러한 단리된 핵산 분자는 예를 들어, 가이드 RNA, 프라이머, 프로브 또는 외인성 공여자 서열로서 사용될 수 있다.These isolated nucleic acid molecules can be used, for example, as guide RNAs, primers, probes, or exogenous donor sequences.

대표적인 야생형 B4GALT1 게놈 서열은 서열번호 1에 제시된다. 대표적인 변이체 B4GALT1 게놈 서열 변이체는 서열번호 2에 제시된다.A representative wild-type B4GALT1 genome sequence is shown in SEQ ID NO:1. A representative variant B4GALT1 genome sequence variant is shown in SEQ ID NO:2.

본 개시내용은 또한 B4GALT1 mRNA의 변이체를 포함하는 단리된 핵산 분자를 제공한다. 예시적인 야생형 인간 B4GALT1 mRNA는 NCBI 수탁 NM_001497(서열번호 3)에 배정되고, 4214개의 뉴클레오타이드 염기로 이루어진다. 인간 B4GALT1 mRNA의 변이체가 서열번호 4에 제시되며, 이것은 SNP(위치 1244에서 A에서 G로; 본 명세서에서 변이체 B4GALT1이라고 지칭됨)를 포함하는데, 이것은 암호화된 B4GALT1 변이체 폴리펩타이드의 352번 위치에 상응하는 위치에서 세린을 초래한다. 야생형 인간 B4GALT1 mRNA의 1243 내지 1245번 위치에서의 3개의 염기 "aau"에 상반되는 바와 같이, 변이체 인간 게놈 B4GALT1 mRNA는 예를 들어, 야생형 인간 B4GALT1 mRNA의 1243 내지 1245번 위치에 상응하는 위치에 세린을 암호화하는 3개의 염기 "agu"를 포함한다(각각 서열번호 4를 서열번호 3과 비교). 일부 실시형태에서, 단리된 핵산 분자는 서열번호 4를 포함한다. 일부 실시형태에서, 단리된 핵산 분자는 서열번호 4로 이루어진다.The present disclosure also provides isolated nucleic acid molecules comprising variants of B4GALT1 mRNA. The exemplary wild-type human B4GALT1 mRNA is assigned to NCBI accession NM_001497 (SEQ ID NO: 3) and consists of 4214 nucleotide bases. A variant of human B4GALT1 mRNA is set forth in SEQ ID NO:4, which contains a SNP (A to G at position 1244; referred to herein as variant B4GALT1 ), which corresponds to position 352 of the encoded B4GALT1 variant polypeptide. It results in serine at the position. The variant human genomic B4GALT1 mRNA contains, for example, a serine at a position corresponding to positions 1243 to 1245 of the wild-type human B4GALT1 mRNA, as opposed to the three bases "aau" at positions 1243 to 1245 of the wild-type human B4GALT1 mRNA. It contains three bases “agu” encoding (compare SEQ ID NO: 4 with SEQ ID NO: 3, respectively). In some embodiments, the isolated nucleic acid molecule comprises SEQ ID NO:4. In some embodiments, the isolated nucleic acid molecule consists of SEQ ID NO:4.

일부 실시형태에서, 단리된 핵산 분자는 서열번호 4와 적어도 약 70%, 적어도 약 75%, 적어도 약 80%, 적어도 약 85%, 적어도 약 90%, 적어도 약 95%, 적어도 약 96%, 적어도 약 97%, 적어도 약 98%, 적어도 약 99% 또는 100% 동일한 핵산 서열을 포함하거나 이것으로 이루어진다. 일부 실시형태에서, 이러한 핵산 서열은 또한 서열번호 4의 1243 내지 1245번 위치에 상응하는 뉴클레오타이드를 포함한다. 일부 실시형태에서, 단리된 핵산 분자는 엑손 1 내지 6을 포함하는 서열번호 4의 일부와 적어도 약 70%, 적어도 약 75%, 적어도 약 80%, 적어도 약 85%, 적어도 약 90%, 적어도 약 95%, 적어도 약 96%, 적어도 약 97%, 적어도 약 98%, 적어도 약 99% 또는 100% 동일한 뉴클레오타이드 서열을 포함하거나 이것으로 이루어진다. 일부 실시형태에서, 이러한 핵산 서열은 또한 서열번호 4의 1243 내지 1245번 위치에 상응하는 뉴클레오타이드를 포함한다. 일부 실시형태에서, 단리된 핵산 분자는 본 명세서에 개시된 임의의 B4GALT1 mRNA 분자의 보체이다.In some embodiments, the isolated nucleic acid molecule is at least about 70%, at least about 75%, at least about 80%, at least about 85%, at least about 90%, at least about 95%, at least about 96%, at least It comprises or consists of nucleic acid sequences that are about 97%, at least about 98%, at least about 99%, or 100% identical. In some embodiments, this nucleic acid sequence also includes nucleotides corresponding to positions 1243-1245 of SEQ ID NO:4. In some embodiments, the isolated nucleic acid molecule is at least about 70%, at least about 75%, at least about 80%, at least about 85%, at least about 90%, or at least about a portion of SEQ ID NO:4, including exons 1-6. It comprises or consists of nucleotide sequences that are 95%, at least about 96%, at least about 97%, at least about 98%, at least about 99%, or 100% identical. In some embodiments, this nucleic acid sequence also includes nucleotides corresponding to positions 1243-1245 of SEQ ID NO:4. In some embodiments, the isolated nucleic acid molecule is the complement of any B4GALT1 mRNA molecule disclosed herein.

일부 실시형태에서, 단리된 핵산 분자는 전체 mRNA 서열 미만을 포함한다. 일부 실시형태에서, 단리된 핵산 분자는 서열번호 4의 적어도 약 15, 적어도 약 20, 적어도 약 25, 적어도 약 30, 적어도 약 35, 적어도 약 40, 적어도 약 45, 적어도 약 50, 적어도 약 60, 적어도 약 70, 적어도 약 80, 적어도 약 90, 적어도 약 100, 적어도 약 200, 적어도 약 300, 적어도 약 400, 적어도 약 500, 적어도 약 600, 적어도 약 700, 적어도 약 800, 적어도 약 900, 적어도 약 1000, 적어도 약 2000, 적어도 약 3000 또는 적어도 약 4000개의 인접 뉴클레오타이드를 포함하거나 이것으로 이루어진다. 일부 실시형태에서, 이러한 단리된 핵산 분자는 또한 서열번호 4의 1243 내지 1245번 위치에 상응하는 뉴클레오타이드를 포함한다. 일부 실시형태에서, 단리된 핵산 분자는 서열번호 4의 적어도 약 15, 적어도 약 20, 적어도 약 25, 적어도 약 30, 적어도 약 35, 적어도 약 40, 적어도 약 45, 적어도 약 50, 적어도 약 60, 적어도 약 70, 적어도 약 80, 적어도 약 90, 적어도 약 100, 적어도 약 200, 적어도 약 300, 적어도 약 400, 적어도 약 500, 적어도 약 600, 적어도 약 700, 적어도 약 800, 적어도 약 900 또는 적어도 약 1000개의 인접 뉴클레오타이드를 포함하거나 이것으로 이루어진다. 일부 실시형태에서, 이러한 단리된 핵산 분자는 또한 서열번호 4의 1243 내지 1245번 위치에 상응하는 뉴클레오타이드를 포함한다. 일부 실시형태에서, 단리된 핵산 분자는 서열번호 4의 엑손 1 내지 6의 적어도 약 15, 적어도 약 20, 적어도 약 25, 적어도 약 30, 적어도 약 35, 적어도 약 40, 적어도 약 45, 적어도 약 50, 적어도 약 60, 적어도 약 70, 적어도 약 80, 적어도 약 90, 적어도 약 100, 적어도 약 200, 적어도 약 300, 적어도 약 400, 적어도 약 500, 적어도 약 600, 적어도 약 700, 적어도 약 800, 적어도 약 900 또는 적어도 약 1000개의 인접 뉴클레오타이드를 포함하거나 이것으로 이루어진다. 일부 실시형태에서, 이러한 단리된 핵산 분자는 또한 서열번호 4의 1243 내지 1245번 위치에 상응하는 뉴클레오타이드를 포함한다.In some embodiments, the isolated nucleic acid molecule comprises less than the entire mRNA sequence. In some embodiments, the isolated nucleic acid molecule is at least about 15, at least about 20, at least about 25, at least about 30, at least about 35, at least about 40, at least about 45, at least about 50, at least about 60, At least about 70, at least about 80, at least about 90, at least about 100, at least about 200, at least about 300, at least about 400, at least about 500, at least about 600, at least about 700, at least about 800, at least about 900, at least about Contains or consists of 1000, at least about 2000, at least about 3000 or at least about 4000 contiguous nucleotides. In some embodiments, this isolated nucleic acid molecule also includes nucleotides corresponding to positions 1243-1245 of SEQ ID NO:4. In some embodiments, the isolated nucleic acid molecule is at least about 15, at least about 20, at least about 25, at least about 30, at least about 35, at least about 40, at least about 45, at least about 50, at least about 60, at least about 70, at least about 80, at least about 90, at least about 100, at least about 200, at least about 300, at least about 400, at least about 500, at least about 600, at least about 700, at least about 800, at least about 900 or at least about Contains or consists of 1000 contiguous nucleotides. In some embodiments, this isolated nucleic acid molecule also includes nucleotides corresponding to positions 1243-1245 of SEQ ID NO:4. In some embodiments, the isolated nucleic acid molecule has at least about 15, at least about 20, at least about 25, at least about 30, at least about 35, at least about 40, at least about 45, at least about 50 of exons 1 to 6 of SEQ ID NO:4. , at least about 60, at least about 70, at least about 80, at least about 90, at least about 100, at least about 200, at least about 300, at least about 400, at least about 500, at least about 600, at least about 700, at least about 800, at least Contains or consists of about 900 or at least about 1000 contiguous nucleotides. In some embodiments, this isolated nucleic acid molecule also includes nucleotides corresponding to positions 1243-1245 of SEQ ID NO:4.

일부 실시형태에서, 본 개시내용은 서열번호 4의 일부와 적어도 90% 동일한 핵산 서열을 포함하는 단리된 핵산 분자를 제공하며, 여기서 서열번호 4의 일부는 서열번호 4의 뉴클레오타이드 1243 내지 1245를 포함하고, 여기서 서열번호 4의 일부는 서열번호 4의 적어도 15개 뉴클레오타이드를 포함한다. 일부 이러한 실시형태에서, 서열번호 4의 일부는 서열번호 4의 적어도 20, 적어도 25 또는 적어도 30개 뉴클레오타이드이다. 일부 실시형태에서, 본 개시내용은 서열번호 4의 일부와 적어도 95% 동일한 핵산 서열을 포함하는 단리된 핵산 분자를 제공하며, 여기서 서열번호 4의 일부는 서열번호 4의 뉴클레오타이드 1243 내지 1245를 포함하고, 여기서 서열번호 4의 일부는 서열번호 4의 적어도 15개 뉴클레오타이드를 포함한다. 일부 이러한 실시형태에서, 서열번호 4의 일부는 서열번호 4의 적어도 20, 적어도 25 또는 적어도 30개 뉴클레오타이드이다. 일부 실시형태에서, 본 개시내용은 서열번호 4의 일부와 적어도 100% 동일한 핵산 서열을 포함하는 단리된 핵산 분자를 제공하며, 여기서 서열번호 4의 일부는 서열번호 4의 뉴클레오타이드 1243 내지 1245를 포함하고, 여기서 서열번호 4의 일부는 서열번호 4의 적어도 15개 뉴클레오타이드를 포함한다. 일부 이러한 실시형태에서, 서열번호 4의 일부는 서열번호 4의 적어도 20, 적어도 25 또는 적어도 30개 뉴클레오타이드이다. 일부 실시형태에서, 본 개시내용은 서열번호 4의 일부와 적어도 90% 동일한 핵산 서열을 포함하는 단리된 핵산 분자를 제공하며, 여기서 서열번호 4의 일부는 서열번호 4의 뉴클레오타이드 1243 내지 1245를 포함하고, 여기서 서열번호 4의 일부는 서열번호 4의 15 내지 50개 뉴클레오타이드를 포함한다. 일부 이러한 실시형태에서, 서열번호 4의 일부는 서열번호 4의 적어도 20, 적어도 25 또는 적어도 30개 뉴클레오타이드이다. 일부 실시형태에서, 본 개시내용은 서열번호 4의 일부와 적어도 95% 동일한 핵산 서열을 포함하는 단리된 핵산 분자를 제공하며, 여기서 서열번호 4의 일부는 서열번호 4의 뉴클레오타이드 1243 내지 1245를 포함하고, 여기서 서열번호 4의 일부는 서열번호 4의 15 내지 50개 뉴클레오타이드를 포함한다. 일부 이러한 실시형태에서, 서열번호 4의 일부는 서열번호 4의 적어도 20, 적어도 25 또는 적어도 30개 뉴클레오타이드이다. 일부 실시형태에서, 본 개시내용은 서열번호 4의 일부와 100% 동일한 핵산 서열을 포함하는 단리된 핵산 분자를 제공하며, 여기서 서열번호 4의 일부는 서열번호 4의 뉴클레오타이드 1243 내지 1245를 포함하고, 여기서 서열번호 4의 일부는 서열번호 4의 15 내지 50개 뉴클레오타이드를 포함한다. 일부 이러한 실시형태에서, 서열번호 4의 일부는 서열번호 4의 적어도 20, 적어도 25 또는 적어도 30개 뉴클레오타이드이다.In some embodiments, the present disclosure provides an isolated nucleic acid molecule comprising a nucleic acid sequence that is at least 90% identical to a portion of SEQ ID NO:4, wherein the portion of SEQ ID NO:4 comprises nucleotides 1243 to 1245 of SEQ ID NO:4 , wherein the portion of SEQ ID NO:4 includes at least 15 nucleotides of SEQ ID NO:4. In some such embodiments, the portion of SEQ ID NO:4 is at least 20, at least 25, or at least 30 nucleotides of SEQ ID NO:4. In some embodiments, the present disclosure provides an isolated nucleic acid molecule comprising a nucleic acid sequence that is at least 95% identical to a portion of SEQ ID NO:4, wherein the portion of SEQ ID NO:4 comprises nucleotides 1243 to 1245 of SEQ ID NO:4 , wherein the portion of SEQ ID NO:4 includes at least 15 nucleotides of SEQ ID NO:4. In some such embodiments, the portion of SEQ ID NO:4 is at least 20, at least 25, or at least 30 nucleotides of SEQ ID NO:4. In some embodiments, the present disclosure provides an isolated nucleic acid molecule comprising a nucleic acid sequence that is at least 100% identical to a portion of SEQ ID NO:4, wherein the portion of SEQ ID NO:4 comprises nucleotides 1243 to 1245 of SEQ ID NO:4 , wherein the portion of SEQ ID NO:4 includes at least 15 nucleotides of SEQ ID NO:4. In some such embodiments, the portion of SEQ ID NO:4 is at least 20, at least 25, or at least 30 nucleotides of SEQ ID NO:4. In some embodiments, the present disclosure provides an isolated nucleic acid molecule comprising a nucleic acid sequence that is at least 90% identical to a portion of SEQ ID NO:4, wherein the portion of SEQ ID NO:4 comprises nucleotides 1243 to 1245 of SEQ ID NO:4 , wherein the portion of SEQ ID NO: 4 includes 15 to 50 nucleotides of SEQ ID NO: 4. In some such embodiments, the portion of SEQ ID NO:4 is at least 20, at least 25, or at least 30 nucleotides of SEQ ID NO:4. In some embodiments, the present disclosure provides an isolated nucleic acid molecule comprising a nucleic acid sequence that is at least 95% identical to a portion of SEQ ID NO:4, wherein the portion of SEQ ID NO:4 comprises nucleotides 1243 to 1245 of SEQ ID NO:4 , wherein the portion of SEQ ID NO:4 includes 15 to 50 nucleotides of SEQ ID NO:4. In some such embodiments, the portion of SEQ ID NO:4 is at least 20, at least 25, or at least 30 nucleotides of SEQ ID NO:4. In some embodiments, the present disclosure provides an isolated nucleic acid molecule comprising a nucleic acid sequence that is 100% identical to a portion of SEQ ID NO:4, wherein the portion of SEQ ID NO:4 comprises nucleotides 1243 to 1245 of SEQ ID NO:4, Here, the portion of SEQ ID NO: 4 includes 15 to 50 nucleotides of SEQ ID NO: 4. In some such embodiments, the portion of SEQ ID NO:4 is at least 20, at least 25, or at least 30 nucleotides of SEQ ID NO:4.

이러한 단리된 핵산 분자는 예를 들어, B4GALT1 변이체 폴리펩타이드를 발현하도록 또는 외인성 공여자 서열로서 사용될 수 있다. 집단 내의 유전자 서열은 다형성, 예컨대, SNP로 인해서 달라질 수 있다는 것이 이해된다. 본 명세서에 제공된 예는 단지 예시적인 서열이며, 다른 서열이 또한 가능하다.Such isolated nucleic acid molecules can be used, for example, to express a B4GALT1 variant polypeptide or as an exogenous donor sequence. It is understood that genetic sequences within a population may vary due to polymorphisms, such as SNPs. The examples provided herein are merely illustrative sequences; other sequences are also possible.

일부 실시형태에서, 단리된 핵산 분자는 변이체 Asn352Ser B4GALT1 폴리펩타이드(서열번호 8)와 적어도 약 75%, 적어도 약 80%, 적어도 약 85%, 적어도 약 90%, 적어도 약 91%, 적어도 약 92%, 적어도 약 93%, 적어도 약 94%, 적어도 약 95%, 적어도 약 96%, 적어도 약 97%, 적어도 약 98%, 적어도 약 99% 또는 100% 동일한 폴리펩타이드를 암호화하는 핵산 서열을 포함하거나 이것으로 이루어지되, 단 폴리펩타이드는 352번 위치에 상응하는 위치에 세린을 포함한다. 일부 실시형태에서, 단리된 핵산 분자는 서열번호 8과 적어도 약 90% 동일한 폴리펩타이드를 암호화하는 핵산 서열을 포함하거나 이것으로 이루어지되, 단 폴리펩타이드는 352번 위치에 상응하는 위치에 세린을 포함한다. 일부 실시형태에서, 단리된 핵산 분자는 서열번호 8과 적어도 약 95% 동일한 폴리펩타이드를 암호화하는 핵산 서열을 포함하거나 이것으로 이루어지되, 단 폴리펩타이드는 352번 위치에 상응하는 위치에 세린을 포함한다.In some embodiments, the isolated nucleic acid molecule is at least about 75%, at least about 80%, at least about 85%, at least about 90%, at least about 91%, at least about 92% with the variant Asn352Ser B4GALT1 polypeptide (SEQ ID NO: 8). , or comprises a nucleic acid sequence encoding a polypeptide that is at least about 93%, at least about 94%, at least about 95%, at least about 96%, at least about 97%, at least about 98%, at least about 99%, or 100% identical. It consists of, provided that the polypeptide contains serine at the position corresponding to position 352. In some embodiments, the isolated nucleic acid molecule comprises or consists of a nucleic acid sequence encoding a polypeptide that is at least about 90% identical to SEQ ID NO: 8, provided that the polypeptide contains a serine at a position corresponding to position 352. . In some embodiments, the isolated nucleic acid molecule comprises or consists of a nucleic acid sequence encoding a polypeptide that is at least about 95% identical to SEQ ID NO: 8, provided that the polypeptide contains a serine at a position corresponding to position 352. .

예를 들어, 일부 실시형태에서, 단리된 핵산 분자는 적어도 10개의 아미노산 길이인 아미노산 서열을 갖는 폴리펩타이드를 암호화하는 핵산 서열을 포함하고, 여기서 아미노산 서열은 서열번호 8의 아미노산 서열의 일부와 90% 동일하고, 여기서 일부는 서열번호 8의 352번 위치에 상응하는 위치에 세린을 포함한다. 일부 이러한 실시형태에서, 핵산 서열은 적어도 15, 적어도 20 또는 적어도 25개 아미노산 길이인 아미노산 서열을 갖는 폴리펩타이드를 암호화한다. 일부 실시형태에서, 단리된 핵산 분자는 적어도 10개의 아미노산 길이인 아미노산 서열을 갖는 폴리펩타이드를 암호화하는 핵산 서열을 포함하고, 여기서 아미노산 서열은 서열번호 8의 아미노산 서열의 일부와 95% 동일하고, 여기서 일부는 서열번호 8의 352번 위치에 상응하는 위치에 세린을 포함한다. 일부 이러한 실시형태에서, 핵산 서열은 적어도 15, 적어도 20 또는 적어도 25개 아미노산 길이인 아미노산 서열을 갖는 폴리펩타이드를 암호화한다. 일부 실시형태에서, 단리된 핵산 분자는 10 내지 50개의 아미노산 길이인 아미노산 서열을 갖는 폴리펩타이드를 암호화하는 핵산 서열을 포함하고, 여기서 아미노산 서열은 서열번호 8의 아미노산 서열의 일부와 90% 동일하고, 여기서 일부는 서열번호 8의 352번 위치에 상응하는 위치에 세린을 포함한다. 일부 이러한 실시형태에서, 핵산 서열은 적어도 15, 적어도 20 또는 적어도 25개 아미노산 길이인 아미노산 서열을 갖는 폴리펩타이드를 암호화한다. 일부 실시형태에서, 단리된 핵산 분자는 10 내지 50개의 아미노산 길이인 아미노산 서열을 갖는 폴리펩타이드를 암호화하는 핵산 서열을 포함하고, 여기서 아미노산 서열은 서열번호 8의 아미노산 서열의 일부와 95% 동일하고, 여기서 일부는 서열번호 8의 352번 위치에 상응하는 위치에 세린을 포함한다. 일부 이러한 실시형태에서, 핵산 서열은 적어도 15, 적어도 20 또는 적어도 25개 아미노산 길이인 아미노산 서열을 갖는 폴리펩타이드를 암호화한다. 일부 실시형태에서, 단리된 핵산 분자는 서열번호 8과 동일한 폴리펩타이드를 암호화하는 핵산 서열을 포함하거나 이것으로 이루어진다.For example, in some embodiments, the isolated nucleic acid molecule comprises a nucleic acid sequence encoding a polypeptide having an amino acid sequence that is at least 10 amino acids in length, wherein the amino acid sequence is a portion of the amino acid sequence of SEQ ID NO: 8 and 90% of the amino acid sequence. identical, wherein some contain serine at a position corresponding to position 352 in SEQ ID NO:8. In some such embodiments, the nucleic acid sequence encodes a polypeptide having an amino acid sequence that is at least 15, at least 20, or at least 25 amino acids in length. In some embodiments, the isolated nucleic acid molecule comprises a nucleic acid sequence encoding a polypeptide having an amino acid sequence that is at least 10 amino acids in length, wherein the amino acid sequence is 95% identical to a portion of the amino acid sequence of SEQ ID NO: 8, wherein Some contain serine at a position corresponding to position 352 in SEQ ID NO:8. In some such embodiments, the nucleic acid sequence encodes a polypeptide having an amino acid sequence that is at least 15, at least 20, or at least 25 amino acids in length. In some embodiments, the isolated nucleic acid molecule comprises a nucleic acid sequence encoding a polypeptide having an amino acid sequence that is 10 to 50 amino acids in length, wherein the amino acid sequence is 90% identical to a portion of the amino acid sequence of SEQ ID NO: 8, Here, some contain serine at a position corresponding to position 352 in SEQ ID NO:8. In some such embodiments, the nucleic acid sequence encodes a polypeptide having an amino acid sequence that is at least 15, at least 20, or at least 25 amino acids in length. In some embodiments, the isolated nucleic acid molecule comprises a nucleic acid sequence encoding a polypeptide having an amino acid sequence that is 10 to 50 amino acids in length, wherein the amino acid sequence is 95% identical to a portion of the amino acid sequence of SEQ ID NO: 8, Here, some contain serine at a position corresponding to position 352 in SEQ ID NO:8. In some such embodiments, the nucleic acid sequence encodes a polypeptide having an amino acid sequence that is at least 15, at least 20, or at least 25 amino acids in length. In some embodiments, the isolated nucleic acid molecule comprises or consists of a nucleic acid sequence encoding a polypeptide identical to SEQ ID NO:8.

본 개시내용은 또한 변이체 B4GALT1 mRNA 서열에 혼성화되는 단리된 핵산 분자를 제공한다. 일부 실시형태에서, 이러한 단리된 핵산 분자는 적어도 약 15, 적어도 약 20, 적어도 약 25, 적어도 약 30, 적어도 약 35, 적어도 약 40, 적어도 약 45, 적어도 약 50, 적어도 약 60, 적어도 약 70, 적어도 약 80, 적어도 약 90, 적어도 약 100, 적어도 약 200, 적어도 약 300, 적어도 약 400, 적어도 약 500, 적어도 약 600, 적어도 약 700, 적어도 약 800, 적어도 약 900, 적어도 약 1000, 적어도 약 2000, 적어도 약 3000 또는 적어도 약 4000개의 뉴클레오타이드를 포함하거나 이것으로 이루어진다. 일부 실시형태에서, 이러한 단리된 핵산 분자는 또한 서열번호 4의 1243 내지 1245번 위치에 혼성화된다. 일부 실시형태에서, 단리된 핵산 분자는 서열번호 4의 1243 내지 1245번 위치를 포함하거나 또는 이의 약 1000, 약 500, 약 400, 약 300, 약 200, 약 100, 약 50, 약 45, 약 40, 약 35, 약 30, 약 25, 약 20, 약 15, 약 10, 또는 약 5개 뉴클레오타이드 내에 존재하는 분절에서 변이체 B4GALT1 mRNA의 일부에 혼성화된다.The present disclosure also provides isolated nucleic acid molecules that hybridize to variant B4GALT1 mRNA sequences. In some embodiments, such isolated nucleic acid molecules have at least about 15, at least about 20, at least about 25, at least about 30, at least about 35, at least about 40, at least about 45, at least about 50, at least about 60, at least about 70. , at least about 80, at least about 90, at least about 100, at least about 200, at least about 300, at least about 400, at least about 500, at least about 600, at least about 700, at least about 800, at least about 900, at least about 1000, at least Contains or consists of about 2000, at least about 3000 or at least about 4000 nucleotides. In some embodiments, this isolated nucleic acid molecule also hybridizes to positions 1243-1245 of SEQ ID NO:4. In some embodiments, the isolated nucleic acid molecule comprises positions 1243-1245 of SEQ ID NO:4, or about 1000, about 500, about 400, about 300, about 200, about 100, about 50, about 45, about 40 thereof. , hybridizes to a portion of the variant B4GALT1 mRNA in a segment within about 35, about 30, about 25, about 20, about 15, about 10, or about 5 nucleotides.

일부 실시형태에서, 단리된 핵산 분자는 적어도 15개의 뉴클레오타이드를 포함하거나 이들로 이루어지고, 서열번호 4의 1243 내지 1245번 위치를 포함하거나 또는 이의 5개 뉴클레오타이드 내에 존재하는 분절에서 변이체 B4GALT1 mRNA(예를 들어, 서열번호 4)의 일부에 혼성화된다. 일부 이러한 실시형태에서, 단리된 핵산 분자는 적어도 20, 적어도 25 또는 적어도 30개의 뉴클레오타이드를 포함한다. 일부 실시형태에서, 단리된 핵산 분자는 적어도 15개의 뉴클레오타이드를 포함하거나 이들로 이루어지고, 서열번호 4의 1243 내지 1245번 위치를 포함하거나 또는 이의 5개 뉴클레오타이드 내에 존재하는 분절에서 변이체 B4GALT1 mRNA(예를 들어, 서열번호 4)의 일부에 혼성화되고, 서열번호 4의 1243 내지 1245번 위치에 혼성화된다. 일부 이러한 실시형태에서, 단리된 핵산 분자는 적어도 20, 적어도 25 또는 적어도 30개의 뉴클레오타이드를 포함한다. 일부 실시형태에서, 단리된 핵산 분자는 15 내지 50개의 뉴클레오타이드를 포함하고, 서열번호 4의 1243 내지 1245번 위치를 포함하는 분절에서 변이체 B4GALT1 mRNA(예를 들어, 서열번호 4)의 일부에 혼성화되고, 서열번호 4의 1243 내지 1245번 위치에 혼성화된다. 일부 이러한 실시형태에서, 단리된 핵산 분자는 적어도 20, 적어도 25 또는 적어도 30개의 뉴클레오타이드를 포함한다.In some embodiments, the isolated nucleic acid molecule comprises or consists of at least 15 nucleotides and contains a variant B4GALT1 mRNA (e.g. For example, it hybridizes to part of SEQ ID NO: 4). In some such embodiments, the isolated nucleic acid molecule comprises at least 20, at least 25, or at least 30 nucleotides. In some embodiments, the isolated nucleic acid molecule comprises or consists of at least 15 nucleotides and contains a variant B4GALT1 mRNA (e.g. For example, it hybridizes to a portion of SEQ ID NO: 4) and hybridizes to positions 1243 to 1245 of SEQ ID NO: 4. In some such embodiments, the isolated nucleic acid molecule comprises at least 20, at least 25, or at least 30 nucleotides. In some embodiments, the isolated nucleic acid molecule comprises 15 to 50 nucleotides and hybridizes to a portion of the variant B4GALT1 mRNA (e.g., SEQ ID NO: 4) in a segment comprising positions 1243 to 1245 of SEQ ID NO: 4. , hybridizes to positions 1243 to 1245 of SEQ ID NO: 4. In some such embodiments, the isolated nucleic acid molecule comprises at least 20, at least 25, or at least 30 nucleotides.

일부 실시형태에서, 단리된 핵산 분자는 변이체 B4GALT1 mRNA(예를 들어, 서열번호 4)와 적어도 약 70%, 적어도 약 75%, 적어도 약 80%, 적어도 약 85%, 적어도 약 90%, 적어도 약 95%, 적어도 약 96%, 적어도 약 97%, 적어도 약 98%, 적어도 약 99% 또는 100% 동일한 핵산 분자의 적어도 약 15개의 인접 뉴클레오타이드에 혼성화된다. 일부 실시형태에서, 단리된 핵산 분자는 또한 서열번호 4의 1243 내지 1245번 위치에 혼성화된다. 일부 실시형태에서, 단리된 핵산 분자는 약 15 내지 약 100개의 뉴클레오타이드, 또는 약 15 내지 약 35개의 뉴클레오타이드를 포함하거나 이것으로 이루어진다.In some embodiments, the isolated nucleic acid molecule is at least about 70%, at least about 75%, at least about 80%, at least about 85%, at least about 90%, at least about hybridizes to at least about 15 contiguous nucleotides of a nucleic acid molecule that is 95%, at least about 96%, at least about 97%, at least about 98%, at least about 99%, or 100% identical. In some embodiments, the isolated nucleic acid molecule also hybridizes to positions 1243-1245 of SEQ ID NO:4. In some embodiments, the isolated nucleic acid molecule contains or consists of about 15 to about 100 nucleotides, or about 15 to about 35 nucleotides.

일부 실시형태에서, 단리된 핵산 분자는 적어도 15개의 뉴클레오타이드를 포함하거나 이들로 이루어지고, 서열번호 4의 1243 내지 1245번 위치를 포함하거나 또는 이의 5개 뉴클레오타이드 내에 존재하는 분절에서 변이체 B4GALT1 mRNA의 일부에 혼성화되고, 여기서 변이체 B4GALT1 mRNA는 변이체 B4GALT1 mRNA(예를 들어, 서열번호 4)와 적어도 90% 동일하다. 일부 이러한 실시형태에서, 단리된 핵산 분자는 적어도 20, 적어도 25 또는 적어도 30개의 뉴클레오타이드를 포함한다. 일부 실시형태에서, 단리된 핵산 분자는 적어도 15개의 뉴클레오타이드를 포함하거나 이들로 이루어지고, 서열번호 4의 1243 내지 1245번 위치를 포함하거나 또는 이의 5개 뉴클레오타이드 내에 존재하는 분절에서 변이체 B4GALT1 mRNA의 일부에 혼성화되고, 여기서 변이체 B4GALT1 mRNA는 변이체 B4GALT1 mRNA(예를 들어, 서열번호 4)와 적어도 95% 동일하다. 일부 이러한 실시형태에서, 단리된 핵산 분자는 적어도 20, 적어도 25 또는 적어도 30개의 뉴클레오타이드를 포함한다. 일부 실시형태에서, 단리된 핵산 분자는 적어도 15개의 뉴클레오타이드를 포함하거나 이들로 이루어지고, 서열번호 4의 1243 내지 1245번 위치를 포함하거나 또는 이의 5개 뉴클레오타이드 내에 존재하는 분절에서 변이체 B4GALT1 mRNA의 일부에 혼성화되고, 서열번호 4의 1243 내지 1245번 위치에 혼성화되고, 여기서 변이체 B4GALT1 mRNA는 변이체 B4GALT1 mRNA(예를 들어, 서열번호 4)와 적어도 90% 동일하다. 일부 이러한 실시형태에서, 단리된 핵산 분자는 적어도 20, 적어도 25 또는 적어도 30개의 뉴클레오타이드를 포함한다. 일부 실시형태에서, 단리된 핵산 분자는 적어도 15개의 뉴클레오타이드를 포함하거나 이들로 이루어지고, 서열번호 4의 1243 내지 1245번 위치를 포함하거나 또는 이의 5개 뉴클레오타이드 내에 존재하는 분절에서 변이체 B4GALT1 mRNA의 일부에 혼성화되고, 서열번호 4의 1243 내지 1245번 위치에 혼성화되고, 여기서 변이체 B4GALT1 mRNA는 변이체 B4GALT1 mRNA(예를 들어, 서열번호 4)와 적어도 95% 동일하다. 일부 이러한 실시형태에서, 단리된 핵산 분자는 적어도 20, 적어도 25 또는 적어도 30개의 뉴클레오타이드를 포함한다. 일부 실시형태에서, 단리된 핵산 분자는 15 내지 100개의 뉴클레오타이드, 또는 15 내지 35개의 뉴클레오타이드를 포함하거나 이것으로 이루어진다.In some embodiments, the isolated nucleic acid molecule comprises or consists of at least 15 nucleotides and is located in a portion of the variant B4GALT1 mRNA in a segment comprising or within 5 nucleotides of positions 1243 to 1245 of SEQ ID NO:4. hybridized, wherein the variant B4GALT1 mRNA is at least 90% identical to the variant B4GALT1 mRNA (e.g., SEQ ID NO:4). In some such embodiments, the isolated nucleic acid molecule comprises at least 20, at least 25, or at least 30 nucleotides. In some embodiments, the isolated nucleic acid molecule comprises or consists of at least 15 nucleotides and is located in a portion of the variant B4GALT1 mRNA in a segment comprising or within 5 nucleotides of positions 1243 to 1245 of SEQ ID NO:4. hybridized, wherein the variant B4GALT1 mRNA is at least 95% identical to the variant B4GALT1 mRNA (e.g., SEQ ID NO:4). In some such embodiments, the isolated nucleic acid molecule comprises at least 20, at least 25, or at least 30 nucleotides. In some embodiments, the isolated nucleic acid molecule comprises or consists of at least 15 nucleotides and is located in a portion of the variant B4GALT1 mRNA in a segment comprising or within 5 nucleotides of positions 1243 to 1245 of SEQ ID NO:4. hybridizes to positions 1243 to 1245 of SEQ ID NO:4, wherein the variant B4GALT1 mRNA is at least 90% identical to the variant B4GALT1 mRNA (e.g., SEQ ID NO:4). In some such embodiments, the isolated nucleic acid molecule comprises at least 20, at least 25, or at least 30 nucleotides. In some embodiments, the isolated nucleic acid molecule comprises or consists of at least 15 nucleotides and is located in a portion of the variant B4GALT1 mRNA in a segment comprising or within 5 nucleotides of positions 1243 to 1245 of SEQ ID NO:4. hybridizes to positions 1243 to 1245 of SEQ ID NO:4, wherein the variant B4GALT1 mRNA is at least 95% identical to the variant B4GALT1 mRNA (e.g., SEQ ID NO:4). In some such embodiments, the isolated nucleic acid molecule comprises at least 20, at least 25, or at least 30 nucleotides. In some embodiments, the isolated nucleic acid molecule contains or consists of 15 to 100 nucleotides, or 15 to 35 nucleotides.

대표적인 야생형 B4GALT1 mRNA 서열은 서열번호 3에 제시된다. 대표적인 변이체 B4GALT1 mRNA 서열은 서열번호 4에 제시된다.A representative wild-type B4GALT1 mRNA sequence is shown in SEQ ID NO:3. A representative variant B4GALT1 mRNA sequence is shown in SEQ ID NO:4.

본 개시내용은 또한 B4GALT1 변이체 폴리펩타이드의 전부 또는 부분을 암호화하는 B4GALT1 cDNA의 변이체를 포함하는 핵산 분자를 제공한다. 예시적인 야생형 인간 B4GALT1 cDNA(예를 들어, DNA라고 기록된 mRNA의 암호 영역)는 1197개의 뉴클레오타이드 염기(서열번호 5)로 이루어진다. 인간 B4GALT1 cDNA의 변이체가 서열번호 6에 제시되며, 이것은 SNP(위치 1055에서 A에서 G로; 본 명세서에서 변이체 B4GALT1이라고 지칭됨)를 포함하는데, 이것은 암호화된 B4GALT1 변이체 폴리펩타이드의 352번 위치에 상응하는 위치에서 세린을 초래한다. 1054 내지 1056번 위치에서 야생형 인간 B4GALT1 cDNA의 3개의 염기 "aat"에 상반되는 바와 같이, 변이체 인간 게놈 B4GALT1 cDNA는 예를 들어, 전장/성숙 야생형 인간 B4GALT1 cDNA의 1054 내지 1056번 위치에 상응하는 위치에 세린을 암호화하는 "agt"를 포함한다(각각 서열번호 6을 서열번호 5와 비교). 일부 실시형태에서, 핵산 분자는 서열번호 6을 포함한다. 일부 실시형태에서, 핵산 분자는 서열번호 6으로 이루어진다. 일부 실시형태에서, cDNA 분자는 단리된다.The present disclosure also provides nucleic acid molecules comprising variants of B4GALT1 cDNA encoding all or part of a B4GALT1 variant polypeptide. An exemplary wild-type human B4GALT1 cDNA (e.g., the coding region of mRNA, written DNA) consists of 1197 nucleotide bases (SEQ ID NO: 5). A variant of human B4GALT1 cDNA is set forth in SEQ ID NO:6, which contains a SNP (A to G at position 1055; referred to herein as variant B4GALT1 ), which corresponds to position 352 of the encoded B4GALT1 variant polypeptide. It results in serine at the position. As opposed to the three bases "aat" of the wild-type human B4GALT1 cDNA at positions 1054 to 1056, the variant human genomic B4GALT1 cDNA has, for example, positions corresponding to positions 1054 to 1056 of the full-length/mature wild-type human B4GALT1 cDNA. and "agt", which encodes serine (compare SEQ ID NO: 6 with SEQ ID NO: 5, respectively). In some embodiments, the nucleic acid molecule comprises SEQ ID NO:6. In some embodiments, the nucleic acid molecule consists of SEQ ID NO:6. In some embodiments, the cDNA molecule is isolated.

일부 실시형태에서, cDNA 분자는 서열번호 6과 적어도 약 70%, 적어도 약 75%, 적어도 약 80%, 적어도 약 85%, 적어도 약 90%, 적어도 약 95%, 적어도 약 96%, 적어도 약 97%, 적어도 약 98%, 적어도 약 99% 또는 100% 동일한 핵산 서열을 포함하거나 이것으로 이루어진다. 일부 실시형태에서, cDNA 분자는 또한 서열번호 6의 1054 내지 1056번 위치에 상응하는 뉴클레오타이드를 포함한다. 일부 실시형태에서, 단리된 핵산 분자는 본 명세서에 개시된 임의의 B4GALT1 cDNA 분자의 보체이다.In some embodiments, the cDNA molecule is at least about 70%, at least about 75%, at least about 80%, at least about 85%, at least about 90%, at least about 95%, at least about 96%, at least about 97% %, at least about 98%, at least about 99%, or 100% identical nucleic acid sequences. In some embodiments, the cDNA molecule also includes nucleotides corresponding to positions 1054-1056 of SEQ ID NO:6. In some embodiments, the isolated nucleic acid molecule is the complement of any B4GALT1 cDNA molecule disclosed herein.

일부 실시형태에서, cDNA 분자는 전체 cDNA 서열 미만을 포함한다. 일부 실시형태에서, cDNA 분자는 서열번호 6의 적어도 약 15, 적어도 약 20, 적어도 약 25, 적어도 약 30, 적어도 약 35, 적어도 약 40, 적어도 약 45, 적어도 약 50, 적어도 약 60, 적어도 약 70, 적어도 약 80, 적어도 약 90, 적어도 약 100, 적어도 약 200, 적어도 약 300, 적어도 약 400, 적어도 약 500, 적어도 약 600, 적어도 약 700, 적어도 약 800, 적어도 약 900, 적어도 약 1000 또는 적어도 약 1100개의 인접 뉴클레오타이드를 포함하거나 이것으로 이루어진다. 일부 실시형태에서, 이러한 cDNA 분자는 또한 서열번호 6의 1054 내지 1056번 위치에 상응하는 뉴클레오타이드를 포함한다. 일부 실시형태에서, cDNA 분자는 서열번호 6의 적어도 약 15, 적어도 약 20, 적어도 약 25, 적어도 약 30, 적어도 약 35, 적어도 약 40, 적어도 약 45, 적어도 약 50, 적어도 약 60, 적어도 약 70, 적어도 약 80, 적어도 약 90, 적어도 약 100, 적어도 약 200, 적어도 약 300, 적어도 약 400 또는 적어도 약 500개의 인접 뉴클레오타이드를 포함하거나 이것으로 이루어진다. 일부 실시형태에서, 이러한 cDNA 분자는 또한 서열번호 6의 1054 내지 1056번 위치에 상응하는 뉴클레오타이드를 포함한다.In some embodiments, the cDNA molecule includes less than the entire cDNA sequence. In some embodiments, the cDNA molecule has at least about 15, at least about 20, at least about 25, at least about 30, at least about 35, at least about 40, at least about 45, at least about 50, at least about 60, at least about 70, at least about 80, at least about 90, at least about 100, at least about 200, at least about 300, at least about 400, at least about 500, at least about 600, at least about 700, at least about 800, at least about 900, at least about 1000, or Contains or consists of at least about 1100 contiguous nucleotides. In some embodiments, this cDNA molecule also includes nucleotides corresponding to positions 1054-1056 of SEQ ID NO:6. In some embodiments, the cDNA molecule has at least about 15, at least about 20, at least about 25, at least about 30, at least about 35, at least about 40, at least about 45, at least about 50, at least about 60, at least about Contains or consists of 70, at least about 80, at least about 90, at least about 100, at least about 200, at least about 300, at least about 400 or at least about 500 contiguous nucleotides. In some embodiments, this cDNA molecule also includes nucleotides corresponding to positions 1054-1056 of SEQ ID NO:6.

예를 들어, 일부 실시형태에서, cDNA 분자는 서열번호 6의 적어도 15개의 인접 뉴클레오타이드를 포함하고, 여기서 인접 뉴클레오타이드는 서열번호 6의 뉴클레오타이드 1054 내지 1056을 포함한다. 일부 이러한 실시형태에서, 단리된 핵산 분자는 서열번호 6의 적어도 20, 적어도 25 또는 적어도 30개의 인접 뉴클레오타이드를 포함한다. 일부 실시형태에서, cDNA 분자는 서열번호 6의 15 내지 50개의 인접 뉴클레오타이드를 포함하고, 여기서 인접 뉴클레오타이드는 서열번호 6의 뉴클레오타이드 1054 내지 1056을 포함한다. 일부 이러한 실시형태에서, 단리된 핵산 분자는 서열번호 6의 적어도 20, 적어도 25 또는 적어도 30개의 인접 뉴클레오타이드를 포함한다. 일부 실시형태에서, 본 개시내용은 서열번호 6의 일부와 적어도 90% 동일한 핵산 서열을 포함하는 cDNA 분자를 제공하며, 여기서 서열번호 6의 일부는 서열번호 6의 뉴클레오타이드 1054 내지 1056을 포함하고, 여기서 서열번호 6의 일부는 서열번호 6의 적어도 15개의 연속 뉴클레오타이드를 포함한다. 일부 이러한 실시형태에서, 서열번호 6의 일부는 서열번호 6의 적어도 20, 적어도 25 또는 적어도 30개의 연속 뉴클레오타이드이다. 일부 실시형태에서, 본 개시내용은 서열번호 6의 일부와 적어도 95% 동일한 핵산 서열을 포함하는 cDNA 분자를 제공하며, 여기서 서열번호 6의 일부는 서열번호 6의 뉴클레오타이드 1054 내지 1056을 포함하고, 여기서 서열번호 6의 일부는 서열번호 6의 적어도 15개의 연속 뉴클레오타이드를 포함한다. 일부 이러한 실시형태에서, 서열번호 6의 일부는 서열번호 6의 적어도 20, 적어도 25 또는 적어도 30개의 연속 뉴클레오타이드이다. 일부 실시형태에서, 본 개시내용은 서열번호 6의 일부와 적어도 90% 동일한 핵산 서열을 포함하는 cDNA 분자를 제공하며, 여기서 서열번호 6의 일부는 서열번호 6의 뉴클레오타이드 1054 내지 1056을 포함하고, 여기서 서열번호 6의 일부는 서열번호 6의 15 내지 50개의 연속 뉴클레오타이드를 포함한다. 일부 이러한 실시형태에서, 서열번호 6의 일부는 서열번호 6의 적어도 20, 적어도 25 또는 적어도 30개의 연속 뉴클레오타이드이다. 일부 실시형태에서, 본 개시내용은 서열번호 6의 일부와 적어도 95% 동일한 핵산 서열을 포함하는 cDNA 분자를 제공하며, 여기서 서열번호 6의 일부는 서열번호 6의 뉴클레오타이드 1054 내지 1056을 포함하고, 여기서 서열번호 6의 일부는 서열번호 6의 15 내지 50개의 연속 뉴클레오타이드를 포함한다. 일부 이러한 실시형태에서, 서열번호 6의 일부는 서열번호 6의 적어도 20, 적어도 25 또는 적어도 30개의 연속 뉴클레오타이드이다. 일부 실시형태에서, 본 개시내용은 서열번호 6의 뉴클레오타이드 1054 내지 1056에 상응하는 위치에 서열번호 6의 뉴클레오타이드 1054 내지 1056을 포함하는 cDNA 분자를 제공하며, 여기서 cDNA 분자는 서열번호 6의 일부와 적어도 90% 동일한 핵산 서열을 포함하고, 여기서 서열번호 6의 일부는 서열번호 6의 뉴클레오타이드 1054 내지 1056을 포함하고, 여기서 서열번호 6의 일부는 서열번호 6의 적어도 15개의 연속 뉴클레오타이드를 포함한다. 일부 이러한 실시형태에서, 서열번호 6의 일부는 서열번호 6의 적어도 20, 적어도 25 또는 적어도 30개의 연속 뉴클레오타이드이다. 일부 실시형태에서, 본 개시내용은 서열번호 6의 뉴클레오타이드 1054 내지 1056에 상응하는 위치에 서열번호 6의 뉴클레오타이드 1054 내지 1056을 포함하는 cDNA 분자를 제공하며, 여기서 cDNA 분자는 서열번호 6의 일부와 적어도 95% 동일한 핵산 서열을 포함하고, 여기서 서열번호 6의 일부는 서열번호 6의 뉴클레오타이드 1054 내지 1056을 포함하고, 여기서 서열번호 6의 일부는 서열번호 6의 적어도 15개의 연속 뉴클레오타이드를 포함한다. 일부 이러한 실시형태에서, 서열번호 6의 일부는 서열번호 6의 적어도 20, 적어도 25 또는 적어도 30개의 연속 뉴클레오타이드이다. 일부 실시형태에서, 본 개시내용은 서열번호 6의 뉴클레오타이드 1054 내지 1056에 상응하는 위치에 서열번호 6의 뉴클레오타이드 1054 내지 1056을 포함하는 cDNA 분자를 제공하며, 여기서 cDNA 분자는 서열번호 6의 일부와 적어도 90% 동일한 핵산 서열을 포함하고, 여기서 서열번호 6의 일부는 서열번호 6의 뉴클레오타이드 1054 내지 1056을 포함하고, 여기서 서열번호 6의 일부는 서열번호 6의 15 내지 50개의 연속 뉴클레오타이드를 포함한다. 일부 이러한 실시형태에서, 서열번호 6의 일부는 서열번호 6의 적어도 20, 적어도 25 또는 적어도 30개의 연속 뉴클레오타이드이다. 일부 실시형태에서, 본 개시내용은 서열번호 6의 뉴클레오타이드 1054 내지 1056에 상응하는 위치에 서열번호 6의 뉴클레오타이드 1054 내지 1056을 포함하는 cDNA 분자를 제공하며, 여기서 cDNA 분자는 서열번호 6의 일부와 적어도 95% 동일한 핵산 서열을 포함하고, 여기서 서열번호 6의 일부는 서열번호 6의 뉴클레오타이드 1054 내지 1056을 포함하고, 여기서 서열번호 6의 일부는 서열번호 6의 15 내지 50개의 연속 뉴클레오타이드를 포함한다. 일부 이러한 실시형태에서, 서열번호 6의 일부는 서열번호 6의 적어도 20, 적어도 25 또는 적어도 30개의 연속 뉴클레오타이드이다.For example, in some embodiments, the cDNA molecule comprises at least 15 contiguous nucleotides of SEQ ID NO:6, wherein the contiguous nucleotides comprise nucleotides 1054-1056 of SEQ ID NO:6. In some such embodiments, the isolated nucleic acid molecule comprises at least 20, at least 25, or at least 30 contiguous nucleotides of SEQ ID NO:6. In some embodiments, the cDNA molecule comprises 15 to 50 contiguous nucleotides of SEQ ID NO:6, where the contiguous nucleotides comprise nucleotides 1054 to 1056 of SEQ ID NO:6. In some such embodiments, the isolated nucleic acid molecule comprises at least 20, at least 25, or at least 30 contiguous nucleotides of SEQ ID NO:6. In some embodiments, the present disclosure provides a cDNA molecule comprising a nucleic acid sequence that is at least 90% identical to a portion of SEQ ID NO:6, wherein the portion of SEQ ID NO:6 comprises nucleotides 1054 to 1056 of SEQ ID NO:6, wherein A portion of SEQ ID NO:6 includes at least 15 consecutive nucleotides of SEQ ID NO:6. In some such embodiments, the portion of SEQ ID NO:6 is at least 20, at least 25, or at least 30 consecutive nucleotides of SEQ ID NO:6. In some embodiments, the present disclosure provides a cDNA molecule comprising a nucleic acid sequence that is at least 95% identical to a portion of SEQ ID NO:6, wherein the portion of SEQ ID NO:6 comprises nucleotides 1054 to 1056 of SEQ ID NO:6, wherein A portion of SEQ ID NO:6 includes at least 15 consecutive nucleotides of SEQ ID NO:6. In some such embodiments, the portion of SEQ ID NO:6 is at least 20, at least 25, or at least 30 consecutive nucleotides of SEQ ID NO:6. In some embodiments, the present disclosure provides a cDNA molecule comprising a nucleic acid sequence that is at least 90% identical to a portion of SEQ ID NO:6, wherein the portion of SEQ ID NO:6 comprises nucleotides 1054 to 1056 of SEQ ID NO:6, wherein A portion of SEQ ID NO:6 includes 15 to 50 consecutive nucleotides of SEQ ID NO:6. In some such embodiments, the portion of SEQ ID NO:6 is at least 20, at least 25, or at least 30 consecutive nucleotides of SEQ ID NO:6. In some embodiments, the present disclosure provides a cDNA molecule comprising a nucleic acid sequence that is at least 95% identical to a portion of SEQ ID NO:6, wherein the portion of SEQ ID NO:6 comprises nucleotides 1054 to 1056 of SEQ ID NO:6, wherein A portion of SEQ ID NO:6 includes 15 to 50 consecutive nucleotides of SEQ ID NO:6. In some such embodiments, the portion of SEQ ID NO:6 is at least 20, at least 25, or at least 30 consecutive nucleotides of SEQ ID NO:6. In some embodiments, the present disclosure provides a cDNA molecule comprising nucleotides 1054 to 1056 of SEQ ID NO:6 at a position corresponding to nucleotides 1054 to 1056 of SEQ ID NO:6, wherein the cDNA molecule comprises a portion of SEQ ID NO:6 and at least and wherein the portion of SEQ ID NO:6 comprises nucleotides 1054 to 1056 of SEQ ID NO:6, and wherein the portion of SEQ ID NO:6 comprises at least 15 consecutive nucleotides of SEQ ID NO:6. In some such embodiments, the portion of SEQ ID NO:6 is at least 20, at least 25, or at least 30 consecutive nucleotides of SEQ ID NO:6. In some embodiments, the present disclosure provides a cDNA molecule comprising nucleotides 1054 to 1056 of SEQ ID NO:6 at a position corresponding to nucleotides 1054 to 1056 of SEQ ID NO:6, wherein the cDNA molecule comprises a portion of SEQ ID NO:6 and at least and wherein the portion of SEQ ID NO:6 comprises nucleotides 1054 to 1056 of SEQ ID NO:6, and wherein the portion of SEQ ID NO:6 comprises at least 15 consecutive nucleotides of SEQ ID NO:6. In some such embodiments, the portion of SEQ ID NO:6 is at least 20, at least 25, or at least 30 consecutive nucleotides of SEQ ID NO:6. In some embodiments, the present disclosure provides a cDNA molecule comprising nucleotides 1054 to 1056 of SEQ ID NO:6 at a position corresponding to nucleotides 1054 to 1056 of SEQ ID NO:6, wherein the cDNA molecule comprises a portion of SEQ ID NO:6 and at least and wherein the portion of SEQ ID NO:6 comprises nucleotides 1054 to 1056 of SEQ ID NO:6, and wherein the portion of SEQ ID NO:6 comprises 15 to 50 contiguous nucleotides of SEQ ID NO:6. In some such embodiments, the portion of SEQ ID NO:6 is at least 20, at least 25, or at least 30 consecutive nucleotides of SEQ ID NO:6. In some embodiments, the present disclosure provides a cDNA molecule comprising nucleotides 1054 to 1056 of SEQ ID NO:6 at a position corresponding to nucleotides 1054 to 1056 of SEQ ID NO:6, wherein the cDNA molecule comprises a portion of SEQ ID NO:6 and at least and wherein the portion of SEQ ID NO:6 comprises nucleotides 1054 to 1056 of SEQ ID NO:6, and wherein the portion of SEQ ID NO:6 comprises 15 to 50 contiguous nucleotides of SEQ ID NO:6. In some such embodiments, the portion of SEQ ID NO:6 is at least 20, at least 25, or at least 30 consecutive nucleotides of SEQ ID NO:6.

이러한 cDNA 분자는 예를 들어, B4GALT1 변이체 단백질을 발현하도록 또는 외인성 공여자 서열로서 사용될 수 있다. 집단 내의 유전자 서열은 다형성, 예컨대, SNP로 인해서 달라질 수 있다는 것이 이해된다. 본 명세서에 제공된 예는 단지 예시적인 서열이며, 다른 서열이 또한 가능하다.Such cDNA molecules can be used, for example, to express a B4GALT1 variant protein or as an exogenous donor sequence. It is understood that genetic sequences within a population may vary due to polymorphisms, such as SNPs. The examples provided herein are merely illustrative sequences; other sequences are also possible.

일부 실시형태에서, cDNA 분자는 변이체 Asn352Ser B4GALT1 폴리펩타이드(서열번호 8)와 적어도 약 75%, 적어도 약 80%, 적어도 약 85%, 적어도 약 90%, 적어도 약 91%, 적어도 약 92%, 적어도 약 93%, 적어도 약 94%, 적어도 약 95%, 적어도 약 96%, 적어도 약 97%, 적어도 약 98%, 적어도 약 99% 또는 100% 동일한 폴리펩타이드를 암호화하는 핵산 서열을 포함하거나 이것으로 이루어지되, 단 폴리펩타이드는 352번 위치에 상응하는 위치에 세린을 포함한다. 일부 실시형태에서, cDNA 분자는 서열번호 8과 적어도 약 90% 동일한 폴리펩타이드를 암호화하는 핵산 서열을 포함하거나 이것으로 이루어지되, 단 폴리펩타이드는 352번 위치에 상응하는 위치에 세린을 포함한다. 일부 실시형태에서, cDNA 분자는 서열번호 8과 적어도 약 95% 동일한 폴리펩타이드를 암호화하는 핵산 서열을 포함하거나 이것으로 이루어지되, 단 폴리펩타이드는 352번 위치에 상응하는 위치에 세린을 포함한다. 일부 실시형태에서, cDNA 분자는 서열번호 8과 동일한 폴리펩타이드를 암호화하는 핵산 서열을 포함하거나 이것으로 이루어진다.In some embodiments, the cDNA molecule is at least about 75%, at least about 80%, at least about 85%, at least about 90%, at least about 91%, at least about 92%, at least comprising or consisting of a nucleic acid sequence encoding a polypeptide that is about 93%, at least about 94%, at least about 95%, at least about 96%, at least about 97%, at least about 98%, at least about 99%, or 100% identical; However, the polypeptide contains serine at the position corresponding to position 352. In some embodiments, the cDNA molecule comprises or consists of a nucleic acid sequence encoding a polypeptide that is at least about 90% identical to SEQ ID NO:8, provided that the polypeptide includes a serine at a position corresponding to position 352. In some embodiments, the cDNA molecule comprises or consists of a nucleic acid sequence encoding a polypeptide that is at least about 95% identical to SEQ ID NO:8, provided that the polypeptide includes a serine at a position corresponding to position 352. In some embodiments, the cDNA molecule comprises or consists of a nucleic acid sequence encoding a polypeptide identical to SEQ ID NO:8.

본 개시내용은 또한 변이체 B4GALT1 cDNA 서열에 혼성화되는 단리된 핵산 분자를 제공한다. 일부 실시형태에서, 이러한 단리된 핵산 분자는 적어도 약 15, 적어도 약 20, 적어도 약 25, 적어도 약 30, 적어도 약 35, 적어도 약 40, 적어도 약 45, 적어도 약 50, 적어도 약 60, 적어도 약 70, 적어도 약 80, 적어도 약 90, 적어도 약 100, 적어도 약 200, 적어도 약 300, 적어도 약 400, 적어도 약 500, 적어도 약 600, 적어도 약 700, 적어도 약 800, 적어도 약 900, 적어도 약 1000, 또는 적어도 약 1100개의 뉴클레오타이드를 포함하거나 이것으로 이루어진다. 일부 실시형태에서, 이러한 단리된 핵산 분자는 또한 서열번호 6의 1054 내지 1056번 위치에 혼성화된다. 일부 실시형태에서, 이러한 단리된 핵산 분자는 서열번호 6의 1054 내지 1056번 위치를 포함하거나 또는 이의 약 600, 약 500, 약 400, 약 300, 약 200, 약 100, 약 50, 약 45, 약 40, 약 35, 약 30, 약 25, 약 20, 약 15, 약 10, 또는 약 5개 뉴클레오타이드 내에 존재하는 분절에서 변이체 B4GALT1 cDNA의 일부에 혼성화된다. 일부 실시형태에서, 단리된 핵산 분자는 변이체 B4GALT1 cDNA(예를 들어, 서열번호 6)와 적어도 약 70%, 적어도 약 75%, 적어도 약 80%, 적어도 약 85%, 적어도 약 90%, 적어도 약 95%, 적어도 약 96%, 적어도 약 97%, 적어도 약 98%, 적어도 약 99% 또는 100% 동일한 cDNA 분자의 적어도 약 15개의 인접 뉴클레오타이드에 혼성화된다. 일부 실시형태에서, 단리된 핵산 분자는 또한 서열번호 6의 1054 내지 1056번 위치에 혼성화된다. 일부 실시형태에서, 단리된 핵산 분자는 약 15 내지 약 100개의 뉴클레오타이드, 또는 약 15 내지 약 35개의 뉴클레오타이드를 포함하거나 이것으로 이루어진다.The present disclosure also provides isolated nucleic acid molecules that hybridize to variant B4GALT1 cDNA sequences. In some embodiments, such isolated nucleic acid molecules have at least about 15, at least about 20, at least about 25, at least about 30, at least about 35, at least about 40, at least about 45, at least about 50, at least about 60, at least about 70. , at least about 80, at least about 90, at least about 100, at least about 200, at least about 300, at least about 400, at least about 500, at least about 600, at least about 700, at least about 800, at least about 900, at least about 1000, or Contains or consists of at least about 1100 nucleotides. In some embodiments, this isolated nucleic acid molecule also hybridizes to positions 1054-1056 of SEQ ID NO:6. In some embodiments, such isolated nucleic acid molecule comprises positions 1054-1056 of SEQ ID NO:6 or about 600, about 500, about 400, about 300, about 200, about 100, about 50, about 45, about hybridizes to a portion of the variant B4GALT1 cDNA at a segment within 40, about 35, about 30, about 25, about 20, about 15, about 10, or about 5 nucleotides. In some embodiments, the isolated nucleic acid molecule is at least about 70%, at least about 75%, at least about 80%, at least about 85%, at least about 90%, at least about hybridizes to at least about 15 contiguous nucleotides of the cDNA molecule that are 95%, at least about 96%, at least about 97%, at least about 98%, at least about 99%, or 100% identical. In some embodiments, the isolated nucleic acid molecule also hybridizes to positions 1054-1056 of SEQ ID NO:6. In some embodiments, the isolated nucleic acid molecule contains or consists of about 15 to about 100 nucleotides, or about 15 to about 35 nucleotides.

일부 실시형태에서, 단리된 핵산 분자는 적어도 15개의 뉴클레오타이드를 포함하거나 이들로 이루어지고, 서열번호 6의 1054 내지 1056번 위치를 포함하거나 또는 이의 5개 뉴클레오타이드 내에 존재하는 분절에서 변이체 B4GALT1 cDNA의 일부에 혼성화되고, 여기서 변이체 B4GALT1 cDNA는 변이체 B4GALT1 cDNA(예를 들어, 서열번호 6)와 적어도 90% 동일하다. 일부 실시형태에서, 단리된 핵산 분자는 적어도 15개의 뉴클레오타이드를 포함하거나 이들로 이루어지고, 서열번호 6의 1054 내지 1056번 위치를 포함하거나 또는 이의 5개 뉴클레오타이드 내에 존재하는 분절에서 변이체 B4GALT1 cDNA의 일부에 혼성화되고, 여기서 변이체 B4GALT1 cDNA는 변이체 B4GALT1 cDNA(예를 들어, 서열번호 6)와 적어도 95% 동일하다. 일부 실시형태에서, 단리된 핵산 분자는 적어도 15개의 뉴클레오타이드를 포함하거나 이들로 이루어지고, 서열번호 6의 1054 내지 1056번 위치를 포함하거나 또는 이의 5개 뉴클레오타이드 내에 존재하는 분절에서 변이체 B4GALT1 cDNA의 일부에 혼성화되고, 여기서 변이체 B4GALT1 cDNA는 변이체 B4GALT1 cDNA(예를 들어, 서열번호 6)와 100% 동일하다. 일부 실시형태에서, 단리된 핵산 분자는 적어도 15개의 뉴클레오타이드를 포함하거나 이들로 이루어지고, 서열번호 6의 1054 내지 1056번 위치를 포함하거나 또는 이의 5개 뉴클레오타이드 내에 존재하는 분절에서 변이체 B4GALT1 cDNA의 일부에 혼성화되고, 서열번호 6의 1054 내지 1056번 위치에 혼성화되고, 여기서 변이체 B4GALT1 cDNA는 변이체 B4GALT1 cDNA(예를 들어, 서열번호 6)와 적어도 90% 동일하다. 일부 실시형태에서, 단리된 핵산 분자는 적어도 15개의 뉴클레오타이드를 포함하거나 이들로 이루어지고, 서열번호 6의 1054 내지 1056번 위치를 포함하거나 또는 이의 5개 뉴클레오타이드 내에 존재하는 분절에서 변이체 B4GALT1 cDNA의 일부에 혼성화되고, 서열번호 6의 1054 내지 1056번 위치에 혼성화되고, 여기서 변이체 B4GALT1 cDNA는 변이체 B4GALT1 cDNA(예를 들어, 서열번호 6)와 적어도 95% 동일하다. 일부 실시형태에서, 단리된 핵산 분자는 적어도 15개의 뉴클레오타이드를 포함하거나 이들로 이루어지고, 서열번호 6의 1054 내지 1056번 위치를 포함하거나 또는 이의 5개 뉴클레오타이드 내에 존재하는 분절에서 변이체 B4GALT1 cDNA의 일부에 혼성화되고, 서열번호 6의 1054 내지 1056번 위치에 혼성화되고, 여기서 변이체 B4GALT1 cDNA는 변이체 B4GALT1 cDNA(예를 들어, 서열번호 6)와 100% 동일하다. 일부 실시형태에서, 단리된 핵산 분자는 15 내지 100개의 뉴클레오타이드, 또는 15 내지 35개의 뉴클레오타이드를 포함하거나 이것으로 이루어진다.In some embodiments, the isolated nucleic acid molecule comprises or consists of at least 15 nucleotides and is attached to a portion of the variant B4GALT1 cDNA in a segment comprising or within 5 nucleotides of positions 1054 to 1056 of SEQ ID NO:6. hybridized, wherein the variant B4GALT1 cDNA is at least 90% identical to the variant B4GALT1 cDNA (e.g., SEQ ID NO:6). In some embodiments, the isolated nucleic acid molecule comprises or consists of at least 15 nucleotides and is attached to a portion of the variant B4GALT1 cDNA in a segment comprising or within 5 nucleotides of positions 1054 to 1056 of SEQ ID NO:6. hybridized, wherein the variant B4GALT1 cDNA is at least 95% identical to the variant B4GALT1 cDNA (e.g., SEQ ID NO:6). In some embodiments, the isolated nucleic acid molecule comprises or consists of at least 15 nucleotides and is attached to a portion of the variant B4GALT1 cDNA in a segment comprising or within 5 nucleotides of positions 1054 to 1056 of SEQ ID NO:6. hybridized, wherein the variant B4GALT1 cDNA is 100% identical to the variant B4GALT1 cDNA (e.g., SEQ ID NO:6). In some embodiments, the isolated nucleic acid molecule comprises or consists of at least 15 nucleotides and is attached to a portion of the variant B4GALT1 cDNA in a segment comprising or within 5 nucleotides of positions 1054 to 1056 of SEQ ID NO:6. Hybridizes to positions 1054 to 1056 of SEQ ID NO:6, wherein the variant B4GALT1 cDNA is at least 90% identical to the variant B4GALT1 cDNA (e.g., SEQ ID NO:6). In some embodiments, the isolated nucleic acid molecule comprises or consists of at least 15 nucleotides and is attached to a portion of the variant B4GALT1 cDNA in a segment comprising or within 5 nucleotides of positions 1054 to 1056 of SEQ ID NO:6. hybridizes to positions 1054 to 1056 of SEQ ID NO:6, wherein the variant B4GALT1 cDNA is at least 95% identical to the variant B4GALT1 cDNA (e.g., SEQ ID NO:6). In some embodiments, the isolated nucleic acid molecule comprises or consists of at least 15 nucleotides and is attached to a portion of the variant B4GALT1 cDNA in a segment comprising or within 5 nucleotides of positions 1054 to 1056 of SEQ ID NO:6. hybridized to positions 1054 to 1056 of SEQ ID NO:6, wherein the variant B4GALT1 cDNA is 100% identical to the variant B4GALT1 cDNA (e.g., SEQ ID NO:6). In some embodiments, the isolated nucleic acid molecule contains or consists of 15 to 100 nucleotides, or 15 to 35 nucleotides.

이러한 단리된 핵산 분자는 예를 들어, 가이드 RNA, 프라이머, 프로브, 외인성 공여자 서열, 안티센스 RNA, siRNA 또는 shRNA로서 사용될 수 있다.These isolated nucleic acid molecules can be used, for example, as guide RNAs, primers, probes, exogenous donor sequences, antisense RNA, siRNA or shRNA.

대표적인 야생형 B4GALT1 cDNA 서열은 서열번호 5에 제시된다. 대표적인 변이체 B4GALT1 cDNA 서열은 서열번호 6에 제시된다.A representative wild-type B4GALT1 cDNA sequence is shown in SEQ ID NO:5. A representative variant B4GALT1 cDNA sequence is shown in SEQ ID NO:6.

본 명세서에 개시된 핵산 분자는 자연 발생 B4GALT1 유전자 또는 mRNA 전사물의 핵산 서열을 포함할 수 있거나, 또는 비-자연 발생 서열을 포함할 수 있다. 일부 실시형태에서, 자연 발생 서열은 동의 돌연변이 또는 암호화된 B4GALT1 폴리펩타이드에 영향을 미치지 않는 돌연변이로 인해서 비-자연 발생 서열과 상이할 수 있다. 예를 들어, 서열은 동의 돌연변이 또는 암호화된 B4GALT1 폴리펩타이드에 영향을 주지 않는 돌연변이를 제외하고 동일할 수 있다. 동의 돌연변이 또는 치환은 생산된 아미노산 서열이 변형되지 않도록, 단백질을 암호화하는 유전자의 엑손 내에서 하나의 뉴클레오타이드의 또 다른 뉴클레오타이드로의 치환이다. 이것은 유전자 코드의 축중성으로 인해서 가능하며, 일부 아미노산은 하나 초과의 3-염기쌍 코돈에 의해 암호화된다. 동의 치환은 예를 들어, 코돈 최적화의 과정에서 이용된다. 본 명세서에 개시된 핵산 분자는 코돈 최적화될 수 있다.Nucleic acid molecules disclosed herein may comprise the nucleic acid sequence of a naturally occurring B4GALT1 gene or mRNA transcript, or may comprise non-naturally occurring sequences. In some embodiments, a naturally occurring sequence may differ from a non-naturally occurring sequence due to synonymous mutations or mutations that do not affect the encoded B4GALT1 polypeptide. For example, the sequences may be identical except for synonymous mutations or mutations that do not affect the encoded B4GALT1 polypeptide. A synonymous mutation or substitution is the substitution of one nucleotide for another nucleotide within an exon of a gene encoding a protein such that the resulting amino acid sequence is not altered. This is possible due to the degenerate nature of the genetic code, where some amino acids are encoded by more than one 3-base pair codon. Synonymous substitutions are used, for example, in the process of codon optimization. Nucleic acid molecules disclosed herein can be codon optimized.

개시된 핵산 분자와 상호작용할 수 있는 기능성 폴리뉴클레오타이드가 또한 본 명세서에 제공된다. 기능성 폴리뉴클레오타이드는 특정 기능을 갖는, 예컨대, 표적 분자에 결합하거나 또는 특정 반응을 촉매작용하는 핵산 분자이다. 기능성 폴리뉴클레오타이드의 예는 안티센스 분자, 압타머, 리보자임, 트라이플렉스 형성 분자 및 외부 가이드 서열을 포함하지만 이들로 제한되지 않는다. 기능성 폴리뉴클레오타이드는 표적 분자에 의해 보유된 특이적 활성도의 효과기, 저해제, 조절인자 및 자극인자로서 작용할 수 있거나, 또는 기능성 폴리뉴클레오타이드는 임의의 다른 분자와 독립적으로 데노보 활성을 소유할 수 있다.Also provided herein are functional polynucleotides capable of interacting with the disclosed nucleic acid molecules. A functional polynucleotide is a nucleic acid molecule that has a specific function, such as binding to a target molecule or catalyzing a specific reaction. Examples of functional polynucleotides include, but are not limited to, antisense molecules, aptamers, ribozymes, triplex forming molecules, and external guide sequences. Functional polynucleotides can act as effectors, inhibitors, modulators and stimulators of the specific activity possessed by the target molecule, or functional polynucleotides can possess de novo activity independently of any other molecule.

안티센스 분자는 정준 또는 비정준 염기 짝지움을 통해서 표적 핵산 분자와 상호작용하도록 설계된다. 안티센스 분자 및 표적 분자의 상호작용은 예를 들어, RNase-H-매개된 RNA-DNA 혼성 분해를 통해서 표적 분자의 파괴를 촉진시키도록 설계된다. 대안적으로, 안티센스 분자는 표적 분자 상에서 정상적으로 발생할 가공 기능, 예컨대, 전사 또는 복제를 방해하도록 설계된다. 안티센스 분자는 표적 분자의 서열을 기초로 설계될 수 있다. 표적 분자의 가장 접근 가능한 영역을 식별함으로써 안티센스 효율의 최적화를 위한 다양한 방법이 존재한다. 예는 시험관내 선택 실험, 및 DMS 및 DEPC를 사용한 DNA 변형 연구를 포함하지만 이들로 제한되지 않는다. 안티센스 분자는 일반적으로 약 10^-6 이하, 약 10^-8 이하, 약 10^-10 이하 또는 약 10^-12 이하의 해리 상수(k_d)로 표적 분자에 결합한다. 안티센스 분자의 설계 및 사용을 보조하는 방법 및 기술의 대표적인 샘플은 하기 비제한적인 미국 특허 목록에서 찾아볼 수 있다: 미국 특허 5,135,917; 5,294,533; 5,627,158; 5,641,754; 5,691,317; 5,780,607; 5,786,138; 5,849,903; 5,856,103; 5,919,772; 5,955,590; 5,990,088; 5,994,320; 5,998,602; 6,005,095; 6,007,995; 6,013,522; 6,017,898; 6,018,042; 6,025,198; 6,033,910; 6,040,296; 6,046,004; 6,046,319; 및 6,057,437. 안티센스 분자의 예는 안티센스 RNA, 짧은 간섭 RNA(siRNA) 및 짧은 헤어핀 RNA(shRNA)를 포함하지만 이들로 제한되지 않는다.Antisense molecules are designed to interact with target nucleic acid molecules through canonical or non-canonical base pairing. The interaction of the antisense molecule and the target molecule is designed to promote destruction of the target molecule, for example, through RNase-H-mediated RNA-DNA hybrid cleavage. Alternatively, antisense molecules are designed to interfere with processing functions that would normally occur on the target molecule, such as transcription or replication. Antisense molecules can be designed based on the sequence of the target molecule. Various methods exist for optimization of antisense efficiency by identifying the most accessible region of the target molecule. Examples include, but are not limited to, in vitro selection experiments, and DNA modification studies using DMS and DEPC. Antisense molecules generally bind target molecules with a dissociation constant (k _d ) of about 10 ^-6 or less, about 10 ^-8 or less, about 10 ^{-10 or} less, or about 10 ^{-12 or} less. Representative samples of methods and techniques assisting in the design and use of antisense molecules can be found in the following non-limiting list of U.S. patents: U.S. Patent 5,135,917; 5,294,533; 5,627,158; 5,641,754; 5,691,317; 5,780,607; 5,786,138; 5,849,903; 5,856,103; 5,919,772; 5,955,590; 5,990,088; 5,994,320; 5,998,602; 6,005,095; 6,007,995; 6,013,522; 6,017,898; 6,018,042; 6,025,198; 6,033,910; 6,040,296; 6,046,004; 6,046,319; and 6,057,437. Examples of antisense molecules include, but are not limited to, antisense RNA, short interfering RNA (siRNA), and short hairpin RNA (shRNA).

본 명세서에 개시된 단리된 핵산 분자는 RNA, DNA 또는 RNA과 DNA 둘 다를 포함할 수 있다. 단리된 핵산 분자는 또한 예컨대, 벡터, 또는 이종 표지 내의 이종 핵산 서열에 연결되거나 융합될 수 있다. 예를 들어, 본 명세서에 개시된 단리된 핵산 분자는 단리된 핵산 분자 및 이종성 핵산 서열을 포함하는 벡터 또는 외인성 공여자 서열 내에 존재할 수 있다. 단리된 핵산 분자는 또한 이종 표지, 예컨대, 형광 표지에 연결되거나 융합될 수 있다. 표지의 다른 예는 본 명세서 다른 곳에 개시된다.Isolated nucleic acid molecules disclosed herein may include RNA, DNA, or both RNA and DNA. Isolated nucleic acid molecules can also be linked or fused to heterologous nucleic acid sequences, for example, in vectors, or heterologous labels. For example, an isolated nucleic acid molecule disclosed herein can be presented within a vector or exogenous donor sequence comprising the isolated nucleic acid molecule and a heterologous nucleic acid sequence. Isolated nucleic acid molecules can also be linked or fused to heterologous labels, such as fluorescent labels. Other examples of labels are disclosed elsewhere herein.

표지는 직접 검출 가능(예를 들어, 형광단)하거나 또는 간접적으로 검출 가능(예를 들어, 합텐, 효소 또는 형광단 켄처)할 수 있다. 이러한 표지는 분광학, 광화학, 생화학, 면역화학 또는 화학 수단에 의해 검출 가능할 수 있다. 이러한 표지는 예를 들어, 방사선-계수 장치로 측정될 수 있는 방사성 표지; 시각적으로 관찰되거나 또는 분광광도계로 측정될 수 있는 안료, 염료 또는 다른 색소원; 스핀 표지 분석기로 측정될 수 있는 스핀 표지; 및 형광 표지(예를 들어, 형광단)을 포함하고, 여기서 출력 신호는 적합한 분자 부가물의 여기에 의해 생성되고, 염료에 의해 흡수되는 광으로의 여기에 의해 가시화될 수 있거나, 또는 표준 형광측정기 또는 영상화 시스템으로 측정될 수 있다. 표지는 또한, 예로서 화학발광 물질(여기서 출력 신호는 신호 화합물의 화학적 변형에 의해서 생성됨); 금속-함유 물질; 또는 효소(여기서 신호의 효소-의존성 이차 생성, 예컨대, 무색 기질로부터 착색된 산물의 형성이 발생함)일 수 있다. 용어 "표지"는 또한 접합된 분자가 나중에 기질과 함께 첨가될 때, 검출 가능한 신호를 산출하는데 사용되도록, 접합된 분자에 선택적으로 결합할 수 있는 "태그" 또는 합텐을 지칭할 수 있다. 예를 들어, 바이오틴을 태그로서 사용하고, 이어서, 호스래디쉬 퍼옥시데이트(HRP)의 아비딘 또는 스트렙타비딘 접합체를 사용하여 태그에 결합시키고, 이어서 열량측정 기질(예를 들어, 테트라메틸 벤지딘(TMB)) 또는 형광원 기질을 사용하여 HRP의 존재를 검출할 수 있다. 예시적인 표지는 myc, HA, FLAG 또는 3XFLAG, 6XHis 또는 폴리히스티딘, 글루타티온-S-트랜스퍼라제(GST), 말토스 결합 단백질, 에피토프 태그, 또는 면역글로불린의 Fc 부분을 포함하지만 이로 제한되지 않는다. 다수의 표지가 공지되어 있고, 예를 들어, 입자, 형광단, 합텐, 효소 및 이들의 열량측정, 형광원 및 화학발광 기질 및 다른 표지가 포함된다.The label may be directly detectable (e.g., a fluorophore) or indirectly detectable (e.g., a hapten, enzyme, or fluorophore quencher). Such labels may be detectable by spectroscopic, photochemical, biochemical, immunochemical or chemical means. Such labels include, for example, radioactive labels that can be measured with a radiation-counting device; Pigments, dyes or other color sources that can be observed visually or measured spectrophotometrically; Spin label, which can be measured with a spin label analyzer; and a fluorescent label (e.g., a fluorophore), wherein the output signal is produced by excitation of a suitable molecular adduct and can be visualized by excitation with light absorbed by the dye, or by standard fluorometer or It can be measured with an imaging system. Labels may also include, for example, chemiluminescent materials (where the output signal is produced by chemical modification of the signal compound); metal-containing materials; or an enzyme, where enzyme-dependent secondary production of a signal occurs, e.g., formation of a colored product from a colorless substrate. The term “label” can also refer to a “tag” or hapten that can selectively bind to a conjugated molecule such that it is used to produce a detectable signal when the conjugated molecule is later added with a substrate. For example, biotin can be used as a tag, followed by binding to the tag using an avidin or streptavidin conjugate of horseradish peroxidate (HRP), followed by a calorimetric substrate (e.g., tetramethyl benzidine ( TMB)) or fluorogenic substrates can be used to detect the presence of HRP. Exemplary labels include, but are not limited to, myc, HA, FLAG or 3XFLAG, 6XHis or polyhistidine, glutathione-S-transferase (GST), maltose binding protein, epitope tags, or the Fc portion of an immunoglobulin. Many labels are known and include, for example, particles, fluorophores, haptens, enzymes and their calorimetric, fluorescent and chemiluminescent substrates and other labels.

개시된 핵산 분자는 예를 들어, 뉴클레오타이드 또는 비-자연 또는 변형된 뉴클레오타이드, 예컨대, 뉴클레오타이드 유사체 또는 뉴클레오타이드 치환체로 구성될 수 있다. 이러한 뉴클레오타이드는 변형된 염기, 당 또는 포스페이트 기를 함유하거나 또는 구조 내에 비-자연 모이어티를 혼입한 뉴클레오타이드를 포함한다. 비-자연 뉴클레오타이드의 예는 다이데옥시뉴클레오타이드, 바이오틴일화된, 아민화된(aminated), 탈아민화된, 알킬화된, 벤질화된 및 형광단-표지된 뉴클레오타이드를 포함하지만 이들로 제한되지 않는다.The disclosed nucleic acid molecules may consist, for example, of nucleotides or non-natural or modified nucleotides, such as nucleotide analogs or nucleotide substitutions. Such nucleotides include nucleotides that contain modified bases, sugar or phosphate groups, or incorporate non-natural moieties into the structure. Examples of non-natural nucleotides include, but are not limited to, dideoxynucleotides, biotinylated, aminated, deaminated, alkylated, benzylated, and fluorophore-labeled nucleotides.

본 명세서에 개시된 핵산 분자는 또한 하나 이상의 뉴클레오타이드 유사체 또는 치환을 포함할 수 있다. 뉴클레오타이드 유사체는 염기, 당 또는 포스페이트 모이어티에 대한 변형을 함유하는 뉴클레오타이드이다. 염기 모이어티에 대한 변형은 A, C, G 및 T/U의 자연 및 합성 변형뿐만 아니라 상이한 퓨린 또는 피리미딘 염기, 예를 들어, 슈도우리딘, 우라실-5-일, 하이포잔틴-9-일(I) 및 2-아미노아데닌-9-일을 포함하지만 이들로 제한되지 않는다. 변형된 염기는 5-메틸사이토신(5-me-C), 5-하이드록시메틸 사이토신, 잔틴, 하이포잔틴, 2-아미노아데닌, 아데닌 및 구아닌의 6-메틸 및 다른 알킬 유도체, 아데닌 및 구아닌의 2-프로필 및 다른 알킬 유도체, 2-티오우라실, 2-티오티민 및 2-티오사이토신, 5-할로우라실 및 사이토신, 5-프로핀일 우라실 및 사이토신, 6-아조 우라실, 사이토신 및 티민, 5-우라실(슈도우라실), 4-티오우라실, 8-할로, 8-아미노, 8-티올, 8-티오알킬, 8-하이드록실 및 다른 8-치환된 아데닌 및 구아닌, 5-할로, 특히 5-브로모, 5-트라이플루오로메틸 및 다른 5-치환된 우라실 및 사이토신, 7-메틸구아닌 및 7-메틸아데닌, 8-아자구아닌 및 8-아자아데닌, 7-데아자구아닌 및 7-데아자아데닌 및 3-데아자구아닌 및 3-데아자아데닌을 포함하지만 이들로 제한되지 않는다. 특정 뉴클레오타이드 유사체, 예를 들어, 5-치환된 피리미딘, 6-아자피리미딘 및 N-2, N-6 및 O-6 치환된 퓨린, 예컨대, 비제한적으로 2-아미노프로필아데닌, 5-프로핀일우라실, 5-프로핀일사이토신 및 5-메틸사이토신이 듀플렉스 형성의 안정성을 증가시킬 수 있다. 보통, 염기 변형은 고유한 특성, 예컨대, 증가된 듀플렉스 안정성을 달성하기 위해서, 예를 들어, 당 변형, 예컨대, 2'-O-메톡시에틸과 조합될 수 있다.Nucleic acid molecules disclosed herein may also include one or more nucleotide analogs or substitutions. Nucleotide analogs are nucleotides that contain modifications to a base, sugar, or phosphate moiety. Modifications to base moieties include natural and synthetic modifications of A, C, G and T/U, as well as different purine or pyrimidine bases, such as pseudouridine, uracil-5-yl, hypoxanthine-9-yl ( I) and 2-aminoadenin-9-yl. The modified bases are 5-methylcytosine (5-me-C), 5-hydroxymethylcytosine, xanthine, hypoxanthine, 2-aminoadenine, 6-methyl and other alkyl derivatives of adenine and guanine, adenine and guanine. 2-propyl and other alkyl derivatives of, 2-thiouracil, 2-thiothymine and 2-thiocytosine, 5-halouracil and cytosine, 5-propynyl uracil and cytosine, 6-azouracil, cytosine and Thymine, 5-uracil (pseudouracil), 4-thiouracil, 8-halo, 8-amino, 8-thiol, 8-thioalkyl, 8-hydroxyl and other 8-substituted adenines and guanines, 5-halo, In particular 5-bromo, 5-trifluoromethyl and other 5-substituted uracils and cytosines, 7-methylguanine and 7-methyladenine, 8-azaguanine and 8-azaadenine, 7-deazaguanine and 7-deazaguanine. -Includes, but is not limited to, deazadenine and 3-deazaguanine and 3-deazadenine. Certain nucleotide analogs, such as 5-substituted pyrimidines, 6-azapyrimidines and N-2, N-6 and O-6 substituted purines such as, but not limited to, 2-aminopropyladenine, 5-pro Pinyluracil, 5-propynylcytosine and 5-methylcytosine can increase the stability of duplex formation. Usually, base modifications can be combined, for example, with sugar modifications, such as 2'-O-methoxyethyl, to achieve unique properties, such as increased duplex stability.

뉴클레오타이드 유사체는 또한 당 모이어티의 변형을 포함할 수 있다. 당 모이어티에 대한 변형은 리보스 및 데옥시 리보스의 자연 변형뿐만 아니라 합성적 변형을 포함하지만 이들로 제한되지 않는다. 당 변형은 2' 위치에서의 하기 변형을 포함하지만 이들로 제한되지 않는다: OH; F; O-, S- 또는 N-알킬; O-, S- 또는 N-알켄일; O-, S- 또는 N-알킨일; 또는 O-알킬-O-알킬(여기서 알킬, 알켄일 및 알킨일은 치환된 또는 비치환된 C_1-10알킬 또는 C_2-10알켄일 및 C_2-10알킨일일 수 있음). 예시적인 2' 당 변형은 또한 -O[(CH₂)_nO]_mCH₃, -O(CH₂)_nOCH₃, -O(CH₂)_nNH₂, -O(CH₂)_nCH₃, -O(CH₂)_n-ONH₂, 및 -O(CH₂)_nON[(CH₂)_nCH₃)]₂(식 중, n 및 m은 1 내지 약 10임)를 포함하지만 이들로 제한되지 않는다.Nucleotide analogs may also include modifications of sugar moieties. Modifications to the sugar moiety include, but are not limited to, natural modifications of ribose and deoxyribose as well as synthetic modifications. Sugar modifications include, but are not limited to, the following modifications at the 2' position: OH; F; O-, S- or N-alkyl; O-, S- or N-alkenyl; O-, S- or N-alkynyl; or O-alkyl-O-alkyl, wherein alkyl, alkenyl and alkynyl can be substituted or unsubstituted C _1-10 alkyl or C _2-10 alkenyl and C _2-10 alkynyl. Exemplary 2' sugar modifications also include -O[(CH ₂ ) _n O] _m CH ₃ , -O(CH ₂ ) _n OCH ₃ , -O(CH ₂ ) _n NH ₂ , -O(CH ₂ ) _n CH ₃ , -O(CH ₂ ) _n -ONH ₂ , and -O(CH ₂ ) _n ON[(CH ₂ ) _n CH ₃ )] ₂ (wherein n and m are 1 to about 10); It is not limited to these.

2' 위치에서의 다른 변형은 C_1-10알킬, 치환된 저급 알킬, 알크아릴, 아르알킬, O-알크아릴 또는 O-아르알킬, SH, SCH₃, OCN, Cl, Br, CN, CF₃, OCF₃, SOCH₃, SO₂CH₃, ONO₂, NO₂, N₃, NH₂, 헤테로사이클로알킬, 헤테로사이클로알크아릴, 아미노알킬아미노, 폴리알킬아미노, 치환된 실릴, RNA 절단기, 리포터기, 삽입제, 올리고뉴클레오타이드의 약동학적 특성을 개선시키기 위한 기 또는 올리고뉴클레오타이드의 약력학적 특성을 개선시키기 위한 기, 및 유사한 특성을 갖는 다른 치환체를 포함하지만 이들로 제한되지 않는다. 당 상의 다른 위치, 특히 3' 말단 뉴클레오타이드 상의 또는 2'-5' 연결된 올리고뉴클레오타이드 내의 당의 3' 위치 및 5' 말단 뉴클레오타이드의 5' 위치에서 또한 유사한 변형이 행해질 수 있다. 변형된 당은 또한, 가교 고리 산소에서의 변형, 예컨대, CH₂ 및 S를 함유하는 것을 포함할 수 있다. 뉴클레오타이드 당 유사체는 또한 펜토푸란오실 당 대신에 당 모방체, 예컨대, 사이클로부틸 모이어티를 가질 수 있다.Other modifications at the 2' position include C _1-10 alkyl, substituted lower alkyl, alkaryl, aralkyl, O-alkaryl or O-aralkyl, SH, SCH ₃ , OCN, Cl, Br, CN, CF ₃ , OCF ₃ , SOCH ₃ , SO ₂ CH ₃ , ONO ₂ , NO ₂ , N ₃ , NH ₂ , heterocycloalkyl, heterocycloalkaryl, aminoalkylamino, polyalkylamino, substituted silyl, RNA cutter , reporter groups, intercalators, groups for improving the pharmacokinetic properties of an oligonucleotide or groups for improving the pharmacodynamic properties of an oligonucleotide, and other substituents with similar properties. Similar modifications can also be made at other positions on the sugar, especially on the 3' terminal nucleotide or at the 3' position of the sugar in a 2'-5' linked oligonucleotide and the 5' position of the 5' terminal nucleotide. Modified sugars may also include those containing modifications at bridging ring oxygens, such as CH ₂ and S. Nucleotide sugar analogs can also have a sugar mimetic, such as a cyclobutyl moiety, in place of the pentofuranosyl sugar.

뉴클레오타이드 유사체는 또한 포스페이트 모이어티에서 변형될 수 있다. 변형된 포스페이트 모이어티는, 두 뉴클레오타이드 간의 링키지가 포스포로티오에이트, 카이랄 포스포로티오에이트, 포스포로다이티오에이트, 포스포트라이에스터, 아미노알킬포스포트라이에스터, 3'-알킬렌 포스포네이트 및 카이랄 포스포네이트를 포함하는 메틸 및 다른 알킬 포스포네이트, 포스피네이트, 3'-아미노 포스포르아미데이트 및 아미노알킬포스포르아미데이트를 포함하는 포스포르아미데이트, 티오노포스포르아미데이트, 티오노알킬포스포네이트, 티오노알킬포스포트라이에스터, 및 보라노포스페이트를 함유하도록 변형될 수 있는 것을 포함하지만 이들로 제한되지 않는다. 두 개의 뉴클레오타이드 간의 이러한 포스페이트 또는 변형된 포스페이트 링키지는 3'-5' 링키지 또는 2'-5' 링키지를 통한 것일 수 있고, 링키지는 반전된 극성, 예컨대, 3'-5'에서 5'-3' 또는 2'-5'에서 5'-2'를 함유할 수 있다. 다양한 염, 혼합된 염 및 유리 산 형태가 또한 포함된다.Nucleotide analogs may also be modified at the phosphate moiety. Modified phosphate moieties are those in which the linkage between two nucleotides is phosphorothioate, chiral phosphorothioate, phosphorodithioate, phosphothryester, aminoalkylphosphothryester, 3'-alkylene phosphonate, and phosphorothioate. Methyl and other alkyl phosphonates, including iral phosphonates, phosphinates, phosphoramidates, including 3'-amino phosphoramidates and aminoalkylphosphoramidates, thionophosphoramidates, ti Including, but not limited to, those that can be modified to contain onoalkylphosphonates, thionoalkylphosphotriesters, and boranophosphates. This phosphate or modified phosphate linkage between two nucleotides may be through a 3'-5' linkage or a 2'-5' linkage, with the linkage having reversed polarity, e.g., 3'-5' to 5'-3'. Or it may contain 2'-5' to 5'-2'. Various salts, mixed salts and free acid forms are also included.

뉴클레오타이드 치환체는 뉴클레오타이드와 유사한 기능성 특성을 갖지만, 포스페이트 모이어티를 함유하지 않는 분자, 예컨대, 펩타이드 핵산 (PNA)을 포함한다. 뉴클레오타이드 치환체는 핵산을 왓슨 크릭 또는 후그스틴 방식으로 인식할 것이지만, 포스페이트 모이어티 이외의 모이어티를 통해 함께 연결되는 분자를 포함한다. 뉴클레오타이드 치환체는 적절한 표적 핵산과 상호작용할 때 이중 나선 유형 구조에 정합할 수 있다.Nucleotide substituents include molecules that have functional properties similar to nucleotides but do not contain a phosphate moiety, such as peptide nucleic acids (PNAs). Nucleotide substituents will recognize nucleic acids in Watson-Crick or Hoogsteen fashion, but include molecules that are linked together through moieties other than the phosphate moiety. Nucleotide substituents can align to a double helix type structure when interacting with an appropriate target nucleic acid.

뉴클레오타이드 치환체는 또한, 포스페이트 모이어티 또는 당 모이어티가 대체된 뉴클레오타이드 또는 뉴클레오타이드 유사체를 포함한다. 일부 실시형태에서, 뉴클레오타이드 치환체는 표준 인 원자를 함유할 수 없다. 포스페이트에 대한 치환체는 예를 들어, 짧은 쇄 알킬 또는 사이클로알킬 뉴클레오사이드간 링키지, 혼합된 헤테로원자 및 알킬 또는 사이클로알킬 뉴클레오사이드간 링키지, 또는 하나 이상의 짧은 사슬 헤테로원자 또는 헤테로사이클릭 뉴클레오시드간 링키지일 수 있다. 이들은 모폴리노 링키지(뉴클레오사이드의 당 부분으로부터 부분적으로 형성됨); 실록산 골격; 설파이드, 설폭사이드 및 설폰 골격; 폼아세틸 및 티오폼아세틸 골격; 메틸렌 폼아세틸 및 티오폼아세틸 골격; 알켄 함유 골격; 설파메이트 골격; 메틸렌이미노 및 메틸렌하이드라지노 골격; 설포네이트 및 설폰아마이드 골격; 아마이드 골격; 및 혼합된 N, O, S 및 CH₂ 성분 부분을 갖는 다른 것을 포함한다.Nucleotide substituents also include nucleotides or nucleotide analogs in which a phosphate moiety or sugar moiety has been replaced. In some embodiments, the nucleotide substituent cannot contain a standard phosphorus atom. Substituents for phosphate include, for example, linkages between short chain alkyl or cycloalkyl nucleosides, linkages between mixed heteroatoms and alkyl or cycloalkyl nucleosides, or one or more short chain heteroatoms or heterocyclic nucleosides. It may be a liver linkage. These include morpholino linkages (formed in part from the sugar portion of a nucleoside); siloxane skeleton; sulfide, sulfoxide and sulfone skeletons; Formacetyl and thioformacetyl backbones; methylene formacetyl and thioformacetyl backbones; Alkene-containing skeleton; Sulfamate skeleton; methyleneimino and methylenehydrazino skeletons; sulfonate and sulfonamide backbones; amide skeleton; and others having mixed N, O, S and CH ₂ component portions.

뉴클레오타이드 치환체에서, 뉴클레오타이드의 당 및 포스페이트 모이어티 둘 다는 예를 들어, 아마이드 유형 링키지(아미노에틸글리신)(PNA)에 의해 대체될 수 있다고 또한 이해된다.It is also understood that in nucleotide substitutions, both the sugar and phosphate moieties of the nucleotide can be replaced, for example by an amide type linkage (aminoethylglycine) (PNA).

예를 들어, 세포 흡수를 향상시키기 위해서, 다른 유형의 분자(접합체)를 뉴클레오타이드 또는 뉴클레오타이드 유사체에 연결하는 것이 또한 가능하다. 접합체는 뉴클레오타이드 또는 뉴클레오타이드 유사체에 화학적으로 연결될 수 있다. 이러한 접합체는 예를 들어, 지질 모이어티, 예컨대, 콜레스테롤 모이어티, 콜산, 티오에터, 예컨대, 헥실-S-트라이틸티올, 티오콜레스테롤, 지방족 쇄, 예컨대, 도데칸다이올 또는 운데실 잔기, 포스포지질, 예컨대, 다이-헥사데실-rac-글리세롤 또는 트라이에틸암모늄 1,2-다이-O-헥사데실-rac-글리세로-3-H-포스포네이트, 폴리아민 또는 폴리에틸렌 글리콜 쇄, 아다만탄 아세트산, 팔미틸 모이어티 또는 옥타데실아민 또는 헥실아미노-카보닐-옥시콜레스테롤 모이어티를 포함한다.It is also possible to link other types of molecules (conjugates) to nucleotides or nucleotide analogs, for example to improve cellular uptake. Conjugates may be chemically linked to nucleotides or nucleotide analogs. Such conjugates include, for example, lipid moieties such as cholesterol moieties, cholic acid, thioethers such as hexyl-S-tritylthiol, thiocholesterol, aliphatic chains such as dodecanediol or undecyl moieties, phos Polylipids, such as di-hexadecyl-rac-glycerol or triethylammonium 1,2-di-O-hexadecyl-rac-glycero-3-H-phosphonate, polyamines or polyethylene glycol chains, adamantane acetic acid, palmityl moiety or octadecylamine or hexylamino-carbonyl-oxycholesterol moiety.

본 개시내용은 또한 본 명세서에 개시된 핵산 분자 중 임의의 하나 이상을 포함하는 벡터를 제공한다. 일부 실시형태에서, 벡터는 본 명세서에 개시된 핵산 분자 및 이종 핵산 중 임의의 하나 이상을 포함한다. 벡터는 핵산 분자를 수송할 수 있는 바이러스 또는 비바이러스 벡터일 수 있다. 일부 실시형태에서, 벡터는 플라스미드 또는 코스미드(예를 들어, 추가 DNA 분절이 결찰될 수 있는 환식 이중 가닥 DNA)이다. 일부 실시형태에서, 벡터는 바이러스 벡터이고, 여기서 추가 DNA 분절이 바이러스 게놈 내에 결찰될 수 있다. 일부 실시형태에서, 벡터는 벡터가 도입되는 숙주 세포에서 자율적으로 복제할 수 있다(예를 들어, 박테리아 복제 기점을 갖는 박테리아 벡터 및 에피솜 포유동물 벡터). 일부 실시형태에서, 벡터(예를 들어, 비-에피솜 포유동물 벡터)는 숙주 세포 내로 도입 시에 숙주 세포의 게놈 내로 통합될 수 있고, 이에 의해서, 숙주 게놈과 함께 복제된다. 추가로, 특정 벡터는 그들이 작동 가능하게 연결되는 유전자의 발현을 지시할 수 있다. 이러한 벡터는 본 명세서에서, "재조합 발현 벡터" 또는 "발현 벡터"로서 지칭된다. 이러한 벡터는 또한 표적화 벡터(즉, 외인성 공여자 서열)일 수 있다.The disclosure also provides vectors comprising any one or more of the nucleic acid molecules disclosed herein. In some embodiments, the vector comprises any one or more of the nucleic acid molecules and heterologous nucleic acids disclosed herein. Vectors can be viral or non-viral vectors capable of transporting nucleic acid molecules. In some embodiments, the vector is a plasmid or cosmid (e.g., circular double-stranded DNA into which additional DNA segments can be ligated). In some embodiments, the vector is a viral vector, where additional DNA segments can be ligated within the viral genome. In some embodiments, the vector is capable of replicating autonomously in the host cell into which it is introduced (e.g., bacterial vectors with a bacterial origin of replication and episomal mammalian vectors). In some embodiments, a vector (e.g., a non-episomal mammalian vector) can integrate into the genome of a host cell upon introduction into the host cell, thereby replicating with the host genome. Additionally, certain vectors can direct the expression of genes to which they are operably linked. Such vectors are referred to herein as “recombinant expression vectors” or “expression vectors”. Such vectors may also be targeting vectors (i.e., exogenous donor sequences).

일부 실시형태에서, 본 명세서에 개시된 다양한 유전자 변이체에 의해서 암호화된 단백질은 이들 유전자가 발현 제어 서열, 예컨대, 전사 및 번역 제어 서열에 작동 가능하게 연결되도록, 개시된 유전자 변이체를 암호화하는 핵산 분자를 발현 벡터 내로 삽입함으로써 발현된다. 발현 벡터는 플라스미드, 코스미드, 레트로바이러스, 아데노바이러스, 아데노-연관된 바이러스(AAV), 식물 바이러스, 예컨대, 콜리플라워 모자이크 바이러스 및 담배 모자이크 바이러스, 효모 인공 염색체(YAC), 엡스타인-바(EBV)-유래된 에피솜 등을 포함하지만 이들로 제한되지 않는다. 일부 실시형태에서, 개시된 유전자 변이체를 포함하는 핵산 분자는, 벡터 내에 전사 및 번역 제어 서열이 유전자 변이체의 전사 및 번역을 조절하는 그의 의도된 기능을 제공하도록, 벡터 내로 결찰될 수 있다. 발현 벡터 및 발현 제어 서열은 사용된 발현 숙주 세포와 상용성이도록 선택된다. 개시된 유전자 변이체를 포함하는 핵산 서열은 별개의 벡터 내로 또는 변이체 유전 정보와 동일한 발현 벡터 내로 삽입될 수 있다. 개시된 유전자 변이체를 포함하는 핵산 서열은 표준 방법(예를 들어, 개시된 유전자 변이체 및 벡터를 포함하는 핵산 상에서 상보성 제한 부위의 결찰, 또는 제한 부위가 존재하지 않으면 뭉툭한 단부 결찰)에 의해서 발현 벡터 내로 삽입될 수 있다.In some embodiments, the proteins encoded by the various genetic variants disclosed herein can be derived from nucleic acid molecules encoding the disclosed genetic variants such that these genes are operably linked to expression control sequences, such as transcription and translation control sequences, using expression vectors. It is expressed by inserting it into the body. Expression vectors include plasmids, cosmids, retroviruses, adenoviruses, adeno-associated viruses (AAV), plant viruses such as cauliflower mosaic virus and tobacco mosaic virus, yeast artificial chromosome (YAC), Epstein-Barr (EBV)- Including, but not limited to, derived episomes, etc. In some embodiments, a nucleic acid molecule comprising a disclosed genetic variant can be ligated into a vector such that transcriptional and translational control sequences within the vector provide its intended function of regulating transcription and translation of the genetic variant. Expression vectors and expression control sequences are selected to be compatible with the expression host cells used. Nucleic acid sequences comprising the disclosed genetic variants can be inserted into separate vectors or into the same expression vector as the variant genetic information. Nucleic acid sequences comprising the disclosed genetic variants can be inserted into expression vectors by standard methods (e.g., ligation of complementary restriction sites on the nucleic acid containing the disclosed genetic variants and the vector, or blunt end ligation if restriction sites are not present). You can.

개시된 유전자 변이체를 포함하는 핵산 서열에 더하여, 재조합 발현 벡터는 숙주 세포에서 유전자 변이체의 발현을 제어하는 조절 서열을 보유할 수 있다. 조절 서열의 선택을 비롯한, 발현 벡터의 설계는 형질 전환될 숙주 세포의 선택, 목적하는 단백질의 발현 수준 등과 같은 인자에 좌우될 수 있다. 포유동물 숙주 세포 발현을 위한 목적하는 조절 서열은 예를 들어, 포유동물 세포에서 높은 수준의 단백질 발현을 지시하는 바이러스 요소, 예컨대, 레트로바이러스 LTR, 사이토메갈로바이러스(CMV)(예컨대, CMV 프로모터/인핸서), 유인원 바이러스 40(SV40)(예컨대, SV40 프로모터/인핸서), 아데노바이러스(예를 들어, 아데노바이러스 주요 후기 프로모터(AdMLP))로부터 유래된 프로모터 및/또는 인핸서, 폴리오마 및 강한 포유동물 프로모터, 예컨대, 네이티브 면역글로불린 및 액틴 프로모터를 포함할 수 있다. 박테리아 세포 또는 진균 세포(예를 들어, 효모 세포)에서 폴리펩타이드를 발현하는 방법이 또한 널리 공지되어 있다.In addition to the nucleic acid sequence containing the disclosed genetic variant, the recombinant expression vector may contain regulatory sequences that control expression of the genetic variant in the host cell. The design of the expression vector, including the choice of control sequences, may depend on factors such as the choice of host cell to be transformed, the expression level of the protein of interest, etc. Regulatory sequences of interest for mammalian host cell expression include, for example, viral elements that direct high level protein expression in mammalian cells, such as retroviral LTRs, cytomegalovirus (CMV) (e.g., CMV promoter/enhancer) ), simian virus 40 (SV40) (e.g. SV40 promoter/enhancer), promoters and/or enhancers derived from adenoviruses (e.g. adenovirus major late promoter (AdMLP)), polyomas and strong mammalian promoters, For example, it may include native immunoglobulin and actin promoter. Methods for expressing polypeptides in bacterial cells or fungal cells (e.g., yeast cells) are also well known.

프로모터는 예를 들어, 구성적으로 활성인 프로모터, 조건성 프로모터, 유도성 프로모터, 일시적으로 제한된 프로모터(예를 들어, 발달적으로 조절된 프로모터), 또는 공간적으로 제한된 프로모터(예를 들어, 세포 특이적 또는 조직 특이적 프로모터)일 수 있다. 프로모터의 예는 예를 들어, 국제 특허 제WO 2013/176772호에서 찾아볼 수 있다.A promoter may be, for example, a constitutively active promoter, a conditional promoter, an inducible promoter, a temporally restricted promoter (e.g., a developmentally regulated promoter), or a spatially restricted promoter (e.g., a cell-specific promoter). or tissue-specific promoter). Examples of promoters can be found, for example, in International Patent No. WO 2013/176772.

유도성 프로모터의 예는 예를 들어, 화학적 조절 프로모터 및 물리적 조절 프로모터를 포함한다. 화학적 조절 프로모터는 예를 들어, 알코올-조절 프로모터(예를 들어, 알코올 데하이드로게나제(alcA) 유전자 프로모터), 테트라사이클린-조절 프로모터(예를 들어, 테트라사이클린-반응성 프로모터, 테트라사이클린 작동자 서열(tetO), tet-온 프로모터 또는 tet-오프 프로모터), 스테로이드 조절 프로모터(예를 들어, 래트 글루코코티코이드 수용체, 에스트로겐 수용체의 프로모터, 또는 엑디손 수용체의 프로모터) 또는 금속-조절 프로모터(예를 들어, 메탈로단백질 프로모터)를 포함한다. 물리적 조절 프로모터는 예를 들어, 온도-조절 프로모터(예를 들어, 열 충격 프로모터) 및 광-조절 프로모터(예를 들어, 광-유도성 프로모터 또는 광-억제성 프로모터)를 포함한다.Examples of inducible promoters include, for example, chemically controlled promoters and physically controlled promoters. Chemically controlled promoters include, for example, alcohol-controlled promoters (e.g., alcohol dehydrogenase (alcA) gene promoter), tetracycline-controlled promoters (e.g., tetracycline-responsive promoters, tetracycline operator sequences (tetO), tet-on promoter or tet-off promoter), a steroid-regulated promoter (e.g., the rat glucocorticoid receptor, the promoter of the estrogen receptor, or the promoter of the ecdysone receptor), or a metal-regulated promoter (e.g., the metal low protein promoter). Physically regulated promoters include, for example, temperature-regulated promoters (e.g., heat shock promoters) and light-regulated promoters (e.g., light-inducible promoters or light-repressible promoters).

조직 특이적 프로모터는 예를 들어, 뉴런-특이적 프로모터, 아교세포-특이적 프로모터, 근육 세포-특이적 프로모터, 심장 세포-특이적 프로모터, 신장 세포 특이적-프로모터, 뼈 세포-특이적 프로모터, 내피 세포-특이적 프로모터, 또는 면역 세포-특이적 프로모터 (예를 들어, B 세포 프로모터 또는 T 세포 프로모터)일 수 있다.Tissue-specific promoters include, for example, neuron-specific promoters, glial cell-specific promoters, muscle cell-specific promoters, cardiac cell-specific promoters, kidney cell-specific promoters, bone cell-specific promoters, It may be an endothelial cell-specific promoter, or an immune cell-specific promoter (e.g., a B cell promoter or a T cell promoter).

발달 조절 프로모터는 예를 들어, 발달의 배아 시기 동안에만, 또는 성체 세포에서만 활성인 프로모터를 포함한다.Developmentally regulated promoters include, for example, promoters that are active only during the embryonic phase of development, or only in adult cells.

개시된 유전자 변이체 및 조절 서열을 포함하는 핵산 서열에 더하여, 재조합 발현 벡터는 추가 서열, 예컨대, 숙주 세포에서 벡터의 복제를 조절하는 서열(예를 들어, 복제 기점) 및 선택 가능한 마커 유전자를 보유할 수 있다. 선택 가능한 마커 유전자는 벡터가 도입된 숙주 세포의 선택을 용이하게 할 수 있다(예를 들어, 미국 특허 제4,399,216호; 제4,634,665호; 및 제5,179,017호 참고). 예를 들어, 선택 가능한 마커 유전자는 벡터가 도입된 숙주 세포에서 약물, 예컨대, G418, 하이그로마이신 또는 메토트렉세이트에 대한 내성을 부여할 수 있다. 예시적인 선택 가능한 마커 유전자는 다이하이드로폴레이트 리덕타제(DHFR) 유전자(메토트렉세이트 선택/증폭을 갖는 dhfr-숙주 세포에서의 사용을 위해서), neo 유전자(G418 선택을 위해서), 및 글루타메이트 합성효소(GS) 유전자를 포함하지만 이들로 제한되지 않는다.In addition to the nucleic acid sequences comprising the disclosed genetic variants and regulatory sequences, recombinant expression vectors may carry additional sequences, such as sequences that regulate replication of the vector in host cells (e.g., origins of replication) and selectable marker genes. there is. Selectable marker genes can facilitate the selection of host cells into which the vector has been introduced (see, for example, US Pat. Nos. 4,399,216; 4,634,665; and 5,179,017). For example, a selectable marker gene can confer resistance to a drug, such as G418, hygromycin, or methotrexate, in the host cell into which the vector has been introduced. Exemplary selectable marker genes include the dihydrofolate reductase (DHFR) gene (for use in dhfr-host cells with methotrexate selection/amplification), the neo gene (for G418 selection), and glutamate synthase (GS ) genes, but are not limited to these.

본 개시내용은 또한 변이체 B4GALT1 폴리펩타이드(Asn352Ser)를 포함하는 단리된 폴리펩타이드를 제공한다. 예시적인 야생형 인간 B4GALT1 폴리펩타이드는 UniProt 등록 번호 P15291(서열번호 7)에 배정되며, 398개의 아미노산으로 이루어진다. 야생형 인간 B4GALT1 내의 동일한 위치에서의 아스파라긴과 대조적으로, 인간 변이체 B4GALT1 폴리펩타이드는 전장/성숙 B4GALT1 폴리펩타이드(서열번호 8)의 352번 위치에 상응하는 위치에 세린을 포함한다(각각 서열번호 8을 서열번호 7과 비교함). 일부 실시형태에서, 단리된 폴리펩타이드는 서열번호 8을 포함한다. 일부 실시형태에서, 단리된 폴리펩타이드는 서열번호 8로 이루어진다.The present disclosure also provides isolated polypeptides comprising variant B4GALT1 polypeptides (Asn352Ser). The exemplary wild-type human B4GALT1 polypeptide is assigned UniProt accession number P15291 (SEQ ID NO: 7) and consists of 398 amino acids. In contrast to the asparagine at the same position in wild-type human B4GALT1 , the human variant B4GALT1 polypeptide contains a serine at the position corresponding to position 352 of the full-length/mature B4GALT1 polypeptide (SEQ ID NO: 8) (respectively SEQ ID NO: 8) Compare with number 7). In some embodiments, the isolated polypeptide comprises SEQ ID NO:8. In some embodiments, the isolated polypeptide consists of SEQ ID NO:8.

일부 실시형태에서, 단리된 폴리펩타이드는 서열번호 8과 적어도 약 70%, 적어도 약 75%, 적어도 약 80%, 적어도 약 85%, 적어도 약 90%, 적어도 약 95%, 적어도 약 96%, 적어도 약 97%, 적어도 약 98%, 적어도 약 99% 또는 100% 동일한 아미노산 서열을 포함하거나 이것으로 이루어진다. 일부 실시형태에서, 단리된 폴리펩타이드는 서열번호 8의 352번 위치에 상응하는 위치에 세린을 포함한다. 일부 실시형태에서, 단리된 폴리펩타이드는 서열번호 8과 적어도 약 90%, 적어도 약 95%, 적어도 약 96%, 적어도 약 97%, 적어도 약 98%, 적어도 약 99% 또는 100% 동일한 아미노산 서열을 포함하거나 이것으로 이루어진다. 일부 실시형태에서, 단리된 폴리펩타이드는 서열번호 8의 352번 위치에 상응하는 위치에 세린을 포함한다. 일부 실시형태에서, 단리된 폴리펩타이드는 서열번호 8과 적어도 약 90% 동일한 아미노산 서열을 포함하거나 이것으로 이루어진다. 일부 실시형태에서, 단리된 폴리펩타이드는 서열번호 8과 적어도 약 90% 동일한 아미노산 서열을 포함하거나 이들로 이루어지고, 서열번호 8의 352번 위치에 상응하는 위치에 세린을 포함한다. 일부 실시형태에서, 단리된 폴리펩타이드는 서열번호 8과 적어도 약 90% 동일한 아미노산 서열을 포함하거나 이것으로 이루어지되, 단 단리된 폴리펩타이드는 서열번호 8의 352번 위치에 상응하는 위치에 세린을 포함한다.In some embodiments, the isolated polypeptide is at least about 70%, at least about 75%, at least about 80%, at least about 85%, at least about 90%, at least about 95%, at least about 96%, or at least It comprises or consists of amino acid sequences that are about 97%, at least about 98%, at least about 99%, or 100% identical. In some embodiments, the isolated polypeptide comprises a serine at a position corresponding to position 352 of SEQ ID NO:8. In some embodiments, the isolated polypeptide has an amino acid sequence that is at least about 90%, at least about 95%, at least about 96%, at least about 97%, at least about 98%, at least about 99%, or 100% identical to SEQ ID NO:8. Contains or consists of this. In some embodiments, the isolated polypeptide comprises a serine at a position corresponding to position 352 of SEQ ID NO:8. In some embodiments, the isolated polypeptide comprises or consists of an amino acid sequence that is at least about 90% identical to SEQ ID NO:8. In some embodiments, the isolated polypeptide comprises or consists of an amino acid sequence that is at least about 90% identical to SEQ ID NO: 8 and includes a serine at a position corresponding to position 352 of SEQ ID NO: 8. In some embodiments, the isolated polypeptide comprises or consists of an amino acid sequence that is at least about 90% identical to SEQ ID NO: 8, provided that the isolated polypeptide contains a serine at a position corresponding to position 352 of SEQ ID NO: 8. do.

일부 실시형태에서, 단리된 폴리펩타이드는 서열번호 8의 352번 위치에 상응하는 위치에 세린을 포함한다. 일부 실시형태에서, 단리된 폴리펩타이드는 서열번호 8과 적어도 약 95% 동일한 아미노산 서열을 포함하거나 이것으로 이루어진다. 일부 실시형태에서, 단리된 폴리펩타이드는 서열번호 8과 적어도 약 95% 동일한 아미노산 서열을 포함하거나 이들로 이루어지고, 서열번호 8의 352번 위치에 상응하는 위치에 세린을 포함한다. 일부 실시형태에서, 단리된 폴리펩타이드는 서열번호 8과 적어도 약 95% 동일한 아미노산 서열을 포함하거나 이것으로 이루어지되, 단 단리된 폴리펩타이드는 서열번호 8의 352번 위치에 상응하는 위치에 세린을 포함한다. 일부 실시형태에서, 단리된 폴리펩타이드는 서열번호 8의 352번 위치에 상응하는 위치에 세린을 포함한다. 일부 실시형태에서, 단리된 폴리펩타이드는 서열번호 8과 적어도 약 98% 동일한 아미노산 서열을 포함하거나 이것으로 이루어진다. 일부 실시형태에서, 단리된 폴리펩타이드는 서열번호 8과 적어도 약 98% 동일한 아미노산 서열을 포함하거나 이들로 이루어지고, 서열번호 8의 352번 위치에 상응하는 위치에 세린을 포함한다. 일부 실시형태에서, 단리된 폴리펩타이드는 서열번호 8과 적어도 약 98% 동일한 아미노산 서열을 포함하거나 이것으로 이루어지되, 단 단리된 폴리펩타이드는 서열번호 8의 352번 위치에 상응하는 위치에 세린을 포함한다. 일부 실시형태에서, 단리된 폴리펩타이드는 서열번호 8의 352번 위치에 상응하는 위치에 세린을 포함한다. 일부 실시형태에서, 단리된 폴리펩타이드는 서열번호 8과 적어도 약 99% 동일한 아미노산 서열을 포함하거나 이것으로 이루어진다. 일부 실시형태에서, 단리된 폴리펩타이드는 서열번호 8과 적어도 약 99% 동일한 아미노산 서열을 포함하거나 이것으로 이루어지고, 서열번호 8의 352번 위치에 상응하는 위치에 세린을 포함한다. 일부 실시형태에서, 단리된 폴리펩타이드는 서열번호 8과 적어도 약 99% 동일한 아미노산 서열을 포함하거나 이것으로 이루어지되, 단 단리된 폴리펩타이드는 서열번호 8의 352번 위치에 상응하는 위치에 세린을 포함한다.In some embodiments, the isolated polypeptide comprises a serine at a position corresponding to position 352 of SEQ ID NO:8. In some embodiments, the isolated polypeptide comprises or consists of an amino acid sequence that is at least about 95% identical to SEQ ID NO:8. In some embodiments, the isolated polypeptide comprises or consists of an amino acid sequence that is at least about 95% identical to SEQ ID NO: 8 and includes a serine at a position corresponding to position 352 of SEQ ID NO: 8. In some embodiments, the isolated polypeptide comprises or consists of an amino acid sequence that is at least about 95% identical to SEQ ID NO: 8, provided that the isolated polypeptide contains a serine at a position corresponding to position 352 of SEQ ID NO: 8. do. In some embodiments, the isolated polypeptide comprises a serine at a position corresponding to position 352 of SEQ ID NO:8. In some embodiments, the isolated polypeptide comprises or consists of an amino acid sequence that is at least about 98% identical to SEQ ID NO:8. In some embodiments, the isolated polypeptide comprises or consists of an amino acid sequence that is at least about 98% identical to SEQ ID NO: 8 and includes a serine at a position corresponding to position 352 of SEQ ID NO: 8. In some embodiments, the isolated polypeptide comprises or consists of an amino acid sequence that is at least about 98% identical to SEQ ID NO: 8, provided that the isolated polypeptide contains a serine at a position corresponding to position 352 of SEQ ID NO: 8. do. In some embodiments, the isolated polypeptide comprises a serine at a position corresponding to position 352 of SEQ ID NO:8. In some embodiments, the isolated polypeptide comprises or consists of an amino acid sequence that is at least about 99% identical to SEQ ID NO:8. In some embodiments, the isolated polypeptide comprises or consists of an amino acid sequence that is at least about 99% identical to SEQ ID NO: 8 and includes a serine at a position corresponding to position 352 of SEQ ID NO: 8. In some embodiments, the isolated polypeptide comprises or consists of an amino acid sequence that is at least about 99% identical to SEQ ID NO: 8, provided that the isolated polypeptide contains a serine at a position corresponding to position 352 of SEQ ID NO: 8. do.

일부 실시형태에서, 단리된 폴리펩타이드는 서열번호 8의 적어도 약 15, 적어도 약 20, 적어도 약 25, 적어도 약 30, 적어도 약 35, 적어도 약 40, 적어도 약 45, 적어도 약 50, 적어도 약 60, 적어도 약 70, 적어도 약 80, 적어도 약 90, 적어도 약 100, 적어도 약 150, 적어도 약 200, 적어도 약 250, 적어도 약 300 또는 적어도 약 350개의 인접 아미노산을 포함하거나 이것으로 이루어진다. 일부 실시형태에서, 단리된 폴리펩타이드는 또한 서열번호 8의 352번 위치에 상응하는 위치에 세린을 포함한다. 일부 실시형태에서, 단리된 폴리펩타이드는 서열번호 8의 적어도 약 8, 적어도 약 10, 적어도 약 15, 적어도 약 20, 적어도 약 25, 적어도 약 30, 적어도 약 35, 적어도 약 40, 적어도 약 45, 적어도 약 50, 적어도 약 60, 적어도 약 70, 적어도 약 80, 적어도 약 90, 적어도 약 100, 적어도 약 150, 적어도 약 200, 적어도 약 250, 적어도 약 300 또는 적어도 약 350개의 인접 아미노산과 적어도 약 70%, 적어도 약 75%, 적어도 약 80%, 적어도 약 85%, 적어도 약 90%, 적어도 약 91%, 적어도 약 92%, 적어도 약 93%, 적어도 약 94%, 적어도 약 95%, 적어도 약 96%, 적어도 약 97%, 적어도 약 98%, 적어도 약 99% 또는 100% 동일한 아미노산 서열을 포함하거나 이것으로 이루어진다. 일부 실시형태에서, 단리된 폴리펩타이드는 또한 서열번호 8의 352번 위치에 상응하는 위치에 세린을 포함한다. 일부 실시형태에서, 단리된 폴리펩타이드는 서열번호 8의 적어도 약 8, 적어도 약 10, 적어도 약 15, 적어도 약 20, 적어도 약 25, 적어도 약 30, 적어도 약 35, 적어도 약 40, 적어도 약 45, 적어도 약 50, 적어도 약 60, 적어도 약 70, 적어도 약 80, 적어도 약 90, 적어도 약 100, 적어도 약 150, 적어도 약 200, 적어도 약 250, 적어도 약 300 또는 적어도 약 350개의 인접 아미노산과 적어도 약 90%, 적어도 약 91%, 적어도 약 92%, 적어도 약 93%, 적어도 약 94%, 적어도 약 95%, 적어도 약 96%, 적어도 약 97%, 적어도 약 98%, 적어도 약 99% 또는 100% 동일한 아미노산 서열을 포함하거나 이것으로 이루어진다. 일부 실시형태에서, 단리된 폴리펩타이드는 또한 서열번호 8의 352번 위치에 상응하는 위치에 세린을 포함한다.In some embodiments, the isolated polypeptide is at least about 15, at least about 20, at least about 25, at least about 30, at least about 35, at least about 40, at least about 45, at least about 50, at least about 60, It contains or consists of at least about 70, at least about 80, at least about 90, at least about 100, at least about 150, at least about 200, at least about 250, at least about 300 or at least about 350 contiguous amino acids. In some embodiments, the isolated polypeptide also includes a serine at a position corresponding to position 352 of SEQ ID NO:8. In some embodiments, the isolated polypeptide is at least about 8, at least about 10, at least about 15, at least about 20, at least about 25, at least about 30, at least about 35, at least about 40, at least about 45, at least about 50, at least about 60, at least about 70, at least about 80, at least about 90, at least about 100, at least about 150, at least about 200, at least about 250, at least about 300 or at least about 350 contiguous amino acids and at least about 70 %, at least about 75%, at least about 80%, at least about 85%, at least about 90%, at least about 91%, at least about 92%, at least about 93%, at least about 94%, at least about 95%, at least about 96 %, at least about 97%, at least about 98%, at least about 99%, or 100% identical amino acid sequences. In some embodiments, the isolated polypeptide also includes a serine at a position corresponding to position 352 of SEQ ID NO:8. In some embodiments, the isolated polypeptide is at least about 8, at least about 10, at least about 15, at least about 20, at least about 25, at least about 30, at least about 35, at least about 40, at least about 45, at least about 50, at least about 60, at least about 70, at least about 80, at least about 90, at least about 100, at least about 150, at least about 200, at least about 250, at least about 300 or at least about 350 contiguous amino acids and at least about 90 %, at least about 91%, at least about 92%, at least about 93%, at least about 94%, at least about 95%, at least about 96%, at least about 97%, at least about 98%, at least about 99%, or 100% identical Contains or consists of an amino acid sequence. In some embodiments, the isolated polypeptide also includes a serine at a position corresponding to position 352 of SEQ ID NO:8.

일부 실시형태에서, 단리된 폴리펩타이드는 서열번호 8의 적어도 300개의 인접 아미노산과 적어도 90% 동일한 아미노산 서열을 포함하거나 이것으로 이루어진다. 일부 실시형태에서, 단리된 폴리펩타이드는 서열번호 8의 적어도 300개의 인접 아미노산과 적어도 90% 동일한 아미노산 서열을 포함하거나 이것으로 이루어지고, 단리된 폴리펩타이드는 또한 서열번호 8의 352번 위치에 상응하는 위치에 세린을 포함한다. 일부 실시형태에서, 단리된 폴리펩타이드는 서열번호 8의 적어도 300개의 인접 아미노산과 적어도 95% 동일한 아미노산 서열을 포함하거나 이것으로 이루어진다. 일부 실시형태에서, 단리된 폴리펩타이드는 서열번호 8의 적어도 300개의 인접 아미노산과 적어도 95% 동일한 아미노산 서열을 포함하거나 이것으로 이루어지고, 단리된 폴리펩타이드는 또한 서열번호 8의 352번 위치에 상응하는 위치에 세린을 포함한다. 일부 실시형태에서, 단리된 폴리펩타이드는 서열번호 8의 적어도 300개의 인접 아미노산과 적어도 98% 동일한 아미노산 서열을 포함하거나 이것으로 이루어진다. 일부 실시형태에서, 단리된 폴리펩타이드는 서열번호 8의 적어도 300개의 인접 아미노산과 적어도 98% 동일한 아미노산 서열을 포함하거나 이것으로 이루어지고, 단리된 폴리펩타이드는 또한 서열번호 8의 352번 위치에 상응하는 위치에 세린을 포함한다. 일부 실시형태에서, 단리된 폴리펩타이드는 서열번호 8의 적어도 300개의 인접 아미노산과 적어도 99% 동일한 아미노산 서열을 포함하거나 이것으로 이루어진다. 일부 실시형태에서, 단리된 폴리펩타이드는 서열번호 8의 적어도 300개의 인접 아미노산과 적어도 99% 동일한 아미노산 서열을 포함하거나 이것으로 이루어지고, 단리된 폴리펩타이드는 또한 서열번호 8의 352번 위치에 상응하는 위치에 세린을 포함한다.In some embodiments, the isolated polypeptide comprises or consists of an amino acid sequence that is at least 90% identical to at least 300 contiguous amino acids of SEQ ID NO:8. In some embodiments, the isolated polypeptide comprises or consists of an amino acid sequence that is at least 90% identical to at least 300 contiguous amino acids of SEQ ID NO: 8, and the isolated polypeptide also has an amino acid sequence corresponding to position 352 of SEQ ID NO: 8. Contains serine in position. In some embodiments, the isolated polypeptide comprises or consists of an amino acid sequence that is at least 95% identical to at least 300 contiguous amino acids of SEQ ID NO:8. In some embodiments, the isolated polypeptide comprises or consists of an amino acid sequence that is at least 95% identical to at least 300 contiguous amino acids of SEQ ID NO: 8, and the isolated polypeptide also has an amino acid sequence corresponding to position 352 of SEQ ID NO: 8. Contains serine in position. In some embodiments, the isolated polypeptide comprises or consists of an amino acid sequence that is at least 98% identical to at least 300 contiguous amino acids of SEQ ID NO:8. In some embodiments, the isolated polypeptide comprises or consists of an amino acid sequence that is at least 98% identical to at least 300 contiguous amino acids of SEQ ID NO: 8, and the isolated polypeptide also has an amino acid sequence corresponding to position 352 of SEQ ID NO: 8. Contains serine in position. In some embodiments, the isolated polypeptide comprises or consists of an amino acid sequence that is at least 99% identical to at least 300 contiguous amino acids of SEQ ID NO:8. In some embodiments, the isolated polypeptide comprises or consists of an amino acid sequence that is at least 99% identical to at least 300 contiguous amino acids of SEQ ID NO: 8, and the isolated polypeptide also has an amino acid sequence corresponding to position 352 of SEQ ID NO: 8. Contains serine in position.

일부 실시형태에서, 단리된 폴리펩타이드는 서열번호 8의 적어도 약 15, 적어도 약 20, 적어도 약 25, 적어도 약 30, 적어도 약 35, 적어도 약 40, 적어도 약 45, 적어도 약 50, 적어도 약 60, 적어도 약 70, 적어도 약 80, 적어도 약 90 또는 적어도 약 100개의 인접 아미노산을 포함하거나 이것으로 이루어진다. 일부 실시형태에서, 단리된 폴리펩타이드는 또한 서열번호 8의 352번 위치에 상응하는 위치에 세린을 포함한다. 일부 실시형태에서, 단리된 폴리펩타이드는 서열번호 8의 적어도 약 8, 적어도 약 10, 적어도 약 15, 적어도 약 20, 적어도 약 25, 적어도 약 30, 적어도 약 35, 적어도 약 40, 적어도 약 45, 적어도 약 50, 적어도 약 60, 적어도 약 70, 적어도 약 80, 적어도 약 90 또는 적어도 약 100개의 인접 아미노산과 적어도 약 70%, 적어도 약 75%, 적어도 약 80%, 적어도 약 85%, 적어도 약 90%, 적어도 약 91%, 적어도 약 92%, 적어도 약 93%, 적어도 약 94%, 적어도 약 95%, 적어도 약 96%, 적어도 약 97%, 적어도 약 98%, 적어도 약 99% 또는 100% 동일한 아미노산 서열을 포함하거나 이것으로 이루어진다. 일부 실시형태에서, 단리된 폴리펩타이드는 또한 서열번호 8의 352번 위치에 상응하는 위치에 세린을 포함한다. 일부 실시형태에서, 단리된 폴리펩타이드는 서열번호 8의 적어도 약 8, 적어도 약 10, 적어도 약 15, 적어도 약 20, 적어도 약 25, 적어도 약 30, 적어도 약 35, 적어도 약 40, 적어도 약 45, 적어도 약 50, 적어도 약 60, 적어도 약 70, 적어도 약 80, 적어도 약 90 또는 적어도 약 100개의 인접 아미노산과 적어도 약 90%, 적어도 약 91%, 적어도 약 92%, 적어도 약 93%, 적어도 약 94%, 적어도 약 95%, 적어도 약 96%, 적어도 약 97%, 적어도 약 98%, 적어도 약 99% 또는 100% 동일한 아미노산 서열을 포함하거나 이것으로 이루어진다. 일부 실시형태에서, 단리된 폴리펩타이드는 또한 서열번호 8의 352번 위치에 상응하는 위치에 세린을 포함한다.In some embodiments, the isolated polypeptide is at least about 15, at least about 20, at least about 25, at least about 30, at least about 35, at least about 40, at least about 45, at least about 50, at least about 60, It contains or consists of at least about 70, at least about 80, at least about 90 or at least about 100 contiguous amino acids. In some embodiments, the isolated polypeptide also includes a serine at a position corresponding to position 352 of SEQ ID NO:8. In some embodiments, the isolated polypeptide is at least about 8, at least about 10, at least about 15, at least about 20, at least about 25, at least about 30, at least about 35, at least about 40, at least about 45, at least about 50, at least about 60, at least about 70, at least about 80, at least about 90 or at least about 100 contiguous amino acids and at least about 70%, at least about 75%, at least about 80%, at least about 85%, at least about 90 %, at least about 91%, at least about 92%, at least about 93%, at least about 94%, at least about 95%, at least about 96%, at least about 97%, at least about 98%, at least about 99%, or 100% identical Contains or consists of an amino acid sequence. In some embodiments, the isolated polypeptide also includes a serine at a position corresponding to position 352 of SEQ ID NO:8. In some embodiments, the isolated polypeptide is at least about 8, at least about 10, at least about 15, at least about 20, at least about 25, at least about 30, at least about 35, at least about 40, at least about 45, at least about 50, at least about 60, at least about 70, at least about 80, at least about 90, or at least about 100 contiguous amino acids and at least about 90%, at least about 91%, at least about 92%, at least about 93%, at least about 94 %, at least about 95%, at least about 96%, at least about 97%, at least about 98%, at least about 99%, or 100% identical amino acid sequences. In some embodiments, the isolated polypeptide also includes a serine at a position corresponding to position 352 of SEQ ID NO:8.

대표적인 야생형 B4GALT1 폴리펩타이드 서열은 서열번호 7에 제시된다. 대표적인 B4GALT1 변이체 폴리펩타이드 서열은 서열번호 8에 제시된다.A representative wild-type B4GALT1 polypeptide sequence is shown in SEQ ID NO:7. A representative B4GALT1 variant polypeptide sequence is set forth in SEQ ID NO:8.

본 명세서에 개시된 단리된 폴리펩타이드는 자연 발생 B4GALT1 폴리펩타이드의 아미노산 서열을 포함할 수 있거나, 또는 비-자연 발생 서열을 포함할 수 있다. 일부 실시형태에서, 자연 발생 서열은 보존적 아미노산 치환으로 인해서 비-자연 발생 서열과 상이할 수 있다. 예를 들어, 서열은 보존적 아미노산 치환을 제외하고 동일할 수 있다.Isolated polypeptides disclosed herein may comprise the amino acid sequence of a naturally occurring B4GALT1 polypeptide, or may comprise a non-naturally occurring sequence. In some embodiments, naturally occurring sequences may differ from non-naturally occurring sequences due to conservative amino acid substitutions. For example, the sequences may be identical except for conservative amino acid substitutions.

일부 실시형태에서, 본 명세서에 개시된 단리된 폴리펩타이드는 이종 폴리펩타이드 또는 이종 분자 또는 표지에 연결되거나 또는 융합되며, 이의 다양한 예는 본 명세서의 다른 곳에서 개시된다. 예를 들어, 이러한 단백질은 증가되거나 또는 감소된 안정성을 제공하는 이종 폴리펩타이드에 융합될 수 있다. 융합된 도메인 또는 이종 폴리펩타이드는 N 말단에서, C 말단에서 또는 폴리펩타이드 내의 내부에 위치될 수 있다. 융합 파트너는 예를 들어, T 헬퍼 에피토프를 제공하는 것을 보조할 수 있거나(면역학적 융합 파트너), 또는 네이티브 재조합 폴리펩타이드보다 더 높은 수율로 단백질을 발현하는 것을 보조할 수 있다(발현 인핸서). 특정 융합 파트너는 면역학적 융합 파트너 및 발현 향상 융합 상대 둘 다이다. 다른 융합 파트너는 폴리펩타이드의 용해도를 증가시키거나, 또는 목적하는 세포내 구획에 대한 폴리펩타이드의 표적화를 용이하게 하도록 선택될 수 있다. 일부 융합 파트너는 친화성 태그를 포함하는데, 이것은 폴리펩타이드의 정제를 용이하게 한다.In some embodiments, an isolated polypeptide disclosed herein is linked or fused to a heterologous polypeptide or heterologous molecule or label, various examples of which are disclosed elsewhere herein. For example, such proteins can be fused to heterologous polypeptides to provide increased or decreased stability. The fused domain or heterologous polypeptide can be located at the N terminus, at the C terminus, or internally within the polypeptide. Fusion partners can help provide, for example, a T helper epitope (immunological fusion partner), or help express the protein in higher yields than the native recombinant polypeptide (expression enhancer). Specific fusion partners are both immunological fusion partners and expression enhancing fusion partners. Other fusion partners may be selected to increase the solubility of the polypeptide or to facilitate targeting of the polypeptide to the desired intracellular compartment. Some fusion partners contain an affinity tag, which facilitates purification of the polypeptide.

일부 실시형태에서, 융합 단백질은 이종 분자에 직접 융합되거나, 또는 링커, 예컨대, 펩타이드 링커를 통해서 이종 분자에 연결된다. 적합한 펩타이드 링커 서열은 예를 들어, 하기 인자를 기초로 선택될 수 있다: 1) 유연한 확장된 입체 배좌를 채택하는 능력; 2) 제1 폴리펩타이드와 제2 폴리펩타이드 상에서 기능성 에피토프와 상호작용할 수 있었던 이차 구조를 채택하는 것에 대한 내성; 및 3) 폴리펩타이드 기능성 에피토프와 반응할 수 있는 소수성 또는 하전된 잔기의 결핍. 예를 들어, 펩타이드 링커 서열은 Gly, Asn 및 Ser 잔기를 함유할 수 있다. 다른 거의 중성인 아미노산, 예컨대, Thr 및 Ala이 또한 링커 서열에서 사용될 수 있다. 링커로서 유용하게 사용될 수 있는 아미노산 서열은 예를 들어, 문헌[Maratea et al., Gene, 1985, 40, 39-46; Murphy et al., Proc. Natl. Acad. Sci. USA, 1986, 83, 8258-8262]; 및 미국 특허 제4,935,233호 및 제4,751,180호에 개시된 것을 포함한다. 링커 서열은 일반적으로 예를 들어, 1 내지 약 50개 아미노산 길이일 수 있다. 제1 폴리펩타이드 및 제2 폴리펩타이드가 기능성 도메인을 분리하고, 입체 장애를 예방하는데 사용될 수 있는 비필수적인 N-말단 아미노산 영역을 갖는 경우, 링커 서열은 일반적으로 필요하지 않다.In some embodiments, the fusion protein is fused directly to a heterologous molecule or is linked to the heterologous molecule through a linker, such as a peptide linker. A suitable peptide linker sequence can be selected based on, for example, the following factors: 1) ability to adopt a flexible extended conformation; 2) resistance to adopting secondary structures that could interact with functional epitopes on the first and second polypeptides; and 3) lack of hydrophobic or charged residues that can react with the polypeptide functional epitope. For example, the peptide linker sequence may contain Gly, Asn, and Ser residues. Other nearly neutral amino acids, such as Thr and Ala, can also be used in the linker sequence. Amino acid sequences that can be usefully used as linkers are described, for example, in Maratea et al., Gene , 1985, 40, 39-46; Murphy et al., Proc. Natl. Acad. Sci. USA , 1986, 83, 8258-8262]; and U.S. Patent Nos. 4,935,233 and 4,751,180. The linker sequence may generally be, for example, from 1 to about 50 amino acids in length. If the first and second polypeptides have a non-essential N-terminal amino acid region that can be used to separate the functional domains and prevent steric hindrance, a linker sequence is generally not necessary.

일부 실시형태에서, 폴리펩타이드는 세포-관통 도메인에 작동 가능하게 연결된다. 예를 들어, 세포-관통 도메인은 HIV-1 TAT 단백질, 인간 B형 간염 바이러스로부터의 TLM 세포-관통 모티프, MPG, Pep-1, VP22, 단순 포진 바이러스로부터의 세포-관통 펩타이드, 또는 폴리아르기닌 펩타이드 서열로부터 유래될 수 있다(예를 들어, 국제 특허 제WO2014/089290호 참고). 세포-관통 도메인은 N-말단에서, C-말단에서, 또는 단백질 내에 어디든지 위치될 수 있다.In some embodiments, the polypeptide is operably linked to a cell-penetrating domain. For example, the cell-penetrating domain may be the HIV-1 TAT protein, the TLM cell-penetrating motif from human hepatitis B virus, MPG, Pep-1, VP22, a cell-penetrating peptide from herpes simplex virus, or a polyarginine peptide. It may be derived from a sequence (see, for example, International Patent No. WO2014/089290). The cell-penetrating domain may be located at the N-terminus, at the C-terminus, or anywhere within the protein.

일부 실시형태에서, 폴리펩타이드는 추적 또는 정제의 용이성을 위해서 이종 폴리펩타이드, 예컨대, 형광 단백질, 정제 태그, 또는 에피토프 태그에 작동 가능하게 연결된다. 형광 단백질의 예는 녹색 형광 단백질(예를 들어, GFP, GFP-2, tagGFP, turboGFP, eGFP, Emerald, Azami Green, 단량체 Azami Green, CopGFP, AceGFP, ZsGreenl), 황색 형광 단백질(예를 들어, YFP, eYFP, Citrine, Venus, YPet, PhiYFP, ZsYellowl), 청색 형광 단백질(예를 들어 eBFP, eBFP2, Azurite, mKalamal, GFPuv, Sapphire, T-sapphire), 청록색 형광 단백질(예를 들어 eCFP, Cerulean, CyPet, AmCyanl, Midoriishi-Cyan), 적색 형광 단백질(mKate, mKate2, mPlum, DsRed 단량체, mCherry, mRFP1, DsRed-Express, DsRed2, DsRed-단량체, HcRed-Tandem, HcRedl, AsRed2, eqFP611, mRaspberry, mStrawberry, Jred), 주황색 형광 단백질(mOrange, mKO, Kusabira-Orange, 단량체 Kusabira-Orange, mTangerine, tdTomato) 및 임의의 다른 적합한 형광 단백질을 포함하지만 이들로 제한되지 않는다. 태그의 예는 글루타티온-S-트랜스퍼라제(GST), 키틴 결합 단백질(CBP), 말토스 결합 단백질, 티오레독신(TRX), 폴리(NANP), 탠덤 친화성 정제(TAP) 태그, myc, AcV5, AU1, AU5, E, ECS, E2, FLAG, 헤마글루티닌(HA), nus, Softag 1, Softag 3, Strep, SBP, Glu-Glu, HSV, KT3, S, S1, T7, V5, VSV-G, 히스티딘(His), 바이오틴 카복실 담체 단백질(BCCP) 및 칼모듈린을 포함하지만 이들로 제한되지 않는다. 일부 실시형태에서, 이종 분자는 면역글로불린 Fc 도메인, 펩타이드 태그, 형질도입 도메인, 폴리(에틸렌 글리콜), 폴리시알산 또는 글리콜산이다.In some embodiments, the polypeptide is operably linked to a heterologous polypeptide, such as a fluorescent protein, purification tag, or epitope tag, for ease of tracking or purification. Examples of fluorescent proteins include green fluorescent protein (e.g., GFP, GFP-2, tagGFP, turboGFP, eGFP, Emerald, Azami Green, monomeric Azami Green, CopGFP, AceGFP, ZsGreenl), yellow fluorescent protein (e.g., YFP , eYFP, Citrine, Venus, YPet, PhiYFP, ZsYellowl), blue fluorescent proteins (e.g. eBFP, eBFP2, Azurite, mKalamal, GFPuv, Sapphire, T-sapphire), cyan fluorescent proteins (e.g. eCFP, Cerulean, CyPet , AmCyanl, Midoriishi-Cyan), red fluorescent protein (mKate, mKate2, mPlum, DsRed monomer, mCherry, mRFP1, DsRed-Express, DsRed2, DsRed-monomer, HcRed-Tandem, HcRedl, AsRed2, eqFP611, mRaspberry, mStrawberry, Jred ), orange fluorescent protein (mOrange, mKO, Kusabira-Orange, monomeric Kusabira-Orange, mTangerine, tdTomato), and any other suitable fluorescent protein. Examples of tags include glutathione-S-transferase (GST), chitin binding protein (CBP), maltose binding protein, thioredoxin (TRX), poly(NANP), tandem affinity purification (TAP) tag, myc, AcV5. , AU1, AU5, E, ECS, E2, FLAG, hemagglutinin (HA), nus, Softag 1, Softag 3, Strep, SBP, Glu-Glu, HSV, KT3, S, S1, T7, V5, VSV -G, histidine (His), biotin carboxyl carrier protein (BCCP), and calmodulin. In some embodiments, the heterologous molecule is an immunoglobulin Fc domain, peptide tag, transduction domain, poly(ethylene glycol), polysialic acid, or glycolic acid.

일부 실시형태에서, 단리된 폴리펩타이드는 비-자연 또는 변형된 아미노산 또는 펩타이드 유사체를 포함한다. 예를 들어, 다수의 D-아미노산 또는 자연 발생 아미노산과 상이한 기능성 치환체를 갖는 아미노산이 존재한다. 자연발생 펩타이드의 반대 입체이성질체뿐만 아니라 펩타이드 유사체의 입체이성질체가 개시된다. 이러한 아미노산은 tRNA 분자에 선택된 아미노산을 충전시키고, 예를 들어, 앰버 코돈을 활용하는 유전자 작제물을 가공하여 유사체 아미노산을 펩타이드 쇄 내로 부위 특이적 방식으로 삽입함으로써 폴리펩타이드 쇄 내에 쉽게 통합될 수 있다.In some embodiments, the isolated polypeptide comprises a non-natural or modified amino acid or peptide analog. For example, there are a number of D-amino acids or amino acids with functional substitutions that differ from naturally occurring amino acids. Stereoisomers of peptide analogs as well as opposite stereoisomers of naturally occurring peptides are disclosed. These amino acids can be easily incorporated into a polypeptide chain by charging a tRNA molecule with the selected amino acid and inserting the analog amino acid into the peptide chain in a site-specific manner, for example, by engineering a genetic construct utilizing an amber codon.

일부 실시형태에서, 단리된 폴리펩타이드는 펩타이드 모방체인데, 이것은 펩타이드와 유사하도록 생산될 수 있지만 자연 펩타이드 링키지를 통해서 연결되지는 않는다. 예를 들어, 아미노산 또는 아미노산 유사체를 위한 링키지는 -CH₂NH-, -CH₂S-, -CH₂-, -CH=CH-(시스 및 트랜스), -COCH₂-, -CH(OH)CH₂- 및 -CHH₂SO-를 포함하지만 이들로 제한되지 않는다. 펩타이드 유사체는 결합 원자 사이에 하나 초과의 원자, 예컨대 b-알라닌, g아미노부티르산 등을 가질 수 있다. 아미노산 유사체 및 펩타이드 유사체는 보통 향상된 또는 목적하는 특성, 예컨대, 보다 경제적인 생산, 보다 높은 화학적 안정성, 향상된 약리학적 특성(반감기, 흡수, 효력, 효능 등), 변경된 특이성(예를 들어, 광범위한 생물학적 활성도), 감소된 항원성 및 다른 바람직한 특성을 갖는다.In some embodiments, the isolated polypeptide is a peptide mimetic, which can be produced to resemble a peptide but is not linked through native peptide linkage. For example, linkages for amino acids or amino acid analogues are -CH ₂ NH-, -CH ₂ S-, -CH ₂ -, -CH=CH- (cis and trans), -COCH ₂ -, -CH(OH) Including, but not limited to, CH ₂ - and -CHH ₂ SO-. Peptide analogs may have more than one atom between bonding atoms, such as b-alanine, g-aminobutyric acid, etc. Amino acid analogs and peptide analogs usually have improved or desired properties, such as more economical production, higher chemical stability, improved pharmacological properties (half-life, absorption, potency, potency, etc.), altered specificity (e.g., broad biological activity), ), reduced antigenicity, and other desirable properties.

일부 실시형태에서, 단리된 폴리펩타이드는 D-아미노산을 포함하는데, 이것은 D 아미노산이 펩티다제에 의해서 인식되지 않기 때문에, 보다 안정적인 펩타이드를 생성시키는 데 사용될 수 있다. 보다 안정적인 펩타이드를 생성시키기 위해서 동일한 유형의 D-아미노산으로의 공통 서열의 하나 이상의 아미노산의 체계적인 치환(예를 들어, L-라이신 대신에 D-라이신)이 사용될 수 있다. 시스테인 잔기를 사용하여 2개 이상의 펩타이드를 함께 고리화 또는 부착할 수 있다. 이것은 펩타이드를 특정 입체 배좌로 구속시키는 데 이로울 수 있다(예를 들어, 문헌[Rizo and Gierasch, Ann. Rev. Biochem., 1992, 61, 387] 참고).In some embodiments, the isolated polypeptide includes D-amino acids, which can be used to generate more stable peptides because D amino acids are not recognized by peptidases. Systematic substitution of one or more amino acids of the consensus sequence with a D-amino acid of the same type (e.g., D-lysine instead of L-lysine) can be used to generate more stable peptides. Cysteine residues can be used to cyclize or attach two or more peptides together. This can be advantageous in confining the peptide to a specific conformation (see, e.g., Rizo and Gierasch, Ann. Rev. Biochem ., 1992, 61, 387).

본 개시내용은 또한 본 명세서에 개시된 폴리펩타이드 중 임의의 것을 암호화하는 핵산 분자를 제공한다. 이것은 특정 폴리펩타이드 서열에 관련된 모든 축중 서열(즉, 하나의 특정 폴리펩타이드 서열을 암호화하는 서열을 갖는 모든 핵산뿐만 아니라 단백질 서열의 개시된 변이체 및 유도체를 암호화하는, 축중 핵산을 비롯한 모든 핵산)을 포함한다. 따라서, 각각의 특정 핵산 서열이 본 명세서에서 기록되지 않을 수 있지만, 각각의 그리고 모든 서열은 개시된 폴리펩타이드 서열을 통해서 본 명세서에서 실제로 개시 및 기재된다.The disclosure also provides nucleic acid molecules encoding any of the polypeptides disclosed herein. This includes all degenerate sequences associated with a particular polypeptide sequence (i.e., all nucleic acids having a sequence encoding one particular polypeptide sequence as well as all nucleic acids, including degenerate nucleic acids encoding the disclosed variants and derivatives of a protein sequence) . Accordingly, although each specific nucleic acid sequence may not be recorded herein, each and every sequence is actually disclosed and described herein through the disclosed polypeptide sequence.

본 개시내용은 또한 본 명세서에 개시된 핵산 분자 중 임의의 하나 이상 및 폴리펩타이드 중 임의의 하나 이상을 포함하는 조성물을 제공한다. 일부 실시형태에서, 조성물은 담체를 포함한다. 일부 실시형태에서, 담체는 핵산 분자 및/또는 폴리펩타이드의 안정성을 증가시킨다(예를 들어, 분해 산물이 역치 미만, 예컨대, 출발 핵산 또는 단백질의 0.5중량% 미만으로 남아있는 주어진 저장 조건(예를 들어, -20℃, 4℃, 또는 주변 온도) 하에서 기간을 연장시키거나; 또는 생체내 안정성을 증가시킴). 담체의 예는 폴리(락트산)(PLA) 미소구체, 폴리(D,L-락트산-코글리콜산)(PLGA) 미소구체, 리포솜, 미셀, 역미셀, 지질 코클레이트(cochleate) 및 지질 미소관을 포함하지만 이들로 제한되지 않는다.The disclosure also provides compositions comprising any one or more of the nucleic acid molecules and any one or more of the polypeptides disclosed herein. In some embodiments, the composition includes a carrier. In some embodiments, the carrier increases the stability of the nucleic acid molecule and/or polypeptide (e.g., given storage conditions where degradation products remain below a threshold, e.g., less than 0.5% by weight of the starting nucleic acid or protein (e.g., (e.g., -20°C, 4°C, or ambient temperature) to extend the period of time; or to increase in vivo stability). Examples of carriers include poly(lactic acid) (PLA) microspheres, poly(D,L-lactic acid-coglycolic acid) (PLGA) microspheres, liposomes, micelles, reverse micelles, lipid cochleates, and lipid microtubules. Including but not limited to these.

본 개시내용은 또한 본 명세서에 제공된 B4GALT1 폴리펩타이드 또는 이의 단편 중 임의의 것의 생산 방법을 제공한다. 이러한 B4GALT1 폴리펩타이드 또는 이의 단편은 임의의 적합한 방법에 의해서 생산될 수 있다. 예를 들어, B4GALT1 폴리펩타이드 또는 이의 단편은 이러한 B4GALT1 폴리펩타이드 또는 이의 단편을 암호화하는 핵산 분자(예를 들어, 재조합 발현 벡터)를 포함하는 숙주 세포로부터 생산될 수 있다. 이러한 방법은 B4GALT1 폴리펩타이드 또는 이의 단편을 암호화하는 핵산 분자(예를 들어, 재조합 발현 벡터)를 포함하는 숙주 세포를, B4GALT1 폴리펩타이드 또는 이의 단편을 생산하기에 충분한 조건 하에서 배양함으로써, B4GALT1 폴리펩타이드 또는 이의 단편을 생산하는 것을 포함할 수 있다. 핵산은 숙주 세포에서 활성인 프로모터에 작동 가능하게 연결될 수 있고, 배양은 핵산이 발현되는 조건 하에서 수행될 수 있다. 이러한 방법은 발현된 B4GALT1 폴리펩타이드 또는 이의 단편을 회수하는 것을 추가로 포함할 수 있다. 회수는 B4GALT1 폴리펩타이드 또는 이의 단편을 정제하는 것을 추가로 포함할 수 있다.The disclosure also provides methods for producing any of the B4GALT1 polypeptides or fragments thereof provided herein. These B4GALT1 polypeptides or fragments thereof can be produced by any suitable method. For example, a B4GALT1 polypeptide or fragment thereof can be produced from a host cell comprising a nucleic acid molecule (e.g., a recombinant expression vector) encoding such B4GALT1 polypeptide or fragment thereof. This method involves culturing a host cell comprising a nucleic acid molecule (e.g., a recombinant expression vector) encoding a B4GALT1 polypeptide or fragment thereof under conditions sufficient to produce the B4GALT1 polypeptide or fragment thereof, thereby producing the B4GALT1 polypeptide or It may include producing fragments thereof. The nucleic acid can be operably linked to a promoter that is active in the host cell, and culturing can be performed under conditions under which the nucleic acid is expressed. Such methods may further include recovering the expressed B4GALT1 polypeptide or fragment thereof. Recovery may further include purifying the B4GALT1 polypeptide or fragment thereof.

단백질 발현에 적합한 시스템의 예는 숙주 세포, 예를 들어: 박테리아 세포 발현 시스템(예를 들어, 에쉐리키아 콜라이(Escherichia coli), 락토코쿠스 락티스(Lactococcus lactis)), 효모 세포 발현 시스템(예를 들어, 사카로마이세스 세레비시에(Saccharomyces cerevisiae), 피치아 파스토리스(Pichia pastoris)), 곤충 세포 발현 시스템(예를 들어, 바쿨로바이러스-매개된 단백질 발현), 및 포유동물 세포 발현 시스템을 포함한다.Examples of suitable systems for protein expression include host cells, such as: bacterial cell expression systems (e.g. Escherichia coli , Lactococcus lactis), yeast cell expression systems (e.g. Saccharomyces cerevisiae, Pichia pastoris), insect cell expression systems (e.g. , baculovirus-mediated protein expression), and mammalian cell expression systems.

B4GALT1 폴리펩타이드 또는 이의 단편을 암호화하는 핵산 분자의 예는 본 명세서 다른 곳에 보다 상세하게 개시되어 있다. 일부 실시형태에서, 핵산 분자는 숙주 세포에서 발현을 위해 코돈 최적화된다. 일부 실시형태에서, 핵산 분자는 숙주 세포에서 활성인 프로모터에 작동 가능하게 연결된다. 프로모터는 이종 프로모터(즉, 자연 발생 B4GALT1 프로모터가 아닌 프로모터)일 수 있다. 에쉐리키아 콜라이에 적합한 프로모터의 예는 아라비노스, lac, tac 및 T7 프로모터를 포함하지만 이들로 제한되지 않는다. 락토코쿠스 락티스에 적합한 프로모터의 예는 P170 및 니신 프로모터를 포함하지만 이들로 제한되지 않는다. 사카로마이세스 세레비시에에 적합한 프로모터의 예는 구성적 프로모터, 예컨대, 알코올 데하이드로게나제(ADHI) 또는 엔올라제(ENO) 프로모터 또는 유도성 프로모터, 예컨대, PHO, CUP1, GAL1 및 G10을 포함하지만 이들로 제한되지 않는다. 피치아 파스토리스에 적합한 프로모터의 예는 알코올 옥시다제 I(AOX I) 프로모터, 글리세르알데하이드 3 포스페이트 데하이드로게나제(GAP) 프로모터 및 글루타티온 의존성 폼알데하이드 데하이드로게나제(FLDI) 프로모터를 포함하지만 이들로 제한되지 않는다. 바쿨로바이러스-매개된 시스템에 적합한 프로모터의 예는 후기 바이러스 강한 폴리헤드린 프로모터이다.Examples of nucleic acid molecules encoding B4GALT1 polypeptides or fragments thereof are disclosed in more detail elsewhere herein. In some embodiments, the nucleic acid molecule is codon optimized for expression in a host cell. In some embodiments, the nucleic acid molecule is operably linked to a promoter that is active in the host cell. The promoter may be a heterologous promoter (i.e., a promoter other than the naturally occurring B4GALT1 promoter). Examples of suitable promoters for Escherichia coli include arabinose, lac , tac, and T7. Including, but not limited to, promoters. Examples of suitable promoters for Lactococcus lactis include, but are not limited to, the P170 and nisin promoters. Examples of suitable promoters for Saccharomyces cerevisiae include constitutive promoters such as alcohol dehydrogenase (ADHI) or enolase (ENO) promoters or inducible promoters such as PHO, CUP1, GAL1 and G10. Including but not limited to these. Examples of suitable promoters for Pichia pastoris include, but are not limited to, the alcohol oxidase I (AOX I) promoter, the glyceraldehyde 3 phosphate dehydrogenase (GAP) promoter, and the glutathione-dependent formaldehyde dehydrogenase (FLDI) promoter. Not limited. An example of a suitable promoter for baculovirus-mediated systems is the late viral strong polyhedrin promoter.

일부 실시형태에서, 핵산 분자는 단백질 정제를 용이하게 하기 위해서 B4GALT1 폴리펩타이드 또는 이의 단편과 인프레임으로 태그를 암호화한다. 태그의 예는 본 명세서 다른 곳에 개시된다. 이러한 태그는 예를 들어, 태깅된 단백질이 모든 다른 단백질(예를 들어, 숙주 세포 단백질)로부터 단리될 수 있도록, (예를 들어, 수지 상에 고정된) 파트너 리간드에 결합할 수 있다. 친화성 크로마토그래피, 고성능 액체 크로마토그래피(HPLC) 및 크기 배제 크로마토그래피(SEC)가 발현된 단백질의 순도를 개선시키기 위해서 사용될 수 있는 방법의 예이다.In some embodiments, the nucleic acid molecule encodes a tag in frame with the B4GALT1 polypeptide or fragment thereof to facilitate protein purification. Examples of tags are disclosed elsewhere herein. Such tags can bind to a partner ligand (e.g., immobilized on a resin) such that, for example, the tagged protein can be isolated from all other proteins (e.g., host cell proteins). Affinity chromatography, high performance liquid chromatography (HPLC), and size exclusion chromatography (SEC) are examples of methods that can be used to improve the purity of expressed proteins.

B4GALT1 폴리펩타이드 또는 이의 단편을 생산하기 위해서 다른 방법이 또한 사용될 수 있다. 예를 들어, 2개 이상의 펩타이드 또는 폴리펩타이드가 단백질 화학 기술에 의해서 함께 연결될 수 있다. 예를 들어, 펩타이드 또는 폴리펩타이드는 Fmoc(9-플루오렌일메틸옥시카보닐) 또는 Boc(tert -부틸옥시카보닐) 화학을 사용하여 화학적으로 합성될 수 있다. 이러한 펩타이드 또는 폴리펩타이드는 표준 화학 반응에 의해서 합성될 수 있다. 예를 들어, 펩타이드 또는 폴리펩타이드가 이의 합성 수지로부터 합성되고 절단되지 않을 수 있는 반면, 펩타이드 또는 단백질의 다른 단편이 합성되고, 이후에 수지로부터 절단됨으로써, 이러한 다른 단편 상에서 기능적으로 차단된 말단기가 노출될 수 있다. 펩타이드 축합 반응에 의해서, 이러한 2개의 단편은 각각 그들의 카복실 및 아미노 말단에서 펩타이드 결합을 통해서 공유 결합될 수 있다. 대안으로, 펩타이드 또는 폴리펩타이드는 본 명세서에 기재된 바와 같이 생체내에서 독립적으로 합성될 수 있다. 일단 단리되면, 이러한 독립적인 펩타이드 또는 폴리펩타이드는 연결되어 유사한 펩타이드 축합 반응을 통해서 펩타이드 또는 이의 단편을 형성할 수 있다.Other methods can also be used to produce B4GALT1 polypeptide or fragments thereof. For example, two or more peptides or polypeptides can be linked together by protein chemistry techniques. For example, peptides or polypeptides can be synthesized chemically using Fmoc (9-fluorenylmethyloxycarbonyl) or Boc (tert-butyloxycarbonyl) chemistry. These peptides or polypeptides can be synthesized by standard chemical reactions. For example, a peptide or polypeptide may be synthesized from its synthetic resin and not cleaved, while other fragments of the peptide or protein may be synthesized and subsequently cleaved from the resin, thereby exposing functionally blocked end groups on these other fragments. It can be. By a peptide condensation reaction, these two fragments can be covalently linked via peptide bonds at their carboxyl and amino termini, respectively. Alternatively, the peptide or polypeptide can be synthesized independently in vivo as described herein. Once isolated, these independent peptides or polypeptides can be linked to form peptides or fragments thereof through similar peptide condensation reactions.

일부 실시형태에서, 클로닝되거나 또는 합성 펩타이드 분절의 효소적 결찰은 비교적 짧은 펩타이드 단편이 결합되어 더 큰 펩타이드 단편, 폴리펩타이드 또는 전체 단백질 도메인을 생산하는 것을 가능하게 한다(Abrahmsen et al., Biochemistry, 1991, 30, 4151). 대안적으로, 합성 펩타이드의 네이티브 화학적 결찰이 더 짧은 펩타이드 단편으로부터 큰 펩타이드 또는 폴리펩타이드를 합성에 의해서 작제하는데 활용될 수 있다. 이러한 방법은 2-단계 화학 반응으로 이루어질 수 있다(문헌[Dawson et al., Science, 1994, 266, 776-779] 참고). 제1 단계는 티오에스터-연결된 중간체를 초기 공유 생성물로서 제공하기 위한, 아미노-말단 Cys 잔기를 함유하는 또 다른 비보호 펩타이드 분절과 비보호 합성 펩타이드-티오에스터의 화학선택적 반응일 수 있다. 반응 조건의 변화 없이, 이러한 중간체는 결찰 부위에서 네이티브 펩타이드 결합을 형성하는 자발적이고 신속한 분자내 반응을 겪을 수 있다.In some embodiments, enzymatic ligation of cloned or synthetic peptide segments allows relatively short peptide fragments to be joined to produce larger peptide fragments, polypeptides, or entire protein domains (Abrahmsen et al., Biochemistry , 1991 , 30, 4151). Alternatively, native chemical ligation of synthetic peptides can be utilized to construct large peptides or polypeptides by synthesis from shorter peptide fragments. This method may consist of a two-step chemical reaction (see Dawson et al., Science , 1994, 266, 776-779). The first step may be a chemoselective reaction of the unprotected synthetic peptide-thioester with another unprotected peptide segment containing an amino-terminal Cys residue to provide a thioester-linked intermediate as the initial covalent product. Without changing the reaction conditions, these intermediates can undergo spontaneous and rapid intramolecular reactions forming native peptide bonds at the ligation site.

일부 실시형태에서, 비보호 펩타이드 분절은 화학적으로 연결될 수 있는데, 여기서 화학적 결찰의 결과로서 이들 펩타이드 분절 사이에 형성된 결합은 비자연적(비펩타이드) 결합이다(문헌[Schnolzer et al., Science, 1992, 256, 221] 참고).In some embodiments, unprotected peptide segments can be chemically linked, where the bond formed between these peptide segments as a result of chemical ligation is a non-natural (non-peptide) bond (Schnolzer et al., Science , 1992, 256 , 221]).

본 개시내용은 또한 본 명세서에 개시된 핵산 분자 중 임의의 하나 이상 및 폴리펩타이드 중 임의의 하나 이상을 포함하는 세포(예를 들어, 재조합 숙주 세포)를 제공한다. 세포는 시험관내, 생체외 또는 생체내에 존재할 수 있다. 핵산 분자는 그것이 발현되어 암호화된 단백질을 생산하도록, 프로모터 및 다른 조절 서열에 연결될 수 있다.The disclosure also provides cells (e.g., recombinant host cells) comprising any one or more of the nucleic acid molecules and any one or more of the polypeptides disclosed herein. Cells may exist in vitro, ex vivo, or in vivo. Nucleic acid molecules can be linked to promoters and other regulatory sequences so that they can be expressed to produce the encoded protein.

일부 실시형태에서, 세포는 전능성(totipotent) 세포 또는 다능성(pluripotent) 세포(예를 들어, 배아 줄기(ES) 세포, 예컨대, 설치류 ES 세포, 마우스 ES 세포, 또는 래트 ES 세포)이다. 전능성 세포는 임의의 세포 유형을 발생시킬 수 있는 미분화 세포를 포함하고, 다능성 세포는 하나 초과의 분화된 세포 유형으로 발달하는 능력을 보유하는 미분화 세포를 포함한다. 이러한 다능성 및/또는 전능성 세포는 예를 들어, ES 세포 또는 ES-유사 세포, 예컨대, 유도된 다능성 줄기(iPS) 세포일 수 있다. ES 세포는 배아 내로 도입 시에 발달 중인 배아의 임의의 조직에 기여할 수 있는 배아-유래된 전능성 또는 다능성 세포를 포함한다. ES 세포는 배반포의 속세포괴로부터 유래될 수 있고, 3가지 척추동물 배엽(내배엽, 외배엽 및 중배엽) 중 임의의 세포로 분화할 수 있다.In some embodiments, the cells are totipotent cells or pluripotent cells (e.g., embryonic stem (ES) cells, such as rodent ES cells, mouse ES cells, or rat ES cells). Totipotent cells include undifferentiated cells that can give rise to any cell type, and pluripotent cells include undifferentiated cells that retain the ability to develop into more than one differentiated cell type. Such pluripotent and/or totipotent cells may be, for example, ES cells or ES-like cells, such as induced pluripotent stem (iPS) cells. ES cells include embryo-derived totipotent or pluripotent cells that, upon introduction into the embryo, can contribute to any tissue of the developing embryo. ES cells can be derived from the inner cell mass of the blastocyst and can differentiate into any of the three vertebrate germ layers (endoderm, ectoderm, and mesoderm).

일부 실시형태에서, 세포는 일차 체세포, 또는 일차 체세포가 아닌 세포이다. 체세포는 배우자, 생식 세포, 생식 모세포, 또는 미분화된 줄기 세포가 아닌 임의의 세포를 포함할 수 있다. 일부 실시형태에서, 세포는 또한 일차 세포일 수 있다. 일차 세포는 유기체, 기관 또는 조직으로부터 직접 단리된 세포 또는 세포의 배양물을 포함한다. 일차 세포는 형질전환되지도 않고 불멸화되지도 않은 세포를 포함한다. 일차 세포는 조직 배양 동안 이전에 계대되지 않았거나 또는 조직 배양 동안 이전에 계대되었지만 조직 배양 동안 무한적으로 계대될 수는 없는 유기체, 기관 또는 조직으로부터 획득된 임의의 세포를 포함한다. 이러한 세포는 종래의 기술에 의해 단리될 수 있고, 예를 들어, 체세포, 조혈 세포, 내피 세포, 상피 세포, 섬유모세포, 중간엽 세포, 각질세포, 멜라닌세포, 단핵구, 단핵 세포, 지방세포, 지방전구세포, 뉴런, 신경아교 세포, 간세포, 골격 근모세포 및 평활근 세포를 포함한다. 예를 들어, 일차 세포는 결합 조직, 근육 조직, 신경계 조직, 또는 상피 조직으로부터 유래될 수 있다.In some embodiments, the cell is a primary somatic cell, or a non-primary somatic cell. Somatic cells may include gametes, germ cells, germ cells, or any cell that is not an undifferentiated stem cell. In some embodiments, the cells can also be primary cells. Primary cells include cells or cultures of cells isolated directly from an organism, organ, or tissue. Primary cells include cells that have not been transformed or immortalized. Primary cells include any cells obtained from an organism, organ, or tissue that has not previously been passaged in tissue culture or has been previously passaged in tissue culture but cannot be passaged indefinitely in tissue culture. These cells can be isolated by conventional techniques and include, for example, somatic cells, hematopoietic cells, endothelial cells, epithelial cells, fibroblasts, mesenchymal cells, keratinocytes, melanocytes, monocytes, mononuclear cells, adipocytes, fat. Includes progenitor cells, neurons, glial cells, hepatocytes, skeletal myoblasts, and smooth muscle cells. For example, primary cells can be derived from connective tissue, muscle tissue, nervous system tissue, or epithelial tissue.

일부 실시형태에서, 세포는 정상적으로는 무한적으로 증식할 수 없지만, 돌연변이 또는 변경으로 인해서, 정상적인 세포 노화를 회피하고, 그 대신에 분열을 계속 겪을 수 있다. 이러한 돌연변이 또는 변경은 자연 발생할 수 있거나 또는 의도적으로 유도될 수 있다. 불멸화 세포의 예는 중국 햄스터 난소(CHO) 세포, 인간 배아 신장 세포(예를 들어, HEK 293 세포) 및 마우스 배아 섬유모세포 세포(예를 들어, 3T3 세포)를 포함하지만 이들로 제한되지 않는다. 다수의 유형의 불멸화 세포가 널리 공지되어 있다. 불멸화 또는 일차 세포는 재조합 유전자 또는 단백질을 배양하거나 또는 발현하는 데 전형적으로 사용되는 세포를 포함한다. 일부 실시형태에서, 세포는 분화된 세포, 예컨대, 간 세포 (예를 들어, 인간 간 세포)이다.In some embodiments, cells are normally unable to proliferate indefinitely, but due to mutations or alterations, they can avoid normal cellular aging and instead continue to undergo division. These mutations or alterations may occur naturally or may be intentionally induced. Examples of immortalized cells include, but are not limited to, Chinese hamster ovary (CHO) cells, human embryonic kidney cells (e.g., HEK 293 cells), and mouse embryonic fibroblast cells (e.g., 3T3 cells). Many types of immortalized cells are well known. Immortalized or primary cells include cells typically used to culture or express recombinant genes or proteins. In some embodiments, the cell is a differentiated cell, such as a liver cell (eg, a human liver cell).

세포는 임의의 공급원으로부터 유래될 수 있다. 예를 들어, 세포는 진핵 세포, 동물 세포, 식물 세포, 또는 진균(예를 들어, 효모) 세포일 수 있다. 이러한 세포는 어류 세포 또는 조류 세포일 수 있거나, 또는 이러한 세포는 포유동물 세포, 예컨대, 인간 세포, 비인간 포유동물 세포, 설치류 세포, 마우스 세포 또는 래트 세포일 수 있다. 포유동물은 인간, 비인간 영장류, 원숭이, 유인원, 고양이, 개, 말, 황소, 사슴, 들소, 양, 설치류(예를 들어, 마우스, 래트, 햄스터, 기니 피그), 가축(예를 들어, 소 종, 예컨대, 소, 거세소 등; 양 종, 예컨대, 양, 염소 등; 및 돼지 종, 예컨대, 돼지 및 멧돼지)을 포함하지만 이들로 제한되지 않는다. 조류는 닭, 칠면조, 타조, 거위, 오리 등을 포함하지만 이들로 제한되지 않는다. 가축 및 농업 동물이 또한 포함된다. 용어 "비-인간 동물"은 인간을 배제한다.Cells may be derived from any source. For example, the cell may be a eukaryotic cell, an animal cell, a plant cell, or a fungal (e.g., yeast) cell. These cells may be fish cells or avian cells, or these cells may be mammalian cells, such as human cells, non-human mammalian cells, rodent cells, mouse cells or rat cells. Mammals include humans, non-human primates, monkeys, apes, cats, dogs, horses, bulls, deer, bison, sheep, rodents (e.g., mice, rats, hamsters, guinea pigs), and livestock (e.g., bovine species). , such as cattle, steers, etc.; sheep species, such as sheep, goats, etc.; and porcine species, such as pigs and wild boars). Birds include, but are not limited to, chickens, turkeys, ostriches, geese, ducks, etc. Livestock and agricultural animals are also included. The term “non-human animal” excludes humans.

본 개시내용은 또한 대상체 인간으로부터의 생물학적 샘플에서 B4GALT1 변이체 유전자, mRNA, cDNA 및/또는 폴리펩타이드의 존재를 검출하는 방법을 제공한다. 집단 내에서 유전자 서열, 및 이러한 유전자에 의해 암호화된 mRNA 및 단백질은 다형성, 예컨대, 단일-뉴클레오타이드 다형성으로 인해 달라질 수 있다는 것이 이해된다. B4GALT1 유전자, mRNA, cDNA 및 폴리펩타이드에 대해서 본 명세서에 제공된 서열은 단지 예시적인 서열이다. B4GALT1 유전자, mRNA, cDNA 및 폴리펩타이드에 대한 다른 서열이 또한 가능하다.The present disclosure also provides methods for detecting the presence of a B4GALT1 variant gene, mRNA, cDNA, and/or polypeptide in a biological sample from a human subject. It is understood that within a population gene sequences, and the mRNA and proteins encoded by such genes, may vary due to polymorphisms, such as single-nucleotide polymorphisms. The sequences provided herein for the B4GALT1 gene, mRNA, cDNA and polypeptide are exemplary sequences only. Other sequences for the B4GALT1 gene, mRNA, cDNA and polypeptide are also possible.

생물학적 샘플은 대상체로부터의 임의의 세포, 조직 또는 생물학적 유체로부터 유래될 수 있다. 샘플은 임의의 임상적으로 관련된 조직, 예컨대, 골수 샘플, 종양 생검, 미세 바늘 흡인물 또는 체액, 예컨대, 혈액, 혈장, 혈청, 림프, 복수, 낭성 유체 또는 소변을 포함할 수 있다. 일부 경우에, 샘플은 협측 면봉을 포함한다. 본 명세서에 개시된 방법에서 사용되는 샘플은 검정 포맷, 검출 방법의 특성, 및 샘플로서 사용되는 조직, 세포 또는 추출물을 기초로 달라질 것이다. 생물학적 샘플은 사용되는 검정에 따라서 상이하게 가공될 수 있다. 예를 들어, 변이체 B4GALT1 핵산 분자를 검출하는 경우, 게놈 DNA에 대한 샘플을 단리시키거나 또는 농축시키도록 설계된 예비 가공이 사용될 수 있다. 다양한 공지된 기술이 이러한 목적으로 사용될 수 있다. 변이체 B4GALT1 mRNA의 수준을 검출하는 경우, 상이한 기술이 mRNA를 갖는 생물학적 샘플을 농축시키는 데 사용될 수 있다. mRNA의 존재 또는 수준 또는 특정 변이체 게놈 DNA 유전자좌의 존재를 검출하는 다양한 방법이 사용될 수 있다.A biological sample can be derived from any cell, tissue, or biological fluid from a subject. The sample may include any clinically relevant tissue, such as a bone marrow sample, tumor biopsy, fine needle aspirate, or body fluid such as blood, plasma, serum, lymph, ascites, cystic fluid, or urine. In some cases, the sample includes a buccal swab. Samples used in the methods disclosed herein will vary based on the assay format, nature of the detection method, and tissue, cell, or extract used as the sample. Biological samples can be processed differently depending on the assay used. For example, when detecting variant B4GALT1 nucleic acid molecules, preprocessing designed to isolate or enrich the sample for genomic DNA can be used. A variety of known techniques can be used for this purpose. When detecting levels of variant B4GALT1 mRNA, different techniques can be used to enrich biological samples with mRNA. A variety of methods can be used to detect the presence or level of mRNA or the presence of a specific variant genomic DNA locus.

일부 실시형태에서, 본 개시내용은 생물학적 샘플에서 핵산의 적어도 일부를 서열분석하여 핵산이 서열번호 2의 위치 53575 내지 53577에 상응하는 위치에 서열번호 2의 뉴클레오타이드 53575 내지 53577를 포함하는지의 여부를 결정하는 단계를 포함하는, 변이체 B4GALT1 핵산 분자의 존재 또는 부재를 검출하는 방법을 제공한다.In some embodiments, the present disclosure provides sequencing of at least a portion of a nucleic acid in a biological sample to determine whether the nucleic acid comprises nucleotides 53575 to 53577 of SEQ ID NO: 2 at a position corresponding to positions 53575 to 53577 of SEQ ID NO: 2. A method of detecting the presence or absence of a variant B4GALT1 nucleic acid molecule is provided, comprising the steps of:

일부 실시형태에서, 본 개시내용은 생물학적 샘플에서 핵산의 적어도 일부를 서열분석하여 핵산이 서열번호 4의 위치 1243 내지 1245에 상응하는 위치에 서열번호 4의 뉴클레오타이드 1243 내지 1245를 포함하는지의 여부를 결정하는 단계를 포함하는, 변이체 B4GALT1 핵산 분자의 존재 또는 부재를 검출하는 방법을 제공한다.In some embodiments, the present disclosure provides sequencing of at least a portion of a nucleic acid in a biological sample to determine whether the nucleic acid comprises nucleotides 1243-1245 of SEQ ID NO:4 at a position corresponding to positions 1243-1245 of SEQ ID NO:4. A method of detecting the presence or absence of a variant B4GALT1 nucleic acid molecule is provided, comprising the steps of:

일부 실시형태에서, 본 개시내용은 생물학적 샘플에서 핵산의 적어도 일부를 서열분석하여 핵산이 서열번호 6의 위치 1054 내지 1056에 상응하는 위치에 서열번호 6의 뉴클레오타이드 1054 내지 1056을 포함하는지의 여부를 결정하는 단계를 포함하는, 변이체 B4GALT1 핵산 분자의 존재 또는 부재를 검출하는 방법을 제공한다.In some embodiments, the present disclosure provides sequencing of at least a portion of a nucleic acid in a biological sample to determine whether the nucleic acid comprises nucleotides 1054-1056 of SEQ ID NO:6 at a position corresponding to positions 1054-1056 of SEQ ID NO:6. A method of detecting the presence or absence of a variant B4GALT1 nucleic acid molecule is provided, comprising the steps of:

일부 실시형태에서, 인간 대상체에서 변이체 B4GALT1 핵산 분자(예를 들어, 유전자, mRNA 또는 cDNA)를 검출하는 방법은, 인간 대상체로부터의 생물학적 샘플에 대한 검정을 수행하여 생물학적 샘플 중의 핵산 분자가 서열번호 8의 352번 위치에 세린을 암호화하는 핵산 서열을 포함하는지의 여부를 결정하는 단계를 포함한다. 일부 실시형태에서, 생물학적 샘플은 세포 또는 세포 용해물을 포함한다. 이러한 방법은, 예를 들어, B4GALT1 유전자, mRNA 또는 cDNA를 포함하는 대상체로부터 생물학적 샘플을 획득하는 단계 및 생물학적 샘플에 대해서 검정을 수행하여 서열번호 2의 위치 53575 내지 53577(유전자), 서열번호 4의 1243 내지 1245번 위치(mRNA) 또는 서열번호 6의 1054 내지 1056번 위치(cDNA)에 상응하는 B4GALT1 유전자, mRNA 또는 cDNA의 위치가 변이체 B4GALT1 폴리펩타이드의 352번 위치에 상응하는 위치에서 아스파라긴 대신에 세린을 암호화하는지를 결정하는 단계를 포함할 수 있다. 이러한 검정은 예를 들어, 특정 B4GALT1 핵산 분자의 이러한 위치의 아이덴티티를 결정하는 단계를 포함할 수 있다.In some embodiments, a method of detecting a variant B4GALT1 nucleic acid molecule (e.g., gene, mRNA, or cDNA) in a human subject comprises performing the assay on a biological sample from the human subject to determine if the nucleic acid molecule in the biological sample is SEQ ID NO: 8. It includes determining whether or not the nucleic acid sequence encoding serine is included at position 352. In some embodiments, the biological sample includes cells or cell lysates. This method includes, for example, obtaining a biological sample from a subject containing the B4GALT1 gene, mRNA or cDNA, and performing an assay on the biological sample to determine the gene at positions 53575 to 53577 (gene) of SEQ ID NO: 2, SEQ ID NO: 4, The position of the B4GALT1 gene, mRNA or cDNA corresponding to positions 1243 to 1245 (mRNA) or positions 1054 to 1056 (cDNA) of SEQ ID NO: 6 is serine instead of asparagine at the position corresponding to position 352 of the variant B4GALT1 polypeptide. It may include a step of determining whether to encrypt. Such assays may include, for example, determining the identity of this position on a particular B4GALT1 nucleic acid molecule.

일부 실시형태에서, 검정은, 인간 대상체로부터의 생물학적 샘플 중의 핵산 분자의 B4GALT1 게놈 서열의 일부를 서열분석하는 단계로서, 여기서 서열분석되는 부분은 서열번호 2의 53575 내지 53577번 위치에 상응하는 위치를 포함하는, 상기 서열분석하는 단계; 인간 대상체로부터의 생물학적 샘플 중의 핵산 분자의 B4GALT1 mRNA 서열의 일부를 서열분석하는 단계로서, 여기서 서열분석되는 부분은 서열번호 4의 1243 내지 1245번 위치에 상응하는 위치를 포함하는, 상기 서열분석하는 단계; 또는 인간 대상체로부터의 생물학적 샘플 중의 핵산 분자의 B4GALT1 cDNA 서열의 일부를 서열분석하는 단계로서, 여기서 서열분석되는 부분은 서열번호 6의 1054 내지 1056번 위치에 상응하는 위치를 포함하는, 상기 서열분석하는 단계를 포함한다.In some embodiments, the assay comprises sequencing a portion of the B4GALT1 genomic sequence of a nucleic acid molecule in a biological sample from a human subject, wherein the portion sequenced comprises positions corresponding to positions 53575 to 53577 of SEQ ID NO:2. Including, analyzing the sequence; Sequencing a portion of the B4GALT1 mRNA sequence of a nucleic acid molecule in a biological sample from a human subject, wherein the portion sequenced comprises positions corresponding to positions 1243 to 1245 of SEQ ID NO: 4. ; or sequencing a portion of the B4GALT1 cDNA sequence of a nucleic acid molecule in a biological sample from a human subject, wherein the portion sequenced comprises positions corresponding to positions 1054 to 1056 of SEQ ID NO:6. Includes steps.

일부 실시형태에서, 검정은, a) 생물학적 샘플을, i) 서열번호 2의 53575 내지 53577번 위치에 상응하는 B4GALT1 게놈 서열의 위치에 인접한 B4GALT1 게놈 서열의 일부; ii) 서열번호 4의 1243 내지 1245번 위치에 상응하는 B4GALT1 mRNA의 위치에 인접한 B4GALT1 mRNA 서열의 일부; 또는 iii) 서열번호 6의 1054 내지 1056번 위치에 상응하는 B4GALT1 cDNA의 위치에 인접한 B4GALT1 cDNA 서열의 일부에 혼성화되는 프라이머와 접촉시키는 단계; b) 프라이머를 적어도 i) 53575 내지 53577번 위치에 상응하는 B4GALT1 게놈 서열의 위치; ii) 1243 내지 1245번 위치에 상응하는 B4GALT1 mRNA의 위치; 또는 iii) 1054 내지 1056번 위치에 상응하는 B4GALT1 cDNA의 위치를 통해서 연장시키는 단계; 및 c) 프라이머의 연장 산물이 서열번호 8의 352번 위치에 세린을 암호화하는 i) B4GALT1 게놈 서열의 53575 내지 53577번 위치에 상응하는 위치; ii) B4GALT1 mRNA의 1243 내지 1245번 위치에 상응하는 위치; 또는 iii) B4GALT1 cDNA의 1054 내지 1056번 위치에 상응하는 위치에서 뉴클레오타이드를 포함하는지의 여부를 결정하는 단계를 포함한다. 일부 실시형태에서, B4GALT1 게놈 DNA만 분석된다. 일부 실시형태에서, B4GALT1 mRNA 만 분석된다. 일부 실시형태에서, B4GALT1 cDNA만 분석된다.In some embodiments, the assay comprises: a) a biological sample comprising: i) a portion of the B4GALT1 genomic sequence adjacent to a position in the B4GALT1 genomic sequence corresponding to positions 53575-53577 of SEQ ID NO:2; ii) a portion of the B4GALT1 mRNA sequence adjacent to the position of B4GALT1 mRNA corresponding to positions 1243 to 1245 of SEQ ID NO: 4; or iii) contacting with a primer that hybridizes to a portion of the B4GALT1 cDNA sequence adjacent to the position of the B4GALT1 cDNA corresponding to positions 1054 to 1056 of SEQ ID NO:6; b) apply primers to at least i) a position in the B4GALT1 genomic sequence corresponding to positions 53575 to 53577; ii) the location of B4GALT1 mRNA corresponding to positions 1243 to 1245; or iii) extending through the position of B4GALT1 cDNA corresponding to positions 1054 to 1056; and c) the extension product of the primer encodes a serine at position 352 of SEQ ID NO: 8, i) a position corresponding to positions 53575 to 53577 of the B4GALT1 genome sequence; ii) a position corresponding to positions 1243 to 1245 of B4GALT1 mRNA; or iii) determining whether it contains a nucleotide at a position corresponding to positions 1054 to 1056 of the B4GALT1 cDNA. In some embodiments, only B4GALT1 genomic DNA is analyzed. In some embodiments, only B4GALT1 mRNA is analyzed. In some embodiments, only B4GALT1 cDNA is analyzed.

일부 실시형태에서, 검정은 생물학적 샘플을, 엄격한 조건 하에서 상응하는 야생형 B4GALT1 서열이 아닌 변이체 B4GALT1 게놈 서열, mRNA 서열 또는 cDNA 서열에 특이적으로 혼성화되는 프라이머 또는 프로브와 접촉시키는 단계 및 혼성화가 일어났는지의 여부를 결정하는 단계를 포함한다.In some embodiments, the assay involves contacting a biological sample under stringent conditions with a primer or probe that specifically hybridizes to a variant B4GALT1 genomic sequence, mRNA sequence, or cDNA sequence rather than the corresponding wild-type B4GALT1 sequence and determines whether hybridization has occurred. It includes the step of determining whether or not.

일부 실시형태에서, 상기에 기재된 검정은 RNA 서열분석(RNA-Seq)을 포함한다. 일부 실시형태에서, 검정은 또한 역전사 중합효소 연쇄 반응(RT-PCR)을 포함한다.In some embodiments, the assays described above include RNA sequencing (RNA-Seq). In some embodiments, the assay also includes reverse transcription polymerase chain reaction (RT-PCR).

일부 실시형태에서, 방법은 표적 핵산 서열에 결합하고, 변이체 B4GALT1 유전자, mRNA 또는 cDNA를 포함하는 폴리뉴클레오타이드를 특이적으로 검출 및/또는 식별하기에 충분한 뉴클레오타이드 길이의 프로브 및 프라이머를 활용한다. 혼성화 조건 또는 반응 조건은 이러한 결과를 달성하기 위해 작동인자에 의해 결정될 수 있다. 이러한 길이는 선택되는 검출 방법에 사용하기에 충분한 임의의 길이일 수 있다. 일반적으로, 예를 들어, 약 8, 약 11, 약 14, 약 16, 약 18, 약 20, 약 22, 약 24, 약 26, 약 28, 약 30, 약 40, 약 50, 약 75, 약 100, 약 200, 약 300, 약 400, 약 500, 약 600 또는 약 700개의 뉴클레오타이드 또는 그 초과, 또는 약 11 내지 약 20, 약 20 내지 약 30, 약 30 내지 약 40, 약 40 내지 약 50, 약 50 내지 약 100, 약 100 내지 약 200, 약 200 내지 약 300, 약 300 내지 약 400, 약 400 내지 약 500, 약 500 내지 약 600, 약 600 내지 약 700 또는 약 700 내지 약 800개 또는 그 초과의 뉴클레오타이드 길이가 사용된다. 이러한 프로브 및 프라이머는 고도로 엄격한 혼성화 조건 하에 표적 서열에 특이적으로 혼성화될 수 있다. 비록 표적 핵산 서열과 상이하지만 표적 핵산 서열을 특이적으로 검출 및/또는 식별하는 능력을 보유하는 프로브가 종래의 방법에 의해 설계될 수 있지만, 프로브 및 프라이머는 인접 뉴클레오타이드와 표적 서열의 완전한 핵산 서열 동일성을 가질 수 있다. 따라서, 프로브 및 프라이머는 표적 핵산 분자와 약 80%, 약 85%, 약 90%, 약 91%, 약 92%, 약 93%, 약 94%, 약 95%, 약 96%, 약 97%, 약 98%, 약 99% 또는 100% 서열 동일성 또는 상보성을 공유할 수 있다.In some embodiments, the methods utilize probes and primers of sufficient nucleotide length to bind to the target nucleic acid sequence and specifically detect and/or identify polynucleotides comprising the variant B4GALT1 gene, mRNA or cDNA. Hybridization conditions or reaction conditions can be determined by operating factors to achieve these results. This length may be any length sufficient for use with the detection method selected. Typically, for example, about 8, about 11, about 14, about 16, about 18, about 20, about 22, about 24, about 26, about 28, about 30, about 40, about 50, about 75, about 100, about 200, about 300, about 400, about 500, about 600 or about 700 nucleotides or more, or about 11 to about 20, about 20 to about 30, about 30 to about 40, about 40 to about 50, About 50 to about 100, about 100 to about 200, about 200 to about 300, about 300 to about 400, about 400 to about 500, about 500 to about 600, about 600 to about 700 or about 700 to about 800 or more Excess nucleotide lengths are used. These probes and primers can specifically hybridize to target sequences under highly stringent hybridization conditions. Although probes that are different from the target nucleic acid sequence but retain the ability to specifically detect and/or identify the target nucleic acid sequence can be designed by conventional methods, probes and primers must maintain complete nucleic acid sequence identity of the target sequence with adjacent nucleotides. You can have Accordingly, probes and primers are about 80%, about 85%, about 90%, about 91%, about 92%, about 93%, about 94%, about 95%, about 96%, about 97%, They may share about 98%, about 99% or 100% sequence identity or complementarity.

일부 실시형태에서, 특정 프라이머를 사용하여 변이체 B4GALT1 유전자좌 및/또는 B4GALT1 변이체 mRNA 또는 cDNA를 증폭시켜 특이적 프로브로서 사용될 수 있거나 또는 그 자체가 생물학적 샘플에서 변이체 B4GALT1 유전자좌를 식별하기 위해서 또는 특정 B4GALT1 mRNA 또는 cDNA의 수준을 결정하기 위해서 검출될 수 있는 앰플리콘을 생산할 수 있다. B4GALT1 변이체 유전자좌는 서열번호 2에서 53575 내지 53577번 위치에 상응하는 위치를 포함하는 게놈 핵산 서열을 지칭하는 데 사용될 수 있다. 핵산 분자에 대한 프로브의 결합을 허용하는 조건 하에 생물학적 샘플에서 프로브가 핵산 분자와 혼성화되는 경우, 이러한 결합은 검출되고, 변이체 B4GALT1 유전자좌의 존재 또는 생물학적 샘플에서 변이체 B4GALT1 mRNA 또는 cDNA의 존재 또는 수준의 지시를 허용할 수 있다. 결합된 프로브의 이러한 식별은 기재되어 있다. 특정 프로브는 변이체 B4GALT1 유전자의 특정 영역과 적어도 약 80%, 약 80% 내지 약 85%, 약 85% 내지 약 90%, 약 90% 내지 약 95%, 및 약 95% 내지 약 100% 동일한(또는 상보적인) 서열을 포함할 수 있다. 특정 프로브는 변이체 B4GALT1 mRNA의 특정 영역과 적어도 약 80%, 약 80% 내지 약 85%, 약 85% 내지 약 90%, 약 90% 내지 약 95%, 및 약 95% 내지 약 100% 동일한(또는 상보적인) 서열을 포함할 수 있다. 특정 프로브는 변이체 B4GALT1 cDNA의 특정 영역과 적어도 약 80%, 약 80% 내지 약 85%, 약 85% 내지 약 90%, 약 90% 내지 약 95%, 및 약 95% 내지 약 100% 동일한(또는 상보적인) 서열을 포함할 수 있다.In some embodiments, specific primers are used to amplify the variant B4GALT1 locus and/or B4GALT1 variant mRNA or cDNA, which can be used as specific probes or themselves to identify the variant B4GALT1 locus in a biological sample or to amplify a specific B4GALT1 mRNA or cDNA. Amplicons can be produced that can be detected to determine the level of cDNA. The B4GALT1 variant locus can be used to refer to a genomic nucleic acid sequence comprising a position corresponding to positions 53575 to 53577 in SEQ ID NO:2. If the probe hybridizes to a nucleic acid molecule in a biological sample under conditions that permit binding of the probe to the nucleic acid molecule, this binding is detected and indicates the presence of a variant B4GALT1 locus or the presence or level of variant B4GALT1 mRNA or cDNA in the biological sample. can be allowed. This identification of bound probes is described. Particular probes are at least about 80%, about 80% to about 85%, about 85% to about 90%, about 90% to about 95%, and about 95% to about 100% identical (or may include complementary) sequences. Particular probes are at least about 80%, about 80% to about 85%, about 85% to about 90%, about 90% to about 95%, and about 95% to about 100% identical (or may include complementary) sequences. Particular probes are at least about 80%, about 80% to about 85%, about 85% to about 90%, about 90% to about 95%, and about 95% to about 100% identical (or may include complementary) sequences.

일부 실시형태에서, 생물학적 샘플의 핵산 보체가 변이체 B4GALT1 유전자 유전자좌(서열번호 2) 내의 53575 내지 53577번 위치에서 세린을 암호화하는 뉴클레오타이드를 포함하는지를 결정하기 위해서, 생물학적 샘플은, 변이체 B4GALT1 유전자 유전자좌(서열번호 2) 내의 53575 내지 53577번 위치에서 SNP의 존재에 대해서 진단적인 앰플리콘을 생산하기 위해서, 53575 내지 53577번 위치에 인접한 5' 측접 서열로부터 유래된 제1 프라이머 및 53575 내지 53577번 위치에 인접한 3' 측접 서열로부터 유래된 제2 프라이머를 포함하는 프라이머 쌍을 사용하는 핵산 증폭 방법에 적용될 수 있다. 일부 실시형태에서, 앰플리콘은 길이가 프라이머 쌍과 하나의 뉴클레오타이드 염기쌍을 합한 길이에서부터, DNA 증폭 프로토콜에 의해 생산 가능한 앰플리콘의 임의의 길이까지의 범위일 수 있다. 이러한 거리는 하나의 뉴클레오타이드 염기쌍에서부터, 증폭 반응의 한계치, 또는 약 2만 개 뉴클레오타이드 염기쌍까지의 범위일 수 있다. 선택적으로, 프라이머 쌍은 53575 내지 53577번 위치 및 53575 내지 53577번의 각각의 측면에 대해서 적어도 1, 2, 3, 4, 5, 6, 7, 8, 9, 10개 또는 그 초과의 뉴클레오타이드를 포함하는 영역에 측접한다. 유사한 앰플리콘은 mRNA 및/또는 cDNA 서열로부터 생성될 수 있다.In some embodiments, to determine whether the nucleic acid complement of the biological sample comprises nucleotides encoding serine at positions 53575 to 53577 within the variant B4GALT1 gene locus (SEQ ID NO: 2), the biological sample is selected from the variant B4GALT1 gene locus (SEQ ID NO: 2). 2) To produce an amplicon diagnostic for the presence of a SNP at positions 53575 to 53577, a first primer derived from the 5' flanking sequence adjacent to positions 53575 to 53577 and the 3' flanking sequence adjacent to positions 53575 to 53577. It can be applied to a nucleic acid amplification method using a primer pair including a second primer derived from a flanking sequence. In some embodiments, amplicons may range in length from the sum of a primer pair plus one nucleotide base pair to any length of amplicon that can be produced by a DNA amplification protocol. This distance can range from one nucleotide base pair, the limit of the amplification reaction, or about 20,000 nucleotide base pairs. Optionally, the primer pair comprises at least 1, 2, 3, 4, 5, 6, 7, 8, 9, 10 or more nucleotides for positions 53575 to 53577 and each flanking positions 53575 to 53577. adjacent to the area. Similar amplicons can be generated from mRNA and/or cDNA sequences.

프로브 및 프라이머를 제조 및 사용하는 대표적인 방법은 예를 들어, 문헌[Molecular Cloning: A Laboratory Manual, 2nd Ed., Vol. 1-3, ed. Sambrook et al., Cold Spring Harbor Laboratory Press, Cold Spring Harbor, N.Y. 1989(이하, "Sambrook et al., 1989"); Current Protocols in Molecular Biology, ed. Ausubel et al., Greene Publishing and Wiley-Interscience, New York, 1992(정기적으로 업데이트됨) (이하, "Ausubel et al., 1992"); 및 Innis et al., PCR Protocols: A Guide to Methods and Applications, Academic Press: San Diego, 1990)]에 기재되어 있다. PCR 프라이머 쌍은 예를 들어, 이러한 목적을 위해서 의도된 컴퓨터 프로그램, 예컨대, Vector NTI 버전 10(인포맥스사(Informax Inc.), 미국 메릴랜드주 베데스다주 소재)에서의 PCR 프라이머 분석 툴; PrimerSelect(디엔에이스타사(DNASTAR Inc.), 미국 매디슨주 위스콘신 소재); 및 Primer3(Version 0.4.0.COPYRGT., 1991, 화이트헤드 생의학 연구소(Whitehead Institute for Biomedical Research), 미국 매사추세츠주 캠브리지 소재)을 이용함으로써, 공지된 서열로부터 유래될 수 있다. 추가로, 서열은 시각적으로 스캔될 수 있고, 프라이머는 공지된 가이드라인을 사용하여 수동으로 식별될 수 있다.Representative methods for making and using probes and primers are described, for example, in Molecular Cloning: A Laboratory Manual, 2nd Ed., Vol. 1-3, ed. Sambrook et al., Cold Spring Harbor Laboratory Press, Cold Spring Harbor, NY 1989 (hereinafter “Sambrook et al., 1989”); Current Protocols in Molecular Biology , ed. Ausubel et al., Greene Publishing and Wiley-Interscience, New York, 1992 (updated regularly) (hereafter “Ausubel et al., 1992”); and Innis et al., PCR Protocols: A Guide to Methods and Applications, Academic Press: San Diego, 1990). PCR primer pairs can be analyzed, for example, by computer programs intended for this purpose, such as the PCR Primer Analysis Tool in Vector NTI Version 10 (Informax Inc., Bethesda, MD); PrimerSelect (DNASTAR Inc., Madison, WI, USA); and Primer3 (Version 0.4.0.COPYRGT., 1991, Whitehead Institute for Biomedical Research, Cambridge, Mass., USA). Additionally, sequences can be scanned visually and primers can be identified manually using known guidelines.

하기에 보다 상세하게 기재된 바와 같이, 임의의 종래의 핵산 혼성화 또는 증폭 또는 서열분석 방법을 사용하여 변이체 B4GALT1 유전자 유전자좌의 존재 및/또는 변이체 B4GALT1 mRNA 또는 cDNA의 수준을 특이적으로 검출할 수 있다. 일부 실시형태에서, 핵산 분자를 프라이머로서 사용하여 B4GALT1 핵산의 영역을 증폭시킬 수 있거나 또는 핵산 분자를 엄격한 조건 하에서 변이체 B4GALT1 유전자 유전자좌를 포함하는 핵산 분자 또는 변이체 B4GALT1 mRNA 또는 cDNA를 포함하는 핵산 분자에 혼성화되는 프로브로서 사용할 수 있다.As described in more detail below, any conventional nucleic acid hybridization or amplification or sequencing method can be used to specifically detect the presence of a variant B4GALT1 gene locus and/or levels of variant B4GALT1 mRNA or cDNA. In some embodiments, a nucleic acid molecule can be used as a primer to amplify a region of a B4GALT1 nucleic acid or hybridize the nucleic acid molecule to a nucleic acid molecule comprising a variant B4GALT1 gene locus or a nucleic acid molecule comprising variant B4GALT1 mRNA or cDNA under stringent conditions. It can be used as a probe.

예를 들어, 핵산 서열분석, 핵산 혼성화 및 핵산 증폭을 비롯한 다양한 핵산 기술이 공지되어 있다. 핵산 서열분석 기술의 예시적인 예는 쇄 종결인자(생어(Sanger)) 서열분석 및 염료 종결인자 서열분석을 포함하지만 이들로 제한되지 않는다.A variety of nucleic acid techniques are known, including, for example, nucleic acid sequencing, nucleic acid hybridization, and nucleic acid amplification. Illustrative examples of nucleic acid sequencing techniques include, but are not limited to, chain terminator (Sanger) sequencing and dye terminator sequencing.

다른 방법은 정제된 DNA, 증폭된 DNA 및 고정된 세포 제제에 대해서 지향되는 표지화된 프라이머 또는 프로브(형광 동일계 혼성화)를 사용하는 것을 비롯하여, 서열분석 이외의 핵산 혼성화 방법을 포함한다. 일부 방법에서, 표적 핵산은 검출 이전에 또는 검출과 동시에 증폭될 수 있다. 핵산 증폭 기술의 예시적인 예는 중합효소 연쇄 반응(PCR), 리가제 연쇄 반응(LCR), 가닥 대체 증폭(strand displacement amplification: SDA) 및 핵산 서열 기반 증폭(NASBA)을 포함하지만 이들로 제한되지 않는다. 다른 방법은 리가제 연쇄 반응, 가닥 대체 증폭 및 고온성 SDA(thermophilic SDA: tSDA)를 포함하지만 이들로 제한되지 않는다.Other methods include nucleic acid hybridization methods other than sequencing, including the use of labeled primers or probes (fluorescence in situ hybridization) directed against purified DNA, amplified DNA, and fixed cell preparations. In some methods, the target nucleic acid can be amplified prior to or simultaneously with detection. Illustrative examples of nucleic acid amplification techniques include, but are not limited to, polymerase chain reaction (PCR), ligase chain reaction (LCR), strand displacement amplification (SDA), and nucleic acid sequence-based amplification (NASBA). . Other methods include, but are not limited to, ligase chain reaction, strand displacement amplification, and thermophilic SDA (tSDA).

예를 들어, 혼성화 보호 검정(Hybridization Protection Assay: HPA), 실시간 증폭 공정의 정량적 평가, 및 샘플 중에 초기에 존재하지만 실시간 증폭을 기초로 하지 않고 표적 서열의 양을 결정하는 것을 비롯한, 임의의 방법이 비-증폭된 또는 증폭된 폴리뉴클레오타이드를 검출하는데 사용될 수 있다.Any method, including, for example, Hybridization Protection Assay (HPA), a quantitative assessment of a real-time amplification process, and determination of the amount of target sequence initially present in a sample but not based on real-time amplification, Can be used to detect non-amplified or amplified polynucleotides.

서열 증폭을 반드시 필요로 하지는 않고, 예를 들어, 적절한 프로브를 사용하여, 서던(DNA:DNA) 블롯 혼성화, 동일계 혼성화(ISH) 및 형광 동일계 혼성화(FISH)의 공지된 방법을 기반으로 하는 핵산을 식별하기 위한 방법이 또한 제공된다. 서던 블로팅은 특정 핵산 서열을 검출하는 데 사용될 수 있다. 이러한 방법에서, 샘플로부터 추출되는 핵산은 단편화되고, 매트릭스 겔에서 전기이동적으로 분리되고, 막 필터로 전달된다. 필터 결합된 핵산은 관심대상 서열에 상보적인 표지된 프로브와의 혼성화에 적용된다. 필터에 결합된 혼성화된 프로브가 검출된다.Nucleic acids are synthesized based on known methods of Southern (DNA:DNA) blot hybridization, in situ hybridization (ISH) and fluorescence in situ hybridization (FISH), for example, using appropriate probes, without necessarily requiring sequence amplification. Methods for identification are also provided. Southern blotting can be used to detect specific nucleic acid sequences. In this method, nucleic acids extracted from a sample are fragmented, electrophoretically separated in a matrix gel, and transferred to a membrane filter. Filter-bound nucleic acids are subjected to hybridization with labeled probes complementary to the sequence of interest. Hybridized probe bound to the filter is detected.

혼성화 기술에서, 프로브 또는 프라이머가 이의 표적에 특이적으로 혼성화될 것이도록 엄격한 조건이 사용될 수 있다. 일부 실시형태에서, 엄격한 조건 하에서, 폴리뉴클레오타이드 프라이머 또는 프로브는 다른 서열(예를 들어, 상응하는 야생형 B4GALT1 유전자좌, mRNA 또는 cDNA)에 대한 것보다, 이의 표적 서열(예를 들어, 변이체 B4GALT1 유전자 유전자좌, mRNA 또는 cDNA)에 배경에 비해서 적어도 2배 또는 배경에 비해서 10배 더 검출 가능하게 큰 정도로 혼성화될 것이다. 엄격한 조건은 서열 의존적이고, 상이한 상황에서 상이할 것이다. 혼성화의 엄격성 및/또는 세척 조건을 제어함으로써, 프로브에 100% 상보적인 표적 서열이 식별될 수 있다(상동성 프로빙). 대안으로, 엄격성 조건은 더 낮은 정도의 동일성이 검출되도록, 서열에서 일부 미스매치를 허용하도록 조정될 수 있다(이종성 프로빙). 일반적으로, 프로브는 약 1000개 미만의 뉴클레오타이드 길이 또는 약 500개 미만의 뉴클레오타이드 길이이다.In hybridization techniques, stringent conditions can be used to ensure that a probe or primer will hybridize specifically to its target. In some embodiments , under stringent conditions, a polynucleotide primer or probe is directed to its target sequence (e.g., a variant B4GALT1 gene locus, mRNA or cDNA) will hybridize to a detectably greater degree at least 2-fold over background or 10-fold greater than background. Stringent conditions are sequence dependent and will be different in different situations. By controlling the stringency of hybridization and/or washing conditions, target sequences that are 100% complementary to the probe can be identified (homology probing). Alternatively, stringency conditions can be adjusted to allow for some mismatches in the sequence, such that a lower degree of identity is detected (heterogeneity probing). Typically, probes are less than about 1000 nucleotides long or less than about 500 nucleotides long.

DNA 혼성화를 촉진시키는 적절한 엄격성 조건, 예를 들어, 약 45℃에서 6× 염화나트륨/시트르산나트륨(SSC), 그 다음 50℃에서 2× SSC의 세척이 공지되어 있거나 또는 문헌[Current Protocols in Molecular Biology, John Wiley & Sons, N.Y. (1989), 6.3.1-6.3.6]에서 찾아볼 수 있다. 전형적으로, 혼성화 및 검출을 위한 엄격한 조건은 염 농도가 pH 7.0 내지 8.3에서 약 1.5M Na 이온보다 낮고, 전형적으로 약 0.01 내지 1.0M Na 이온 농도(또는 다른 염)이고, 온도가 짧은 프로브(예를 들어, 10 내지 50개 뉴클레오타이드)의 경우 적어도 약 30℃이고, 보다 긴 프로브(예를 들어, 50개 이상의 뉴클레오타이드)의 경우 적어도 약 60℃인 것일 것이다. 엄격한 조건은 또한 탈안정화제, 예컨대, 폼아마이드의 첨가로 달성될 수 있다. 예시적인 낮은 엄격성 조건은 37℃에서 30 내지 35% 폼아마이드, 1M NaCl, 1% SDS(황산도데실나트륨)의 완충액으로 혼성화, 및 50 내지 55℃에서 1× 내지 2× SSC(20× SSC = 3.0M NaCl/0.3M 구연산삼나트륨)에서의 세척을 포함한다. 예시적인 중간 엄격성 조건은 37℃에서 40 내지 45% 폼아마이드, 1.0M NaCl, 1% SDS에서의 혼성화 및 55 내지 60℃에서 0.5× 내지 1× SSC에서의 세척을 포함한다. 예시적인 높은 엄격성 조건은 37℃에서 50% 폼아마이드, 1M NaCl, 1% SDS에서의 혼성화, 및 60 내지 65℃에서 0.1× SSC에서의 세척을 포함한다. 선택적으로, 세척 완충액은 약 0.1% 내지 약 1%의 SDS를 포함할 수 있다. 혼성화의 기간은 일반적으로 약 24시간 미만, 통상적으로 약 4 내지 약 12시간이다. 세척 시간의 기간은 적어도 평형에 도달하는데 충분한 시간 기간일 것이다.Conditions of appropriate stringency that promote DNA hybridization, e.g., 6×sodium chloride/sodium citrate (SSC) at about 45°C, followed by washes of 2×SSC at 50°C, are known or described in Current Protocols in Molecular Biology. , John Wiley & Sons, NY (1989), 6.3.1-6.3.6]. Typically, stringent conditions for hybridization and detection are salt concentrations lower than about 1.5 M Na ion at pH 7.0 to 8.3, typically about 0.01 to 1.0 M Na ion concentration (or other salt), and short temperature probes (e.g. for longer probes (e.g., 10 to 50 nucleotides) and at least about 60°C for longer probes (e.g., 50 or more nucleotides). Stringent conditions can also be achieved with the addition of destabilizing agents, such as formamide. Exemplary low stringency conditions are hybridization with a buffer of 30-35% formamide, 1M NaCl, 1% SDS (sodium dodecyl sulfate) at 37°C, and 1× to 2× SSC (20× SSC) at 50-55°C. = 3.0M NaCl/0.3M trisodium citrate). Exemplary medium stringency conditions include hybridization in 40-45% formamide, 1.0M NaCl, 1% SDS at 37°C and washing in 0.5× to 1× SSC at 55-60°C. Exemplary high stringency conditions include hybridization in 50% formamide, 1M NaCl, 1% SDS at 37°C, and washing in 0.1×SSC at 60-65°C. Optionally, the wash buffer may include about 0.1% to about 1% SDS. The period of hybridization is generally less than about 24 hours, typically about 4 to about 12 hours. The period of washing time will be at least a sufficient period of time to reach equilibrium.

혼성화 반응에서, 특이성은 전형적으로 혼성화 후 세척의 함수이고, 결정적인 인자는 이온 강도 및 최종 세척 용액의 온도이다. DNA-DNA 혼성체의 경우, T_m은 문헌[Meinkoth and Wahl, Anal. Biochem., 1984, 138, 267-284의 식: T_m = 81.5℃ + 16.6(log M) + 0.41(% GC) - 0.61(% 폼) - 500/L로부터 근사될 수 있고; 식 중 M은 일가 양이온의 몰 농도이고, %GC는 DNA 내의 구아노신 및 사이토신 뉴클레오타이드의 백분율이고, % 폼은 혼성화 용액에서 폼아마이드의 백분율이고, L은 염기쌍에서 혼성체의 길이이다. T_m은 상보적인 표적 서열의 50%가 완벽하게 매치되는 프로브에 혼성화되는 온도(명시된 이온 강도 및 pH 하에서)이다. T_m은 각각 1%의 미스매치에 대해 약 1℃만큼 감소되고; 따라서, T_m, 혼성화 및/또는 세척 조건은 목적하는 동일성의 서열에 혼성화되도록 조정될 수 있다. 예를 들어, 90% 이상의 동일성을 갖는 서열이 추구되면, T_m은 10℃ 감소될 수 있다. 일반적으로, 엄격한 조건은 정의된 이온 강도 및 pH에서 특정 서열 및 이의 보체에 대한 열 융점(T_m)보다 약 5℃ 낮도록 선택된다. 그러나, 극심한 엄격한 조건은 열 융점(T_m)보다 1℃, 2℃, 3℃ 또는 4℃ 더 낮은 온도에서의 혼성화 및/또는 세척을 활용할 수 있고; 중간 정도의 엄격한 조건은 열 융점(T_m)보다 6℃, 7℃, 8℃, 9℃ 또는 10℃ 더 낮은 온도에서의 혼성화 및/또는 세척을 활용할 수 있고; 낮은 엄격성 조건은 열 융점(T_m)보다 11℃, 12℃, 13℃, 14℃, 15℃ 또는 20℃ 더 낮은 온도에서의 혼성화 및/또는 세척을 활용할 수 있다. 상기 식, 혼성화 및 세척 조성물 및 목적하는 T_m을 사용하여, 당업자는 혼성화의 엄격성 및/또는 세척 용액에서 변동이 내재적으로 기재된다고 이해할 것이다. 목적하는 정도의 미스매치가 45℃(수성 용액) 또는 32℃(폼아마이드 용액)보다 낮은 T_m을 유발하면, 보다 더 높은 온도가 사용될 수 있도록 SSC 농도를 증가시키는 것이 최적이다. In hybridization reactions, specificity is typically a function of post-hybridization washing, with the determining factors being the ionic strength and temperature of the final washing solution. For DNA-DNA hybrids, T _m is described in Meinkoth and Wahl, Anal. Biochem ., 1984, 138, 267-284 can be approximated from the equation: T _m = 81.5° C. + 16.6 (log M) + 0.41 (% GC) - 0.61 (% foam) - 500/L; where M is the molar concentration of the monovalent cation, %GC is the percentage of guanosine and cytosine nucleotides in the DNA, %form is the percentage of formamide in the hybridization solution, and L is the length of the hybrid in base pairs. T _m is the temperature (under specified ionic strength and pH) at which 50% of the complementary target sequence hybridizes to the perfectly matched probe. T _m decreases by about 1°C for each 1% mismatch; Accordingly, T _m , hybridization and/or washing conditions can be adjusted to hybridize to the sequence of desired identity. For example, if sequences with greater than 90% identity are sought, T _m may be reduced by 10°C. Typically, stringent conditions are chosen to be about 5°C below the thermal melting point (T _m ) for a particular sequence and its complement at a defined ionic strength and pH. However, extremely stringent conditions may utilize hybridization and/or washing at temperatures 1°C, 2°C, 3°C, or 4°C lower than the thermal melting point (T _m ); Moderately stringent conditions may utilize hybridization and/or washing at temperatures 6°C, 7°C, 8°C, 9°C, or 10°C below the thermal melting point (T _m ); Low stringency conditions may utilize hybridization and/or washing at temperatures 11°C, 12°C, 13°C, 14°C, 15°C, or 20°C below the thermal melting point (T _m ). Using the above equations, hybridization and wash compositions, and desired T _m , one of ordinary skill in the art will understand that variations in the stringency of hybridization and/or wash solutions are implicitly described. If the desired degree of mismatch results in a T _m lower than 45°C (aqueous solution) or 32°C (formamide solution), it is optimal to increase the SSC concentration so that higher temperatures can be used.

예를 들어, 단백질 서열분석 및 면역검정을 비롯하여, 생물학적 샘플에서 변이체 B4GALT1 폴리펩타이드의 존재를 및 수준을 검출하기 위한 방법이 또한 제공된다. 일부 실시형태에서, 인간 대상체에서 B4GALT1 Asn352Ser의 존재를 검출하는 방법은 인간 대상체로부터의 생물학적 샘플에 대해서 검정을 수행하여 생물학적 샘플에서 B4GALT1 Asn352Ser의 존재를 결정하는 단계를 포함한다. Methods for detecting the presence and levels of variant B4GALT1 polypeptides in biological samples are also provided, including, for example, protein sequencing and immunoassays. In some embodiments, a method of detecting the presence of B4GALT1 Asn352Ser in a human subject comprises performing an assay on a biological sample from the human subject to determine the presence of B4GALT1 Asn352Ser in the biological sample.

단백질 서열분석 기술의 예시적인 비제한적인 예는 질량 분광분석법 및 에드만 분해를 포함하지만 이들로 제한되지 않는다. 면역검정의 예시적인 면역침전, 웨스턴 블롯, 면역조직화학, ELISA, 면역세포화학, 유세포 분석법 및 면역-PCR을 포함하지만 이들로 제한되지 않는다. 다양한 공지된 기술(예를 들어, 비색정량, 형광, 화학발광 또는 방사성)을 이용하여 검출 가능하게 표지된 다클론성 또는 단클론성 항체가 면역검정에서의 사용에 적합하다. Illustrative, non-limiting examples of protein sequencing techniques include, but are not limited to, mass spectrometry and Edman degradation. Exemplary immunoassays include, but are not limited to, immunoprecipitation, Western blot, immunohistochemistry, ELISA, immunocytochemistry, flow cytometry, and immuno-PCR. Polyclonal or monoclonal antibodies that are detectably labeled using a variety of known techniques (e.g., colorimetric, fluorescent, chemiluminescent, or radioactive) are suitable for use in immunoassays.

본 개시내용은 또한 심혈관 병태의 발병 또는 심혈관 병태의 발병 위험에 대한 인간 대상체의 민감성을 결정하는 방법을 제공한다. 대상체는 예를 들어, 인간, 비-인간 포유동물, 설치류, 마우스 또는 래트를 비롯한 임의의 유기체일 수 있다. 일부 실시형태에서, 방법은 대상체로부터의 생물학적 샘플에서 변이체 B4GALT1 게놈 DNA, mRNA 또는 cDNA의 존재를 검출하는 것을 포함한다. 집단 내에서 유전자 서열 및 이러한 유전자에 의해 암호화된 mRNA는 다형성, 예컨대, 다형성, 예컨대, SNP로 인해 달라질 수 있다는 것이 이해된다. B4GALT1 유전자, mRNA, cDNA 및 폴리펩타이드에 대해서 본 명세서에 제공된 서열은 단지 예시적인 서열이고, 다른 이러한 서열이 또한 가능하다.The present disclosure also provides methods for determining the susceptibility of a human subject to developing a cardiovascular condition or at risk of developing a cardiovascular condition. The subject can be any organism, including, for example, a human, non-human mammal, rodent, mouse or rat. In some embodiments, the method comprises detecting the presence of variant B4GALT1 genomic DNA, mRNA, or cDNA in a biological sample from a subject. It is understood that within a population, gene sequences and the mRNA encoded by such genes may vary due to polymorphisms, such as polymorphisms, such as SNPs. The sequences provided herein for the B4GALT1 gene, mRNA, cDNA and polypeptide are merely exemplary sequences, and other such sequences are also possible.

심혈관 병태의 비제한적인 예는 1종 이상의 혈청 지질의 증가된 수준을 포함한다. 혈청 지질은 콜레스테롤, LDL, HDL, 트라이글리세리드, HDL-콜레스테롤 및 비-HDL 콜레스테롤 또는 이들의 하위분류물(예를 들어, HDL2, HDL2a, HDL2b, HDL2c, HDL3, HDL3a, HDL3b, HDL3c, HDL3d, LDL1, LDL2, LDL3, 지방단백질 A, Lpa1, Lpa1, Lpa3, Lpa4 또는 Lpa5) 중 1종 이상을 포함한다. 심혈관 병태는 관상 동맥 석회화의 증가된 수준을 포함할 수 있다. 심혈관 병태는 타입 IId 글리코실화(CDG-IId)를 포함할 수 있다. 심혈관 병태는 심장주변 지방의 증가된 수준을 포함할 수 있다. 심혈관 병태는 또한 관상 동맥 질환(CAD), 심근 경색(MI), 말초 동맥 질환(peripheral artery disease: PAD), 뇌졸중, 폐 색전증, 심부정맥 혈전증(deep vein thrombosis: DVT) 및 출혈 체질 및 응고장애를 포함할 수 있다. 심혈관 병태는 죽상혈전성 병태를 포함할 수 있다. 죽상혈전성 병태는 피브리노겐의 증가된 수준을 포함할 수 있다. 죽상혈전성 병태는 피브리노겐-매개된 혈병을 포함할 수 있다. 심혈관 병태는 피브리노겐의 증가된 수준을 포함할 수 있다. 심혈관 병태는 피브리노겐-매개된 혈병을 포함할 수 있다. 심혈관 병태는 피브리노겐 활성도의 관여로부터 형성된 혈병을 포함할 수 있다. 피브리노겐-매개된 혈병 또는 피브리노겐 활성도의 관여로부터 형성된 혈병은 신체 내의 임의의 정맥 또는 동맥 내에 존재할 수 있다. Non-limiting examples of cardiovascular conditions include increased levels of one or more serum lipids. Serum lipids include cholesterol, LDL, HDL, triglycerides, HDL-cholesterol, and non-HDL cholesterol or their subclasses (e.g., HDL2, HDL2a, HDL2b, HDL2c, HDL3, HDL3a, HDL3b, HDL3c, HDL3d, LDL1). , LDL2, LDL3, lipoprotein A, Lpa1, Lpa1, Lpa3, Lpa4 or Lpa5). Cardiovascular conditions may include increased levels of coronary artery calcification. Cardiovascular conditions may involve type IId glycosylation (CDG-IId). Cardiovascular conditions may include increased levels of fat around the heart. Cardiovascular conditions also include coronary artery disease (CAD), myocardial infarction (MI), peripheral artery disease (PAD), stroke, pulmonary embolism, deep vein thrombosis (DVT), and bleeding diathesis and coagulopathy. It can be included. Cardiovascular conditions may include atherothrombotic conditions. Atherothrombotic conditions may involve increased levels of fibrinogen. Atherothrombotic conditions may include fibrinogen-mediated blood clots. Cardiovascular conditions may involve increased levels of fibrinogen. Cardiovascular conditions may include fibrinogen-mediated blood clots. Cardiovascular conditions may include blood clots that form from involvement of fibrinogen activity. Fibrinogen-mediated blood clots, or blood clots formed from involvement of fibrinogen activity, can be present within any vein or artery within the body.

일부 실시형태에서, 심혈관 병태의 발병에 대한 인간 대상체의 민감성을 결정하는 방법은, a) 인간 대상체로부터의 생물학적 샘플에 대해서 검정을 수행하여 생물학적 샘플 중의 핵산 분자가 전장/성숙 변이체 B4GALT1 Asn352Ser 폴리펩타이드의 352번 위치에 상응하는 위치에 세린을 암호화하는 핵산 서열을 포함하는지의 여부를 결정하는 단계; 및 b) 전장/성숙 변이체 B4GALT1 Asn352Ser 폴리펩타이드의 352번 위치에 세린을 암호화하는 핵산 서열을 포함하는 핵산 분자가 생물학적 샘플에서 검출되면, 인간 대상체를 심혈관 병태의 발병에 대해서 감소된 위험을 갖는 것으로 분류하거나, 또는 전장/성숙 변이체 B4GALT1 Asn352Ser 폴리펩타이드의 352번 위치에 세린을 암호화하는 핵산 서열을 포함하는 핵산 분자가 생물학적 샘플에서 검출되지 않으면, 인간 대상체를 심혈관 병태의 발병에 대해서 증가된 위험을 갖는 것으로 분류하는 단계를 포함한다. 일부 실시형태에서, 변이체 B4GALT1 Asn352Ser 폴리펩타이드는 서열번호 8을 포함한다. 일부 실시형태에서, 생물학적 샘플 중의 핵산 분자는 게놈 DNA, mRNA 또는 cDNA이다.In some embodiments, a method of determining the susceptibility of a human subject to developing a cardiovascular condition comprises: a) performing an assay on a biological sample from the human subject to determine whether the nucleic acid molecule in the biological sample is of the full-length/mature variant B4GALT1 Asn352Ser polypeptide; determining whether the position corresponding to position 352 contains a nucleic acid sequence encoding serine; and b) if a nucleic acid molecule comprising a nucleic acid sequence encoding a serine at position 352 of the full-length/mature variant B4GALT1 Asn352Ser polypeptide is detected in the biological sample, the human subject is classified as having a reduced risk for the development of a cardiovascular condition. Alternatively, if a nucleic acid molecule comprising a nucleic acid sequence encoding a serine at position 352 of the full-length/mature variant B4GALT1 Asn352Ser polypeptide is not detected in the biological sample, the human subject is considered to be at increased risk for the development of a cardiovascular condition. Includes a classification step. In some embodiments, the variant B4GALT1 Asn352Ser polypeptide comprises SEQ ID NO:8. In some embodiments, the nucleic acid molecules in the biological sample are genomic DNA, mRNA, or cDNA.

일부 실시형태에서, 본 개시내용은 심혈관 병태의 발병에 대한 인간 대상체의 민감성을 결정하는 방법을 제공하며, 이 방법은, a) 인간 대상체로부터의 생물학적 샘플에 대해서 검정을 수행하여 생물학적 샘플 중의 핵산 분자가 서열번호 2의 53575 내지 53577번 위치에 상응하는 위치에서 서열번호 2의 뉴클레오타이드 53575 내지 53577을 포함하는지의 여부를 결정하는 단계; 및 b) 서열번호 2의 위치 53575 내지 53577에 상응하는 위치에서 서열번호 2의 뉴클레오타이드 53575 내지 53577을 포함하는 핵산 분자가 생물학적 샘플에서 검출되면, 인간 대상체를 심혈관 병태의 발병에 대해서 감소된 위험을 갖는 것으로 분류하거나, 또는 서열번호 2의 위치 53575 내지 53577에 상응하는 위치에서 서열번호 2의 뉴클레오타이드 53575 내지 53577을 포함하는 핵산 분자가 생물학적 샘플에서 검출되지 않으면, 인간 대상체를 심혈관 병태의 발병에 대해서 증가된 위험을 갖는 것으로 분류하는 단계를 포함한다.In some embodiments, the present disclosure provides a method of determining the susceptibility of a human subject to developing a cardiovascular condition, the method comprising: a) performing an assay on a biological sample from the human subject to identify nucleic acid molecules in the biological sample; determining whether it includes nucleotides 53575 to 53577 of SEQ ID NO: 2 at a position corresponding to positions 53575 to 53577 of SEQ ID NO: 2; and b) if a nucleic acid molecule comprising nucleotides 53575 to 53577 of SEQ ID NO: 2 at a position corresponding to positions 53575 to 53577 of SEQ ID NO: 2 is detected in the biological sample, the human subject has a reduced risk for developing a cardiovascular condition. or, if a nucleic acid molecule comprising nucleotides 53575 to 53577 of SEQ ID NO: 2 at a position corresponding to positions 53575 to 53577 of SEQ ID NO: 2 is not detected in the biological sample, the human subject is at increased risk for the development of cardiovascular conditions. Includes the step of classifying it as having a risk.

일부 실시형태에서, 본 개시내용은 심혈관 병태의 발병에 대한 인간 대상체의 민감성을 결정하는 방법을 제공하며, 이 방법은, a) 인간 대상체로부터의 생물학적 샘플에 대해서 검정을 수행하여 생물학적 샘플 중의 핵산 분자가 서열번호 4의 1243 내지 1245번 위치에 상응하는 위치에서 서열번호 4의 뉴클레오타이드 1243 내지 1245를 포함하는지의 여부를 결정하는 단계; 및 b) 서열번호 4의 위치 1243 내지 1245에 상응하는 위치에서 서열번호 4 뉴클레오타이드 1243 내지 1245를 포함하는 핵산 분자가 생물학적 샘플에서 검출되면, 인간 대상체를 심혈관 병태의 발병에 대해서 감소된 위험을 갖는 것으로 분류하거나, 또는 서열번호 4의 위치 1243 내지 1245에 상응하는 위치에서 서열번호 4의 뉴클레오타이드 1243 내지 1245를 포함하는 핵산 분자가 생물학적 샘플에서 검출되지 않으면, 인간 대상체를 심혈관 병태의 발병에 대해서 증가된 위험을 갖는 것으로 분류하는 단계를 포함한다.In some embodiments, the present disclosure provides a method of determining the susceptibility of a human subject to developing a cardiovascular condition, the method comprising: a) performing an assay on a biological sample from the human subject to identify nucleic acid molecules in the biological sample; determining whether it includes nucleotides 1243 to 1245 of SEQ ID NO: 4 at a position corresponding to positions 1243 to 1245 of SEQ ID NO: 4; and b) if a nucleic acid molecule comprising nucleotides 1243 to 1245 of SEQ ID NO:4 at a position corresponding to positions 1243 to 1245 of SEQ ID NO:4 is detected in the biological sample, then the human subject is at reduced risk for developing a cardiovascular condition. classification, or if a nucleic acid molecule comprising nucleotides 1243 to 1245 of SEQ ID NO: 4 at a position corresponding to positions 1243 to 1245 of SEQ ID NO: 4 is not detected in the biological sample, the human subject is at increased risk for the development of a cardiovascular condition. It includes the step of classifying as having.

일부 실시형태에서, 본 개시내용은 심혈관 병태의 발병에 대한 인간 대상체의 민감성을 결정하는 방법을 제공하며, 이 방법은, a) 인간 대상체로부터의 생물학적 샘플에 대해서 검정을 수행하여 생물학적 샘플 중의 핵산 분자가 서열번호 6의 1054 내지 1056번 위치에 상응하는 위치에서 서열번호 6의 뉴클레오타이드 1054 내지 1056을 포함하는지의 여부를 결정하는 단계; 및 b) 서열번호 6의 위치 1054 내지 1056에 상응하는 위치에서 서열번호 6 뉴클레오타이드 1054 내지 1056을 포함하는 핵산 분자가 생물학적 샘플에서 검출되면, 인간 대상체를 심혈관 병태의 발병에 대해서 감소된 위험을 갖는 것으로 분류하거나, 또는 서열번호 6의 위치 1054 내지 1056에 상응하는 위치에서 서열번호 6의 뉴클레오타이드 1054 내지 1056을 포함하는 핵산 분자가 생물학적 샘플에서 검출되지 않으면, 인간 대상체를 심혈관 병태의 발병에 대해서 증가된 위험을 갖는 것으로 분류하는 단계를 포함한다.In some embodiments, the present disclosure provides a method of determining the susceptibility of a human subject to developing a cardiovascular condition, the method comprising: a) performing an assay on a biological sample from the human subject to identify nucleic acid molecules in the biological sample; determining whether it includes nucleotides 1054 to 1056 of SEQ ID NO:6 at a position corresponding to positions 1054 to 1056 of SEQ ID NO:6; and b) if a nucleic acid molecule comprising nucleotides 1054 to 1056 of SEQ ID NO:6 at a position corresponding to positions 1054 to 1056 of SEQ ID NO:6 is detected in the biological sample, then the human subject is at reduced risk for developing a cardiovascular condition. classification, or if a nucleic acid molecule comprising nucleotides 1054 to 1056 of SEQ ID NO:6 at positions corresponding to positions 1054 to 1056 of SEQ ID NO:6 is not detected in the biological sample, the human subject is at increased risk for the development of a cardiovascular condition. It includes the step of classifying as having.

일부 실시형태에서, 방법은 생물학적 샘플에서 변이체 B4GALT1 게놈 DNA의 존재를 검출하는 것을 포함한다. 일부 실시형태에서, 이러한 방법은 심혈관 병태의 발병 또는 심혈관 병태의 발병 위험에 대한 인간 대상체의 민감성을 결정하는 것을 포함하며, 이 방법은 a) 게놈 DNA를 포함하는 대상체로부터 생물학적 샘플을 획득하는 단계; b) 게놈 DNA에 대해서 검정을 수행하여 변이체 B4GALT1 유전자(예를 들어, 서열번호 2 참고)의 53575 내지 53577번 위치에 상응하는 DNA 점유 위치 내의 뉴클레오타이드의 동일성을 결정하는 단계; 및 c) 변이체 B4GALT1 유전자의 53575 내지 53577번 위치에 상응하는 게놈 DNA 내의 위치가 아스파라긴이 아닌 세린을 암호화하면, 대상체를 심혈관 병태의 발병에 대해서 감소된 위험을 갖는 것으로서 분류하는 단계를 포함한다. 대안적으로, 변이체 B4GALT1 유전자의 53575 내지 53577번 위치에 상응하는 게놈 DNA 내의 위치가 아스파라긴이 아닌 세린을 암호화하지 않으면, 대상체는 심혈관 병태의 발병에 대해서 증가된 위험을 갖는 것으로서 분류될 수 있다.In some embodiments, the method includes detecting the presence of variant B4GALT1 genomic DNA in a biological sample. In some embodiments, such methods include determining the susceptibility of a human subject to developing a cardiovascular condition or at risk of developing a cardiovascular condition, the method comprising: a) obtaining a biological sample from the subject comprising genomic DNA; b) performing an assay on genomic DNA to determine the identity of nucleotides within the DNA occupied positions corresponding to positions 53575 to 53577 of the variant B4GALT1 gene (e.g., see SEQ ID NO: 2); and c) classifying the subject as having a reduced risk for developing a cardiovascular condition if the position in the genomic DNA corresponding to positions 53575 to 53577 of the variant B4GALT1 gene encodes a serine rather than an asparagine. Alternatively, if the position in the genomic DNA corresponding to positions 53575 to 53577 of the variant B4GALT1 gene does not encode a serine rather than an asparagine, the subject may be classified as having an increased risk for the development of a cardiovascular condition.

일부 실시형태에서, 이러한 방법은 심혈관 병태를 갖는 대상체를 진단하는 것을 포함하며, 이 방법은, a) 게놈 DNA를 포함하는 대상체로부터 생물학적 샘플을 획득하는 단계; b) 게놈 DNA에 대해서 검정을 수행하여 변이체 B4GALT1 유전자(예를 들어, 서열번호 2 참고)의 53575 내지 53577번 위치에 상응하는 DNA 점유 위치 내의 뉴클레오타이드의 동일성을 결정하는 단계; 및 c) 변이체 B4GALT1 유전자의 53575 내지 53577번 위치에 상응하는 게놈 DNA 내의 위치가 아스파라긴이 아닌 세린을 암호화하면, 대상체를 심혈관 병태를 갖는 것으로서 분류하는 단계를 포함한다. 대안적으로, 변이체 B4GALT1 유전자의 53575 내지 53577번 위치에 상응하는 게놈 DNA 내의 위치가 아스파라긴이 아닌 세린을 암호화하지 않으면, 대상체는 심혈관 병태를 갖지 않는 것으로서 분류될 수 있다.In some embodiments, such methods include diagnosing a subject with a cardiovascular condition, comprising: a) obtaining a biological sample from the subject comprising genomic DNA; b) performing an assay on genomic DNA to determine the identity of nucleotides within the DNA occupied positions corresponding to positions 53575 to 53577 of the variant B4GALT1 gene (e.g., see SEQ ID NO: 2); and c) classifying the subject as having a cardiovascular condition if the position in the genomic DNA corresponding to positions 53575 to 53577 of the variant B4GALT1 gene encodes a serine rather than an asparagine. Alternatively, if the position in the genomic DNA corresponding to positions 53575 to 53577 of the variant B4GALT1 gene does not encode a serine but not an asparagine, the subject may be classified as not having a cardiovascular condition.

일부 실시형태에서, 방법은 생물학적 샘플에서 변이체 B4GALT1 mRNA의 존재를 검출하는 것을 포함한다. 일부 실시형태에서, 이러한 방법은 심혈관 병태의 발병 또는 심혈관 병태의 발병 위험에 대한 인간 대상체의 민감성을 결정하는 것을 포함하며, 이 방법은 a) mRNA를 포함하는 대상체로부터 생물학적 샘플을 획득하는 단계; b) mRNA에 대해서 검정을 수행하여 변이체 B4GALT1 mRNA(예를 들어, 서열번호 4 참고)의 1243 내지 1245번 위치에 상응하는 mRNA 점유 위치 내의 뉴클레오타이드의 동일성을 결정하는 단계; 및 c) 변이체 B4GALT1 mRNA의 1243 내지 1245번 위치에 상응하는 mRNA 내의 위치가 아스파라긴이 아닌 세린을 암호화하면, 대상체를 심혈관 병태의 발병에 대해서 감소된 위험을 갖는 것으로서 분류하는 단계를 포함한다. 대안적으로, 변이체 B4GALT1 mRNA의 1243 내지 1245번 위치에 상응하는 mRNA 내의 위치가 아스파라긴이 아닌 세린을 암호화하지 않으면, 대상체는 심혈관 병태의 발병에 대해서 증가된 위험을 갖는 것으로서 분류될 수 있다.In some embodiments, the method includes detecting the presence of variant B4GALT1 mRNA in a biological sample. In some embodiments, such methods include determining the susceptibility of a human subject to developing a cardiovascular condition or at risk of developing a cardiovascular condition, the method comprising: a) obtaining a biological sample from the subject comprising mRNA; b) performing an assay on the mRNA to determine the identity of the nucleotides within the mRNA occupied positions corresponding to positions 1243 to 1245 of the variant B4GALT1 mRNA (see, e.g., SEQ ID NO: 4); and c) classifying the subject as having a reduced risk for developing a cardiovascular condition if the position in the mRNA corresponding to positions 1243 to 1245 of the variant B4GALT1 mRNA encodes a serine rather than an asparagine. Alternatively, if the position in the mRNA corresponding to positions 1243 to 1245 of the variant B4GALT1 mRNA does not encode a serine rather than an asparagine, the subject may be classified as having an increased risk for the development of a cardiovascular condition.

일부 실시형태에서, 이러한 방법은 심혈관 병태를 갖는 대상체를 진단하는 것을 포함하며, 이 방법은, a) mRNA를 포함하는 대상체로부터 생물학적 샘플을 획득하는 단계; b) mRNA에 대해서 검정을 수행하여 변이체 B4GALT1 mRNA(예를 들어, 서열번호 4 참고)의 1243 내지 1245번 위치에 상응하는 mRNA 점유 위치 내의 뉴클레오타이드의 동일성을 결정하는 단계; 및 c) 변이체 B4GALT1 mRNA의 1243 내지 1245번 위치에 상응하는 mRNA 내의 위치가 아스파라긴이 아닌 세린을 암호화하면, 대상체를 심혈관 병태를 갖는 것으로서 분류하는 단계를 포함한다. 대안적으로, 변이체 B4GALT1 mRNA의 1243 내지 1245번 위치에 상응하는 mRNA 내의 위치가 아스파라긴이 아닌 세린을 암호화하지 않으면, 대상체는 심혈관 병태를 갖지 않는 것으로서 분류될 수 있다.In some embodiments, such methods include diagnosing a subject with a cardiovascular condition, comprising: a) obtaining a biological sample from the subject comprising mRNA; b) performing an assay on the mRNA to determine the identity of the nucleotides within the mRNA occupied positions corresponding to positions 1243 to 1245 of the variant B4GALT1 mRNA (see, e.g., SEQ ID NO: 4); and c) classifying the subject as having a cardiovascular condition if the position in the mRNA corresponding to positions 1243 to 1245 of the variant B4GALT1 mRNA encodes a serine rather than an asparagine. Alternatively, if the position in the mRNA corresponding to positions 1243 to 1245 of the variant B4GALT1 mRNA does not encode a serine rather than an asparagine, the subject may be classified as not having a cardiovascular condition.

일부 실시형태에서, 방법은 생물학적 샘플에서 변이체 B4GALT1 cDNA의 존재를 검출하는 것을 포함한다. 일부 실시형태에서, 이러한 방법은 심혈관 병태의 발병 또는 심혈관 병태의 발병 위험에 대한 인간 대상체의 민감성을 결정하는 것을 포함하며, 이 방법은 a) cDNA를 포함하는 대상체로부터 생물학적 샘플을 획득하는 단계; b) cDNA에 대해서 검정을 수행하여 변이체 B4GALT1 cDNA(예를 들어, 서열번호 6 참고)의 1054 내지 1056번 위치에 상응하는 cDNA 점유 위치 내의 뉴클레오타이드의 동일성을 결정하는 단계; 및 c) 변이체 B4GALT1 cDNA의 1054 내지 1056번 위치에 상응하는 cDNA 내의 위치가 아스파라긴이 아닌 세린을 암호화하면, 대상체를 심혈관 병태의 발병에 대해서 감소된 위험을 갖는다고 분류하는 단계를 포함한다. 대안적으로, 변이체 B4GALT1 cDNA의 1054 내지 1056번 위치에 상응하는 cDNA 내의 위치가 아스파라긴이 아닌 세린을 암호화하지 않으면, 대상체는 심혈관 병태의 발병에 대해서 증가된 위험을 갖는 것으로서 분류될 수 있다.In some embodiments, the method includes detecting the presence of variant B4GALT1 cDNA in a biological sample. In some embodiments, such methods include determining the susceptibility of a human subject to developing a cardiovascular condition or at risk of developing a cardiovascular condition, the method comprising: a) obtaining a biological sample from the subject comprising cDNA; b) performing an assay on the cDNA to determine the identity of the nucleotides within the cDNA occupied positions corresponding to positions 1054 to 1056 of the variant B4GALT1 cDNA (see, e.g., SEQ ID NO:6); and c) if the position in the cDNA corresponding to positions 1054 to 1056 of the variant B4GALT1 cDNA encodes a serine rather than an asparagine, classifying the subject as having a reduced risk for developing a cardiovascular condition. Alternatively, if the position in the cDNA corresponding to positions 1054 to 1056 of the variant B4GALT1 cDNA does not encode a serine rather than an asparagine, the subject may be classified as having an increased risk for the development of a cardiovascular condition.

일부 실시형태에서, 이러한 방법은 심혈관 병태를 갖는 대상체를 진단하는 것을 포함하며, 이 방법은, a) cDNA를 포함하는 대상체로부터 생물학적 샘플을 획득하는 단계; b) cDNA에 대해서 검정을 수행하여 변이체 B4GALT1 cDNA(예를 들어, 서열번호 6 참고)의 1054 내지 1056번 위치에 상응하는 cDNA 점유 위치 내의 뉴클레오타이드의 동일성을 결정하는 단계; 및 c) 변이체 B4GALT1 cDNA의 1054 내지 1056번 위치에 상응하는 cDNA 내의 위치가 아스파라긴이 아닌 세린을 암호화하면, 대상체를 심혈관 병태를 갖는 것으로서 분류하는 단계를 포함한다. 대안적으로, 변이체 B4GALT1 cDNA의 1054 내지 1056번 위치에 상응하는 cDNA 내의 위치가 아스파라긴이 아닌 세린을 암호화하지 않으면, 대상체는 심혈관 병태를 갖지 않는 것으로서 분류될 수 있다.In some embodiments, such methods include diagnosing a subject with a cardiovascular condition, comprising: a) obtaining a biological sample from the subject comprising cDNA; b) performing an assay on the cDNA to determine the identity of the nucleotides within the cDNA occupied positions corresponding to positions 1054 to 1056 of the variant B4GALT1 cDNA (see, e.g., SEQ ID NO:6); and c) classifying the subject as having a cardiovascular condition if the position in the cDNA corresponding to positions 1054 to 1056 of the variant B4GALT1 cDNA encodes a serine and not an asparagine. Alternatively, if the position in the cDNA corresponding to positions 1054 to 1056 of the variant B4GALT1 cDNA does not encode a serine rather than an asparagine, the subject may be classified as not having a cardiovascular condition.

일부 실시형태에서, 검정은 a) 생물학적 샘플을, i) 서열번호 2의 53575 내지 53577번 위치에 상응하는 B4GALT1 게놈 서열의 위치에 인접한 B4GALT1 게놈 서열의 일부; ii) 서열번호 4의 1243 내지 1245번 위치에 상응하는 B4GALT1 mRNA의 위치에 인접한 B4GALT1 mRNA 서열의 일부; 또는 iii) 서열번호 6의 1054 내지 1056번 위치에 상응하는 B4GALT1 cDNA의 위치에 인접한 B4GALT1 cDNA 서열의 일부에 혼성화되는 프라이머와 접촉시키는 단계; b) 프라이머를 적어도 i) 53575 내지 53577번 위치에 상응하는 B4GALT1 게놈 서열의 위치; ii) 1243 내지 1245번 위치에 상응하는 B4GALT1 mRNA의 위치; 또는 iii) 1054 내지 1056번 위치에 상응하는 B4GALT1 cDNA의 위치를 통해서 연장시키는 단계; 및 c) 프라이머의 연장 산물이 서열번호 8의 352번 위치에 세린을 암호화하는 i) B4GALT1 게놈 서열의 53575 내지 53577번 위치에 상응하는 위치; ii) B4GALT1 mRNA의 1243 내지 1245번 위치에 상응하는 위치; 또는 iii) B4GALT1 cDNA의 1054 내지 1056번 위치에 상응하는 위치에서 뉴클레오타이드를 포함하는지의 여부를 결정하는 단계를 포함한다.In some embodiments, the assay involves a) a biological sample comprising: i) a portion of the B4GALT1 genomic sequence adjacent to a position in the B4GALT1 genomic sequence corresponding to positions 53575-53577 of SEQ ID NO:2; ii) a portion of the B4GALT1 mRNA sequence adjacent to the position of B4GALT1 mRNA corresponding to positions 1243 to 1245 of SEQ ID NO: 4; or iii) contacting with a primer that hybridizes to a portion of the B4GALT1 cDNA sequence adjacent to the position of the B4GALT1 cDNA corresponding to positions 1054 to 1056 of SEQ ID NO:6; b) apply primers to at least i) a position in the B4GALT1 genomic sequence corresponding to positions 53575 to 53577; ii) the location of B4GALT1 mRNA corresponding to positions 1243 to 1245; or iii) extending through the position of B4GALT1 cDNA corresponding to positions 1054 to 1056; and c) the extension product of the primer encodes a serine at position 352 of SEQ ID NO: 8, i) a position corresponding to positions 53575 to 53577 of the B4GALT1 genome sequence; ii) a position corresponding to positions 1243 to 1245 of B4GALT1 mRNA; or iii) determining whether it contains a nucleotide at a position corresponding to positions 1054 to 1056 of the B4GALT1 cDNA.

일부 실시형태에서, 검정은 생물학적 샘플을, 엄격한 조건 하에서 상응하는 야생형 B4GALT1 서열이 아닌 변이체 B4GALT1 게놈 서열, mRNA 서열 또는 cDNA 서열에 특이적으로 혼성화되는 프라이머 또는 프로브와 접촉시키는 단계 및 혼성화가 일어났는지의 여부를 결정하는 단계를 포함한다. 일부 실시형태에서, 프라이머 또는 프로브는 서열번호 2의 53575 내지 53577번 위치에 상응하는 생물학적 샘플 내의 게놈 DNA 내의 위치에 특이적으로 혼성화된다. 일부 실시형태에서, 프라이머 또는 프로브는 서열번호 4의 1243 내지 1245번 위치에 상응하는 생물학적 샘플 내의 mRNA 내의 위치에 특이적으로 혼성화된다. 일부 실시형태에서, 프라이머 또는 프로브는 서열번호 6의 1054 내지 1056번 위치에 상응하는 생물학적 샘플 내의 cDNA 내의 위치에 특이적으로 혼성화된다.In some embodiments, the assay involves contacting a biological sample under stringent conditions with a primer or probe that specifically hybridizes to a variant B4GALT1 genomic sequence, mRNA sequence, or cDNA sequence rather than the corresponding wild-type B4GALT1 sequence and determines whether hybridization has occurred. It includes the step of determining whether or not. In some embodiments, the primer or probe specifically hybridizes to a position in the genomic DNA in the biological sample corresponding to positions 53575-53577 of SEQ ID NO:2. In some embodiments, the primer or probe specifically hybridizes to a position within the mRNA in the biological sample corresponding to positions 1243-1245 of SEQ ID NO:4. In some embodiments, the primer or probe specifically hybridizes to a position within the cDNA in the biological sample corresponding to positions 1054-1056 of SEQ ID NO:6.

본 명세서에 개시된 방법에 사용될 수 있는 다른 검정은 예를 들어, 역전사 중합효소 연쇄 반응(RT-PCR) 또는 정량적 RT-PCR(qRT-PCR)을 포함한다. 본 명세서에 개시된 방법에 사용될 수 있는 추가의 다른 검정은 예를 들어, RNA 서열분석(RNA-Seq)에 이어서 생물학적 샘플에서 변이체 mRNA 또는 cDNA의 존재 또는 양의 결정을 포함한다.Other assays that can be used in the methods disclosed herein include, for example, reverse transcription polymerase chain reaction (RT-PCR) or quantitative RT-PCR (qRT-PCR). Additional other assays that can be used in the methods disclosed herein include, for example, RNA sequencing (RNA-Seq) followed by determination of the presence or amount of variant mRNA or cDNA in a biological sample.

본 개시내용은 또한 심혈관 병태의 발병에 대한 인간 대상체의 민감성을 결정하거나 또는 심혈관 병태를 갖는 대상체를 진단하는 방법을 제공하며, 이 방법은, a) 인간 대상체로부터의 생물학적 샘플에 대해서 검정을 수행하여 생물학적 샘플 중의 B4GALT1 폴리펩타이드가 서열번호 8의 352번 위치에 상응하는 위치에 세린을 포함하는지의 여부를 결정하는 단계; 및 b) 서열번호 8의 352번 위치에 상응하는 위치에 세린을 포함하는 B4GALT1 폴리펩타이드가 생물학적 샘플에서 검출되면, 인간 대상체를 심혈관 병태의 발병에 대해서 감소된 위험을 갖는 것으로 분류하거나, 또는 서열번호 8의 352번 위치에 상응하는 위치에 세린을 포함하는 B4GALT1 폴리펩타이드가 생물학적 샘플에서 검출되지 않으면, 인간 대상체를 심혈관 병태의 발병에 대해서 증가된 위험을 갖는 것으로 분류하는 단계를 포함한다. 일부 실시형태에서, 방법은 대상체로부터 생물학적 샘플을 획득하는 단계를 추가로 포함한다.The disclosure also provides a method of determining the susceptibility of a human subject to developing a cardiovascular condition or diagnosing a subject with a cardiovascular condition, the method comprising: a) performing an assay on a biological sample from the human subject; determining whether the B4GALT1 polypeptide in the biological sample contains serine at a position corresponding to position 352 of SEQ ID NO: 8; and b) if a B4GALT1 polypeptide comprising a serine at a position corresponding to position 352 of SEQ ID NO. 8 is detected in the biological sample, the human subject is classified as having a reduced risk for developing a cardiovascular condition, or SEQ ID NO. If a B4GALT1 polypeptide comprising a serine at a position corresponding to position 352 of 8 is not detected in the biological sample, classifying the human subject as having an increased risk for developing a cardiovascular condition. In some embodiments, the method further includes obtaining a biological sample from the subject.

일부 실시형태에서, 대상체가 심혈관 병태를 갖는 것으로 또는 심혈관 병태의 발병에 대해서 증가된 위험을 갖는 것으로 진단된 경우, 심혈관 병태를 치료 또는 예방하는 치료제 또는 예방제가 대상체에게 투여된다. 대안적으로, 방법은 특히 증가된 LDL 수준을 갖는 환자 및/또는 혈전증 사건의 증가된 위험을 가졌거나 또는 갖는 환자에서, 보다 임상적으로 진전된 심혈관 병태의 병기로의 진행과 연관된 하나 이상의 증상을 예방 또는 완화시키기에 적절한 치료제를 투여하는 것을 추가로 포함할 수 있다.In some embodiments, when a subject is diagnosed as having a cardiovascular condition or as having an increased risk for developing a cardiovascular condition, a therapeutic or prophylactic agent that treats or prevents the cardiovascular condition is administered to the subject. Alternatively, the method may be used to treat one or more symptoms associated with progression to a more clinically advanced stage of cardiovascular disease, particularly in patients with increased LDL levels and/or with or at increased risk of thrombotic events. It may further include administering a therapeutic agent appropriate for prevention or alleviation.

본 개시내용은 뉴클레아제 작용제, 외인성 공여자 서열, 전사 활성인자, 전사 억제인자, 안티센스 분자, 예컨대, 안티센스 RNA, siRNA, 및 shRNA, B4GALT1 폴리펩타이드 또는 이의 단편 및 재조합 B4GALT1 유전자 또는 B4GALT1 폴리펩타이드를 암호화하는 핵산을 발현하기 위한 발현 벡터의 임의의 조합물을 사용함으로써 세포를 변형시키는 방법을 제공한다. 방법은 시험관내, 생체외 또는 생체내에서 진행될 수 있다. 뉴클레아제 작용제, 외인성 공여자 서열, 전사 활성인자, 전사 억제인자, 안티센스 분자, 예컨대, 안티센스 RNA, siRNA 및 shRNA, B4GALT1 폴리펩타이드 또는 이의 단편 및 발현 벡터는 임의의 형태로 그리고 본 명세서 다른 곳에 기재된 바와 같은 임의의 수단에 의해서 세포 내에 도입될 수 있고, 모두 또는 일부는 임의의 조합물로 동시에 또는 순차적으로 도입될 수 있다. 일부 방법은 세포 내에서 내인성 B4GALT1 유전자를 변경시키는 것만을 포함한다. 일부 방법은 전사 활성인자 또는 억제인자의 사용을 통해서 또는 안티센스 분자, 예컨대, 안티센스 RNA, siRNA 및 shRNA의 사용을 통해서 내인성 B4GALT1 유전자의 발현을 변경시키는 것만을 포함한다. 일부 방법은 재조합 B4GALT1 유전자 또는 B4GALT1 폴리펩타이드 또는 이의 단편을 암호화하는 핵산을 세포 내에 도입하는 것만을 포함한다. 일부 방법은 B4GALT1 폴리펩타이드 또는 이의 단편을 세포 내에 도입하는 것 만을 포함한다(본 명세서에 개시된 B4GALT1 폴리펩타이드 또는 이의 단편 중 임의의 하나 또는 이들의 임의의 조합물). 다른 방법은 내인성 B4GALT1 유전자를 세포 내에서 변경시키는 것 및 B4GALT1 폴리펩타이드 또는 이의 단편 또는 재조합 B4GALT1 유전자 또는 B4GALT1 폴리펩타이드 또는 이의 단편을 암호화하는 핵산을 세포 내에 도입하는 것 둘 다를 포함한다. 다른 방법은 내인성 B4GALT1 유전자의 발현을 세포 내에서 변경시키는 것 및 B4GALT1 폴리펩타이드 또는 이의 단편 또는 재조합 B4GALT1 유전자 또는 B4GALT1 폴리펩타이드 또는 이의 단편을 암호화하는 핵산을 세포 내에 도입하는 것 둘 다를 포함한다.The present disclosure provides nuclease agents, exogenous donor sequences, transcriptional activators, transcriptional repressors, antisense molecules such as antisense RNA, siRNA, and shRNA, B4GALT1 polypeptides or fragments thereof, and recombinant B4GALT1 genes or B4GALT1 polypeptides encoding A method of modifying a cell is provided by using any combination of expression vectors to express nucleic acids. The method may proceed in vitro, ex vivo, or in vivo. Nuclease agents, exogenous donor sequences, transcriptional activators, transcriptional repressors, antisense molecules such as antisense RNA, siRNA and shRNA, B4GALT1 polypeptide or fragments thereof, and expression vectors may be used in any form and as described elsewhere herein. They can be introduced into cells by any means, and all or some of them can be introduced simultaneously or sequentially in any combination. Some methods involve only altering the endogenous B4GALT1 gene within the cell. Some methods involve solely altering the expression of the endogenous B4GALT1 gene through the use of transcriptional activators or repressors or through the use of antisense molecules such as antisense RNA, siRNA and shRNA. Some methods involve solely introducing into cells a nucleic acid encoding a recombinant B4GALT1 gene or B4GALT1 polypeptide or fragment thereof. Some methods involve solely introducing a B4GALT1 polypeptide or fragment thereof into a cell (any one of the B4GALT1 polypeptides or fragments thereof disclosed herein or any combination thereof). Other methods include both altering the endogenous B4GALT1 gene within the cell and introducing into the cell a nucleic acid encoding a B4GALT1 polypeptide or fragment thereof or a recombinant B4GALT1 gene or B4GALT1 polypeptide or fragment thereof. Other methods include both altering the expression of the endogenous B4GALT1 gene within the cell and introducing into the cell a B4GALT1 polypeptide or fragment thereof or a nucleic acid encoding a recombinant B4GALT1 gene or B4GALT1 polypeptide or fragment thereof.

본 개시내용은 뉴클레아제 작용제 및/또는 외인성 공여자 서열의 사용을 통해서 세포(예를 들어, 다능성 세포 또는 분화된 세포) 내에서 게놈 내의 내인성 B4GALT1 유전자를 변형시키는 방법을 제공한다. 방법은 시험관내, 생체외 또는 생체내에서 진행될 수 있다. 뉴클레아제 작용제는 단독으로 또는 외인성 공여자 서열과 조합하여 사용될 수 있다. 대안적으로, 외인성 공여자 서열은 단독으로 또는 뉴클레아제 작용제와 조합하여 사용될 수 있다.The present disclosure provides methods of modifying the endogenous B4GALT1 gene within a genome within a cell (e.g., a pluripotent cell or differentiated cell) through the use of a nuclease agent and/or an exogenous donor sequence. The method may proceed in vitro, ex vivo, or in vivo. Nuclease agents can be used alone or in combination with an exogenous donor sequence. Alternatively, exogenous donor sequences can be used alone or in combination with a nuclease agent.

이중 가닥 끊김(double-strand break: DSB)에 대한 반응에서의 수선(repair)은 원칙적으로 2개의 보존된 DNA 수선 경로를 통해 일어난다: 비-상동성 단부 결합(NHEJ) 및 상동성 재조합(HR)(문헌[Kasparek & Humphrey, Seminars in Cell & Dev. Biol., 2011, 22, 886-897] 참고). 외인성 공여자 서열에 의해서 매개되는 표적 핵산(예를 들어, 내인성 B4GALT1 유전자)의 수선은 두 폴리뉴클레오타이드 간의 유전자 정보의 임의의 교환 과정을 포함할 수 있다. 예를 들어, NHEJ는 또한 외인성 공여자 서열의 단부와 끊김 단부의 직접적인 결찰(즉, NHEJ-기반 포획)을 통한 외인성 공여자 서열의 표적화된 통합을 초래할 수 있다. 수선은 또한 상동 직접 수선(homology directed repair: HDR) 또는 상동성 재조합(HR)을 통해서 일어날 수 있다. HDR 또는 HR은 뉴클레오타이드 서열 상동성을 필요로 할 수 있고, "표적" 분자(즉, 이중 가닥 끊김을 겪은 분자)의 수선을 위한 주형으로서 "공여자" 분자를 사용하고, 공여자로부터 표적으로의 유전 정보의 전달로 이어지는 핵산 수선의 형태를 포함한다.Repair in response to double-strand breaks (DSBs) principally occurs through two conserved DNA repair pathways: non-homologous end joining (NHEJ) and homologous recombination (HR). (See Kasparek & Humphrey, Seminars in Cell & Dev. Biol. , 2011, 22, 886-897). Repair of a target nucleic acid (e.g., an endogenous B4GALT1 gene) mediated by an exogenous donor sequence may involve any exchange of genetic information between two polynucleotides. For example, NHEJ can also result in targeted integration of an exogenous donor sequence through direct ligation of the broken end with the end of the exogenous donor sequence (i.e., NHEJ-based capture). Repair can also occur through homology directed repair (HDR) or homologous recombination (HR). HDR or HR may require nucleotide sequence homology, use a "donor" molecule as a template for repair of a "target" molecule (i.e., a molecule that has suffered a double-strand break), and transfer genetic information from the donor to the target. Includes a form of nucleic acid repair that leads to the transfer of .

게놈 내의 내인성 B4GALT1 유전자에 대한 표적화된 유전자 변형은 세포를, 내인성 B4GALT1 유전자 내의 표적 게놈 유전자좌에서 5' 표적 서열에 혼성화되는 5' 상동성 아암 및 내인성 B4GALT1 유전자 내의 표적 게놈 유전자좌에서 3' 표적 서열에 혼성화되는 3' 상동성 아암을 포함하는 외인성 공여자 서열과 접촉시킴으로써 생성될 수 있다. 외인성 공여자 서열은 표적 게놈 유전자좌와 재조합하여 내인성 B4GALT1 유전자에 대한 표적화된 유전자 변형을 생성시킬 수 있다. 일례로서, 5' 상동성 아암은 서열번호 1의 53575 내지 53577번 위치에 상응하는 위치의 표적 서열 5'에 혼성화될 수 있고, 3' 상동성 아암은 서열번호 1의 53575 내지 53577번 위치에 상응하는 위치의 표적 서열 3'에 혼성화될 수 있다. 이러한 방법은 예를 들어, 이로부터 생산된 전장/성숙 폴리펩타이드의 352번 위치에 상응하는 위치에 세린을 암호화하는 뉴클레오타이드 서열을 함유하는 B4GALT1 유전자를 초래할 수 있다. 외인성 공여자 서열의 예는 본 명세서 다른 곳에 개시된다.Targeted genetic modification to the endogenous B4GALT1 gene within the genome comprises a 5' homology arm that hybridizes to a 5' target sequence at a target genomic locus within the endogenous B4GALT1 gene and a 3' target sequence at the target genomic locus within the endogenous B4GALT1 gene. It can be generated by contacting an exogenous donor sequence that contains the 3' homology arm. The exogenous donor sequence can be recombined with the target genomic locus to create targeted genetic modifications to the endogenous B4GALT1 gene. As an example, the 5' homology arm may hybridize to the target sequence 5' at a position corresponding to positions 53575 to 53577 in SEQ ID NO: 1, and the 3' homology arm corresponds to positions 53575 to 53577 in SEQ ID NO: 1. It can hybridize to the target sequence 3' at the position. This method can, for example, result in a B4GALT1 gene containing a nucleotide sequence encoding serine at a position corresponding to position 352 of the full-length/mature polypeptide produced therefrom. Examples of exogenous donor sequences are disclosed elsewhere herein.

예를 들어, 게놈에서 내인성 B4GALT1 유전자에 대한 표적화된 유전자 변형은, 세포 또는 세포의 게놈을, 내인성 B4GALT1 유전자 내의 표적 게놈 유전자좌 내에서 하나 이상의 가이드 RNA 인식 서열에 혼성화되는 하나 이상의 가이드 RNA 및 Cas 단백질과 접촉시킴으로써 생성될 수 있다. 예를 들어, 이러한 방법은 세포를, Cas 단백질 및 내인성 B4GALT1 유전자 내의 가이드 RNA 인식 서열에 혼성화되는 가이드 RNA와 접촉시키는 것을 포함할 수 있다. 일부 실시형태에서, 가이드 RNA 인식 서열은 서열번호 1의 엑손 5에 상응하는 영역 내에 위치된다. 일부 실시형태에서, 가이드 RNA 인식 서열은 서열번호 1의 53575 내지 53577번 위치에 상응하는 위치를 포함할 수 있거나 이에 인접한다. 예를 들어, 가이드 RNA 인식 서열은 서열번호 1의 53575 내지 53577번 위치에 상응하는 위치를 포함하거나 또는 이의 약 1000, 약 500, 약 400, 약 300, 약 200, 약 100, 약 50, 약 45, 약 40, 약 35, 약 30, 약 25, 약 20, 약 15, 약 10, 또는 약 5개 뉴클레오타이드 내에 존재한다. 추가의 또 다른 예로서, 가이드 RNA 인식 서열은 내인성 B4GALT1 유전자의 시작 코돈 또는 내인성 B4GALT1 유전자의 중단 코돈을 포함할 수 있거나 이에 인접할 수 있다. 예를 들어, 가이드 RNA 인식 서열은 시작 코돈 또는 중단 코돈의 약 10, 약 20, 약 30, 약 40, 약 50, 약 100, 약 200, 약 300, 약 400, 약 500 또는 약 1,000개 뉴클레오타이드 내에 존재할 수 있다. Cas 단백질 및 가이드 RNA는 복합체를 형성하고, Cas 단백질은 가이드 RNA 인식 서열을 절단한다. Cas 단백질에 의한 절단은 이중 가닥 끊김 또는 단일 가닥 끊김(예를 들어, Cas 단백질이 닉카제인 경우)를 생성할 수 있다. 이러한 방법은, 예를 들어, 서열번호 1의 엑손 5에 상응하는 영역이 파괴되거나, 시작 코돈이 파괴되거나, 중단 코돈이 파괴되거나, 또는 코딩 서열이 결실된 내인성 B4GALT1 유전자를 초래할 수 있다. 방법에 사용될 수 있는 Cas(예를 들어, Cas9) 단백질 및 가이드 RNA의 예 및 변이는 본 명세서의 다른 곳에 기재된다. For example, a targeted genetic modification to an endogenous B4GALT1 gene in the genome may comprise a cell or the genome of a cell with one or more guide RNAs and a Cas protein that hybridizes to one or more guide RNA recognition sequences within a target genomic locus within the endogenous B4GALT1 gene. It can be created by contact. For example, such a method may include contacting the cell with a Cas protein and a guide RNA that hybridizes to a guide RNA recognition sequence within the endogenous B4GALT1 gene. In some embodiments, the guide RNA recognition sequence is located within a region corresponding to exon 5 of SEQ ID NO:1. In some embodiments, the guide RNA recognition sequence may comprise or is adjacent to positions corresponding to positions 53575 to 53577 of SEQ ID NO:1. For example, the guide RNA recognition sequence includes positions corresponding to positions 53575 to 53577 of SEQ ID NO: 1, or about 1000, about 500, about 400, about 300, about 200, about 100, about 50, about 45 thereof. , about 40, about 35, about 30, about 25, about 20, about 15, about 10, or about 5 nucleotides. As yet another example, the guide RNA recognition sequence may comprise or be adjacent to the start codon of the endogenous B4GALT1 gene or the stop codon of the endogenous B4GALT1 gene. For example, the guide RNA recognition sequence is within about 10, about 20, about 30, about 40, about 50, about 100, about 200, about 300, about 400, about 500, or about 1,000 nucleotides of the start codon or stop codon. It can exist. The Cas protein and guide RNA form a complex, and the Cas protein cleaves the guide RNA recognition sequence. Cleavage by a Cas protein can produce a double-strand break or a single-strand break (e.g., if the Cas protein is a nickase). This method can result in an endogenous B4GALT1 gene with, for example, a region corresponding to exon 5 of SEQ ID NO:1 destroyed, a start codon destroyed, a stop codon destroyed, or the coding sequence deleted. Examples and variations of Cas (e.g., Cas9) proteins and guide RNAs that can be used in the methods are described elsewhere herein.

일부 실시형태에서, 2종 이상의 뉴클레아제 작용제가 사용될 수 있다. 예를 들어, 2개의 뉴클레아제 작용제가 사용될 수 있는데, 각각 서열번호 1의 엑손 5에 상응하는 영역 내의 뉴클레아제 인식 서열을 표적화하거나, 또는 서열번호 1의 53575 내지 53577번 위치에 상응하는 위치(예를 들어, 서열번호 1의 53575 내지 53577번 위치에 상응하는 위치의 약 1000, 약 500, 약 400, 약 300, 약 200, 약 100, 약 50, 약 45, 약 40, 약 35, 약 30, 약 25, 약 20, 약 15, 약 10, 또는 약 5개 뉴클레오타이드 내의 위치)를 포함하거나 이에 인접한다. 또 다른 예로서, 2종 이상의 뉴클레아제 작용제가 사용될 수 있는데, 각각 시작 코돈을 포함하거나 이에 인접한 뉴클레아제 인식 서열을 표적화한다. 또 다른 예로서, 2종 이상의 뉴클레아제 작용제가 사용될 수 있고, 하나는 시작 코돈을 포함하거나 이에 인접한 뉴클레아제 인식 서열을 표적화하고, 하나는 중단 코돈을 포함하거나 이에 인접한 뉴클레아제 인식 서열을 표적화하며, 여기서 뉴클레아제 작용제에 의한 절단은 두 뉴클레아제 인식 서열 사이의 암호 영역의 결실을 초래할 수 있다. 추가의 또 다른 예로서, 3종 이상의 뉴클레아제 작용제가 사용될 수 있고, 하나 이상(예를 들어, 2종)은 시작 코돈을 포함하거나 이에 인접한 뉴클레아제 인식 서열을 표적화하고, 하나 이상(예를 들어, 2종)은 중단 코돈을 포함하거나 이에 인접한 뉴클레아제 인식 서열을 표적화하며, 여기서 뉴클레아제 작용제에 의한 절단은 시작 코돈을 포함하거나 이에 인접한 뉴클레아제 인식 서열과 중단 코돈을 포함하거나 이에 인접한 뉴클레아제 인식 서열 사이의 암호 영역의 결실을 초래할 수 있다.In some embodiments, two or more nuclease agents may be used. For example, two nuclease agents can be used, each targeting a nuclease recognition sequence within the region corresponding to exon 5 of SEQ ID NO: 1, or positions corresponding to positions 53575 to 53577 of SEQ ID NO: 1. (For example, about 1000, about 500, about 400, about 300, about 200, about 100, about 50, about 45, about 40, about 35, about positions corresponding to positions 53575 to 53577 of SEQ ID NO: 1 30, about 25, about 20, about 15, about 10, or about 5 nucleotides). As another example, two or more nuclease agents may be used, each targeting a nuclease recognition sequence containing or adjacent to the start codon. As another example, two or more nuclease agents may be used, one targeting the nuclease recognition sequence containing or adjacent to the start codon, and one targeting the nuclease recognition sequence containing or adjacent to the stop codon. targeting, where cleavage by a nuclease agent may result in deletion of the coding region between the two nuclease recognition sequences. As yet another example, three or more nuclease agents may be used, one or more (e.g., two) targeting a nuclease recognition sequence comprising or adjacent to the start codon, and one or more (e.g., For example, type 2) targets a nuclease recognition sequence that includes or is adjacent to a stop codon, wherein cleavage by a nuclease agent includes a nuclease recognition sequence that includes or is adjacent to a start codon and a stop codon; This may result in deletion of the coding region between the adjacent nuclease recognition sequences.

일부 실시형태에서, 세포는 내인성 B4GALT1 내의 표적 게놈 유전자좌 내에서 추가적인 가이드 RNA 인식 서열에 혼성화되는 하나 이상의 추가적인 가이드 RNA와 추가로 접촉될 수 있다. 세포를 하나 이상의 추가적인 가이드 RNA(예를 들어, 제2 가이드 RNA 인식 서열에 혼성화되는 제2 가이드 RNA)와 접촉시킴으로써, Cas 단백질에 의한 절단은 둘 이상의 이중 가닥 끊김 또는 둘 이상의 단일 가닥 끊김(예를 들어, Cas 단백질이 닉카제인 경우)를 생성할 수 있다.In some embodiments, the cells may be further contacted with one or more additional guide RNAs that hybridize to additional guide RNA recognition sequences within the target genomic locus within endogenous B4GALT1 . By contacting the cell with one or more additional guide RNAs (e.g., a second guide RNA that hybridizes to a second guide RNA recognition sequence), cleavage by the Cas protein results in two or more double-strand breaks or two or more single-strand breaks (e.g., a second guide RNA that hybridizes to a second guide RNA recognition sequence). For example, if the Cas protein is a nickase), it can be produced.

일부 실시형태에서, 세포는 표적화된 유전자 변형을 생성시키도록 내인성 B4GALT1 유전자 내의 표적 게놈 유전자좌와 재조합하는 하나 이상의 외인성 공여자 서열과 추가로 접촉될 수 있다. 방법에 사용될 수 있는 외인성 공여자 서열의 예 및 변이가 본 명세서의 다른 곳에 개시된다. In some embodiments, the cells may be further contacted with one or more exogenous donor sequences that recombine with the target genomic locus within the endogenous B4GALT1 gene to generate targeted genetic modifications. Examples and variations of exogenous donor sequences that can be used in the methods are disclosed elsewhere herein.

Cas 단백질, 가이드 RNA(들) 및 외인성 공여자 서열(들)은 임의의 형태로 그리고 본 명세서의 다른 곳에 기재된 바와 같은 임의의 수단에 의하여 세포 내에 도입될 수 있고, Cas 단백질, 가이드 RNA(들) 및 외인성 공여자 서열(들)의 전부 또는 일부는 임의의 조합물로 동시에 또는 순차적으로 도입될 수 있다.The Cas protein, guide RNA(s) and exogenous donor sequence(s) can be introduced into the cell in any form and by any means as described elsewhere herein, and the Cas protein, guide RNA(s) and All or part of the exogenous donor sequence(s) may be introduced simultaneously or sequentially in any combination.

일부 실시형태에서, 외인성 공여자 서열에 의한 표적 핵산(예를 들어, 내인성 B4GALT1 유전자)의 수선은 상동 직접 수선(HDR)를 통해 일어난다. 상동 직접 수선은 Cas 단백질이 이중 가닥 끊김을 생성하기 위해서 내인성 B4GALT1 유전자 내의 DNA의 양 가닥을 절단하는 경우, Cas 단백질이 단일 가닥 끊김을 생성하기 위해서 표적 핵산 내의 DNA의 하나의 가닥을 절단하는 닉카제인 경우, 또는 두 오프셋 닉에 의해 형성된 이중 가닥 끊김을 생성하기 위해서 Cas 닉카제가 사용되는 경우 일어날 수 있다. 이러한 방법에서, 외인성 공여자 서열은 5' 및 3' 표적 서열에 상응하는 5' 및 3' 상동성 아암을 포함한다. 가이드 RNA 인식 서열(들) 또는 절단 부위(들)는 5' 표적 서열에 인접하거나, 3' 표적 서열에 인접하거나, 5' 표적 서열 및 3' 표적 서열 둘 다에 인접하거나, 또는 5' 표적 서열과 3' 표적 서열에 인접하지 않을 수 있다. 일부 실시형태에서, 외인성 공여자 서열은 5' 및 3' 상동성 아암에 의해 측접된 핵산 삽 입물을 추가로 포함할 수 있고, 핵산 삽입물은 5' 및 3' 표적 서열 사이에 삽입된다. 핵산 삽입물이 존재하지 않으면, 외인성 공여자 서열은 5' 및 3' 표적 서열 사이의 게놈 서열을 결실시키는 기능을 할 수 있다. 외인성 공여자 서열의 예는 본 명세서 다른 곳에 개시된다.In some embodiments, repair of a target nucleic acid (e.g., an endogenous B4GALT1 gene) by an exogenous donor sequence occurs through homologous direct repair (HDR). Homologous direct repair occurs when the Cas protein cleaves both strands of DNA within the endogenous B4GALT1 gene to create a double-strand break, while the Cas protein cleaves one strand of DNA within the target nucleic acid to create a single-strand break. This can occur, or when a Cas nickase is used to create a double strand break formed by two offset nicks. In this method, the exogenous donor sequence includes 5' and 3' homology arms corresponding to the 5' and 3' target sequences. The guide RNA recognition sequence(s) or cleavage site(s) are adjacent to the 5' target sequence, adjacent to the 3' target sequence, adjacent to both the 5' target sequence and the 3' target sequence, or adjacent to the 5' target sequence. and may not be adjacent to the 3' target sequence. In some embodiments, the exogenous donor sequence may further comprise a nucleic acid insert flanked by 5' and 3' homology arms, and the nucleic acid insert is inserted between the 5' and 3' target sequences. If a nucleic acid insert is not present, the exogenous donor sequence can serve to delete the genomic sequence between the 5' and 3' target sequences. Examples of exogenous donor sequences are disclosed elsewhere herein.

대안적으로, 외인성 공여자 서열에 의해 매개된 내인성 B4GALT1 유전자의 수선은 비상동성 단부 결합(NHEJ)-매개된 결찰을 통해 일어날 수 있다. 이러한 방법에서, 외인성 공여자 서열의 적어도 하나의 단부는 내인성 B4GALT1 유전자에서 Cas-매개된 절단에 의해 생성된 적어도 하나의 오버행과 상보적인 짧은 단일 가닥 영역을 포함한다. 외인성 공여자 서열에서 상보적인 단부는 핵산 삽입물에 측접할 수 있다. 예를 들어, 외인성 공여자 서열의 각각의 단부는 내인성 B4GALT1 유전자에서 Cas-매개된 절단에 의해 생성된 오버행에 상보적인 짧은 단일-가닥 영역을 포함할 수 있고, 여기서 외인성 공여자 서열 내의 이러한 상보성 영역은 핵산 삽입물에 측접할 수 있다.Alternatively, repair of the endogenous B4GALT1 gene mediated by an exogenous donor sequence can occur through non-homologous end joining (NHEJ)-mediated ligation. In this method, at least one end of the exogenous donor sequence comprises a short single-stranded region complementary to at least one overhang created by Cas-mediated cleavage in the endogenous B4GALT1 gene. Complementary ends in the exogenous donor sequence may flank the nucleic acid insert. For example, each end of the exogenous donor sequence may comprise a short single-stranded region complementary to an overhang created by Cas-mediated cleavage in the endogenous B4GALT1 gene, wherein this complementary region within the exogenous donor sequence is a nucleic acid It can be flanked on the insert.

오버행(즉, 엇갈린 단부)은 Cas-매개된 절단에 의해 생성된 이중 가닥 끊김의 블런트 단부의 절제에 의해 생성될 수 있다. 이러한 절제는 단편 결합에 필요한 미세상동성 영역을 생성할 수 있지만, 이것은 B4GALT1 유전자에서 원하지 않거나 또는 제어 가능하지 않은 변경을 생성할 수 있다. 대안적으로, 이러한 오버행은 짝을 이룬 Cas 닉카제를 사용함으로써 생성될 수 있다. 예를 들어, 세포는 DNA의 반대 가닥을 절단하는 제1 닉카제 및 제2 닉카제와 접촉될 수 있고, 이에 의해서 게놈은 이중 닉킹을 통해 변형된다. 이것은 세포를 제1 Cas 단백질 닉카제, 내인성 B4GALT1 유전자 내의 표적 게놈 유전자좌 내의 제1 가이드 RNA 인식 서열에 혼성화되는 제1 가이드 RNA, 제2 Cas 단백질 닉카제, 및 내인성 B4GALT1 유전자 내의 표적 게놈 유전자좌 내의 제2 가이드 RNA 인식 서열에 혼성화되는 제2 가이드 RNA와 접촉시킴으로써 달성될 수 있다. 제1 Cas 단백질 및 제1 가이드 RNA는 제1 복합체를 형성하고, 제2 Cas 단백질 및 제2 가이드 RNA는 제2 복합체를 형성한다. 제1 Cas 단백질 닉카제는 제1 가이드 RNA 인식 서열 내의 게놈 DNA의 제1 가닥을 절단하고, 제2 Cas 단백질 닉카제는 제2 가이드 RNA 인식 서열 내의 게놈 DNA의 제2 가닥을 절단하고, 선택적으로 외인성 공여자 서열은 내인성 B4GALT1 유전자 내의 표적 게놈 유전자좌와 재조합하여 표적화된 유전자 변형을 생성시킨다.Overhangs (i.e., staggered ends) can be created by excision of the blunt ends of double-strand breaks created by Cas-mediated cleavage. Although this excision may create a region of microhomology necessary for fragment joining, it may create unwanted or uncontrollable changes in the B4GALT1 gene. Alternatively, these overhangs can be created by using paired Cas nickases. For example, a cell can be contacted with a first nickase and a second nickase that cleave opposing strands of DNA, thereby modifying the genome through double nicking. This involves activating the cell with a first Cas protein nickase, a first guide RNA that hybridizes to a first guide RNA recognition sequence within the target genomic locus within the endogenous B4GALT1 gene, a second Cas protein nickase, and a second guide RNA within the target genomic locus within the endogenous B4GALT1 gene. This can be accomplished by contacting a second guide RNA that hybridizes to the guide RNA recognition sequence. The first Cas protein and the first guide RNA form a first complex, and the second Cas protein and the second guide RNA form a second complex. The first Cas protein nickase cleaves the first strand of genomic DNA within the first guide RNA recognition sequence, and the second Cas protein nickase cleaves the second strand of genomic DNA within the second guide RNA recognition sequence, and optionally The exogenous donor sequence recombines with the target genomic locus within the endogenous B4GALT1 gene to create a targeted genetic modification.

제1 닉카제는 게놈 DNA의 제1 가닥(즉, 상보성 가닥)을 절단하고, 제2 닉카제는 게놈 DNA의 제2 가닥(즉, 비-상보성 가닥)을 절단한다. 제1 닉카제 및 제2 닉카제는, 예를 들어, Cas9의 RuvC 도메인의 촉매적 잔 기를 돌연변이시킴으로써(예를 들어, 본 명세서의 다른 곳에 기재된 D10A 돌연변이) 또는 Cas9의 HNH 도메인에서 촉매적 잔기를 돌연변이시킴으로써(예를 들어, 본 명세서 다른 곳에 기재된 H840A 돌연변이) 생성될 수 있다. 이러한 방법에서, 이중 닉킹이 엇갈린 단부(즉, 오버행)를 갖는 이중 가닥 끊김을 생성하기 위해서 사용될 수 있다. 제1 가이드 RNA 및 제2 가이드 RNA 인식 서열은 DNA의 제1 가닥 및 제2 가닥 상에서 제1 닉카제 및 제2 닉카제에 의해 생성된 닉이 이중 가닥 끊김을 생성하도록 절단 부위를 생성하기 위해서 위치될 수 있다. 오버행은 제1 및 제2 CRISPR RNA 인식 서열 내의 닉이 오프셋인 경우 생성된다. 오프셋 창은, 예를 들어, 적어도 약 5bp, 적어도 약 10bp, 적어도 약 20bp, 적어도 약 30bp, 적어도 약 40bp, 적어도 약 50bp, 적어도 약 60bp, 적어도 약 70bp, 적어도 약 80bp, 적어도 약 90bp, 적어도 약 100bp 또는 그 초과일 수 있다(예를 들어, 문헌[Ran et al., Cell, 2013, 154, 1380-1389; Mali et al., Nat. Biotech., 213, 31, 833-838; 및 Shen et al., Nat. Methods, 2014, 11, 399-404] 참고).The first nickase cleaves the first strand of genomic DNA (i.e., the complementary strand), and the second nickase cleaves the second strand of genomic DNA (i.e., the non-complementary strand). The first nickase and the second nickase are mutated, for example, by mutating catalytic residues in the RuvC domain of Cas9 (e.g., the D10A mutation described elsewhere herein) or by mutating catalytic residues in the HNH domain of Cas9. mutating (e.g., the H840A mutation described elsewhere herein). In this method, double nicking can be used to create double strand breaks with staggered ends (i.e., overhangs). The first guide RNA and the second guide RNA recognition sequence are positioned on the first and second strands of the DNA to create a cleavage site such that the nick created by the first nickase and the second nickase creates a double strand break. It can be. Overhangs are created when nicks within the first and second CRISPR RNA recognition sequences are offset. The offset window may be, for example, at least about 5 bp, at least about 10 bp, at least about 20 bp, at least about 30 bp, at least about 40 bp, at least about 50 bp, at least about 60 bp, at least about 70 bp, at least about 80 bp, at least about 90 bp, at least about It may be 100 bp or more (see, e.g., Ran et al., Cell , 2013, 154, 1380-1389; Mali et al., Nat. Biotech. , 213, 31, 833-838; and Shen et al. al., Nat. Methods , 2014, 11, 399-404].

다양한 유형의 표적화된 유전자 변형이 본 명세서에 기재된 방법을 사용하여 도입될 수 있다. 이러한 표적화된 변형은, 예를 들어, 하나 이상의 뉴클레오타이드의 첨가, 하나 이상의 뉴클레오타이드의 결실, 하나 이상의 뉴클레오타이드의 치환, 점 돌연변이, 또는 이들의 조합을 포함할 수 있다. 예를 들어, 적어도 1, 적어도 2, 적어도 3, 적어도 4, 적어도 5, 적어도 7, 적어도 8, 적어도 9 또는 적어도 10개 또는 그 초과의 뉴클레오타이드가 변경(예를 들어, 결실, 삽입 또는 치환)되어 표적화된 게놈 변형을 형성할 수 있다. Various types of targeted genetic modifications can be introduced using the methods described herein. Such targeted modifications may include, for example, the addition of one or more nucleotides, the deletion of one or more nucleotides, the substitution of one or more nucleotides, point mutations, or combinations thereof. For example, at least 1, at least 2, at least 3, at least 4, at least 5, at least 7, at least 8, at least 9 or at least 10 or more nucleotides are altered (e.g., deleted, inserted or substituted). Targeted genomic modifications can be made.

이러한 표적화된 유전자 변형은 표적 게놈 유전자좌의 파괴를 초래할 수 있다. 파괴는 조절 요소(예를 들어, 프로모터 또는 인핸서)의 변경, 미스센스 돌연변이, 넌센스 돌연변이, 프레임-시프트 돌연변이, 절두 돌연변이, 널(null) 돌연변이, 또는 적은 수의 뉴클레오타이드의 삽입 또는 결실(예를 들어, 프레임시프트 돌연변이를 유발함)을 포함할 수 있고, 이것은 비활성화(즉, 기능의 손실) 또는 대립유전자의 손실을 초래할 수 있다. 예를 들어, 표적화된 변형은 시작 코돈이 더 이상 기능할 수 없도록 하는 내인성 B4GALT1 유전자의 시작 코돈의 파괴를 포함할 수 있다.These targeted genetic modifications can result in disruption of the target genomic locus. Disruptions may include alterations in regulatory elements (e.g., promoters or enhancers), missense mutations, nonsense mutations, frame-shift mutations, truncation mutations, null mutations, or insertions or deletions of a small number of nucleotides (e.g. , resulting in frameshift mutations), which may result in inactivation (i.e., loss of function) or loss of allele. For example, a targeted modification may involve destruction of the start codon of the endogenous B4GALT1 gene such that the start codon is no longer functional.

일부 실시형태에서, 표적화된 변형은 제1 가이드 RNA 인식 서열과 제2 가이드 RNA 인식 서열 사이 또는 Cas 절단 부위 사이의 결실을 포함할 수 있다. 외인성 공여자 서열(예를 들어, 수선 주형 또는 표적화 벡터)이 사용되면, 변형은 제1 가이드 RNA 인식 서열과 제2 가이드 RNA 인식 서열 사이 또는 Cas 절단 부위 사이의 결실뿐만 아니라 5' 표적 서열과 3' 표적 서열 사이의 핵산 삽입물의 삽입을 포함할 수 있다.In some embodiments, the targeted modification may comprise a deletion between the first and second guide RNA recognition sequences or between the Cas cleavage site. If an exogenous donor sequence (e.g., a repair template or targeting vector) is used, modifications may include deletions between the first and second guide RNA recognition sequences or between the Cas cleavage site, as well as deletions between the 5' target sequence and the 3' target sequence. It may involve the insertion of nucleic acid inserts between target sequences.

일부 실시형태에서, 외인성 공여자 서열이 단독으로 또는 뉴클레아제 작용제와 함께 사용되는 경우, 변형은 5' 표적 서열과 및 3' 표적 서열 사이의 결실뿐만 아니라 제1 상동성 염색체와 제2 상동성 염색체의 쌍에서 5' 표적 서열과 3' 표적 서열 사이에 핵산 삽입물의 삽입을 포함할 수 있고, 이에 의해서 동형접합성 변형 게놈을 초래할 수 있다. 대안적으로, 외인성 공여자 서열이 핵산 삽입물 없이 5' 상동성 아암 및 3' 상동성 아암을 포함하는 경우, 변형은 5' 표적 서열과 3' 표적 서열 사이의 결실을 포함할 수 있다.In some embodiments, when an exogenous donor sequence is used alone or in combination with a nuclease agent, the modifications include deletions between the 5' target sequence and the 3' target sequence, as well as deletions between the first and second homologous chromosomes. may involve the insertion of a nucleic acid insert between the 5' and 3' target sequences in the pair, thereby resulting in a homozygously modified genome. Alternatively, if the exogenous donor sequence includes a 5' homology arm and a 3' homology arm without a nucleic acid insert, the modification may involve a deletion between the 5' target sequence and the 3' target sequence.

제1 가이드 RNA 인식 서열과 제2 가이드 RNA 인식 서열 사이의 결실 또는 5' 표적 서열과 3' 표적 서열 사이의 결실은 결실된 핵산이 제1 및 제2 뉴클레아제 절단 부위 사이의 핵산 서열로만 이루어지거나 또는 5' 표적 서열과 3' 표적 서열 사이의 핵산 서열로만 이루어져서 변형된 게놈 표적 유전자좌에 추가적인 결실 또는 삽입이 없는 경우에 정확한 결실일 수 있다. 제1 가이드 RNA 인식 서열과 제2 가이드 RNA 인식 서열 사이의 결실은 또한 제1 및 제2 뉴클레아제 절단 부위를 지나서 연장된 부정확한 결실일 수 있는데, 이것은 비상동성 단부 결합(NHEJ)에 의한 부정확한 수선과 일치하며, 변형된 게놈 유전자좌에서 추가적인 결실 및/또는 삽입을 초래할 수 있다. 예를 들어, 결실은 제1 및 제2 Cas 단백질 절단 부위를 지나서 약 1bp, 약 2bp, 약 3bp, 약 4bp, 약 5bp, 약 10bp, 약 20bp, 약 30bp, 약 40bp, 약 50bp, 약 100bp, 약 200bp, 약 300bp, 약 400bp, 약 500bp, 또는 그 초과로 연장될 수 있다. 마찬가지로, 변형된 게놈 유전자좌는 NHEJ에 의한 부정확한 수선과 일치하는 추가적인 삽입, 예를 들어 약 1bp, 약 2bp, 약 3bp, 약 4bp, 약 5 bp, 약 10bp, 약 20bp, 약 30bp, 약 40bp, 약 50bp, 약 100bp, 약 200bp, 약 300bp, 약 400bp, 약 500bp, 또는 그 초과의 삽입을 포함할 수 있다.A deletion between a first guide RNA recognition sequence and a second guide RNA recognition sequence or a deletion between a 5' target sequence and a 3' target sequence means that the deleted nucleic acid consists only of the nucleic acid sequence between the first and second nuclease cleavage sites. It may be an accurate deletion if it consists only of the nucleic acid sequence between the 5' target sequence and the 3' target sequence and there is no additional deletion or insertion at the modified genomic target locus. Deletions between the first guide RNA recognition sequence and the second guide RNA recognition sequence may also be imprecise deletions extending past the first and second nuclease cleavage sites, which may be imprecise by non-homologous end joining (NHEJ). Consistent with a repair, it may result in additional deletions and/or insertions at the altered genomic locus. For example, the deletion may be about 1 bp, about 2 bp, about 3 bp, about 4 bp, about 5 bp, about 10 bp, about 20 bp, about 30 bp, about 40 bp, about 50 bp, about 100 bp, It may extend about 200 bp, about 300 bp, about 400 bp, about 500 bp, or more. Likewise, altered genomic loci may have additional insertions consistent with incorrect repair by NHEJ, e.g., about 1 bp, about 2 bp, about 3 bp, about 4 bp, about 5 bp, about 10 bp, about 20 bp, about 30 bp, about 40 bp, It may include an insertion of about 50 bp, about 100 bp, about 200 bp, about 300 bp, about 400 bp, about 500 bp, or more.

표적화된 유전자 변형은, 예를 들어, 이중대립유전자 변형 또는 단일대립유전자 변형일 수 있다. 이중대립유전자 변형은 동일한 변형이 (예를 들어, 이배체 세포에서) 상응하는 상동성 염색체 상의 동일한 유전자좌에 대해 일어나거나, 또는 상이한 변형이 상응하는 상동성 염색체 상의 동일한 유전자좌에 대해 일어나는 사건을 포함한다. 일부 실시형태에서, 표적화된 유전자 변형은 단일대립유전자 변형이다. 단일대립유전자 변형은 변형이 단지 하나의 대립유전자에 대해서 일어나는 사건(즉, 두 상동성 염색체 중 하나에서만 내인성 B4GALT1 유전자에 대한 변형)을 포함한다. 상동성 염색체는 동일한 유전자좌에서 동일한 유전자를 갖지만 가능한 상이한 대립유전자를 갖는 염색체(예를 들어, 감수분열 중에 쌍을 이루는 염색체)를 포함한다.The targeted genetic modification may be, for example, a biallelic modification or a monoallelic modification. Biallelic modifications include events where the same modification occurs at the same locus on a corresponding homologous chromosome (e.g., in a diploid cell), or where different modifications occur at the same locus on a corresponding homologous chromosome. In some embodiments, the targeted genetic modification is a monoallelic modification. Monoallelic variants include events in which the variant occurs for only one allele (i.e., for the endogenous B4GALT1 gene on only one of the two homologous chromosomes). Homologous chromosomes include chromosomes that have identical genes at the same locus but possibly different alleles (for example, chromosomes that pair up during meiosis).

단일대립유전자 돌연변이는 표적화된 내인성 B4GALT1 변형에 대해 이형접합성인 세포를 초래할 수 있다. 이형접합성은 B4GALT1 유전자의 단지 하나의 대립유전자(즉, 두 상동성 염색체 상의 상응하는 대립유전자)가 표적화된 변형을 갖는 상황을 포함한다.Monoallelic mutations can result in cells that are heterozygous for the targeted endogenous B4GALT1 variant. Heterozygosity includes the situation where only one allele of the B4GALT1 gene (i.e., the corresponding allele on two homologous chromosomes) has a targeted modification.

이중대립유전자 변형은 표적화된 변형에 대해 동형접합성을 초래할 수 있다. 동형접합성은 B4GALT1 유전자의 두 대립유전자(즉, 두 상동성 염색체 상의 상응하는 대립유전자)가 모두 표적화된 변형을 갖는 상황을 포함한다. 대안적으로, 이중대립유전자 변형은 표적화된 변형에 대해 화합물 이형접합성 (예를 들어, 반접합성)을 초래할 수 있다. 화합물 이형접합성은 표적 유전자좌의 두 대립유전자(즉, 두 상동성 염색체 상의 대립유전자)가 모두 변형되었지만, 상이한 방식으로 변형된(예를 들어, 한 대립유전자에서는 표적화된 변형 및 다른 대립유전자의 비활성화 또는 파괴) 상황을 포함한다.Biallelic variants can result in homozygosity for the targeted variant. Homozygosity includes the situation where both alleles of the B4GALT1 gene (i.e., corresponding alleles on two homologous chromosomes) have the targeted variant. Alternatively, a biallelic modification may result in compound heterozygosity (e.g., hemizygosity) for the targeted modification. Compound heterozygosity occurs when both alleles at a target locus (i.e., alleles on two homologous chromosomes) are modified, but in different ways (e.g., a targeted modification in one allele and inactivation of the other allele, or destruction) situation.

본 명세서에 개시된 방법은 변형된 내인성 B4GALT1 유전자를 갖는 세포를 식별하는 단계를 추가로 포함할 수 있다. 다양한 방법을 사용하여 표적화된 유전자 변형, 예컨대, 결실 또는 삽입을 갖는 세포를 식별할 수 있다. 이러한 방법은 B4GALT1 유전자 내에 표적화된 유전자 변형을 갖는 하나의 세포를 식별하는 단계를 포함할 수 있다. 스크리닝을 수행하여 변형된 게놈 유전자좌를 갖는 이러한 세포를 식별할 수 있다. 스크리닝 단계는 모(parental) 염색체의 대립유전자의 변형(MOA)을 평가하기 위한 정량적 검정(예를 들어, 대립유전자 손실(LOA) 및/또는 대립유전자 획득(GOA) 검정)을 포함할 수 있다.The methods disclosed herein may further include the step of identifying cells with a modified endogenous B4GALT1 gene. A variety of methods can be used to identify cells with a targeted genetic modification, such as a deletion or insertion. Such methods may include identifying one cell with a targeted genetic modification within the B4GALT1 gene. Screening can be performed to identify these cells with altered genomic loci. The screening step may include quantitative assays (e.g., loss of allele (LOA) and/or gain of allele (GOA) assays) to assess the allelic modification (MOA) of the parental chromosome.

적합한 정량적 검정의 다른 예는 고정된 프로브(들), INVADER^{(등록상표)} 프로브, TAQMAN^{(등록상표)} Molecular Beacon 프로브, 또는 ECLIPSE^(상표명) 프로브 기술에 대한 형광-매개된 동일계 혼성화(FISH), 비교 게놈 혼성화, 항온 DNA 증폭, 정량적 혼성화를 포함한다. 표적화된 변형을 위한 스크리닝에 대한 종래의 검정, 예컨대, 긴-범위 PCR, 서던 블로팅 또는 생어 서열분석이 또한 사용될 수 있다. 이러한 검정은 전형적으로 삽입된 표적화 벡터 및 표적화된 게놈 유전자좌 사이의 링키지에 대한 증거를 얻기 위해 사용된다. 예를 들어, 긴-범위 PCR 검정의 경우, 하나의 프라이머는 삽입된 DNA 내의 서열을 인식할 수 있는 한편, 다른 프라이머는 표적화 벡터의 상동성 아암의 단부를 지나서 표적 게놈 유전자좌 서열을 인식한다.Other examples of suitable quantitative assays include fluorescence-mediated in situ hybridization (FISH), comparison of immobilized probe(s), INVADER ^™ probes, TAQMAN ^™ Molecular Beacon probes, or ECLIPSE ^™ probe technologies. Includes genomic hybridization, constant temperature DNA amplification, and quantitative hybridization. Conventional assays for screening for targeted modifications, such as long-range PCR, Southern blotting or Sanger sequencing, can also be used. These assays are typically used to obtain evidence for linkage between the inserted targeting vector and the targeted genomic locus. For example, for a long-range PCR assay, one primer may recognize a sequence within the inserted DNA, while the other primer recognizes a target genomic locus sequence past the end of the homology arm of the targeting vector.

차세대 서열분석(next generation sequencing: NGS)이 또한 스크리닝을 위해서 사용될 수 있다. 차세대 서열분석은 또한 "NGS" 또는 "대량 병렬 서열분석"(massively parallel sequencing) 또는 "고속대량 서열분석"(high throughput sequencing)으로서 언급될 수 있다. 일부 실시형태에서, 선택 마커를 사용한 표적화된 세포에 대한 스크리닝은 필요하지 않다. 예를 들어, 본 명세서에 기재된 MOA 및 NGS 검정은 선택 카세트를 사용하지 않는 것에 의존할 수 있다.Next generation sequencing (NGS) can also be used for screening. Next-generation sequencing may also be referred to as “NGS” or “massively parallel sequencing” or “high throughput sequencing.” In some embodiments, screening for targeted cells using a selection marker is not necessary. For example, the MOA and NGS assays described herein may rely on not using a selection cassette.

본 개시내용은 또한 B4GALT1 폴리펩타이드를 암호화하는 핵산의 발현을 변경시키는 방법을 제공한다. 일부 실시형태에서, 발현은 본 명세서 다른 곳에서 더 상세하게 기재되는 바와 같이, 내인성 B4GALT1 폴리펩타이드를 암호화하는 핵산의 파괴를 유발하기 위해서 뉴클레아제 작용제로의 절단을 통해 변경된다. 일부 실시형태에서, 발현은 전사 활성화 도메인 또는 전사 억제 도메인에 융합되거나 또는 연결된 DNA-결합 단백질의 사용을 통해 변경된다. 일부 실시형태에서, 발현은 안티센스 RNA, shRNA, 또는 siRNA와 같은 RNA 간섭 조성물의 사용을 통해 변경된다.The present disclosure also provides methods of altering the expression of a nucleic acid encoding a B4GALT1 polypeptide. In some embodiments, expression is altered through cleavage with a nuclease agent to cause destruction of the nucleic acid encoding the endogenous B4GALT1 polypeptide, as described in more detail elsewhere herein. In some embodiments, expression is altered through the use of a DNA-binding protein fused to or linked to a transcriptional activation domain or a transcriptional repression domain. In some embodiments, expression is altered through the use of RNA interference compositions, such as antisense RNA, shRNA, or siRNA.

일부 실시형태에서, 내인성 B4GALT1 유전자 또는 B4GALT1 폴리펩타이드를 암호화하는 핵산의 발현은 세포 또는 세포 내의 게놈을 내인성 B4GALT1 유전자 또는 내인성 B4GALT1 폴리펩타이드를 암호화하는 핵산 내의 표적 게놈 유전자좌에서 인식 서열에서 하나 이상의 닉 또는 이중 가닥 끊김을 유도하는 뉴클레아제 작용제와 접촉시킴으로써 변형될 수 있다. 이러한 절단은 내인성 B4GALT1 유전자 또는 내인성 B4GALT1 폴리펩타이드를 암호화하는 핵산의 발현의 파괴를 초래할 수 있다. 예를 들어, 뉴클레아제 인식 서열은 내인성 B4GALT1 유전자의 시작 코돈을 포함하거나 이에 인접할 수 있다. 예를 들어, 인식 서열은 시작 코돈의 약 10, 약 20, 약 30, 약 40, 약 50, 약 100, 약 200, 약 300, 약 400, 약 500 또는 약 1,000개 뉴클레오타이드 내에 존재할 수 있고, 뉴클레아제 작용제에 의한 절단은 시작 코돈을 파괴할 수 있다. 일부 실시형태에서, 2종 이상의 뉴클레아제 작용제가 사용될 수 있는데, 각각 시작 코돈을 포함하거나 이에 인접한 뉴클레아제 인식 서열을 표적화한다. 일부 실시형태에서, 2종 이상의 뉴클레아제 작용제가 사용될 수 있고, 하나는 시작 코돈을 포함하거나 이에 인접한 뉴클레아제 인식 서열을 표적하고, 하나는 중단 코돈을 포함하거나 이에 인접한 뉴클레아제 인식 서열을 표적화하며, 여기서 뉴클레아제 작용제에 의한 절단은 두 뉴클레아제 인식 서열 사이의 암호 영역의 결실을 초래할 수 있다. 일부 실시형태에서, 3종 이상의 뉴클레아제 작용제가 사용될 수 있고, 하나 이상(예를 들어, 2종)은 시작 코돈을 포함하거나 이에 인접한 뉴클레아제 인식 서열을 표적화하고, 하나 이상(예를 들어, 2종)은 중단 코돈을 포함하거나 이에 인접한 뉴클레아제 인식 서열을 표적화하며, 여기서 뉴클레아제 작용제에 의한 절단은 시작 코돈을 포함하거나 이에 인접한 뉴클레아제 인식 서열과 중단 코돈을 포함하거나 이에 인접한 뉴클레아제 인식 서열 사이의 암호 영역의 결실을 초래할 수 있다. 내인성 B4GALT1 유전자 또는 내인성 B4GALT1 폴리펩타이드를 암호화하는 핵산을 변형시키는 다른 예는 본 명세서 다른 곳에 개시된다.In some embodiments, expression of a nucleic acid encoding an endogenous B4GALT1 gene or B4GALT1 polypeptide may cause the cell or the genome within the cell to be nicked or duplicated at a recognition sequence at a target genomic locus within the endogenous B4GALT1 gene or nucleic acid encoding the endogenous B4GALT1 polypeptide. It can be modified by contacting it with a nuclease agent that induces strand breaks. Such truncation may result in disruption of the expression of the endogenous B4GALT1 gene or nucleic acid encoding the endogenous B4GALT1 polypeptide. For example, the nuclease recognition sequence can include or be adjacent to the start codon of the endogenous B4GALT1 gene. For example, the recognition sequence may be within about 10, about 20, about 30, about 40, about 50, about 100, about 200, about 300, about 400, about 500, or about 1,000 nucleotides of the start codon, and Cleavage by clease agents can destroy the start codon. In some embodiments, two or more nuclease agents may be used, each targeting a nuclease recognition sequence that includes or is adjacent to the start codon. In some embodiments, two or more nuclease agents may be used, one targeting a nuclease recognition sequence containing or adjacent to the start codon, and one targeting a nuclease recognition sequence containing or adjacent to the stop codon. targeting, where cleavage by a nuclease agent may result in deletion of the coding region between the two nuclease recognition sequences. In some embodiments, three or more nuclease agents may be used, one or more (e.g., two) targeting a nuclease recognition sequence comprising or adjacent to the start codon, and one or more (e.g., , 2 types) target a nuclease recognition sequence containing or adjacent to a stop codon, where cleavage by a nuclease agent targets a nuclease recognition sequence containing or adjacent to the start codon and a nuclease recognition sequence containing or adjacent to the stop codon. This may result in deletion of the coding region between the nuclease recognition sequences. Other examples of modifying the endogenous B4GALT1 gene or nucleic acid encoding the endogenous B4GALT1 polypeptide are disclosed elsewhere herein.

일부 실시형태에서, 내인성 B4GALT1 유전자 또는 내인성 B4GALT1 폴리펩타이드를 암호화하는 핵산의 발현은 세포 또는 세포 내의 게놈을 내인성 B4GALT1 유전자 내의 표적 게놈 유전자좌에 결합하는 DNA-결합 단백질과 접촉시킴으로써 변형될 수 있다. DNA- 결합 단백질은, 예를 들어, 전사 활성인자 도메인 또는 전사 억제인자 도메인에 융합된 뉴클레아제-비활성 Cas 단백질일 수 있다. DNA-결합 단백질의 다른 예는 전사 활성인자 도메인 또는 전사 억제인자 도메인에 융합된 아연 핑거 단백질, 또는 전사 활성인자 도메인 또는 전사 억제인자 도메인에 융합된 전사 활성인자-유사 효과기(TALE) 단백질을 포함한다. 이러한 단백질의 예는 본 명세서 다른 곳에 개시된다.In some embodiments, the expression of an endogenous B4GALT1 gene or a nucleic acid encoding an endogenous B4GALT1 polypeptide can be modified by contacting the cell or the genome within the cell with a DNA-binding protein that binds to a target genomic locus within the endogenous B4GALT1 gene. The DNA-binding protein may be, for example, a nuclease-inactive Cas protein fused to a transcriptional activator domain or a transcriptional repressor domain. Other examples of DNA-binding proteins include zinc finger proteins fused to a transcriptional activator domain or a transcriptional repressor domain, or transcriptional activator-like effector (TALE) proteins fused to a transcriptional activator domain or a transcriptional repressor domain. . Examples of such proteins are disclosed elsewhere herein.

DNA-결합 단백질을 위한 인식 서열(예를 들어, 가이드 RNA 인식 서열)은 발현을 변형시키기에 적합한 내인성 B4GALT1 유전자 또는 B4GALT1 폴리펩타이드를 암호화하는 핵산 내에 어디든 존재할 수 있다. 일부 실시형태에서, 인식 서열은 조절 요소, 예를 들어 인핸서 또는 프로모터 내에 존재할 수 있거나, 또는 조절 요소와 인접한 곳에 존재할 수 있다. 예를 들어, 인식 서열은 내인성 B4GALT1 유전자의 시작 코돈을 포함하거나 이에 인접할 수 있다. 일부 실시형태에서, 인식 서열은 시작 코돈의 약 10, 약 20, 약 30, 약 40, 약 50, 약 100, 약 200, 약 300, 약 400, 약 500 또는 약 1,000개 뉴클레오타이드 내에 존재할 수 있다.Recognition sequences for DNA-binding proteins (e.g., guide RNA recognition sequences) may be present anywhere within the endogenous B4GALT1 gene or nucleic acid encoding the B4GALT1 polypeptide suitable for modifying expression. In some embodiments, the recognition sequence may be within a regulatory element, such as an enhancer or promoter, or may be adjacent to a regulatory element. For example, the recognition sequence may include or be adjacent to the start codon of the endogenous B4GALT1 gene. In some embodiments, the recognition sequence may be within about 10, about 20, about 30, about 40, about 50, about 100, about 200, about 300, about 400, about 500, or about 1,000 nucleotides of the start codon.

일부 실시형태에서, 안티센스 분자는 내인성 B4GALT1 유전자 또는 B4GALT1 폴리펩타이드를 암호화하는 핵산의 발현을 변경시키기 위해서 사용될 수 있다. 안티센스 분자의 예는 안티센스 RNA, siRNA 및 shRNA를 포함하지만 이들로 제한되지 않는다. 이러한 안티센스 RNA, siRNA 또는 shRNA는 mRNA의 임의의 영역을 표적화하도록 설계될 수 있다. 예를 들어, 안티센스 RNA, siRNA 또는 shRNA는 B4GALT1 mRNA의 고유한 영역을 표적화하도록 설계될 수 있다.In some embodiments, antisense molecules can be used to alter the expression of the endogenous B4GALT1 gene or nucleic acid encoding a B4GALT1 polypeptide. Examples of antisense molecules include, but are not limited to, antisense RNA, siRNA, and shRNA. These antisense RNAs, siRNAs or shRNAs can be designed to target any region of the mRNA. For example, antisense RNA, siRNA, or shRNA can be designed to target unique regions of B4GALT1 mRNA.

본 명세서에 개시된 핵산 및 단백질은 임의의 수단에 의해서 세포 내에 도입될 수 있다. 일부 실시형태에서, 도입은 임의의 수단에 의해서 달성될 수 있고, 하나 이상의 성분(예를 들어, 성분 중 2개, 또는 모든 성분)은 임의의 조합물로 세포에 동시에 또는 순차적으로 도입될 수 있다. 예를 들어, 외인성 공여자 서열은 뉴클레아제 작용제의 도입 전에 도입될 수 있거나, 또는 뉴클레아제 작용제의 도입 후에 도입될 수 있다(예를 들어, 외인성 공여자 서열은 뉴클레아제 작용제의 도입 전 또는 후 약 1, 2, 3, 4, 8, 12, 24, 36, 48, 또는 72시간에 투여될 수 있다). 세포의 게놈을 뉴클레아제 작용제 또는 외인성 공여자 서열과 접촉시키는 것은 1종 이상의 뉴클레아제 작용제 또는 뉴클레아제 작용제를 암호화하는 핵산(예를 들어, 1종 이상의 Cas 단백질 또는 1종 이상의 Cas 단백질을 암호화하는 핵산, 및 1종 이상의 가이드 RNA 또는 1종 이상의 가이드 RNA를 암호화하는 핵산(즉, 1종 이상의 CRISPR RNA 및 1종 이상의 tracrRNA)) 및/또는 1종 이상의 외인성 공여자 서열을 세포에 도입시키는 것을 포함할 수 있다. 세포의 게놈을 접촉시키는 것(즉, 세포를 접촉시키는 것)은 상기 성분 중 단지 하나, 하나 이상의 성분, 또는 모든 성분을 세포에 도입시키는 것을 포함할 수 있다.Nucleic acids and proteins disclosed herein can be introduced into cells by any means. In some embodiments, introduction may be accomplished by any means, and one or more components (e.g., two of the components, or all components) may be introduced into the cell simultaneously or sequentially in any combination. . For example, the exogenous donor sequence may be introduced before introduction of the nuclease agent, or may be introduced after introduction of the nuclease agent (e.g., the exogenous donor sequence may be introduced before or after introduction of the nuclease agent). may be administered at approximately 1, 2, 3, 4, 8, 12, 24, 36, 48, or 72 hours). Contacting the genome of a cell with a nuclease agent or an exogenous donor sequence involves the addition of one or more nuclease agents or a nucleic acid encoding a nuclease agent (e.g., one or more Cas proteins or one or more Cas proteins). A nucleic acid that encodes one or more guide RNAs or one or more guide RNAs (i.e., one or more CRISPR RNAs and one or more tracrRNAs) and/or one or more exogenous donor sequences into the cell. can do. Contacting the genome of a cell (i.e., contacting the cell) may include introducing only one, more than one, or all of the above components into the cell.

뉴클레아제 작용제는 단백질의 형태로 또는 뉴클레아제 작용제를 암호화하는 핵산, 예를 들어, RNA(예를 들어, 메신저 RNA(mRNA)) 또는 DNA의 형태로 세포 내로 도입될 수 있다. DNA의 형태로 도입될 때, DNA는 세포에서 활성인 프로모터에 작동 가능하게 연결될 수 있다. 이러한 DNA는 1종 이상의 발현 작제물 내에 존재할 수 있다.The nuclease agent can be introduced into the cell in the form of a protein or in the form of a nucleic acid encoding the nuclease agent, such as RNA (e.g., messenger RNA (mRNA)) or DNA. When introduced in the form of DNA, the DNA can be operably linked to a promoter that is active in the cell. Such DNA may be present in one or more expression constructs.

일부 실시형태에서, Cas 단백질은 단백질, 예를 들어 gRNA와 복합체를 이룬 Cas 단백질의 형태로, 또는 Cas 단백질을 암호화하는 핵산, 예를 들어 RNA(예를 들어, 메신저 RNA (mRNA)) 또는 DNA의 형태로 세포 내로 도입될 수 있다. 가이드 RNA는 RNA의 형태로 또는 가이드 RNA를 암호화하는 DNA의 형태로 세포 내로 도입될 수 있다. DNA의 형태로 도입될 때, Cas 단백질 및/또는 가이드 RNA를 암호화하는 DNA는 세포에서 활성인 프로모터에 작동 가능하게 연결될 수 있다. 이러한 DNA는 1종 이상의 발현 작제물 내에 존재할 수 있다. 예를 들어, 이러한 발현 작제물은 단일 핵산 분자의 성분일 수 있다. 대안적으로, 이것은 2종 이상의 핵산 분자 중에서 임의의 조합으로 분리될 수 있다(즉, 1종 이상의 CRISPR RNA를 암호화하는 DNA, 종 이상의 tracrRNA를 암호화하는 DNA, 및 Cas 단백질을 암호화하는 DNA는 별도의 핵산 분자의 성분일 수 있다).In some embodiments, the Cas protein is a protein, e.g., in the form of a Cas protein complexed with a gRNA, or in the form of a nucleic acid, e.g., RNA (e.g., messenger RNA (mRNA)) or DNA encoding the Cas protein. It can be introduced into cells in this form. Guide RNA can be introduced into cells in the form of RNA or in the form of DNA encoding the guide RNA. When introduced in the form of DNA, the DNA encoding the Cas protein and/or guide RNA can be operably linked to a promoter that is active in the cell. Such DNA may be present in one or more expression constructs. For example, such expression constructs may be components of a single nucleic acid molecule. Alternatively, it may be isolated from any combination of two or more types of nucleic acid molecules (i.e., DNA encoding one or more CRISPR RNAs, DNA encoding more than one type of tracrRNA, and DNA encoding Cas proteins as separate may be a component of a nucleic acid molecule).

일부 실시형태에서, 뉴클레아제 작용제(예를 들어, Cas 단백질 및 가이드 RNA)를 암호화하는 DNA 및/또는 외인성 공여자 서열을 암호화하는 DNA는 DNA 미니서클을 통해 세포 내로 도입될 수 있다. DNA 미니서클은 복제 기원 또는 항생물질 선택 마커를 갖지 않은 비바이러스 유전자 전달에 사용될 수 있는 슈퍼코일 DNA 분자이다. 따라서, DNA 미니서클은 전형적으로 플라스미드 벡터보다 크기가 더 작다. 이러한 DNA는 박테리아 DNA가 없고, 따라서 박테리아 DNA에서 발견되는 메틸화되지 않은 CpG 모티프가 결핍되어 있다.In some embodiments, DNA encoding nuclease agents (e.g., Cas proteins and guide RNAs) and/or DNA encoding exogenous donor sequences can be introduced into cells via DNA minicircles. DNA minicircles are supercoiled DNA molecules that can be used for non-viral gene transfer without an origin of replication or an antibiotic selection marker. Therefore, DNA minicircles are typically smaller in size than plasmid vectors. This DNA is devoid of bacterial DNA and therefore lacks the unmethylated CpG motifs found in bacterial DNA.

본 명세서에 기재된 방법은 핵산 또는 단백질을 세포 내로 도입하는 특정 방법에 좌우되지 않으며, 단지 핵산 또는 단백질이 적어도 하나의 세포의 내부에 접근하게 한다. 핵산 및 단백질을 다양한 세포 유형 내로로 도입시키는 방법은 공지되어 있고, 예를 들어, 안정적인 형질주입 방법, 일시적 형질주입법, 및 바이러스-매개된 방법을 포함하지만 이들로 제한되지 않는다.The methods described herein do not depend on the specific method of introducing the nucleic acid or protein into the cell, but simply provide the nucleic acid or protein access to the interior of at least one cell. Methods for introducing nucleic acids and proteins into various cell types are known and include, but are not limited to, stable transfection methods, transient transfection methods, and virus-mediated methods.

핵산 또는 단백질을 세포 내로 도입시키기 위한 프로토콜뿐만 아니라 형질주입 프로토콜은 달라질 수 있다. 비제한적인 형질주입 방법은 리포좀; 나노입자; 칼슘, 덴드라이머; 및 양이온성 중합체, 예컨대, DEAE-덱스트란 또는 폴리에틸렌이민을 사용하는 화학적-기반 형질주입 방법을 포함한다. 비화학적 방법은 전기천공법, 초음파 천공법(Sono-poration), 및 광학적 형질주입을 포함한다. 입자-기반 형질주입은 유전자 건의 사용, 또는 자기- 보조 형질주입(magnet-assisted transfection)을 포함한다. 바이러스 방법이 또한 형질주입에 사용될 수 있다.Transfection protocols as well as protocols for introducing nucleic acids or proteins into cells can vary. Non-limiting transfection methods include liposomes; nanoparticles; Calcium, dendrimer; and chemical-based transfection methods using cationic polymers such as DEAE-dextran or polyethyleneimine. Non-chemical methods include electroporation, sono-poration, and optical transfection. Particle-based transfection involves the use of gene guns, or magnet-assisted transfection. Viral methods can also be used for transfection.

핵산 또는 단백질의 세포 내로의 도입은 또한 전기천공법에 의해서, 세포질내 주입에 의해서, 바이러스 감염에 의해서, 아데노바이러스에 의해서, 아데노-연관된 바이러스에 의해서, 렌티바이러스에 의해서, 레트로바이러스에 의해서, 형질주입에 의해서, 지질-매개된 형질주입에 의해서, 또는 뉴클레오펙션(nucleofection)에 의해서 매개될 수 있다. 뉴클레오펙션은 핵산 기질이 세포질로 전달될 뿐만 아니라 핵 막을 통해서 핵 내로 전달되는 것을 가능하게 하는 개선된 전기천공 기술이다. 이에 더하여, 본 명세서에 개시된 방법에서 뉴클레오펙션의 사용은 전형적으로 정규 전기 천공법보다 훨씬 더 적은 수의 세포(예를 들어, 정규 전기천공법에 의한 7백만개와 비교하여 단지 약 2백만개)를 필요로 한다. 일부 실시형태에서, 뉴클레오펙션은 LONZA^{(등록상표)} NUCLEOFECTOR^(상표명) 시스템을 사용하여 수행된다.Introduction of nucleic acids or proteins into cells can also be done by electroporation, by intracytoplasmic injection, by viral infection, by adenovirus, by adeno-associated virus, by lentivirus, by retrovirus, by transfection. It may be mediated by injection, by lipid-mediated transfection, or by nucleofection. Nucleofection is an improved electroporation technique that allows nucleic acid substrates to be delivered into the nucleus through the nuclear membrane as well as into the cytoplasm. Additionally, the use of nucleofection in the methods disclosed herein typically results in a much smaller number of cells than regular electroporation (e.g., only about 2 million compared to 7 million by regular electroporation). in need. In some embodiments, nucleofection is performed using the ^{LONZA® NUCLEOFECTOR®} ^system .

핵산 또는 단백질의 세포 내로의 도입은 또한 미세주입에 의해서 달성될 수 있다. mRNA의 미세주입은 보통 (예를 들어, mRNA를 번역 머시너리로 직접 전달하기 위해서) 세포질 내로 주입되는 반면, 단백질 또는 Cas 단백질을 암호화하는 DNA의 미세주입은 보통 핵 내로 주입된다. 대안적으로, 미세주입은 핵 및 세포질 둘 다 내로의 주입에 의해 수행될 수 있고: 바늘이 먼저 핵 내로 도입되고 첫 번째 양이 주입될 수 있는 있고, 바늘이 세포로부터 제거되면서 두 번째 양이 세포질 내로 주입될 수 있다. 뉴클레아제 작용제 단백질이 세포질 내로 주입되는 경우, 단백질은 핵/전핵으로의 전달을 보장하기 위해서 핵 국지화 신호를 포함할 수 있다.Introduction of nucleic acids or proteins into cells can also be achieved by microinjection. Microinjection of mRNA is usually injected into the cytoplasm (e.g., to deliver the mRNA directly to the translation machinery), whereas microinjection of DNA encoding a protein or Cas protein is usually injected into the nucleus. Alternatively, microinjection may be performed by injection into both the nucleus and the cytoplasm: the needle may first be introduced into the nucleus and the first amount may be injected, and the second amount may be injected into the cytoplasm as the needle is removed from the cell. It can be injected into the body. When a nuclease agonist protein is injected into the cytoplasm, the protein may contain a nuclear localization signal to ensure delivery to the nucleus/pronucleus.

핵산 또는 단백질을 세포 내로 도입하기 위한 방법은, 예를 들어, 벡터 전달, 입자-매개된 전달, 엑소좀-매개된 전달, 지질-나노입자-매개된 전달, 세포-투과-펩타이드-매개된 전달 또는 이식가능한 장치-매개된 전달을 포함한다. 핵산 또는 단백질을 대상체에 투여하여 생체내에서 세포를 변형시키는 방법은 본 명세서 다른 곳에서 개시된다. 핵산 및 단백질의 세포 내로의 도입은 또한 유체역학적 전달(HDD)에 의해서 달성될 수 있다.Methods for introducing nucleic acids or proteins into cells include, for example, vector delivery, particle-mediated delivery, exosome-mediated delivery, lipid-nanoparticle-mediated delivery, cell-penetrating-peptide-mediated delivery. or implantable device-mediated delivery. Methods for modifying cells in vivo by administering nucleic acids or proteins to a subject are disclosed elsewhere herein. Introduction of nucleic acids and proteins into cells can also be achieved by hydrodynamic delivery (HDD).

핵산 또는 단백질을 세포 내로 도입하기 위한 방법은, 예를 들어, 벡터 전달, 입자-매개된 전달, 엑소좀-매개된 전달, 지질-나노입자-매개된 전달, 세포-투과-펩타이드-매개된 전달 또는 이식가능한 장치-매개된 전달을 포함한다. 일부 실시형태에서, 핵산 또는 단백질은 담체, 예를 들어, 폴리(락트산)(PLA) 미소구체, 폴리(D,L-락트산-코글리콜산)(PLGA) 미소구체, 리포솜, 미셀, 역 미셀, 지질 코클레이트, 또는 지질 미소관에 내에서 세포 내로 도입될 수 있다.Methods for introducing nucleic acids or proteins into cells include, for example, vector delivery, particle-mediated delivery, exosome-mediated delivery, lipid-nanoparticle-mediated delivery, cell-penetrating-peptide-mediated delivery. or implantable device-mediated delivery. In some embodiments, the nucleic acid or protein is carried in a carrier, e.g., poly(lactic acid) (PLA) microspheres, poly(D,L-lactic acid-coglycolic acid) (PLGA) microspheres, liposomes, micelles, reverse micelles, Lipid cochleates, or lipids may be introduced into cells within microtubules.

핵산 또는 단백질의 세포 내의 도입은 일정 시간 기간에 걸쳐서 1회 또는 수 회 수행될 수 있다. 일부 실시형태에서, 도입은 일정 시간에 걸쳐서 적어도 2회, 일정 시간에 걸쳐서 적어도 3회, 일정 시간에 걸쳐서 적어도 4회, 일정 시간에 걸쳐서 적어도 5회, 일정 시간에 걸쳐서 적어도 6회, 적어도 일정 시간에 걸쳐서 적어도 7회, 일정 시간에 걸쳐서 적어도 8회, 일정 시간에 걸쳐서 적어도 9회, 일정 시간에 걸쳐서 적어도 10회, 적어도 11회, 일정 시간에 걸쳐서 적어도 12회, 일정 시간에 걸쳐서 적어도 13회, 일정 시간에 걸쳐서 적어도 14회, 일정 시간에 걸쳐서 적어도 15회, 일정 시간에 걸쳐서 적어도 16회, 일정 시간에 걸쳐서 적어도 17회, 일정 시간에 걸쳐서 적어도 18회, 일정 시간에 걸쳐서 적어도 19회 또는 일정 시간에 걸쳐서 적어도 20회 수행될 수 있다.Introduction of a nucleic acid or protein into a cell may be performed once or multiple times over a period of time. In some embodiments, the introduction occurs at least 2 times over a period of time, at least 3 times over a period of time, at least 4 times over a period of time, at least 5 times over a period of time, at least 6 times over a period of time, or at least over a period of time. at least 7 times over a period of time, at least 8 times over a period of time, at least 9 times over a period of time, at least 10 times over a period of time, at least 11 times over a period of time, at least 12 times over a period of time, at least 13 times over a period of time, at least 14 times over a period of time, at least 15 times over a period of time, at least 16 times over a period of time, at least 17 times over a period of time, at least 18 times over a period of time, at least 19 times over a period of time or at least a period of time. It can be performed at least 20 times.

일부 실시형태에서, 방법 및 조성물에 사용되는 세포는 게놈 내로 안정적으로 혼입되는 DNA를 갖는다. 이러한 경우에, 접촉은 게놈에 이미 안정적으로 혼입된 작제물을 갖는 세포를 제공하는 것을 포함할 수 있다. 일부 실시형태에서, 본 명세서에 개시된 방법에 사용되는 세포는 게놈에 안정적으로 혼입되는 기존의 Cas-암호화 유전자를 가질 수 있다(즉, Cas-준비 세포). 일부 실시형태에서, 폴리뉴클레오타이드는 세포의 게놈에 통합되고, 이의 자손에 의해서 유전될 수 있다. 임의의 프로토콜이 DNA 작제물 또는 표적화된 게놈 통합 시스템의 다양한 성분의 안정적인 혼입을 위해서 사용될 수 있다.In some embodiments, the cells used in the methods and compositions have DNA stably incorporated into the genome. In such cases, contacting may involve providing cells with the construct already stably incorporated into the genome. In some embodiments, cells used in the methods disclosed herein may have a pre-existing Cas-encoding gene stably incorporated into the genome (i.e., Cas-ready cells). In some embodiments, the polynucleotide is integrated into the cell's genome and can be inherited by its progeny. Any protocol can be used for stable incorporation of the various components of a DNA construct or targeted genomic integration system.

목적하는 인식 서열 또는 목적하는 인식 서열에 결합하는 임의의 DNA-결합 단백질 내에서 닉 또는 이중 가닥 끊김을 유도하는 임의의 뉴클레아제 작용제가 본 명세서에 개시된 방법 및 조성물에 사용될 수 있다. 뉴클레아제 작용제가 목적하는 인식 서열에서 닉 또는 이중 가닥 끊김을 유도하는 한, 자연 발생하는 또는 네이티브 뉴클레아제 작용제가 사용될 수 있다. 마찬가지로, DNA-결합 단백질이 목적하는 인식 서열에 결합하는 한, 자연 발생 또는 네이티브 DNA-결합 단백질이 사용될 수 있다. 대안적으로, 변형된 또는 조작된 뉴클레아제 작용제 또는 DNA-결합 단백질이 사용될 수 있다. 조작된 뉴클레아제 작용제 또는 DNA-결합 단백질은 네이티브, 자연 발생 뉴클레아제 작용제 또는 DNA-결합 단백질로부터 유래될 수 있거나 또는 그것은 인공적으로 생성되거나 합성될 수 있다. 조작된 뉴클레아제 작용제 또는 DNA-결합 단백질은 인식 서열을 인식할 수 있고, 예를 들어, 여기서 인식 서열은 네이티브(조작되지 않거나 또는 변형되지 않은) 뉴클레아제 작용제 또는 DNA-결합 단백질에 의해 인식될 서열이 아니다. 뉴클레아제 작용제 또는 DNA-결합 단백질의 변형은 단백질 절단제 내의 하나의 아미노산 또는 핵산 절단제 내의 하나의 뉴클레오타이드만큼 작은 것일 수 있다.Any nuclease agent that induces a nick or double-strand break within the desired recognition sequence or any DNA-binding protein that binds the desired recognition sequence can be used in the methods and compositions disclosed herein. Naturally occurring or native nuclease agents may be used, as long as the nuclease agent induces a nick or double-strand break in the desired recognition sequence. Likewise, naturally occurring or native DNA-binding proteins can be used, as long as they bind to the desired recognition sequence. Alternatively, modified or engineered nuclease agents or DNA-binding proteins may be used. An engineered nuclease agent or DNA-binding protein can be derived from a native, naturally occurring nuclease agent or DNA-binding protein, or it can be artificially created or synthesized. An engineered nuclease agent or DNA-binding protein can recognize a recognition sequence, for example, wherein the recognition sequence is recognized by a native (unengineered or unmodified) nuclease agent or DNA-binding protein. It's not a priority. Modifications of a nuclease agent or DNA-binding protein can be as small as one amino acid in a protein cleaving agent or one nucleotide in a nucleic acid cleaving agent.

뉴클레아제 작용제에 대한 인식 서열은 닉 또는 이중 가닥 끊김이 뉴클레아제 작용제에 의해서 유도된 DNA 서열을 포함한다. 마찬가지로, DNA-결합 단백질에 대한 인식 서열은 DNA-결합 단백질이 결합할 DNA 서열을 포함한다. 인식 서열은 세포에 내인성(또는 네이티브)일 수 있거나 또는 인식 서열은 세포에 외인성일 수 있다. 인식 서열은 또한 표적 유전자좌에 배치되는 바람직한 관심 대상 폴리뉴클레오타이드에 외인성일 수 있다. 일부 실시형태에서, 인식 서열은 숙주 세포의 게놈에서 단지 1회 존재한다.Recognition sequences for nuclease agents include DNA sequences in which a nick or double-strand break is induced by the nuclease agent. Likewise, the recognition sequence for a DNA-binding protein includes the DNA sequence to which the DNA-binding protein will bind. The recognition sequence may be endogenous (or native) to the cell or the recognition sequence may be exogenous to the cell. The recognition sequence may also be exogenous to the desired polynucleotide of interest placed at the target locus. In some embodiments, the recognition sequence occurs only once in the genome of the host cell.

예시된 인식 서열의 활성 변이체 및 단편이 또한 제공된다. 이러한 활성 변이체는 주어진 인식 서열과 적어도 65%, 적어도 70%, 적어도 75%, 적어도 80%, 적어도 85%, 적어도 90%, 적어도 91%, 적어도 92%, 적어도 93%, 적어도 94%, 적어도 95%, 적어도 96%, 적어도 97%, 적어도 98% 또는 적어도 99% 또는 100% 서열 동일성을 포함할 수 있고, 여기서 활성 변이체는 생물학적 활성도를 보유하며, 서열-특이적 방식으로 뉴클레아제 작용제에 의해 인식 및 절단될 수 있다. 뉴클레아제 작용제에 의한 인식 서열의 이중 가닥 끊김을 측정하기 위한 검정은 공지되어 있다(예를 들어, TAQMAN^(상표명) qPCR 검정, Frendewey et al., Methods in Enzymology, 2010, 476, 295-307).Active variants and fragments of the exemplified recognition sequences are also provided. Such active variants are at least 65%, at least 70%, at least 75%, at least 80%, at least 85%, at least 90%, at least 91%, at least 92%, at least 93%, at least 94%, or at least 95% identical to a given recognition sequence. %, at least 96%, at least 97%, at least 98% or at least 99% or 100% sequence identity, wherein the active variant retains biological activity and can be cleaved by a nuclease agent in a sequence-specific manner. Can be recognized and cut. Assays for measuring double-strand breaks in recognition sequences by nuclease agents are known (e.g., TAQMAN ^™ qPCR assay, Frendewey et al., Methods in Enzymology , 2010, 476, 295-307). .

인식 서열의 길이는 달라질 수 있고, 예를 들어, 아연 핑거 단백질 또는 아연 핑거 뉴클레아제(ZFN) 쌍에 대해 약 30 내지 36bp(즉, 각각의 ZFN에 대해 약 15 내지 18bp)인 인식 서열, TALE 단백질 또는 전사 활성인자-유사 효과기 뉴클레아제(TALEN)에 대해 약 36bP인 인식 서열, 또는 CRISPR/Cas9 가이드 RNA에 대해 약 20bp인 인식 서열을 포함한다.The length of the recognition sequence may vary, for example, the recognition sequence, TALE, is about 30 to 36 bp for a zinc finger protein or zinc finger nuclease (ZFN) pair (i.e., about 15 to 18 bp for each ZFN). A recognition sequence that is approximately 36 bp for a protein or transcription activator-like effector nuclease (TALEN), or a recognition sequence that is approximately 20 bp for a CRISPR/Cas9 guide RNA.

DNA-결합 단백질 또는 뉴클레아제 작용제의 인식 서열은 표적 게놈 유전자좌에 또는 이것 근처에 어디에든 위치될 수 있다. 인식 서열은 유전자(예를 들어, B4GALT1 유전자)의 암호 영역 내에, 또는 유전자의 발현에 영향을 갖는 조절 영역 내에 위치될 수 있다. DNA-결합 단백질 또는 뉴클레아제 작용제의 인식 서열은 인트론, 엑손, 프로모터, 인핸서, 조절 영역 또는 임의의 비-단백질 암호 영역에 위치될 수 있다.The recognition sequence of the DNA-binding protein or nuclease agent may be located anywhere at or near the target genomic locus. The recognition sequence may be located within the coding region of a gene (eg, the B4GALT1 gene) or within a regulatory region that affects expression of the gene. The recognition sequence of a DNA-binding protein or nuclease agent may be located in an intron, exon, promoter, enhancer, regulatory region, or any non-protein coding region.

본 명세서에 개시된 다양한 방법 및 조성물에 사용될 수 있는 DNA-결합 단백질의 하나의 유형은 TALE이다. TALE는, 예를 들어, 에피유전자 변형 도메인, 전사 활성화 도메인, 또는 전사 억제인자 도메인에 융합 또는 연결될 수 있다. 이러한 도메인의 예는 하기에서 Cas 단백질과 관련하여 기재되고, 또한 예를 들어, 국제 특허 제WO 2011/145121호에서 찾아볼 수 있다. 상응하게, 본 명세서에 개시된 다양한 방법 및 조성물에서 사용될 수 있는 뉴클레아제 작용제의 하나의 유형은 TALEN이다. 전사-활성인자-유사(TAL) 효과기 뉴클레아제는 원핵 또는 진핵 유기체의 게놈에서의 특이적 표적 서열에서 이중 가닥 끊김을 만들기 위해 사용될 수 있는 서열-특이적 뉴클레아제의 부류이다. TAL 효과기 뉴클레아제는 네티이브 또는 조작된 TAL 효과기 또는 이의 기능성 부분을 FokI과 같은 엔도뉴클레아제의 촉매적 도메인에 융합시킴으로써 생성된다. 고유한 모듈성 TAL 효과기 DNA 결합 도메인은 잠재적으로 임의의 주어진 DNA 인식 특이성을 갖는 단백질의 설계를 가능하게 한다. 따라서, TAL 효과기 뉴클레아제의 DNA 결합 도메인은 특이적 DNA 표적 부위를 인식하도록 조작될 수 있고, 따라서 이를 사용하여 목적하는 표적 서열에서 이중 가닥 끊김을 만들 수 있다. 적합한 TAL 뉴클레아제, 및 적합한 TAL 뉴클레아제의 제조 방법의 예는 예를 들어, 미국 특허 공개 제2011/0239315호; 제2011/0269234호; 제2011/0145940호; 제2003/0232410호; 제2005/0208489호; 제2005/0026157호; 제2005/0064474호; 제2006/0188987호; 및 제2006/0063231에 개시되어 있다.One type of DNA-binding protein that can be used in the various methods and compositions disclosed herein is TALE. TALEs may be fused or linked to, for example, an epigene modification domain, a transcriptional activation domain, or a transcriptional repressor domain. Examples of such domains are described below in relation to Cas proteins and can also be found, for example, in International Patent No. WO 2011/145121. Correspondingly, one type of nuclease agent that can be used in the various methods and compositions disclosed herein is TALEN. Transcription-activator-like (TAL) effector nucleases are a class of sequence-specific nucleases that can be used to create double strand breaks in specific target sequences in the genome of a prokaryotic or eukaryotic organism. TAL effector nucleases are produced by fusing a native or engineered TAL effector or functional portion thereof to the catalytic domain of an endonuclease such as FokI . The uniquely modular TAL effector DNA binding domain potentially allows the design of proteins with any given DNA recognition specificity. Accordingly, the DNA binding domain of a TAL effector nuclease can be engineered to recognize a specific DNA target site and thus used to create a double strand break in the desired target sequence. Examples of suitable TAL nucleases, and methods of making suitable TAL nucleases, see, for example, US Patent Publication No. 2011/0239315; No. 2011/0269234; No. 2011/0145940; No. 2003/0232410; No. 2005/0208489; No. 2005/0026157; No. 2005/0064474; No. 2006/0188987; and 2006/0063231.

일부 TALEN에서, TALEN의 각각의 단량체는 2개의 초가변성 잔기를 통해서 단일 염기쌍을 인식하는 33 내지 35개의 TAL 반복부를 포함한다. 일부 TALEN에서, 뉴클레아제 작용제는 FokI 엔도뉴클레아제와 같은 독립적인 뉴클레아제에 작동 가능하게 연결된 TAL-반복부-기반 DNA 결합 도메인을 포함하는 키메라 단백질이다. 예를 들어, 뉴클레아제 작용제는 제1 TAL-반복부-기반 DNA 결합 도메인 및 제2 TAL-반복부-기반 DNA 결합 도메인을 포함할 수 있고, 여기서 제1 및 제2 TAL-반복부-기반 DNA 결합 도메인 각각은 FokI 뉴클레아제에 작동 가능하게 연결되며, 여기서 제1 및 제2 TAL-반복부-기반 DNA 결합 도메인은 다양한 길이(약 12 내지 약 20bp)의 스페이서 서열에 의해 분리된 표적 DNA 서열의 각각의 가닥에서 2개의 인접 표적 DNA 서열을 인식하고, FokI 뉴클레아제 소단위는 이량체화되어 표적 서열에서 이중 가닥 끊김을 만드는 활성 뉴클레아제를 생성한다.In some TALENs, each monomer of the TALEN contains 33 to 35 TAL repeats that recognize a single base pair through two hypervariable residues. In some TALENs, the nuclease agonist is a chimeric protein containing a TAL-repeat-based DNA binding domain operably linked to an independent nuclease, such as the FokI endonuclease. For example, the nuclease agent may comprise a first TAL-repeat-based DNA binding domain and a second TAL-repeat-based DNA binding domain, wherein the first and second TAL-repeat-based Each DNA binding domain is operably linked to a FokI nuclease, wherein the first and second TAL-repeat-based DNA binding domains bind target DNA separated by spacer sequences of varying length (about 12 to about 20 bp). Recognizing two adjacent target DNA sequences on each strand of the sequence, the FokI nuclease subunit dimerizes to produce an active nuclease that creates a double-strand break in the target sequence.

DNA-결합 단백질의 다른 예는 아연 핑거 단백질이다. 이러한 아연 핑거 단백질은 예를 들어, 에피유전자 변형 도메인, 전사 활성화 도메인, 또는 전사 억제인자 도메인에 융합 또는 연결될 수 있다. 이러한 도메인의 예는 하기에서 Cas 단백질과 관련하여 기재되고, 또한 예를 들어, 국제 특허 제WO 2011/145121호에서 찾아볼 수 있다. 상응하게, 본 명세서에 개시된 다양한 방법 및 조성물에서 사용될 수 있는 뉴클레아제 작용제의 또 다른 예는 ZFN이다. 일부 ZFN에서, ZFN의 각각의 단량체는 3개 이상의 아연 핑거-기반 DNA 결합 도메인을 포함하고, 여기서 각각의 아연 핑거-기반 DNA 결합 도메인은 3bp 하위부위에 결합한다. 다른 ZFN에서, ZFN은 FokI 엔도뉴클레아제와 같은 독립적인 뉴클레아제에 작동 가능하게 연결된 아연 핑거-기반 DNA 결합 도메인을 포함하는 키메라 단백질이다. 예를 들어, 뉴클레아제 작용제는 제1 ZFN 및 제2 ZFN을 포함할 수 있고, 여기서 제1 ZFN 및 제2 ZFN 각각은 FokI 뉴클레아제 소단위에 작동 가능하게 연결되고, 여기서 제1 ZFN 및 제2 ZFN은 약 5 내지 7bp 스페이서에 의해 분리된 표적 DNA 서열의 각각의 가닥에서 2개의 인접 표적 DNA 서열을 인식하고, 여기서 FokI 뉴클레아제 소단위는 이량체화되어 이중 가닥 끊김을 만드는 활성 뉴클레아제를 생성한다.Another example of a DNA-binding protein is the zinc finger protein. These zinc finger proteins can be fused or linked to, for example, an epigene modification domain, a transcriptional activation domain, or a transcriptional repressor domain. Examples of such domains are described below in relation to Cas proteins and can also be found, for example, in International Patent No. WO 2011/145121. Correspondingly, another example of a nuclease agent that can be used in the various methods and compositions disclosed herein is ZFN. In some ZFNs, each monomer of the ZFN contains three or more zinc finger-based DNA binding domains, where each zinc finger-based DNA binding domain binds a 3bp subsite. In other ZFNs, the ZFN is a chimeric protein containing a zinc finger-based DNA binding domain operably linked to an independent nuclease, such as the FokI endonuclease. For example, the nuclease agent may comprise a first ZFN and a second ZFN, wherein each of the first ZFN and the second ZFN is operably linked to a FokI nuclease subunit, and wherein the first ZFN and the second ZFN are each operably linked to a FokI nuclease subunit. 2 ZFNs recognize two adjacent target DNA sequences on each strand of the target DNA sequence separated by a spacer of approximately 5 to 7 bp, where the FokI nuclease subunit dimerizes to produce an active nuclease that creates a double-strand break. Create.

본 명세서에 기재된 방법 및 조성물에 사용하기 위한 다른 적합한 DNA-결합 단백질 및 뉴클레아제 작용제는, 본 명세서 다른 곳에서 기술되는 CRISPR-Cas 시스템을 포함한다.Other suitable DNA-binding proteins and nuclease agents for use in the methods and compositions described herein include the CRISPR-Cas system described elsewhere herein.

DNA-결합 단백질 또는 뉴클레아제 작용제는 임의의 공지된 수단에 의해 세포 내로 도입될 수 있다. DNA-결합 단백질 또는 뉴클레아제 작용제를 암호화하는 폴리펩타이드는 직접 세포 내로 도입될 수 있다. 대안적으로, DNA-결합 단백질 또는 뉴클레아제 작용제를 암호화하는 폴리뉴클레오타이드는 세포 내로 도입될 수 있다. DNA-결합 단백질 또는 뉴클레아제 작용제를 암호화하는 폴리뉴클레오타이드가 세포 내로 도입되는 경우, DNA-결합 단백질 또는 뉴클레아제 작용제는 세포 내에서 일시적으로, 조건부로, 또는 구성적으로 발현될 수 있다. 예를 들어, DNA-결합 단백질 또는 뉴클레아제 작용제를 암호화하는 폴리뉴클레오타이드는 발현 카세트에 함유될 수 있고, 조건부 프로모터, 유도성 프로모터, 구성적 프로모터 또는 조직-특이적 프로모터에 작동 가능하게 연결될 수 있다. 이러한 프로모터는 본 명세서 다른 곳에서 보다 상세하게 논의된다. 일부 실시형태에서, DNA-결합 단백질 또는 뉴클레아제 작용제는 DNA-결합 단백질 또는 뉴클레아제 작용제를 암호화하는 mRNA로서 세포 내로 도입될 수 있다.DNA-binding proteins or nuclease agents can be introduced into cells by any known means. Polypeptides encoding DNA-binding proteins or nuclease agents can be introduced directly into cells. Alternatively, polynucleotides encoding DNA-binding proteins or nuclease agents can be introduced into cells. When a polynucleotide encoding a DNA-binding protein or nuclease agent is introduced into a cell, the DNA-binding protein or nuclease agent may be expressed transiently, conditionally, or constitutively within the cell. For example, a polynucleotide encoding a DNA-binding protein or nuclease agent can be contained in an expression cassette and operably linked to a conditional promoter, inducible promoter, constitutive promoter, or tissue-specific promoter. . These promoters are discussed in more detail elsewhere herein. In some embodiments, the DNA-binding protein or nuclease agonist may be introduced into the cell as an mRNA encoding the DNA-binding protein or nuclease agonist.

DNA-결합 단백질 또는 뉴클레아제 작용제를 암호화하는 폴리뉴클레오타이드는 세포의 게놈에 안정적으로 통합될 수 있고, 세포에서 활성인 프로모터에 작동 가능하게 연결될 수 있다. 대안적으로, DNA-결합 단백질 또는 뉴클레아제 작용제를 암호화하는 폴리뉴클레오타이드는 표적화 벡터 내에 또는 삽입 폴리뉴클레오타이드를 포함하는 표적화 벡터로부터 분리된 벡터 또는 플라스미드 내에 존재할 수 있다.A polynucleotide encoding a DNA-binding protein or nuclease agent can be stably integrated into the genome of a cell and can be operably linked to a promoter active in the cell. Alternatively, the polynucleotide encoding the DNA-binding protein or nuclease agent may be present within the targeting vector or in a vector or plasmid separate from the targeting vector containing the insert polynucleotide.

DNA-결합 단백질 또는 뉴클레아제 작용제가 DNA-결합 단백질 또는 뉴클레아제 작용제를 암호화하는 폴리뉴클레오타이드의 도입을 통해 세포에 제공되는 경우, 이러한 DNA-결합 단백질 또는 뉴클레아제 작용제를 암호화하는 폴리뉴클레오타이드는 DNA-결합 단백질 또는 뉴클레아제 작용제를 암호화하는 자연 발생 폴리뉴클레오타이드와 비교할 때, 관심대상 세포에서 더 높은 빈도로 사용되는 코돈을 치환하도록 변형될 수 있다. 일부 실시형태에서, DNA-결합 단백질 또는 뉴클레아제 작용제를 암호화하는 폴리뉴클레오타이드는 관심대상의 주어진 원핵 또는 진핵 세포, 예컨대, 박테리아 세포, 효모 세포, 인간 세포, 비인간 세포, 포유동물 세포, 설치류 세포, 마우스 세포, 래트 세포 또는 관심대상의 임의의 다른 숙주 세포에서, 자연적 발생 폴리뉴클레오타이드 서열과 비교할 때, 더 높은 빈도로 사용되는 코돈을 치환하도록 변형될 수 있다.When a DNA-binding protein or nuclease agent is provided to a cell through introduction of a polynucleotide encoding the DNA-binding protein or nuclease agent, the polynucleotide encoding such DNA-binding protein or nuclease agent Compared to naturally occurring polynucleotides encoding DNA-binding proteins or nuclease agents, they can be modified to substitute codons that are used at a higher frequency in the cells of interest. In some embodiments, a polynucleotide encoding a DNA-binding protein or nuclease agent is selected from a given prokaryotic or eukaryotic cell of interest, such as a bacterial cell, yeast cell, human cell, non-human cell, mammalian cell, rodent cell, It can be modified to substitute codons that are used at a higher frequency when compared to naturally occurring polynucleotide sequences in mouse cells, rat cells or any other host cell of interest.

본 명세서에 개시된 방법은 규칙적인 간격을 갖는 짧은 회분 구조의 반복단위(Clustered Regularly Interspersed Short Palindromic Repeat: CRISPR)/CRISPR-연관된(Cas) 시스템 또는 이러한 시스템의 성분을 활용하여 세포 내의 게놈을 변형시킬 수 있다. CRISPR-Cas 시스템은 Cas 유전자의 발현, 또는 활성의 지시에 관련된 전사물 및 다른 요소를 포함한다. CRISPR-Cas 시스템은 타입 I, 타입 II 또는 타입 III 시스템일 수 있다. 대안적으로 CRISPR/Cas 시스템은, 예를 들어, 타입 V 시스템(예를 들어, 아형 V-A 또는 아형 V-B)일 수 있다. 본 명세서에 개시된 방법 및 조성물은 핵산의 부위-지향된 절단을 위해서 CRISPR 복합체(Cas 단백질과 복합체를 형성한 가이드 RNA(gRNA)를 포함함)를 활용함으로써 CRISPR-Cas 시스템을 사용할 수 있다.The methods disclosed herein can utilize the Clustered Regularly Interspersed Short Palindromic Repeat (CRISPR)/CRISPR-Associated (Cas) system or components of such systems to modify the genome within a cell. there is. The CRISPR-Cas system includes transcripts and other elements involved in directing the expression, or activity, of Cas genes. CRISPR-Cas systems may be Type I, Type II, or Type III systems. Alternatively, the CRISPR/Cas system may be, for example, a type V system (e.g., subtype V-A or subtype V-B). The methods and compositions disclosed herein can utilize the CRISPR-Cas system by utilizing a CRISPR complex (comprising a guide RNA (gRNA) complexed with a Cas protein) for site-directed cleavage of nucleic acids.

본 명세서에 개시된 방법에서 사용되는 CRISPR-Cas 시스템은 자연에서 발생하지 않는다. 예를 들어, 일부 CRISPR/Cas 시스템은 자연적으로는 함께 발생하지 않는 gRNA 및 Cas 단백질을 포함하는 비자연 발생 CRISPR 복합체를 사용한다.The CRISPR-Cas system used in the methods disclosed herein does not occur in nature. For example, some CRISPR/Cas systems use non-naturally occurring CRISPR complexes containing gRNA and Cas proteins that do not naturally occur together.

Cas 단백질은 일반적으로 가이드 RNA(하기에 보다 상세하게 기재된 gRNA)와 상호작용할 수 있는 적어도 하나의 RNA 인식 또는 결합 도메인을 포함한다. Cas 단백질은 또한 뉴클레아제 도메인(예를 들어, DNase 또는 RNase 도메인), DNA 결합 도메인, 헬리카제 도메인, 단백질-단백질 상호작용 도메인, 이량체화 도메인, 및 다른 도메인을 포함할 수 있다. 뉴클레아제 도메인은 핵산 절단을 위한 촉매 활성을 보유하고, 핵산 분자의 공유 결합의 파괴를 포함한다. 절단은 블런트 단부 또는 엇갈린 단부를 생성할 수 있고, 단일 가닥 또는 이중 가닥일 수 있다. 야생형 Cas9 단백질은 전형적으로 블런트 절단 생성물을 생성할 것이다. 대안적으로, 야생형 Cpf1 단백질(예를 들어, FnCpf1)은 5-뉴클레오타이드 5' 오버행을 갖는 절단 생성물을 초래할 수 있는데, 절단은 비표적화 가닥 상의 PAM 서열로부터 18번째 염기쌍 이후에 그리고 표적화된 가닥 상에서 23번째 염기 이후에 일어난다. Cas 단백질은 내인성 B4GALT1 유전자에서 이중 가닥 끊김(예를 들어, 블런트 단부를 갖는 이중 가닥 끊김)을 생성하기 위해서 전체 절단 활성도를 가질 수 있거나, 또는 그것은 내인성 B4GALT1 유전자에서 단일 가닥 끊김을 생성하는 닉카제일 수 있다.Cas proteins generally contain at least one RNA recognition or binding domain that can interact with a guide RNA (gRNA, described in more detail below). Cas proteins may also include a nuclease domain (e.g., a DNase or RNase domain), a DNA binding domain, a helicase domain, a protein-protein interaction domain, a dimerization domain, and other domains. The nuclease domain possesses catalytic activity for cleavage of nucleic acids, including the breaking of covalent bonds in nucleic acid molecules. Cleavage may produce blunt or staggered ends and may be single or double stranded. Wild-type Cas9 protein will typically produce blunt cleavage products. Alternatively, wild-type Cpf1 protein (e.g., FnCpf1) may result in a cleavage product with a 5-nucleotide 5' overhang, with cleavage occurring after the 18th base pair from the PAM sequence on the non-targeting strand and after 23 base pairs on the targeted strand. Occurs after the second base. The Cas protein may have global cleavage activity to create a double strand break in the endogenous B4GALT1 gene (e.g., a double strand break with blunt ends), or it may be a nickase that creates a single strand break in the endogenous B4GALT1 gene. there is.

Cas 단백질의 예는 Cas1, Cas1B, Cas2, Cas3, Cas4, Cas5, Cas5e(CasD), Cas6, Cas6e, Cas6f, Cas7, Cas8a1, Cas8a2, Cas8b, Cas8c, Cas9(Csn1 또는 Csx12), Cas10, Casl0d, CasF, CasG, CasH, Csy1, Csy2, Csy3, Cse1(CasA), Cse2(CasB), Cse3(CasE), Cse4(CasC), Csc1, Csc2, Csa5, Csn2, Csm2, Csm3, Csm4, Csm5, Csm6, Cmr1, Cmr3, Cmr4, Cmr5, Cmr6, Csb1, Csb2, Csb3, Csx17, Csx14, Csx10, Csx16, CsaX, Csx3, Csx1, Csx15, Csf1, Csf2, Csf3, Csf4 및 Cu1966, 및 이들의 동족체 또는 변형된 버전을 포함하지만 이들로 제한되지 않는다.Examples of Cas proteins include Cas1, Cas1B, Cas2, Cas3, Cas4, Cas5, Cas5e (CasD), Cas6, Cas6e, Cas6f, Cas7, Cas8a1, Cas8a2, Cas8b, Cas8c, Cas9 (Csn1 or Csx12), Cas10, Casl0d, CasF , CasG, CasH, Csy1, Csy2, Csy3, Cse1(CasA), Cse2(CasB), Cse3(CasE), Cse4(CasC), Csc1, Csc2, Csa5, Csn2, Csm2, Csm3, Csm4, Csm5, Csm6, Cmr1 , Cmr3, Cmr4, Cmr5, Cmr6, Csb1, Csb2, Csb3, Csx17, Csx14, Csx10, Csx16, CsaX, Csx3, Csx1, Csx15, Csf1, Csf2, Csf3, Csf4 and Cu1966, and homologs or modified versions thereof. Including but not limited to these.

일부 실시형태에서, Cas 단백질은 Cas9 단백질이거나 또는 타입 II CRISPR-Cas 시스템으로부터의 Cas9 단백질로부터 유래된다. Cas9 단백질은 타입 II CRISPR-Cas 시스템로부터 유래되거나, 전형적으로 보존된 구조를 갖는 4개의 주요 모티프를 공유한다. 모티프 1, 2 및 4는 RuvC-유사 모티프이고, 모티프 3은 HNH 모티프이다. 예시적인 Cas9 단백질은 스트렙토코쿠스 피오게네스(Streptococcus pyogenes), 스트렙토코쿠스 써모필루스(Streptococcus thermophilus), 스트렙토코쿠스 종(Streptococcus sp.), 스타필로코쿠스 아우레우스(Staphylococcus aureus), 노카르디옵시스 다쏜빌레이(Nocardiopsis dassonvillei), 스트렙토마이세스 프리스티나에스피랄리스(Streptomyces pristinaespiralis), 스트렙토마이세스 비리도크로모게네스(Streptomyces viridochromogenes), 스트렙토마이세스 비리도크로모게네스(Streptomyces viridochromogenes), 스트렙토스포랑기움 로제움(Streptosporangium roseum), 스트렙토스포랑기움 로제움(Streptosporangium roseum), 알리사이클로바실루스 아시도칼다리우스(Alicyclobacillus acidocaldarius), 바실루스 슈도마이코이데스(Bacillus pseudomycoides), 바실루스 셀레니티레두센스(Bacillus selenitireducens), 엑시구오박테리움 시비리쿰(Exiguobacterium sibiricum), 락토바실루스 델브루엑키이(Lactobacillus delbrueckii), 락토바실루스 살리바리우스(Lactobacillus salivarius), 마이크로실라 마리나(Microscilla marina), 부르콜데리알레스 박테리움(Burkholderiales bacterium), 폴라로모나스 나프탈레니보란스(Polaromonas naphthalenivorans), 폴라로모나스 종(Polaromonas sp.), 크로코스파에라 왓소니이(Crocosphaera watsonii), 시아노테스 종(Cyanothece sp.), 마이크로시스티스 아에루기노사(Microcystis aeruginosa), 시네코코쿠스 종(Synechococcus sp.), 아세토할로비움 아라바티쿰(Acetohalobium arabaticum), 암모니펙스 데겐시이(Ammonifex degensii), 칼디셀룰로시룹터 벡시이(Caldicelulosiruptor becscii), 칸디다투스 데설포루디스(Candidatus Desulforudis), 클로스트리디움 보툴리눔(Clostridium botulinum), 클로스트리디움 디피실레(Clostridium difficile), 피네골디아 마그나(Finegoldia magna), 나트라나에로비우스 써모필루스(Natranaerobius thermophilus), 펠로토마쿨룸 써모프로피오니쿰(Pelotomaculum thermopropionicum), 아시디티오바실루스 칼두스(Acidithiobacillus caldus), 아시디티오바실루스 페로옥시단스(Acidithiobacillus ferrooxidans), 알로크로마티움 비노숨(Allochromatium vinosum), 마리노박터 종(Marinobacter sp.), 니트로소코쿠스 할로필루스(Nitrosococcus halophilus), 니트로소코쿠스 왓소니(Nitrosococcus watsoni), 슈도알테로모나스 할로플란크티스(Pseudoalteromonas haloplanktis), 크테도노박터 라세미페르(Ktedonobacter racemifer), 메타노할로비움 에베스티가툼(Methanohalobium evestigatum), 아나바에나 바리아빌리스(Anabaena variabilis), 노둘라리아 스퍼미게나(Nodularia spumigena), 노스톡 종(Nostoc sp.), 아르트로스피라 막시마(Arthrospira maxima), 아르트로스피라 플라텐시스(Arthrospira platensis), 아르트로스피라 종(Arthrospira sp.), 링비아 종(Lyngbya sp.), 마이크로콜레우스 크토노플라스테스(Microcoleus chthonoplastes), 오실라토리아 종(Oscillatoria sp.), 페트로토가 모빌리스(Petrotoga mobilis), 써모시포 아프리카누스(Thermosipho africanus), 또는 아카리오클로리스 마리나(Acaryochloris marina)로부터의 것을 포함하지만 이들로 제한되지 않는다. Cas9 패밀리 구성원의 추가 예는 국제 특허 제WO 2014/131833호에 기재된다. S. 피오게네스로부터의 Cas9(SwissProt 등록 번호 Q99ZW2에 배정)가 예시적인 효소이다. S. 아우레우스로부터의 Cas9(UniProt 등록 번호 J7RUA5에 배정됨)가 또 다른 예시적인 효소이다.In some embodiments, the Cas protein is a Cas9 protein or is derived from a Cas9 protein from a type II CRISPR-Cas system. Cas9 proteins are derived from type II CRISPR-Cas systems, or typically share four major motifs with a conserved structure. Motifs 1, 2 and 4 are RuvC-like motifs and motif 3 is a HNH motif. Exemplary Cas9 proteins include Streptococcus pyogenes, Streptococcus thermophilus , Streptococcus sp. , Staphylococcus aureus , and Nocardiopsis dassonvillei , Streptomyces pristinaespiralis, Streptomyces viridochromogenes , Streptomyces viridochromogenes , Streptomyces Streptosporangium roseum, Streptosporangium roseum, Alicyclobacillus acidocaldarius, Bacillus pseudomycoides , Bacillus selenitireducens selenitireducens ), Exiguobacterium sibiricum , Lactobacillus delbrueckii, Lactobacillus salivarius , Microscilla marina , Burkholderiales bacterium bacterium ), Polaromonas naphthalenivorans, Polaromonas sp. , Crocosphaera watsonii, Cyanothece sp. , Microcystis Aeruginosa ( Microcystis aeruginosa ), Synechococcus sp. ), Acetohalobium arabaticum , Ammonifex degensii , Caldicelulosiruptor becscii, Candidatus Desulforudis , Clostridium botulinum ( Clostridium botulinum ), Clostridium difficile ( Clostridium difficile ), Finegoldia magna (Natranaerobius thermophilus), Pelotomaculum thermopropionicum ( Pelotomaculum thermopropionicum ), Asidi Acidithiobacillus caldus , Acidithiobacillus ferrooxidans, Allochromatium vinosum , Marinobacter sp. , Nitrosococcus halophilus , Nitrosococcus watsoni , Pseudoalteromonas haloplanktis , Ktedonobacter racemifer, Methanohalobium evestigatum, Anavar Anabaena variabilis , Nodularia spumigena, Nostoc sp. , Arthrospira maxima , Arthrospira platensis , Arthrospira sp . , Lyngbya sp. , Microcoleus chthonoplastes, Oscillatoria sp. ), Petrotoga mobilis, Thermosipho africanus, or Acaryochloris marina . Additional examples of Cas9 family members are described in International Patent No. WO 2014/131833. Cas9 from S. pyogenes (assigned SwissProt accession number Q99ZW2) is an exemplary enzyme. Cas9 from S. aureus (assigned UniProt accession number J7RUA5) is another exemplary enzyme.

Cas 단백질의 또 다른 예는 Cpf1(프레보텔라(Prevotella) 및 프란시셀라(Francisella) 1로부터의 CRISPR) 단백질이다. Cpf1은 Cas9의 특징적인 아르기닌-풍부 클러스터에 대한 대응 부분과 함께 Cas9의 상응하는 도메인에 상동성인 RuvC-유사 뉴클레아제 도메인을 함유하는 큰 단백질(약 1300개 아미노산)이다. 그러나, Cpf1은, HNH 도메인을 포함하는 긴 삽입물을 함유하는 Cas9와는 대조적으로, Cas9 단백질에 존재하는 HNH 뉴클레아제 도메인이 결핍되어 있고, RuvC-유사 도메인은 Cpf1 서열에서 연속적이다. 예시적인 Cpf1 단백질은 프란시셀라 툴라렌시스 1(Francisella tularensis 1), 프란시셀라 툴라렌시스 아종 노비시다(Francisella tularensis subsp. novicida), 프레보텔란 알벤시스(Prevotella albensis), 라크노스피라세아에 박테리움(Lachnospiraceae bacterium) MC2017 1, 부티리비브리오 프로테오클라스티쿠스(Butyrivibrio proteoclasticus), 페레그리니박테리아 박테리움(Peregrinibacteria bacterium) GW2011_GWA2_33_10, 파르쿠박테리아 박테리움(Parcubacteria bacterium) GW2011_GWC2_44_17, 스미텔라 종(Smithella sp.) SCADC, 아시다미노코쿠스 종(Acidaminococcus sp.) BV3L6, 라크노스피라세아에 박테리움(Lachnospiraceae bacterium) MA2020, 칸디다투스 메타노플라스마 테르미툼(Candidatus Methanoplasma termitum), 유박테리움 엘리겐스(Eubacterium eligens), 모락셀라 보보쿨리 237(Moraxella bovoculi 237), 렙토스피라 이나다이(Leptospira inadai), 라크노스피라세아에 박테리움(Lachnospiraceae bacterium) ND2006, 포르피로모나스 크레비오리카니스(Porphyromonas crevioricanis) 3, 프레보텔라 디시엔스(Prevotella disiens), 및 포르피로모나스 마카카에(Porphyromonas macacae)로부터의 것을 포함하지만 이들로 제한되지 않는다. 프란시셀라 노비시다(Francisella novicida) U112로부터의 Cpf1(FnCpf1; UniProt 등록 번호 A0Q7Q2에 배정됨)이 예시적인 효소이다.Another example of a Cas protein is the Cpf1 (CRISPR from Prevotella and Francisella 1) protein. Cpf1 is a large protein (approximately 1300 amino acids) containing a RuvC-like nuclease domain homologous to the corresponding domain of Cas9 along with a counterpart to the characteristic arginine-rich cluster of Cas9. However, Cpf1 lacks the HNH nuclease domain present in the Cas9 protein, and the RuvC-like domain is continuous in the Cpf1 sequence, in contrast to Cas9, which contains a long insert containing the HNH domain. Exemplary Cpf1 proteins include Francisella tularensis 1, Francisella tularensis subsp. novicida , Prevotella albensis , Lachnospiraceae Bacterium ( Lachnospiraceae bacterium ) MC2017 1, Butyrivibrio proteoclasticus , Peregrinibacteria bacterium GW2011_GWA2_33_10 , Parcubacteria bacterium GW2011_GWC2_44_17 , Mitella sp . . ) SCADC, Acidaminococcus sp. BV3L6 , Lachnospiraceae bacterium MA2020 , Candidatus Methanoplasma termitum , Eubacterium eligens ), Moraxella bovoculi 237, Leptospira inadai , Lachnospiraceae bacterium ND2006 , Porphyromonas crevioricanis 3 , Prevotella Including, but not limited to, those from Prevotella disiens , and Porphyromonas macacae. Cpf1 (FnCpf1; assigned UniProt accession number A0Q7Q2) from Francisella novicida U112 is an exemplary enzyme.

Cas 단백질은 야생형 단백질(즉, 자연에서 발생하는 것), 변형된 Cas 단백질(즉, Cas 단백질 변이체), 또는 야생형 또는 변형된 Cas 단백질의 단편일 수 있다. Cas 단백질은 또한 야생형 또는 변형된 Cas 단백질의 활성 변이체 또는 단편일 수 있다. 활성 변이체 또는 단편은 야생형 또는 변형된 Cas 단백질 또는 이들의 일부와 적어도 80%, 적어도 85%, 적어도 90%, 적어도 91%, 적어도 92%, 적어도 93%, 적어도 94%, 적어도 95%, 적어도 96%, 적어도 97%, 적어도 98% 또는 적어도 99% 또는 100% 서열 동일성을 포함할 수 있고, 여기서 활성 변이체는 목적하는 절단 부위에서 절단되는 능력을 보유하고, 따라서 닉-유도 또는 이중 가닥 끊김-유도 활성을 보유한다. 닉-유도 또는 이중 가닥 끊김-유도 활성에 대한 검정은 공지되어 있고 일반적으로 절단 부위를 함유한 DNA 기질에 대한 Cas 단백질의 전체 활성도 및 특이성을 측정한다.The Cas protein may be a wild-type protein (i.e., that which occurs in nature), a modified Cas protein (i.e., a Cas protein variant), or a fragment of a wild-type or modified Cas protein. The Cas protein may also be an active variant or fragment of a wild-type or modified Cas protein. The active variant or fragment is at least 80%, at least 85%, at least 90%, at least 91%, at least 92%, at least 93%, at least 94%, at least 95%, or at least 96% of the wild-type or modified Cas protein or portion thereof. %, at least 97%, at least 98% or at least 99% or 100% sequence identity, wherein the active variant retains the ability to cleave at the desired cleavage site and thus nick-induced or double-strand break-induced Remains active. Assays for nick-inducing or double-strand break-inducing activity are known and generally measure the overall activity and specificity of Cas proteins for DNA substrates containing cleavage sites.

Cas 단백질은 적어도 하나의 뉴클레아제 도메인, 예컨대, DNase 도메인을 포함할 수 있다. 예를 들어, 야생형 Cpf1 단백질은 일반적으로, 아마도 이량체 형태의 표적 DNA의 두 가닥을 절단하는 RuvC-유사 도메인을 포함한다. Cas 단백질은 또한 적어도 2개의 뉴클레아제 도메인, 예를 들어 DNase 도메인을 포함할 수 있다. 예를 들어, 야생형 Cas9 단백질은 일반적으로 RuvC-유사 뉴클레아제 도메인 및 HNH-유사 뉴클레아제 도메인을 포함한다. RuvC 및 HNH 도메인은 각각 DNA에 이중 가닥 끊김을 만들기 위해서 이중 가닥 DNA의 상이한 가닥을 절단할 수 있다. Cas proteins may include at least one nuclease domain, such as a DNase domain. For example, the wild-type Cpf1 protein generally contains a RuvC-like domain that cleaves both strands of target DNA, possibly in dimeric form. Cas proteins may also include at least two nuclease domains, such as DNase domains. For example, wild-type Cas9 proteins generally contain a RuvC-like nuclease domain and an HNH-like nuclease domain. The RuvC and HNH domains can each cleave a different strand of double-stranded DNA to create a double-strand break in the DNA.

Cas 단백질(예를 들어, 뉴클레아제-활성 Cas 단백질 또는 뉴클레아제-비활성 Cas 단백질)은 또한 융합 단백질로서 이종 폴리펩타이드에 작동 가능하게 연결될 수 있다. 예를 들어, Cas 단백질은 절단 도메인, 에피유전자 변형 도메인, 전사 활성화 도메인, 또는 전사 억제인자 도메인에 융합될 수 있다. 전사 활성화 도메인의 예는 단순 포진 바이러스 VP16 활성화 도메인, VP64(VP16의 사량체 유도체임), NFκB p65 활성화 도메인, p53 활성화 도메인 1 및 2, CREB(cAMP 반응 요소 결합 단백질) 활성화 도메인, E2A 활성화 도메인 및 NFAT(활성화된 T-세포의 핵 인자) 활성화 도메인을 포함한다. 다른 실시예는 Oct1, Oct-2A, SP1, AP-2, CTF1, P300, CBP, PCAF, SRC1, PvALF, ERF-2, OsGAI, HALF-1, C1, AP1, ARF-5, ARF-6, ARF-7, ARF-8, CPRF1, CPRF4, MYC-RP/GP, TRAB1PC4 및 HSF1로부터의 활성화 도메인을 포함하지만 이들로 제한되지 않는다(미국 특허 출원 공개 제2016/0237456호, 유럽 특허 제EP3045537호 및 PCT 공개 제WO 2011/145121호 참고).Cas proteins (e.g., nuclease-active Cas proteins or nuclease-inactive Cas proteins) can also be operably linked to heterologous polypeptides as fusion proteins. For example, a Cas protein can be fused to a cleavage domain, epigene modification domain, transcriptional activation domain, or transcriptional repressor domain. Examples of transcriptional activation domains include the herpes simplex virus VP16 activation domain, VP64 (which is a tetrameric derivative of VP16), NFκB p65 activation domain, p53 activation domains 1 and 2, CREB (cAMP response element binding protein) activation domain, E2A activation domain, and Contains an NFAT (nuclear factor of activated T-cell) activation domain. Other examples include Oct1, Oct-2A, SP1, AP-2, CTF1, P300, CBP, PCAF, SRC1, PvALF, ERF-2, OsGAI, HALF-1, C1, AP1, ARF-5, ARF-6, Activation domains from ARF-7, ARF-8, CPRF1, CPRF4, MYC-RP/GP, TRAB1PC4 and HSF1 (US Patent Application Publication No. 2016/0237456, European Patent No. EP3045537 and See PCT Publication No. WO 2011/145121).

일부 실시형태에서, MS2-p65-HSF1과 짝지워진 dCas9-VP64 융합 단백질을 포함하는 전사 활성화 시스템이 사용될 수 있다. 이러한 시스템에서 가이드 RNA는 이량체화된 MS2 박테리오파지 코트 단백질에 결합하도록 설계된 sgRNA 테트라루프 및 스템-루프 2에 첨부된 압타머 서열로 설계될 수 있다(예를 들어, 문헌[Konermann et al., Nature, 2015, 517, 583-588] 참고). 전사 억제인자 도메인의 예는 유도성 cAMP 초기 억제인자(ICER) 도메인, Kruppel-연관된 박스 A(KRAB-A) 억제인자 도메인, YY1 글리신 풍부 억제인자 도메인, Sp1-유사 억제인자, E(spl) 억제인자, ΙκB 억제인자 및 MeCP2를 포함한다. 다른 예는 A/B, KOX, TGF-베타-유도성 초기 유전자(TIEG), v-erbA, SID, SID4X, MBD2, MBD3, DNMT1, DNMG3A, DNMT3B, Rb, ROM2로부터의 전사 억제인자 도메인을 포함하지만 이들로 제한되지 않는다(예를 들어, 유럽 특허 제EP3045537호 및 PCT 공개 제WO 2011/145121호). Cas 단백질은 또한 증가되거나 또는 감소된 안정성을 제공하는 이종 폴리펩타이드에 융합될 수 있다. 융합된 도메인 또는 이종 폴리펩타이드는 N 말단에서, C 말단에서 또는 Cas 단백질 내의 내부에 위치될 수 있다.In some embodiments, a transcriptional activation system comprising a dCas9-VP64 fusion protein coupled to MS2-p65-HSF1 can be used. In these systems, the guide RNA can be designed with an aptamer sequence attached to the sgRNA tetraloop and stem-loop 2 designed to bind to the dimerized MS2 bacteriophage coat protein (see, e.g., Konermann et al., Nature , 2015, 517, 583-588]). Examples of transcriptional repressor domains include the inducible cAMP early repressor (ICER) domain, Kruppel-associated box A (KRAB-A) repressor domain, YY1 glycine-rich repressor domain, Sp1-like repressor, and E(spl) repressor. factor, ΙκB inhibitor, and MeCP2. Other examples include transcriptional repressor domains from A/B, KOX, TGF-beta-inducible early gene (TIEG), v-erbA, SID, SID4X, MBD2, MBD3, DNMT1, DNMG3A, DNMT3B, Rb, and ROM2. However, it is not limited to these (e.g. European Patent No. EP3045537 and PCT Publication No. WO 2011/145121). Cas proteins can also be fused to heterologous polypeptides to provide increased or decreased stability. The fused domain or heterologous polypeptide can be located at the N terminus, at the C terminus, or internally within the Cas protein.

Cas 단백질은 세포내 국지화를 제공하는 이종 폴리펩타이드에 융합된 Cas 단백질이다. 이러한 이종 폴리펩타이드는, 예를 들어, 하나 이상의 핵 국지화 신호(NLS), 예컨대, 핵에 대해서 표적화하기 위한 SV40 NLS, 미토콘드리아로 표적화하기 위한 미토콘드리아 국지화 신호, ER 보유 신호 등을 포함할 수 있다. 이러한 세포내 국지화 신호는 N-말단에서, C-말단에서, 또는 단백질 내에 어디든지 위치될 수 있다. NLS는 기본적인 아미노산의 스트레치를 포함할 수 있고, 모노파타이트 서열(monopartite sequence) 또는 바이파타이트 서열(bipartite sequence)일 수 있다.Cas proteins are Cas proteins fused to heterologous polypeptides that provide intracellular localization. Such heterologous polypeptides may include, for example, one or more nuclear localization signals (NLS), such as SV40 NLS for targeting to the nucleus, a mitochondrial localization signal for targeting to mitochondria, an ER retention signal, etc. This intracellular localization signal can be located at the N-terminus, at the C-terminus, or anywhere within the protein. The NLS may contain a stretch of basic amino acids and may be a monopartite sequence or a bipartite sequence.

Cas 단백질은 또한 세포-관통 도메인에 작동 가능하게 연결될 수 있다. 예를 들어, 세포-관통 도메인은 HIV-1 TAT 단백질, 인간 B형 간염 바이러스로부터의 TLM 세포-관통 모티프, MPG, Pep-1, VP22, 단순 포진 바이러스로부터의 세포-관통 펩타이드, 또는 폴리아르기닌 펩타이드 서열로부터 유래될 수 있다. 세포-관통 도메인은 N-말단에서, C-말단에서, 또는 Cas단백질 내에 어디든지 위치될 수 있다.Cas proteins can also be operably linked to a cell-penetrating domain. For example, the cell-penetrating domain may be the HIV-1 TAT protein, the TLM cell-penetrating motif from human hepatitis B virus, MPG, Pep-1, VP22, a cell-penetrating peptide from herpes simplex virus, or a polyarginine peptide. It can be derived from a sequence. The cell-penetrating domain can be located at the N-terminus, at the C-terminus, or anywhere within the Cas protein.

Cas 단백질은 또한 추적 또는 정제의 용이성을 위해서 이종 폴리펩타이드, 예컨대, 형광 단백질, 정제 태그, 또는 에피토프 태그에 작동 가능하게 연결될 수 있다. 형광 단백질의 예는 녹색 형광 단백질(예를 들어, GFP, GFP-2, tagGFP, turboGFP, eGFP, Emerald, Azami Green, 단량체 Azami Green, CopGFP, AceGFP, ZsGreenl), 황색 형광 단백질(예를 들어, YFP, eYFP, Citrine, Venus, YPet, PhiYFP, ZsYellowl), 청색 형광 단백질(예를 들어 eBFP, eBFP2, Azurite, mKalamal, GFPuv, Sapphire, T-sapphire), 청록색 형광 단백질(예를 들어 eCFP, Cerulean, CyPet, AmCyanl, Midoriishi-Cyan), 적색 형광 단백질(mKate, mKate2, mPlum, DsRed 단량체, mCherry, mRFP1, DsRed-Express, DsRed2, DsRed-단량체, HcRed-Tandem, HcRedl, AsRed2, eqFP611, mRaspberry, mStrawberry, Jred), 주황색 형광 단백질(mOrange, mKO, Kusabira-Orange, 단량체 Kusabira-Orange, mTangerine, tdTomato) 및 임의의 다른 적합한 형광 단백질을 포함한다. 태그의 예는 글루타티온-S-트랜스퍼라제(GST), 키틴 결합 단백질(CBP), 말토스 결합 단백질, 티오레독신(TRX), 폴리(NANP), 탠덤 친화성 정제(TAP) 태그, myc, AcV5, AU1, AU5, E, ECS, E2, FLAG, 헤마글루티닌(HA), nus, Softag 1, Softag 3, Strep, SBP, Glu-Glu, HSV, KT3, S, S1, T7, V5, VSV-G, 히스티딘(His), 바이오틴 카복실 담체 단백질(BCCP) 및 칼모듈린을 포함한다.Cas proteins can also be operably linked to heterologous polypeptides, such as fluorescent proteins, purification tags, or epitope tags for ease of tracking or purification. Examples of fluorescent proteins include green fluorescent protein (e.g., GFP, GFP-2, tagGFP, turboGFP, eGFP, Emerald, Azami Green, monomeric Azami Green, CopGFP, AceGFP, ZsGreenl), yellow fluorescent protein (e.g., YFP , eYFP, Citrine, Venus, YPet, PhiYFP, ZsYellowl), blue fluorescent proteins (e.g. eBFP, eBFP2, Azurite, mKalamal, GFPuv, Sapphire, T-sapphire), cyan fluorescent proteins (e.g. eCFP, Cerulean, CyPet , AmCyanl, Midoriishi-Cyan), red fluorescent protein (mKate, mKate2, mPlum, DsRed monomer, mCherry, mRFP1, DsRed-Express, DsRed2, DsRed-monomer, HcRed-Tandem, HcRedl, AsRed2, eqFP611, mRaspberry, mStrawberry, Jred ), orange fluorescent protein (mOrange, mKO, Kusabira-Orange, monomeric Kusabira-Orange, mTangerine, tdTomato) and any other suitable fluorescent protein. Examples of tags include glutathione-S-transferase (GST), chitin binding protein (CBP), maltose binding protein, thioredoxin (TRX), poly(NANP), tandem affinity purification (TAP) tag, myc, AcV5. , AU1, AU5, E, ECS, E2, FLAG, hemagglutinin (HA), nus, Softag 1, Softag 3, Strep, SBP, Glu-Glu, HSV, KT3, S, S1, T7, V5, VSV -G, histidine (His), biotin carboxyl carrier protein (BCCP) and calmodulin.

Cas9 단백질은 또한 외인성 공여자 서열 또는 표지된 핵산에 테더링될 수 있다. 이러한 테더링(즉, 물리적 연결)은 공유 상호작용 또는 비공유 상호작용을 통해 달성될 수 있고, 테더링은 직접적이거나(예를 들어, 단백질 상의 시스테인 또는 라이신 잔기의 변형에 의해서 또는 인테인 변형에 의해서 달성될 수 있는 직접 융합 또는 화학적 접합을 통해서), 또는 하나 이상의 개재하는 링커 또는 스트렙타비딘 또는 압타머와 같은 어댑터 분자를 통해서 달성될 수 있다. 단백질-핵산 접합체를 합성하기 위한 비공유 전략은 바이오틴-스트렙타비딘 및 니켈-히스티딘 방법을 포함한다. 공유 단백질-핵산 접합체는 적절하게 기능화된 핵산 및 단백질을 광범위한 화학을 사용하여 연결시킴으로써 합성될 수 있다. 이런 화학 중 일부는 단백질 표면 상의 아미노산 잔기(예를 들어, 라이신 아민 또는 시스테인 티올)에 대한 올리고뉴클레오타이드의 직접 부착을 포함하는 반면, 다른 더 복잡한 방법은 단백질의 번역후 변형 또는 촉매적 또는 반응성 단백질 도메인의 포함이 요구된다. 핵산에 대한 단백질의 공유 부착을 위한 방법은, 예를 들어, 단백질 라이신 또는 시스테인 잔기에 대한 올리고뉴클레오타이드의 화학적 가교, 발현된 단백질 결찰, 화학효소적 방법, 및 광압타머(photoaptamer)의 사용을 포함할 수 있다. 외인성 공여자 서열 또는 표지된 핵산은 C-말단에, N-말단에, 또는 Cas9 단백질 내의 내부 영역에 테더링될 수 있다. 일부 실시형태에서, 외인성 공여자 서열 또는 표지된 핵산은 Cas9 단백질의 C-말단 또는 N-말단에 테더링된다. 마찬가지로, Cas9 단백질은 5' 단부, 3' 단부 또는 외인성 도너 서열 또는 표지된 핵산 내의 내부 영역에 테더링될 수 있다. 일부 실시형태에서, Cas9 단백질은 외인성 공여자 서열 또는 표지된 핵산의 5' 단부 또는 3' 단부에 테더링된다.Cas9 protein can also be tethered to an exogenous donor sequence or labeled nucleic acid. This tethering (i.e., physical connection) can be achieved through covalent or non-covalent interactions, and tethering can be direct (e.g., by modification of cysteine or lysine residues on the protein or by intein modification). This can be achieved through direct fusion or chemical conjugation), or through one or more intervening linkers or adapter molecules such as streptavidin or aptamers. Noncovalent strategies for synthesizing protein-nucleic acid conjugates include biotin-streptavidin and nickel-histidine methods. Covalent protein-nucleic acid conjugates can be synthesized by linking appropriately functionalized nucleic acids and proteins using a wide range of chemistries. Some of these chemistries involve direct attachment of oligonucleotides to amino acid residues (e.g., lysine amines or cysteine thiols) on the protein surface, while other, more complex methods involve post-translational modification of the protein or catalytic or reactive protein domains. inclusion is required. Methods for covalent attachment of proteins to nucleic acids may include, for example, chemical cross-linking of oligonucleotides to protein lysine or cysteine residues, ligation of expressed proteins, chemoenzymatic methods, and the use of photoaptamers. You can. The exogenous donor sequence or labeled nucleic acid can be tethered to the C-terminus, to the N-terminus, or to an internal region within the Cas9 protein. In some embodiments, the exogenous donor sequence or labeled nucleic acid is tethered to the C-terminus or N-terminus of the Cas9 protein. Likewise, the Cas9 protein can be tethered to the 5' end, 3' end, or internal region within an exogenous donor sequence or labeled nucleic acid. In some embodiments, the Cas9 protein is tethered to an exogenous donor sequence or to the 5' end or 3' end of a labeled nucleic acid.

Cas 단백질은 임의의 형태로 제공될 수 있다. 예를 들어, Cas 단백질은 단백질, 예컨대, gRNA와 복합체를 이룬 Cas 단백질의 형태로 제공될 수 있다. 대안적으로, Cas 단백질은 Cas 단백질을 암호화하는 핵산, 예를 들어 RNA(예를 들어, 메신저 RNA(mRNA)) 또는 DNA의 형태로 제공될 수 있다. 일부 실시형태에서, Cas 단백질을 암호화하는 핵산은 특정 세포 또는 유기체에서 단백질로의 효율적인 번역을 위해 코돈 최적화될 수 있다. 예를 들어, Cas 단백질을 암호화하는 핵산은 박테리아 세포, 효모 세포, 인간 세포, 비-인간 세포, 포유류 세포, 설치류 세포, 마우스 세포, 래트 세포 또는 관심대상의 임의의 다른 숙주 세포에서, 자연 발생 폴리뉴클레오타이드 서열과 비교할 때, 더 높은 빈도로 사용되는 코돈을 치환하도록 변형될 수 있다. Cas 단백질을 암호화하는 핵산이 세포 내로 도입되는 경우, Cas 단백질은 일시적으로, 조건부로, 또는 구성적으로 세포에서 발현될 수 있다.Cas proteins can be provided in any form. For example, the Cas protein can be provided in the form of a protein, such as a Cas protein complexed with a gRNA. Alternatively, the Cas protein may be provided in the form of a nucleic acid encoding the Cas protein, such as RNA (e.g., messenger RNA (mRNA)) or DNA. In some embodiments, nucleic acids encoding Cas proteins can be codon optimized for efficient translation into proteins in specific cells or organisms. For example, the nucleic acid encoding the Cas protein can be expressed as a naturally occurring polynucleotide in a bacterial cell, yeast cell, human cell, non-human cell, mammalian cell, rodent cell, mouse cell, rat cell, or any other host cell of interest. Compared to the nucleotide sequence, it can be modified to replace codons that are used with higher frequency. When a nucleic acid encoding a Cas protein is introduced into a cell, the Cas protein may be transiently, conditionally, or constitutively expressed in the cell.

Cas 단백질을 암호화하는 핵산은 세포의 게놈에 안정적으로 통합될 수 있고, 세포에서 활성인 프로모터에 작동 가능하게 연결될 수 있다. 대안적으로, Cas 단백질을 암호화하는 핵산이 발현 작제물에서 프로모터에 작동 가능하게 연결될 수 있다. 발현 작제물은 유전자 또는 관심대상의 다른 핵산 서열(예를 들어, Cas 유전자)의 발현을 지시할 수 있고 관심대상의 이러한 핵산 서열을 표적 세포로 전달할 수 있는 임의의 핵산 작제물을 포함한다. 예를 들어, Cas 단백질을 암호화하는 핵산은 핵산 삽입물을 포함하는 표적화 벡터 및/또는 gRNA를 암호화하는 DNA를 포함하는 벡터에 존재할 수 있다. 대안적으로, 그것은 핵산 삽입물을 포함하는 표적화 벡터와 분리된 그리고/또는 gRNA를 암호화하는 DNA를 포함하는 벡터와 분리된 벡터 또는 플라스미드에 존재할 수 있다. 발현 작제물에 사용될 수 있는 프로모터는, 예를 들어, 진핵 세포, 인간 세포, 비-인간 세포, 포유동물 세포, 비-인간 포유동물 세포, 설치류 세포, 마우스 세포, 래트 세포, 햄스터 세포, 토끼 세포, 다능성 세포, 배아 줄기(ES) 세포, 또는 접합체 중 하나 이상에서 활성인 프로모터를 포함한다. 이러한 프로모터는, 예를 들어, 조건부 프로모터, 유도성 프로모터, 구성적 프로모터, 또는 조직-특이적 프로모터일 수 있다. 일부 실시형태에서, 프로모터는 한 방향에서의 Cas 단백질의 발현 및 다른 방향에서의 가이드 RNA의 발현 둘 다를 지시하는 양방향성 프로모터일 수 있다. 이러한 양방향성 프로모터는 하기로 이루어질 수 있다: 1) 3개의 외부 제어 요소: 원위 서열 요소(distal sequence element: DSE), 근위 서열 요소(proximal sequence element: PSE), 및 TATA 박스를 함유한 완전한, 종래의, 단방향성 Pol III 프로모터; 및 2) PSE 및 역 배향으로 DSE의 5' 말단에 융합된 TATA 박스를 포함하는 제2 기본 Pol III 프로모터. 예를 들어, H1 프로모터에서, DSE는 PSE 및 TATA 박스에 인접하고, 프로모터는 역방향의 전사가 U6 프로모터로부터 유래된 PSE 및 TATA 박스를 첨부시킴으로써 제어되는 혼성 프로모터를 생성함으로써 양방향성이 될 수 있다. Cas 단백질 및 가이드 RNA를 암호화하는 유전자를 발현하기 위한 양방향성 프로모터의 사용은 동시에 전달을 용이하게 하기 위해서 컴팩트한 발현 카세트의 생성을 가능하게 한다.A nucleic acid encoding a Cas protein can be stably integrated into the genome of a cell and operably linked to a promoter active in the cell. Alternatively, a nucleic acid encoding a Cas protein can be operably linked to a promoter in an expression construct. Expression constructs include any nucleic acid construct capable of directing the expression of a gene or other nucleic acid sequence of interest (e.g., a Cas gene) and delivering such nucleic acid sequence of interest to a target cell. For example, a nucleic acid encoding a Cas protein can be present in a targeting vector containing a nucleic acid insert and/or a vector containing DNA encoding a gRNA. Alternatively, it may be present in a vector or plasmid separate from the targeting vector containing the nucleic acid insert and/or separate from the vector containing the DNA encoding the gRNA. Promoters that can be used in expression constructs include, for example, eukaryotic cells, human cells, non-human cells, mammalian cells, non-human mammalian cells, rodent cells, mouse cells, rat cells, hamster cells, rabbit cells. , pluripotent cells, embryonic stem (ES) cells, or zygotes. Such promoters may be, for example, conditional promoters, inducible promoters, constitutive promoters, or tissue-specific promoters. In some embodiments, the promoter may be a bidirectional promoter that directs both expression of the Cas protein in one direction and expression of the guide RNA in the other direction. Such a bidirectional promoter can consist of: 1) a complete, conventional promoter containing three external control elements: a distal sequence element (DSE), a proximal sequence element (PSE), and a TATA box; , unidirectional Pol III promoter; and 2) a second basic Pol III promoter containing a PSE and a TATA box fused to the 5' end of the DSE in reverse orientation. For example, in the H1 promoter, the DSE is adjacent to the PSE and TATA boxes, and the promoter can be made bidirectional by creating a hybrid promoter where transcription in the reverse direction is controlled by appending the PSE and TATA boxes from the U6 promoter. The use of a bidirectional promoter to express the genes encoding the Cas protein and guide RNA simultaneously allows the creation of compact expression cassettes to facilitate delivery.

본 개시내용은 또한 Cas 단백질(예를 들어, Cas9 단백질)에 결합하고, Cas 단백질을 표적 DNA 내의 특이적 위치(예를 들어, B4GALT1 유전자)에 표적화하는 가이드 RNA(gRNA)를 제공한다. 일부 실시형태에서, 가이드 RNA는 Cas 효소가 내인성 B4GALT1 유전자에 결합하거나 또는 이를 절단하도록 지시하기에 효과적이며, 여기서 가이드 RNA는 예를 들어, 서열번호 1의 53575 내지 53577번 위치를 포함하거나 또는 이에 인접한 내인성 B4GALT1 유전자 내의 가이드 RNA 인식 서열에 혼성화되는 DNA-표적화 분절을 포함한다. 예를 들어, 가이드 RNA 인식 서열은 서열번호 1의 53575 내지 53577번 위치의 약 5, 약 10, 약 15, 약 20, 약 25, 약 30, 약 35, 약 40, 약 45, 약 50, 약 100, 약 200, 약 300, 약 400, 약 500 또는 약 1,000개 뉴클레오타이드 내에 존재할 수 있다. 다른 예시적인 가이드 RNA는 서열번호 1의 엑손 5에 상응하는 영역 내에 존재하는 내인성 B4GALT1 유전자 내의 가이드 RNA 인식 서열에 혼성화되는 DNA-표적화 분절을 포함한다. 다른 예시적인 가이드 RNA는 내인성 B4GALT1 유전자의 시작 코돈을 포함하거나 그것에 인접하거나 또는 내인성 B4GALT1 유전자의 중단 코돈을 포함하거나 그것에 인접한 내인성 B4GALT1 유전자 내의 가이드 RNA 인식 서열에 혼성화되는 DNA-표적화 분절을 포함한다. 예를 들어, 가이드 RNA 인식 서열은 시작 코돈의 약 5, 약 10, 약 15, 약 20, 약 25, 약 30, 약 35, 약 40, 약 45, 약 50, 약 100, 약 200, 약 300, 약 400, 약 500 또는 약 1,000개 뉴클레오타이드 내에 또는 중단 코돈의 약 5, 약 10, 약 15, 약 20, 약 25, 약 30, 약 35, 약 40, 약 45, 약 50, 약 100, 약 200, 약 300, 약 400, 약 500 또는 약 1,000개 뉴클레오타이드 내에 존재할 수 있다. 내인성 B4GALT1 유전자는 임의의 유기체로부터의 B4GALT1 유전자일 수 있다. 예를 들어, 내인성 B4GALT1 유전자는 인간 B4GALT1 유전자이거나 또는 또 다른 유기체, 예를 들어, 비-인간 포유류, 설치류, 마우스, 또는 래트로부터의 오쏘로그일 수 있다.The present disclosure also provides a guide RNA (gRNA) that binds to a Cas protein (e.g., Cas9 protein) and targets the Cas protein to a specific location within a target DNA (e.g., the B4GALT1 gene). In some embodiments, the guide RNA is effective to direct the Cas enzyme to bind to or cleave the endogenous B4GALT1 gene, wherein the guide RNA includes, for example, positions 53575 to 53577 of SEQ ID NO: 1 or adjacent thereto. It comprises a DNA-targeting segment that hybridizes to a guide RNA recognition sequence within the endogenous B4GALT1 gene. For example, the guide RNA recognition sequence is about 5, about 10, about 15, about 20, about 25, about 30, about 35, about 40, about 45, about 50, about positions 53575 to 53577 of SEQ ID NO: 1. It may be within 100, about 200, about 300, about 400, about 500 or about 1,000 nucleotides. Another exemplary guide RNA includes a DNA-targeting segment that hybridizes to a guide RNA recognition sequence in the endogenous B4GALT1 gene within the region corresponding to exon 5 of SEQ ID NO:1. Other exemplary guide RNAs include DNA-targeting segments that hybridize to a guide RNA recognition sequence in the endogenous B4GALT1 gene that includes or is adjacent to the start codon of the endogenous B4GALT1 gene or that includes or is adjacent to the stop codon of the endogenous B4GALT1 gene. For example, the guide RNA recognition sequence is about 5, about 10, about 15, about 20, about 25, about 30, about 35, about 40, about 45, about 50, about 100, about 200, about 300 of the start codon. , within about 400, about 500, or about 1,000 nucleotides, or about 5, about 10, about 15, about 20, about 25, about 30, about 35, about 40, about 45, about 50, about 100, about a stop codon. It may be within 200, about 300, about 400, about 500 or about 1,000 nucleotides. The endogenous B4GALT1 gene can be a B4GALT1 gene from any organism. For example, the endogenous B4GALT1 gene can be the human B4GALT1 gene or an ortholog from another organism, such as a non-human mammal, rodent, mouse, or rat.

일부 실시형태에서, 가이드 RNA 인식 서열은 인간 B4GALT1 유전자의 5' 단부에 존재한다. 일부 실시형태에서, 가이드 RNA 인식 서열은 인간 B4GALT1 유전자의 전사 시작 부위(TSS)에 인접한다. 일부 실시형태에서, 가이드 RNA 인식 서열은 인간 B4GALT1 유전자의 3' 단부에 존재한다. 일부 실시형태에서, 가이드 RNA 인식 서열은 서열번호 1의 53575 내지 53577번 위치에 인접한다. 서열번호 1의 53575 내지 53577번 위치에 인접한 예시적인 가이드 RNA 인식 서열은 ATTAGTTTTTAGAGGCATGT(서열번호 9) 및 GGCTCTCAGGCCAAGTGTAT(서열번호 10)(서열번호 1의 53575 내지 53577번 위치에 대한 5' 모두) 및 TACTCCTTCCCCCTTTAGGA(서열번호 11) 및 GTCCGAGGCTCTGGGCCTAG(서열번호 12)(서열번호 1의 53575 내지 53577번 위치에 대한 3' 모두)를 포함하지만 이들로 제한되지 않는다In some embodiments, the guide RNA recognition sequence is at the 5' end of the human B4GALT1 gene. In some embodiments, the guide RNA recognition sequence is adjacent to the transcription start site (TSS) of the human B4GALT1 gene. In some embodiments, the guide RNA recognition sequence is at the 3' end of the human B4GALT1 gene. In some embodiments, the guide RNA recognition sequence is adjacent to positions 53575-53577 of SEQ ID NO:1. Exemplary guide RNA recognition sequences adjacent to positions 53575 to 53577 of SEQ ID NO: 1 include ATTAGTTTTTAGAGGCATGT (SEQ ID NO: 9) and GGCTCTCAGGCCAAGTGTAT (SEQ ID NO: 10) (all 5' to positions 53575 to 53577 of SEQ ID NO: 1) and TACTCCTTCCCCCTTTAGGA ( SEQ ID NO: 11) and GTCCGAGGCTCTGGGCCTAG (SEQ ID NO: 12) (all 3' to positions 53575 to 53577 of SEQ ID NO: 1).

가이드 RNA는 2개의 분절: DNA-표적화 분절 및 단백질-결합 분절을 포함할 수 있다. 일부 gRNA는 2개의 별개의 RNA 분자: 활성인자-RNA(예를 들어, tracrRNA) 및 표적인자-RNA(예를 들어, CRISPR RNA 또는 crRNA)를 포함할 수 있다. 다른 gRNA는 또한 단일 RNA 분자(단일 RNA 폴리뉴클레오타이드; 단일-분자 gRNA, 단일-가이드 RNA 또는 sgRNA)이다. Cas9의 경우, 예를 들어, 단일-가이드 RNA는 tracrRNA에(예를 들어, 링커를 통해서) 융합된 crRNA를 포함할 수 있다. Cpf1의 경우, 예를 들어, 절단을 달성하기 위해서 crRNA만 필요하다. gRNA는 이중-분자(즉, 모듈성) gRNA 및 단일-분자 gRNA를 둘 다를 포함한다.The guide RNA may include two segments: a DNA-targeting segment and a protein-binding segment. Some gRNAs may comprise two separate RNA molecules: an activator-RNA (e.g., tracrRNA) and a target-RNA (e.g., CRISPR RNA or crRNA). Other gRNAs are also single RNA molecules (single RNA polynucleotides; single-molecule gRNA, single-guide RNA or sgRNA). For Cas9, for example, the single-guide RNA may comprise a crRNA fused to a tracrRNA (e.g., via a linker). For Cpf1, for example, only crRNA is required to achieve cleavage. gRNAs include both bi-molecular (i.e. modular) gRNAs and single-molecule gRNAs.

주어진 gRNA의 DNA-표적화 분절(crRNA)은 표적 DNA에서 서열에 상보적인 뉴클레오타이드 서열(즉, 가이드 RNA 인식 서열)을 포함한다. gRNA의 DNA-표적화 분절은 혼성화(즉, 염기 짝지움)을 통한 서열-특이적 방식으로 표적 DNA(예를 들어, B4GALT1 유전자)와 상호작용한다. 이와 같이, DNA-표적화 분절의 뉴클레오타이드 서열은 달라질 수 있고 gRNA 및 표적 DNA가 상호작용할 표적 DNA 내에서의 위치를 결정한다. 대상 gRNA의 DNA-표적화 분절은 표적 DNA 내에서 임의의 목적하는 서열에 혼성화되도록 변형될 수 있다. 자연 발생 crRNA는 CRISPR/Cas 시스템 및 유기체에 따라 상이하지만, 종종 약 21 내지 약 46개 뉴클레오타이드 길이의 2개의 직접 반복부(DR)에 의해 측접된, 약 21 내지 약 72개 뉴클레오타이드 길이의 표적화 분절을 함유한다. S. 피오게네스의 경우, DR은 36개 뉴클레오타이드 길이이고, 표적화 분절은 30개 뉴클레오타이드 길이이다. 3' 위치된 DR은 상응하는 tracrRNA에 대해서 상보적이고, 그것과 혼성화되고, 결국 Cas 단백질에 결합한다.The DNA-targeting segment (crRNA) of a given gRNA contains a nucleotide sequence complementary to a sequence in the target DNA (i.e., guide RNA recognition sequence). The DNA-targeting segment of the gRNA interacts with the target DNA (e.g., the B4GALT1 gene) in a sequence-specific manner through hybridization (i.e., base pairing). As such, the nucleotide sequence of the DNA-targeting segment can vary and determines the location within the target DNA at which the gRNA and target DNA will interact. The DNA-targeting segment of the gRNA of interest can be modified to hybridize to any desired sequence within the target DNA. Naturally occurring crRNAs vary depending on the CRISPR/Cas system and organism, but often have a targeting segment of about 21 to about 72 nucleotides in length, flanked by two direct repeats (DRs) of about 21 to about 46 nucleotides in length. Contains. For S. pyogenes, the DR is 36 nucleotides long and the targeting segment is 30 nucleotides long. The 3' positioned DR is complementary to the corresponding tracrRNA, hybridizes with it, and eventually binds to the Cas protein.

DNA-표적화 분절은 적어도 약 12개 뉴클레오타이드, 적어도 약 15개 뉴클레오타이드, 적어도 약 17개 뉴클레오타이드, 적어도 약 18개 뉴클레오타이드, 적어도 약 19개 뉴클레오타이드, 적어도 약 20개 뉴클레오타이드, 적어도 약 25개 뉴클레오타이드, 적어도 약 30개 뉴클레오타이드, 적어도 약 35개 뉴클레오타이드 또는 적어도 약 40개 뉴클레오타이드의 길이를 가질 수 있다. 이러한 DNA-표적화 분절은 약 12개 뉴클레오타이드 내지 약 100개 뉴클레오타이드, 약 12개 뉴클레오타이드 내지 약 80개 뉴클레오타이드, 약 12개 뉴클레오타이드 내지 약 50개 뉴클레오타이드, 약 12개 뉴클레오타이드 내지 약 40개 뉴클레오타이드, 약 12개 뉴클레오타이드 내지 약 30개 뉴클레오타이드, 약 12개 뉴클레오타이드 내지 약 25개 뉴클레오타이드 또는 약 12개 뉴클레오타이드 내지 약 20개 뉴클레오타이드의 길이를 가질 수 있다. 예를 들어, DNA 표적화 분절은 약 15개 뉴클레오타이드 내지 약 25개 뉴클레오타이드(예를 들어, 약 17개 뉴클레오타이드 내지 약 20개 뉴클레오타이드 또는 약 17개 뉴클레오타이드, 약 18개 뉴클레오타이드, 약 19개 뉴클레오타이드 또는 약 20개 뉴클레오타이드)일 수 있다(예를 들어, 미국 특허 공개 제2016/0024523호 참고). S. 피오게네스로부터의 Cas9의 경우, 전형적인 DNA-표적화 분절은 약 16개 내지 약 20개 뉴클레오타이드 길이 또는 약 17개 내지 약 20개 뉴클레오타이드 길이이다. S. 아우레우스로부터의 Cas9의 경우, 전형적인 DNA-표적화 분절은 약 21개 내지 약 23개 뉴클레오타이드 길이이다. Cpf1의 경우, 전형적인 DNA-표적화 분절은 적어도 약 16개 뉴클레오타이드 길이 또는 적어도 약 18개 뉴클레오타이드 길이이다.The DNA-targeting segment is at least about 12 nucleotides, at least about 15 nucleotides, at least about 17 nucleotides, at least about 18 nucleotides, at least about 19 nucleotides, at least about 20 nucleotides, at least about 25 nucleotides, at least about 30 nucleotides. It may have a length of at least about 35 nucleotides, or at least about 40 nucleotides. This DNA-targeting segment can be about 12 nucleotides to about 100 nucleotides, about 12 nucleotides to about 80 nucleotides, about 12 nucleotides to about 50 nucleotides, about 12 nucleotides to about 40 nucleotides, about 12 nucleotides. It may have a length of from about 30 nucleotides, from about 12 nucleotides to about 25 nucleotides, or from about 12 nucleotides to about 20 nucleotides. For example, the DNA targeting segment may be about 15 nucleotides to about 25 nucleotides (e.g., about 17 nucleotides to about 20 nucleotides or about 17 nucleotides, about 18 nucleotides, about 19 nucleotides or about 20 nucleotides). nucleotides) (see, e.g., US Patent Publication No. 2016/0024523). For Cas9 from S. pyogenes, a typical DNA-targeting segment is about 16 to about 20 nucleotides long or about 17 to about 20 nucleotides long. For Cas9 from S. aureus, a typical DNA-targeting segment is about 21 to about 23 nucleotides long. For Cpf1, a typical DNA-targeting segment is at least about 16 nucleotides long or at least about 18 nucleotides long.

표적 DNA 내의 가이드 RNA 인식 서열과 DNA-표적화 서열 간의 상보성 백분율은 적어도 약 60%, 적어도 약 65%, 적어도 약 70%, 적어도 약 75%, 적어도 약 80%, 적어도 약 85%, 적어도 약 90%, 적어도 약 95%, 적어도 약 97%, 적어도 약 98%, 적어도 약 99% 또는 100%일 수 있다. 표적 DNA 내의 가이드 RNA 인식 서열과 DNA-표적화 서열 간의 상보성 백분율은 약 20개의 인접 뉴클레오타이드에 걸쳐서 적어도 약 60%일 수 있다. 예로서, 표적 DNA 내의 가이드 RNA 인식 서열과 DNA-표적화 서열 간의 상보성 백분율은 표적 DNA의 상보성 가닥 내의 가이드 RNA 인식 서열의 5' 단부에서 약 14개의 인접 뉴클레오타이드에 걸쳐서 약 100%이고, 나머지에 걸쳐서 약 0%이다. 이러한 경우, DNA-표적화 서열은 약 14개 뉴클레오타이드의 길이라고 간주될 수 있다. 또 다른 예로서, 표적 DNA 내의 가이드 RNA 인식 서열과 DNA-표적화 서열 간의 상보성 백분율은 표적 DNA의 상보성 가닥 내의 가이드 RNA 인식 서열의 5' 단부에서 7개의 인접 뉴클레오타이드에 걸쳐서 약 100%이고, 나머지에 걸쳐서 약 0%이다. 이러한 경우, DNA-표적화 서열은 약 7개 뉴클레오타이드의 길이라고 간주될 수 있다. 일부 가이드 RNA에서, DNA-표적 서열 내에서 적어도 약 17개의 뉴클레오타이드가 표적 DNA에 상보적이다. 예를 들어, DNA-표적화 서열은 약 20개 뉴클레오타이드 길이일 수 있고, 표적 DNA(가이드 RNA 인식 서열)와 1, 2, 또는 3개의 미츠매치를 포함할 수 있다. 일부 실시형태에서, 미스매치는 프로토스페이서 인접 모티프(protospacer adjacent motif: PAM) 서열에 인접하지 않는다(예를 들어, 미스매치는 DNA-표적화 서열의 5' 단부에 존재하거나, 미스매치는 PAM 서열로부터 적어도 2, 적어도 3, 적어도 4, 적어도 5, 적어도 6, 적어도 7, 적어도 8, 적어도 9, 적어도 10, 적어도 11, 적어도 12, 적어도 13, 적어도 14, 적어도 15, 적어도 16, 적어도 17, 적어도 18 또는 적어도 19개 염기쌍 멀리 존재함).The percent complementarity between the guide RNA recognition sequence and the DNA-targeting sequence in the target DNA is at least about 60%, at least about 65%, at least about 70%, at least about 75%, at least about 80%, at least about 85%, at least about 90%. , at least about 95%, at least about 97%, at least about 98%, at least about 99%, or 100%. The percent complementarity between the guide RNA recognition sequence in the target DNA and the DNA-targeting sequence may be at least about 60% over about 20 contiguous nucleotides. As an example, the percent complementarity between the guide RNA recognition sequence in the target DNA and the DNA-targeting sequence is about 100% over about 14 contiguous nucleotides at the 5' end of the guide RNA recognition sequence within the complementary strand of the target DNA, and about 100% over the remainder. It is 0%. In this case, the DNA-targeting sequence can be considered to be approximately 14 nucleotides in length. As another example, the percent complementarity between the guide RNA recognition sequence in the target DNA and the DNA-targeting sequence is about 100% over the 7 contiguous nucleotides at the 5' end of the guide RNA recognition sequence in the complementary strand of the target DNA, and over the remainder. It's about 0%. In this case, the DNA-targeting sequence can be considered to be approximately 7 nucleotides in length. In some guide RNAs, at least about 17 nucleotides within the DNA-target sequence are complementary to the target DNA. For example, the DNA-targeting sequence may be about 20 nucleotides long and may contain 1, 2, or 3 matches with the target DNA (guide RNA recognition sequence). In some embodiments, the mismatch is not adjacent to a protospacer adjacent motif (PAM) sequence (e.g., the mismatch is at the 5' end of the DNA-targeting sequence, or the mismatch is adjacent to the PAM sequence). At least 2, at least 3, at least 4, at least 5, at least 6, at least 7, at least 8, at least 9, at least 10, at least 11, at least 12, at least 13, at least 14, at least 15, at least 16, at least 17, at least 18 or at least 19 base pairs away).

가이드 RNA는 추가적인 바람직한 특징(예를 들어, 변형된 또는 조절된 안정성; 세포내 표적화; 형광 표지로의 추적; 단백질 또는 단백질 복합체에 대한 결합 부위 등)을 제공하는 변형 또는 서열을 포함할 수 있다. 이러한 변형의 예는, 예를 들어, 5' 캡(예를 들어, 7-메틸구아닐레이트 캡(m7G)); 3' 폴리아데닐화된 꼬리(즉, 3' 폴리(A) 꼬리); 리보스위치 서열(예를 들어, 단백질 및/또는 단백질 복합체에 의해 조절된 안정성 및/또는 조절된 접근성을 허용하기 위함); 안정성 제어 서열; dsRNA 듀플렉스를 형성하는 서열(즉, 헤어핀); 세포내 위치(예를 들어, 핵, 미토콘드리아, 엽록체 등)를 위해서 RNA를 표적화하는 변형 또는 서열; 추적을 제공하는 변형 또는 서열(예를 들어, 형광 분자에 대한 직접 접합, 형광 검출을 용이하게 하는 모이어티에 대한 접합, 형광 검출을 허용하는 서열 등); 단백질(예를 들어, 전사 활성인자, 전사 억제인자, DNA 메틸트랜스퍼라제, DNA 데메틸라제, 히스톤 아세틸트랜스퍼라제, 히스톤 탈아세틸화효소 등을 비롯한 DNA에 작용하는 단백질)에 대한 결합 부위를 제공하는 변형 또는 서열; 및 이들의 조합물을 포함한다.Guide RNAs may contain modifications or sequences that provide additional desirable characteristics (e.g., modified or controlled stability; intracellular targeting; tracking with a fluorescent label; binding sites for proteins or protein complexes, etc.). Examples of such modifications include, for example, 5' caps (e.g., 7-methylguanylate cap (m7G)); 3' polyadenylated tail (i.e., 3' poly(A) tail); riboswitch sequences (e.g., to allow controlled stability and/or controlled accessibility by proteins and/or protein complexes); stability control sequence; Sequences that form dsRNA duplexes (i.e., hairpins); Modifications or sequences that target RNA to an intracellular location (e.g., nucleus, mitochondria, chloroplast, etc.); Modifications or sequences that provide tracking (e.g., direct conjugation to a fluorescent molecule, conjugation to a moiety that facilitates fluorescence detection, sequences that allow fluorescence detection, etc.); Providing a binding site for proteins (e.g., proteins that act on DNA, including transcriptional activators, transcriptional repressors, DNA methyltransferases, DNA demethylases, histone acetyltransferases, histone deacetylases, etc.) modification or sequence; and combinations thereof.

가이드 RNA는 임의의 형태로 제공될 수 있다. 예를 들어, gRNA는 두 분자로서(별개의 crRNA 및 tracrRNA) 또는 하나의 분자로서(sgRNA), RNA의 형태로, 그리고 선택적으로 Cas 단백질과의 복합체의 형태로 제공될 수 있다. 예를 들어, gRNA는, 예를 들어, T7 RNA 중합효소를 사용하여 시험관내 전사에 의해 제조될 수 있다. 가이드 RNA는 또한 화학적 합성에 의해서 제조될 수 있다.Guide RNA may be provided in any form. For example, the gRNA can be provided in the form of RNA, as two molecules (separate crRNA and tracrRNA) or as one molecule (sgRNA), and optionally in the form of a complex with a Cas protein. For example, gRNA can be prepared by in vitro transcription using, for example, T7 RNA polymerase. Guide RNA can also be prepared by chemical synthesis.

gRNA는 또한 gRNA를 암호화하는 DNA의 형태로 제공될 수 있다. gRNA를 암호화하는 DNA는 단일 RNA 분자(sgRNA) 또는 별개의 RNA 분자(예를 들어, 별개의 crRNA 및 tracrRNA)를 암호화할 수 있다. 후자의 경우에, gRNA를 암호화하는 DNA는 하나의 DNA 분자로서 또는 crRNA 및 tracrRNA를 각각 암호화하는 별개의 DNA 분자로서 제공될 수 있다. gRNA가 DNA의 형태로 제공되는 경우, gRNA는 세포에서 일시적으로, 조건부로, 또는 구성적으로 발현될 수 있다. gRNA를 암호화하는 DNA는 세포의 게놈에 안정적으로 통합될 수 있고, 세포에서 활성인 프로모터에 작동 가능하게 연결될 수 있다. 대안적으로, gRNA를 암호화하는 DNA가 발현 작제물에서 프로모터에 작동 가능하게 연결될 수 있다. 예를 들어, gRNA를 암호화하는 DNA가 이종 핵산을 포함하는 벡터 내에 존재할 수 있다. 벡터는 외인성 공여자 서열을 추가로 포함할 수 있고/있거나 벡터는 Cas 단백질을 암호화하는 핵산을 추가로 포함할 수 있다. 대안적으로, gRNA를 암호화하는 DNA는 외인성 공여자 서열을 포함하는 벡터 및/또는 Cas 단백질을 암호화하는 핵산을 포함하는 벡터와 별개인 벡터 또는 플라스미드에 존재할 수 있다. 이러한 발현 작제물에 사용될 수 있는 프로모터는, 예를 들어, 진핵 세포, 인간 세포, 비-인간 세포, 포유동물 세포, 비-인간 포유동물 세포, 설치류 세포, 마우스 세포, 래트 세포, 햄스터 세포, 토끼 세포, 다능성 세포, 배아 줄기 세포, 또는 접합체 중 하나 이상에서 활성인 프로모터를 포함한다. 이러한 프로모터는, 예를 들어, 조건부 프로모터, 유도성 프로모터, 구성적 프로모터, 또는 조직-특이적 프로모터일 수 있다. 이러한 프로모터는 또한 예를 들어, 양방향성 프로모터일 수 있다. 적합한 프로모터의 구체적인 예는 RNA 중합효소 III 프로모터, 예컨대, 인간 U6 프로모터, 래트 U6 중합효소 III 프로모터 또는 마우스 U6 중합효소 III 프로모터를 포함한다.gRNA can also be provided in the form of DNA encoding the gRNA. DNA encoding a gRNA may encode a single RNA molecule (sgRNA) or separate RNA molecules (e.g., separate crRNA and tracrRNA). In the latter case, the DNA encoding the gRNA may be provided as one DNA molecule or as separate DNA molecules encoding the crRNA and tracrRNA, respectively. If the gRNA is provided in the form of DNA, the gRNA may be expressed transiently, conditionally, or constitutively in the cell. DNA encoding a gRNA can be stably integrated into the genome of a cell and operably linked to a promoter active in the cell. Alternatively, DNA encoding the gRNA can be operably linked to a promoter in an expression construct. For example, DNA encoding a gRNA may be present in a vector containing a heterologous nucleic acid. The vector may further comprise an exogenous donor sequence and/or the vector may further comprise a nucleic acid encoding a Cas protein. Alternatively, the DNA encoding the gRNA may be present in a vector or plasmid that is separate from the vector containing the exogenous donor sequence and/or the vector containing the nucleic acid encoding the Cas protein. Promoters that can be used in such expression constructs include, for example, eukaryotic cells, human cells, non-human cells, mammalian cells, non-human mammalian cells, rodent cells, mouse cells, rat cells, hamster cells, rabbits. and a promoter that is active in one or more of the cells, pluripotent cells, embryonic stem cells, or zygotes. Such promoters may be, for example, conditional promoters, inducible promoters, constitutive promoters, or tissue-specific promoters. Such promoters may also be, for example, bidirectional promoters. Specific examples of suitable promoters include RNA polymerase III promoter, such as human U6 promoter, rat U6 polymerase III promoter or mouse U6 polymerase III promoter.

본 개시내용은 또한 본 명세서에 개시된 1종 이상의 가이드 RNA(예를 들어, 1, 2, 3, 4종 또는 그 초과의 가이드 RNA) 및 단리된 핵산 분자 또는 단백질의 안정성을 증가시키는(예를 들어, 분해 산물이 역치 미만, 예컨대, 출발 핵산 또는 단백질의 0.5중량% 미만으로 남아있는 주어진 저장 조건(예를 들어, -20℃, 4℃, 또는 주변 온도) 하에서 기간을 연장시키거나; 또는 생체내 안정성을 증가시킴) 담체를 포함하는 조성물을 제공한다. 이러한 담체의 예는 폴리(락트산)(PLA) 미소구체, 폴리(D,L-락트산-코글리콜산)(PLGA) 미소구체, 리포솜, 미셀, 역미셀, 지질 코클레이트(cochleate) 및 지질 미소관을 포함하지만 이들로 제한되지 않는다. 이러한 조성물은 Cas 단백질, 예컨대, Cas9 단백질 또는 Cas 단백질을 암호화하는 핵산을 추가로 포함할 수 있다. 이러한 조성물은 본 명세서 다른 곳에 개시된 바와 같은 1종 이상의(예를 들어, 1, 2, 3, 4 또는 그 초과의) 외인성 공여자 서열 및/또는 1종 이상의(예를 들어, 1, 2, 3, 4 또는 그 초과의) 표적화 벡터 및/또는 1종 이상의(예를 들어, 1, 2, 3, 4 또는 그 초과의) 발현 벡터를 추가로 포함할 수 있다.The present disclosure also provides guidance for one or more guide RNAs disclosed herein (e.g., 1, 2, 3, 4 or more guide RNAs) and methods for increasing the stability of an isolated nucleic acid molecule or protein (e.g. , extending the period under given storage conditions (e.g., -20°C, 4°C, or ambient temperature) during which degradation products remain below a threshold, e.g., less than 0.5% by weight of the starting nucleic acid or protein; or in vivo. Provides a composition comprising a carrier (increasing stability). Examples of such carriers include poly(lactic acid) (PLA) microspheres, poly(D,L-lactic acid-coglycolic acid) (PLGA) microspheres, liposomes, micelles, reverse micelles, lipid cochleates, and lipid microspheres. Including, but not limited to, tubes. Such compositions may further include a Cas protein, such as a Cas9 protein or a nucleic acid encoding a Cas protein. Such compositions may include one or more (e.g., 1, 2, 3, 4 or more) exogenous donor sequences and/or one or more (e.g., 1, 2, 3, or more) exogenous donor sequences as disclosed elsewhere herein. It may further comprise 4 or more targeting vectors and/or one or more (e.g., 1, 2, 3, 4 or more) expression vectors.

가이드 RNA 인식 서열은 gRNA의 DNA-표적화 분절이 결합할 표적 DNA (예를 들어, B4GALT1 유전자)에 존재하는 핵산 서열을 포함하되, 단 결합이 존재하기 위해 충분한 조건을 제공해야 한다. 예를 들어, 가이드 RNA 인식 서열은, 가이드 RNA가 상보성을 갖도록 설계된 서열을 포함하고, 여기서 가이드 RNA 인식 서열과 DNA 표적화 서열 간의 혼성화가 CRISPR 복합체의 형성을 촉진시킨다. 완전한 상보성이 반드시 요구되는 것은 아니되, 단 혼성화를 유발하고, CRISPR 복합체의 형성을 촉진하기 위해서 충분한 상보성이 존재해야 한다. 가이드 RNA 인식 서열은 또한 하기에서 보다 상세하게 기재되는, Cas 단백질에 대한 절단 부위를 포함한다. 가이드 RNA 인식 서열은 예를 들어, 세포의 핵 또는 세포질에 또는 세포의 소기관 내에, 예컨대, 미토콘드리온 또는 엽록체 내에 위치될 수 있는 임의의 폴리뉴클레오타이드를 포함할 수 있다.The guide RNA recognition sequence includes a nucleic acid sequence present in the target DNA (e.g., B4GALT1 gene) to which the DNA-targeting segment of the gRNA will bind, provided that sufficient conditions are provided for binding to occur. For example, the guide RNA recognition sequence includes a sequence designed to be complementary to the guide RNA, wherein hybridization between the guide RNA recognition sequence and the DNA targeting sequence promotes formation of the CRISPR complex. Complete complementarity is not necessarily required, but sufficient complementarity must exist to induce hybridization and promote the formation of the CRISPR complex. The guide RNA recognition sequence also contains a cleavage site for the Cas protein, described in more detail below. The guide RNA recognition sequence may comprise any polynucleotide that can be located, for example, in the nucleus or cytoplasm of the cell or within an organelle of the cell, such as the mitochondrion or chloroplast.

표적 DNA 내의 가이드 RNA 인식 서열은 Cas 단백질 또는 gRNA에 의해서 표적화될 수 있다(즉, 이에 의해서 결합될 수 있거나 또는 이와 혼성화될 수 있거나 이와 상보성일 수 있다). 적합한 DNA/RNA 결합 조건은 세포에 일반적으로 존재하는 생리학적 조건을 포함한다. 다른 적합한 DNA/RNA 결합 조건은 공지되어 있다.The guide RNA recognition sequence in the target DNA can be targeted by (i.e., can be bound to, hybridize to, or be complementary to) a Cas protein or a gRNA. Suitable DNA/RNA binding conditions include physiological conditions normally present in cells. Other suitable DNA/RNA binding conditions are known.

Cas 단백질은 gRNA의 DNA-표적화 분절이 결합할 표적 DNA에 존재하는 핵산 서열 내에 또는 그 외부에 존재한 부위에서 핵산을 절단할 수 있다. "절단 부위"는 Cas 단백질이 단일 가닥 끊김 또는 이중 가닥 끊김을 생성시키는 핵산의 위치를 포함한다. 예를 들어, CRISPR 복합체(가이드 RNA 인식 서열에 혼성화되고 Cas 단백질과 복합체를 이루는 gRNA를 포함함)의 형성은 gRNA의 DNA-표적화 분절이 결합할 표적 DNA에 존재하는 핵산 서열에 또는 그 근처에(예를 들어, 핵산 서열로부터 1, 2, 3, 4, 5, 6, 7, 8, 9, 10, 20, 50개, 또는 그 초과의 염기쌍 내에) 있는 하나 또는 양 가닥의 절단을 초래할 수 있다. 절단 부위는 핵산의 단지 하나의 가닥에 또는 양 가닥에 존재할 수 있다. 절단 부위는 핵산의 양 가닥 상의 동일한 위치에 존재할 수 있거나(블런트 단부를 생성함) 또는 각각의 단부 상의 상이한 부위에 존재할 수 있다(엇갈린 단부(즉, 오버행)를 생성함). 일부 실시형태에, 제1 가닥 상의 닉카제의 가이드 RNA 인식 서열은 제2 가닥 상의 닉카제의 가이드 RNA 인식 서열과 적어도 2, 적어도 3, 적어도 4, 적어도 5, 적어도 6, 적어도 7, 적어도 8, 적어도 9, 적어도 10, 적어도 15, 적어도 20, 적어도 25, 적어도 30, 적어도 40, 적어도 50, 적어도 75, 적어도 100, 적어도 250, 적어도 500 또는 적어도 1,000개의 염기쌍만큼 분리되어 있다.Cas proteins can cleave nucleic acids at sites within or outside the nucleic acid sequence present in the target DNA to which the DNA-targeting segment of the gRNA will bind. A “cleavage site” includes the location in the nucleic acid at which the Cas protein creates a single-strand break or a double-strand break. For example, the formation of a CRISPR complex (comprising a gRNA that hybridizes to a guide RNA recognition sequence and is in complex with a Cas protein) involves placing (at or near) a nucleic acid sequence present in the target DNA to which the DNA-targeting segment of the gRNA will bind. cleavage of one or both strands (e.g., within 1, 2, 3, 4, 5, 6, 7, 8, 9, 10, 20, 50, or more base pairs from the nucleic acid sequence) . The cleavage site may be on only one strand or on both strands of the nucleic acid. The cleavage site may be at the same location on both strands of the nucleic acid (creating blunt ends) or may be at different sites on each end (creating staggered ends (i.e., overhangs)). In some embodiments, the guide RNA recognition sequence of the nickase on the first strand is at least 2, at least 3, at least 4, at least 5, at least 6, at least 7, at least 8, and the guide RNA recognition sequence of the nickase on the second strand. are separated by at least 9, at least 10, at least 15, at least 20, at least 25, at least 30, at least 40, at least 50, at least 75, at least 100, at least 250, at least 500, or at least 1,000 base pairs.

Cas 단백질에 의한 표적 DNA의 부위-특이적 절단은 i) gRNA와 표적 DNA 간의 염기 짝지움 상보성 및 ii) 표적 DNA에서, 프로토스페이서 인접 모티프(PAM)라고 불리는, 짧은 모티프 둘 다에 의해 결정된 위치에서 일어날 수 있다. PAM은 가이드 RNA 인식 서열에 측접할 수 있다. 일부 실시형태에서, 가이드 RNA 인식 서열은 PAM에 의해서 3' 단부 상에서 측접될 수 있다. 대안적으로, 가이드 RNA 인식 서열은 PAM에 의해서 5' 단부 상에서 측접될 수 있다. 예를 들어, Cas 단백질의 절단 부위는 PAM 서열의 상류 또는 하류로 약 1 내지 약 10개 또는 약 2 내지 약 5개 염기쌍 (예를 들어, 3개 염기쌍)일 수 있다. 일부 경우에 (예를 들어, S. 피오게네스로부터의 Cas9 또는 밀접하게 관련된 Cas9가 사용되는 경우), 비-상보성 가닥의 PAM 서열은 5'-N₁GG-3'일 수 있고, 여기서 N₁은 임의의 DNA 뉴클레오타이드이고, 표적 DNA의 비-상보성 가닥의 가이드 RNA 인식 서열의 바로 3'에 존재한다. 이와 같이, 상보성 가닥의 PAM 서열은 5'-CCN₂-3'일 것이고, 여기서 N₂는 임의의 DNA 뉴클레오타이드이고, 표적 DNA의 상보성 가닥의 가이드 RNA 인식 서열의 바로 5'에 존재한다. 일부 이러한 경우에, N₁ 및 N₂는 상보적일 수 있고, N₁-N₂ 염기쌍은 임의의 염기쌍일 수 있다(예를 들어, N₁=C이고, N₂=G; N₁=G이고, N₂=C; N₁=A이고, N₂=T; 또는 N₁=T이고, N₂=A임). S. 아우레우스로부터의 Cas9의 경우에, PAM은 NNGRRT(서열번호 13) 또는 NNGRR(서열번호 14)일 수 있고, 여기서 N은 A, G, C, 또는 T일 수 있고, R은 G 또는 A일 수 있다. 일부 경우에 (예를 들어, FnCpf1의 경우), PAM 서열은 5' 단부의 상류에 존재할 수 있고, 서열 5'-TTN-3'를 갖는다.Site-specific cleavage of target DNA by Cas proteins occurs at positions determined by both i) the base pairing complementarity between the gRNA and the target DNA and ii) a short motif in the target DNA, called the protospacer adjacent motif (PAM). It can happen. The PAM may be flanked by a guide RNA recognition sequence. In some embodiments, the guide RNA recognition sequence may be flanked on the 3' end by a PAM. Alternatively, the guide RNA recognition sequence can be flanked on the 5' end by a PAM. For example, the cleavage site of the Cas protein may be about 1 to about 10 or about 2 to about 5 base pairs (e.g., 3 base pairs) upstream or downstream of the PAM sequence. In some cases (e.g., when Cas9 from S. pyogenes or a closely related Cas9 is used), the PAM sequence of the non-complementary strand may be 5'-N ₁ GG-3', where N ₁ is any DNA nucleotide and is immediately 3' of the guide RNA recognition sequence of the non-complementary strand of the target DNA. As such, the PAM sequence of the complementary strand will be 5'-CCN ₂ -3', where N ₂ is any DNA nucleotide and is immediately 5' of the guide RNA recognition sequence of the complementary strand of the target DNA. In some such cases, N ₁ and N ₂ may be complementary and the N ₁ -N ₂ base pair may be any base pair (e.g., N ₁ =C, N ₂ =G; N ₁ =G) , N ₂ =C; N ₁ =A and N ₂ =T; or N ₁ =T and N ₂ =A). For Cas9 from S. aureus, the PAM can be NNGRRT (SEQ ID NO: 13) or NNGRR (SEQ ID NO: 14), where N can be A, G, C, or T, and R can be G or It could be A. In some cases (e.g., for FnCpf1), the PAM sequence may be present upstream of the 5' end and has the sequence 5'-TTN-3'.

가이드 RNA 인식 서열의 예는 PAM 서열에 더하여 gRNA의 DNA-표적화 분절에 상보적인 DNA 서열 또는 이러한 DNA 서열을 포함한다. 예를 들어, 표적 모티프 서열은 Cas9 단백질에 의해 인식된 NGG 모티프 바로 앞에 있는 20-뉴클레오타이드 DNA 서열, 예컨대, GN₁₉NGG(서열번호 15) 또는 N₂₀NGG(서열번호 16)일 수 있다(예를 들어, PCT 공개 제WO 2014/165825호 참고). 5' 단부에서 구아닌은 세포에서 RNA 중합효소에 의한 전사를 용이하게 할 수 있다. 가이드 RNA 인식 서열의 다른 예는 시험관내에서 T7 중합효소에 의한 효율적인 전사를 용이하게 하기 위해서 5' 단부에 2개의 구아닌 뉴클레오타이드(예를 들어, GGN₂₀NGG; 서열번호 17)를 포함할 수 있다(예를 들어, PCT 공개 제WO 2014/065596호 참고). 다른 가이드 RNA 인식 서열은 5' G 또는 GG 및 3' GG 또는 NGG를 포함하는, 약 4 내지 약 22개 뉴클레오타이드 길이를 가질 수 있다. 일부 실시형태에서, 다른 가이드 RNA 인식 서열은 약 14 내지 약 20개 뉴클레오타이드 길이를 가질 수 있다.Examples of guide RNA recognition sequences include or include a DNA sequence complementary to the DNA-targeting segment of the gRNA in addition to the PAM sequence. For example, the target motif sequence can be a 20-nucleotide DNA sequence immediately preceding the NGG motif recognized by the Cas9 protein, such as GN ₁₉ NGG (SEQ ID NO: 15) or N ₂₀ NGG (SEQ ID NO: 16) (e.g. For example, see PCT Publication No. WO 2014/165825). Guanine at the 5' end can facilitate transcription by RNA polymerase in cells. Another example of a guide RNA recognition sequence may include two guanine nucleotides (e.g., GGN ₂₀ NGG; SEQ ID NO: 17) at the 5' end to facilitate efficient transcription by T7 polymerase in vitro ( See, for example, PCT Publication No. WO 2014/065596). Other guide RNA recognition sequences can be from about 4 to about 22 nucleotides in length, including a 5' G or GG and a 3' GG or NGG. In some embodiments, the other guide RNA recognition sequence may be about 14 to about 20 nucleotides in length.

가이드 RNA 인식 서열은 세포에 내인성 또는 외인성인 임의의 핵산 서열일 수 있다. 가이드 RNA 인식 서열은 유전자 산물(예를 들어, 단백질)을 암호화하는 서열 또는 비암호 서열(예를 들어, 조절 서열)일 수 있거나 또는 둘 다를 포함할 수 있다.The guide RNA recognition sequence can be any nucleic acid sequence that is endogenous or exogenous to the cell. The guide RNA recognition sequence may be a sequence encoding a gene product (e.g., a protein) or a non-coding sequence (e.g., a regulatory sequence), or may include both.

일부 실시형태에서, 가이드 RNA 인식 서열은 서열번호 1의 엑손 5에 상응하는 영역 내에 존재할 수 있다. 일부 실시형태에서, 가이드 RNA 인식 서열은 서열번호 1의 53575 내지 53577번 위치를 포함할 수 있거나 이에 인접한다. 예를 들어, 가이드 RNA 인식 서열은 서열번호 1의 53575 내지 53577번 위치에 상응하는 위치를 포함하거나 또는 이의 약 1000, 약 500, 약 400, 약 300, 약 200, 약 100, 약 50, 약 45, 약 40, 약 35, 약 30, 약 25, 약 20, 약 15, 약 10, 또는 약 5개 뉴클레오타이드 내에 존재한다. 일부 실시형태에서, 가이드 RNA 인식 서열은 내인성 B4GALT1 유전자의 시작 코돈 또는 내인성 B4GALT1 유전자의 중단 코돈을 포함할 수 있거나 이에 인접할 수 있다. 예를 들어, 가이드 RNA 인식 서열은 시작 코돈 또는 중단 코돈의 약 10, 약 20, 약 30, 약 40, 약 50, 약 100, 약 200, 약 300, 약 400, 약 500 또는 약 1,000개 뉴클레오타이드 내에 존재할 수 있다.In some embodiments, the guide RNA recognition sequence may be within the region corresponding to exon 5 of SEQ ID NO:1. In some embodiments, the guide RNA recognition sequence may comprise or is adjacent to positions 53575-53577 of SEQ ID NO:1. For example, the guide RNA recognition sequence includes positions corresponding to positions 53575 to 53577 of SEQ ID NO: 1, or about 1000, about 500, about 400, about 300, about 200, about 100, about 50, about 45 thereof. , about 40, about 35, about 30, about 25, about 20, about 15, about 10, or about 5 nucleotides. In some embodiments, the guide RNA recognition sequence may comprise or be adjacent to the start codon of the endogenous B4GALT1 gene or the stop codon of the endogenous B4GALT1 gene. For example, the guide RNA recognition sequence is within about 10, about 20, about 30, about 40, about 50, about 100, about 200, about 300, about 400, about 500, or about 1,000 nucleotides of the start codon or stop codon. It can exist.

본 명세서에 개시된 방법 및 조성물은 외인성 공여자 서열(예를 들어, 표적화 벡터 또는 수선 주형)을 활용하여, 내인성 B4GALT1 유전자의 절단 없이 또는 뉴클레아제 작용제로의 내인성 B4GALT1 유전자의 절단 이후에, 내인성 B4GALT1 유전자를 변형시킬 수 있다. 외인성 공여자 서열은 표적 서열과의 부위-특이적 재조합을 가능하게 하기 위해서 필요한 요소를 포함하는 임의의 핵산 또는 벡터를 지칭한다. 뉴클레아제 작용제와 조합하여 외인성 공여자 서열을 사용하는 것은 상동 직접 수선을 촉진시킴으로써 내인성 B4GALT1 유전자 내에서 보다 정확한 변형을 초래할 수 있다.The methods and compositions disclosed herein utilize an exogenous donor sequence (e.g., a targeting vector or repair template) to cleave the endogenous B4GALT1 gene, either without cleavage of the endogenous B4GALT1 gene or following cleavage of the endogenous B4GALT1 gene with a nuclease agent . can be transformed. An exogenous donor sequence refers to any nucleic acid or vector that contains the necessary elements to enable site-specific recombination with the target sequence. The use of exogenous donor sequences in combination with nuclease agents can result in more precise modifications within the endogenous B4GALT1 gene by promoting homologous direct repair.

이러한 방법에서, 뉴클레아제 작용제는 내인성 B4GALT1 유전자를 절단하여 단일 가닥 끊김(닉) 또는 이중 가닥 끊김을 생성하고, 외인성 공여자 서열은 비-상동성 단부 결합(NHEJ)-매개된 결찰을 통해서 또는 상동 직접 수선 사건을 통해서 내인성 B4GALT1 유전자와 재조합한다. 외인성 공여자 서열을 사용한 수선은 뉴클레아제 절단 부위를 제거하거나 파괴하여, 대립유전자가 뉴클레아제 작용제에 의해 재표적화될 수 없다.In this method, a nuclease agent cleaves the endogenous B4GALT1 gene to create a single-strand break (nick) or double-strand break, and the exogenous donor sequence is homologous or via non-homologous end joining (NHEJ)-mediated ligation. It recombines with the endogenous B4GALT1 gene through a direct repair event. Repair using an exogenous donor sequence removes or destroys the nuclease cleavage site, so the allele cannot be retargeted by nuclease agents.

외인성 공여자 서열은 데옥시리보핵산(DNA) 또는 리보핵산(RNA)을 포함하고, 이것은 단일 가닥이거나 이중 가닥일 수 있고, 선형 또는 원형 형태로 존재할 수 있다. 예를 들어, 외인성 공여자 서열은 단일 가닥 올리고데옥시뉴클레오타이드(ssODN)일 수 있다. 예시적인 외인성 공여자 서열은 약 50개 뉴클레오타이드 내지 약 5kb 길이, 약 50개 뉴클레오타이드 내지 약 3kb 길이, 또는 약 50 내지 약 1,000개 뉴클레오타이드 길이이다. 다른 예시적인 외인성 공여자 서열은 약 40 내지 약 200개 뉴클레오타이드 길이이다. 예를 들어, 외인성 공여자 서열은 약 50 내지 약 60, 약 60 내지 약 70, 약 70 내지 약 80, 약 80 내지 약 90, 약 90 내지 약 100, 약 100 내지 약 110, 약 110 내지 약 120, 약 120 내지 약 130, 약 130 내지 약 140, 약 140 내지 약 150, 약 150 내지 약 160, 약 160 내지 약 170, 약 170 내지 약 180, 약 180 내지 약 190 또는 약 190 내지 약 200 뉴클레오타이드 길이일 수 있다. 대안적으로, 외인성 공여자 서열은 약 50 내지 약 100, 약 100 내지 약 200, 약 200 내지 약 300, 약 300 내지 약 400, 약 400 내지 약 500, 약 500 내지 약 600, 약 600 내지 약 700, 약 700 내지 약 800, 약 800 내지 약 900 또는 약 900 내지 약 1,000개 뉴클레오타이드 길이일 수 있다. 대안적으로, 외인성 공여자 서열은 약 1kb 내지 약 1.5kb, 약 1.5kb 내지 약 2kb, 약 2kb 내지 약 2.5kb, 약 2.5kb 내지 약 3kb, 약 3kb 내지 약 3.5kb, 약 3.5kb 내지 약 4kb, 약 4kb 내지 약 4.5kb 또는 약 4.5kb 내지 약 5kb 길이일 수 있다. 대안적으로, 외인성 공여자 서열은 예를 들어, 약 5kb 이하, 약 4.5kb 이하, 약 4kb 이하, 약 3.5kb 이하, 약 3kb 이하, 약 2.5kb 이하, 약 2kb 이하, 약 1.5kb 이하, 약 1kb 이하, 약 900개 이하의 뉴클레오타이드, 약 800개 이하의 뉴클레오타이드, 약 700개 이하의 뉴클레오타이드, 약 600개 이하의 뉴클레오타이드, 약 500개 이하의 뉴클레오타이드, 약 400개 이하의 뉴클레오타이드, 약 300개 이하의 뉴클레오타이드, 약 200개 이하의 뉴클레오타이드, 약 100개 이하의 뉴클레오타이드 또는 약 50개 이하의 뉴클레오타이드 길이일 수 있다.The exogenous donor sequence comprises deoxyribonucleic acid (DNA) or ribonucleic acid (RNA), which may be single or double stranded and may exist in linear or circular form. For example, the exogenous donor sequence can be a single-stranded oligodeoxynucleotide (ssODN). Exemplary exogenous donor sequences are about 50 nucleotides to about 5 kb in length, about 50 nucleotides to about 3 kb in length, or about 50 to about 1,000 nucleotides in length. Other exemplary exogenous donor sequences are about 40 to about 200 nucleotides in length. For example, the exogenous donor sequence may be about 50 to about 60, about 60 to about 70, about 70 to about 80, about 80 to about 90, about 90 to about 100, about 100 to about 110, about 110 to about 120, about 120 to about 130, about 130 to about 140, about 140 to about 150, about 150 to about 160, about 160 to about 170, about 170 to about 180, about 180 to about 190 or about 190 to about 200 nucleotides in length. You can. Alternatively, the exogenous donor sequence may be about 50 to about 100, about 100 to about 200, about 200 to about 300, about 300 to about 400, about 400 to about 500, about 500 to about 600, about 600 to about 700, It may be about 700 to about 800, about 800 to about 900, or about 900 to about 1,000 nucleotides in length. Alternatively, the exogenous donor sequence is about 1 kb to about 1.5 kb, about 1.5 kb to about 2 kb, about 2 kb to about 2.5 kb, about 2.5 kb to about 3 kb, about 3 kb to about 3.5 kb, about 3.5 kb to about 4 kb, It may be about 4 kb to about 4.5 kb or about 4.5 kb to about 5 kb in length. Alternatively, the exogenous donor sequence may be, for example, less than about 5 kb, less than about 4.5 kb, less than about 4 kb, less than about 3.5 kb, less than about 3 kb, less than about 2.5 kb, less than about 2 kb, less than about 1.5 kb, less than about 1 kb. Hereinafter, about 900 nucleotides or less, about 800 nucleotides or less, about 700 nucleotides or less, about 600 nucleotides or less, about 500 nucleotides or less, about 400 nucleotides or less, about 300 nucleotides or less. , may be up to about 200 nucleotides, up to about 100 nucleotides, or up to about 50 nucleotides in length.

일부 실시형태에서, 외인성 공여자 서열은 약 80개 뉴클레오타이드 내지 약 200개 뉴클레오타이드 길이(예를 들어, 약 120개 뉴클레오타이드 길이)인 ssODN이다. 또 다른 실시예에서, 외인성 공여자 서열은 약 80개 뉴클레오타이드 내지 약 3kb 길이인 ssODN이다. 이러한 ssODN은 예를 들어, 각각 약 40개 뉴클레오타이드 내지 약 60개 뉴클레오타이드 길이인 상동성 아암을 가질 수 있다. 이러한 ssODN은 또한 예를 들어, 각각 약 30개 뉴클레오타이드 내지 약 100개 뉴클레오타이드 길이인 상동성 아암을 가질 수 있다. 상동성 아암은 대칭일 수 있거나(예를 들어, 각각 약 40개 뉴클레오타이드 또는 각각 약 60 뉴클레오타이드 길이), 또는 이것은 비대칭일 수 있다(예를 들어, 약 36개 뉴클레오타이드 길이인 하나의 상동성 아암 및 약 91개 뉴클레오타이드 길이인 하나의 상동성 아암).In some embodiments, the exogenous donor sequence is an ssODN that is about 80 nucleotides to about 200 nucleotides long (e.g., about 120 nucleotides long). In another embodiment, the exogenous donor sequence is an ssODN that is about 80 nucleotides to about 3 kb in length. Such ssODNs may have homology arms, for example, each about 40 nucleotides to about 60 nucleotides long. Such ssODNs may also have homology arms, for example, each about 30 nucleotides to about 100 nucleotides long. The homology arms may be symmetrical (e.g., about 40 nucleotides each or about 60 nucleotides long each), or they may be asymmetric (e.g., one homology arm about 36 nucleotides long and about 60 nucleotides long each). one homology arm that is 91 nucleotides long).

외인성 공여자 서열은 추가적인 바람직한 특징(예를 들어, 변형된 또는 조절된 안정성; 형광 표지로의 추적 또는 검출; 단백질 또는 단백질 복합체에 대한 결합 부위 등)을 제공하는 변형 또는 서열을 포함할 수 있다. 외인성 공여자 서열은 하나 이상의 형광 표지, 정제 태그, 에피토프 태그 또는 이들의 조합물을 포함할 수 있다. 예를 들어, 외인성 공여자 서열은 하나 이상의 형광 표지(예를 들어, 형광 단백질 또는 다른 형광단 또는 염료), 예컨대, 적어도 1, 적어도 2, 적어도 3, 적어도 4 또는 적어도 5개의 형광 표지를 포함할 수 있다. 예시적인 형광 표지는 형광단, 예컨대, 플루오레세인(예를 들어, 6-카복시플루오레세인(6-FAM)), Texas Red, HEX, Cy3, Cy5, Cy5.5, Pacific Blue, 5-(및-6)-카복시테트라메틸로다민(TAMRA) 및 Cy7을 포함한다. 광범위한 형광 염료가 올리고뉴클레오타이드를 표지하기 위해서 상업적으로 입수 가능하다(예를 들어, 인테그레이티드 디엔에이 테크놀로지즈사(통합된 DNA Technologies)). 이러한 형광 표지 (예를 들어, 내부 형광 표지)를 사용하여 예를 들어, 외인성 공여자 서열의 단부와 상용성인 돌출된 단부를 갖는 절단된 내인성 B4GALT1 유전자 내에 직접 통합된 외인성 공여자 서열을 검출할 수 있다. 표지 또는 태그는 5' 단부에, 3' 단부에, 또는 외인성 공여자 서열 내부에 존재할 수 있다. 예를 들어, 외인성 공여자 서열은 인테그레이티드 디엔에이 테크놀로지즈사로부터의 IR700 형광단(5'IRDYE^{(등록상표)}700)과 5' 단부에서 접합될 수 있다.The exogenous donor sequence may include modifications or sequences that provide additional desirable characteristics (e.g., modified or controlled stability; tracking or detection with a fluorescent label; binding site for a protein or protein complex, etc.). The exogenous donor sequence may include one or more fluorescent labels, purification tags, epitope tags, or combinations thereof. For example, the exogenous donor sequence may comprise one or more fluorescent labels (e.g., a fluorescent protein or other fluorophore or dye), such as at least 1, at least 2, at least 3, at least 4, or at least 5 fluorescent labels. there is. Exemplary fluorescent labels include fluorophores such as fluorescein (e.g., 6-carboxyfluorescein (6-FAM)), Texas Red, HEX, Cy3, Cy5, Cy5.5, Pacific Blue, 5-( and -6) -carboxytetramethylrhodamine (TAMRA) and Cy7. A wide range of fluorescent dyes are commercially available for labeling oligonucleotides (e.g., Integrated DNA Technologies). Such a fluorescent label (e.g., an internal fluorescent label) can be used to detect, for example, an exogenous donor sequence integrated directly into a truncated endogenous B4GALT1 gene with a protruding end that is compatible with the end of the exogenous donor sequence. The label or tag may be present at the 5' end, at the 3' end, or within the exogenous donor sequence. For example, an exogenous donor sequence can be conjugated at the 5' ^end with the IR700 fluorophore (5'IRDYE® 700) from Integrated DNA Technologies.

외인성 공여자 서열은 또한 내인성 B4GALT1 유전자에 통합될 DNA 분절을 포함하는 핵산 삽입물을 포함할 수 있다. 내인성 B4GALT1 유전자 내의 핵산 삽입물의 통합은 내인성 B4GALT1 유전자 내의 관심대상 핵산 서열의 첨가, 내인성 B4GALT1 유전자 내의 관심대상 핵산 서열의 결실 또는 내인성 B4GALT1 유전자 내의 관심대상 핵산 서열의 대체(즉, 결실 및 삽입)를 초래할 수 있다. 일부 외인성 공여자 서열은 내인성 B4GALT1 유전자 내에서의 임의의 상응하는 결실 없이 내인성 B4GALT1 유전자 내의 핵산 삽입물의 삽입을 위해서 설계된다. 다른 외인성 공여자 서열은 핵산 삽입물의 임의의 상응하는 삽입 없이 내인성 B4GALT1 유전자 내의 관심대상 핵산 서열을 결실시키도록 설계된다. 다른 외인성 공여자 서열은 내인성 B4GALT1 유전자 내의 관심대상 핵산 서열을 결실시키고, 그것을 핵산 삽입물로 대체하도록 설계된다. The exogenous donor sequence may also include a nucleic acid insert comprising a DNA segment to be integrated into the endogenous B4GALT1 gene. Integration of a nucleic acid insert within the endogenous B4GALT1 gene results in addition of the nucleic acid sequence of interest within the endogenous B4GALT1 gene, deletion of the nucleic acid sequence of interest within the endogenous B4GALT1 gene, or replacement (i.e. deletion and insertion) of the nucleic acid sequence of interest within the endogenous B4GALT1 gene. You can. Some exogenous donor sequences are designed for insertion of a nucleic acid insert within the endogenous B4GALT1 gene without any corresponding deletion within the endogenous B4GALT1 gene. Other exogenous donor sequences are designed to delete the nucleic acid sequence of interest within the endogenous B4GALT1 gene without any corresponding insertion of the nucleic acid insert. Another exogenous donor sequence is designed to delete the nucleic acid sequence of interest within the endogenous B4GALT1 gene and replace it with a nucleic acid insert.

결실 및/또는 대체될 내인성 B4GALT1 유전자 내의 핵산 삽입물 및 상응하는 핵산은 다양한 길이일 수 있다. 결실 및/또는 대체될 내인성 B4GALT1 유전자 내의 예시적인 핵산 삽입물 또는 상응하는 핵산은 약 1개 뉴클레오타이드 내지 약 5kb 길이이거나 또는 약 1개 뉴클레오타이드 내지 약 1,000개 뉴클레오타이드 길이이다. 예를 들어, 결실 및/또는 대체될 내인성 B4GALT1 유전자 내의 핵산 삽입물 및 상응하는 핵산은 약 1 내지 약 10, 약 10 내지 약 20, 약 20 내지 약 30, 약 30 내지 약 40, 약 40 내지 약 50, 약 50 내지 약 60, 약 60 내지 약 70, 약 70 내지 약 80, 약 80 내지 약 90, 약 90 내지 약 100, 약 100 내지 약 110, 약 110 내지 약 120, 약 120 내지 약 130, 약 130 내지 약 140, 약 140 내지 약 150, 약 150 내지 약 160, 약 160 내지 약 170, 약 170 내지 약 180, 약 180 내지 약 190 또는 약 190 내지 약 200개 뉴클레오타이드 길이일 수 있다. 마찬가지로, 결실 및/또는 대체될 내인성 B4GALT1 유전자 내의 핵산 삽입물 및 상응하는 핵산은 약 1 내지 약 100, 약 100 내지 약 200, 약 200 내지 약 300, 약 300 내지 약 400, 약 400 내지 약 500, 약 500 내지 약 600, 약 600 내지 약 700, 약 700 내지 약 800, 약 800 내지 약 900 또는 약 900 내지 약 1,000개 뉴클레오타이드 길이일 수 있다. 마찬가지로, 결실 및/또는 대체될 내인성 B4GALT1 유전자 내의 핵산 삽입물 및 상응하는 핵산은 약 1kb 내지 약 1.5kb, 약 1.5kb 내지 약 2kb, 약 2kb 내지 약 2.5kb, 약 2.5kb 내지 약 3kb, 약 3kb 내지 약 3.5kb, 약 3.5kb 내지 약 4kb, 약 4kb 내지 약 4.5kb 또는 약 4.5kb 내지 약 5kb 길이일 수 있다.The nucleic acid insert and corresponding nucleic acid within the endogenous B4GALT1 gene to be deleted and/or replaced can be of various lengths. Exemplary nucleic acid inserts or corresponding nucleic acids within the endogenous B4GALT1 gene to be deleted and/or replaced are from about 1 nucleotide to about 5 kb in length or from about 1 nucleotide to about 1,000 nucleotides in length. For example, the nucleic acid inserts and corresponding nucleic acids within the endogenous B4GALT1 gene to be deleted and/or replaced may have about 1 to about 10, about 10 to about 20, about 20 to about 30, about 30 to about 40, about 40 to about 50. , about 50 to about 60, about 60 to about 70, about 70 to about 80, about 80 to about 90, about 90 to about 100, about 100 to about 110, about 110 to about 120, about 120 to about 130, about It may be 130 to about 140, about 140 to about 150, about 150 to about 160, about 160 to about 170, about 170 to about 180, about 180 to about 190, or about 190 to about 200 nucleotides in length. Likewise, the nucleic acid inserts and corresponding nucleic acids within the endogenous B4GALT1 gene to be deleted and/or replaced are about 1 to about 100, about 100 to about 200, about 200 to about 300, about 300 to about 400, about 400 to about 500, about It may be 500 to about 600, about 600 to about 700, about 700 to about 800, about 800 to about 900, or about 900 to about 1,000 nucleotides in length. Likewise, the nucleic acid insert and corresponding nucleic acid within the endogenous B4GALT1 gene to be deleted and/or replaced range from about 1 kb to about 1.5 kb, from about 1.5 kb to about 2 kb, from about 2 kb to about 2.5 kb, from about 2.5 kb to about 3 kb, from about 3 kb to about 3 kb. It may be about 3.5 kb, about 3.5 kb to about 4 kb, about 4 kb to about 4.5 kb, or about 4.5 kb to about 5 kb in length.

핵산 삽입물은 게놈 DNA 또는 임의의 다른 유형의 DNA를 포함할 수 있다. 예를 들어, 핵산 삽입물은 cDNA를 포함할 수 있다.Nucleic acid inserts may include genomic DNA or any other type of DNA. For example, a nucleic acid insert may include cDNA.

핵산 삽입물은 내인성 B4GALT1 유전자의 전부 또는 부분(예를 들어, B4GALT1 폴리펩타이드의 특정 모티프 또는 영역을 암호화하는 유전자의 일부)과 상동성인 서열을 포함할 수 있다. 예를 들어, 핵산 삽입물은 내인성 B4GALT1 유전자에서의 대체에 대해서 표적화된 서열과 비교할 때 하나 이상의 점 돌연변이(예를 들어, 1, 2, 3, 4, 5 또는 그 초과) 또는 하나 이상의 뉴클레오타이드 삽입 또는 결실을 포함하는 서열을 포함할 수 있다.The nucleic acid insert may comprise a sequence homologous to all or part of the endogenous B4GALT1 gene (e.g., a portion of the gene encoding a specific motif or region of a B4GALT1 polypeptide). For example, the nucleic acid insert may contain one or more point mutations (e.g., 1, 2, 3, 4, 5 or more) or one or more nucleotide insertions or deletions compared to the sequence targeted for replacement in the endogenous B4GALT1 gene. It may include a sequence containing a.

결실 및/또는 대체될 내인성 B4GALT1 유전자 내의 핵산 삽입물 및 상응하는 핵산은 암호 영역, 예컨대, 엑손; 비암호 영역, 예컨대, 인트론, 미번역 영역 또는 조절 영역(예를 들어, 프로모터, 인핸서 또는 전사 억제인자-결합 요소); 또는 이들의 임의의 조합물일 수 있다.The nucleic acid insert and corresponding nucleic acid within the endogenous B4GALT1 gene to be deleted and/or replaced include coding regions, such as exons; Non-coding regions, such as introns, untranslated regions, or regulatory regions (e.g., promoters, enhancers, or transcriptional repressor-binding elements); Or it may be any combination thereof.

핵산 삽입물은 또한 선택 마커를 암호화하는 폴리뉴클레오타이드를 포함할 수 있다. 대안적으로, 핵산 삽입물은 또한 선택 마커를 암호화하는 폴리뉴클레오타이드가 결핍될 수 있다. 선택 마커는 선택 카세트 내에 함유될 수 있다. 일부 실시형태에서, 선택 카세트는 자가-결실 카세트일 수 있다. 예로서, 자가-결실 카세트는 마우스 Prm1 프로모터에 작동 가능하게 연결된 Cre 유전자(인트론에 의해서 분리된 Cre 재조합효소를 암호화하는 2개의 엑손을 포함함) 및 인간 유비퀴틴 프로모터에 작동 가능하게 연결된 네오마이신 내성 유전자를 포함할 수 있다. 예시적인 선택 마커는 네오마이신 포스포트랜스퍼라제(neo^r), 하이그로마이신 B 포스포트랜스퍼라제(hyg^r), 퓨로마이신-N-아세틸트랜스퍼라제(puro^r), 블라스티시딘 S 데아미나제(bsr^r), 잔틴/구아닌 포스포리보실 트랜스퍼라제(gpt), 또는 단순 포진 바이러스 티미딘 카이나제(HSV-k) 또는 이들의 조합물을 포함한다. 선택 마커를 암호화하는 폴리뉴클레오타이드는 표적화될 세포에서 활성인 프로모터에 작동 가능하게 연결될 수 있다. 프로모터의 예는 본 명세서 다른 곳에서 기재된다.The nucleic acid insert may also include a polynucleotide encoding a selectable marker. Alternatively, the nucleic acid insert may also lack a polynucleotide encoding a selectable marker. A selection marker may be contained within a selection cassette. In some embodiments, the selection cassette may be a self-deletion cassette. By way of example, a self-deletion cassette may comprise the Cre gene operably linked to the mouse Prm1 promoter (comprising two exons encoding the Cre recombinase separated by an intron) and the neomycin resistance gene operably linked to the human ubiquitin promoter. may include. Exemplary selection markers include neomycin phosphotransferase (neo ^r ), hygromycin B phosphotransferase (hyg ^r ), puromycin-N-acetyltransferase (puro ^r ), blasticidin S deaminase (bsr ^r ), xanthine/guanine phosphoribosyl transferase (gpt), or herpes simplex virus thymidine kinase (HSV-k), or combinations thereof. A polynucleotide encoding a selectable marker can be operably linked to a promoter that is active in the cell to be targeted. Examples of promoters are described elsewhere herein.

핵산 삽입물은 또한 리포터 유전자를 포함할 수 있다. 예시적인 리포터 유전자는 루시퍼라제, β-갈락토시다제, 녹색 형광 단백질(GFP), 향상된 녹색 형광 단백질(eGFP), 청록색 형광 단백질(CFP), 황색 형광 단백질(YFP), 향상된 황색 형광 단백질(eYFP), 청색 형광 단백질(BFP), 향상된 청색 형광 단백질(eBFP), DsRed, ZsGreen, MmGFP, mPlum, mCherry, tdTomato, mStrawberry, J-Red, mOrange, mKO, mCitrine, Venus, YPet, Emerald, CyPet, Cerulean, T-Sapphire 및 알칼리성 포스파타제를 암호화하는 것을 포함한다. 이러한 리포터 유전자는 표적화될 세포에서 활성인 프로모터에 작동 가능하게 연결될 수 있다. 프로모터의 예는 본 명세서 다른 곳에서 기재된다.Nucleic acid inserts may also include reporter genes. Exemplary reporter genes include luciferase, β-galactosidase, green fluorescent protein (GFP), enhanced green fluorescent protein (eGFP), cyan fluorescent protein (CFP), yellow fluorescent protein (YFP), and enhanced yellow fluorescent protein (eYFP). ), blue fluorescent protein (BFP), enhanced blue fluorescent protein (eBFP), DsRed, ZsGreen, MmGFP, mPlum, mCherry, tdTomato, mStrawberry, J-Red, mOrange, mKO, mCitrine, Venus, YPet, Emerald, CyPet, Cerulean , which encodes T-Sapphire and alkaline phosphatase. Such reporter genes can be operably linked to a promoter that is active in the cell to be targeted. Examples of promoters are described elsewhere herein.

핵산 삽입물은 또한 하나 이상의 발현 카세트 또는 결실 카세트를 포함할 수 있다. 특정 카세트는 관심대상 뉴클레오타이드 서열, 선택 마커를 암호화하는 폴리뉴클레오타이드, 및 리포터 유전자 중 하나 이상을, 발현에 영향을 미치는 다양한 조절 성분과 함께 포함할 수 있다. 포함될 수 있는 선택 가능한 마커 및 리포터 유전자의 예는 본 명세서 다른 곳에 더 상세하게 논의된다.A nucleic acid insert may also include one or more expression cassettes or deletion cassettes. A particular cassette may contain one or more of a nucleotide sequence of interest, a polynucleotide encoding a selectable marker, and a reporter gene, along with various regulatory elements that affect expression. Examples of selectable markers and reporter genes that may be included are discussed in more detail elsewhere herein.

핵산 삽입물은 부위-특이적 재조합 인식 서열과 측접된 핵산을 포함할 수 있다. 대안적으로, 핵산 삽입물은 하나 이상의 부위-특이적 재조합 인식 서열을 포함할 수 있다. 전체 핵산 삽입물이 이러한 부위-특이적 재조합 인식 서열에 의해 측접될 수 있지만, 핵산 삽입물 내의 관심대상의 임의의 영역 또는 개별 폴리뉴클레오타이드가 또한 이러한 부위에 의해 측접될 수 있다. 핵산 삽입물 또는 핵산 삽입물에서 관심대상의 임의의 폴리뉴클레오타이드를 측접할 수 있는 부위-특이적 재조합 인식 서열은 예를 들어, loxP, lox511, lox2272, lox66, lox71, loxM2, lox5171, FRT, FRT11, FRT71, attp, att, FRT, rox 또는 이들의 조합물을 포함할 수 있다. 일부 실시형태에서, 부위-특이적 재조합 부위는 핵산 삽입물 내에 함유된 선택 마커 및/또는 리포터 유전자를 암호화하는 폴리뉴클레오타이드에 측접한다. 핵산 삽입물의 내인성 B4GALT1 유전자 내의 통합 이후에, 부위-특이적 재조합 부위 사이의 서열은 제거될 수 있다. 일부 실시형태에서, 2개의 외인성 공여자 서열이 사용될 수 있고, 각각은 부위-특이적 재조합 부위를 포함하는 핵산 삽입물을 갖는다. 외인성 공여자 서열은 관심대상 핵산에 측접하는 5' 및 3' 영역에 대해 표적화될 수 있다. 2개의 핵산 삽입물의 표적 게놈 유전자좌 내의 통합 이후에, 2개의 삽입된 부위-특이적 재조합 부위 사이의 관심대상 핵산은 제거될 수 있다.Nucleic acid inserts may include nucleic acids flanked by site-specific recombination recognition sequences. Alternatively, the nucleic acid insert may include one or more site-specific recombination recognition sequences. Although the entire nucleic acid insert may be flanked by such site-specific recombination recognition sequences, individual polynucleotides or any region of interest within the nucleic acid insert may also be flanked by such sites. Site-specific recombination recognition sequences capable of flanking the nucleic acid insert or any polynucleotide of interest in the nucleic acid insert include, for example, loxP, lox511, lox2272, lox66, lox71, loxM2, lox5171, FRT, FRT11, FRT71, It may include attp, att, FRT, rox, or combinations thereof. In some embodiments, the site-specific recombination site flanks a polynucleotide encoding a selectable marker and/or reporter gene contained within the nucleic acid insert. After integration of the nucleic acid insert into the endogenous B4GALT1 gene, sequences between site-specific recombination sites can be removed. In some embodiments, two exogenous donor sequences can be used, each having a nucleic acid insert containing a site-specific recombination site. Exogenous donor sequences can be targeted to the 5' and 3' regions flanking the nucleic acid of interest. After integration of the two nucleic acid inserts within the target genomic locus, the nucleic acid of interest between the two inserted site-specific recombination sites can be removed.

핵산 삽입물은 또한 타입 I, 타입 II, 타입 III 및 타입 IV 엔도뉴클레아제를 포함하는, 제한 엔도뉴클레아제(즉, 제한 효소)에 대한 하나 이상의 제한 부위를 포함할 수 있다. 타입 I 및 타입 III 제한 엔도뉴클레아제는 특이적 인식 서열을 인식하지만, 전형적으로 뉴클레아제 결합 부위로부터 가변 위치에서 절단하고, 그것은 절단 부위(인식 서열)로부터 수 백 개의 염기쌍만큼 멀리 존재할 수 있다. 타입 II 시스템에서, 제한 활성도는 임의의 메틸라제 활성도와 독립적이고, 절단은 전형적으로 결합 부위 내에서 또는 그 근처에서 특이적 부위에서 일어난다. 대부분의 타입 II 효소는 회문 서열을 절단하지만, 타입 IIa 효소는 비회문 인식 서열을 인식하고, 인식 서열의 외부를 절단하고, 타입 IIb 효소는 인식 서열의 외부 양 부위로 서열을 2회 절단하며, 타입 II 효소는 비대칭 인식 서열을 인식하고, 하나의 측면 상에서 인식 서열로부터 약 1 내지 20개 뉴클레오타이드의 정의된 거리에서 절단한다. 타입 IV 제한 효소는 메틸화된 DNA를 표적화한다.The nucleic acid insert may also include one or more restriction sites for restriction endonucleases (i.e., restriction enzymes), including type I, type II, type III, and type IV endonucleases. Type I and type III restriction endonucleases recognize specific recognition sequences, but typically cleave at variable positions from the nuclease binding site, which can be as much as hundreds of base pairs away from the cleavage site (recognition sequence). . In type II systems, restriction activity is independent of any methylase activity, and cleavage typically occurs at a specific site within or near the binding site. Most type II enzymes cleave palindromic sequences, but type IIa enzymes recognize non-palindromic recognition sequences and cleave outside the recognition sequence, and type IIb enzymes cleave the sequence twice, both outside the recognition sequence. Type II enzymes recognize asymmetric recognition sequences and cleave on one side at a defined distance of about 1 to 20 nucleotides from the recognition sequence. Type IV restriction enzymes target methylated DNA.

일부 실시형태에서, 외인성 공여자 서열은 (예를 들어, B4GALT1 유전자 내의) 표적 게놈 유전자좌에서 뉴클레아제-매개된 또는 Cas-단백질-매개된 절단에 의해서 생성된 하나 이상의 오버행에 상보적인 5' 단부 및/또는 3' 단부에서 짧은 단일 가닥 영역을 갖는다. 이러한 오버행은 또한 5' 및 3' 상동성 아암으로서 지칭될 수 있다. 예를 들어, 일부 외인성 공여자 서열은 표적 게놈 유전자좌에서 5' 및/또는 3' 표적 서열에서의 Cas-단백질-매개된 절단에 의해 생성된 하나 이상의 오버행에 상보적인 5' 단부 및/또는 3' 단부에서 짧은 단일-가닥 영역을 갖는다. 일부 실시형태에서, 이러한 외인성 공여자 서열은 5' 단부에서만 또는 3' 단부에서만 상보성 가닥을 갖는다. 예를 들어, 일부 이러한 외인성 공여자 서열은 표적 게놈 유전자좌에서 5' 표적 서열에서 생성된 오버행에 상보적인 5' 단부에서만 상보성 영역을 갖거나 표적 게놈 유전자좌에서 3' 표적 서열에서 생성된 오버행에 상보적인 3' 단부에서만 상보성 영역을 갖는다. 다른 이러한 외인성 공여자 서열은 5' 및 3' 단부 둘 다에서 상보성 영역을 갖는다. 예를 들어, 다른 이러한 외인성 공여자 서열은 5' 및 3' 단부 둘 다에서 상보적인, 예를 들어, 각각 표적 게놈 유전자좌에서의 Cas-매개된 절단에 의해 생성된 제1 및 제2 오버행에 상보성 영역을 갖는다. 예를 들어, 외인성 공여자 서열이 이중 가닥인 경우, 단일 가닥 상보성 영역은 공여자 서열의 상부 가닥의 5' 단부로부터 그리고 공여자 서열의 하부 가닥의 5' 단부로부터 연장되어, 각각의 단부에서 5' 오버행을 생성할 수 있다. 대안적으로, 단일 가닥 상보성 영역은 공여자 서열의 상부 가닥의 3' 단부로부터 그리고 주형의 하부 가닥의 3' 단부로부터 연장되어, 3' 오버행을 생성할 수 있다.In some embodiments, the exogenous donor sequence comprises a 5' end complementary to one or more overhangs created by nuclease-mediated or Cas-protein-mediated cleavage at the target genomic locus (e.g., within the B4GALT1 gene) and /or has a short single-stranded region at the 3' end. These overhangs may also be referred to as 5' and 3' homology arms. For example, some exogenous donor sequences have 5' and/or 3' ends complementary to one or more overhangs created by Cas-protein-mediated cleavage in the 5' and/or 3' target sequence at the target genomic locus. has a short single-stranded region. In some embodiments, this exogenous donor sequence has a complementary strand only at the 5' end or only at the 3' end. For example, some of these exogenous donor sequences have a region of complementarity only at the 5' end, which is complementary to an overhang generated from the 5' target sequence at the target genomic locus, or at the 3' end complementary to an overhang generated from the 3' target sequence at the target genomic locus. ' It has a complementary region only at the end. Other such exogenous donor sequences have regions of complementarity at both the 5' and 3' ends. For example, other such exogenous donor sequences may be complementary at both the 5' and 3' ends, e.g., regions complementary to the first and second overhangs, respectively, generated by Cas-mediated cleavage at the target genomic locus. has For example, if the exogenous donor sequence is double stranded, the single-stranded complementarity region extends from the 5' end of the upper strand of the donor sequence and from the 5' end of the lower strand of the donor sequence, leaving a 5' overhang at each end. can be created. Alternatively, the single-stranded region of complementarity may extend from the 3' end of the upper strand of the donor sequence and from the 3' end of the lower strand of the template, creating a 3' overhang.

상보성 영역은 외인성 공여자 서열과 내인성 B4GALT1 유전자 사이에서 결찰을 촉진시키기에 충분한 임의의 길이를 가질 수 있다. 예시적인 상보성 영역은 약 1 내지 약 5개 뉴클레오타이드 길이, 약 1 내지 약 25개 뉴클레오타이드 길이 또는 약 5 내지 약 150개 뉴클레오타이드 길이이다. 예를 들어, 상보성 영역은 적어도 약 1, 적어도 약 2, 적어도 약 3, 적어도 약 4, 적어도 약 5, 적어도 약 6, 적어도 약 7, 적어도 약 8, 적어도 약 9, 적어도 약 10, 적어도 약 11, 적어도 약 12, 적어도 약 13, 적어도 약 14, 적어도 약 15, 적어도 약 16, 적어도 약 17, 적어도 약 18, 적어도 약 19, 적어도 약 20, 적어도 약 21, 적어도 약 22, 적어도 약 23, 적어도 약 24 또는 적어도 약 25개 뉴클레오타이드 길이일 수 있다. 대안적으로, 상보성 영역은 약 5 내지 약 10, 약 10 내지 약 20, 약 20 내지 약 30, 약 30 내지 약 40, 약 40 내지 약 50, 약 50 내지 약 60, 약 60 내지 약 70, 약 70 내지 약 80, 약 80 내지 약 90, 약 90 내지 약 100, 약 100 내지 약 110, 약 110 내지 약 120, 약 120 내지 약 130, 약 130 내지 약 140, 약 140 내지 약 150개 뉴클레오타이드 길이일 수 있거나 더 길 수 있다.The region of complementarity can be of any length sufficient to facilitate ligation between the exogenous donor sequence and the endogenous B4GALT1 gene. Exemplary regions of complementarity are about 1 to about 5 nucleotides long, about 1 to about 25 nucleotides long, or about 5 to about 150 nucleotides long. For example, the region of complementarity may be at least about 1, at least about 2, at least about 3, at least about 4, at least about 5, at least about 6, at least about 7, at least about 8, at least about 9, at least about 10, or at least about 11. , at least about 12, at least about 13, at least about 14, at least about 15, at least about 16, at least about 17, at least about 18, at least about 19, at least about 20, at least about 21, at least about 22, at least about 23, at least It may be about 24 or at least about 25 nucleotides long. Alternatively, the region of complementarity may be about 5 to about 10, about 10 to about 20, about 20 to about 30, about 30 to about 40, about 40 to about 50, about 50 to about 60, about 60 to about 70, about 70 to about 80, about 80 to about 90, about 90 to about 100, about 100 to about 110, about 110 to about 120, about 120 to about 130, about 130 to about 140, about 140 to about 150 nucleotides in length. It can be longer.

이러한 상보성 영역은 두 쌍의 닉카제에 의해서 생성된 오버행에 상보적일 수 있다. 엇갈린 단부를 갖는 2개의 이중 가닥 끊김은 DNA의 반대 가닥을 절단하여 제1 이중 가닥 끊김을 생성하는 제1 닉카제 및 제2 닉카제, 및 DNA의 반대 가닥을 절단하여 제2 이중 가닥 끊김을 생성하는 제 3 닉카제 및 제 4 닉카제를 사용함으로써 생성될 수 있다. 예를 들어, Cas 단백질을 사용하여 제1, 제2, 제3, 및 제4 가이드 RNA와 상응하는 제1, 제2, 제3, 및 제4 가이드 RNA 인식 니킹할 수 있다. 제1 가이드 RNA 및 제2 가이드 RNA 인식 서열은 DNA의 제1 가닥 및 제2 가닥 상에서 제1 닉카제 및 제2 닉카제에 의해 생성된 닉이 이중 가닥 끊김을 생성하도록 제1 절단 부위를 생성하기 위해서 위치될 수 있다(즉, 제1 절단 부위는 제1 및 제2 가이드 RNA 인식 서열 내의 닉을 포함한다). 마찬가지로, 제3 가이드 RNA 및 제4 가이드 RNA 인식 서열은 DNA의 제1 가닥 및 제2 가닥 상에서 제3 닉카제 및 제4 닉카제에 의해 생성된 닉이 이중 가닥 끊김을 생성하도록 제2 절단 부위를 생성하기 위해서 위치될 수 있다(즉, 제2 절단 부위는 제3 및 제4 가이드 RNA 인식 서열 내의 닉을 포함한다). 일부 실시형태에서, 제1 가이드 RNA 인식 서열 및 제2 가이드 RNA 인식 서열 및/또는 제3 가이드 RNA 인식 서열 및 제4 가이드 RNA 인식 서열 내의 닉은 오버행을 생성하는 오프-셋 닉(off-set nick)일 수 있다. 오프셋 창은, 예를 들어, 적어도 약 5bp, 적어도 약 10bp, 적어도 약 20bp, 적어도 약 30bp, 적어도 약 40bp, 적어도 약 50bp, 적어도 약 60bp, 적어도 약 70bp, 적어도 약 80bp, 적어도 약 90bp, 적어도 약 100bp 또는 그 초과일 수 있다. 이러한 실시형태에서, 이중 가닥 외인성 공여자 서열은 제1 및 제2 가이드 RNA 인식 서열 내의 닉 및 제3 및 제4 가이드 RNA 인식 서열 내의 닉에 의해 생성된 오버행에 상보적인 단일 가닥 상보성 영역으로 설계될 수 있다. 이어서, 이러한 외인성 공여자 서열은 비상동성 단부 결합-매개된 결찰에 의해서 삽입될 수 있다.This region of complementarity may be complementary to the overhang created by the two pairs of nickases. Two double strand breaks having staggered ends, a first nickase and a second nickase cleaving opposing strands of DNA to produce a first double strand break, and cleaving opposing strands of DNA to produce a second double strand break. It can be produced by using a third nickase and a fourth nickase. For example, a Cas protein can be used to nick the first, second, third, and fourth guide RNAs and the corresponding first, second, third, and fourth guide RNAs. The first guide RNA and the second guide RNA recognition sequence are configured to create a first cleavage site such that nicks created by the first nickase and the second nickase on the first and second strands of the DNA produce a double strand break. (i.e., the first cleavage site includes nicks within the first and second guide RNA recognition sequences). Likewise, the third guide RNA and fourth guide RNA recognition sequences define a second cleavage site such that the nicks created by the third and fourth nickases on the first and second strands of DNA produce double-strand breaks. (i.e., the second cleavage site includes nicks in the third and fourth guide RNA recognition sequences). In some embodiments, the nick within the first guide RNA recognition sequence and the second guide RNA recognition sequence and/or the third guide RNA recognition sequence and the fourth guide RNA recognition sequence is an off-set nick that creates an overhang. ) can be. The offset window may be, for example, at least about 5 bp, at least about 10 bp, at least about 20 bp, at least about 30 bp, at least about 40 bp, at least about 50 bp, at least about 60 bp, at least about 70 bp, at least about 80 bp, at least about 90 bp, at least about It may be 100bp or more. In this embodiment, the double-stranded exogenous donor sequence can be designed with a single-stranded region of complementarity that is complementary to the overhang created by the nick in the first and second guide RNA recognition sequences and the nick in the third and fourth guide RNA recognition sequences. there is. This exogenous donor sequence can then be inserted by non-homologous end joining-mediated ligation.

일부 실시형태에서, 외인성 공여자 서열(즉, 표적화 벡터)은 상동성 아암을 포함한다. 외인성 공여자 서열이 또한 핵산 삽입물을 포함하는 경우, 상동성 아암은 핵산 삽입물에 측접할 수 있다. 참고 용이성을 위해서, 상동성 아암은 본 명세서에서 5' 및 3'(즉, 상류 및 하류) 상동성 아암이라고 지칭된다. 이러한 용어는 외인성 공여자 서열 내의 핵산 삽입물에 대한 상동성 아암의 상대적인 위치에 관한 것이다.In some embodiments, the exogenous donor sequence (i.e., targeting vector) includes a homology arm. If the exogenous donor sequence also includes a nucleic acid insert, the homology arms may flank the nucleic acid insert. For ease of reference, the homology arms are referred to herein as the 5' and 3' (i.e., upstream and downstream) homology arms. These terms relate to the relative position of the homology arm to the nucleic acid insert within the exogenous donor sequence.

상동성 아암 및 표적 서열은 두 영역이 상동성 재조합 반응에 대한 기질로서 작용하기 위해서 서로에 대해 충분한 수준의 서열 동일성을 공유하는 경우 서로에 상응한다. 외인성 공여자 서열에서 발견되는 특정 표적 서열과 상응하는 상동성 아암 간의 서열 동일성은 상동성 재조합이 일어나는 것을 허용하는 서열 동일성의 임의의 정도일 수 있다. 예를 들어, 외인성 공여자 서열(또는 이의 단편)의 상동성 아암과 표적 서열(또는 이의 단편)에 의해서 공유되는 서열 동일성의 양은 적어도 50%, 적어도 55%, 적어도 60%, 적어도 65%, 적어도 70%, 적어도 75%, 적어도 80%, 적어도 81%, 적어도 82%, 적어도 83%, 적어도 84%, 적어도 85%, 적어도 86%, 적어도 87%, 적어도 88%, 적어도 89%, 적어도 90%, 적어도 91%, 적어도 92%, 적어도 93%, 적어도 94%, 적어도 95%, 적어도 96%, 적어도 97%, 적어도 98%, 적어도 99% 또는 100% 서열 동일성일 수 있어서, 서열이 상동성 재조합을 겪을 수 있다. 추가로, 상동성 아암과 상응하는 표적 서열 사이의 상응하는 상동성 영역은 상동성 재조합을 촉진시키기에 충분한 임의의 길이를 가질 수 있다. 예시적인 상동성 아암은 약 25개 뉴클레오타이드 내지 약 2.5kb 길이, 약 25개 뉴클레오타이드 내지 약 1.5kb 길이 또는 약 25개 내지 약 500개 뉴클레오타이드 길이이다. 예를 들어, 주어진 상동성 아암(또는 상동성 아암 각각) 및/또는 상응하는 표적 서열은 약 25 내지 약 30, 약 30 내지 약 40, 약 40 내지 약 50, 약 50 내지 약 60, 약 60 내지 약 70, 약 70 내지 약 80, 약 80 내지 약 90, 약 90 내지 약 100, 약 100 내지 약 150, 약 150 내지 약 200, 약 200 내지 약 250, 약 250 내지 약 300, 약 300 내지 약 350, 약 350 내지 약 400, 약 400 내지 약 450 또는 약 450 내지 약 500개 뉴클레오타이드 길이인 상응하는 상동성 영역을 포함할 수 있어서, 상동성 아암은 내인성 B4GALT1 유전자 내의 상응하는 표적 서열과 상동성 재조합을 경험하기에 충분한 상동성을 갖는다. 대안적으로, 특정 상동성 아암(또는 각각의 상동성 아암) 및/또는 상응하는 표적 서열은 약 0.5kb 내지 약 1kb, 약 1kb 내지 약 1.5kb, 약 1.5kb 내지 약 2kb 또는 약 2kb 내지 약 2.5kb 길이인 상응하는 상동성 영역을 포함할 수 있다. 예를 들어, 상동성 아암은 각각 약 750개 뉴클레오타이드 길이일 수 있다. 상동성 아암은 대칭적이거나(각각 거의 동일한 크기의 길이), 또는 비대칭적일 수 있다(하나가 나머지 것보다 김).The homology arm and target sequence correspond to each other if the two regions share a sufficient level of sequence identity to each other to serve as a substrate for a homologous recombination reaction. The sequence identity between a particular target sequence found in the exogenous donor sequence and the corresponding homology arm can be any degree of sequence identity that allows homologous recombination to occur. For example, the amount of sequence identity shared by the homology arm of the exogenous donor sequence (or fragment thereof) and the target sequence (or fragment thereof) is at least 50%, at least 55%, at least 60%, at least 65%, at least 70%. %, at least 75%, at least 80%, at least 81%, at least 82%, at least 83%, at least 84%, at least 85%, at least 86%, at least 87%, at least 88%, at least 89%, at least 90%, may be at least 91%, at least 92%, at least 93%, at least 94%, at least 95%, at least 96%, at least 97%, at least 98%, at least 99% or 100% sequence identity, such that the sequences are susceptible to homologous recombination. You can experience it. Additionally, the corresponding region of homology between the homology arm and the corresponding target sequence can be of any length sufficient to promote homologous recombination. Exemplary homology arms are about 25 nucleotides to about 2.5 kb in length, about 25 nucleotides to about 1.5 kb in length, or about 25 to about 500 nucleotides in length. For example, a given homology arm (or each homology arm) and/or the corresponding target sequence may have about 25 to about 30, about 30 to about 40, about 40 to about 50, about 50 to about 60, about 60 to about 60. About 70, about 70 to about 80, about 80 to about 90, about 90 to about 100, about 100 to about 150, about 150 to about 200, about 200 to about 250, about 250 to about 300, about 300 to about 350 , may comprise a corresponding homology region that is about 350 to about 400, about 400 to about 450, or about 450 to about 500 nucleotides in length, so that the homology arm undergoes homologous recombination with the corresponding target sequence within the endogenous B4GALT1 gene. It has sufficient homology to be experienced. Alternatively, a particular homology arm (or each homology arm) and/or the corresponding target sequence is about 0.5 kb to about 1 kb, about 1 kb to about 1.5 kb, about 1.5 kb to about 2 kb, or about 2 kb to about 2.5 kb. It may contain a corresponding homology region that is kb long. For example, the homology arms may each be about 750 nucleotides long. The homology arms may be symmetrical (each approximately the same amount of length) or asymmetrical (one longer than the other).

상동성 아암은 세포에 대해 네티이브인 유전자좌(예를 들어, 표적화된 유전자좌)에 상응할 수 있다. 대안적으로, 예를 들어, 그것은 예를 들어, 트랜스젠, 발현 카세트 또는 DNA의 이종성 또는 외인성 영역을 포함하는, 세포의 게놈 내에 통합된 DNA의 이종성 또는 외인성 분절의 영역에 상응할 수 있다. 일부 실시형태에서, 표적화 벡터의 상동성 아암은 효모 인공 염색체(yeast artificial chromosome: YAC), 박테리아 인공 염색체(bacterial artificial chromosome: BAC), 인간 인공 염색체의 영역, 또는 적절한 숙주 세포에 함유된 임의의 다른 조작된 영역에 상응할 수 있다. 일부 실시형태에서, 표적화 벡터의 상동성 아암은 BAC 라이브러리, 코스미드 라이브러리, 또는 P1 파지 라이브러리의 영역에 상응하거나 그 영역으로부터 유래될 수 있거나, 또는 합성 DNA로부터 유래될 수 있다.The homology arm may correspond to a locus that is native to the cell (e.g., the targeted locus). Alternatively, for example, it may correspond to a region of a heterologous or exogenous segment of DNA integrated into the genome of a cell, including, for example, a transgene, an expression cassette or a heterologous or exogenous region of DNA. In some embodiments, the homology arm of the targeting vector is a region of a yeast artificial chromosome (YAC), a bacterial artificial chromosome (BAC), a human artificial chromosome, or any other contained in a suitable host cell. It may correspond to the manipulated area. In some embodiments, the homology arm of the targeting vector corresponds to or may be derived from a region of a BAC library, a cosmid library, or a P1 phage library, or may be derived from synthetic DNA.

뉴클레아제 작용제가 외인성 공여자 서열과 조합하여 사용되는 경우, 5' 및 3' 표적 서열은 일반적으로 뉴클레아제 절단 부위에서 단일 가닥 끊김(닉) 또는 이중 가닥 끊김 시에 표적 서열과 상동성 아암 사이에 상동성 재조합 사건이 발생하는 것을 촉진시키도록 뉴클레아제 절단 부위에 충분히 인접하게 위치된다. 뉴클레아제 절단 부위는 닉 또는 이중 가닥 끊김이 뉴클레아제 작용제(예를 들어, 가이드 RNA와 복합체를 이루는 Cas9 단백질)에 의해 생성되는 DNA 서열을 포함한다. 외인성 공여자 서열의 5' 및 3' 상동성 아암에 상응하는 내인성 B4GALT1 유전자 내의 표적 서열은 거리가 예컨대, 뉴클레아제 절단 부위에서 단일 가닥 끊김 또는 이중 가닥 끊김 시에 5' 및 3' 표적 서열과 상동성 아암 사이에 상동성 재조합 사건의 발생을 촉진시키는 만큼의 거리인 경우 뉴클레아제 절단 부위에 "충분히 인접하여 위치된"다. 따라서, 외인성 공여자 서열의 5' 및/또는 3' 상동성 아암에 상응하는 표적 서열은, 예를 들어, 주어진 뉴클레아제 절단 부위의 적어도 1개 뉴클레오타이드 내에, 또는 주어진 뉴클레아제 절단 부위의 적어도 10개 뉴클레오타이드 내지 약 1,000개 뉴클레오타이드 내에 존재할 수 있다. 일부 실시형태에서, 뉴클레아제 절단 부위는 표적 서열 중 적어도 하나의 또는 둘 다에 바로 인접할 수 있다.When a nuclease agent is used in combination with an exogenous donor sequence, the 5' and 3' target sequences typically have a single-strand break (nick) or double-strand break at the nuclease cleavage site between the target sequence and the homology arm. It is located sufficiently adjacent to the nuclease cleavage site to facilitate homologous recombination events occurring. Nuclease cleavage sites include DNA sequences in which a nick or double-strand break is created by a nuclease agent (e.g., the Cas9 protein in complex with a guide RNA). The target sequence in the endogenous B4GALT1 gene corresponding to the 5' and 3' homology arms of the exogenous donor sequence is at a distance from the 5' and 3' target sequence, e.g., upon a single-strand break or double-strand break at a nuclease cleavage site. A nuclease cleavage site is "sufficiently located" if the distance between the homologous arms facilitates the occurrence of homologous recombination events. Thus, the target sequence corresponding to the 5' and/or 3' homology arms of the exogenous donor sequence is, for example, within at least 1 nucleotide of a given nuclease cleavage site, or at least 10 nucleotides from a given nuclease cleavage site. It can be anywhere from 1 nucleotide to about 1,000 nucleotides. In some embodiments, the nuclease cleavage site may be immediately adjacent to at least one or both target sequences.

외인성 공여자 서열 및 뉴클레아제 절단 부위의 상동성 아암에 상응하는 표적 서열의 공간적 관계는 달라질 수 있다. 일부 실시형태에서, 표적 서열은 뉴클레아제 절단 부위에 대해 5'에 위치할 수 있거나, 표적 서열은 뉴클레아제 절단 부위에 대해 3'에 위치할 수 있거나, 또는 표적 서열은 뉴클레아제 절단 부위에 측접할 수 있다.The spatial relationship of the exogenous donor sequence and the target sequence corresponding to the homology arm of the nuclease cleavage site may vary. In some embodiments, the target sequence may be located 5' to a nuclease cleavage site, the target sequence may be located 3' to a nuclease cleavage site, or the target sequence may be located 3' to a nuclease cleavage site. can be adjacent to.

본 개시내용은 또한 내인성 B4GALT1 유전자의 발현을 변형 또는 변경시키기 위해서 본 명세서에 개시된 방법을 사용하여 질환을 갖거나 질환을 가질 위험이 있는 대상체에서의 심혈관 병태에 대한 치료 방법 및 치료 또는 예방 방법을 제공한다. 본 개시내용은 또한 내인성 B4GALT1 mRNA의 발현을 감소시키는 방법 또는 대상체에게 B4GALT1 폴리펩타이드를 암호화하는 재조합 핵산을 제공하는 방법, B4GALT1 폴리펩타이드를 암호화하는 mRNA를 제공하는 방법, 또는 B4GALT1 폴리펩타이드를 제공하는 방법을 사용하여 질환을 갖거나 질환을 가질 위험이 있는 대상체에서 심혈관 병태를 치료하는 방법 및 이를 치료 또는 예방하는 방법을 제공한다. 방법은 (예를 들어, 생체내 또는 생체외에서) 대상체 내에, 대상체의 기관 내에 또는 대상체의 세포 내에 1종 이상의 핵산 분자 또는 단백질을 도입하는 것을 포함할 수 있다.The present disclosure also provides methods of treatment and methods of treatment or prevention of cardiovascular conditions in a subject having the disease or at risk of having the disease using the methods disclosed herein to modify or alter the expression of the endogenous B4GALT1 gene. do. The present disclosure also provides methods for reducing expression of endogenous B4GALT1 mRNA, or providing a subject with a recombinant nucleic acid encoding a B4GALT1 polypeptide, providing an mRNA encoding a B4GALT1 polypeptide, or providing a B4GALT1 polypeptide. Provides a method of treating a cardiovascular condition in a subject having the disease or at risk of having the disease using the method and a method of treating or preventing the same. The methods may include introducing one or more nucleic acid molecules or proteins into a subject, into an organ of the subject, or into a cell of the subject (e.g., in vivo or in vitro).

일부 실시형태에서, 본 개시내용은 요법에서 사용하기 위한 B4GALT1 폴리펩타이드를 암호화하는 mRNA(예를 들어, 본 명세서에 논의된 바와 같은 폴리뉴클레오타이드, 예를 들어, 서열번호 4의 서열을 포함하는 mRNA)를 제공한다. 일부 이러한 실시형태에서, 요법은 심혈관 병태의 치료 또는 예방이다.In some embodiments, the present disclosure provides an mRNA encoding a B4GALT1 polypeptide (e.g., a polynucleotide as discussed herein, e.g., an mRNA comprising the sequence of SEQ ID NO: 4) for use in therapy. provides. In some such embodiments, the therapy is the treatment or prevention of a cardiovascular condition.

일부 실시형태에서, 본 개시내용은 요법에서 사용하기 위한 B4GALT1 폴리펩타이드(예를 들어, 본 명세서에 논의된 바와 같은 폴리펩타이드, 예를 들어, 서열번호 8의 서열을 포함하는 폴리펩타이드)를 제공한다. 일부 이러한 실시형태에서, 요법은 심혈관 병태의 치료 또는 예방이다.In some embodiments, the disclosure provides a B4GALT1 polypeptide (e.g., a polypeptide as discussed herein, e.g., a polypeptide comprising the sequence of SEQ ID NO: 8) for use in therapy. . In some such embodiments, the therapy is the treatment or prevention of a cardiovascular condition.

대상체는 예방적 또는 치료적 치료를 제공받는 인간 및 다른 포유동물 대상체(예를 들어, 고양이과, 개과, 설치류, 마우스 또는 래트) 또는 비-포유동물 대상체(예를 들어, 가금류)를 포함한다. 이러한 대상체는, 예를 들어, 변이체 B4GALT1의 보유자가 아니고(또는 단지 변이체 B4GALT1의 이형접합성 보유자이고), 심혈관 병태를 갖거나 이의 발병에 민감한 대상체(예를 들어 인간)일 수 있다.Subjects include humans and other mammalian subjects (e.g., felines, canines, rodents, mice, or rats) or non-mammalian subjects (e.g., poultry) receiving prophylactic or therapeutic treatment. Such a subject may be, for example, a subject (e.g., a human) who is not a carrier of variant B4GALT1 (or is only a heterozygous carrier of variant B4GALT1 ) and has or is susceptible to developing a cardiovascular condition.

심혈관 병태의 비제한적인 예는 1종 이상의 혈청 지질의 증가된 수준을 포함한다. 혈청 지질은 콜레스테롤, LDL, HDL, 트라이글리세리드, HDL-콜레스테롤 및 비-HDL 콜레스테롤 또는 이들의 하위분류물(예를 들어, HDL2, HDL2a, HDL2b, HDL2c, HDL3, HDL3a, HDL3b, HDL3c, HDL3d, LDL1, LDL2, LDL3, 지방단백질 A, Lpa1, Lpa1, Lpa3, Lpa4 또는 Lpa5) 중 1종 이상을 포함한다. 심혈관 병태는 관상 동맥 석회화의 증가된 수준을 포함할 수 있다. 심혈관 병태는 타입 IId 글리코실화(CDG-IId)를 포함할 수 있다. 심혈관 병태는 심장주변 지방의 증가된 수준을 포함할 수 있다. 심혈관 병태는 죽상혈전성 병태를 포함할 수 있다. 죽상혈전성 병태는 피브리노겐의 증가된 수준을 포함할 수 있다. 죽상혈전성 병태는 피브리노겐-매개된 혈병을 포함할 수 있다. 심혈관 병태는 피브리노겐의 증가된 수준을 포함할 수 있다. 심혈관 병태는 피브리노겐-매개된 혈병을 포함할 수 있다. 심혈관 병태는 피브리노겐 활성도의 관여로부터 형성된 혈병을 포함할 수 있다. 피브리노겐-매개된 혈병 또는 피브리노겐 활성도의 관여로부터 형성된 혈병은 신체 내의 임의의 정맥 또는 동맥 내에 존재할 수 있다.Non-limiting examples of cardiovascular conditions include increased levels of one or more serum lipids. Serum lipids include cholesterol, LDL, HDL, triglycerides, HDL-cholesterol, and non-HDL cholesterol or their subclasses (e.g., HDL2, HDL2a, HDL2b, HDL2c, HDL3, HDL3a, HDL3b, HDL3c, HDL3d, LDL1). , LDL2, LDL3, lipoprotein A, Lpa1, Lpa1, Lpa3, Lpa4 or Lpa5). Cardiovascular conditions may include increased levels of coronary artery calcification. Cardiovascular conditions may involve type IId glycosylation (CDG-IId). Cardiovascular conditions may include increased levels of fat around the heart. Cardiovascular conditions may include atherothrombotic conditions. Atherothrombotic conditions may involve increased levels of fibrinogen. Atherothrombotic conditions may include fibrinogen-mediated blood clots. Cardiovascular conditions may involve increased levels of fibrinogen. Cardiovascular conditions may include fibrinogen-mediated blood clots. Cardiovascular conditions may include blood clots that form from involvement of fibrinogen activity. Fibrinogen-mediated blood clots, or blood clots formed from involvement of fibrinogen activity, can be present within any vein or artery within the body.

이러한 방법은 게놈 편집 또는 유전자 요법을 포함할 수 있다. 예를 들어, 변이체 B4GALT1이 아닌 내인성 B4GALT1 유전자는 변이체 B4GALT1과 연관된 변이(즉, 전장/성숙 B4GALT1 폴리펩타이드의 352번 위치에 상응하는 위치에서 아스파라긴의 세린으로의 대체)를 포함하도록 변형될 수 있다. 다른 예로서, 내인성 B4GALT1 변이체가 아닌 내인성 B4GALT1 유전자는 넉아웃되거나 비활성화될 수 있다. 마찬가지로, 변이체 B4GALT1이 아닌 내인성 B4GALT1 유전자는 넉아웃되거나 비활성화될 수 있고, 변이체 B4GALT1과 연관된 변형을 포함하는 B4GALT1 유전자(예를 들어, 완전한 변이체 B4GALT1 또는 변형을 포함하는 미니유전자)가 도입되고 발현될 수 있다. 유사하게, 변이체 B4GALT1이 아닌 내인성 B4GALT1 유전자는 넉아웃되거나 비활성화될 수 있고, B4GALT1 변이체 폴리펩타이드를 암호화하는 재조합 DNA가 도입되고, 발현될 수 있고, B4GALT1 변이체 폴리펩타이드를 암호화하는 mRNA가 도입되고, 발현될 수 있고(예를 들어, 세포내 단백질 대체 요법) 그리고/또는 변이체 B4GALT1 폴리펩타이드가 도입될 수 있다(예를 들어, 단백질 대체 요법).These methods may include genome editing or gene therapy. For example, an endogenous B4GALT1 gene that is not variant B4GALT1 can be modified to include a mutation associated with variant B4GALT1 (i.e., replacement of asparagine with serine at the position corresponding to position 352 of the full-length/mature B4GALT1 polypeptide). As another example, the endogenous B4GALT1 gene, but not the endogenous B4GALT1 variant, can be knocked out or inactivated. Likewise, an endogenous B4GALT1 gene other than variant B4GALT1 can be knocked out or inactivated, and a B4GALT1 gene containing a variant associated with variant B4GALT1 (e.g., a full variant B4GALT1 or a minigene containing the variant) can be introduced and expressed. there is. Similarly, the endogenous B4GALT1 gene, but not variant B4GALT1 , can be knocked out or inactivated, recombinant DNA encoding the B4GALT1 variant polypeptide can be introduced and expressed, and mRNA encoding the B4GALT1 variant polypeptide can be introduced and expressed. (e.g., intracellular protein replacement therapy) and/or variant B4GALT1 polypeptides may be introduced (e.g., protein replacement therapy).

일부 실시형태에서, 방법은 B4GALT1 rs551564683 변이체(예를 들어, 완전 변이체 B4GALT1 또는 변형을 포함하는 미니유전자)와 연관된 변형을 포함하는 재조합 B4GALT1 유전자를 도입 및 발현시키거나, 변이체 B4GALT1 폴리펩타이드 또는 이의 단편을 암호화하는 재조합 핵산(예를 들어, DNA)을 도입 및 발현시키거나, 변이체 B4GALT1 폴리펩타이드 또는 이의 단편을 암호화하는 1종 이상의 mRNA를 도입 및 발현시키거나(예를 들어, 세포내 단백질 대체 요법), 또는 변이체 B4GALT1이 아닌 내인성 B4GALT1 유전자를 넉아웃 또는 비활성화시키지 않고 변이체 B4GALT1 폴리펩타이드 또는 이의 단편(예를 들어, 단백질 대체 요법) 도입시키는 것을 포함한다. 일부 실시형태에서, 이러한 방법은 또한 변이체 B4GALT1이 아닌 내인성 B4GALT1 mRNA가 예컨대, 안티센스 RNA, siRNA 또는 shRNA의 사용을 통해서 감소된 발현에 대해서 표적화되는 방법과 조합하여 수행될 수 있다.In some embodiments, the methods include introducing and expressing a recombinant B4GALT1 gene comprising a modification associated with a B4GALT1 rs551564683 variant (e.g., a full variant B4GALT1 or a minigene comprising the modification), or a variant B4GALT1 polypeptide or fragment thereof. Introducing and expressing a recombinant nucleic acid (e.g., DNA) encoding, or introducing and expressing one or more mRNAs encoding a variant B4GALT1 polypeptide or fragment thereof (e.g., intracellular protein replacement therapy), or introducing a variant B4GALT1 polypeptide or fragment thereof (e.g., protein replacement therapy) without knocking out or inactivating an endogenous B4GALT1 gene other than variant B4GALT1 . In some embodiments, these methods also include variants other than B4GALT1 . This can be performed in combination with methods in which endogenous B4GALT1 mRNA is targeted for reduced expression, such as through the use of antisense RNA, siRNA or shRNA.

B4GALT1 유전자 또는 미니유전자 또는 변이체 B4GALT1 폴리펩타이드 또는 이의 단편을 암호화하는 DNA는 게놈을 변형시키지 않는 발현 벡터의 형태로 도입 및 발현될 수 있거나, 이것은 내인성 B4GALT1 유전자좌 내에 게놈에 의해서(genomically) 통합되도록 표적화 벡터의 형태로 도입될 수 있거나 또는 그것은 내인성 B4GALT1 유전자좌, 예컨대, 세이프 하버 유전자좌가 아닌 유전자좌 내에 게놈에 의해서 통합되도록 도입될 수 있다. 게놈에 의해서 통합된 B4GALT1 유전자는 통합 부위에서 B4GALT1 프로모터 또는 다른 프로모터, 예컨대, 내인성 프로모터에 작동 가능하게 연결될 수 있다. 세이프 하버 유전자좌는 트랜스젠이 유전자 구조 또는 발현에 악영향을 미치지 않으면서 관심대상의 모든 조적에서 안정적으로 실현 가능하게 발현될 수 있는 염색체 부위이다. 세이프 하버 유전자좌는 예를 들어, 하기 특징 중 하나 이상 또는 전부를 가질 수 있다. (1) 임의의 유전자의 5' 단부로부터 약 50kb 초과의 거리; 임의의 암-관련 유전자로부터 약 300kb 초과의 거리; 임의의 microRNA로부터 약 300kb 초과의 거리; 유전자 전사 단위 외부; 및 초-보존 영역의 외부. 적합한 세이프 하버 유전자좌의 예는 아데노-연관된 바이러스 부위 1(AAVS1), 케모카인(CC 모티프) 수용체 5(CCR5) 유전자 유전자좌, 및 마우스 ROSA26 유전자좌의 인간 오쏘로그를 포함하지만 이들로 제한되지 않는다.DNA encoding the B4GALT1 gene or minigene or variant B4GALT1 polypeptide or fragment thereof can be introduced and expressed in the form of an expression vector that does not modify the genome, or it can be targeted to be genomically integrated into the endogenous B4GALT1 locus. or it may be introduced to be genomically integrated within a locus other than the endogenous B4GALT1 locus, such as a safe harbor locus. The genomically integrated B4GALT1 gene may be operably linked to the B4GALT1 promoter or another promoter, such as an endogenous promoter, at the site of integration. A safe harbor locus is a chromosomal region where a transgene can be stably and feasibly expressed in all loci of interest without adversely affecting gene structure or expression. A safe harbor locus may have one or more or all of the following features, for example: (1) a distance greater than about 50 kb from the 5' end of any gene; A distance greater than about 300 kb from any cancer-related gene; A distance greater than about 300 kb from any microRNA; outside the gene transcription unit; and outside the super-conservation region. Examples of suitable safe harbor loci include, but are not limited to, the adeno-associated viral site 1 (AAVS1), the chemokine (CC motif) receptor 5 (CCR5) gene locus, and the human ortholog of the mouse ROSA26 locus.

일부 실시형태에서, 방법은 변이체 B4GALT1의 보유자가 아니고(또는 단지 변이체 B4GALT1의 이형접합성 보유자이고), 심혈관 병태를 갖거나 이의 발병에 민감한 대상체의 치료 방법을 포함하며, 이 방법은, 대상체 내에 또는 대상체의 세포 내에: a) 내인성 B4GALT1 유전자 내의 뉴클레아제 인식 서열에 결합하는 뉴클레아제 작용제(또는 암호화하는 핵산)(여기서 뉴클레아제 인식 서열은 서열번호 1의 53575 내지 53577번 위치를 포함하거나 또는 이에 인접함); 및 b) 서열번호 1의 53575 내지 53577번 위치의 표적 서열 5'에 혼성화되는 5' 상동성 아암 및 5' 상동성 아암 및 3' 상동성 아암에 의해서 측접된 세린을 암호화하는 핵산 서열을 포함하는 핵산 삽입물을 포함하는 외인성 공여자 서열을 도입시키는 단계를 포함한다. 뉴클레아제 작용제는 대상체에서 세포 내의 내인성 B4GALT1 유전자를 절단할 수 있고, 여기서 외인성 공여자 서열은 세포 내에서 내인성 B4GALT1 유전자와 재조합할 수 있고, 여기서 외인성 공여자 서열과 내인성 B4GALT1 유전자와의 재조합 시, 세린을 암호화하는 핵산 서열은 서열번호 1의 53575 내지 53577번 위치에 상응하는 뉴클레오타이드에서 삽입된다. 이러한 방법에 사용될 수 있는 뉴클레아제 작용제(예를 들어, Cas9 단백질 및 가이드 RNA)의 예는 본 명세서 다른 곳에 개시되어 있다.In some embodiments, the methods include a method of treating a subject who is not a carrier of variant B4GALT1 (or is only a heterozygous carrier of variant B4GALT1) and has or is susceptible to developing a cardiovascular condition, comprising: In the cell of: a) a nuclease agent (or nucleic acid encoding) that binds to a nuclease recognition sequence in the endogenous B4GALT1 gene, wherein the nuclease recognition sequence comprises or corresponds to positions 53575 to 53577 of SEQ ID NO: 1 adjacent); and b) a 5' homology arm that hybridizes to the target sequence 5' at positions 53575 to 53577 of SEQ ID NO: 1, and a nucleic acid sequence encoding a serine flanked by the 5' homology arm and the 3' homology arm. and introducing an exogenous donor sequence comprising a nucleic acid insert. The nuclease agent can cleave the endogenous B4GALT1 gene within a cell in a subject, wherein the exogenous donor sequence can recombine with the endogenous B4GALT1 gene within the cell, wherein upon recombination of the exogenous donor sequence with the endogenous B4GALT1 gene, serine The coding nucleic acid sequence is inserted at nucleotides corresponding to positions 53575 to 53577 of SEQ ID NO: 1. Examples of nuclease agents (e.g., Cas9 protein and guide RNA) that can be used in these methods are disclosed elsewhere herein.

일부 실시형태에서, 방법은 변이체 B4GALT1의 보유자가 아니고(또는 단지 변이체 B4GALT1의 이형접합성 보유자이고), 심혈관 병태를 갖거나 이의 발병에 민감한 대상체의 치료 방법을 포함하며, 이 방법은, 대상체 내에 또는 대상체의 세포 내에, 서열번호 1의 53575 내지 53577번 위치에 상응하는 위치의 표적 서열 5'에 혼성화되는 5' 상동성 아암, 서열번호 1의 53575 내지 53577번의 표적 서열 3'에 혼성화되는 3' 상동성 아암 및 5' 상동성 아암 및 3' 상동성 아암에 의해서 측접된 세린을 암호화하는 뉴클레오타이드 서열을 포함하는 핵산 삽입물을 포함하는 외인성 공여자 서열을 도입시키는 단계를 포함한다. 외인성 공여자 서열은 세포 내에서 내인성 B4GALT1 유전자와 재조합할 수 있고, 여기서 외인성 공여자 서열과 내인성 B4GALT1 유전자와의 재조합 시, 세린을 암호화하는 뉴클레오타이드 서열은 서열번호 1의 53575 내지 53577번 위치에 상응하는 뉴클레오타이드에서 삽입된다.In some embodiments, the methods include a method of treating a subject who is not a carrier of variant B4GALT1 (or is only a heterozygous carrier of variant B4GALT1) and has or is susceptible to developing a cardiovascular condition, comprising: A 5' homology arm that hybridizes to the target sequence 5' at a position corresponding to positions 53575 to 53577 of SEQ ID NO: 1, and a 3' homology arm that hybridizes to the target sequence 3' of positions 53575 to 53577 of SEQ ID NO: 1. introducing an exogenous donor sequence comprising an arm and a nucleic acid insert comprising a nucleotide sequence encoding a serine flanked by a 5' homology arm and a 3' homology arm. The exogenous donor sequence can be recombined with the endogenous B4GALT1 gene in a cell, wherein, upon recombination of the exogenous donor sequence with the endogenous B4GALT1 gene, the nucleotide sequence encoding serine is in the nucleotides corresponding to positions 53575 to 53577 of SEQ ID NO: 1. is inserted.

일부 이러한 방법은 변이체 B4GALT1 ant의 보유자가 아니고(또는 단지 변이체 B4GALT1의 이형접합성 보유자이고), 심혈관 병태를 갖거나 이의 발병에 민감한 대상체의 치료 방법을 포함하며, 이 방법은, 대상체 내에 또는 대상체의 세포 내에: a) 내인성 B4GALT1 유전자 내의 뉴클레아제 인식 서열에 결합하는 뉴클레아제 작용제(또는 암호화 핵산)를 도입시키는 단계를 포함하며, 여기서 뉴클레아제 인식 서열은 내인성 B4GALT1 유전자에 대한 시작 코돈을 포함하거나 또는 시작 코돈의 약 10, 약 20, 약 30, 약 40, 약 50, 약 100, 약 200, 약 300, 약 400, 약 500 또는 약 1,000개 뉴클레오타이드 내에 존재하거나 또는 서열번호 9 내지 12로부터 선택된다. 뉴클레아제 작용제는 대상체 내의 세포에서 내인성 B4GALT1 유전자를 절단하고, 이의 발현을 방해한다.Some such methods include methods of treating a subject who is not a carrier of variant B4GALT1 ant (or is only a heterozygous carrier of variant B4GALT1 ) and has or is susceptible to developing a cardiovascular condition, comprising: Into: a) introducing a nuclease agent (or encoding nucleic acid) that binds to a nuclease recognition sequence in the endogenous B4GALT1 gene, wherein the nuclease recognition sequence comprises a start codon for the endogenous B4GALT1 gene or or within about 10, about 20, about 30, about 40, about 50, about 100, about 200, about 300, about 400, about 500, or about 1,000 nucleotides of the start codon, or is selected from SEQ ID NOs: 9 to 12. . The nuclease agent cleaves the endogenous B4GALT1 gene in cells within the subject and prevents its expression.

일부 실시형태에서, 방법은 변이체 B4GALT1의 보유자가 아니고(또는 단지 변이체 B4GALT1의 이형접합성 보유자이고), 심혈관 병태를 갖거나 이의 발병에 민감한 대상체의 치료 방법을 포함하며, 이 방법은, 대상체 내에 또는 대상체의 세포 내에: a) 내인성 B4GALT1 유전자 내의 뉴클레아제 인식 서열에 결합하는 뉴클레아제 작용제(또는 암호화 핵산)(여기서 뉴클레아제 인식 서열은 내인성 B4GALT1 유전자에 대한 시작 코돈을 포함하거나 또는 시작 코돈의 약 10, 약 20, 약 30, 약 40, 약 50, 약 100, 약 200, 약 300, 약 400, 약 500 또는 약 1,000개 뉴클레오타이드 내에 존재하거나 또는 서열번호 9 내지 12로부터 선택됨); 및 b) 전장/성숙 B4GALT1 폴리펩타이드의 352번 위치에 상응하는 위치에 세린을 암호화하는 53575 내지 53577번 위치에 뉴클레오타이드 서열을 포함하는 재조합 B4GALT1 유전자를 포함하는 발현 벡터를 도입시키는 단계를 포함한다. 발현 벡터는 게놈에 의해서 통합되지 않는 것일 수 있다. 대안적으로, 표적화 벡터(즉, 외인성 공여자 서열)가 도입될 수 있고, 이것은 전장/성숙 B4GALT1 폴리펩타이드의 352번 위치에 상응하는 위치에 세린을 암호화하는 53575 내지 53577번 위치에 뉴클레오타이드 서열을 포함하는 재조합 B4GALT1 유전자를 포함한다. 뉴클레아제 작용제는 대상체 내의 세포에서 B4GALT1 유전자를 절단하고, 이의 발현을 방해하고, 발현 벡터는 대상체 내의 세포에서 재조합 B4GALT1 유전자를 발현할 수 있다. 대안적으로, 게놈에 의해서 통합된, 재조합 B4GALT1 유전자가 대상체 내의 세포에서 발현될 수 있다. 이러한 방법에 사용될 수 있는 뉴클레아제 작용제(예를 들어, 뉴클레아제 활성 Cas9 단백질 및 가이드 RNA)의 예는 본 명세서 다른 곳에 개시되어 있다. 적합한 가이드 RNA 및 가이드 RNA 인식 서열의 예가 또한 본 명세서 다른 곳에 개시되어 있다. 단계 b)는 대안적으로 변이체 B4GALT1 Asn352Ser 폴리펩타이드 또는 이의 단편과 적어도 90%, 적어도 95%, 적어도 96%, 적어도 97%, 적어도 98%, 적어도 99% 또는 100% 동일한 B4GALT1 폴리펩타이드를 암호화하는 핵산(예를 들어, DNA) 및/또는 변이체 B4GALT1 mRNA 또는 이의 단편과 적어도 90%, 적어도 95%, 적어도 96%, 적어도 97%, 적어도 98%, 적어도 99% 또는 100% 동일한 서열을 포함하는 발현 벡터 또는 표적화 벡터를 도입하는 것을 포함할 수 있다. 마찬가지로, 단계 b)는 또한 변이체 B4GALT1 Asn352Ser 폴리펩타이드 또는 이의 단편과 적어도 90%, 적어도 95%, 적어도 96%, 적어도 97%, 적어도 98%, 적어도 99% 또는 100% 동일한 B4GALT1 Asn352Ser 폴리펩타이드를 암호화하고/하거나 변이체 B4GALT1 mRNA 또는 이의 단편과 적어도 90%, 적어도 95%, 적어도 96%, 적어도 97%, 적어도 98%, 적어도 99% 또는 100% 동일한 상보성 DNA(또는 이의 부분)를 갖는 mRNA를 도입하는 것을 포함할 수 있다. 마찬가지로, 단계 b)는 또한 변이체 B4GALT1 Asn352Ser 폴리펩타이드 또는 이의 단편과 적어도 90%, 적어도 95%, 적어도 96%, 적어도 97%, 적어도 98%, 적어도 99% 또는 100% 동일한 아미노산 서열을 포함하는 단백질을 도입하는 것을 포함할 수 있다.In some embodiments, the methods include a method of treating a subject who is not a carrier of variant B4GALT1 (or is only a heterozygous carrier of variant B4GALT1 ) and has or is susceptible to developing a cardiovascular condition, comprising: In a cell of: a) a nuclease agent (or encoding nucleic acid) that binds to a nuclease recognition sequence in the endogenous B4GALT1 gene, wherein the nuclease recognition sequence comprises the start codon for the endogenous B4GALT1 gene or is about the start codon within 10, about 20, about 30, about 40, about 50, about 100, about 200, about 300, about 400, about 500 or about 1,000 nucleotides or selected from SEQ ID NOs: 9-12); and b) introducing an expression vector containing a recombinant B4GALT1 gene comprising a nucleotide sequence at positions 53575 to 53577 encoding a serine at a position corresponding to position 352 of the full-length/mature B4GALT1 polypeptide. The expression vector may not be integrated into the genome. Alternatively, a targeting vector (i.e., an exogenous donor sequence) can be introduced, which contains a nucleotide sequence at positions 53575 to 53577 that encodes a serine at position 352 corresponding to the full-length/mature B4GALT1 polypeptide. Contains the recombinant B4GALT1 gene. The nuclease agent can cleave the B4GALT1 gene in cells within the subject and prevent its expression, and the expression vector can express the recombinant B4GALT1 gene in cells within the subject. Alternatively, the recombinant B4GALT1 gene, integrated into the genome, can be expressed in cells within the subject. Examples of nuclease agents (e.g., nuclease active Cas9 protein and guide RNA) that can be used in these methods are disclosed elsewhere herein. Examples of suitable guide RNAs and guide RNA recognition sequences are also disclosed elsewhere herein. Step b) alternatively comprises a nucleic acid encoding a B4GALT1 polypeptide that is at least 90%, at least 95%, at least 96%, at least 97%, at least 98%, at least 99% or 100% identical to the variant B4GALT1 Asn352Ser polypeptide or fragment thereof. (e.g., DNA) and/or an expression vector comprising a sequence that is at least 90%, at least 95%, at least 96%, at least 97%, at least 98%, at least 99% or 100% identical to the variant B4GALT1 mRNA or fragment thereof. Alternatively, it may include introducing a targeting vector. Likewise, step b) also encodes a B4GALT1 Asn352Ser polypeptide that is at least 90%, at least 95%, at least 96%, at least 97%, at least 98%, at least 99% or 100% identical to the variant B4GALT1 Asn352Ser polypeptide or fragment thereof; /or introducing an mRNA having complementary DNA (or a portion thereof) that is at least 90%, at least 95%, at least 96%, at least 97%, at least 98%, at least 99% or 100 % identical to the variant B4GALT1 mRNA or fragment thereof. It can be included. Likewise, step b) also comprises a protein comprising an amino acid sequence that is at least 90%, at least 95%, at least 96%, at least 97%, at least 98%, at least 99% or 100% identical to the variant B4GALT1 Asn352Ser polypeptide or fragment thereof. This may include introducing

일부 실시형태에서, 제2 뉴클레아제 작용제는 또한 대상체 내에 또는 대상체의 세포 내에 도입되고, 여기서 제2 뉴클레아제 작용제는 내인성 B4GALT1 유전자 내의 뉴클레아제 인식 서열에 결합하고, 여기서 제2 뉴클레아제 인식 서열은 내인성 B4GALT1 유전자에 대한 제2 정지 코돈을 포함하거나 또는 정지 코돈의 약 10, 약 20, 약 30, 약 40, 약 50, 약 100, 약 200, 약 300, 약 400, 약 500 또는 약 1,000개 뉴클레오타이드 내에 존재하거나 또는 서열번호 9 내지 12로부터 선택되고, 여기서 뉴클레아제 작용제는 제1 뉴클레아제 인식 서열 및 제2 뉴클레아제 인식 서열 둘 다 내에서 세포에서 내인성 B4GALT1 유전자를 절단하고, 여기서 세포는 제1 뉴클레아제 인식 서열과 제2 뉴클레아제 인식 서열 사이에서 결실을 포함하도록 변형된다. 일부 실시형태에서, 제2 뉴클레아제 작용제는 Cas9 단백질 및 가이드 RNA일 수 있다. 적합한 가이드 RNA 및 중단 코돈에 인접한 가이드 RNA 인식 서열이 본 명세서 다른 곳에 개시되어 있다.In some embodiments, the second nuclease agent is also introduced into the subject or into a cell of the subject, wherein the second nuclease agent binds to a nuclease recognition sequence within the endogenous B4GALT1 gene, and wherein the second nuclease agent binds to a nuclease recognition sequence within the endogenous B4GALT1 gene. The recognition sequence includes a second stop codon for the endogenous B4GALT1 gene or about 10, about 20, about 30, about 40, about 50, about 100, about 200, about 300, about 400, about 500 or about exists within 1,000 nucleotides or is selected from SEQ ID NOs: 9-12, wherein the nuclease agent cleaves the endogenous B4GALT1 gene in the cell within both the first nuclease recognition sequence and the second nuclease recognition sequence; wherein the cell is transformed to contain a deletion between a first nuclease recognition sequence and a second nuclease recognition sequence. In some embodiments, the second nuclease agent may be Cas9 protein and guide RNA. Suitable guide RNAs and guide RNA recognition sequences adjacent to stop codons are disclosed elsewhere herein.

일부 실시형태에서, 방법은 또한 변이체 B4GALT1의 보유자가 아니고(또는 단지 변이체 B4GALT1의 이형접합성 보유자이고), 심혈관 병태를 갖거나 이의 발병에 민감한 대상체의 치료 방법을 포함할 수 있고, 이 방법은, 대상체 내에 또는 대상체의 세포 내에: 내인성 B4GALT1 mRNA 내의 영역 내의 서열에 혼성화되는 안티센스 RNA, siRNA 또는 shRNA를 도입시키는 단계를 포함한다. 예를 들어, 안티센스 RNA, siRNA 또는 shRNA는 서열번호 3(B4GALT1 mRNA)의 엑손 5 내의 영역 내의 서열에 혼성화되고, 대상체 내의 세포에서 B4GALT1 mRNA의 발현을 감소시킬 수 있다. 일부 실시형태에서, 이러한 방법은 대상체 내에, 서열번호 2의 53575 내지 53577번 위치에 삽입되는 세린을 암호화하는 뉴클레오타이드 서열을 포함하는 재조합 B4GALT1 유전자를 포함하는 발현 벡터를 도입시키는 단계를 추가로 포함할 수 있다. 발현 벡터는 게놈에 의해서 통합되지 않는 것일 수 있다. 대안적으로, 표적화 벡터(즉, 외인성 공여자 서열)가 도입될 수 있고, 이것은 서열번호 2의 53575 내지 53577번 위치에 상응하는 위치에 세린을 암호화하는 핵산 서열을 포함하는 재조합 B4GALT1 유전자를 포함한다. 발현 벡터가 사용되는 방법에서, 발현 벡터는 대상체 내의 세포에서 재조합 B4GALT1 유전자를 발현할 수 있다. 대안적으로, 재조합 B4GALT1 유전자가 게놈에 의해서 통합되는 방법에서, 재조합 B4GALT1 유전자는 대상체 내의 세포에서 발현할 수 있다.In some embodiments, the methods may also include a method of treating a subject who is not a carrier of variant B4GALT1 (or is only a heterozygous carrier of variant B4GALT1 ) and has or is susceptible to developing a cardiovascular condition, which method comprises: or into a cell of the subject: introducing an antisense RNA, siRNA, or shRNA that hybridizes to a sequence within a region within the endogenous B4GALT1 mRNA. For example, antisense RNA, siRNA or shRNA can hybridize to a sequence within exon 5 of SEQ ID NO:3 ( B4GALT1 mRNA) and reduce the expression of B4GALT1 mRNA in cells within a subject. In some embodiments, these methods may further comprise introducing into the subject an expression vector comprising a recombinant B4GALT1 gene comprising a nucleotide sequence encoding a serine inserted at positions 53575 to 53577 of SEQ ID NO: 2. there is. The expression vector may not be integrated into the genome. Alternatively, a targeting vector (i.e., an exogenous donor sequence) can be introduced, which contains a recombinant B4GALT1 gene comprising a nucleic acid sequence encoding serine at positions corresponding to positions 53575 to 53577 of SEQ ID NO:2. In methods in which an expression vector is used, the expression vector is capable of expressing the recombinant B4GALT1 gene in cells within the subject. Alternatively, in methods where the recombinant B4GALT1 gene is integrated into the genome, the recombinant B4GALT1 gene can be expressed in cells within the subject.

일부 실시형태에서, 이러한 방법은 변이체 B4GALT1 Asn352Ser 폴리펩타이드 또는 이의 단편과 적어도 90%, 적어도 95%, 적어도 96%, 적어도 97%, 적어도 98%, 적어도 99% 또는 100% 동일한 B4GALT1 폴리펩타이드를 암호화하는 핵산(예를 들어, DNA) 및/또는 변이체 B4GALT1 mRNA 또는 이의 단편과 적어도 90%, 적어도 95%, 적어도 96%, 적어도 97%, 적어도 98%, 적어도 99% 또는 100% 동일한 서열을 포함하는 발현 벡터 또는 표적화 벡터를 도입하는 것을 포함할 수 있다. 마찬가지로, 이러한 방법은 대안적으로 변이체 B4GALT1 Asn352Ser 폴리펩타이드 또는 이의 단편과 적어도 90%, 적어도 95%, 적어도 96%, 적어도 97%, 적어도 98%, 적어도 99% 또는 100% 동일한 폴리펩타이드를 암호화하고/하거나 변이체 B4GALT1 mRNA 또는 이의 단편과 적어도 90%, 적어도 95%, 적어도 96%, 적어도 97%, 적어도 98%, 적어도 99% 또는 100% 동일한 상보성 DNA(또는 이의 부분)를 갖는 mRNA를 도입하는 것을 포함할 수 있다. 마찬가지로, 이러한 방법은 대안적으로 변이체 B4GALT1 Asn352Ser 폴리펩타이드 또는 이의 단편과 적어도 90%, 적어도 95%, 적어도 96%, 적어도 97%, 적어도 98%, 적어도 99% 또는 100% 동일한 서열을 포함하는 폴리펩타이드를 도입하는 것을 포함할 수 있다.In some embodiments, such methods comprise a method comprising a variant B4GALT1 polypeptide encoding a B4GALT1 polypeptide that is at least 90%, at least 95%, at least 96%, at least 97%, at least 98%, at least 99%, or 100% identical to a variant B4GALT1 Asn352Ser polypeptide or fragment thereof. Expression comprising a sequence that is at least 90%, at least 95%, at least 96%, at least 97%, at least 98%, at least 99% or 100% identical to a nucleic acid (e.g., DNA) and/or variant B4GALT1 mRNA or fragment thereof. It may include introducing a vector or targeting vector. Likewise, these methods alternatively encode a polypeptide that is at least 90%, at least 95%, at least 96%, at least 97%, at least 98%, at least 99% or 100% identical to the variant B4GALT1 Asn352Ser polypeptide or fragment thereof, and/ or introducing an mRNA having complementary DNA (or a portion thereof) that is at least 90%, at least 95%, at least 96%, at least 97%, at least 98%, at least 99% or 100 % identical to the variant B4GALT1 mRNA or fragment thereof. can do. Likewise, these methods may alternatively be used to produce a polypeptide comprising a sequence that is at least 90%, at least 95%, at least 96%, at least 97%, at least 98%, at least 99% or 100% identical to the variant B4GALT1 Asn352Ser polypeptide or fragment thereof. It may include introducing .

일부 실시형태에서, 이러한 방법은 변이체 B4GALT1의 보유자가 아니고(또는 단지 변이체 B4GALT1의 이형접합성 보유자이고), 심혈관 병태를 갖거나 이의 발병에 민감한 대상체의 치료 방법을 포함할 수 있고, 이 방법은, 대상체 내에 또는 대상체의 세포 내에 발현 벡터를 도입시키는 단계를 포함하고, 여기서 발현 벡터는 전장/성숙 B4GALT1 폴리펩타이드의 352번 위치에 상응하는 위치에 세린을 암호화하는 53575 내지 53577번 위치에 뉴클레오타이드 서열을 포함하는 재조합 B4GALT1 유전자를 포함하고, 여기서 발현 벡터는 대상체 내의 세포에서 재조합 B4GALT1 유전자를 발현한다. 발현 벡터는 게놈에 의해서 통합되지 않는 것일 수 있다. 대안적으로, 표적화 벡터(즉, 외인성 공여자 서열)가 도입될 수 있고, 이것은 전장/성숙 B4GALT1 폴리펩타이드의 352번 위치에 상응하는 위치에 세린을 암호화하는 서열번호 2의 53575 내지 53577번 위치에 뉴클레오타이드 서열을 포함하는 재조합 B4GALT1 유전자를 포함한다. 발현 벡터가 사용되는 방법에서, 발현 벡터는 대상체 내의 세포에서 재조합 B4GALT1 유전자를 발현할 수 있다. 대안적으로, 재조합 B4GALT1 유전자가 게놈에 의해서 통합되는 방법에서, 재조합 B4GALT1 유전자는 대상체 내의 세포에서 발현할 수 있다.In some embodiments, such methods may include methods of treating a subject who is not a carrier of variant B4GALT1 (or is only a heterozygous carrier of variant B4GALT1) and has or is susceptible to developing a cardiovascular condition, the method comprising: Introducing an expression vector within or into a cell of a subject, wherein the expression vector comprises a nucleotide sequence at positions 53575 to 53577 that encodes a serine at a position corresponding to position 352 of the full-length/mature B4GALT1 polypeptide. and a recombinant B4GALT1 gene, wherein the expression vector expresses the recombinant B4GALT1 gene in a cell within the subject. The expression vector may not be integrated into the genome. Alternatively, a targeting vector (i.e., an exogenous donor sequence) can be introduced, which contains nucleotides at positions 53575 to 53577 of SEQ ID NO: 2, which encodes a serine at a position corresponding to position 352 of the full-length/mature B4GALT1 polypeptide. Contains a recombinant B4GALT1 gene comprising the sequence. In methods in which an expression vector is used, the expression vector is capable of expressing the recombinant B4GALT1 gene in cells within the subject. Alternatively, in methods where the recombinant B4GALT1 gene is integrated into the genome, the recombinant B4GALT1 gene can be expressed in cells within the subject.

이러한 방법은 대안적으로 변이체 B4GALT1 Asn352Ser 폴리펩타이드 또는 이의 단편과 적어도 90%, 적어도 95%, 적어도 96%, 적어도 97%, 적어도 98%, 적어도 99% 또는 100% 동일한 B4GALT1 폴리펩타이드를 암호화하는 핵산(예를 들어, DNA) 및/또는 변이체 B4GALT1 mRNA 또는 이의 단편과 적어도 90%, 적어도 95%, 적어도 96%, 적어도 97%, 적어도 98%, 적어도 99% 또는 100% 동일한 서열을 포함하는 발현 벡터 또는 표적화 벡터를 도입하는 것을 포함할 수 있다. 마찬가지로, 이러한 방법은 대안적으로 변이체 B4GALT1 폴리펩타이드 또는 이의 단편과 적어도 90%, 적어도 95%, 적어도 96%, 적어도 97%, 적어도 98%, 적어도 99% 또는 100% 동일한 폴리펩타이드를 암호화하고/하거나 변이체 B4GALT1 mRNA 또는 이의 단편과 적어도 90%, 적어도 95%, 적어도 96%, 적어도 97%, 적어도 98%, 적어도 99% 또는 100% 동일한 상보성 DNA(또는 이의 부분)를 갖는 mRNA를 도입하는 것을 포함할 수 있다. 마찬가지로, 이러한 방법은 대안적으로 변이체 B4GALT1 Asn352Ser 폴리펩타이드 또는 이의 단편과 적어도 90%, 적어도 95%, 적어도 96%, 적어도 97%, 적어도 98%, 적어도 99% 또는 100% 동일한 서열을 포함하는 단백질을 도입하는 것을 포함할 수 있다. These methods alternatively include nucleic acids ( For example, DNA) and/or an expression vector comprising a sequence that is at least 90%, at least 95%, at least 96%, at least 97%, at least 98%, at least 99% or 100% identical to the variant B4GALT1 mRNA or fragment thereof, or It may include introducing a targeting vector. Likewise, these methods alternatively encode a polypeptide that is at least 90%, at least 95%, at least 96%, at least 97%, at least 98%, at least 99% or 100% identical to the variant B4GALT1 polypeptide or fragment thereof. It may comprise introducing an mRNA having complementary DNA (or a portion thereof) that is at least 90%, at least 95%, at least 96%, at least 97%, at least 98%, at least 99% or 100 % identical to the variant B4GALT1 mRNA or fragment thereof. You can. Likewise, these methods may alternatively produce a protein comprising a sequence that is at least 90%, at least 95%, at least 96%, at least 97%, at least 98%, at least 99% or 100 % identical to the variant B4GALT1 Asn352Ser polypeptide or fragment thereof. This may include introducing

상기 방법 중 임의의 것에서 사용하기에 적합한 발현 벡터 및 재조합 B4GALT1 유전자는 본 명세서 다른 곳에 개시되어 있다. 예를 들어, 재조합 B4GALT1 유전자는 완전 B4GALT1 변이체 유전자일 수 있거나 또는 유전자의 하나 이상의 비필수적인 분절이 상응하는 야생형 B4GALT1 유전자에 대해서 결실된 B4GALT1 미니유전자일 수 있다. 예로서, 결실된 분절은 하나 이상의 인트론 서열을 포함할 수 있고, 미니유전자는 엑손 1 내지 6을 포함할 수 있다. 완전 B4GALT1 변이체 유전자의 예는 서열번호 2와 적어도 90%, 적어도 95%, 적어도 96%, 적어도 97%, 적어도 98%, 적어도 99% 또는 100% 동일한 것이다.Expression vectors and recombinant B4GALT1 genes suitable for use in any of the above methods are disclosed elsewhere herein. For example, the recombinant B4GALT1 gene can be a complete B4GALT1 variant gene or a B4GALT1 minigene in which one or more non-essential segments of the gene have been deleted relative to the corresponding wild-type B4GALT1 gene. By way of example, the deleted segment may include one or more intron sequences and the minigene may include exons 1-6. An example of a complete B4GALT1 variant gene is one that is at least 90%, at least 95%, at least 96%, at least 97%, at least 98%, at least 99% or 100% identical to SEQ ID NO:2.

일부 실시형태에서, 이러한 방법은 심혈관 병태를 갖거나 또는 이의 발명에 민감한 대상체에서 세포를 변형시키는 방법을 포함한다. 이러한 방법에서, 뉴클레아제 작용제 및/또는 외인성 공여자 서열 및/또는 재조합 발현 벡터는 치료될 심혈관 병태 중 적어도 하나의 징후 또는 증상의 발병을 지연시키고/시키거나, 중증도를 감소시키고/시키거나, 추가 악화를 저해하고/거나 개선시키는 투여량, 투여 경로 및 투여 빈도를 의미하는 효과적인 요법에서 투여를 통해서 세포 내로 도입될 수 있다. 용어 "증상"은 대상체에 의해 인지되는 바와 같은 질환의 주관적인 증거를 나타내며, "신호"는 의사에 의해 관찰되는 바와 같은 질환의 객관적인 증거를 나타낸다. 대상체가 질환을 이미 앓고 있으면, 요법은 치료적으로 효과적인 요법으로서 언급될 수 있다. 대상체가 일반적인 집단에 비해서 질환에 대해서 증가된 위험을 갖지만 아직 증상을 경험하지 않으면, 요법은 예방적으로 효과적인 요법으로서 언급될 수 있다. 일부 예에서, 치료적 또는 예방적 효능은 동일한 대상체에서 과거 대조군 또는 지나간 경험과 비교하여 개별 환자에서 관찰될 수 있다. 다른 예에서, 치료적 또는 예방적 효능은 미치료 대상체의 대조군 집단과 비교하여, 치료된 대상체의 집단에서 전임상 또는 임상 실험으로 입증될 수 있다.In some embodiments, such methods include methods of modifying cells in a subject having or susceptible to a cardiovascular condition. In these methods, the nuclease agent and/or exogenous donor sequence and/or recombinant expression vector are used to delay the onset, reduce the severity, and/or increase the severity of at least one sign or symptom of the cardiovascular condition being treated. It can be introduced into the cell through administration in an effective regimen, meaning the dosage, route of administration and frequency of administration that inhibits and/or improves the exacerbation. The term “symptoms” refers to subjective evidence of disease as perceived by a subject, and “signs” refers to objective evidence of disease as observed by a physician. If the subject already suffers from the disease, the therapy may be referred to as a therapeutically effective therapy. If the subject has an increased risk for the disease compared to the general population but has not yet experienced symptoms, the therapy may be referred to as a prophylactically effective therapy. In some instances, therapeutic or prophylactic efficacy may be observed in an individual patient compared to historical controls or past experience in the same subject. In other examples, therapeutic or prophylactic efficacy can be demonstrated in preclinical or clinical trials in a population of treated subjects compared to a control population of untreated subjects.

전달은 본 명세서 다른 곳에 개시된 바와 같은, 임의의 적합한 방법일 수 있다. 예를 들어, 뉴클레아제 작용제 또는 외인성 공여자 서열 또는 재조합 발현 벡터는, 예를 들어, 벡터 전달, 바이러스 전달, 입자-매개된 전달, 나노입자-매개된 전달, 리포솜-매개된 전달, 엑소좀-매개된 전달, 지질-매개된 전달, 지질-나노입자-매개된 전달, 세포-관통-펩타이드-매개된 전달 또는 이식 가능한-장치-매개된 전달에 의해서 전달될 수 있다. 구체적인 예는 유체역학적 전달, 바이러스-매개된 전달 및 지질-나노입자-매개된 전달을 포함한다.Delivery may be by any suitable method, as disclosed elsewhere herein. For example, nuclease agents or exogenous donor sequences or recombinant expression vectors can be used, for example, in vector delivery, viral delivery, particle-mediated delivery, nanoparticle-mediated delivery, liposome-mediated delivery, exosome-mediated delivery, etc. Delivery may be by mediated delivery, lipid-mediated delivery, lipid-nanoparticle-mediated delivery, cell-penetrating-peptide-mediated delivery, or implantable-device-mediated delivery. Specific examples include hydrodynamic delivery, virus-mediated delivery, and lipid-nanoparticle-mediated delivery.

투여는, 예를 들어, 비경구, 정맥내, 경구, 피하, 동맥내, 두개내, 척수강내, 복강내, 국소, 비강내, 또는 근육내를 포함하지만 이들로 제한되지 않는 임의의 적합한 경로에 의할 수 있다. 예를 들어, 단백질 대체 요법에 대해 종종 사용되는 구체적인 예는 정맥내 주입이다. 투여 빈도 및 투여 횟수는 다른 인자 중에서 뉴클레아제 작용제 또는 외인성 공여자 서열 또는 재조합 발현 벡터의 반감기, 대상체의 병태 및 투여 경로에 좌우될 수 있다. 약제학적 조성물은 바람직하게 무균성이고, 실질적으로 등장성이고, GMP 조건 하에 제조된다. 약제학적 조성물은 단위 투여 형태(즉, 단일 투여를 위한 투여량)로 제공될 수 있다. 약제학적 조성물은 1종 이상의 생리학적으로 그리고 약제학적으로 허용 가능한 담체, 희석제, 부형제 또는 보조제를 사용하여 제형화될 수 있다. 제형은 선택된 투여 경로에 좌우된다. 용어 "약제학적으로 허용 가능한"은 담체, 희석제, 부형제 또는 보조제가 제형의 다른 성분과 상용성이고, 이의 수용자에게 실질적으로 유해하지 않다는 것을 의미한다. Administration can be by any suitable route, including but not limited to, for example, parenteral, intravenous, oral, subcutaneous, intraarterial, intracranial, intrathecal, intraperitoneal, topical, intranasal, or intramuscular. It can be depended on. For example, a specific example often used for protein replacement therapy is intravenous infusion. The frequency and frequency of administration may depend on the half-life of the nuclease agent or exogenous donor sequence or recombinant expression vector, the condition of the subject, and the route of administration, among other factors. The pharmaceutical composition is preferably sterile, substantially isotonic, and manufactured under GMP conditions. Pharmaceutical compositions may be presented in unit dosage form (i.e., dosage for a single administration). Pharmaceutical compositions may be formulated using one or more physiologically and pharmaceutically acceptable carriers, diluents, excipients, or adjuvants. The formulation will depend on the route of administration chosen. The term “pharmaceutically acceptable” means that the carrier, diluent, excipient or adjuvant is compatible with the other ingredients of the formulation and is not substantially harmful to its recipient.

이러한 다른 방법은 심혈관 병태를 갖거나 또는 이의 발병에 민감한 대상체로부터의 세포에서의 생체외 방법을 포함한다. 이어서 표적화된 유전자 변형을 갖는 세포가 대상체 내로 다시 이식될 수 있다.These other methods include in vitro methods in cells from subjects with or susceptible to developing cardiovascular conditions. Cells with the targeted genetic modification can then be transplanted back into the subject.

본 개시내용은 본 명세서에 기재된 방법 중 임의의 것에 의해서, 내인성 야생형 B4GALT1의 발현을 감소시키거나 또는 B4GALT1 Asn352Ser의 발현을 증가시킴으로써, LDL의 감소를 필요로 하는 대상체에서 LDL을 감소시키는 방법을 제공한다. 본 개시내용은 본 명세서에 기재된 방법 중 임의의 것에 의해서, 내인성 야생형 B4GALT1의 발현을 감소시키거나 또는 B4GALT1 Asn352Ser의 발현을 증가시킴으로써, 총 콜레스테롤의 감소를 필요로 하는 대상체에서 총 콜레스테롤을 감소시키는 방법을 제공한다. 본 개시내용은 본 명세서에 기재된 방법 중 임의의 것에 의해서, 내인성 야생형 B4GALT1의 발현을 감소시키거나 또는 B4GALT1 Asn352Ser의 발현을 증가시킴으로써, 피브리노겐의 감소를 필요로 하는 대상체에서 피브리노겐을 감소시키는 방법을 제공한다. 본 개시내용은 본 명세서에 기재된 방법 중 임의의 것에 의해서, 내인성 야생형 B4GALT1의 발현을 감소시키거나 또는 B4GALT1 Asn352Ser의 발현을 증가시킴으로써, eGFR의 감소를 필요로 하는 대상체에서 eGFR을 감소시키는 방법을 제공한다. 본 개시내용은 본 명세서에 기재된 방법 중 임의의 것에 의해서, 내인성 야생형 B4GALT1의 발현을 감소시키거나 또는 B4GALT1 Asn352Ser의 발현을 증가시킴으로써, ALT가 아닌 AST의 증가를 필요로 하는 대상체에서 ALT가 아닌 AST를 증가시키는 방법을 제공한다. 본 개시내용은 본 명세서에 기재된 방법 중 임의의 것에 의해서, 내인성 야생형 B4GALT1의 발현을 감소시키거나 또는 B4GALT1 Asn352Ser의 발현을 증가시킴으로써, 크레아티닌의 증가를 필요로 하는 대상체에서 크레아티닌을 증가시키는 방법을 제공한다. The present disclosure provides methods of reducing LDL in a subject in need thereof by reducing expression of endogenous wild-type B4GALT1 or increasing expression of B4GALT1 Asn352Ser, by any of the methods described herein. . The present disclosure provides methods for reducing total cholesterol in a subject in need thereof by reducing expression of endogenous wild-type B4GALT1 or increasing expression of B4GALT1 Asn352Ser, by any of the methods described herein. to provide. The present disclosure provides methods of reducing fibrinogen in a subject in need thereof by reducing expression of endogenous wild-type B4GALT1 or increasing expression of B4GALT1 Asn352Ser, by any of the methods described herein. . The present disclosure provides methods of reducing eGFR in a subject in need thereof by reducing expression of endogenous wild-type B4GALT1 or increasing expression of B4GALT1 Asn352Ser, by any of the methods described herein. . The present disclosure provides for increasing AST, but not ALT, in a subject in need thereof by reducing expression of endogenous wild-type B4GALT1 or increasing expression of B4GALT1 Asn352Ser, by any of the methods described herein. Provides a way to increase The present disclosure provides methods of increasing creatinine in a subject in need thereof by reducing expression of endogenous wild-type B4GALT1 or increasing expression of B4GALT1 Asn352Ser, by any of the methods described herein. .

본 개시내용은 또한 심혈관 병태의 발병 위험을 진단하는 방법 또는 심혈관 병태의 발병 위험을 진단하는 방법 및 심혈관 병태의 치료를 필요로 하는 대상체에서 심혈관 병태를 치료하는 방법을 제공하며, 이 방법은, 본 명세서에 기재된 바와 같이, 변이체 B4GALT1 유전자, mRNA, cDNA 또는 폴리펩타이드의 존재 또는 부재에 대해서, 대상체로부터의 샘플의 분석 결과를 제공하는 시험을 요청하는 단계; 및 변이체 B4GALT1 유전자, mRNA, cDNA 또는 폴리펩타이드를 갖지 않는 대상체에서, 본 명세서에 기재된 바와 같은 치료제를 대상체에게 투여하는 단계를 포함한다. 변이체 B4GALT1 유전자, mRNA, cDNA 또는 폴리펩타이드의 존재 또는 부재가 결정되는 본 명세서에 기재된 시험 중 임의의 것이 사용될 수 있다. The disclosure also provides a method of diagnosing a risk of developing a cardiovascular condition or a method of diagnosing a risk of developing a cardiovascular condition and a method of treating a cardiovascular condition in a subject in need thereof, the method comprising: Requesting a test that provides analysis of a sample from the subject for the presence or absence of a variant B4GALT1 gene, mRNA, cDNA or polypeptide, as described herein; and, in a subject that does not have the variant B4GALT1 gene, mRNA, cDNA, or polypeptide, administering to the subject a therapeutic agent as described herein. Any of the tests described herein can be used to determine the presence or absence of a variant B4GALT1 gene, mRNA, cDNA, or polypeptide.

본 개시내용은 또한 하기를 필요로 하는 대상체에서 LDL을 감소시키고, 총 콜레스테롤을 감소시키고, 피브리노겐을 감소시키고, eGFR을 감소시키고, AST(ALT가 아님)를 증가시키고, 크레아티닌을 증가시키기 위한 의약의 제조에서의 본 명세서에 개시된 변이체 B4GALT1 유전자, mRNA, cDNA, 폴리펩타이드 및 혼성화 핵산 분자 중 임의의 것의 용도를 제공한다. 본 개시내용은 또한 관상 동맥 질환, 관상 동맥 석회화 및 관련 장애의 치료를 위한 의약의 제조에서의 변이체 B4GALT1 유전자, mRNA, cDNA, 폴리펩타이드 및 혼성화 핵산 분자 중 임의의 것의 용도를 제공한다.The present disclosure also provides for the use of a medicament to reduce LDL, reduce total cholesterol, reduce fibrinogen, reduce eGFR, increase AST (but not ALT), and increase creatinine in a subject in need thereof. Provided is the use of any of the variant B4GALT1 genes, mRNA, cDNA, polypeptides and hybridized nucleic acid molecules disclosed herein in manufacturing. The disclosure also provides for the use of any of the variant B4GALT1 genes, mRNA, cDNA, polypeptides and hybridized nucleic acid molecules in the manufacture of a medicament for the treatment of coronary artery disease, coronary artery calcification and related disorders.

본 개시내용은 또한 하기를 필요로 하는 대상체에서 LDL을 감소시키고, 총 콜레스테롤을 감소시키고, 피브리노겐을 감소시키고, eGFR을 감소시키고, AST(ALT가 아님)를 증가시키고, 크레아티닌을 증가시키기 위한 본 명세서에 개시된 변이체 B4GALT1 유전자, mRNA, cDNA, 폴리펩타이드 및 혼성화 핵산 분자 중 임의의 것의 용도를 제공한다.The present disclosure also provides methods for reducing LDL, reducing total cholesterol, reducing fibrinogen, reducing eGFR, increasing AST (but not ALT), and increasing creatinine in a subject in need thereof. Provided are uses of any of the variant B4GALT1 genes, mRNA, cDNA, polypeptides and hybridized nucleic acid molecules disclosed in.

본 개시내용은 또한 관상 동맥 질환, 관상 동맥 석회화, 타입 IId 글리코실화(CDG-IId) 및 관련 장애의 치료를 위한 변이체 B4GALT1 유전자, mRNA, cDNA, 폴리펩타이드 및 혼성화 핵산 분자 중 임의의 것의 용도를 제공한다.The present disclosure also provides use of any of the variant B4GALT1 genes, mRNA, cDNA, polypeptides and hybridized nucleic acid molecules for the treatment of coronary artery disease, coronary artery calcification, type IId glycosylation (CDG-IId) and related disorders. do.

본 개시내용은 또한 하기를 필요로 하는 대상체에서 세포에서의 B4GALT1 유전자를 변형시키기 위한 본 명세서에 개시된 변이체 B4GALT1 유전자, mRNA, cDNA, 폴리펩타이드 및 혼성화 핵산 분자 중 임의의 것의 용도를 제공한다.The disclosure also provides the use of any of the variant B4GALT1 genes, mRNA, cDNA, polypeptides and hybridized nucleic acid molecules disclosed herein for modifying the B4GALT1 gene in a cell in a subject in need thereof.

본 개시내용은 또한 하기를 필요로 하는 대상체에서 세포에서의 B4GALT1 유전자의 발현을 변경시키기 위한 본 명세서에 개시된 변이체 B4GALT1 유전자, mRNA, cDNA, 폴리펩타이드 및 혼성화 핵산 분자 중 임의의 것의 용도를 제공한다.The disclosure also provides for the use of any of the variant B4GALT1 genes, mRNA, cDNA, polypeptides and hybridized nucleic acid molecules disclosed herein for altering the expression of the B4GALT1 gene in a cell in a subject in need thereof.

본 개시내용은 또한 본 명세서에 개시된 심혈관 병태 중 임의의 것의 발병 위험을 진단하기 위한 본 명세서에 개시된 변이체 B4GALT1 유전자, mRNA, cDNA, 폴리펩타이드 및 혼성화 핵산 분자 중 임의의 것의 용도를 제공한다.The disclosure also provides for the use of any of the variant B4GALT1 genes, mRNA, cDNA, polypeptides, and hybridized nucleic acid molecules disclosed herein for diagnosing the risk of developing any of the cardiovascular conditions disclosed herein.

본 개시내용은 또한 본 명세서에 개시된 심혈관 병태 중 임의의 것을 갖는 대상체를 진단하기 위한 본 명세서에 개시된 변이체 B4GALT1 유전자, mRNA, cDNA, 폴리펩타이드 및 혼성화 핵산 분자 중 임의의 것의 용도를 제공한다.The disclosure also provides for the use of any of the variant B4GALT1 genes, mRNA, cDNA, polypeptides, and hybridized nucleic acid molecules disclosed herein for diagnosing a subject with any of the cardiovascular conditions disclosed herein.

상기 또는 하기에서 인용된 모든 특허 출원, 웹사이트, 기타 간행물, 등록 번호 등은, 각각의 개별적인 항목이 구체적으로 및 개별적으로 참고로 포함된 것으로 제시된 것과 동일한 정도로 모든 목적을 위해서 전문이 참고로 포함된다. 상이한 버전의 서열이 상이한 시점에서의 등록 번호와 연관되는 경우, 본 출원의 유효한 출원일의 등록 번호와 연관된 버전이 의미가 있다. 유효한 출원일은 실제 출원일 이전이거나 해당되는 경우 등록 번호를 언급한 우선권 출원의 출원일을 의미한다. 마찬가지로, 상이한 버전의 간행물, 웹사이트 등이 상이한 시점에 공개되면, 달리 제시되지 않는 한 본 출원의 유효한 출원일에 가장 가깝게 공개된 버전을 의미한다. 본 개시내용의 임의의 특징, 단계, 요소, 실시형태 또는 양상은 구체적으로 달리 제시되지 않는 한 임의의 다른 것과 조합하여 사용될 수 있다. 본 발명이 명료성 및 이해를 목적으로 어느 정도 상세하게 설명 및 예에 의해서 기재되었지만, 특정 변화 및 변형이 첨부된 청구범위의 범주 내에서 실시될 수 있는 것이 명백할 것이다.All patent applications, websites, other publications, registration numbers, etc. cited above or hereinafter are hereby incorporated by reference in their entirety for all purposes to the same extent as if each individual item was specifically and individually indicated to be incorporated by reference. . If different versions of a sequence are associated with registration numbers at different times, the version associated with the registration number at the effective filing date of this application is meaningful. Effective filing date means prior to the actual filing date or, if applicable, the filing date of the priority application to which the registration number refers. Likewise, if different versions of publications, websites, etc. are published at different times, this means the version published closest to the effective filing date of this application, unless otherwise indicated. Any feature, step, element, embodiment or aspect of the disclosure may be used in combination with any other unless specifically indicated otherwise. Although the invention has been described in some detail by way of illustration and example for purposes of clarity and understanding, it will be apparent that certain changes and modifications may be practiced within the scope of the appended claims.

본 명세서에서 언급된 뉴클레오타이드 및 아미노산 서열은 뉴클레오타이드 염기에 대한 표준 문자 약어 및 아미노산에 대한 1-문자 암호를 사용하여 제시된다. 뉴클레오타이드 서열은 서열의 5' 단부에서 시작하여 3' 단부를 향해 진행하는(즉, 각 라인에서 좌측에서 우측으로) 표준 관례를 따른다. 각각의 뉴클레오타이드 서열의 단지 하나의 가닥이 제시되지만, 상보성 가닥은 표시된 가닥의 임의의 언급에 의해서 포함되는 것으로 이해된다. 아미노산 서열은 서열의 아미노 말단에서 시작하여 카복시 말단을 향해 진행하는(즉, 각각의 라인에서 좌측에서 우측으로) 표준 관례를 따른다.Nucleotide and amino acid sequences referred to herein are presented using standard letter abbreviations for nucleotide bases and one-letter codes for amino acids. Nucleotide sequences follow the standard convention of starting at the 5' end of the sequence and proceeding toward the 3' end (i.e., left to right on each line). Although only one strand of each nucleotide sequence is shown, complementary strands are understood to be included by any reference to the indicated strand. Amino acid sequences follow the standard convention of starting at the amino terminus of the sequence and proceeding toward the carboxy terminus (i.e., left to right in each line).

미국 출원 제62/659,344호(출원일: 2018년 4월 18일), 미국 출원 제62/550,161호(출원일 2017년 8월 25일) 및 미국 특허 제62/515,140호(출원일 2017년 6월 5일)은 각각 이들의 전문이 본 명세서에 참고로 원용된다.U.S. Application No. 62/659,344 (filed April 18, 2018), U.S. Application No. 62/550,161 (filed August 25, 2017), and U.S. Patent No. 62/515,140 (filed June 5, 2017) ) are each incorporated herein by reference in their entirety.

하기 실시예는 실시형태를 보다 상세하게 기재하기 위해서 제공된다. 이것은 청구된 실시형태를 제한하는 것이 아니라, 설명하도록 의도된다.　The following examples are provided to describe the embodiments in more detail. It is intended to be illustrative, not limiting, of the claimed embodiments.

실시예Example

실시예 1: 전장 게놈 통계학적 유의성에서의 혈청 지질 특질과 연관된 염색체 9p.21 상의 신규 유전자좌의 결정Example 1: Determination of novel loci on chromosome 9p.21 associated with serum lipid traits at genome-wide statistical significance

물질 및 방법:Materials and Methods:

Chip 유전자형분석 및 QC: OOA의 개체로부터의 전혈로부터 게놈 DNA를 추출하였고, 피코그린(picogreen)을 사용하여 정량하였다. 매릴랜드 대학교의 바이오폴리머 중앙 설비(University of Maryland Biopolymer Core Facility)에서 Affymetrix 500K 및 6.0 칩을 사용하여 전장 게놈 유전자형분석을 수행하였다. BRLMM 알고리즘을 유전자형 콜링(genotype calling)을 위해서 사용하였다. 0.93 미만의 콜비(call rate), 높은 수준의 멘델리안 오류(Mendelian error) 또는 성별 미스매치를 갖는 샘플을 제외시켰다. 0.95 미만의 콜비, 1.0E-6 미만의 HWE p-값 또는 0.01 미만의 MAF를 갖는 SNP를 제외시켰다. 염색체 X 및 Y 상의 SNP 및 미토콘드리아 게놈을 또한 제외시켰다. Chip genotyping and QC: Genomic DNA was extracted from whole blood from OOA individuals and quantified using picogreen. Whole-genome genotyping was performed using Affymetrix 500K and 6.0 chips at the University of Maryland Biopolymer Core Facility. The BRLMM algorithm was used for genotype calling. Samples with call rates below 0.93, high levels of Mendelian error, or gender mismatch were excluded. SNPs with Col ratio less than 0.95, HWE p-value less than 1.0E-6 or MAF less than 0.01 were excluded. SNPs on chromosomes X and Y and the mitochondrial genome were also excluded.

WGS 및 QC: 앰아이티와 하버드의 브로드 연구소(Broad Institute of MIT and Harvard)에 의해서 라이브러리 제조 및 전체 게놈 서열분석을 수행하였다. 미시간 대학교(University of Michigan)의 NHLBI 인포매틱스 리소스 코어(NHLBI Informatics Resource Core)가 모든 TOPMed 샘플의 정렬, 베이스 콜링(base calling) 및 서열 품질 점수화를 수행하였고, 분석에 사용된 판독 깊이가 적어도 10 인 모든 품질 필터를 통과한 모든 변이체에 대해서 bcf 파일을 전달하였다. LCR, 또는 X 염색체에서의 모든 부위를 제거하는 것을 포함하는, 추가 QC를 이 파일에 적용하였다. 5% 초과의 결손률, 1.0E-09 미만의 HWE p-값 및 0.1% 미만의 MAF를 갖는 변이체를 또한 제거하였다. 샘플 QC를 수행하여 5% 초과의 결손률, 높은 수준의 멘델리안 오류(일부 예에서) 또는 동일한(MZ) 쌍(각각의 쌍 중 하나)을 갖는 샘플을 제거하였다. WGS and QC: Library preparation and whole genome sequencing were performed by the Broad Institute of MIT and Harvard. The NHLBI Informatics Resource Core at the University of Michigan performed alignment, base calling, and sequence quality scoring of all TOPMed samples, with read depths of at least 10 used in the analysis. Bcf files were delivered for all variants that passed all quality filters. Additional QC was applied to this file, including removing all regions on the LCR, or X chromosome. Variants with deletion rate greater than 5%, HWE p-value less than 1.0E-09 and MAF less than 0.1% were also removed. Sample QC was performed to remove samples with missing rates greater than 5%, high levels of Mendelian error (in some instances), or identical (MZ) pairs (one of each pair).

WES 및 QC: 하기에 보다 상세하게 기재된 바와 같이 레저네론 제네틱스 센터(Regeneron Genetics Center: RGC)에서 엑솜 포획 및 서열분석을 수행하였다. 간략하면, 포획된 라이브러리를 페어드-엔드(paired-end) 75bp 판독물을 사용하여 v4 화학으로 Illumina HiSeq 2500 플랫폼 상에서 서열분석하였다. 염기의 85% 초과가 20× 또는 그 초과(표적화된 염기의 대부분에 걸쳐 이형접합성 변이체를 콜링하기에 충분함)로 커버되도록 포획된 염기의 페어드-엔드 서열분석을 수행하였다. RGC DNAseq 분석 파이프라인에 구현된 바와 같이 BWA-MEM 및 GATK를 사용하여 판독물 정렬 및 변이체 콜링을 수행하였다. 0.90 미만의 콜비, 높은 수준의 멘델리안 오류 또는 동일한(MZ) 쌍(각각의 쌍 중 하나) 또는 성별 미스매치를 갖는 샘플을 제외시켰다. 0.90 미만의 콜비를 갖는 SNP 및 단형성 SNP를 또한 제외시켰다. 염색체 X 및 Y에서의 SNP 및 미토콘드리아 게놈을 또한 제외시켰다. WES and QC: Exome capture and sequencing were performed at the Regeneron Genetics Center (RGC) as described in more detail below. Briefly, captured libraries were sequenced on an Illumina HiSeq 2500 platform with v4 chemistry using paired-end 75bp reads. Paired-end sequencing of captured bases was performed to ensure that more than 85% of the bases were covered by 20× or greater (sufficient to call heterozygous variants across most of the targeted bases). Read alignment and variant calling were performed using BWA-MEM and GATK as implemented in the RGC DNAseq analysis pipeline. Samples with Colby less than 0.90, high levels of Mendelian error, or identical (MZ) pairs (one of each pair) or gender mismatch were excluded. SNPs with Col ratios less than 0.90 and monomorphic SNPs were also excluded. SNPs on chromosomes X and Y and the mitochondrial genome were also excluded.

연관 분석: 공복 혈액 샘플을 수집하였고, 지질 분석을 위해서 사용하였다. 프라이발트식(Friedewald formula)을 사용하여 LDL를 계산하였고, 지질 감소 의약을 복용 중인 대상체를 사용하는 일부 분석에서 LDL 수준을 0.7로 나눔으로써 조정하였다. 선형 혼합 모델을 사용하여 유전자 연관 분석을 수행하여 가계 기반 친족 매트릭스 및/또는 WES로부터 친족을 추정하는 가족 교정을 사용한 가족 상관관계를 설명하였다. 분석을 또한 연령, 연령 제곱, 성별, 코호트 및 APOB R3527Q 유전자형에 대해서 조정하였다. APOB R3527Q는 아미쉬에서 풍부하고, LDL 수준(58㎎/㎗)에 강한 효과를 갖는다고 이미 식별되었고(Shen et al., Arch Intern. Med., 2010, 170, 1850-1855), 따라서 LDL 분석에서의 이러한 변이체의 효과를 고려하였다. 5.0E-08의 전장 게놈 보정된 p-값을 유의 역치로서 사용하였다. Correlation analysis: Fasting blood samples were collected and used for lipid analysis. LDL was calculated using the Friedewald formula, and in some analyzes using subjects taking lipid-lowering medications, LDL levels were adjusted by dividing by 0.7. Genetic linkage analysis was performed using linear mixed models to account for familial correlations using pedigree-based kinship matrices and/or family correction to estimate kinship from WES. Analyzes were also adjusted for age, age squared, gender, cohort, and APOB R3527Q genotype. APOB R3527Q is abundant in the Amish and has previously been identified as having a strong effect on LDL levels (58 mg/dl) (Shen et al., Arch Intern. Med., 2010, 170, 1850-1855), and was therefore found in the LDL analysis. The effects of these variants were considered. A genome-wide corrected p-value of 5.0E-08 was used as the significance threshold.

전장 게놈 연관 연구(Genome Wide Association Study: GWAS)를 사용한 염색체 9p 영역과 LDL 간의 연관의 식별:Identification of the association between chromosome 9p region and LDL using Genome Wide Association Study (GWAS):

심혈관 위험 인자와 연관된 신규 유전자에서의 원인 변이체를 식별하기 위해서, Affymetrix 500K 및 6.0 칩을 사용하여 유전형분석된 1852명의 올드 오더 아미쉬 대상체를 사용하여 전장 게놈 연관 분석을 수행하였다. 이러한 참가자의 기본적인 특징을 표 1에 나타낸다. To identify causative variants in novel genes associated with cardiovascular risk factors, genome-wide linkage analysis was performed using 1852 Old Order Amish subjects genotyped using Affymetrix 500K and 6.0 chips. Table 1 shows the basic characteristics of these participants.

WGS 정밀 맵핑 샘플의 거의 전부(96%)가 GWAS 발견 샘플에 포함되었다.Almost all (96%) of the WGS fine mapping samples were included in the GWAS discovery samples.

WES 샘플 중 단지 30%가 GWAS 또는 WGS 샘플에 포함되었다.Only 30% of WES samples were included in GWAS or WGS samples.

도 1에 나타낸 바와 같이, LDL과 염색체 9p 상의 유전자좌 간의 강한 신규한 연관 신호를 발견하였다. 우위의 연관된 SNP는 rs855453(p=2.2E-08)이었고, 아미쉬에서 15% 그리고 일반적인 집단에서 25%의 빈도를 가졌다. 부 'T' 대립유전자는 10㎎/㎗의 더 낮은 LDL 수준으로 연관되었다. 따라서, 이러한 GWAS SNP는 아미쉬 및 비-아미쉬 둘 다에서 일반적이며, 큰 효과 크기를 가지지만, 큰 GWAS 메타 분석 중 임의의 것에서 식별되지 않았다. 이러한 특징은 이전 연구(APOC3 및 LIPE)의 특징과 일치하며, 이를 기초로, 이러한 GWAS SNP가 이러한 영역에서 인과/기능성 변이체가 아니라 일반적인 집단에서는 희귀하지만 아미쉬 집단에서는 일반적인 또 다른 변이체와 연관 비평형(linkage disequilibrium: LD)이었다고 결론내었다. 추가로, 다중 균주의 5개의 독립적인 교차를 기반으로 하는 다수의 연구는 또한, 래트 염색체 5 상에 위치된 래트 게놈의 신테닌 영역이 혈청 콜레스테롤 및 트라이글리세리드 수준에 대한 QTL을 보유함을 발견하였다(The Rat Genome Database(RGD). Scl12.26. 35. 44, 54 및 Stl 28).As shown in Figure 1, we found a strong novel association signal between LDL and a locus on chromosome 9p. The predominant associated SNP was rs855453 (p=2.2E-08), with a frequency of 15% in the Amish and 25% in the general population. The minor 'T' allele was associated with lower LDL levels of 10 mg/dL. Therefore, this GWAS SNP is common in both Amish and non-Amish, has a large effect size, but was not identified in any of the large GWAS meta-analyses. These features are consistent with those of previous studies ( APOC3 and LIPE ) and, on this basis, suggest that these GWAS SNPs are not causal/functional variants in these regions but are associated with another variant that is rare in the general population but common in the Amish population (disequilibrium). It was concluded that it was linkage disequilibrium (LD). Additionally, multiple studies based on five independent crosses of multiple strains also found that the syntenin region of the rat genome, located on rat chromosome 5, harbors QTL for serum cholesterol and triglyceride levels (The Rat Genome Database (RGD). Scl12.26. 35. 44, 54 and Stl 28).

전체 엑솜 서열분석(WES)을 사용한 확인:Confirmation using whole exome sequencing (WES):

기본적인 특징을 표 1에 나타낸 4,565명의 아미쉬 개체에 대한 고품질 QC'd WES를 후속으로 사용하였다. LDL의 혼합 모델 엑솜 와이드 분석의 결과는, 3.3E-18의 p-값 및 14.7㎎/㎗의 더 낮은 LDL의 효과 크기를 갖는 가장 유의한 연관으로서 B4GALT1 rs551564683 미스센스 변이체를 확인하였다. rs551564683 변이체는 아미쉬에서 6%의 MAF를 가졌지만, 일반적인 집단에서는 매우 드물었다. 이 변이체는 빈도 또는 집단 정보 없이 dbSNP에 존재하고, ExAC 데이터베이스(60,000개 샘플)에 존재하지 않고, 정밀 의약(Precision Medicine)(TOPMed) 데이터세트를 위한 NHLBI Trans-Omics에서 15,387명의 비아미쉬로부터의 WGS에서 단지 하나의 카피가 발견되었다. 추가로, 연구자에게 입수 가능한 다른 집단 코호트의 집합적 데이터 세트 - 총 125,401명의 개체 -에서 이 변이체의 단지 79개의 이형접합체 및 5개의 동형접합체가 발견되었다(아미쉬 집단에서 1천배를 초과하게 풍부함). 이러한 미스센스 변이체는 0.5의 LD의 r2 추정치로 GWAS 변이체로부터 500kb 떨어져 있다. rs551564683과 완벽하게 관련있는 변이체는 존재하지 않고; 사실, 다음으로 가장 유의한 SNP는 p-값 E-14를 갖는 rs149557496이다. 따라서, rs551564683 연관의 강도는 염색체 9 GWAS 유전자좌가 실재하고, rs551564683이 우발성 변이체의 예측된 모든 특성을 갖는다는 것을 확인해 준다.We subsequently used high-quality QC'd WES on 4,565 Amish individuals whose basic characteristics are shown in Table 1. Results of a mixed model exome wide analysis of LDL identified the B4GALT1 rs551564683 missense variant as the most significant association with a p-value of 3.3E-18 and an effect size of lower LDL of 14.7 mg/dl. The rs551564683 variant had a MAF of 6% in the Amish, but was very rare in the general population. This variant is present in dbSNP without frequency or population information, is not present in the ExAC database (60,000 samples), and WGS from 15,387 Biamish in the NHLBI Trans-Omics for Precision Medicine (TOPMed) dataset. Only one copy was found in Additionally, only 79 heterozygotes and 5 homozygotes of this variant were found in a collective data set of different population cohorts available to researchers - a total of 125,401 individuals (over 1,000-fold more abundant in the Amish population). This missense variant is located 500 kb away from the GWAS variant with an r2 estimate of LD of 0.5. There is no perfectly related variant of rs551564683; In fact, the next most significant SNP is rs149557496 with a p-value of E-14. Therefore, the strength of the rs551564683 association confirms that the chromosome 9 GWAS locus is real and that rs551564683 has all the predicted characteristics of an episodic variant.

전체 게놈 서열분석(WGS)을 사용한 염색체 9p 영역의 정밀 맵핑:Precise mapping of the chromosome 9p region using whole genome sequencing (WGS):

더 작은 샘플에 대해서 사용 가능한 WGS를 사용하여 엑솜 서열분석에서의 갭을 채워서 rs551564683이 인과적이라는 추가 증거를 제공하였다. 1083 OOA에 대한 WGS 데이터를 TOPMed 프로그램의 부분으로서 생성하였다. WGS 샘플의 기본적인 특징을 표 1에 나타낸다. WGS는 관심대상 영역 내의 상위 변위체와 상관관계가 있을 수 있는 모든 SNP 및 Indel(삽입/결실) - 코딩 및 비코딩 둘 다 - 을 캡처한다. 상위 변이체는 약 6% 빈도이기 때문에, 변이체 콜러(variant caller)가 변이체 누락을 유발할 불충분한 서열 판독물일 가능성이 거의 없다. 그러나, QC 절차 동안 제외된 변이체가 존재할 수 있다. QC를 통과하지 않은 변이체를 조사함으로써, 2개의 추가적인 변이체를 분석에 첨가하였다. 연관 분석은 2.9E-06의 p-값 및 -16.4㎎/㎗의 효과 크기를 갖는 이러한 영역에서 LDL과 가장 유의하게 연관된 변이체로서 B4GALT1 유전자 내의 미스센스 SNP(N352S) rs551564683을 확인하였다(표 2 참고).Gaps in exome sequencing were filled using WGS, which was available for smaller samples, providing additional evidence that rs551564683 was causal. WGS data for 1083 OOA were generated as part of the TOPMed program. Table 1 shows the basic characteristics of WGS samples. WGS captures all SNPs and Indels (both coding and non-coding) that may be correlated with upstream variants within the region of interest. Because the top variant has a frequency of approximately 6%, it is unlikely that the variant caller will have insufficient sequence reads to cause missed variants. However, there may be variants excluded during the QC procedure. By examining variants that did not pass QC, two additional variants were added to the analysis. Linkage analysis identified the missense SNP (N352S) rs551564683 in the B4GALT1 gene as the variant most significantly associated with LDL in this region with a p-value of 2.9E-06 and an effect size of -16.4 mg/dl (see Table 2 ).

TOPMed WGS 데이터 세트는 2.9E-06 내지 2.5E-05의 p-값을 갖고, 높지만 완벽하지는 않게 상위 히트 rs551564683(r2 = 0.83- 0.94)와 상관관계가 있는 LDL과 연관된 20개의 변이체를 제공하였다(도 2에서 적색 참고). rs551564683에 대한 조건부 분석 조정은 20개 변이체의 연관 신호를 완전히 폐지하였고, 이 영역에서 어떠한 다른 신호도 나타내지 않았는데, 이는 단일 인과 변이체(causal variant)를 강하게 암시하였다.The TOPMed WGS data set provided 20 variants associated with LDL that were highly but not perfectly correlated with the top hit rs551564683 (r2 = 0.83-0.94), with a p-value of 2.9E-06 to 2.5E-05 ( (see red in Figure 2). Conditional analysis adjustment for rs551564683 completely abolished the association signal of 20 variants and did not reveal any other signal in this region, strongly suggesting a single causal variant.

이러한 20개의 변이체(도 2에서 적색 참고)를 주의 깊게 조사함으로써 변이체를 2개의 군으로 나누었다: 음영이 있는 삼각형 내의 7개의 적색 변이체 및 13개의 음영이 없는 적색 변이체. 음영이 있는 삼각형 내의 7개의 적색 변이체는 서로와 거의 완전히 상관관계가 있었고, 상위 히트 rs551564683와 0.83의 r2를 가졌다. 이러한 7개의 변이체는 하기 3가지 이유를 기초로 인과/기능성인 것으로 안전하게 제외되었다: 1) 이것은 OOA 외부에서 비교적 흔함(maf > 1%), 2) 이것은 TOPMed 내의 프래밍험 심장 연구(Framingham Heart Study: FHS)로부터 3877개의 샘플에서 LDL과 어떠한 연관도 나타내지 않았음, 3) 이들 7개의 변이체 중 하나는 4,565명의 OOA 대상체의 WES 데이터에서 상위 히트 rs551564683에 대해서 6.3E-14 대 3.3E-18의 LDL 연관 p-값를 가졌다.Careful examination of these 20 variants (see red in Figure 2) divided the variants into two groups: 7 red variants in shaded triangles and 13 unshaded red variants. The seven red variants within the shaded triangle were almost completely correlated with each other, with the top hit rs551564683 and an r2 of 0.83. These 7 variants were safely excluded as causal/functional based on 3 reasons: 1) they are relatively common outside of OOA (maf > 1%), 2) they were reported within TOPMed in the Framingham Heart Study: 3) one of these seven variants had an LDL association of 6.3E-14 versus 3.3E-18 for the top hit rs551564683 in WES data of 4,565 OOA subjects; It had a p-value.

도 2에서 음영이 있는 사각형 내의 변이체의 또 다른 군은 또한 약 10E-6의 연관 p-값 만을 가졌고, 서로와 거의 완전히 상관관계가 있었고, 상위 히트 rs551564683와 0.68의 r2를 가졌다. 이러한 군은 또한 인과/기능성인 것으로 제외되었는데, 그 이유는 이것은 OOA 외부에서 비교적 흔하고(maf 약 4%), TOPMed 내의 FHS 로부터의 3877개 샘플에서 LDL과 어떠한 연관도 나타내지 않았기 때문이었다.Another group of variants within the shaded squares in Figure 2 also had an association p-value of only about 10E-6, were almost completely correlated with each other, and had an r2 of 0.68 with the top hit rs551564683. This group was also excluded as causal/functional because it is relatively common outside of OOA (maf about 4%) and did not show any association with LDL in the 3877 samples from FHS in TOPMed.

도 2에서 상위 히트 rs551564683 및 13개의 음영이 없는 적색 변이체(염색체 9의 짧은 아암 상에서 31.5Mb에서부터 35.5Mb까지 4Mb에 걸쳐서 연장됨)가 남았다. 상기에 기재된 바와 같이, 이러한 13개의 변이체는 서로와 거의 완전히 상관관계가 있었고, 상위 히트 rs551564683와 0.91 내지 0.94의 r2를 가졌다. 이러한 변이체 중에서, 상위 히트 rs551564683이 유일한 코딩 변이체(coding variant)였고, 그것은 단백질 기능에 대한 변이체의 효과를 예측하는 9개의 알고리즘 중 5개에 의해서 손상 및 유해성으로서 분류되었다. 상위 히트 rs551564683 및 이러한 13개의 변이체가 OOA에서 6%의 MAF를 가졌지만, 일반 집단에서는 거의 존재하지 않는다.In Figure 2, the top hit rs551564683 and 13 unshaded red variants (extending over 4 Mb from 31.5 Mb to 35.5 Mb on the short arm of chromosome 9) remained. As described above, these 13 variants were almost completely correlated with each other and had an r2 of 0.91 to 0.94 with the top hit rs551564683. Among these variants, the top hit rs551564683 was the only coding variant, which was classified as damaging and deleterious by 5 out of 9 algorithms predicting the effect of variants on protein function. The top hit rs551564683 and these 13 variants had a MAF of 6% in OOA, but are almost absent in the general population.

일배체형 분석:Haplotype analysis:

별개의 유전자좌 간의 불완전한 r2는 재조합 사건의 결과이다. 일차 14-SNP 일배체형의 상세한 분석을 수행하였다. 도 3은 이러한 4Mb 영역 내의 이러한 3개의 주요 일배체형을 나타낸다. 일배체형 A를 갖는 115명의 대상체(1명의 동형접합체, 및 114명의 이형접합체)가 존재하였고, 이것은 14개의 SNP에서 동일한 유전자형을 가졌지만, 어느 SNP가 인과적일 수 있는지에 대해서는 어떠한 정보도 제공하지 않았다. 6명의 대상체는 rs551564683 및 4개의 상류 SNP에서 이형접합체 유전자형을 함유한 일배체형 B를 가졌고, 7명의 대상체는 rs551564683 및 9개의 하류 SNP에서 이형접합체 유전자형을 함유한 일배체형 C를 가졌다. 재조합 일배체형 B 및 C는 관련된 대상체에서 클러스터링되었는데, 이는 이들이 유전자형 분석 오류의 인공물이 아니라는 증거를 제공한다. 표 3은 일배체형 A를 갖는 개체와 비교하여 일배체형 B 및 C를 갖는 개체를 단일군에 추가한 후 rs551564683의 p-값을 나타낸다.Incomplete r2 between distinct loci is the result of recombination events. A detailed analysis of the primary 14-SNP haplotype was performed. Figure 3 shows these three major haplotypes within this 4 Mb region. There were 115 subjects (1 homozygote and 114 heterozygotes) with haplotype A, which had identical genotypes at 14 SNPs, but did not provide any information about which SNP may be causal. . Six subjects had haplotype B, containing a heterozygous genotype at rs551564683 and four upstream SNPs, and seven subjects had haplotype C, containing a heterozygous genotype at rs551564683 and nine downstream SNPs. Recombinant haplotypes B and C clustered in related subjects, providing evidence that they are not an artifact of genotyping error. Table 3 shows the p-value of rs551564683 after adding individuals with haplotypes B and C to a single group compared to individuals with haplotype A.

일배체형 B 및 C를 개별적으로 첨가하는 것은 p-값을 개선시켰고, 이들 둘 다를 첨가하는 것은 p-값을 더 양호하게 개선시켰다. 개선된 p-값은 일배체형 B 및 C 둘 다가 인과 대립유전자를 보유함을 나타내었다. B와 C 사이에 공통적인 유일한 SNP는 rs551564683였는데, 이것은 인과 변이체인 것으로 간주되었다.Adding haplotypes B and C individually improved the p-value, and adding both improved the p-value even better. The improved p-value indicated that both haplotypes B and C carry the causal allele. The only SNP in common between B and C was rs551564683, which was considered to be a causal variant.

글리코실화의 B4GALT1 선천적 장애는 rs551564683 기능성 역할을 뒷받침한다:B4GALT1 congenital impairment of glycosylation supports a functional role for rs551564683:

전장 표현형 연관 분석(phenotype-wide association study: PheWAS)을 수행하여 아미쉬 데이터베이스에서 rs551564683와 모든 특질의 연관을 시험하였다. LDL(p= 3.3E-18) 및 총 콜레스테롤(p= 3.0E-18) 이후의 가장 강한 연관은 아스파테이트 트랜스아미나제(AST)(p= 3.0E-8)에서 발견되었는데, 여기서 부 대립유전자 동형접합체는 야생형 동형접합체보다 AST 수준에서 2배 증가를 가졌다. 더 높은 AST가, 절두된 기능장애 단백질에서 발생한 B4GALT1 내의 프레임 시프트 삽입에 의해서 유발된 글리코실화의 선천적 장애(CGD) 사례에서 이미 보고되었다. 더욱이, 피브리노겐 수준(p= 5.0E-4)과의 강한 연관이 관찰되었는데, 여기서 부 동형접합체 수준은 야생형 보다 약 20% 더 낮았고, 이는 동일한 CDG 환자에서의 혈액 응고 결함과 일관되었다. 더욱이, 작은 실험에서, 크레아틴 카이나제 혈청 수준의 50% 증가(p=0.02)가 13개의 야생형 동형접합체와 비교할 때 13개의 부 대립유전자 동형접합체에서 발견되었다. 미스센스 SNP와 연관된 표현형에서의 이러한 일관성 및 B4GALT1에서 절두 삽입에 의해 유발된 것은 B4GALT1 rs551564683 SNP가 이러한 영역에서 인과/기능성 유전자 및 변이체라는 증거를 더 강력하게 뒷받침한다.A phenotype-wide association study (PheWAS) was performed to test the association of rs551564683 with all traits in the Amish database. The strongest association after LDL (p= 3.3E-18) and total cholesterol (p= 3.0E-18) was found for aspartate transaminase (AST) (p= 3.0E-8), where the minor allele Homozygotes had a two-fold increase in AST levels than wild-type homozygotes. Higher AST has already been reported in a case of congenital disorder of glycosylation (CGD) caused by a frameshift insertion in B4GALT1 resulting from a truncated dysfunctional protein. Moreover, a strong association with fibrinogen levels (p=5.0E-4) was observed, where homozygous levels were approximately 20% lower than wild type, consistent with blood coagulation defects in the same CDG patients. Moreover, in a small experiment, a 50% increase in creatine kinase serum levels (p=0.02) was found in 13 minor allele homozygotes compared to 13 wild-type homozygotes. This consistency in the phenotypes associated with the missense SNP and caused by the truncated insertion in B4GALT1 further supports the evidence that the B4GALT1 rs551564683 SNP is a causal/functional gene and variant in this region.

지질 하위분류물과 rs551564683 간의 연관을 759명의 아미쉬 개체에서 조사하였고, 표 4에 나타낸 바와 같이, 유의하거나 또는 유의하지 않은 p-값을 갖는 거의 모든 하위분류물의 더 낮은 수준과의 연관이 관찰되었다.The association between lipid subclasses and rs551564683 was examined in 759 Amish individuals, and as shown in Table 4, associations with lower levels of almost all subclasses were observed with significant or non-significant p-values.

관상 석회화 점수, 대동맥 석회화 점수 및 심장주변 지방은, 더 낮은 수준과의 연관 경향을 나타내었지만 유의한 p-값을 갖지 않았다.Coronary calcification score, aortic calcification score and pericardial fat showed a trend toward association with lower levels but did not have a significant p-value.

PheWAS는 또한 rs551564683가 더 높은 크레아티닌 및 더 낮은 eGFR, 뿐만 아니라 더 높은 헤마토크리트 및 더 낮은 호염구와 연관됨을 발견하였다.PheWAS also found that rs551564683 was associated with higher creatinine and lower eGFR, as well as higher hematocrit and lower basophils.

실시예 2: 샘플 제조 및 서열분석Example 2: Sample preparation and sequencing

게놈 DNA 샘플 농축물을 아미쉬 대상체로부터 획득하였고, 이어서 인-하우스 설비로 옮기고, 서열 분석 시까지 -80℃(LiCONiC TubeStore)에서 저장하였다. 샘플 양을 형광(라이프 테크놀로지스사(Life Technologies))에 의해서 결정하고, 100ng의 샘플을 2% 프리-캐스트(pre-cast) 아가로스 겔(라이프 테크놀로지스사) 상에서 전개시킴으로써 품질을 평가하였다.Genomic DNA sample concentrates were obtained from Amish subjects, then transferred to in-house facilities and stored at -80°C (LiCONiC TubeStore) until sequence analysis. Sample quantity was determined by fluorescence (Life Technologies) and quality was assessed by running 100 ng of sample on a 2% pre-cast agarose gel (Life Technologies).

DNA 샘플을 정규화 시키고, 각각의 샘플을 집중 음향 에너지(Covaris LE220)를 사용하여 150개 염기쌍의 평균 단편 길이로 전단시켰다. 전단된 게놈 DNA를 인하우스에서 개발된 완전 자동화 접근법을 사용하여 카파 바이오시스템즈사(Kapa Biosystems)로부터의 맞춤형 시약 키트를 사용한 엑솜 포획을 위해서 제조하였다. 고유한 6개의 염기쌍 바코드를 라이브러리 제조 동안 각각의 DNA단편에 첨가하여 멀티플렉싱된 엑솜 포획 및 서열분석을 용이하게 하였다. 동일량의 샘플을 풀링(pooling)시키고, 그 다음 일부 변형을 갖는 IDT로부터 입수 가능한 xGen 디자인 상에서 엑솜 포획하였다. 멀티플렉싱된 샘플을 Illumina v4 HiSeq 2500 상에서 75bp 페어드-엔드 서열분석을 사용하여 서열분석하였다.DNA samples were normalized and each sample was sheared using focused acoustic energy (Covaris LE220) to an average fragment length of 150 base pairs. Sheared genomic DNA was prepared for exome capture using a custom reagent kit from Kapa Biosystems using a fully automated approach developed in-house. A unique six base pair barcode was added to each DNA fragment during library preparation to facilitate multiplexed exome capture and sequencing. Equal amounts of samples were pooled and then exome captured on the xGen design available from IDT with some modifications. Multiplexed samples were sequenced on an Illumina v4 HiSeq 2500 using 75bp paired-end sequencing.

Illumina Hiseq 2500 플랫폼 상에서 생성된 원시 서열 데이터를 DNAnexus(디엔에이넥서스사(DNAnexus Inc.), 미국 캘리포니아주 마운틴 뷰 소재)에서 고성능 컴퓨팅 리소스에 업로딩하였고, 자동화된 워크플로우가 raw .bcl을 주석이 달린 변이체 콜로 처리하였다. 원시 판독물을 CASAVA 소프트웨어(일루미나사(Illumina Inc.), 미국 캘리포니아주 샌디에고 소재))를 사용하는 샘플 특이적 바코드를 기반으로 하는 분석을 위해서 적절한 샘플에 배정하였다.Raw sequence data generated on the Illumina Hiseq 2500 platform were uploaded to high-performance computing resources at DNAnexus (DNAnexus Inc., Mountain View, CA, USA), and an automated workflow was used to convert raw .bcl into annotated variants. It was handled as a call. Raw reads were assigned to the appropriate samples for analysis based on sample-specific barcodes using CASAVA software (Illumina Inc., San Diego, CA, USA).

이어서 샘플 특이적 판독물을 BWA-mem을 사용하여 표준 서열에 정렬하였다(Li and Durbin, Bioinformatics, 2009, 25, 1754-1760). 이것은 각각의 판독물이 맵핑된 게놈 좌표 및 특정 샘플의 판독물 모두를 갖는 각각의 샘플에 대한 이진 정렬 파일(BAM)을 생성하였다. 정렬 이후에, 샘플의 판독물을 평가하여 피카드 마크듀플리케이츠 툴(Picard MarkDuplicates tool)(picard.sourceforge.net)을 사용하여 중복 판독물을 식별 및 플래그 처리하여, 각각의 중복 판독물이 표시된 정렬 파일을 생성하였다(duplicatesMarked.BAM).Sample-specific reads were then aligned to standard sequences using BWA-mem (Li and Durbin, Bioinformatics, 2009, 25, 1754-1760). This generated a binary alignment file (BAM) for each sample with both the reads for that particular sample and the genomic coordinates to which each read was mapped. After alignment, the reads from the sample were evaluated to identify and flag duplicate reads using the Picard MarkDuplicates tool (picard.sourceforge.net), creating an alignment file with each duplicate read marked. was created (duplicatesMarked.BAM).

이어서, 게놈 분석 툴키트(Genome Analysis Toolkit: GATK)(Van der Auwera, Cur. Protocols in Bioinformatics, 2013, 11, 11-33; McKenna, Genome Res., 2010, 20, 1297-1303)를 사용하여 각각의 샘플의 정렬 및 중복 표시된 판독물의 국지적 재정렬을 수행하였다. 이어서 GATK 하플로타입콜러(HaplotypeCaller)를 사용하여 재정렬된 중복 표시된 판독물을 처리하고, 단일 뉴클레오타이드 변이 및 INDEL을 비롯하여, 샘플이 게놈 표준품으로부터 변화된 모든 엑손 위치 및 특정 샘플이 표준품과 상이한 임의의 위치에서 샘플 내의 변이체의 접합성(zygosity)을 식별하였다.Subsequently, each Alignment of samples and local realignment of overlapping marked reads were performed. The GATK HaplotypeCaller is then used to process the realigned overlapping reads and identify all exon positions where the sample changed from the genomic standard, including single nucleotide variants and INDELs, and any positions where a particular sample differs from the standard. The zygosity of the variants in the sample was identified.

표준 대립유전자 및 대안적인 대립유전자 둘 다에 배정된 판독물 계수치, 유전자형 콜의 신뢰도를 나타내는 유전자형 품질 및 그 위치에서의 변이체 콜의 전체 품질을 모든 변이체 위치에서 출력하였다. 이어서, GATK로부터의 변이체 품질 점수 재보정(Variant Quality Score Recalibration: VQSR)을 사용하여 트레이닝 데이터세트를 사용한 샘플의 변이체의 전체 품질 점수를 평가하여 이러한 점수를 접근 및 재계산하여 특이성을 증가시켰다. 메트릭 통계(metric statistics)를 각각의 샘플에 대해서 캡처하여 캡처 성능, 정렬 성능 및 변이체 콜링을 평가하였다. 코호트 서열분석의 완결 이후에, GATK를 사용하여 결합-유전자형분석에 의해서 프로젝트-수준 VCF를 생성시켜 유전자형 및 코호트 내의 임의의 샘플이 표준 게놈으로부터의 변이체를 보유하는 임의의 부위에서 모든 샘플에 대한 유전자형 및 연관된 메트릭 정보를 생성시켰다. 그것은 하류 통계학적 분석을 위해서 사용된 프로젝트-수준 VCF였다. VQSR에 더하여, 변이체는 GATK를 사용하여 깊이 품질(Quality By Depth: QD) 메트릭으로 주석을 달고, 2.0 초과의 QD, 1% 미만의 누락률 및 1.0×10^-6 초과의 하디-바인버그 평형 p-값을 갖는 이중대립유전자 변이체를 추가 분석을 위해서 유지시켰다.Counts of reads assigned to both standard and alternative alleles, genotype quality, which indicates confidence in the genotype call, and overall quality of variant calls at that position were output for every variant position. Variant Quality Score Recalibration (VQSR) from GATK was then used to assess the overall quality scores of variants in samples using the training dataset to access and recalculate these scores to increase specificity. Metric statistics were captured for each sample to evaluate capture performance, alignment performance, and variant calling. After completion of cohort sequencing, a project-level VCF is generated by joint-genotyping using GATK to genotype and genotype all samples at any site where any sample in the cohort carries a variant from the reference genome. and associated metric information was generated. It was a project-level VCF used for downstream statistical analysis. In addition to VQSR, variants are annotated with the Quality By Depth (QD) metric using GATK, with a QD greater than 2.0, a miss rate less than 1%, and a Hardy-Weinberg equilibrium p greater than 1.0×10 ^-6 . Biallelic variants with -values were retained for further analysis.

하류 서열 데이터 분석 이전에, 유전자 결정된 성별과 불일치하는 보고된 성별을 갖는 샘플, 높은 비율의 헤테로접합성, 낮은 서열 커버리지(목표 염기의 75% 미만의 20× 커버리지로서 정의됨) 또는 비정상적으로 높은 수준의 은성 관련성(cryptic relatedness)을 갖는 샘플 및 유전자 식별된 샘플 중복물을 제외시켰다.Prior to downstream sequence data analysis, samples with reported sex inconsistent with genetically determined sex, high rates of heterozygosity, low sequence coverage (defined as 20× coverage of less than 75% of target bases), or abnormally high levels of Samples with cryptic relatedness and genetically identified sample duplicates were excluded.

ANNOVAR(Wang et al., Nuc. Acids Res., 2010, 38, e164) 및 주석 및 분석을 위해서 다른 맞춤 알고리즘을 사용하는 주석 파이프라인을 사용하여 서열 변이체에 주석을 달았다. 변이체를 이의 잠재적인 기능성 효과에 따라서 분류하고, 그 다음 공공으로 사용 가능한 집단 제어 데이터베이스 및 공통 다형성 및 높은 빈도(아마도 양성 변이체)를 필터링하기 위한 데이터베이스에서 관찰된 빈도에 의해서 필터링하였다. 다중 종 정렬을 기반으로 하는 보존 점수와 함께 변이체의 기능성 효과의 생물정보학 예측에 대한 알고리즘을 변이체의 주석 과정의 부분으로서 혼입시켰고, 이를 사용하여 식별된 후보 변이체의 잠재적인 유해성을 알아내었다.Sequence variants were annotated using an annotation pipeline using ANNOVAR (Wang et al., Nuc. Acids Res., 2010, 38, e164) and other custom algorithms for annotation and analysis. Variants were classified according to their potential functional effect and then filtered by observed frequency in publicly available population control databases and databases to filter out common polymorphisms and high frequency (presumably benign variants). An algorithm for bioinformatics prediction of the functional effect of variants with conservation scores based on multi-species alignments was incorporated as part of the variant annotation process and used to determine the potential deleteriousness of the identified candidate variants.

실시예 3: Example 3: B4GALT1B4GALT1 rs551564683 N352S 빈도는 아미쉬에서 풍부하다 rs551564683 N352S frequency is abundant in the Amish

약 4700명의 아미쉬 대상체에서의 엑솜 서열분석 및 연관 분석을 통해서, 염색체 9 상의 rs551564683이 총 콜레스테롤 수준과 상당히 연관되었음을 발견하였다(p=1.3E-10)(도 4 참고). RS551564683은 B4GALT1 단백질 내의 352번 위치에서 세린이 아스파라긴으로 변경된 미스센스 변이체를 암호화한다. 이러한 영역에서 다음으로 가장 상당히 LDL과 연관된 변이체는 단지 10^-5의 p-값을 갖는 rs149557496이었는데, 이는 N352S 변이체가 원인 변이체라는 것을 시사한다. 구체적으로 도 4를 참고하면, 엑솜 서열 데이터에서, Asn352Ser B4GALT1을 갖는 최고 LD에서의 변이체는 HRCT1, 2.8Mb 거리, R² 0.78, 10^-5의 아미쉬에서의 LDL과의 P-값의 rs149557496이었다. 아미쉬에서의 전체 게놈 서열 데이터(TOPMED)는 이러한 영역에서 LDL-C와 보다 고도로 연관된 변이체를 식별하는 데 실패하였다.Through exome sequencing and linkage analysis in approximately 4700 Amish subjects, rs551564683 on chromosome 9 was found to be significantly associated with total cholesterol levels (p=1.3E-10) (see Figure 4). RS551564683 encodes a missense variant in which serine is changed to asparagine at position 352 in the B4GALT1 protein. The next most significantly LDL associated variant in this region was rs149557496 with a p-value of only 10 ^-5 , suggesting that the N352S variant is the causative variant. Referring specifically to Figure 4, in the exome sequence data, the variant at highest LD with Asn352Ser B4GALT1 was rs149557496 with P-value of HRCT1, 2.8Mb distance, R ² 0.78, P-value with LDL in Amish of 10 ^-5 . Whole genome sequence data in the Amish (TOPMED) failed to identify variants more highly associated with LDL-C in this region.

추가 분석은, B4GALT1 N352S 변이체 빈도가 아미쉬 집단에서 1천배를 초과하게 풍부하였다는 것을 나타내었다(도 5 참고). 이러한 데이터는, 4725명의 아미쉬의 코호트에서, rs551564683-함유 대립유전자에 대해서 548명의 이형접합성 보유자를 식별하였고, 13명의 보유자가 그 대립유전자에 대해서 동형접합성이었다는 것을 나타내었다(도 5 참고). 비교로, 연구자에게 입수 가능한 다른 집단 코호트의 집합적 데이터 세트 - 총 125,401명의 개체 -를 분석하였고, 단지 79개의 이형접합체 및 5개의 동형접합체가 이러한 수집된 데이터 세트에서 식별되었다. 아미쉬 코호트에서 대립유전자 빈도는 약 0.06인 것으로 예측되었고, 이에 비해서 집합적 데이터 세트에서는 약 0.0025였다(도 5 참고). 유전자 드리프트는 아미쉬에서 이러한 대립유전자의 더 높은 빈도를 설명할 수 있다고 여겨진다.Further analysis showed that the B4GALT1 N352S variant frequency was >1,000-fold more abundant in the Amish population (see Figure 5). These data indicated that, in a cohort of 4725 Amish, 548 carriers were identified as heterozygous for the rs551564683-containing allele, and 13 carriers were homozygous for that allele (see Figure 5). As a comparison, a collective data set of another population cohort available to researchers - a total of 125,401 individuals - was analyzed, and only 79 heterozygotes and 5 homozygotes were identified in this collected data set. The allele frequency in the Amish cohort was predicted to be approximately 0.06, compared to approximately 0.0025 in the collective data set (see Figure 5). It is believed that genetic drift may explain the higher frequency of these alleles in the Amish.

실시예 4: Example 4: B4GALT1B4GALT1 N352S는 감소된 혈청 지질 및 증가된 AST와 연관된다. N352S is associated with reduced serum lipids and increased AST.

B4GALT1 N352S 변이와, 혈청 지질, 관상 동맥 질환(CAD) 및 간 특질을 비롯한 다양한 표현형의 연관을 평가하였다. 연관은, 표준 대립유전자에 대해서 동형접합성인 개체, 대안적인 대립유전자에 대해서 이형접합성인 개체 및 대안적인 대립유전자에 대해서 동형접합성인 개체를 갖는 아미쉬 코호트를 기초로 수행하였다. 지질 및 간 특질에 대한 유전형분석 평균 및 CAD의 위험을 결정하였는데, 대상체 연령, 연령제곱, 대상체 성별 및 연구의 효과를 제거함으로써 효과 측정치를 조정하였다(표현형 데이터를 수 년에 걸친 몇몇 연구로부터 수집하였기 때문). 심장주변 지방의 경우, 유전형분석 평균을 BMI를 위해서 추가로 조정하였다. 측정된 표현형에 대한 변동의 효과 크기를 95% 신뢰 구간에서 측정하였다. 특질 및 결과를 도 6, 도 7 및 도 8에 제공한다.The association of the B4GALT1 N352S variant with various phenotypes, including serum lipids, coronary artery disease (CAD), and liver traits, was evaluated. Linkages were performed based on an Amish cohort with individuals homozygous for the standard allele, individuals heterozygous for the alternative allele, and individuals homozygous for the alternative allele. Genotyping mean and risk of CAD for lipid and liver traits were determined, with effect measures adjusted by removing the effects of subject age, age squared, subject sex, and study (as phenotypic data were collected from several studies over several years). because). For pericardial fat, genotyping means were further adjusted for BMI. The effect size of variation on the measured phenotype was estimated at a 95% confidence interval. Characteristics and results are provided in Figures 6, 7 and 8.

도 6에 나타낸 바와 같이, N352S 변이의 존재는 일반적으로 감소된 혈청 지질, 특히 총 콜레스테롤(p-값 1.3×10^-10) 및 LDL(p-값 1.8×10^-9) 수준과 상관관계가 있었는데, 이는 강한 통계학적 유의성을 달성하였다. 이러한 변경에 대한 개체 이형접합성 및 동형접합성은 각각 LDL 수준에 대해서 17.3㎎/㎗ 및 31.2㎎/㎗ 감소를 나타내었다. 변이체와 감소된 관상 동맥 석회화 간에 경향이 존재하였다. 또한, 이러한 변이의 존재는 증가된 아스파테이트 아미노트랜스퍼라제(AST) 수준(p-값 6.0×10^-8)과 상관관계가 있었다. AST 수준에 대한 열성 모델 p-값은 9×10^-23인 것으로 결정되었다. 이러한 변이는 증가된 알라닌 아미노트랜스퍼라제(ALT) 수준, 알칼리성 포스파타제 수준 또는 간 지방 수준과 상관관계를 나타내지 않았다. 콜레스테롤, LDL 및 AST 수준을 도 7에 그래프로 나타낸다. 도 7에서, 콜레스테롤, LDL 및 AST의 수준을 표준 대립유전자에 대해서 동형접합성(TT)인 대상체, 대안적인 대립유전자에 대해서 이형접합성(CT)인, 대상체 및 대안적인 대립유전자에 대해서 동형접합성(CC)인 대상체에 대해서 나타낸다. 나타낸 값은 조정되지 않는다. 대상체 연령 및 제곱 연령, 성별 및 연구에 대한 조정을 기초로 해서 값을 재계산하였다(도 7의 하단에 표로 나타냄).As shown in Figure 6, the presence of the N352S mutation was generally correlated with reduced serum lipids, especially total cholesterol (p-value 1.3× ^10-10 ) and LDL (p-value 1.8× ^10-9 ) levels. , which achieved strong statistical significance. Individual heterozygosity and homozygosity for these alterations showed reductions in LDL levels of 17.3 mg/dL and 31.2 mg/dL, respectively. There was a trend between the variants and reduced coronary artery calcification. Additionally, the presence of this variant was correlated with increased aspartate aminotransferase (AST) levels (p-value 6.0×10 ^-8 ). The recessive model p-value for AST levels was determined to be 9×10 ^-23 . These mutations did not correlate with increased alanine aminotransferase (ALT) levels, alkaline phosphatase levels, or liver fat levels. Cholesterol, LDL and AST levels are graphically shown in Figure 7. In Figure 7, levels of cholesterol, LDL, and AST were measured in subjects homozygous for the standard allele (TT), heterozygous for the alternative allele (CT), and homozygous for the alternative allele (CC). ) refers to an object. The values shown are not adjusted. Values were recalculated based on subject age and adjustment for squared age, sex, and study (as tabulated at the bottom of Figure 7).

지질 하위분류물에 대한 N352S 변경의 효과를 또한 평가하였다. 이러한 결과를 도 8에 나타낸다. 연관은, 표준 대립유전자에 대해서 동형접합성인 개체, 대안적인 대립유전자에 대해서 이형접합성인 개체 및 대안적인 대립유전자에 대해서 동형접합성인 개체를 갖는 아미쉬 코호트를 기초로 수행하였다. 도 8에서의 결과는, B4GALT1 N352S 변경이 시험된 모든 지질 하위분류물의 감소와 연관된다는 것을 나타낸다.The effect of the N352S change on lipid subclasses was also assessed. These results are shown in Figure 8. Linkages were performed based on an Amish cohort with individuals homozygous for the standard allele, individuals heterozygous for the alternative allele, and individuals homozygous for the alternative allele. The results in Figure 8 show that the B4GALT1 N352S alteration is associated with a reduction in all lipid subclasses tested.

실시예 5: 감소된 피브리노겐 수준과의 Example 5: With reduced fibrinogen levels B4GALT1B4GALT1 N352S 연관 N352S association

B4GALT1 N352S 변이와 피브리노겐 수준의 연관을 또한 샘플의 하위세트에서 평가하였다. 실시예 4에서 평가된 혈청 지질, CAD 및 특질에 대해서, 피브리노겐 수준과의 연관은, 대안적인 대립유전자에 대해서 동형접합성인 개체, 표준 대립유전자에 대해서 이형접합성인 개체 및 대안적인 대립유전자에 대해서 동형접합성인 개체를 갖는 아미쉬 코호트를 기초로 수행하였다. 피브리노겐 수준에 대한 유전형분석 평균을 개체의 두 하위군 - 클로피도그렐(clopidogrel) 요법 중이 아닌 개체(약물 미경험) 및 클로피도그렐 요법 중인 개체(온-클로피도그렐)에서 결정하였고, 분석의 일부로서, 대상체 연령, 연령 제곱, 대상체 성별 및 연구의 효과를 제거함으로써 각각의 군에서의 평균 수준을 조정하였다. 피브리노겐 수준에 대한 변동의 효과 크기를 95% 신뢰 구간에서 측정하였다. 도 9에 나타낸 바와 같이, N352S 변이의 존재는, 약물 미경험(p-값 1.15×10^-3)군 및 온-클로피도그렐(p-값 2.74×10^-5)군 각각에서 감소된 피브리노겐 수준과 연관되었다. 약물 미경험 하위군은 대략 24㎎/㎗의 피브리노겐 감소를 나타내었다(도 9 참고). 온-클로피도그렐 하위군은 대략 32.5㎎/㎗의 피브리노겐 감소를 나타내었다(도 9 참고).The association of the B4GALT1 N352S variant with fibrinogen levels was also assessed in a subset of samples. For the serum lipids, CAD, and traits assessed in Example 4, the association with fibrinogen levels was found in individuals homozygous for the alternative allele, heterozygous for the standard allele, and homozygous for the alternative allele. The study was conducted based on an Amish cohort with zygotic individuals. Genotyping means for fibrinogen levels were determined in two subgroups of subjects - those not on clopidogrel therapy (drug-naive) and those on clopidogrel therapy (on-clopidogrel), and as part of the analysis, subject age, age squared. , the average level in each group was adjusted by removing the effects of subject gender and study. The effect size of variation on fibrinogen levels was estimated at 95% confidence intervals. As shown in Figure 9, the presence of the N352S mutation was associated with reduced fibrinogen levels in the drug-naive (p-value 1.15×10 ^-3 ) and clopidogrel-on-clopidogrel (p-value 2.74×10 ^-5 ) groups, respectively. . The drug-naive subgroup showed a fibrinogen reduction of approximately 24 mg/dl (see Figure 9). The on-clopidogrel subgroup showed a fibrinogen reduction of approximately 32.5 mg/dl (see Figure 9).

실시예 6: 추가적인 Example 6: Additional B4GALT1B4GALT1 N352S 연관 N352S association

아미쉬 코호트 내에서, B4GALT1 N352S 변이와, 크레아티닌 수준, 예측된 사구체 여과율(eGFR), 호염구 수준 및 헤마토크리트 백분율을 비롯한 다른 특질 간의 연관 평가를 또한 수행하였다. 도 9에 나타낸 바와 같이, 이 변이체는 크레아티닌 수준의 적은 증가와 약하게 연관되었지만, eGFR, 호염구 수준 헤마토크리트 백분율과는 유의하게 연관되지 않았다.Within the Amish cohort, an assessment of the association between the B4GALT1 N352S variant and other traits including creatinine level, predicted glomerular filtration rate (eGFR), basophil level and hematocrit percentage was also performed. As shown in Figure 9, this variant was weakly associated with small increases in creatinine levels, but not significantly associated with eGFR, basophil level and hematocrit percentage.

실시예 7: 제브라피쉬에서의 B4GALT1 오쏘로그 넉다운Example 7: B4GALT1 ortholog knockdown in zebrafish

세포-기반 검정과 유사하게, 제브라피쉬 모델을 수행하여 LDL에 대한 B4GALT1 p.Asn352Ser의 효과를 조사하였다.Similar to the cell-based assay, a zebrafish model was performed to investigate the effect of B4GALT1 p.Asn352Ser on LDL.

제브라피쉬 제조, 모폴리노 주입 및 검증Zebrafish preparation, morpholino injection and validation.

야생형(Tubingen) 제브라피쉬 스톡을 사용하여 모폴리노 주입을 위한 배아를 생성시켰다. 성어를 27 내지 29℃에서 유지시키고, 키우고, 배아를 28.5℃에서 상승시켰다. 모든 동물을 매릴랜드 대학교 연구 동물 관리 및 사용 위원회(University of Maryland Institutional Animal Care and Use Committee)에 의해서 승인된 프로토콜에 따라서 가두고 유지시켰다. 모폴리노 안티센스 올리고뉴클레오타이드(MO)를 B4GALT1에 대해서 표적화된 이미 공개된 MO를 기초로 획득하였다(젠 툴사(Gene Tools, Inc.))(Machingo et al., Dev. Biol., 2006, 297, 471-482). MO를 1 내지 2 세포기에 주입하였고, 야생형 B4GALT1 전사물의 qRT-PCR 정량에 의해서 검증하였다. 오프-타깃 독성을 p53의 델타113 아이소폼의 qRT-PCR 정량에 의해서 측정하였다(Robu et al., PLoS Genet., 2007, 3, e78). mRNA 구제 실험의 경우, 인간 B4GALT1 mRNA를 유전자의 야생형 또는 N352S 변이체의 오픈 리딩 프레임(ORF)을 함유하는 pCS2⁺ 플라스미드 벡터로부터 전사시켰다. mRNA를 다양한 농도의 MO와 혼합하고, 1 내지 2세포기 배아에 공동 주입하였다. 각각의 주입 실험의 경우, 총 200 내지 400개의 배아를 주입하였고, 각각의 실험을 최소 3회 반복하였다.Wild-type (Tubingen) zebrafish stock was used to generate embryos for morpholino injection. Adults were maintained and raised at 27-29°C and embryos were raised at 28.5°C. All animals were housed and maintained according to protocols approved by the University of Maryland Institutional Animal Care and Use Committee. Morpholino antisense oligonucleotides (MOs) were obtained based on previously published MOs targeting B4GALT1 (Gene Tools, Inc.) (Machingo et al., Dev. Biol., 2006, 297, 471-482). MOs were injected at 1 to 2 cell stages and verified by qRT-PCR quantification of wild-type B4GALT1 transcript. Off-target toxicity was measured by qRT-PCR quantification of the delta113 isoform of p53 (Robu et al., PLoS Genet., 2007, 3, e78). For mRNA rescue experiments, human B4GALT1 mRNA was transcribed from pCS2 ⁺ plasmid vector containing the open reading frame (ORF) of the wild type or N352S variant of the gene. mRNA was mixed with various concentrations of MO and co-injected into 1- to 2-cell stage embryos. For each injection experiment, a total of 200 to 400 embryos were injected, and each experiment was repeated at least three times.

제브라피쉬에서의 LDL 정량LDL quantification in zebrafish.

100마리의 5 dfp(days post fertilization: dpf) 라바를 400㎕의 얼음 냉각된 10μM의 부틸화된 하이드록시톨루엔 중에서 실험에 따라서 균질화시켰다. 균질액을, 지질 추출용 제조에서 0.45㎛ Dura PVDF 막 필터(밀리포어사(Millipore))를 통해서 여과시켰다. HDL 및 LDL/VLDL 콜레스테롤 검정 키트(셀 바이오랩스사(Cell Biolabs, Inc.))를 사용하여, 균질액을 제조사의 프로토콜에 따라서 처리하였다. 침전 및 희석 후에, 샘플을 SpectraMax Gemini EM 플레이트 리더 및 SoftMax Pro 마이크로플레이트 데이터 획득 및 분석 소프트웨어(몰레큘러 디바이시스사(Molecular Devices))를 사용하여 형광 분석에 의해서 분석하였다.One hundred 5 days post fertilization (dpf) laba were homogenized according to the experiment in 400 μl of ice-cold 10 μM butylated hydroxytoluene. The homogenate was filtered through a 0.45 μm Dura PVDF membrane filter (Millipore) prepared for lipid extraction. Homogenates were processed using HDL and LDL/VLDL cholesterol assay kits (Cell Biolabs, Inc.) according to the manufacturer's protocol. After precipitation and dilution, samples were analyzed by fluorescence analysis using a SpectraMax Gemini EM plate reader and SoftMax Pro microplate data acquisition and analysis software (Molecular Devices).

엑손 2의 CRISPR/Cas9-매개된 표적화를 사용하여 제브라피쉬 오쏘로그(B4GALT1)의 게놈 넉아웃을 생성시켰다. 넉아웃 동물에서의 배아 치사율(embryonic lethality)의 마우스 보고와 일관되게, 주입된 F0 동물은 성인기까지 생존하지 않았고, 청소년기에 일관되게 사망하였다. 생존성의 결핍을 회피하기 위해서, 배아에 주입된 이전에 보고된 스플라이스-차단 안티센스 모폴리노 올리고뉴클레오타이드(MO)를 사용한 넉아웃 접근법(Machingo et al., Dev. Biol., 2006, 297, 471-482)을 사용하였다. MO의 효능을 qRT-PCR에 의해서 2개의 상이한 농도에서 검증하고(도 10 참고), 오프-타깃 독성의 가능성을 제외시켰다(도 11 참고). LDL 수준의 변화를 정량하기 위해서, 8ng의 MO를 주입하고, 주입된 배아를 5(days post fertilization: dpf)까지 배양하였고, 이러한 기의 라바를 이미 공개된 프로토콜(O'Hare et al., J. Lipid Res., 2014, 55, 2242-2253)에 따라서 총 LDL에 대해서 검정하였다. LDL 항상성에서의 B4GALT1에 대한 역할과 일관되게 대조군 라바에 비해서 MO-주입된 라바에서 LDL의 유의한 감소가 관찰되었다(도 12 참고). 이러한 결과는, 2ng의 MO의 주입 시 LDL 농도의 감소를 생성하는 제2 스플라이스-차단 MO 표적화 엑손 2를 사용하여 확인되었다(데이터 나타내지 않음). 제브라피쉬에서 이러한 관찰의 특이성을 검증하고, 인간 B4GALT1의 기능을 시험하기 위해서, 인간 유전자를 암호화하는 전장 캡핑된 mRNA를 인간 유전자의 오픈 리딩 프레임(ORF)을 보유하는 pCS2⁺ 플라스미드로부터 시험관내 전사에 의해서 생성시켰다. 넉다운 표현형을 구제하는 야생형 인간 mRNA의 능력을 평가하기 위해서, 그것을 B4GALT1 MO와 함께 배아에 공동 주입하고, 공복 라바에서 LDL을 평가하였다. 3가지의 농도의 mRNA(10pg, 25pg 및 50pg)를 8ng의 MO와 공동 주입하였다. 50pg의 B4GALT1 mRNA의 공동 주입은, 대조군 MO만이 주입된 라바에서의 것과 통계학적으로 구별되지 않는 LDL 수준을 초래하였는데(p-값 = 0.14), 이는 인간 mRNA가 제브라피쉬 유전자의 넉다운 효과를 구제할 수 있음을 시사한다(도 12 참고; 라바를 B4GALT1에 대한 MO, WT 인간 B4GALT1 mRNA와 공동 주입된 MO(WT 구제), Asn352Ser 돌연변이를 암호화하는 B4GALT1 mRNA와 공동 주입된 MO(N352S 구제)로 처리하였다).A genomic knockout of the zebrafish ortholog (B4GALT1) was generated using CRISPR/Cas9-mediated targeting of exon 2. Consistent with mouse reports of embryonic lethality in knockout animals, injected F0 animals did not survive to adulthood and consistently died in adolescence. To circumvent the lack of viability, a knockout approach using previously reported splice-blocking antisense morpholino oligonucleotides (MOs) injected into embryos (Machingo et al., Dev. Biol., 2006, 297, 471 -482) was used. The efficacy of MO was verified by qRT-PCR at two different concentrations (see Figure 10) and the possibility of off-target toxicity was excluded (see Figure 11). To quantify changes in LDL levels, 8 ng of MO was injected, the injected embryos were cultured until 5 (days post fertilization: dpf), and embryos were cultured at this stage according to a previously published protocol (O'Hare et al., J Total LDL was tested according to Lipid Res., 2014, 55, 2242-2253). Consistent with a role for B4GALT1 in LDL homeostasis, a significant reduction in LDL was observed in MO-injected Laba compared to control Laba (see Figure 12). These results were confirmed using a second splice-blocking MO targeting exon 2, which produced a decrease in LDL concentrations upon injection of 2 ng of MO (data not shown). To verify the specificity of these observations in zebrafish and to test the function of human B4GALT1, full-length capped mRNA encoding the human gene was subjected to in vitro transcription from the pCS2 ⁺ plasmid carrying the open reading frame (ORF) of the human gene. created by. To assess the ability of wild-type human mRNA to rescue the knockdown phenotype, it was co-injected into embryos with B4GALT1 MO and LDL was assessed in fasting laba. Three concentrations of mRNA (10 pg, 25 pg, and 50 pg) were co-injected with 8 ng of MO. Co-injection of 50 pg of B4GALT1 mRNA resulted in LDL levels that were statistically indistinguishable from those in laba injected with control MO alone (p-value = 0.14), suggesting that human mRNA could rescue the effect of knockdown of the zebrafish gene. (See Figure 12; Laba was treated with MO against B4GALT1, MO co-injected with WT human B4GALT1 mRNA (WT rescue), and MO co-injected with B4GALT1 mRNA encoding the Asn352Ser mutation (N352S rescue). ).

이러한 데이터는, 인간 B4GALT1에서 변이체의 기능성 해석을 위한 이러한 시스템의 사용을 뒷받침하고, 인간 야생형 B4GALT1 mRNA가 전신 LDL 수준의 조절과 관련하여 제브라피쉬에서 기능성임을 시사한다. B4GALT1 기능에 대한 p.Asn352Ser의 영향을 추가로 조사하였다. 부위 지향된 돌연변이발생(O'Hare et al., Hepatology, 2017, 65, 1526-1542)을 사용하여, T 대 C 변화를 인간 B4GALT1 ORF 작제물의 암호 서열에 도입하여 전장 mRNA를 생성시켰다. B4GALT1 p.352Ser mRNA와 MO의 공동 주입은 LDL 표현형의 구제에 대해서 감소된 능력을 초래하였다. 생성된 LDL 농도는 야생형 mRNA와 MO의 공동 주입으로부터 생성된 것보다 15% 더 낮았는데, 이는 통계학적으로 유의한 효과였다(46.6μM에 비해서 39.9μM, p-값 = 0.02). 이러한 수준의 LDL은 또한 B4GALT1 MO 단독(p-값 = 0.01)보다 통계학적으로 더 높았지만(도 12 참고), 이는 미스센스 변이체에 의해서 도입된 기능의 부분적인 결함을 시사한다.These data support the use of this system for functional interpretation of variants in human B4GALT1 and suggest that human wild-type B4GALT1 mRNA is functional in zebrafish with respect to regulation of systemic LDL levels. The impact of p.Asn352Ser on B4GALT1 function was further investigated. Site-directed mutagenesis (O'Hare et al., Hepatology, 2017, 65, 1526-1542), T to C changes were introduced into the coding sequence of the human B4GALT1 ORF construct to generate full-length mRNA. Co-injection of B4GALT1 p.352Ser mRNA and MO resulted in a reduced ability for rescue of the LDL phenotype. The resulting LDL concentration was 15% lower than that resulting from co-injection of wild-type mRNA and MO, a statistically significant effect (39.9 μM compared to 46.6 μM, p-value = 0.02). This level of LDL was also statistically higher than B4GALT1 MO alone (p-value = 0.01) (see Figure 12), suggesting a partial defect in function introduced by the missense variant.

실시예 8: 표적화된 유전자형분석Example 8: Targeted genotyping

QuantStudio 시스템(써모 피셔 사이언티픽사(Thermo Fisher Scientific))을 사용한 표적화된 SNP 유전자형분석을 3,236명의 OOA 대상체에 대해서 수행하였다. 14개의 SNP의 LD 구조를 기초로, 7개의 SNP를 유전자형분석을 위해서 선택하였고, rs551564683에 대한 연관 증거는 4.1E-13인 반면, 그것은 다른 SNP에 대해서 약 E-10이었는데(도 14), 이는 rs551564683이 이러한 영역에서 인과 변이체임을 확인해준다.Targeted SNP genotyping using the QuantStudio system (Thermo Fisher Scientific) was performed on 3,236 OOA subjects. Based on the LD structures of 14 SNPs, 7 SNPs were selected for genotyping, and the linkage evidence for rs551564683 was 4.1E-13, while it was about E-10 for other SNPs (Figure 14), which is This confirms that rs551564683 is a causal variant in this region.

실시예 9: B4GALT1 N352S는 단백질 안정성 또는 세포 국지화의 변화의 부재 하에서 감소된 효소 활성도를 초래한다Example 9: B4GALT1 N352S results in reduced enzyme activity in the absence of changes in protein stability or cellular localization

인간 에피토프-태그화된 Flag-B4GALT1 352Asn 또는 에피토프-태그화된 Flag-B4GALT1 352Ser을 과발현하는 COS-7 및 Huh7 세포에서 B4GALT1의 특성의 조사를 수행하였다(도 15 및 도 16). 도 15를 참고하면, B4GALT1 또는 Flag 항체를 사용한 Flag-352Asn 또는 Flag-352Ser의 공초점 현미경관찰 영상은 동일한 염색 패턴을 나타낸다(축적 막대 = 10㎛). 도 16을 참고하면, Huh7 세포의 간접 면역형광에 의한 세포내 국지화는 내인성으로 발현되는 B4GALT1 및 TGN56, 골지 장치 마커의 공동 국지화를 나타내었다. 유사한 공동 국지화 패턴은 인간 에피토프-태그화된 Flag-B4GALT1 352Asn 또는 에피토프-태그화된 Flag-B4GALT1 352Ser이 과발현되었는지의 여부를 관찰하였다(도 16). 도 16을 참고하면, 인간 간세포암 Huh7 세포에서 과발현된 내인성 B4GALT1, Flag-352Asn 및 Flag-352ser이 트랜스 골지망 마커 TGN46과 함께 국지화하였다. 트랜스 골지망 마커 TGN46와 관련된 내인성 B4GALT1, Flag-352Asn 및 Flag-352Se 하위세포 국지화의 공초점 현미경관찰 영상을 나타낸다(축적 막대 = 10㎛).Investigation of the properties of B4GALT1 was performed in COS-7 and Huh7 cells overexpressing human epitope-tagged Flag-B4GALT1 352Asn or epitope-tagged Flag-B4GALT1 352Ser (Figures 15 and 16). Referring to Figure 15, confocal microscopy images of Flag-352Asn or Flag-352Ser using B4GALT1 or Flag antibodies show the same staining pattern (scale bar = 10㎛). Referring to Figure 16, intracellular localization by indirect immunofluorescence in Huh7 cells showed co-localization of endogenously expressed B4GALT1 and TGN56, Golgi apparatus markers. A similar co-localization pattern was observed whether human epitope-tagged Flag-B4GALT1 352Asn or epitope-tagged Flag-B4GALT1 352Ser was overexpressed (Figure 16). Referring to Figure 16, endogenous B4GALT1, Flag-352Asn, and Flag-352ser overexpressed in human hepatocellular carcinoma Huh7 cells co-localized with the trans-Golgi network marker TGN46. Confocal microscopy images of endogenous B4GALT1, Flag-352Asn, and Flag-352Se subcellular localization in relation to the trans-Golgi network marker TGN46 are shown (scale bar = 10 μm).

COS-7 세포는 낮은 함량의 내인성 B4GALT1을 갖는 것으로 관찰되었고(도 17, 패널 B), 따라서 이러한 세포주를 사용하여 단백질 안정성 및/또는 정상 상태 수준, 및 갈락토실트랜스퍼라제 활성도에 대한 미스센스 돌연변이의 효과를 평가하였다. 이 결과는, 미스센스 돌연변이가 단백질 안정성 및/또는 정상 상태 수준에 영향을 미치지 않음을 나타내었다(웨스턴 블롯에 의해서)(도 17). 도 17을 참고하면, 단백질 안정성 및/또는 정상 상태 수준에 대한 352Ser의 효과를 나타낸다. 패널 A는 유리 EGFP를 갖는 352Asn 또는 352Ser Flag 태그 단백질 융합체를 발현하는 COS7 세포가 COS7 세포에서 발현하였음을 나타낸다. 세포 용해물을 상업적인 항체를 사용하여 B4GALT1, Bactin 및 EGFP에 대해서 웨스턴 블롯에 의해서 분석하였다. 4개의 유사한 실험 중 하나를 나타낸다. 패널 B는 RT-qPCR 분석에 의해서 결정된 B4GALT1 유전자에 대한 mRNA 발현 수준을 나타낸다. 데이터는 4개의 실험의 평균 ± S.E.를 나타낸다.COS-7 cells were observed to have low content of endogenous B4GALT1 (Figure 17, panel B), and therefore this cell line could be used to detect missense mutations on protein stability and/or steady-state levels, and galactosyltransferase activity. The effect was evaluated. These results showed that the missense mutation did not affect protein stability and/or steady-state levels (by Western blot) (Figure 17). 17, the effect of 352Ser on protein stability and/or steady-state levels is shown. Panel A shows COS7 cells expressing 352Asn or 352Ser Flag tag protein fusions with free EGFP in COS7 cells. Cell lysates were analyzed by Western blot for B4GALT1, Bactin, and EGFP using commercial antibodies. Represents one of four similar experiments. Panel B shows mRNA expression levels for the B4GALT1 gene determined by RT-qPCR analysis. Data represent the mean ± S.E. of four experiments.

352Ser의 촉매적 활성도를 결정하기 위해서, 형질주입되지 않은 COS-7 세포 및 발현 벡터 단독 또는 야생형 또는 돌연변이체 B4GALT1 용해물의 cDNA 삽입물을 함유하는 발현 벡터가 형질주입된 COS-7 세포를 갈락토실트랜스퍼라제 활성도에 대해서 분석하였다. FLAG-태그화된 단백질의 발현에 대해서 정규화되는 경우(도 18, 패널 A 및 패널 B에서의 면역블로팅 실험), 352Ser의 효소 활성도는 352Asn에 비해서 대략 50% 감소되었다(도 18, 패널 C). 도 18을 참고하면, 활성도에 대한 352Ser 돌연변이의 효과를 나타낸다. 패널 A 및 패널 B는, 352Asn 또는 352Ser Flag 태그 단백질 융합체를 발현하는 COS7 세포가 COS7 세포에서 발현하였음을 나타낸다. 세포 용해물을 토끼 항-Flag IgG 또는 토끼 사전 면역(pre-immune) 대조군 IgG와 함께 인큐베이션시켰다. 면역침전물을 상업적인 항체를 사용하여 B4GALT1 또는 Flag에 대해서 웨스턴 블롯에 의해서 분석하였다. 4개의 유사한 실험 중 하나를 나타낸다. 패널 C는 상업적인 키트(알앤디사(R&D))로 측정된 면역침전물에서의 B4GALT1 활성도를 나타낸다. 각각의 데이터 지점은, B4GALT1 특이적 활성도와 면역침전물 중에 회수된 352Asn 또는 352Ser 단백질의 양의 계산된 비의 평균을 나타낸다. 웨스턴 블롯 ECL로부터의 신호를 ImageJ 소프트웨어를 사용하여 농도계에 의해서 정량하였다. 데이터는 4개의 실험의 평균 ± S.E.를 나타낸다(*, p < 0.05, 352Asn 대 352Ser).To determine the catalytic activity of 352Ser, untransfected COS-7 cells and COS-7 cells transfected with expression vector alone or expression vector containing the cDNA insert of wild-type or mutant B4GALT1 lysate were incubated with galactosyl. Transferase activity was analyzed. When normalized to the expression of FLAG-tagged proteins (Figure 18, immunoblotting experiments in panels A and panel B), the enzymatic activity of 352Ser was reduced by approximately 50% compared to 352Asn (Figure 18, panel C). . Referring to Figure 18, the effect of the 352Ser mutation on activity is shown. Panels A and B show COS7 cells expressing 352Asn or 352Ser Flag tag protein fusions in COS7 cells. Cell lysates were incubated with rabbit anti-Flag IgG or rabbit pre-immune control IgG. Immunoprecipitates were analyzed by Western blot for B4GALT1 or Flag using commercial antibodies. Represents one of four similar experiments. Panel C shows B4GALT1 activity in immunoprecipitates measured with a commercial kit (R&D). Each data point represents the average of the calculated ratio of B4GALT1 specific activity and the amount of 352Asn or 352Ser protein recovered in the immunoprecipitate. Signals from Western blot ECL were quantified densitometrically using ImageJ software. Data represent the mean ± S.E. of four experiments (*, p < 0.05, 352Asn vs. 352Ser).

이러한 실험은, 이러한 미스센스 돌연변이가 단백질 발현 및 이의 국지화 수준에 효과를 갖지 않지만, 그것은 더 낮은 효소 활성도로 이어진다는 것을 나타낸다.These experiments show that this missense mutation has no effect on the level of protein expression and its localization, but it leads to lower enzyme activity.

실시예 10: 선천적인 글리코실화의 장애(CDG)에 대한 탄수화물 결핍 트랜스페린 시험Example 10: Carbohydrate Deficiency Transferrin Test for Congenital Disorders of Glycosylation (CDG)

3개의 유전자형군으로부터의 24명의 대상체(8명의 부 동형접합체, 8명의 이형접합체 및 8명의 주 동형접합체)로부터의 0.1ml의 혈청 샘플을 사용하여 CDG 시험을 수행하였다. 각각의 부 동형접합체를 이형접합체 및 주 동형접합체(친족 계수에 기초하여 형제 또는 밀접하게 관련된 동일한 성별의 개체임)와 매치하였다. 연령 및 보유자 상태를 또한 APOB^R3527Q에서 주요 지질-변경 유전자 대립유전자에 대해서 매치하였다.CDG testing was performed using 0.1 ml serum samples from 24 subjects (8 minor homozygotes, 8 heterozygotes and 8 major homozygotes) from 3 genotype groups. Each minor homozygote was matched with a heterozygote and a major homozygote (who are siblings or closely related individuals of the same sex based on kinship coefficient). Age and carrier status were also matched for major lipid-modifying gene alleles in APOB ^R3527Q .

수 희석 샘플을 면역친화성 칼럼을 사용하여 이중 세척하였다. APOCIII 및 트랜스페린에 대해서 특이적인 2 스캔 범위로 작동되는 질량 분석계를 사용하여 용리된 단백질의 글리코실화 프로파일링을 수행하였다. 각각의 단백질의 글리코폼 비를 사용하여 글리코실화 결핍을 결정하였다. 마요 클리닉(Mayo Clinic)의 마요 의학 실험실에서 CDG 시험을 수행하였다.Water diluted samples were double washed using an immunoaffinity column. Glycosylation profiling of the eluted proteins was performed using a mass spectrometer operating at a 2 scan range specific for APOCIII and transferrin. Glycosylation deficiency was determined using the glycoform ratio of each protein. CDG testing was performed at the Mayo Medical Laboratory at Mayo Clinic.

결과는, 모든 24개의 샘플이 정상 수준의 모노-올리고당류/다이-올리고당류 트랜스페린 비, 비올리고당류(a-oligosaccharide)/다이-올리고당류 트랜스페린 비, ApoCIII-1/ApoCIII-2 비, 및 ApoCIII-0/ApoCIII-2 비를 가졌음을 나타내었다. 그러나, 모든 야생형 샘플은 정상 수준의 트라이-시알로/다이-올리고당류 트랜스페린 비를 가졌지만, 모든 이형접합체에서의 수준은 중간 범위였고, 모든 부 동형접합체에서의 수준은 비정상적이었고, 매치된 야생형 및 이형접합체보다 상당히 더 높았다(p=7.6 E-10)(도 19). 이러한 결과는, 이러한 미스센스 돌연변이가 B4GALT1의 감소된 효소 활성도의 결과로서의 결함이 있는 글리코실화와 연관됨을 나타낸다.Results showed that all 24 samples had normal levels of mono-oligosaccharide/di-oligosaccharide transferrin ratio, a-oligosaccharide/di-oligosaccharide transferrin ratio, ApoCIII-1/ApoCIII-2 ratio, and ApoCIII. It was indicated that it had a ratio of -0/ApoCIII-2. However, while all wild-type samples had normal levels of the tri-sialo/di-oligosaccharide transferrin ratio, levels in all heterozygotes were in the intermediate range, and levels in all heterozygotes were abnormal, with matched wild-type and was significantly higher than that of heterozygotes (p=7.6 E-10) (Figure 19). These results indicate that this missense mutation is associated with defective glycosylation as a result of reduced enzymatic activity of B4GALT1.

실시예 11: 혈장 당단백질의 전역(global) N-연결된 글리칸 분석Example 11: Global N-linked glycan analysis of plasma glycoproteins

탈시알릴화(desialylation) 및 하이포갈락토실화(hypogalactsylation)가 트랜스페린에만 영향을 미치는지 다른 당단백질로 확장되는 지를 결정하기 위해서, 레그너론(Regneron)에서의 분석 화학 그룹에 의해서 전역 N-글리칸 분석을 수행하였다. 렉틴 풍부 당단백질을 주 동형접합체 및 부 동형접합체의 5개의 쌍의 혈청으로부터 반복물로 추출하였고, 친수성 상호작용 크로마토그래피를 사용하여 표지된 글리칸에 대해서 전역 N-연결된 글리칸 분리를 수행하고, 형광에 의해서 검출하고, 질량 분광분석법(HILIC -FLR-MS)에 의해서 분석하였다(도 20 및 표 5). 도 20을 참고하면, B4GALT1 N352S의 부(SS) 동형접합체 및 주(NN) 동형접합체의 매치된 쌍으로부터의 당단백질의 N-글리칸 분석의 대표적인 HILIC-FLR-MS 스펙트럼을 나타낸다. 이 결과는, 부 동형접합체가 유의하게 더 높은 수준으로 하이포갈락토실화되고, 더 낮은 수준으로 시알릴화된 글리칸(단지 하나의 갈락토스 및 하나의 시알산을 갖는 바이안테너리 글리칸(biantennary glycan)(p=3.1 E-5), 단지 하나의 갈락토스를 갖는 시알릴화되지 않은 바이안테너리 글리칸(p=0.001), 및 갈락토스 및 시알산 둘 모두가 누락된 절두된 바이안테너리 글리칸(p=0.005))을 갖는다는 것을 나타내었다. 다른 한편, 부 동형접합체는 2개의 갈락토스 및 2개의 시알산을 갖는 바이안테너리 글리칸을 유의하게 더 낮은 수준(p=0.001)으로 가졌다(표 5). 부 동형접합체 중에서 유의하게 더 낮은 갈락토실화(p=9.2 E-5) 및 시알릴화(p=0.001)가 존재한 반면, 푸코실화 수준에서는 차이가 없었다(p=0.5). 혈청의 CDT 및 전역 N-글리칸 분석 둘 다는 부 동형접합체에서 탄수화물-결핍 당단백질의 유의하게 증가된 수준을 나타내는데, 이는 B4GALT1N352S가 결함이 있는 단백질 글리코실화로 이어진다는 것을 나타낸다.To determine whether desialylation and hypogalactsylation only affect transferrin or extend to other glycoproteins, global N-glycan analysis was performed by the analytical chemistry group at Regneron. carried out. Lectin-rich glycoproteins were extracted in replicates from five pairs of sera from major and minor homozygotes, and global N-linked glycan separation was performed on labeled glycans using hydrophilic interaction chromatography; Detected by fluorescence and analyzed by mass spectrometry (HILIC-FLR-MS) (Figure 20 and Table 5). Referring to Figure 20, representative HILIC-FLR-MS spectra of N-glycan analysis of glycoproteins from matched pairs of minor (SS) and major (NN) homozygotes of B4GALT1 N352S are shown. These results show that heterozygotes have significantly higher levels of hypogalactosylated and lower levels of sialylated glycans (biantennary glycans with only one galactose and one sialic acid). (p=3.1 E-5), unsialylated biantennary glycans with only one galactose (p=0.001), and truncated biantennary glycans missing both galactose and sialic acid (p= 0.005)). On the other hand, heterozygotes had significantly lower levels (p=0.001) of biantennary glycans with two galactoses and two sialic acids (Table 5). There was significantly lower galactosylation (p=9.2 E-5) and sialylation (p=0.001) among heterozygotes, while there was no difference in fucosylation levels (p=0.5). Both CDT and global N-glycan analysis of serum show significantly increased levels of carbohydrate-deficient glycoproteins in heterozygotes, indicating that B4GALT1N352S leads to defective protein glycosylation.

본 개시내용은 상기에 기술되고 예시된 실시형태로 제한되지 않지만, 첨부된 청구범위의 범주 내에서 변경 및 변형될 수 있다. 본 개시내용은 또한 본 명세서에 언급된 임의의 제목의 사용에 의해서 어떠한 방식으로도 제한되지 않는다.The present disclosure is not limited to the embodiments described and illustrated above, but is subject to variations and modifications within the scope of the appended claims. The disclosure is also not limited in any way by the use of any headings mentioned herein.

SEQUENCE LISTING <110> Regeneron Pharmaceuticals, Inc. University of Maryland, Baltimore <120> B4GALT1 Variants And Uses Thereof <130> WO/2018/226560 <140> PCT/US2018/035806 <141> 2018-06-04 <150> US 62/515,140 <151> 2017-06-05 <150> US 62/550,161 <151> 2017-08-25 <150> US 62/659,344 <151> 2018-04-18 <160> 17 <170> PatentIn version 3.5 <210> 1 <211> 56718 <212> DNA <213> Homo sapien <220> <223> wild-type B4GALT1 genomic sequence <400> 1 gcgcctcggg cggcttctcg ccgctcccag gtctggctgg ctggaggagt 50 ctcagctctc agccgctcgc ccgcccccgc tccgggccct cccctagtcg 100 ccgctgtggg gcagcgcctg gcgggcggcc cgcgggcggg tcgcctcccc 150 tcctgtagcc cacacccttc ttaaagcggc ggcgggaaga tgaggcttcg 200 ggagccgctc ctgagcggca gcgccgcgat gccaggcgcg tccctacagc 250 gggcctgccg cctgctcgtg gccgtctgcg ctctgcacct tggcgtcacc 300 ctcgtttact acctggctgg ccgcgacctg agccgcctgc cccaactggt 350 cggagtctcc acaccgctgc agggcggctc gaacagtgcc gccgccatcg 400 ggcagtcctc cggggagctc cggaccggag gggcccggcc gccgcctcct 450 ctaggcgcct cctcccagcc gcgcccgggt ggcgactcca gcccagtcgt 500 ggattctggc cctggccccg ctagcaactt gacctcggtc ccagtgcccc 550 acaccaccgc actgtcgctg cccgcctgcc ctgaggagtc cccgctgctt 600 ggtaaggact cgggtcggcg ccagtcggag gattgggacc cccccggatt 650 tccccgacag ggtcccccag acattccctc aggctggctc ttctacgaca 700 gccagcctcc ctcttctgga tcagagtttt aaatcccaga cagaggcttg 750 ggactggatg ggagagaagg tttgcgaggt gggtccctgg ggagtcctgt 800 tggaggcgtg gggccgggac cgcacaggga agtcccgagg cccctctagc 850 cccagaacca gagaaggcct tggagacttc cctgctgtgg cccgaggctc 900 aggaagtttt ggagtttggg tctgcttagg gcttcgagca gccttgcact 950 gagaactctg gtagggacct cgagtaatcc actccctttt ggggactgac 1000 gtgaggctcc cggtggggaa ggagactgac ctctcggttc acgtgtcttg 1050 ccatagagcc actctcctga gtgggttttt ctcctgatcg tttgggccaa 1100 gtgacttctc tctgaacctc atatttctct tctgggataa taaatggtca 1150 ccctttcaag gggttgtttt ggaagatatt gtgaacaatg gtaaataagg 1200 gcttaattaa tgagggtaag ccctcagtaa attgtcactg tgtgttcatt 1250 tcttcctctg tgtggatcgt gaccgagagc ccttccccct agcctcctcc 1300 tggtatgggt acccaaaacc taggtgagca gggatctctc ccaggggcag 1350 agagcttgtg tactctgggt gttagagggc taaaatataa ccagtcaaca 1400 ccacgttgcc catttctggt acttccggta gcagcctgag tctcaattat 1450 cttgcccaga tgatctgaac tctgacctct agcctgtttc agcataggca 1500 gagagcttga gtaggtgagt ttgcattcct catagcagct ggctgagcct 1550 agtctggact tctctttgac ctgtaaccta caggcccaca ggcccaaggc 1600 aaccacaggt tgcttccagg gttaccacac aggtggtttc tcatttctaa 1650 tgctaggttt tagataattg ttgtaagtga ggggccctgg caggcaggat 1700 gacatcctgc caataggagt tttctgtcac tttcccacag agccctggct 1750 actacatact cttgctcaat ttcgccagta attgcgtcaa tgtgttcata 1800 tcaagtttgg gaagaacatc ttggaattgg tcagacgtga actgtggtaa 1850 taatgggggc ttgttttttt aagcagataa ttaaattcct ttgcatttga 1900 tgattattct gggaagcaga ctagtcccat aaaatgaaat ggactctgcc 1950 ttgctgctaa gtgtctgact tgagacatgc tatcgagttt ctcaaaatct 2000 cttccttgtg taaaatgtgg ttgtcgatga ttaccttaca ggggtttttt 2050 taagactaaa tgagatcgtg tacattaaat acaggcactc aggctgggca 2100 tggtggctca cgcctgtaat cctagcactt tgggaggctg aggggagtgg 2150 atcacttgag gttaggagtt tgagaccagc ctggccaata tggtgaaaca 2200 ccatcccatc tctacaaaaa tacaaaaaag ttagccaggg gtggtggcat 2250 cgcagctact caggaggccg aggcaggaga attgcttgaa cctgggaggc 2300 agaggttgca gtgagtcaag attgtgccag tacactccag cctgggcgac 2350 gaagcaagac tgtctaaaaa aaaaaaaaaa aaaaaaaata cgggcactca 2400 atacaccgta taataataat atagtaataa tatttgctta ggatctttaa 2450 aaagtttcat tttttcagac tcccacagaa atggctctgc acagcagagt 2500 gaagggggag agagactgag tctccaggcc agaaaaaggc caggtttttt 2550 gcttttgttt ttagttgttg cctggatatt gcacagaaag aaaaaataat 2600 tagcaagtta aacaaaagta ccgcaaagtt gattacattg gtatttgagt 2650 atcacatctt ctctcagaag cgtaagagac aaggtcgtga ccatacctct 2700 gcttagtttt gttttgtaat ggtgttgcta gtgatcggct tgtcaccagt 2750 tactggtgtt tctaaatgga ctataattgg ctacttgaaa ggacttcctg 2800 agaaagaaca ttttggagga cgaggagaga gtgccttctc tattttggct 2850 gctttcatgt gacatgcaag agaccatgac gtttaggctg ctgctgaggc 2900 agccccagaa atgggggccg agaggtcttt tcttcatttt aatagggtct 2950 gtaggtttgg gtggttaggt acagttctca gaatggaggt tcctggctat 3000 gaggccttga gaaagctgaa agtctccttg ggagtgtgtg ggtgggggga 3050 gtcgagccca tctgttcatg ggcaggtgtc agccaaagcc cttgcgggtg 3100 gttttgaggt tggtgggaga aagcatccgt ggggtttaga gttgtggcct 3150 tttcactact tgcagttcct ttccccgact tggctttact ttctggtgtc 3200 caggggtctg ggccagatgc tgagattcct ctcagctgac aggtgtgggt 3250 tatgggcaaa cccttccctg gaggacataa ggcaccggat tggactgctg 3300 atgggttgct gttggagttg tcagggcctt ggaatagtct tcagatagac 3350 ttgggttagt gtgacctggg gcaggctgca ggtttggagc catagtaccc 3400 cccgccccca caccgggcac cctgctctgg gctaatgtga ggcttgcagg 3450 agtgagtgat gcagtgggaa ggggggcctt tcctgaggat tctacagctt 3500 tctccaggga atcctcccag gtagtttagg cctgcaggtg ctatgctatc 3550 cttctttcct aaccctgtct caggtcctca gcggggccat gcggcatcca 3600 cttataaccc tgcagcgagg ccctcttttc tggccacctg ggtgtttgcc 3650 tgctgagatg ggaggaacag tggccttggg cttcttcccc cgtcatgttt 3700 atctctgctc agattgggca gcagctcaat gggacttgac cagctgtggc 3750 actgccagtc tgaagatgag tagggtgatg gggggaggtg ggcagtacct 3800 gaagctgaac tggtgagaga ggcaggctgg cctgggggct cagctggggc 3850 ctgggatggt tggtacagtc ccctcagggg ggtaggggag tgagtgttag 3900 actgcttaag cctcagaggc cgctcttgcc cacctatgct ttgaggagat 3950 cctcttcatt tgttcaaagg gaagactctg atctagagat gggcacttgg 4000 accagcaaac agcagctaca ggtagccagg gcacccgagg agcacttgct 4050 catgagccgg tttccctggt ttttatgggg gctgttgctg agcgtctgcc 4100 agggtttgtg tcctagcact tgctggtctt tgctgggctc tcagctctca 4150 ggtgtttctc taccagcacg tttccccctc cctcatatgc acacatgtgg 4200 acacaagcag gctgcccagg acagagtgta ctttgaggct tgggaaagga 4250 ctctctctcg cccttttggg gatgagcctt ggaacctcat caccttccgg 4300 cttggggtgg agcttcatcc tgggggttga agctttaggc tcagataact 4350 agtcttgtaa gccagttttg tcctgttgtt tttttcgtgg aaaataatgt 4400 attgacgtat acacagacat tctttgtcta acagtctgag attgagaaat 4450 accctccatg actatttggt ttgctttcat ggtgaaactt ggtcgctttc 4500 ttagacacag cctatggcaa taagagtgat ccctggctgc tgtaattcat 4550 tccagacttt gagcaaacac aaggcaccgc ctccacctgc agtggagcct 4600 ctgatgaacc aaatggaaac tccttgggga atggggagta agagccaaat 4650 gtgggattgg acttaaactg cagcttctta gaactgtagc attccacgat 4700 gggattgtct agtgctcttc ctggaggtta ctattcaata gttggctagt 4750 gcacaggttc aggggtgacc tgatatgccc tagcgtttca gaagatccct 4800 gcaaggtgtg tcttttggtc catctgaagg gtcttgtatg gtgatcttgt 4850 atggatatcc gtgacggcta aggcatctga taacttcatt ccttcagttc 4900 cagcagtgtt cctgtattat gctgggcact agagctacaa agaagaaaac 4950 aaagtgcctc ctcttcagga actcttaatt taggcagggg aggcataatt 5000 gaacagtgct gaggtcatct aggggaacca aagtgtgtat ttatcccctt 5050 ccctatcact cccctccctc cttcatttct tcctttcttc tttcagaaac 5100 tccaagttca tatcaaaatt ctccagccct ggttttattt ggttgtgtga 5150 aaattttcct ctaatttctg aagctatgca ttagttctgc tgagtaatct 5200 ttaacttgct gctttataat gattataatg agatatcact gggtattatg 5250 gtctttgggt agcagcaggg tagggatttc caggctggga ctaagctaat 5300 ttatgggttg ggaattatgg ggcagttaat agcaaggcag tccaagcttt 5350 ccacagattc caccctaggg accatccaga cttaaggaac agggccggca 5400 ggctcatccc ctttgcactc agctgggcta tgggtgtgtg tttgtgaaag 5450 aggtttattc agtagtcata cctgctgatt tccctgctat ctgtttaccc 5500 agtgcctcct gtaccttgtt tcttactctt tgttctctgc tcttactatg 5550 aagaagcaga gactggaatt ctgcttgaac ccacatctac ctggaaattc 5600 cagtttttct tgtccagtgg agcagcaatc cagttgtttt aggacaaatg 5650 gtctgccctt gaagcttaaa tcctttgagg gcctggcatg gtgacagttt 5700 tacatttggc tttggtatag actggtgtgg tccctgggca gtgaggtcac 5750 tgtaaggcca gccagccaga ccctggctcc taggggaatt aacaaggcat 5800 gggattagac tcacagggtc cctcctgtcc ctaaacttgg taggggttcc 5850 tgggagccag actgcgatta agattgtaga gacctgagac ctgagttgta 5900 ggggcctctg tgttgatctg ggccattgcc gggtgagctg aggcggtcac 5950 tagctcaagg agtgatctca ggatattgtt ctgtaagtca gagacctcca 6000 ggttggagag tggggcttgg gggtggggga cagggtttag tggggagctg 6050 gttctgggtg aatgtggcct aaagggattt gtccttagaa gacagagggg 6100 tgagtcacac actcagtgct tcaggttcca ctttgcggct tggcctcagc 6150 ccgccccttc cctgcacaaa tgaaggccag gggctatata attggctgtt 6200 gctgaattct ttggcagtga ttttaaagtc tggtctgggt gtgttatgta 6250 gctgcttctc tatccactcc ccacacccgc tgcttctcca gagcccctca 6300 caaagcccag gcagagagag agagagagag agagagaatg acttgcctca 6350 cagagatgtt ggggataggg ataggggtat gggtctttgc ttttgccttt 6400 tgagggggga taatctcttc cttcatttta aaagtaaaaa gtaatgcagg 6450 ctcattgaaa ataatttgaa aagttgaaag agatataaaa gcacacccaa 6500 attcctatca cccaaaagaa acataccggc atatttccta ctagtctttt 6550 tcatgtttaa gaatatagct gatatatttt tttttctttt tctttttgag 6600 acagggtttt tgctctgtca cccaggctgg agtgcagtga tcacggctca 6650 ctgcagcctc gacctctcgg gctaagcgat tctcccactt cagtctcccg 6700 agttgctggg accacaggtg cacaccgcca tgcctgacta atttttgtat 6750 tttttgtaga gatggggttt tgccatgttg cctaggctgg tctcgaactc 6800 cagagctcaa gtgattcacc tgccttggcc tcccaaagcg ctgggattat 6850 aggtgtcagt caccacaccc agtgttatag ctgttgtctt tatagatgaa 6900 cagatagatt gacatagatt catgtagata gcctggtgtt cagcattttt 6950 catttaagat tctgtcacag acttgaccct atacctttaa aaatcacaaa 7000 ggcagtatca tagtctgtca gctgaatatg ccataactta aaaaaatcat 7050 tcaactgttg ctgaacacac acatatacat atatagtttt tgttttttct 7100 tagtgatgta gtgatgcttg tgcagaaagc tttatgtact ttttggatgg 7150 tttctgtagg agagctttct aaaaaaggaa aaaaagtgtt gaatgttttt 7200 tgagaagggc tagattttca agccagtctt acaaaaggat agactcattg 7250 gaaattccag atttgcttag tgctggcaga tgagtatcac ttattgctga 7300 acaatgtgtc tagaattctg attaaaaaag aaactaggtc caggaagtgc 7350 ctgggggcag gggcaaaggg ccaggctgca ggataggctc ttaggatctg 7400 gctgagcaga aatctgctgt gaacagaatc ggtgggggtg atgctttctc 7450 agtaacttct ccatttgttt ctttagcagc taagtccctg tgctggactt 7500 ctgtggacta ctgtggctct ggggctgtgg ttgtgggtga acaacagcta 7550 gctaaaccag tgctgttgac atcattgaga tgtgacgcac aggaaggtgg 7600 gagcaagctt gcaaatcaga ttctgaaaca tatagcacag ctctcccacc 7650 tccaggtggt cctgagatct agggaggagc catagtgaga aactttaggt 7700 ttctaggaat tctcttaggg agaagctctc ttagggagag gcagaacctg 7750 gttctcagtt ggggctgatt caggtgggtt agatcaataa agcctcaggc 7800 cagtgtgcca ggctattccc aaggagtata ctttgaagtt actcccttta 7850 gaatgtcctc agtggagata aattctctct gaggagcagt tttgtctgcc 7900 ggggtcattt ggcacaaagc ctggagtgct agggcgaggt tgcactgagg 7950 gaaggggcag gattatgtca gcagtgtgac ggatacagtg tgaggtcagg 8000 ctccttcctg ccccaccacg ggggcctaga ggtcatgggg agggtccctg 8050 gcaggggatt caatcattgc ttggccccat gacagagtat attctaaaaa 8100 tgccttaagt ttttttcttt caaagtttct tcctgttttg cataatggcc 8150 ttttgccttt gacatcctga aaccgcagag ctgtcattgg tgttgcagga 8200 cactgccagc ttgaaaaaaa tcaacaacaa aaaaagaaac aggaaaggat 8250 gtggagttca gggtgcggcc tagggaagct ggtatttgcg ttatgggatt 8300 gtggggatgt ggtattaagg tgttgggtag cgcctgacat ttagaggagt 8350 actctgggca gagtccctgc ctgcccaaga ataggtagaa ttgagtcttc 8400 acaccaaagt caggagagac cccctccccc caggaagaga atgaacaggg 8450 actcatttcc tcattcagca aacttttatt ggtaactaca ctatatgaag 8500 tgtgagagat agacatgaac aagagaggcc cccactcttg ggcagtccct 8550 tagtagtagt agatagactc tggcaatatg gtgtggtcag agagaggaag 8600 cctgggtgct ttgagggtac tgaggaggtg cagggagcca aatgggtggt 8650 ctgggccagg gccagagtca gaatgaagga cctctcttcc agacgttgat 8700 tttagcatct ctgtctctca gtatgtttga acagtctccc ttattggaag 8750 ggcaggagtc tactgctaaa agtaacctgc gatttcctct acttgctgtc 8800 atgtggaaag aatactaaag ctgaaattcc aaaagttgca cacctttacc 8850 agcagggcag gagaggaaag gaaatggagg cagagtgagc tgaagatgat 8900 aaaagaaaga gaaggtggtg cagtttggac tgttatggac agaggaagtc 8950 tgagggtagc tggactgagg gatcaaaggg aggcagttga aagggaagag 9000 agctgcagag agggatttct tggtctgcag agggtaggag caagccttga 9050 aggctgctgg agtgaggatt ccgagccctg gtctttattc tttttctaat 9100 tcattacatc attttaggca agtcctaact cctttggtct ctgttgtctt 9150 tctgaaattt gagtgggctg ggcctgctgg tctttagcct ctgtctttct 9200 ctacctccta gattccagtt tggcgagtgg gggggaaaac ctggttgtat 9250 atgcaacgtg aaaggcctct ggaattcctt ttgaagctca ctacccatga 9300 ggcttctgct aaggatttca tcatgtctgt ctaagcagac ataaaaattt 9350 tagcaggtgg atgacccgta gaaatggcac aaggaatgtt tctttctgtc 9400 acactgtggt atttgattta agaaagttgt tatcctctct gtgcctcagt 9450 gttctcactt gtaaaatggc aataacagta tccacctcat agatgttatg 9500 aaatacaggt agtagccacg aaagggctta aaacagtgcc taacacagaa 9550 taagttgtga atatatgtta tttattattg gtagtataat gcttatttgt 9600 gaagattttg gcttttgctt tataggacct tttttttttt tagttgaaaa 9650 tacaatgtta ccatgttaaa tgttaaaaaa aattctactt accattgtaa 9700 cagaacatgc tcccacttct gtaacagagc ttgctattac ttttcaaatg 9750 catacatatt ccaatgcata tattccaatg cagttgtaga gtgaaactgt 9800 ttgcatgcag ccatttttat ccaacattat cttataaaat gttatgttgt 9850 ttatgattat cctaattatc ttttgttgct gtctagtatc cttatagata 9900 ttccattagc atacactatt ccaggtttca ctatcgtcga taatctagat 9950 atgaacattt ttgtagtgtg tagctctttg cttcagttga attactttcc 10000 tgggataaat tcctggggaa gaatttctag gccagaggat atggtcatct 10050 tgacaatact gattcacatt gctgcattgc tttccaagag gtttggaatc 10100 attcacaggt tctaaattgg aaaatcctgg cttttgaagt atgtggattc 10150 taagggcgat ttggatctag ctggagcctc acactgacac ttccagccag 10200 tgtgtgtgtg tgtgtgtgtg tgtgtgtgtg tgtgtgtagt tccctatgct 10250 ggacaccgtg tgtgtgtgtg tgtgtgtgtg tgtgtgtgtg tgtgtagttc 10300 cctatgctgg acaccatgtg gcctttctgg acattagggt tttcctgtga 10350 ttgcctcaga gcagttcctg ttgaattcac tctgtgtcca caaaaggagc 10400 cttactgtgg ctctttcaac acccacctac ctttgccaag ttggtttaca 10450 gaaagtaaga acattctttc cttcttcctt gatatgtggc gctaaaccta 10500 tagcatgggg caggctctgg ctttaaaaac ctgacttaaa aataatggtg 10550 ttgatcaaaa agtttgtgga tcagtttttg gaaacactgc atgtagccat 10600 ccatagaaac ttatattctg ttgggctagc ctgggcgcct gatcatttaa 10650 ctcatgtgga tgaacttcta tgtaatagcc ctggtgtatg ggatccagaa 10700 acagggccct aatgaagaaa ggcttttaaa ttatgttgga taaaaataag 10750 ttgttacaat agcccaaagt ctgcaaatat gaattgccag ttctgtcctt 10800 gtagtcatcc accatgtgcc tgcatctttt gtagactctt gtagattcag 10850 aagcccactg aattgcataa atgatggaat gattttagac ttagtgattt 10900 cagtgactaa aagtttacag atcctggccg ggcacagtgg ctcacacccg 10950 tattcccagc actttgggag gccgaggtgg gtggatcacc tgaggtcagg 11000 agtttgagac cagcctggcc aacatggtga aaccttgtct ctactaaaaa 11050 tacaaaaatt agccgggtgt ggtggcatgc acctgttgtc ccagctactt 11100 gggaggctga ggtgggagaa tggcttgaac ctgggaggcg gaggttgcag 11150 tgagcccaca tcaggccact gcactccagc ctgggtgaca gagtgagact 11200 ctgtctccac ctcccccgcc ccccgaaaaa aaaaaaagtt tacagatcca 11250 gcagatgggg catattcaat ttgtgacagc cactcccttc accttatagc 11300 tatgtcatat gtcttcttct cctttgactg cattctgcag cagtcagttg 11350 tgacttaata tggcactctg ggcccactga attaggtcag agctgctagt 11400 agtatattgt tcctagagac ctagggcaag attttcttac tacataaaat 11450 gagggagata atttcttacc tcaagatgtt ggtaagagga gtgaatgagg 11500 ttagttatat ggtaatatca gtactctgaa tgtcttttga tcaatgccta 11550 actcatcttc ttgggcacaa aaggcataca gtcagcaccc ttaggccaca 11600 tataaaattc ctccaaatgc aggttttcat ctgccttggg gcagagtcaa 11650 gagaaagaag aggaagaggc gtgaggctct gaccacaact tagggacaga 11700 atatagccca aagcgagtac cccaggccac aaggagaagg ccgctatctt 11750 gttgaatcca cagcactgga aacttggagt gtgtgttccc ctgtgtcagt 11800 tacactggaa ttttatggct gctcacattc ttcccttcag gtggacgttg 11850 ttcatcagta tcctgggcaa gaggccatca taaaccacag acagctgagt 11900 gattaggaag aggagctgaa gagggagcat tagatgtttg attgagtctt 11950 aggtgagaaa gtatatcatt aaaacaaaaa gatagatgta ggcgggctca 12000 gtcttgtgtg cctggtgtgt tggtagaaaa actaaagcac aagcctgtag 12050 ataacctgct ttattctacc tcggggctgg tgttggaatc caggatgcca 12100 gaccctaaag tccagctctc tttccaacct actgaataat ccgagagaaa 12150 tcatgttctc tctctgggcc tcagtttgcc catgtataaa atgagatgaa 12200 ggattggctg ggatgctctc cagagtctct tcctgcctgg agttctgacg 12250 tagccatgta ctcctgctca gcatcgctaa atggctttgt ggtaggacca 12300 ttgagtgctg cctccattag ggccagctat gtaatgctgg ggtggctgtc 12350 actgggccct aagagccagg attggtctta ctggagaaat ccacatccac 12400 ctaaacttaa gacccagggg tgtccaatct tttggcttcc ccaggccaca 12450 ctggaagaag aattgtcttg gaccgcatat aaaatacact aattatagcc 12500 gatgaggtta aaaaaaaaaa actcaatatt ttaagagagt tcatgaattt 12550 gtgttgagct gcattcaaag ccatcctggc cgcatgtggc ccatgggcca 12600 tcggttggac atgcttgctt tagacctccc agcaattcta gtctctaaac 12650 aggaaatcaa aagtcaagat gaatagataa gttggtcagt gtgaaaaagt 12700 aattggtggg agccactgta gatgcagggt tctaggctcc atcaacaacc 12750 acctacatca ctgaacgaaa gataatgctt gttcagcact tattacatgc 12800 caaccatggt aaaaatactt cagatgcatt gttttcatga actctcacag 12850 cagctctttt tcttgcctaa atgccccgtt agaacctcca gtacaatgtt 12900 aaatagatat gctaagagac aacatatgtg tcttgttagg gggaaaatat 12950 ccagtctttg actattaaga atggtgttag cagtgggttt ttcctaggtg 13000 ccctttatca ggttgaggaa gttcctttct attcctggtt tgttgagtat 13050 ttttatcatg aaaaggtgat gggttttgtc aaatgctttt ctgtgtctgt 13100 tgagatgatc atgttttttt gtcatttatt ctattgatat ggtatattat 13150 acattgattt ttcagatatt aatcttgcat acctgggata aatcccactt 13200 ggtcatggtg tataattctt tttatttgtt gctggattga gtttgctagt 13250 attttgttga tttgtattca taacagatag tggtctgtag tctttccctc 13300 cctccctccc tccctccctc cctcccttcc ttccttcctc tctctctctc 13350 tctctcccct cccctccctt cttttcccct cctctcccct ccccttccct 13400 ttcttctctt tcatagttgt ttaccactgt cagaaaaggt ctgttcgttt 13450 tctttcgtcg tgagatcttt gtttggtttt ggtatcaggg taatactgcc 13500 tcaaaaaatg agtagggaag tgttccttcc tcttctgtat tttgagagag 13550 tttgtggtcg gtttttatta attcttcttt aaatatctgg tagcgttcac 13600 cagtaaagcc atctgggcct gatgttttct ttgtggaaaa ctttttgatt 13650 cctaattcag tttctggtta taggtctatt cagaccttct attttttctt 13700 aagtcagttt tgatagtttg tgtcttccaa ggagtttgct tcatctaagt 13750 catctaattt gttggcatac atttcatagt gattccttat gatccttttt 13800 atttccgtta aagttggtgt agggatagtc cctctttcat tactgattat 13850 aataatttga attttctttt tttcttagtc ttgccaaaag cttgtcattt 13900 ttattgatct tttcagagga ccaactttga gttcattatt tgttctcttt 13950 gttcttattt ttctgcttca ttaacttctc taatctttat tctttcattc 14000 tgcttgcttt tggttaagtt tgctttttct ggtgtcttaa ggtagaaggt 14050 taggttactg atttgagatt taaagatcat gctctttaaa cgttttgata 14100 gatactgtca gtttgccctc tggctttttc tcattaacag tgtataggag 14150 tgcttattcc tcacactcat accagccctg ggtgttacta acctttatat 14200 atttgccagt atcatattca gacatagtat cttgttttaa tatgtttctc 14250 tgattactga tgaagttaag caaattttca cgtgtttatt ggccatctgt 14300 ctttcttttt tcatcctttc tttcaagatg ggagtctttg ccatgttgcc 14350 caggctggac tcgaactcct gggctcaaat gatcttcctg cctcagcctc 14400 ctgagtagct gggactatag gcgtgagcca ccatggctgg cttgcccatt 14450 tgtatttctt atgtgagtat tttttctttt tttttgaagt ggagtctcac 14500 tccatccccc agagtggagt gcagttgtcc gatcttggct cactgcaacc 14550 accgcctccc aggttcaagt gattctcaca ccttagcctc ccaagtatct 14600 gggactatag gtgtgtgcca ccacacctgg ctaatatttg tatttttagc 14650 agagatgggg tttcaccatg ttggccaggc tggtttcaaa ctggcctcaa 14700 gtgattcacc tgcctcggcc tcccaaagtg ctgggattac aggtgtgagc 14750 cactgtgccc agctgacttt ttttttcttt tttttaaccc tttttttttt 14800 ttaccctttt tttggcccat ttttttttac cctttttctt ttaacccatt 14850 tttctattag ttttaaaaat atgtttgcag gagcttttta tattgtggat 14900 ttttcttgtt tattacatat catttgtaaa tatggtctct ccatctgtca 14950 ctcttcttta tctctggttt ctttagctat gtagaagttg ttatgttatg 15000 ttatgttatg ttatgttatg ttatgttatg ttatgttatg ttatgttatt 15050 ttttggagag ggagtcttgc tctgtcgccc aggctggagt gcagtggtga 15100 aatctcggct cactgcaacc tctgcctcct gggttcaagc gattctcctg 15150 cctcagcttc ccgagaagct gtgattacag gcacccgcca ccacacccag 15200 ctaatttttg tgttttagta gagacggggt ttcactatgt aggtcaagct 15250 gatctcaaac tcctgatctc aaatgatcct cccaaagtgc tggggttaca 15300 ggcgtgagcc actgcactcg gccagaagtt ttgaattttt atgtgtttaa 15350 atctatgttt tcctttatga cttcaggttg ctttcatact taagcaggtc 15400 ttcaccatcc caaaatgata aaatttttct cctgagtttt cttctaagtt 15450 ggttctttag aagccaccaa cttggcttcg acagcaaaag atgaacagaa 15500 tttctgttca actctcatgc tgcaagaagc tttatgtaat actccaggga 15550 ccctttaagg tcccagagtt ttcctccaaa tctatcagtg attctagtgg 15600 ctaagagtag aaatgtgaaa atttagccat gtgtgctgat agagctgtag 15650 taatttgtaa gctctgaagt tctaaggagt caggggagaa gggaaagtaa 15700 catttattga acatctatta gctcaataag aacatgcgat aagtatgtat 15750 atgtattatt tcacttacat ctgaaaggaa ggcataatta tccccactcc 15800 ttagagaagg aaattggagc tggctacatt taaagtagtc ctgacaccag 15850 agagatattg ccaggagtac ttggctggct gagtgcccag atggcccata 15900 ggagtagtgg gccctccaca gtccaaggtc tggttctagg tggagagaga 15950 aggatgtgct cgtagtcagc accgcagctc cagaaaatct gctggggctc 16000 caaaactgat tagaggggca gctgactcag taataaaact cccaggagac 16050 ttacttacat actggaatgc aaagttgcag ctttactggg aagattagaa 16100 ctgttattga gtagcttaga aatctctggc tgaattcact gcaagggaag 16150 ccgcaggata agctaactgc tggtgagtca gcagtcagag cagggaagtg 16200 aatttaacat tagatgggtc agtctctcgt ggctgatgaa ttcatcccca 16250 caatactgta cacctgcctt agggaccttt gtctggacta ggggttgggg 16300 tccccctcct ttgtacagcc ctggaaggac acatccagct ccatccgcca 16350 tctctccctt acttatttcc ttccttcctt ccttctttcc atccagccat 16400 caagcttcct ttcatggcca ataatcatca ttggggtcta ctcatggact 16450 ctcttgcctc atgtatttgt tttattttgt cctcattccc acttctattt 16500 cccaggtata tcacaggcaa ctattctaac gtatttatag tttgtgtatc 16550 tgtttttgct cttgccaaaa tggaagccac tgctttatac atagatgtat 16600 tcttaacttt aaaaaaaatt tttttagatt aacctacaat aaaattggct 16650 ttttggcata tagtctataa attttaacac atacatattt ttgtgtatct 16700 accaccacaa tcaggataca gaacagttcc atcaccccaa aaaaatccct 16750 cttgtagtca cattctcctc ccacccttaa tcccaggcaa ccactgatct 16800 attcttcatt actattgttt tgtctttttg aggatgtcac ataaatggag 16850 tcacacagta tatatacatt tttttaaaca tatgtaaatg gcattttata 16900 gctcattttg attatatgtt tttcatccag ttctgttttt tttttttatt 16950 tttaaaaagt ttgacataac ttcagactta cagaaaagtt gttagactaa 17000 tacaaagaat tcctggatat cctttggagt ccctaaatgt taacatttta 17050 ctatatttac tttttccttc tctctctctc tctctctcgc tctgtgtgtg 17100 tgtgtgtgtg tgtgtgtgtg tgtgtatcta cctgtagata gatagatatt 17150 aatataattt tagatagatg tatctagatc tctctctctc atatatatgt 17200 gtgtgtgtat atatctatat ctatatctat atatatctcc ttttaccctt 17250 aaatattcag tgtatatttc ctaacaacaa ggtgatttaa aaatatatat 17300 ataaacatag tataattaac aatcaggaca tcaacattga aacatttctg 17350 ctatgtcatc tacaggcctt aggaagactt tgtcaggtgc cccaataata 17400 gccttgatgg tagaagaaaa ccatgtgttg tattcagttg tcatgtctct 17450 tagtgtcttg taatctgaaa taattcccaa gccctttgga tttcatgaca 17500 gtgacattgt tgaagagtac aggccagtta ttttgtagaa ggtctctcag 17550 tttaggtctg tctgatgttt cctcctgatc agattcaggt tattcacttt 17600 tgacaggaat accactgaaa tgatgctgag ttcttctcag tgtaacgaga 17650 tctagagaca cacactgtca gtttgttcct tattggcagt gtgaaccttg 17700 aggatttcat tgtagtggca tttggcatta ctccattata gttactattt 17750 taccatttta aattaaaact atctggccgg gcgtagtagc tcatgtctgt 17800 aatcccagca ctttaggagg ctgaggcggg caaattgctt gaggtcagaa 17850 gtttgaaacc atcctagcca acataacatg gtgaaacgcc atctctataa 17900 aaaatacaaa aaattagcct ggcgtggtgg cgcatttgta gttccagcta 17950 ctcaggaggc tgaggcacaa ggcttgcttg agcctgggag gcggaggttg 18000 cagtgagctg aaatcacgcc actgcactct agccagggtg acagagtgag 18050 actctgtctc aaaaaaaaaa agtaaataaa taaaaaaatt ttttaagtat 18100 cttatgggca tatacttgtc ctgttactcc tcaaactttc atccactttt 18150 ttttttttaa attttttttc ttacctttca tcgttttctt gatatccact 18200 gggttttagc atctacaaat gattcttgcc tgaatcagtt attatggtag 18250 ttgatggttt tctaattcca ttattccttc tatgtttgtt aattttggca 18300 ttcttctata aggaagagct tacccttttt ccctattaat taattcatat 18350 attaatgcag acctatgcat tcttacttca ttaaatcata atcctttact 18400 atcattatgt attctgatgt tcagactatc ccagatttag ccaataagat 18450 ccccttcagg ggaatggtct ttgggattcc tctttagagg ttcctggttc 18500 ctgttttctt ttgacatatc ctattactct ttgagcattt tttttttttt 18550 ttttactttt aggcacagca agaagttcca tggtcctctt gttctttccc 18600 caactcagcc ctagagtcag tcacttctcc aatgagctct agttcctttt 18650 agtagagaat cataattaga aaacaagaat cagtgccaag tgtgcacctt 18700 tgtttttaag gtccatccac gttgccgtgt atatgtccag catgttgatt 18750 ctaactgctg aataatacct catgattgtc atccatccca gtgtttcttt 18800 ttcccttctg taatgaggga ctcctggact gcctccagca ttaccttcac 18850 aaatattgct gtgaggaaaa tccttaaacg tttcctttat gggcaacgtg 18900 tgagcatgtt tatgttgatt caggggtgcc agacacagct ccagaatggc 18950 tgcctcagtt tacatttcca ccagcagagc atgacaggct ctgtgtctcc 19000 gtgaataatc agcattaacc agcttcctat tttttgccaa actaatagat 19050 gtgctaggat aactctttgt tttaacttgt ttttctctga ttaccaatga 19100 gctggagcat ttcttcatat gcctgatggt ctttgggatt cctcttaggt 19150 aaattgctta ttcattataa tcctttgcct gtttttcact ggagttctta 19200 tatttttctt gaagatatgc aggaattcct tatacatcct agatattaat 19250 cccttcctgg tctcagacat tgcagatatc ttctgaatct gttatttact 19300 tatttattta caattttttt tttaagagtt ggggttttgc tctgtcaccc 19350 agactggagt gcagtggtat gatcatgact cattgtggcc tcgcaatcct 19400 gggcttaagc gatcctccca cctcagcctc ctgagtagtt gggactacag 19450 gtatgcacca ccagacttgg ctaattttat tttatttttt agagatggaa 19500 gtcttaatat gttgctcagg ccaatcttga actcctggcc tcaagcaatc 19550 tttccacctc agcctcctgc atctattata tatatgttca ctttgctcat 19600 gctgtatttt gttgcaacat aaaactattt ttcccattgt tttgtgcagt 19650 ctctcaccag cactcttctt tttctgtaac tgtgttaatg ccctttgttc 19700 ttccatatgt taggtatgct ggtatagttg aactctgctg actctcctca 19750 gtaaacagtc tctttttatg acaccttatc ctctactgaa ttctctctat 19800 caagaatgac ttggccgggc atgggggctc atgcctgtaa tcccagcatt 19850 ctgggaggcc gaggtgggca gatcacccga ggtcagaagt tcaagaccag 19900 cccggccaac acggtgaaac cctgtctcta tgaaaataca aaaatcagct 19950 gggcgtggtg gcaggtgcct gtaatcccag ctacttggga ggctgaggcg 20000 ggagaatcac ttgaacctga gggggaggtt gcagtaagcc gggatggcac 20050 attgcactcc agactgggtg atggagaaac tccatctcag ggggaaaaaa 20100 aaaaaaaaaa aaagaatgac ttgtcttcct cttagagtgt gaggtctaca 20150 tacaaatatt attcttgtat tcagcaaatg tatgtcatag gcctagtgtg 20200 tgttaggaac tgtgctgtca ccaacaaagt ttagagaggt tataaaactt 20250 gactgtagct ttttagaggt ggaggagtga tttgaaacct aggctgtaat 20300 tccttcctcc tgtgattcct tcctactgtg ttgccttccc ttgaaaattg 20350 catttggggg ccaggtgtgg tggctctcgc ctgtaatccc agcactttgg 20400 gaggctgagg cgggtggatc acctgaggtc aggagttcaa gaccagcctg 20450 gccaacatgg cgaaaccccg tctttactaa aaatacaaaa attagctgga 20500 tgtggtgtgt ggtgacatgc acctatattc ccaggtactc agtaggctga 20550 ggcaagagaa tcacttgaac ccaggaggca gaggctgcag tgagctgaaa 20600 ttgcaccact gcactccagc ctgagtgaca gagtgagact ctgtctcaaa 20650 aaaaaaaaaa agaaaagaaa gaaaattgca tttagttcct gtagactgtg 20700 tgtcaaatgt ctaaatctct tctaacaaat ggcctaagga ggtgcaaagc 20750 gaagcatcct caccagcatc ctgacttggc agtgaggcat gggaccctgg 20800 agggagtagt ggtaagtgtg actctggaat tcttcctggg ctacttgtca 20850 gtgactggct ccagattgag aggagagccc agaggacaca ggtggctgcc 20900 ccagcctgga ggtgaaagtc ttaaaataaa atgccagatg cctagaccat 20950 tctaaacctt tctgagaagc tgaaatcatc ccttctggaa gcgctctagt 21000 tctaaaagga cagatataca gcaagatctt cctggggcta atatggagtt 21050 tataggcaag taggcctcag aacctttccc tggtagtgat atctgtgggc 21100 aggcacagtt tccacacttt ccagaaattc cagcggaagg agtgagaagg 21150 aggaatctgc ccttgagtga ggaccaaaga aagcagaaat tcctcttggg 21200 aatttttcct ccagagacca aacactactt gggagcttgt ttactgggct 21250 ttaaaagctt gtgaccccca gtcactcttt cttgacccca aggctttgca 21300 tttctgtggc ttccccactg gacagaagtg gaactgtcat gctgcctgtt 21350 ctggggtctc ccagaggttt ccccatgtcc tctccttgct tctactgccc 21400 cacagaattg gggatctgtg accacatatg gtatagaatt aatgcttgag 21450 aatggtttag ttcagtgatg tcaaataaga ttcactttta tgccacctcc 21500 atcagttgaa ggcccccctg gcccctaaat tggaaaagat tctgagacag 21550 aatccccgtg ggtacagcgc agggacagta aaggcacgtg tgctgtgatt 21600 tgctatccac tgtgtggatg catccaggaa tatcagaacc ctggaagatt 21650 atttaagggg aagttaggac agcttttttg ccaatccaag ggtgttcttg 21700 aggaagtctg tcttcctgta tggccttcag tttctttcct gtgtaaccat 21750 ggggccaaca cataattccc acagctctat tggcccttgt ctgccaggat 21800 tctctagggt ctgattcgag gtggatcctg gccctttgag gtggcagaat 21850 ctgatcatgg tgctgtttcc ttagatttag gccttgatac ccttggcgag 21900 agcatcctgg gctgagtgac cacctgaggt ttttctggtg attttgtgac 21950 ccatgtaaaa ctttgagctt tgggattatt ctctcaagga aatagtgaca 22000 tttggtgaag agcctgtttg gtgtggctat gtgaggctta gccaagaaaa 22050 tgcaccattt ttattaggag gttaggccat ccgttgccac aaagtgtcag 22100 atgctaggcc tagagcctgg agaaaactta ttttaaaatt gatggggtgc 22150 tggaggggtt ggggggtggt ggctgtagct catgaatcag gtgctaaacc 22200 tagaaacaaa aggcctcatg tggcagactg tttctgagca cagatgaatg 22250 gatgagcaac tggcgcaact ttgcccagtt ggtccagctt cccacttggc 22300 cacctaggct tgctgtgaag acctcgtctg gcagaaatga gagtgttttt 22350 gccccatctt gatcttaact gtaatttaag actaaaatct tagattctaa 22400 aacatcaaag gcaagatggc tcccagctct gtgagctcag cttctcacct 22450 cttagttgaa caagtgcagt gtgggtcaat acatgattgc tgctcttgct 22500 gccaggaact gtcccagcat agaaaggaat gggacacaat ccctgccgtc 22550 aagattctaa gggaggaagc aggcaggtcg actggtgcct catctctgca 22600 gggctccagc caaggtttgt gaaggatttt gcaggcatat ggagtgggga 22650 ctgattgatc ccgagagggg actggggaaa gctctgaaga ggggatgaca 22700 tttggtttga actccaaaaa atggttgctt tacctgtttc ctgaagtttt 22750 tgaggtggct tataagaaca tataccataa aaaggaccaa tataaattta 22800 aaatcagaaa aagagaaaat gggctgggca tggtggctca tgcctgtaat 22850 cccagcactt tgggaggcca aggtgggtgg atcgtgaggt caggagatcg 22900 agaccatcct gcctggccaa catggtgaaa ccccggctct actaaaaata 22950 caaaaaatta gctgggtgtg gtggcacatg cctgtagtcc cacctacttg 23000 ggaggctgag gcaggagaat cgcttgaaac ctgggaggcg gaggttgcag 23050 tgagctgaga tcgcaccact gcactccagc ctgggcgaca gagtgagact 23100 cctcctcaaa aataaataaa taaagagaaa atggaactta gaaaattaag 23150 aggaagagtg aaaaggtaga tatttagtca ggcacagtgg ctcatgcctg 23200 taatcccaac actttgggag gccaagacag gaaaatctct tgagaccagg 23250 agcttgagac ttgcctggca acatctcagg tgagacctta tctctacaaa 23300 aaatttaaaa attagctgag ctgtgtggct cgtgactgtg atcccagcta 23350 ctcaggaggc cgagaccaca gcccaggagg atcgcttggg cccagcagtt 23400 tgaggctgca gtgagctggc accactgcaa ttcagcctgg gctacagagc 23450 aagacccagt ttaaaaaaaa aaaaaaagat attcaaacca tgggtcccaa 23500 cgtagttatt atatttgacc atttgcaaaa gctgaaagca aaacatgtta 23550 cacattttca gagaggaaaa tacacagtag ttcctgagtg taagttgttt 23600 ttcttgacct cattcttaaa ttgcttcatg agggtgggag ggaagtggta 23650 gttaataagt gaacctgtaa accagcgttt ctcaaaatgt agtccaggga 23700 attgcatcaa aattgcagtt acctacagtg cttgttaaaa tgcagattcc 23750 tgggcccctg ccccaggctt atcaaatcaa tctggtgagt aggactcaag 23800 aacctgtaaa ttcacatact tctgcagatg attcttcttg cactgcacag 23850 catgaaagcc tctgcaatag acagaaagct accagcattg cgaaagcaac 23900 ttgagtgctt ggcctttgaa ggttgagtgg gactttaatg agggagagag 23950 taaggcatga gaaatggcag ttccactgag gtcagtcagt ggttcattgc 24000 tgacgaagtc acttttaagt catgttttag aagaactacc aagtgtggca 24050 ggtcaggcat gtggcaggac tgtttctgag cacagatgaa tggatgagca 24100 cctggcccca ctgtgcccag ttggtctagc ttcccacttg gccacctacg 24150 gtctgctgtg tggaccttgt ctggcagtct cctttaattt attttttatt 24200 atttttttct ttttgagatg gagtcttgct ttgttgccca ggctagagtg 24250 cagtggcatg atctcggctc actgcagcct ccacttccca ggttccagcg 24300 attctcctgc ctcagcctcc caggtagctg ggatcacagg caagtgccac 24350 cacgcccagc taatttttgt atttttaata gagacatggt tttaccatgt 24400 tggccaggct ggtctcgaac tcctgacctc aggtgatcca cccatctcag 24450 cctcccaaaa tgctggaatt acaggtgtga gccaccgcac ctggcctatt 24500 ttttttcagc aaattctttg tttttctctc tgttcccaaa tgcagggtac 24550 tgagaccaca gatgtattct gtttcctgtt gaaaaaatgt ttctcactta 24600 gctgggtgtg gtagcatgca ctgcagtccc acgggaggct gaggcgagag 24650 gattgcttga gcccaggagt tcgataatca tgccattgca ctctggtctg 24700 ggtaacagag cgagaaactg tctcttaaaa aaaagaaaaa gaaaaagagg 24750 tcctagggaa agaaacaaat agtggcttgg atggtgagtt ggtggaaaga 24800 acagtgggtg ttgggggtgt tgaacttgtg tttgtgtgtg gtgtacccaa 24850 gacatatcat gtcagcatta agaatagact attcctgttt tctggtcact 24900 gagttgtatg ttttgacatc cttattttgg aagatacttc cttactagga 24950 atgggatagg gagggggtca cctttcccat ctgtgggtca tattttaaaa 25000 tatttattgt tcaagtttaa agatataacc aaaggtataa agaaaaatac 25050 cacaaacatc tgatttaaga aacaaaccag ccgagcgcgg tggctcgtgc 25100 ctgtaatccc agcactgtgg gaggccgagg caggcagatc atgaggtcaa 25150 gagatcgaga ccatcctggc caacatggtg aaaccccgtc tctactgaaa 25200 atacaaaaat taactggtca tggtggtgtg tgcctgtagt cccagctact 25250 cgggaggctg tggcaggaga atcgcttgaa cccaggaggc ggaggttgta 25300 gtgagccaag attgtgccac tgcattctag cctggcgaca gagtgagact 25350 ccgtctcaaa aagaaaaaaa aaagaaagaa atcatttcct acaccttcga 25400 agccttcatg agttagattt tgaaacagtg caaaatgctt cacgtgagaa 25450 tcgagagtcc cttctggtgg ctctccatcc cctgctcttc tgtcaggttt 25500 tcttgtaggt ttatggaaac ctttgttact tgtgcaggtg gcagagaagc 25550 agagaggata gctgcgcgcc acccacacag ctaggattta ttggcgtact 25600 cccacgtgca tggcagccaa gtggacacaa ctctgtgatg aatcctccca 25650 agagaactga ggggccctga tggaggagct gcttctttgc aaagctttcc 25700 ttgactctct tcctgtcccc tagttgattc cccttctgtg ctagttttag 25750 cttattgttt gttacctgtc acacttagca gtactgttgg ctttgctggt 25800 ctccttgact actgggggta aagacctttt gttgttgttg ttgagacaga 25850 gtcttgctct gtcgcccagg ctggagtgca atggcgtgat ttcggctcac 25900 tgcaaccttc acctcccagg ttcaagagat tctcctgcct cagcctccta 25950 agtagctggg attacagcta caccacaccc ggttaatttt tgtattttta 26000 atagagatgg ggtttagtag agatggggtt tcaccatgtt ggccaggctg 26050 gtctcaagcc cctgacctca aggtgacctg cctgtctcag cctcccaaag 26100 tgctgggatt acagacatga gccaccatgc ccagcctcaa agacctcttc 26150 tttacttgct caccctgccg cccactcccc taccaacccc tgcatgccct 26200 ataccacctg gcacatgata catactaact gggtacatgt ttgaatatga 26250 atggatgtgg tgctgtgaat gcttagggga agtgggtgaa atgcttaaga 26300 accaaccttg agtggtctgg gaaggcttcc tgggagggtg gtgtttgagc 26350 taaggccagg cagctgttag atttgttaga ctgaagccct tgcagactta 26400 gagagcttgt gctcttccca gaatgacggg tgagccacgt acagtaaatg 26450 gtgcttctca tttctagccc aaggggcctc aaggggcacc gtgatttcac 26500 gagaatgctg caagcaaatc ttttctcaag ctggggaatt tggtggtaat 26550 gcctggctca gcttgcggtg cgcacctggc ctttggaaga ttggtacaga 26600 gagaagcggc ccatccacat gagcctgtgg aacagcactg gtgggggagc 26650 tgatttgtga agaggggctg tgcagtgtac tgtcaggtct gagacccagg 26700 aagaaattcc agtatcccag ctctcagaat cacagagttc taggcactgc 26750 ctagttccac gtgttcccaa atgtttcctg aatacttgga tttcctgtcc 26800 agagaatttt caaaacaaac ttagaggcct gacccatggc tgccaaggaa 26850 ggattttttt tttaaattaa attttaaaaa tcagtccagc atgaaaatct 26900 atgatgattt cataagagaa aggacatttt aatattcaaa gagtaagaag 26950 cacttaatct tggaagaaag ggcattccta tactttgatt acctttagtt 27000 taattaaaaa acacctacat ggtctttact tctgtgattt cattcctggg 27050 ctagtgaaac attgtcacaa taaagcatca ggccaacgct tctttcgacc 27100 cactggccaa tcagttgaca aacagtgact agatgtttca gcctattttg 27150 ctgaggctaa aggattgaac tagtgcttca gccagcatga aaaccagtca 27200 ggagtccgtg ctggtgttgg cttagattag cagggccttt gatggagggg 27250 catgtatgtg tttgggtttg ctgtgccagg caggggagca gtggaatttg 27300 tctgaattga gctcacacat tgaagttatt gagcgactta catgcaaggc 27350 catgacctgg actcccagcc gagaggccca cgtggcgggg cttgagctgg 27400 gggagccgag gacagcttac atctgctcat ctgcttacgt aaccctgcct 27450 cccagcttcc agagccaaga aaacacacaa gccagcccag cggggccgag 27500 agcctgtggt agcacacgcc atgcgccgca cagcaagggc gccttggctc 27550 ggcttgaggc ctgtcatgaa gccctcagcc ctctgcctcc tcccagagct 27600 tctccccacc accccaggca gtggctctga aacctggtcg caggtctgca 27650 tgattctgaa cagaggtagt cgttgccttc ctggagtctg agctctctgg 27700 agtttctcac tgggacagag ccaggtgtgt agcagagcat ggtccctgca 27750 gtatggcagg aggtgtgcag ggcattcagg aggcctcctg gctggcactc 27800 gacccaatta gtcattcaac gccaggtctg gggctgctgt ctgttgtctc 27850 aaaggtgtga gctgcaagat ccttagagtt gtggagaaaa aattgccaga 27900 ttggcaagaa gggcaggatt gggggtcaag gtgtctcagt gtgttggaag 27950 catgatgggg gttgtgcaag gggcacagcg agttcagaag ggagcaggag 28000 agtgagaaga ggctgttcag tgataaagct ctgcacagag ccattggagg 28050 agcaagctcc ttgaccatcc ttaaaccagg gtaattttca tttaggttct 28100 gccacacgct cagcagggaa ctcctggaag gcaggatttg tcttgtccat 28150 cctccctccc tacctcaacc cactcctcct tgggctggca cacagtaggt 28200 acccagaaag tatcaattga aacaaattga aagtggtctt gatacatatc 28250 acagggcaag tttgcagtta acagacattt cagagtaaag actctctggc 28300 ttggtgctcg atcggcttct gtgggttgtc agcatgctgt ggacagcccc 28350 ggcatgggag cgagtgggcg tgtgtgtgtg tgtatgtgag ggtgagagag 28400 cgttagtgtg tgtgttgggg ttggggagag aggaggggga atagaagatg 28450 gaccacccgg gtatcagctt ctgccctggg gagatggtgg tgtcagttgc 28500 tgagggaatc ctgagaagca ggtctggctg taggtggtga tggtggtggg 28550 gttgcatgag aatccatttg gggcaggttg aatttgaggt gcccatgaca 28600 tatggctagc catgttctgt tggctgtgag gtcaggagag agacatgaga 28650 tggaaacaga ggtttgggaa ctgtcatgtg cttaaaccaa agacctgggt 28700 atagggagag tgagaagaga agggggcaaa gatggacatc caagaaagaa 28750 gctgagaaag cctaggaatt tgaggtaaga ggagacgtag gtaaatgtga 28800 cgcttggtga tcaaggcttc tttccacctc tcctatgctg gacactcacg 28850 tctcctgtct gcttggaaat tcatgctgag ggcagggaag gtgggagcaa 28900 ggatttgtct aaagatcttg ctttggatcc ctgcactcct cctggtttac 28950 caagtgtcac tggacacgtc agggcgttct gagaccttag agagcatcca 29000 gtcctgtccc tgcagtttac aaatgaggaa accagtaccc tgagagtggc 29050 tgtactatcc actctcagga taccaaagat catctggaaa gtcactggtg 29100 gagctggacc ggggcccagg catctcttct cctgtccggg gctcttgact 29150 tcaggaccac ctttctgaaa cccatgatgg ggcaacacca ggacactttc 29200 cagcctgcag gtgtctgtcc cgcggaagcg agccaggcca catgtgaatt 29250 cctgttttct gggtgggttt cagaaggtac gagcaagtcg gcagggtgac 29300 agcccaggtg cttcttgggt tccccaaaac gcggttatgt ttagcagcat 29350 cctcagaacc aaaggtgggg tgggggctgc agatgttgtg ggggccctct 29400 gaagtgaaaa gagccctgtg acagatcttt tcttcatgtt tttcacaagt 29450 tcactgtgca gcagggcccc cccagtagcc tttgcccagg gttgggtgtt 29500 gggcagccca ggcctggctg accttgtggg gaagggtgtg aatggtggga 29550 atccccgagg gccctctttg cccgaaagcc ctaagccttg acatcagatg 29600 cccatcagat ggtccatcgg agccctacta cccagcttgc ccagtgagaa 29650 tcatctgggc tccttgttag gtagccattt aggtccttcc caaaatccac 29700 agactctcta agggaagggc ccgagatgct gtacttgtac taacttcctc 29750 aagcaattct tgtgataggt ttgggaaaaa cttgtccagg gtgaccactg 29800 actgagtcct ggtcttctct gaagagcaca gtgcctgctc actttagggc 29850 accctgggag gtgggagctg gctcagcagg cagtcttata agggactgag 29900 cttcaaggcc tctgtccctc caggagggag gtgcatgacc agagagggag 29950 gcctgaggat cttcttccct gccccagagg gtctgctgcc tgagctctgt 30000 gatagcgcag agagtaaaag gatcaagctt gattgaggcc tatctctcaa 30050 tgcgaaagtt tgctagttaa gaggagagtg ggaagggcat ttctggcaaa 30100 gagaaaagtg tggacaggca tggcttaagg gatggggagg gagacagaca 30150 gagctgaggg tgaagggcct tttgctcagc tgtgggcctt ggccttccct 30200 tgtgcaggga cacacagcct tagagccact ggaggtttta gtgggaaagt 30250 aatatggtcg gggctgtatc tcagaagaaa acaaactaat gggaacaggt 30300 cctgtgatgg tggacctggg tcagctacgg agggagggaa gatgtgagat 30350 gtgtactggg gaagggggtg gaagtggcag ctatctggtg agaggaagca 30400 ggcccacagc tttttttctc aagctgttga attcagaagg gcgagtgatt 30450 ccgggagtag ggggtgcttg gagagccacg cgttattgat aaacagggca 30500 ggctgaagcc tgctcactgg ccctgggcgg gttctcacca gcatgtttca 30550 ggttttgatc tgtgcttgtg gttggtgttc ctacctgttc tctaggttcc 30600 ttcctttgtt cttgtggctc atttgcttca caggtgaagc tggttacact 30650 agagtaacag ttcccaaagt gtgttccctg gaaaaatggt tctgtagcca 30700 aataagcttg ggaaatggtg ggttaaatat aacgaagggg gtttttcgac 30750 tgcacaactt ctcagagcct ttggtgtgtg tcgtgacttt gcagaagcag 30800 gatttaatac gcagcattcc cgttcttatt tgaccacgag acatgttttt 30850 ccattaagca tcttgctggg tctgatgttt tctggaaccc attttgaggc 30900 ggtctggtct gcagagagta tggggagcct gggttcaagc cttggctctt 30950 gactctcagc agagccttga ttccctgtgt tgcctggact gcaccacgtg 31000 taccacatac ccggtatgtg acgttttcct catccctctt cccacctgcc 31050 gttacctcac aatccacaat ctgcacctca tccatttttc ttctgaggca 31100 agcactctct tactaactta cttatctcat ctgcatccat gttcttctag 31150 gccagaaact tgggagtcat ccctccctct ttgttacttc ttcttcctct 31200 ttgttacttt atcccctctg ttactaaaca ttcttctgtg tttccagcta 31250 tttcttttat tttccctcgg tctcctttgg ggtttctttg cctccatctc 31300 tcccagacct tggttcacct tccatcgagt cccttcctgg gacatgggca 31350 ctcatgccac tcctgctacc ttccacttcg aagctaactc cctccacact 31400 gacgtcccca acatgcatgc atacacacac acacacacac acacacatac 31450 acacacacac acacacactt ccccagttag gctagaatca gagagatgat 31500 gtcagccatt tgtccaaggc cacgcagctg ggaggtcaca gagctaagtc 31550 tcaacctcag gggttttgag aaattgcctt ctcatccgtg atcactgatt 31600 tctacaacag cctgtcagga agtctgggta gaaattactt ccattttaca 31650 gtggagtcag agcggggagg gtcctgggca ggcgagtgct tcacagagtg 31700 accaaccatc taggtttgcc ccacactgaa gggggtttct ggggatggtt 31750 ggtcacccta atgctggatg tggtgcctga tgctgggcag gagggccctc 31800 tccgtggcca cgttgcctcc caggaggaga catttcctct gcagctgcag 31850 ctgcagcctg gccatctgat gcagcctgtg gagcggtggc gagtcctgtg 31900 gcctgctaac ttctccctcc ctccacctct ctagtgggcc ccatgctgat 31950 tgagtttaac atgcctgtgg acctggagct cgtggcaaag cagaacccaa 32000 atgtgaagat gggcggccgc tatgccccca gggactgcgt ctctcctcac 32050 aaggtggcca tcatcattcc attccgcaac cggcaggagc acctcaagta 32100 ctggctatat tatttgcacc cagtcctgca gcgccagcag ctggactatg 32150 gcatctatgt tatcaaccag gtgaggcctg ggaaggtgga atgagagagg 32200 gtgtgtgtgc atgcagatgt gtatcagatg tgtgtgtaat gagggcaggg 32250 gaaggggagt gatttcacag acacctggca cttacagcga ggaaccagcc 32300 ccccagccac caccagtgca gatgaggtaa acgccaaaca gtgtgcttgc 32350 ctattgctgt caactctata gccaagggaa atgctggagt gttttcgttg 32400 ttctgttttt gttttctgga agtagccttc cagcaagatt gggaaaaaag 32450 acaaccctaa ttattccaaa gtacacactg attattccct ggctttgtgt 32500 agctgtgtat tttcctttta aaaataaaac caccatttag atgtcagact 32550 tttaggtaac ttcaaagttt atccagtcag tcagagcgtg tctcctgggg 32600 cacctggaga cagtgccctt agttcaggtc acatgcctac atgccagccc 32650 ctggtgaaat atctggagaa gtctgattcg tgggccatct gagagttatg 32700 tggactgggc cgagtctgag aaaaagtttc tcactgctcg tctgatccat 32750 atgtgttggg ctttagccct gcttaggaaa gtaatgctaa ggataggtca 32800 actttcatca ccatggcatg gagaatcaga ttgatctaag aggcatcttt 32850 attgaaataa atttttcagt ttatttgagg agcattattt tcccaagagt 32900 ataactttga tatttcaaga ttacccctaa cacttaaatt catgttttta 32950 gactataacc tcctaggtgc aatgacacat ctaacttatc taagcaccca 33000 gtttcattga aattcatttg aagagtctga gtacgcccat ttctacaagg 33050 cccaatgtcc atttcatttc gagataaact ctgctttagg taggaggatt 33100 gttggcagtt tacggcttcc atcaaggtca aggaactctg tgcaccttcc 33150 ctatgacccc aggggaagca ctcgaggact gctgtggcat tgtgctgcat 33200 cacttgctgc agggagattc tgaagaagtg taaggtctca gtcctgccct 33250 gtcccgaagc ctccaaccca cttctggcaa gtgggacctt cccagggaac 33300 aatttgttaa cagacccaaa tatcctgtga ttggatggtg gctgccaaat 33350 gctttggaag ctcagaggaa ggagagagag caatggcttg gaagaaccag 33400 gatataaact aggttctaaa gtctgcaggg agatgggctt ctcagctggg 33450 gccagtgagc agggacctta aggcagaaag gagccttgca tgttcctgga 33500 aattgagatg cccactgggg taggaaagca ccagaagctc tgggaccagg 33550 tgtcagagtt aagcctgtga ggcaggagag agcagaacaa gccctgttac 33600 aaggaaactg aagcaggaga gcaggtggtg ggcaaacccc ttgaggctgt 33650 ttgaattctt cggccaagtg aggtacagac cagggcccta tgaacacctg 33700 caagcaagac agccacgcag ttgtgggtca ccttggaaga atattggaga 33750 atgcaagaga gaacaggtaa atgtcctgca aaatgcgggt cactttaacc 33800 caacacatat tcatttaaga aaagctctgt gattgagaaa catttgtctg 33850 atgccagtta gcacatacca atgacggcaa gattcaggag cctgttatta 33900 aagcagtggc agcgagcacc tggaagaggc ggccaccatc accaggagcc 33950 agcagggatg actaataagc cgtgccagct gcatctcgtt tctctcttga 34000 cagttgctat gccagtagat gagggatgta ctgtggatac aatgctgtca 34050 tatcttattc agcagggcat ctgatagcat cccacaaatc tgcctgagta 34100 gaagacagac agctgtggtc tgggtgccat ataggtaggt taaaatatat 34150 atttgggcct aggcgcagtg gctcatgcct gtaatcccag cactttggga 34200 ggccaaggca ggcggatcac ttgaagtcag gagttcaaga ccagcctggc 34250 caacatggcg aaaccccgtc tctactaaaa atacaaaaat tagctggaca 34300 tagtggtggg cggctgtaat cccagctact cgggaggctg aggcaggaga 34350 atctcttgaa cccaggaggc agaggttgca gtgagccgag atcatgccac 34400 tgcactccag cctgggcaac agagtgagac tctgtctcaa aaaaataaaa 34450 taaataaata aataaataaa atatatactt gggtaaagag gataaaagag 34500 ttagcgatga tgctgaattt ttgaactgag gtggctgttt tcaaggaaga 34550 ctggagggtg ggatgctacg tctagatatg ttgcagttta ggtgaatgtg 34600 agacttccct gttttgaagt caaatattgg accagtaaaa tctagccatc 34650 agcttaaatt cctatgatac aatttacata ctccccaggc tcaacacagt 34700 agatttctga atgtcctctg ccagctacat gctcctgccc acctcaatcc 34750 gagtagatgg aacaactaac caagccagct cagaccggtg gcacagctgt 34800 gctggctaac actgggcacc acctaagaga gtgcttctcc aaaagtgtgc 34850 ttccccaaat ggagcgaaat acgcttgagg aatgttgggt tgaaccatgt 34900 aaagcaggtc tcattcccgc agagcctttg gtaccccggt gtacactgta 34950 accccagaag tgtttcctga gcttgcctga cgagacaact tttccaagaa 35000 ccgtctcaag tgatgagtgt tttgtgagtc acactttggg gaaagcgggc 35050 ctaagttagc atctcctccc agctgcctcc ctgctttccc tggaacacta 35100 ggaactgccc gtcctccctc cctccctcct cttcccactt cacaacttag 35150 catcaggaat attttagttt tggtttttca aacatatata cctccttttt 35200 tcttatcttg tcaatatcat cttttttttt tctttgcttt tcctcatact 35250 tttttttctc ttcatccttt ccttctccaa gggttaactt tccaccttag 35300 gagaatcttt tctgcttttt ctcccacttc cccagctact ctcttatcat 35350 ctgctccaat ctcaccctaa ttgatcattt tgggaaaata tggtcagagt 35400 ccagataact aagttgagaa atgcttaaac tctgccatac ctttccagta 35450 aagaatatta cctaataaat aataaaatgg taatgggaaa cctgaaccct 35500 gaaaaaaaag aggtggaagg agaaacattt ggagcacatc ctgtctacaa 35550 attaggaact gcctgtgtta tctgttttat ggttatattc tagaagaaga 35600 aagggatttt gtagcacctg gttttgacct ttctgcactg tttgttgagc 35650 aaataaacct tatgggctgt tagccctctt tatagcctct cagcttatcc 35700 ctggcccaga caccctgctg tcattttgac ttttcattcc cacacacaca 35750 tacacatgca cacacatgta cacacacaca cataccattt aagattagac 35800 agaagtaatg ctcaaaatgg agtggcttct gagacattta gtccaagggt 35850 tcccaaacag gcttttcagt atcagatttc tttctgcccc attgaaatgc 35900 tacacaacct tccgcttaca gcaggtcaca agggtttcat tctacttgaa 35950 gtaggggcca tgtcccattt ccacttcctt ggcttcccat tcagtcactg 36000 ctaggatttg cctagacccc tgaggccaga caatgtagaa acttctgctc 36050 catgtcacag gtgaggaaac aggctcagag agggacaggc tccgaaagtc 36100 acatagacaa cagtagggct gcggctcaaa ccccagcgtc tgactccagg 36150 tttagtgcct tctcagggca tcagtgacac tcctcatggc cagggtgccc 36200 ccagtgttgc tcacagtctg gtatccaggg ctgagagtgt gctgtgtgct 36250 cagactgcct gggttcagtc ctggcactgc cactttacag tcagtgacct 36300 caggcaggtt acttaagctc tgcaggcctc agtttcctcc ttggtgggga 36350 gggttatgag gcatccttct catggtaaac cttcagtaaa taccagccgt 36400 tactaggagg gtccactcct gcctctccac tctccattca tcctgcctgt 36450 ttcctctgcc tgcttcctct gcctgcttct gtggtggtga attcttcatg 36500 gctcccaccg cctcctgctg cacccccact cagggcccgc atcaggaccc 36550 ttcctcctat tggtttgaac tccttggagt cagagggtaa tggatagtgg 36600 agtgagccag gtggcagaat ctcagaggcc atcccgggcc tataagcctc 36650 ttcaaaatag ggccacgtat caagctttac acacaggagt gaactttcac 36700 aagttgttat gactcatact ctgtctatag taagctgtta accactccca 36750 tttggcttat gcctctgtaa ttattgtact aacttatatc ttaaaataag 36800 gatattgaag gaatgagccg ggagaggctt tcctggttga gatatagaag 36850 aacaagagtt gctctttttc cttaaggtct ctcctcccac ccctgacctt 36900 agctcaccag catgggagaa tactatttga ctccttgtac tctgagacgt 36950 ggatttcaag atatagcatt ccaacttcaa cggcagcaag aaaagaagca 37000 acagaaggag aagacatcat agcaaacagg gatgcatgct gcatttccta 37050 atactcaaac ccggaaacga gacttcactc aaggtgaagg gagggcaggt 37100 caccacctgg tagcactagc cctaaattaa ggaatgcaga atgtttgtgg 37150 gattgcccat cataaaaatt acaaaatgag taaggaatgc aggcacagct 37200 ggccaggtgg gtttgtcaca accatggcag ccctttgccc cacagccagt 37250 acacagaact ggtctctcca attccgattg catatcttct ggcacctctg 37300 ttcctctccc tcagctgccc aggatttttc tggttctgac catgttactt 37350 cctcttttaa acctgttagc atttcacgac tgcctacagg caacggtcta 37400 aatggtcgga aggcccaagc ttagcatccg agaccctgac ctacctccag 37450 ccacttcctc ctcctctcca cttcactgga ctccccatct ccacccagac 37500 acctctgttc tcccctctgt gtgcctttgc ttatgctgtc ccctgtgttc 37550 ctagtgtgtc tctggctatc ttttaagctt ccctccccaa cctcattagt 37600 tctgtggagc ccctggaata gagctgactt ctccttccct gctgctccca 37650 ggctgctcag aactttctgg aaagggatga ttatctgagt tccagcctca 37700 ccccagcccc cggactctga gtccctcatg tctgcctccc ttctttctct 37750 ctgaccacac agctggtaca tagtcagtac agacgcagtc agtgagtgga 37800 gcacggggct tctctccagg attcctgccc ctttgtttat ccctagtctc 37850 aggactccct actcctggtc ttctgcctaa atctgtgcct cttggaagtg 37900 aagcctccgt tcccagtggg gccaggtcct gacccttggg aacttgcagg 37950 atccctccct tgggcctctc cccgaagctt ccagctcaat gctgaccaga 38000 gcacaggctg cctgtgacag tccttggggt gacctccctt atcaggaaaa 38050 atgcagaaaa cctattaata ccttagcctt gtgattgtta atggtcacaa 38100 aactccttta gggtcctttg gactcagcac ctttatggtc tcactttgaa 38150 ttttgaacct cccacctccc cccatccccc agagtaaggc aaatggtctt 38200 ctgattgttc ctgcagaggg aaggctccac aggtaagcac acgatggcca 38250 ggaagcagag ctggagcctg cctgaaaggc tgtggagaaa tggagggagg 38300 gctgccctga ggactctgtc tggctttgaa gttttctact gtttcctttt 38350 cttctgtgca ctgttttagg atgatggggt gatagttcca ggctggttga 38400 ggatggattt ggagacagtc ctttgtaccc tcagtgagca agagtatctg 38450 tcaccctacc tcagcagttg tctctgtcac tggtccaagc agctggttcc 38500 tacacaaggt caagatcaac tggggagaag cagactcctg ggtctatccc 38550 attagtgagg acagctgcct gggcttatgg cctcattggt ttggtttcta 38600 tcttgatcat ctctaccatc cccccatccc ggccttccat tttctacctc 38650 agctgtcagt gcacagattg atgtgtgtgg gaacggagct tgggaggagt 38700 ggggtagggc tggtcctgtc ctgtagcctc cccttccttc gggcacttgg 38750 accctttgga gcttgccggg gtggggaatg ggagtgggaa ggccagggag 38800 tgtctctgca ccatcactgt ttgagtgttg cccctttgct gtgtgcccca 38850 cctagtctat gtgtgtctct gttctctggg gactcaattt gctggtgaat 38900 tgcttccatg gacattgttc tgggaaatgc cattttttct gctcacccat 38950 gactctgtga caaggaatga cagcttatta ggaatttgtt tttgcattgg 39000 aacagtggtc atcagaatgg gccccttttc ccttgcagct ttgacatttg 39050 cctctctttt cctcacctct ctcccttgca tccacccttt tctctttttc 39100 ttcttttttg ttttccttct agcaggggcc ttttaccttt acttgttaat 39150 cctgtttgta gcaaagcaag tggaaggagg agttcctctc tgatctgctt 39200 cttattctcc acctaccttc tcttctgtac tttccgcctc ctagagagag 39250 agagagagag aggaatgccg acctaactac cgctgccact gctgctgcca 39300 ccaccgctgc caccaccacc ctggtaatgt tcacatgtcc tcaaatcaac 39350 ccagagccag ggccctgctg gtcaggggga ggctatgtaa ataatcccat 39400 gagtgtgcca tcctcaggcc ctggggtctc ctaggcaaga ccagggcctc 39450 tgtgggctct ctcggaaatg ctgaggttgc tggaagccag cccgtcatac 39500 agggtctgag agtttaactt cttttaaatt aaaccacagt tgagctcatg 39550 ctgtgtgtgt ataaactttt gtatcctgct ttttccttaa attctttatc 39600 atcagcatct tcccatgtta tttcatagtc ttcatcatca tcactttcca 39650 taccttcata gtagttgatc gtagaattcc atcataatta acttgtcttt 39700 tctctcttag aagtccctta ggtaatgtcc aattttccgt gagtgtaagt 39750 aataccataa tgaacatctt ggagtctgaa gtttattctg tgttggtttg 39800 ttccacattt aggatcattt tcccaggcta gattttcaga tgtgggatta 39850 tgggttcaga tatggtttac acatttttat agttcttaat acagatggcc 39900 aaattgcttt ctgaaagaga agcttttctt aagtattttt ctccaacttg 39950 tatcttaaac atcctgaaca tgcttagcac cactgtcttg atatatctgc 40000 ggaaagccac gtctccactt ttcagtgtgt cgggccctgg gagaggcagg 40050 catcctgcgc tggctccttg gagctgggtt taaaattgtc tcctctggct 40100 gggcgtggtg gctcacacct gtaatcccag tactttggga ggccgaggtg 40150 ggcggatcac taggtcagga gatcgagacc atcctggcta acatggtgaa 40200 accccgtctc tactaaaaat acaaaaaatt agccgggcgt ggtggcgggc 40250 acttgaaaag tcccagctac tcgggaggct gaggcaggag aatgatatga 40300 acccgggagg cggagcttgc agtgagccga gatcgcgcca ctgcactcca 40350 gcctgggcga cagagtgaga ctccatttta aaaaaacaaa caaacaaaac 40400 aaaaaaacaa acaaacaaaa actgtctctt ctgtgctcac ttcacccaga 40450 atccctgttg ggctcttcaa ggagctcagt tctctctgaa agcaacttta 40500 tagcctcagt ccagtctgtg ttcctgtgtg gcaggggtca agggtatgct 40550 cactcttgag agtggtgtct ttggttgacc aagaaccact cccatagcct 40600 ggtccctaac ccttgaaggc ccatctctct cactcactgg ggtgaagagt 40650 ttaaatctca gatccaagtt ttgttgagag ctctgagcta ccatattgct 40700 atggttaaca atagttaaca atgttaacaa tggttaacta tggttaacaa 40750 tagttaacaa tgtttaacaa ctagagccca gctgggtgtg gtggcatgtg 40800 ctaacagtcc cagcttctca agaggctgag gtgagaagat tgctggagtc 40850 caggagctca aggccagcct gggcaacatg gcgagaccct gtctcccctg 40900 caaaaaaaca acaacaacaa aagcaaaact agagcccaac tgctgtgaac 40950 tcatggctga gtagatatta ttagccctcc acaaactcag catttgtata 41000 atcccaggct gtttccagta attctctggg gatcatctcc cagcctgtcc 41050 actgttccag gatccacact taggcctata ggaatgcccc gtcagagctt 41100 ctgctgccgc tgatctgtta ctgtttcatg caacccactc ggcctagttc 41150 cttcctctta ctgtctcagt gggcacagaa aagcatacag agggtgtttc 41200 agcaaacatt gccactggct gcagacctgc ccccggatct gtcctgttga 41250 gagcttagtg ctgcgttctt gcatggtggg gaggggtgtg gctctgtgat 41300 gagccagggc atgtgtatag gagcaacagt gtctctctta tcacgtagaa 41350 gttctgactc attgcgagtc ttggctttgg gttaatggtt ccagccatgt 41400 tgctgctgtg tcttttggtg caggagaggc tgggcacagt tggtccctaa 41450 gccattatgg ataagggatg tgtctgctga tatacacaca tggacctgac 41500 atccagggaa ggcagggtga ttggacagaa cagttcttcc agaagctgtt 41550 ggaacttgga caagagtggc ccttggcttt ctgtagttgg tcatctgtcc 41600 cctgttgcaa tcaggggaag gccacacttg ccttccttaa ccacagttag 41650 gattttcttg gggattagac cagattctag cacctgtcct gaacctctcg 41700 ccccgcccct acaaaggctg cttgcaagtg tagtgcacat acacagggag 41750 caggtggggc atggaagtgg aagtggagcc cctgcctttg gcccttgggg 41800 gaggcactgt ctgcttaccc acggttgttg cctcatagga atcatacaac 41850 agcttcctaa ctggtctcct tgccttcagt tggattgggg cacaaatccc 41900 tccttgacat ataaaccatg gtttaaggct ccctgtggcc taaataaaga 41950 taaagcttaa gtatcttaac aagcacctaa cccttctccc cagcctcggt 42000 gatttggctc atcgctgcct tcatgtttca ttctggcttc actcattcgg 42050 aatttcttgt agttccttgg ctgttctctt ttccttaccg cctttacaaa 42100 tgctctcacc atgcatgctt ttctctgctc ctacagatgc cttctctccc 42150 agcaccgcct ccagagtcta tgtctggtcg attctgtctg ctgtctccag 42200 tccccatctt gtggcagtct ctgctcaatc atttggggat tttatatgtt 42250 ttctggcctt tcttttgggg gcctgtcttc tccttctaaa agcagccagt 42300 tgacctagaa ggaagggata actgtaactc ttgtctacca acataagatt 42350 aggcccaccc tttaaaagct gcgtctttga aagggacacc tgcacccagc 42400 atgctggctt ctcttcacca agcgtgactt cctacgcatt tcacaggcct 42450 ccagaggtcc ccctgactct cttctgctgt gagaaactct aatcatgtaa 42500 gccacaggct aattcccttg agccttaaat gtttttagta atttcccatt 42550 catcagagaa gcaggatttg ggaggaattt tgaagcaaac actacagaag 42600 gcagagtctc caggtaggat atctaagaga catttggaat ggtctgactg 42650 ttcaagatgg atgggaaagc ctcttcctgt aatgatagta gccaacattt 42700 gttgtcaggc agtggggccc catttttgag atggggtctc tgtcacccag 42750 gttggagtgc ggtggtgctg tcatggctca ctgcaacctc agcctccccg 42800 ggctgggtct tcttaattct gaaaaaccca gcttttaaag ggtggaccta 42850 atcttatgtt ggtagacaat gttgtctcat ttaatacaat gcacatgctc 42900 tccccataac acaaaagagg gaactgaggc ctggaggtgt gatgtacccc 42950 aagtcacata gctaataaat aaagaagcca gcattcctgg gattaaaaat 43000 gcatgtgtct gtcactgtgg tgtatttggt gcttgatcaa tgtttacttg 43050 agcaaatgga ggggcagagg taccgatgag tgtgctcagt gaggagggca 43100 ggagtgaagc tgggcgtctt cccgcctctt gtgagtggtg gggcttggtg 43150 agcttgccag ggcctgtctt tcttatcaaa gaaggtgtgt gccccagtgt 43200 tacagcattt cacccaaagc agcctagaaa atgcttgact tttctgtcat 43250 tccggggagg acactttcct cctccactgt tctgctggcc tggtgtaccc 43300 acggcccctg atagatgata gcacctgcta aagtgcacca tgcccttccg 43350 tctcactgca tcccacagat gaggccaggc tgggatgagg gagaaaggga 43400 gggatatata gttcaggtta ttttggaaaa ctgcctgacc aattttaagt 43450 ctgggccgga cactggggca tctcaccacg ttgaaagggc cgtggcaccc 43500 cgggcggtga aaggggctgg aaccaggtct gcttcttggg cttctcctcc 43550 agggtgccat tgctcatggg ccttggctgc agaggtgctc attcgtggtt 43600 ccaaaattcc aattcctggg agaggaaaaa tgcttagttc agtctcagtt 43650 aggcctctgc ttagatcaaa cagccaaggc cagtaggccc agtcctatgg 43700 tagagacatg gcctcaaaga gccctctgct gcagttgttg gggagtgtac 43750 caagagaagg gagcattgtc ctgggctggg cagccctggg ggtctagtgc 43800 atagatgtag aaaggctctg ttggtatacc tccctttgct tgttggaaag 43850 tgctcaacgg ggctgaattg tgtttgacag tgtaagtctg ggctggggtg 43900 agggttgtta caagattgtc aagatgatta aatgaaatgc catttgaaac 43950 acttatccat gccttgtgta tggtatcccc accagtgaat attcacagta 44000 tattataata attccaacaa cttcataatt ttcatatgca atttctaaac 44050 tttgaacttt tttttttttt tttttttttt tgagacagtg tctcgctctg 44100 ttgcccaggc tggagtgcag tggcgcaatc ttggctcact gcaacctcca 44150 cctcccggct tcaagtgatt ctcctgcctc agcctcctga gtagctagga 44200 atccaggcgc ccgccaccac acccagctaa tttttgtatt tttagtagag 44250 acgggctttc gccatgttgg ccaggctggt ctcaaactcc tgacctgagg 44300 tgatccaccg ccttggcctt ccaaagtgct aggattacat acgtgagcca 44350 ctgtgcccgg caattttttg tgtttttagt agagatgggg tttcaccatg 44400 ttggccaggc tggtctcgaa ctcctgacct caagtgatct gcccgcctca 44450 gcctccctaa tgctgggatt acaggtgtga gccaccacgc ccagcctaaa 44500 ctttgaattt ctttgaaccc atgacttaca cagaattagc tgaacgcaga 44550 attccaaatc aactcagcct gtgggacagc caaaaaacac agtgtgcctt 44600 tgggctcctt cactcaccac gcggggttag aaaactttgt cagaggcttt 44650 aaaaaaggag ctcttgtgtg taaaatgttt ccttgattct ctttctggtg 44700 cctctctttc tctaagtggt ttgcttcccc aagttcccca cctgagtctg 44750 ggtggctgtg gcacatctgt gcattctgta cgcacacagg cagccttttg 44800 gagtgccagt ttccaggtct tggttttatt tatttattta tttatttttt 44850 tgagatgggg gtctcactct gccgcccagg ctggagtgca gtggtgccgt 44900 catggctcac tgcaacctca acctccctgg gatcagttga gcctcctacc 44950 tcagcctcca gagtactagg gaccaccatg cctggcaaat ttttgtaatt 45000 ttttgtagag gcagagtctc accatgttgc tcaggctggt ctcgagctcc 45050 tagactcaag tgatctgccc accttggcct cccaagtgtt aggattacaa 45100 gtgtgagcca ccatgcccag cccaggtcat cttttgaggg catggagaga 45150 agactttgag catcccactt ttgagattgt gtaccagtcg caagccccta 45200 tgacacactt tttccccaaa gtagagggct ctgactatgt tgatcccaag 45250 agagatggga aagagcattg aatgaggatt ccaaagtatt gggccttagt 45300 tcgtttcctc atgttggtgt tgtgaagatt ctggttagga taacagcatg 45350 tgtgcaggag gctttgtgaa ctgctgagag tgaggcgtgg caatgtcagt 45400 gctaggtttg tccttactaa cctggggcca tgggaattga taagaccaga 45450 ttcccaactc taccccacaa tgtgatccct gtggtgaccc ctcacagggc 45500 tctttggtcg agcttcccag aagggatcac catctgccat tgtatgttga 45550 accccattca ttcattcatt cattcagcca accagcaact atttgttgag 45600 ctcttattgt gtgagaagca gtcttcaagg aactgggtga ataaaaaaaa 45650 caaaacatcc taaccttcat tgagcttaca ttcttactga aagaaaacaa 45700 ataaaacata catgtaatcc tagcactttg ggaggccaag gcaggcggat 45750 cacttgaggt caggaatttg aaaccagcct ggccaacgtg aaacccatct 45800 ctactgaaaa ttaaaaaaaa aaaaaaaaaa aagccgggca tggtggcaca 45850 tgcctgtaat cccagctact cgcgaggcta aggcaggaga atcgcttgaa 45900 tcctggaggc agaggttgca gtgagccaag atcataccat tatactccag 45950 cctcagtgat gaagcaagac tccatctcaa aaataaaaaa taaaaataaa 46000 aatatgcatt ccctttgcac cagcacactt ggtgcctggg gacctcgtgg 46050 ttggcaccct gaagcaggtg tccctcttct gtcttgcaca ccttgcttct 46100 gtcctggtgt gtatggcatg gccttctgcc ctccatggtg agcactgtga 46150 gggcagaggt tgagttgggt ttgctgtatt tctcaggtgc ctaggtttgt 46200 gcttgacagg tagatggaag gcacacaatg tggtcatcaa acctcagtca 46250 accatataag gaaggtagaa gtgaaaagtc ccataggtac ccaactaatg 46300 tcaccagttt cctggatacc tttcctggag tttatttata gtgtgtataa 46350 ataaatgatg tatgtgttta aatgcctttt tcacctttcc ttttagagct 46400 gcctcttttt aacagttcca ttccattgta tggatgtact atgatttatt 46450 gaaccagttc cctactgatt attctgtttt ttgcagtctt ttgttatgat 46500 gaacattcca cagtgacaat gttgttcata gtcattcaca cacatgcaag 46550 tccttctgca ggatatattt ctagagggga attgctgact cagaggtttt 46600 ggtactctgt gttgattgta gagtgacggc agaaaagtga ggcccaagag 46650 tttcctagtg accatgtgta gtggacaagt caccagtccc tgtgagtgtt 46700 tggcccaaag gctttaaggc atttgatatc actgtttttg tttctgcacc 46750 aggcgggaga cactatattc aatcgtgcta agctcctcaa tgttggcttt 46800 caagaagcct tgaaggacta tgactacacc tgctttgtgt ttagtgacgt 46850 ggacctcatt ccaatgaatg accataatgc gtacaggtgt ttttcacagc 46900 cacggcacat ttccgttgca atggataagt ttggattcag gtaagagata 46950 ctcagtcaga atctgtggta aacatgtctc tctcatgtgt tgactaggaa 47000 atgcagtcct ggcagctcaa gagtgcctct ttaagctctg gagcagaatg 47050 cctcctctga gaaatgggtg ctttgtatta gttgagatgg aaagaagaga 47100 ccagaaatgc ctgtagtctc tgcacatcca gacaaaaaca aattttcccc 47150 cctttttttt ttttgtttgt tttttgagac agggtctggc tctgtcaccc 47200 aggctggagt gcagtgccgt gatcttggct caccgcaacc tctgcctccc 47250 gggttcatgc catcctgtca cctcagcctc ctgagtagct gggactacaa 47300 acacttgcca ccatgcgcag ctaatttttg tatattttgt agagatgggg 47350 ttttgctgta ttgcccagtc tggtctcgaa ctcctgagct caagcaatcc 47400 atctgccttg gcctctcgaa gtgctggatt ataggcatgt ggcaccatgc 47450 ctggcctaag aacagttttt agcatttggg aggggctctc atctttaagc 47500 tccaaatgat actgtatttt cttgcttttt tctttctctt gccccacaag 47550 ttttggaaag taaattggaa tagttttccc ccactgaatt atttagcttg 47600 tatacctcag cagatgttcc ttggcctgtt ttgttttgtt tttgagacag 47650 ggtcttgctc tgtcacccag gctggagtgc agtgacacaa tcatggctca 47700 ctgcagcctt gactgcctgg gctcaatcca tcctgcagcc tcagcctcct 47750 gagtagttgg gactacaggc atgagccagc atgtccagct aattttttat 47800 ttttagtgga gatgaggtct ggctatgttg cccaagctgg gcttgaactc 47850 ttgggctcaa gtgatcctct cacctcagcc ttccaaagca ttgggattac 47900 aggtgtgaac cactgctccc gcccttggcc ctataagaag gaatgtgatt 47950 ctgttttcca gcagggcaca aacttctgct taaatacaaa gcccaaattt 48000 ttccaccaaa atgcccctag tgaagtggcc agcccagatg cccgactagc 48050 gtattatcca aagcatattg tcattggtgg aaaatggcct tatagtccat 48100 tgttttgtct taaaagtaaa tatataaata aacttgtata ttgtttccta 48150 attccgtgtt tatattaaca taaaagtgtt ttaaattacc tgtcagtggc 48200 caggtgcagt ggctcgtgcc tgtaatcgca gcactttggg aggccgaggc 48250 gggcagatca cctgaggtca ggagttcgag accagcctga ccagcatggt 48300 gaaaccctgt ctctactaaa aatacaaaaa ttagccaggt gtggtggcag 48350 gtgcctgtaa tcccagctac tcgggaagct gaggcaggag aattgcttga 48400 acccgggagg cagaggttgc agtgagttga gatcgcgcca ttgaacttca 48450 acttgggcaa cagagcaaga ctctgtctca gagaaagaaa aaaaaaaacc 48500 tatcagttga ataacaaaac cctttccttc cttgctttaa gtgaatctga 48550 agatccagga gctgtgctgc aggtaccctc tatgttgggt acccctggtt 48600 taggctgact agtacagtgt ggttggctca tgtagacagc agacccttta 48650 ttttagatac aacttttttt ctttttcttt tatttttttt gagacagagt 48700 cttgcttgtc acccagcctg gagtgcagtg gcgtgatcat ggctcactat 48750 agccttaaac tccctggctc aagtgatcct ctcacctcgg ctttcctagt 48800 agctgggacc acaggtgtgg gccagcaccc ctggctgatt taaaaaaaaa 48850 aaaatttttt tttttagaga tgtctcacta tgttacccag gctggtcttg 48900 aactcctggg ggctcaagca atcctcctgc tttgacctcc caaagtgctg 48950 ggatgacagg catgaactac tgcacctgct gagatgcaac agctttctgt 49000 cagactcatt ttattctcat catttcttcc tgtcctccct tgctgggagc 49050 atgagagctg tgatgggaat ataggaatgt atgaagtcct tctcccagat 49100 caaaaatcct aacttcttgt cttaaaggga ggaaaatttg aatgtaacct 49150 tacttttaga ctcttcagaa atccttctat acccttccgt ccccgctttc 49200 acccttcctc cctctccgtg tgtgtatctt cttctcttga aacacacagg 49250 tttataccct gacccctctt gattcatccc ttgaagcaca gtggtgaaca 49300 aggaaggggc ccgtgatgcc ctaattcttt gccacagcac catgtttgtt 49350 tcacaaggag cctggcaggt ttgggcttgg ggcagatagg ggagagaaag 49400 cagcagagac agcaaaacca aatcatgtca gcttggcatg tacttccctc 49450 tgaaatagct aagaatccat ttctgtaaaa gcactgatta tcagaaaacc 49500 ttattggcct ggccaccttt ggttcaaacc ctcacattaa taatgtggac 49550 agtagtatga ggtgtgccaa aggtggatga ctcagcacct aagtgatgac 49600 acctaattac gaataggttc attaaagcag accccctggg gacctttgct 49650 tgaggatcct tacagtcaga attcctgaat atatttgaaa ataataattg 49700 catctttatt ttcatatgtt ctgtatggtt tggctgactt ccccctcaaa 49750 gtctgagtta gagttttcct taatttatgt gatgggtttg gtctttttgg 49800 attccagaaa gagctgggtg tggtttggag ctgcactcag agtcacacaa 49850 aaccacagcc tttagagaac ccacaggaag gctttggggc acgtcctgat 49900 tcttgacatt tctcatcagt gctgactttg tatcccttag gagttcacaa 49950 ttcataacca ctgaaatatt aaaatacaaa aagttttgga aggatgagag 50000 cccagatgct ctactacttg aaaatatgtt aaaacataag ttcatcatta 50050 tacattttgc taaatcagga taaagtctga agtttcaaag aagttttatt 50100 ttagcaaatt ttcagaaaca ctgcctcaac tgttagggcc agtgttctag 50150 tcagtatgcc tttggaagca tgaaagctgg attggtcgat aggatgggtg 50200 tggaaggggg gctgtgactg ggtgggtaca gagaggctct gaaacaatct 50250 cagattccag gagttcctgg ataaggactt catgtgcggg aacagagcac 50300 aggagaagca gattcctgag ccactcagga agaactgggc ctaggcctgc 50350 tcttgtcact gactggcttt ctacataacc acagaaacag cactgtgttg 50400 tagaaagagg aagatcatac tttttgatat ctgtgtctaa tttaaggtca 50450 tctgagccct gatagaaaag caaaacagac aaaacccttg taactgctcc 50500 ctcccacccc acccaccatc aaaaaagctt tagagaggct ggacatggtg 50550 gctcttgcct gtgatcccag cactttggga ggctaaggtg ggtggatcac 50600 ctgaggtcag gagttcgaga ccagcctgac caatatggtg aaaccccatc 50650 tgtactaaaa atacaaaaat tagccaggtg tggtggcaca cgcctgtagt 50700 cccagctact tgggaggctg agacaggaga attacttgaa aacctgggag 50750 gcggaggttg cagtgagccg agatcacgcc attgtactcc agcctgggct 50800 acagagcgag actccttcaa aaaaaaaaaa aaaaaaagat ccggtttggt 50850 gtcttacaac tgtaatccca gcactttggg aggccgaggc cggtggatca 50900 cgaggttaag agatcaagac catcctgacc aacatggtga aaccctgtct 50950 ctactaaaaa ttagctgggc gtggtggcag gcgcctgtag tcccagctcc 51000 tcaggaggct gaggcagaag aatcgcttga acccgggagg cggaagttgc 51050 agtgagccta gatcgcgccc ctgcactcca gcctggcaac agagcaagac 51100 tacgtctcaa aaaaaaaata aataaaaact ctagagaagc aaaaagaata 51150 actttaaaag tgtttatgtt ctcagcaagc tttattttgg ggatgtcaga 51200 acttaactaa ccactgctcc ttctgtgtgt atgtttttcc tccagcctac 51250 cttatgttca gtattttgga ggtgtctctg ctctaagtaa acaacagttt 51300 ctaaccatca atggatttcc taataattat tggggctggg gaggagaaga 51350 tgatgacatt tttaacaggt aatggtcata acttagatat ctttctcctc 51400 tgtcaacctt cacttccagt tttttaacca atgcttggtt gttccccaag 51450 gactgaccct cagatgggat gcacccctag tcagcccaca ttcttaggtg 51500 tggcttccta caggtcctgc aggtgctaaa agggatctgt aggaaaatga 51550 gtttctgaga tttttgtatt ggcctggaaa aatgtcaaat gggaaccaag 51600 tgacggggca agtttacttt gacttgctgc atgccgtttt gtactcaagg 51650 agtaaaccaa tgtcctttgt aaaaatccct cctttcatta tggtcccctt 51700 tcactgtgaa acaagtttcc ttgagcagaa tcctaactgt cttcacagaa 51750 gctttgtgtt atatttttat tttggagtat tttcacatat acaaaagaga 51800 tactgtagta taataaacct ttgaggacct atccagcccc agcaaccatt 51850 atggcctggt cagttctgtc ccatccacat cctggggctc tttttaagct 51900 ggtaaatcat tatgatgtgg gttgtcattt acagtggtaa aaaacatcta 51950 tcagtagcat ttgaaagaac attctgctca gtcctctggc tgtagaggct 52000 tcaaccccac cagccaccga tgagcacctt ctccctccag gagccagtct 52050 gagctcatta ctgagtttaa tatcagaata caccctggtg cagcctttct 52100 aaattgcagt accagttaac agaaggtgtc tgtcagagca acacccaagt 52150 cattcaagtt accattgtgt gcaaacttaa cagagaccca cgtcttcaat 52200 ataagccttg aaggaaactc cagttttagt atgtagatgg ggtatcaagt 52250 gtgtgcacat tgaacatctg ctgcatacag agcactgtgc caggcaggcc 52300 caggacactg aaaacctgga catagggtcc agacagaagc aagcctgctt 52350 ccacagaggc actcctgggc agacactctg gactgatatg acagtgtgca 52400 gggccgacag gataccacag gtctgaatgg tcagaacagc tggggaggga 52450 gggagcatcc gcaggcatct agtcccatgc taacgcagtg gcactagaag 52500 gatgggtggt gtgtggagca actttcttga aagataaagg acctaacact 52550 ttctatgcac cacttactgt gtgccaggca aggccaggaa tgtttaagtg 52600 gtctgggatc agccagttct gcctcttaac taactttgct gtcctgctct 52650 ccaggctttc attttggtcc tcattccttt tccttggacc aacacagaat 52700 cctccaccct gttctggctg cctctagtct tgttctcagc cctccatttg 52750 tttttttctg ccttttccca catgttctga agccctccat tcgtatacta 52800 ctttccagag acttccccat ggctaaaagc attttggaaa tactgtatat 52850 taggcccctt tcagatactg gcaaccgttt gtgggatgct ctgagaaggc 52900 ctctgtgact tagcctggcc cttttcagcc catcacctgc cacgtcctac 52950 cccagaccct tgtcaccagt ccccaggagc ttacgttgct ccctgagggc 53000 actaggcttg ctctcacttc catgcctttg cctgtgccat cctggctgcc 53050 caaaatgcta tggcagatac ctgttcatcc tcaactgggc tctgcctagg 53100 cttgctccag cagaggttac aaactctatg cttcttcctc tgtgtctcca 53150 acctcatctt cctcttctca cctccatcct ggccctaaag gccctatgtt 53200 tgaagcattc acactgtata ttctgtgggg cacacggccc cagtgtctgg 53250 cacatggtag tcaacaccac aaaccgcaga accagttgta aaaggacatg 53300 gagtcggaat gtgagtttta accagggtca tgctgggctg ggttctggca 53350 tgatgctggg ttgtgggctg agtgagaaca gcaagggtga tggtggatgg 53400 agcaacagtc ttgcagccgg ggctctcagg ccaagtgtat ggcagctctg 53450 tgataatgac tttcccttta ctctttgcag attagttttt agaggcatgt 53500 ctatatctcg cccaaatgct gtggtcggga ggtgtcgcat gatccgccac 53550 tcaagagaca agaaaaatga acccaatcct cagaggtgca ttctttgttt 53600 attcatactc cttccccctt taggatgagg taggctgcag gtccgaggct 53650 ctgggcctag agggaaattg aggtggtcag gttacagtgg agagggagga 53700 ggaagtacgt gtgatgattt cttcttaaga tttttgtttt aagacaatct 53750 ccttgtgctc ttttccttgt aggtttgacc gaattgcaca cacaaaggag 53800 acaatgctct ctgatggttt gaactcactc acctaccagg tgctggatgt 53850 acagagatac ccattgtata cccaaatcac agtggacatc gggacaccga 53900 gctagcgttt tggtacacgg ataagagacc tgaaattagc cagggacctc 53950 tgctgtgtgt ctctgccaat ctgctgggct ggtccctctc atttttacca 54000 gtctgagtga caggtcccct tcgctcatca ttcagatggc tttccagatg 54050 accaggacga gtgggatatt ttgcccccaa cttggctcgg catgtgaatt 54100 cttagctctg caaggtgttt atgcctttgc gggtttcttg atgtgttcgc 54150 agtgtcaccc cagagtcaga actgtacaca tcccaaaatt tggtggccgt 54200 ggaacacatt cccggtgata gaattgctaa attgtcgtga aataggttag 54250 aatttttctt taaattatgg ttttcttatt cgtgaaaatt cggagagtgc 54300 tgctaaaatt ggattggtgt gatctttttg gtagttgtaa tttaacagaa 54350 aaacacaaaa tttcaaccat tcttaatgtt acgtcctccc cccaccccct 54400 tctttcagtg gtatgcaacc actgcaatca ctgtgcatat gtcttttctt 54450 agcaaaagga ttttaaaact tgagccctgg accttttgtc ctatgtgtgt 54500 ggattccagg gcaactctag catcagagca aaagccttgg gtttctcgca 54550 ttcagtggcc tatctccaga ttgtctgatt tctgaatgta aagttgttgt 54600 gttttttttt aaatagtagt ttgtagtatt ttaaagaaag aacagatcga 54650 gttctaatta tgatctagct tgattttgtg ttgatccaaa tttgcatagc 54700 tgtttaatgt taagtcatga caatttattt ttcttggcat gctatgtaaa 54750 cttgaatttc ctatgtattt ttattgtggt gttttaaata tggggagggg 54800 tattgagcat tttttaggga gaaaaataaa tatatgctgt agtggccaca 54850 aataggccta tgatttagct ggcaggccag gttttctcaa gagcaaaatc 54900 accctctggc cccttggcag gtaaggcctc ccggtcagca ttatcctgcc 54950 agacctcggg gaggatacct gggagacaga agcctctgca cctactgtgc 55000 agaactctcc acttccccaa ccctccccag gtgggcaggg cggagggagc 55050 ctcagcctcc ttagactgac ccctcaggcc cctaggctgg ggggttgtaa 55100 ataacagcag tcaggttgtt taccagccct ttgcacctcc ccaggcagag 55150 ggagcctctg ttctggtggg ggccacctcc ctcagaggct ctgctagcca 55200 cactccgtgg cccacccttt gttaccagtt cttcctcctt cctcttttcc 55250 cctgcctttc tcattccttc cttcgtctcc ctttttgttc ctttgcctct 55300 tgcctgtccc ctaaaacttg actgtggcac tcagggtcaa acagactatc 55350 cattccccag catgaatgtg ccttttaatt agtgatctag aaagaagttc 55400 agccgaaccc acaccccaac tccctcccaa gaacttcggt gcctaaagcc 55450 tcctgttcca cctcaggttt tcacaggtgc tcccacccca gttgaggctc 55500 ccacccacag ggctgtctgt cacaaaccca cctctgttgg gagctattga 55550 gccacctggg atgagatgac acaaggcact cctaccactg agcgcctttg 55600 ccaggtccag cctgggctca ggttccaaga ctcagctgcc taatcccagg 55650 gttgagcctt gtgctcgtgg cggaccccaa accactgccc tcctgggtac 55700 cagccctcag tgtggaggct gagctggtgc ctggccccag tcttatctgt 55750 gcctttactg ctttgcgcat ctcagatgct aacttggttc tttttccaga 55800 agcctttgta ttggttaaaa attattttcc attgcagaag cagctggact 55850 atgcaaaaag tatttctctg tcagttcccc actctatacc aaggatatta 55900 ttaaaactag aaatgactgc attgagaggg agttgtggga aataagaaga 55950 atgaaagcct ctctttctgt ccgcagatcc tgacttttcc aaagtgcctt 56000 aaaagaaatc agacaaatgc cctgagtggt aacttctgtg ttattttact 56050 cttaaaacca aactctacct tttcttgttg tttttttttt tttttttttt 56100 ttttttttgg ttaccttctc attcatgtca agtatgtggt tcattcttag 56150 aaccaaggga aatactgctc cccccatttg ctgacgtagt gctctcatgg 56200 gctcacctgg gcccaaggca cagccagggc acagttaggc ctggatgttt 56250 gcctggtccg tgagatgccg cgggtcctgt ttccttactg gggatttcag 56300 ggctgggggt tcagggagca tttccttttc ctgggagtta tgaccgcgaa 56350 gttgtcatgt gccgtgccct tttctgtttc tgtgtatcct attgctggtg 56400 actctgtgtg aactggcctt tgggaaagat cagagagggc agaggtggca 56450 caggacagta aaggagatgc tgtgctggcc ttcagcctgg acagggtctc 56500 tgctgactgc caggggcggg ggctctgcat agccaggatg acggctttca 56550 tgtcccagag acctgttgtg ctgtgtattt tgatttcctg tgtatgcaaa 56600 tgtgtgtatt taccattgtg tagggggctg tgtctgatct tggtgttcaa 56650 aacagaactg tatttttgcc tttaaaatta aataatataa cgtgaataaa 56700 tgaccctatc tttgtaac 56718 <210> 2 <211> 56718 <212> DNA <213> Homo sapien <220> <223> variant B4GALT1 genomic sequence <400> 2 gcgcctcggg cggcttctcg ccgctcccag gtctggctgg ctggaggagt 50 ctcagctctc agccgctcgc ccgcccccgc tccgggccct cccctagtcg 100 ccgctgtggg gcagcgcctg gcgggcggcc cgcgggcggg tcgcctcccc 150 tcctgtagcc cacacccttc ttaaagcggc ggcgggaaga tgaggcttcg 200 ggagccgctc ctgagcggca gcgccgcgat gccaggcgcg tccctacagc 250 gggcctgccg cctgctcgtg gccgtctgcg ctctgcacct tggcgtcacc 300 ctcgtttact acctggctgg ccgcgacctg agccgcctgc cccaactggt 350 cggagtctcc acaccgctgc agggcggctc gaacagtgcc gccgccatcg 400 ggcagtcctc cggggagctc cggaccggag gggcccggcc gccgcctcct 450 ctaggcgcct cctcccagcc gcgcccgggt ggcgactcca gcccagtcgt 500 ggattctggc cctggccccg ctagcaactt gacctcggtc ccagtgcccc 550 acaccaccgc actgtcgctg cccgcctgcc ctgaggagtc cccgctgctt 600 ggtaaggact cgggtcggcg ccagtcggag gattgggacc cccccggatt 650 tccccgacag ggtcccccag acattccctc aggctggctc ttctacgaca 700 gccagcctcc ctcttctgga tcagagtttt aaatcccaga cagaggcttg 750 ggactggatg ggagagaagg tttgcgaggt gggtccctgg ggagtcctgt 800 tggaggcgtg gggccgggac cgcacaggga agtcccgagg cccctctagc 850 cccagaacca gagaaggcct tggagacttc cctgctgtgg cccgaggctc 900 aggaagtttt ggagtttggg tctgcttagg gcttcgagca gccttgcact 950 gagaactctg gtagggacct cgagtaatcc actccctttt ggggactgac 1000 gtgaggctcc cggtggggaa ggagactgac ctctcggttc acgtgtcttg 1050 ccatagagcc actctcctga gtgggttttt ctcctgatcg tttgggccaa 1100 gtgacttctc tctgaacctc atatttctct tctgggataa taaatggtca 1150 ccctttcaag gggttgtttt ggaagatatt gtgaacaatg gtaaataagg 1200 gcttaattaa tgagggtaag ccctcagtaa attgtcactg tgtgttcatt 1250 tcttcctctg tgtggatcgt gaccgagagc ccttccccct agcctcctcc 1300 tggtatgggt acccaaaacc taggtgagca gggatctctc ccaggggcag 1350 agagcttgtg tactctgggt gttagagggc taaaatataa ccagtcaaca 1400 ccacgttgcc catttctggt acttccggta gcagcctgag tctcaattat 1450 cttgcccaga tgatctgaac tctgacctct agcctgtttc agcataggca 1500 gagagcttga gtaggtgagt ttgcattcct catagcagct ggctgagcct 1550 agtctggact tctctttgac ctgtaaccta caggcccaca ggcccaaggc 1600 aaccacaggt tgcttccagg gttaccacac aggtggtttc tcatttctaa 1650 tgctaggttt tagataattg ttgtaagtga ggggccctgg caggcaggat 1700 gacatcctgc caataggagt tttctgtcac tttcccacag agccctggct 1750 actacatact cttgctcaat ttcgccagta attgcgtcaa tgtgttcata 1800 tcaagtttgg gaagaacatc ttggaattgg tcagacgtga actgtggtaa 1850 taatgggggc ttgttttttt aagcagataa ttaaattcct ttgcatttga 1900 tgattattct gggaagcaga ctagtcccat aaaatgaaat ggactctgcc 1950 ttgctgctaa gtgtctgact tgagacatgc tatcgagttt ctcaaaatct 2000 cttccttgtg taaaatgtgg ttgtcgatga ttaccttaca ggggtttttt 2050 taagactaaa tgagatcgtg tacattaaat acaggcactc aggctgggca 2100 tggtggctca cgcctgtaat cctagcactt tgggaggctg aggggagtgg 2150 atcacttgag gttaggagtt tgagaccagc ctggccaata tggtgaaaca 2200 ccatcccatc tctacaaaaa tacaaaaaag ttagccaggg gtggtggcat 2250 cgcagctact caggaggccg aggcaggaga attgcttgaa cctgggaggc 2300 agaggttgca gtgagtcaag attgtgccag tacactccag cctgggcgac 2350 gaagcaagac tgtctaaaaa aaaaaaaaaa aaaaaaaata cgggcactca 2400 atacaccgta taataataat atagtaataa tatttgctta ggatctttaa 2450 aaagtttcat tttttcagac tcccacagaa atggctctgc acagcagagt 2500 gaagggggag agagactgag tctccaggcc agaaaaaggc caggtttttt 2550 gcttttgttt ttagttgttg cctggatatt gcacagaaag aaaaaataat 2600 tagcaagtta aacaaaagta ccgcaaagtt gattacattg gtatttgagt 2650 atcacatctt ctctcagaag cgtaagagac aaggtcgtga ccatacctct 2700 gcttagtttt gttttgtaat ggtgttgcta gtgatcggct tgtcaccagt 2750 tactggtgtt tctaaatgga ctataattgg ctacttgaaa ggacttcctg 2800 agaaagaaca ttttggagga cgaggagaga gtgccttctc tattttggct 2850 gctttcatgt gacatgcaag agaccatgac gtttaggctg ctgctgaggc 2900 agccccagaa atgggggccg agaggtcttt tcttcatttt aatagggtct 2950 gtaggtttgg gtggttaggt acagttctca gaatggaggt tcctggctat 3000 gaggccttga gaaagctgaa agtctccttg ggagtgtgtg ggtgggggga 3050 gtcgagccca tctgttcatg ggcaggtgtc agccaaagcc cttgcgggtg 3100 gttttgaggt tggtgggaga aagcatccgt ggggtttaga gttgtggcct 3150 tttcactact tgcagttcct ttccccgact tggctttact ttctggtgtc 3200 caggggtctg ggccagatgc tgagattcct ctcagctgac aggtgtgggt 3250 tatgggcaaa cccttccctg gaggacataa ggcaccggat tggactgctg 3300 atgggttgct gttggagttg tcagggcctt ggaatagtct tcagatagac 3350 ttgggttagt gtgacctggg gcaggctgca ggtttggagc catagtaccc 3400 cccgccccca caccgggcac cctgctctgg gctaatgtga ggcttgcagg 3450 agtgagtgat gcagtgggaa ggggggcctt tcctgaggat tctacagctt 3500 tctccaggga atcctcccag gtagtttagg cctgcaggtg ctatgctatc 3550 cttctttcct aaccctgtct caggtcctca gcggggccat gcggcatcca 3600 cttataaccc tgcagcgagg ccctcttttc tggccacctg ggtgtttgcc 3650 tgctgagatg ggaggaacag tggccttggg cttcttcccc cgtcatgttt 3700 atctctgctc agattgggca gcagctcaat gggacttgac cagctgtggc 3750 actgccagtc tgaagatgag tagggtgatg gggggaggtg ggcagtacct 3800 gaagctgaac tggtgagaga ggcaggctgg cctgggggct cagctggggc 3850 ctgggatggt tggtacagtc ccctcagggg ggtaggggag tgagtgttag 3900 actgcttaag cctcagaggc cgctcttgcc cacctatgct ttgaggagat 3950 cctcttcatt tgttcaaagg gaagactctg atctagagat gggcacttgg 4000 accagcaaac agcagctaca ggtagccagg gcacccgagg agcacttgct 4050 catgagccgg tttccctggt ttttatgggg gctgttgctg agcgtctgcc 4100 agggtttgtg tcctagcact tgctggtctt tgctgggctc tcagctctca 4150 ggtgtttctc taccagcacg tttccccctc cctcatatgc acacatgtgg 4200 acacaagcag gctgcccagg acagagtgta ctttgaggct tgggaaagga 4250 ctctctctcg cccttttggg gatgagcctt ggaacctcat caccttccgg 4300 cttggggtgg agcttcatcc tgggggttga agctttaggc tcagataact 4350 agtcttgtaa gccagttttg tcctgttgtt tttttcgtgg aaaataatgt 4400 attgacgtat acacagacat tctttgtcta acagtctgag attgagaaat 4450 accctccatg actatttggt ttgctttcat ggtgaaactt ggtcgctttc 4500 ttagacacag cctatggcaa taagagtgat ccctggctgc tgtaattcat 4550 tccagacttt gagcaaacac aaggcaccgc ctccacctgc agtggagcct 4600 ctgatgaacc aaatggaaac tccttgggga atggggagta agagccaaat 4650 gtgggattgg acttaaactg cagcttctta gaactgtagc attccacgat 4700 gggattgtct agtgctcttc ctggaggtta ctattcaata gttggctagt 4750 gcacaggttc aggggtgacc tgatatgccc tagcgtttca gaagatccct 4800 gcaaggtgtg tcttttggtc catctgaagg gtcttgtatg gtgatcttgt 4850 atggatatcc gtgacggcta aggcatctga taacttcatt ccttcagttc 4900 cagcagtgtt cctgtattat gctgggcact agagctacaa agaagaaaac 4950 aaagtgcctc ctcttcagga actcttaatt taggcagggg aggcataatt 5000 gaacagtgct gaggtcatct aggggaacca aagtgtgtat ttatcccctt 5050 ccctatcact cccctccctc cttcatttct tcctttcttc tttcagaaac 5100 tccaagttca tatcaaaatt ctccagccct ggttttattt ggttgtgtga 5150 aaattttcct ctaatttctg aagctatgca ttagttctgc tgagtaatct 5200 ttaacttgct gctttataat gattataatg agatatcact gggtattatg 5250 gtctttgggt agcagcaggg tagggatttc caggctggga ctaagctaat 5300 ttatgggttg ggaattatgg ggcagttaat agcaaggcag tccaagcttt 5350 ccacagattc caccctaggg accatccaga cttaaggaac agggccggca 5400 ggctcatccc ctttgcactc agctgggcta tgggtgtgtg tttgtgaaag 5450 aggtttattc agtagtcata cctgctgatt tccctgctat ctgtttaccc 5500 agtgcctcct gtaccttgtt tcttactctt tgttctctgc tcttactatg 5550 aagaagcaga gactggaatt ctgcttgaac ccacatctac ctggaaattc 5600 cagtttttct tgtccagtgg agcagcaatc cagttgtttt aggacaaatg 5650 gtctgccctt gaagcttaaa tcctttgagg gcctggcatg gtgacagttt 5700 tacatttggc tttggtatag actggtgtgg tccctgggca gtgaggtcac 5750 tgtaaggcca gccagccaga ccctggctcc taggggaatt aacaaggcat 5800 gggattagac tcacagggtc cctcctgtcc ctaaacttgg taggggttcc 5850 tgggagccag actgcgatta agattgtaga gacctgagac ctgagttgta 5900 ggggcctctg tgttgatctg ggccattgcc gggtgagctg aggcggtcac 5950 tagctcaagg agtgatctca ggatattgtt ctgtaagtca gagacctcca 6000 ggttggagag tggggcttgg gggtggggga cagggtttag tggggagctg 6050 gttctgggtg aatgtggcct aaagggattt gtccttagaa gacagagggg 6100 tgagtcacac actcagtgct tcaggttcca ctttgcggct tggcctcagc 6150 ccgccccttc cctgcacaaa tgaaggccag gggctatata attggctgtt 6200 gctgaattct ttggcagtga ttttaaagtc tggtctgggt gtgttatgta 6250 gctgcttctc tatccactcc ccacacccgc tgcttctcca gagcccctca 6300 caaagcccag gcagagagag agagagagag agagagaatg acttgcctca 6350 cagagatgtt ggggataggg ataggggtat gggtctttgc ttttgccttt 6400 tgagggggga taatctcttc cttcatttta aaagtaaaaa gtaatgcagg 6450 ctcattgaaa ataatttgaa aagttgaaag agatataaaa gcacacccaa 6500 attcctatca cccaaaagaa acataccggc atatttccta ctagtctttt 6550 tcatgtttaa gaatatagct gatatatttt tttttctttt tctttttgag 6600 acagggtttt tgctctgtca cccaggctgg agtgcagtga tcacggctca 6650 ctgcagcctc gacctctcgg gctaagcgat tctcccactt cagtctcccg 6700 agttgctggg accacaggtg cacaccgcca tgcctgacta atttttgtat 6750 tttttgtaga gatggggttt tgccatgttg cctaggctgg tctcgaactc 6800 cagagctcaa gtgattcacc tgccttggcc tcccaaagcg ctgggattat 6850 aggtgtcagt caccacaccc agtgttatag ctgttgtctt tatagatgaa 6900 cagatagatt gacatagatt catgtagata gcctggtgtt cagcattttt 6950 catttaagat tctgtcacag acttgaccct atacctttaa aaatcacaaa 7000 ggcagtatca tagtctgtca gctgaatatg ccataactta aaaaaatcat 7050 tcaactgttg ctgaacacac acatatacat atatagtttt tgttttttct 7100 tagtgatgta gtgatgcttg tgcagaaagc tttatgtact ttttggatgg 7150 tttctgtagg agagctttct aaaaaaggaa aaaaagtgtt gaatgttttt 7200 tgagaagggc tagattttca agccagtctt acaaaaggat agactcattg 7250 gaaattccag atttgcttag tgctggcaga tgagtatcac ttattgctga 7300 acaatgtgtc tagaattctg attaaaaaag aaactaggtc caggaagtgc 7350 ctgggggcag gggcaaaggg ccaggctgca ggataggctc ttaggatctg 7400 gctgagcaga aatctgctgt gaacagaatc ggtgggggtg atgctttctc 7450 agtaacttct ccatttgttt ctttagcagc taagtccctg tgctggactt 7500 ctgtggacta ctgtggctct ggggctgtgg ttgtgggtga acaacagcta 7550 gctaaaccag tgctgttgac atcattgaga tgtgacgcac aggaaggtgg 7600 gagcaagctt gcaaatcaga ttctgaaaca tatagcacag ctctcccacc 7650 tccaggtggt cctgagatct agggaggagc catagtgaga aactttaggt 7700 ttctaggaat tctcttaggg agaagctctc ttagggagag gcagaacctg 7750 gttctcagtt ggggctgatt caggtgggtt agatcaataa agcctcaggc 7800 cagtgtgcca ggctattccc aaggagtata ctttgaagtt actcccttta 7850 gaatgtcctc agtggagata aattctctct gaggagcagt tttgtctgcc 7900 ggggtcattt ggcacaaagc ctggagtgct agggcgaggt tgcactgagg 7950 gaaggggcag gattatgtca gcagtgtgac ggatacagtg tgaggtcagg 8000 ctccttcctg ccccaccacg ggggcctaga ggtcatgggg agggtccctg 8050 gcaggggatt caatcattgc ttggccccat gacagagtat attctaaaaa 8100 tgccttaagt ttttttcttt caaagtttct tcctgttttg cataatggcc 8150 ttttgccttt gacatcctga aaccgcagag ctgtcattgg tgttgcagga 8200 cactgccagc ttgaaaaaaa tcaacaacaa aaaaagaaac aggaaaggat 8250 gtggagttca gggtgcggcc tagggaagct ggtatttgcg ttatgggatt 8300 gtggggatgt ggtattaagg tgttgggtag cgcctgacat ttagaggagt 8350 actctgggca gagtccctgc ctgcccaaga ataggtagaa ttgagtcttc 8400 acaccaaagt caggagagac cccctccccc caggaagaga atgaacaggg 8450 actcatttcc tcattcagca aacttttatt ggtaactaca ctatatgaag 8500 tgtgagagat agacatgaac aagagaggcc cccactcttg ggcagtccct 8550 tagtagtagt agatagactc tggcaatatg gtgtggtcag agagaggaag 8600 cctgggtgct ttgagggtac tgaggaggtg cagggagcca aatgggtggt 8650 ctgggccagg gccagagtca gaatgaagga cctctcttcc agacgttgat 8700 tttagcatct ctgtctctca gtatgtttga acagtctccc ttattggaag 8750 ggcaggagtc tactgctaaa agtaacctgc gatttcctct acttgctgtc 8800 atgtggaaag aatactaaag ctgaaattcc aaaagttgca cacctttacc 8850 agcagggcag gagaggaaag gaaatggagg cagagtgagc tgaagatgat 8900 aaaagaaaga gaaggtggtg cagtttggac tgttatggac agaggaagtc 8950 tgagggtagc tggactgagg gatcaaaggg aggcagttga aagggaagag 9000 agctgcagag agggatttct tggtctgcag agggtaggag caagccttga 9050 aggctgctgg agtgaggatt ccgagccctg gtctttattc tttttctaat 9100 tcattacatc attttaggca agtcctaact cctttggtct ctgttgtctt 9150 tctgaaattt gagtgggctg ggcctgctgg tctttagcct ctgtctttct 9200 ctacctccta gattccagtt tggcgagtgg gggggaaaac ctggttgtat 9250 atgcaacgtg aaaggcctct ggaattcctt ttgaagctca ctacccatga 9300 ggcttctgct aaggatttca tcatgtctgt ctaagcagac ataaaaattt 9350 tagcaggtgg atgacccgta gaaatggcac aaggaatgtt tctttctgtc 9400 acactgtggt atttgattta agaaagttgt tatcctctct gtgcctcagt 9450 gttctcactt gtaaaatggc aataacagta tccacctcat agatgttatg 9500 aaatacaggt agtagccacg aaagggctta aaacagtgcc taacacagaa 9550 taagttgtga atatatgtta tttattattg gtagtataat gcttatttgt 9600 gaagattttg gcttttgctt tataggacct tttttttttt tagttgaaaa 9650 tacaatgtta ccatgttaaa tgttaaaaaa aattctactt accattgtaa 9700 cagaacatgc tcccacttct gtaacagagc ttgctattac ttttcaaatg 9750 catacatatt ccaatgcata tattccaatg cagttgtaga gtgaaactgt 9800 ttgcatgcag ccatttttat ccaacattat cttataaaat gttatgttgt 9850 ttatgattat cctaattatc ttttgttgct gtctagtatc cttatagata 9900 ttccattagc atacactatt ccaggtttca ctatcgtcga taatctagat 9950 atgaacattt ttgtagtgtg tagctctttg cttcagttga attactttcc 10000 tgggataaat tcctggggaa gaatttctag gccagaggat atggtcatct 10050 tgacaatact gattcacatt gctgcattgc tttccaagag gtttggaatc 10100 attcacaggt tctaaattgg aaaatcctgg cttttgaagt atgtggattc 10150 taagggcgat ttggatctag ctggagcctc acactgacac ttccagccag 10200 tgtgtgtgtg tgtgtgtgtg tgtgtgtgtg tgtgtgtagt tccctatgct 10250 ggacaccgtg tgtgtgtgtg tgtgtgtgtg tgtgtgtgtg tgtgtagttc 10300 cctatgctgg acaccatgtg gcctttctgg acattagggt tttcctgtga 10350 ttgcctcaga gcagttcctg ttgaattcac tctgtgtcca caaaaggagc 10400 cttactgtgg ctctttcaac acccacctac ctttgccaag ttggtttaca 10450 gaaagtaaga acattctttc cttcttcctt gatatgtggc gctaaaccta 10500 tagcatgggg caggctctgg ctttaaaaac ctgacttaaa aataatggtg 10550 ttgatcaaaa agtttgtgga tcagtttttg gaaacactgc atgtagccat 10600 ccatagaaac ttatattctg ttgggctagc ctgggcgcct gatcatttaa 10650 ctcatgtgga tgaacttcta tgtaatagcc ctggtgtatg ggatccagaa 10700 acagggccct aatgaagaaa ggcttttaaa ttatgttgga taaaaataag 10750 ttgttacaat agcccaaagt ctgcaaatat gaattgccag ttctgtcctt 10800 gtagtcatcc accatgtgcc tgcatctttt gtagactctt gtagattcag 10850 aagcccactg aattgcataa atgatggaat gattttagac ttagtgattt 10900 cagtgactaa aagtttacag atcctggccg ggcacagtgg ctcacacccg 10950 tattcccagc actttgggag gccgaggtgg gtggatcacc tgaggtcagg 11000 agtttgagac cagcctggcc aacatggtga aaccttgtct ctactaaaaa 11050 tacaaaaatt agccgggtgt ggtggcatgc acctgttgtc ccagctactt 11100 gggaggctga ggtgggagaa tggcttgaac ctgggaggcg gaggttgcag 11150 tgagcccaca tcaggccact gcactccagc ctgggtgaca gagtgagact 11200 ctgtctccac ctcccccgcc ccccgaaaaa aaaaaaagtt tacagatcca 11250 gcagatgggg catattcaat ttgtgacagc cactcccttc accttatagc 11300 tatgtcatat gtcttcttct cctttgactg cattctgcag cagtcagttg 11350 tgacttaata tggcactctg ggcccactga attaggtcag agctgctagt 11400 agtatattgt tcctagagac ctagggcaag attttcttac tacataaaat 11450 gagggagata atttcttacc tcaagatgtt ggtaagagga gtgaatgagg 11500 ttagttatat ggtaatatca gtactctgaa tgtcttttga tcaatgccta 11550 actcatcttc ttgggcacaa aaggcataca gtcagcaccc ttaggccaca 11600 tataaaattc ctccaaatgc aggttttcat ctgccttggg gcagagtcaa 11650 gagaaagaag aggaagaggc gtgaggctct gaccacaact tagggacaga 11700 atatagccca aagcgagtac cccaggccac aaggagaagg ccgctatctt 11750 gttgaatcca cagcactgga aacttggagt gtgtgttccc ctgtgtcagt 11800 tacactggaa ttttatggct gctcacattc ttcccttcag gtggacgttg 11850 ttcatcagta tcctgggcaa gaggccatca taaaccacag acagctgagt 11900 gattaggaag aggagctgaa gagggagcat tagatgtttg attgagtctt 11950 aggtgagaaa gtatatcatt aaaacaaaaa gatagatgta ggcgggctca 12000 gtcttgtgtg cctggtgtgt tggtagaaaa actaaagcac aagcctgtag 12050 ataacctgct ttattctacc tcggggctgg tgttggaatc caggatgcca 12100 gaccctaaag tccagctctc tttccaacct actgaataat ccgagagaaa 12150 tcatgttctc tctctgggcc tcagtttgcc catgtataaa atgagatgaa 12200 ggattggctg ggatgctctc cagagtctct tcctgcctgg agttctgacg 12250 tagccatgta ctcctgctca gcatcgctaa atggctttgt ggtaggacca 12300 ttgagtgctg cctccattag ggccagctat gtaatgctgg ggtggctgtc 12350 actgggccct aagagccagg attggtctta ctggagaaat ccacatccac 12400 ctaaacttaa gacccagggg tgtccaatct tttggcttcc ccaggccaca 12450 ctggaagaag aattgtcttg gaccgcatat aaaatacact aattatagcc 12500 gatgaggtta aaaaaaaaaa actcaatatt ttaagagagt tcatgaattt 12550 gtgttgagct gcattcaaag ccatcctggc cgcatgtggc ccatgggcca 12600 tcggttggac atgcttgctt tagacctccc agcaattcta gtctctaaac 12650 aggaaatcaa aagtcaagat gaatagataa gttggtcagt gtgaaaaagt 12700 aattggtggg agccactgta gatgcagggt tctaggctcc atcaacaacc 12750 acctacatca ctgaacgaaa gataatgctt gttcagcact tattacatgc 12800 caaccatggt aaaaatactt cagatgcatt gttttcatga actctcacag 12850 cagctctttt tcttgcctaa atgccccgtt agaacctcca gtacaatgtt 12900 aaatagatat gctaagagac aacatatgtg tcttgttagg gggaaaatat 12950 ccagtctttg actattaaga atggtgttag cagtgggttt ttcctaggtg 13000 ccctttatca ggttgaggaa gttcctttct attcctggtt tgttgagtat 13050 ttttatcatg aaaaggtgat gggttttgtc aaatgctttt ctgtgtctgt 13100 tgagatgatc atgttttttt gtcatttatt ctattgatat ggtatattat 13150 acattgattt ttcagatatt aatcttgcat acctgggata aatcccactt 13200 ggtcatggtg tataattctt tttatttgtt gctggattga gtttgctagt 13250 attttgttga tttgtattca taacagatag tggtctgtag tctttccctc 13300 cctccctccc tccctccctc cctcccttcc ttccttcctc tctctctctc 13350 tctctcccct cccctccctt cttttcccct cctctcccct ccccttccct 13400 ttcttctctt tcatagttgt ttaccactgt cagaaaaggt ctgttcgttt 13450 tctttcgtcg tgagatcttt gtttggtttt ggtatcaggg taatactgcc 13500 tcaaaaaatg agtagggaag tgttccttcc tcttctgtat tttgagagag 13550 tttgtggtcg gtttttatta attcttcttt aaatatctgg tagcgttcac 13600 cagtaaagcc atctgggcct gatgttttct ttgtggaaaa ctttttgatt 13650 cctaattcag tttctggtta taggtctatt cagaccttct attttttctt 13700 aagtcagttt tgatagtttg tgtcttccaa ggagtttgct tcatctaagt 13750 catctaattt gttggcatac atttcatagt gattccttat gatccttttt 13800 atttccgtta aagttggtgt agggatagtc cctctttcat tactgattat 13850 aataatttga attttctttt tttcttagtc ttgccaaaag cttgtcattt 13900 ttattgatct tttcagagga ccaactttga gttcattatt tgttctcttt 13950 gttcttattt ttctgcttca ttaacttctc taatctttat tctttcattc 14000 tgcttgcttt tggttaagtt tgctttttct ggtgtcttaa ggtagaaggt 14050 taggttactg atttgagatt taaagatcat gctctttaaa cgttttgata 14100 gatactgtca gtttgccctc tggctttttc tcattaacag tgtataggag 14150 tgcttattcc tcacactcat accagccctg ggtgttacta acctttatat 14200 atttgccagt atcatattca gacatagtat cttgttttaa tatgtttctc 14250 tgattactga tgaagttaag caaattttca cgtgtttatt ggccatctgt 14300 ctttcttttt tcatcctttc tttcaagatg ggagtctttg ccatgttgcc 14350 caggctggac tcgaactcct gggctcaaat gatcttcctg cctcagcctc 14400 ctgagtagct gggactatag gcgtgagcca ccatggctgg cttgcccatt 14450 tgtatttctt atgtgagtat tttttctttt tttttgaagt ggagtctcac 14500 tccatccccc agagtggagt gcagttgtcc gatcttggct cactgcaacc 14550 accgcctccc aggttcaagt gattctcaca ccttagcctc ccaagtatct 14600 gggactatag gtgtgtgcca ccacacctgg ctaatatttg tatttttagc 14650 agagatgggg tttcaccatg ttggccaggc tggtttcaaa ctggcctcaa 14700 gtgattcacc tgcctcggcc tcccaaagtg ctgggattac aggtgtgagc 14750 cactgtgccc agctgacttt ttttttcttt tttttaaccc tttttttttt 14800 ttaccctttt tttggcccat ttttttttac cctttttctt ttaacccatt 14850 tttctattag ttttaaaaat atgtttgcag gagcttttta tattgtggat 14900 ttttcttgtt tattacatat catttgtaaa tatggtctct ccatctgtca 14950 ctcttcttta tctctggttt ctttagctat gtagaagttg ttatgttatg 15000 ttatgttatg ttatgttatg ttatgttatg ttatgttatg ttatgttatt 15050 ttttggagag ggagtcttgc tctgtcgccc aggctggagt gcagtggtga 15100 aatctcggct cactgcaacc tctgcctcct gggttcaagc gattctcctg 15150 cctcagcttc ccgagaagct gtgattacag gcacccgcca ccacacccag 15200 ctaatttttg tgttttagta gagacggggt ttcactatgt aggtcaagct 15250 gatctcaaac tcctgatctc aaatgatcct cccaaagtgc tggggttaca 15300 ggcgtgagcc actgcactcg gccagaagtt ttgaattttt atgtgtttaa 15350 atctatgttt tcctttatga cttcaggttg ctttcatact taagcaggtc 15400 ttcaccatcc caaaatgata aaatttttct cctgagtttt cttctaagtt 15450 ggttctttag aagccaccaa cttggcttcg acagcaaaag atgaacagaa 15500 tttctgttca actctcatgc tgcaagaagc tttatgtaat actccaggga 15550 ccctttaagg tcccagagtt ttcctccaaa tctatcagtg attctagtgg 15600 ctaagagtag aaatgtgaaa atttagccat gtgtgctgat agagctgtag 15650 taatttgtaa gctctgaagt tctaaggagt caggggagaa gggaaagtaa 15700 catttattga acatctatta gctcaataag aacatgcgat aagtatgtat 15750 atgtattatt tcacttacat ctgaaaggaa ggcataatta tccccactcc 15800 ttagagaagg aaattggagc tggctacatt taaagtagtc ctgacaccag 15850 agagatattg ccaggagtac ttggctggct gagtgcccag atggcccata 15900 ggagtagtgg gccctccaca gtccaaggtc tggttctagg tggagagaga 15950 aggatgtgct cgtagtcagc accgcagctc cagaaaatct gctggggctc 16000 caaaactgat tagaggggca gctgactcag taataaaact cccaggagac 16050 ttacttacat actggaatgc aaagttgcag ctttactggg aagattagaa 16100 ctgttattga gtagcttaga aatctctggc tgaattcact gcaagggaag 16150 ccgcaggata agctaactgc tggtgagtca gcagtcagag cagggaagtg 16200 aatttaacat tagatgggtc agtctctcgt ggctgatgaa ttcatcccca 16250 caatactgta cacctgcctt agggaccttt gtctggacta ggggttgggg 16300 tccccctcct ttgtacagcc ctggaaggac acatccagct ccatccgcca 16350 tctctccctt acttatttcc ttccttcctt ccttctttcc atccagccat 16400 caagcttcct ttcatggcca ataatcatca ttggggtcta ctcatggact 16450 ctcttgcctc atgtatttgt tttattttgt cctcattccc acttctattt 16500 cccaggtata tcacaggcaa ctattctaac gtatttatag tttgtgtatc 16550 tgtttttgct cttgccaaaa tggaagccac tgctttatac atagatgtat 16600 tcttaacttt aaaaaaaatt tttttagatt aacctacaat aaaattggct 16650 ttttggcata tagtctataa attttaacac atacatattt ttgtgtatct 16700 accaccacaa tcaggataca gaacagttcc atcaccccaa aaaaatccct 16750 cttgtagtca cattctcctc ccacccttaa tcccaggcaa ccactgatct 16800 attcttcatt actattgttt tgtctttttg aggatgtcac ataaatggag 16850 tcacacagta tatatacatt tttttaaaca tatgtaaatg gcattttata 16900 gctcattttg attatatgtt tttcatccag ttctgttttt tttttttatt 16950 tttaaaaagt ttgacataac ttcagactta cagaaaagtt gttagactaa 17000 tacaaagaat tcctggatat cctttggagt ccctaaatgt taacatttta 17050 ctatatttac tttttccttc tctctctctc tctctctcgc tctgtgtgtg 17100 tgtgtgtgtg tgtgtgtgtg tgtgtatcta cctgtagata gatagatatt 17150 aatataattt tagatagatg tatctagatc tctctctctc atatatatgt 17200 gtgtgtgtat atatctatat ctatatctat atatatctcc ttttaccctt 17250 aaatattcag tgtatatttc ctaacaacaa ggtgatttaa aaatatatat 17300 ataaacatag tataattaac aatcaggaca tcaacattga aacatttctg 17350 ctatgtcatc tacaggcctt aggaagactt tgtcaggtgc cccaataata 17400 gccttgatgg tagaagaaaa ccatgtgttg tattcagttg tcatgtctct 17450 tagtgtcttg taatctgaaa taattcccaa gccctttgga tttcatgaca 17500 gtgacattgt tgaagagtac aggccagtta ttttgtagaa ggtctctcag 17550 tttaggtctg tctgatgttt cctcctgatc agattcaggt tattcacttt 17600 tgacaggaat accactgaaa tgatgctgag ttcttctcag tgtaacgaga 17650 tctagagaca cacactgtca gtttgttcct tattggcagt gtgaaccttg 17700 aggatttcat tgtagtggca tttggcatta ctccattata gttactattt 17750 taccatttta aattaaaact atctggccgg gcgtagtagc tcatgtctgt 17800 aatcccagca ctttaggagg ctgaggcggg caaattgctt gaggtcagaa 17850 gtttgaaacc atcctagcca acataacatg gtgaaacgcc atctctataa 17900 aaaatacaaa aaattagcct ggcgtggtgg cgcatttgta gttccagcta 17950 ctcaggaggc tgaggcacaa ggcttgcttg agcctgggag gcggaggttg 18000 cagtgagctg aaatcacgcc actgcactct agccagggtg acagagtgag 18050 actctgtctc aaaaaaaaaa agtaaataaa taaaaaaatt ttttaagtat 18100 cttatgggca tatacttgtc ctgttactcc tcaaactttc atccactttt 18150 ttttttttaa attttttttc ttacctttca tcgttttctt gatatccact 18200 gggttttagc atctacaaat gattcttgcc tgaatcagtt attatggtag 18250 ttgatggttt tctaattcca ttattccttc tatgtttgtt aattttggca 18300 ttcttctata aggaagagct tacccttttt ccctattaat taattcatat 18350 attaatgcag acctatgcat tcttacttca ttaaatcata atcctttact 18400 atcattatgt attctgatgt tcagactatc ccagatttag ccaataagat 18450 ccccttcagg ggaatggtct ttgggattcc tctttagagg ttcctggttc 18500 ctgttttctt ttgacatatc ctattactct ttgagcattt tttttttttt 18550 ttttactttt aggcacagca agaagttcca tggtcctctt gttctttccc 18600 caactcagcc ctagagtcag tcacttctcc aatgagctct agttcctttt 18650 agtagagaat cataattaga aaacaagaat cagtgccaag tgtgcacctt 18700 tgtttttaag gtccatccac gttgccgtgt atatgtccag catgttgatt 18750 ctaactgctg aataatacct catgattgtc atccatccca gtgtttcttt 18800 ttcccttctg taatgaggga ctcctggact gcctccagca ttaccttcac 18850 aaatattgct gtgaggaaaa tccttaaacg tttcctttat gggcaacgtg 18900 tgagcatgtt tatgttgatt caggggtgcc agacacagct ccagaatggc 18950 tgcctcagtt tacatttcca ccagcagagc atgacaggct ctgtgtctcc 19000 gtgaataatc agcattaacc agcttcctat tttttgccaa actaatagat 19050 gtgctaggat aactctttgt tttaacttgt ttttctctga ttaccaatga 19100 gctggagcat ttcttcatat gcctgatggt ctttgggatt cctcttaggt 19150 aaattgctta ttcattataa tcctttgcct gtttttcact ggagttctta 19200 tatttttctt gaagatatgc aggaattcct tatacatcct agatattaat 19250 cccttcctgg tctcagacat tgcagatatc ttctgaatct gttatttact 19300 tatttattta caattttttt tttaagagtt ggggttttgc tctgtcaccc 19350 agactggagt gcagtggtat gatcatgact cattgtggcc tcgcaatcct 19400 gggcttaagc gatcctccca cctcagcctc ctgagtagtt gggactacag 19450 gtatgcacca ccagacttgg ctaattttat tttatttttt agagatggaa 19500 gtcttaatat gttgctcagg ccaatcttga actcctggcc tcaagcaatc 19550 tttccacctc agcctcctgc atctattata tatatgttca ctttgctcat 19600 gctgtatttt gttgcaacat aaaactattt ttcccattgt tttgtgcagt 19650 ctctcaccag cactcttctt tttctgtaac tgtgttaatg ccctttgttc 19700 ttccatatgt taggtatgct ggtatagttg aactctgctg actctcctca 19750 gtaaacagtc tctttttatg acaccttatc ctctactgaa ttctctctat 19800 caagaatgac ttggccgggc atgggggctc atgcctgtaa tcccagcatt 19850 ctgggaggcc gaggtgggca gatcacccga ggtcagaagt tcaagaccag 19900 cccggccaac acggtgaaac cctgtctcta tgaaaataca aaaatcagct 19950 gggcgtggtg gcaggtgcct gtaatcccag ctacttggga ggctgaggcg 20000 ggagaatcac ttgaacctga gggggaggtt gcagtaagcc gggatggcac 20050 attgcactcc agactgggtg atggagaaac tccatctcag ggggaaaaaa 20100 aaaaaaaaaa aaagaatgac ttgtcttcct cttagagtgt gaggtctaca 20150 tacaaatatt attcttgtat tcagcaaatg tatgtcatag gcctagtgtg 20200 tgttaggaac tgtgctgtca ccaacaaagt ttagagaggt tataaaactt 20250 gactgtagct ttttagaggt ggaggagtga tttgaaacct aggctgtaat 20300 tccttcctcc tgtgattcct tcctactgtg ttgccttccc ttgaaaattg 20350 catttggggg ccaggtgtgg tggctctcgc ctgtaatccc agcactttgg 20400 gaggctgagg cgggtggatc acctgaggtc aggagttcaa gaccagcctg 20450 gccaacatgg cgaaaccccg tctttactaa aaatacaaaa attagctgga 20500 tgtggtgtgt ggtgacatgc acctatattc ccaggtactc agtaggctga 20550 ggcaagagaa tcacttgaac ccaggaggca gaggctgcag tgagctgaaa 20600 ttgcaccact gcactccagc ctgagtgaca gagtgagact ctgtctcaaa 20650 aaaaaaaaaa agaaaagaaa gaaaattgca tttagttcct gtagactgtg 20700 tgtcaaatgt ctaaatctct tctaacaaat ggcctaagga ggtgcaaagc 20750 gaagcatcct caccagcatc ctgacttggc agtgaggcat gggaccctgg 20800 agggagtagt ggtaagtgtg actctggaat tcttcctggg ctacttgtca 20850 gtgactggct ccagattgag aggagagccc agaggacaca ggtggctgcc 20900 ccagcctgga ggtgaaagtc ttaaaataaa atgccagatg cctagaccat 20950 tctaaacctt tctgagaagc tgaaatcatc ccttctggaa gcgctctagt 21000 tctaaaagga cagatataca gcaagatctt cctggggcta atatggagtt 21050 tataggcaag taggcctcag aacctttccc tggtagtgat atctgtgggc 21100 aggcacagtt tccacacttt ccagaaattc cagcggaagg agtgagaagg 21150 aggaatctgc ccttgagtga ggaccaaaga aagcagaaat tcctcttggg 21200 aatttttcct ccagagacca aacactactt gggagcttgt ttactgggct 21250 ttaaaagctt gtgaccccca gtcactcttt cttgacccca aggctttgca 21300 tttctgtggc ttccccactg gacagaagtg gaactgtcat gctgcctgtt 21350 ctggggtctc ccagaggttt ccccatgtcc tctccttgct tctactgccc 21400 cacagaattg gggatctgtg accacatatg gtatagaatt aatgcttgag 21450 aatggtttag ttcagtgatg tcaaataaga ttcactttta tgccacctcc 21500 atcagttgaa ggcccccctg gcccctaaat tggaaaagat tctgagacag 21550 aatccccgtg ggtacagcgc agggacagta aaggcacgtg tgctgtgatt 21600 tgctatccac tgtgtggatg catccaggaa tatcagaacc ctggaagatt 21650 atttaagggg aagttaggac agcttttttg ccaatccaag ggtgttcttg 21700 aggaagtctg tcttcctgta tggccttcag tttctttcct gtgtaaccat 21750 ggggccaaca cataattccc acagctctat tggcccttgt ctgccaggat 21800 tctctagggt ctgattcgag gtggatcctg gccctttgag gtggcagaat 21850 ctgatcatgg tgctgtttcc ttagatttag gccttgatac ccttggcgag 21900 agcatcctgg gctgagtgac cacctgaggt ttttctggtg attttgtgac 21950 ccatgtaaaa ctttgagctt tgggattatt ctctcaagga aatagtgaca 22000 tttggtgaag agcctgtttg gtgtggctat gtgaggctta gccaagaaaa 22050 tgcaccattt ttattaggag gttaggccat ccgttgccac aaagtgtcag 22100 atgctaggcc tagagcctgg agaaaactta ttttaaaatt gatggggtgc 22150 tggaggggtt ggggggtggt ggctgtagct catgaatcag gtgctaaacc 22200 tagaaacaaa aggcctcatg tggcagactg tttctgagca cagatgaatg 22250 gatgagcaac tggcgcaact ttgcccagtt ggtccagctt cccacttggc 22300 cacctaggct tgctgtgaag acctcgtctg gcagaaatga gagtgttttt 22350 gccccatctt gatcttaact gtaatttaag actaaaatct tagattctaa 22400 aacatcaaag gcaagatggc tcccagctct gtgagctcag cttctcacct 22450 cttagttgaa caagtgcagt gtgggtcaat acatgattgc tgctcttgct 22500 gccaggaact gtcccagcat agaaaggaat gggacacaat ccctgccgtc 22550 aagattctaa gggaggaagc aggcaggtcg actggtgcct catctctgca 22600 gggctccagc caaggtttgt gaaggatttt gcaggcatat ggagtgggga 22650 ctgattgatc ccgagagggg actggggaaa gctctgaaga ggggatgaca 22700 tttggtttga actccaaaaa atggttgctt tacctgtttc ctgaagtttt 22750 tgaggtggct tataagaaca tataccataa aaaggaccaa tataaattta 22800 aaatcagaaa aagagaaaat gggctgggca tggtggctca tgcctgtaat 22850 cccagcactt tgggaggcca aggtgggtgg atcgtgaggt caggagatcg 22900 agaccatcct gcctggccaa catggtgaaa ccccggctct actaaaaata 22950 caaaaaatta gctgggtgtg gtggcacatg cctgtagtcc cacctacttg 23000 ggaggctgag gcaggagaat cgcttgaaac ctgggaggcg gaggttgcag 23050 tgagctgaga tcgcaccact gcactccagc ctgggcgaca gagtgagact 23100 cctcctcaaa aataaataaa taaagagaaa atggaactta gaaaattaag 23150 aggaagagtg aaaaggtaga tatttagtca ggcacagtgg ctcatgcctg 23200 taatcccaac actttgggag gccaagacag gaaaatctct tgagaccagg 23250 agcttgagac ttgcctggca acatctcagg tgagacctta tctctacaaa 23300 aaatttaaaa attagctgag ctgtgtggct cgtgactgtg atcccagcta 23350 ctcaggaggc cgagaccaca gcccaggagg atcgcttggg cccagcagtt 23400 tgaggctgca gtgagctggc accactgcaa ttcagcctgg gctacagagc 23450 aagacccagt ttaaaaaaaa aaaaaaagat attcaaacca tgggtcccaa 23500 cgtagttatt atatttgacc atttgcaaaa gctgaaagca aaacatgtta 23550 cacattttca gagaggaaaa tacacagtag ttcctgagtg taagttgttt 23600 ttcttgacct cattcttaaa ttgcttcatg agggtgggag ggaagtggta 23650 gttaataagt gaacctgtaa accagcgttt ctcaaaatgt agtccaggga 23700 attgcatcaa aattgcagtt acctacagtg cttgttaaaa tgcagattcc 23750 tgggcccctg ccccaggctt atcaaatcaa tctggtgagt aggactcaag 23800 aacctgtaaa ttcacatact tctgcagatg attcttcttg cactgcacag 23850 catgaaagcc tctgcaatag acagaaagct accagcattg cgaaagcaac 23900 ttgagtgctt ggcctttgaa ggttgagtgg gactttaatg agggagagag 23950 taaggcatga gaaatggcag ttccactgag gtcagtcagt ggttcattgc 24000 tgacgaagtc acttttaagt catgttttag aagaactacc aagtgtggca 24050 ggtcaggcat gtggcaggac tgtttctgag cacagatgaa tggatgagca 24100 cctggcccca ctgtgcccag ttggtctagc ttcccacttg gccacctacg 24150 gtctgctgtg tggaccttgt ctggcagtct cctttaattt attttttatt 24200 atttttttct ttttgagatg gagtcttgct ttgttgccca ggctagagtg 24250 cagtggcatg atctcggctc actgcagcct ccacttccca ggttccagcg 24300 attctcctgc ctcagcctcc caggtagctg ggatcacagg caagtgccac 24350 cacgcccagc taatttttgt atttttaata gagacatggt tttaccatgt 24400 tggccaggct ggtctcgaac tcctgacctc aggtgatcca cccatctcag 24450 cctcccaaaa tgctggaatt acaggtgtga gccaccgcac ctggcctatt 24500 ttttttcagc aaattctttg tttttctctc tgttcccaaa tgcagggtac 24550 tgagaccaca gatgtattct gtttcctgtt gaaaaaatgt ttctcactta 24600 gctgggtgtg gtagcatgca ctgcagtccc acgggaggct gaggcgagag 24650 gattgcttga gcccaggagt tcgataatca tgccattgca ctctggtctg 24700 ggtaacagag cgagaaactg tctcttaaaa aaaagaaaaa gaaaaagagg 24750 tcctagggaa agaaacaaat agtggcttgg atggtgagtt ggtggaaaga 24800 acagtgggtg ttgggggtgt tgaacttgtg tttgtgtgtg gtgtacccaa 24850 gacatatcat gtcagcatta agaatagact attcctgttt tctggtcact 24900 gagttgtatg ttttgacatc cttattttgg aagatacttc cttactagga 24950 atgggatagg gagggggtca cctttcccat ctgtgggtca tattttaaaa 25000 tatttattgt tcaagtttaa agatataacc aaaggtataa agaaaaatac 25050 cacaaacatc tgatttaaga aacaaaccag ccgagcgcgg tggctcgtgc 25100 ctgtaatccc agcactgtgg gaggccgagg caggcagatc atgaggtcaa 25150 gagatcgaga ccatcctggc caacatggtg aaaccccgtc tctactgaaa 25200 atacaaaaat taactggtca tggtggtgtg tgcctgtagt cccagctact 25250 cgggaggctg tggcaggaga atcgcttgaa cccaggaggc ggaggttgta 25300 gtgagccaag attgtgccac tgcattctag cctggcgaca gagtgagact 25350 ccgtctcaaa aagaaaaaaa aaagaaagaa atcatttcct acaccttcga 25400 agccttcatg agttagattt tgaaacagtg caaaatgctt cacgtgagaa 25450 tcgagagtcc cttctggtgg ctctccatcc cctgctcttc tgtcaggttt 25500 tcttgtaggt ttatggaaac ctttgttact tgtgcaggtg gcagagaagc 25550 agagaggata gctgcgcgcc acccacacag ctaggattta ttggcgtact 25600 cccacgtgca tggcagccaa gtggacacaa ctctgtgatg aatcctccca 25650 agagaactga ggggccctga tggaggagct gcttctttgc aaagctttcc 25700 ttgactctct tcctgtcccc tagttgattc cccttctgtg ctagttttag 25750 cttattgttt gttacctgtc acacttagca gtactgttgg ctttgctggt 25800 ctccttgact actgggggta aagacctttt gttgttgttg ttgagacaga 25850 gtcttgctct gtcgcccagg ctggagtgca atggcgtgat ttcggctcac 25900 tgcaaccttc acctcccagg ttcaagagat tctcctgcct cagcctccta 25950 agtagctggg attacagcta caccacaccc ggttaatttt tgtattttta 26000 atagagatgg ggtttagtag agatggggtt tcaccatgtt ggccaggctg 26050 gtctcaagcc cctgacctca aggtgacctg cctgtctcag cctcccaaag 26100 tgctgggatt acagacatga gccaccatgc ccagcctcaa agacctcttc 26150 tttacttgct caccctgccg cccactcccc taccaacccc tgcatgccct 26200 ataccacctg gcacatgata catactaact gggtacatgt ttgaatatga 26250 atggatgtgg tgctgtgaat gcttagggga agtgggtgaa atgcttaaga 26300 accaaccttg agtggtctgg gaaggcttcc tgggagggtg gtgtttgagc 26350 taaggccagg cagctgttag atttgttaga ctgaagccct tgcagactta 26400 gagagcttgt gctcttccca gaatgacggg tgagccacgt acagtaaatg 26450 gtgcttctca tttctagccc aaggggcctc aaggggcacc gtgatttcac 26500 gagaatgctg caagcaaatc ttttctcaag ctggggaatt tggtggtaat 26550 gcctggctca gcttgcggtg cgcacctggc ctttggaaga ttggtacaga 26600 gagaagcggc ccatccacat gagcctgtgg aacagcactg gtgggggagc 26650 tgatttgtga agaggggctg tgcagtgtac tgtcaggtct gagacccagg 26700 aagaaattcc agtatcccag ctctcagaat cacagagttc taggcactgc 26750 ctagttccac gtgttcccaa atgtttcctg aatacttgga tttcctgtcc 26800 agagaatttt caaaacaaac ttagaggcct gacccatggc tgccaaggaa 26850 ggattttttt tttaaattaa attttaaaaa tcagtccagc atgaaaatct 26900 atgatgattt cataagagaa aggacatttt aatattcaaa gagtaagaag 26950 cacttaatct tggaagaaag ggcattccta tactttgatt acctttagtt 27000 taattaaaaa acacctacat ggtctttact tctgtgattt cattcctggg 27050 ctagtgaaac attgtcacaa taaagcatca ggccaacgct tctttcgacc 27100 cactggccaa tcagttgaca aacagtgact agatgtttca gcctattttg 27150 ctgaggctaa aggattgaac tagtgcttca gccagcatga aaaccagtca 27200 ggagtccgtg ctggtgttgg cttagattag cagggccttt gatggagggg 27250 catgtatgtg tttgggtttg ctgtgccagg caggggagca gtggaatttg 27300 tctgaattga gctcacacat tgaagttatt gagcgactta catgcaaggc 27350 catgacctgg actcccagcc gagaggccca cgtggcgggg cttgagctgg 27400 gggagccgag gacagcttac atctgctcat ctgcttacgt aaccctgcct 27450 cccagcttcc agagccaaga aaacacacaa gccagcccag cggggccgag 27500 agcctgtggt agcacacgcc atgcgccgca cagcaagggc gccttggctc 27550 ggcttgaggc ctgtcatgaa gccctcagcc ctctgcctcc tcccagagct 27600 tctccccacc accccaggca gtggctctga aacctggtcg caggtctgca 27650 tgattctgaa cagaggtagt cgttgccttc ctggagtctg agctctctgg 27700 agtttctcac tgggacagag ccaggtgtgt agcagagcat ggtccctgca 27750 gtatggcagg aggtgtgcag ggcattcagg aggcctcctg gctggcactc 27800 gacccaatta gtcattcaac gccaggtctg gggctgctgt ctgttgtctc 27850 aaaggtgtga gctgcaagat ccttagagtt gtggagaaaa aattgccaga 27900 ttggcaagaa gggcaggatt gggggtcaag gtgtctcagt gtgttggaag 27950 catgatgggg gttgtgcaag gggcacagcg agttcagaag ggagcaggag 28000 agtgagaaga ggctgttcag tgataaagct ctgcacagag ccattggagg 28050 agcaagctcc ttgaccatcc ttaaaccagg gtaattttca tttaggttct 28100 gccacacgct cagcagggaa ctcctggaag gcaggatttg tcttgtccat 28150 cctccctccc tacctcaacc cactcctcct tgggctggca cacagtaggt 28200 acccagaaag tatcaattga aacaaattga aagtggtctt gatacatatc 28250 acagggcaag tttgcagtta acagacattt cagagtaaag actctctggc 28300 ttggtgctcg atcggcttct gtgggttgtc agcatgctgt ggacagcccc 28350 ggcatgggag cgagtgggcg tgtgtgtgtg tgtatgtgag ggtgagagag 28400 cgttagtgtg tgtgttgggg ttggggagag aggaggggga atagaagatg 28450 gaccacccgg gtatcagctt ctgccctggg gagatggtgg tgtcagttgc 28500 tgagggaatc ctgagaagca ggtctggctg taggtggtga tggtggtggg 28550 gttgcatgag aatccatttg gggcaggttg aatttgaggt gcccatgaca 28600 tatggctagc catgttctgt tggctgtgag gtcaggagag agacatgaga 28650 tggaaacaga ggtttgggaa ctgtcatgtg cttaaaccaa agacctgggt 28700 atagggagag tgagaagaga agggggcaaa gatggacatc caagaaagaa 28750 gctgagaaag cctaggaatt tgaggtaaga ggagacgtag gtaaatgtga 28800 cgcttggtga tcaaggcttc tttccacctc tcctatgctg gacactcacg 28850 tctcctgtct gcttggaaat tcatgctgag ggcagggaag gtgggagcaa 28900 ggatttgtct aaagatcttg ctttggatcc ctgcactcct cctggtttac 28950 caagtgtcac tggacacgtc agggcgttct gagaccttag agagcatcca 29000 gtcctgtccc tgcagtttac aaatgaggaa accagtaccc tgagagtggc 29050 tgtactatcc actctcagga taccaaagat catctggaaa gtcactggtg 29100 gagctggacc ggggcccagg catctcttct cctgtccggg gctcttgact 29150 tcaggaccac ctttctgaaa cccatgatgg ggcaacacca ggacactttc 29200 cagcctgcag gtgtctgtcc cgcggaagcg agccaggcca catgtgaatt 29250 cctgttttct gggtgggttt cagaaggtac gagcaagtcg gcagggtgac 29300 agcccaggtg cttcttgggt tccccaaaac gcggttatgt ttagcagcat 29350 cctcagaacc aaaggtgggg tgggggctgc agatgttgtg ggggccctct 29400 gaagtgaaaa gagccctgtg acagatcttt tcttcatgtt tttcacaagt 29450 tcactgtgca gcagggcccc cccagtagcc tttgcccagg gttgggtgtt 29500 gggcagccca ggcctggctg accttgtggg gaagggtgtg aatggtggga 29550 atccccgagg gccctctttg cccgaaagcc ctaagccttg acatcagatg 29600 cccatcagat ggtccatcgg agccctacta cccagcttgc ccagtgagaa 29650 tcatctgggc tccttgttag gtagccattt aggtccttcc caaaatccac 29700 agactctcta agggaagggc ccgagatgct gtacttgtac taacttcctc 29750 aagcaattct tgtgataggt ttgggaaaaa cttgtccagg gtgaccactg 29800 actgagtcct ggtcttctct gaagagcaca gtgcctgctc actttagggc 29850 accctgggag gtgggagctg gctcagcagg cagtcttata agggactgag 29900 cttcaaggcc tctgtccctc caggagggag gtgcatgacc agagagggag 29950 gcctgaggat cttcttccct gccccagagg gtctgctgcc tgagctctgt 30000 gatagcgcag agagtaaaag gatcaagctt gattgaggcc tatctctcaa 30050 tgcgaaagtt tgctagttaa gaggagagtg ggaagggcat ttctggcaaa 30100 gagaaaagtg tggacaggca tggcttaagg gatggggagg gagacagaca 30150 gagctgaggg tgaagggcct tttgctcagc tgtgggcctt ggccttccct 30200 tgtgcaggga cacacagcct tagagccact ggaggtttta gtgggaaagt 30250 aatatggtcg gggctgtatc tcagaagaaa acaaactaat gggaacaggt 30300 cctgtgatgg tggacctggg tcagctacgg agggagggaa gatgtgagat 30350 gtgtactggg gaagggggtg gaagtggcag ctatctggtg agaggaagca 30400 ggcccacagc tttttttctc aagctgttga attcagaagg gcgagtgatt 30450 ccgggagtag ggggtgcttg gagagccacg cgttattgat aaacagggca 30500 ggctgaagcc tgctcactgg ccctgggcgg gttctcacca gcatgtttca 30550 ggttttgatc tgtgcttgtg gttggtgttc ctacctgttc tctaggttcc 30600 ttcctttgtt cttgtggctc atttgcttca caggtgaagc tggttacact 30650 agagtaacag ttcccaaagt gtgttccctg gaaaaatggt tctgtagcca 30700 aataagcttg ggaaatggtg ggttaaatat aacgaagggg gtttttcgac 30750 tgcacaactt ctcagagcct ttggtgtgtg tcgtgacttt gcagaagcag 30800 gatttaatac gcagcattcc cgttcttatt tgaccacgag acatgttttt 30850 ccattaagca tcttgctggg tctgatgttt tctggaaccc attttgaggc 30900 ggtctggtct gcagagagta tggggagcct gggttcaagc cttggctctt 30950 gactctcagc agagccttga ttccctgtgt tgcctggact gcaccacgtg 31000 taccacatac ccggtatgtg acgttttcct catccctctt cccacctgcc 31050 gttacctcac aatccacaat ctgcacctca tccatttttc ttctgaggca 31100 agcactctct tactaactta cttatctcat ctgcatccat gttcttctag 31150 gccagaaact tgggagtcat ccctccctct ttgttacttc ttcttcctct 31200 ttgttacttt atcccctctg ttactaaaca ttcttctgtg tttccagcta 31250 tttcttttat tttccctcgg tctcctttgg ggtttctttg cctccatctc 31300 tcccagacct tggttcacct tccatcgagt cccttcctgg gacatgggca 31350 ctcatgccac tcctgctacc ttccacttcg aagctaactc cctccacact 31400 gacgtcccca acatgcatgc atacacacac acacacacac acacacatac 31450 acacacacac acacacactt ccccagttag gctagaatca gagagatgat 31500 gtcagccatt tgtccaaggc cacgcagctg ggaggtcaca gagctaagtc 31550 tcaacctcag gggttttgag aaattgcctt ctcatccgtg atcactgatt 31600 tctacaacag cctgtcagga agtctgggta gaaattactt ccattttaca 31650 gtggagtcag agcggggagg gtcctgggca ggcgagtgct tcacagagtg 31700 accaaccatc taggtttgcc ccacactgaa gggggtttct ggggatggtt 31750 ggtcacccta atgctggatg tggtgcctga tgctgggcag gagggccctc 31800 tccgtggcca cgttgcctcc caggaggaga catttcctct gcagctgcag 31850 ctgcagcctg gccatctgat gcagcctgtg gagcggtggc gagtcctgtg 31900 gcctgctaac ttctccctcc ctccacctct ctagtgggcc ccatgctgat 31950 tgagtttaac atgcctgtgg acctggagct cgtggcaaag cagaacccaa 32000 atgtgaagat gggcggccgc tatgccccca gggactgcgt ctctcctcac 32050 aaggtggcca tcatcattcc attccgcaac cggcaggagc acctcaagta 32100 ctggctatat tatttgcacc cagtcctgca gcgccagcag ctggactatg 32150 gcatctatgt tatcaaccag gtgaggcctg ggaaggtgga atgagagagg 32200 gtgtgtgtgc atgcagatgt gtatcagatg tgtgtgtaat gagggcaggg 32250 gaaggggagt gatttcacag acacctggca cttacagcga ggaaccagcc 32300 ccccagccac caccagtgca gatgaggtaa acgccaaaca gtgtgcttgc 32350 ctattgctgt caactctata gccaagggaa atgctggagt gttttcgttg 32400 ttctgttttt gttttctgga agtagccttc cagcaagatt gggaaaaaag 32450 acaaccctaa ttattccaaa gtacacactg attattccct ggctttgtgt 32500 agctgtgtat tttcctttta aaaataaaac caccatttag atgtcagact 32550 tttaggtaac ttcaaagttt atccagtcag tcagagcgtg tctcctgggg 32600 cacctggaga cagtgccctt agttcaggtc acatgcctac atgccagccc 32650 ctggtgaaat atctggagaa gtctgattcg tgggccatct gagagttatg 32700 tggactgggc cgagtctgag aaaaagtttc tcactgctcg tctgatccat 32750 atgtgttggg ctttagccct gcttaggaaa gtaatgctaa ggataggtca 32800 actttcatca ccatggcatg gagaatcaga ttgatctaag aggcatcttt 32850 attgaaataa atttttcagt ttatttgagg agcattattt tcccaagagt 32900 ataactttga tatttcaaga ttacccctaa cacttaaatt catgttttta 32950 gactataacc tcctaggtgc aatgacacat ctaacttatc taagcaccca 33000 gtttcattga aattcatttg aagagtctga gtacgcccat ttctacaagg 33050 cccaatgtcc atttcatttc gagataaact ctgctttagg taggaggatt 33100 gttggcagtt tacggcttcc atcaaggtca aggaactctg tgcaccttcc 33150 ctatgacccc aggggaagca ctcgaggact gctgtggcat tgtgctgcat 33200 cacttgctgc agggagattc tgaagaagtg taaggtctca gtcctgccct 33250 gtcccgaagc ctccaaccca cttctggcaa gtgggacctt cccagggaac 33300 aatttgttaa cagacccaaa tatcctgtga ttggatggtg gctgccaaat 33350 gctttggaag ctcagaggaa ggagagagag caatggcttg gaagaaccag 33400 gatataaact aggttctaaa gtctgcaggg agatgggctt ctcagctggg 33450 gccagtgagc agggacctta aggcagaaag gagccttgca tgttcctgga 33500 aattgagatg cccactgggg taggaaagca ccagaagctc tgggaccagg 33550 tgtcagagtt aagcctgtga ggcaggagag agcagaacaa gccctgttac 33600 aaggaaactg aagcaggaga gcaggtggtg ggcaaacccc ttgaggctgt 33650 ttgaattctt cggccaagtg aggtacagac cagggcccta tgaacacctg 33700 caagcaagac agccacgcag ttgtgggtca ccttggaaga atattggaga 33750 atgcaagaga gaacaggtaa atgtcctgca aaatgcgggt cactttaacc 33800 caacacatat tcatttaaga aaagctctgt gattgagaaa catttgtctg 33850 atgccagtta gcacatacca atgacggcaa gattcaggag cctgttatta 33900 aagcagtggc agcgagcacc tggaagaggc ggccaccatc accaggagcc 33950 agcagggatg actaataagc cgtgccagct gcatctcgtt tctctcttga 34000 cagttgctat gccagtagat gagggatgta ctgtggatac aatgctgtca 34050 tatcttattc agcagggcat ctgatagcat cccacaaatc tgcctgagta 34100 gaagacagac agctgtggtc tgggtgccat ataggtaggt taaaatatat 34150 atttgggcct aggcgcagtg gctcatgcct gtaatcccag cactttggga 34200 ggccaaggca ggcggatcac ttgaagtcag gagttcaaga ccagcctggc 34250 caacatggcg aaaccccgtc tctactaaaa atacaaaaat tagctggaca 34300 tagtggtggg cggctgtaat cccagctact cgggaggctg aggcaggaga 34350 atctcttgaa cccaggaggc agaggttgca gtgagccgag atcatgccac 34400 tgcactccag cctgggcaac agagtgagac tctgtctcaa aaaaataaaa 34450 taaataaata aataaataaa atatatactt gggtaaagag gataaaagag 34500 ttagcgatga tgctgaattt ttgaactgag gtggctgttt tcaaggaaga 34550 ctggagggtg ggatgctacg tctagatatg ttgcagttta ggtgaatgtg 34600 agacttccct gttttgaagt caaatattgg accagtaaaa tctagccatc 34650 agcttaaatt cctatgatac aatttacata ctccccaggc tcaacacagt 34700 agatttctga atgtcctctg ccagctacat gctcctgccc acctcaatcc 34750 gagtagatgg aacaactaac caagccagct cagaccggtg gcacagctgt 34800 gctggctaac actgggcacc acctaagaga gtgcttctcc aaaagtgtgc 34850 ttccccaaat ggagcgaaat acgcttgagg aatgttgggt tgaaccatgt 34900 aaagcaggtc tcattcccgc agagcctttg gtaccccggt gtacactgta 34950 accccagaag tgtttcctga gcttgcctga cgagacaact tttccaagaa 35000 ccgtctcaag tgatgagtgt tttgtgagtc acactttggg gaaagcgggc 35050 ctaagttagc atctcctccc agctgcctcc ctgctttccc tggaacacta 35100 ggaactgccc gtcctccctc cctccctcct cttcccactt cacaacttag 35150 catcaggaat attttagttt tggtttttca aacatatata cctccttttt 35200 tcttatcttg tcaatatcat cttttttttt tctttgcttt tcctcatact 35250 tttttttctc ttcatccttt ccttctccaa gggttaactt tccaccttag 35300 gagaatcttt tctgcttttt ctcccacttc cccagctact ctcttatcat 35350 ctgctccaat ctcaccctaa ttgatcattt tgggaaaata tggtcagagt 35400 ccagataact aagttgagaa atgcttaaac tctgccatac ctttccagta 35450 aagaatatta cctaataaat aataaaatgg taatgggaaa cctgaaccct 35500 gaaaaaaaag aggtggaagg agaaacattt ggagcacatc ctgtctacaa 35550 attaggaact gcctgtgtta tctgttttat ggttatattc tagaagaaga 35600 aagggatttt gtagcacctg gttttgacct ttctgcactg tttgttgagc 35650 aaataaacct tatgggctgt tagccctctt tatagcctct cagcttatcc 35700 ctggcccaga caccctgctg tcattttgac ttttcattcc cacacacaca 35750 tacacatgca cacacatgta cacacacaca cataccattt aagattagac 35800 agaagtaatg ctcaaaatgg agtggcttct gagacattta gtccaagggt 35850 tcccaaacag gcttttcagt atcagatttc tttctgcccc attgaaatgc 35900 tacacaacct tccgcttaca gcaggtcaca agggtttcat tctacttgaa 35950 gtaggggcca tgtcccattt ccacttcctt ggcttcccat tcagtcactg 36000 ctaggatttg cctagacccc tgaggccaga caatgtagaa acttctgctc 36050 catgtcacag gtgaggaaac aggctcagag agggacaggc tccgaaagtc 36100 acatagacaa cagtagggct gcggctcaaa ccccagcgtc tgactccagg 36150 tttagtgcct tctcagggca tcagtgacac tcctcatggc cagggtgccc 36200 ccagtgttgc tcacagtctg gtatccaggg ctgagagtgt gctgtgtgct 36250 cagactgcct gggttcagtc ctggcactgc cactttacag tcagtgacct 36300 caggcaggtt acttaagctc tgcaggcctc agtttcctcc ttggtgggga 36350 gggttatgag gcatccttct catggtaaac cttcagtaaa taccagccgt 36400 tactaggagg gtccactcct gcctctccac tctccattca tcctgcctgt 36450 ttcctctgcc tgcttcctct gcctgcttct gtggtggtga attcttcatg 36500 gctcccaccg cctcctgctg cacccccact cagggcccgc atcaggaccc 36550 ttcctcctat tggtttgaac tccttggagt cagagggtaa tggatagtgg 36600 agtgagccag gtggcagaat ctcagaggcc atcccgggcc tataagcctc 36650 ttcaaaatag ggccacgtat caagctttac acacaggagt gaactttcac 36700 aagttgttat gactcatact ctgtctatag taagctgtta accactccca 36750 tttggcttat gcctctgtaa ttattgtact aacttatatc ttaaaataag 36800 gatattgaag gaatgagccg ggagaggctt tcctggttga gatatagaag 36850 aacaagagtt gctctttttc cttaaggtct ctcctcccac ccctgacctt 36900 agctcaccag catgggagaa tactatttga ctccttgtac tctgagacgt 36950 ggatttcaag atatagcatt ccaacttcaa cggcagcaag aaaagaagca 37000 acagaaggag aagacatcat agcaaacagg gatgcatgct gcatttccta 37050 atactcaaac ccggaaacga gacttcactc aaggtgaagg gagggcaggt 37100 caccacctgg tagcactagc cctaaattaa ggaatgcaga atgtttgtgg 37150 gattgcccat cataaaaatt acaaaatgag taaggaatgc aggcacagct 37200 ggccaggtgg gtttgtcaca accatggcag ccctttgccc cacagccagt 37250 acacagaact ggtctctcca attccgattg catatcttct ggcacctctg 37300 ttcctctccc tcagctgccc aggatttttc tggttctgac catgttactt 37350 cctcttttaa acctgttagc atttcacgac tgcctacagg caacggtcta 37400 aatggtcgga aggcccaagc ttagcatccg agaccctgac ctacctccag 37450 ccacttcctc ctcctctcca cttcactgga ctccccatct ccacccagac 37500 acctctgttc tcccctctgt gtgcctttgc ttatgctgtc ccctgtgttc 37550 ctagtgtgtc tctggctatc ttttaagctt ccctccccaa cctcattagt 37600 tctgtggagc ccctggaata gagctgactt ctccttccct gctgctccca 37650 ggctgctcag aactttctgg aaagggatga ttatctgagt tccagcctca 37700 ccccagcccc cggactctga gtccctcatg tctgcctccc ttctttctct 37750 ctgaccacac agctggtaca tagtcagtac agacgcagtc agtgagtgga 37800 gcacggggct tctctccagg attcctgccc ctttgtttat ccctagtctc 37850 aggactccct actcctggtc ttctgcctaa atctgtgcct cttggaagtg 37900 aagcctccgt tcccagtggg gccaggtcct gacccttggg aacttgcagg 37950 atccctccct tgggcctctc cccgaagctt ccagctcaat gctgaccaga 38000 gcacaggctg cctgtgacag tccttggggt gacctccctt atcaggaaaa 38050 atgcagaaaa cctattaata ccttagcctt gtgattgtta atggtcacaa 38100 aactccttta gggtcctttg gactcagcac ctttatggtc tcactttgaa 38150 ttttgaacct cccacctccc cccatccccc agagtaaggc aaatggtctt 38200 ctgattgttc ctgcagaggg aaggctccac aggtaagcac acgatggcca 38250 ggaagcagag ctggagcctg cctgaaaggc tgtggagaaa tggagggagg 38300 gctgccctga ggactctgtc tggctttgaa gttttctact gtttcctttt 38350 cttctgtgca ctgttttagg atgatggggt gatagttcca ggctggttga 38400 ggatggattt ggagacagtc ctttgtaccc tcagtgagca agagtatctg 38450 tcaccctacc tcagcagttg tctctgtcac tggtccaagc agctggttcc 38500 tacacaaggt caagatcaac tggggagaag cagactcctg ggtctatccc 38550 attagtgagg acagctgcct gggcttatgg cctcattggt ttggtttcta 38600 tcttgatcat ctctaccatc cccccatccc ggccttccat tttctacctc 38650 agctgtcagt gcacagattg atgtgtgtgg gaacggagct tgggaggagt 38700 ggggtagggc tggtcctgtc ctgtagcctc cccttccttc gggcacttgg 38750 accctttgga gcttgccggg gtggggaatg ggagtgggaa ggccagggag 38800 tgtctctgca ccatcactgt ttgagtgttg cccctttgct gtgtgcccca 38850 cctagtctat gtgtgtctct gttctctggg gactcaattt gctggtgaat 38900 tgcttccatg gacattgttc tgggaaatgc cattttttct gctcacccat 38950 gactctgtga caaggaatga cagcttatta ggaatttgtt tttgcattgg 39000 aacagtggtc atcagaatgg gccccttttc ccttgcagct ttgacatttg 39050 cctctctttt cctcacctct ctcccttgca tccacccttt tctctttttc 39100 ttcttttttg ttttccttct agcaggggcc ttttaccttt acttgttaat 39150 cctgtttgta gcaaagcaag tggaaggagg agttcctctc tgatctgctt 39200 cttattctcc acctaccttc tcttctgtac tttccgcctc ctagagagag 39250 agagagagag aggaatgccg acctaactac cgctgccact gctgctgcca 39300 ccaccgctgc caccaccacc ctggtaatgt tcacatgtcc tcaaatcaac 39350 ccagagccag ggccctgctg gtcaggggga ggctatgtaa ataatcccat 39400 gagtgtgcca tcctcaggcc ctggggtctc ctaggcaaga ccagggcctc 39450 tgtgggctct ctcggaaatg ctgaggttgc tggaagccag cccgtcatac 39500 agggtctgag agtttaactt cttttaaatt aaaccacagt tgagctcatg 39550 ctgtgtgtgt ataaactttt gtatcctgct ttttccttaa attctttatc 39600 atcagcatct tcccatgtta tttcatagtc ttcatcatca tcactttcca 39650 taccttcata gtagttgatc gtagaattcc atcataatta acttgtcttt 39700 tctctcttag aagtccctta ggtaatgtcc aattttccgt gagtgtaagt 39750 aataccataa tgaacatctt ggagtctgaa gtttattctg tgttggtttg 39800 ttccacattt aggatcattt tcccaggcta gattttcaga tgtgggatta 39850 tgggttcaga tatggtttac acatttttat agttcttaat acagatggcc 39900 aaattgcttt ctgaaagaga agcttttctt aagtattttt ctccaacttg 39950 tatcttaaac atcctgaaca tgcttagcac cactgtcttg atatatctgc 40000 ggaaagccac gtctccactt ttcagtgtgt cgggccctgg gagaggcagg 40050 catcctgcgc tggctccttg gagctgggtt taaaattgtc tcctctggct 40100 gggcgtggtg gctcacacct gtaatcccag tactttggga ggccgaggtg 40150 ggcggatcac taggtcagga gatcgagacc atcctggcta acatggtgaa 40200 accccgtctc tactaaaaat acaaaaaatt agccgggcgt ggtggcgggc 40250 acttgaaaag tcccagctac tcgggaggct gaggcaggag aatgatatga 40300 acccgggagg cggagcttgc agtgagccga gatcgcgcca ctgcactcca 40350 gcctgggcga cagagtgaga ctccatttta aaaaaacaaa caaacaaaac 40400 aaaaaaacaa acaaacaaaa actgtctctt ctgtgctcac ttcacccaga 40450 atccctgttg ggctcttcaa ggagctcagt tctctctgaa agcaacttta 40500 tagcctcagt ccagtctgtg ttcctgtgtg gcaggggtca agggtatgct 40550 cactcttgag agtggtgtct ttggttgacc aagaaccact cccatagcct 40600 ggtccctaac ccttgaaggc ccatctctct cactcactgg ggtgaagagt 40650 ttaaatctca gatccaagtt ttgttgagag ctctgagcta ccatattgct 40700 atggttaaca atagttaaca atgttaacaa tggttaacta tggttaacaa 40750 tagttaacaa tgtttaacaa ctagagccca gctgggtgtg gtggcatgtg 40800 ctaacagtcc cagcttctca agaggctgag gtgagaagat tgctggagtc 40850 caggagctca aggccagcct gggcaacatg gcgagaccct gtctcccctg 40900 caaaaaaaca acaacaacaa aagcaaaact agagcccaac tgctgtgaac 40950 tcatggctga gtagatatta ttagccctcc acaaactcag catttgtata 41000 atcccaggct gtttccagta attctctggg gatcatctcc cagcctgtcc 41050 actgttccag gatccacact taggcctata ggaatgcccc gtcagagctt 41100 ctgctgccgc tgatctgtta ctgtttcatg caacccactc ggcctagttc 41150 cttcctctta ctgtctcagt gggcacagaa aagcatacag agggtgtttc 41200 agcaaacatt gccactggct gcagacctgc ccccggatct gtcctgttga 41250 gagcttagtg ctgcgttctt gcatggtggg gaggggtgtg gctctgtgat 41300 gagccagggc atgtgtatag gagcaacagt gtctctctta tcacgtagaa 41350 gttctgactc attgcgagtc ttggctttgg gttaatggtt ccagccatgt 41400 tgctgctgtg tcttttggtg caggagaggc tgggcacagt tggtccctaa 41450 gccattatgg ataagggatg tgtctgctga tatacacaca tggacctgac 41500 atccagggaa ggcagggtga ttggacagaa cagttcttcc agaagctgtt 41550 ggaacttgga caagagtggc ccttggcttt ctgtagttgg tcatctgtcc 41600 cctgttgcaa tcaggggaag gccacacttg ccttccttaa ccacagttag 41650 gattttcttg gggattagac cagattctag cacctgtcct gaacctctcg 41700 ccccgcccct acaaaggctg cttgcaagtg tagtgcacat acacagggag 41750 caggtggggc atggaagtgg aagtggagcc cctgcctttg gcccttgggg 41800 gaggcactgt ctgcttaccc acggttgttg cctcatagga atcatacaac 41850 agcttcctaa ctggtctcct tgccttcagt tggattgggg cacaaatccc 41900 tccttgacat ataaaccatg gtttaaggct ccctgtggcc taaataaaga 41950 taaagcttaa gtatcttaac aagcacctaa cccttctccc cagcctcggt 42000 gatttggctc atcgctgcct tcatgtttca ttctggcttc actcattcgg 42050 aatttcttgt agttccttgg ctgttctctt ttccttaccg cctttacaaa 42100 tgctctcacc atgcatgctt ttctctgctc ctacagatgc cttctctccc 42150 agcaccgcct ccagagtcta tgtctggtcg attctgtctg ctgtctccag 42200 tccccatctt gtggcagtct ctgctcaatc atttggggat tttatatgtt 42250 ttctggcctt tcttttgggg gcctgtcttc tccttctaaa agcagccagt 42300 tgacctagaa ggaagggata actgtaactc ttgtctacca acataagatt 42350 aggcccaccc tttaaaagct gcgtctttga aagggacacc tgcacccagc 42400 atgctggctt ctcttcacca agcgtgactt cctacgcatt tcacaggcct 42450 ccagaggtcc ccctgactct cttctgctgt gagaaactct aatcatgtaa 42500 gccacaggct aattcccttg agccttaaat gtttttagta atttcccatt 42550 catcagagaa gcaggatttg ggaggaattt tgaagcaaac actacagaag 42600 gcagagtctc caggtaggat atctaagaga catttggaat ggtctgactg 42650 ttcaagatgg atgggaaagc ctcttcctgt aatgatagta gccaacattt 42700 gttgtcaggc agtggggccc catttttgag atggggtctc tgtcacccag 42750 gttggagtgc ggtggtgctg tcatggctca ctgcaacctc agcctccccg 42800 ggctgggtct tcttaattct gaaaaaccca gcttttaaag ggtggaccta 42850 atcttatgtt ggtagacaat gttgtctcat ttaatacaat gcacatgctc 42900 tccccataac acaaaagagg gaactgaggc ctggaggtgt gatgtacccc 42950 aagtcacata gctaataaat aaagaagcca gcattcctgg gattaaaaat 43000 gcatgtgtct gtcactgtgg tgtatttggt gcttgatcaa tgtttacttg 43050 agcaaatgga ggggcagagg taccgatgag tgtgctcagt gaggagggca 43100 ggagtgaagc tgggcgtctt cccgcctctt gtgagtggtg gggcttggtg 43150 agcttgccag ggcctgtctt tcttatcaaa gaaggtgtgt gccccagtgt 43200 tacagcattt cacccaaagc agcctagaaa atgcttgact tttctgtcat 43250 tccggggagg acactttcct cctccactgt tctgctggcc tggtgtaccc 43300 acggcccctg atagatgata gcacctgcta aagtgcacca tgcccttccg 43350 tctcactgca tcccacagat gaggccaggc tgggatgagg gagaaaggga 43400 gggatatata gttcaggtta ttttggaaaa ctgcctgacc aattttaagt 43450 ctgggccgga cactggggca tctcaccacg ttgaaagggc cgtggcaccc 43500 cgggcggtga aaggggctgg aaccaggtct gcttcttggg cttctcctcc 43550 agggtgccat tgctcatggg ccttggctgc agaggtgctc attcgtggtt 43600 ccaaaattcc aattcctggg agaggaaaaa tgcttagttc agtctcagtt 43650 aggcctctgc ttagatcaaa cagccaaggc cagtaggccc agtcctatgg 43700 tagagacatg gcctcaaaga gccctctgct gcagttgttg gggagtgtac 43750 caagagaagg gagcattgtc ctgggctggg cagccctggg ggtctagtgc 43800 atagatgtag aaaggctctg ttggtatacc tccctttgct tgttggaaag 43850 tgctcaacgg ggctgaattg tgtttgacag tgtaagtctg ggctggggtg 43900 agggttgtta caagattgtc aagatgatta aatgaaatgc catttgaaac 43950 acttatccat gccttgtgta tggtatcccc accagtgaat attcacagta 44000 tattataata attccaacaa cttcataatt ttcatatgca atttctaaac 44050 tttgaacttt tttttttttt tttttttttt tgagacagtg tctcgctctg 44100 ttgcccaggc tggagtgcag tggcgcaatc ttggctcact gcaacctcca 44150 cctcccggct tcaagtgatt ctcctgcctc agcctcctga gtagctagga 44200 atccaggcgc ccgccaccac acccagctaa tttttgtatt tttagtagag 44250 acgggctttc gccatgttgg ccaggctggt ctcaaactcc tgacctgagg 44300 tgatccaccg ccttggcctt ccaaagtgct aggattacat acgtgagcca 44350 ctgtgcccgg caattttttg tgtttttagt agagatgggg tttcaccatg 44400 ttggccaggc tggtctcgaa ctcctgacct caagtgatct gcccgcctca 44450 gcctccctaa tgctgggatt acaggtgtga gccaccacgc ccagcctaaa 44500 ctttgaattt ctttgaaccc atgacttaca cagaattagc tgaacgcaga 44550 attccaaatc aactcagcct gtgggacagc caaaaaacac agtgtgcctt 44600 tgggctcctt cactcaccac gcggggttag aaaactttgt cagaggcttt 44650 aaaaaaggag ctcttgtgtg taaaatgttt ccttgattct ctttctggtg 44700 cctctctttc tctaagtggt ttgcttcccc aagttcccca cctgagtctg 44750 ggtggctgtg gcacatctgt gcattctgta cgcacacagg cagccttttg 44800 gagtgccagt ttccaggtct tggttttatt tatttattta tttatttttt 44850 tgagatgggg gtctcactct gccgcccagg ctggagtgca gtggtgccgt 44900 catggctcac tgcaacctca acctccctgg gatcagttga gcctcctacc 44950 tcagcctcca gagtactagg gaccaccatg cctggcaaat ttttgtaatt 45000 ttttgtagag gcagagtctc accatgttgc tcaggctggt ctcgagctcc 45050 tagactcaag tgatctgccc accttggcct cccaagtgtt aggattacaa 45100 gtgtgagcca ccatgcccag cccaggtcat cttttgaggg catggagaga 45150 agactttgag catcccactt ttgagattgt gtaccagtcg caagccccta 45200 tgacacactt tttccccaaa gtagagggct ctgactatgt tgatcccaag 45250 agagatggga aagagcattg aatgaggatt ccaaagtatt gggccttagt 45300 tcgtttcctc atgttggtgt tgtgaagatt ctggttagga taacagcatg 45350 tgtgcaggag gctttgtgaa ctgctgagag tgaggcgtgg caatgtcagt 45400 gctaggtttg tccttactaa cctggggcca tgggaattga taagaccaga 45450 ttcccaactc taccccacaa tgtgatccct gtggtgaccc ctcacagggc 45500 tctttggtcg agcttcccag aagggatcac catctgccat tgtatgttga 45550 accccattca ttcattcatt cattcagcca accagcaact atttgttgag 45600 ctcttattgt gtgagaagca gtcttcaagg aactgggtga ataaaaaaaa 45650 caaaacatcc taaccttcat tgagcttaca ttcttactga aagaaaacaa 45700 ataaaacata catgtaatcc tagcactttg ggaggccaag gcaggcggat 45750 cacttgaggt caggaatttg aaaccagcct ggccaacgtg aaacccatct 45800 ctactgaaaa ttaaaaaaaa aaaaaaaaaa aagccgggca tggtggcaca 45850 tgcctgtaat cccagctact cgcgaggcta aggcaggaga atcgcttgaa 45900 tcctggaggc agaggttgca gtgagccaag atcataccat tatactccag 45950 cctcagtgat gaagcaagac tccatctcaa aaataaaaaa taaaaataaa 46000 aatatgcatt ccctttgcac cagcacactt ggtgcctggg gacctcgtgg 46050 ttggcaccct gaagcaggtg tccctcttct gtcttgcaca ccttgcttct 46100 gtcctggtgt gtatggcatg gccttctgcc ctccatggtg agcactgtga 46150 gggcagaggt tgagttgggt ttgctgtatt tctcaggtgc ctaggtttgt 46200 gcttgacagg tagatggaag gcacacaatg tggtcatcaa acctcagtca 46250 accatataag gaaggtagaa gtgaaaagtc ccataggtac ccaactaatg 46300 tcaccagttt cctggatacc tttcctggag tttatttata gtgtgtataa 46350 ataaatgatg tatgtgttta aatgcctttt tcacctttcc ttttagagct 46400 gcctcttttt aacagttcca ttccattgta tggatgtact atgatttatt 46450 gaaccagttc cctactgatt attctgtttt ttgcagtctt ttgttatgat 46500 gaacattcca cagtgacaat gttgttcata gtcattcaca cacatgcaag 46550 tccttctgca ggatatattt ctagagggga attgctgact cagaggtttt 46600 ggtactctgt gttgattgta gagtgacggc agaaaagtga ggcccaagag 46650 tttcctagtg accatgtgta gtggacaagt caccagtccc tgtgagtgtt 46700 tggcccaaag gctttaaggc atttgatatc actgtttttg tttctgcacc 46750 aggcgggaga cactatattc aatcgtgcta agctcctcaa tgttggcttt 46800 caagaagcct tgaaggacta tgactacacc tgctttgtgt ttagtgacgt 46850 ggacctcatt ccaatgaatg accataatgc gtacaggtgt ttttcacagc 46900 cacggcacat ttccgttgca atggataagt ttggattcag gtaagagata 46950 ctcagtcaga atctgtggta aacatgtctc tctcatgtgt tgactaggaa 47000 atgcagtcct ggcagctcaa gagtgcctct ttaagctctg gagcagaatg 47050 cctcctctga gaaatgggtg ctttgtatta gttgagatgg aaagaagaga 47100 ccagaaatgc ctgtagtctc tgcacatcca gacaaaaaca aattttcccc 47150 cctttttttt ttttgtttgt tttttgagac agggtctggc tctgtcaccc 47200 aggctggagt gcagtgccgt gatcttggct caccgcaacc tctgcctccc 47250 gggttcatgc catcctgtca cctcagcctc ctgagtagct gggactacaa 47300 acacttgcca ccatgcgcag ctaatttttg tatattttgt agagatgggg 47350 ttttgctgta ttgcccagtc tggtctcgaa ctcctgagct caagcaatcc 47400 atctgccttg gcctctcgaa gtgctggatt ataggcatgt ggcaccatgc 47450 ctggcctaag aacagttttt agcatttggg aggggctctc atctttaagc 47500 tccaaatgat actgtatttt cttgcttttt tctttctctt gccccacaag 47550 ttttggaaag taaattggaa tagttttccc ccactgaatt atttagcttg 47600 tatacctcag cagatgttcc ttggcctgtt ttgttttgtt tttgagacag 47650 ggtcttgctc tgtcacccag gctggagtgc agtgacacaa tcatggctca 47700 ctgcagcctt gactgcctgg gctcaatcca tcctgcagcc tcagcctcct 47750 gagtagttgg gactacaggc atgagccagc atgtccagct aattttttat 47800 ttttagtgga gatgaggtct ggctatgttg cccaagctgg gcttgaactc 47850 ttgggctcaa gtgatcctct cacctcagcc ttccaaagca ttgggattac 47900 aggtgtgaac cactgctccc gcccttggcc ctataagaag gaatgtgatt 47950 ctgttttcca gcagggcaca aacttctgct taaatacaaa gcccaaattt 48000 ttccaccaaa atgcccctag tgaagtggcc agcccagatg cccgactagc 48050 gtattatcca aagcatattg tcattggtgg aaaatggcct tatagtccat 48100 tgttttgtct taaaagtaaa tatataaata aacttgtata ttgtttccta 48150 attccgtgtt tatattaaca taaaagtgtt ttaaattacc tgtcagtggc 48200 caggtgcagt ggctcgtgcc tgtaatcgca gcactttggg aggccgaggc 48250 gggcagatca cctgaggtca ggagttcgag accagcctga ccagcatggt 48300 gaaaccctgt ctctactaaa aatacaaaaa ttagccaggt gtggtggcag 48350 gtgcctgtaa tcccagctac tcgggaagct gaggcaggag aattgcttga 48400 acccgggagg cagaggttgc agtgagttga gatcgcgcca ttgaacttca 48450 acttgggcaa cagagcaaga ctctgtctca gagaaagaaa aaaaaaaacc 48500 tatcagttga ataacaaaac cctttccttc cttgctttaa gtgaatctga 48550 agatccagga gctgtgctgc aggtaccctc tatgttgggt acccctggtt 48600 taggctgact agtacagtgt ggttggctca tgtagacagc agacccttta 48650 ttttagatac aacttttttt ctttttcttt tatttttttt gagacagagt 48700 cttgcttgtc acccagcctg gagtgcagtg gcgtgatcat ggctcactat 48750 agccttaaac tccctggctc aagtgatcct ctcacctcgg ctttcctagt 48800 agctgggacc acaggtgtgg gccagcaccc ctggctgatt taaaaaaaaa 48850 aaaatttttt tttttagaga tgtctcacta tgttacccag gctggtcttg 48900 aactcctggg ggctcaagca atcctcctgc tttgacctcc caaagtgctg 48950 ggatgacagg catgaactac tgcacctgct gagatgcaac agctttctgt 49000 cagactcatt ttattctcat catttcttcc tgtcctccct tgctgggagc 49050 atgagagctg tgatgggaat ataggaatgt atgaagtcct tctcccagat 49100 caaaaatcct aacttcttgt cttaaaggga ggaaaatttg aatgtaacct 49150 tacttttaga ctcttcagaa atccttctat acccttccgt ccccgctttc 49200 acccttcctc cctctccgtg tgtgtatctt cttctcttga aacacacagg 49250 tttataccct gacccctctt gattcatccc ttgaagcaca gtggtgaaca 49300 aggaaggggc ccgtgatgcc ctaattcttt gccacagcac catgtttgtt 49350 tcacaaggag cctggcaggt ttgggcttgg ggcagatagg ggagagaaag 49400 cagcagagac agcaaaacca aatcatgtca gcttggcatg tacttccctc 49450 tgaaatagct aagaatccat ttctgtaaaa gcactgatta tcagaaaacc 49500 ttattggcct ggccaccttt ggttcaaacc ctcacattaa taatgtggac 49550 agtagtatga ggtgtgccaa aggtggatga ctcagcacct aagtgatgac 49600 acctaattac gaataggttc attaaagcag accccctggg gacctttgct 49650 tgaggatcct tacagtcaga attcctgaat atatttgaaa ataataattg 49700 catctttatt ttcatatgtt ctgtatggtt tggctgactt ccccctcaaa 49750 gtctgagtta gagttttcct taatttatgt gatgggtttg gtctttttgg 49800 attccagaaa gagctgggtg tggtttggag ctgcactcag agtcacacaa 49850 aaccacagcc tttagagaac ccacaggaag gctttggggc acgtcctgat 49900 tcttgacatt tctcatcagt gctgactttg tatcccttag gagttcacaa 49950 ttcataacca ctgaaatatt aaaatacaaa aagttttgga aggatgagag 50000 cccagatgct ctactacttg aaaatatgtt aaaacataag ttcatcatta 50050 tacattttgc taaatcagga taaagtctga agtttcaaag aagttttatt 50100 ttagcaaatt ttcagaaaca ctgcctcaac tgttagggcc agtgttctag 50150 tcagtatgcc tttggaagca tgaaagctgg attggtcgat aggatgggtg 50200 tggaaggggg gctgtgactg ggtgggtaca gagaggctct gaaacaatct 50250 cagattccag gagttcctgg ataaggactt catgtgcggg aacagagcac 50300 aggagaagca gattcctgag ccactcagga agaactgggc ctaggcctgc 50350 tcttgtcact gactggcttt ctacataacc acagaaacag cactgtgttg 50400 tagaaagagg aagatcatac tttttgatat ctgtgtctaa tttaaggtca 50450 tctgagccct gatagaaaag caaaacagac aaaacccttg taactgctcc 50500 ctcccacccc acccaccatc aaaaaagctt tagagaggct ggacatggtg 50550 gctcttgcct gtgatcccag cactttggga ggctaaggtg ggtggatcac 50600 ctgaggtcag gagttcgaga ccagcctgac caatatggtg aaaccccatc 50650 tgtactaaaa atacaaaaat tagccaggtg tggtggcaca cgcctgtagt 50700 cccagctact tgggaggctg agacaggaga attacttgaa aacctgggag 50750 gcggaggttg cagtgagccg agatcacgcc attgtactcc agcctgggct 50800 acagagcgag actccttcaa aaaaaaaaaa aaaaaaagat ccggtttggt 50850 gtcttacaac tgtaatccca gcactttggg aggccgaggc cggtggatca 50900 cgaggttaag agatcaagac catcctgacc aacatggtga aaccctgtct 50950 ctactaaaaa ttagctgggc gtggtggcag gcgcctgtag tcccagctcc 51000 tcaggaggct gaggcagaag aatcgcttga acccgggagg cggaagttgc 51050 agtgagccta gatcgcgccc ctgcactcca gcctggcaac agagcaagac 51100 tacgtctcaa aaaaaaaata aataaaaact ctagagaagc aaaaagaata 51150 actttaaaag tgtttatgtt ctcagcaagc tttattttgg ggatgtcaga 51200 acttaactaa ccactgctcc ttctgtgtgt atgtttttcc tccagcctac 51250 cttatgttca gtattttgga ggtgtctctg ctctaagtaa acaacagttt 51300 ctaaccatca atggatttcc taataattat tggggctggg gaggagaaga 51350 tgatgacatt tttaacaggt aatggtcata acttagatat ctttctcctc 51400 tgtcaacctt cacttccagt tttttaacca atgcttggtt gttccccaag 51450 gactgaccct cagatgggat gcacccctag tcagcccaca ttcttaggtg 51500 tggcttccta caggtcctgc aggtgctaaa agggatctgt aggaaaatga 51550 gtttctgaga tttttgtatt ggcctggaaa aatgtcaaat gggaaccaag 51600 tgacggggca agtttacttt gacttgctgc atgccgtttt gtactcaagg 51650 agtaaaccaa tgtcctttgt aaaaatccct cctttcatta tggtcccctt 51700 tcactgtgaa acaagtttcc ttgagcagaa tcctaactgt cttcacagaa 51750 gctttgtgtt atatttttat tttggagtat tttcacatat acaaaagaga 51800 tactgtagta taataaacct ttgaggacct atccagcccc agcaaccatt 51850 atggcctggt cagttctgtc ccatccacat cctggggctc tttttaagct 51900 ggtaaatcat tatgatgtgg gttgtcattt acagtggtaa aaaacatcta 51950 tcagtagcat ttgaaagaac attctgctca gtcctctggc tgtagaggct 52000 tcaaccccac cagccaccga tgagcacctt ctccctccag gagccagtct 52050 gagctcatta ctgagtttaa tatcagaata caccctggtg cagcctttct 52100 aaattgcagt accagttaac agaaggtgtc tgtcagagca acacccaagt 52150 cattcaagtt accattgtgt gcaaacttaa cagagaccca cgtcttcaat 52200 ataagccttg aaggaaactc cagttttagt atgtagatgg ggtatcaagt 52250 gtgtgcacat tgaacatctg ctgcatacag agcactgtgc caggcaggcc 52300 caggacactg aaaacctgga catagggtcc agacagaagc aagcctgctt 52350 ccacagaggc actcctgggc agacactctg gactgatatg acagtgtgca 52400 gggccgacag gataccacag gtctgaatgg tcagaacagc tggggaggga 52450 gggagcatcc gcaggcatct agtcccatgc taacgcagtg gcactagaag 52500 gatgggtggt gtgtggagca actttcttga aagataaagg acctaacact 52550 ttctatgcac cacttactgt gtgccaggca aggccaggaa tgtttaagtg 52600 gtctgggatc agccagttct gcctcttaac taactttgct gtcctgctct 52650 ccaggctttc attttggtcc tcattccttt tccttggacc aacacagaat 52700 cctccaccct gttctggctg cctctagtct tgttctcagc cctccatttg 52750 tttttttctg ccttttccca catgttctga agccctccat tcgtatacta 52800 ctttccagag acttccccat ggctaaaagc attttggaaa tactgtatat 52850 taggcccctt tcagatactg gcaaccgttt gtgggatgct ctgagaaggc 52900 ctctgtgact tagcctggcc cttttcagcc catcacctgc cacgtcctac 52950 cccagaccct tgtcaccagt ccccaggagc ttacgttgct ccctgagggc 53000 actaggcttg ctctcacttc catgcctttg cctgtgccat cctggctgcc 53050 caaaatgcta tggcagatac ctgttcatcc tcaactgggc tctgcctagg 53100 cttgctccag cagaggttac aaactctatg cttcttcctc tgtgtctcca 53150 acctcatctt cctcttctca cctccatcct ggccctaaag gccctatgtt 53200 tgaagcattc acactgtata ttctgtgggg cacacggccc cagtgtctgg 53250 cacatggtag tcaacaccac aaaccgcaga accagttgta aaaggacatg 53300 gagtcggaat gtgagtttta accagggtca tgctgggctg ggttctggca 53350 tgatgctggg ttgtgggctg agtgagaaca gcaagggtga tggtggatgg 53400 agcaacagtc ttgcagccgg ggctctcagg ccaagtgtat ggcagctctg 53450 tgataatgac tttcccttta ctctttgcag attagttttt agaggcatgt 53500 ctatatctcg cccaaatgct gtggtcggga ggtgtcgcat gatccgccac 53550 tcaagagaca agaaaaatga acccagtcct cagaggtgca ttctttgttt 53600 attcatactc cttccccctt taggatgagg taggctgcag gtccgaggct 53650 ctgggcctag agggaaattg aggtggtcag gttacagtgg agagggagga 53700 ggaagtacgt gtgatgattt cttcttaaga tttttgtttt aagacaatct 53750 ccttgtgctc ttttccttgt aggtttgacc gaattgcaca cacaaaggag 53800 acaatgctct ctgatggttt gaactcactc acctaccagg tgctggatgt 53850 acagagatac ccattgtata cccaaatcac agtggacatc gggacaccga 53900 gctagcgttt tggtacacgg ataagagacc tgaaattagc cagggacctc 53950 tgctgtgtgt ctctgccaat ctgctgggct ggtccctctc atttttacca 54000 gtctgagtga caggtcccct tcgctcatca ttcagatggc tttccagatg 54050 accaggacga gtgggatatt ttgcccccaa cttggctcgg catgtgaatt 54100 cttagctctg caaggtgttt atgcctttgc gggtttcttg atgtgttcgc 54150 agtgtcaccc cagagtcaga actgtacaca tcccaaaatt tggtggccgt 54200 ggaacacatt cccggtgata gaattgctaa attgtcgtga aataggttag 54250 aatttttctt taaattatgg ttttcttatt cgtgaaaatt cggagagtgc 54300 tgctaaaatt ggattggtgt gatctttttg gtagttgtaa tttaacagaa 54350 aaacacaaaa tttcaaccat tcttaatgtt acgtcctccc cccaccccct 54400 tctttcagtg gtatgcaacc actgcaatca ctgtgcatat gtcttttctt 54450 agcaaaagga ttttaaaact tgagccctgg accttttgtc ctatgtgtgt 54500 ggattccagg gcaactctag catcagagca aaagccttgg gtttctcgca 54550 ttcagtggcc tatctccaga ttgtctgatt tctgaatgta aagttgttgt 54600 gttttttttt aaatagtagt ttgtagtatt ttaaagaaag aacagatcga 54650 gttctaatta tgatctagct tgattttgtg ttgatccaaa tttgcatagc 54700 tgtttaatgt taagtcatga caatttattt ttcttggcat gctatgtaaa 54750 cttgaatttc ctatgtattt ttattgtggt gttttaaata tggggagggg 54800 tattgagcat tttttaggga gaaaaataaa tatatgctgt agtggccaca 54850 aataggccta tgatttagct ggcaggccag gttttctcaa gagcaaaatc 54900 accctctggc cccttggcag gtaaggcctc ccggtcagca ttatcctgcc 54950 agacctcggg gaggatacct gggagacaga agcctctgca cctactgtgc 55000 agaactctcc acttccccaa ccctccccag gtgggcaggg cggagggagc 55050 ctcagcctcc ttagactgac ccctcaggcc cctaggctgg ggggttgtaa 55100 ataacagcag tcaggttgtt taccagccct ttgcacctcc ccaggcagag 55150 ggagcctctg ttctggtggg ggccacctcc ctcagaggct ctgctagcca 55200 cactccgtgg cccacccttt gttaccagtt cttcctcctt cctcttttcc 55250 cctgcctttc tcattccttc cttcgtctcc ctttttgttc ctttgcctct 55300 tgcctgtccc ctaaaacttg actgtggcac tcagggtcaa acagactatc 55350 cattccccag catgaatgtg ccttttaatt agtgatctag aaagaagttc 55400 agccgaaccc acaccccaac tccctcccaa gaacttcggt gcctaaagcc 55450 tcctgttcca cctcaggttt tcacaggtgc tcccacccca gttgaggctc 55500 ccacccacag ggctgtctgt cacaaaccca cctctgttgg gagctattga 55550 gccacctggg atgagatgac acaaggcact cctaccactg agcgcctttg 55600 ccaggtccag cctgggctca ggttccaaga ctcagctgcc taatcccagg 55650 gttgagcctt gtgctcgtgg cggaccccaa accactgccc tcctgggtac 55700 cagccctcag tgtggaggct gagctggtgc ctggccccag tcttatctgt 55750 gcctttactg ctttgcgcat ctcagatgct aacttggttc tttttccaga 55800 agcctttgta ttggttaaaa attattttcc attgcagaag cagctggact 55850 atgcaaaaag tatttctctg tcagttcccc actctatacc aaggatatta 55900 ttaaaactag aaatgactgc attgagaggg agttgtggga aataagaaga 55950 atgaaagcct ctctttctgt ccgcagatcc tgacttttcc aaagtgcctt 56000 aaaagaaatc agacaaatgc cctgagtggt aacttctgtg ttattttact 56050 cttaaaacca aactctacct tttcttgttg tttttttttt tttttttttt 56100 ttttttttgg ttaccttctc attcatgtca agtatgtggt tcattcttag 56150 aaccaaggga aatactgctc cccccatttg ctgacgtagt gctctcatgg 56200 gctcacctgg gcccaaggca cagccagggc acagttaggc ctggatgttt 56250 gcctggtccg tgagatgccg cgggtcctgt ttccttactg gggatttcag 56300 ggctgggggt tcagggagca tttccttttc ctgggagtta tgaccgcgaa 56350 gttgtcatgt gccgtgccct tttctgtttc tgtgtatcct attgctggtg 56400 actctgtgtg aactggcctt tgggaaagat cagagagggc agaggtggca 56450 caggacagta aaggagatgc tgtgctggcc ttcagcctgg acagggtctc 56500 tgctgactgc caggggcggg ggctctgcat agccaggatg acggctttca 56550 tgtcccagag acctgttgtg ctgtgtattt tgatttcctg tgtatgcaaa 56600 tgtgtgtatt taccattgtg tagggggctg tgtctgatct tggtgttcaa 56650 aacagaactg tatttttgcc tttaaaatta aataatataa cgtgaataaa 56700 tgaccctatc tttgtaac 56718 <210> 3 <211> 4214 <212> DNA <213> Homo sapien <220> <223> wild-type B4GALT1 mRNA sequence <400> 3 gcgccucggg cggcuucucg ccgcucccag gucuggcugg cuggaggagu 50 cucagcucuc agccgcucgc ccgcccccgc uccgggcccu ccccuagucg 100 ccgcuguggg gcagcgccug gcgggcggcc cgcgggcggg ucgccucccc 150 uccuguagcc cacacccuuc uuaaagcggc ggcgggaaga ugaggcuucg 200 ggagccgcuc cugagcggca gcgccgcgau gccaggcgcg ucccuacagc 250 gggccugccg ccugcucgug gccgucugcg cucugcaccu uggcgucacc 300 cucguuuacu accuggcugg ccgcgaccug agccgccugc cccaacuggu 350 cggagucucc acaccgcugc agggcggcuc gaacagugcc gccgccaucg 400 ggcaguccuc cggggagcuc cggaccggag gggcccggcc gccgccuccu 450 cuaggcgccu ccucccagcc gcgcccgggu ggcgacucca gcccagucgu 500 ggauucuggc ccuggccccg cuagcaacuu gaccucgguc ccagugcccc 550 acaccaccgc acugucgcug cccgccugcc cugaggaguc cccgcugcuu 600 gugggcccca ugcugauuga guuuaacaug ccuguggacc uggagcucgu 650 ggcaaagcag aacccaaaug ugaagauggg cggccgcuau gcccccaggg 700 acugcgucuc uccucacaag guggccauca ucauuccauu ccgcaaccgg 750 caggagcacc ucaaguacug gcuauauuau uugcacccag uccugcagcg 800 ccagcagcug gacuauggca ucuauguuau caaccaggcg ggagacacua 850 uauucaaucg ugcuaagcuc cucaauguug gcuuucaaga agccuugaag 900 gacuaugacu acaccugcuu uguguuuagu gacguggacc ucauuccaau 950 gaaugaccau aaugcguaca gguguuuuuc acagccacgg cacauuuccg 1000 uugcaaugga uaaguuugga uucagccuac cuuauguuca guauuuugga 1050 ggugucucug cucuaaguaa acaacaguuu cuaaccauca auggauuucc 1100 uaauaauuau uggggcuggg gaggagaaga ugaugacauu uuuaacagau 1150 uaguuuuuag aggcaugucu auaucucgcc caaaugcugu ggucgggagg 1200 ugucgcauga uccgccacuc aagagacaag aaaaaugaac ccaauccuca 1250 gagguuugac cgaauugcac acacaaagga gacaaugcuc ucugaugguu 1300 ugaacucacu caccuaccag gugcuggaug uacagagaua cccauuguau 1350 acccaaauca caguggacau cgggacaccg agcuagcguu uugguacacg 1400 gauaagagac cugaaauuag ccagggaccu cugcugugug ucucugccaa 1450 ucugcugggc uggucccucu cauuuuuacc agucugagug acaggucccc 1500 uucgcucauc auucagaugg cuuuccagau gaccaggacg agugggauau 1550 uuugccccca acuuggcucg gcaugugaau ucuuagcucu gcaagguguu 1600 uaugccuuug cggguuucuu gauguguucg cagugucacc ccagagucag 1650 aacuguacac aucccaaaau uugguggccg uggaacacau ucccggugau 1700 agaauugcua aauugucgug aaauagguua gaauuuuucu uuaaauuaug 1750 guuuucuuau ucgugaaaau ucggagagug cugcuaaaau uggauuggug 1800 ugaucuuuuu gguaguugua auuuaacaga aaaacacaaa auuucaacca 1850 uucuuaaugu uacguccucc ccccaccccc uucuuucagu gguaugcaac 1900 cacugcaauc acugugcaua ugucuuuucu uagcaaaagg auuuuaaaac 1950 uugagcccug gaccuuuugu ccuaugugug uggauuccag ggcaacucua 2000 gcaucagagc aaaagccuug gguuucucgc auucaguggc cuaucuccag 2050 auugucugau uucugaaugu aaaguuguug uguuuuuuuu uaaauaguag 2100 uuuguaguau uuuaaagaaa gaacagaucg aguucuaauu augaucuagc 2150 uugauuuugu guugauccaa auuugcauag cuguuuaaug uuaagucaug 2200 acaauuuauu uuucuuggca ugcuauguaa acuugaauuu ccuauguauu 2250 uuuauugugg uguuuuaaau auggggaggg guauugagca uuuuuuaggg 2300 agaaaaauaa auauaugcug uaguggccac aaauaggccu augauuuagc 2350 uggcaggcca gguuuucuca agagcaaaau cacccucugg ccccuuggca 2400 gguaaggccu cccggucagc auuauccugc cagaccucgg ggaggauacc 2450 ugggagacag aagccucugc accuacugug cagaacucuc cacuucccca 2500 acccucccca ggugggcagg gcggagggag ccucagccuc cuuagacuga 2550 ccccucaggc cccuaggcug ggggguugua aauaacagca gucagguugu 2600 uuaccagccc uuugcaccuc cccaggcaga gggagccucu guucuggugg 2650 gggccaccuc ccucagaggc ucugcuagcc acacuccgug gcccacccuu 2700 uguuaccagu ucuuccuccu uccucuuuuc cccugccuuu cucauuccuu 2750 ccuucgucuc ccuuuuuguu ccuuugccuc uugccugucc ccuaaaacuu 2800 gacuguggca cucaggguca aacagacuau ccauucccca gcaugaaugu 2850 gccuuuuaau uagugaucua gaaagaaguu cagccgaacc cacaccccaa 2900 cucccuccca agaacuucgg ugccuaaagc cuccuguucc accucagguu 2950 uucacaggug cucccacccc aguugaggcu cccacccaca gggcugucug 3000 ucacaaaccc accucuguug ggagcuauug agccaccugg gaugagauga 3050 cacaaggcac uccuaccacu gagcgccuuu gccaggucca gccugggcuc 3100 agguuccaag acucagcugc cuaaucccag gguugagccu ugugcucgug 3150 gcggacccca aaccacugcc cuccugggua ccagcccuca guguggaggc 3200 ugagcuggug ccuggcccca gucuuaucug ugccuuuacu gcuuugcgca 3250 ucucagaugc uaacuugguu cuuuuuccag aagccuuugu auugguuaaa 3300 aauuauuuuc cauugcagaa gcagcuggac uaugcaaaaa guauuucucu 3350 gucaguuccc cacucuauac caaggauauu auuaaaacua gaaaugacug 3400 cauugagagg gaguuguggg aaauaagaag aaugaaagcc ucucuuucug 3450 uccgcagauc cugacuuuuc caaagugccu uaaaagaaau cagacaaaug 3500 cccugagugg uaacuucugu guuauuuuac ucuuaaaacc aaacucuacc 3550 uuuucuuguu guuuuuuuuu uuuuuuuuuu uuuuuuuuug guuaccuucu 3600 cauucauguc aaguaugugg uucauucuua gaaccaaggg aaauacugcu 3650 ccccccauuu gcugacguag ugcucucaug ggcucaccug ggcccaaggc 3700 acagccaggg cacaguuagg ccuggauguu ugccuggucc gugagaugcc 3750 gcggguccug uuuccuuacu ggggauuuca gggcuggggg uucagggagc 3800 auuuccuuuu ccugggaguu augaccgcga aguugucaug ugccgugccc 3850 uuuucuguuu cuguguaucc uauugcuggu gacucugugu gaacuggccu 3900 uugggaaaga ucagagaggg cagagguggc acaggacagu aaaggagaug 3950 cugugcuggc cuucagccug gacagggucu cugcugacug ccaggggcgg 4000 gggcucugca uagccaggau gacggcuuuc augucccaga gaccuguugu 4050 gcuguguauu uugauuuccu guguaugcaa auguguguau uuaccauugu 4100 guagggggcu gugucugauc uugguguuca aaacagaacu guauuuuugc 4150 cuuuaaaauu aaauaauaua acgugaauaa augacccuau cuuuguaaca 4200 aaaaaaaaaa aaaa 4214 <210> 4 <211> 4214 <212> DNA <213> Homo sapien <220> <223> variant B4GALT1 mRNA sequence <400> 4 gcgccucggg cggcuucucg ccgcucccag gucuggcugg cuggaggagu 50 cucagcucuc agccgcucgc ccgcccccgc uccgggcccu ccccuagucg 100 ccgcuguggg gcagcgccug gcgggcggcc cgcgggcggg ucgccucccc 150 uccuguagcc cacacccuuc uuaaagcggc ggcgggaaga ugaggcuucg 200 ggagccgcuc cugagcggca gcgccgcgau gccaggcgcg ucccuacagc 250 gggccugccg ccugcucgug gccgucugcg cucugcaccu uggcgucacc 300 cucguuuacu accuggcugg ccgcgaccug agccgccugc cccaacuggu 350 cggagucucc acaccgcugc agggcggcuc gaacagugcc gccgccaucg 400 ggcaguccuc cggggagcuc cggaccggag gggcccggcc gccgccuccu 450 cuaggcgccu ccucccagcc gcgcccgggu ggcgacucca gcccagucgu 500 ggauucuggc ccuggccccg cuagcaacuu gaccucgguc ccagugcccc 550 acaccaccgc acugucgcug cccgccugcc cugaggaguc cccgcugcuu 600 gugggcccca ugcugauuga guuuaacaug ccuguggacc uggagcucgu 650 ggcaaagcag aacccaaaug ugaagauggg cggccgcuau gcccccaggg 700 acugcgucuc uccucacaag guggccauca ucauuccauu ccgcaaccgg 750 caggagcacc ucaaguacug gcuauauuau uugcacccag uccugcagcg 800 ccagcagcug gacuauggca ucuauguuau caaccaggcg ggagacacua 850 uauucaaucg ugcuaagcuc cucaauguug gcuuucaaga agccuugaag 900 gacuaugacu acaccugcuu uguguuuagu gacguggacc ucauuccaau 950 gaaugaccau aaugcguaca gguguuuuuc acagccacgg cacauuuccg 1000 uugcaaugga uaaguuugga uucagccuac cuuauguuca guauuuugga 1050 ggugucucug cucuaaguaa acaacaguuu cuaaccauca auggauuucc 1100 uaauaauuau uggggcuggg gaggagaaga ugaugacauu uuuaacagau 1150 uaguuuuuag aggcaugucu auaucucgcc caaaugcugu ggucgggagg 1200 ugucgcauga uccgccacuc aagagacaag aaaaaugaac ccaguccuca 1250 gagguuugac cgaauugcac acacaaagga gacaaugcuc ucugaugguu 1300 ugaacucacu caccuaccag gugcuggaug uacagagaua cccauuguau 1350 acccaaauca caguggacau cgggacaccg agcuagcguu uugguacacg 1400 gauaagagac cugaaauuag ccagggaccu cugcugugug ucucugccaa 1450 ucugcugggc uggucccucu cauuuuuacc agucugagug acaggucccc 1500 uucgcucauc auucagaugg cuuuccagau gaccaggacg agugggauau 1550 uuugccccca acuuggcucg gcaugugaau ucuuagcucu gcaagguguu 1600 uaugccuuug cggguuucuu gauguguucg cagugucacc ccagagucag 1650 aacuguacac aucccaaaau uugguggccg uggaacacau ucccggugau 1700 agaauugcua aauugucgug aaauagguua gaauuuuucu uuaaauuaug 1750 guuuucuuau ucgugaaaau ucggagagug cugcuaaaau uggauuggug 1800 ugaucuuuuu gguaguugua auuuaacaga aaaacacaaa auuucaacca 1850 uucuuaaugu uacguccucc ccccaccccc uucuuucagu gguaugcaac 1900 cacugcaauc acugugcaua ugucuuuucu uagcaaaagg auuuuaaaac 1950 uugagcccug gaccuuuugu ccuaugugug uggauuccag ggcaacucua 2000 gcaucagagc aaaagccuug gguuucucgc auucaguggc cuaucuccag 2050 auugucugau uucugaaugu aaaguuguug uguuuuuuuu uaaauaguag 2100 uuuguaguau uuuaaagaaa gaacagaucg aguucuaauu augaucuagc 2150 uugauuuugu guugauccaa auuugcauag cuguuuaaug uuaagucaug 2200 acaauuuauu uuucuuggca ugcuauguaa acuugaauuu ccuauguauu 2250 uuuauugugg uguuuuaaau auggggaggg guauugagca uuuuuuaggg 2300 agaaaaauaa auauaugcug uaguggccac aaauaggccu augauuuagc 2350 uggcaggcca gguuuucuca agagcaaaau cacccucugg ccccuuggca 2400 gguaaggccu cccggucagc auuauccugc cagaccucgg ggaggauacc 2450 ugggagacag aagccucugc accuacugug cagaacucuc cacuucccca 2500 acccucccca ggugggcagg gcggagggag ccucagccuc cuuagacuga 2550 ccccucaggc cccuaggcug ggggguugua aauaacagca gucagguugu 2600 uuaccagccc uuugcaccuc cccaggcaga gggagccucu guucuggugg 2650 gggccaccuc ccucagaggc ucugcuagcc acacuccgug gcccacccuu 2700 uguuaccagu ucuuccuccu uccucuuuuc cccugccuuu cucauuccuu 2750 ccuucgucuc ccuuuuuguu ccuuugccuc uugccugucc ccuaaaacuu 2800 gacuguggca cucaggguca aacagacuau ccauucccca gcaugaaugu 2850 gccuuuuaau uagugaucua gaaagaaguu cagccgaacc cacaccccaa 2900 cucccuccca agaacuucgg ugccuaaagc cuccuguucc accucagguu 2950 uucacaggug cucccacccc aguugaggcu cccacccaca gggcugucug 3000 ucacaaaccc accucuguug ggagcuauug agccaccugg gaugagauga 3050 cacaaggcac uccuaccacu gagcgccuuu gccaggucca gccugggcuc 3100 agguuccaag acucagcugc cuaaucccag gguugagccu ugugcucgug 3150 gcggacccca aaccacugcc cuccugggua ccagcccuca guguggaggc 3200 ugagcuggug ccuggcccca gucuuaucug ugccuuuacu gcuuugcgca 3250 ucucagaugc uaacuugguu cuuuuuccag aagccuuugu auugguuaaa 3300 aauuauuuuc cauugcagaa gcagcuggac uaugcaaaaa guauuucucu 3350 gucaguuccc cacucuauac caaggauauu auuaaaacua gaaaugacug 3400 cauugagagg gaguuguggg aaauaagaag aaugaaagcc ucucuuucug 3450 uccgcagauc cugacuuuuc caaagugccu uaaaagaaau cagacaaaug 3500 cccugagugg uaacuucugu guuauuuuac ucuuaaaacc aaacucuacc 3550 uuuucuuguu guuuuuuuuu uuuuuuuuuu uuuuuuuuug guuaccuucu 3600 cauucauguc aaguaugugg uucauucuua gaaccaaggg aaauacugcu 3650 ccccccauuu gcugacguag ugcucucaug ggcucaccug ggcccaaggc 3700 acagccaggg cacaguuagg ccuggauguu ugccuggucc gugagaugcc 3750 gcggguccug uuuccuuacu ggggauuuca gggcuggggg uucagggagc 3800 auuuccuuuu ccugggaguu augaccgcga aguugucaug ugccgugccc 3850 uuuucuguuu cuguguaucc uauugcuggu gacucugugu gaacuggccu 3900 uugggaaaga ucagagaggg cagagguggc acaggacagu aaaggagaug 3950 cugugcuggc cuucagccug gacagggucu cugcugacug ccaggggcgg 4000 gggcucugca uagccaggau gacggcuuuc augucccaga gaccuguugu 4050 gcuguguauu uugauuuccu guguaugcaa auguguguau uuaccauugu 4100 guagggggcu gugucugauc uugguguuca aaacagaacu guauuuuugc 4150 cuuuaaaauu aaauaauaua acgugaauaa augacccuau cuuuguaaca 4200 aaaaaaaaaa aaaa 4214 <210> 5 <211> 1197 <212> DNA <213> Homo sapien <220> <223> wild-type B4GALT1 cDNA sequence <400> 5 atgaggcttc gggagccgct cctgagcggc agcgccgcga tgccaggcgc 50 gtccctacag cgggcctgcc gcctgctcgt ggccgtctgc gctctgcacc 100 ttggcgtcac cctcgtttac tacctggctg gccgcgacct gagccgcctg 150 ccccaactgg tcggagtctc cacaccgctg cagggcggct cgaacagtgc 200 cgccgccatc gggcagtcct ccggggagct ccggaccgga ggggcccggc 250 cgccgcctcc tctaggcgcc tcctcccagc cgcgcccggg tggcgactcc 300 agcccagtcg tggattctgg ccctggcccc gctagcaact tgacctcggt 350 cccagtgccc cacaccaccg cactgtcgct gcccgcctgc cctgaggagt 400 ccccgctgct tgtgggcccc atgctgattg agtttaacat gcctgtggac 450 ctggagctcg tggcaaagca gaacccaaat gtgaagatgg gcggccgcta 500 tgcccccagg gactgcgtct ctcctcacaa ggtggccatc atcattccat 550 tccgcaaccg gcaggagcac ctcaagtact ggctatatta tttgcaccca 600 gtcctgcagc gccagcagct ggactatggc atctatgtta tcaaccaggc 650 gggagacact atattcaatc gtgctaagct cctcaatgtt ggctttcaag 700 aagccttgaa ggactatgac tacacctgct ttgtgtttag tgacgtggac 750 ctcattccaa tgaatgacca taatgcgtac aggtgttttt cacagccacg 800 gcacatttcc gttgcaatgg ataagtttgg attcagccta ccttatgttc 850 agtattttgg aggtgtctct gctctaagta aacaacagtt tctaaccatc 900 aatggatttc ctaataatta ttggggctgg ggaggagaag atgatgacat 950 ttttaacaga ttagttttta gaggcatgtc tatatctcgc ccaaatgctg 1000 tggtcgggag gtgtcgcatg atccgccact caagagacaa gaaaaatgaa 1050 cccaatcctc agaggtttga ccgaattgca cacacaaagg agacaatgct 1100 ctctgatggt ttgaactcac tcacctacca ggtgctggat gtacagagat 1150 acccattgta tacccaaatc acagtggaca tcgggacacc gagctag 1197 <210> 6 <211> 1197 <212> DNA <213> Homo sapien <220> <223> variant B4GALT1 cDNA sequence <400> 6 atgaggcttc gggagccgct cctgagcggc agcgccgcga tgccaggcgc 50 gtccctacag cgggcctgcc gcctgctcgt ggccgtctgc gctctgcacc 100 ttggcgtcac cctcgtttac tacctggctg gccgcgacct gagccgcctg 150 ccccaactgg tcggagtctc cacaccgctg cagggcggct cgaacagtgc 200 cgccgccatc gggcagtcct ccggggagct ccggaccgga ggggcccggc 250 cgccgcctcc tctaggcgcc tcctcccagc cgcgcccggg tggcgactcc 300 agcccagtcg tggattctgg ccctggcccc gctagcaact tgacctcggt 350 cccagtgccc cacaccaccg cactgtcgct gcccgcctgc cctgaggagt 400 ccccgctgct tgtgggcccc atgctgattg agtttaacat gcctgtggac 450 ctggagctcg tggcaaagca gaacccaaat gtgaagatgg gcggccgcta 500 tgcccccagg gactgcgtct ctcctcacaa ggtggccatc atcattccat 550 tccgcaaccg gcaggagcac ctcaagtact ggctatatta tttgcaccca 600 gtcctgcagc gccagcagct ggactatggc atctatgtta tcaaccaggc 650 gggagacact atattcaatc gtgctaagct cctcaatgtt ggctttcaag 700 aagccttgaa ggactatgac tacacctgct ttgtgtttag tgacgtggac 750 ctcattccaa tgaatgacca taatgcgtac aggtgttttt cacagccacg 800 gcacatttcc gttgcaatgg ataagtttgg attcagccta ccttatgttc 850 agtattttgg aggtgtctct gctctaagta aacaacagtt tctaaccatc 900 aatggatttc ctaataatta ttggggctgg ggaggagaag atgatgacat 950 ttttaacaga ttagttttta gaggcatgtc tatatctcgc ccaaatgctg 1000 tggtcgggag gtgtcgcatg atccgccact caagagacaa gaaaaatgaa 1050 cccagtcctc agaggtttga ccgaattgca cacacaaagg agacaatgct 1100 ctctgatggt ttgaactcac tcacctacca ggtgctggat gtacagagat 1150 acccattgta tacccaaatc acagtggaca tcgggacacc gagctag 1197 <210> 7 <211> 398 <212> PRT <213> Homo sapien <220> <223> wild-type B4GALT1 sequence <400> 7 Met Arg Leu Arg Glu Pro Leu Leu Ser Gly Ser Ala Ala Met Pro Gly 1 5 10 15 Ala Ser Leu Gln Arg Ala Cys Arg Leu Leu Val Ala Val Cys Ala Leu 20 25 30 His Leu Gly Val Thr Leu Val Tyr Tyr Leu Ala Gly Arg Asp Leu Ser 35 40 45 Arg Leu Pro Gln Leu Val Gly Val Ser Thr Pro Leu Gln Gly Gly Ser 50 55 60 Asn Ser Ala Ala Ala Ile Gly Gln Ser Ser Gly Glu Leu Arg Thr Gly 65 70 75 80 Gly Ala Arg Pro Pro Pro Pro Leu Gly Ala Ser Ser Gln Pro Arg Pro 85 90 95 Gly Gly Asp Ser Ser Pro Val Val Asp Ser Gly Pro Gly Pro Ala Ser 100 105 110 Asn Leu Thr Ser Val Pro Val Pro His Thr Thr Ala Leu Ser Leu Pro 115 120 125 Ala Cys Pro Glu Glu Ser Pro Leu Leu Val Gly Pro Met Leu Ile Glu 130 135 140 Phe Asn Met Pro Val Asp Leu Glu Leu Val Ala Lys Gln Asn Pro Asn 145 150 155 160 Val Lys Met Gly Gly Arg Tyr Ala Pro Arg Asp Cys Val Ser Pro His 165 170 175 Lys Val Ala Ile Ile Ile Pro Phe Arg Asn Arg Gln Glu His Leu Lys 180 185 190 Tyr Trp Leu Tyr Tyr Leu His Pro Val Leu Gln Arg Gln Gln Leu Asp 195 200 205 Tyr Gly Ile Tyr Val Ile Asn Gln Ala Gly Asp Thr Ile Phe Asn Arg 210 215 220 Ala Lys Leu Leu Asn Val Gly Phe Gln Glu Ala Leu Lys Asp Tyr Asp 225 230 235 240 Tyr Thr Cys Phe Val Phe Ser Asp Val Asp Leu Ile Pro Met Asn Asp 245 250 255 His Asn Ala Tyr Arg Cys Phe Ser Gln Pro Arg His Ile Ser Val Ala 260 265 270 Met Asp Lys Phe Gly Phe Ser Leu Pro Tyr Val Gln Tyr Phe Gly Gly 275 280 285 Val Ser Ala Leu Ser Lys Gln Gln Phe Leu Thr Ile Asn Gly Phe Pro 290 295 300 Asn Asn Tyr Trp Gly Trp Gly Gly Glu Asp Asp Asp Ile Phe Asn Arg 305 310 315 320 Leu Val Phe Arg Gly Met Ser Ile Ser Arg Pro Asn Ala Val Val Gly 325 330 335 Arg Cys Arg Met Ile Arg His Ser Arg Asp Lys Lys Asn Glu Pro Asn 340 345 350 Pro Gln Arg Phe Asp Arg Ile Ala His Thr Lys Glu Thr Met Leu Ser 355 360 365 Asp Gly Leu Asn Ser Leu Thr Tyr Gln Val Leu Asp Val Gln Arg Tyr 370 375 380 Pro Leu Tyr Thr Gln Ile Thr Val Asp Ile Gly Thr Pro Ser 385 390 395 <210> 8 <211> 398 <212> PRT <213> Homo sapien <220> <223> variant B4GALT1 sequence <400> 8 Met Arg Leu Arg Glu Pro Leu Leu Ser Gly Ser Ala Ala Met Pro Gly 1 5 10 15 Ala Ser Leu Gln Arg Ala Cys Arg Leu Leu Val Ala Val Cys Ala Leu 20 25 30 His Leu Gly Val Thr Leu Val Tyr Tyr Leu Ala Gly Arg Asp Leu Ser 35 40 45 Arg Leu Pro Gln Leu Val Gly Val Ser Thr Pro Leu Gln Gly Gly Ser 50 55 60 Asn Ser Ala Ala Ala Ile Gly Gln Ser Ser Gly Glu Leu Arg Thr Gly 65 70 75 80 Gly Ala Arg Pro Pro Pro Pro Leu Gly Ala Ser Ser Gln Pro Arg Pro 85 90 95 Gly Gly Asp Ser Ser Pro Val Val Asp Ser Gly Pro Gly Pro Ala Ser 100 105 110 Asn Leu Thr Ser Val Pro Val Pro His Thr Thr Ala Leu Ser Leu Pro 115 120 125 Ala Cys Pro Glu Glu Ser Pro Leu Leu Val Gly Pro Met Leu Ile Glu 130 135 140 Phe Asn Met Pro Val Asp Leu Glu Leu Val Ala Lys Gln Asn Pro Asn 145 150 155 160 Val Lys Met Gly Gly Arg Tyr Ala Pro Arg Asp Cys Val Ser Pro His 165 170 175 Lys Val Ala Ile Ile Ile Pro Phe Arg Asn Arg Gln Glu His Leu Lys 180 185 190 Tyr Trp Leu Tyr Tyr Leu His Pro Val Leu Gln Arg Gln Gln Leu Asp 195 200 205 Tyr Gly Ile Tyr Val Ile Asn Gln Ala Gly Asp Thr Ile Phe Asn Arg 210 215 220 Ala Lys Leu Leu Asn Val Gly Phe Gln Glu Ala Leu Lys Asp Tyr Asp 225 230 235 240 Tyr Thr Cys Phe Val Phe Ser Asp Val Asp Leu Ile Pro Met Asn Asp 245 250 255 His Asn Ala Tyr Arg Cys Phe Ser Gln Pro Arg His Ile Ser Val Ala 260 265 270 Met Asp Lys Phe Gly Phe Ser Leu Pro Tyr Val Gln Tyr Phe Gly Gly 275 280 285 Val Ser Ala Leu Ser Lys Gln Gln Phe Leu Thr Ile Asn Gly Phe Pro 290 295 300 Asn Asn Tyr Trp Gly Trp Gly Gly Glu Asp Asp Asp Ile Phe Asn Arg 305 310 315 320 Leu Val Phe Arg Gly Met Ser Ile Ser Arg Pro Asn Ala Val Val Gly 325 330 335 Arg Cys Arg Met Ile Arg His Ser Arg Asp Lys Lys Asn Glu Pro Ser 340 345 350 Pro Gln Arg Phe Asp Arg Ile Ala His Thr Lys Glu Thr Met Leu Ser 355 360 365 Asp Gly Leu Asn Ser Leu Thr Tyr Gln Val Leu Asp Val Gln Arg Tyr 370 375 380 Pro Leu Tyr Thr Gln Ile Thr Val Asp Ile Gly Thr Pro Ser 385 390 395 <210> 9 <211> 20 <212> DNA <213> Artificial Sequence <220> <223> guide RNA recognition sequences <400> 9 attagttttt agaggcatgt 20 <210> 10 <211> 20 <212> DNA <213> Artificial Sequence <220> <223> guide RNA recognition sequences <400> 10 ggctctcagg ccaagtgtat 20 <210> 11 <211> 20 <212> DNA <213> Artificial Sequence <220> <223> guide RNA recognition sequences <400> 11 tactccttcc ccctttagga 20 <210> 12 <211> 20 <212> DNA <213> Artificial Sequence <220> <223> guide RNA recognition sequences <400> 12 gtccgaggct ctgggcctag 20 <210> 13 <211> 6 <212> DNA <213> Artificial Sequence <220> <223> PAM for Cas9 from S. aureus <220> <221> n is A, G, C, or T <222> (1) .. (2) <220> <221> r is A or G <222> (4) .. (5) <400> 13 nngrrt 6 <210> 14 <211> 5 <212> DNA <213> Artificial Sequence <220> <223> PAM for Cas9 from S. aureus <220> <221> n is A, G, C, or T <222> (1) .. (2) <220> <221> r is A or G <222> (4) .. (5) <400> 14 nngrr 5 <210> 15 <211> 23 <212> DNA <213> Artificial Sequence <220> <223> target motif preceding NGG recognized by Cas9 protein <220> <221> n is A, G, C, or T <222> (2) .. (21) <400> 15 gnnnnnnnnn nnnnnnnnnn ngg 23 <210> 16 <211> 23 <212> DNA <213> Artificial Sequence <220> <223> target motif preceding NGG recognized by Cas9 protein <220> <221> n is A, G, C, or T <222> (1) .. (21) <400> 16 nnnnnnnnnn nnnnnnnnnn ngg 23 <210> 17 <211> 25 <212> DNA <213> Artificial Sequence <220> <223> RNA recognition sequence <220> <221> n is A, G, C, or T <222> (3) .. (23) <400> 17 ggnnnnnnnn nnnnnnnnnn nnngg 25 SEQUENCE LISTING <110> Regeneron Pharmaceuticals, Inc. University of Maryland, Baltimore <120> B4GALT1 Variants And Uses Thereof <130> WO/2018/226560 <140> PCT/US2018/035806 <141> 2018-06-04 <150> US 62/515,140 <151> 2017-06-05 <150> US 62/550,161 <151> 2017-08-25 <150> US 62/659,344 <151> 2018-04-18 <160> 17 <170> PatentIn version 3.5 <210> 1 <211> 56718 <212> DNA <213> Homo sapiens <220> <223> wild-type B4GALT1 genomic sequence <400> 1 gcgcctcggg cggcttctcg ccgctcccag gtctggctgg ctggagggagt 50 ctcagctctc agccgctcgc ccgccccccgc tccgggccct cccctagtcg 100 ccgctgtggg gcagcgcctg gcgggcggcc cgcgggcggg tcgcctcccc 150 tcctgtagcc cacacccttc ttaaagcggc ggcgggaaga tgaggcttcg 200 ggagccgctc ctgagcggca gcgccgcgat gccaggcgcg tccctacagc 250 gggcctgccg cctgctcgtg gccgtctgcg ctctgcacct tggcgtcacc 300 ctcgtttact acctggctgg ccgcgacctg agccgcctgc cccaactggt 350 cggagtctcc acaccgctgc agggcggctc gaacagtgcc gccgccatcg 400 ggcagtcctc cggggagctc cggaccggag gggcccggcc gccgcctcct 450 ctaggcgcct cctcccagcc gcgcccgggt ggcgactcca gcccagtcgt 500 ggattctggc cctggccccg ctagcaactt gacctcggtc ccagtgcccc 550 acaccaccgc actgtcgctg cccgcctgcc ctgaggagtc cccgctgctt 600 ggtaaggact cgggtcggcg ccagtcggag gattgggacc cccccggatt 650 tccccgacag ggtcccccag acattccctc aggctggctc ttctacgaca 700 gccagcctcc ctcttctgga tcagagtttt aaatcccaga cagaggcttg 750 ggactggatg ggagagaagg tttgcgaggt gggtccctgg ggagtcctgt 800 tggaggcgtg gggccgggac cgcacaggga agtcccgagg cccctctagc 850 cccagaacca gagaaggcct tggagacttc cctgctgtgg cccgaggctc 900 aggaagtttt ggagtttggg tctgcttagg gcttcgagca gccttgcact 950 gagaactctg gtagggacct cgagtaatcc actccctttt ggggactgac 1000 gtgaggctcc cggtggggaa ggagactgac ctctcggttc acgtgtcttg 1050 ccatagagcc actctcctga gtgggttttt ctcctgatcg tttgggccaa 1100 gtgacttctc tctgaacctc atatttctct tctggggataa taaatggtca 1150 ccctttcaag gggttgtttt ggaagatatt gtgaacaatg gtaaataagg 1200 gcttaattaa tgagggtaag ccctcagtaa attgtcactg tgtgttcatt 1250 tcttcctctg tgtggatcgt gaccgagagc ccttccccct agcctcctcc 1300 tggtatgggt acccaaaacc taggtgagca gggatctctc ccaggggcag 1350 agagcttgtg tactctgggt gttagagggc taaaatataa ccagtcaaca 1400 ccacgttgcc catttctggt acttccggta gcagcctgag tctcaattat 1450 cttgcccaga tgatctgaac tctgacctct agcctgtttc agcataggca 1500 gagagcttga gtaggtgagt ttgcattcct catagcagct ggctgagcct 1550 agtctggact tctctttgac ctgtaaccta caggcccaca ggcccaaggc 1600 aaccacaggt tgcttccagg gttaccacac aggtggtttc tcatttctaa 1650 tgctaggttt tagataattg ttgtaagtga ggggccctgg caggcaggat 1700 gacatcctgc caataggagt tttctgtcac tttcccacag agccctggct 1750 actacatact cttgctcaat ttcgccagta attgcgtcaa tgtgttcata 1800 tcaagtttgg gaagaacatc ttggaattgg tcagacgtga actgtggtaa 1850 taatgggggc ttgttttttt aagcagataa ttaaattcct ttgcatttga 1900 tgattattct gggaagcaga ctagtcccat aaaatgaaat ggactctgcc 1950 ttgctgctaa gtgtctgact tgagacatgc tatcgagttt ctcaaaatct 2000 cttccttgtg taaaatgtgg ttgtcgatga ttaccttaca ggggtttttt 2050 taagactaaa tgagatcgtg tacattaaat acaggcactc aggctgggca 2100 tggtggctca cgcctgtaat cctagcactt tgggaggctg aggggagtgg 2150 atcacttgag gttaggagtt tgagaccagc ctggccaata tggtgaaaca 2200 ccatcccatc tctacaaaaa tacaaaaaag ttagccaggg gtggtggcat 2250 cgcagctact caggaggccg aggcaggaga attgcttgaa cctgggaggc 2300 agaggttgca gtgagtcaag attgtgccag tacactccag cctgggcgac 2350 gaagcaagac tgtctaaaaa aaaaaaaaaaa aaaaaaaata cgggcactca 2400 atacacgta taataataat atagtaataa tatttgctta ggatctttaa 2450 aaagtttcat tttttcagac tcccacagaa atggctctgc acagcagagt 2500 gaagggggag agagactgag tctccaggcc agaaaaaggc caggtttttt 2550 gcttttgttt ttagttgttg cctggatatt gcacagaaag aaaaaataat 2600 tagcaagtta aaaaaaagta ccgcaaagtt gattacattg gtatttgagt 2650 atcacatctt ctctcagaag cgtaagagac aaggtcgtga ccatacctct 2700 gcttagtttt gttttgtaat ggtgttgcta gtgatcggct tgtcaccagt 2750 tactggtgtt tctaaatgga ctataattgg ctacttgaaa ggacttcctg 2800 agaaagaaca ttttggagga cgaggagaga gtgccttctc tattttggct 2850 gctttcatgt gacatgcaag agaccatgac gtttaggctg ctgctgaggc 2900 agccccagaa atgggggccg agaggtcttt tcttcatttt aatagggtct 2950 gtaggtttgg gtggttaggt acagttctca gaatggaggt tcctggctat 3000 gaggccttga gaaagctgaa agtctccttg ggagtgtgtg ggtgggggga 3050 gtcgagccca tctgttcatg ggcaggtgtc agccaaagcc cttgcgggtg 3100 gttttgaggt tggtggggaga aagcatccgt ggggtttaga gttgtggcct 3150 tttcactact tgcagttcct ttccccgact tggctttact ttctggtgtc 3200 caggggtctg ggccagatgc tgagattcct ctcagctgac aggtgtgggt 3250 tatgggcaaa cccttccctg gaggacataa ggcaccggat tggactgctg 3300 atgggttgct gttggagttg tcagggcctt ggaaatagtct tcagatagac 3350 ttgggttagt gtgacctggg gcaggctgca ggtttggagc catagtaccc 3400 cccgccccca caccgggcac cctgctctgg gctaatgtga ggcttgcagg 3450 agtgagtgat gcagtgggaa ggggggcctt tcctgaggat tctacagctt 3500 tctccaggga atcctcccag gtagtttagg cctgcaggtg ctatgctatc 3550 cttctttcct aaccctgtct caggtcctca gcggggccat gcggcatcca 3600 cttataaccc tgcagcgagg ccctcttttc tggccacctg ggtgtttgcc 3650 tgctgagatg ggaggaacag tggccttggg cttcttcccc cgtcatgttt 3700 atctctgctc agatgggca gcagctcaat gggacttgac cagctgtggc 3750 actgccagtc tgaagatgag tagggtgatg gggggaggtg ggcagtacct 3800 gaagctgaac tggtgagaga ggcaggctgg cctgggggct cagctggggc 3850 ctgggatggt tggtacagtc ccctcagggg ggtaggggag tgagtgttag 3900 actgcttaag cctcagaggc cgctcttgcc cacctatgct ttgaggagat 3950 cctcttcatt tgttcaaagg gaagactctg atctagagat gggcacttgg 4000 accagcaaac agcagctaca ggtagccagg gcacccgagg agcacttgct 4050 catgagccgg tttccctggt ttttatgggg gctgttgctg agcgtctgcc 4100 agggtttgtg tcctagcact tgctggtctt tgctgggctc tcagctctca 4150 ggtgtttctc taccagcacg tttccccctc cctcatatgc acacatgtgg 4200 acacaagcag gctgcccagg acagagtgta ctttgaggct tgggaaagga 4250 ctctctctcg cccttttggg gatgagcctt ggaacctcat caccttccgg 4300 cttggggtgg agcttcatcc tgggggttga agctttaggc tcagataact 4350 agtcttgtaa gccagttttg tcctgttgtt tttttcgtgg aaaataatgt 4400 attgacgtat acacagacat tctttgtcta acagtctgag attgagaaat 4450 accctccatg actatttggt ttgctttcat ggtgaaactt ggtcgctttc 4500 ttagacacag cctatggcaa taagagtgat ccctggctgc tgtaattcat 4550 tccagacttt gagcaaacac aaggcaccgc ctccacctgc agtggagcct 4600 ctgatgaacc aaatggaaac tccttgggga atggggagta agagccaaat 4650 gtgggattgg acttaaactg cagcttctta gaactgtagc attccacgat 4700 gggattgtct agtgctcttc ctggaggtta ctattcaata gttggctagt 4750 gcacaggttc aggggtgacc tgatatgccc tagcgtttca gaagatccct 4800 gcaaggtgtg tcttttggtc catctgaagg gtcttgtatg gtgatcttgt 4850 atggatatcc gtgacggcta aggcatctga taacttcatt ccttcagttc 4900 cagcagtgtt cctgtattat gctgggcact agagctacaa agaagaaaac 4950 aaagtgcctc ctcttcagga actcttaatt taggcagggg aggcataatt 5000 gaacagtgct gaggtcatct aggggaacca aagtgtgtat ttatcccctt 5050 ccctatcact cccctccctc cttcatttct tcctttcttc tttcagaaac 5100 tccaagttca tatcaaaatt ctccagccct ggttttatattt ggttgtgtga 5150 aaattttcct ctaatttctg aagctatgca ttagttctgc tgagtaatct 5200 ttaacttgct gctttataat gattataatg agatatcact gggtattatg 5250 gtctttgggt agcagcaggg tagggatttc caggctggga ctaagctaat 5300 ttatgggttg ggaattatgg ggcagttaat agcaaggcag tccaagcttt 5350 ccacagattc caccctaggg accatccaga cttaaggaac agggccggca 5400 ggctcatccc ctttgcactc agctgggcta tgggtgtgtg tttgtgaaag 5450 aggtttattc agtagtcata cctgctgatt tccctgctat ctgtttaccc 5500 agtgcctcct gtaccttgtt tcttactctt tgttctctgc tcttactatg 5550 aagaagcaga gactggaatt ctgcttgaac ccacatctac ctggaaattc 5600 cagtttttct tgtccagtgg agcagcaatc cagttgtttt aggacaaatg 5650 gtctgccctt gaagcttaaa tcctttgagg gcctggcatg gtgacagttt 5700 tacatttggc tttggtatag actggtgtgg tccctgggca gtgaggtcac 5750 tgtaaggcca gccagccaga ccctggctcc taggggaatt aacaaggcat 5800 gggattagac tcacagggtc cctcctgtcc ctaaacttgg taggggttcc 5850 tgggagccag actgcgatta agattgtaga gacctgagac ctgagttgta 5900 ggggcctctg tgttgatctg ggccattgcc gggtgagctg aggcggtcac 5950 tagctcaagg agtgatctca ggatattgtt ctgtaagtca gagacctcca 6000 ggttggagag tggggcttgg gggtggggga cagggtttag tggggagctg 6050 gttctgggtg aatgtggcct aaagggattt gtccttagaa gacagagggg 6100 tgagtcacac actcagtgct tcaggttcca ctttgcggct tggcctcagc 6150 ccgccccttc cctgcacaaa tgaaggccag gggctatata attggctgtt 6200 gctgaattct ttggcagtga ttttaaagtc tggtctgggt gtgttatgta 6250 gctgcttctc tatccactcc ccacacccgc tgcttctcca gagcccctca 6300 caaagcccag gcagagagag agagagagag agagagaatg acttgcctca 6350 cagagatgtt ggggatagg ataggggtat gggtctttgc ttttgccttt 6400 tgagggggga taatctcttc cttcatttta aaagtaaaaa gtaatgcagg 6450 ctcattgaaa ataatttgaa aagttgaaag agatataaaa gcacacccaa 6500 attcctatca cccaaaagaa acataccggc atatttccta ctagtctttt 6550 tcatgtttaa gaatatagct gatatatttt tttttctttt tctttttgag 6600 acagggtttt tgctctgtca cccaggctgg agtgcagtga tcacggctca 6650 ctgcagcctc gacctctcgg gctaagcgat tctcccactt cagtctcccg 6700 agttgctggg accacaggtg cacaccgcca tgcctgacta atttttgtat 6750 tttttgtaga gatggggttt tgccatgttg cctaggctgg tctcgaactc 6800 cagagctcaa gtgattcacc tgccttggcc tcccaaagcg ctgggattat 6850 aggtgtcagt caccacaccc agtgttatag ctgttgtctt tatagatgaa 6900 cagatagatt gacatagatt catgtagata gcctggtgtt cagcattttt 6950 catttaagat tctgtcacag acttgaccct atacctttaa aaatcacaaa 7000 ggcagtatca tagtctgtca gctgaatatg ccataactta aaaaaatcat 7050 tcaactgttg ctgaacacac acatatacat atatagtttt tgttttttct 7100 tagtgatgta gtgatgcttg tgcagaaagc tttatgtact ttttggatgg 7150 tttctgtagg agagctttct aaaaaagggaa aaaaagtgtt gaatgttttt 7200 tgagaagggc tagattttca agccagtctt acaaaagggat agactcattg 7250 gaaattccag atttgcttag tgctggcaga tgagtatcac ttattgctga 7300 acaatgtgtc tagaattctg attaaaaaag aaactaggtc caggaagtgc 7350 ctgggggcag gggcaaaggg ccaggctgca ggataggctc ttaggatctg 7400 gctgagcaga aatctgctgt gaacagaatc ggtgggggtg atgctttctc 7450 agtaacttct ccatttgttt ctttagcagc taagtccctg tgctggactt 7500 ctgtggacta ctgtggctct ggggctgtgg ttgtgggtga acaacagcta 7550 gctaaaccag tgctgttgac atcattgaga tgtgacgcac aggaaggtgg 7600 gagcaagctt gcaaatcaga ttctgaaaca tatagcacag ctctcccacc 7650 tccaggtggt cctgagatct agggaggagc catagtgaga aactttaggt 7700 ttctaggaat tctcttaggg agaagctctc ttagggagag gcagaacctg 7750 gttctcagtt ggggctgatt caggtgggtt agatcaataa agcctcaggc 7800 cagtgtgcca ggctattccc aaggagtata ctttgaagtt actcccttta 7850 gaatgtcctc agtggagata aattctctct gaggagcagt tttgtctgcc 7900 ggggtcattt ggcacaaagc ctggagtgct agggcgaggt tgcactgagg 7950 gaagggggcag gattatgtca gcagtgtgac ggatacagtg tgaggtcagg 8000 ctccttcctg ccccaccacg ggggcctaga ggtcatgggg agggtccctg 8050 gcaggggatt caatcattgc ttggccccat gacagagtat attctaaaaa 8100 tgccttaagt ttttttcttt caaagtttct tcctgttttg cataatggcc 8150 ttttgccttt gacatcctga aaccgcagag ctgtcattgg tgttgcagga 8200 cactgccagc ttgaaaaaaa tcaacaaacaa aaaaagaaac aggaaaggat 8250 gtggagttca gggtgcggcc tagggaagct ggtatttgcg ttatgggatt 8300 gtggggatgt ggtattaagg tgttgggtag cgcctgacat ttagaggagt 8350 actctgggca gagtccctgc ctgcccaaga ataggtagaa ttgagtcttc 8400 acaccaaagt caggagagac cccctccccc caggaagaga atgaacaggg 8450 actcatttcc tcattcagca aacttttat ggtaactaca ctatatgaag 8500 tgtgagagat agacatgaac aagagaggcc cccactcttg ggcagtccct 8550 tagtagtagt agatagactc tggcaatatg gtgtggtcag agagaggaag 8600 cctgggtgct ttgagggtac tgaggaggtg cagggagcca aatgggtggt 8650 ctgggccagg gccagagtca gaatgaagga cctctcttcc agacgttgat 8700 tttagcatct ctgtctctca gtatgtttga acagtctccc ttattggaag 8750 ggcaggagtc tactgctaaa agtaacctgc gatttcctct acttgctgtc 8800 atgtggaaag aatactaaag ctgaaattcc aaaagttgca cacctttacc 8850 agcagggcag gagaggaaag gaaatggagg cagagtgagc tgaagatgat 8900 aaaagaaaga gaaggtggtg cagtttggac tgttatggac agaggaagtc 8950 tgagggtagc tggactgagg gatcaaaggg aggcagttga aagggaagag 9000 agctgcagag agggatttct tggtctgcag agggtaggag caagccttga 9050 aggctgctgg agtgaggatt ccgagccctg gtctttattc tttttctaat 9100 tcatttacatc attttaggca agtcctaact cctttggtct ctgttgtctt 9150 tctgaaattt gagtgggctg ggcctgctgg tctttagcct ctgtctttct 9200 ctacctccta gattccagtt tggcgagtgg gggggaaaac ctggttgtat 9250 atgcaacgtg aaaggcctct ggaattcctt ttgaagctca ctacccatga 9300 ggcttctgct aaggatttca tcatgtctgt ctaagcagac ataaaaattt 9350 tagcaggtgg atgacccgta gaaatggcac aaggaatgtt tctttctgtc 9400 acactgtggt atttgattta agaaagttgt tatcctctct gtgcctcagt 9450 gttctcactt gtaaaatggc aataacagta tccacctcat agatgttatg 9500 aaatacaggt agtagccacg aaagggctta aaacagtgcc taacacagaa 9550 taagttgtga atatatgtta tttattatg gtagtataat gcttatttgt 9600 gaagattttg gcttttgctt tataggacct tttttttttt tagttgaaaa 9650 tacaatgtta ccatgttaaa tgttaaaaaaa aattctactt accattgtaa 9700 cagaacatgc tcccacttct gtaacagagc ttgctattac ttttcaaatg 9750 catacatatt ccaatgcata tattccaatg cagttgtaga gtgaaactgt 9800 ttgcatgcag ccatttttat ccaaacattat cttataaaat gttatgttgt 9850 ttatgattat cctaattatc ttttgttgct gtctagtatc cttatagata 9900 ttccattagc atacactatt ccaggtttca ctatcgtcga taatctagat 9950 atgaacattt ttgtagtgtg tagctctttg cttcagttga attactttcc 10000 tgggataaat tcctggggaa gaatttctag gccagaggat atggtcatct 10050 tgacaatact gattcacatt gctgcattgc tttccaagag gtttggaatc 10100 attcacaggt tctaaattgg aaaatcctgg cttttgaagt atgtggattc 10150 taagggcgat ttggatctag ctggagcctc acactgacac ttccagccag 10200 tgtgtgtgtg tgtgtgtgtg tgtgtgtgtg tgtgtgtagt tccctatgct 10250 ggacaccgtg tgtgtgtgtg tgtgtgtgtg tgtgtgtgtg tgtgtagttc 10300 cctatgctgg acaccatgtg gcctttctgg acattagggt tttcctgtga 10350 ttgcctcaga gcagttcctg ttgaattcac tctgtgtcca caaaaggagc 10400 cttactgtgg ctctttcaac acccacctac ctttgccaag ttggtttaca 10450 gaaagtaaga acattctttc cttcttcctt gatatgtggc gctaaaaccta 10500 tagcatgggg caggctctgg ctttaaaaac ctgacttaaa aataatggtg 10550 ttgatcaaaa agtttgtgga tcagtttttg gaaacactgc atgtagccat 10600 ccatagaaac ttatattctg ttgggctagc ctgggcgcct gatcatttaa 10650 ctcatgtgga tgaacttcta tgtaatagcc ctggtgtatg ggatccagaa 10700 acagggccct aatgaagaaa ggcttttaaa ttatgttgga taaaaataag 10750 ttgttacaat agcccaaagt ctgcaaatat gaattgccag ttctgtcctt 10800 gtagtcatcc accatgtgcc tgcatctttt gtagactctt gtagattcag 10850 aagcccactg aattgcataa atgatggaat gattttagac ttagtgattt 10900 cagtgactaa aagtttacag atcctggccg ggcacagtgg ctcacaccccg 10950 tattcccagc actttgggag gccgaggtgg gtggatcacc tgaggtcagg 11000 agtttgagac cagcctggcc aacatggtga aaccttgtct ctactaaaaa 11050 tacaaaaatt agccgggtgt ggtggcatgc acctgttgtc ccagctactt 11100 gggaggctga ggtggggagaa tggcttgaac ctgggaggcg gaggttgcag 11150 tgagcccaca tcaggccact gcactccagc ctgggtgaca gagtgagact 11200 ctgtctccac ctcccccgcc ccccgaaaaa aaaaaaagtt tacagatcca 11250 gcagatgggg catattcaat ttgtgacagc cactcccttc accttatagc 11300 tatgtcatat gtcttcttct cctttgactg cattctgcag cagtcagttg 11350 tgacttaata tggcactctg ggcccactga attaggtcag agctgctagt 11400 agtatattgt tcctagagac ctagggcaag attttcttac tacataaaat 11450 gagggagata atttcttacc tcaagatgtt ggtaagagga gtgaatgagg 11500 ttagttatat ggtaatatca gtactctgaa tgtcttttga tcaatgccta 11550 actcatcttc ttgggcacaa aaggcataca gtcagcaccc ttaggccaca 11600 tataaaattc ctccaaatgc aggttttcat ctgccttggg gcagagtcaa 11650 gagaaagaag aggaagaggc gtgaggctct gaccacaact tagggacaga 11700 atatagccca aagcgagtac cccaggccac aaggagaagg ccgctatctt 11750 gttgaatcca cagcactgga aacttggagt gtgtgttccc ctgtgtcagt 11800 tacactggaa ttttatggct gctcacattc ttcccttcag gtggacgttg 11850 ttcatcagta tcctgggcaa gaggccatca taaaccacag acagctgagt 11900 gattaggaag aggagctgaa gagggagcat tagatgtttg attgagtctt 11950 aggtgagaaa gtatatcatt aaaacaaaaa gatagatgta ggcgggctca 12000 gtcttgtgtg cctggtgtgt tggtagaaaa actaaagcac aagcctgtag 12050 ataacctgct ttattctacc tcggggctgg tgttggaatc caggatgcca 12100 gaccctaaag tccagctctc tttccaaacct actgaataat ccgagagaaa 12150 tcatgttctc tctctgggcc tcagtttgcc catgtataaa atgagatgaa 12200 ggattggctg ggatgctctc cagagtctct tcctgcctgg agttctgacg 12250 tagccatgta ctcctgctca gcatcgctaa atggctttgt ggtaggacca 12300 ttgagtgctg cctccattag ggccagctat gtaatgctgg ggtggctgtc 12350 actgggccct aagagccagg attggtctta ctggagaaat ccacatccac 12400 ctaaacttaa gacccagggg tgtccaatct tttggcttcc ccaggccaca 12450 ctggaagaag aattgtcttg gaccgcatat aaaatacact aattatagcc 12500 gatgaggtta aaaaaaaaaaa actcaatatt ttaagagagt tcatgaattt 12550 gtgttgagct gcattcaaag ccatcctggc cgcatgtggc ccatgggcca 12600 tcggttggac atgcttgctt tagacctccc agcaattcta gtctctaaac 12650 aggaaatcaa aagtcaagat gaatagataa gttggtcagt gtgaaaaagt 12700 aattggtggg agccactgta gatgcagggt tctaggctcc atcaacaacc 12750 acctacatca ctgaacgaaa gataatgctt gttcagcact tattacatgc 12800 caaccatggt aaaaatactt cagatgcatt gttttcatga actctcacag 12850 cagctctttt tcttgcctaa atgccccgtt agaacctcca gtacaatgtt 12900 aaatagatat gctaagagac aacatatgtg tcttgttagg gggaaaatat 12950 ccagtctttg actattaaga atggtgttag cagtgggttt ttcctaggtg 13000 ccctttatca ggttgaggaa gttcctttct attcctggtt tgttgagtat 13050 ttttatcatg aaaaggtgat gggttttgtc aaatgctttt ctgtgtctgt 13100 tgagatgatc atgttttttt gtcatttatt ctattgatat ggtatattat 13150 acattgattt ttcagatatt aatcttgcat acctgggata aatcccactt 13200 ggtcatggtg tataattctt tttattgtt gctggattga gtttgctagt 13250 attttgttga tttgtattca taacagatag tggtctgtag tctttccctc 13300 cctccctccc tccctccctc cctcccttcc ttccttcctc tctctctctc 13350 tctctcccct cccctccctt cttttcccct cctctcccct ccccttccct 13400 ttcttctctt tcatagttgt ttaccactgt cagaaaaggt ctgttcgttt 13450 tctttcgtcg tgagatcttt gtttggtttt ggtatcaggg taatactgcc 13500 tcaaaaaatg agtaggggaag tgttccttcc tcttctgtat tttgagagag 13550 tttgtggtcg gtttttatta attcttcttt aaatatctgg tagcgttcac 13600 cagtaaagcc atctgggcct gatgttttct ttgtggaaaa ctttttgatt 13650 cctaattcag tttctggtta taggtctatt cagaccttct attttttctt 13700 aagtcagttt tgatagtttg tgtcttccaa ggagtttgct tcatctaagt 13750 catctaattt gttggcatac atttcatagt gattccttat gatccttttt 13800 atttccgtta aagttggtgt agggatagtc cctctttcat tactgattat 13850 aataatttga attttctttt tttcttagtc ttgccaaaag cttgtcattt 13900 ttattgatct tttcagagga ccaactttga gttcattatt tgttctcttt 13950 gttcttattt ttctgcttca ttaacttctc taatctttat tctttcattc 14000 tgcttgcttt tggttaagtt tgctttttct ggtgtcttaa ggtagaaggt 14050 taggttactg atttgagatt taaagatcat gctctttaaa cgttttgata 14100 gatactgtca gtttgccctc tggctttttc tcattaacag tgtataggag 14150 tgcttattcc tcacactcat accagccctg ggtgttacta acctttatat 14200 atttgccagt atcatattca gacatagtat cttgttttaa tatgtttctc 14250 tgattactga tgaagttaag caaattttca cgtgtttat ggccatctgt 14300 ctttcttttt tcatcctttc tttcaagatg ggagtctttg ccatgttgcc 14350 caggctggac tcgaactcct gggctcaaat gatcttcctg cctcagcctc 14400 ctgagtagct gggactatag gcgtgagcca ccatggctgg cttgcccatt 14450 tgtatttctt atgtgagtat tttttctttt tttttgaagt ggagtctcac 14500 tccatccccc agagtggagt gcagttgtcc gatcttggct cactgcaacc 14550 accgcctccc aggttcaagt gattctcaca ccttagcctc ccaagtatct 14600 gggactatag gtgtgtgcca ccacacctgg ctaatatttg tatttttagc 14650 agagatgggg tttcaccatg ttggccaggc tggtttcaaa ctggcctcaa 14700 gtgattcacc tgcctcggcc tcccaaagtg ctggggattac aggtgtgagc 14750 cactgtgccc agctgacttt ttttttcttt tttttaaccc ttttttttt 14800 ttaccctttt tttggcccat ttttttttac cctttttctt ttaacccatt 14850 tttctattag ttttaaaaat atgtttgcag gagcttttta tattgtggat 14900 ttttcttgtt tattacatat catttgtaaa tatggtctct ccatctgtca 14950 ctcttcttta tctctggttt ctttagctat gtagaagttg ttatgttatg 15000 ttatgttatg ttatgttatg ttatgttatg ttatgttatg ttatgttatt 15050 ttttggagag ggagtcttgc tctgtcgccc aggctggagt gcagtggtga 15100 aatctcggct cactgcaacc tctgcctcct gggttcaagc gattctcctg 15150 cctcagcttc ccgagaagct gtgattacag gcacccgcca ccacacccag 15200 ctaatttttg tgttttagta gagacggggt ttcactatgt aggtcaagct 15250 gatctcaaac tcctgatctc aaatgatcct cccaaagtgc tggggttaca 15300 ggcgtgagcc actgcactcg gccagaagtt ttgaattttt atgtgtttaa 15350 atctatgttt tcctttatga cttcaggttg ctttcatact taagcaggtc 15400 ttcaccatcc caaaatgata aaatttttct cctgagtttt cttctaagtt 15450 ggttctttag aagccaccaa cttggcttcg acagcaaaag atgaacagaa 15500 tttctgttca actctcatgc tgcaagaagc tttatgtaat actccaggga 15550 ccctttaagg tcccagagtt ttcctccaaa tctatcagtg attctagtgg 15600 ctaagagtag aaatgtgaaa atttagccat gtgtgctgat agagctgtag 15650 taatttgtaa gctctgaagt tctaaggagt caggggagaa gggaaagtaa 15700 catttatga acatctatta gctcaataag aacatgcgat aagtatgtat 15750 atgtattatt tcacttacat ctgaaaggaa ggcataatta tccccactcc 15800 ttagagaagg aaattggagc tggctacatt taaagtagtc ctgacaccag 15850 agagatattg ccaggagtac ttggctggct gagtgcccag atggcccata 15900 ggagtagtgg gccctccaca gtccaaggtc tggttctagg tggagagaga 15950 aggatgtgct cgtagtcagc accgcagctc cagaaaatct gctggggctc 16000 caaaactgat tagaggggca gctgactcag taataaaact cccagggagac 16050 ttacttacat actggaatgc aaagttgcag ctttactggg aagattagaa 16100 ctgttatga gtagcttaga aatctctggc tgaattcact gcaaggggaag 16150 ccgcaggata agctaactgc tggtgagtca gcagtcagag cagggaagtg 16200 aatttaacat tagatgggtc agtctctcgt ggctgatgaa ttcatcccca 16250 caatactgta cacctgcctt agggaccttt gtctggacta ggggttgggg 16300 tccccctcct ttgtacagcc ctggaaggac acatccagct ccatccgcca 16350 tctctccctt acttatttcc ttccttcctt ccttctttcc atccagccat 16400 caagcttcct ttcatggcca ataatcatca ttggggtcta ctcatggact 16450 ctcttgcctc atgtatttgt tttattttgt cctcattccc acttctattt 16500 cccaggtata tcacaggcaa ctattctaac gtatttatag tttgtgtatc 16550 tgtttttgct cttgccaaaa tggaagccac tgctttatac atagatgtat 16600 tcttaacttt aaaaaaaatt tttttagatt aacctacaat aaaattggct 16650 ttttggcata tagtctataa attttaacac atacatattt ttgtgtatct 16700 accaccacaa tcaggataca gaacagttcc atcaccccaa aaaaatccct 16750 cttgtagtca cattctcctc ccacccttaa tcccaggcaa ccactgatct 16800 attcttcatt actattgttt tgtctttttg aggatgtcac ataaatggag 16850 tcacacagta tatatacatt tttttaaaca tatgtaaaatg gcattttata 16900 gctcattttg attatatgtt tttcatccag ttctgttttt tttttttatt 16950 tttaaaaagt ttgacataac ttcagactta cagaaaagtt gttagactaa 17000 tacaaagaat tcctggatat cctttggagt ccctaaatgt taacatttta 17050 ctatatttac tttttccttc tctctctctc tctctctcgc tctgtgtgtg 17100 tgtgtgtgtg tgtgtgtgtg tgtgtatcta cctgtagata gatagatatt 17150 aatataattt tagatagatg tatctagatc tctctctctc atatatatgt 17200 gtgtgtgtat atatctatat ctatatctat atatatctcc ttttaccctt 17250 aaatattcag tgtatatttc ctaacaacaa ggtgatttaa aaatatatat 17300 ataaacatag tataattaac aatcaggaca tcaacattga aacatttctg 17350 ctatgtcatc tacaggcctt aggaagactt tgtcaggtgc cccaataata 17400 gccttgatgg tagaagaaaa ccatgtgttg tattcagttg tcatgtctct 17450 tagtgtcttg taatctgaaa taattcccaa gccctttgga tttcatgaca 17500 gtgacattgt tgaagagtac aggccagtta ttttgtagaa ggtctctcag 17550 tttaggtctg tctgatgttt cctcctgatc agattcaggt tattcacttt 17600 tgacaggaat accactgaaa tgatgctgag ttcttctcag tgtaacgaga 17650 tctagagaca cacactgtca gtttgttcct tattggcagt gtgaaccttg 17700 aggatttcat tgtagtggca tttggcatta ctccattata gttactattt 17750 taccatttta aattaaaact atctggccgg gcgtagtagc tcatgtctgt 17800 aatcccagca ctttaggagg ctgaggcggg caaattgctt gaggtcagaa 17850 gtttgaaacc atcctagcca acataacatg gtgaaacgcc atctctataa 17900 aaaatacaaa aaattagcct ggcgtggtgg cgcatttgta gttccagcta 17950 ctcaggaggc tgaggcacaa ggcttgcttg agcctgggag gcggaggttg 18000 cagtgagctg aaatcacgcc actgcactct agccagggtg acagagtgag 18050 actctgtctc aaaaaaaaaaa agtaaataaa taaaaaaatt ttttaagtat 18100 cttatgggca tatacttgtc ctgttactcc tcaaactttc atccactttt 18150 ttttttttaa attttttttc ttacctttca tcgttttctt gatatccact 18200 gggttttagc atctacaaat gattcttgcc tgaatcagtt attatggtag 18250 ttgatggttt tctaattcca ttattccttc tatgtttgtt aattttggca 18300 ttcttctata aggaagagct tacccttttt ccctattaat taattcatat 18350 attaatgcag acctatgcat tcttacttca ttaaatcata atcctttact 18400 atcattatgt attctgatgt tcagactatc ccagattag ccaataagat 18450 ccccttcagg ggaatggtct ttgggattcc tctttagagg ttcctggttc 18500 ctgttttctt ttgacatatc ctattactct ttgagcattt tttttttttt 18550 ttttactttt aggcacagca agaagttcca tggtcctctt gttctttccc 18600 caactcagcc ctagagtcag tcacttctcc aatgagctct agttcctttt 18650 agtagagaat cataattaga aaacaagaat cagtgccaag tgtgcacctt 18700 tgtttttaag gtccatccac gttgccgtgt atatgtccag catgttgatt 18750 ctaactgctg aataatacct catgattgtc atccatccca gtgtttcttt 18800 ttcccttctg taatgaggga ctcctggact gcctccagca ttaccttcac 18850 aaatattgct gtgaggaaaa tccttaaacg tttcctttat gggcaacgtg 18900 tgagcatgtt tatgttgatt caggggtgcc agacacagct ccagaatggc 18950 tgcctcagtt tacatttcca ccagcagagc atgacaggct ctgtgtctcc 19000 gtgaataatc agcattaacc agcttcctat tttttgccaa actaatagat 19050 gtgctaggat aactctttgt tttaacttgt ttttctctga ttaccaatga 19100 gctggagcat ttcttcatat gcctgatggt ctttgggatt cctcttaggt 19150 aaattgctta ttcattataa tcctttgcct gtttttcact ggagttctta 19200 tatttttctt gaagatatgc aggaattcct tatacatcct agatattaat 19250 cccttcctgg tctcagacat tgcagatatc ttctgaatct gttattatact 19300 tatttattta caattttttt tttaagagtt ggggttttgc tctgtcaccc 19350 agactggagt gcagtggtat gatcatgact cattgtggcc tcgcaatcct 19400 gggcttaagc gatcctccca cctcagcctc ctgagtagtt gggactacag 19450 gtatgcacca ccagacttgg ctaattttat tttattttt agagatggaa 19500 gtcttaatat gttgctcagg ccaatcttga actcctggcc tcaagcaatc 19550 tttccacctc agcctcctgc atctattata tatatgttca ctttgctcat 19600 gctgtatttt gttgcaacat aaaactattt ttcccattgt tttgtgcagt 19650 ctctcaccag cactcttctt tttctgtaac tgtgttaatg ccctttgttc 19700 ttccatatgt taggtatgct ggtatagttg aactctgctg actctcctca 19750 gtaaacagtc tctttttatg acaccttatc ctctactgaa ttctctctat 19800 caagaatgac ttggccgggc atgggggctc atgcctgtaa tcccagcatt 19850 ctgggaggcc gaggtgggca gatcacccga ggtcagaagt tcaagaccag 19900 cccggccaac acggtgaaac cctgtctcta tgaaaataca aaaatcagct 19950 gggcgtggtg gcaggtgcct gtaatcccag ctacttggga ggctgaggcg 20000 ggagaatcac ttgaacctga gggggaggtt gcagtaagcc gggatggcac 20050 attgcactcc agactgggtg atggagaaac tccatctcag ggggaaaaaa 20100 aaaaaaaaaa aaagaatgac ttgtcttcct cttagagtgt gaggtctaca 20150 tacaaatatt attcttgtat tcagcaaatg tatgtcatag gcctagtgtg 20200 tgttaggaac tgtgctgtca ccaacaaagt ttagagaggt tataaaactt 20250 gactgtagct ttttagaggt ggaggagtga tttgaaacct aggctgtaat 20300 tccttcctcc tgtgattcct tcctactgtg ttgccttccc ttgaaaattg 20350 catttggggg ccaggtgtgg tggctctcgc ctgtaatccc agcactttgg 20400 gaggctgagg cgggtggatc acctgaggtc aggagttcaa gaccagcctg 20450 gccaacatgg cgaaaccccg tctttactaa aaatacaaaa attagctgga 20500 tgtggtgtgt ggtgacatgc acctatattc ccaggtactc agtaggctga 20550 ggcaagagaa tcacttgaac ccaggaggca gaggctgcag tgagctgaaa 20600 ttgcaccact gcactccagc ctgagtgaca gagtgagact ctgtctcaaa 20650 aaaaaaaaaa agaaaagaaa gaaaattgca tttagttcct gtagactgtg 20700 tgtcaaatgt ctaaatctct tctaaacaaat ggcctaagga ggtgcaaagc 20750 gaagcatcct caccagcatc ctgacttggc agtgaggcat gggaccctgg 20800 agggagtagt ggtaagtgtg actctggaat tcttcctggg ctacttgtca 20850 gtgactggct ccagattgag aggagagccc agaggacaca ggtggctgcc 20900 ccagcctgga ggtgaaagtc ttaaaataaa atgccagatg cctagaccat 20950 tctaaacctt tctgagaagc tgaaatcatc ccttctggaa gcgctctagt 21000 tctaaaagga cagatataca gcaagatctt cctggggcta atatggagtt 21050 tataggcaag taggcctcag aacctttccc tggtagtgat atctgtgggc 21100 aggcacagtt tccacacttt ccagaaattc cagcggaagg agtgagaagg 21150 aggaatctgc ccttgagtga ggaccaaaga aagcagaaat tcctcttggg 21200 aatttttcct ccagagacca aacactactt gggagcttgt ttactgggct 21250 ttaaaagctt gtgaccccca gtcactcttt cttgaccccca aggctttgca 21300 tttctgtggc ttccccactg gacagaagtg gaactgtcat gctgcctgtt 21350 ctggggtctc ccagaggttt ccccatgtcc tctccttgct tctactgccc 21400 cacagaattg gggatctgtg accacatatg gtatagaatt aatgcttgag 21450 aatggtttag ttcagtgatg tcaaataaga ttcactttta tgccacctcc 21500 atcagttgaa ggcccccctg gcccctaaat tggaaaaagat tctgagacag 21550 aatccccgtg ggtacagcgc agggacagta aaggcacgtg tgctgtgatt 21600 tgctatccac tgtgtggatg catccaggaa tatcagaacc ctggaagatt 21650 atttaagggg aagttaggac agcttttttg ccaatccaag ggtgttcttg 21700 aggaagtctg tcttcctgta tggccttcag tttctttcct gtgtaaccat 21750 ggggccaaca cataattccc acagctctat tggcccttgt ctgccaggat 21800 tctctagggt ctgattcgag gtggatcctg gccctttgag gtggcagaat 21850 ctgatcatgg tgctgtttcc ttagatttag gccttgatac ccttggcgag 21900 agcatcctgg gctgagtgac cacctgaggt ttttctggtg attttgtgac 21950 ccatgtaaaa ctttgagctt tgggattatt ctctcaagga aatagtgaca 22000 tttggtgaag agcctgtttg gtgtggctat gtgaggctta gccaagaaaa 22050 tgcaccattt ttattaggag gttaggccat ccgttgccac aaagtgtcag 22100 atgctaggcc tagagcctgg agaaaactta ttttaaaatt gatggggtgc 22150 tggagggggtt ggggggtggt ggctgtagct catgaatcag gtgctaaaacc 22200 tagaaacaaa aggcctcatg tggcagactg tttctgagca cagatgaatg 22250 gatgagcaac tggcgcaact ttgcccagtt ggtccagctt cccacttggc 22300 cacctaggct tgctgtgaag acctcgtctg gcagaaatga gagtgttttt 22350 gccccatctt gatcttaact gtaatttaag actaaaatct tagattctaa 22400 aacatcaaag gcaagatggc tcccagctct gtgagctcag cttctcacct 22450 cttagttgaa caagtgcagt gtgggtcaat acatgattgc tgctcttgct 22500 gccaggaact gtcccagcat agaaaggaat gggacacaat ccctgccgtc 22550 aagattctaa gggaggaagc aggcaggtcg actggtgcct catctctgca 22600 gggctccagc caaggtttgt gaaggatttt gcaggcatat ggagtgggga 22650 ctgattgatc ccgagagggg actggggaaa gctctgaaga ggggatgaca 22700 tttggtttga actccaaaaa atggttgctt tacctgtttc ctgaagtttt 22750 tgaggtggct tataagaaca tataccataa aaaggaccaa tataaattta 22800 aaatcagaaa aagagaaaat gggctgggca tggtggctca tgcctgtaat 22850 cccagcactt tgggaggcca aggtgggtgg atcgtgaggt caggagatcg 22900 agaccatcct gcctggccaa catggtgaaa ccccggctct actaaaaata 22950 caaaaaatta gctgggtgtg gtggcacatg cctgtagtcc cacctacttg 23000 ggaggctgag gcaggagaat cgcttgaaac ctgggaggcg gaggttgcag 23050 tgagctgaga tcgcaccact gcactccagc ctgggcgaca gagtgagact 23100 cctcctcaaa aataaataaa taaagagaaa atggaactta gaaaattaag 23150 aggaagagtg aaaaggtaga tatttagtca ggcacagtgg ctcatgcctg 23200 taatcccaac actttggggag gccaagacag gaaaatctct tgagaccagg 23250 agcttgagac ttgcctggca acatctcagg tgagacctta tctctacaaa 23300 aaaatttaaaa attagctgag ctgtgtggct cgtgactgtg atcccagcta 23350 ctcaggaggc cgagaccaca gcccaggagg atcgcttggg cccagcagtt 23400 tgaggctgca gtgagctggc accactgcaa ttcagcctgg gctacagagc 23450 aagacccagt ttaaaaaaaaa aaaaaaagat attcaaacca tgggtcccaa 23500 cgtagttat atatttgacc atttgcaaaa gctgaaagca aaacatgtta 23550 cacattttca gagaggaaaa tacacagtag ttcctgagtg taagttgttt 23600 ttcttgacct cattcttaaa ttgcttcatg agggtgggag ggaagtggta 23650 gttaataagt gaacctgtaa accagcgttt ctcaaaatgt agtccaggga 23700 attgcatcaa aattgcagtt acctacagtg cttgttaaaa tgcagattcc 23750 tgggcccctg ccccaggctt atcaaatcaa tctggtgagt aggactcaag 23800 aacctgtaaa ttcacatact tctgcagatg attcttcttg cactgcacag 23850 catgaaagcc tctgcaatag acagaaagct accagcattg cgaaagcaac 23900 ttgagtgctt ggcctttgaa ggttgagtgg gactttaatg agggagagag 23950 taaggcatga gaaatggcag ttccactgag gtcagtcagt ggttcattgc 24000 tgacgaagtc acttttaagt catgttttag aagaactacc aagtgtggca 24050 ggtcaggcat gtggcaggac tgtttctgag cacagatgaa tggatgagca 24100 cctggcccca ctgtgcccag ttggtctagc ttcccacttg gccacctacg 24150 gtctgctgtg tggaccttgt ctggcagtct cctttaattt attttttatt 24200 atttttttct ttttgagatg gagtcttgct ttgttgccca ggctagagtg 24250 cagtggcatg atctcggctc actgcagcct ccacttccca ggttccagcg 24300 attctcctgc ctcagcctcc caggtagctg ggatcacagg caagtgccac 24350 cacgcccagc taatttttgt atttttaata gagacatggt tttaccatgt 24400 tggccaggct ggtctcgaac tcctgacctc aggtgatcca cccatctcag 24450 cctcccaaaa tgctggaatt acaggtgtga gccaccgcac ctggcctatt 24500 ttttttcagc aaattctttg tttttctctc tgttcccaaa tgcagggtac 24550 tgagaccaca gatgtattct gtttcctgtt gaaaaaatgt ttctcactta 24600 gctgggtgtg gtagcatgca ctgcagtccc acgggaggct gaggcgagag 24650 gattgcttga gcccaggagt tcgataatca tgccattgca ctctggtctg 24700 ggtaacagag cgagaaactg tctcttaaaa aaaagaaaaa gaaaaagagg 24750 tcctagggaa agaaacaaaat agtggcttgg atggtgagtt ggtggaaaga 24800 acagtgggtg ttgggggtgt tgaacttgtg tttgtgtgtg gtgtacccaa 24850 gacatatcat gtcagcatta agaatagact attcctgttt tctggtcact 24900 gagttgtatg ttttgacatc cttattttgg aagatacttc cttactagga 24950 atgggatagg gagggggtca cctttcccat ctgtgggtca tattttaaaa 25000 tatttattgt tcaagtttaa agatataacc aaaggtataa agaaaaatac 25050 cacaaacatc tgatttaaga aacaaaccag ccgagcgcgg tggctcgtgc 25100 ctgtaatccc agcactgtgg gaggccgagg caggcagatc atgaggtcaa 25150 gagatcgaga ccatcctggc caacatggtg aaaccccgtc tctactgaaa 25200 atacaaaaat taactggtca tggtggtgtg tgcctgtagt cccagctact 25250 cgggaggctg tggcaggaga atcgcttgaa cccaggaggc ggaggttgta 25300 gtgagccaag attgtgccac tgcattctag cctggcgaca gagtgagact 25350 ccgtctcaaa aagaaaaaaa aaagaaagaa atcatttcct acaccttcga 25400 agccttcatg agttagattt tgaaacagtg caaaatgctt cacgtgagaa 25450 tcgagagtcc cttctggtgg ctctccatcc cctgctcttc tgtcaggttt 25500 tcttgtaggt ttatggaaac ctttgttact tgtgcaggtg gcagagaagc 25550 agagaggata gctgcgcgcc acccacacag ctaggattta ttggcgtact 25600 cccacgtgca tggcagccaa gtggacacaa ctctgtgatg aatcctccca 25650 agagaactga ggggccctga tggaggagct gcttctttgc aaagctttcc 25700 ttgactctct tcctgtcccc tagttgattc cccttctgtg ctagttttag 25750 cttattgttt gttacctgtc acacttagca gtactgttgg ctttgctggt 25800 ctccttgact actgggggta aagacctttt gttgttgttg ttgagacaga 25850 gtcttgctct gtcgcccagg ctggagtgca atggcgtgat ttcggctcac 25900 tgcaaccttc acctcccagg ttcaagagat tctcctgcct cagcctccta 25950 agtagctggg attacagcta caccacaccc ggttaatttt tgtattttta 26000 atagagatgg ggtttagtag agatggggtt tcaccatgtt ggccaggctg 26050 gtctcaagcc cctgacctca aggtgacctg cctgtctcag cctcccaaag 26100 tgctgggatt acagacatga gccaccatgc ccagcctcaa agacctcttc 26150 tttacttgct caccctgccg cccactcccc taccaaccct tgcatgccct 26200 ataccacctg gcacatgata catactaact gggtacatgt ttgaatatga 26250 atggatgtgg tgctgtgaat gcttagggga agtgggtgaa atgcttaaga 26300 accaaccttg agtggtctgg gaaggcttcc tgggagggtg gtgtttgagc 26350 taaggccagg cagctgttag atttgttaga ctgaagccct tgcagactta 26400 gagagcttgt gctcttccca gaatgacggg tgagccacgt acagtaaatg 26450 gtgcttctca tttctagccc aagggggcctc aagggggcacc gtgatttcac 26500 gagaatgctg caagcaaatc ttttctcaag ctggggaatt tggtggtaat 26550 gcctggctca gcttgcggtg cgcacctggc ctttggaaga ttggtacaga 26600 gagaagcggc ccatccacat gagcctgtgg aacagcactg gtgggggagc 26650 tgatttgtga agaggggctg tgcagtgtac tgtcaggtct gagacccagg 26700 aagaaattcc agtatcccag ctctcagaat cacagagttc taggcactgc 26750 ctagttccac gtgttcccaa atgtttcctg aatacttgga tttcctgtcc 26800 agagaatttt caaaacaaac ttagaggcct gacccatggc tgccaaggaa 26850 ggattttttt tttaaattaa attttaaaaa tcagtccagc atgaaaatct 26900 atgatgattt catagagaa aggacatttt aatattcaaa gagtaagaag 26950 cacttaatct tggaagaaag ggcattccta tactttgatt acctttagtt 27000 taattaaaaa acacctacat ggtctttact tctgtgattt cattcctggg 27050 ctagtgaaac attgtcacaa taaagcatca ggccaacgct tctttcgacc 27100 cactggccaa tcagttgaca aacagtgact agatgtttca gcctattttg 27150 ctgaggctaa aggattgaac tagtgcttca gccagcatga aaaccagtca 27200 ggagtccgtg ctggtgttgg cttagattag cagggccttt gatgggagggg 27250 catgtatgtg tttgggtttg ctgtgccagg caggggagca gtggaatttg 27300 tctgaattga gctcacacat tgaagttatt gagcgactta catgcaaggc 27350 catgacctgg actcccagcc gagaggccca cgtggcgggg cttgagctgg 27400 gggagccgag gacagcttac atctgctcat ctgcttacgt aaccctgcct 27450 cccagcttcc agagccaaga aaaacacacaa gccagcccag cggggccgag 27500 agcctgtggt agcacacgcc atgcgccgca cagcaagggc gccttggctc 27550 ggcttgaggc ctgtcatgaa gccctcagcc ctctgcctcc tcccagagct 27600 tctccccacc accccaggca gtggctctga aacctggtcg caggtctgca 27650 tgattctgaa cagaggtagt cgttgccttc ctggagtctg agctctctgg 27700 agtttctcac tgggacagag ccaggtgtgt agcagagcat ggtccctgca 27750 gtatggcagg aggtgtgcag ggcattcagg aggcctcctg gctggcactc 27800 gacccaatta gtcattcaac gccaggtctg gggctgctgt ctgttgtctc 27850 aaaggtgtga gctgcaagat ccttagagtt gtggagaaaa aattgccaga 27900 ttggcaagaa gggcaggatt gggggtcaag gtgtctcagt gtgttggaag 27950 catgatgggg gttgtgcaag gggcacagcg agttcagaag ggagcaggag 28000 agtgagaaga ggctgttcag tgataaagct ctgcacagag ccattggagg 28050 agcaagctcc ttgaccatcc ttaaaccagg gtaattttca tttaggttct 28100 gccacacgct cagcagggaa ctcctggaag gcaggatttg tcttgtccat 28150 cctccctccc tacctcaacc cactcctcct tgggctggca cacagtaggt 28200 acccaaag tatcaattga aacaaattga aagtggtctt gatacatatc 28250 acagggcaag tttgcagtta acagacattt cagagtaaag actctctggc 28300 ttggtgctcg atcggcttct gtgggttgtc agcatgctgt ggacagcccc 28350 ggcatgggag cgagtgggcg tgtgtgtgtg tgtatgtgag ggtgagagag 28400 cgttagtgtg tgtgttgggg ttggggagag aggagggggga atagaagatg 28450 gaccacccgg gtatcagctt ctgccctggg gagatggtgg tgtcagttgc 28500 tgagggaatc ctgagaagca ggtctggctg taggtggtga tggtggtggg 28550 gttgcatgag aatccatttg gggcaggttg aatttgaggt gcccatgaca 28600 tatggctagc catgttctgt tggctgtgag gtcaggagag agacatgaga 28650 tggaaacaga ggtttgggaa ctgtcatgtg cttaaaccaa agacctgggt 28700 atagggagag tgagaagaga agggggcaaa gatggacatc caagaaagaa 28750 gctgagaaag cctaggaatt tgaggtaaga ggagacgtag gtaaatgtga 28800 cgcttggtga tcaaggcttc tttccacctc tcctatgctg gacactcacg 28850 tctcctgtct gcttggaaat tcatgctgag ggcagggaag gtgggagcaa 28900 ggatttgtct aaagatcttg ctttggatcc ctgcactcct cctggtttac 28950 caagtgtcac tggacacgtc agggcgttct gagaccttag agagcatcca 29000 gtcctgtccc tgcagtttac aaatgaggaa accagtaccc tgagagtggc 29050 tgtactatcc actctcagga taccaaagat catctggaaa gtcactggtg 29100 gagctggacc ggggcccagg catctcttct cctgtccggg gctcttgact 29150 tcaggaccac ctttctgaaa cccatgatgg ggcaacacca ggacactttc 29200 cagcctgcag gtgtctgtcc cgcggaagcg agccaggcca catgtgaatt 29250 cctgttttct gggtgggttt cagaaggtac gagcaagtcg gcagggtgac 29300 agcccaggtg cttcttgggt tccccaaac gcggttatgt ttagcagcat 29350 cctcagaacc aaaggtgggg tgggggctgc agatgttgtg ggggccctct 29400 gaagtgaaaa gagccctgtg acagatcttt tcttcatgtt tttcacaagt 29450 tcactgtgca gcagggcccc cccagtagcc tttgcccagg gttgggtgtt 29500 gggcagccca ggcctggctg accttgtggg gaagggtgtg aatggtggga 29550 atccccgagg gccctctttg cccgaaagcc ctaagccttg acatcagatg 29600 cccatcagat ggtccatcgg agccctacta cccagcttgc ccagtgagaa 29650 tcatctgggc tccttgttag gtagccattt aggtccttcc caaaatccac 29700 agactctcta agggaagggc ccgagatgct gtacttgtac taacttcctc 29750 aagcaattct tgtgataggt ttgggaaaaa cttgtccagg gtgaccactg 29800 actgagtcct ggtcttctct gaagagcaca gtgcctgctc actttagggc 29850 accctgggag gtgggagctg gctcagcagg cagtcttata agggactgag 29900 cttcaaggcc tctgtccctc caggagggag gtgcatgacc agagagggag 29950 gcctgaggat cttcttccct gccccagagg gtctgctgcc tgagctctgt 30000 gatagcgcag agagtaaaag gatcaagctt gattgaggcc tatctctcaa 30050 tgcgaaagtt tgctagttaa gaggagagtg ggaagggcat ttctggcaaa 30100 gagaaaagtg tggacaggca tggcttaagg gatggggagg gagacagaca 30150 gagctgaggg tgaagggcct tttgctcagc tgtgggcctt ggccttccct 30200 tgtgcaggga cacacagcct tagagccact ggaggtttta gtgggaaagt 30250 aatatggtcg gggctgtatc tcagaagaaa acaaactaat gggaacaggt 30300 cctgtgatgg tggacctggg tcagctacgg agggagggaa gatgtgagat 30350 gtgtactggg gaagggggtg gaagtggcag ctatctggtg agaggaagca 30400 ggcccacagc tttttttctc aagctgttga attcagaagg gcgagtgatt 30450 ccgggagtag ggggtgcttg gagagccacg cgttatgat aaacagggca 30500 ggctgaagcc tgctcactgg ccctgggcgg gttctcacca gcatgtttca 30550 ggttttgatc tgtgcttgtg gttggtgttc ctacctgttc tctaggttcc 30600 ttcctttgtt cttgtggctc atttgcttca caggtgaagc tggttacact 30650 agagtaacag ttcccaaagt gtgttccctg gaaaaaatggt tctgtagcca 30700 aataagcttg ggaaatggtg ggttaaatat aacgaagggg gtttttcgac 30750 tgcacaactt ctcagagcct ttggtgtgtg tcgtgacttt gcagaagcag 30800 gatttaatac gcagcattcc cgttcttatt tgaccacgag acatgttttt 30850 ccattaagca tcttgctggg tctgatgttt tctggaaccc attttgaggc 30900 ggtctggtct gcagagagta tggggagcct gggttcaagc cttggctctt 30950 gactctcagc agagccttga ttccctgtgt tgcctggact gcaccacgtg 31000 taccacatac ccggtatgtg acgttttcct catccctctt cccacctgcc 31050 gttacctcac aatccacaat ctgcacctca tccatttttc ttctgaggca 31100 agcactctct tactaactta cttatctcat ctgcatccat gttcttctag 31150 gccagaaact tggggagtcat ccctccctct ttgttacttc ttcttcctct 31200 ttgttacttt atcccctctg ttactaaaaca ttcttctgtg tttccagcta 31250 tttcttttat tttccctcgg tctcctttgg ggtttctttg cctccatctc 31300 tcccagacct tggttcacct tccatcgagt cccttcctgg gacatgggca 31350 ctcatgccac tcctgctacc ttccacttcg aagctaactc cctccacact 31400 gacgtcccca acatgcatgc atacacacac acacacacac acacacatac 31450 acacacacac acacacactt ccccagttag gctagaatca gagagatgat 31500 gtcagccatt tgtccaaggc cacgcagctg ggaggtcaca gagctaagtc 31550 tcaacctcag gggttttgag aaattgcctt ctcatccgtg atcactgatt 31600 tctacaacag cctgtcagga agtctgggta gaaattactt ccattttaca 31650 gtggagtcag agcggggagg gtcctgggca ggcgagtgct tcacagagtg 31700 accaaccatc taggtttgcc ccacactgaa gggggtttct ggggatggtt 31750 ggtcacccta atgctggatg tggtgcctga tgctgggcag gagggccctc 31800 tccgtggcca cgttgcctcc caggaggaga catttcctct gcagctgcag 31850 ctgcagcctg gccatctgat gcagcctgtg gagcggtggc gagtcctgtg 31900 gcctgctaac ttctccctcc ctccacctct ctagtgggcc ccatgctgat 31950 tgagtttaac atgcctgtgg acctggagct cgtggcaaag cagaacccaa 32000 atgtgaagat gggcggccgc tatgccccca gggactgcgt ctctcctcac 32050 aaggtggcca tcatcattcc attccgcaac cggcaggagc acctcaagta 32100 ctggctatat tatttgcacc cagtcctgca gcgccagcag ctggactatg 32150 gcatctatgt tatcaaccag gtgaggcctg ggaaggtgga atgagagagg 32200 gtgtgtgtgc atgcagatgt gtatcagatg tgtgtgtaat gagggcaggg 32250 gaagggggagt gatttcacag acacctggca cttacagcga ggaaccagcc 32300 ccccagccac caccagtgca gatgaggtaa acgccaaaaca gtgtgcttgc 32350 ctattgctgt caactctata gccaagggaa atgctggagt gttttcgttg 32400 ttctgttttt gttttctgga agtagccttc cagcaagatt gggaaaaaag 32450 acaaccctaa ttattccaaa gtacacactg attattccct ggctttgtgt 32500 agctgtgtat tttcctttta aaaataaaac caccatttag atgtcagact 32550 tttaggtaac ttcaaagttt atccagtcag tcagagcgtg tctcctgggg 32600 cacctggaga cagtgccctt agttcaggtc acatgcctac atgccagccc 32650 ctggtgaaat atctggagaa gtctgattcg tgggccatct gagagttatg 32700 tggactgggc cgagtctgag aaaaagtttc tcactgctcg tctgatccat 32750 atgtgttggg ctttagccct gcttaggaaa gtaatgctaa ggataggtca 32800 actttcatca ccatggcatg gagaatcaga ttgatctaag aggcatcttt 32850 attgaaataa atttttcagt ttatttgagg agcattattt tcccaagagt 32900 ataactttga tatttcaaga ttacccctaa cacttaaatt catgttttta 32950 gactataacc tcctaggtgc aatgacacat ctaacttatc taagcaccca 33000 gtttcattga aattcatttg aagagtctga gtacgcccat ttctacaagg 33050 cccaatgtcc atttcatttc gagataaact ctgctttagg taggaggatt 33100 gttggcagtt tacggcttcc atcaaggtca aggaactctg tgcaccttcc 33150 ctatgacccc aggggaagca ctcgaggact gctgtggcat tgtgctgcat 33200 cacttgctgc agggagattc tgaagaagtg taaggtctca gtcctgccct 33250 gtcccgaagc ctccaaccca cttctggcaa gtgggacctt cccagggaac 33300 aatttgttaa cagacccaaa tatcctgtga ttggatggtg gctgccaaat 33350 gctttggaag ctcagaggaa ggagagagag caatggcttg gaagaaccag 33400 gatataaact aggttctaaa gtctgcaggg agatgggctt ctcagctggg 33450 gccagtgagc agggacctta aggcagaaag gagccttgca tgttcctgga 33500 aattgagatg cccactgggg taggaaagca ccagaagctc tgggaccagg 33550 tgtcagagtt aagcctgtga ggcaggagag agcagaacaa gccctgttac 33600 aaggaaactg aagcaggaga gcaggtggtg ggcaaacccc ttgaggctgt 33650 ttgaattctt cggccaagtg aggtacagac cagggcccta tgaacacctg 33700 caagcaagac agccacgcag ttgtgggtca ccttggaaga atattggaga 33750 atgcaagaga gaacaggtaa atgtcctgca aaatgcgggt cactttaacc 33800 caacacatat tcatttaaga aaagctctgt gattgagaaa catttgtctg 33850 atgccagtta gcacatacca atgacggcaa gattcaggag cctgttatta 33900 aagcagtggc agcgagcacc tggaagaggc ggccaccatc accaggagcc 33950 agcagggatg actaataagc cgtgccagct gcatctcgtt tctctcttga 34000 cagttgctat gccagtagat gagggatgta ctgtggatac aatgctgtca 34050 tatcttattc agcagggcat ctgatagcat ccccaaaatc tgcctgagta 34100 gaagacagac agctgtggtc tgggtgccat ataggtaggt taaaatatat 34150 atttgggcct aggcgcagtg gctcatgcct gtaatcccag cactttggga 34200 ggccaaggca ggcggatcac ttgaagtcag gagttcaaga ccagcctggc 34250 caacatggcg aaaccccgtc tctactaaaa atacaaaaat tagctggaca 34300 tagtggtggg cggctgtaat cccagctact cgggaggctg aggcaggaga 34350 atctcttgaa cccagggaggc agaggttgca gtgagccgag atcatgccac 34400 tgcactccag cctgggcaac agagtgagac tctgtctcaa aaaaataaaa 34450 taaataaata aataaataaa atatatactt gggtaaagag gataaaagag 34500 ttagcgatga tgctgaattt ttgaactgag gtggctgttt tcaaggaaga 34550 ctggagggtg ggatgctacg tctagatatg ttgcagttta ggtgaatgtg 34600 agacttccct gttttgaagt caaatattgg accagtaaaa tctagccatc 34650 agcttaaatt cctatgatac aatttacata ctccccaggc tcaacacagt 34700 agatttctga atgtcctctg ccagctacat gctcctgccc acctcaatcc 34750 gagtagatgg aacaactaac caagccagct cagaccggtg gcacagctgt 34800 gctggctaac actgggcacc acctaagaga gtgcttctcc aaaagtgtgc 34850 ttccccaaat ggagcgaaat acgcttgagg aatgttgggt tgaaccatgt 34900 aaagcaggtc tcattcccgc agagcctttg gtaccccggt gtacactgta 34950 accccagaag tgtttcctga gcttgcctga cgagacaact tttccaagaa 35000 ccgtctcaag tgatgagtgt tttgtgagtc acactttggg gaaagcgggc 35050 ctaagttagc atctcctccc agctgcctcc ctgctttccc tggaacacta 35100 ggaactgccc gtcctccctc cctccctcct cttcccactt cacaacttag 35150 catcaggaat attttagttt tggtttttca aacatatata cctccttttt 35200 tcttatcttg tcaatatcat ctttttttt tctttgcttt tcctcatact 35250 tttttttctc ttcatccttt ccttctccaa gggttaactt tccaccttag 35300 gagaatcttt tctgcttttt ctcccacttc cccagctact ctcttatcat 35350 ctgctccaat ctcaccctaa ttgatcattt tgggaaaata tggtcagagt 35400 ccagataact aagttgagaa atgcttaaac tctgccatac ctttccagta 35450 aagaatatta cctaataaat aataaaatgg taatgggaaa cctgaaccct 35500 gaaaaaaaag aggtggaagg agaaacatt ggagcacatc ctgtctacaa 35550 attaggaact gcctgtgtta tctgttttat ggttatattc tagaagaaga 35600 aagggatttt gtagcacctg gttttgacct ttctgcactg tttgttgagc 35650 aaataaacct tatgggctgt tagccctctt tatagcctct cagcttatcc 35700 ctggcccaga caccctgctg tcattttgac ttttcattcc cacacacaca 35750 tacacatgca cacacatgta cacacacaca cataccattt aagattagac 35800 agaagtaatg ctcaaaatgg agtggcttct gagacattta gtccaagggt 35850 tcccaaacag gcttttcagt atcagatttc tttctgcccc attgaaatgc 35900 tacacaacct tccgcttaca gcaggtcaca agggtttcat tctacttgaa 35950 gtaggggcca tgtcccattt ccacttcctt ggcttcccat tcagtcactg 36000 ctaggatttg cctagacccc tgaggccaga caatgtagaa acttctgctc 36050 catgtcacag gtgaggaaac aggctcagag agggacaggc tccgaaagtc 36100 acatagacaa cagtagggct gcggctcaaa ccccagcgtc tgactccagg 36150 tttagtgcct tctcagggca tcagtgacac tcctcatggc cagggtgccc 36200 ccagtgttgc tcacagtctg gtatccaggg ctgagagtgt gctgtgtgct 36250 cagactgcct gggttcagtc ctggcactgc cactttacag tcagtgacct 36300 caggcaggtt acttaagctc tgcaggcctc agtttcctcc ttggtgggga 36350 gggttatgag gcatccttct catggtaaac cttcagtaaa taccagccgt 36400 tactaggagg gtccactcct gcctctccac tctccattca tcctgcctgt 36450 ttcctctgcc tgcttcctct gcctgcttct gtggtggtga attcttcatg 36500 gctcccaccg cctcctgctg cacccccact cagggcccgc atcaggaccc 36550 ttcctcctat tggtttgaac tccttggagt cagagggtaa tggatagtgg 36600 agtgagccag gtggcagaat ctcagaggcc atcccgggcc tataagcctc 36650 ttcaaaatag ggccacgtat caagctttac acacaggagt gaactttcac 36700 aagttgttat gactcatact ctgtctatag taagctgtta accactccca 36750 tttggcttat gcctctgtaa ttattgtact aacttatatc ttaaaataag 36800 gatattgaag gaatgagccg ggagaggctt tcctggttga gatatagaag 36850 aacaagagtt gctctttttc cttaaggtct ctcctcccac ccctgacctt 36900 agctcaccag catgggagaa tactatttga ctccttgtac tctgagacgt 36950 ggatttcaag atatagcatt ccaacttcaa cggcagcaag aaaagaagca 37000 acagaaggag aagacatcat agcaaacagg gatgcatgct gcatttccta 37050 atactcaaac ccggaaacga gacttcactc aaggtgaagg gagggcaggt 37100 caccacctgg tagcactagc cctaaattaa ggaatgcaga atgtttgtgg 37150 gattgcccat cataaaaatt acaaaatgag taaggaatgc aggcacagct 37200 ggccaggtgg gtttgtcaca accatggcag ccctttgccc cacagccagt 37250 acacagaact ggtctctcca attccgattg catatcttct ggcacctctg 37300 ttcctctccc tcagctgccc aggatttttc tggttctgac catgttactt 37350 cctcttttaa acctgttagc atttcacgac tgcctacagg caacggtcta 37400 aatggtcgga aggcccaagc ttagcatccg agaccctgac ctacctccag 37450 ccacttcctc ctcctctcca cttcactgga ctccccatct ccacccagac 37500 acctctgttc tcccctctgt gtgcctttgc ttatgctgtc ccctgtgttc 37550 ctagtgtgtc tctggctatc ttttaagctt ccctccccaa cctcattagt 37600 tctgtggagc ccctggaata gagctgactt ctccttccct gctgctccca 37650 ggctgctcag aactttctgg aaaggggatga ttatctgagt tccagcctca 37700 ccccagcccc cggactctga gtccctcatg tctgcctccc ttctttctct 37750 ctgaccacac agctggtaca tagtcagtac agacgcagtc agtgagtgga 37800 gcacggggct tctctccagg attcctgccc ctttgtttat ccctagtctc 37850 aggactccct actcctggtc ttctgcctaa atctgtgcct cttggaagtg 37900 aagcctccgt tcccagtggg gccaggtcct gacccttggg aacttgcagg 37950 atccctccct tgggcctctc cccgaagctt ccagctcaat gctgaccaga 38000 gcacaggctg cctgtgacag tccttggggt gacctccctt atcaggaaaa 38050 atgcagaaaa cctattaata ccttagcctt gtgattgtta atggtcacaa 38100 aactccttta gggtcctttg gactcagcac ctttatggtc tcactttgaa 38150 ttttgaacct cccacctccc cccatccccc agagtaaggc aaatggtctt 38200 ctgattgttc ctgcagaggg aaggctccac aggtaagcac acgatggcca 38250 ggaagcagag ctggagcctg cctgaaaggc tgtggagaaa tggaggggagg 38300 gctgccctga ggactctgtc tggctttgaa gttttctact gtttcctttt 38350 cttctgtgca ctgttttagg atgatggggt gatagttcca ggctggttga 38400 ggatggattt ggagacagtc ctttgtaccc tcagtgagca agagtatctg 38450 tcaccctacc tcagcagttg tctctgtcac tggtccaagc agctggttcc 38500 tacacaaggt caagatcaac tggggagaag cagactcctg ggtctatccc 38550 attagtgagg acagctgcct gggcttatgg cctcattggt ttggtttcta 38600 tcttgatcat ctctaccatc cccccatccc ggccttccat tttctacctc 38650 agctgtcagt gcacagattg atgtgtgtgg gaacggagct tgggaggagt 38700 ggggtagggc tggtcctgtc ctgtagcctc cccttccttc gggcacttgg 38750 accctttgga gcttgccggg gtggggaatg ggagtgggaa ggccagggag 38800 tgtctctgca ccatcactgt ttgagtgttg cccctttgct gtgtgcccca 38850 cctagtctat gtgtgtctct gttctctggg gactcaattt gctggtgaat 38900 tgcttccatg gacattgttc tgggaaatgc cattttttct gctcacccat 38950 gactctgtga caaggaatga cagcttatta ggaatttgtt tttgcattgg 39000 aacagtggtc atcagaatgg gccccttttc ccttgcagct ttgacatttg 39050 cctctctttt cctcacctct ctcccttgca tccacccttt tctctttttc 39100 ttcttttttg ttttccttct agcaggggcc ttttaccttt acttgttaat 39150 cctgtttgta gcaaagcaag tggaagggagg agttcctctc tgatctgctt 39200 cttatctcc acctaccttc tcttctgtac tttccgcctc ctagagagag 39250 agagagagag aggaatgccg acctaactac cgctgccact gctgctgcca 39300 ccaccgctgc caccaaccacc ctggtaatgt tcacatgtcc tcaaatcaac 39350 ccagagccag ggccctgctg gtcaggggga ggctatgtaa ataatcccat 39400 gagtgtgcca tcctcaggcc ctggggtctc ctaggcaaga ccagggcctc 39450 tgtgggctct ctcggaaatg ctgaggttgc tggaagccag cccgtcatac 39500 agggtctgag agtttaactt cttttaaatt aaaccacagt tgagctcatg 39550 ctgtgtgtgt ataaactttt gtatcctgct ttttccttaa attctttatc 39600 atcagcatct tcccatgtta tttcatagtc ttcatcatca tcactttcca 39650 taccttcata gtagttgatc gtagaattcc atcataatta acttgtcttt 39700 tctctcttag aagtccctta ggtaatgtcc aattttccgt gagtgtaagt 39750 aataccataa tgaacatctt ggagtctgaa gtttatctg tgttggtttg 39800 ttccacatt aggatcattt tcccaggcta gattttcaga tgtgggatta 39850 tgggttcaga tatggtttac acatttttat agttcttaat acagatggcc 39900 aaattgcttt ctgaaagaga agcttttctt aagtattttt ctccaacttg 39950 tatcttaaac atcctgaaca tgcttagcac cactgtcttg atatatctgc 40000 ggaaagccac gtctccactt ttcagtgtgt cgggccctgg gagaggcagg 40050 catcctgcgc tggctccttg gagctgggtt taaaattgtc tcctctggct 40100 gggcgtggtg gctcacacct gtaatcccag tactttggga ggccgaggtg 40150 ggcggatcac taggtcagga gatcgagacc atcctggcta acatggtgaa 40200 accccgtctc tactaaaaat acaaaaaatt agccgggcgt ggtggcgggc 40250 acttgaaaag tcccagctac tcgggaggct gaggcaggag aatgatatga 40300 acccgggagg cggagcttgc agtgagccga gatcgcgcca ctgcactcca 40350 gcctgggcga cagagtgaga ctccatttta aaaaaacaaa caaacaaaac 40400 aaaaaaaacaa acaaaacaaaa actgtctctt ctgtgctcac ttcacccaga 40450 atccctgttg ggctcttcaa ggagctcagt tctctctgaa agcaacttta 40500 tagcctcagt ccagtctgtg ttcctgtgtg gcaggggtca agggtatgct 40550 cactcttgag agtggtgtct ttggttgacc aagaaccact cccatagcct 40600 ggtccctaac ccttgaaggc ccatctctct cactcactgg ggtgaagagt 40650 ttaaatctca gatccaagtt ttgttgagag ctctgagcta ccatattgct 40700 atggttaaca atagttaaca atgttaacaa tggttaacta tggttaacaa 40750 tagttaacaa tgtttaacaa ctagagccca gctgggtgtg gtggcatgtg 40800 ctaacagtcc cagcttctca agaggctgag gtgagaagat tgctggagtc 40850 caggagctca aggccagcct gggcaacatg gcgagaccct gtctcccctg 40900 caaaaaaaca acaacaaacaa aagcaaaact agagcccaac tgctgtgaac 40950 tcatggctga gtagatatta ttagccctcc acaaactcag catttgtata 41000 atcccaggct gtttccagta attctctggg gatcatctcc cagcctgtcc 41050 actgttccag gatccacact taggcctata ggaatgcccc gtcagagctt 41100 ctgctgccgc tgatctgtta ctgtttcatg caacccactc ggcctagttc 41150 cttcctctta ctgtctcagt gggcacagaa aagcatacag agggtgtttc 41200 agcaaacatt gccactggct gcagacctgc ccccggatct gtcctgttga 41250 gagcttagtg ctgcgttctt gcatggtggg gaggggtgtg gctctgtgat 41300 gagccagggc atgtgtatag gagcaacagt gtctctctta tcacgtagaa 41350 gttctgactc attgcgagtc ttggctttgg gttaatggtt ccagccatgt 41400 tgctgctgtg tcttttggtg caggagaggc tgggcacagt tggtccctaa 41450 gccattatgg ataagggatg tgtctgctga tatacacaca tggacctgac 41500 atccagggaa ggcagggtga ttggacagaa cagttcttcc agaagctgtt 41550 ggaacttgga caagagtggc ccttggcttt ctgtagttgg tcatctgtcc 41600 cctgttgcaa tcaggggaag gccacacttg ccttccttaa ccacagttag 41650 gattttcttg gggattagac cagattctag cacctgtcct gaacctctcg 41700 ccccgcccct acaaaggctg cttgcaagtg tagtgcacat acacagggag 41750 caggtggggc atggaagtgg aagtggagcc cctgcctttg gcccttgggg 41800 gaggcactgt ctgcttaccc acggttgttg cctcatagga atcatacaac 41850 agcttcctaa ctggtctcct tgccttcagt tggattgggg cacaaatccc 41900 tccttgacat ataaaccatg gtttaaggct ccctgtggcc taaataaaga 41950 taaagcttaa gtatcttaac aagcacctaa cccttctccc cagcctcggt 42000 gatttggctc atcgctgcct tcatgtttca ttctggcttc actcattcgg 42050 aatttcttgt agttccttgg ctgttctctt ttccttaccg cctttacaaa 42100 tgctctcacc atgcatgctt ttctctgctc ctacagatgc cttctctccc 42150 agcaccgcct ccagagtcta tgtctggtcg attctgtctg ctgtctccag 42200 tccccatctt gtggcagtct ctgctcaatc atttggggat tttatatgtt 42250 ttctggcctt tcttttgggg gcctgtcttc tccttctaaa agcagccagt 42300 tgacctagaa ggaagggata actgtaactc ttgtctacca acataagatt 42350 aggcccacc tttaaaagct gcgtctttga aagggacacc tgcacccagc 42400 atgctggctt ctcttcacca agcgtgactt cctacgcatt tcacaggcct 42450 ccagaggtcc ccctgactct cttctgctgt gagaaactct aatcatgtaa 42500 gccacaggct aattcccttg agccttaaat gtttttagta atttcccatt 42550 catcagagaa gcaggatttg ggaggaattt tgaagcaaac actacagaag 42600 gcagagtctc caggtaggat atctaagaga catttggaat ggtctgactg 42650 ttcaagatgg atgggaaagc ctcttcctgt aatgatagta gccaaacattt 42700 gttgtcaggc agtggggccc catttttgag atggggtctc tgtcacccag 42750 gttggagtgc ggtggtgctg tcatggctca ctgcaacctc agcctccccg 42800 ggctgggtct tcttaattct gaaaaaccca gcttttaaag ggtggaccta 42850 atcttatgtt ggtagacaat gttgtctcat ttaatacaat gcacatgctc 42900 tccccataac acaaaagagg gaactgaggc ctggaggtgt gatgtacccc 42950 aagtcacata gctaataaat aaagaagcca gcattcctgg gattaaaaat 43000 gcatgtgtct gtcactgtgg tgtatttggt gcttgatcaa tgtttacttg 43050 agcaaatgga ggggcagagg taccgatgag tgtgctcagt gaggagggca 43100 ggagtgaagc tgggcgtctt cccgcctctt gtgagtggtg gggcttggtg 43150 agcttgccag ggcctgtctt tcttatcaaa gaaggtgtgt gccccagtgt 43200 tacagcattt cacccaaagc agcctagaaa atgcttgact tttctgtcat 43250 tccgggggagg acactttcct cctccactgt tctgctggcc tggtgtaccc 43300 acggcccctg atagatgata gcacctgcta aagtgcacca tgcccttccg 43350 tctcactgca tcccacagat gaggccaggc tgggatgagg gagaaaggga 43400 gggatatata gttcaggtta ttttggaaaa ctgcctgacc aattttaagt 43450 ctgggccgga cactggggca tctcaccacg ttgaaagggc cgtggcaccc 43500 cgggcggtga aaggggctgg aaccaggtct gcttcttggg cttctcctcc 43550 agggtgccat tgctcatggg ccttggctgc agaggtgctc attcgtggtt 43600 ccaaaattcc aattcctggg agaggaaaaaa tgcttagttc agtctcagtt 43650 aggcctctgc ttagatcaaa cagccaaggc cagtaggccc agtcctatgg 43700 tagagacatg gcctcaaaga gccctctgct gcagttgttg gggagtgtac 43750 caagagaagg gagcattgtc ctgggctggg cagccctggg ggtctagtgc 43800 atagatgtag aaaggctctg ttggtatacc tccctttgct tgttggaaag 43850 tgctcaacgg ggctgaattg tgtttgacag tgtaagtctg ggctggggtg 43900 agggttgtta caagatgtc aagatgatta aatgaaatgc catttgaaac 43950 acttatccat gccttgtgta tggtatcccc accagtgaat attcacagta 44000 tattataata attccaaacaa cttcataatt ttcatatgca atttctaaac 44050 tttgaacttt tttttttttt tttttttttt tgagacagtg tctcgctctg 44100 ttgcccaggc tggagtgcag tggcgcaatc ttggctcact gcaacctcca 44150 cctcccggct tcaagtgatt ctcctgcctc agcctcctga gtagctagga 44200 atccaggcgc ccgccaccac acccagctaa tttttgtatt tttagtagag 44250 acgggctttc gccatgttgg ccaggctggt ctcaaactcc tgacctgagg 44300 tgatccaccg ccttggcctt ccaaagtgct aggattacat acgtgagcca 44350 ctgtgcccgg caattttttg tgtttttagt agagatgggg tttcaccatg 44400 ttggccaggc tggtctcgaa ctcctgacct caagtgatct gcccgcctca 44450 gcctccctaa tgctgggatt acaggtgtga gccaccacgc ccagcctaaa 44500 ctttgaattt ctttgaaccc atgacttaca cagaattagc tgaacgcaga 44550 attccaaatc aactcagcct gtgggacagc caaaaaacac agtgtgcctt 44600 tgggctcctt cactcaccac gcggggttag aaaactttgt cagaggcttt 44650 aaaaaaggag ctcttgtgtg taaaatgttt ccttgattct ctttctggtg 44700 cctctctttc tctaagtggt ttgcttcccc aagttccccca cctgagtctg 44750 ggtggctgtg gcacatctgt gcattctgta cgcacacagg cagccttttg 44800 gagtgccagt ttccaggtct tggttttat tatttattta tttattttt 44850 tgagatgggg gtctcactct gccgcccagg ctggagtgca gtggtgccgt 44900 catggctcac tgcaacctca acctccctgg gatcagttga gcctcctacc 44950 tcagcctcca gagtactagg gaccaccatg cctggcaaat ttttgtaatt 45000 ttttgtagag gcagagtctc accatgttgc tcaggctggt ctcgagctcc 45050 tagactcaag tgatctgccc accttggcct cccaagtgtt aggattacaa 45100 gtgtgagcca ccatgcccag cccaggtcat cttttgaggg catggagaga 45150 agactttgag catcccactt ttgagattgt gtaccagtcg caagccccta 45200 tgacacactt tttccccaaa gtagagggct ctgactatgt tgatcccaag 45250 agagatggga aagagcattg aatgaggatt ccaaagtatt gggccttagt 45300 tcgtttcctc atgttggtgt tgtgaagatt ctggttagga taacagcatg 45350 tgtgcaggag gctttgtgaa ctgctgagag tgaggcgtgg caatgtcagt 45400 gctaggtttg tccttactaa cctggggcca tgggaattga taagaccaga 45450 ttcccaactc taccccacaa tgtgatccct gtggtgaccc ctcacagggc 45500 tctttggtcg agcttcccag aagggatcac catctgccat tgtatgttga 45550 accccattca ttcattcatt cattcagcca accagcaact atttgttgag 45600 ctcttattgt gtgagaagca gtcttcaagg aactgggtga ataaaaaaaa 45650 caaaacatcc taaccttcat tgagcttaca ttcttactga aagaaaacaa 45700 ataaaacata catgtaatcc tagcactttg ggaggccaag gcaggcggat 45750 cacttgaggt caggaatttg aaaccagcct ggccaacgtg aaacccatct 45800 ctactgaaaa ttaaaaaaaa aaaaaaaaaaa aagccgggca tggtggcaca 45850 tgcctgtaat cccagctact cgcgaggcta aggcaggaga atcgcttgaa 45900 tcctggaggc agaggttgca gtgagccaag atcataccat tatactccag 45950 cctcagtgat gaagcaagac tccatctcaa aaataaaaaa taaaaataaa 46000 aatatgcatt ccctttgcac cagcacactt ggtgcctggg gacctcgtgg 46050 ttggcaccct gaagcaggtg tccctcttct gtcttgcaca ccttgcttct 46100 gtcctggtgt gtatggcatg gccttctgcc ctccatggtg agcactgtga 46150 gggcagaggt tgagttgggt ttgctgtatt tctcaggtgc ctaggtttgt 46200 gcttgacagg tagatggaag gcacacaatg tggtcatcaa acctcagtca 46250 accatataag gaaggtagaa gtgaaaagtc ccataggtac ccaactaatg 46300 tcaccagttt cctggatacc tttcctggag tttatttata gtgtgtataa 46350 ataaatgatg tatgtgttta aatgcctttt tcacctttcc ttttagagct 46400 gcctcttttt aacagttcca ttccattgta tggatgtact atgatttatt 46450 gaaccagttc cctactgatt attctgtttt ttgcagtctt ttgttatgat 46500 gaacattcca cagtgacaat gttgttcata gtcattcaca cacatgcaag 46550 tccttctgca ggatatattt ctagagggga attgctgact cagaggtttt 46600 ggtactctgt gttgattgta gagtgacggc agaaaagtga ggcccaagag 46650 tttcctagtg accatgtgta gtggacaagt caccagtccc tgtgagtgtt 46700 tggcccaaag gctttaaggc atttgatatc actgtttttg tttctgcacc 46750 aggcgggaga cactatattc aatcgtgcta agctcctcaa tgttggcttt 46800 caagaagcct tgaaggacta tgactacacc tgctttgtgt ttagtgacgt 46850 ggacctcatt ccaatgaatg accataatgc gtacaggtgt ttttcacagc 46900 cacggcacat ttccgttgca atggataagt ttggattcag gtaagagata 46950 ctcagtcaga atctgtggta aacatgtctc tctcatgtgt tgactaggaa 47000 atgcagtcct ggcagctcaa gagtgcctct ttaagctctg gagcagaatg 47050 cctcctctga gaaatgggtg ctttgtatta gttgagatgg aaagaagaga 47100 ccagaaatgc ctgtagtctc tgcacatcca gacaaaaaca aattttcccc 47150 cctttttttt ttttgtttgt tttttgagac agggtctggc tctgtcaccc 47200 aggctggagt gcagtgccgt gatcttggct caccgcaacc tctgcctccc 47250 gggttcatgc catcctgtca cctcagcctc ctgagtagct gggactacaa 47300 acacttgcca ccatgcgcag ctaatttttg tatattttgt agagatgggg 47350 ttttgctgta ttgcccagtc tggtctcgaa ctcctgagct caagcaatcc 47400 atctgccttg gcctctcgaa gtgctggatt ataggcatgt ggcaccatgc 47450 ctggcctaag aacagttttt agcatttggg aggggctctc atctttaagc 47500 tccaaatgat actgtatttt cttgcttttt tctttctctt gccccacaag 47550 ttttggaaag taaattggaa tagttttccc ccactgaatt atttagcttg 47600 tatacctcag cagatgttcc ttggcctgtt ttgttttgtt tttgagacag 47650 ggtcttgctc tgtcacccag gctggagtgc agtgacacaa tcatggctca 47700 ctgcagcctt gactgcctgg gctcaatcca tcctgcagcc tcagcctcct 47750 gagtagttgg gactacaggc atgagccagc atgtccagct aattttttat 47800 ttttagtgga gatgaggtct ggctatgttg cccaagctgg gcttgaactc 47850 ttgggctcaa gtgatcctct cacctcagcc ttccaaagca ttgggattac 47900 aggtgtgaac cactgctccc gcccttggcc ctataagaag gaatgtgatt 47950 ctgttttcca gcagggcaca aacttctgct taaatacaaa gcccaaattt 48000 ttccaccaaa atgcccctag tgaagtggcc agcccagatg cccgactagc 48050 gtattatcca aagcatattg tcattggtgg aaaatggcct tatagtccat 48100 tgttttgtct taaaagtaaa tatataaata aacttgtata ttgtttccta 48150 attccgtgtt tatattaaca taaaagtgtt ttaaattacc tgtcagtggc 48200 caggtgcagt ggctcgtgcc tgtaatcgca gcactttggg aggccgaggc 48250 gggcagatca cctgaggtca ggagttcgag accagcctga ccagcatggt 48300 gaaaccctgt ctctactaaa aatacaaaaa ttagccaggt gtggtggcag 48350 gtgcctgtaa tcccagctac tcgggaagct gaggcaggag aattgcttga 48400 acccgggagg cagaggttgc agtgagttga gatcgcgcca ttgaacttca 48450 acttgggcaa cagagcaaga ctctgtctca gagaaagaaa aaaaaaaacc 48500 tatcagttga ataacaaaac cctttccttc cttgctttaa gtgaatctga 48550 agatccagga gctgtgctgc aggtaccctc tatgttgggt acccctggtt 48600 taggctgact agtacagtgt ggttggctca tgtagacagc agacccttta 48650 ttttagatac aacttttttt ctttttcttt tatttttttt gagacagagt 48700 cttgcttgtc acccagcctg gagtgcagtg gcgtgatcat ggctcactat 48750 agccttaaac tccctggctc aagtgatcct ctcacctcgg ctttcctagt 48800 agctgggacc acaggtgtgg gccagcaccc ctggctgatt taaaaaaaaa 48850 aaaatttttt tttttagaga tgtctcacta tgttacccag gctggtcttg 48900 aactcctggg ggctcaagca atcctcctgc tttgacctcc caaagtgctg 48950 ggatgacagg catgaactac tgcacctgct gagatgcaac agctttctgt 49000 cagactcatt ttattctcat catttcttcc tgtcctccct tgctgggagc 49050 atgagagctg tgatgggaat ataggaatgt atgaagtcct tctcccagat 49100 caaaaatcct aacttcttgt cttaaaggga ggaaaatttg aatgtaacct 49150 tacttttaga ctcttcagaa atccttctat acccttccgt ccccgctttc 49200 acccttcctc cctctccgtg tgtgtatctt cttctcttga aacacacagg 49250 tttataccct gacccctctt gattcatccc ttgaagcaca gtggtgaaca 49300 aggaaggggc ccgtgatgcc ctaattcttt gccacagcac catgtttgtt 49350 tcacaaggag cctggcaggt ttgggcttgg ggcagatagg ggagagaaag 49400 cagcagagac agcaaaacca aatcatgtca gcttggcatg tacttccctc 49450 tgaaatagct aagaatccat ttctgtaaaa gcactgatta tcagaaaacc 49500 ttatggcct ggccaccttt ggttcaaacc ctcacattaa taatgtggac 49550 agtagtatga ggtgtgccaa aggtggatga ctcagcacct aagtgatgac 49600 acctaattac gaataggttc attaaagcag accccctggg gacctttgct 49650 tgaggatcct tacagtcaga attcctgaat atatttgaaa ataataattg 49700 catctttat ttcatatgtt ctgtatggtt tggctgactt ccccctcaaa 49750 gtctgagtta gagttttcct taatttatgt gatgggtttg gtctttttgg 49800 attccagaaa gagctgggtg tggtttggag ctgcactcag agtcacacaa 49850 aaccacagcc tttagagaac ccacaggaag gctttggggc acgtcctgat 49900 tcttgacatt tctcatcagt gctgactttg tatcccttag gagttcacaa 49950 ttcataacca ctgaaatatt aaaatacaaa aagttttgga aggatgagag 50000 cccagatgct ctactacttg aaaatatgtt aaaacataag ttcatcatta 50050 tacattttgc taaatcagga taaagtctga agtttcaaag aagttttatt 50100 ttagcaaatt ttcagaaaca ctgcctcaac tgttagggcc agtgttctag 50150 tcagtatgcc tttggaagca tgaaagctgg attggtcgat aggatgggtg 50200 tggaaggggg gctgtgactg ggtgggtaca gagaggctct gaaacaatct 50250 cagattccag gagttcctgg ataaggactt catgtgcggg aacagagcac 50300 aggagaagca gattcctgag ccactcagga agaactgggc ctaggcctgc 50350 tcttgtcact gactggcttt ctacataacc acagaaacag cactgtgttg 50400 tagaaagagg aagatcatac tttttgatat ctgtgtctaa tttaaggtca 50450 tctgagccct gatagaaaag caaaacagac aaaacccttg taactgctcc 50500 ctcccacccc acccaccatc aaaaaagctt tagagaggct ggacatggtg 50550 gctcttgcct gtgatcccag cactttggga ggctaaggtg ggtggatcac 50600 ctgaggtcag gagttcgaga ccagcctgac caatatggtg aaaccccatc 50650 tgtactaaaa atacaaaaat tagccaggtg tggtggcaca cgcctgtagt 50700 cccagctact tgggaggctg agacaggaga attacttgaa aacctgggag 50750 gcggaggttg cagtgagccg agatcacgcc attgtactcc agcctgggct 50800 acagagcgag actccttcaa aaaaaaaaaaa aaaaaaagat ccggtttggt 50850 gtcttacaac tgtaatccca gcactttggg aggccgaggc cggtggatca 50900 cgaggttaag agatcaagac catcctgacc aacatggtga aaccctgtct 50950 ctactaaaaa ttagctgggc gtggtggcag gcgcctgtag tcccagctcc 51000 tcaggaggct gaggcagaag aatcgcttga acccgggagg cggaagttgc 51050 agtgagccta gatcgcgccc ctgcactcca gcctggcaac agagcaagac 51100 tacgtctcaa aaaaaaaata aataaaaact ctagagaagc aaaaagaata 51150 actttaaaag tgtttatgtt ctcagcaagc tttatttgg ggatgtcaga 51200 acttaactaa ccactgctcc ttctgtgtgt atgtttttcc tccagcctac 51250 cttatgttca gtattttgga ggtgtctctg ctctaagtaa acaacagttt 51300 ctaaccatca atggatttcc taataattat tggggctggg gaggagaaga 51350 tgatgacatt tttaacaggt aatggtcata acttagatat ctttctcctc 51400 tgtcaacctt cacttccagt tttttaacca atgcttggtt gttcccccaag 51450 gactgaccct cagatgggat gcacccctag tcagcccaca ttcttaggtg 51500 tggcttccta caggtcctgc aggtgctaaa agggatctgt aggaaaaatga 51550 gtttctgaga tttttgtatt ggcctggaaaa aatgtcaaat gggaaccaag 51600 tgacggggca agtttacttt gacttgctgc atgccgtttt gtactcaagg 51650 agtaaaccaa tgtcctttgt aaaaatccct cctttcatta tggtcccctt 51700 tcactgtgaa acaagtttcc ttgagcagaa tcctaactgt cttcacagaa 51750 gctttgtgtt atatttttat tttggagtat tttcacatat acaaaagaga 51800 tactgtagta taataaacct ttgaggacct atccagcccc agcaaccatt 51850 atggcctggt cagttctgtc ccatccacat cctggggctc tttttaagct 51900 ggtaaatcat tatgatgtgg gttgtcattt acagtggtaa aaaacatcta 51950 tcagtagcat ttgaaagaac attctgctca gtcctctggc tgtagaggct 52000 tcaaccccac cagccaccga tgagcacctt ctccctccag gagccagtct 52050 gagctcatta ctgagtttaa tatcagaata caccctggtg cagcctttct 52100 aaattgcagt accagttaac agaaggtgtc tgtcagagca acacccaagt 52150 cattcaagtt accattgtgt gcaaacttaa cagagaccca cgtcttcaat 52200 ataagccttg aaggaaactc cagttttagt atgtagatgg ggtatcaagt 52250 gtgtgcacat tgaacatctg ctgcatacag agcactgtgc caggcaggcc 52300 caggacactg aaaacctgga catagggtcc agacagaagc aagcctgctt 52350 ccacagaggc actcctgggc agacactctg gactgatatg acagtgtgca 52400 gggccgacag gataccacag gtctgaatgg tcagaacagc tggggagggga 52450 gggagcatcc gcaggcatct agtcccatgc taacgcagtg gcactagaag 52500 gatgggtggt gtgtggagca actttcttga aagataaagg acctaacact 52550 ttctatgcac cacttactgt gtgccaggca aggccaggaa tgtttaagtg 52600 gtctgggatc agccagttct gcctcttaac taactttgct gtcctgctct 52650 ccaggctttc attttggtcc tcattccttt tccttggacc aacacagaat 52700 cctccaccct gttctggctg cctctagtct tgttctcagc cctccatttg 52750 tttttttctg ccttttccca catgttctga agccctccat tcgtatacta 52800 ctttccagag acttccccat ggctaaaagc attttggaaa tactgtatat 52850 taggcccctt tcagatactg gcaaccgttt gtgggatgct ctgagaaggc 52900 ctctgtgact tagcctggcc cttttcagcc catcacctgc cacgtcctac 52950 cccagaccct tgtcaccagt ccccaggagc ttacgttgct ccctgagggc 53000 actaggcttg ctctcacttc catgcctttg cctgtgccat cctggctgcc 53050 caaaatgcta tggcagatac ctgttcatcc tcaactgggc tctgcctagg 53100 cttgctccag cagaggttac aaactctatg cttcttcctc tgtgtctcca 53150 acctcatctt cctcttctca cctccatcct ggccctaaag gccctatgtt 53200 tgaagcattc acactgtata ttctgtgggg cacacggccc cagtgtctgg 53250 cacatggtag tcaacaccac aaaccgcaga accagttgta aaaggacatg 53300 gagtcggaat gtgagtttta accagggtca tgctgggctg ggttctggca 53350 tgatgctggg ttgtgggctg agtgagaaca gcaagggtga tggtggatgg 53400 agcaacagtc ttgcagccgg ggctctcagg ccaagtgtat ggcagctctg 53450 tgataatgac tttcccttta ctctttgcag attagttttt agaggcatgt 53500 ctatatctcg cccaaatgct gtggtcggga ggtgtcgcat gatccgccac 53550 tcaagagaca agaaaaatga acccaatcct cagaggtgca ttctttgttt 53600 attcatactc cttccccctt taggatgagg taggctgcag gtccgaggct 53650 ctgggcctag agggaaattg aggtggtcag gttacagtgg agagggagga 53700 ggaagtacgt gtgatgattt cttcttaaga tttttgtttt aagacaatct 53750 ccttgtgctc ttttccttgt aggtttgacc gaattgcaca cacaaaggag 53800 acaatgctct ctgatggttt gaactcactc acctaccagg tgctggatgt 53850 acagagatac ccattgtata cccaaatcac agtggacatc gggacaccga 53900 gctagcgttt tggtacacgg ataagagacc tgaaattagc cagggacctc 53950 tgctgtgtgt ctctgccaat ctgctgggct ggtccctctc atttttaacca 54000 gtctgagtga caggtcccct tcgctcatca ttcagatggc tttccagatg 54050 accaggacga gtgggatatt ttgcccccaa cttggctcgg catgtgaatt 54100 cttagctctg caaggtgttt atgcctttgc gggtttcttg atgtgttcgc 54150 agtgtcaccc cagagtcaga actgtacaca tcccaaaatt tggtggccgt 54200 ggaacacatt cccggtgata gaattgctaa attgtcgtga aataggttag 54250 aatttttctt taaattatgg ttttcttatt cgtgaaaatt cggagagtgc 54300 tgctaaaatt ggattggtgt gatctttttg gtagttgtaa tttaacagaa 54350 aaacacaaaaa tttcaaccat tcttaatgtt acgtcctccc cccaccccct 54400 tctttcagtg gtatgcaacc actgcaatca ctgtgcatat gtcttttctt 54450 agcaaaagga ttttaaaact tgagccctgg accttttgtc ctatgtgtgt 54500 ggattccagg gcaactctag catcagagca aaagccttgg gtttctcgca 54550 ttcagtggcc tatctccaga ttgtctgatt tctgaatgta aagttgttgt 54600 gttttttttt aaatagtagt ttgtagtatt ttaaagaaag aacagatcga 54650 gttctaatta tgatctagct tgattttgtg ttgatccaaa tttgcatagc 54700 tgtttaatgt taagtcatga caatttattt ttcttggcat gctatgtaaa 54750 cttgaatttc ctatgtattt ttaattgtggt gttttaaata tggggaggggg 54800 tattgagcat tttttaggga gaaaaataaa tatatgctgt agtggccaca 54850 aataggccta tgatttagct ggcaggccag gttttctcaa gagcaaaatc 54900 accctctggc cccttggcag gtaaggcctc ccggtcagca ttatcctgcc 54950 agacctcggg gaggatacct gggagacaga agcctctgca cctactgtgc 55000 agaactctcc acttccccaa ccctccccag gtgggcaggg cggagggagc 55050 ctcagcctcc ttagactgac ccctcaggcc cctaggctgg ggggttgtaa 55100 ataacagcag tcaggttgtt taccagccct ttgcacctcc ccaggcagag 55150 ggagcctctg ttctggtggg ggccacctcc ctcagaggct ctgctagcca 55200 cactccgtgg cccacccttt gttaccagtt cttcctcctt cctcttttcc 55250 cctgcctttc tcattccttc cttcgtctcc ctttttgttc ctttgcctct 55300 tgcctgtccc ctaaaacttg actgtggcac tcagggtcaa acagactatc 55350 cattccccag catgaatgtg ccttttaatt agtgatctag aaagaagttc 55400 agccgaaccc acaccccaac tccctcccaa gaacttcggt gcctaaagcc 55450 tcctgttcca cctcaggttt tcacaggtgc tcccaccccca gttgaggctc 55500 ccacccacag ggctgtctgt cacaaaccca cctctgttgg gagctattga 55550 gccacctggg atgagatgac acaaggcact cctaccactg agcgcctttg 55600 ccaggtccag cctgggctca ggttccaaga ctcagctgcc taatcccagg 55650 gttgagcctt gtgctcgtgg cggaccccaa accactgccc tcctgggtac 55700 cagccctcag tgtggaggct gagctggtgc ctggccccag tcttatctgt 55750 gcctttactg ctttgcgcat ctcagatgct aacttggttc tttttccaga 55800 agcctttgta ttggttaaaa attattttcc attgcagaag cagctggact 55850 atgcaaaaag tatttctctg tcagttcccc actctatacc aaggatatta 55900 ttaaaactag aaatgactgc attgagaggg agttgtggga aataagaaga 55950 atgaaagcct ctctttctgt ccgcagatcc tgacttttcc aaagtgcctt 56000 aaaagaaatc agacaaatgc cctgagtggt aacttctgtg ttattttact 56050 cttaaaacca aactctacct tttcttgttg ttttttttt tttttttttt 56100 ttttttttgg ttaccttctc attcatgtca agtatgtggt tcattcttag 56150 aaccaaggga aatactgctc cccccatttg ctgacgtagt gctctcatgg 56200 gctcacctgg gcccaaggca cagccagggc acagttaggc ctggatgttt 56250 gcctggtccg tgagatgccg cgggtcctgt ttccttactg gggatttcag 56300 ggctgggggt tcagggagca tttccttttc ctgggagtta tgaccgcgaa 56350 gttgtcatgt gccgtgccct tttctgtttc tgtgtatcct attgctggtg 56400 actctgtgtg aactggcctt tgggaaagat cagagagggc agaggtggca 56450 caggacagta aaggagatgc tgtgctggcc ttcagcctgg acagggtctc 56500 tgctgactgc caggggcggg ggctctgcat agccaggatg acggctttca 56550 tgtcccagag acctgttgtg ctgtgtattt tgatttcctg tgtatgcaaa 56600 tgtgtgtatt taccattgtg tagggggctg tgtctgatct tggtgttcaa 56650 aacagaactg tatttttgcc tttaaaatta aataatataa cgtgaataaa 56700 tgaccctatc tttgtaac 56718 <210> 2 <211> 56718 <212> DNA <213> Homo sapiens <220> <223> variant B4GALT1 genomic sequence <400> 2 gcgcctcggg cggcttctcg ccgctcccag gtctggctgg ctggagggagt 50 ctcagctctc agccgctcgc ccgccccccgc tccgggccct cccctagtcg 100 ccgctgtggg gcagcgcctg gcgggcggcc cgcgggcggg tcgcctcccc 150 tcctgtagcc cacacccttc ttaaagcggc ggcgggaaga tgaggcttcg 200 ggagccgctc ctgagcggca gcgccgcgat gccaggcgcg tccctacagc 250 gggcctgccg cctgctcgtg gccgtctgcg ctctgcacct tggcgtcacc 300 ctcgtttact acctggctgg ccgcgacctg agccgcctgc cccaactggt 350 cggagtctcc acaccgctgc agggcggctc gaacagtgcc gccgccatcg 400 ggcagtcctc cggggagctc cggaccggag gggcccggcc gccgcctcct 450 ctaggcgcct cctcccagcc gcgcccgggt ggcgactcca gcccagtcgt 500 ggattctggc cctggccccg ctagcaactt gacctcggtc ccagtgcccc 550 acaccaccgc actgtcgctg cccgcctgcc ctgaggagtc cccgctgctt 600 ggtaaggact cgggtcggcg ccagtcggag gattgggacc cccccggatt 650 tccccgacag ggtcccccag acattccctc aggctggctc ttctacgaca 700 gccagcctcc ctcttctgga tcagagtttt aaatcccaga cagaggcttg 750 ggactggatg ggagagaagg tttgcgaggt gggtccctgg ggagtcctgt 800 tggaggcgtg gggccgggac cgcacaggga agtcccgagg cccctctagc 850 cccagaacca gagaaggcct tggagacttc cctgctgtgg cccgaggctc 900 aggaagtttt ggagtttggg tctgcttagg gcttcgagca gccttgcact 950 gagaactctg gtagggacct cgagtaatcc actccctttt ggggactgac 1000 gtgaggctcc cggtggggaa ggagactgac ctctcggttc acgtgtcttg 1050 ccatagagcc actctcctga gtgggttttt ctcctgatcg tttgggccaa 1100 gtgacttctc tctgaacctc atatttctct tctggggataa taaatggtca 1150 ccctttcaag gggttgtttt ggaagatatt gtgaacaatg gtaaataagg 1200 gcttaattaa tgagggtaag ccctcagtaa attgtcactg tgtgttcatt 1250 tcttcctctg tgtggatcgt gaccgagagc ccttccccct agcctcctcc 1300 tggtatgggt acccaaaacc taggtgagca gggatctctc ccaggggcag 1350 agagcttgtg tactctgggt gttagagggc taaaatataa ccagtcaaca 1400 ccacgttgcc catttctggt acttccggta gcagcctgag tctcaattat 1450 cttgcccaga tgatctgaac tctgacctct agcctgtttc agcataggca 1500 gagagcttga gtaggtgagt ttgcattcct catagcagct ggctgagcct 1550 agtctggact tctctttgac ctgtaaccta caggcccaca ggcccaaggc 1600 aaccacaggt tgcttccagg gttaccacac aggtggtttc tcatttctaa 1650 tgctaggttt tagataattg ttgtaagtga ggggccctgg caggcaggat 1700 gacatcctgc caataggagt tttctgtcac tttcccacag agccctggct 1750 actacatact cttgctcaat ttcgccagta attgcgtcaa tgtgttcata 1800 tcaagtttgg gaagaacatc ttggaattgg tcagacgtga actgtggtaa 1850 taatgggggc ttgttttttt aagcagataa ttaaattcct ttgcatttga 1900 tgattattct gggaagcaga ctagtcccat aaaatgaaat ggactctgcc 1950 ttgctgctaa gtgtctgact tgagacatgc tatcgagttt ctcaaaatct 2000 cttccttgtg taaaatgtgg ttgtcgatga ttaccttaca ggggtttttt 2050 taagactaaa tgagatcgtg tacattaaat acaggcactc aggctgggca 2100 tggtggctca cgcctgtaat cctagcactt tgggaggctg aggggagtgg 2150 atcacttgag gttaggagtt tgagaccagc ctggccaata tggtgaaaca 2200 ccatcccatc tctacaaaaa tacaaaaaag ttagccaggg gtggtggcat 2250 cgcagctact caggaggccg aggcaggaga attgcttgaa cctgggaggc 2300 agaggttgca gtgagtcaag attgtgccag tacactccag cctgggcgac 2350 gaagcaagac tgtctaaaaa aaaaaaaaaaa aaaaaaaata cgggcactca 2400 atacacgta taataataat atagtaataa tatttgctta ggatctttaa 2450 aaagtttcat tttttcagac tcccacagaa atggctctgc acagcagagt 2500 gaagggggag agagactgag tctccaggcc agaaaaaggc caggtttttt 2550 gcttttgttt ttagttgttg cctggatatt gcacagaaag aaaaaataat 2600 tagcaagtta aaaaaaagta ccgcaaagtt gattacattg gtatttgagt 2650 atcacatctt ctctcagaag cgtaagagac aaggtcgtga ccatacctct 2700 gcttagtttt gttttgtaat ggtgttgcta gtgatcggct tgtcaccagt 2750 tactggtgtt tctaaatgga ctataattgg ctacttgaaa ggacttcctg 2800 agaaagaaca ttttggagga cgaggagaga gtgccttctc tattttggct 2850 gctttcatgt gacatgcaag agaccatgac gtttaggctg ctgctgaggc 2900 agccccagaa atgggggccg agaggtcttt tcttcatttt aatagggtct 2950 gtaggtttgg gtggttaggt acagttctca gaatggaggt tcctggctat 3000 gaggccttga gaaagctgaa agtctccttg ggagtgtgtg ggtgggggga 3050 gtcgagccca tctgttcatg ggcaggtgtc agccaaagcc cttgcgggtg 3100 gttttgaggt tggtggggaga aagcatccgt ggggtttaga gttgtggcct 3150 tttcactact tgcagttcct ttccccgact tggctttact ttctggtgtc 3200 caggggtctg ggccagatgc tgagattcct ctcagctgac aggtgtgggt 3250 tatgggcaaa cccttccctg gaggacataa ggcaccggat tggactgctg 3300 atgggttgct gttggagttg tcagggcctt ggaaatagtct tcagatagac 3350 ttgggttagt gtgacctggg gcaggctgca ggtttggagc catagtaccc 3400 cccgccccca caccgggcac cctgctctgg gctaatgtga ggcttgcagg 3450 agtgagtgat gcagtgggaa ggggggcctt tcctgaggat tctacagctt 3500 tctccaggga atcctcccag gtagtttagg cctgcaggtg ctatgctatc 3550 cttctttcct aaccctgtct caggtcctca gcggggccat gcggcatcca 3600 cttataaccc tgcagcgagg ccctcttttc tggccacctg ggtgtttgcc 3650 tgctgagatg ggaggaacag tggccttggg cttcttcccc cgtcatgttt 3700 atctctgctc agatgggca gcagctcaat gggacttgac cagctgtggc 3750 actgccagtc tgaagatgag tagggtgatg gggggaggtg ggcagtacct 3800 gaagctgaac tggtgagaga ggcaggctgg cctgggggct cagctggggc 3850 ctgggatggt tggtacagtc ccctcagggg ggtaggggag tgagtgttag 3900 actgcttaag cctcagaggc cgctcttgcc cacctatgct ttgaggagat 3950 cctcttcatt tgttcaaagg gaagactctg atctagagat gggcacttgg 4000 accagcaaac agcagctaca ggtagccagg gcacccgagg agcacttgct 4050 catgagccgg tttccctggt ttttatgggg gctgttgctg agcgtctgcc 4100 agggtttgtg tcctagcact tgctggtctt tgctgggctc tcagctctca 4150 ggtgtttctc taccagcacg tttccccctc cctcatatgc acacatgtgg 4200 acacaagcag gctgcccagg acagagtgta ctttgaggct tgggaaagga 4250 ctctctctcg cccttttggg gatgagcctt ggaacctcat caccttccgg 4300 cttggggtgg agcttcatcc tgggggttga agctttaggc tcagataact 4350 agtcttgtaa gccagttttg tcctgttgtt tttttcgtgg aaaataatgt 4400 attgacgtat acacagacat tctttgtcta acagtctgag attgagaaat 4450 accctccatg actatttggt ttgctttcat ggtgaaactt ggtcgctttc 4500 ttagacacag cctatggcaa taagagtgat ccctggctgc tgtaattcat 4550 tccagacttt gagcaaacac aaggcaccgc ctccacctgc agtggagcct 4600 ctgatgaacc aaatggaaac tccttgggga atggggagta agagccaaat 4650 gtgggattgg acttaaactg cagcttctta gaactgtagc attccacgat 4700 gggattgtct agtgctcttc ctggaggtta ctattcaata gttggctagt 4750 gcacaggttc aggggtgacc tgatatgccc tagcgtttca gaagatccct 4800 gcaaggtgtg tcttttggtc catctgaagg gtcttgtatg gtgatcttgt 4850 atggatatcc gtgacggcta aggcatctga taacttcatt ccttcagttc 4900 cagcagtgtt cctgtattat gctgggcact agagctacaa agaagaaaac 4950 aaagtgcctc ctcttcagga actcttaatt taggcagggg aggcataatt 5000 gaacagtgct gaggtcatct aggggaacca aagtgtgtat ttatcccctt 5050 ccctatcact cccctccctc cttcatttct tcctttcttc tttcagaaac 5100 tccaagttca tatcaaaatt ctccagccct ggttttatattt ggttgtgtga 5150 aaattttcct ctaatttctg aagctatgca ttagttctgc tgagtaatct 5200 ttaacttgct gctttataat gattataatg agatatcact gggtattatg 5250 gtctttgggt agcagcaggg tagggatttc caggctggga ctaagctaat 5300 ttatgggttg ggaattatgg ggcagttaat agcaaggcag tccaagcttt 5350 ccacagattc caccctaggg accatccaga cttaaggaac agggccggca 5400 ggctcatccc ctttgcactc agctgggcta tgggtgtgtg tttgtgaaag 5450 aggtttattc agtagtcata cctgctgatt tccctgctat ctgtttaccc 5500 agtgcctcct gtaccttgtt tcttactctt tgttctctgc tcttactatg 5550 aagaagcaga gactggaatt ctgcttgaac ccacatctac ctggaaattc 5600 cagtttttct tgtccagtgg agcagcaatc cagttgtttt aggacaaatg 5650 gtctgccctt gaagcttaaa tcctttgagg gcctggcatg gtgacagttt 5700 tacatttggc tttggtatag actggtgtgg tccctgggca gtgaggtcac 5750 tgtaaggcca gccagccaga ccctggctcc taggggaatt aacaaggcat 5800 gggattagac tcacagggtc cctcctgtcc ctaaacttgg taggggttcc 5850 tgggagccag actgcgatta agattgtaga gacctgagac ctgagttgta 5900 ggggcctctg tgttgatctg ggccattgcc gggtgagctg aggcggtcac 5950 tagctcaagg agtgatctca ggatattgtt ctgtaagtca gagacctcca 6000 ggttggagag tggggcttgg gggtggggga cagggtttag tggggagctg 6050 gttctgggtg aatgtggcct aaagggattt gtccttagaa gacagagggg 6100 tgagtcacac actcagtgct tcaggttcca ctttgcggct tggcctcagc 6150 ccgccccttc cctgcacaaa tgaaggccag gggctatata attggctgtt 6200 gctgaattct ttggcagtga ttttaaagtc tggtctgggt gtgttatgta 6250 gctgcttctc tatccactcc ccacacccgc tgcttctcca gagcccctca 6300 caaagcccag gcagagagag agagagagag agagagaatg acttgcctca 6350 cagagatgtt ggggatagg ataggggtat gggtctttgc ttttgccttt 6400 tgagggggga taatctcttc cttcatttta aaagtaaaaa gtaatgcagg 6450 ctcattgaaa ataatttgaa aagttgaaag agatataaaa gcacacccaa 6500 attcctatca cccaaaagaa acataccggc atatttccta ctagtctttt 6550 tcatgtttaa gaatatagct gatatatttt tttttctttt tctttttgag 6600 acagggtttt tgctctgtca cccaggctgg agtgcagtga tcacggctca 6650 ctgcagcctc gacctctcgg gctaagcgat tctcccactt cagtctcccg 6700 agttgctggg accacaggtg cacaccgcca tgcctgacta atttttgtat 6750 tttttgtaga gatggggttt tgccatgttg cctaggctgg tctcgaactc 6800 cagagctcaa gtgattcacc tgccttggcc tcccaaagcg ctgggattat 6850 aggtgtcagt caccacaccc agtgttatag ctgttgtctt tatagatgaa 6900 cagatagatt gacatagatt catgtagata gcctggtgtt cagcattttt 6950 catttaagat tctgtcacag acttgaccct atacctttaa aaatcacaaa 7000 ggcagtatca tagtctgtca gctgaatatg ccataactta aaaaaatcat 7050 tcaactgttg ctgaacacac acatatacat atatagtttt tgttttttct 7100 tagtgatgta gtgatgcttg tgcagaaagc tttatgtact ttttggatgg 7150 tttctgtagg agagctttct aaaaaagggaa aaaaagtgtt gaatgttttt 7200 tgagaagggc tagattttca agccagtctt acaaaagggat agactcattg 7250 gaaattccag atttgcttag tgctggcaga tgagtatcac ttattgctga 7300 acaatgtgtc tagaattctg attaaaaaag aaactaggtc caggaagtgc 7350 ctgggggcag gggcaaaggg ccaggctgca ggataggctc ttaggatctg 7400 gctgagcaga aatctgctgt gaacagaatc ggtgggggtg atgctttctc 7450 agtaacttct ccatttgttt ctttagcagc taagtccctg tgctggactt 7500 ctgtggacta ctgtggctct ggggctgtgg ttgtgggtga acaacagcta 7550 gctaaaccag tgctgttgac atcattgaga tgtgacgcac aggaaggtgg 7600 gagcaagctt gcaaatcaga ttctgaaaca tatagcacag ctctcccacc 7650 tccaggtggt cctgagatct agggaggagc catagtgaga aactttaggt 7700 ttctaggaat tctcttaggg agaagctctc ttagggagag gcagaacctg 7750 gttctcagtt ggggctgatt caggtgggtt agatcaataa agcctcaggc 7800 cagtgtgcca ggctattccc aaggagtata ctttgaagtt actcccttta 7850 gaatgtcctc agtggagata aattctctct gaggagcagt tttgtctgcc 7900 ggggtcattt ggcacaaagc ctggagtgct agggcgaggt tgcactgagg 7950 gaagggggcag gattatgtca gcagtgtgac ggatacagtg tgaggtcagg 8000 ctccttcctg ccccaccacg ggggcctaga ggtcatgggg agggtccctg 8050 gcaggggatt caatcattgc ttggccccat gacagagtat attctaaaaa 8100 tgccttaagt ttttttcttt caaagtttct tcctgttttg cataatggcc 8150 ttttgccttt gacatcctga aaccgcagag ctgtcattgg tgttgcagga 8200 cactgccagc ttgaaaaaaa tcaacaaacaa aaaaagaaac aggaaaggat 8250 gtggagttca gggtgcggcc tagggaagct ggtatttgcg ttatgggatt 8300 gtggggatgt ggtattaagg tgttgggtag cgcctgacat ttagaggagt 8350 actctgggca gagtccctgc ctgcccaaga ataggtagaa ttgagtcttc 8400 acaccaaagt caggagagac cccctccccc caggaagaga atgaacaggg 8450 actcatttcc tcattcagca aacttttat ggtaactaca ctatatgaag 8500 tgtgagagat agacatgaac aagagaggcc cccactcttg ggcagtccct 8550 tagtagtagt agatagactc tggcaatatg gtgtggtcag agagaggaag 8600 cctgggtgct ttgagggtac tgaggaggtg cagggagcca aatgggtggt 8650 ctgggccagg gccagagtca gaatgaagga cctctcttcc agacgttgat 8700 tttagcatct ctgtctctca gtatgtttga acagtctccc ttattggaag 8750 ggcaggagtc tactgctaaa agtaacctgc gatttcctct acttgctgtc 8800 atgtggaaag aatactaaag ctgaaattcc aaaagttgca cacctttacc 8850 agcagggcag gagaggaaag gaaatggagg cagagtgagc tgaagatgat 8900 aaaagaaaga gaaggtggtg cagtttggac tgttatggac agaggaagtc 8950 tgagggtagc tggactgagg gatcaaaggg aggcagttga aagggaagag 9000 agctgcagag agggatttct tggtctgcag agggtaggag caagccttga 9050 aggctgctgg agtgaggatt ccgagccctg gtctttattc tttttctaat 9100 tcatttacatc attttaggca agtcctaact cctttggtct ctgttgtctt 9150 tctgaaattt gagtgggctg ggcctgctgg tctttagcct ctgtctttct 9200 ctacctccta gattccagtt tggcgagtgg gggggaaaac ctggttgtat 9250 atgcaacgtg aaaggcctct ggaattcctt ttgaagctca ctacccatga 9300 ggcttctgct aaggatttca tcatgtctgt ctaagcagac ataaaaattt 9350 tagcaggtgg atgacccgta gaaatggcac aaggaatgtt tctttctgtc 9400 acactgtggt atttgattta agaaagttgt tatcctctct gtgcctcagt 9450 gttctcactt gtaaaatggc aataacagta tccacctcat agatgttatg 9500 aaatacaggt agtagccacg aaagggctta aaacagtgcc taacacagaa 9550 taagttgtga atatatgtta tttattatg gtagtataat gcttatttgt 9600 gaagattttg gcttttgctt tataggacct tttttttttt tagttgaaaa 9650 tacaatgtta ccatgttaaa tgttaaaaaaa aattctactt accattgtaa 9700 cagaacatgc tcccacttct gtaacagagc ttgctattac ttttcaaatg 9750 catacatatt ccaatgcata tattccaatg cagttgtaga gtgaaactgt 9800 ttgcatgcag ccatttttat ccaaacattat cttataaaat gttatgttgt 9850 ttatgattat cctaattatc ttttgttgct gtctagtatc cttatagata 9900 ttccattagc atacactatt ccaggtttca ctatcgtcga taatctagat 9950 atgaacattt ttgtagtgtg tagctctttg cttcagttga attactttcc 10000 tgggataaat tcctggggaa gaatttctag gccagaggat atggtcatct 10050 tgacaatact gattcacatt gctgcattgc tttccaagag gtttggaatc 10100 attcacaggt tctaaattgg aaaatcctgg cttttgaagt atgtggattc 10150 taagggcgat ttggatctag ctggagcctc acactgacac ttccagccag 10200 tgtgtgtgtg tgtgtgtgtg tgtgtgtgtg tgtgtgtagt tccctatgct 10250 ggacaccgtg tgtgtgtgtg tgtgtgtgtg tgtgtgtgtg tgtgtagttc 10300 cctatgctgg acaccatgtg gcctttctgg acattagggt tttcctgtga 10350 ttgcctcaga gcagttcctg ttgaattcac tctgtgtcca caaaaggagc 10400 cttactgtgg ctctttcaac acccacctac ctttgccaag ttggtttaca 10450 gaaagtaaga acattctttc cttcttcctt gatatgtggc gctaaaaccta 10500 tagcatgggg caggctctgg ctttaaaaac ctgacttaaa aataatggtg 10550 ttgatcaaaa agtttgtgga tcagtttttg gaaacactgc atgtagccat 10600 ccatagaaac ttatattctg ttgggctagc ctgggcgcct gatcatttaa 10650 ctcatgtgga tgaacttcta tgtaatagcc ctggtgtatg ggatccagaa 10700 acaggggccct aatgaagaaa ggcttttaaa ttatgttgga taaaaataag 10750 ttgttacaat agcccaaagt ctgcaaatat gaattgccag ttctgtcctt 10800 gtagtcatcc accatgtgcc tgcatctttt gtagactctt gtagattcag 10850 aagcccactg aattgcataa atgatggaat gattttagac ttagtgattt 10900 cagtgactaa aagtttacag atcctggccg ggcacagtgg ctcacaccccg 10950 tattcccagc actttgggag gccgaggtgg gtggatcacc tgaggtcagg 11000 agtttgagac cagcctggcc aacatggtga aaccttgtct ctactaaaaa 11050 tacaaaaatt agccgggtgt ggtggcatgc acctgttgtc ccagctactt 11100 gggaggctga ggtggggagaa tggcttgaac ctgggaggcg gaggttgcag 11150 tgagcccaca tcaggccact gcactccagc ctgggtgaca gagtgagact 11200 ctgtctccac ctcccccgcc ccccgaaaaa aaaaaaagtt tacagatcca 11250 gcagatgggg catattcaat ttgtgacagc cactcccttc accttatagc 11300 tatgtcatat gtcttcttct cctttgactg cattctgcag cagtcagttg 11350 tgacttaata tggcactctg ggcccactga attaggtcag agctgctagt 11400 agtatattgt tcctagagac ctagggcaag attttcttac tacataaaat 11450 gagggagata atttcttacc tcaagatgtt ggtaagagga gtgaatgagg 11500 ttagttatat ggtaatatca gtactctgaa tgtcttttga tcaatgccta 11550 actcatcttc ttgggcacaa aaggcataca gtcagcaccc ttaggccaca 11600 tataaaattc ctccaaatgc aggttttcat ctgccttggg gcagagtcaa 11650 gagaaagaag aggaagaggc gtgaggctct gaccacaact tagggacaga 11700 atatagccca aagcgagtac cccaggccac aaggagaagg ccgctatctt 11750 gttgaatcca cagcactgga aacttggagt gtgtgttccc ctgtgtcagt 11800 tacactggaa ttttatggct gctcacattc ttcccttcag gtggacgttg 11850 ttcatcagta tcctgggcaa gaggccatca taaaccacag acagctgagt 11900 gattaggaag aggagctgaa gagggagcat tagatgtttg attgagtctt 11950 aggtgagaaa gtatatcatt aaaacaaaaa gatagatgta ggcgggctca 12000 gtcttgtgtg cctggtgtgt tggtagaaaa actaaagcac aagcctgtag 12050 ataacctgct ttattctacc tcggggctgg tgttggaatc caggatgcca 12100 gaccctaaag tccagctctc tttccaacct actgaataat ccgagagaaa 12150 tcatgttctc tctctgggcc tcagtttgcc catgtataaa atgagatgaa 12200 ggattggctg ggatgctctc cagagtctct tcctgcctgg agttctgacg 12250 tagccatgta ctcctgctca gcatcgctaa atggctttgt ggtaggacca 12300 ttgagtgctg cctccattag ggccagctat gtaatgctgg ggtggctgtc 12350 actgggccct aagagccagg attggtctta ctggagaaat ccacatccac 12400 ctaaacttaa gacccagggg tgtccaatct tttggcttcc ccaggccaca 12450 ctggaagaag aattgtcttg gaccgcatat aaaatacact aattatagcc 12500 gatgaggtta aaaaaaaaaaa actcaatatt ttaagagagt tcatgaattt 12550 gtgttgagct gcattcaaag ccatcctggc cgcatgtggc ccatgggcca 12600 tcggttggac atgcttgctt tagacctccc agcaattcta gtctctaaac 12650 aggaaatcaa aagtcaagat gaatagataa gttggtcagt gtgaaaaagt 12700 aattggtggg agccactgta gatgcagggt tctaggctcc atcaacaacc 12750 acctacatca ctgaacgaaa gataatgctt gttcagcact tattacatgc 12800 caaccatggt aaaaatactt cagatgcatt gttttcatga actctcacag 12850 cagctctttt tcttgcctaa atgccccgtt agaacctcca gtacaatgtt 12900 aaatagatat gctaagagac aacatatgtg tcttgttagg gggaaaatat 12950 ccagtctttg actattaaga atggtgttag cagtgggttt ttcctaggtg 13000 ccctttatca ggttgaggaa gttcctttct attcctggtt tgttgagtat 13050 ttttatcatg aaaaggtgat gggttttgtc aaatgctttt ctgtgtctgt 13100 tgagatgatc atgttttttt gtcatttatt ctattgatat ggtatattat 13150 acattgattt ttcagatatt aatcttgcat acctgggata aatcccactt 13200 ggtcatggtg tataattctt tttattgtt gctggattga gtttgctagt 13250 attttgttga tttgtattca taacagatag tggtctgtag tctttccctc 13300 cctccctccc tccctccctc cctcccttcc ttccttcctc tctctctctc 13350 tctctcccct cccctccctt cttttcccct cctctcccct ccccttccct 13400 ttcttctctt tcatagttgt ttaccactgt cagaaaaggt ctgttcgttt 13450 tctttcgtcg tgagatcttt gtttggtttt ggtatcaggg taatactgcc 13500 tcaaaaaatg agtaggggaag tgttccttcc tcttctgtat tttgagagag 13550 tttgtggtcg gtttttatta attcttcttt aaatatctgg tagcgttcac 13600 cagtaaagcc atctgggcct gatgttttct ttgtggaaaa ctttttgatt 13650 cctaattcag tttctggtta taggtctatt cagaccttct attttttctt 13700 aagtcagttt tgatagtttg tgtcttccaa ggagtttgct tcatctaagt 13750 catctaattt gttggcatac atttcatagt gattccttat gatccttttt 13800 atttccgtta aagttggtgt agggatagtc cctctttcat tactgattat 13850 aataatttga attttctttt tttcttagtc ttgccaaaag cttgtcattt 13900 ttattgatct tttcagagga ccaactttga gttcattatt tgttctcttt 13950 gttcttattt ttctgcttca ttaacttctc taatctttat tctttcattc 14000 tgcttgcttt tggttaagtt tgctttttct ggtgtcttaa ggtagaaggt 14050 taggttactg atttgagatt taaagatcat gctctttaaa cgttttgata 14100 gatactgtca gtttgccctc tggctttttc tcattaacag tgtataggag 14150 tgcttattcc tcacactcat accagccctg ggtgttacta acctttatat 14200 atttgccagt atcatattca gacatagtat cttgttttaa tatgtttctc 14250 tgattactga tgaagttaag caaattttca cgtgtttat ggccatctgt 14300 ctttcttttt tcatcctttc tttcaagatg ggagtctttg ccatgttgcc 14350 caggctggac tcgaactcct gggctcaaat gatcttcctg cctcagcctc 14400 ctgagtagct gggactatag gcgtgagcca ccatggctgg cttgcccatt 14450 tgtatttctt atgtgagtat tttttctttt tttttgaagt ggagtctcac 14500 tccatccccc agagtggagt gcagttgtcc gatcttggct cactgcaacc 14550 accgcctccc aggttcaagt gattctcaca ccttagcctc ccaagtatct 14600 gggactatag gtgtgtgcca ccacacctgg ctaatatttg tatttttagc 14650 agagatgggg tttcaccatg ttggccaggc tggtttcaaa ctggcctcaa 14700 gtgattcacc tgcctcggcc tcccaaagtg ctggggattac aggtgtgagc 14750 cactgtgccc agctgacttt ttttttcttt tttttaaccc ttttttttt 14800 ttaccctttt tttggcccat ttttttttac cctttttctt ttaacccatt 14850 tttctattag ttttaaaaat atgtttgcag gagcttttta tattgtggat 14900 ttttcttgtt tattacatat catttgtaaa tatggtctct ccatctgtca 14950 ctcttcttta tctctggttt ctttagctat gtagaagttg ttatgttatg 15000 ttatgttatg ttatgttatg ttatgttatg ttatgttatg ttatgttatt 15050 ttttggagag ggagtcttgc tctgtcgccc aggctggagt gcagtggtga 15100 aatctcggct cactgcaacc tctgcctcct gggttcaagc gattctcctg 15150 cctcagcttc ccgagaagct gtgattacag gcacccgcca ccacacccag 15200 ctaatttttg tgttttagta gagacggggt ttcactatgt aggtcaagct 15250 gatctcaaac tcctgatctc aaatgatcct cccaaagtgc tggggttaca 15300 ggcgtgagcc actgcactcg gccagaagtt ttgaattttt atgtgtttaa 15350 atctatgttt tcctttatga cttcaggttg ctttcatact taagcaggtc 15400 ttcaccatcc caaaatgata aaatttttct cctgagtttt cttctaagtt 15450 ggttctttag aagccaccaa cttggcttcg acagcaaaag atgaacagaa 15500 tttctgttca actctcatgc tgcaagaagc tttatgtaat actccaggga 15550 ccctttaagg tcccagagtt ttcctccaaa tctatcagtg attctagtgg 15600 ctaagagtag aaatgtgaaa atttagccat gtgtgctgat agagctgtag 15650 taatttgtaa gctctgaagt tctaaggagt caggggagaa gggaaagtaa 15700 catttatga acatctatta gctcaataag aacatgcgat aagtatgtat 15750 atgtattatt tcacttacat ctgaaaggaa ggcataatta tccccactcc 15800 ttagagaagg aaattggagc tggctacatt taaagtagtc ctgacaccag 15850 agagatattg ccaggagtac ttggctggct gagtgcccag atggcccata 15900 ggagtagtgg gccctccaca gtccaaggtc tggttctagg tggagagaga 15950 aggatgtgct cgtagtcagc accgcagctc cagaaaatct gctggggctc 16000 caaaactgat tagaggggca gctgactcag taataaaact cccagggagac 16050 ttacttacat actggaatgc aaagttgcag ctttactggg aagattagaa 16100 ctgttatga gtagcttaga aatctctggc tgaattcact gcaaggggaag 16150 ccgcaggata agctaactgc tggtgagtca gcagtcagag cagggaagtg 16200 aatttaacat tagatgggtc agtctctcgt ggctgatgaa ttcatcccca 16250 caatactgta cacctgcctt agggaccttt gtctggacta ggggttgggg 16300 tccccctcct ttgtacagcc ctggaaggac acatccagct ccatccgcca 16350 tctctccctt acttatttcc ttccttcctt ccttctttcc atccagccat 16400 caagcttcct ttcatggcca ataatcatca ttggggtcta ctcatggact 16450 ctcttgcctc atgtatttgt tttattttgt cctcattccc acttctattt 16500 cccaggtata tcacaggcaa ctattctaac gtatttatag tttgtgtatc 16550 tgtttttgct cttgccaaaa tggaagccac tgctttatac atagatgtat 16600 tcttaacttt aaaaaaaatt tttttagatt aacctacaat aaaattggct 16650 ttttggcata tagtctataa attttaacac atacatattt ttgtgtatct 16700 accaccacaa tcaggataca gaacagttcc atcaccccaa aaaaatccct 16750 cttgtagtca cattctcctc ccacccttaa tcccaggcaa ccactgatct 16800 attcttcatt actattgttt tgtctttttg aggatgtcac ataaatggag 16850 tcacacagta tatatacatt tttttaaaca tatgtaaaatg gcattttata 16900 gctcattttg attatatgtt tttcatccag ttctgttttt tttttttatt 16950 tttaaaaagt ttgacataac ttcagactta cagaaaagtt gttagactaa 17000 tacaaagaat tcctggatat cctttggagt ccctaaatgt taacatttta 17050 ctatatttac tttttccttc tctctctctc tctctctcgc tctgtgtgtg 17100 tgtgtgtgtg tgtgtgtgtg tgtgtatcta cctgtagata gatagatatt 17150 aatataattt tagatagatg tatctagatc tctctctctc atatatatgt 17200 gtgtgtgtat atatctatat ctatatctat atatatctcc ttttaccctt 17250 aaatattcag tgtatatttc ctaacaacaa ggtgatttaa aaatatatat 17300 ataaacatag tataattaac aatcaggaca tcaacattga aacatttctg 17350 ctatgtcatc tacaggcctt aggaagactt tgtcaggtgc cccaataata 17400 gccttgatgg tagaagaaaa ccatgtgttg tattcagttg tcatgtctct 17450 tagtgtcttg taatctgaaa taattcccaa gccctttgga tttcatgaca 17500 gtgacattgt tgaagagtac aggccagtta ttttgtagaa ggtctctcag 17550 tttaggtctg tctgatgttt cctcctgatc agattcaggt tattcacttt 17600 tgacaggaat accactgaaa tgatgctgag ttcttctcag tgtaacgaga 17650 tctagagaca cacactgtca gtttgttcct tattggcagt gtgaaccttg 17700 aggatttcat tgtagtggca tttggcatta ctccattata gttactattt 17750 taccatttta aattaaaact atctggccgg gcgtagtagc tcatgtctgt 17800 aatcccagca ctttaggagg ctgaggcggg caaattgctt gaggtcagaa 17850 gtttgaaacc atcctagcca acataacatg gtgaaacgcc atctctataa 17900 aaaatacaaa aaattagcct ggcgtggtgg cgcatttgta gttccagcta 17950 ctcaggaggc tgaggcacaa ggcttgcttg agcctgggag gcggaggttg 18000 cagtgagctg aaatcacgcc actgcactct agccagggtg acagagtgag 18050 actctgtctc aaaaaaaaaaa agtaaataaa taaaaaaatt ttttaagtat 18100 cttatgggca tatacttgtc ctgttactcc tcaaactttc atccactttt 18150 ttttttttaa attttttttc ttacctttca tcgttttctt gatatccact 18200 gggttttagc atctacaaat gattcttgcc tgaatcagtt attatggtag 18250 ttgatggttt tctaattcca ttattccttc tatgtttgtt aattttggca 18300 ttcttctata aggaagagct tacccttttt ccctattaat taattcatat 18350 attaatgcag acctatgcat tcttacttca ttaaatcata atcctttact 18400 atcattatgt attctgatgt tcagactatc ccagattag ccaataagat 18450 ccccttcagg ggaatggtct ttgggattcc tctttagagg ttcctggttc 18500 ctgttttctt ttgacatatc ctattactct ttgagcattt tttttttttt 18550 ttttactttt aggcacagca agaagttcca tggtcctctt gttctttccc 18600 caactcagcc ctagagtcag tcacttctcc aatgagctct agttcctttt 18650 agtagagaat cataattaga aaacaagaat cagtgccaag tgtgcacctt 18700 tgtttttaag gtccatccac gttgccgtgt atatgtccag catgttgatt 18750 ctaactgctg aataatacct catgattgtc atccatccca gtgtttcttt 18800 ttcccttctg taatgaggga ctcctggact gcctccagca ttaccttcac 18850 aaatattgct gtgaggaaaa tccttaaacg tttcctttat gggcaacgtg 18900 tgagcatgtt tatgttgatt caggggtgcc agacacagct ccagaatggc 18950 tgcctcagtt tacatttcca ccagcagagc atgacaggct ctgtgtctcc 19000 gtgaataatc agcattaacc agcttcctat tttttgccaa actaatagat 19050 gtgctaggat aactctttgt tttaacttgt ttttctctga ttaccaatga 19100 gctggagcat ttcttcatat gcctgatggt ctttgggatt cctcttaggt 19150 aaattgctta ttcattataa tcctttgcct gtttttcact ggagttctta 19200 tatttttctt gaagatatgc aggaattcct tatacatcct agatattaat 19250 cccttcctgg tctcagacat tgcagatatc ttctgaatct gttattatact 19300 tatttattta caattttttt tttaagagtt ggggttttgc tctgtcaccc 19350 agactggagt gcagtggtat gatcatgact cattgtggcc tcgcaatcct 19400 gggcttaagc gatcctccca cctcagcctc ctgagtagtt gggactacag 19450 gtatgcacca ccagacttgg ctaattttat tttattttt agagatggaa 19500 gtcttaatat gttgctcagg ccaatcttga actcctggcc tcaagcaatc 19550 tttccacctc agcctcctgc atctattata tatatgttca ctttgctcat 19600 gctgtatttt gttgcaacat aaaactattt ttcccattgt tttgtgcagt 19650 ctctcaccag cactcttctt tttctgtaac tgtgttaatg ccctttgttc 19700 ttccatatgt taggtatgct ggtatagttg aactctgctg actctcctca 19750 gtaaacagtc tctttttatg acaccttatc ctctactgaa ttctctctat 19800 caagaatgac ttggccgggc atgggggctc atgcctgtaa tcccagcatt 19850 ctgggaggcc gaggtgggca gatcacccga ggtcagaagt tcaagaccag 19900 cccggccaac acggtgaaac cctgtctcta tgaaaataca aaaatcagct 19950 gggcgtggtg gcaggtgcct gtaatcccag ctacttggga ggctgaggcg 20000 ggagaatcac ttgaacctga gggggaggtt gcagtaagcc gggatggcac 20050 attgcactcc agactgggtg atggagaaac tccatctcag ggggaaaaaa 20100 aaaaaaaaaa aaagaatgac ttgtcttcct cttagagtgt gaggtctaca 20150 tacaaatatt attcttgtat tcagcaaatg tatgtcatag gcctagtgtg 20200 tgttaggaac tgtgctgtca ccaacaaagt ttagagaggt tataaaactt 20250 gactgtagct ttttagaggt ggaggagtga tttgaaacct aggctgtaat 20300 tccttcctcc tgtgattcct tcctactgtg ttgccttccc ttgaaaattg 20350 catttggggg ccaggtgtgg tggctctcgc ctgtaatccc agcactttgg 20400 gaggctgagg cgggtggatc acctgaggtc aggagttcaa gaccagcctg 20450 gccaacatgg cgaaaccccg tctttactaa aaatacaaaa attagctgga 20500 tgtggtgtgt ggtgacatgc acctatattc ccaggtactc agtaggctga 20550 ggcaagagaa tcacttgaac ccaggaggca gaggctgcag tgagctgaaa 20600 ttgcaccact gcactccagc ctgagtgaca gagtgagact ctgtctcaaa 20650 aaaaaaaaaa agaaaagaaa gaaaattgca tttagttcct gtagactgtg 20700 tgtcaaatgt ctaaatctct tctaaacaaat ggcctaagga ggtgcaaagc 20750 gaagcatcct caccagcatc ctgacttggc agtgaggcat gggaccctgg 20800 agggagtagt ggtaagtgtg actctggaat tcttcctggg ctacttgtca 20850 gtgactggct ccagattgag aggagagccc agaggacaca ggtggctgcc 20900 ccagcctgga ggtgaaagtc ttaaaataaa atgccagatg cctagaccat 20950 tctaaacctt tctgagaagc tgaaatcatc ccttctggaa gcgctctagt 21000 tctaaaagga cagatataca gcaagatctt cctggggcta atatggagtt 21050 tataggcaag taggcctcag aacctttccc tggtagtgat atctgtgggc 21100 aggcacagtt tccacacttt ccagaaattc cagcggaagg agtgagaagg 21150 aggaatctgc ccttgagtga ggaccaaaga aagcagaaat tcctcttggg 21200 aatttttcct ccagagacca aacactactt gggagcttgt ttactgggct 21250 ttaaaagctt gtgaccccca gtcactcttt cttgaccccca aggctttgca 21300 tttctgtggc ttccccactg gacagaagtg gaactgtcat gctgcctgtt 21350 ctggggtctc ccagaggttt ccccatgtcc tctccttgct tctactgccc 21400 cacagaattg gggatctgtg accacatatg gtatagaatt aatgcttgag 21450 aatggtttag ttcagtgatg tcaaataaga ttcactttta tgccacctcc 21500 atcagttgaa ggcccccctg gcccctaaat tggaaaaagat tctgagacag 21550 aatccccgtg ggtacagcgc agggacagta aaggcacgtg tgctgtgatt 21600 tgctatccac tgtgtggatg catccaggaa tatcagaacc ctggaagatt 21650 atttaagggg aagttaggac agcttttttg ccaatccaag ggtgttcttg 21700 aggaagtctg tcttcctgta tggccttcag tttctttcct gtgtaaccat 21750 ggggccaaca cataattccc acagctctat tggcccttgt ctgccaggat 21800 tctctagggt ctgattcgag gtggatcctg gccctttgag gtggcagaat 21850 ctgatcatgg tgctgtttcc ttagatttag gccttgatac ccttggcgag 21900 agcatcctgg gctgagtgac cacctgaggt ttttctggtg attttgtgac 21950 ccatgtaaaa ctttgagctt tgggattatt ctctcaagga aatagtgaca 22000 tttggtgaag agcctgtttg gtgtggctat gtgaggctta gccaagaaaa 22050 tgcaccattt ttattaggag gttaggccat ccgttgccac aaagtgtcag 22100 atgctaggcc tagagcctgg agaaaactta ttttaaaatt gatggggtgc 22150 tggagggggtt ggggggtggt ggctgtagct catgaatcag gtgctaaaacc 22200 tagaaacaaa aggcctcatg tggcagactg tttctgagca cagatgaatg 22250 gatgagcaac tggcgcaact ttgcccagtt ggtccagctt cccacttggc 22300 cacctaggct tgctgtgaag acctcgtctg gcagaaatga gagtgttttt 22350 gccccatctt gatcttaact gtaatttaag actaaaatct tagattctaa 22400 aacatcaaag gcaagatggc tcccagctct gtgagctcag cttctcacct 22450 cttagttgaa caagtgcagt gtgggtcaat acatgattgc tgctcttgct 22500 gccaggaact gtcccagcat agaaaggaat gggacacaat ccctgccgtc 22550 aagattctaa gggaggaagc aggcaggtcg actggtgcct catctctgca 22600 gggctccagc caaggtttgt gaaggatttt gcaggcatat ggagtgggga 22650 ctgattgatc ccgagagggg actggggaaa gctctgaaga ggggatgaca 22700 tttggtttga actccaaaaa atggttgctt tacctgtttc ctgaagtttt 22750 tgaggtggct tataagaaca tataccataa aaaggaccaa tataaattta 22800 aaatcagaaa aagagaaaat gggctgggca tggtggctca tgcctgtaat 22850 cccagcactt tgggaggcca aggtgggtgg atcgtgaggt caggagatcg 22900 agaccatcct gcctggccaa catggtgaaa ccccggctct actaaaaata 22950 caaaaaatta gctgggtgtg gtggcacatg cctgtagtcc cacctacttg 23000 ggaggctgag gcaggagaat cgcttgaaac ctgggaggcg gaggttgcag 23050 tgagctgaga tcgcaccact gcactccagc ctgggcgaca gagtgagact 23100 cctcctcaaa aataaataaa taaagagaaa atggaactta gaaaattaag 23150 aggaagagtg aaaaggtaga tatttagtca ggcacagtgg ctcatgcctg 23200 taatcccaac actttggggag gccaagacag gaaaatctct tgagaccagg 23250 agcttgagac ttgcctggca acatctcagg tgagacctta tctctacaaa 23300 aaaatttaaaa attagctgag ctgtgtggct cgtgactgtg atcccagcta 23350 ctcaggaggc cgagaccaca gcccaggagg atcgcttggg cccagcagtt 23400 tgaggctgca gtgagctggc accactgcaa ttcagcctgg gctacagagc 23450 aagacccagt ttaaaaaaaaa aaaaaaagat attcaaacca tgggtcccaa 23500 cgtagttat atatttgacc atttgcaaaa gctgaaagca aaacatgtta 23550 cacattttca gagaggaaaa tacacagtag ttcctgagtg taagttgttt 23600 ttcttgacct cattcttaaa ttgcttcatg agggtgggag ggaagtggta 23650 gttaataagt gaacctgtaa accagcgttt ctcaaaatgt agtccaggga 23700 attgcatcaa aattgcagtt acctacagtg cttgttaaaa tgcagattcc 23750 tgggcccctg ccccaggctt atcaaatcaa tctggtgagt aggactcaag 23800 aacctgtaaa ttcacatact tctgcagatg attcttcttg cactgcacag 23850 catgaaagcc tctgcaatag acagaaagct accagcattg cgaaagcaac 23900 ttgagtgctt ggcctttgaa ggttgagtgg gactttaatg agggagagag 23950 taaggcatga gaaatggcag ttccactgag gtcagtcagt ggttcattgc 24000 tgacgaagtc acttttaagt catgttttag aagaactacc aagtgtggca 24050 ggtcaggcat gtggcaggac tgtttctgag cacagatgaa tggatgagca 24100 cctggcccca ctgtgcccag ttggtctagc ttcccacttg gccacctacg 24150 gtctgctgtg tggaccttgt ctggcagtct cctttaattt attttttatt 24200 atttttttct ttttgagatg gagtcttgct ttgttgccca ggctagagtg 24250 cagtggcatg atctcggctc actgcagcct ccacttccca ggttccagcg 24300 attctcctgc ctcagcctcc caggtagctg ggatcacagg caagtgccac 24350 cacgcccagc taatttttgt atttttaata gagacatggt tttaccatgt 24400 tggccaggct ggtctcgaac tcctgacctc aggtgatcca cccatctcag 24450 cctcccaaaa tgctggaatt acaggtgtga gccaccgcac ctggcctatt 24500 ttttttcagc aaattctttg tttttctctc tgttcccaaa tgcagggtac 24550 tgagaccaca gatgtattct gtttcctgtt gaaaaaatgt ttctcactta 24600 gctgggtgtg gtagcatgca ctgcagtccc acgggaggct gaggcgagag 24650 gattgcttga gcccaggagt tcgataatca tgccattgca ctctggtctg 24700 ggtaacagag cgagaaactg tctcttaaaa aaaagaaaaa gaaaaagagg 24750 tcctagggaa agaaacaaaat agtggcttgg atggtgagtt ggtggaaaga 24800 acagtgggtg ttgggggtgt tgaacttgtg tttgtgtgtg gtgtacccaa 24850 gacatatcat gtcagcatta agaatagact attcctgttt tctggtcact 24900 gagttgtatg ttttgacatc cttattttgg aagatacttc cttactagga 24950 atgggatagg gagggggtca cctttcccat ctgtgggtca tattttaaaa 25000 tatttattgt tcaagtttaa agatataacc aaaggtataa agaaaaatac 25050 cacaaacatc tgatttaaga aacaaaccag ccgagcgcgg tggctcgtgc 25100 ctgtaatccc agcactgtgg gaggccgagg caggcagatc atgaggtcaa 25150 gagatcgaga ccatcctggc caacatggtg aaaccccgtc tctactgaaa 25200 atacaaaaat taactggtca tggtggtgtg tgcctgtagt cccagctact 25250 cgggaggctg tggcaggaga atcgcttgaa cccaggaggc ggaggttgta 25300 gtgagccaag attgtgccac tgcattctag cctggcgaca gagtgagact 25350 ccgtctcaaa aagaaaaaaa aaagaaagaa atcatttcct acaccttcga 25400 agccttcatg agttagattt tgaaacagtg caaaatgctt cacgtgagaa 25450 tcgagagtcc cttctggtgg ctctccatcc cctgctcttc tgtcaggttt 25500 tcttgtaggt ttatggaaac ctttgttact tgtgcaggtg gcagagaagc 25550 agagaggata gctgcgcgcc acccacacag ctaggattta ttggcgtact 25600 cccacgtgca tggcagccaa gtggacacaa ctctgtgatg aatcctccca 25650 agagaactga ggggccctga tggaggagct gcttctttgc aaagctttcc 25700 ttgactctct tcctgtcccc tagttgattc cccttctgtg ctagttttag 25750 cttattgttt gttacctgtc acacttagca gtactgttgg ctttgctggt 25800 ctccttgact actgggggta aagacctttt gttgttgttg ttgagacaga 25850 gtcttgctct gtcgcccagg ctggagtgca atggcgtgat ttcggctcac 25900 tgcaaccttc acctcccagg ttcaagagat tctcctgcct cagcctccta 25950 agtagctggg attacagcta caccacaccc ggttaatttt tgtattttta 26000 atagagatgg ggtttagtag agatggggtt tcaccatgtt ggccaggctg 26050 gtctcaagcc cctgacctca aggtgacctg cctgtctcag cctcccaaag 26100 tgctgggatt acagacatga gccaccatgc ccagcctcaa agacctcttc 26150 tttacttgct caccctgccg cccactcccc taccaaccct tgcatgccct 26200 ataccacctg gcacatgata catactaact gggtacatgt ttgaatatga 26250 atggatgtgg tgctgtgaat gcttagggga agtgggtgaa atgcttaaga 26300 accaaccttg agtggtctgg gaaggcttcc tgggagggtg gtgtttgagc 26350 taaggccagg cagctgttag atttgttaga ctgaagccct tgcagactta 26400 gagagcttgt gctcttccca gaatgacggg tgagccacgt acagtaaatg 26450 gtgcttctca tttctagccc aagggggcctc aagggggcacc gtgatttcac 26500 gagaatgctg caagcaaatc ttttctcaag ctggggaatt tggtggtaat 26550 gcctggctca gcttgcggtg cgcacctggc ctttggaaga ttggtacaga 26600 gagaagcggc ccatccacat gagcctgtgg aacagcactg gtgggggagc 26650 tgatttgtga agaggggctg tgcagtgtac tgtcaggtct gagacccagg 26700 aagaaattcc agtatcccag ctctcagaat cacagagttc taggcactgc 26750 ctagttccac gtgttcccaa atgtttcctg aatacttgga tttcctgtcc 26800 agagaatttt caaaacaaac ttagaggcct gacccatggc tgccaaggaa 26850 ggattttttt tttaaattaa attttaaaaa tcagtccagc atgaaaatct 26900 atgatgattt catagagaa aggacatttt aatattcaaa gagtaagaag 26950 cacttaatct tggaagaaag ggcattccta tactttgatt acctttagtt 27000 taattaaaaa acacctacat ggtctttact tctgtgattt cattcctggg 27050 ctagtgaaac attgtcacaa taaagcatca ggccaacgct tctttcgacc 27100 cactggccaa tcagttgaca aacagtgact agatgtttca gcctattttg 27150 ctgaggctaa aggattgaac tagtgcttca gccagcatga aaaccagtca 27200 ggagtccgtg ctggtgttgg cttagattag cagggccttt gatgggagggg 27250 catgtatgtg tttgggtttg ctgtgccagg caggggagca gtggaatttg 27300 tctgaattga gctcacacat tgaagttatt gagcgactta catgcaaggc 27350 catgacctgg actcccagcc gagaggccca cgtggcgggg cttgagctgg 27400 gggagccgag gacagcttac atctgctcat ctgcttacgt aaccctgcct 27450 cccagcttcc agagccaaga aaaacacacaa gccagcccag cggggccgag 27500 agcctgtggt agcacacgcc atgcgccgca cagcaagggc gccttggctc 27550 ggcttgaggc ctgtcatgaa gccctcagcc ctctgcctcc tcccagagct 27600 tctccccacc accccaggca gtggctctga aacctggtcg caggtctgca 27650 tgattctgaa cagaggtagt cgttgccttc ctggagtctg agctctctgg 27700 agtttctcac tgggacagag ccaggtgtgt agcagagcat ggtccctgca 27750 gtatggcagg aggtgtgcag ggcattcagg aggcctcctg gctggcactc 27800 gacccaatta gtcattcaac gccaggtctg gggctgctgt ctgttgtctc 27850 aaaggtgtga gctgcaagat ccttagagtt gtggagaaaa aattgccaga 27900 ttggcaagaa gggcaggatt gggggtcaag gtgtctcagt gtgttggaag 27950 catgatgggg gttgtgcaag gggcacagcg agttcagaag ggagcaggag 28000 agtgagaaga ggctgttcag tgataaagct ctgcacagag ccattggagg 28050 agcaagctcc ttgaccatcc ttaaaccagg gtaattttca tttaggttct 28100 gccacacgct cagcagggaa ctcctggaag gcaggatttg tcttgtccat 28150 cctccctccc tacctcaacc cactcctcct tgggctggca cacagtaggt 28200 acccaaag tatcaattga aacaaattga aagtggtctt gatacatatc 28250 acagggcaag tttgcagtta acagacattt cagagtaaag actctctggc 28300 ttggtgctcg atcggcttct gtgggttgtc agcatgctgt ggacagcccc 28350 ggcatgggag cgagtgggcg tgtgtgtgtg tgtatgtgag ggtgagagag 28400 cgttagtgtg tgtgttgggg ttggggagag aggagggggga atagaagatg 28450 gaccacccgg gtatcagctt ctgccctggg gagatggtgg tgtcagttgc 28500 tgagggaatc ctgagaagca ggtctggctg taggtggtga tggtggtggg 28550 gttgcatgag aatccatttg gggcaggttg aatttgaggt gcccatgaca 28600 tatggctagc catgttctgt tggctgtgag gtcaggagag agacatgaga 28650 tggaaacaga ggtttgggaa ctgtcatgtg cttaaaccaa agacctgggt 28700 atagggagag tgagaagaga agggggcaaa gatggacatc caagaaagaa 28750 gctgagaaag cctaggaatt tgaggtaaga ggagacgtag gtaaatgtga 28800 cgcttggtga tcaaggcttc tttccacctc tcctatgctg gacactcacg 28850 tctcctgtct gcttggaaat tcatgctgag ggcagggaag gtgggagcaa 28900 ggatttgtct aaagatcttg ctttggatcc ctgcactcct cctggtttac 28950 caagtgtcac tggacacgtc agggcgttct gagaccttag agagcatcca 29000 gtcctgtccc tgcagtttac aaatgaggaa accagtaccc tgagagtggc 29050 tgtactatcc actctcagga taccaaagat catctggaaa gtcactggtg 29100 gagctggacc ggggcccagg catctcttct cctgtccggg gctcttgact 29150 tcaggaccac ctttctgaaa cccatgatgg ggcaacacca ggacactttc 29200 cagcctgcag gtgtctgtcc cgcggaagcg agccaggcca catgtgaatt 29250 cctgttttct gggtgggttt cagaaggtac gagcaagtcg gcagggtgac 29300 agcccaggtg cttcttgggt tccccaaac gcggttatgt ttagcagcat 29350 cctcagaacc aaaggtgggg tgggggctgc agatgttgtg ggggccctct 29400 gaagtgaaaa gagccctgtg acagatcttt tcttcatgtt tttcacaagt 29450 tcactgtgca gcagggcccc cccagtagcc tttgcccagg gttgggtgtt 29500 gggcagccca ggcctggctg accttgtggg gaagggtgtg aatggtggga 29550 atccccgagg gccctctttg cccgaaagcc ctaagccttg acatcagatg 29600 cccatcagat ggtccatcgg agccctacta cccagcttgc ccagtgagaa 29650 tcatctgggc tccttgttag gtagccattt aggtccttcc caaaatccac 29700 agactctcta agggaagggc ccgagatgct gtacttgtac taacttcctc 29750 aagcaattct tgtgataggt ttgggaaaaa cttgtccagg gtgaccactg 29800 actgagtcct ggtcttctct gaagagcaca gtgcctgctc actttagggc 29850 accctgggag gtgggagctg gctcagcagg cagtcttata agggactgag 29900 cttcaaggcc tctgtccctc caggagggag gtgcatgacc agagagggag 29950 gcctgaggat cttcttccct gccccagagg gtctgctgcc tgagctctgt 30000 gatagcgcag agagtaaaag gatcaagctt gattgaggcc tatctctcaa 30050 tgcgaaagtt tgctagttaa gaggagagtg ggaagggcat ttctggcaaa 30100 gagaaaagtg tggacaggca tggcttaagg gatggggagg gagacagaca 30150 gagctgaggg tgaagggcct tttgctcagc tgtgggcctt ggccttccct 30200 tgtgcaggga cacacagcct tagagccact ggaggtttta gtgggaaagt 30250 aatatggtcg gggctgtatc tcagaagaaa acaaactaat gggaacaggt 30300 cctgtgatgg tggacctggg tcagctacgg agggagggaa gatgtgagat 30350 gtgtactggg gaagggggtg gaagtggcag ctatctggtg agaggaagca 30400 ggcccacagc tttttttctc aagctgttga attcagaagg gcgagtgatt 30450 ccgggagtag ggggtgcttg gagagccacg cgttatgat aaacagggca 30500 ggctgaagcc tgctcactgg ccctgggcgg gttctcacca gcatgtttca 30550 ggttttgatc tgtgcttgtg gttggtgttc ctacctgttc tctaggttcc 30600 ttcctttgtt cttgtggctc atttgcttca caggtgaagc tggttacact 30650 agagtaacag ttcccaaagt gtgttccctg gaaaaaatggt tctgtagcca 30700 aataagcttg ggaaatggtg ggttaaatat aacgaagggg gtttttcgac 30750 tgcacaactt ctcagagcct ttggtgtgtg tcgtgacttt gcagaagcag 30800 gatttaatac gcagcattcc cgttcttatt tgaccacgag acatgttttt 30850 ccattaagca tcttgctggg tctgatgttt tctggaaccc attttgaggc 30900 ggtctggtct gcagagagta tggggagcct gggttcaagc cttggctctt 30950 gactctcagc agagccttga ttccctgtgt tgcctggact gcaccacgtg 31000 taccacatac ccggtatgtg acgttttcct catccctctt cccacctgcc 31050 gttacctcac aatccacaat ctgcacctca tccatttttc ttctgaggca 31100 agcactctct tactaactta cttatctcat ctgcatccat gttcttctag 31150 gccagaaact tggggagtcat ccctccctct ttgttacttc ttcttcctct 31200 ttgttacttt atcccctctg ttactaaaaca ttcttctgtg tttccagcta 31250 tttcttttat tttccctcgg tctcctttgg ggtttctttg cctccatctc 31300 tcccagacct tggttcacct tccatcgagt cccttcctgg gacatgggca 31350 ctcatgccac tcctgctacc ttccacttcg aagctaactc cctccacact 31400 gacgtcccca acatgcatgc atacacacac acacacacac acacacatac 31450 acacacacac acacacactt ccccagttag gctagaatca gagagatgat 31500 gtcagccatt tgtccaaggc cacgcagctg ggaggtcaca gagctaagtc 31550 tcaacctcag gggttttgag aaattgcctt ctcatccgtg atcactgatt 31600 tctacaacag cctgtcagga agtctgggta gaaattactt ccattttaca 31650 gtggagtcag agcggggagg gtcctgggca ggcgagtgct tcacagagtg 31700 accaaccatc taggtttgcc ccacactgaa gggggtttct ggggatggtt 31750 ggtcacccta atgctggatg tggtgcctga tgctgggcag gagggccctc 31800 tccgtggcca cgttgcctcc caggaggaga catttcctct gcagctgcag 31850 ctgcagcctg gccatctgat gcagcctgtg gagcggtggc gagtcctgtg 31900 gcctgctaac ttctccctcc ctccacctct ctagtgggcc ccatgctgat 31950 tgagtttaac atgcctgtgg acctggagct cgtggcaaag cagaacccaa 32000 atgtgaagat gggcggccgc tatgccccca gggactgcgt ctctcctcac 32050 aaggtggcca tcatcattcc attccgcaac cggcaggagc acctcaagta 32100 ctggctatat tatttgcacc cagtcctgca gcgccagcag ctggactatg 32150 gcatctatgt tatcaaccag gtgaggcctg ggaaggtgga atgagagagg 32200 gtgtgtgtgc atgcagatgt gtatcagatg tgtgtgtaat gagggcaggg 32250 gaagggggagt gatttcacag acacctggca cttacagcga ggaaccagcc 32300 ccccagccac caccagtgca gatgaggtaa acgccaaaaca gtgtgcttgc 32350 ctattgctgt caactctata gccaagggaa atgctggagt gttttcgttg 32400 ttctgttttt gttttctgga agtagccttc cagcaagatt gggaaaaaag 32450 acaaccctaa ttattccaaa gtacacactg attattccct ggctttgtgt 32500 agctgtgtat tttcctttta aaaataaaac caccatttag atgtcagact 32550 tttaggtaac ttcaaagttt atccagtcag tcagagcgtg tctcctgggg 32600 cacctggaga cagtgccctt agttcaggtc acatgcctac atgccagccc 32650 ctggtgaaat atctggagaa gtctgattcg tgggccatct gagagttatg 32700 tggactgggc cgagtctgag aaaaagtttc tcactgctcg tctgatccat 32750 atgtgttggg ctttagccct gcttaggaaa gtaatgctaa ggataggtca 32800 actttcatca ccatggcatg gagaatcaga ttgatctaag aggcatcttt 32850 attgaaataa atttttcagt ttatttgagg agcattattt tcccaagagt 32900 ataactttga tatttcaaga ttacccctaa cacttaaatt catgttttta 32950 gactataacc tcctaggtgc aatgacacat ctaacttatc taagcaccca 33000 gtttcattga aattcatttg aagagtctga gtacgcccat ttctacaagg 33050 cccaatgtcc atttcatttc gagataaact ctgctttagg taggaggatt 33100 gttggcagtt tacggcttcc atcaaggtca aggaactctg tgcaccttcc 33150 ctatgacccc aggggaagca ctcgaggact gctgtggcat tgtgctgcat 33200 cacttgctgc agggagattc tgaagaagtg taaggtctca gtcctgccct 33250 gtcccgaagc ctccaaccca cttctggcaa gtgggacctt cccagggaac 33300 aatttgttaa cagacccaaa tatcctgtga ttggatggtg gctgccaaat 33350 gctttggaag ctcagaggaa ggagagagag caatggcttg gaagaaccag 33400 gatataaact aggttctaaa gtctgcaggg agatgggctt ctcagctggg 33450 gccagtgagc agggacctta aggcagaaag gagccttgca tgttcctgga 33500 aattgagatg cccactgggg taggaaagca ccagaagctc tgggaccagg 33550 tgtcagagtt aagcctgtga ggcaggagag agcagaacaa gccctgttac 33600 aaggaaactg aagcaggaga gcaggtggtg ggcaaacccc ttgaggctgt 33650 ttgaattctt cggccaagtg aggtacagac cagggcccta tgaacacctg 33700 caagcaagac agccacgcag ttgtgggtca ccttggaaga atattggaga 33750 atgcaagaga gaacaggtaa atgtcctgca aaatgcgggt cactttaacc 33800 caacacatat tcatttaaga aaagctctgt gattgagaaa catttgtctg 33850 atgccagtta gcacatacca atgacggcaa gattcaggag cctgttatta 33900 aagcagtggc agcgagcacc tggaagaggc ggccaccatc accaggagcc 33950 agcagggatg actaataagc cgtgccagct gcatctcgtt tctctcttga 34000 cagttgctat gccagtagat gagggatgta ctgtggatac aatgctgtca 34050 tatcttattc agcagggcat ctgatagcat ccccaaaatc tgcctgagta 34100 gaagacagac agctgtggtc tgggtgccat ataggtaggt taaaatatat 34150 atttgggcct aggcgcagtg gctcatgcct gtaatcccag cactttggga 34200 ggccaaggca ggcggatcac ttgaagtcag gagttcaaga ccagcctggc 34250 caacatggcg aaaccccgtc tctactaaaa atacaaaaat tagctggaca 34300 tagtggtggg cggctgtaat cccagctact cgggaggctg aggcaggaga 34350 atctcttgaa cccagggaggc agaggttgca gtgagccgag atcatgccac 34400 tgcactccag cctgggcaac agagtgagac tctgtctcaa aaaaataaaa 34450 taaataaata aataaataaa atatatactt gggtaaagag gataaaagag 34500 ttagcgatga tgctgaattt ttgaactgag gtggctgttt tcaaggaaga 34550 ctggagggtg ggatgctacg tctagatatg ttgcagttta ggtgaatgtg 34600 agacttccct gttttgaagt caaatattgg accagtaaaa tctagccatc 34650 agcttaaatt cctatgatac aatttacata ctccccaggc tcaacacagt 34700 agatttctga atgtcctctg ccagctacat gctcctgccc acctcaatcc 34750 gagtagatgg aacaactaac caagccagct cagaccggtg gcacagctgt 34800 gctggctaac actgggcacc acctaagaga gtgcttctcc aaaagtgtgc 34850 ttccccaaat ggagcgaaat acgcttgagg aatgttgggt tgaaccatgt 34900 aaagcaggtc tcattcccgc agagcctttg gtaccccggt gtacactgta 34950 accccagaag tgtttcctga gcttgcctga cgagacaact tttccaagaa 35000 ccgtctcaag tgatgagtgt tttgtgagtc acactttggg gaaagcgggc 35050 ctaagttagc atctcctccc agctgcctcc ctgctttccc tggaacacta 35100 ggaactgccc gtcctccctc cctccctcct cttcccactt cacaacttag 35150 catcaggaat attttagttt tggtttttca aacatatata cctccttttt 35200 tcttatcttg tcaatatcat ctttttttt tctttgcttt tcctcatact 35250 tttttttctc ttcatccttt ccttctccaa gggttaactt tccaccttag 35300 gagaatcttt tctgcttttt ctcccacttc cccagctact ctcttatcat 35350 ctgctccaat ctcaccctaa ttgatcattt tgggaaaata tggtcagagt 35400 ccagataact aagttgagaa atgcttaaac tctgccatac ctttccagta 35450 aagaatatta cctaataaat aataaaatgg taatgggaaa cctgaaccct 35500 gaaaaaaaag aggtggaagg agaaacatt ggagcacatc ctgtctacaa 35550 attaggaact gcctgtgtta tctgttttat ggttatattc tagaagaaga 35600 aagggatttt gtagcacctg gttttgacct ttctgcactg tttgttgagc 35650 aaataaacct tatgggctgt tagccctctt tatagcctct cagcttatcc 35700 ctggcccaga caccctgctg tcattttgac ttttcattcc cacacacaca 35750 tacacatgca cacacatgta cacacacaca cataccattt aagattagac 35800 agaagtaatg ctcaaaatgg agtggcttct gagacattta gtccaagggt 35850 tcccaaacag gcttttcagt atcagatttc tttctgcccc attgaaatgc 35900 tacacaacct tccgcttaca gcaggtcaca agggtttcat tctacttgaa 35950 gtaggggcca tgtcccattt ccacttcctt ggcttcccat tcagtcactg 36000 ctaggatttg cctagacccc tgaggccaga caatgtagaa acttctgctc 36050 catgtcacag gtgaggaaac aggctcagag agggacaggc tccgaaagtc 36100 acatagacaa cagtagggct gcggctcaaa ccccagcgtc tgactccagg 36150 tttagtgcct tctcagggca tcagtgacac tcctcatggc cagggtgccc 36200 ccagtgttgc tcacagtctg gtatccaggg ctgagagtgt gctgtgtgct 36250 cagactgcct gggttcagtc ctggcactgc cactttacag tcagtgacct 36300 caggcaggtt acttaagctc tgcaggcctc agtttcctcc ttggtgggga 36350 gggttatgag gcatccttct catggtaaac cttcagtaaa taccagccgt 36400 tactaggagg gtccactcct gcctctccac tctccattca tcctgcctgt 36450 ttcctctgcc tgcttcctct gcctgcttct gtggtggtga attcttcatg 36500 gctcccaccg cctcctgctg cacccccact cagggcccgc atcaggaccc 36550 ttcctcctat tggtttgaac tccttggagt cagagggtaa tggatagtgg 36600 agtgagccag gtggcagaat ctcagaggcc atcccgggcc tataagcctc 36650 ttcaaaatag ggccacgtat caagctttac acacaggagt gaactttcac 36700 aagttgttat gactcatact ctgtctatag taagctgtta accactccca 36750 tttggcttat gcctctgtaa ttattgtact aacttatatc ttaaaataag 36800 gatattgaag gaatgagccg ggagaggctt tcctggttga gatatagaag 36850 aacaagagtt gctctttttc cttaaggtct ctcctcccac ccctgacctt 36900 agctcaccag catgggagaa tactatttga ctccttgtac tctgagacgt 36950 ggatttcaag atatagcatt ccaacttcaa cggcagcaag aaaagaagca 37000 acagaaggag aagacatcat agcaaacagg gatgcatgct gcatttccta 37050 atactcaaac ccggaaacga gacttcactc aaggtgaagg gagggcaggt 37100 caccacctgg tagcactagc cctaaattaa ggaatgcaga atgtttgtgg 37150 gattgcccat cataaaaatt acaaaatgag taaggaatgc aggcacagct 37200 ggccaggtgg gtttgtcaca accatggcag ccctttgccc cacagccagt 37250 acacagaact ggtctctcca attccgattg catatcttct ggcacctctg 37300 ttcctctccc tcagctgccc aggatttttc tggttctgac catgttactt 37350 cctcttttaa acctgttagc atttcacgac tgcctacagg caacggtcta 37400 aatggtcgga aggcccaagc ttagcatccg agaccctgac ctacctccag 37450 ccacttcctc ctcctctcca cttcactgga ctccccatct ccacccagac 37500 acctctgttc tcccctctgt gtgcctttgc ttatgctgtc ccctgtgttc 37550 ctagtgtgtc tctggctatc ttttaagctt ccctccccaa cctcattagt 37600 tctgtggagc ccctggaata gagctgactt ctccttccct gctgctccca 37650 ggctgctcag aactttctgg aaaggggatga ttatctgagt tccagcctca 37700 ccccagcccc cggactctga gtccctcatg tctgcctccc ttctttctct 37750 ctgaccacac agctggtaca tagtcagtac agacgcagtc agtgagtgga 37800 gcacggggct tctctccagg attcctgccc ctttgtttat ccctagtctc 37850 aggactccct actcctggtc ttctgcctaa atctgtgcct cttggaagtg 37900 aagcctccgt tcccagtggg gccaggtcct gacccttggg aacttgcagg 37950 atccctccct tgggcctctc cccgaagctt ccagctcaat gctgaccaga 38000 gcacaggctg cctgtgacag tccttggggt gacctccctt atcaggaaaa 38050 atgcagaaaa cctattaata ccttagcctt gtgattgtta atggtcacaa 38100 aactccttta gggtcctttg gactcagcac ctttatggtc tcactttgaa 38150 ttttgaacct cccacctccc cccatccccc agagtaaggc aaatggtctt 38200 ctgattgttc ctgcagaggg aaggctccac aggtaagcac acgatggcca 38250 ggaagcagag ctggagcctg cctgaaaggc tgtggagaaa tggaggggagg 38300 gctgccctga ggactctgtc tggctttgaa gttttctact gtttcctttt 38350 cttctgtgca ctgttttagg atgatggggt gatagttcca ggctggttga 38400 ggatggattt ggagacagtc ctttgtaccc tcagtgagca agagtatctg 38450 tcaccctacc tcagcagttg tctctgtcac tggtccaagc agctggttcc 38500 tacacaaggt caagatcaac tggggagaag cagactcctg ggtctatccc 38550 attagtgagg acagctgcct gggcttatgg cctcattggt ttggtttcta 38600 tcttgatcat ctctaccatc cccccatccc ggccttccat tttctacctc 38650 agctgtcagt gcacagattg atgtgtgtgg gaacggagct tgggaggagt 38700 ggggtagggc tggtcctgtc ctgtagcctc cccttccttc gggcacttgg 38750 accctttgga gcttgccggg gtggggaatg ggagtgggaa ggccagggag 38800 tgtctctgca ccatcactgt ttgagtgttg cccctttgct gtgtgcccca 38850 cctagtctat gtgtgtctct gttctctggg gactcaattt gctggtgaat 38900 tgcttccatg gacattgttc tgggaaatgc cattttttct gctcacccat 38950 gactctgtga caaggaatga cagcttatta ggaatttgtt tttgcattgg 39000 aacagtggtc atcagaatgg gccccttttc ccttgcagct ttgacatttg 39050 cctctctttt cctcacctct ctcccttgca tccacccttt tctctttttc 39100 ttcttttttg ttttccttct agcaggggcc ttttaccttt acttgttaat 39150 cctgtttgta gcaaagcaag tggaagggagg agttcctctc tgatctgctt 39200 cttatctcc acctaccttc tcttctgtac tttccgcctc ctagagagag 39250 agagagagag aggaatgccg acctaactac cgctgccact gctgctgcca 39300 ccaccgctgc caccaaccacc ctggtaatgt tcacatgtcc tcaaatcaac 39350 ccagagccag ggccctgctg gtcaggggga ggctatgtaa ataatcccat 39400 gagtgtgcca tcctcaggcc ctggggtctc ctaggcaaga ccagggcctc 39450 tgtgggctct ctcggaaatg ctgaggttgc tggaagccag cccgtcatac 39500 agggtctgag agtttaactt cttttaaatt aaaccacagt tgagctcatg 39550 ctgtgtgtgt ataaactttt gtatcctgct ttttccttaa attctttatc 39600 atcagcatct tcccatgtta tttcatagtc ttcatcatca tcactttcca 39650 taccttcata gtagttgatc gtagaattcc atcataatta acttgtcttt 39700 tctctcttag aagtccctta ggtaatgtcc aattttccgt gagtgtaagt 39750 aataccataa tgaacatctt ggagtctgaa gtttatctg tgttggtttg 39800 ttccacatt aggatcattt tcccaggcta gattttcaga tgtgggatta 39850 tgggttcaga tatggtttac acatttttat agttcttaat acagatggcc 39900 aaattgcttt ctgaaagaga agcttttctt aagtattttt ctccaacttg 39950 tatcttaaac atcctgaaca tgcttagcac cactgtcttg atatatctgc 40000 ggaaagccac gtctccactt ttcagtgtgt cgggccctgg gagaggcagg 40050 catcctgcgc tggctccttg gagctgggtt taaaattgtc tcctctggct 40100 gggcgtggtg gctcacacct gtaatcccag tactttggga ggccgaggtg 40150 ggcggatcac taggtcagga gatcgagacc atcctggcta acatggtgaa 40200 accccgtctc tactaaaaat acaaaaaatt agccgggcgt ggtggcgggc 40250 acttgaaaag tcccagctac tcgggaggct gaggcaggag aatgatatga 40300 acccgggagg cggagcttgc agtgagccga gatcgcgcca ctgcactcca 40350 gcctgggcga cagagtgaga ctccatttta aaaaaacaaa caaacaaaac 40400 aaaaaaaacaa acaaaacaaaa actgtctctt ctgtgctcac ttcacccaga 40450 atccctgttg ggctcttcaa ggagctcagt tctctctgaa agcaacttta 40500 tagcctcagt ccagtctgtg ttcctgtgtg gcaggggtca agggtatgct 40550 cactcttgag agtggtgtct ttggttgacc aagaaccact cccatagcct 40600 ggtccctaac ccttgaaggc ccatctctct cactcactgg ggtgaagagt 40650 ttaaatctca gatccaagtt ttgttgagag ctctgagcta ccatattgct 40700 atggttaaca atagttaaca atgttaacaa tggttaacta tggttaacaa 40750 tagttaacaa tgtttaacaa ctagagccca gctgggtgtg gtggcatgtg 40800 ctaacagtcc cagcttctca agaggctgag gtgagaagat tgctggagtc 40850 caggagctca aggccagcct gggcaacatg gcgagaccct gtctcccctg 40900 caaaaaaaca acaacaaacaa aagcaaaact agagcccaac tgctgtgaac 40950 tcatggctga gtagatatta ttagccctcc acaaactcag catttgtata 41000 atcccaggct gtttccagta attctctggg gatcatctcc cagcctgtcc 41050 actgttccag gatccacact taggcctata ggaatgcccc gtcagagctt 41100 ctgctgccgc tgatctgtta ctgtttcatg caacccactc ggcctagttc 41150 cttcctctta ctgtctcagt gggcacagaa aagcatacag agggtgtttc 41200 agcaaacatt gccactggct gcagacctgc ccccggatct gtcctgttga 41250 gagcttagtg ctgcgttctt gcatggtggg gaggggtgtg gctctgtgat 41300 gagccagggc atgtgtatag gagcaacagt gtctctctta tcacgtagaa 41350 gttctgactc attgcgagtc ttggctttgg gttaatggtt ccagccatgt 41400 tgctgctgtg tcttttggtg caggagaggc tgggcacagt tggtccctaa 41450 gccattatgg ataagggatg tgtctgctga tatacacaca tggacctgac 41500 atccagggaa ggcagggtga ttggacagaa cagttcttcc agaagctgtt 41550 ggaacttgga caagagtggc ccttggcttt ctgtagttgg tcatctgtcc 41600 cctgttgcaa tcaggggaag gccacacttg ccttccttaa ccacagttag 41650 gattttcttg gggattagac cagattctag cacctgtcct gaacctctcg 41700 ccccgcccct acaaaggctg cttgcaagtg tagtgcacat acacagggag 41750 caggtggggc atggaagtgg aagtggagcc cctgcctttg gcccttgggg 41800 gaggcactgt ctgcttaccc acggttgttg cctcatagga atcatacaac 41850 agcttcctaa ctggtctcct tgccttcagt tggattgggg cacaaatccc 41900 tccttgacat ataaaccatg gtttaaggct ccctgtggcc taaataaaga 41950 taaagcttaa gtatcttaac aagcacctaa cccttctccc cagcctcggt 42000 gatttggctc atcgctgcct tcatgtttca ttctggcttc actcattcgg 42050 aatttcttgt agttccttgg ctgttctctt ttccttaccg cctttacaaa 42100 tgctctcacc atgcatgctt ttctctgctc ctacagatgc cttctctccc 42150 agcaccgcct ccagagtcta tgtctggtcg attctgtctg ctgtctccag 42200 tccccatctt gtggcagtct ctgctcaatc atttggggat tttatatgtt 42250 ttctggcctt tcttttgggg gcctgtcttc tccttctaaa agcagccagt 42300 tgacctagaa ggaagggata actgtaactc ttgtctacca acataagatt 42350 aggcccacc tttaaaagct gcgtctttga aagggacacc tgcacccagc 42400 atgctggctt ctcttcacca agcgtgactt cctacgcatt tcacaggcct 42450 ccagaggtcc ccctgactct cttctgctgt gagaaactct aatcatgtaa 42500 gccacaggct aattcccttg agccttaaat gtttttagta atttcccatt 42550 catcagagaa gcaggatttg ggaggaattt tgaagcaaac actacagaag 42600 gcagagtctc caggtaggat atctaagaga catttggaat ggtctgactg 42650 ttcaagatgg atgggaaagc ctcttcctgt aatgatagta gccaaacattt 42700 gttgtcaggc agtggggccc catttttgag atggggtctc tgtcacccag 42750 gttggagtgc ggtggtgctg tcatggctca ctgcaacctc agcctccccg 42800 ggctgggtct tcttaattct gaaaaaccca gcttttaaag ggtggaccta 42850 atcttatgtt ggtagacaat gttgtctcat ttaatacaat gcacatgctc 42900 tccccataac acaaaagagg gaactgaggc ctggaggtgt gatgtacccc 42950 aagtcacata gctaataaat aaagaagcca gcattcctgg gattaaaaat 43000 gcatgtgtct gtcactgtgg tgtatttggt gcttgatcaa tgtttacttg 43050 agcaaatgga ggggcagagg taccgatgag tgtgctcagt gaggagggca 43100 ggagtgaagc tgggcgtctt cccgcctctt gtgagtggtg gggcttggtg 43150 agcttgccag ggcctgtctt tcttatcaaa gaaggtgtgt gccccagtgt 43200 tacagcattt cacccaaagc agcctagaaa atgcttgact tttctgtcat 43250 tccgggggagg acactttcct cctccactgt tctgctggcc tggtgtaccc 43300 acggcccctg atagatgata gcacctgcta aagtgcacca tgcccttccg 43350 tctcactgca tcccacagat gaggccaggc tgggatgagg gagaaaggga 43400 gggatatata gttcaggtta ttttggaaaa ctgcctgacc aattttaagt 43450 ctgggccgga cactggggca tctcaccacg ttgaaagggc cgtggcaccc 43500 cgggcggtga aaggggctgg aaccaggtct gcttcttggg cttctcctcc 43550 agggtgccat tgctcatggg ccttggctgc agaggtgctc attcgtggtt 43600 ccaaaattcc aattcctggg agaggaaaaaa tgcttagttc agtctcagtt 43650 aggcctctgc ttagatcaaa cagccaaggc cagtaggccc agtcctatgg 43700 tagagacatg gcctcaaaga gccctctgct gcagttgttg gggagtgtac 43750 caagagaagg gagcattgtc ctgggctggg cagccctggg ggtctagtgc 43800 atagatgtag aaaggctctg ttggtatacc tccctttgct tgttggaaag 43850 tgctcaacgg ggctgaattg tgtttgacag tgtaagtctg ggctggggtg 43900 agggttgtta caagatgtc aagatgatta aatgaaatgc catttgaaac 43950 acttatccat gccttgtgta tggtatcccc accagtgaat attcacagta 44000 tattataata attccaaacaa cttcataatt ttcatatgca atttctaaac 44050 tttgaacttt tttttttttt tttttttttt tgagacagtg tctcgctctg 44100 ttgcccaggc tggagtgcag tggcgcaatc ttggctcact gcaacctcca 44150 cctcccggct tcaagtgatt ctcctgcctc agcctcctga gtagctagga 44200 atccaggcgc ccgccaccac acccagctaa tttttgtatt tttagtagag 44250 acgggctttc gccatgttgg ccaggctggt ctcaaactcc tgacctgagg 44300 tgatccaccg ccttggcctt ccaaagtgct aggattacat acgtgagcca 44350 ctgtgcccgg caattttttg tgtttttagt agagatgggg tttcaccatg 44400 ttggccaggc tggtctcgaa ctcctgacct caagtgatct gcccgcctca 44450 gcctccctaa tgctgggatt acaggtgtga gccaccacgc ccagcctaaa 44500 ctttgaattt ctttgaaccc atgacttaca cagaattagc tgaacgcaga 44550 attccaaatc aactcagcct gtgggacagc caaaaaacac agtgtgcctt 44600 tgggctcctt cactcaccac gcggggttag aaaactttgt cagaggcttt 44650 aaaaaaggag ctcttgtgtg taaaatgttt ccttgattct ctttctggtg 44700 cctctctttc tctaagtggt ttgcttcccc aagttccccca cctgagtctg 44750 ggtggctgtg gcacatctgt gcattctgta cgcacacagg cagccttttg 44800 gagtgccagt ttccaggtct tggttttat tatttattta tttattttt 44850 tgagatgggg gtctcactct gccgcccagg ctggagtgca gtggtgccgt 44900 catggctcac tgcaacctca acctccctgg gatcagttga gcctcctacc 44950 tcagcctcca gagtactagg gaccaccatg cctggcaaat ttttgtaatt 45000 ttttgtagag gcagagtctc accatgttgc tcaggctggt ctcgagctcc 45050 tagactcaag tgatctgccc accttggcct cccaagtgtt aggattacaa 45100 gtgtgagcca ccatgcccag cccaggtcat cttttgaggg catggagaga 45150 agactttgag catcccactt ttgagattgt gtaccagtcg caagccccta 45200 tgacacactt tttccccaaa gtagagggct ctgactatgt tgatcccaag 45250 agagatggga aagagcattg aatgaggatt ccaaagtatt gggccttagt 45300 tcgtttcctc atgttggtgt tgtgaagatt ctggttagga taacagcatg 45350 tgtgcaggag gctttgtgaa ctgctgagag tgaggcgtgg caatgtcagt 45400 gctaggtttg tccttactaa cctggggcca tgggaattga taagaccaga 45450 ttcccaactc taccccacaa tgtgatccct gtggtgaccc ctcacagggc 45500 tctttggtcg agcttcccag aagggatcac catctgccat tgtatgttga 45550 accccattca ttcattcatt cattcagcca accagcaact atttgttgag 45600 ctcttattgt gtgagaagca gtcttcaagg aactgggtga ataaaaaaaa 45650 caaaacatcc taaccttcat tgagcttaca ttcttactga aagaaaacaa 45700 ataaaacata catgtaatcc tagcactttg ggaggccaag gcaggcggat 45750 cacttgaggt caggaatttg aaaccagcct ggccaacgtg aaacccatct 45800 ctactgaaaa ttaaaaaaaa aaaaaaaaaaa aagccgggca tggtggcaca 45850 tgcctgtaat cccagctact cgcgaggcta aggcaggaga atcgcttgaa 45900 tcctggaggc agaggttgca gtgagccaag atcataccat tatactccag 45950 cctcagtgat gaagcaagac tccatctcaa aaataaaaaa taaaaataaa 46000 aatatgcatt ccctttgcac cagcacactt ggtgcctggg gacctcgtgg 46050 ttggcaccct gaagcaggtg tccctcttct gtcttgcaca ccttgcttct 46100 gtcctggtgt gtatggcatg gccttctgcc ctccatggtg agcactgtga 46150 gggcagaggt tgagttgggt ttgctgtatt tctcaggtgc ctaggtttgt 46200 gcttgacagg tagatggaag gcacacaatg tggtcatcaa acctcagtca 46250 accatataag gaaggtagaa gtgaaaagtc ccataggtac ccaactaatg 46300 tcaccagttt cctggatacc tttcctggag tttatttata gtgtgtataa 46350 ataaatgatg tatgtgttta aatgcctttt tcacctttcc ttttagagct 46400 gcctcttttt aacagttcca ttccattgta tggatgtact atgatttatt 46450 gaaccagttc cctactgatt attctgtttt ttgcagtctt ttgttatgat 46500 gaacattcca cagtgacaat gttgttcata gtcattcaca cacatgcaag 46550 tccttctgca ggatatattt ctagagggga attgctgact cagaggtttt 46600 ggtactctgt gttgattgta gagtgacggc agaaaagtga ggcccaagag 46650 tttcctagtg accatgtgta gtggacaagt caccagtccc tgtgagtgtt 46700 tggcccaaag gctttaaggc atttgatatc actgtttttg tttctgcacc 46750 aggcgggaga cactatattc aatcgtgcta agctcctcaa tgttggcttt 46800 caagaagcct tgaaggacta tgactacacc tgctttgtgt ttagtgacgt 46850 ggacctcatt ccaatgaatg accataatgc gtacaggtgt ttttcacagc 46900 cacggcacat ttccgttgca atggataagt ttggattcag gtaagagata 46950 ctcagtcaga atctgtggta aacatgtctc tctcatgtgt tgactaggaa 47000 atgcagtcct ggcagctcaa gagtgcctct ttaagctctg gagcagaatg 47050 cctcctctga gaaatgggtg ctttgtatta gttgagatgg aaagaagaga 47100 ccagaaatgc ctgtagtctc tgcacatcca gacaaaaaca aattttcccc 47150 cctttttttt ttttgtttgt tttttgagac agggtctggc tctgtcaccc 47200 aggctggagt gcagtgccgt gatcttggct caccgcaacc tctgcctccc 47250 gggttcatgc catcctgtca cctcagcctc ctgagtagct gggactacaa 47300 acacttgcca ccatgcgcag ctaatttttg tatattttgt agagatgggg 47350 ttttgctgta ttgcccagtc tggtctcgaa ctcctgagct caagcaatcc 47400 atctgccttg gcctctcgaa gtgctggatt ataggcatgt ggcaccatgc 47450 ctggcctaag aacagttttt agcatttggg aggggctctc atctttaagc 47500 tccaaatgat actgtatttt cttgcttttt tctttctctt gccccacaag 47550 ttttggaaag taaattggaa tagttttccc ccactgaatt atttagcttg 47600 tatacctcag cagatgttcc ttggcctgtt ttgttttgtt tttgagacag 47650 ggtcttgctc tgtcacccag gctggagtgc agtgacacaa tcatggctca 47700 ctgcagcctt gactgcctgg gctcaatcca tcctgcagcc tcagcctcct 47750 gagtagttgg gactacaggc atgagccagc atgtccagct aattttttat 47800 ttttagtgga gatgaggtct ggctatgttg cccaagctgg gcttgaactc 47850 ttgggctcaa gtgatcctct cacctcagcc ttccaaagca ttgggattac 47900 aggtgtgaac cactgctccc gcccttggcc ctataagaag gaatgtgatt 47950 ctgttttcca gcagggcaca aacttctgct taaatacaaa gcccaaattt 48000 ttccaccaaa atgcccctag tgaagtggcc agcccagatg cccgactagc 48050 gtattatcca aagcatattg tcattggtgg aaaatggcct tatagtccat 48100 tgttttgtct taaaagtaaa tatataaata aacttgtata ttgtttccta 48150 attccgtgtt tatattaaca taaaagtgtt ttaaattacc tgtcagtggc 48200 caggtgcagt ggctcgtgcc tgtaatcgca gcactttggg aggccgaggc 48250 gggcagatca cctgaggtca ggagttcgag accagcctga ccagcatggt 48300 gaaaccctgt ctctactaaa aatacaaaaa ttagccaggt gtggtggcag 48350 gtgcctgtaa tcccagctac tcgggaagct gaggcaggag aattgcttga 48400 acccgggagg cagaggttgc agtgagttga gatcgcgcca ttgaacttca 48450 acttgggcaa cagagcaaga ctctgtctca gagaaagaaa aaaaaaaacc 48500 tatcagttga ataacaaaac cctttccttc cttgctttaa gtgaatctga 48550 agatccagga gctgtgctgc aggtaccctc tatgttgggt acccctggtt 48600 taggctgact agtacagtgt ggttggctca tgtagacagc agacccttta 48650 ttttagatac aacttttttt ctttttcttt tatttttttt gagacagagt 48700 cttgcttgtc acccagcctg gagtgcagtg gcgtgatcat ggctcactat 48750 agccttaaac tccctggctc aagtgatcct ctcacctcgg ctttcctagt 48800 agctgggacc acaggtgtgg gccagcaccc ctggctgatt taaaaaaaaa 48850 aaaatttttt tttttagaga tgtctcacta tgttacccag gctggtcttg 48900 aactcctggg ggctcaagca atcctcctgc tttgacctcc caaagtgctg 48950 ggatgacagg catgaactac tgcacctgct gagatgcaac agctttctgt 49000 cagactcatt ttattctcat catttcttcc tgtcctccct tgctgggagc 49050 atgagagctg tgatgggaat ataggaatgt atgaagtcct tctcccagat 49100 caaaaatcct aacttcttgt cttaaaggga ggaaaatttg aatgtaacct 49150 tacttttaga ctcttcagaa atccttctat acccttccgt ccccgctttc 49200 acccttcctc cctctccgtg tgtgtatctt cttctcttga aacacacagg 49250 tttataccct gacccctctt gattcatccc ttgaagcaca gtggtgaaca 49300 aggaaggggc ccgtgatgcc ctaattcttt gccacagcac catgtttgtt 49350 tcacaaggag cctggcaggt ttgggcttgg ggcagatagg ggagagaaag 49400 cagcagagac agcaaaacca aatcatgtca gcttggcatg tacttccctc 49450 tgaaatagct aagaatccat ttctgtaaaa gcactgatta tcagaaaacc 49500 ttatggcct ggccaccttt ggttcaaacc ctcacattaa taatgtggac 49550 agtagtatga ggtgtgccaa aggtggatga ctcagcacct aagtgatgac 49600 acctaattac gaataggttc attaaagcag accccctggg gacctttgct 49650 tgaggatcct tacagtcaga attcctgaat atatttgaaa ataataattg 49700 catctttat ttcatatgtt ctgtatggtt tggctgactt ccccctcaaa 49750 gtctgagtta gagttttcct taatttatgt gatgggtttg gtctttttgg 49800 attccagaaa gagctgggtg tggtttggag ctgcactcag agtcacacaa 49850 aaccacagcc tttagagaac ccacaggaag gctttggggc acgtcctgat 49900 tcttgacatt tctcatcagt gctgactttg tatcccttag gagttcacaa 49950 ttcataacca ctgaaatatt aaaatacaaa aagttttgga aggatgagag 50000 cccagatgct ctactacttg aaaatatgtt aaaacataag ttcatcatta 50050 tacattttgc taaatcagga taaagtctga agtttcaaag aagttttatt 50100 ttagcaaatt ttcagaaaca ctgcctcaac tgttagggcc agtgttctag 50150 tcagtatgcc tttggaagca tgaaagctgg attggtcgat aggatgggtg 50200 tggaaggggg gctgtgactg ggtgggtaca gagaggctct gaaacaatct 50250 cagattccag gagttcctgg ataaggactt catgtgcggg aacagagcac 50300 aggagaagca gattcctgag ccactcagga agaactgggc ctaggcctgc 50350 tcttgtcact gactggcttt ctacataacc acagaaacag cactgtgttg 50400 tagaaagagg aagatcatac tttttgatat ctgtgtctaa tttaaggtca 50450 tctgagccct gatagaaaag caaaacagac aaaacccttg taactgctcc 50500 ctcccacccc acccaccatc aaaaaagctt tagagaggct ggacatggtg 50550 gctcttgcct gtgatcccag cactttggga ggctaaggtg ggtggatcac 50600 ctgaggtcag gagttcgaga ccagcctgac caatatggtg aaaccccatc 50650 tgtactaaaa atacaaaaat tagccaggtg tggtggcaca cgcctgtagt 50700 cccagctact tgggaggctg agacaggaga attacttgaa aacctgggag 50750 gcggaggttg cagtgagccg agatcacgcc attgtactcc agcctgggct 50800 acagagcgag actccttcaa aaaaaaaaaaa aaaaaaagat ccggtttggt 50850 gtcttacaac tgtaatccca gcactttggg aggccgaggc cggtggatca 50900 cgaggttaag agatcaagac catcctgacc aacatggtga aaccctgtct 50950 ctactaaaaa ttagctgggc gtggtggcag gcgcctgtag tcccagctcc 51000 tcaggaggct gaggcagaag aatcgcttga acccgggagg cggaagttgc 51050 agtgagccta gatcgcgccc ctgcactcca gcctggcaac agagcaagac 51100 tacgtctcaa aaaaaaaata aataaaaact ctagagaagc aaaaagaata 51150 actttaaaag tgtttatgtt ctcagcaagc tttatttgg ggatgtcaga 51200 acttaactaa ccactgctcc ttctgtgtgt atgtttttcc tccagcctac 51250 cttatgttca gtattttgga ggtgtctctg ctctaagtaa acaacagttt 51300 ctaaccatca atggatttcc taataattat tggggctggg gaggagaaga 51350 tgatgacatt tttaacaggt aatggtcata acttagatat ctttctcctc 51400 tgtcaacctt cacttccagt tttttaacca atgcttggtt gttcccccaag 51450 gactgaccct cagatgggat gcacccctag tcagcccaca ttcttaggtg 51500 tggcttccta caggtcctgc aggtgctaaa agggatctgt aggaaaaatga 51550 gtttctgaga tttttgtatt ggcctggaaaa aatgtcaaat gggaaccaag 51600 tgacggggca agtttacttt gacttgctgc atgccgtttt gtactcaagg 51650 agtaaaccaa tgtcctttgt aaaaatccct cctttcatta tggtcccctt 51700 tcactgtgaa acaagtttcc ttgagcagaa tcctaactgt cttcacagaa 51750 gctttgtgtt atatttttat tttggagtat tttcacatat acaaaagaga 51800 tactgtagta taataaacct ttgaggacct atccagcccc agcaaccatt 51850 atggcctggt cagttctgtc ccatccacat cctggggctc tttttaagct 51900 ggtaaatcat tatgatgtgg gttgtcattt acagtggtaa aaaacatcta 51950 tcagtagcat ttgaaagaac attctgctca gtcctctggc tgtagaggct 52000 tcaaccccac cagccaccga tgagcacctt ctccctccag gagccagtct 52050 gagctcatta ctgagtttaa tatcagaata caccctggtg cagcctttct 52100 aaattgcagt accagttaac agaaggtgtc tgtcagagca acacccaagt 52150 cattcaagtt accattgtgt gcaaacttaa cagagaccca cgtcttcaat 52200 ataagccttg aaggaaactc cagttttagt atgtagatgg ggtatcaagt 52250 gtgtgcacat tgaacatctg ctgcatacag agcactgtgc caggcaggcc 52300 caggacactg aaaacctgga catagggtcc agacagaagc aagcctgctt 52350 ccacagaggc actcctgggc agacactctg gactgatatg acagtgtgca 52400 gggccgacag gataccacag gtctgaatgg tcagaacagc tggggagggga 52450 gggagcatcc gcaggcatct agtcccatgc taacgcagtg gcactagaag 52500 gatgggtggt gtgtggagca actttcttga aagataaagg acctaacact 52550 ttctatgcac cacttactgt gtgccaggca aggccaggaa tgtttaagtg 52600 gtctgggatc agccagttct gcctcttaac taactttgct gtcctgctct 52650 ccaggctttc attttggtcc tcattccttt tccttggacc aacacagaat 52700 cctccaccct gttctggctg cctctagtct tgttctcagc cctccatttg 52750 tttttttctg ccttttccca catgttctga agccctccat tcgtatacta 52800 ctttccagag acttccccat ggctaaaagc attttggaaa tactgtatat 52850 taggcccctt tcagatactg gcaaccgttt gtgggatgct ctgagaaggc 52900 ctctgtgact tagcctggcc cttttcagcc catcacctgc cacgtcctac 52950 cccagaccct tgtcaccagt ccccaggagc ttacgttgct ccctgagggc 53000 actaggcttg ctctcacttc catgcctttg cctgtgccat cctggctgcc 53050 caaaatgcta tggcagatac ctgttcatcc tcaactgggc tctgcctagg 53100 cttgctccag cagaggttac aaactctatg cttcttcctc tgtgtctcca 53150 acctcatctt cctcttctca cctccatcct ggccctaaag gccctatgtt 53200 tgaagcattc acactgtata ttctgtgggg cacacggccc cagtgtctgg 53250 cacatggtag tcaacaccac aaaccgcaga accagttgta aaaggacatg 53300 gagtcggaat gtgagtttta accagggtca tgctgggctg ggttctggca 53350 tgatgctggg ttgtgggctg agtgagaaca gcaagggtga tggtggatgg 53400 agcaacagtc ttgcagccgg ggctctcagg ccaagtgtat ggcagctctg 53450 tgataatgac tttcccttta ctctttgcag attagttttt agaggcatgt 53500 ctatatctcg cccaaatgct gtggtcggga ggtgtcgcat gatccgccac 53550 tcaagagaca agaaaaatga acccagtcct cagaggtgca ttctttgttt 53600 attcatactc cttccccctt taggatgagg taggctgcag gtccgaggct 53650 ctgggcctag agggaaattg aggtggtcag gttacagtgg agagggagga 53700 ggaagtacgt gtgatgattt cttcttaaga tttttgtttt aagacaatct 53750 ccttgtgctc ttttccttgt aggtttgacc gaattgcaca cacaaaggag 53800 acaatgctct ctgatggttt gaactcactc acctaccagg tgctggatgt 53850 acagagatac ccattgtata cccaaatcac agtggacatc gggacaccga 53900 gctagcgttt tggtacacgg ataagagacc tgaaattagc cagggacctc 53950 tgctgtgtgt ctctgccaat ctgctgggct ggtccctctc atttttaacca 54000 gtctgagtga caggtcccct tcgctcatca ttcagatggc tttccagatg 54050 accaggacga gtgggatatt ttgcccccaa cttggctcgg catgtgaatt 54100 cttagctctg caaggtgttt atgcctttgc gggtttcttg atgtgttcgc 54150 agtgtcaccc cagagtcaga actgtacaca tcccaaaatt tggtggccgt 54200 ggaacacatt cccggtgata gaattgctaa attgtcgtga aataggttag 54250 aatttttctt taaattatgg ttttcttatt cgtgaaaatt cggagagtgc 54300 tgctaaaatt ggattggtgt gatctttttg gtagttgtaa tttaacagaa 54350 aaaacacaaaa tttcaaccat tcttaatgtt acgtcctccc cccaccccct 54400 tctttcagtg gtatgcaacc actgcaatca ctgtgcatat gtcttttctt 54450 agcaaaagga ttttaaaact tgagccctgg accttttgtc ctatgtgtgt 54500 ggattccagg gcaactctag catcagagca aaagccttgg gtttctcgca 54550 ttcagtggcc tatctccaga ttgtctgatt tctgaatgta aagttgttgt 54600 gttttttttt aaatagtagt ttgtagtatt ttaaagaaag aacagatcga 54650 gttctaatta tgatctagct tgattttgtg ttgatccaaa tttgcatagc 54700 tgtttaatgt taagtcatga caatttattt ttcttggcat gctatgtaaa 54750 cttgaatttc ctatgtattt ttaattgtggt gttttaaata tggggaggggg 54800 tattgagcat tttttaggga gaaaaataaa tatatgctgt agtggccaca 54850 aataggccta tgatttagct ggcaggccag gttttctcaa gagcaaaatc 54900 accctctggc cccttggcag gtaaggcctc ccggtcagca ttatcctgcc 54950 agacctcggg gaggatacct gggagacaga agcctctgca cctactgtgc 55000 agaactctcc acttccccaa ccctccccag gtgggcaggg cggagggagc 55050 ctcagcctcc ttagactgac ccctcaggcc cctaggctgg ggggttgtaa 55100 ataacagcag tcaggttgtt taccagccct ttgcacctcc ccaggcagag 55150 ggagcctctg ttctggtggg ggccacctcc ctcagaggct ctgctagcca 55200 cactccgtgg cccacccttt gttaccagtt cttcctcctt cctcttttcc 55250 cctgcctttc tcattccttc cttcgtctcc ctttttgttc ctttgcctct 55300 tgcctgtccc ctaaaacttg actgtggcac tcagggtcaa acagactatc 55350 cattccccag catgaatgtg ccttttaatt agtgatctag aaagaagttc 55400 agccgaaccc acaccccaac tccctcccaa gaacttcggt gcctaaagcc 55450 tcctgttcca cctcaggttt tcacaggtgc tcccaccccca gttgaggctc 55500 ccacccacag ggctgtctgt cacaaaccca cctctgttgg gagctattga 55550 gccacctggg atgagatgac acaaggcact cctaccactg agcgcctttg 55600 ccaggtccag cctgggctca ggttccaaga ctcagctgcc taatcccagg 55650 gttgagcctt gtgctcgtgg cggaccccaa accactgccc tcctgggtac 55700 cagccctcag tgtggaggct gagctggtgc ctggccccag tcttatctgt 55750 gcctttactg ctttgcgcat ctcagatgct aacttggttc tttttccaga 55800 agcctttgta ttggttaaaa attattttcc attgcagaag cagctggact 55850 atgcaaaaag tatttctctg tcagttcccc actctatacc aaggatatta 55900 ttaaaactag aaatgactgc attgagaggg agttgtggga aataagaaga 55950 atgaaagcct ctctttctgt ccgcagatcc tgacttttcc aaagtgcctt 56000 aaaagaaatc agacaaatgc cctgagtggt aacttctgtg ttattttact 56050 cttaaaacca aactctacct tttcttgttg ttttttttt tttttttttt 56100 ttttttttgg ttaccttctc attcatgtca agtatgtggt tcattcttag 56150 aaccaaggga aatactgctc cccccatttg ctgacgtagt gctctcatgg 56200 gctcacctgg gcccaaggca cagccagggc acagttaggc ctggatgttt 56250 gcctggtccg tgagatgccg cgggtcctgt ttccttactg gggatttcag 56300 ggctgggggt tcagggagca tttccttttc ctgggagtta tgaccgcgaa 56350 gttgtcatgt gccgtgccct tttctgtttc tgtgtatcct attgctggtg 56400 actctgtgtg aactggcctt tgggaaagat cagagagggc agaggtggca 56450 caggacagta aaggagatgc tgtgctggcc ttcagcctgg acagggtctc 56500 tgctgactgc caggggcggg ggctctgcat agccaggatg acggctttca 56550 tgtcccagag acctgttgtg ctgtgtattt tgatttcctg tgtatgcaaa 56600 tgtgtgtatt taccattgtg tagggggctg tgtctgatct tggtgttcaa 56650 aacagaactg tatttttgcc tttaaaatta aataatataa cgtgaataaa 56700 tgaccctatc tttgtaac 56718 <210> 3 <211> 4214 <212> DNA <213> Homo sapiens <220> <223> wild-type B4GALT1 mRNA sequence <400> 3 gcgccucggg cggcuucucg ccgcucccag gucuggcugg cuggaggagu 50 cucagcucuc agccgcucgc ccgccccccgc uccgggcccu ccccuagucg 100 ccgcugguggg gcagcgccug gcgggcggcc cgcgggcggg ucgccucccc 150 uccuguagcc cacacccuuc uuaaagcggc ggcgggaaga ugaggcuucg 200 ggagccgcuc cugagcggca gcgccgcgau gccaggcgcg ucccuacagc 250 gggccugccg ccugcucgug gccgucugcg cucugcaccu uggcgucacc 300 cucguuuacu accuggcugg ccgcgaccug agccgccugc cccaacuggu 350 cggagucucc acaccgcugc agggcggcuc gaacagugcc gccgccaucg 400 ggcaguccuc cggggagcuc cggaccggag gggcccggcc gccgccuccu 450 cuaggcgccu ccucccagcc gcgcccgggu ggcgacucca gcccagucgu 500 ggauucuggc ccuggccccg cuagcaacuu gaccucgguc ccagugcccc 550 acaccaccgc acugucgcug cccgccugcc cugaggaguc cccgcugcuu 600 gugggcccca ugcugauuga guuuaacaug ccuguggacc uggagcucgu 650 ggcaaagcag aacccaaaug ugaagauggg cggccgcuau gcccccaggg 700 acugcgucuc uccucacaag guggccauca ucauuccauu ccgcaaccgg 750 caggagcacc ucaaguacug gcuauauuau uugcacccag uccugcagcg 800 ccagcagcug gacuauggca ucuauguuau caaccaggcg ggagacacua 850 uauucaaucg ugcuaagcuc cucaauguug gcuuucaaga agccuugaag 900 gacuaugacu acaccugcuu uguguuuagu gacguggacc ucauuccaau 950 gaaugaccau aaugcguaca ggguuuuuc acagccacgg cacauuuccg 1000 uugcaaugga uaaguuugga uucagccuac cuuauguuca guauuuugga 1050 ggugucucug cucuaaguaa acaacaguuu cuaaccauca auggauuucc 1100 uaauaauuau uggggcuggg gaggagaaga ugaugacauu uuuaacagau 1150 uaguuuuuag aggcaugucu auaucucgcc caaaugcugu ggucgggagg 1200 ugucgcauga uccgccacuc aagagacaag aaaaaugaac ccaauccuca 1250 gagguuugac cgaauugcac acacaaagga gacaaugcuc ucugaugguu 1300 ugaacucacu caccuaccag gugcuggaug uacagagaua cccauuguau 1350 acccaaauca caguggacau cgggacaccg agcuagcguu uugguacacg 1400 gauaagagac cugaaauuag ccagggaccu cugcugugug ucucugccaa 1450 ucugcugggc uggucccucu cauuuuuacc agucugagug acaggucccc 1500 uucgcucauc auucagaugg cuuuccagau gaccaggacg agugggauau 1550 uuugccccca acuuggcucg gcaugugaau ucuuagcucu gcaagguguu 1600 uaugccuuug cggguuucuu gauguguucg cagugucacc ccagagucag 1650 aacuguacac aucccaaaau uugguggccg uggaacacau ucccggugau 1700 agaauugcua aauugucgug aaauagguua gaauuuuucu uuaaauuaug 1750 guuuucuuau ucgugaaaau ucggagagug cugcuaaaau uggauuggug 1800 ugaucuuuuu gguaguugua auuuaacaga aaaacacaaa auuucaacca 1850 uucuuaaugu uacguccucc ccccaccccc uucuuucagu gguaugcaac 1900 cacugcaauc acugugcaua ugucuuuucu uagcaaaagg auuuuaaaac 1950 uugagcccug gaccuuuugu ccuaugugug uggauuccag ggcaacucua 2000 gcaucagagc aaaagccuug gguuucucgc auucaguggc cuaucuccag 2050 auugucugau uucugaaugu aaaguuguug uguuuuuuuu uaaauguag 2100 uuuguaguau uuuaaagaaa gaacagaucg aguucuaauu augaucuagc 2150 uugauuuugu guugauccaa auuugcauag cuguuuaaug uuaagucaug 2200 acaauuuauu uuucuuggca ugcuauguaa acuugaauuu ccuauguauu 2250 uuuauuggg uguuuuaaau auggggaggg guauugagca uuuuuuaggg 2300 agaaaaauaa auauaugcug uaguggccac aaauaggccu augauuuagc 2350 uggcaggcca gguuuucuca agagcaaaau cacccucugg ccccuuggca 2400 gguaaggccu cccggucagc auuauccugc cagaccucgg ggaggauacc 2450 ugggagacag aagccucugc accuacugug cagaacucuc cacuucccca 2500 acccuccca ggugggcagg gcggagggag ccucagccuc cuuagacuga 2550 ccccucaggc cccuaggcug ggggguugua aauaacagca gucagguugu 2600 uuaccagccc uuugcaccuc cccaggcaga gggagccucu guucuggugg 2650 gggccaccuc ccucagaggc ucugcuagcc acacuccgug gcccacccuu 2700 uguuaccagu ucuuccuccu uccucuuuuc cccugccuuu cucauuccuu 2750 ccuucgucuc ccuuuuuguu ccuuugccuc uugccugucc ccuaaaacuu 2800 gacuguggca cucaggguca aacagacuau ccauucccca gcaugaaugu 2850 gccuuuuaau uagugaucua gaaagaaguu cagccgaacc cacaccccaa 2900 cucccuccca agaacuucgg ugccuaaagc cuccuguucc accucagguu 2950 uucacaggg cucccacccc aguugaggcu cccacccaca gggcugucug 3000 ucacaaaccc accucuguug ggagcuauug agccaccugg gaugagauga 3050 cacaaggcac uccuaccacu gagcgccuuu gccagggucca gccugggcuc 3100 agguuccaag acucagcugc cuaaucccag gguugagccu ugugcucgug 3150 gcggacccca aaccacugcc cuccugggua ccagcccuca guguggaggc 3200 ugagcuggug ccuggcccca gucuuaucug ugccuuuacu gcuuugcgca 3250 ucucagaugc uaacuugguu cuuuuuccag aagccuuugu auugguuaaa 3300 aauuauuuuc cauugcagaa gcagcuggac uaugcaaaaa guauuucucu 3350 gucaguuccc cacucuauac caaggauauu auuaaaacua gaaaugacug 3400 cauugagagg gaguugguggg aaauaagaag aaugaaagcc ucucuuucug 3450 uccgcagauc cugacuuuuc caaagugccu uaaaagaaau cagacaaaug 3500 cccugagugg uaacuucugu guuauuuuac ucuuaaaacc aaacucuuacc 3550 uuuucuuguu guuuuuuuuu uuuuuuuuuu uuuuuuuuug guuaccuucu 3600 cauucauguc aaguauggg uucaucuua gaaccaaggg aaauacugcu 3650 ccccccauuu gcugacguag ugcucucaug ggcucaccug ggcccaaggc 3700 acagccaggg cacaguuagg ccuggauguu ugccuggucc gugagaugcc 3750 gcggguccug uuuccuuacu ggggauuuca gggcuggggg uucagggagc 3800 auuuccuuuu ccugggaguu augaccgcga aguugucaug ugccgugccc 3850 uuuucuguuu cuguguaucc uauugcuggu gacucugugu gaacuggccu 3900 uugggaaaga ucagagaggg cagagguggc acaggacagu aaaggagaug 3950 cugugcuggc cuucagccug gacagggucu cugcugacug ccaggggcgg 4000 gggcucugca uagccaggau gacggcuuuc augucccaga gaccuguugu 4050 gcuguguauu uugauuuccu guguaugcaa auguguguau uuaccauugu 4100 guaggggggcu gugucugauc uugguguuca aaacagaacu guauuuuugc 4150 cuuuaaaauu aaauaauaua acgugaauaa augacccuau cuuuguaaca 4200 aaaaaaaaaaaaaa 4214 <210> 4 <211> 4214 <212> DNA <213> Homo sapiens <220> <223> variant B4GALT1 mRNA sequence <400> 4 gcgccucggg cggcuucucg ccgcucccag gucuggcugg cuggaggagu 50 cucagcucuc agccgcucgc ccgccccccgc uccgggcccu ccccuagucg 100 ccgcugguggg gcagcgccug gcgggcggcc cgcgggcggg ucgccucccc 150 uccuguagcc cacacccuuc uuaaagcggc ggcgggaaga ugaggcuucg 200 ggagccgcuc cugagcggca gcgccgcgau gccaggcgcg ucccuacagc 250 gggccugccg ccugcucgug gccgucugcg cucugcaccu uggcgucacc 300 cucguuuacu accuggcugg ccgcgaccug agccgccugc cccaacuggu 350 cggagucucc acaccgcugc agggcggcuc gaacagugcc gccgccaucg 400 ggcaguccuc cggggagcuc cggaccggag gggcccggcc gccgccuccu 450 cuaggcgccu ccucccagcc gcgcccgggu ggcgacucca gcccagucgu 500 ggauucuggc ccuggccccg cuagcaacuu gaccucgguc ccagugcccc 550 acaccaccgc acugucgcug cccgccugcc cugaggaguc cccgcugcuu 600 gugggcccca ugcugauuga guuuaacaug ccuguggacc uggagcucgu 650 ggcaaagcag aacccaaaug ugaagauggg cggccgcuau gcccccaggg 700 acugcgucuc uccucacaag guggccauca ucauuccauu ccgcaaccgg 750 caggagcacc ucaaguacug gcuauauuau uugcacccag uccugcagcg 800 ccagcagcug gacuauggca ucuauguuau caaccaggcg ggagacacua 850 uauucaaucg ugcuaagcuc cucaauguug gcuuucaaga agccuugaag 900 gacuaugacu acaccugcuu uguguuuagu gacguggacc ucauuccaau 950 gaaugaccau aaugcguaca ggguuuuuc acagccacgg cacauuuccg 1000 uugcaaugga uaaguuugga uucagccuac cuuauguuca guauuuugga 1050 ggugucucug cucuaaguaa acaacaguuu cuaaccauca auggauuucc 1100 uaauaauuau uggggcuggg gaggagaaga ugaugacauu uuuaacagau 1150 uaguuuuuag aggcaugucu auaucucgcc caaaugcugu ggucgggagg 1200 ugucgcauga uccgccacuc aagagacaag aaaaaugaac ccaguccuca 1250 gagguuugac cgaauugcac acacaaagga gacaaugcuc ucugaugguu 1300 ugaacucacu caccuaccag gugcuggaug uacagagaua cccauuguau 1350 acccaaauca caguggacau cgggacaccg agcuagcguu uugguacacg 1400 gauaagagac cugaaauuag ccagggaccu cugcugugug ucucugccaa 1450 ucugcugggc uggucccucu cauuuuuacc agucugagug acaggucccc 1500 uucgcucauc auucagaugg cuuuccagau gaccaggacg agugggauau 1550 uuugccccca acuuggcucg gcaugugaau ucuuagcucu gcaagguguu 1600 uaugccuuug cggguuucuu gauguguucg cagugucacc ccagagucag 1650 aacuguacac aucccaaaau uugguggccg uggaacacau ucccggugau 1700 agaauugcua aauugucgug aaauagguua gaauuuuucu uuaaauuaug 1750 guuuucuuau ucgugaaaau ucggagagug cugcuaaaau uggauuggug 1800 ugaucuuuuu gguaguugua auuuaacaga aaaacacaaa auuucaacca 1850 uucuuaaugu uacguccucc ccccaccccc uucuuucagu gguaugcaac 1900 cacugcaauc acugugcaua ugucuuuucu uagcaaaagg auuuuaaaac 1950 uugagcccug gaccuuuugu ccuaugugug uggauuccag ggcaacucua 2000 gcaucagagc aaaagccuug gguuucucgc auucaguggc cuaucuccag 2050 auugucugau uucugaaugu aaaguuguug uguuuuuuuu uaaauguag 2100 uuuguaguau uuuaaagaaa gaacagaucg aguucuaauu augaucuagc 2150 uugauuuugu guugauccaa auuugcauag cuguuuaaug uuaagucaug 2200 acaauuuauu uuucuuggca ugcuauguaa acuugaauuu ccuauguauu 2250 uuuauuggg uguuuuaaau auggggaggg guauugagca uuuuuuaggg 2300 agaaaaauaa auauaugcug uaguggccac aaauaggccu augauuuagc 2350 uggcaggcca gguuuucuca agagcaaaau cacccucugg ccccuuggca 2400 gguaaggccu cccggucagc auuauccugc cagaccucgg ggaggauacc 2450 ugggagacag aagccucugc accuacugug cagaacucuc cacuucccca 2500 acccuccca ggugggcagg gcggagggag ccucagccuc cuuagacuga 2550 ccccucaggc cccuaggcug ggggguugua aauaacagca gucagguugu 2600 uuaccagccc uuugcaccuc cccaggcaga gggagccucu guucuggugg 2650 gggccaccuc ccucagaggc ucugcuagcc acacuccgug gcccacccuu 2700 uguuaccagu ucuuccuccu uccucuuuuc cccugccuuu cucauuccuu 2750 ccuucgucuc ccuuuuuguu ccuuugccuc uugccugucc ccuaaaacuu 2800 gacuguggca cucaggguca aacagacuau ccauucccca gcaugaaugu 2850 gccuuuuaau uagugaucua gaaagaaguu cagccgaacc cacaccccaa 2900 cucccuccca agaacuucgg ugccuaaagc cuccuguucc accucagguu 2950 uucacaggg cucccacccc aguugaggcu cccacccaca gggcugucug 3000 ucacaaaccc accucuguug ggagcuauug agccaccugg gaugagauga 3050 cacaaggcac uccuaccacu gagcgccuuu gccagggucca gccugggcuc 3100 agguuccaag acucagcugc cuaaucccag gguugagccu ugugcucgug 3150 gcggacccca aaccacugcc cuccugggua ccagcccuca gugguggaggc 3200 ugagcuggug ccuggcccca gucuuaucug ugccuuuacu gcuuugcgca 3250 ucucagaugc uaacuugguu cuuuuuccag aagccuuugu auugguuaaa 3300 aauuauuuuc cauugcagaa gcagcuggac uaugcaaaaa guauuucucu 3350 gucaguuccc cacucuauac caaggauauu auuaaaacua gaaaugacug 3400 cauugagagg gaguugggg aaauaagaag aaugaaagcc ucucuuucug 3450 uccgcagauc cugacuuuuc caaagugccu uaaaagaaau cagacaaaug 3500 cccugagugg uaacuucugu guuauuuuac ucuuaaaacc aaacucuuacc 3550 uuuucuuguu guuuuuuuuu uuuuuuuuuu uuuuuuuuug guuaccuucu 3600 cauucauguc aaguauggg uucaucuua gaaccaaggg aaauacugcu 3650 ccccccauuu gcugacguag ugcucucaug ggcucaccug ggcccaaggc 3700 acagccaggg cacaguuagg ccuggauguu ugccuggucc gugagaugcc 3750 gcggguccug uuuccuuacu ggggauuuca gggcuggggg uucagggagc 3800 auuuccuuuu ccugggaguu augaccgcga aguugucaug ugccgugccc 3850 uuuucuguuu cuguguaucc uauugcuggu gacucugugu gaacuggccu 3900 uugggaaaga ucagagaggg cagagguggc acaggacagu aaaggagaug 3950 cugugcuggc cuucagccug gacagggucu cugcugacug ccaggggcgg 4000 gggcucugca uagccaggau gacggcuuuc augucccaga gaccuguugu 4050 gcuguguauu uugauuuccu guguaugcaa auguguguau uuaccauugu 4100 guaggggggcu gugucugauc uugguguuca aaacagaacu guauuuuugc 4150 cuuuaaaauu aaauaauaua acgugaauaa augacccuau cuuuguaaca 4200 aaaaaaaaaaaaaa 4214 <210> 5 <211> 1197 <212> DNA <213> Homo sapiens <220> <223> wild-type B4GALT1 cDNA sequence <400> 5 atgaggcttc gggagccgct cctgagcggc agcgccgcga tgccaggcgc 50 gtccctacag cgggcctgcc gcctgctcgt ggccgtctgc gctctgcacc 100 ttggcgtcac cctcgtttac tacctggctg gccgcgacct gagccgcctg 150 ccccaactgg tcggagtctc cacaccgctg cagggcggct cgaacagtgc 200 cgccgccatc gggcagtcct ccggggagct ccggaccgga ggggcccggc 250 cgccgcctcc tctaggcgcc tcctcccagc cgcgcccggg tggcgactcc 300 agcccagtcg tggattctgg ccctggcccc gctagcaact tgacctcggt 350 cccagtgccc cacaccaccg cactgtcgct gcccgcctgc cctgaggagt 400 ccccgctgct tgtgggcccc atgctgattg agtttaacat gcctgtggac 450 ctggagctcg tggcaaagca gaacccaaat gtgaagatgg gcggccgcta 500 tgcccccagg gactgcgtct ctcctcacaa ggtggccatc atcattccat 550 tccgcaaccg gcaggagcac ctcaagtact ggctatatta tttgcaccca 600 gtcctgcagc gccagcagct ggactatggc atctatgtta tcaaccaggc 650 gggagacact atattcaatc gtgctaagct cctcaatgtt ggctttcaag 700 aagccttgaa ggactatgac tacacctgct ttgtgtttag tgacgtggac 750 ctcattccaa tgaatgacca taatgcgtac aggtgttttt cacagccacg 800 gcacatttcc gttgcaatgg ataagtttgg attcagccta ccttatgttc 850 agtattttgg aggtgtctct gctctaagta aacaacagtt tctaaccatc 900 aatggatttc ctaataatta ttggggctgg ggaggagaag atgatgacat 950 ttttaacaga ttagttttta gaggcatgtc tatatctcgc ccaaatgctg 1000 tggtcggggag gtgtcgcatg atccgccact caagagacaa gaaaaaatgaa 1050 cccaatcctc agaggtttga ccgaattgca cacacaaagg agacaatgct 1100 ctctgatggt ttgaactcac tcacctacca ggtgctggat gtacagagat 1150 acccattgta tacccaaatc acagtggaca tcgggacacc gagctag 1197 <210> 6 <211> 1197 <212> DNA <213> Homo sapiens <220> <223> variant B4GALT1 cDNA sequence <400> 6 atgaggcttc gggagccgct cctgagcggc agcgccgcga tgccaggcgc 50 gtccctacag cgggcctgcc gcctgctcgt ggccgtctgc gctctgcacc 100 ttggcgtcac cctcgtttac tacctggctg gccgcgacct gagccgcctg 150 ccccaactgg tcggagtctc cacaccgctg cagggcggct cgaacagtgc 200 cgccgccatc gggcagtcct ccggggagct ccggaccgga ggggcccggc 250 cgccgcctcc tctaggcgcc tcctcccagc cgcgcccggg tggcgactcc 300 agcccagtcg tggattctgg ccctggcccc gctagcaact tgacctcggt 350 cccagtgccc cacaccaccg cactgtcgct gcccgcctgc cctgaggagt 400 ccccgctgct tgtgggcccc atgctgattg agtttaacat gcctgtggac 450 ctggagctcg tggcaaagca gaacccaaat gtgaagatgg gcggccgcta 500 tgcccccagg gactgcgtct ctcctcacaa ggtggccatc atcattccat 550 tccgcaaccg gcaggagcac ctcaagtact ggctatatta tttgcaccca 600 gtcctgcagc gccagcagct ggactatggc atctatgtta tcaaccaggc 650 gggagacact atattcaatc gtgctaagct cctcaatgtt ggctttcaag 700 aagccttgaa ggactatgac tacacctgct ttgtgtttag tgacgtggac 750 ctcattccaa tgaatgacca taatgcgtac aggtgttttt cacagccacg 800 gcacatttcc gttgcaatgg ataagtttgg attcagccta ccttatgttc 850 agtattttgg aggtgtctct gctctaagta aacaacagtt tctaaccatc 900 aatggatttc ctaataatta ttggggctgg ggaggagaag atgatgacat 950 ttttaacaga ttagttttta gaggcatgtc tatatctcgc ccaaatgctg 1000 tggtcggggag gtgtcgcatg atccgccact caagagacaa gaaaaaatgaa 1050 cccagtcctc agaggtttga ccgaattgca cacacaaagg agacaatgct 1100 ctctgatggt ttgaactcac tcacctacca ggtgctggat gtacagagat 1150 acccattgta tacccaaatc acagtggaca tcgggacacc gagctag 1197 <210> 7 <211> 398 <212> PRT <213> Homo sapiens <220> <223> wild-type B4GALT1 sequence <400> 7 Met Arg Leu Arg Glu Pro Leu Leu Ser Gly Ser Ala Ala Met Pro Gly 1 5 10 15 Ala Ser Leu Gln Arg Ala Cys Arg Leu Leu Val Ala Val Cys Ala Leu 20 25 30 His Leu Gly Val Thr Leu Val Tyr Tyr Leu Ala Gly Arg Asp Leu Ser 35 40 45 Arg Leu Pro Gln Leu Val Gly Val Ser Thr Pro Leu Gln Gly Gly Ser 50 55 60 Asn Ser Ala Ala Ala Ile Gly Gln Ser Ser Gly Glu Leu Arg Thr Gly 65 70 75 80 Gly Ala Arg Pro Pro Pro Pro Pro Leu Gly Ala Ser Ser Gln Pro Arg Pro 85 90 95 Gly Gly Asp Ser Ser Pro Val Val Asp Ser Gly Pro Gly Pro Ala Ser 100 105 110 Asn Leu Thr Ser Val Pro Val Pro His Thr Thr Ala Leu Ser Leu Pro 115 120 125 Ala Cys Pro Glu Glu Ser Pro Leu Leu Val Gly Pro Met Leu Ile Glu 130 135 140 Phe Asn Met Pro Val Asp Leu Glu Leu Val Ala Lys Gln Asn Pro Asn 145 150 155 160 Val Lys Met Gly Gly Arg Tyr Ala Pro Arg Asp Cys Val Ser Pro His 165 170 175 Lys Val Ala Ile Ile Ile Pro Phe Arg Asn Arg Gln Glu His Leu Lys 180 185 190 Tyr Trp Leu Tyr Tyr Leu His Pro Val Leu Gln Arg Gln Gln Leu Asp 195 200 205 Tyr Gly Ile Tyr Val Ile Asn Gln Ala Gly Asp Thr Ile Phe Asn Arg 210 215 220 Ala Lys Leu Leu Asn Val Gly Phe Gln Glu Ala Leu Lys Asp Tyr Asp 225 230 235 240 Tyr Thr Cys Phe Val Phe Ser Asp Val Asp Leu Ile Pro Met Asn Asp 245 250 255 His Asn Ala Tyr Arg Cys Phe Ser Gln Pro Arg His Ile Ser Val Ala 260 265 270 Met Asp Lys Phe Gly Phe Ser Leu Pro Tyr Val Gln Tyr Phe Gly Gly 275 280 285 Val Ser Ala Leu Ser Lys Gln Gln Phe Leu Thr Ile Asn Gly Phe Pro 290 295 300 Asn Asn Tyr Trp Gly Trp Gly Gly Glu Asp Asp Asp Ile Phe Asn Arg 305 310 315 320 Leu Val Phe Arg Gly Met Ser Ile Ser Arg Pro Asn Ala Val Val Gly 325 330 335 Arg Cys Arg Met Ile Arg His Ser Arg Asp Lys Lys Asn Glu Pro Asn 340 345 350 Pro Gln Arg Phe Asp Arg Ile Ala His Thr Lys Glu Thr Met Leu Ser 355 360 365 Asp Gly Leu Asn Ser Leu Thr Tyr Gln Val Leu Asp Val Gln Arg Tyr 370 375 380 Pro Leu Tyr Thr Gln Ile Thr Val Asp Ile Gly Thr Pro Ser 385 390 395 <210> 8 <211> 398 <212> PRT <213> Homo sapiens <220> <223> variant B4GALT1 sequence <400> 8 Met Arg Leu Arg Glu Pro Leu Leu Ser Gly Ser Ala Ala Met Pro Gly 1 5 10 15 Ala Ser Leu Gln Arg Ala Cys Arg Leu Leu Val Ala Val Cys Ala Leu 20 25 30 His Leu Gly Val Thr Leu Val Tyr Tyr Leu Ala Gly Arg Asp Leu Ser 35 40 45 Arg Leu Pro Gln Leu Val Gly Val Ser Thr Pro Leu Gln Gly Gly Ser 50 55 60 Asn Ser Ala Ala Ala Ile Gly Gln Ser Ser Gly Glu Leu Arg Thr Gly 65 70 75 80 Gly Ala Arg Pro Pro Pro Pro Pro Leu Gly Ala Ser Ser Gln Pro Arg Pro 85 90 95 Gly Gly Asp Ser Ser Pro Val Val Asp Ser Gly Pro Gly Pro Ala Ser 100 105 110 Asn Leu Thr Ser Val Pro Val Pro His Thr Thr Ala Leu Ser Leu Pro 115 120 125 Ala Cys Pro Glu Glu Ser Pro Leu Leu Val Gly Pro Met Leu Ile Glu 130 135 140 Phe Asn Met Pro Val Asp Leu Glu Leu Val Ala Lys Gln Asn Pro Asn 145 150 155 160 Val Lys Met Gly Gly Arg Tyr Ala Pro Arg Asp Cys Val Ser Pro His 165 170 175 Lys Val Ala Ile Ile Ile Pro Phe Arg Asn Arg Gln Glu His Leu Lys 180 185 190 Tyr Trp Leu Tyr Tyr Leu His Pro Val Leu Gln Arg Gln Gln Leu Asp 195 200 205 Tyr Gly Ile Tyr Val Ile Asn Gln Ala Gly Asp Thr Ile Phe Asn Arg 210 215 220 Ala Lys Leu Leu Asn Val Gly Phe Gln Glu Ala Leu Lys Asp Tyr Asp 225 230 235 240 Tyr Thr Cys Phe Val Phe Ser Asp Val Asp Leu Ile Pro Met Asn Asp 245 250 255 His Asn Ala Tyr Arg Cys Phe Ser Gln Pro Arg His Ile Ser Val Ala 260 265 270 Met Asp Lys Phe Gly Phe Ser Leu Pro Tyr Val Gln Tyr Phe Gly Gly 275 280 285 Val Ser Ala Leu Ser Lys Gln Gln Phe Leu Thr Ile Asn Gly Phe Pro 290 295 300 Asn Asn Tyr Trp Gly Trp Gly Gly Glu Asp Asp Asp Ile Phe Asn Arg 305 310 315 320 Leu Val Phe Arg Gly Met Ser Ile Ser Arg Pro Asn Ala Val Val Gly 325 330 335 Arg Cys Arg Met Ile Arg His Ser Arg Asp Lys Lys Asn Glu Pro Ser 340 345 350 Pro Gln Arg Phe Asp Arg Ile Ala His Thr Lys Glu Thr Met Leu Ser 355 360 365 Asp Gly Leu Asn Ser Leu Thr Tyr Gln Val Leu Asp Val Gln Arg Tyr 370 375 380 Pro Leu Tyr Thr Gln Ile Thr Val Asp Ile Gly Thr Pro Ser 385 390 395 <210> 9 <211> 20 <212> DNA <213> Artificial Sequence <220> <223> guide RNA recognition sequences <400> 9 attagttttt agaggcatgt 20 <210> 10 <211> 20 <212> DNA <213> Artificial Sequence <220> <223> guide RNA recognition sequences <400> 10 ggctctcagg ccaagtgtat 20 <210> 11 <211> 20 <212> DNA <213> Artificial Sequence <220> <223> guide RNA recognition sequences <400> 11 tactccttcc ccctttagga 20 <210> 12 <211> 20 <212> DNA <213> Artificial Sequence <220> <223> guide RNA recognition sequences <400> 12 gtccgaggct ctgggcctag 20 <210> 13 <211> 6 <212> DNA <213> Artificial Sequence <220> <223> PAM for Cas9 from S. aureus <220> <221> n is A, G, C, or T <222> (1) .. (2) <220> <221> r is A or G <222> (4) .. (5) <400> 13 nngrrt 6 <210> 14 <211> 5 <212> DNA <213> Artificial Sequence <220> <223> PAM for Cas9 from S. aureus <220> <221> n is A, G, C, or T <222> (1) .. (2) <220> <221> r is A or G <222> (4) .. (5) <400> 14 nngrr 5 <210> 15 <211> 23 <212> DNA <213> Artificial Sequence <220> <223> target motif preceding NGG recognized by Cas9 protein <220> <221> n is A, G, C, or T <222> (2) .. (21) <400> 15 gnnnnnnnnnnnnnnnnnnnn ngg 23 <210> 16 <211> 23 <212> DNA <213> Artificial Sequence <220> <223> target motif preceding NGG recognized by Cas9 protein <220> <221> n is A, G, C, or T <222> (1) .. (21) <400> 16 nnnnnnnnnn nnnnnnnnnn ngg 23 <210> 17 <211> 25 <212> DNA <213> Artificial Sequence <220> <223> RNA recognition sequence <220> <221> n is A, G, C, or T <222> (3) .. (23) <400> 17 ggnnnnnnnn nnnnnnnnnn nnngg 25

Claims

delete

An isolated nucleic acid molecule or a complement thereof comprising a nucleic acid sequence that is at least 99% identical to SEQ ID NO: 8 and encoding a polypeptide having the activity of beta-1,4-galactosyltransferase 1 (B4GALT1), provided that the polypeptide is numbered 352 An isolated nucleic acid molecule or complement thereof comprising a serine at position.

According to clause 34,
An isolated nucleic acid molecule or complement thereof, wherein the nucleic acid sequence encodes the polypeptide sequence of SEQ ID NO: 8.

A vector comprising an isolated nucleic acid molecule according to claim 34 or its complement.

delete

An isolated nucleic acid molecule according to claim 34 or 35 or its complement, or a vector according to claim 36 and
A composition comprising a carrier.

delete

A host cell comprising an isolated nucleic acid molecule according to claim 34 or 35 or its complement, or a vector according to claim 36.

delete

encoding human beta-1,4-galactosyltransferase 1 ( B4GALT1 ) protein comprising a nucleic acid sequence at least 99% identical to SEQ ID NO:6 As cDNA or its complement,
Provided that the nucleic acid sequence encodes a serine at a position corresponding to position 352 of the full-length or mature B4GALT1 polypeptide,
The cDNA is derived from a nucleic acid sequence encoding the B4GALT1 variant polypeptide of SEQ ID NO: 8, or its complement.

delete

A vector comprising the cDNA or its complement according to claim 49.

delete

cDNA or its complement according to claim 49, or a vector according to claim 51 and
A composition comprising a carrier.

delete

A host cell comprising a cDNA according to claim 49 or its complement, or a vector according to claim 51.

delete

An isolated polypeptide comprising an amino acid sequence that is at least 99% identical to the B4GALT1 variant polypeptide having SEQ ID NO: 8 and having the activity of B4GALT1 (beta-1,4-galactosyltransferase 1),
However, the polypeptide contains serine corresponding to position 352 of SEQ ID NO: 8.

delete

A composition comprising the polypeptide according to claim 64 and a carrier or excipient.

A host cell expressing the polypeptide according to claim 64.

A method for producing a polypeptide according to claim 64, comprising:
A method for producing a polypeptide, comprising culturing a host cell containing a nucleic acid molecule encoding the polypeptide, causing the cell to express the polypeptide, and recovering the expressed polypeptide.

delete

A method of providing detection information for B4GALT1 variant nucleic acid molecules for use in determining the susceptibility of a human subject to the development of a cardiovascular condition, comprising:
A sample obtained from the subject is assayed to determine whether the nucleic acid molecule in the sample contains a nucleic acid sequence encoding a serine at a position corresponding to position 352 of the full-length or mature B4GALT1 polypeptide with an amino acid sequence of 398 contiguous amino acids. Including the step of determining whether,
The method wherein the B4GALT1 variant nucleic acid molecule encodes the B4GALT1 variant polypeptide of SEQ ID NO: 8.

According to clause 76,
The test is,
Sequencing a portion of the B4GALT1 genomic sequence of a nucleic acid molecule in the sample, wherein the sequenced portion comprises positions corresponding to positions 53575 to 53577 of SEQ ID NO: 2;
sequencing a portion of the B4GALT1 mRNA sequence of a nucleic acid molecule in the sample, wherein the sequenced portion comprises positions corresponding to positions 1243 to 1245 of SEQ ID NO:4; or
Sequencing a portion of the B4GALT1 cDNA sequence of a nucleic acid molecule in the sample, wherein the sequenced portion comprises positions corresponding to positions 1054 to 1056 of SEQ ID NO:6.
Method, including.

According to clause 76,
The test is,
a) the sample comprising: i) a portion of the B4GALT1 genomic sequence adjacent to the position of the B4GALT1 genomic sequence corresponding to positions 53575 to 53577 of SEQ ID NO: 2; ii) a portion of the B4GALT1 mRNA sequence adjacent to the position of B4GALT1 mRNA corresponding to positions 1243 to 1245 of SEQ ID NO: 4; or iii) contacting with a primer that hybridizes to a portion of the B4GALT1 cDNA sequence adjacent to the position of the B4GALT1 cDNA corresponding to positions 1054 to 1056 of SEQ ID NO:6;
b) apply the primers to at least i) a position in the B4GALT1 genome sequence corresponding to positions 53575 to 53577; ii) the position of the B4GALT1 mRNA corresponding to positions 1243 to 1245; or iii) extending through the position of the B4GALT1 cDNA corresponding to positions 1054 to 1056; and
c) the product of the extension of the primer encodes serine at position 352 of SEQ ID NO: 8; i) a position corresponding to positions 53575 to 53577 of the B4GALT1 genome sequence; ii) a position corresponding to positions 1243 to 1245 of the B4GALT1 mRNA; or iii) determining whether it contains a nucleotide at a position corresponding to positions 1054 to 1056 of the B4GALT1 cDNA.
Method, including.

According to clause 76,
The assay involves contacting the sample under stringent conditions with primers or probes that specifically hybridize to the B4GALT1 variant genomic sequence, mRNA sequence, or cDNA sequence rather than the corresponding wild-type B4GALT1 sequence and determining whether hybridization has occurred. A method comprising steps.

A method of providing detection information for the presence of B4GALT1 Asn352Ser for use in determining the susceptibility of a human subject to the development of a cardiovascular condition, comprising:
Performing an assay on a sample obtained from the human subject to determine whether the B4GALT1 protein in the sample contains a serine residue at position 352,
The method wherein the detection information is based on the B4GALT1 variant polypeptide of SEQ ID NO: 8.

An in vitro method for determining the susceptibility of a human subject to the development of a cardiovascular condition, comprising:
a) Assaying a sample obtained from the subject and determining that the nucleic acid molecule in the sample comprises a nucleic acid sequence encoding a serine at a position corresponding to position 352 of the full-length or mature B4GALT1 polypeptide having an amino acid sequence of 398 contiguous amino acids. deciding whether or not to do so; and
b) If the nucleic acid molecule comprises a nucleic acid sequence encoding a serine at a position corresponding to position 352 of the full-length or mature B4GALT1 polypeptide, the human subject is classified as having a reduced risk for developing the cardiovascular condition. Alternatively, if the nucleic acid molecule does not contain a nucleic acid sequence encoding a serine at a position corresponding to position 352 of the full-length/mature B4GALT1 polypeptide, the human subject is at increased risk for developing the cardiovascular condition. Steps to classify
Including,
An in vitro method for determining the susceptibility of a human subject to the development of a cardiovascular condition, wherein said determination is based on the B4GALT1 variant polypeptide of SEQ ID NO:8.

According to clause 81,
The test is,
Sequencing a portion of the B4GALT1 genomic sequence of a nucleic acid molecule in the sample, wherein the sequenced portion comprises positions corresponding to positions 53575 to 53577 of SEQ ID NO: 2;
sequencing a portion of the B4GALT1 mRNA sequence of a nucleic acid molecule in the sample, wherein the sequenced portion comprises positions corresponding to positions 1243 to 1245 of SEQ ID NO:4; or
Sequencing a portion of the B4GALT1 cDNA sequence of a nucleic acid molecule in the sample, wherein the sequenced portion comprises positions corresponding to positions 1054 to 1056 of SEQ ID NO:6.
An in vitro method for determining the susceptibility of a human subject to the development of a cardiovascular condition, comprising:

According to clause 81,
The test is,
a) the sample comprising: i) a portion of the B4GALT1 genomic sequence adjacent to the position of the B4GALT1 genomic sequence corresponding to positions 53575 to 53577 of SEQ ID NO: 2; ii) a portion of the B4GALT1 mRNA sequence adjacent to the position of B4GALT1 mRNA corresponding to positions 1243 to 1245 of SEQ ID NO: 4; or iii) contacting with a primer that hybridizes to a portion of the B4GALT1 cDNA sequence adjacent to the position of the B4GALT1 cDNA corresponding to positions 1054 to 1056 of SEQ ID NO:6;
b) apply the primers to at least i) a position in the B4GALT1 genome sequence corresponding to positions 53575 to 53577; ii) the position of the B4GALT1 mRNA corresponding to positions 1243 to 1245; or iii) extending through the position of the B4GALT1 cDNA corresponding to positions 1054 to 1056; and
c) the product of the extension of the primer encodes serine at position 352 of SEQ ID NO: 8; i) a position corresponding to positions 53575 to 53577 of the B4GALT1 genome sequence; ii) a position corresponding to positions 1243 to 1245 of the B4GALT1 mRNA; or iii) determining whether it contains a nucleotide at a position corresponding to positions 1054 to 1056 of the B4GALT1 cDNA.
An in vitro method for determining the susceptibility of a human subject to the development of a cardiovascular condition, comprising:

According to clause 81,
The assay involves contacting the sample under stringent conditions with primers or probes that specifically hybridize to the B4GALT1 variant genomic sequence, mRNA sequence, or cDNA sequence rather than the corresponding wild-type B4GALT1 sequence and determining whether hybridization has occurred. An in vitro method for determining the susceptibility of a human subject to the development of a cardiovascular condition, comprising the steps of:

delete

An in vitro method for determining the susceptibility of a human subject to the development of a cardiovascular condition, comprising:
a) performing an assay on a sample obtained from the human subject to determine whether the B4GALT1 protein in the sample contains a serine residue at position 352; and
b) if the B4GALT1 polypeptide contains a serine at a position corresponding to position 352 of a full-length or mature B4GALT1 polypeptide with an amino acid sequence of 398 contiguous amino acids, then the human subject is at a reduced risk for developing the cardiovascular condition. or, if the B4GALT1 polypeptide does not contain a serine at the position corresponding to position 352 of the full-length or mature B4GALT1 polypeptide, the human subject is at increased risk for developing the cardiovascular condition. Steps to classify
Including,
An in vitro method for determining the susceptibility of a human subject to the development of a cardiovascular condition, wherein said determination is based on the B4GALT1 variant polypeptide of SEQ ID NO:8.

The method according to any one of claims 81 to 84,
An in vitro method for determining the susceptibility of a human subject to the development of a cardiovascular condition, wherein the cardiovascular condition comprises increased levels of one or more serum lipids.

According to clause 96,
An in vitro method for determining the susceptibility of a human subject to the development of a cardiovascular condition, wherein the serum lipids include one or more of cholesterol, LDL, HDL, triglycerides, HDL-cholesterol, and non-HDL cholesterol.

The method according to any one of claims 81 to 84,
An in vitro method for determining the susceptibility of a human subject to the development of a cardiovascular condition, wherein the cardiovascular condition comprises increased levels of coronary artery calcification.

The method according to any one of claims 81 to 84,
An in vitro method for determining the susceptibility of a human subject to the development of a cardiovascular condition, wherein the cardiovascular condition comprises increased levels of pericardial fat.

The method according to any one of claims 81 to 84,
An in vitro method for determining the susceptibility of a human subject to the development of a cardiovascular condition, wherein the cardiovascular condition includes an atherothrombotic condition.

According to clause 100,
An in vitro method for determining the susceptibility of a human subject to the development of a cardiovascular condition, wherein the atherothrombotic condition comprises increased levels of fibrinogen.

According to clause 101,
An in vitro method for determining the susceptibility of a human subject to the development of a cardiovascular condition, wherein the atherothrombotic condition comprises a blood clot formed from involvement of fibrinogen activity.

The method according to any one of claims 81 to 84,
An in vitro method for determining the susceptibility of a human subject to the development of a cardiovascular condition, wherein the cardiovascular condition comprises increased levels of fibrinogen.

Paragraph 103:
An in vitro method for determining the susceptibility of a human subject to the development of a cardiovascular condition, wherein the cardiovascular condition comprises a blood clot formed from involvement of fibrinogen activity.

delete

According to clause 95,
An in vitro method for determining the susceptibility of a human subject to the development of a cardiovascular condition, wherein the cardiovascular condition comprises increased levels of one or more serum lipids.

According to clause 154,
An in vitro method for determining the susceptibility of a human subject to the development of a cardiovascular condition, wherein the serum lipids include one or more of cholesterol, LDL, HDL, triglycerides, HDL-cholesterol, and non-HDL cholesterol.

According to clause 95,
An in vitro method for determining the susceptibility of a human subject to the development of a cardiovascular condition, wherein the cardiovascular condition comprises increased levels of coronary artery calcification.

According to clause 95,
An in vitro method for determining the susceptibility of a human subject to the development of a cardiovascular condition, wherein the cardiovascular condition comprises increased levels of pericardial fat.

According to clause 95,
An in vitro method for determining the susceptibility of a human subject to the development of a cardiovascular condition, wherein the cardiovascular condition includes an atherothrombotic condition.

According to clause 158,
An in vitro method for determining the susceptibility of a human subject to the development of a cardiovascular condition, wherein the atherothrombotic condition comprises increased levels of fibrinogen.

Paragraph 159:
An in vitro method for determining the susceptibility of a human subject to the development of a cardiovascular condition, wherein the atherothrombotic condition comprises a blood clot formed from involvement of fibrinogen activity.

According to clause 95,
An in vitro method for determining the susceptibility of a human subject to the development of a cardiovascular condition, wherein the cardiovascular condition comprises increased levels of fibrinogen.

According to clause 161,
An in vitro method for determining the susceptibility of a human subject to the development of a cardiovascular condition, wherein the cardiovascular condition comprises a blood clot formed from involvement of fibrinogen activity.

delete