KR20200009927A

KR20200009927A - Methods for identifying new variant that cause hereditary spastic paraplegia and Diagnostic chip of hereditary spastic paraplegia

Info

Publication number: KR20200009927A
Application number: KR1020180084998A
Authority: KR
Inventors: 김남순; 정경숙; 김대수; 이다용; 이용재; 이재란; 이정주; 정초록; 조은위; 김완태
Original assignee: 한국생명공학연구원
Priority date: 2018-07-20
Filing date: 2018-07-20
Publication date: 2020-01-30
Also published as: KR102257221B1

Abstract

The present invention relates to a method for identifying novel variants of hereditary spastic paraplegia (HSP) disease, an HSP disease cause gene identification system, a computer program stored in an informational medium for predicting HSP, a method of providing information for diagnosing HSP, a system for diagnosing HSP disease, a chip for diagnosing a cause of HSP disease, and an HSP diagnostic kit.

Description

Methods for identifying new variant that cause hereditary spastic paraplegia and Diagnostic chip of hereditary spastic paraplegia}

본 발명은 유전성 강직성 하반신마비 (Hereditary Spastic Paraplegia; HSP) 질환의 원인 신규 변이체를 동정하는 방법, HSP 질환 원인 유전자 동정용 시스템, HSP 예측을 위한 정보 제공용 매체에 저장된 컴퓨터 프로그램, HSP 진단을 위한 정보의 제공 방법, HSP 질환 진단용 시스템, HSP 질환의 원인 진단용 칩, 및 HSP 진단용 키트에 관한 것이다.The present invention relates to a method for identifying a novel variant of Hereditary Spastic Paraplegia (HSP) disease, a system for identifying a gene causing HSP disease, a computer program stored in an information providing medium for predicting HSP, and information for diagnosing HSP. The present invention relates to a method for providing an HSP disease diagnosis system, a chip for diagnosing the cause of an HSP disease, and a kit for diagnosing an HSP.

유전성 강직성 하반신마비(Hereditary spastic paraplegia, HSP)는 다리의 근육이 점차적으로 약해져 마비되고, 근육의 긴장성이 증가하며, 뻣뻣해지는 증상이 나타나는 유전성 신경계 질환이다. 유전성 강직성 하반신마비의 원인은 아직 명확하지 않다. 그러나, 유전성 강직성 하반신마비의 증상은 척수의 피질척수로의 점진적인 퇴행성 변화 때문에 나타나는 것으로 추정하고 있다. 유전성 강직성 하반신마비는 염색체 우성질환 (autosomal dominant, AD type), 염색체 열성질환 (autosomal recessive, AR type)에서 발병되는 것으로 알려져 있다 (Novarino G et al, Science 343:506~511, 2014).Hereditary spastic paraplegia (HSP) is a hereditary nervous system disorder in which the muscles of the legs gradually weaken and become paralyzed, muscle tension increases and stiffness occurs. The cause of hereditary tonic paraplegia is not yet clear. However, symptoms of hereditary anterior paraplegia are presumed to be due to gradual degenerative changes of the spinal cord to the cortical spinal cord. Hereditary stiff paraplegia is known to occur in autosomal dominant (AD type) and chromosomal recessive (AR type) (Novarino G et al, Science 343: 506-511, 2014).

최근 차세대 시퀀싱(Next-generation Sequenceing; NGS) 기술에 기반한 엑손시퀀싱 (Whole Exome Sequencing; WES)의 등장으로 다양한 질환 원인유전자 및 그의 돌연변이 등 다양한 질환의 유전적 연구가 빠르게 확대되고 있다. Recently, with the advent of next-generation sequencing (WGS) based on next-generation sequencing (NGS) technology, genetic studies of various diseases including various disease causing genes and mutations thereof are rapidly expanding.

현재까지 희귀난치 질환 중 하나인 유전성 강직성 하반신마비 (Hereditary Spastic Paraplegia; HSP)의 유전적 진단은 주요 원인 유전자 몇 개에 대한 NGS 와 Sanger 시퀀싱을 이용하여 수행하고 있으며, 국내에서는 주 원인유전자인 HSP4 (국내환자의 약 40%), HSP3 (국내 환자의 13%)의 돌연변이만을 파악하고 있는 실정이다. 따라서 HSP 질환과 관련되는 모든 원인유전자에 대한 분석이 가능한 진단시스템 개발의 필요성이 대두되고 있다. 특히, 원인유전자가 1-2개가 아닌 이질적 질환의 원인유전자 및 돌연변이 확인에 NGS와 같은 유전적 연구가 매우 적절한 방법으로 여겨지고 있다. To date, genetic diagnosis of Hereditary Spastic Paraplegia (HSP), one of rare incurable diseases, has been carried out using NGS and Sanger sequencing of several major causative genes. About 40% of domestic patients) and HSP3 (13% of domestic patients) only mutations are identified. Therefore, the necessity of developing a diagnostic system capable of analyzing all causal genes related to HSP disease is emerging. In particular, genetic studies such as NGS are considered to be a very suitable method for identifying the causal genes and mutations of heterogeneous diseases with not one or two causal genes.

이러한 배경하에서, 본 연구자들은 보다 효과적으로 유전성 강직성 하반신마비(HSP) 질환의 원인인 신규 변이체를 동정하는 방법을 개발하기 위해 예의 연구 노력한 결과, 기존의 복잡한 유전성 강직성 하반신마비 질환의 원인 유전자 동정 알고리즘을 확립하여 진단 시스템을 구축함으로써, 본 발명을 완성하였다. Against this background, the researchers have worked hard to develop new ways to more effectively identify novel variants that are the cause of hereditary ankylosing paraplegia (HSP) disease. The present invention was completed by constructing a diagnostic system.

본 발명의 목적은 HSP 질환 원인 신규 변이체를 동정하는 방법을 제공하는 것이다.It is an object of the present invention to provide a method for identifying novel variants of HSP disease causes.

본 발명의 다른 목적은 HSP 예측을 위한 정보 제공용 매체에 저장된 컴퓨터 프로그램을 제공하는 것이다. Another object of the present invention is to provide a computer program stored in an information providing medium for HSP prediction.

본 발명의 또 다른 목적은 HSP 진단을 위한 정보 제공 방법을 제공하는 것이다. Still another object of the present invention is to provide a method for providing information for diagnosing HSP.

본 발명의 또 다른 목적은 HSP 질환의 원인 진단용 칩을 제공하는 것이다. Still another object of the present invention is to provide a chip for diagnosing the cause of HSP disease.

본 발명의 또 다른 목적은 HSP 진단용 키트를 제공하는 것이다.Still another object of the present invention is to provide a kit for diagnosing HSP.

이를 구체적으로 설명하면 다음과 같다. 한편, 본 발명에서 개시된 각각의 설명 및 실시형태는 각각의 다른 설명 및 실시 형태에도 적용될 수 있다. 즉, 본 발명에서 개시된 다양한 요소들의 모든 조합이 본 발명의 범주에 속한다. 또한, 하기 기술된 구체적인 서술에 의하여 본 발명의 범주가 제한된다고 볼 수 없다.This will be described in detail below. Meanwhile, each of the descriptions and the embodiments disclosed in the present invention may be applied to each of the other descriptions and the embodiments. That is, all combinations of the various elements disclosed in the present invention fall within the scope of the present invention. In addition, the scope of the present invention is not to be limited by the specific description described below.

상기 목적을 달성하기 위한 본 발명의 하나의 양태는 HSP 질환 원인 신규 변이체를 동정하는 방법을 제공하는 것이다.One aspect of the present invention for achieving the above object is to provide a method for identifying novel variants of HSP disease causes.

구체적으로, (a) 유전성 강직성 하반신마비 단일 염기서열(Single Nucleotide Variant, SNV) 데이터 및 가족정보가 입력된 유전성 강직성 하반신마비 단일 염기서열 데이터 및 가족정보와, 기 설정된 수식을 이용하여 SNV 데이터가 수치화되는 단계; (b) 상기 (a) 단계에서 수치화된 SNV 데이터를 분석 가능한 가족 구성원의 수를 이용하여 윈도우 사이즈 n이 결정되는 단계; (c) 임의의 특정 위치 염기서열을 중심으로 좌측 및 우측으로 각각 상기 (b) 단계에서 결정된 윈도우 사이즈인 n개의 염기서열을 포함하는 윈도우가 설정되고, 설정된 윈도우 내 수치화된 SNV 데이터를 이용하여 분석 대상 가족의 비율이 연산되는 단계; (d) 상기 (c) 단계에서 설정된 윈도우 내 단일 염기서열(SNV) 위치에서 단일비율검정을 이용하여 유의 확률(p-value)이 연산되는 단계; (e) 상기 (c) 단계에서 설정된 윈도우의 양측 말단의 물리적인 위치 보정을 위한 가중치가 연산되는 단계; (f) 상기 (d) 단계에서 연산된 유의 확률과 상기 (e) 단계에서 연산된 가중치를 이용하여 우선순위 점수가 연산되는 단계; (g) 상기 연산된 우선순위 점수에 따라 단일 염기서열(SNV)이 유전자기호(gene symbol)로 변환되는 단계; 및 (h) 상기 연산된 우선순위 점수에 따라 우선 순위가 결정되고, 결정된 우선 순위에 따라 원인 후보 유전자 리스트가 확인되는 단계를 포함하는, HSP 질환 원인 신규 변이체를 동정하는 방법을 제공하는 것이다. Specifically, (a) Hereditary stiff paraplegic single nucleotide sequence (SNV) data and family information into which the hereditary stiff paraplegic single nucleotide sequence data and family information and SNV data are quantified using a predetermined formula. Becoming; (b) determining a window size n using the number of family members capable of analyzing the SNV data quantified in step (a); (c) a window including n base sequences having a window size determined in step (b), respectively, is set to the left and the right of a specific position sequence, respectively, and is analyzed using the quantized SNV data in the set window. Calculating a proportion of the target family; (d) calculating a significant probability (p-value) using a single ratio test at a single sequence (SNV) position in the window set in step (c); (e) calculating weights for physical position correction of both ends of the window set in step (c); (f) calculating a priority score using the significance probability calculated in step (d) and the weight calculated in step (e); (g) converting a single nucleotide sequence (SNV) into a gene symbol according to the calculated priority score; And (h) determining the priority candidates according to the calculated priority scores and identifying a cause candidate gene list according to the determined priority scores.

희귀난치 질환 중 하나인 유전성 강직성 하반신마비 (Hereditary Spastic Paraplegia; HSP)의 유전적 진단은 주요 원인 유전자 몇 개에 대하여 차세대 시퀀싱(Next-generation Sequenceing; NGS)과 Sanger 시퀀싱을 이용하여 수행하고 있을 뿐이므로, HSP 질환과 관련되는 모든 원인유전자에 대한 분석이 가능한 진단시스템 개발이 필요한 실정이다. 본 발명에서는 유전성 강직성 하반신마비 질환 원인 신규 변이체를 동정하는 시스템을 구축하여 새로운 HSP 원인 유전자를 동정하고, 이를 이용하여 HSP 질환을 진단할 수 있는 있다는 점에서 의의가 크다고 할 수 있다. Genetic diagnosis of Hereditary Spastic Paraplegia (HSP), one of the rare incurable diseases, is performed using next-generation sequencing (NGS) and Sanger sequencing on several major causal genes. Therefore, it is necessary to develop a diagnostic system capable of analyzing all causal genes related to HSP disease. In the present invention, it can be said that it is significant in that it is possible to identify a new HSP causal gene by constructing a system for identifying a new variant of hereditary stiff paraplegic disease cause, and diagnose HSP disease using the same.

본 발명에서의 용어 "유전성 강직성 하반신마비(Hereditary Spastic Paraplegia, HSP)"는 다리의 근육이 점차적으로 약해져 마비되고, 근육의 긴장성이 증가하며, 뻣뻣해지는 증상이 나타나는 유전성 신경계 질환이다. 유전성 강직성 하반신마비는 단순형 유전성 강직성 하반신마비(Uncomplicated HSP)와 복합형 유전성 강직성 하반신마비 (Complicated HSP)로 나누어진다. 단순형 유전성 강직성 하반신마비는 진행성의 강직성 하반신 마비가 초기 소견으로 발생되고 근육 강직이 발달될 수 있고, 방광 조절에 어려움을 느낄 수 있으나, 복합형 유전성 강직성 하반신마비는 단순형에서 추가적으로 신경학적 이상이 보이는데 시각, 청각장애, 정신 지연, 수의적인 운동 조절의 부전(운동실조) 등의 이상이 나타날 수 있으나, 본 발명의 진단 방법 또는 동정 방법이 적용되는 증상은 제한 없이 포함된다.The term "Hereditary Spastic Paraplegia" (HSP) in the present invention is a hereditary nervous system disease in which the muscles of the legs gradually weaken and become paralyzed, muscle tension increases and stiffness appears. Hereditary stiff paraplegia is divided into simple hereditary stiff paraplegia (Uncomplicated HSP) and complex hereditary stiff paraplegia (Complicated HSP). Simple hereditary stiff paraplegia may develop progressive stiff paraplegia with initial findings, muscle stiffness, and difficulty controlling bladder, but complex hereditary stiff paraplegia may show additional neurological abnormalities. , Hearing impairment, mental retardation, voluntary dysfunction (motor ataxia) and the like may appear, but the symptoms to which the diagnostic method or identification method of the present invention is applied include without limitation.

상기 HSP 질환 원인 신규 변이체를 동정하는 방법의 각 단계에 대해 구체적으로 설명하면 다음과 같다.Each step of the method for identifying a new variant of HSP disease causes is described in detail as follows.

먼저 (a) 단계는 유전성 강직성 하반신마비 단일 염기서열(Single Nucleotide Variant, SNV) 데이터 및 가족정보가 입력된 유전성 강직성 하반신마비 단일 염기서열 데이터 및 가족정보와, 기 설정된 수식을 이용하여 SNV 데이터가 수치화되는 단계일 수 있으며, 상기 (a) 단계의 SNV 데이터는 AMPD2, PSEN1, APP, CAPN1, FUS, ALDH18A1, ALDH18A1, SOD1, MARS, MPZ, MAG, NEFL, PFN1, SLC16A2, UBQLN2, PLP1, GJB1, L1CAM, GARS, PSEN2, VCP, PMP22, KIF1A, HINT1, ATXN2, ENTPD1, B4GALNT1, AP4M1, TFG, SPG7, KIF5A, KIF1C, USP8, RTN2, PNPLA6, SLC33A1, CYP7B1, TBK1, TARDBP, KIF1B, BSCL2, ALS2, ATL1, GDAP1, LYST, AP4S1, AP4E1, AP4B1, ARL6IP1, NIPA1, SPG21, MFN2, GJC2, CPT1C, REEP1, RAB3GAP2, REEP2, FIG4, GBA2, BICD2, VPS37A, ARSI, CCT5, CYP2U1, SPG11, FA2H, CCDC50, ERLIN1, ERLIN2, PGAP1, ZFYVE26, WDR48, C12orf65, AP5Z1, C19orf12, DDHD1, TECPR2, DDHD2, IBA57, ZFR, KLC2, NT5C2, SPAST, ZFYVE27, KIAA0196, HSP60, SPG20, ARL6IP2로 구성된 군으로부터 선택되는 하나 이상의 유전자를 코딩하는 뉴클레오타이드 서열에 특이적으로 결합하는 프라이머 또는 프로브를 차세대 염기서열 분석((Next Generation Sequencing)을 통해 얻는 것일 수 있으나, 이에 제한되지 않는다. First, in step (a), the sequential sequential single nucleotide variance single nucleotide sequence (SNV) data and family information is inputted. SNV data of step (a) is AMPD2, PSEN1, APP, CAPN1, FUS, ALDH18A1, ALDH18A1, SOD1, MARS, MPZ, MAG, NEFL, PFN1, SLC16A2, UBQLN2, PLP1, GJB1, L1CAM , GARS, PSEN2, VCP, PMP22, KIF1A, HINT1, ATXN2, ENTPD1, B4GALNT1, AP4M1, TFG, SPG7, KIF5A, KIF1C, USP8, RTN2, PNPLA6, SLC33A1, CYP7B1, TBK1, TLSBP, ATLB2 , GDAP1, LYST, AP4S1, AP4E1, AP4B1, ARL6IP1, NIPA1, SPG21, MFN2, GJC2, CPT1C, REEP1, RAB3GAP2, REEP2, FIG4, GBA2, BICD2, VPS37A, ARSI, CCT5, SPG1U1 , ERLIN2, PGAP1, ZFYVE26, WDR48, C12orf65, AP5Z1, C19orf12, DDHD1, TECPR2, DDHD2, IBA57, ZFR, KLC2, NT5C2, SPAST, ZFYVE27, KIAA0196, HSP60, SPG20, ARL6IP2 The specific nucleotide sequence to primers or probes that bind to that encodes at least one gene selected from the group, but can be obtain in the next generation sequencing ((Next Generation Sequencing), but is not limited thereto.

구체적으로, 상기 (a) 단계의 기 설정된 조건은 개체의 부친과 모친의 염기서열 정보를 모두 사용할 수 있는 경우와 그렇지 않은 경우인 것일 수 있으며, 더욱 구체적으로, 상기 (a) 단계에서 개체의 부친과 모친의 염기서열 정보를 모두 사용할 수 있는 경우 상기 기 설정된 수학식은 Specifically, the predetermined condition of step (a) may be a case where both the father and the mother base information of the individual can be used or not, more specifically, the father of the individual in step (a) If both mother and nucleotide sequence information can be used, the preset equation is

일 수 있다. 일 예로, 상기 수학식 1에서 S는 환자와 정상을 구분하는 변수이다. S=0은 정상인을 뜻하고, S=1은 환자를 뜻하며, SNV_jv(S)는 v번째 가족 구성원의 j번째 SNV를 의미한다.Can be. For example, in Equation 1, S is a variable for distinguishing a patient from a normal. S = 0 means normal, S = 1 means patient, and SNV _jv (S) means jth SNV of the vth family member.

또한, 상기 (a) 단계에서 개체의 부친과 모친의 염기서열 정보를 모두 사용할 수 없거나, 하나만 사용할 수 있는 경우 상기 기 설정된 수학식은In addition, in the step (a), if neither the nucleotide sequence information of the father nor the mother of the individual or only one can be used, the preset equation is

일 수 있다. 여기서 MSNV_i(S=1)는 i 번째 위치에서 환자의 SNV에 가장 일반적인 패턴이고, v=1, ..., V 인 것일 수 있다. 일 예로, 상기 수학식 3에서 MSNV_j(S=1)은 분석 대상 가족 구성원 내에서 환자들만이 가지고 있는 SNV 중 빈도가 높은 패턴을 나타낸다. 상기 수학식 3을 이용하여 가족 전체의 패턴 빈도 LFj를 계산할 수 있다. Can be. Here, MSNV _i (S = 1) is the most common pattern for the SNV of the patient at the i-th position, and may be v = 1, ..., V. For example, MSNV _j (S = 1) in Equation 3 represents a high frequency pattern among SNVs that only patients have within the family members to be analyzed. Using Equation 3, the pattern frequency LFj of the whole family can be calculated.

(b) 단계는 상기 (a) 단계에서 수치화된 SNV 데이터를 분석 가능한 가족 구성원의 수를 이용하여 윈도우 사이즈 n이 결정되는 단계일 수 있으며, (c) 단계는 임의의 특정 위치 염기서열을 중심으로 좌측 및 우측으로 각각 상기 (c) 단계에서 결정된 윈도우 사이즈인 n개의 염기서열을 포함하는 윈도우가 설정되고, 설정된 윈도우 내 수치화된 SNV 데이터를 이용하여 분석 대상 가족의 비율이 연산되는 단계일 수 있다. 일 예로, 상기 윈도우 크기를 결정하는 n은 분석에 사용된 가족 구성원의 수에 따라 달라질 수 있다. Step (b) may be a step in which the window size n is determined using the number of family members capable of analyzing the SNV data quantified in step (a), and step (c) is based on any specific position sequence A window including n base sequences of the window size determined in step (c) may be set to the left and the right, respectively, and the ratio of the family to be analyzed may be calculated using the quantized SNV data in the set window. As an example, n determining the window size may vary depending on the number of family members used in the analysis.

(d) 단계는 상기 (c) 단계에서 설정된 윈도우 내 단일 염기서열(SNV) 위치에서 단일비율검정을 이용하여 유의 확률(p-value)을 연산되는 단계일 수 있다. 일 예로, 단일비율검정의 귀무가설은 PRj.=0.5로 설정하고, 단일비율검정에서 얻은 유의확률을 f(PRj.)으로 설정하여 계산할 수 있다. Step (d) may be a step of calculating a significant probability (p-value) using a single ratio test at a single sequence (SNV) position in the window set in step (c). For example, the null hypothesis of the single ratio test may be calculated by setting PRj. = 0.5 and setting the significant probability obtained from the single ratio test to f (PRj.).

(e) 단계는 상기 (c) 단계에서 설정된 윈도우의 양측 말단의 물리적인 위치 보정을 위한 가중치가 연산되는 단계이고, (f) 단계는 상기 (d) 단계에서 연산된 유의 확률과 상기 (e) 단계에서 연산된 가중치를 이용하여 우선순위 점수가 연산되는 단계이다. 구체적으로 상기 (f) 단계에서 연산된 우선순위 점수가 -log(0.05)=2.996 이상인 경우, 개체의 SNV와 정상인의 SNV 패턴이 각각 일치하는지 확인하는 단계를 추가로 포함할 수 있다. 또한, 상기 (f) 단계에서 연산된 우선순위 점수가 -log(0.05)=2.996 이상인 조건을 만족하는 경우, 단일 염기서열(SNV) 위치가 암호화 부위(coding region)인지 확인하는 단계를 추가로 포함할 수 있으나, 이에 제한되지 않는다. Step (e) is a step for calculating a weight for the physical position correction of both ends of the window set in step (c), step (f) is a significant probability calculated in step (d) and (e) The priority score is calculated using the weight calculated in the step. Specifically, when the priority score calculated in step (f) is -log (0.05) = 2.996 or more, the method may further include checking whether the SNV pattern of the individual and the SNV pattern of the normal person correspond to each other. In addition, if the priority score calculated in the step (f) satisfies the condition -log (0.05) = 2.996 or more, further comprising the step of checking whether a single sequence (SNV) position is a coding region (coding region) It may be, but is not limited thereto.

또한, (g) 단계는 상기 연산된 우선순위 점수에 따라 단일 염기서열(SNV)이 유전자기호(gene symbol)로 변환되는 단계이며, (h) 단계는 상기 연산된 우선순위 점수에 따라 우선 순위가 결정되고, 결정된 우선 순위에 따라 원인 후보 유전자 리스트가 확인되는 단계를 포함할 수 있다. Further, step (g) is a step in which a single nucleotide sequence (SNV) is converted into a gene symbol according to the calculated priority score, and step (h) has a priority according to the calculated priority score. And determining a cause candidate gene list according to the determined priority.

일 실시예에서는 상기 방법을 통해, 서양인에서는 한번도 보고되지 않은 13개의 신규 변이체를 새롭게 동정하였다. 구체적으로, 상기 방법을 통해 AP4M1, ATL1, CYP7B1, DDHD1, KIAA0196, KIF1C, KIF5A, PLP1, REEP1, SPAST, SPG11, SPG20, 및 SPG7를 새롭게 동정하였다.In one embodiment, through this method, 13 new variants have not been newly identified in Westerners. Specifically, AP4M1, ATL1, CYP7B1, DDHD1, KIAA0196, KIF1C, KIF5A, PLP1, REEP1, SPAST, SPG11, SPG20, and SPG7 were newly identified through the above method.

상기 목적을 달성하기 위한 다른 하나의 양태는 HSP 예측을 위한 정보 제공용 매체에 저장된 컴퓨터 프로그램을 제공하는 것이다.Another aspect for achieving the above object is to provide a computer program stored in an information providing medium for HSP prediction.

구체적으로, (a) 유전성 강직성 하반신 마비 단일 염기서열 (SNV) 데이터 및 가족정보가 입력되는, 데이터 취득부; (b) (i) 상기 입력된 SNV 및 가족정보를 기 설정된 수식을 이용하여 SNV 데이터의 수치화; (ii) 수치화된 SNV 데이터를 분석 가능한 가족 구성원의 수를 이용하여 윈도우 사이즈 n 결정; (iii) 임의의 특정 위치 염기서열을 중심으로 좌측 및 우측으로 각각 상기 (ii)의 결정된 윈도우 사이즈 n개의 염기서열을 포함하는 윈도가 설정되고, 설정된 윈도우 내 수치화된 SNV 데이터를 이용하여 분석 대상 가족의 비율을 연산; (iv) 상기 (iii)에서 설정된 윈도우 내 SNV 위치에서 단일비율 검정을 이용하여 유의 확율을 연산; (v) 상기 (iii)에서 설정된 윈도우의 양측 말단의 물리적인 위치 보정을 위한 가중치 연산; 및 (vi) 상기 (v)에서 연산된 유의 확률과 상기 (v)의 연산된 가중치를 이용한 우선순위 점수 연산을 수행하는 데이터 연산부; (c) 상기 연산부에서 연산된 우선순위 점수에 따라 SNV가 유전자 기호로 변환되는 맵핑부; 및 (d) 상기 맵핑된 유전자 기호를 확인하여, HSP 질환 원인 유전자를 확인하는 동정부를 포함하는, HSP 질환 원인 유전자 동정용 시스템을 제공하는 것이다. Specifically, (a) hereditary stiff paraplegic single nucleotide sequence (SNV) data and family information is input, the data acquisition unit; (b) (i) digitizing the SNV data using a preset formula of the input SNV and family information; (ii) determining window size n using the number of family members that can analyze the quantified SNV data; (iii) a window comprising the determined window size n base sequences of (ii), respectively, to the left and the right around any specific position sequence, and the family to be analyzed using the quantified SNV data in the set window Calculate the ratio of; (iv) calculating a significance probability using a single ratio test at the SNV position in the window set in (iii) above; (v) weight calculation for physical position correction of both ends of the window set in (iii); And (vi) a data operator for performing a priority score operation using the significance probability calculated in (v) and the weighted value of (v); (c) a mapping unit for converting an SNV into a genetic symbol according to the priority score calculated by the calculating unit; And (d) identifying the mapped gene symbol, and identifying the HSP disease causing gene.

더욱 구체적으로, 상기 (b) 단계의 기 설정된 수식은 (i) 개체의 부친과 모친의 염기서열 정보를 모두 사용할 수 있는 경우, 상기 기 설정된 수학식은 More specifically, the preset formula of step (b) is (i) when both the father and mother of the base sequence information can be used, the preset equation is

이고,

ego,

(ii) 개체의 부친과 모친의 염기서열 정보를 모두 사용할 수 없거나, 하나만 사용할 수 있는 경우 상기 기 설정된 수학식은 (ii) If neither the father nor the mother's nucleotide sequence information is available or only one is available, the preset equation is

이며,

Is,

여기서 MSNV_i(S=1)는 i 번째 위치에서 환자의 SNV에 가장 일반적인 패턴이고, v=1, ..., V 인 것일 수 있으나, 이에 제한되지 않는다. Here, MSNV _i (S = 1) is the most general pattern for the SNV of the patient at the i-th position, and may be v = 1, ..., V, but is not limited thereto.

상기 목적을 달성하기 위한 다른 하나의 양태는 HSP 질환 원인 유전자 동정용 시스템을 제공하는 것이다.Another aspect for achieving the above object is to provide a system for identifying HSP disease cause genes.

구체적으로, (a) 유전성 강직성 하반신마비 단일 염기서열(Single Nucleotide Variant, SNV) 데이터 및 가족정보가 입력된 유전성 강직성 하반신마비 단일 염기서열 데이터 및 가족정보와, 기 설정된 수식을 이용하여 SNV 데이터가 수치화되는 단계; (b) 상기 (a) 단계에서 수치화된 SNV 데이터를 분석 가능한 가족 구성원의 수를 이용하여 윈도우 사이즈 n이 결정되는 단계; (c) 임의의 특정 위치 염기서열을 중심으로 좌측 및 우측으로 각각 상기 (b) 단계에서 결정된 윈도우 사이즈인 n개의 염기서열을 포함하는 윈도우가 설정되고, 설정된 윈도우 내 수치화된 SNV 데이터를 이용하여 분석 대상 가족의 비율이 연산되는 단계; (d) 상기 (c) 단계에서 설정된 윈도우 내 단일 염기서열(SNV) 위치에서 단일비율검정을 이용하여 유의 확률(p-value)이 연산되는 단계; (e) 상기 (c) 단계에서 설정된 윈도우의 양측 말단의 물리적인 위치 보정을 위한 가중치가 연산되는 단계; (f) 상기 (d) 단계에서 연산된 유의 확률과 상기 (e) 단계에서 연산된 가중치를 이용하여 우선순위 점수가 연산되는 단계; (g) 상기 연산된 우선순위 점수에 따라 선정된 단일 염기서열(SNV)과 유전성 강직성 하반신마비 단일 염기서열(Single Nucleotide Variant, SNV) 데이터를 맵핑하는 단계; 및 (h) 상기 단계에서 맵핑된 SNV를 이용하여 HSP 위험 여부를 출력하는 단계를 포함하는, HSP 예측을 위한 정보 제공용 매체에 저장된 컴퓨터 프로그램을 제공하는 것이다. Specifically, (a) Hereditary stiff paraplegic single nucleotide sequence (SNV) data and family information into which the hereditary stiff paraplegic single nucleotide sequence data and family information and SNV data are quantified using a predetermined formula. Becoming; (b) determining a window size n using the number of family members capable of analyzing the SNV data quantified in step (a); (c) a window including n base sequences having a window size determined in step (b), respectively, is set to the left and the right of a specific position sequence, respectively, and is analyzed using the quantized SNV data in the set window. Calculating a proportion of the target family; (d) calculating a significant probability (p-value) using a single ratio test at a single sequence (SNV) position in the window set in step (c); (e) calculating weights for physical position correction of both ends of the window set in step (c); (f) calculating a priority score using the significance probability calculated in step (d) and the weight calculated in step (e); (g) mapping single nucleotide sequence (SNV) data and hereditary stiff paraplegic single nucleotide sequence (SNV) data according to the calculated priority score; And (h) outputting HSP risk using the SNV mapped in the above step, to provide a computer program stored in an information providing medium for HSP prediction.

본 발명의 컴퓨터 프로그램을 이용하는 경우, 새롭게 동정된 유전성 강직성 하반신마비 질환 원인 유전자를 이용하여, HSP 질환의 발병 가능성을 예측할 수 있는 장점이 있다. When using the computer program of the present invention, the newly identified hereditary stiff paraplegic disease causal gene can be used to predict the possibility of developing HSP disease.

상기 목적을 달성하기 위한 다른 하나의 양태는 HSP 질환 원인 신규 변이체를 동정하는 방법을 제공하는 것이다.Another aspect for achieving the above object is to provide a method for identifying new variants of HSP disease causes.

구체적으로, (a) AMPD2, PSEN1, APP, CAPN1, FUS, ALDH18A1, ALDH18A1, SOD1, MARS, MPZ, MAG, NEFL, PFN1, SLC16A2, UBQLN2, PLP1, GJB1, L1CAM, GARS, PSEN2, VCP, PMP22, KIF1A, HINT1, ATXN2, ENTPD1, B4GALNT1, AP4M1, TFG, SPG7, KIF5A, KIF1C, USP8, RTN2, PNPLA6, SLC33A1, CYP7B1, TBK1, TARDBP, KIF1B, BSCL2, ALS2, ATL1, GDAP1, LYST, AP4S1, AP4E1, AP4B1, ARL6IP1, NIPA1, SPG21, MFN2, GJC2, CPT1C, REEP1, RAB3GAP2, REEP2, FIG4, GBA2, BICD2, VPS37A, ARSI, CCT5, CYP2U1, SPG11, FA2H, CCDC50, ERLIN1, ERLIN2, PGAP1, ZFYVE26, WDR48, C12orf65, AP5Z1, C19orf12, DDHD1, TECPR2, DDHD2, IBA57, ZFR, KLC2, NT5C2, SPAST, ZFYVE27, KIAA0196, HSP60, SPG20, ARL6IP2로 구성된 군으로부터 선택되는 하나 이상의 유전자를 유전자를 코딩하는 뉴클레오타이드 서열에 특이적으로 결합하는 프라이머 또는 프로브를 이용하여 차세대 염기서열 분석((Next Generation Sequencing)을 통해 유전성 강직성 하반신마비 단일 염기서열(Single Nucleotide Variant, SNV) 데이터를 얻는 단계; (b) 상기 SNV 데이터와 개체로부터 얻어진 시료의 데이터를 맵핑하는 단계; 및 (c) 상기 SNV 데이터와 HSP 질환 원인 유전자의 변이된 위치를 확인하는 단계를 포함하는, HSP 질환 원인 신규 변이체를 동정하는 방법을 제공하는 것이다.Specifically, (a) AMPD2, PSEN1, APP, CAPN1, FUS, ALDH18A1, ALDH18A1, SOD1, MARS, MPZ, MAG, NEFL, PFN1, SLC16A2, UBQLN2, PLP1, GJB1, L1CAM, GARS, PSMP22, VCP KIF1A, HINT1, ATXN2, ENTPD1, B4GALNT1, AP4M1, TFG, SPG7, KIF5A, KIF1C, USP8, RTN2, PNPLA6, SLC33A1, CYP7B1, TBK1, TARDBP, KIF1B, BSCL1, ALS2, ATLS, AP4 AP4B1, ARL6IP1, NIPA1, SPG21, MFN2, GJC2, CPT1C, REEP1, RAB3GAP2, REEP2, FIG4, GBA2, BICD2, VPS37A, ARSI, CCT5, CYP2U1, SPG11, FA2H, CCDC50, ERLINFY, GRL Nucleotide sequence encoding the gene for one or more genes selected from the group consisting of C12orf65, AP5Z1, C19orf12, DDHD1, TECPR2, DDHD2, IBA57, ZFR, KLC2, NT5C2, SPAST, ZFYVE27, KIAA0196, HSP60, SPG20, ARL6IP2 Genetic stiff paraplegic single nucleotide sequence through next generation sequencing using primers or probes that bind to SNV) obtaining data, (b) mapping the SNV data and data of a sample obtained from the subject, and (c) identifying the mutated position of the SNV data and the HSP disease causing gene, HSP. It is to provide a method for identifying a disease-causing new variant.

상기 목적을 달성하기 위한 다른 하나의 양태는 HSP 진단을 위한 정보 제공 방법을 제공하는 것이다.Another aspect for achieving the above object is to provide a method for providing information for HSP diagnosis.

구체적으로, (a) AMPD2, PSEN1, APP, CAPN1, FUS, ALDH18A1, ALDH18A1, SOD1, MARS, MPZ, MAG, NEFL, PFN1, SLC16A2, UBQLN2, PLP1, GJB1, L1CAM, GARS, PSEN2, VCP, PMP22, KIF1A, HINT1, ATXN2, ENTPD1, B4GALNT1, AP4M1, TFG, SPG7, KIF5A, KIF1C, USP8, RTN2, PNPLA6, SLC33A1, CYP7B1, TBK1, TARDBP, KIF1B, BSCL2, ALS2, ATL1, GDAP1, LYST, AP4S1, AP4E1, AP4B1, ARL6IP1, NIPA1, SPG21, MFN2, GJC2, CPT1C, REEP1, RAB3GAP2, REEP2, FIG4, GBA2, BICD2, VPS37A, ARSI, CCT5, CYP2U1, SPG11, FA2H, CCDC50, ERLIN1, ERLIN2, PGAP1, ZFYVE26, WDR48, C12orf65, AP5Z1, C19orf12, DDHD1, TECPR2, DDHD2, IBA57, ZFR, KLC2, NT5C2, SPAST, ZFYVE27, KIAA0196, HSP60, SPG20, ARL6IP2로 구성된 군으로부터 선택되는 하나 이상의 유전자를 유전자를 코딩하는 뉴클레오타이드 서열에 특이적으로 결합하는 프라이머 또는 프로브를 이용하여 차세대 염기서열 분석((Next Generation Sequencing)을 통해 유전성 강직성 하반신마비 단일 염기서열(Single Nucleotide Variant, SNV) 데이터를 얻는 단계; (b) 상기 SNV 데이터와 개체로부터 얻어진 시료의 데이터를 맵핑하는 단계; (c) 상기 SNV 데이터와 HSP 질환 원인 유전자의 변이된 위치를 확인하여 HSP 질환 원인 신규 변이체를 동정하는 단계; 및 (d) HSP 질환 원인 신규 변이체의 수가 많은 경우 유전성 강직성 하반신마비 (HSP) 질환 위험도가 높음을 예측하는 단계를 포함하는, HSP 진단 방법을 제공하는 것이다. Specifically, (a) AMPD2, PSEN1, APP, CAPN1, FUS, ALDH18A1, ALDH18A1, SOD1, MARS, MPZ, MAG, NEFL, PFN1, SLC16A2, UBQLN2, PLP1, GJB1, L1CAM, GARS, PSMP22, VCP KIF1A, HINT1, ATXN2, ENTPD1, B4GALNT1, AP4M1, TFG, SPG7, KIF5A, KIF1C, USP8, RTN2, PNPLA6, SLC33A1, CYP7B1, TBK1, TARDBP, KIF1B, BSCL1, ALS2, ATLS, AP4 AP4B1, ARL6IP1, NIPA1, SPG21, MFN2, GJC2, CPT1C, REEP1, RAB3GAP2, REEP2, FIG4, GBA2, BICD2, VPS37A, ARSI, CCT5, CYP2U1, SPG11, FA2H, CCDC50, ERLINFY, GRL Nucleotide sequence encoding the gene for one or more genes selected from the group consisting of C12orf65, AP5Z1, C19orf12, DDHD1, TECPR2, DDHD2, IBA57, ZFR, KLC2, NT5C2, SPAST, ZFYVE27, KIAA0196, HSP60, SPG20, ARL6IP2 Genetic stiff paraplegic single nucleotide sequence through next generation sequencing using primers or probes that bind to SNV) obtaining data, (b) mapping the SNV data and data of a sample obtained from the subject, (c) identifying new variants of HSP disease causes by identifying the mutated positions of the SNV data and HSP disease causing genes. And (d) predicting a high risk of hereditary tonic paraplegia (HSP) disease when the number of new variants of HSP disease causes is high.

상기 목적을 달성하기 위한 또 다른 하나의 양태는 HSP 질환 진단용 시스템을 제공하는 것이다.Another aspect for achieving the above object is to provide a system for diagnosing HSP disease.

구체적으로, (a) 유전성 강직성 하반신 마비 단일 염기서열 (SNV) 데이터 및 가족정보가 입력되는, 데이터 취득부; (b) (i) 상기 입력된 SNV 및 가족정보를 기 설정된 수식을 이용하여 SNV 데이터의 수치화; (ii) 수치화된 SNV 데이터를 분석 가능한 가족 구성원의 수를 이용하여 윈도우 사이즈 n 결정; (iii) 임의의 특정 위치 염기서열을 중심으로 좌측 및 우측으로 각각 상기 (ii)의 결정된 윈도우 사이즈 n개의 염기서열을 포함하는 윈도가 설정되고, 설정된 윈도우 내 수치화된 SNV 데이터를 이용하여 분석 대상 가족의 비율을 연산; (iv) 상기 (iii)에서 설정된 윈도우 내 SNV 위치에서 단일비율 검정을 이용하여 유의 확율을 연산; (v) 상기 (iii)에서 설정된 윈도우의 양측 말단의 물리적인 위치 보정을 위한 가중치 연산; 및 (vi) 상기 (v)에서 연산된 유의 확률과 상기 (v)의 연산된 가중치를 이용한 우선순위 점수 연산을 수행하는 데이터 연산부; (c) 상기 연산부에서 연산된 우선순위 점수에 따라 선정된 SNV와 HSP의 SNV 데이터를 맵핑하는 맵핑부; 및 (d) 상기 단계에서 맵핑된 SNV를 이용하여 HSP 위험 여부를 출력하는 동정부를 포함하는, HSP 질환 진단용 시스템을 제공하는 것이다. Specifically, (a) hereditary stiff paraplegic single nucleotide sequence (SNV) data and family information is input, the data acquisition unit; (b) (i) digitizing the SNV data using a preset formula of the input SNV and family information; (ii) determining window size n using the number of family members that can analyze the quantified SNV data; (iii) a window comprising the determined window size n base sequences of (ii), respectively, to the left and the right around any specific position sequence, and the family to be analyzed using the quantified SNV data in the set window Calculate the ratio of; (iv) calculating a significance probability using a single ratio test at the SNV position in the window set in (iii) above; (v) weight calculation for physical position correction of both ends of the window set in (iii); And (vi) a data operator for performing a priority score operation using the significance probability calculated in (v) and the weighted value of (v); (c) a mapping unit for mapping the selected SNV and SNV data of the HSP according to the priority score calculated by the calculating unit; And (d) it provides a system for diagnosing HSP disease, including the identification of outputting the HSP risk using the SNV mapped in the step.

상기 목적을 달성하기 위한 또 다른 하나의 양태는 AP4M1, ATL1, CYP7B1, DDHD1, KIAA0196, KIF1C, KIF5A, PLP1, REEP1, SPAST, SPG11, SPG20, SPG7로 구성된 군으로부터 선택되는 하나 이상의 유전자를 코딩하는 뉴클레오타이드 서열에 특이적으로 결합하는 프라이머 또는 프로브를 포함하는, HSP 질환의 원인 진단용 칩을 제공하는 것이다.Another aspect for achieving the above object is a nucleotide encoding at least one gene selected from the group consisting of AP4M1, ATL1, CYP7B1, DDHD1, KIAA0196, KIF1C, KIF5A, PLP1, REEP1, SPAST, SPG11, SPG20, SPG7 It is to provide a chip for diagnosing the cause of HSP disease, including a primer or probe that specifically binds to the sequence.

상기 HSP 질환의 원인 진단용 칩에 포함되는 13개의 원인 유전자는 본 발명에서 제공하는 HSP 질환 원인 신규 변이체를 동정하는 방법을 이용하여 새롭게 동정된 HSP 원인 변이체로서, 상기 유전자를 코딩하는 뉴클레오타이드 서열에 특이적으로 결합하는 프라이머 또는 프로브를 포함하는, HSP 질환의 원인 진단용 칩을 이용하는 경우, HSP 질환의 원인을 더 정확하게 진단할 수 있다.The thirteen cause genes included in the chip for diagnosing the cause of the HSP disease are newly identified HSP cause variants using the method for identifying the HSP disease cause new variant provided by the present invention, and are specific for the nucleotide sequence encoding the gene. When using a chip for diagnosing the cause of the HSP disease, including a primer or a probe to bind to, it is possible to diagnose the cause of the HSP disease more accurately.

또한, 상기 HSP 질환의 원인 진단용 칩은 AMPD2, PSEN1, APP, CAPN1, FUS, ALDH18A1, ALDH18A1, SOD1, MARS, MPZ, MAG, NEFL, PFN1, SLC16A2, UBQLN2, GJB1, L1CAM, GARS, PSEN2, VCP, PMP22, KIF1A, HINT1, ATXN2, ENTPD1, B4GALNT1, TFG, USP8, RTN2, PNPLA6, SLC33A1, TBK1, TARDBP, KIF1B, BSCL2, ALS2, GDAP1, LYST, AP4S1, AP4E1, AP4B1, ARL6IP1, NIPA1, SPG21, MFN2, GJC2, CPT1C, RAB3GAP2, REEP2, FIG4, GBA2, BICD2, VPS37A, ARSI, CCT5, CYP2U1, FA2H, CCDC50, ERLIN1, ERLIN2, PGAP1, ZFYVE26, WDR48, C12orf65, AP5Z1, C19orf12, TECPR2, DDHD2, IBA57, ZFR, KLC2, NT5C2, ZFYVE27, HSP60, ARL6IP2로 구성된 군으로부터 선택되는 하나 이상의 유전자를 추가로 포함할 수 있으며, HSP 질환과 관련된 유전자라면 이에 제한되지 않고 포함될 수 있다. In addition, the chip for diagnosing the cause of the HSP disease is AMPD2, PSEN1, APP, CAPN1, FUS, ALDH18A1, ALDH18A1, SOD1, MARS, MPZ, MAG, NEFL, PFN1, SLC16A2, UBQLN2, GJB1, L1CAM, GARS, PSEN2, VCP PMP22, KIF1A, HINT1, ATXN2, ENTPD1, B4GALNT1, TFG, USP8, RTN2, PNPLA6, SLC33A1, TBK1, TARDBP, KIF1B, BSCL2, ALS2, GDAP1, LYST, AP4S1, AP4E1, AP4P21 NI GJC2, CPT1C, RAB3GAP2, REEP2, FIG4, GBA2, BICD2, VPS37A, ARSI, CCT5, CYP2U1, FA2H, CCDC50, ERLIN1, ERLIN2, PGAP1, ZFYVE26, WDR48, C12orf65, AP5Z1, C19FR2 It may further include one or more genes selected from the group consisting of KLC2, NT5C2, ZFYVE27, HSP60, ARL6IP2, and genes related to HSP disease may be included without limitation.

상기 목적을 달성하기 위한 또 다른 하나의 양태는 상기 HSP 질환의 원인 진단용 칩을 포함하는 HSP 진단용 키트를 제공하는 것이다. 상기 키트는 RT-PCR 키트, DNA 칩 키트 또는 단백질 칩 키트인 것일 수 있으나, 이에 제한 되지 않는다. Another aspect for achieving the above object is to provide a HSP diagnostic kit comprising a chip for diagnosing the cause of the HSP disease. The kit may be RT-PCR kit, DNA chip kit or protein chip kit, but is not limited thereto.

본 발명은 HSP 질환의 원인 신규 변이체를 동정하는 방법을 제공함으로써, 기존의 복잡한 유전성 강직성 하반신마비 질환의 원인 유전자 동정 알고리즘을 확립하고, 진단 시스템을 구축하였다. 이는 HSP 원인 유전자의 동정 효율을 높이면서 비용 및 시간을 절감 할 수 있는 장점을 갖고, 또한, HSP 질환의 신규한 원인 유전자 변이를 규명하였다는 점에 의의가 있다. The present invention provides a method for identifying a causal variant of HSP disease, thereby establishing a causal genetic identification algorithm for an existing complex hereditary stiff paraplegic disease and establishing a diagnostic system. This has the advantage of reducing the cost and time while increasing the identification efficiency of the HSP causal gene, and also has the significance of identifying a novel causal gene mutation of HSP disease.

도 1은 HSP 질환 샘플의 뉴클레오티드 변이 분석방법을 도식화한 것이다.
도 2는 HSP 질환 원인 신규 변이체를 동정하는 방법을 도식화한 것이다.1 is a diagram illustrating a method for analyzing nucleotide variation in HSP disease samples.
2 is a schematic of a method for identifying new variants of HSP disease causes.

이하, 실시예를 통하여 본 발명을 보다 상세히 설명하고자 한다. 이들 실시예는 본 발명을 보다 구체적으로 설명하기 위한 것으로, 본 발명의 범위가 이들 실시예에 한정되는 것은 아니다.Hereinafter, the present invention will be described in more detail with reference to Examples. These examples are intended to illustrate the present invention more specifically, but the scope of the present invention is not limited to these examples.

실시예Example 1. 유전성 강직성 하반신마비 변이 검출 1. Detection of hereditary tonic paraplegia 타겟을Target 위한 유전자의 선정 Selection of genes for

1-1. 유전성 강직성 하반신마비 변이 검출 연구 대상자 선정1-1. Selection of subjects for the detection of hereditary stiff paraplegic variance

총 109명의 유전성 강직성 하반신마비(HSP) 질환 환자 또는 상기 환자의 정상 가계원을 본 발명의 실험대상으로 등록하였다. 임상 평가는 두 명의 독자적 신경과 전문의에 의해 실시되었으며, 모든 참가자는 성균관 대학교, 삼성서울병원의 기관생명윤리위원회에 의해 승인된 절차에 따라 피험자 동의서를 제공하였다. 유전성 강직성 하반신마비 환자 83명 및 환자의 정상 가계원 26명을 건강한 대조군으로 이용해서 연구를 수행하였다.A total of 109 hereditary anterior paraplegic (HSP) disease patients or normal household members of the patients were enrolled as subjects of the present invention. Clinical evaluation was conducted by two independent neurologists, and all participants provided subject informed consent in accordance with procedures approved by the Institutional Bioethics Committee of Sungkyunkwan University and Samsung Medical Center. The study was performed using 83 patients with hereditary anterior paraplegia and 26 normal household members of the patient as healthy controls.

1-2. 유전성 강직성 하반신마비 변이 검출 1-2. Detection of hereditary tonic paraplegia mutation 타겟을Target 위한 유전자의 선정 Selection of genes for

유전성 강직성 하반신마비 관련 최적의 유전자를 선정하였다. The optimal gene related to hereditary stiff paraplegia was selected.

그 결과 INDEL(Insertions and deletions) 및 SNV(Single Nucleotide Variation)를 검출하기 위한 유전자로서, LATL1, BSCL2, CYP7B1, DDHD1, HSPD1, KIAA0196, KIF1A, KIF5A, MAST, NIPA1, PLP1, PNPLA6, REEP1, RTN2, SPARTIN, SPAST, SPG11, SPG7, ZFYVE26, ZFYVE27, AP4B1, AP4E1, AP4M1, AP5Z1, ARL6IP1, C19orf12, CAPN1, CPT1C, ENTPD1, FLRT1, GAD1, KIF1C, LYST, MAG, NT5C2, PGAP1, REEP2, SPG20, USP8 유전자를 선정하였다. As a result, the genes for detecting INDEL (Insertions and deletions) and Single Nucleotide Variation (SNV) are LATL1, BSCL2, CYP7B1, DDHD1, HSPD1, KIAA0196, KIF1A, KIF5A, MAST, NIPA1, PLP1, PNPLA6, REEP1, RTN2, SPARTIN, SPAST, SPG11, SPG7, ZFYVE26, ZFYVE27, AP4B1, AP4E1, AP4M1, AP5Z1, ARL6IP1, C19orf12, CAPN1, CPT1C, ENTPD1, FLRT1, GAD1, KIF1C, LYST, MAG, NTP, SP2 Was selected.

1-3. 유전성 강직성 하반신마비 변이 검출 1-3. Detection of hereditary tonic paraplegia mutation 타겟을Target 프로브Probe 제작 making

실시예 1-2에서와 같이 표적 유전자를 선별하였으며, 프라이머 설계는 Sureselect(Agilent Technologies, Santa Clara, CA)를 이용하였다. 유전자의 UTR 영역의 경우 타겟에서 제외한 후 단백질로 코딩이 되는 영역을 최대한 커버할 수 있는 프로브를 제작하였다. 총 85개 유전자의 CDS영역을 포함한 1229개 엑손 영역의 서열을 바탕으로 엑손 프로브를 제작하였다. 총 1229개 액손 영역의 사이즈는 362,093bp이다.Target genes were selected as in Example 1-2 and primer design was used with Sureselect (Agilent Technologies, Santa Clara, Calif.). In the case of the UTR region of the gene, a probe was prepared to cover the region encoded by the protein after the exclusion from the target. Exon probes were constructed based on the sequence of 1229 exon regions, including CDS regions of a total of 85 genes. A total of 1229 axon regions were 362,093 bp in size.

1-4. 유전성 강직성 하반신마비 변이 검출 1-4. Detection of hereditary tonic paraplegia mutation 타겟target 캡처 및 Capture and NGSNGS 라이브러리 제작 Library Authoring

유전성 강직성 하반신마비 변이 검출 유전자 패널 NGS(Next-generation Sequenceing) 실험을 위해 유전성 강직성 하반신 마비 환자의 혈액으로부터 유전체 DNA를 QiAmp DNA Mini kit (Qiagen, Valencia, CA, USA)를 사용하여 분리하였다. 그 후, Nanodrop 8000 UV-Vis spectrometer (Thermo Scientific Inc., DE, USA), Qubit 2.0 Fluorometer (Life technologies Inc., Grand Island, NY, USA) 및 2200 TapeStation Instrument (Aglient Technologies, Santa Clara, CA, USA) 장비를 사용하여 분리된 유전체 DNA의 농도, 순도, 및 분해(degradation) 여부를 확인하였다. QC 기준에 부합한 임상시료의 경우 다음 단계의 실험에 사용하였다.Genetic Stiff Paraplegic Variation Detection Gene Panel Genomic DNA was isolated from the blood of hereditary stiff paraplegic patients using a QiAmp DNA Mini kit (Qiagen, Valencia, CA, USA) for next-generation sequencing experiments. Subsequently, Nanodrop 8000 UV-Vis spectrometer (Thermo Scientific Inc., DE, USA), Qubit 2.0 Fluorometer (Life technologies Inc., Grand Island, NY, USA) and 2200 TapeStation Instrument (Aglient Technologies, Santa Clara, CA, USA ), The concentration, purity, and degradation of the isolated genomic DNA were determined using the equipment. Clinical samples that met the QC criteria were used for the next step of the experiment.

QC를 통과한 임상시료의 혈액으로부터 확보한 유전체 DNA (~250ng)는 Covaris S220 (Covaris, MA, USA)를 사용하여 전단(shearing)을 수행한 후, End-repair, A-tailing, Paired-End adaptor ligation 및 amplification 단계를 거쳐 시퀀싱 라이브러리 제작을 수행하였다. 구체적으로, 상기 실시예 1-3에서 선정된 85개 유전자의 1229개 엑손 영역들을 캡처하기 위해 제작된 프로브를 모두 포함하는 조성물을 사용하여 라이브러리의 Hybridization time은 65℃에서 24시간 동안 반응하였으며, Hybridization 의해 캡처된 유전체 DNA 라이브러리 조각들을 정제하였다. 정제는 엑손에 부착된 바이오틴과 스트렙타비딘의 결합 특성을 이용하였다. 구체적으로 자성비드로 코팅된 스트렙타비딘과 캡처된 라이브러리 조각에 부착된 바이오틴을 결합시킨 후 자기력을 이용하여 혼합물로부터 캡처된 라이브러리 조각을 분리하였다. 그 후 정제된 유전체 DNA 라이브러리 조각을 index barcode tag와 함께 증폭을 수행하였다.Genomic DNA (~ 250ng) obtained from blood from clinical samples that passed QC was sheared using Covaris S220 (Covaris, MA, USA), followed by End-repair, A-tailing, Paired-End The sequencing library was fabricated through adapter ligation and amplification steps. Specifically, the hybridization time of the library was reacted at 65 ° C. for 24 hours using the composition including all of the probes designed to capture 1229 exon regions of the 85 genes selected in Examples 1-3 above. The genomic DNA library pieces captured by were purified. Purification took advantage of the binding properties of biotin and streptavidin attached to exons. Specifically, the combined library of streptavidin coated with magnetic beads and biotin attached to the captured library fragments were separated, and then the captured library fragments were separated from the mixture using magnetic force. Then, the purified genomic DNA library fragment was amplified with an index barcode tag.

1-5. 유전성 강직성 하반신마비 변이 검출 1-5. Detection of hereditary tonic paraplegia mutation 타겟target 켑처Capture 임상시료 유전체 데이터 생산 Clinical Sample Genome Data Production

유전성 강직성 하반신 마비 임상시료에서 85개 유전자의 1,229개 엑손 영역들을 캡처한 시퀀싱 라이브러리를 NGS 시퀀싱 기계(Miseq, illumina, USA)에 주입하여 각 DNA 절편의 서열 정보를 획득하고, 유전체 데이터의 가공(Trimming) 및 표준 인간 유전체에 정렬하여 샘플에서 각 유전자에 대한 서열정보를 수득하였다. 시퀀싱 반응은 TruSeq Rapid PE Cluster kit 및 TruSeq Rapid SBS kit (Illumina, USA)를 사용하여 이루어졌으며 양방향 100bp를 읽을 수 있는 paired-end 조건으로 수행하였다.A sequencing library that captured 1,229 exon regions of 85 genes in a hereditary stiff paraplegic clinical sample was injected into an NGS sequencing machine (Miseq, illumina, USA) to obtain sequence information for each DNA fragment and to process genomic data. ) And the standard human genome to obtain sequence information for each gene in the sample. Sequencing reactions were performed using the TruSeq Rapid PE Cluster kit and TruSeq Rapid SBS kit (Illumina, USA) and were performed under paired-end conditions capable of reading bidirectional 100bp.

1-6. 유전성 강직성 하반신마비 1-6. Hereditary stiff paraplegia 변이체Variant 데이터 추출 Data extract

NGS 시퀀싱 장비에서 만들어진 시퀀싱 리드(reads) 데이터를 유전체 데이터의 가공(Trimming)의 절차를 수행한 후 BWA (Burrows-Wheeler Aligner) 알고리즘을 사용하여 UCSC hg19 reference genome (http://genome.ucsc.edu)에 정렬(alignment)를 수행하였다. 유전체 데이터의 QC가 완료된 서열을 UCSC hg19 표준 서열에 맵핑하는 과정에서 NGS library를 만들 때 생기는 PCR 중복 리드(PCR duplicated reads)가 포함될 수 있으며, PCR 중복 리드를 제거하는 이유는 시퀀싱을 하기 위해 증폭하는 과정이 필요한데 오류로 증폭이 더 많이 된 부분을 제거하기 위함이다. PCR 중복은 picard-tools-1.119 (http://picard.sourceforge.net/)를 사용하여 제거하였으며, GenomeAnalysisTK-3.8 알고리즘을 사용하여 단일 뉴클레오티드 변이(Single Nucleotide Variation, SNV) 및 삽입-결실변이(INDEL)를 동정하였다. 동정된 뉴클레오티드 변이를 UCSC hg19의 정보를 근간으로 각 변이의 주석을 달기 위하여 ANNOVAR 프로그램을 이용하여 주석 달기를 수행하였다. After sequencing reads data created on NGS sequencing equipment is subjected to the procedure of trimming the genomic data, UCSC hg19 reference genome (http://genome.ucsc.edu) using the Burrows-Wheeler Aligner (BWA) algorithm. ), Alignment was performed. PCR duplicated reads that occur when the NGS library is created in the process of mapping the QC-completed sequence of genomic data to the UCSC hg19 standard sequence, and the reason for removing the PCR duplicated reads are to amplify them for sequencing. This is necessary to eliminate the part where the error is more amplified. PCR duplications were removed using picard-tools-1.119 (http://picard.sourceforge.net/) and single nucleotide variations (SNV) and indels (INDEL) using the GenomeAnalysisTK-3.8 algorithm. ) Was identified. The identified nucleotide variations were annotated using the ANNOVAR program to annotate each variation based on the information of UCSC hg19.

1-7. 유전성 강직성 하반신마비 원인 유전자 변이 선발1-7. Genetic mutation selection for hereditary stiff paraplegia

각 임상 시료별 추출이 완료된 단일 뉴클레오티드 변이(Single Nucleotide Variation, SNV) 및 삽입-결실변이(INDEL) 중에서 변이의 특성(Function), 변이형태 (variation type), 아미노산 변경 정보(Amino acid change), SNP DB 수록 정보 (dbSNP)을 고려하여 변이의 특성이 단백질을 변화 시킬 수 있는 단일 뉴클레오티드(SNV) 변이 및 삽입-결실(INDEL) 변이를 제외한 나머지 변이는 원인 유전자 선발에서 제거하는 과정을 수행하였다. Characterization, variation type, amino acid change information, SNP, among the single Nucleotide Variation (SNV) and Indel-Deletion (INDEL) Considering the DB information (dbSNP), except for single nucleotide (SNV) mutations and indel-delete (INDEL) mutations in which the characteristics of the mutations can change the protein, the process of eliminating the cause gene selection was performed.

또한, 유전성 강직성 하반신마비의 경우 희귀질환으로 연구가 진행되었기 때문에 전체 동정된 단일 뉴클레오티드 변이 및 삽입-결실 중 전체 발생빈도가 1% 이상을 보이는 변이는 공통변이로 정의하여 제거하는 과정을 수행하였다. 변이의 발생 빈도를 확인하기 위하여 dbSNP, 1000 Genome 및 NHLBI GO Exome Project의 유전체데이터를 사용하여 각 단일 뉴클레오티드 변이 및 삽입-결실 중 전체 발생빈도를 확인 후 제거한 후 유전성 강직성 하반신 마비에 원인이 되는 후보 변이를 선발 하는 과정을 수행하였다. 공통된 변이를 제거하는 과정을 수행 후 남은 변이의 경우는 유전성 강직성 하반신 마비의 원인 후보 변이로 정의하였으며, 만약 서로 다른 두 개 이상의 유전자에서 변이가 선발되었을 경우 변이체 서열의 다른 종간의 유전체서열의 보존된 정도(conservation rate)를 확인하여 중간 유전체 서열의 보존된 정도가 높은 변이를 우선순위로 선발하였다.In addition, in the case of hereditary tonic paraplegia, a rare disease has been studied. Therefore, a single nucleotide variation and a mutation in which the total incidence among indels is more than 1% were defined as a common mutation and removed. To determine the incidence of mutations, candidate mutations that are responsible for hereditary stiff paraplegia after eliminating and identifying the total incidence of each single nucleotide mutation and indels using genomic data from dbSNP, 1000 Genome, and NHLBI GO Exome Project. To perform the process of selecting. The remaining mutations after the process of eliminating common mutations were defined as candidate mutations of hereditary tonic paraplegia, and if the mutations were selected from two or more different genes, the conserved genome sequences of different species of the variant sequence were preserved. Conservation rates were identified to prioritize high conserved variations of intermediate genome sequences.

또한 유전성 강직성 하반신 마비와 혼동되는 질환인 ALS (amyotrophic lateral sclerosis) 및 PLS (Primary lateral sclerosis) 환자를 구분 할 수 있는 유전자를 선발하여 유전정 강직성 하반신 마비 환자를 구분 할 수 있는 유전자를 문헌 연구를 통해 핫스팟 유전자를 선정하였다. In addition, a literature study was conducted to select genes to distinguish patients with amyotrophic lateral sclerosis (ALS) and primary lateral sclerosis (PLS), which are confused with hereditary stiff paraplegia. Hotspot genes were selected.

그 결과, INDEL 및 SNV를 검출하기 위한 유전자로 SOD1, FUS, TBK1, TARDBP, VCP, PFN1, UBQLN2, ATXN2, FIG4, ALS2를 선정하였다. As a result, SOD1, FUS, TBK1, TARDBP, VCP, PFN1, UBQLN2, ATXN2, FIG4 and ALS2 were selected as genes for detecting INDEL and SNV.

또한 유전성 강직성 하반신 마비와 혼동되는 질환인 CMT (Charcot-Marie-Tooth) 환자를 구분 할 수 있는 유전자를 선발하여 유전정 강직성 하반신 마비 환자를 구분 할 수 있는 유전자를 문헌 연구를 통해 핫스팟 유전자를 선정하였다.In addition, a hotspot gene was selected through a literature study to identify genes capable of distinguishing CMT (Charcot-Marie-Tooth) patients, a disorder confused with hereditary stiff paraplegia. .

그 결과, INDEL 및 SNV를 검출하기 위한 유전자로 PMP22, GJB1, MPZ, KIF1B, MFN2, NEFL, GARS, GDAP1, HINT1를 선정하였다.As a result, PMP22, GJB1, MPZ, KIF1B, MFN2, NEFL, GARS, GDAP1, and HINT1 were selected as genes for detecting INDEL and SNV.

1-8. 유전성 강직성 하반신마비 유전자 패널의 제조 및 보정 1-8. Preparation and Correction of Hereditary Stiff Paraplegic Gene Panels

유전성 강직성 하반신마비 유전변이의 검증이 한국인 유전성 강직성 하반신마비 환자의 36개 임상 시료에 대해서 1차 선발된 HSP gene panel sequencing 결과 전체 임상 시료 중 8가계에서 유전성 강직성 하반신마비 환자의 원인 유전자가 선발되었다. 나머지 27가계에서 유전성 강직성 하반신마비 환자의 원인 유전자 선발이 되지 않아 유전성 강직성 하반신마비 유전자 패널을 보정하기 위하여, 한국인 유전성 강직성 하반신 마비 61가계 106임상시료의 WES 유전체데이터 분석결과를 활용하여 한국인 유전성 강직성 하반신 마비 환자에서 동정된 원인 유전자 및 문헌 연구를 통해 선발한 유전성 강직성 하반신 마비 환자 원인 유전자를 추가 선별하였다. HSP gene panel sequencing was first selected for 36 clinical samples of Korean patients with hereditary stiff paraplegia, and the causal genes of hereditary stiff paraplegia were selected from eight households. In order to calibrate the hereditary stiff paraplegic gene panel because of hereditary stiff paraplegic genes in the remaining 27 households, Korean hereditary stiff paraplegia was analyzed using WES genomic data analysis of Korean 106 hereditary stiff paraplegics. The cause genes identified in patients with paralysis and literature studies were further selected for the cause genes of hereditary stiff paraplegia patients.

또한 실시예 1-7에서와 동일하게 유전성 강직성 하반신 마비와 혼동되는 질환인 ALS (amyotrophic lateral sclerosis), PLS (Primary lateral sclerosis) 및 CMT (Charcot-Marie-Tooth) 환자를 구분할 수 있는 유전자를 선발하여 유전성 강직성 하반신 마비 환자를 구분 할 수 있는 유전자를 문헌 연구를 통해 핫스팟 유전자로 선정하였다. In addition, as in Examples 1-7, genes capable of distinguishing amyotrophic lateral sclerosis (ALS), primary lateral sclerosis (PLS), and Charcot-Marie-Tooth (CMT) patients, which are confused with hereditary stiff paraplegia, were selected. Genes that can distinguish between patients with hereditary stiff paraplegia were selected as hotspot genes through literature studies.

그 결과, INDEL 및 SNV를 검출하기 위한 유전자로 SOD1, FUS, TBK1, TARDBP, VCP, PFN1, UBQLN2, ATXN2, FIG4, ALS2, PMP22, GJB1, MPZ, KIF1B, MFN2, NEFL, GARS, GDAP1, HINT1를 선정하였다.As a result, SOD1, FUS, TBK1, TARDBP, VCP, PFN1, UBQLN2, ATXN2, FIG4, ALS2, PMP22, GJB1, MPZ, KIF1B, MFN2, NEFL, GARS, GDAP1, HINT1 as genes for detecting INDEL and SNV. Selected.

　 GenotypeGenotype OMIMOMIM Gene symbolGene symbol Gene locusGene locus InheritanceInheritance Age of onsetAge of onset 1One SPG1SPG1 303350303350 L1CAML1CAM Xq28Xq28 X-linked recessiveX-linked recessive Early onsetEarly onset 22 SPG2SPG2 312920312920 PLP1PLP1 Xq22.2Xq22.2 X-linked recessiveX-linked recessive VariableVariable 33 SPG22SPG22 300523300523 SLC16A2SLC16A2 Xq13.2Xq13.2 X-linked recessiveX-linked recessive Early onsetEarly onset 44 SPG16SPG16 300266300266 ?? Xq11.2Xq11.2 X-linked recessiveX-linked recessive Early onsetEarly onset 55 SPG34SPG34 300750300750 ?? Xq24-q25Xq24-q25 X-linked recessiveX-linked recessive Teenage/AdultTeenage / Adult 66 SPG3ASPG3A 182600182600 ATL1ATL1 14q22.114q22.1 Autosomal dominantAutosomal dominant Early onsetEarly onset 77 SPG4SPG4 182601182601 SPASTSPAST 2p22.32p22.3 Autosomal dominantAutosomal dominant VariableVariable 88 SPG6SPG6 600363600363 NIPA1NIPA1 15q11.215q11.2 Autosomal dominantAutosomal dominant TeenageTeenage 99 SPG7SPG7 602783602783 SPG7SPG7 16q24.316q24.3 Autosomal dominantAutosomal dominant VariableVariable 1010 SPG8SPG8 603563603563 KIAA0196KIAA0196 8q24.138q24.13 Autosomal dominantAutosomal dominant AdultAdult 1111 SPG9ASPG9A 601162601162 ALDH18A1ALDH18A1 10q24.110q24.1 Autosomal dominantAutosomal dominant TeenageTeenage 1212 SPG10SPG10 604187604187 KIF5AKIF5A 12q13.312q13.3 Autosomal dominantAutosomal dominant Early onsetEarly onset 1313 SPG12SPG12 604805604805 RTN2RTN2 19q13.3219q13.32 Autosomal dominantAutosomal dominant EarlyEarly 1414 SPG13SPG13 605280605280 HSP60HSP60 2q33.12q33.1 Autosomal dominantAutosomal dominant VariableVariable 1515 SPG17SPG17 270685270685 BSCL2BSCL2 11q12.311q12.3 Autosomal dominantAutosomal dominant TeenageTeenage 1616 SPG19SPG19 607152607152 ?? 9q9q Autosomal dominantAutosomal dominant Adult onsetAdult onset 1717 SPG29SPG29 609727609727 ?? 1p31.1-p21.11p31.1-p21.1 Autosomal dominantAutosomal dominant TeenageTeenage 1818 SPG31SPG31 610250610250 REEP1REEP1 2p11.22p11.2 Autosomal dominantAutosomal dominant Early onsetEarly onset 1919 SPG33SPG33 610244610244 ZFYVE27ZFYVE27 10q24.210q24.2 Autosomal dominantAutosomal dominant AdultAdult 2020 SPG36SPG36 613096613096 ?? 12q23-q2412q23-q24 Autosomal dominantAutosomal dominant Teenage/AdultTeenage / Adult 2121 SPG37SPG37 611945611945 ?? 8p21.1-q13.38p21.1-q13.3 Autosomal dominantAutosomal dominant VariableVariable 2222 SPG38SPG38 612335612335 ?? 4p16-p154p16-p15 Autosomal dominantAutosomal dominant Teenage/AdultTeenage / Adult 2323 SPG41SPG41 613364613364 ?? 11p14.1-p11.211p14.1-p11.2 Autosomal dominantAutosomal dominant AdolescenceAdolescence 2424 SPG42SPG42 612539612539 SLC33A1SLC33A1 3q25.313q25.31 Autosomal dominantAutosomal dominant VariableVariable 2525 SPG72SPG72 615625615625 REEP2REEP2 5q315q31 Autosomal recessive; dominantAutosomal recessive; dominant InfancyInfancy 2626 SPG73SPG73 616282616282 CPT1CCPT1C 19q13.3319q13.33 Autosomal dominantAutosomal dominant AdultAdult 2727 SPG5ASPG5A 270800270800 CYP7B1CYP7B1 8q12.38q12.3 Autosomal recessiveAutosomal recessive VariableVariable 2828 SPG9BSPG9B 616586616586 ALDH18A1ALDH18A1 10q24.110q24.1 Autosomal recessiveAutosomal recessive Early onsetEarly onset 2929 SPG11SPG11 604360604360 SPG11SPG11 15q21.115q21.1 Autosomal recessiveAutosomal recessive VariableVariable 3030 SPG14SPG14 605229605229 ?? 3q27-q283q27-q28 Autosomal recessiveAutosomal recessive AdultAdult 3131 SPG15SPG15 270700270700 ZFYVE26ZFYVE26 14q24.114q24.1 Autosomal recessiveAutosomal recessive Early onsetEarly onset 3232 SPG18SPG18 611225611225 ERLIN2ERLIN2 8p11.238p11.23 Autosomal recessiveAutosomal recessive Early onsetEarly onset 3333 SPG20SPG20 275900275900 SPG20SPG20 13q13.313q13.3 Autosomal recessiveAutosomal recessive Early onsetEarly onset 3434 SPG21SPG21 248900248900 SPG21SPG21 15q22.3115q22.31 Autosomal recessiveAutosomal recessive Early onsetEarly onset 3535 SPG23SPG23 270750270750 ?? 1q24-q321q24-q32 Autosomal recessiveAutosomal recessive Early onsetEarly onset 3636 SPG24SPG24 607584607584 ?? 13q1413q14 Autosomal recessiveAutosomal recessive Early onsetEarly onset 3737 SPG25SPG25 608220608220 ?? 6q23-q24.16q23-q24.1 Autosomal recessiveAutosomal recessive AdultAdult 3838 SPG26SPG26 609195609195 B4GALNT1B4GALNT1 12q13.312q13.3 Autosomal recessiveAutosomal recessive Early onsetEarly onset 3939 SPG27SPG27 609041609041 ?? 10q22.1-q24.110q22.1-q24.1 Autosomal recessiveAutosomal recessive VariableVariable 4040 SPG28SPG28 609340609340 DDHD1DDHD1 14q22.114q22.1 Autosomal recessiveAutosomal recessive Early onsetEarly onset 4141 SPG30SPG30 610357610357 KIF1AKIF1A 2q37.32q37.3 Autosomal recessiveAutosomal recessive TeenageTeenage 4242 SPG32SPG32 611252611252 ?? 14q12-q2114q12-q21 Autosomal recessiveAutosomal recessive ChildhoodChildhood 4343 SPG35SPG35 612319612319 FA2HFA2H 16q23.116q23.1 Autosomal recessiveAutosomal recessive ChildhoodChildhood 4444 SPG39SPG39 612020612020 PNPLA6PNPLA6 19p13.219p13.2 Autosomal recessiveAutosomal recessive ChildhoodChildhood 4545 SPG43SPG43 615043615043 C19orf12C19orf12 19q1219q12 Autosomal recessiveAutosomal recessive ChildhoodChildhood 4646 SPG44SPG44 613206613206 GJC2GJC2 1q42.131q42.13 Autosomal recessiveAutosomal recessive Childhood/teenageChildhood / teenage 4747 SPG45SPG45 613162613162 NT5C2NT5C2 10q24.32-q24.3310q24.32-q24.33 Autosomal recessiveAutosomal recessive InfancyInfancy 4848 SPG46SPG46 614409614409 GBA2GBA2 9p13.39p13.3 Autosomal recessiveAutosomal recessive VariableVariable 4949 SPG47SPG47 614066614066 AP4B1AP4B1 1p13.21p13.2 Autosomal recessiveAutosomal recessive ChildhoodChildhood 5050 SPG48SPG48 613647613647 AP5Z1AP5Z1 7p22.17p22.1 Autosomal recessiveAutosomal recessive 6th decade6th decade 5151 SPG49SPG49 615041615041 TECPR2TECPR2 14q32.3114q32.31 Autosomal recessiveAutosomal recessive InfancyInfancy 5252 SPG50SPG50 612936612936 AP4M1AP4M1 7q22.17q22.1 Autosomal recessiveAutosomal recessive InfancyInfancy 5353 SPG51SPG51 613744613744 AP4E1AP4E1 15q21.215q21.2 Autosomal recessiveAutosomal recessive InfancyInfancy 5454 SPG52SPG52 614067614067 AP4S1AP4S1 14q1214q12 Autosomal recessiveAutosomal recessive InfancyInfancy 5555 SPG53SPG53 614898614898 VPS37AVPS37A 8p228p22 Autosomal recessiveAutosomal recessive ChildhoodChildhood 5656 SPG54SPG54 615033615033 DDHD2DDHD2 8p11.238p11.23 Autosomal recessiveAutosomal recessive ChildhoodChildhood 5757 SPG55SPG55 615035615035 C12orf65C12orf65 12q24.3112q24.31 Autosomal recessiveAutosomal recessive ChildhhodChildhhod 5858 SPG56SPG56 615030615030 CYP2U1CYP2U1 4q254q25 Autosomal recessiveAutosomal recessive ChildhhodChildhhod 5959 SPG57SPG57 615658615658 TFGTFG 3q12.23q12.2 Autosomal recessiveAutosomal recessive Early onsetEarly onset 6060 SPG58SPG58 611302611302 KIF1CKIF1C 17p13.217p13.2 Autosomal recessiveAutosomal recessive Within first two decadesWithin first two decades 6161 SPG59SPG59 -- USP8USP8 15q21.215q21.2 ?? ChildhoodChildhood 6262 SPG60SPG60 -- WDR48WDR48 3p22.23p22.2 ?? InfancyInfancy 6363 SPG61SPG61 615685615685 ARL6IP1ARL6IP1 16p12.316p12.3 Autosomal recessiveAutosomal recessive InfancyInfancy 6464 SPG62SPG62 615681615681 ERLIN1ERLIN1 10q24.3110q24.31 Autosomal recessiveAutosomal recessive ChildhoodChildhood 6565 SPG63SPG63 615686615686 AMPD2AMPD2 1p13.31p13.3 Autosomal recessiveAutosomal recessive InfancyInfancy 6666 SPG64SPG64 615683615683 ENTPD1ENTPD1 10q24.110q24.1 Autosomal recessiveAutosomal recessive ChildhoodChildhood 6767 SPG66SPG66 -- ARSIARSI 5q325q32 ?? InfancyInfancy 6868 SPG67SPG67 615802615802 PGAP1PGAP1 2q33.12q33.1 Autosomal recessiveAutosomal recessive InfancyInfancy 6969 SPG68SPG68 609541609541 KLC2KLC2 11q13.111q13.1 Autosomal recessiveAutosomal recessive ChildhoodChildhood 7070 SPG69SPG69 -- RAB3GAP2RAB3GAP2 1q411q41 　　 7171 SPG70SPG70 -- MARSMARS 12q1312q13 ?? InfancyInfancy 7272 SPG71SPG71 -- ZFRZFR 5p13.35p13.3 　 ChildhoodChildhood 7373 SPG74SPG74 616451616451 IBA57IBA57 1q42.131q42.13 Autosomal recessiveAutosomal recessive ChildhoodChildhood 7474 SPG75SPG75 616680616680 MAGMAG 19q13.1219q13.12 Autosomal recessiveAutosomal recessive ChildhoodChildhood 7575 SPG76SPG76 616907616907 CAPN1CAPN1 11q1311q13 Autosomal recessiveAutosomal recessive AdultAdult 7676 HSNSPHSNSP 256840256840 CCT5CCT5 5p15.25p15.2 Autosomal recessiveAutosomal recessive ChildhoodChildhood Related GenesRelated Genes 　 diseasedisease 　 Gene symbolGene symbol 　　　 7777 HspHsp 　 BICD2 BICD2 　　　 7878 　 LYSTLYST 　　　 7979 ALSALS 　 SOD1SOD1 　　　 8080 　 C9ORF72C9ORF72 　　　 8181 　 TDP43TDP43 　　　 8282 　 FUSFUS 　　　 8383 PLS (Juvenile)PLS (Juvenile) 　 ALS2ALS2 　　　 8484 ALSALS 　 OPTNOPTN 　　　 8585 　 VCPVCP 　　　 8686 　 TARDBPTARDBP 　　　 8787 　 ANGANG 　　　 8888 　 FIG4FIG4 　　　 8989 EOFADEOFAD 　 PS1PS1 　　　 9090 　 PS2PS2 　　　 9191 　 bAPPbAPP 　　　 9292 CMTCMT 　 CMT1CMT1 　　　 9393 　 CMT2CMT2 　　　 9494 　 HSANHSAN 　　　 9595 　 HMNHMN 　　　 9696 　 CCDC50CCDC50 　　　

실시예Example 2. 유전성 강직성 하반신마비 유전자 패널의 분석 결과 2. Analysis of Genetic Stiff Paraplegic Gene Panel

최종 선발된 82개의 유전성 강직성 하반신마비 유전자(표 1)를 HiSeq2500 Genome Analyzer(Illumina)에 최적화된 Sureselect(Agilent)에 따라 프라이머를 설계하였다. 새롭게 개발된 유전자 패널을 보정하기 위해, 사전에 결정된 원인 변이를 갖는 31명 임상시료에 대하여 시험하였다. 예상한 바와 같이, 시료로부터 100% 변이된 유전성 강직성 하반신 마비 원인 유전자를 검출할 수 있었다. 이는 새롭게 개발된 HSP 유전자 패널의 유효성이 충분히 확인됨을 시사하는 것이다. The primers were designed according to Sureselect (Agilent) optimized for the finally selected 82 hereditary stiff paraplegic genes (Table 1) in HiSeq2500 Genome Analyzer (Illumina). To calibrate the newly developed panel of genes, 31 clinical samples with predetermined causal variations were tested. As expected, 100% mutated hereditary stiff paraplegia causes genes could be detected from the sample. This suggests that the validity of the newly developed panel of HSP genes is sufficiently confirmed.

또한, 한국인 유전성 강직성 하반신마비 HSP gene panel 데이터 분석을 수행한 결과, 31가계 모든 임상시료에서 19개 HSP 원인 유전자(SPG11, KIF5A, ZFYVE26, SPAST, AFG3L2, AMPD2, AP5Z1, ATP2B4, CYP7B1, GAD1, KIAA0196, KIF1A, KIF1C, LYST, MARS, NIPA1, SLC2A1, SPG20, SPG7)가 동정되었다. 유전성 강직성 하반신마비 유전자 패널에 포함된 유전성 강직성 하반신마비 원인유전자에서 변이체들이 발견되어 유전성 강직성 하반신마비 유전자 패널의 유효성이 검증되었을 알 수 있었다. In addition, we analyzed 19 hereditary stiff paraplegic HSP gene panel data. , KIF1A, KIF1C, LYST, MARS, NIPA1, SLC2A1, SPG20, SPG7). Variants were found in the hereditary stiff paraplegic genes included in the hereditary stiff paraplegic gene panel, indicating that the hereditary stiff paraplegic gene panel was validated.

FamilyFamily ChrChr StartStart RefRef AltAlt GeneGene
NameName AAChangeAAChange .. refGenerefGene HSP-019DHSP-019D chr12chr12 5795872457958724 CC GG KIF5AKIF5A NM_004984NM_004984 exon6exon6 c.C469Gc.C469G p.H157Dp.H157D HSP-026HSP-026 chr14chr14 6824182868241828 GG CC ZFYVE26ZFYVE26 NM_015346NM_015346 exon27exon27 c.C5225Gc.C5225G p.S1742Cp.S1742C HSP-028HSP-028 chr14chr14 6824823168248231 TT CC ZFYVE26ZFYVE26 NM_015346NM_015346 exon22exon22 c.A4388Gc.A4388G p.Q1463Rp.Q1463R HSP-031FHSP-031F chr15chr15 4494113444941134 GG CC SPG11SPG11 NM_001160227NM_001160227 exon7exon7 c.C1532Gc.C1532G p.A511G;SPG11p.A511G; SPG11 HSP-031PHSP-031P chr15chr15 4494113444941134 GG CC SPG11SPG11 NM_001160227NM_001160227 exon7exon7 c.C1532Gc.C1532G p.A511G;SPG11p.A511G; SPG11 HSP-032HSP-032 chr8chr8 126087346126087346 CC TT KIAA0196KIAA0196 NM_014846NM_014846 exon8exon8 c.G872Ac.G872A p.S291Np.S291N HSP-033HSP-033 chr2chr2 3236222132362221 CC TT SPASTSPAST NM_199436NM_199436 exon11exon11 c.C1361Tc.C1361T p.T454I;SPASTp.T454I; SPAST HSP-034HSP-034 chr14chr14 6824182868241828 GG CC ZFYVE26ZFYVE26 NM_015346NM_015346 exon27exon27 c.C5225Gc.C5225G p.S1742Cp.S1742C HSP-035HSP-035 chr12chr12 5796318657963186 CC TT KIF5AKIF5A NM_004984NM_004984 exon10exon10 c.C967Tc.C967T p.R323Wp.R323W HSP-036HSP-036 chr12chr12 5790970957909709 CC AA MARSMARS NM_004990NM_004990 exon19exon19 c.C2398Ac.C2398A p.P800Tp.P800T HSP-037HSP-037 chr15chr15 4492100144921001 TT CC SPG11SPG11 NM_001160227NM_001160227 exon10exon10 c.A1933Gc.A1933G p.S645G;SPG11p.S645G; SPG11 HSP-038DHSP-038D chr1chr1 203690406203690406 CC TT ATP2B4ATP2B4 NM_001001396NM_001001396 exon17exon17 c.C2680Tc.C2680T p.P894S;ATP2B4p.P894S; ATP2B4 HSP-038PHSP-038P chr15chr15 2308636523086365 GCCGCCGCCGCCGCCGCC -- NIPA1NIPA1 NM_144599NM_144599 exon1exon1 c.39_47delc.39_47del p.13_16delp.13_16del HSP-039HSP-039 chr2chr2 241696841241696841 TCCTCC -- KIF1AKIF1A NM_001244008NM_001244008 exon27exon27 c.2751_2753delc.2751_2753del p.917_918delp.917_918del HSP-040PHSP-040P chr8chr8 6551724365517243 GG AA CYP7B1CYP7B1 NM_004820NM_004820 exon5exon5 c.C1229Tc.C1229T p.P410Lp.P410L HSP-041HSP-041 chr18chr18 1237087212370872 TT CC AFG3L2AFG3L2 NM_006796NM_006796 exon3exon3 c.A268Gc.A268G p.K90Ep.K90E HSP-042HSP-042 chr2chr2 171675181171675181 CC AA GAD1GAD1 NM_000817NM_000817 exon2exon2 c.C80Ac.C80A p.T27K;GAD1p.T27K; GAD1 HSP-043HSP-043 chr15chr15 4492100144921001 TT CC SPG11SPG11 NM_001160227NM_001160227 exon10exon10 c.A1933Gc.A1933G p.S645G;SPG11p.S645G; SPG11 HSP-044HSP-044 chr12chr12 5796305857963058 GG AA KIF5AKIF5A NM_004984NM_004984 exon10exon10 c.G839Ac.G839A p.R280Hp.R280H HSP-045PHSP-045P chr16chr16 8959891289598912 CC TT SPG7SPG7 NM_003119NM_003119 exon9exon9 c.C1192Tc.C1192T p.R398X;SPG7p.R398X; SPG7 8962094389620943 TT CC SPG7SPG7 exon16exon16 c.T2153Cc.T2153C p.L718Pp.L718P HSP-046HSP-046 chr1chr1 4339555543395555 CC TT SLC2A1SLC2A1 NM_006516NM_006516 exon5exon5 c.G668Ac.G668A p.R223Qp.R223Q HSPS-01953HSPS-01953 chr7chr7 48230704823070 AA GG AP5Z1AP5Z1 NM_014855NM_014855 exon4exon4 c.A490Gc.A490G p.S164Gp.S164G HSPS-030PHSPS-030P chr1chr1 110168830110168830 GG -- AMPD2AMPD2 NM_203404NM_203404 exon2exon2 c.207delGc.207delG p.T69fs;AMPD2p.T69fs; AMPD2 HSPS-031PHSPS-031P chr1chr1 235897950235897950 TT GG LYSTLYST NM_000081NM_000081 exon32exon32 c.A8368Cc.A8368C p.K2790Q;LYSTp.K2790Q; LYST HSPS-032PHSPS-032P chr2chr2 3235351332353513 TTTTTT -- SPASTSPAST NM_199436NM_199436 exon8exon8 c.1114_1116delc.1114_1116del p.372_372del;SPASTp.372_372del; SPAST HSPS-033HSPS-033 chr12chr12 5795873957958739 CC TT KIF5AKIF5A NM_004984NM_004984 exon6exon6 c.C484Tc.C484T p.R162Wp.R162W HSPS-034HSPS-034 chr17chr17 49102784910278 TT GG KIF1CKIF1C NM_006612NM_006612 exon14exon14 c.T1234Gc.T1234G p.S412Ap.S412A HSPS-036PHSPS-036P chr2chr2 3235354932353549 GG AA SPASTSPAST NM_014946NM_014946 exon9exon9 c.1245+1G>A;NM_199436c.1245 + 1G> A; NM_199436 exon8exon8 HSPS-038PHSPS-038P chr15chr15 4487646744876467 CACA -- SPG11SPG11 NM_001160227NM_001160227 exon30exon30 c.5410_5411delc.5410_5411del p.C1804fs;SPG11p.C1804fs; SPG11 chr15chr15 4490303744903037 CC AA exon18exon18 c.3291+1G>T;NM_025137c.3291 + 1G> T; NM_025137 exon18exon18 HSPS-039HSPS-039 chr13chr13 3690557136905571 CC AA SPG20SPG20 NM_001142294NM_001142294 exon3exon3 c.G973Tc.G973T p.D325Y;SPG20p.D325Y; SPG20 HSPS-040HSPS-040 chr14chr14 6827461268274612 CC GG ZFYVE26ZFYVE26 NM_015346NM_015346 exon5exon5 c.G389Cc.G389C p.G130Ap.G130A

실시예Example 3. 유전성 강직성 하반신마비 유전자 패널의 분석 프로그램 개발 3. Development of analysis program for hereditary stiff paraplegic gene panel

가계성 및 비가계성 유전성 강직성 하반신마비 질환의 원인 후보 유전자를 찾아내는 방법을 제공하기 위하여 유전성 강직성 하반신마비 유전자 패널의 분석 프로그램을 개발하고자 하였다. 구체적으로 환자와 부모 또는 형제자매의 염기 서열 정보를 수치화하고, 그 패턴을 분석하여 가계성 질환의 원인 후보 유전자 리스트를 이용하였다. The aim of this study was to develop an analytical program of a hereditary stiff paraplegic gene panel to provide a method of finding candidate genes for pedophilic and non-genetic hereditary hypoparesis. Specifically, nucleotide sequence information of patients, parents or siblings was digitized, and the patterns were analyzed to use a list of candidate genes for causes of household diseases.

가계성 및 비가계성 유전성 강직성 하반신마비 질환의 원인 유전자 후보 리스트를 제공하는 방법은 (a) 환자 부모의 염기서열 데이터 유무에 따른 분류 단계; (b) 지시함수(indicator function)를 이용하여 상기 (a) 단계에서 나누어진 조건에 따른 단일 염기서열(Single Nucleotide Variant, SNV) 데이터의 수치화 단계; (c) 분석 가능한 가족 구성원의 수를 이용하여 윈도우 사이즈 n을 결정하는 단계; (d) 특정 염기서열을 중심으로 좌우 n개의 염기서열의 (b) 데이터를 이용하여 단일비율검정(proportion test)을 실시하는 단계; (e) 모든 염기서열에 대해 (d)의 검정을 실시하는 단계; (f) 단측 검정 결과를 이용하여 각 위치의 염기서열에서의 유의확률(p-value)을 얻는 단계; (g) 윈도우의 물리적인 위치 보정을 위한 가중치 계산 단계; (h) 상기 (f)의 유의확률과 (g)의 가중치를 이용하여 점수를 계산하는 단계; (i) 상기 (h)의 점수가 -log(0.05)=2.996 이상인 단일 염기서열에서 환자와 정상인의 패턴이 각각 일치하는지 확인하는 단계; (j) 상기 (i)의 조건을 만족하는 단일 염기서열 위치가 암호화 부위(coding region)인지 확인하는 단계; (k) 상기 (j)의 조건을 만족하는 위치의 단일 염기서열을 유전자기호(gene symbol)로 변환하는 단계; 및 (l) 점수에 따라 순위를 매긴 후 원인 후보 유전자 리스트를 확인하는 단계를 포함한다.Methods for providing a list of candidate genes for causative and non-genetic hereditary hypoparesis disorders include: (a) classifying according to the presence or absence of sequencing data of the parent of the patient; (b) quantifying single nucleotide sequence (SNV) data according to the conditions divided in step (a) using an indicator function; (c) determining window size n using the number of family members that can be analyzed; (d) performing a proportional test using (b) data of left and right n base sequences around a specific base sequence; (e) performing the assay of (d) for all base sequences; (f) obtaining a significant probability (p-value) at the base sequence of each position using the one-sided test result; (g) a weight calculation step for physical position correction of the window; (h) calculating a score using the significant probability of (f) and the weight of (g); (i) checking whether the pattern of the patient and the normal person coincides with each other in a single nucleotide sequence having a score of (h) above -log (0.05) = 2.996; (j) confirming whether a single sequence position satisfying the condition of (i) is a coding region; (k) converting a single nucleotide sequence of a position satisfying the condition of (j) into a gene symbol; And (l) identifying the cause candidate gene list after ranking according to the score.

본 발명은 상기 (a) 단계에서 환자의 부친과 모친의 염기서열 정보를 모두 사용할 수 있는 경우와 그렇지 않은 경우를 나누어 분석한다.In the step (a), the present invention analyzes the case where both the father's and mother's base sequence information can be used and the case where it is not.

본 발명에서 환자의 부친과 모친의 염기 서열 데이터를 모두 사용할 수 있는 경우에 단일 염기서열을 다음 수학식 1과 같이 지시함수로 수치화한다.In the present invention, when both the nucleotide sequence data of the father and mother of the patient can be used, a single nucleotide sequence is quantified by an instruction function as shown in Equation 1 below.

상기 수학식 1에서 S는 환자와 정상을 구분하는 변수이다. S=0은 정상인을 뜻하고, S=1은 환자를 뜻한다. 상기 수학식 1에서 SNVjv(S)는 v번째 가족 구성원의 j번째 SNV를 의미한다.In Equation 1, S is a variable for distinguishing a patient from a normal. S = 0 means normal person, S = 1 means patient. In Equation 1, SNVjv (S) means the j-th SNV of the v-th family member.

환자 부모의 염기서열 자료를 모두 사용할 수 있는 경우, 부친의 자료를 v=1, 모친의 자료를 v=2로 설정한다. v=3,…, V는 v-2 번째 자녀 염기서열 데이터이다.If all of the patient's parent sequences are available, set the father's data to v = 1 and the mother's data to v = 2. v = 3,... , V is the v-2 child sequence data.

상기 수학식 1에서 REFj는 인간게놈참조서열의 j번째 위치의 유전자형(genotype)이다. 상기 인간게놈참조서열은 UCSC 에서 제공하는 hg19이다. 지시함수는 특정 집합에 속하는지 여부를 0 또는 1로 표기하는 함수이다. 상기 수학식 1을 이용하여 가족 전체의 패턴 빈도 LFj를 계산한다. 상기 패턴빈도 LFj의 계산방법은 아래 수학식 2와 같다.In Equation 1, REFj is a genotype of the j th position of the human genome reference sequence. The human genome reference sequence is hg19 provided by UCSC. An indication function is a function that indicates whether it belongs to a specific set as 0 or 1. The pattern frequency LFj of the entire family is calculated using Equation 1 above. The calculation method of the pattern frequency LFj is shown in Equation 2 below.

상기 수학식 2에서 C는 자녀의 숫자이다.In Equation 2, C is the number of children.

부친 또는 모친 염기서열 정보 중 하나만 사용 가능하거나 부모 데이터 모두를 사용할 수 없는 경우, 아래 수학식 3을 이용하여 단일 염기서열 정보를 수치화한다.If only one of the father or mother sequence information is available or both of the parent data is not available, the single sequence information is digitized using Equation 3 below.

상기 수학식 3에서 MSNV_j(S=1)은 분석 대상 가족 구성원 내에서 환자들만이 가지고 있는 SNV 중 빈도가 높은 패턴을 나타낸다. 상기 수학식 3을 이용하여 가족 전체의 패턴 빈도 LFj를 계산한다. 상기 패턴빈도 LFj의 계산방법은 다음 수학식 4와 같다.In Equation 3, MSNV _j (S = 1) represents a high frequency pattern among SNVs that only patients have within the family members to be analyzed. Equation 3 is used to calculate the pattern frequency LFj of the entire family. The calculation method of the pattern frequency LFj is shown in Equation 4 below.

특정 위치 염기서열을 중심으로 주위의 염기서열을 포함하여 패턴을 분석한다. 특정 i 번째 위치 염기 서열을 중심으로 좌우 n개의 SNV를 윈도우

로 설정한다. 윈도우

는 아래의 수학식 5와 같다.The pattern is analyzed by including the surrounding nucleotide sequence around the specific position nucleotide sequence. Window n left and right SNVs around a specific i-th base sequence

Set to. window

Is the same as Equation 5 below.

특정 j 번째 위치 염기서열에서 분석 대상 가족의 비율은 하기 수학식 6과 같다.The proportion of the family to be analyzed in the specific j th position sequence is represented by Equation 6 below.

상기 수학식 6으로 도출된 비율 자료를 수치화하기 위하여 하기 수학식 7을 사용한다.Equation 7 is used to quantify the ratio data derived from Equation 6.

윈도우 내 SNV 위치에서 상기 수학식 7의 수치를 이용하여 단일비율검정(proportion test)을 실시한다. 단일비율검정의 귀무가설은 PRj.=0.5로 설정한다. 단일비율검정에서 얻은 유의확률을 f(PRj.)으로 설정한다. 다음 i+1번째 위치로 이동하여(window moving) 위의 단일비율검정을 재차 실시한다. 차례로 이동하며 전체 SNP에 대해 검정한다. 상기 윈도우 크기를 결정하는 n은 분석에 사용된 가족 구성원의 수에 따라 달라지며, 크기를 결정하기 위하여 아래의 수학식 8를 이용하였다.A proportional test is performed using the numerical value of Equation 7 at the SNV position in the window. The null hypothesis of the single rate test is set to PRj. = 0.5. Set the significance probability obtained from the single ratio test to f (PRj.). Move to the next i + 1th position (window moving) and perform the above single ratio test again. Moves in turn and assays for the entire SNP. The n for determining the window size depends on the number of family members used in the analysis, and Equation 8 below was used to determine the size.

상기 수학식 8는 통계적 표본크기 결정법이다.Equation 8 is a statistical sample size determination method.

상기 수학식 8에서 z_0.05는 표준정규분포(standard normal distribution)의 유의수준 0.05에서의 검정통계량이다. 상기 수학식 8에서 E는 오차한계(margin of error)이다. 상기 오차한계는 0.05로 지정한다. 상기 수학식 8에서 C는 자녀의 숫자이다. SNV에서 얻은 유의확률과 window 양 끝점의 물리적 위치의 가중치를 부여하여 점수를 계산한다. 상기 가중치의 계산 방법은 아래 수학식 9과 같다.In Equation 8, z _0.05 is a test statistic at a significance level of 0.05 of the standard normal distribution. In Equation 8, E is a margin of error. The error limit is designated as 0.05. In Equation 8, C is the number of children. Calculate the score by assigning the weighted value of the probabilities of the SNV and the physical location of both ends of the window. The calculation method of the weight is expressed by Equation 9 below.

여기에서

는 SNV가 발생된 위치를 나타낸다. From here

Indicates the position where the SNV is generated.

상기 수학식 10은 유의확률에 가중치를 곱하여 -log를 취해 작은 값을 큰 값으로 역변환시킨 점수이다. 상기 수학식 10을 이용하여 특정 SNV에서 가계 특이적인 질환의 원인 후보 유전자에 점수를 부여한다. 상기 점수가 유의수준 0.05를 역변환한 -log(0.05)=2.996보다 작은 경우, 후보 유전자에서 제외한다. Equation (10) is a score obtained by multiplying the significance probability by a weight and taking a -log to invert a small value to a large value. Equation 10 is used to assign a score to a cause candidate gene of a family specific disease in a specific SNV. If the score is less than -log (0.05) = 2.996, which is the inverse of the significance level of 0.05, it is excluded from the candidate gene.

상기 조건을 만족한 SNV에서 환자와 부모 및 형제자매의 염기서열 정보를 이용하여 환자와 정상인의 염기서열이 각각 일치하는 위치를 선별한다. 분석 대상 가족 내에서 환자와 정상인의 염기서열을 정확하게 구분 짓는 위치는 상기 수학식 2,4에서 LFj.=0인 위치이다. In the SNV that satisfies the above conditions, the sequence of the sequence of the patient and the normal person is selected using the sequence information of the patient, the parent and the sibling. The position of accurately distinguishing the nucleotide sequences of the patient and the normal person in the family to be analyzed is the position of LFj. = 0 in Equations 2 and 4 above.

상기 걸러낸 염기서열의 위치가 암호화 부위(coding region)인지 확인하고, 암호화 부위의 염기서열만 대상으로 삼는다. 상기 선정된 SNV 염기서열을 유전자기호(gene symbol)로 변환한다. 상기 선정된 SNV에 해당하는 유전자에 대한 설명과 표현형 및 PolyPhen-2와 SIFT 프로그램의 점수 등 추가 정보를 제공한다. PolyPhen-2와 SIFT는 염기서열의 변이에 따른 질병을 예측하는 프로그램이다. 상기 모든 과정은 R언어를 통해 분석하며, 분석 결과는 점수가 높은 순으로 정렬하여 표 형식으로 데이터를 제공한다.Check whether the position of the filtered nucleotide sequence is a coding region, and target only the nucleotide sequence of the coding region. The selected SNV base sequence is converted into a gene symbol. Additional information is provided, including a description and phenotype of the genes corresponding to the selected SNV, and scores for the PolyPhen-2 and SIFT programs. PolyPhen-2 and SIFT are programs for predicting disease caused by mutated sequences. All the above processes are analyzed in R language, and the analysis results are arranged in order of high score to provide data in tabular form.

실시예Example 4. 신규 유전자 돌연변이 선별 4. New Gene Mutation Screening

한국인 HSP임상시료 60가계 109임상시료에서 전장 엑솜 시퀀싱 결과 분석을 통해 기존에 알려진 유전성 강직성 하반신마비 환자와 정상군의 연관성 분석을 한 결과, 기존에 알려진 13개 유전자에서 29개 Loci에서 한국인 유전성 강직성 하반신마비 환자에서 새로운 뉴클레오타이드 변이 및 삽입-결실된 결과를 확보하여 신규 변이체 정보를 확보하였다. 상기 결과는 서양인에서는 한번도 보고되지 않은 신규변이체 정보이다(표 3).The analysis of the association between the normal group and the hereditary stiff paraplegic patients with 60 known HSP clinical samples and 109 clinical samples showed that the Korean hereditary stiff lower body had 29 loci in 13 known genes. New variant information and indel-deleted results were obtained in paralyzed patients to obtain new variant information. The results are new variant information that has never been reported in Westerners (Table 3).

한국인 유전성 강직성 임상시료에서 발굴된 신규 변이 정보Information on the Novel Variations in Korean Hereditary Rigid Clinical Samples Gene NameGene Name ChromosomeChromosome PositionPosition Alternative alleleAlternative allele Protein changeProtein change AP4M1AP4M1 chr7chr7 9970316299703162 c.G929Ac.G929A p.R310Qp.R310Q ATL1ATL1 chr14chr14 5106057751060577 c.C536Ac.C536A p.S179Yp.S179Y CYP7B1CYP7B1 chr8chr8 6550947465509474 c.G1246Cc.G1246C p.D416Hp.D416H chr8chr8 6553695865536958 c.259+2T>Cc.259 + 2T> C 　 DDHD1DDHD1 chr14chr14 5351358053513580 c.A2525Gc.A2525G p.Y842Cp.Y842C chr14chr14 5352985553529855 c.1571_1572delc.1571_1572del p.F524fsp.F524fs KIAA0196KIAA0196 chr8chr8 126087346126087346 c.G872Ac.G872A p.S291Np.S291N chr8chr8 126067844126067844 c.G2086Ac.G2086A p.G696Sp.G696S KIF1CKIF1C chr17chr17 49102784910278 c.T1234Gc.T1234G p.S412Ap.S412A KIF5AKIF5A chr12chr12 5795872457958724 c.C469Gc.C469G p.H157Dp.H157D chr12chr12 5796318657963186 c.C967Tc.C967T p.R323Wp.R323W chr12chr12 5795873957958739 c.C484Tc.C484T p.R162Wp.R162W PLP1PLP1 chrXchrX 103045525103045525 c.G833Cc.G833C p.X278Sp.X278S REEP1REEP1 chr2chr2 8648181586481815 c.324+2T>Cc.324 + 2T> C 　 chr2chr2 8647915186479151 c.G265Ac.G265A p.D89Np.D89N SPASTSPAST chr2chr2 3235349932353499 c.C1100Gc.C1100G p.S367Wp.S367W chr2chr2 3235205332352053 c.1039_1044delc.1039_1044del 　 chr2chr2 3234128232341282 c.1098+1G>c.1098 + 1G> 　 chr2chr2 3234080732340807 c.812dupCc.812dupC p.T271fsp.T271fs chr2chr2 3231465832314658 c.570delCc.570delC p.D190fsp.D190fs chr2chr2 3236222132362221 c.C1361Tc.C1361T p.T454Ip.T454I chr2chr2 3235351332353513 c.1114_1116delc.1114_1116del 　 chr2chr2 3235354932353549 c.1245+1G>Ac.1245 + 1G> A 　 SPG11SPG11 chr15chr15 4494113444941134 c.C1532Gc.C1532G p.A511Gp.A511G chr15chr15 4494938344949383 c.C779Gc.C779G p.T260Sp.T260S chr15chr15 4494108644941086 C1580TC1580T p.S527Lp.S527L SPG20SPG20 chr13chr13 3690557136905571 c.G973Tc.G973T p.D325Yp.D325Y SPG7SPG7 chr16chr16 8959891289598912 c.C1192Tc.C1192T p.R398Xp.R398X chr16chr16 8962094389620943 c.T2153Cc.T2153C p.L718Pp.L718P

이상의 설명으로부터, 본 발명이 속하는 기술분야의 당업자는 본 발명이 그 기술적 사상이나 필수적 특징을 변경하지 않고서 다른 구체적인 형태로 실시될 수 있다는 것을 이해할 수 있을 것이다. 이와 관련하여, 이상에서 기술한 실시 예들은 모든 면에서 예시적인 것이며 한정적인 것이 아닌 것으로서 이해해야만 한다. 본 발명의 범위는 상기 상세한 설명보다는 후술하는 특허 청구범위의 의미 및 범위 그리고 그 등가 개념으로부터 도출되는 모든 변경 또는 변형된 형태가 본 발명의 범위에 포함되는 것으로 해석되어야 한다.From the above description, those skilled in the art will appreciate that the present invention can be implemented in other specific forms without changing the technical spirit or essential features. In this regard, the embodiments described above are to be understood in all respects as illustrative and not restrictive. The scope of the present invention should be construed that all changes or modifications derived from the meaning and scope of the following claims and equivalent concepts rather than the detailed description are included in the scope of the present invention.

Claims

(a) quantifying SNV data using hereditary stiff paraplegic single nucleotide sequence data and family information in which hereditary stiff paraplegic single nucleotide sequence (SNV) data and family information are input;
(b) determining a window size n using the number of family members capable of analyzing the SNV data quantified in step (a);
(c) a window including n base sequences having a window size determined in step (b), respectively, is set to the left and the right of a specific position sequence, respectively, and is analyzed using the quantized SNV data in the set window. Calculating a proportion of the target family;
(d) calculating a significant probability (p-value) using a single ratio test at a single sequence (SNV) position in the window set in step (c);
(e) calculating weights for physical position correction of both ends of the window set in step (c);
(f) calculating a priority score using the significance probability calculated in step (d) and the weight calculated in step (e);
(g) converting a single nucleotide sequence (SNV) into a gene symbol according to the calculated priority score; And
(h) determining the priority candidates according to the calculated priority score, and identifying a cause candidate gene list according to the determined priority.

The method of claim 1,
The predetermined condition of the step (a) is (i) if both the nucleotide sequence information of the father and mother of the individual can be used

ego,
(ii) If neither the father nor the mother's nucleotide sequence information is available or only one is available, the preset equation is

Is,
Where MSNV _i (S = 1) is the most common pattern for the patient's SNV at the i-th position and v = 1, ..., V, wherein the new variant of HSP disease causes.

The method of claim 1,
The SNV data of step (a) is AMPD2, PSEN1, APP, CAPN1, FUS, ALDH18A1, ALDH18A1, SOD1, MARS, MPZ, MAG, NEFL, PFN1, SLC16A2, UBQLN2, PLP1, GJB1, L1CAM, GARS, PSEN2 , PMP22, KIF1A, HINT1, ATXN2, ENTPD1, B4GALNT1, AP4M1, TFG, SPG7, KIF5A, KIF1C, USP8, RTN2, PNPLA6, SLC33A1, CYP7B1, TBK1, TARDBP, KIF1B2, BSCL2, ATLS, APCL1 , AP4E1, AP4B1, ARL6IP1, NIPA1, SPG21, MFN2, GJC2, CPT1C, REEP1, RAB3GAP2, REEP2, FIG4, GBA2, BICD2, VPS37A, ARSI, CCT5, CYP2U1, SPG11, FA2H, CCDC50AP ERL1 Nucleotide sequence encoding one or more genes selected from the group consisting of: WDR48, C12orf65, AP5Z1, C19orf12, DDHD1, TECPR2, DDHD2, IBA57, ZFR, KLC2, NT5C2, SPAST, ZFYVE27, KIAA0196, HSP60, SPG20, ARL6IP2 A method for identifying new variants of HSP disease causes, which will be obtained through Next Generation Sequencing.

The method of claim 1,
(i) if the priority score calculated in step (f) is -log (0.05) = 2.996 or more, further comprising checking whether the SNV pattern of the individual and the SNV pattern of the normal person each match, the new cause of HSP disease How to identify variants.

The method of claim 4, wherein
The method of identifying a new variant of the cause of HSP disease, further comprising the step of identifying whether a single sequence (SNV) position that satisfies the conditions of step (i) is a coding region.

(a) a data acquisition unit into which the hereditary stiff paraplegic single nucleotide sequence (SNV) data and family information are input;
(b) (i) digitizing the SNV data using a preset formula of the input SNV and family information; (ii) determining window size n using the number of family members that can analyze the quantified SNV data; (iii) a window comprising the determined window size n base sequences of (ii), respectively, to the left and the right around any specific position sequence, and the family to be analyzed using the quantified SNV data in the set window Calculate the ratio of; (iv) calculating a significance probability using a single ratio test at the SNV position in the window set in (iii) above; (v) weight calculation for physical position correction of both ends of the window set in (iii); And (vi) a data operator for performing a priority score operation using the significance probability calculated in (v) and the weighted value of (v);
(c) a mapping unit for converting an SNV into a genetic symbol according to the priority score calculated by the calculating unit; And
(d) a system for identifying HSP disease cause genes, comprising identifying the mapped gene symbols to identify HSP disease cause genes.

The method of claim 6,
The preset formula of step (b) is (i) when both the father and mother nucleotide sequence information of the individual is available, the preset equation is

Is,
Here, MSNV _i (S = 1) is the most common pattern for the SNV of the patient at the i-th position, and v = 1, ..., V, HSP disease cause gene identification system.

(a) quantifying SNV data using hereditary stiff paraplegic single nucleotide sequence data and family information in which hereditary stiff paraplegic single nucleotide sequence (SNV) data and family information are input;
(b) determining a window size n using the number of family members capable of analyzing the SNV data quantified in step (a);
(c) a window including n base sequences having a window size determined in step (b), respectively, is set to the left and the right of a specific position sequence, respectively, and is analyzed using the quantized SNV data in the set window. Calculating a proportion of the target family;
(d) calculating a significant probability (p-value) using a single ratio test at a single sequence (SNV) position in the window set in step (c);
(e) calculating weights for physical position correction of both ends of the window set in step (c);
(f) calculating a priority score using the significance probability calculated in step (d) and the weight calculated in step (e);
(g) mapping single nucleotide sequence (SNV) data and hereditary stiff paraplegic single nucleotide sequence (SNV) data according to the calculated priority score; And
and (h) outputting whether HSP is at risk using the SNV mapped in the step.

(a) AMPD2, PSEN1, APP, CAPN1, FUS, ALDH18A1, ALDH18A1, SOD1, MARS, MPZ, MAG, NEFL, PFN1, SLC16A2, UBQLN2, PLP1, GJB1, L1CAM, GARS, PSEN2, VCP1, PMP22 , ATXN2, ENTPD1, B4GALNT1, AP4M1, TFG, SPG7, KIF5A, KIF1C, USP8, RTN2, PNPLA6, SLC33A1, CYP7B1, TBK1, TARDBP, KIF1B, BSCL2, ALS2, ATL1, APB4 AP1, APB4 AP4 , NIPA1, SPG21, MFN2, GJC2, CPT1C, REEP1, RAB3GAP2, REEP2, FIG4, GBA2, BICD2, VPS37A, ARSI, CCT5, CYP2U1, SPG11, FA2H, CCDC50, ERLIN1, ERLIN2, PGAP1, ZF48 Specifically binding to a nucleotide sequence encoding a gene, one or more genes selected from the group consisting of: C19orf12, DDHD1, TECPR2, DDHD2, IBA57, ZFR, KLC2, NT5C2, SPAST, ZFYVE27, KIAA0196, HSP60, SPG20, ARL6IP2 Hereditary tonic paraplegic single sequencing data via Next Generation Sequencing using primers or probes To obtain;
(b) mapping the SNV data and data of a sample obtained from the individual; And
(c) identifying the mutated position of the SNV data and the HSP disease cause gene, the method of identifying a new variant of HSP disease cause.

(a) AMPD2, PSEN1, APP, CAPN1, FUS, ALDH18A1, ALDH18A1, SOD1, MARS, MPZ, MAG, NEFL, PFN1, SLC16A2, UBQLN2, PLP1, GJB1, L1CAM, GARS, PSEN2, VCP1, PMP22 , ATXN2, ENTPD1, B4GALNT1, AP4M1, TFG, SPG7, KIF5A, KIF1C, USP8, RTN2, PNPLA6, SLC33A1, CYP7B1, TBK1, TARDBP, KIF1B, BSCL2, ALS2, ATL1, APB4 AP1, APB4 AP4 , NIPA1, SPG21, MFN2, GJC2, CPT1C, REEP1, RAB3GAP2, REEP2, FIG4, GBA2, BICD2, VPS37A, ARSI, CCT5, CYP2U1, SPG11, FA2H, CCDC50, ERLIN1, ERLIN2, PGAP1, ZF48 Specifically binding to a nucleotide sequence encoding a gene, one or more genes selected from the group consisting of: C19orf12, DDHD1, TECPR2, DDHD2, IBA57, ZFR, KLC2, NT5C2, SPAST, ZFYVE27, KIAA0196, HSP60, SPG20, ARL6IP2 Hereditary tonic paraplegic single sequencing data via Next Generation Sequencing using primers or probes To obtain;
(b) mapping the SNV data and data of a sample obtained from the individual;
(c) identifying new variants of HSP disease causes by identifying the mutated positions of the SNV data and HSP disease cause genes; And
(d) predicting a high risk of hereditary tonic paraplegia (HSP) disease if the number of new variants of HSP disease causes is high.

(a) a data acquisition unit into which the hereditary stiff paraplegic single nucleotide sequence (SNV) data and family information are input;
(b) (i) digitizing the SNV data using a preset formula of the input SNV and family information;
(ii) determining window size n using the number of family members that can analyze the quantified SNV data; (iii) a window comprising the determined window size n base sequences of (ii), respectively, to the left and the right around any specific position sequence, and the family to be analyzed using the quantified SNV data in the set window Calculate the ratio of; (iv) calculating a significance probability using a single ratio test at the SNV position in the window set in (iii) above; (v) weight calculation for physical position correction of both ends of the window set in (iii); And (vi) a data operator for performing a priority score operation using the significance probability calculated in (v) and the weighted value of (v);
(c) a mapping unit for mapping the selected SNV and SNV data of the HSP according to the priority score calculated by the calculating unit; And
(d) a system for diagnosing HSP disease, comprising outputting whether or not there is a risk of HSP using the SNV mapped in the step.

The method of claim 11,
The preset formula of step (b) is (i) when both the father and mother nucleotide sequence information of the individual is available, the preset equation is

Is,
Where MSNV _i (S = 1) is the most common pattern for the SNV of the patient at the i th position, and v = 1, ..., V, wherein the system for diagnosing HSP disease.

Includes primers or probes that specifically bind to a nucleotide sequence encoding one or more genes selected from the group consisting of AP4M1, ATL1, CYP7B1, DDHD1, KIAA0196, KIF1C, KIF5A, PLP1, REEP1, SPAST, SPG11, SPG20, SPG7 Chip for diagnosing the cause of HSP disease.

The method of claim 13,
The chips are AMPD2, PSEN1, APP, CAPN1, FUS, ALDH18A1, ALDH18A1, SOD1, MARS, MPZ, MAG, NEFL, PFN1, SLC16A2, UBQLN2, GJB1, L1CAM, GARS, PSEN2, VCP, PMP22, KIF1 ATXN , ENTPD1, B4GALNT1, TFG, USP8, RTN2, PNPLA6, SLC33A1, TBK1, TARDBP, KIF1B, BSCL2, ALS2, GDAP1, LYST, AP4S1, AP4E1, AP4B1, ARL6IP1, NIPA1, SPG21GJCN2 , FIG4, GBA2, BICD2, VPS37A, ARSI, CCT5, CYP2U1, FA2H, CCDC50, ERLIN1, ERLIN2, PGAP1, ZFYVE26, WDR48, C12orf65, AP5Z1, C19orf12, TECPR2, DDHD2, IBA57, ZF5 KZF27 And, further comprising one or more genes selected from the group consisting of ARL6IP2, the chip for diagnosing the cause of HSP disease.

HSP diagnostic kit, comprising the chip of claim 13.

The method of claim 15,
The kit is an RT-PCR kit, DNA chip kit or protein chip kit, HSP diagnostic kit.