KR100904220B1

KR100904220B1 - System, method and program for M cell target prediction of peptide sequence by mathematical model

Info

Publication number: KR100904220B1
Application number: KR1020070008483A
Authority: KR
Inventors: 정은경; 김준형; 김민경; 이호경; 정동현; 최승훈; 신재민; 윤철희; 강상기; 김민국; 최윤재
Original assignee: 주식회사 인실리코텍
Priority date: 2007-01-26
Filing date: 2007-01-26
Publication date: 2009-06-25
Also published as: KR20080086563A

Abstract

본 발명은 수학적 모델을 이용하여 펩타이드 서열의 M 세포 표적을 예측하는 시스템 및 방법과 그 프로그램을 저장한 기록매체에 관한 것이다.The present invention relates to a system and method for predicting M cell target of a peptide sequence using a mathematical model, and a recording medium storing the program.

본 발명은 실험기법을 이용하여 M 세포를 표적하는 펩타이드 서열의 표본을 획득하는 단계와; 이들 서열을 근거로 M 세포를 표적하지 않는 펩타이드 서열의 표본을 획득하는 단계와; 상기 획득한 표본 각각을 집합으로 저장한 후 이들을 일정한 비율이 되도록 임의로 추출하여 수학적 모델 훈련용 집합과 수학적 모델 검증용 집합으로 분류하는 단계와; 개별 펩타이드 서열에 표현자 값과 활성값을 부여하는 단계와; 훈련용 펩타이드 집합과 수학적 모델을 이용하여 훈련하는 단계와; 훈련된 수학적 모델을 이용하여 검증용 펩타이드 집합에 대해 M 세포 표적 여부를 예측하는 단계와; 훈련된 수학적 모델을 검증하는 단계로 이루어져 펩타이드의 M 세포 표적 여부를 항상 실험으로 확인하지 않고 프로그램을 저장한 기록매체를 이용하여 미리 예측함으로써 실험으로 소요되는 비용과 시간을 절감할 수 있는 각별한 장점이 있는 유용한 발명이다.The present invention comprises the steps of obtaining a sample of the peptide sequence targeting the M cells by using an experimental technique; Obtaining a sample of peptide sequences that do not target M cells based on these sequences; Storing each of the obtained samples as a set and then randomly extracting them in a predetermined ratio to classify them into a mathematical model training set and a mathematical model verification set; Assigning presenter values and activity values to individual peptide sequences; Training using a training peptide set and a mathematical model; Predicting M cell target for the validation peptide set using a trained mathematical model; It is a step of verifying a trained mathematical model, and it is possible to reduce the cost and time required by experiment by predicting the target of M peptide of the peptide by using the recording medium that stores the program in advance. It is a useful invention.

수학적 모델, 펩타이드 서열, 표현자, 활성값, M 세포 표적. Mathematical model, peptide sequence, presenter, activity value, M cell target.

Description

System, method and program for M cell target prediction of peptide sequence by mathematical model}

도 1은 본 발명에 따른 수학적 모델을 이용한 펩타이드 서열의 M 세포 표적 예측 시스템의 일실시예를 도시한 블록 구성도,1 is a block diagram showing an embodiment of an M cell target prediction system of a peptide sequence using a mathematical model according to the present invention,

도 2는 본 발명 수학적 모델을 이용한 펩타이드 서열의 M 세포 표적 예측방법의 일실시예를 도시한 순서도,Figure 2 is a flow chart showing one embodiment of a method for predicting M cell target of the peptide sequence using the mathematical model of the present invention,

도 3은 본 발명 수학적 모델을 이용한 펩타이드 서열의 M 세포 표적 예측방법의 일실시예를 도시한 순서도,Figure 3 is a flow chart showing an embodiment of a method for predicting M cell target of the peptide sequence using the mathematical model of the present invention,

도 4는 본 발명에서 펩타이드 서열의 M 세포 표적 예측 모델을 재훈련하는 방법의 순서도이다.4 is a flowchart of a method for retraining an M cell target prediction model of a peptide sequence in the present invention.

〈도면의 주요부분에 대한 부호의 설명〉<Explanation of symbols for main parts of drawing>

10 : 마이크로 컴퓨터 11 : 프로그램 기록매체10: microcomputer 11: program recording medium

12 : CPU 13 : 입출력부12: CPU 13: input / output unit

20 : 입력수단 30 : 출력수단20: input means 30: output means

본 발명은 수학적 모델을 이용하여 펩타이드 서열의 M 세포 표적 여부를 예측하는 시스템 및 방법과 그 프로그램을 저장한 기록매체에 관한 것이다.The present invention relates to a system and method for predicting whether a peptide sequence targets an M cell using a mathematical model, and a recording medium storing the program.

일반적으로 외부로부터 체내로 병원성 물질이 감염되는 경로는 구강, 호흡 및 생식기도 그리고 피부로 나누어진다. 구강을 통해 소화기 내로 들어온 항원성 물질이 흡수되는 경로로는 페이어스 패치(Peyer's patch)의 낭포 조합 상피(follicle associated epithelium; FAE)에 존재하는 M 세포를 통한 수용체 매개 물질운송기전(receptor mediated endocytosis)과 융모세포(villi)를 구성하는 소장의 상피세포(enterocyte) 간의 밀착결합(tight junction) 사이로 유입되는 경로로 나누어 불 수 있다. 점막면역반응은 체면역반응(systemic immune response)과 달리 장관내로 운반된 거대분자(macromolecules), 미생물(microorganism)을 흡수하여 항원 특이적 면역 반응을 일으켜 최종적으로 소장의 상피세포(enterocyte)를 통과하여 분비형 IgA(secretory IgA)의 분비를 유도한다.In general, the path of infectious agents from the outside into the body is divided into the oral cavity, respiratory and genital tract and skin. The uptake of antigenic substances into the digestive tract through the oral cavity includes receptor mediated endocytosis through M cells present in the follicle associated epithelium (FAE) of the Peyer's patch. It can be divided into the path that enters between the tight junction (tight junction) between the intestinal epithelial (enterocyte) constituting the villi. The mucosal immune response, unlike the systemic immune response, absorbs macromolecules and microorganisms that are transported into the intestinal tract, resulting in antigen-specific immune responses that finally pass through the small intestine's epithelial cells. Induce secretion of type IgA (secretory IgA).

M 세포라는 특화된 세포는 장내에 존재하는 여러 항원을 특이적, 비특이적인 경로로 흡수하여 점막면역반응의 최종산물인 분비형 IgA를 소장의 상피세포를 통해 분비하는데 있어서 중추적인 기능을 담당한다.Specialized cells, called M cells, play a central role in secreting IgA, the end product of mucosal immune responses, through the epithelial cells of the small intestine, by absorbing several antigens present in the intestine through specific and nonspecific pathways.

콜레라, 장티푸스, 세균성 이질 등의 수인성 전염병과 같이 M 세포를 통과하여 소장의 점막 내로 유입되는 질병의 예방을 위한 백신의 투여에 있어 가장 효율적인 경로는 M 세포를 표적하는 경구투여방식(oral drug delivery)을 이용하여 점 막면역반응(mucosal immune response)을 유도하는 것으로 알려져 있다. 따라서 M 세포에서 특이적인 트랜스사이토시스(transcytosis)를 유도하는 작용기(ligand)를 확보할 수 있다면, 점막면역반응을 활성화시킬 수 있는 효율적인 경구용 백신 전달 시스템에 응용할 수 있다.The most efficient route for the administration of vaccines for the prevention of diseases that enter the mucous membranes of the small intestine through M cells, such as waterborne infectious diseases such as cholera, typhoid fever and bacterial dysentery, is the oral drug delivery that targets M cells. ) Is known to induce mucosal immune responses. Therefore, if a ligand capable of inducing specific transcytosis in M cells can be secured, it can be applied to an efficient oral vaccine delivery system that can activate mucosal immune response.

경구투여방식은 환자들의 거부감이 적고 숙련된 인력이나 주사기 등 별도의 기구가 필요하지 않기 때문에 투약의 편이성과 복약 순응도가 높다는 측면에서 가장 이상적인 방법이지만, 실용화에 있어서 몇 가지 문제점이 제시되고 있다. 첫째는 물질 크기 상의 제약으로, 단백질계 약물은 거대분자(macromolecule)로 이루어져 생체막 투과에 있어 제한을 가진다. 다음은 생체의 소화기관을 통과하면서 위를 포함한 소화기관에서 분비되는 각종 소화효소의 작용에 의해 활성을 잃게 된다. 이러한 요소를 극복하고 온전한 형태로 소장 내에 도달하더라도 외부물질을 선택적으로 받아들이는 장 상피세포층의 차단에 의해 약물이 체내로 흡수되어 제 기능을 발휘하는 것은 매우 어렵다. 그러므로 경구용 백신 전달 시스템의 상용화를 위해서는 백신의 낮은 흡수율과 소화관 통과 시의 불안정성, 낮은 장점막 투과성 등의 문제점 개선이 선행되어야 한다.Oral administration method is the most ideal method in terms of ease of administration and high compliance with medications because of the low objection of patients and does not require a separate device such as skilled personnel or syringe, but some problems have been suggested in practical use. The first is the constraint on the size of the substance, the protein-based drug is composed of macromolecules (macromolecule) has a limitation in the biomembrane permeation. The following is lost through the action of various digestive enzymes secreted by the digestive system including the stomach while passing through the digestive system of the living body. Overcoming these factors and reaching the small intestine intact form, it is very difficult for the drug to be absorbed into the body by blocking the intestinal epithelial cell layer that selectively accepts foreign substances to function properly. Therefore, in order to commercialize the oral vaccine delivery system, improvement of problems such as low absorption rate of the vaccine, instability at the digestive tract and low membranous permeability should be preceded.

최근 신약 개발에 있어, 펩타이드는 효능이 강력하고 독성 및 부작용이 거의 없으며 인체에 잔류하지 않는 것과 같은 장점으로 인해 새로운 신약을 개발하는데 있어 각광받는 연구물질 중의 하나로서 시장의 비율이 점점 성장하고 있다. Recently, in the development of new drugs, peptides are one of the hottest research materials for developing new drugs due to their strong efficacy, little toxicity and side effects, and non-existence in the human body.

이러한 펩타이드의 장점을 이용하여 M 세포를 표적하는 펩타이드를 선발해 이를 경구용 백신에 적용할 경우 기존의 단백질계 약물 및 백신이 가지고 있는 낮 은 운송효율을 개선할 수 있는 보다 효율적인 방법이 될 수 있다.Taking advantage of these peptides, selecting peptides targeting M cells and applying them to oral vaccines can be a more efficient way to improve the low transport efficiency of existing protein-based drugs and vaccines. .

종래의 기술은 생체에 직접 펩타이드를 투여하여 M 세포를 표적하는 펩타이드를 찾아내는 방법에 주로 의존하기 때문에 시간적, 경제적 측면에서 상당히 소모적이라고 하는 단점이 있었다.The prior art has a disadvantage in that it is very time-consuming and economical because it mainly depends on the method of finding peptides targeting M cells by directly administering the peptides to the living body.

이러한 실험적인 방법의 단점을 보완하기 위한 방법으로서, 신약 또는 신물질 개발에 있어 구조와 활성간의 상관관계에 대한 정량적 모델을 만드는 것이 실험에 대한 비용을 줄이면서 활성을 미리 예측하는데 있어 매우 유용한 방법 중의 하나로 사용되었다. 그러나 현재까지 이러한 방법론을 적용하여 펩타이드의 M 세포 표적 여부를 미리 예측해 볼 수 있는 프로그램은 개발되어 있지 않다. 그러한 이유로 약물 전달 물질이나 신약 개발에 있어 약물의 효율성을 증대시킬 수 있는 활성 중의 하나인 펩타이드 서열의 M 세포 표적 여부를 예측할 수 있는 기술의 개발이 절실히 요구되고 있는 실정이다.As a way of making up for the shortcomings of these experimental methods, the development of a quantitative model of the correlation between structure and activity in the development of new drugs or new materials is one of the most useful methods for predicting the activity while reducing the cost of the experiment. Was used. However, until now, no program has been developed that can predict the target of M-peptide by applying this methodology. For this reason, there is an urgent need for the development of a technique for predicting whether or not to target the M cell of the peptide sequence, which is one of the activities that can increase the efficiency of drugs in drug delivery materials or drug development.

본 발명은 상기한 실정을 감안하여 발명한 것으로서, 수학적 모델을 이용하여 펩타이드 서열의 M 세포 표적 여부를 예측하는 시스템 및 방법과 그 프로그램을 저장한 기록매체를 제공하여 펩타이드 서열의 M 세포 표적 여부를 예측하고 검증하는 모델을 제시하는데 그 목적이 있다.The present invention has been invented in view of the above circumstances, and provides a system and method for predicting whether or not a peptide sequence targets M cells using a mathematical model, and provides a recording medium storing the program to determine whether the peptide sequence targets M cells. The purpose is to present a model that predicts and verifies.

상기한 목적을 달성하기 위한 본 발명 수학적 모델을 이용한 펩타이드 서열의 M 세포 표적 예측 시스템은 프로그램 기록매체(11)와 CPU(12) 및 입출력부(13)로 이루어진 마이크로컴퓨터(10)와; 입력수단(20) 및; 출력수단(30)으로 이루어짐을 특징으로 하는 수학적 모델을 이용한 펩타이드 서열의 M 세포 표적 예측 시스템에 있어서, 상기 마이크로컴퓨터의 제어하에 상기 프로그램기록매체가 동작을 수행하되, 상기 프로그램기록매체는 실험 기법을 이용하여 M 세포를 표적하는 펩타이드 서열의 표본을 획득하는 프로세스와; 이들 서열을 근거로 M 세포를 표적하지 않는 펩타이드 서열의 표본을 획득하는 프로세스와; 상기 획득한 표본 각각을 집합으로 저장한 후 이들을 일정한 비율이 되도록 임의로 추출하여 수학적 모델 훈련용 집합과 수학적 모델 검증용 집합으로 분류하는 프로세스와; 개별 펩타이드 서열에 표현자 값과 활성값을 부여하는 프로세스와; 훈련용 펩타이드 집합과 수학적 모델을 이용하여 훈련하는 프로세스와; 훈련된 수학적 모델을 이용하여 검증용 펩타이드 집합에 대해 M 세포 표적 여부를 예측하는 프로세스와; 훈련된 수학적 모델을 검증하는 프로세스를 포함하는 것을 특징으로 한다.M cell target prediction system of the peptide sequence using the mathematical model of the present invention for achieving the above object comprises a microcomputer (10) consisting of a program recording medium (11), a CPU (12) and an input / output unit (13); Input means 20; In the M cell target prediction system of the peptide sequence using a mathematical model, characterized in that the output means 30, the program recording medium under the control of the microcomputer to operate, the program recording medium using an experimental technique Obtaining a sample of the peptide sequence targeting the M cell; Obtaining a sample of peptide sequences that do not target M cells based on these sequences; Storing each of the obtained samples as a set and then randomly extracting them in a predetermined ratio to classify them into a mathematical model training set and a mathematical model verification set; Assigning presenter values and activity values to individual peptide sequences; Training using training peptide sets and mathematical models; A process of predicting M cell targets for the validation peptide set using a trained mathematical model; And a process for verifying the trained mathematical model.

상기 프로그램 기록매체(11)는 사용자가 M 세포 표적 여부를 알고자 하는 펩타이드 서열을 입력하면 이를 아미노산 표현자로 변환하는 프로그램과, 훈련된 수학적 모델을 사용하여 M 세포 표적 여부를 예측하는 프로그램을 포함하고, 사용자가 실험기법으로 M 세포 표적 여부 활성값을 획득한 새로운 M 세포 표적 펩타이드 서열을 추가하면, 이를 원래의 M 세포 표적 펩타이드 집합에 추가한 다음 분류하는 프로그램과, 추가된 펩타이드에 표현자 값과 활성값을 부여하는 프로그램과, 훈련용 펩타이드 집합을 이용하여 수학적 모델로 훈련하는 프로그램과, 검증용 펩타이드 집합에 대해 M 세포 표적 여부를 예측할 수 있도록 하는 프로그램과, 훈련된 수학적 모델에 대해 검증하는 프로그램을 포함하는 것을 특징으로 한다.The program recording medium 11 includes a program for converting a peptide sequence of which a user wants to know whether or not to target an M cell into an amino acid descriptor, and a program for predicting whether to target an M cell using a trained mathematical model. When a user adds a new M cell target peptide sequence obtained by M-cell target activity by an experimental technique, the program is added to the original M cell target peptide set and then classified, and the expression value is added to the added peptide. A program that gives an activity value, a program that trains a mathematical model using a training peptide set, a program that can predict whether or not to target M cell targets for a test peptide set, and a program that verifies a trained mathematical model. Characterized in that it comprises a.

또한 상기한 목적을 달성하기 위한 본 발명 수학적 모델을 이용한 펩타이드 서열의 M 세포 표적을 예측하는 방법은 실험기법을 이용하여 M 세포를 표적하는 펩타이드 서열의 표본을 획득하는 단계와; 이들 서열을 근거로 M 세포를 표적하지 않는 펩타이드의 서열의 표본을 획득하는 단계와; 상기 획득한 표본 각각을 집합으로 저장한 후 이들을 일정한 비율이 되도록 임의로 추출하여 수학적 모델 훈련용 집합과 수학적 모델 검증용 집합으로 분류하는 단계와; 개별 펩타이드 서열에 표현자 값과 활성값을 부여하는 단계와; 훈련용 펩타이드 집합과 수학적 모델을 이용하여 훈련하는 단계와; 훈련된 수학적 모델을 이용하여 검증용 펩타이드 집합에 대해 M 세포 표적 여부를 예측하는 단계와; 훈련된 수학적 모델을 검증하는 단계로 이루어지는 것을 특징으로 한다.In addition, the method for predicting the M cell target of the peptide sequence using the mathematical model of the present invention for achieving the above object comprises the steps of obtaining a sample of the peptide sequence targeting the M cells by using an experimental technique; Obtaining a sample of a sequence of peptides that do not target M cells based on these sequences; Storing each of the obtained samples as a set and then randomly extracting them in a predetermined ratio to classify them into a mathematical model training set and a mathematical model verification set; Assigning presenter values and activity values to individual peptide sequences; Training using a training peptide set and a mathematical model; Predicting M cell target for the validation peptide set using a trained mathematical model; Verifying a trained mathematical model.

상기 수학적 모델은 회귀분석법, 기계학습법, 유전자 알고리즘을 이용한 다중 회귀분석법, 유전자 알고리즘을 이용한 편최소제곱법, 주성분 분석을 활용한 편최소제곱법, 주성분 분석을 활용한 다중 회귀분석법을 포함하는 정량적 구조-특성 상관관계 방법인 것을 특징으로 하는 수학적 모델 방법이며, 상기 기계학습법은 신경망, 데이터마이닝, 의사결정트리, 귀납논리, 사례기반 추론, 패턴 인식, 강화 학습, 베이지안 망, 은닉마코프 모델, 확률 문법 방법이고, 특히 신경망 기법인 것을 특징으로 한다.The mathematical model is a quantitative structure including regression analysis, machine learning, multiple regression analysis using genetic algorithm, partial least square method using genetic algorithm, partial least square method using principal component analysis, and multiple regression analysis using principal component analysis. A mathematical model method characterized in that it is a characteristic correlation method, and the machine learning method is a neural network, data mining, decision tree, inductive logic, case-based reasoning, pattern recognition, reinforcement learning, Bayesian network, hidden marker model, probability grammar Method, and in particular, a neural network technique.

상기 표현자 값은 분자구조, 아미노산, 펩타이드를 정량적으로 나타낸 것으로, 바이너리 아미노산 표현자, VHSE 아미노산 표현자, Z3 아미노산 표현자, Z5 아미노산 표현자 중 적어도 어느 하나를 포함하는 것을 특징으로 한다.The presenter value is a quantitative representation of molecular structure, amino acids, peptides, characterized in that it comprises at least one of binary amino acid descriptors, VHSE amino acid descriptors, Z3 amino acid descriptors, Z5 amino acid descriptors.

상기 기계학습 모델을 구축하기 위해 수집되는 데이터는 in vivo , ex vivo , in vitro 실험 중 적어도 어느 하나로부터 얻은 데이터이고, 특히 파지 디스플레이 실험 기법을 이용한 in vivo , ex vivo , in vitro 실험 중 적어도 어느 하나로부터 얻은 데이터인 것을 특징으로 한다. 상기 펩타이드 서열은 2 ~ 12개의 펩타이드, 더 바람직하게는 3 ~ 7개의 펩타이드로 이루어진 서열이며, 본 발명 수학적 모델을 이용한 펩타이드 서열의 M 세포 표적 예측방법을 적용하는 종은 포유류이며, 특히 사람을 대상으로 한다.The data collected to build the machine learning model is in vivo , ex vivo is, data obtained from at least any one of the in vitro experiments, especially in using the phage display technique experiment vivo , ex vivo , in It is characterized in that the data obtained from at least one of the in vitro experiments. The peptide sequence is a sequence consisting of 2 to 12 peptides, more preferably 3 to 7 peptides, and the species to which the M cell target prediction method of the peptide sequence using the mathematical model of the present invention is applied are mammals, particularly humans. It is done.

또한 본 발명 수학적 모델을 이용한 펩타이드 서열의 M 세포 표적 예측 프로그램을 저장한 기록매체는 실험 기법을 이용하여 M 세포를 표적하는 펩타이드 서열의 표본을 획득하는 프로세스와; 이들 서열을 근거로 M 세포를 표적하지 않는 펩타 이드 서열의 표본을 획득하는 프로세스와; 상기 획득한 표본 각각을 집합으로 저장한 후 이들을 일정한 비율이 되도록 임의로 추출하여 수학적 모델 훈련용 집합과 수학적 모델 검증용 집합으로 분류하는 프로세스와; 개별 펩타이드 서열에 표현자 값과 활성값을 부여하는 프로세스와; 훈련용 펩타이드 집합과 수학적 모델을 이용하여 훈련하는 프로세스와; 훈련된 수학적 모델을 이용하여 검증용 펩타이드 집합에 대해 M 세포 표적 여부를 예측하는 프로세스와; 훈련된 수학적 모델을 검증하는 프로세스를 포함하는 것을 특징으로 한다.In addition, the recording medium storing the M cell target prediction program of the peptide sequence using the mathematical model of the present invention includes the process of obtaining a sample of the peptide sequence targeting the M cells by using an experimental technique; Obtaining a sample of a peptide sequence that does not target M cells based on these sequences; Storing each of the obtained samples as a set and then randomly extracting them in a predetermined ratio to classify them into a mathematical model training set and a mathematical model verification set; Assigning presenter values and activity values to individual peptide sequences; Training using training peptide sets and mathematical models; A process of predicting M cell targets for the validation peptide set using a trained mathematical model; And a process for verifying the trained mathematical model.

상기 본 발명의 목적과 특징 및 장점은 첨부 도면 및 다음의 상세한 설명을 참조함으로써 더욱 쉽게 이해될 수 있을 것이다.The objects, features and advantages of the present invention will be more readily understood by reference to the accompanying drawings and the following detailed description.

이하, 첨부도면을 참조하여 본 발명 수학적 모델을 이용한 펩타이드 서열의 M 세포 표적 예측 시스템 및 방법과 그 프로그램을 저장한 기록매체를 바람직한 실시예로서 상세하게 설명한다.Hereinafter, with reference to the accompanying drawings, a system and method for predicting M cell target of peptide sequence using the mathematical model of the present invention and a recording medium storing the program will be described in detail as a preferred embodiment.

도 1은 본 발명에 따른 수학적 모델을 이용한 펩타이드 서열의 M 세포 표적 예측 시스템의 일실시예를 도시한 블록 구성도, 도 2는 본 발명 수학적 모델을 이용한 펩타이드 서열의 M 세포 표적 예측방법의 일실시예를 도시한 순서도로서, 도 2에 도시한 바와 같이 먼저 in vitro M 세포 모델과 파지디스플레이 실험 기법으로 M 세포 표적 펩타이드 서열의 표본(수)을 수집한다(S1단계). 여기서 펩타이드 서열의 길이는 하나의 펩타이드에 있는 아미노산의 수를 의미하며, 펩타이드 서열 길이 7은 아미노산이 7개로 이루어진 펩타이드를 나타낸다. 수집한 펩타이드 서열의 수 는 하기 표 1과 같다.1 is a block diagram showing an embodiment of the M cell target prediction system of the peptide sequence using the mathematical model according to the present invention, Figure 2 is an embodiment of a method for predicting M cell target of the peptide sequence using the mathematical model of the present invention As a flowchart showing an example, as shown in FIG. in vitro A sample (number) of M cell target peptide sequences is collected by M cell model and phage display experiment technique (step S1). Here, the length of the peptide sequence means the number of amino acids in one peptide, and the peptide sequence length 7 indicates a peptide having 7 amino acids. The number of peptide sequences collected is shown in Table 1 below.

M 세포 표적 펩타이드 서열의 수Number of M cell target peptide sequences 펩타이드 서열 길이 Peptide sequence length 펩타이드 수(개) Peptide Count M 세포 표적 M cell target M 세포 비표적 M cell non-target 훈련용 For training 검증용 Verification 3 3 1,225 1,225 1,225 1,225 1,930 1,930 520 520 4 4 980 980 980 980 1,568 1,568 392 392 5 5 735 735 735 735 1,174 1,174 296 296 6 6 490 490 490 490 782 782 198 198 7 7 245 245 245 245 396 396 94 94

또한 상기 S1단계에서 사용된 파지디스플레이 펩타이드 라이브러리는 'ph.D.-C7C^TM(New England BioLab.)'으로, 이는 M13 박테리오파지(bacteriophage)의 게놈(genome) 중에서 코트 단백질(coat protein)의 일종인 pⅢ를 생산하는 유전자 말단에 7개의 무작위 아미노산 서열(random amino acid sequence)의 펩타이드(peptide)가 발현되도록 인위적으로 유전자 서열을 삽입한 후, 대장균(E. coli)에 감염시켜 얻은 수억 종 이상의 서로 다른 펩타이드를 발현한 재조합 박테리오파지로 구성되어 있다. 한편, M13 파지에 도입되어 있는 7개의 무작위 아미노산 서열은 양쪽에 cysteine 잔기를 보유하도록 설계되어 펩타이드 발현시 자연적으로 이황화 결합(disulfide bond)을 형성함으로써 고리모양(loop shape)을 이루도록 하여 목적 단백질과 더욱 강한 결합을 유도할 수 있도록 되어 있다. In addition, the phage display peptide library used in step S1 is 'ph.D.-C7C ^TM (New England BioLab.)', Which is a kind of coat protein in the genome of M13 bacteriophage. More than hundreds of millions of different species obtained by infecting E. coli by artificially inserting gene sequences to express peptides of 7 random amino acid sequences at the ends of pIII-producing genes It consists of recombinant bacteriophage expressing peptides. On the other hand, the seven random amino acid sequences introduced into the M13 phage are designed to have cysteine residues on both sides to form a disulfide bond in the expression of the peptide, thereby forming a loop shape, thereby allowing the loop protein to form a loop shape. It is designed to induce strong bonding.

파지디스플레이 기법은 1.0 X 10¹¹ pfu의 파지 펩타이드 라이브러리(~1,000 copies of individual recombinant phage clone) 중 in vitro M 세포 모델을 대상으로 트랜스사이토시스 분석(transcytosis assay)을 실시하여 유의적으로 높은 트랜스사이토시스(transcytosis) 능력을 가진 펩타이드 서열을 선별하였다.Phage display technique is 1.0 X 10 ¹¹ pfu phage-peptide library (~ 1,000 copies of individual recombinant phage clone) of in in vitro A transcytosis assay was performed on M cell models to select peptide sequences with significantly high transcytosis ability.

이와 더불어 임의의 아미노산을 선택하게 하는 프로그램을 이용하여 M 세포 표적 펩타이드 서열의 길이 7에 대한 7개의 아미노산을 뽑아낸 후 실험에서 획득한 M 세포 표적 펩타이드 집합과 비교하여 동일한 서열의 펩타이드가 없는 경우 M 세포 비표적 펩타이드 서열의 집합으로 분류한다(S2단계). 여기서 임의의 아미노산을 선택하게 하는 프로그램은 공지의 프로그램을 이용한다.In addition, by using a program to select an arbitrary amino acid 7 amino acids for the length 7 of the M cell target peptide sequence was extracted and compared to the M cell target peptide set obtained in the experiment compared to the M cell if there is no peptide of the same sequence Classify as a set of non-target peptide sequences (step S2). The program for selecting any amino acid here uses a well-known program.

다음으로 기계학습 훈련을 위하여 펩타이드 서열의 집합을 분류한다(S3단계). 이 단계(S3단계)에는 M 세포 표적 펩타이드 서열 집합의 수가 M 세포 비표적 펩타이드 서열 집합의 수보다 양이 적기 때문에 두 집합간의 개체수를 동일하게 만드는 과정을 포함한다. 이 단계(S3단계)에서 펩타이드 서열 길이 7의 경우 M 세포 비표적 펩타이드를 표 1에서와 같이 245개 획득하였다. Next, classify a set of peptide sequences for machine learning training (step S3). This step (step S3) includes a process of making the population equal between two sets because the number of M cell target peptide sequence sets is less than the number of M cell non-target peptide sequence sets. In this step (step S3), for peptide sequence length 7, 245 M cell non-target peptides were obtained as shown in Table 1.

이어 상기 M 세포 표적 펩타이드 집합에서 임의의 펩타이드 서열을 대략 80% 추출하고, M 세포 비표적 펩타이드 집합에서 임의의 펩타이드 서열을 대략 80% 추출한 다음 이 둘을 모아서 기계학습 훈련용 펩타이드 집합으로 분류한다(S4단계).Then, approximately 80% of any peptide sequence is extracted from the set of M cell target peptides, approximately 80% of any peptide sequence is extracted from the non-M cell target peptide set, and the two are collected and classified as a set of machine learning training peptides ( Step S4).

상기 S4단계와 마찬가지로 M 세포 표적 펩타이드 집합에서 대략 나머지 20%와 M 세포 비표적 펩타이드 집합에서의 나머지 20%를 모두 모아서 기계학습 검증용 펩타이드 집합으로 분류한다(S5단계).As in step S4, approximately 20% of the remaining M cell target peptide set and the remaining 20% of the M cell non-target peptide set are collected and classified into a set of machine learning verification peptides (step S5).

그 결과 표 1에서와 같이 펩타이드 서열의 길이 7의 경우 기계학습 훈련용 펩타이드의 수는 396개, 기계학습 검증용 펩타이드의 수는 94개 이다.As a result, as shown in Table 1, in the case of length 7 of the peptide sequence, the number of machine learning training peptides was 396 and the number of peptides for machine learning verification was 94.

다음에는 기계학습법을 이용하여 상기 S4단계에서 획득한 기계학습 훈련용 집합으로 M 세포 표적형 펩타이드 예측 모델을 훈련하고 획득하는 단계(S10단계)를 진행한다. 즉, M 세포 표적 펩타이드 집합이 입력되는 순서를 임의로 변경하는 단계로서, M 세포 표적 펩타이드 서열과 M 세포 비표적 펩타이드 서열이 동등한 비율로 번갈아 가면서 기계학습 훈련 과정에 입력값으로 들어갈 수 있도록 기계학습 훈련용 집합의 순서를 조정하여 기계학습 모델 훈련을 위한 입력값으로 입력한다(S11단계).Next, using the machine learning method, the step of training and acquiring the M cell-targeted peptide prediction model with the machine learning training set obtained in step S4 is performed (step S10). That is, in the step of arbitrarily changing the order in which the M cell target peptide set is input, the machine learning training so that the M cell target peptide sequence and the M cell non-target peptide sequence are alternated at equal ratios and entered into the machine learning training process as input values. The order of the dragon set is adjusted and input as an input value for the machine learning model training (step S11).

그 후에는 기계학습 훈련용 집합에 포함된 펩타이드 개별 서열을 아미노산 표현자 값으로 변환한다(S12단계). 여기서 아미노산 표현자 값은 바이너리 아미노산 표현자, VHSE 아미노산 표현자, Z3 아미노산 표현자, Z5 아미노산 표현자 중 적어도 어느 하나를 포함하여 사용하며, 아미노산에 대한 바이너리 아미노산 표현자는 하나의 아미노산에 대해서 19개의 "0"과 하나의 "1"로 구성된 20자리의 수로 표현하고, 20개의 아미노산 각각에 대해서는 "1"의 값이 위치하는 순서가 서로 다르도록 설정한다. 펩타이드 서열 길이 7의 경우 140개의 표현자로 구성되며, M 세포표적 활성값의 경우 M 세포 표적 펩타이드는 0.9로, M 세포 비표적 펩타이드는 0.1로 한다.Thereafter, the individual peptide sequences included in the machine learning training set are converted into amino acid descriptor values (step S12). Wherein the amino acid presenter value includes at least one of binary amino acid presenter, VHSE amino acid presenter, Z3 amino acid presenter, Z5 amino acid presenter, and the binary amino acid presenter for an amino acid is defined as 19 " It is expressed by the number of 20 digits which consists of 0 "and one" 1 ", and it sets so that the order in which the value of" 1 "may be located may differ from each other for 20 amino acids. For peptide sequence length 7, it consists of 140 descriptors. For M cell target activity, M cell target peptide is 0.9 and M cell non-target peptide is 0.1.

이와 같이 기계학습 훈련용 집합에 포함된 펩타이드 개별 서열을 표현자 값으로의 변환은 VHSE 아미노산 표현자를 사용하여서도 변환할 수 있으며, 이에 대해 정의된 값은 하기 표 2와 같다. VHSE 표현자는 하나의 아미노산에 대해서 8개의 표현자로 이루어져 있으며, 이들은 아미노산의 소수성(hydrophobicity), 전자기적(electronic), 입체적(steric) 특성을 나타내는 것으로 알려져 있으며, 펩타이드 서열길이 3의 경우 24개의 입력값으로 구성된다.As such, the conversion of individual peptide sequences included in the machine learning training set to the value of the presenter can be converted using the VHSE amino acid presenter, and the values defined therefor are shown in Table 2 below. VHSE descriptors consist of eight markers for one amino acid, which are known to represent the hydrophobicity, electronic, and steric properties of the amino acids, and 24 input values for peptide sequence length 3. It consists of.

VHSE 아미노산 표현자VHSE Amino Acid Descriptor Amino Acids Amino acids VHSE 1 VHSE 1 VHSE 2 VHSE 2 VHSE 3 VHSE 3 VHSE 4 VHSE 4 VHSE 5 VHSE 5 VHSE 6 VHSE 6 VHSE 7 VHSE 7 VHSE 8 VHSE 8 Ala Ala A A 0.15 0.15 -1.11 -1.11 -1.35 -1.35 -0.92 -0.92 0.02 0.02 -0.91 -0.91 0.36 0.36 -0.48 -0.48 Arg Arg R R -1.47 -1.47 1.45 1.45 1.24 1.24 1.27 1.27 1.55 1.55 1.47 1.47 1.30 1.30 0.83 0.83 Asn Asn N N -0.99 -0.99 0.00 0.00 -0.37 -0.37 0.69 0.69 -0.55 -0.55 0.85 0.85 0.73 0.73 -0.80 -0.80 Asp Asp D D -1.15 -1.15 0.67 0.67 -0.41 -0.41 -0.01 -0.01 -2.68 -2.68 1.31 1.31 0.03 0.03 0.56 0.56 Cys Cys C C 0.18 0.18 -1.67 -1.67 -0.46 -0.46 -0.21 -0.21 0.00 0.00 1.20 1.20 -1.61 -1.61 -0.19 -0.19 Gln Gln Q Q -0.96 -0.96 0.12 0.12 0.18 0.18 0.16 0.16 0.09 0.09 0.42 0.42 -0.20 -0.20 -0.41 -0.41 Glu Glu E E -1.18 -1.18 0.40 0.40 0.10 0.10 0.36 0.36 -2.16 -2.16 -0.17 -0.17 0.91 0.91 0.02 0.02 Gly Gly G G -0.20 -0.20 -1.53 -1.53 -2.63 -2.63 2.28 2.28 -0.53 -0.53 -1.18 -1.18 2.01 2.01 -1.34 -1.34 His His H H -0.43 -0.43 -0.25 -0.25 0.37 0.37 0.19 0.19 0.51 0.51 1.28 1.28 0.93 0.93 0.65 0.65 Ile Ile I I 1.27 1.27 -0.14 -0.14 0.30 0.30 -1.80 -1.80 0.30 0.30 -1.61 -1.61 -0.16 -0.16 -0.13 -0.13 Leu Leu L L 1.36 1.36 0.07 0.07 0.26 0.26 -0.80 -0.80 0.22 0.22 -1.37 -1.37 0.08 0.08 -0.62 -0.62 Lys Lys K K -1.17 -1.17 0.70 0.70 0.70 0.70 0.80 0.80 1.64 1.64 0.67 0.67 1.63 1.63 0.13 0.13 Met Met M M 1.01 1.01 -0.53 -0.53 0.43 0.43 0.00 0.00 0.23 0.23 0.10 0.10 -0.86 -0.86 -0.68 -0.68 Phe Phe F F 1.52 1.52 0.61 0.61 0.96 0.96 -0.16 -0.16 0.25 0.25 0.28 0.28 -1.33 -1.33 -0.20 -0.20 Pro Pro P P 0.22 0.22 -0.17 -0.17 -0.50 -0.50 0.05 0.05 -0.01 -0.01 -1.34 -1.34 -0.19 -0.19 3.56 3.56 Ser Ser S S -0.67 -0.67 -0.86 -0.86 -1.07 -1.07 -0.41 -0.41 -0.32 -0.32 0.27 0.27 -0.64 -0.64 0.11 0.11 Thr Thr T T -0.34 -0.34 -0.51 -0.51 -0.55 -0.55 -1.06 -1.06 -0.06 -0.06 -0.01 -0.01 -0.79 -0.79 0.39 0.39 Trp Trp W W 1.50 1.50 2.06 2.06 1.79 1.79 0.75 0.75 0.75 0.75 -0.13 -0.13 -1.01 -1.01 -0.85 -0.85 Tyr Tyr Y Y 0.61 0.61 1.61 1.61 1.17 1.17 0.73 0.73 0.53 0.53 0.25 0.25 -0.96 -0.96 -0.52 -0.52 Val Val V V 0.76 0.76 -0.92 -0.92 -0.17 -0.17 -1.91 -1.91 0.22 0.22 -1.40 -1.40 -0.24 -0.24 -0.03 -0.03

계속하여 기계학습 훈련용 펩타이드 집합에 대한 M 세포 표적 여부의 실험값과 펩타이드 서열에 대한 표현자 값을 입력값으로 사용하여 기계학습 훈련을 실시한다(S13단계). 여기서 기계학습을 위한 방법으로는 신경망(Neural Network), 데이터마이닝(Data Mining), 의사결정 트리(Decision Tree), 사례기반 추론(Case Based Reasoning), 패턴 인식(Pattern Recognition), 강화 학습(Reinforcement Learning)을 사용하며, 예컨대 피드 포워드 신경망을 사용할 경우는 기계학습 훈련용 펩타이드 집합을 사용하여 피드 포워드 신경망 학습 훈련을 실시하는데 기계학습법을 위한 피드 포워드 신경망의 구조는 입력층, 은닉층, 출력층으로 구성되고, 입력층은 입력노드로 구성되어 있으며, 입력 노드의 수는 펩타이드 서열 길이의 수에 표현자값이 갖는 성분의 수를 곱하여 결정되며, 하나의 입력 노드는 하나의 표현자값 성분인 실수 또는 정수이다. 은닉층은 하나로서 0 ~ 3 개의 은닉노드로 구성되며, 출력층은 출력노드로 구성되며 출력노드의 수는 하나이다. 펩타이드 서열 길이 7에 대해 20자리 바이너리 아미노산 표현자를 사용하는 경우 피드 포워드 신경망의 구조는 140개의 입력노드로 구성되고, 각 노드의 입력값은 상기 S12단계에서 만들어진 140개의 표현자값 "0" 또는 "1"이다. 모든 펩타이드 서열 길이에 대한 피드 포워드 신경망의 구조는 은닉층을 사용하지 않고 바로 하나의 출력노드가 있는 출력층으로 구성할 수도 있다.Subsequently, machine learning training is performed using the experimental values of whether M cells are targeted for the machine learning training peptide set and the expression value of the peptide sequence as input values (step S13). The methods for machine learning include Neural Network, Data Mining, Decision Tree, Case Based Reasoning, Pattern Recognition, Reinforcement Learning For example, in the case of using a feed forward neural network, feed forward neural network learning training is performed using a set of machine learning training peptides. The structure of the feed forward neural network for machine learning is composed of an input layer, a hidden layer, and an output layer. The input layer is composed of input nodes, and the number of input nodes is determined by multiplying the number of peptide sequence lengths by the number of components of the presenter value, and one input node is a real or integer value of one presenter value component. One hidden layer is composed of 0 to 3 hidden nodes, and an output layer is composed of output nodes, and the number of output nodes is one. In the case of using the 20-digit binary amino acid descriptor for the peptide sequence length 7, the structure of the feedforward neural network is composed of 140 input nodes, and the input values of each node are 140 expression values "0" or "1" created in step S12. "to be. The structure of the feedforward neural network for all peptide sequence lengths may consist of an output layer with just one output node without using a hidden layer.

다음에는 상기 S13 단계의 적절한 기계학습 훈련을 통하여 펩타이드 서열의 M 세포 표적 여부를 예측할 수 있는 M 세포 표적형 펩타이드 예측 모델을 획득한다(S14단계).Next, through the appropriate machine learning training of the step S13 to obtain an M cell-targeted peptide prediction model that can predict whether the peptide sequence targets M cells (step S14).

이어 상기 S14단계에서 획득한 M 세포 표적형 펩타이드 예측 모델과 상기 S5단계에서 획득한 기계학습 검증용 집합을 이용하여 펩타이드의 M 세포 표적 여부에 대한 예측값을 획득하고 실험값과 비교하여 M 세포 표적형 펩타이드 예측 모델을 검증하고 평가한다(S20단계). 이 S20단계는 S21단계 ~ S24단계로 이루어지며, 즉, 먼저 기계학습 모델 검증을 위한 입력값을 준비한다(S21단계). 이 S21단계에서는 상기 S5단계에서 획득한 기계학습 검증용 집합을 그대로 사용한다.Subsequently, using the M cell targeted peptide prediction model obtained in step S14 and the machine learning verification set obtained in step S5, the predicted value for M cell targeting of the peptide was obtained, and compared with the experimental value, M cell targeted peptide. Validate and evaluate the prediction model (step S20). This step S20 consists of steps S21 to S24, that is, first prepares an input value for the machine learning model verification (step S21). In step S21, the machine learning verification set obtained in step S5 is used as it is.

계속하여 기계학습 검증용 집합에 포함된 펩타이드 개별 서열을 표현자 값으로 변환한다(S22단계). 이때 표현자는 상기 S14단계에서 획득한 M 세포 표적형 펩타이드 예측 모델이 S13단계의 훈련과정에서 사용한 표현자와 반드시 동일한 표현자를 사용한다.Subsequently, the peptide individual sequences included in the machine learning verification set are converted into the presenter values (step S22). In this case, the presenter uses the same presenter as the M cell targeted peptide prediction model obtained in the step S14 used in the training process of the step S13.

이어 기계학습 검증용 펩타이드 집합으로 펩타이드 서열에 대한 아미노산 표현자 값을 입력값으로 사용하고, M 세포 표적 여부를 예측하기 위하여 상기 S14단계에서 획득한 M 세포 표적형 펩타이드 예측 모델을 획득한다(S23단계).Subsequently, the amino acid presenter value for the peptide sequence is used as an input value as a set of machine learning verification peptides, and an M cell-targeted peptide prediction model obtained in step S14 is obtained in order to predict whether or not M cells are targeted (step S23). ).

그 후 기계학습 검증용 집합을 이용하여 M 세포 표적 여부에 대한 예측값을 획득하고 이를 이용하여 상기 S23단계에서 획득한 M 세포 표적형 펩타이드 예측 모델을 검증하고 그 결과를 하기 표 3에 나타냈다(S24단계).Then, using the machine learning verification set to obtain the predicted value for the target M cell, and using this to verify the M cell target peptide prediction model obtained in step S23 and the results are shown in Table 3 below (step S24) ).

상기 S22단계에서 표현자를 20자리 바이너리 아미노산 표현자로 사용하여 기계학습 모델을 훈련하여 상기 S24단계를 실행하고 그 결과를 하기 표 3에 나타냈다. 피드 포워드 신경망의 입력값 순서를 임의로 변경하여 검증을 3번 실시한 결과 펩타이드 길이 3에 대한 반응자 작용 특성 점수가 훈련용 집합이 0.8678 ±0.0062, 검증용 집합이 0.8609 ±0.0122이 되었다.In step S22, using the presenter as a 20-digit binary amino acid presenter, a machine learning model was trained to execute step S24, and the results are shown in Table 3 below. As a result of performing the verification three times by arbitrarily changing the order of input values of the feed forward neural network, the responder action characteristic score for the peptide length 3 was 0.8678 ± 0.0062 for the training set and 0.8609 ± 0.0122 for the validation set.

M 세포 표적 예측 모델의 검증결과Results of M Cell Target Prediction Model 펩타이드 서열 길이 Peptide sequence length 반응자 작용특성 점수(ROC score) Responder Activity Score 훈련용(80%) For training (80%) 검증용(20%) Verification (20%) 3 3 0.8678 ±0.0062 0.8678 ± 0.0062 0.8609 ±0.0122 0.8609 ± 0.0122 4 4 0.7644 ±0.0025 0.7644 ± 0.0025 0.7020 ±0.0155 0.7020 ± 0.0155 5 5 0.7984 ±0.0110 0.7984 ± 0.0110 0.7544 ±0.0172 0.7544 ± 0.0172 6 6 0.8571 ±0.0048 0.8571 ± 0.0048 0.7248 ±0.0132 0.7248 ± 0.0132 7 7 0.9314 ±0,0101 0.9314 ± 0,0101 0.6871 ±0.0064 0.6871 ± 0.0064

상기 S22단계에서 표현자를 VHSE 아미노산 표현자로 사용하여 기계학습 모델을 훈련하여 상기 S24단계를 실행하고 그 결과를 하기 표 4에 나타냈다. 피드 포워드신경망의 입력값 순서를 임의로 변경하여 검증을 3번 실시한 결과 펩타이드 길이 3에 대한 반응자 작용 특성 점수가 훈련용 집합이 0.8177 ±0.0079, 검증용 집합이 0.7974 ±0.0187로 되었다.In step S22, using the presenter as a VHSE amino acid presenter to train the machine learning model to perform the step S24 and the results are shown in Table 4 below. The test was performed three times by randomly changing the order of input values of the feed forward neural network. As a result, the responder action characteristic score for the peptide length 3 was 0.8177 ± 0.0079 for the training set and 0.7974 ± 0.0187 for the validation set.

M 세포 표적 예측 모델의 검증결과Results of M Cell Target Prediction Model 펩타이드 서열 길이 Peptide sequence length 반응자 작용특성 점수(ROC score) Responder Activity Score 훈련용(80%) For training (80%) 검증용(20%) Verification (20%) 3 3 0.8177 ±0.0079 0.8177 ± 0.0079 0.7974 ±0.0187 0.7974 ± 0.0187 4 4 0.7309 ±0.0154 0.7309 ± 0.0154 0.7064 ±0.0083 0.7064 ± 0.0083 5 5 0.8067 ±0.0027 0.8067 ± 0.0027 0.7449 ±0.0193 0.7449 ± 0.0193 6 6 0.8067 ±0.0027 0.8067 ± 0.0027 0.7433 ±0.0205 0.7433 ± 0.0205 7 7 0.8536 ±0,0057 0.8536 ± 0,0057 0.6710 ±0.0464 0.6710 ± 0.0464

이와 같은 실시예를 통하여 펩타이드 서열에 대해 입력층, 은닉층, 출력층으로 구성된 피드 포워드 인공 신경망 모델이 실제 M 세포 표적 펩타이드와 M 세포 비표적 펩타이드를 구별하였음을 알 수 있다.Through this embodiment, it can be seen that the feedforward artificial neural network model composed of the input layer, the hidden layer, and the output layer with respect to the peptide sequence distinguished the actual M cell target peptide from the M cell non-target peptide.

도 3은 본 발명 기계학습법을 이용한 새로운 펩타이드 서열의 M 세포 표적 예측방법의 일실시예를 도시한 순서도로서, 먼저 입력수단(20)을 통해 M 세포 표적을 알고자 하는 펩타이드 서열을 입력하여 프로그램 기록매체(11)에 저장한다(S101단계). Figure 3 is a flow chart showing an embodiment of a method for predicting the M cell target of the new peptide sequence using the machine learning method of the present invention, first input the peptide sequence to know the M cell target through the input means 20 to record the program The medium 11 is stored in the medium 11 (step S101).

다음에 입력된 펩타이드 개별 서열을 도 2의 과정을 통하여 훈련된 예측 모델(S23단계)에서 요구하는 표현자 값으로 변환한다(S102단계).Next, the individual peptide sequences inputted are converted into the expression values required by the trained prediction model (step S23) through the process of FIG. 2 (step S102).

그 후 훈련된 예측 모델(S23단계)로 구성된 M 세포 표적 예측 모델에 적용한다(S103단계).After that, it applies to the M cell target prediction model consisting of the trained prediction model (step S23) (step S103).

M 세포 표적 여부를 알고자 사용자가 입력한 새로운 펩타이드 서열의 M 세포 표적 여부를 출력한다(S104단계).To find out whether the target M cell is outputted whether the target M cell of the new peptide sequence input by the user (S104).

도 4는 본 발명에 따른 M 세포 표적 예측 모델을 재훈련하는 방법을 나타낸 순서도로서, 우선 M 세포 표적 여부 활성값을 실험기법으로 획득한 새로운 특정 M 세포 표적 펩타이드 서열 및 M 세포 비표적 펩타이드 서열을 입력수단(20)을 통해 프로그램 기록매체(11)에 저장한다(S201단계).4 is a flowchart illustrating a method for retraining an M cell target prediction model according to the present invention. First, a new specific M cell target peptide sequence and an M cell non-target peptide sequence obtained by an experimental technique for M cell target activity value are obtained. The program recording medium 11 is stored through the input means 20 (step S201).

이어 상기한 도 2의 S3단계 ~ S5단계 및 S10단계, S20단계를 수행하여 기계학습 모델을 훈련한 다음 이를 검증하고 기존의 기계학습 모델과의 비교값을 자동화 하여 실행한다(S210단계). 먼저 새로이 입력된 펩타이드 서열이 이미 지정되어 있는 서열과 동일한 것인지를 판명한 후 이들 서열을 활성값에 따라 M 세포 표적 펩타이드 집합과 M 세포 비표적 펩타이드 집합에 추가하여 저장한다(S211단계).Subsequently, the machine learning model is trained by performing the steps S3 to S5 and the steps S10 and S20 of FIG. 2, and then verified and automated by comparing the comparison with the existing machine learning model (step S210). First, it is determined whether the newly input peptide sequences are the same as the already designated sequences, and these sequences are added and stored according to the activity value to the M cell target peptide set and the M cell non-target peptide set (step S211).

다음에 기존에 저장되어 있는 펩타이드 서열에 새로이 입력된 펩타이드 서열을 추가하여 상기한 도 2의 S3단계의 기계학습 훈련을 위하여 펩타이드 서열 집합을 분류하고, S4단계의 기계학습 훈련용 펩타이드 집합을 획득하며, S5단계의 기계학습 검증용 펩타이드 집합을 획득하고, S10단계의 기계학습법을 이용하여 M 세포 표적형 펩타이드 예측 모델을 훈련하여 획득하며, S20단계의 기계학습법을 이용하여 M 세포 표적형 펩타이드 예측 모델을 검증한다(S212단계).Next, by adding the newly input peptide sequence to the previously stored peptide sequence to classify the peptide sequence set for the machine learning training of step S3 of FIG. 2, and obtains the peptide set for machine learning training of step S4 , Obtain a set of peptides for the machine learning verification of step S5, train the M cell-targeted peptide prediction model using the machine learning method of step S10, and obtain an M cell-targeted peptide prediction model using the machine learning method of S20. Verify (S212).

그 후 기존에 저장되어 있는 M 세포 표적형 펩타이드 예측 모델의 반응자 작용특성 점수와 상기 S212단계에서 획득한 M 세포 표적형 펩타이드 예측 모델의 반응자 작용특성 점수를 비교한다(S213단계).Thereafter, the responder functional characteristic scores of the M cell-targeted peptide prediction model stored previously are compared with the responder functional characteristic scores of the M cell-targeted peptide predictive model obtained in step S212 (step S213).

이어 상기 S213단계에서 계산된 반응자 작용특성 점수를 사용자에게 출력하며 이를 근거로 사용자가 새로이 훈련된 M 세포 표적형 펩타이드 예측 모델을 저장한다(S202단계).Subsequently, the responder action characteristic score calculated in step S213 is output to the user, and based on this, the user stores the newly trained M cell targeted peptide prediction model (step S202).

이와 같이 함으로써 사용자가 실험을 통해 새로 획득한 M 세포 표적 펩타이드 서열을 이용하여 수학적 모델 기반의 예측 모델을 재훈련하고 검증할 수 있다.In this way, the user can retrain and validate the mathematical model based prediction model using the newly acquired M cell target peptide sequence through experiments.

지금까지 바람직한 실시예로서 본 발명을 설명하였지만 본 발명은 이에 한정되지 않고 발명의 요지를 이탈하지 않는 범위 내에서 다양하게 변형하여 실시할 수 있음은 물론이다.Although the present invention has been described as a preferred embodiment so far, the present invention is not limited thereto and can be variously modified and implemented within the scope not departing from the gist of the invention.

상기한 바와 같이 본 발명에 의하면, 경구용 백신으로 이용하고자 하는 기존의 단백질계 약물 및 백신이 가지고 있는 낮은 운송효율을 개선하기 위한 M 세포 표적 물질로서의 펩타이드 서열에 대한 M 세포 표적 여부를 항상 실험으로 확인하지 않고 프로그램을 저장한 기록매체를 이용하여 미리 예측함으로써 실험으로 소요되는 비용과 시간을 절감할 수 있는 각별한 장점이 있다.As described above, according to the present invention, whether or not M cell targets for peptide sequences as M cell target substances to improve the low transport efficiency of existing protein-based drugs and vaccines to be used as oral vaccines are always experimentally tested. There is a special advantage to reduce the cost and time required for experiments by predicting in advance by using the recording medium that stores the program without checking.

Claims

A microcomputer 10 comprising a program recording medium 11, a CPU 12, and an input / output unit 13; Input means 20; In the M cell target prediction system of the peptide sequence using a mathematical model, characterized in that the output means 30,

The program recording medium 11 operates under the control of the microcomputer 10, wherein the program recording medium 11 includes a process of obtaining a sample of a peptide sequence targeting M cells using an experimental technique; Obtaining a sample of peptide sequences that do not target M cells based on these sequences; Storing each of the obtained samples as a set and then randomly extracting them in a predetermined ratio to classify them into a mathematical model training set and a mathematical model verification set; Assigning presenter values and activity values to individual peptide sequences; Training using training peptide sets and mathematical models; A process of predicting M cell targets for the validation peptide set using a trained mathematical model; M cell target prediction system of peptide sequence using a mathematical model, characterized in that it comprises a process of validating a trained mathematical model.

According to claim 1, wherein the program recording medium (11) when a user inputs a peptide sequence to know whether or not to target the M cell, the program converts it into an amino acid descriptor, and trained mathematical model to determine whether the M cell target A program for predicting and adding a new M cell target peptide sequence obtained by the user with the M cell target activity value by an experimental technique, the program is added to the original M cell target peptide set and then classified; Programs that give expression and activity values to peptides, programs that train mathematical models using training peptide sets, programs that allow predicting M cell targets for validation peptide sets, and trained mathematical The mathematical model includes a program for verifying the model. M predicted target cell system of a peptide sequence.

Obtaining a sample of a peptide sequence targeting M cells using an experimental technique; Obtaining a sample of a sequence of peptides that do not target M cells based on these sequences; Storing each of the obtained samples as a set and then randomly extracting them in a predetermined ratio to classify them into a mathematical model training set and a mathematical model verification set; Assigning presenter values and activity values to individual peptide sequences; Training using a training peptide set and a mathematical model; Predicting M cell target for the validation peptide set using a trained mathematical model; M cell target prediction method of the peptide sequence using a mathematical model, characterized in that it comprises the step of verifying the trained mathematical model.

4. The mathematical model of claim 3, wherein the mathematical model includes regression analysis, machine learning, multiple regression analysis using genetic algorithm, partial least square method using genetic algorithm, partial least square method using principal component analysis, and multiple regression using principal component analysis. A method for predicting M cell target of a peptide sequence using a mathematical model, characterized in that it is a quantitative structure-characteristic correlation method including an analysis method.

The mathematical model of claim 4, wherein the machine learning method is a neural network, data mining, decision tree, inductive logic, case-based reasoning, pattern recognition, reinforcement learning, Bayesian network, hidden Markov model, and a stochastic grammar method. M cell target prediction method of the peptide sequence used.

5. The method of claim 4, wherein the machine learning method is a neural network technique.

4. The method of claim 3, wherein the presenter value is a quantitative representation of molecular structure, amino acids, and peptides.

The M cell of the peptide sequence of claim 7, wherein the presenter value comprises at least one of a binary amino acid presenter, a VHSE amino acid presenter, a Z3 amino acid presenter, and a Z5 amino acid presenter. Target prediction method.

The method of claim 3, wherein the data collected to build the mathematical model is in vivo, ex vivo , in M cell target prediction method of the peptide sequence using a mathematical model, characterized in that the data obtained from at least one of the in vitro experiments.

According to claim 3, wherein using the phage display technique experimental data collected in order to build the mathematical model in vivo , ex vivo , in M cell target prediction method of the peptide sequence using a mathematical model, characterized in that the data obtained from at least one of the in vitro experiments.

The method of claim 3, wherein the peptide sequence is a sequence consisting of 2 to 12 peptides.

According to claim 3, wherein the peptide sequence consists of 3 to 7 peptides

M cell target prediction method of the peptide sequence using a mathematical model, characterized in that the column.

A method for predicting M cell target of a peptide sequence using a mathematical model, wherein the species is a mammal to which the method for predicting M cell target of the peptide sequence according to claim 3 is applied.

A method for predicting M cell target of a peptide sequence using a mathematical model, wherein the species is a human applying the method for predicting M cell target of the peptide sequence according to claim 3.

Obtaining a sample of peptide sequences targeting M cells using experimental techniques; Obtaining a sample of peptide sequences that do not target M cells based on these sequences; Storing each of the obtained samples as a set and then randomly extracting them in a predetermined ratio to classify them into a mathematical model training set and a mathematical model verification set; Assigning presenter values and activity values to individual peptide sequences; Training using training peptide sets and mathematical models; A process of predicting M cell targets for the validation peptide set using a trained mathematical model; A recording medium storing an M cell target prediction program of a peptide sequence using a mathematical model, the process comprising verifying a trained mathematical model.