KR20030040908A

KR20030040908A - A Method and a Computer Program To Simulate Laboratory Gene Cloning Procedure Under Virtual Conditions for Generating Gene Clone Database.

Info

Publication number: KR20030040908A
Application number: KR1020010071566A
Authority: KR
Inventors: 정대 최; 고흥 김; 은호 양; 상욱 이; 호선 손; 동승 신; 정권 유; 연철 정; 경진 김; 두길 강; 동구 한; 재용 박
Original assignee: (주)뉴로제넥스
Priority date: 2001-11-17
Filing date: 2001-11-17
Publication date: 2003-05-23
Also published as: KR100512915B1

Abstract

PURPOSE: A computer program for simulating a gene cloning process with the same as an actual experimental process and storing a result in a database, and a method for the same are provided to enable all persons specialized in molecular biology to easily and conveniently use, store, and manage the precious information. CONSTITUTION: A primer designer module designs and makes a primer in order to perform the DNA(Deoxyribo Nucleic Acid) PCR(Polymerize Chain Reaction). A PCR module makes the PCR product by checking an annealing state between the primer and a sequence. A DNA cutter module makes the DNA product generated when the sequence is cut by a restriction enzyme. A ligation module makes one sequence through the addition of two sequences by calculating overhang of the two sequences. A cut and ligation module realizes all cloning processes at one time by inputting a vector sequence object, an insert sequence object, and the restriction enzyme necessary for cloning. A sequence editor module edits the sequence in order to fit to a sequence object expression equation used by a cloning editor module. A sequence database stores the sequence object making all data generated by each process and inputted products as the object.

Description

Computer program and method for simulating the gene cloning process in the same way as the actual experiment process and storing the result in database form. {A Method and a Computer Program To Simulate Laboratory Gene Cloning Procedure Under Virtual Conditions for Generating Gene Clone Database.}

본 발명은 유전자 분석용 프로그램의 하나로서, 대다수의 분자생물학 분야의 실험자들이 수행하고 있는 유전자 분석 및 클로닝 작업을 컴퓨터를 이용하여 시뮬레이션하고, 실제 실험과 유사한 과정을 거쳐 그 결과물을 데이터베이스 형태로 저장할 수 있도록 해주는 프로그램이다.The present invention is one of the programs for genetic analysis, which simulates the genetic analysis and cloning work performed by the experimenter in the field of molecular biology using a computer, and stores the result in a database form through a similar process to the actual experiment. It is a program that allows you to do this.

기존의 프로그램들에서는 클로닝 작업에 필요한 일련의 과정들이 프로그래머의 관점에서 배열되어 있기 때문에 실제의 사용자들은 수백 페이지 분량의 매뉴얼을 숙독해야만 작업을 수행할 수 있었고, 또한 실제의 유전자 조작과는 다르게 유전자를 조작하며 표현되는 유전자도 실제의 형태를 정확하게 표현하지 못하는 한계를 가지고 있었다.In existing programs, the sequence of procedures required for cloning is arranged from the programmer's point of view, so that the actual user can read through hundreds of pages of manuals to perform the task. Genes that are manipulated and expressed also have limitations that cannot accurately express the actual form.

본 발명은 실험실과 연구소등에서 수행되는 클로닝 작업을 수행할 때 현장에서 절실히 느끼는 아래의 필요성 의해 발명되었다.The present invention was invented by the following necessity which is felt desperately in the field when performing a cloning operation performed in a laboratory and a laboratory.

첫째로, 컴퓨터를 이용하여 사용자가 쉽고 빠르게 사용하여 업무와의 연계성을 극대화하는 것과Firstly, users can easily and quickly use the computer to maximize the connection with their work.

둘째로, 실험자 자신의 보유 클론과 그 모든 상태에 대해 디지털 데이터로 기록, 저장함으로써 보유한 수많은 유전자 클론에 대한 상태를 정확히 파악하고 관리할 수 있게 하는 것과Secondly, by recording and storing digital data of the experimenter's own clones and all of them, it is possible to accurately identify and manage the status of the numerous gene clones they possess.

셋째로, 실험자간(더 크게는 실험실간, 연구소간) 재조합 유전자 보유현황검색을 가능케 하고 유전자 교환 등 상호 교류를 향상시켜 연구 활동을 보다 촉진하고자 하는 필요에 의해 수행되었다.Third, it was carried out by the necessity of facilitating research activities by enabling retrieval of recombinant gene retention status between experimenters (largerly between laboratories and institutes) and improving mutual exchange such as gene exchange.

국내의 대학 실험실이나 기업연구소의 경우에도 연구자들이 평균 5년 이내의 주기로 교체되곤 하고 있다. 이 과정에서 대게는 전임자의 연구 내용이 보고서나 인수인계서 혹은 학위 논문의 형태로 정리되고 후임자에게 전달되지만 클로닝 작업 산물이 특정 유전체를 포함한 플라스미드의 전체 시퀀스의 기록으로 남아 있지 않은 것이 현실이다. 그로 인해 후임자들은 전임자들의 클로닝 결과물을 재사용하지 못하고 새로이 클로닝 작업을 수행하거나, 전혀 다른 일을 하게 되어 해당 실험실이나 연구소뿐만 아니라 국가적 자원의 손실로 이어지는 것이 현실이다.In the case of university laboratories and corporate research institutes in Korea, researchers are often replaced by cycles within an average of five years. In the process, the predecessor's work is usually arranged in the form of a report, a takeover, or a dissertation and passed on to the successor, but the cloning work is not a record of the entire sequence of plasmids containing the specific genome. As a result, successors may not be able to reuse the cloning results of their predecessors and perform new cloning tasks or do completely different tasks, leading to the loss of national resources as well as the laboratory or laboratory.

따라서 일련의 클로닝 결과물을 데이터베이스화하여 시퀀스 정보로 확보하고자하는 필요가 대두되었고, 아울러 컴퓨팅 작업에 익숙하지 않은 실험자들이 손쉽게 사용할 수 있는 프로그램에 대한 요구가 커지고 있다.Therefore, there is a need to secure a sequence of cloning results as a database and to obtain the sequence information. In addition, there is a growing demand for a program that can be easily used by experimenters who are not familiar with computing tasks.

본 발명은 위와 같은 실험 현장에서의 요구를 충족하기 위하여 본 발명은 연구자들이 유전자 클로닝 과정을 실지 실험과정과 동일하게 시뮬레이션하고 그 결과물을 데이터베이스 형태로 저장할 수 있도록 하여 분자생물학을 전공하는 모든 사람들이 소중한 정보들을 쉽고 편하게 사용할 수 있고 정확하게 저장, 보관, 관리할 수 있도록 하기 위해서 발명되었다.In order to meet the needs of the experimental field as described above, the present invention allows researchers to simulate the gene cloning process in the same way as the actual experimental process and store the result in a database form, which is valuable for all who major in molecular biology. It was invented to enable easy and convenient use of information and to accurately store, store and manage information.

이에 따라 본 발명은 실험자들이 실제 클로닝 작업을 수행할 때 필요한 실험 기법들을 프로그래밍상의 모듈 개념에 대응시키고, 조작되는 유전자의 특성을 그대로 표현해주는 유전자 표현법을 개발하고자 하였다. 즉, 클로닝에 익숙한 실험자들이 유전자와 유전자 운반체(벡터)를 조작하는 것과 동일한 과정으로 프로그램의 모듈을 이용할 수 있도록 하였다. 그럼으로써 나오는 결과물이 곧 새로운 프라스미드가 되는 것과 같이 이 플라스미드 전체 시퀀스가 산출, 기록되도록 고안하였다.Accordingly, the present invention attempts to develop a gene expression method that corresponds to the modular concept of the experimental techniques necessary for the experimenter to perform the actual cloning operation, and expresses the characteristics of the gene to be manipulated. In other words, experimenters who are familiar with cloning can use modules in the program in the same process as manipulating genes and gene carriers (vectors). The entire sequence of this plasmid was calculated and recorded as if the result were a new plasmid.

도 1은 본 발명에 따르는 피씨알과 라이케이션으로 구성하는 클로닝 과정을 나타내는 도면이다.1 is a view showing a cloning process consisting of a PCAL and a line according to the present invention.

도 2는 본 발명에 따르는 디엔에이 커터와 라이게이션으로 구성하는 클로닝 과정을 나타내는 도면이다.2 is a view showing a cloning process consisting of a die cutter and ligation according to the present invention.

도 3은 도1의 피씨알 과정 중 프라이머 디자이너에 대한 도면이다.FIG. 3 is a diagram illustrating a primer designer during the process of FIG. 1.

도 4는 클로닝 에디터 중 시퀀스 에디터에 대한 도면이다.4 is a diagram illustrating a sequence editor among cloning editors.

도 5는 시퀀스의 오버행 표현에 대한 도면이다.5 is a diagram for an overhang representation of a sequence.

도 6은 시퀀스 오버행 표현인 ENDS의 시퀀스 파일 표현에 대한 도면이다.6 is a diagram of a sequence file representation of ENDS, which is a sequence overhang representation.

도 7은 라이게이션 과정 중 SPM의 작용에 관한 도면이다.7 is a view of the action of the SPM during the ligation process.

도 8은 SPM의 시퀀스 파일 표현에 대한 도면이다.8 is a diagram for a sequence file representation of an SPM.

도 9는 라이게이션 할 때 라이게이션이 가능한지 여부를 판단하는 방법에 대한 도면이다.9 is a diagram for a method of determining whether or not ligation is possible when ligation is performed.

이러한 목적을 달성하는 클로닝 에디터(Cloning Editor)는 실지 유전자 클로닝 과정을 일련의 단계적인 작업들로 분해한 뒤, 모듈화하여 아래와 같이 일곱 가지 모듈로 구성되었다.To achieve this goal, the Cloning Editor breaks down the actual cloning process into a series of steps and then modularizes it into seven modules:

첫째, DNA 중합효소 연쇄반응(PCR, Polymerize Chain Reaction, 이하 피씨알)을 하기 위해 프라이머를 설계하여 만들어내는 프라이머 디자이너(Primer Designer)First, Primer Designer, which designs and creates primers for DNA Polymerase Chain Reaction (PCR).

둘째, 프라이머와 시퀀스간 정렬 상태(annealing)를 검사해 피씨알 산물(PCR Product)을 만들어 내는 피씨알Second, the PAL that produces the PCR product by examining the alignment between the primer and the sequence.

셋째, 시퀀스를 제한효소로 잘랐을 때 생성되는 디엔에이 산물을 만들어내는 디엔에이 커터(DNA Cutter)Third, a DNA cutter that produces a DNA product produced when the sequence is cut with a restriction enzyme.

넷째, 두 시퀀스의 오버행을 계산해 두 시퀀스를 하나로 합쳐 하나의 시퀀스를 만들어 내는 라이게이션Fourth, ligation that calculates overhang of two sequences and combines them into one to create one sequence

다섯째, 클로닝에 필요한 벡터 시퀀스 객체, 인서트 시퀀스 객체, 제한효소를 입력해 이 모든 클로닝 과정을 한 번에 가능하게 하는 컷 앤 라이게이션(Cut & Ligation)Fifth, cut & ligation that enables all cloning process at once by inputting vector sequence object, insert sequence object and restriction enzyme required for cloning.

여섯째, 시퀀스를 클로닝 에디터에 사용되는 시퀀스 객체 표현식에 맞게 편집할 수 있는 시퀀스 에디터(Sequence Editor)Sixth, a sequence editor that can edit a sequence to match the sequence object expression used in the cloning editor.

일곱째, 각 과정에서 생성되는 모든 자료 및 입력되는 산물들을 객체화시킨 시퀀스 객체를 저장할 수 있는 시퀀스 객체 데이터베이스(Sequence Database)Seventh, a sequence object database that can store all the data created in each process and sequence objects that object the input products.

로 구성되어있다.Consists of

이러한 본 발명은 다음의 첨부된 도면과 관련된 상세한 설명에 의해 보다 잘 이해될 수 있다.This invention may be better understood by the following detailed description taken in conjunction with the accompanying drawings.

도 1, 도 2, 도 3, 도 4를 참조하면, 각 도면은 클로닝 과정을 그림으로 정리한 것으로 모서리가 둥근 사각형으로 표시된 부분이 클로닝의 각 과정들이고, 각이 진 사각형으로 표시된 부분이 각 과정에 데이터로 쓰일 시퀀스 객체를 의미한다. 이 과정은 실험실에서 수행하는 과정과 동일하다.Referring to FIGS. 1, 2, 3, and 4, each drawing shows the cloning process in a pictorial manner, in which the corners with rounded rectangles are the processes of cloning, and the portions indicated with the angled squares are each process. Sequence object to be used as data. This process is the same as that performed in the laboratory.

도 1은 클로닝 에디터에서 피씨알과 라이게이션으로 구성하는 방법을 설명한 것으로, 벡터 시퀀스 객체와와 피씨알 결과 시퀀스 객체 이 두 시퀀스를 라이게이션을 통해서 새로운 플라스미드를 만들어 낸다. 만일 라이게이션 할 때 시퀀스가 하나만 입력되면 입력된 하나의 시퀀스로만 라이게이션을 수행한다. 이 방법을 자체 라이게이션(self ligation)이라고 부른다.1 illustrates a method of constructing a PCAL and a ligation in the cloning editor. The vector sequence object and the PC result sequence object form a new plasmid through ligation. If only one sequence is input when ligating, the license is executed with only one input sequence. This method is called self ligation.

도 2를 참조하면, 클로닝 에디터에서 디엔에이 커터와 라이게이션으로 구성하는 방법을 표현한 것으로, 라이게이션이 되는 시퀀스가 준비되어 있지 않는 경우 벡터 시퀀스(이하 벡터)와 인서트 시퀀스(이하 인서트)를 만드는 과정을 설명한 것이다. 미리 벡터와 인서트가 준비가 되어있다면 이 과정은 생략 가능하다. 벡터는 준비된 시퀀스 벡터와 인서트를 디엔에이 커터에서 적당한 제한효소로 잘라 벡터와인서트에 라이게이션이 가능한 오버행을 만든 후 라이게이션을 수행하게 된다. 또한 도 2의 과정을 한번에 컷 앤 라이게이션으로 한번에 수행 가능하다.Referring to FIG. 2, a method of constructing a die cutter and a ligation in the cloning editor is described. When a sequence to be ligated is not prepared, a process of creating a vector sequence (hereinafter referred to as vector) and an insert sequence (hereinafter referred to as insert) is described. It is explained. This step can be omitted if the vectors and inserts are ready in advance. The vector cuts the prepared sequence vector and insert with the appropriate restriction enzyme in the DNA cutter to make the overhang that can be ligated to the vector insert and then perform the ligation. Also, the process of FIG. 2 may be performed at once with cut and ligation.

도 3을 참조하면, 도2에 있어서 인서트를 만들어 내는 과정을 자세히 표현한 것으로, 이미 인서트가 준비 되어있다면 생략 가능하다. 피씨알을 이용해 인서트에서 시퀀스의 특정 부분을 증폭하기 위해서는 프라이머가 필요하다. 프라이머 디자이너에서 피씨알 가능한 프라이머를 만들어 낸다. 이 프라이머와 주형 시퀀스 객체(template sequence)를 넣어 피씨알 과정을 거치면 디엔에이 커터를 위한 준비를 마친 잘린 인서트가 생성된다.Referring to FIG. 3, a process of making an insert in FIG. 2 is described in detail, and may be omitted if an insert is already prepared. A primer is needed to amplify a specific part of the sequence in the insert using the PCAL. Create a primer that is available in the primer designer. The primers and the template sequence object are then subjected to the PSA process to generate a cut insert ready for the DNA cutter.

도 4를 참조하면, 클로닝 과정에서 필요한 시퀀스 객체를 편집할 수 있는 시퀀스 에디터를 설명한 것으로, 시퀀스 에디터는 시퀀스 객체를 불러와 수정, 편집하여 다시 시퀀스 객체로 저장하는 도구이다.Referring to FIG. 4, a sequence editor capable of editing a sequence object required in a cloning process has been described. The sequence editor is a tool for calling, modifying, editing, and storing a sequence object as a sequence object.

이 모든 과정을 통합해서 설명하면, 클로닝을 위해서는 벡터와 인서트가 필요한데, 인서트에서 특정한 범위의 시퀀스를 증폭하기 위해 피씨알 과정이 필요하며, 이 피씨알에 필요한 프라이머를 만들어 내기 위해서 프라이머 디자이너를 이용해 프라이머를 만들어 낸다. 이 과정은 이미 만들어진 프라이머가 있거나 프라이머가 필요 없는 경우는 생략이 가능하다. 피씨알을 통해 블런트(blunt,오버행이 없는) 시퀀스가 나오게 되면 디엔에이 커터를 거쳐 라이게이션을 위한 잘린 인서트가 만들어지게 된다. 벡터는 디엔에이 커터에서 오버행을 가지거나 블런트인 잘린 벡터가 나오게 되면 라이게이션 과정을 거쳐서 새로운 프라스미드가 만들어진다.Integrating all of these steps, cloning requires vectors and inserts, which require a PCR process to amplify a specific range of sequences in the insert, and using primer designers to create the primers needed for this process. To produce This process can be omitted if you have a primer already made or do not need it. When the PAL comes out of a blunt sequence, it cuts through the die cutter to create a cut insert for ligation. When the vector has an overhang or a blunt cut vector from the DNA cutter, a ligation process creates a new plasmid.

이러한 과정을 구현하기 위해서 각 과정의 프로그램은 이전 과정의 프로그램과 독립성을 유지하기 위해 프로그램 상호간에 오가는 데이터인 시퀀스를 객체(object)화 시켰으며 이를 텍스트 파일 형태로 표현, 보관한다. 여기에서 실험실에서 물질적으로 존재하는 분자를 객체화 시켜서 표현한 것을 시퀀스 객체(sequence object)라고 한다.In order to implement this process, the program of each process objectizes the sequence, which is the data coming and going between the programs, in order to maintain its independence from the program of the previous process. Here, the representation of the physically present molecules in the laboratory is called a sequence object.

또한, 본 발명에서는 이러한 시퀀스 객체를 클로닝 에디터의 각 모듈에서 입출력 할 수 있는 형식의 표현으로 하는 것을 시퀀스 객체 표현 형식이라고 하며 도 6과 도8에서 그 예를 확인할 수 있다. 시퀀스 객체 표현은 다른 시퀀스 표현 형식이 표현할 수 없는 시퀀스의 오버행을 표현하여 클로닝 에디터에서 쓰이는 유전자의 정확한 형태를 표현할 수 있다.In addition, in the present invention, the sequence object representation format is a representation of a format that can be input and output from each module of the cloning editor, and the example can be seen in FIGS. 6 and 8. The sequence object representation can express the exact form of the gene used in the cloning editor by expressing an overhang of a sequence that cannot be represented by another sequence representation format.

각 과정의 모든 프로그램들은 이 시퀀스 객체 표현 형식으로 표현된 시퀀스 객체를 불러오고, 저장할 수 있게 구성된다. 그러나 이전 다른 소프트웨어들이 사용하는 시퀀스 표현(Genbank, EMBL, FASTA...) 형태는 실험실의 분자의 정확한 상태를 표현하기엔 역부족이므로, 기존 시퀀스 객체 표현 형식을 포함하는 새로운 시퀀스 객체 표현 형식을 고안하였고 여기에 ENDS(시퀀스 객체 오버행, 블런트 표시)와 SPM(시퀀스 객체 기준 위치 표시)을 추가하였다. 도 6, 도 8에 새로 고안된 객체 표현 형식에 ENDS와 SPM을 추가한 형태를 확인 할 수 있다.All programs in each process are configured to load and save sequence objects represented in this sequence object representation format. However, the form of sequence representation (Genbank, EMBL, FASTA ...) used by other software is not enough to represent the exact state of the molecule in the laboratory, so we devised a new sequence object representation format that includes the existing sequence object representation format. We added ENDS (sequence object overhang, blunt display) and SPM (sequence object reference position display). 6 and 8, ENDS and SPM are added to the newly designed object expression format.

또한 각 과정에서 다루는 시퀀스 객체의 형태는 선형(linear) 또는 원형(circular)의 두 가지 형태를 가진다. 그러나 원형은 그 시퀀스 객체의 표현이 선형과 동일한 형태를 갖지만 시퀀스 객체의 헤더에 원형이라고 표시만 되어있다. 이런 원형 시퀀스 객체를 연산할 때 문제되는 부분은 원형 시퀀스 객체의 표현된시퀀스의 마지막 부분이 시퀀스의 처음과 연결되어 있는 구조라 선형 시퀀스 객체를 다룰 때와는 다르게 몇 가지 추가 과정이 필요로 한다. 본 발명에서는 각 구성 프로그램의 입력 시퀀스 객체가 원형일 경우 프로그램의 최대 연산 단위만큼의 시퀀스를 시퀀스 객체의 시작 부분에서 떼어내어 시퀀스의 끝 부분에 덧붙이면 그 시퀀스 객체를 선형 시퀀스 객체와 동일한 방법으로 처리할 수 있다. 이 선형 시퀀스 객체와 동일하게 계산된 결과에서 시퀀스 객체의 길이보다 큰 위치를 가지는 결과는 그 위치를 원형 시퀀스 객체의 길이로 나눈 나머지의 위치로 원형 위치의 시퀀스에 사상(mapping) 시키면 원형 시퀀스 객체의 결과로 변환된다.In addition, the sequence object types handled in each process have two types, linear or circular. However, the prototype has only the same representation as that of the sequence object, although the representation of the sequence object is linear. The problem with computing these circular sequence objects is that the last part of the sequence represented by the circular sequence object is connected to the beginning of the sequence, so some additional steps are required, unlike when dealing with linear sequence objects. In the present invention, when the input sequence object of each component program is circular, if the sequence of the maximum operation unit of the program is removed from the beginning of the sequence object and added to the end of the sequence, the sequence object is processed in the same manner as the linear sequence object. can do. The result of having the position larger than the length of the sequence object in the result calculated in the same way as this linear sequence object is the mapping of the position of the circular sequence object to the remaining position divided by the length of the circular sequence object. Is converted to the result.

여기서 각 프로그램의 연산 최대단위는 표 1과 같다.The maximum unit of operation of each program is shown in Table 1.

제한효소 관련 프로그램Restriction Enzyme Program 적용되는 제한효소의 인식 시퀀스 중 가장 긴 인식 시퀀스 길이Longest recognition sequence length among the restriction sequences applied 피씨알PCR 입력되는 프라이머 중 가장 긴 프라이머의 길이Length of the longest primer among the primers entered 프라이머 디자이너Primer designer 주어진 Tm value로 구성 가능한 최대 길이의 시퀀스 길이 또는 주어진 프라이머의 길이Sequence length of maximum length configurable with given Tm value or length of given primer

또한 디엔에이 커터 및 라이게이션에서 시퀀스 객체의 정확한 상태를 표시하기 위해서는 시퀀스 객체의 오버행 상태를 표현해야 한다. 도 5를 참조하면 오버행을 표현하기 위해 오버행의 표현은 시퀀스 객체의 5'과 3'의 차이로 표시한다. 시퀀스 객체의 시작 부분과 끝부분으로 나누어 각각의 위치에서 5'의 위치가 3'보다 바깥에 나가있는 상태면 +의 상태가 되고, 3'의 위치가 5'의 위치보다 바깥에 나가있는 상태가 되면 -의 상태가 된다. 또한 5'과 3'의 위치 차이에 의해서 숫자를 표현하게 된다. 시퀀스 객체에서 5'과 3'의 오버행 상태가 알려져 있다면 이 둘을 컴마 두개(..)의 구분자로 연결해 "(시퀀스 객체 처음부분의 오버행)..(시퀀스 객체마지막 부분의 오버행)"과 같은 형식으로 표현한다. 이 것을 ENDS라고 부른다.Also, in order to display the exact state of the sequence object in the die cutter and ligation, the overhang state of the sequence object must be represented. Referring to FIG. 5, in order to express an overhang, an expression of an overhang is represented by a difference between 5 ′ and 3 ′ of a sequence object. If the position of 5 'is outward than 3' at each position, it becomes the state of + and the position of 3 'is out of the position of 5' at each position. Becomes-. In addition, the number is represented by the position difference between 5 'and 3'. If the sequence object's 5 'and 3' overhang states are known, concatenate them with two comma delimiters (..) to form "(overhang at the beginning of the sequence object) .. (overhang at the end of the sequence object)". Express as This is called ENDS.

이렇게 표시하면 제한효소 BamHI으로 시퀀스를 자르면 그 잘려진 시퀀스는 +4..+4의 오버행상태를 가지게 되고, 이를 시퀀스 객체 표현 형식에 표현하기 위해 도 6의 (1)처럼 오버행을 표현하는 필드(ENDS)를 추가하여 시퀀스 객체를 표현한다. 이 때 블런트일 경우는 ENDS를 0..0으로 명시해 블런트라는 것을 확실하게 표현한다.In this case, when the sequence is cut with the restriction enzyme BamHI, the truncated sequence has an overhang state of +4 .. + 4, and the field representing the overhang as shown in (1) of FIG. ) To represent a sequence object. In this case, if the blunt is specified, the ENDS is expressed as 0..0 to express the blunt.

또한 프라이머 디자이너와 디엔에이 커터에서 제한효소를 입력받게 되는데, 제한효소 이름을 쓰는 사용자에 따라 HindIII, Hind3, hind3, hindiii… 같이 같은 제한효소의 이름을 로마자 숫자, 아라비아 숫자, 대소문자를 혼용하고 있어 대소문자와 로마자 숫자를 구분해야 할 필요성이 생긴다.In addition, primer designers and DNA cutters receive input restriction enzymes. HindIII, Hind3, hind3, hindiii… Likewise, the same restriction enzyme name is mixed with roman numerals, arabic numerals, and upper and lower case letters, which makes it necessary to distinguish between uppercase and lowercase letters.

이러한 문제를 해결하기 위해 제한효소의 이름을 입력받는 부분에서는 로마자 숫자로 표현되어 제한효소의 이름에 쓰이는 I, II, III, IV, V, I, ii, iii, iv, v를 숫자 1,2,3,4,5와 동일한 의미로 인식하며, 대소문자도 구분하지 않는다. 이는 제한효소에 들어가는 숫자의 기준을 로마자 대문자로 하고 제한효소의 이름에 숫자가 들어왔을 경우 그 부분을 해당하는 로마자 숫자로 변경하면 해당되는 제한효소의 이름을 로마자 숫자와 아라비아 숫자와 상관없이 구분이 가능하다. 예를 들면 hind3라는 입력이 들어오면 입력에 숫자 '3'이 있으므로 이 숫자를 로마자 숫자로 변경한 hindIII와 기존의 입력 hind3를 제한효소 테이블에서 검색하여 제한효소를 선택한다.In order to solve this problem, the name of the restriction enzyme is expressed in Roman numerals, and I, II, III, IV, V, I, ii, iii, iv, and v are used for the restriction enzyme. It is recognized as the same meaning as, 3,4,5, and it is not case sensitive. This means that if the restriction enzyme number is based on uppercase Roman letters, and if the restriction enzyme contains a number, change the part to the corresponding Roman numeral, and the restriction enzyme name will be separated regardless of Roman and Arabic numerals. It is possible. For example, if the input hind3 comes in, there is a number '3' in the input, so the restriction enzyme is selected by searching the restriction table for hindIII, which has been converted to Roman numerals, and the existing input hind3.

제한효소의 인식시퀀스에는 일반 시퀀스 문자인 ATGC이외에 일반적으로 다중 IUPAC 코드(Ambiguity IUPAC code)라 불리는 복합 문자가 들어가게 되는데, 혼합 IUPAC 코드는 표 2에서 보는 바와 같다. 이러한 혼합 IUPAC 코드를 인식하기 위해 본 발명에서는 집합의 개념을 차용해서 사용한다. 혼합 IUPAC 코드 R을 예로 들면 R은 A, G를 의미한다. 따라서 R ==> {A, G, R}로 확장해 대상 시퀀스에서 검색을 하면 R 표현에 포함되는지 여부를 알 수 있다. 이와 마찬가지로 나머지 혼합 IUPAC에 대해서 적용을 시켜보면 표 3을 얻을 수 있다.In addition to the normal sequence character ATGC, the restriction sequence of restriction enzymes contains a complex character commonly called an Ambiguity IUPAC code. The mixed IUPAC codes are shown in Table 2. In order to recognize such mixed IUPAC codes, the present invention borrows the concept of aggregation. Taking mixed IUPAC code R as an example, R means A, G. Therefore, if you expand R ==> {A, G, R} and search on the target sequence, you can see whether it is included in the R representation. Similarly, Table 3 can be obtained by applying the remaining mixed IUPACs.

이제 표 3에 의거 제한효소의 인식 시퀀스를 확장시키고, 이 확장시킨 시퀀스 집합에 대응하는 시퀀스를 입력 시퀀스 객체에서 검색하면 해당하는 혼합 IUPAC 코드로 이루어진 제한효소의 인식 시퀀스도 검색할 수 있다. 본 발명에서는 혼합 IUPAC 코드의 인식을 집합의 개념을 프로그램에서는 정규식 검색 방법을 이용해 처리한다.Now, if the recognition sequence of the restriction enzyme is extended according to Table 3, and the sequence corresponding to the expanded sequence set is searched in the input sequence object, the recognition sequence of the restriction enzyme composed of the corresponding mixed IUPAC code can be also retrieved. In the present invention, the concept of a set is processed in a program using a regular expression search method for recognition of mixed IUPAC codes.

인식 시퀀스의 확장 예를 들면 제한효소 'Bsp1286I'은 인식 시퀀스가 "GDGCHC"이다. 이것은 표 3에 의거 확장시키면 "{G}{A,G,T,D}{G}{C}{A,C,T,H}{C}"가 된다. 이 확장 시퀀스를 정규식으로 표현하면 "[G][AGTD][G][C][ACTH][C]"가 되며 이것을 대상 시퀀스 객체에 정규식 검색을 수행한다.Expansion of the recognition sequence For example, the restriction enzyme 'Bsp1286I' has a recognition sequence of "GDGCHC". When expanded according to Table 3, it becomes "{G} {A, G, T, D} {G} {C} {A, C, T, H} {C}". Expressing this extended sequence as a regular expression results in "[G] [AGTD] [G] [C] [ACTH] [C]", which performs a regular expression search on the target sequence object.

프라이머 디자이너는 피씨알을 하기 위해서는 프라이머가 필요한데, 이 프라이머를 자동으로 만들어 주는 프로그램으로, 기본적으로는 사용자가 지정한 영역을 피씨알하기 위한 프라이머를 Tm value 또는 시퀀스 객체와 프라이머가 정렬(annealing)되는 길이를 주어 만들어 낼 수 있다. 이 피씨알하기 위한 영역을 자동으로 결정하기 위한 여러 가지 방법을 제공한다.Primer Designer requires primers to do the PCR. The primer is a program that automatically creates primers. Basically, the primer length for TPM value or sequence object and primer is annealed for the user specified region. Can be created by There are several ways to automatically determine the area for this PCAL.

이러한 영역을 자동으로 결정하기 위한 방법에는 시퀀스 객체의 Feature Table내의 CDS를 검색, 시퀀스 객체의 Coding Sequence 검색, Longest ORF, Longest ORF include with out start or stop codon, Full Length등의 방법이 있다. 시퀀스 객체의 코딩 시퀀스(Coding sequence)를 검색하는 방법은 주어진 시퀀스에서 오알에프를 프로그램에서 찾아서 그 범위를 설정하는 방법이며, Longest ORF는 시퀀스 객체에서 시작 코돈(start codon)과 종료 코돈(end codon)이 있으면 그중 가장 긴 범위를 설정하는 방법이며, Longest ORF include without start or stop codon은 Longest ORF 방법과 비슷하나 시작 코돈이나 종료 코돈이 없을 경우는 시퀀스 객체의 마지막, 시퀀스 시작 부분에 코돈이 있다는 가정을 하고 그 부분도 오알에프로 인정하는 방법이다.Methods for automatically determining such areas include searching CDS in a feature table of a sequence object, searching a coding sequence of a sequence object, including a longest ORF, longest ORF include with out start or stop codon, and full length. The method of searching the coding sequence of a sequence object is to find OARF in a program in a given sequence and set its range. Longest ORF is a start codon and end codon in a sequence object. The longest ORF include without start or stop codon is similar to the Longest ORF method, but if there is no start codon or end codon, the assumption is that there is a codon at the end of the sequence object, at the beginning of the sequence. This part is also a way to recognize OAL.

또한 디자인되는 프라이머의 5'위치에 사용자가 원하는 링커(linker) 시퀀스를 추가하여 디자인할 수 있다. 이 추가되는 링커 시퀀스에 원하는 제한효소의 인식 시퀀스를 넣게 되면 피씨알 산물을 디엔에이 커터로 해당 제한효소로 시퀀스 객체를 잘라서 라이게이션에서 사용 가능하다.In addition, it can be designed by adding a linker sequence desired by the user at the 5 'position of the designed primer. When the recognition sequence of the desired restriction enzyme is added to the added linker sequence, the DNA product is cut into the DNA cutter using the restriction enzyme, and the sequence object can be used for ligation.

프라이머 디자이너에서 디자인되는 프라이머가 피씨알에 쓰기 적합한 프라이머인지 검사하기 위해, 만들어진 프라이머가 정렬(annealing)되는 내부 시퀀스에미리 정해진 제한 효소 셋(set) 및 사용자가 지정한 특정 제한효소로 잘리는 인식 시퀀스가 있는 지와 디자인된 프라이머가 내부 시퀀스에 정렬(annealing)이 되는지를 검사하여 프라이머로 부적당한지 검사할 수 있다. 이 때 내부로 정렬(annealing)되는 조건은 프라이머 디자인 할 때 Tm value(기본값 54)보다 낮은 값(기본값 24)을 주어 검사한다.To check if a primer designed in the primer designer is a primer suitable for writing to a PC, there is a set of predetermined restriction enzymes in the internal sequence in which the created primers are annealed and a recognition sequence that is cut by a specific restriction enzyme specified by the user. Paper and designed primers can be annealed to internal sequences to determine if they are not suitable for primers. In this case, the condition to be annealed inside is checked by giving a value lower than the Tm value (default 54) when designing the primer.

이렇게 프라이머 디자이너로 디자인되거나, 미리 만들어놓은 프라이머로 피씨알 프로그램을 이용하여 피씨알 산물을 얻을 수 있다. 피씨알 프로그램은 프라이머와 주형 시퀀스의 결합을 프라이머 시퀀스 객체의 3' 위치부터 검사하여 주형 시퀀스와 완전히 대응되는 프라이머 시퀀스의 Tm값이 사용자가 지정한 Tm값 보다 크다면 프라이머가 주형 시퀀스에 결합되었다고 인정한다. 피씨알 프로그램에서 피씨알 결과로 나온 시퀀스 객체는 오버행이 없는 블런트형태의 시퀀스 객체를 만들어낸다. 또한 T-벡터 클로닝을 위해서 만들어진 피씨알 산물의 3'위치에 시퀀스 'A'를 추가할 수 있다.These primers can be designed with primer designers or pre-made primers to get the product. The PCR program checks the binding between the primer and the template sequence from the 3 'position of the primer sequence object, and recognizes that the primer is bound to the template sequence if the Tm value of the primer sequence completely corresponding to the template sequence is greater than the Tm value specified by the user. . The sequence object resulting from the PCAL in the PC program produces a blunt form of the object without an overhang. It is also possible to add the sequence 'A' at position 3 'of the PD product made for T-vector cloning.

디엔에이 커터에서는 피씨알 산물에서 나온 인서트 시퀀스나 벡터 시퀀스같이 시퀀스 객체를 받아 주어진 제한효소로 시퀀스 객체를 자른 후 제한효소에 따른 오버행을 추가해 이후 작업인 라이게이션으로 넘어가게 된다. 이 때 디엔에이 커터에 입력되는 시퀀스 객체가 원형이면 SPM(시퀀스 객체 기준 위치 표시, Standard Position Marker)를 시퀀스 객체 시작 위치(1번 위치)에 추가한다.In the DNA cutter, the sequence object, such as an insert sequence or a vector sequence from a PSA product, is received, the sequence object is cut with a given restriction enzyme, and an overhang according to the restriction enzyme is added to the later work. At this time, if the sequence object input to the die cutter is circular, SPM (Standard Position Marker) is added to the sequence object start position (position 1).

도 7을 참고하면, 벡터와 인서트가 준비되어 라이게이션으로 새로운 프라스미드가 만들어질 준비가 되면, 라이게이션은 벡터와 인서트의 ENDS 항목이 있으면도 9와 같이 오버행을 판단해 라이게이션이 가능한지 여부를 확인하여 라이게이션을 수행한다. ENDS 항목이 없으면 시퀀스 객체를 블런트로 인식해 블런트 라이게이션을 수행한다. 만일 라이게이션된 시퀀스 객체가 원형이고 SPM이 존재한다면 벡터의 위치를 보정하기 위해 SPM을 시작 위치를 기준으로 시퀀스 객체를 이동하고 SPM을 제거한다. 이렇게 함으로써 SPM이 포함된 시퀀스 객체의 위치를 보정한다.Referring to FIG. 7, when a vector and an insert are prepared and a new plasmid is prepared by ligation, the ligation determines whether the ligation is possible by judging an overhang as shown in FIG. 9 even if the ENDS items of the vector and the insert are prepared. Check and perform the ligation. If there is no ENDS item, the sequence object is recognized as a blunt and blunt ligation is performed. If the ligated sequence object is circular and the SPM exists, move the sequence object relative to the starting position of the SPM and remove the SPM to correct the position of the vector. This corrects the position of the sequence object containing the SPM.

이 때 라이게이션의 입력 시퀀스 객체가 2개가 아닌 1개만 입력되면 입력된 하나의 시퀀스 객체를 가지고 자기 자신과 라이게이션을 시도한다. 이를 자체 라이게이션(self ligation)이라고 한다.At this time, if only one input sequence object of the ligation is input, not two, the ligation is attempted with one of the input sequence objects. This is called self ligation.

이와 같이 본 발명은 실험실에서 클로닝하는 과정을 그대로 컴퓨터상에 옮겨 생물학자들이 쉽게 이용하고, 사용하기 편하고, 각 이동하는 데이터를 객체화(시퀀스 객체) 시켜 데이터와 작업 과정을 분리하였으며, 각 이동하는 데이터는 실제 실험상에 나타나는 시퀀스의 변화를 그대로 표현할 수 있다.As described above, the present invention moves the cloning process from the laboratory to the computer as it is, which is easily used by biologists, is easy to use, and separates the data from the work process by objectizing each moving data (sequence object). Can represent the change of the sequence appearing in the actual experiment as it is.

본 발명의 결과 실험자들은 실제로 유전자를 벡터에 클로닝하는 작업을 수행하듯이 각 모듈로 구성되어있는 프로그램을 운용하여 그 결과물을 컴퓨터상의 완전한 시퀀스로 얻을 수 있게 되었다. 본 발명을 사용자들이 사용하기 위해서는 실제 과정을 보여주는 간단한 매뉴얼을 따라서 한 번 해보는 것으로 충분하고, 그 모든 과정이 클로닝 작업을 그대로 보여주는 것이기 때문에 친숙하고 편하게 사용할 수 있는 강점을 가지게 되었다.As a result of the present invention, the experimenter can operate the program consisting of each module to obtain the result as a complete sequence on the computer as if the gene is actually cloned into the vector. In order to use the present invention, it is enough to try a simple manual showing the actual process, and since all of the process shows the cloning operation as it is, it has the advantages of being familiar and comfortable to use.

또한 그 과정을 자연스럽게 수행하고 나면 완전한 시퀀스 객체를 결과물에기록할 수 있기 때문에 각 실험실이나 연구소에서 축적되는 유전자와 그 운반체(벡터)들을 재생산해야 하는 비용을 절감하고, 정확한 정보들을 바탕으로 다양한 연구자들 간에 공유하여 재사용할 수 있는 길을 열게 되었다. 즉, 인적 물적 낭비를 최소화하고 연구 효율을 극대화함으로써 명실 공히 국가경쟁력을 재고할 수 있는 길을 열었다 할 수 있다.In addition, since the process can be written naturally, the complete sequence object can be recorded in the output, reducing the cost of reproducing the genes and their carriers (vectors) that accumulate in each laboratory or lab. It has opened the way for sharing and reuse among people. In other words, by minimizing human material waste and maximizing research efficiency, it is possible to open a way to reconsider national competitiveness.

Claims

A method of applying a laboratory's gene cloning process in the same way as a laboratory in software.

The method according to claim 1, wherein the genetic process of the software is composed of PCL and ligation as shown in FIG.

The method of claim 1, wherein the gene cloning process of the cloning software comprises a DNA cutter and a ligation as shown in FIG. 2.

The method according to claim 3, wherein the clone software comprises a cut and ligation that performs the ligation with the die cutter during the gene cloning process.

The method of claim 1, wherein the process of automatically designing the primers required for the PCR during the gene cloning process of the cloning software comprises a primer designer as shown in FIG. 3.

In each of the constituent modules constituting the gene cloning, the sequence of genes (die, RNA, plasmid, primers, restriction enzyme) that are handled in the laboratory, the input data of each process, is recognized as one object, and each sequence is cloned. How to treat independently, regardless of the process.

The method of claim 6, wherein the overhang and blunt information of the sequence object is added to the sequence object.

The method of claim 7, wherein the sequence object including the overhang and blunt information is represented by the sequence object according to the sequence object representation format.

The method of claim 6, wherein a Standard Position Marker (SPM) is added to the sequence object to mark the beginning of the sequence object.

The method of claim 9, wherein the sequence object including the sequence object positional reference indication is represented according to a sequence object representation format.

Primer designer using the CDS region search, Longest ORF region, and sequence-wide region search of the feature table of the sequence object as a method of automatically determining the region for generating the primer for the PC.

The method of claim 11, wherein the primer is designed by adding a linker sequence to the 5 ′ position of the primer produced by the primer designer.

The function of claim 11, wherein a restriction enzyme that recognizes a primer made by a primer designer can determine the template sequence and indicate whether the primer sequence is suitable.

12. The function of claim 11, wherein each primer designed to bind at both ends of the template sequence is bound to at least two positions in the template sequence to indicate that undesired PD products may occur when the PC is to be seeded.

The PCR program checks the annealing state between the template sequence and the primer. If the T's Tm value of the sequence is higher than the value specified by the user from 3 'to 3, To determine the state of engagement of the.

The method of claim 15, wherein the PCL program blunts the PA product sequence object.

16. The method of claim 15, wherein the PCL program adds a 3 '"A" overhang at both ends of the PCL product sequence object.

DNA cutter program for generating an ENDS that displays overhangs and blunts in a result sequence.

19. The DNA cutter of claim 18, wherein the result of executing the program is a sequence object.

Ligation program characterized in that the recognition of the overhang, blunt marked ENDS in the sequence object automatically determines whether or not.

21. The method of claim 20, wherein if the shape of the sequence object is circular and the sequence object position indication is the result of the ligation program, the method maintains the shape of the sequence object by switching the sequence object based on the sequence object position indication.

21. The method of claim 20, wherein when only one linear sequence object is input to the ligation, a self ligation is performed by ligating both ends of the input sequence object to a circle.

Gene Cloning Software A method of calculating when the input sequence of each process is circular, copying a sequence of at least the maximum operation unit of each operation from the beginning of the sequence object and adding it to the end of the sequence, and then copying the sequence object in the same manner as the linear method. After processing, the result is divided by the length of the circular sequence and the remainder is mapped to the circular sequence.

Gene Cloning Software A program that accepts restriction enzyme names during each step, automatically distinguishes the restriction enzyme name and displays the Roman numerals (I, II, III, IV, V, ... i, ii, iii, iv). , v ..., 1, 2, 3, 4, 5 ....) to facilitate user restriction enzyme input.

Gene Cloning Software A method of treating mixed IUPAC codes as a set of inclusions during each process.

27. The method of claim 25, wherein the processing of mixed IUPAC code is handled by a regular expression representation in a software implementation via a set concept.