JP5062634B2

JP5062634B2 - Base sequence design method

Info

Publication number: JP5062634B2
Application number: JP2008503907A
Authority: JP
Inventors: 憲二宮本; 康文榊原; 博道太田
Original assignee: Keio University
Current assignee: Keio University
Priority date: 2006-03-09
Filing date: 2007-03-08
Publication date: 2012-10-31
Anticipated expiration: 2027-03-08
Also published as: WO2007102578A1; JPWO2007102578A1

Description

本発明は、翻訳系で発現させるための遺伝子の塩基配列設計方法に関する。 The present invention relates to a method for designing a base sequence of a gene for expression in a translation system.

特定のアミノ酸配列をコードする遺伝子を、その遺伝子が由来する生物種とは異なる生物種のホスト細胞で発現させる際、翻訳過程が律速段階になり、十分な発現レベルが得られないことがある。 When a gene encoding a specific amino acid sequence is expressed in a host cell of a species different from the species from which the gene is derived, the translation process becomes a rate-limiting step, and a sufficient expression level may not be obtained.

そのような場合の解決方法の一つとして、その遺伝子の使用しているコドンを、ホスト細胞の生物種で、高頻度で用いられているコドンに変更するということが行われてきた。 As one of the solutions in such a case, the codon used by the gene has been changed to a codon frequently used in the host cell species.

しかし、それでも、十分に発現レベルが高くならない場合があり、発現させる遺伝子の設計に、更なる工夫が期待されている。 However, there are cases where the expression level does not become sufficiently high, and further ingenuity is expected in the design of the gene to be expressed.

そこで、本発明は、所定の翻訳系を用いて遺伝子を発現させるとき、高い発現レベルを得ることができる、遺伝子の塩基配列設計方法を提供することを目的とする。 Therefore, an object of the present invention is to provide a gene base sequence design method capable of obtaining a high expression level when a gene is expressed using a predetermined translation system.

翻訳系を用いて遺伝子を発現させる場合、その遺伝子及び翻訳系の由来する生物種に関わらず、その遺伝子をそのまま発現ベクターに組み込んで、翻訳系で発現させることが多い。しかし、生物種によって、codon usageが異なるため、その遺伝子のコドンが、用いている翻訳系ではうまく利用されず、発現レベルが上がらないことがある。本発明者らは、この問題を解決するため鋭意努力した結果、生物種によって、コドン配列の規則性が異なり、このことが、発現レベルが上がらない理由の一つであるという新たな知見を見出し、本発明の完成に至った。 When a gene is expressed using a translation system, the gene is often directly incorporated into an expression vector and expressed in the translation system regardless of the species from which the gene and translation system are derived. However, since codon usage differs depending on the species, the codon of the gene may not be used well in the translation system used, and the expression level may not increase. As a result of diligent efforts to solve this problem, the present inventors have found a new finding that the regularity of the codon sequence varies depending on the species, and this is one of the reasons that the expression level does not increase. The present invention has been completed.

本発明の塩基配列設計方法は、所定の翻訳系を用いて発現させる遺伝子の塩基配列設計方法であって、前記遺伝子のコドン配列を、学習データにおけるコドン配列の規則性に適合させることを特徴とする。前記学習データが、前記翻訳系が由来する生物種の遺伝子プールであってもよく、所定の翻訳系において、すでに発現レベルが所定レベル以上であることが明らかになっている遺伝子プールであってもよい。 The nucleotide sequence design method of the present invention is a nucleotide sequence design method for a gene to be expressed using a predetermined translation system, characterized in that the codon sequence of the gene is adapted to the regularity of the codon sequence in the learning data. To do. The learning data may be a gene pool of a species from which the translation system is derived, or may be a gene pool whose expression level has already been clarified to be a predetermined level or higher in a predetermined translation system. Good.

また、前記塩基配列設計方法が、適合させた前記遺伝子の塩基配列を、適合させる際に用いた学習データに、データとして追加する工程を含んでもよい。 The base sequence design method may include a step of adding the adapted base sequence of the gene as data to the learning data used for the adaptation.

さらに、前記塩基配列設計方法が、隠れマルコフモデルを用いて、前記遺伝子のコドン配列を、学習データにおける前記コドン配列の規則性に適合させることを特徴としてもよい。この場合、バウム・ウエルチアルゴリズムを用いて、前記コドン配列の規則性を計算してもよい。 Furthermore, the base sequence design method may be characterized in that the codon sequence of the gene is adapted to the regularity of the codon sequence in the learning data using a hidden Markov model. In this case, the regularity of the codon sequence may be calculated using a Baum-Welch algorithm.

さらに、前記塩基配列設計方法において、前記翻訳系が細胞内にあってもよく、in vitroタンパク合成系内にあってもよい。 Furthermore, in the base sequence design method, the translation system may be in a cell or in an in vitro protein synthesis system.

本発明のプログラムは所定の翻訳系を用いて発現させるための遺伝子の塩基配列を設計するためのプログラムであって、コンピュータに前記塩基配列設計方法を実行させるプログラムである。本発明の記録媒体は、これらのプログラムの少なくとも一つを記録した、コンピュータで読み取り可能な記録媒体である。 The program of the present invention is a program for designing a base sequence of a gene to be expressed using a predetermined translation system, and is a program for causing a computer to execute the base sequence design method. The recording medium of the present invention is a computer-readable recording medium that records at least one of these programs.

本発明の塩基配列設計システムは、所定の翻訳系を用いて発現させるための遺伝子の塩基配列設計システムであって、(i) 塩基配列データを入力するための入力装置、(ii) 入力されたデータを用いて、請求項９に記載のプログラムを実行するコンピュータ、および(iii) (ii)により得られた結果を出力するための出力装置を備える。 The base sequence design system of the present invention is a gene base sequence design system for expression using a predetermined translation system, comprising: (i) an input device for inputting base sequence data; and (ii) input A computer that executes the program according to claim 9 by using the data, and an output device for outputting the result obtained by (iii) and (ii) are provided.

＝＝関連文献とのクロスリファレンス＝＝
なお、本出願は、２００６年３月９日出願の日本国出願番号特願２００６−６４８１６を基礎とする優先権の利益を主張し、これを引用することにより本明細書に含める。== Cross reference with related literature ==
Note that this application claims the benefit of priority based on Japanese Patent Application No. 2006-64816 filed on Mar. 9, 2006, and is incorporated herein by reference.

本発明にかかる実施の形態における「枝刈り」の例である３つの条件の具体例を示す模式図である。It is a schematic diagram which shows the specific example of three conditions which are examples of the "pruning" in embodiment concerning this invention.

実施の形態及び実施例に特に説明がない場合には、J. Sambrook, E. F. Fritsch & T. Maniatis (Ed.), Molecular cloning, a laboratory manual (3rd edition), Cold Spring Harbor Press, Cold Spring Harbor, New York (2001); F. M. Ausubel, R. Brent, R. E. Kingston, D. D. Moore, J.G. Seidman, J. A. Smith, K. Struhl (Ed.), Current Protocols in Molecular Biology, John Wiley & Sons Ltd.などの標準的なプロトコール集に記載の方法、あるいはそれを修飾したり、改変した方法を用いる。また、市販の試薬キットや測定装置を用いる場合には、特に説明が無い場合、それらに添付のプロトコールを用いる。 Unless otherwise stated in the embodiments and examples, J. Sambrook, EF Fritsch & T. Maniatis (Ed.), Molecular cloning, a laboratory manual (3rd edition), Cold Spring Harbor Press, Cold Spring Harbor, New York (2001); FM Ausubel, R. Brent, RE Kingston, DD Moore, JG Seidman, JA Smith, K. Struhl (Ed.), Standard Protocols in Molecular Biology, John Wiley & Sons Ltd. The method described in the protocol collection, or a modified or modified method thereof is used. In addition, when using commercially available reagent kits and measuring devices, unless otherwise explained, protocols attached to them are used.

なお、本発明の目的、特徴、利点、及びそのアイデアは、本明細書の記載により、当業者には明らかであり、本明細書の記載から、当業者であれば、容易に本発明を再現できる。以下に記載された発明の実施の形態及び具体的な実施例などは、本発明の好ましい実施態様を示すものであり、例示又は説明のために示されているのであって、本発明をそれらに限定するものではない。本明細書で開示されている本発明の意図並びに範囲内で、本明細書の記載に基づき、様々な改変並びに修飾ができることは、当業者にとって明らかである。 The objects, features, advantages, and ideas of the present invention will be apparent to those skilled in the art from the description of the present specification, and those skilled in the art can easily reproduce the present invention from the description of the present specification. it can. The embodiments and specific examples of the invention described below show preferred embodiments of the present invention and are shown for illustration or explanation, and the present invention is not limited to them. It is not limited. It will be apparent to those skilled in the art that various modifications and variations can be made based on the description of the present specification within the spirit and scope of the present invention disclosed herein.

＝＝塩基配列設計方法＝＝
本発明は、所定の翻訳系を用いて発現させる遺伝子の塩基配列設計方法であって、その遺伝子のコドン配列を、学習データにおけるコドン配列の規則性に適合させることを特徴とする。以下、具体的に説明する。なお、本明細書で「コドン配列」とは、「三組みコドンの並び」のことをいうものとする。また、「コドン配列の規則性」とは、「現れることが期待される確率が最も高いコドン配列」を意味するものとする。== Base sequence design method ==
The present invention is a method for designing a base sequence of a gene to be expressed using a predetermined translation system, characterized in that the codon sequence of the gene is adapted to the regularity of the codon sequence in the learning data. This will be specifically described below. In the present specification, “codon sequence” means “a sequence of triplet codons”. Further, “regularity of codon sequences” means “codon sequences with the highest probability of being expected to appear”.

所定の翻訳系を用いて、目的の遺伝子を発現させる際、翻訳系の由来する生物種や遺伝子の由来する生物種に関わらず、その翻訳系で発現できる発現ベクターに遺伝子をそのまま組み込み、翻訳系に導入することが多い。しかし、本発明に従って、その遺伝子のコドン配列を、学習データにおけるコドン配列の規則性に適合させることによって、その翻訳系における発現に対し、遺伝子の塩基配列を好適化することができ、発現レベルを上げることができる。 When expressing a target gene using a predetermined translation system, the gene is directly incorporated into an expression vector that can be expressed in the translation system regardless of the species from which the translation system is derived or the species from which the gene is derived. It is often introduced in. However, according to the present invention, by adapting the codon sequence of the gene to the regularity of the codon sequence in the learning data, the base sequence of the gene can be optimized for expression in the translation system, and the expression level can be increased. Can be raised.

用いる翻訳系は、ｍＲＮＡを翻訳させられる系であれば限定されず、細胞内にあっても、in vitro翻訳系であってもかまわない。細胞内の翻訳系を用いる場合は、例えば、発現ベクターに、配列を好適化した遺伝子を組み込み、常法をによって細胞に導入すればよい。in vitroの翻訳系を用いる場合は、例えば、in vitroで転写させて合成したｍＲＮＡを、ウサギ網状赤血球、コムギ胚芽、あるいは大腸菌由来の無細胞タンパク質合成系に添加して、翻訳させればよい。 The translation system to be used is not limited as long as it is a system capable of translating mRNA, and may be in a cell or an in vitro translation system. When using an intracellular translation system, for example, a gene whose sequence is optimized may be incorporated into an expression vector and introduced into the cell by a conventional method. When using an in vitro translation system, for example, mRNA synthesized by transcription in vitro may be added to a rabbit reticulocyte, wheat germ, or cell-free protein synthesis system derived from E. coli and translated.

発現させる遺伝子は、特に限定されず、アミノ酸配列はどんなものでもよく、天然の遺伝子であっても、人為的に変異を導入された遺伝子でもかまわない。 The gene to be expressed is not particularly limited, and any amino acid sequence may be used, and it may be a natural gene or an artificially introduced gene.

学習データも、特に限定されないが、翻訳系が由来する生物種または近縁種の遺伝子プール、あるいは所定の翻訳系において、すでに発現レベルが所定レベル以上であることが明らかになっている遺伝子プールが好ましく、それらを組み合わせたりしてもよい。また、本発明の塩基配列好適化方法によってコドン配列を適合させた遺伝子データを、適合させる際に用いた学習データに追加し、次に同じ翻訳系を用いる場合、そのデータを追加した学習データを用いてもよい。 The learning data is not particularly limited, but there is a gene pool of a biological species or a related species from which the translation system is derived, or a gene pool whose expression level has already been clarified in the predetermined translation system. Preferably, they may be combined. In addition, when the genetic data in which the codon sequence is adapted by the base sequence optimization method of the present invention is added to the learning data used in the adaptation, and the same translation system is used next, the learning data to which the data is added is added. It may be used.

塩基配列を好適化するのは、発現させる遺伝子のコドン配列を、学習データにおけるコドン配列の規則性に適合させることにより行う。 The base sequence is optimized by adapting the codon sequence of the gene to be expressed to the regularity of the codon sequence in the learning data.

例えば、今、２つのコドン配列の規則性によって、発現させる遺伝子のコドン配列を決める場合を考える。例として、発現させる遺伝子中の２つのアミノ酸の並びをＡ−Ｂとする。Ａには、３種類のコドンａ１、ａ２、ａ３が存在し、Ｂには、ｂ１、ｂ２の２種類のコドンが存在するとすると、コドンの選び方には６通り存在する。学習データにおいては、その中で、ａ１−ｂ２の頻度が最も高いとすると、発現させる遺伝子の塩基配列として、ａ１−ｂ２を選択する。このように、学習データにおいて、出現すると期待される確率が最も高いコドン配列を選択し、発現させる遺伝子のコドン配列とする。 For example, consider the case where the codon sequence of a gene to be expressed is determined by the regularity of two codon sequences. As an example, a sequence of two amino acids in a gene to be expressed is A-B. If A has three types of codons a1, a2, and a3, and B has two types of codons, b1 and b2, there are six ways to select codons. In learning data, if the frequency of a1-b2 is the highest, a1-b2 is selected as the base sequence of the gene to be expressed. As described above, in the learning data, the codon sequence having the highest probability of being expected to be selected is selected as the codon sequence of the gene to be expressed.

この例では、規則性を考えるコドン配列におけるコドン数を２つとしたが、３つ以上でも良く、学習データと発現させる遺伝子のコドン配列の適合性を考えると、コドン数は多い方が良いと考えられる。隠れマルコフモデルを利用する場合には、理論的には任意の長さのコドン配列の適合性を計算できるが，隠れ状態の数によって規則性を考えるコドン配列におけるコドン数が決まる。その数は、コドン数が多くなると、それに伴い、コドン配列の頻度の計算量が増え、コストも増加するので、１０個〜５０個にするのが好ましい。 In this example, the number of codons in the codon sequence that considers regularity is two, but it may be three or more, and considering the compatibility between the learning data and the codon sequence of the gene to be expressed, we think that the larger number of codons is better. It is done. When the hidden Markov model is used, the conformity of a codon sequence having an arbitrary length can be calculated theoretically, but the number of codons in the codon sequence considering regularity is determined by the number of hidden states. As the number of codons increases, the amount of calculation of the frequency of codon sequences increases and the cost also increases. Accordingly, the number is preferably 10 to 50.

学習データにおけるコドン配列の規則性を規定するための方法は、特に限定されないが、隠れマルコフモデルを利用するのが好ましい。隠れマルコフモデルにおいては、大量の学習データ {g¹, ..., g, ..., g^T} を収集し、これらすべての遺伝子におけるコドンc_iに対して，
を最大化するようなパラメータの組 {π_a, s_ab, d_a(c_i)} を求める。（ここで、π_aはスタートコドン(Met)出力前における状態aの初期確率、s_abは状態 a から状態 bへの状態遷移確率、d_a(c_i)は状態 a でコドンc_i が観測される出力確率である。）A method for defining the regularity of the codon sequence in the learning data is not particularly limited, but it is preferable to use a hidden Markov model. In the hidden Markov model, a large amount of training data {g ¹ , ..., g, ..., g ^T } is collected, and for codons c _i in all these genes,
Find _a set of parameters {π _a , s _ab , d _a (c _i )} that maximizes. (Where π _a is the initial probability of state a before output of the start codon (Met), s _ab is the state transition probability from state a to state b, and d _a (c _i ) is state a and codon c _i is observed. Output probability.)

ここで、各パラメータはバウム・ウエルチ（Baum Welch）アルゴリズムを用いて決定されるのが好ましい。即ち、
1. 初期値π_１, s_ab, d_a(c_i)を設定する。
2. 学習データ上の遺伝子gにおいてi番目のコドンで状態aから状態bへの遷移確率γ_i(a,b,g)を、i,a,b全ての組み合わせに関して学習データから演算する。
具体的には、以下の式を計算する（ここで、Ｌは遺伝子gの全コドン数、c_i ^Lはｉ番目からL番目(つまり最後まで)のコドンを意味する文字列、Ｓは状態 a から状態 bへの状態遷移に対応したs_abに関して全ての状態遷移に対応したs_ab、Ｄはコドンc_iに対応したd_a(c_i)に関して全てのコドンに対応したd_a(c_i)である。）
3. γ_i(a,b,g)からπ_１, s_ab, d_a(c_i)を計算して再定義する。
具体的には、以下の式を計算する。
4.各パラメータが安定しなければ2へ戻る。Here, each parameter is preferably determined using a Baum Welch algorithm. That is,
1. Set initial values π ₁ , s _ab , d _a (c _i ).
2. In the gene g on the learning data, the transition probability γ _i (a, b, g) from the state a to the state b at the i-th codon is calculated from the learning data for all combinations of i, a, b.
Specifically, the following equation is calculated (where L is the total number of codons for gene g, c _i ^L is a character string meaning the i-th to L-th (that is, the last) codon, and S is the state a. corresponding to all the state transitions with respect to s _ab corresponding to the state transition to state b from s _ab, D codon c _i d _a that corresponds to the (c _i) d corresponding to all codons with respect to _a (c _i) .)
3. Calculate and redefine π ₁ , s _ab , d _a (c _i ) from γ _i (a, b, g).
Specifically, the following formula is calculated.
4. Return to 2 if each parameter is not stable.

このアルゴリズムを、各パラメータが完全に安定するまで行ってもよいが、それには事実上膨大な時間がかかるので、このサイクルを所定の回数行ってアルゴリズムを終了してもよい。この場合の回数は１００回〜１０００回程度行えばよい。なお、１において、初期値の設定方法は特に限定されないが、実施例のように擬似乱数に確率的な制約を加えて設定してもよい。 This algorithm may be performed until each parameter is completely stabilized. However, since it takes an enormous amount of time, this algorithm may be terminated a predetermined number of times. The number of times in this case may be about 100 to 1000 times. In 1, the initial value setting method is not particularly limited, but may be set by adding a probabilistic constraint to the pseudo-random number as in the embodiment.

Baum Welchアルゴリズムで得られた各パラメータを用い、すべてのコドンの組み合わせにおける出力確率を演算して、最大のものを求めれば、目的のコドン配列が得られる。即ち、Baum Welchアルゴリズムによって決定されたモデルＭにおける文字列の出力確率は状態遷移σ_1,a,σ_2,a2,・・・σ_L,aLの組み合わせ全てを足したもの
となる。Using each parameter obtained by the Baum Welch algorithm, calculating the output probabilities for all the codon combinations, and obtaining the maximum one, the target codon sequence can be obtained. That is, the output probability of the character string in the model M determined by the Baum Welch algorithm is the sum of all combinations of the state transitions σ _{1, a} , σ _{2, a2} _,.
It becomes.

しかしながら、この計算には膨大な時間がかかるため、近似値が得られる手法を用いてもよく、その手法は限定されないが、例えば、以下のような３つの条件を有する「枝刈り」を利用してもよい。（なお、図１に、この３つの条件の具体例を挙げた。）
まず、(i)に関しては明らかであって、ここでの目的が、発現させる遺伝子のコドン配列を決めることであるため、目的とすべきアミノ酸配列はあらかじめ決まっており、i番目に、アミノ酸a_iに対応しないコドンc_iを使用している塩基配列を選択できない(ii)に関しても明らかといえるが、以下、帰納法にて厳密に証明する。
However, since this calculation takes an enormous amount of time, a method for obtaining an approximate value may be used, and the method is not limited. For example, “pruning” having the following three conditions is used. May be. (Note that FIG. 1 shows specific examples of these three conditions.)
First, it is clear about (i), and since the purpose here is to determine the codon sequence of the gene to be expressed, the amino acid sequence to be targeted is determined in advance, and the i-th amino acid a _i Although it can be said that (ii) cannot select a base sequence that uses a codon c _i that does not correspond to, it will be proved strictly by induction below.

このように、(i)と(ii)に関しては好適解を求めるにあたって間違いを起こさないが、(iii)に関しては好適解を求めるに当たって間違いを起こしうる。nを十分に大きくすることで(iii)の条件が厳しくなり、誤った枝刈りを起こす可能性は減るため、nを十分大きくすることが好ましい。ただし、ｎを大きくするほど，(ii)のチェックを行うのに必要な計算量が増加する。 As described above, (i) and (ii) do not make an error in obtaining a suitable solution, but (iii) can make an error in obtaining a suitable solution. By sufficiently increasing n, the condition of (iii) becomes strict and the possibility of erroneous pruning is reduced. Therefore, it is preferable to increase n sufficiently. However, as n is increased, the amount of calculation required to perform the check (ii) increases.

＝＝塩基配列設計システム＝＝
上記塩基配列設計方法を自動化するために、コンピュータに実行させることができるようにプログラム化する。こうして作成されたプログラムも、本発明の権利範囲内である。== Base sequence design system ==
In order to automate the above base sequence design method, it is programmed so that it can be executed by a computer. The program created in this way is also within the scope of the rights of the present invention.

さらに、このプログラムを実行するためのコンピュータとともに、塩基配列データを入力するための入力装置、及びプログラムの実行により得られた結果を出力するための出力装置を備えた塩基配列設計システムとすることもできる。 Furthermore, a base sequence design system including an input device for inputting base sequence data and an output device for outputting the result obtained by executing the program together with a computer for executing the program may be used. it can.

＜実施例＞
（１）S.cerevisiaeのadh1遺伝子及びadh2遺伝子の塩基配列の設計
本実施例では、翻訳系として大腸菌JM109を用いた。この細胞株は、大腸菌K12株に由来する菌株であるため、学習データとして、ゲノムデータベースDDBJ(http://www.ddbj.nig.ac.jp/)のE.coli K12 MG1655の塩基配列データを利用した。この翻訳系で発現させる遺伝子として、Saccharomyces cerevisiae由来のアルコールデヒドロゲナーゼ遺伝子adh1（adh1w、配列番号１）及びadh2（adh2w、配列番号２）を用いた。<Example>
(1) Design of nucleotide sequences of adh1 gene and adh2 gene of S. cerevisiae In this example, Escherichia coli JM109 was used as a translation system. Since this cell line is derived from Escherichia coli K12, the base sequence data of E.coli K12 MG1655 in the genome database DDBJ (http://www.ddbj.nig.ac.jp/) is used as learning data. used. As genes to be expressed in this translation system, alcohol dehydrogenase genes adh1 (adh1w, SEQ ID NO: 1) and adh2 (adh2w, SEQ ID NO: 2) derived from Saccharomyces cerevisiae were used.

まず、規則性を考えるコドン配列におけるコドン数を８個とし、コドンを決定するアミノ酸の８個前までのコドン配列についての規則性を検討した。学習データにおけるコドン配列の規則性を規定するためには、隠れマルコフモデルを利用した。バウム・ウエルチ（Baum Welch）アルゴリズムにおいては、初期値として、時刻を入力して発生させた疑似乱数に対し、状態遷移確率や出力確率のすべての確率値の合計が１になるように設定した乱数を用いた。また、３５０回のアルゴリズムを行って学習データを学習させた。 First, the number of codons in the codon sequence considering regularity was set to 8, and the regularity of the codon sequence up to 8 amino acids before the amino acid determining the codon was examined. In order to define the regularity of the codon sequence in the learning data, a hidden Markov model was used. In the Baum Welch algorithm, the initial value is a random number that is set so that the sum of all probability values of the state transition probability and output probability is 1, compared to the pseudo-random number generated by inputting the time. Was used. In addition, the learning data was learned by performing an algorithm of 350 times.

このようにして得られたモデルを用い、Adh1及びAdh2に対するコドン配列を好適化した。その際、上記３つの条件を有する「枝刈り」を用いた。本実施例では、ｎを５０として、演算を行ったところ、S.cerevisiae由来のAdh1及びAdh2の両者で(iii)の枝刈りは起こらなかったため、誤りを含み得る枝刈りを行わなくても済み、厳密な最適解を得ることができた。 Using the model thus obtained, the codon sequences for Adh1 and Adh2 were optimized. At that time, “pruning” having the above three conditions was used. In this example, when n was set to 50, the calculation of (iii) did not occur in both Adh1 and Adh2 derived from S. cerevisiae. Therefore, it is not necessary to perform pruning that may contain errors. The exact optimal solution was obtained.

（２）好適化したadh1遺伝子及びadh2遺伝子の合成
好適化したadh1遺伝子（adh1c、配列番号３）及びadh2遺伝子（adh2c、配列番号４）を以下のように合成した。(2) Synthesis of optimized adh1 gene and adh2 gene The optimized adh1 gene (adh1c, SEQ ID NO: 3) and adh2 gene (adh2c, SEQ ID NO: 4) were synthesized as follows.

adh1cに関しては40bpのプライマー群をつなげ合わせる方法(Assembly PCR法）で合成を行った。まず、adh1cの塩基配列に対応するforwardプライマーとreverseのプライマーを、40bpの長さずつに切り分ける形で用意した。adh1の塩基配列の長さは1047bpであるために、forwardプライマー26本（配列番号５〜３０）、reverseプライマー26本（配列番号３１〜５６）となった。このようにして用意した26×2本のプライマーを以下の条件でAssembly PCR法を行った。 For adh1c, synthesis was performed by a method of joining 40 bp primer groups (Assembly PCR method). First, a forward primer and a reverse primer corresponding to the base sequence of adh1c were prepared by cutting them into 40 bp lengths. Since the length of the base sequence of adh1 was 1047 bp, it became 26 forward primers (SEQ ID NOs: 5 to 30) and 26 reverse primers (SEQ ID NOs: 31 to 56). The 26 × 2 primers prepared in this way were subjected to assembly PCR under the following conditions.

まず、全プライマーを等濃度混合したprimer mix(250μM)を作製し、以下の反応液（50μl）で、９４℃３０秒−５２℃３０秒−７２℃３０秒を５５サイクル行った。
反応液： 0.2 mM dNTP 5μl
primer mix 0.5μl
Pfu Ultra (Stratagene) 0.25μl
Pfu Ultra用10×buffer 5μl
滅菌水 39.25μlFirst, a primer mix (250 μM) in which all the primers were mixed at an equal concentration was prepared, and 55 cycles of 94 ° C. for 30 seconds−52 ° C. for 30 seconds−72 ° C. for 30 seconds were performed with the following reaction solution (50 μl).
Reaction solution: 0.2 mM dNTP 5 μl
primer mix 0.5μl
Pfu Ultra (Stratagene) 0.25μl
10 × buffer 5μl for Pfu Ultra
Sterile water 39.25μl

反応終了後、反応液をテンプレートにして、再び以下の反応液でPCRを行った。条件は、９４℃２分の変性反応後、９４℃３０秒−５３℃３０秒−７２℃６０秒を３０サイクル行い、７２℃１０分の伸長反応を行った。このとき、プライマーとして、後の段階でベクターとのライゲーションができるように、先頭のforwardプライマー（配列番号５）の最初にEcoRIサイトをつけて長さを調節したプライマー（配列番号５７）と最初のreverseプライマー（配列番号３１； EcoRIサイト付加済み）を用いた。
反応液：テンプレート 1μl
0.2 mM dNTP 5μl
プライマー（配列番号５７）(250μM) 0.5μl
プライマー（配列番号３１）(250μM) 0.5μl
Pfu Ultra (Stratagene) 1μl
Pfu Ultra 用10×buffer 5μl
滅菌水 36μl
PCR産物をアガロースゲルで精製し、EcoRIで切断後、pUC19のEcoRI部位に挿入した。塩基配列を決定することにより、正しい配列を有していることを確認した。また、adh2cに関しては、adh1cと同様に設計したプラスミドの合成をoperon社に合成委託したが、基本的には、adh1cと同様に合成することができる。After completion of the reaction, PCR was performed again with the following reaction solution using the reaction solution as a template. The conditions were 94 ° C. for 2 minutes denaturation reaction, 94 ° C. for 30 seconds-53 ° C. for 30 seconds-72 ° C. for 60 seconds for 30 cycles, and 72 ° C. for 10 minutes for extension reaction. At this time, a primer (SEQ ID NO: 57) whose length was adjusted by attaching an EcoRI site at the beginning of the first forward primer (SEQ ID NO: 5) so that ligation with the vector can be performed at a later stage as a primer. A reverse primer (SEQ ID NO: 31; EcoRI site added) was used.
Reaction solution: Template 1μl
0.2 mM dNTP 5 μl
Primer (SEQ ID NO: 57) (250 μM) 0.5 μl
Primer (SEQ ID NO: 31) (250 μM) 0.5 μl
Pfu Ultra (Stratagene) 1μl
10 × buffer 5μl for Pfu Ultra
Sterile water 36μl
The PCR product was purified on an agarose gel, cut with EcoRI, and inserted into the EcoRI site of pUC19. By determining the base sequence, it was confirmed that it had the correct sequence. Regarding adh2c, the synthesis of a plasmid designed in the same manner as adh1c was outsourced to operon, but basically it can be synthesized in the same manner as adh1c.

（３）adh1遺伝子及びadh2遺伝子の合成
一方、adh1w及びadh2wは、S.cerevisiae YPH 499から抽出したゲノムDNAを鋳型にしてPCRを行い、各クローンを得た。なお、PCRは、それぞれの遺伝子に対するプライマーペア（adh1wに対しては、配列番号５８と５９のプライマーペア、adh2wに対しては配列番号６０と６１のプライマーペア）を用い、９４℃２分の変性反応後、９４℃３０秒−５３℃３０秒−７２℃６０秒を３０サイクル行い、７２℃１０分の伸長反応を行った。
プライマー(大文字はクローニングのために付加した制限酵素サイトを示す)：
adh1w増幅用forwardプライマー（配列番号５８）：
GGAATTCatgtctatcccagaaactcaaaaaggtgtt
adh1w増幅用reverseプライマー（配列番号５９）：
GGAATTCttatttagaagtgtcaacaacgtatctacc
adh2w増幅用forwardプライマー（配列番号６０）：
GGAATTCatgtctattccagaaactcaaaaagccatt
adh2w増幅用reverseプライマー（配列番号６１）：
GGAATTCttatttagaagtgtcaacaacgtatctacc
反応液：テンプレート 1μl
0.2 mM dNTP 5μl
フォワードプライマー(250μM) 0.5μl
リバースプライマー(250μM) 0.5μl
Pfu Ultra (Stratagene) 1μl
Pfu Ultra 用10×buffer 5μl
滅菌水 36μl
PCR産物をアガロースゲルで精製し、EcoRIで切断後、pUC19のEcoRI部位に挿入した。塩基配列を決定することにより、正しい配列を有していることを確認した。(3) Synthesis of adh1 gene and adh2 gene On the other hand, adh1w and adh2w were subjected to PCR using genomic DNA extracted from S. cerevisiae YPH 499 as a template to obtain each clone. PCR uses a primer pair for each gene (a primer pair of SEQ ID NOS: 58 and 59 for adh1w, a primer pair of SEQ ID NOS: 60 and 61 for adh2w), and denaturation at 94 ° C. for 2 minutes. After the reaction, 30 cycles of 94 ° C. for 30 seconds-53 ° C. for 30 seconds-72 ° C. for 60 seconds were performed, and an extension reaction was performed at 72 ° C. for 10 minutes.
Primer (capital letters indicate restriction sites added for cloning):
adh1w amplification forward primer (SEQ ID NO: 58):
GGAATTCatgtctatcccagaaactcaaaaaggtgtt
adh1w amplification reverse primer (SEQ ID NO: 59):
GGAATTCttatttagaagtgtcaacaacgtatctacc
adh2w amplification forward primer (SEQ ID NO: 60):
GGAATTCatgtctattccagaaactcaaaaagccatt
adh2w amplification reverse primer (SEQ ID NO: 61):
GGAATTCttatttagaagtgtcaacaacgtatctacc
Reaction solution: Template 1μl
0.2 mM dNTP 5 μl
Forward primer (250 μM) 0.5 μl
Reverse primer (250 μM) 0.5 μl
Pfu Ultra (Stratagene) 1μl
10 × buffer 5μl for Pfu Ultra
Sterile water 36μl
The PCR product was purified on an agarose gel, cut with EcoRI, and inserted into the EcoRI site of pUC19. By determining the base sequence, it was confirmed that it had the correct sequence.

（４）adh1w、adh2w、adh1c、adh2cの各遺伝子の発現ベクターの構築
このようにして得られたadh1c、adh2c、adh1w及びadh2wがEcoRI部位に挿入されたプラスミドを鋳型とし、PCRを行った。なお、PCRは（３）と同様の条件で行なったが、プライマーは各遺伝子を特異的に増幅するプライマーペア（adh1cに対しては、配列番号６２と６３のプライマーペア、adh2cに対しては配列番号６４と６５のプライマーペア、adh1wに対しては、配列番号６６と６７のプライマーペア、adh2wに対しては配列番号６８と６９のプライマーペア）を用いた。
プライマー(大文字はクローニングのために付加した制限酵素サイトを示す)：
pCold用adh1c増幅用forwaordプライマー（配列番号６２）
CCGGAATTCatgagcattccggaaacgcaaaaaggtgtg
pCold用adh1c増幅用reverseプライマー（配列番号６３）
AACTGCAGttatttgctggtatccaccacatagcggcc
pCold用adh2c増幅用forwaordプライマー（配列番号６４）
CCGGAATTCatgagcattccggaaacccagaaagcgattatt
pCold用adh2c増幅用reverseプライマー（配列番号６５）
AACTGCAGttatttgctggtatccaccacatagcggcc
pCold用adh1w増幅用forwaordプライマー（配列番号６６）
CCGGAATTCatgtctatcccagaaactcaaaaaggtgttatc
pCold用adh1w増幅用reverseプライマー（配列番号６７）
AACTGCAGttatttagaagtgtcaacaacgtatctacc
pCold用adh2w増幅用forwaordプライマー（配列番号６８）
CCGGAATTCatgtctattccagaaactcaaaaagccattatc
pCold用adh2w増幅用reverseプライマー（配列番号６９）
AACTGCAGttatttagaagtgtcaacaacgtatctacc
PCR産物をアガロースゲルで精製し、EcoRI及びPstIで切断後、低温発現誘導型のベクターであるpColdIII及びpColdIV（タカラバイオ社）のEcoRI-PstI部位に挿入した。このようにして作製したプラスミド（表１）をJM109に形質転換した。
(4) Construction of expression vectors for each gene of adh1w, adh2w, adh1c, and adh2c PCR was performed using the plasmid obtained by inserting adh1c, adh2c, adh1w, and adh2w thus obtained in the EcoRI site as a template. PCR was carried out under the same conditions as in (3), except that the primer was a primer pair that specifically amplifies each gene (for adh1c, the primer pair of SEQ ID NOS: 62 and 63, and for adh2c, the sequence The primer pair of numbers 64 and 65, the primer pair of SEQ ID NOs: 66 and 67 for adh1w, and the primer pair of SEQ ID NOs: 68 and 69 for adh2w) were used.
Primer (capital letters indicate restriction sites added for cloning):
forwaord primer for adh1c amplification for pCold (SEQ ID NO: 62)
CCGGAATTCatgagcattccggaaacgcaaaaaggtgtg
A reverse primer for adh1c amplification for pCold (SEQ ID NO: 63)
AACTGCAGttatttgctggtatccaccacatagcggcc
forwaord primer for adh2c amplification for pCold (SEQ ID NO: 64)
CCGGAATTCatgagcattccggaaacccagaaagcgattatt
Reverse primer for adh2c amplification for pCold (SEQ ID NO: 65)
AACTGCAGttatttgctggtatccaccacatagcggcc
forwaord primer for adh1w amplification for pCold (SEQ ID NO: 66)
CCGGAATTCatgtctatcccagaaactcaaaaaggtgttatc
reverse primer for adh1w amplification for pCold (SEQ ID NO: 67)
AACTGCAGttatttagaagtgtcaacaacgtatctacc
forwaord primer for adh2w amplification for pCold (SEQ ID NO: 68)
CCGGAATTCatgtctattccagaaactcaaaaagccattatc
reverse primer for adh2w amplification for pCold (SEQ ID NO: 69)
AACTGCAGttatttagaagtgtcaacaacgtatctacc
The PCR product was purified on an agarose gel, cleaved with EcoRI and PstI, and then inserted into the EcoRI-PstI sites of pColdIII and pColdIV (Takara Bio), which are low-temperature expression-inducing vectors. The plasmid thus prepared (Table 1) was transformed into JM109.

（５）好適化したadh1遺伝子及びadh2遺伝子の発現誘導
まず、各プラスミドを有するE.coli JM109株及びインサートを有さないpUC19を有するJM109株を3 ml LB液体培地にて、37 ℃ 12 時間程度、撹拌しながら前培養した。その後の本培養は、以下のようにベクターによって操作が異なる。(5) Expression induction of the optimized adh1 and adh2 genes First, the E. coli JM109 strain having each plasmid and the JM109 strain having pUC19 without an insert in 3 ml LB liquid medium for about 12 hours at 37 ° C Pre-cultured with stirring. Subsequent main culture operations differ depending on the vector as follows.

pUC19をベクターとするプラスミドは、前培養の後に、坂口フラスコに入った100 ml LB液体培地(+アンピシリン 50 μg/ml)に3 mlの培養液全てを加え、37 ℃で1.5 時間振盪培養(120 rpm)したのちに、終濃度100 μMになるようにIPTGを加え、その後に再び37 ℃で24 時間振盪培養(120 rpm)した。 For the plasmid containing pUC19 as a vector, after pre-culture, add all 3 ml of the culture solution to 100 ml LB liquid medium (+ ampicillin 50 μg / ml) in a Sakaguchi flask and shake culture at 37 ° C for 1.5 hours (120 rpm), IPTG was added to a final concentration of 100 μM, and then the shaking culture (120 rpm) was performed again at 37 ° C. for 24 hours.

pColdをベクターとするプラスミドは、前培養の後に、坂口フラスコに入った 100 ml LB液体培地(+アンピシリン50 μg/ml)に3 mlの培養液全てを加え、37 ℃で1.75 時間振盪培養(120 rpm)した後に、15 ℃で30 min静置した。その後、終濃度100 μMになるようにIPTGを加えて、さらに15 ℃で24 時間振盪培養(120 rpm)した。 For the plasmid containing pCold as a vector, after pre-culture, add 3 ml of all culture solution to 100 ml LB liquid medium (+ ampicillin 50 μg / ml) in a Sakaguchi flask and shake culture at 37 ° C for 1.75 hours (120 rpm) and then allowed to stand at 15 ° C. for 30 min. Thereafter, IPTG was added to a final concentration of 100 μM, and the mixture was further cultured with shaking (120 rpm) at 15 ° C. for 24 hours.

このようにして得られた培養液を10,000 rpm 10 minにて遠心分離を行い、上清を捨てた後に残った菌体を100 mM トリスバッファー(pH8)に懸濁した。その懸濁液を再び7,200 rpmにて10 min遠心分離を行い、上清を捨てた後に残った菌体を100 mMトリスバッファー(pH8)に懸濁した。このようにして得られた懸濁液を4℃で10 min超音波破砕をした後に、再び10,000 rpmにて10 min遠心分離を行い、上清を無細胞抽出物とした。 The culture solution thus obtained was centrifuged at 10,000 rpm for 10 min, and the cells remaining after discarding the supernatant were suspended in 100 mM Tris buffer (pH 8). The suspension was centrifuged again at 7,200 rpm for 10 min, and the cells remaining after discarding the supernatant were suspended in 100 mM Tris buffer (pH 8). The suspension thus obtained was subjected to ultrasonic crushing at 4 ° C. for 10 min, and then centrifuged again at 10,000 rpm for 10 min, and the supernatant was used as a cell-free extract.

この無細胞抽出物中のADH活性を以下のように測定した。すなわち、無細胞抽出物を1倍、10倍、100倍、1000倍に希釈し、各100μlを下記の基質溶液900μlと混合し、10 sec間隔で50secまで波長340 nmでの吸光度を測定した。
基質溶液：ピロリン酸ナトリウム 85mM
セミカルバジド塩酸塩 6.5mM
グリシン 18mM
エタノール 550mM
NAD⁺ 1.75mMThe ADH activity in this cell-free extract was measured as follows. That is, the cell-free extract was diluted 1-fold, 10-fold, 100-fold and 1000-fold, 100 μl of each was mixed with 900 μl of the following substrate solution, and the absorbance at a wavelength of 340 nm was measured up to 50 sec at 10 sec intervals.
Substrate solution: Sodium pyrophosphate 85 mM
Semicarbazide hydrochloride 6.5 mM
Glycine 18mM
Ethanol 550mM
NAD ⁺ 1.75mM

また、各無細胞抽出物について、BSAをスタンダードとしてブラッドフォード法(CBB法)を用いて総タンパク質量の測定を行った。得られた吸光度から、下記式を用いて、ADH活性を算出した。表２にその結果を示す。なお、ネガティブコントロール（プラスミドを何も組み込まないE.coli JM109株及びインサートを持たないpUC19を有するE.coli JM109株）の比活性は0であった。
For each cell-free extract, the total protein amount was measured using the Bradford method (CBB method) with BSA as a standard. ADH activity was calculated from the obtained absorbance using the following formula. Table 2 shows the results. The specific activity of the negative control (E. coli JM109 strain without any plasmid and E. coli JM109 strain with pUC19 without insert) was 0.

このように、塩基配列を好適化することにより、1つの組み合わせ(adh2とpColdIV)を除いて、タンパク質の発現量が2.5倍から10倍増強された。以上より、所定の翻訳系を用いて遺伝子を発現させる際、本発明の塩基配列設計方法によって、発現レベルを少なくとも2.5倍から10倍増強させ得ることが示された。 Thus, by optimizing the base sequence, the expression level of the protein was enhanced 2.5 to 10 times except for one combination (adh2 and pColdIV). From the above, it was shown that when a gene is expressed using a predetermined translation system, the expression level can be enhanced by at least 2.5 to 10 times by the base sequence design method of the present invention.

本発明によって、所定の翻訳系を用いて遺伝子を発現させるとき、高い発現レベルを得ることができる、遺伝子の塩基配列設計方法を提供できる。 According to the present invention, it is possible to provide a gene base sequence design method capable of obtaining a high expression level when a gene is expressed using a predetermined translation system.

Claims

A method for designing a base sequence of a gene to be expressed using a predetermined translation system,
An input device inputting a gene base sequence into a computer;
The computer, the codon sequences obtained from the nucleotide sequence of the gene that has been input, the training data, the regularity of the calculated codon sequence using Baum-Welch algorithm, Ru fitted using Hidden Markov Models step And
An output device outputting the adapted base sequence of the gene;
Nucleotide sequence design method, which comprises a.

The method according to claim 1, wherein the learning data is a gene pool of a biological species from which the translation system is derived.

The method according to claim 1, wherein the learning data is a gene pool whose expression level has already been clarified to be equal to or higher than a predetermined level in a predetermined translation system.

Said computer further, the base sequence of the gene is adapted to the learning data used in adapting method of claim 1, characterized in that it comprises a step of adding as data.

The method according to claim 1, wherein the translation system is in a cell.

The method according to claim 1, wherein the translation system is in an in vitro protein synthesis system.

A program for designing a base sequence of a gene to be expressed using a predetermined translation system,
The program which makes a computer perform the method of any one of Claims 1-6 .

A computer-readable recording medium on which the program according to claim 7 is recorded.

A gene base sequence design system for expression using a predetermined translation system,
(i) an input device for inputting the base sequence of the gene into a computer ;
(ii) The codon sequence obtained from the base sequence of the gene input by the input device is adapted to the regularity of the codon sequence calculated using the Baum-Welch algorithm in the learning data using a hidden Markov model. A computer for, and
(iii) A base sequence design system comprising an output device for outputting the adapted base sequence of the gene .