JP7376867B2

JP7376867B2 - How to identify sequence clusters containing antigen-specific antibodies

Info

Publication number: JP7376867B2
Application number: JP2019080434A
Authority: JP
Inventors: 知成松田; 義久萩原; 陽子赤澤; 誠生宮▲崎▼; 哲夫福田; 祐二伊東
Original assignee: ARK RESOURCE CO., LTD.; Kagoshima University NUC; Kyoto University; National Institute of Advanced Industrial Science and Technology AIST
Current assignee: ARK RESOURCE CO., LTD.; Kagoshima University NUC; Kyoto University; National Institute of Advanced Industrial Science and Technology AIST
Priority date: 2019-04-19
Filing date: 2019-04-19
Publication date: 2023-11-09
Anticipated expiration: 2039-04-19
Also published as: JP2020177529A; WO2020213730A1

Description

本発明は、抗原特異抗体を示す配列データを含む配列クラスターを特定する方法に関する。本発明はまた、抗原特異抗体を含む配列クラスターか否かを判定する判定システム、及び抗原特異抗体の作製方法に関する。 The present invention relates to methods for identifying sequence clusters containing sequence data indicative of antigen-specific antibodies. The present invention also relates to a determination system for determining whether a sequence cluster contains an antigen-specific antibody, and a method for producing an antigen-specific antibody.

抗体は、医薬分野及び分子生物学分野を含む多くの分野で利用されている生体分子である。例えば、医薬分野では治療薬及び診断薬として利用され、分子生物学分野では抗原精製用の担体及び／又は検出用試薬として利用さている。 Antibodies are biomolecules that are used in many fields, including the fields of medicine and molecular biology. For example, in the pharmaceutical field, they are used as therapeutic agents and diagnostic agents, and in the molecular biology field, they are used as carriers for antigen purification and/or detection reagents.

抗原特異的なモノクローナル抗体は、さまざまな方法により取得することができる。例えば、抗原特異的なモノクローナル抗体は、抗原を投与した動物から取得したＢ細胞（Ｂｃｅｌｌｓ）とミエローマ細胞と融合させてハイブリドーマ（ｈｙｂｒｉｄｏｍａｓ）を作製し、前記ハイブリドーマの中から前記抗原に対して所定の反応性を示す抗体産生ハイブリドーマ（ｈｙｂｒｉｄｏｍａ）を選抜することを含むスクリーニング法により得ることができる。 Antigen-specific monoclonal antibodies can be obtained by various methods. For example, an antigen-specific monoclonal antibody is prepared by fusing B cells obtained from an animal to which the antigen has been administered with myeloma cells to produce hybridomas, and selecting a specific monoclonal antibody against the antigen from among the hybridomas. It can be obtained by a screening method that involves selecting antibody-producing hybridomas that exhibit reactivity.

また、ファージディスプレイ法及びリボソームディスプレイ法などのバイオパニング方法を用いて、抗原特異的な抗体の候補を含む抗体ライブラリーから特定の抗原に対して所定の反応性を示す抗原特異的なモノクローナル抗体を得ることができる。 In addition, by using biopanning methods such as phage display and ribosome display, we can extract antigen-specific monoclonal antibodies that exhibit a predetermined reactivity toward a specific antigen from an antibody library containing antigen-specific antibody candidates. Obtainable.

近年、遺伝子配列の大規模かつ迅速な測定を可能にする様々なシークエンサー（「次世代シークエンサー」ともいう）が開発され、市販されている。従来のバイオパニング方法と次世代シークエンサーによる解析された遺伝子情報とを組合せ、抗原特異的な抗体を迅速かつ効率的にスクリーニングする方法が報告されている（特許文献１）。 In recent years, various sequencers (also referred to as "next generation sequencers") that enable large-scale and rapid measurement of gene sequences have been developed and are commercially available. A method has been reported that combines a conventional biopanning method and genetic information analyzed by a next-generation sequencer to rapidly and efficiently screen for antigen-specific antibodies (Patent Document 1).

バイオパニングを行うことなく、特定のリンパ節で増加した免疫細胞から次世代シークエンサーで解析して得られた遺伝子情報から、高頻度に検出されたものについて抗原特異的なモノクローナルを得る方法も報告されている（非特許文献１）。 A method to obtain antigen-specific monoclonals for frequently detected cells from genetic information obtained by analyzing immune cells increased in specific lymph nodes using a next-generation sequencer without performing biopanning has also been reported. (Non-patent Document 1).

特開２０１５－１１９６３７号公報Japanese Patent Application Publication No. 2015-119637

ＳｃｉｅｎｔｉｆｉｃＲｅｐｏｒｔ２０１５（５）１３９２６：１－１０Scientific Report 2015(5)13926:1-10

しかしながら、非特許文献１記載の方法のように、特定の臓器で高頻度に検出される抗体の遺伝子情報だけに着目したのでは、例えば、低頻度であるが高い親和性と抗原特異性をあわせ持つ有用な抗原特異的な抗体を見落とすことになる。 However, if we focus only on the genetic information of antibodies that are frequently detected in specific organs, as in the method described in Non-Patent Document 1, for example, it is difficult to The useful antigen-specific antibodies that they have will be overlooked.

本発明者らは、抗原を投与した動物から複数の時点で得られた抗体群の遺伝子情報のなかで、時間経過に伴って変異の蓄積が大きい抗体集団を選抜することで、抗原特異抗体の遺伝子情報を選抜できることを見出し、本発明を完成させた。本発明者らは、さらに、抗原を投与した動物から複数の時点で得られた抗体集団の遺伝子情報から、効率的に抗原特異抗体の遺伝子情報を取得する方法を見出し、本発明を完成させた。 The present inventors were able to develop antigen-specific antibodies by selecting antibody populations with large accumulation of mutations over time from among the genetic information of antibody groups obtained from animals administered with antigen at multiple time points. They discovered that genetic information can be selected and completed the present invention. The present inventors further discovered a method for efficiently obtaining genetic information of antigen-specific antibodies from genetic information of antibody populations obtained at multiple time points from animals administered with antigen, and completed the present invention. .

すなわち、本発明は、以下に関する：
抗体作製の候補となる抗原特異抗体を示す配列データを含む配列クラスターを特定する方法であって、免疫刺激を受けた可能性のある動物から取得された抗体の配列データを含む配列データグループを準備する工程；前記配列データグループから分子系統樹を作成する工程；及び、前記分子系統樹における配列クラスター中の配列データに示される抗体の配列と前記抗体のオリジナル配列との配列一致度の変化又は抗体の種類の変化に基づいて、前記配列クラスターが抗体作製の候補となる抗原特異抗体を示す配列データを含むか否かを判定する工程を含み、前記オリジナル配列が、配列データに示される抗体の配列と、前記抗体が得られた動物に対応する動物種のゲノム配列との配列一致度が最も高い配列である、特定方法。 That is, the present invention relates to:
A method for identifying sequence clusters containing sequence data representing antigen-specific antibodies that are candidates for antibody production, the method comprising preparing a sequence data group containing sequence data of antibodies obtained from animals that may have received immune stimulation. a step of creating a molecular phylogenetic tree from the sequence data group; and a change in the degree of sequence identity between the sequence of the antibody shown in the sequence data in the sequence cluster in the molecular phylogenetic tree and the original sequence of the antibody or the antibody the sequence cluster includes a step of determining whether the sequence cluster includes sequence data indicating an antigen-specific antibody that is a candidate for antibody production, based on a change in the type of and a sequence that has the highest degree of sequence identity with the genome sequence of the animal species corresponding to the animal from which the antibody was obtained.

抗体作製の候補となる抗原特異抗体を示す配列データを含む配列クラスターか否かを判定する判定システムであって、制御部と、記憶部とを備え、前記記憶部は、免疫刺激を受けた可能性のある動物から取得された抗体の配列データを含む配列データグループを含むデータベースを格納しており、前記制御部は、前記配列データグループから分子系統樹を作成する分子系統樹作成部と、前記分子系統樹における配列クラスター中の配列データに示される抗体の配列と前記抗体のオリジナル配列との配列一致度の変化又は抗体の種類の変化に基づいて、前記配列クラスターが抗体作製の候補となる抗原特異抗体を示す配列データを含むか否かを判定する配列データ判定部とを含み、前記オリジナル配列が、配列データに示される抗体の配列と、前記抗体が得られた動物に対応する動物種のゲノム配列との配列一致度が最も高い配列である、判定システム。
抗原特異抗体を作製する方法であって、免疫刺激を受けた可能性のある動物から取得された抗体の配列データを含む配列データグループを準備する工程；前記配列データグループから分子系統樹を作成する工程；前記分子系統樹における配列クラスター中の配列データに示される抗体の配列と前記抗体のオリジナル配列との配列一致度の変化又は抗体の種類の変化に基づいて、前記配列クラスターが抗体作製の候補となる抗原特異抗体を示す配列データを含むか否かを判定する工程；抗原特異抗体を示す配列データを含むと判定された配列クラスターから、抗体作製の候補となる抗原特異抗体を示す配列データを選出する工程；及び、選出された配列データに示されるアミノ酸配列を有する抗体を作製する工程を含む、作製方法。 A determination system for determining whether or not a sequence cluster includes sequence data indicating an antigen-specific antibody that is a candidate for antibody production, comprising a control unit and a storage unit, the storage unit including a sequence cluster containing sequence data indicating an antigen-specific antibody that is a candidate for antibody production. A database containing sequence data groups containing sequence data of antibodies obtained from animals of different sexes is stored, and the control unit includes a molecular phylogenetic tree creation unit that creates a molecular phylogenetic tree from the sequence data groups; The sequence cluster is an antigen for which the sequence cluster is a candidate for antibody production based on a change in sequence identity between the sequence data of the sequence cluster in the molecular phylogenetic tree and the original sequence of the antibody or a change in the type of antibody. a sequence data determination unit that determines whether or not the original sequence contains sequence data indicating a specific antibody; A determination system that determines which sequence has the highest degree of sequence identity with the genome sequence.
A method for producing antigen-specific antibodies, the step of preparing a sequence data group containing sequence data of antibodies obtained from animals that may have received immune stimulation; creating a molecular phylogenetic tree from the sequence data group. Step: The sequence cluster is determined as a candidate for antibody production based on a change in sequence identity between the sequence data of the sequence cluster in the molecular phylogenetic tree and the original sequence of the antibody or a change in the type of antibody. Step of determining whether or not sequence data indicating an antigen-specific antibody is included; From sequence clusters determined to include sequence data indicating an antigen-specific antibody, sequence data indicating an antigen-specific antibody that is a candidate for antibody production is determined. A production method comprising the steps of selecting; and producing an antibody having an amino acid sequence shown in the selected sequence data.

抗体作製の候補となる抗原特異抗体を示す配列データを選出する方法であって、免疫刺激を受けた可能性のある動物から取得された抗体の配列データを含む配列データグループを準備する工程；及び、配列データグループから抗原特異抗体を示す配列データを選出する工程を含み、前記準備工程が、前記配列データグループ中の特定の配列データに示される配列と所定の配列一致度を有する配列に関する配列データの個数が、配列データの個数に関する閾値未満である場合に、前記特定の配列データを前記配列データグループから除外する工程を含み、前記選出工程が、前記配列データグループ中の配列データに示される抗体の配列と前記抗体のオリジナル配列との配列一致度と、閾値とを対比する工程を含み、前記オリジナル配列が、配列データに示される抗体の配列と、前記抗体が得られた動物に対応する動物種のゲノム配列との配列一致度が最も高い配列である、選出方法。
抗体作製の候補となる抗原特異抗体を示す配列データを選出する選出システムであって、制御部と、記憶部とを備え、前記記憶部は、免疫刺激を受けた可能性のある動物から取得された抗体の配列データを含む配列データグループを含むデータベースを格納しており、前記制御部は、前記配列データグループ中の特定の配列データに示される配列と所定の配列一致度を有する配列に関する配列データの個数が、配列データの個数に関する閾値未満である場合に、前記特定の配列データを前記配列データグループから除外する工程を含む、配列データ準備部と、前記配列データグループ中の配列データに示される抗体の配列と前記抗体のオリジナル配列との配列一致度と、閾値とを対比する工程を含む、配列データ選出部とを含み、前記オリジナル配列が、配列データに示される抗体の配列と、前記抗体が得られた動物に対応する動物種のゲノム配列との配列一致度が最も高い配列である、判定システム。 A method for selecting sequence data indicating antigen-specific antibodies that are candidates for antibody production, the step of preparing a sequence data group containing sequence data of antibodies obtained from animals that may have received immune stimulation; , the step of selecting sequence data indicative of an antigen-specific antibody from a sequence data group, wherein the preparation step selects sequence data relating to a sequence having a predetermined degree of sequence identity with a sequence indicated by specific sequence data in the sequence data group; is less than a threshold regarding the number of sequence data, the selection step includes the step of excluding the specific sequence data from the sequence data group, and the selection step includes the step of excluding the specific sequence data from the sequence data group, and the selecting step and the original sequence of the antibody, and a threshold value, the original sequence is the sequence of the antibody shown in the sequence data, and the animal corresponding to the animal from which the antibody was obtained. The selection method is the sequence with the highest degree of sequence identity with the genome sequence of the species.
A selection system for selecting sequence data indicative of antigen-specific antibodies that are candidates for antibody production, comprising a control unit and a storage unit, the storage unit comprising sequence data obtained from an animal that may have received immune stimulation. The controller stores a database containing sequence data groups containing sequence data of antibodies that have been obtained, and the control unit stores sequence data regarding sequences having a predetermined degree of sequence identity with a sequence indicated by specific sequence data in the sequence data group. is less than a threshold regarding the number of sequence data, the sequence data preparation unit includes the step of excluding the specific sequence data from the sequence data group, and the sequence data shown in the sequence data in the sequence data group. a sequence data selection unit that includes a step of comparing the degree of sequence identity between the antibody sequence and the original sequence of the antibody and a threshold value, and the original sequence is the sequence of the antibody shown in the sequence data and the antibody sequence. is the sequence that has the highest sequence identity with the genome sequence of the animal species corresponding to the obtained animal.

抗体作製の候補となる抗原特異抗体を示す配列データを選出する方法であって、免疫刺激を受けた可能性のある動物から取得された抗体の配列データを含む配列データグループを準備する工程；及び配列データグループから抗原特異抗体を示す配列データを選出する工程；を含み、前記準備工程が、前記配列データグループ中の特定の配列データに示される配列と所定の配列一致度を有する配列に関する配列データの個数が、配列データの個数に関する閾値未満である場合に、前記特定の配列データを前記配列データグループから除外する工程を含み、前記選出工程が、前記配列データグループ中の配列データに示される抗体の配列と前記抗体のオリジナル配列との配列一致度と、閾値とを対比する工程を含み、前記オリジナル配列が、配列データに示される抗体の配列と、前記抗体が得られた動物に対応する動物種のゲノム配列との配列一致度が最も高い配列である、選出方法。 A method for selecting sequence data indicating antigen-specific antibodies that are candidates for antibody production, the step of preparing a sequence data group containing sequence data of antibodies obtained from animals that may have received immune stimulation; selecting sequence data indicative of an antigen-specific antibody from a sequence data group; the preparation step selects sequence data relating to a sequence having a predetermined degree of sequence identity with a sequence indicated by specific sequence data in the sequence data group; is less than a threshold regarding the number of sequence data, the selection step includes the step of excluding the specific sequence data from the sequence data group, and the selection step includes the step of excluding the specific sequence data from the sequence data group, and the selecting step and the original sequence of the antibody, and a threshold value, the original sequence is the sequence of the antibody shown in the sequence data, and the animal corresponding to the animal from which the antibody was obtained. The selection method is the sequence with the highest degree of sequence identity with the genome sequence of the species.

抗体作製の候補となる抗原特異抗体を示す配列データを選出する選出システムであって、制御部と、記憶部とを備え、前記記憶部は、免疫刺激を受けた可能性のある動物から取得された抗体の配列データを含む配列データグループを含むデータベースを格納しており、前記制御部は、前記配列データグループ中の特定の配列データに示される配列と所定の配列一致度を有する配列に関する配列データの個数が、配列データの個数に関する閾値未満である場合に、前記特定の配列データを前記配列データグループから除外する処理を実行すること含む、配列データ準備部と、前記配列データグループ中の配列データに示される抗体の配列と前記抗体のオリジナル配列との配列一致度と、閾値とを対比する処理を実行することを含む、配列データ選出部とを含み、前記オリジナル配列が、配列データに示される抗体の配列と、前記抗体が得られた動物に対応する動物種のゲノム配列との配列一致度が最も高い配列である、判定システム。 A selection system for selecting sequence data indicative of antigen-specific antibodies that are candidates for antibody production, comprising a control unit and a storage unit, the storage unit comprising sequence data obtained from an animal that may have received immune stimulation. The controller stores a database containing sequence data groups containing sequence data of antibodies that have been obtained, and the control unit stores sequence data regarding sequences having a predetermined degree of sequence identity with a sequence indicated by specific sequence data in the sequence data group. is less than a threshold regarding the number of array data, an array data preparation unit including executing a process of excluding the specific array data from the array data group; and array data in the array data group. and a sequence data selection unit that performs a process of comparing the degree of sequence identity between the antibody sequence shown in the sequence of the antibody and the original sequence of the antibody with a threshold value, and the original sequence is shown in the sequence data. A determination system in which the sequence of an antibody has the highest sequence identity with the genome sequence of the animal species corresponding to the animal from which the antibody was obtained.

抗原特異抗体を作製する方法であって、免疫刺激を受けた可能性のある動物から取得された抗体の配列データを含む配列データグループを準備する工程；配列データグループから抗原特異抗体を示す配列データを選出する工程；及び、選出された配列データに示されるアミノ酸配列を有する抗体を作製する工程を含み、前記準備工程が、前記配列データグループ中の特定の配列データに示される配列と所定の配列一致度を有する配列に関する配列データの個数が、配列データの個数に関する閾値未満である場合に、前記特定の配列データを前記配列データグループから除外する工程を含み、前記選出工程が、前記配列データグループ中の配列データに示される抗体の配列と前記抗体のオリジナル配列との配列一致度と、閾値とを対比する工程を含み、前記配列一致度が閾値未満の場合に、前記配列データを、抗原特異抗体を示す配列データとして選出し、前記オリジナル配列が、配列データに示される抗体の配列と、前記抗体が得られた動物に対応する動物種のゲノム配列との配列一致度が最も高い配列である、作製方法。 A method for producing antigen-specific antibodies, the step of preparing a sequence data group containing sequence data of antibodies obtained from an animal that may have received immune stimulation; sequence data indicating antigen-specific antibodies from the sequence data group. and a step of producing an antibody having an amino acid sequence shown in the selected sequence data, and the preparation step includes a step of selecting a sequence shown in specific sequence data in the sequence data group and a predetermined sequence. If the number of sequence data related to sequences having a degree of matching is less than a threshold regarding the number of sequence data, the specific sequence data is excluded from the sequence data group, and the selection step includes the step of excluding the specific sequence data from the sequence data group. The step of comparing the degree of sequence identity between the sequence of the antibody shown in the sequence data and the original sequence of the antibody with a threshold value, and when the degree of sequence identity is less than the threshold value, the sequence data is The original sequence is selected as sequence data indicating an antibody, and the original sequence has the highest sequence identity between the antibody sequence shown in the sequence data and the genome sequence of the animal species corresponding to the animal from which the antibody was obtained. , fabrication method.

本発明の１つの態様によれば、バイオパニングなどの実験手法を用いることなく、迅速に且つ効率的に、抗体作製の候補となる抗原特異抗体の遺伝子情報を取得することができる。また、本発明の他の態様によれば、取得された遺伝子情報に基づいて、迅速に且つ効率的に、抗原特異的な抗体を作製することができる。 According to one aspect of the present invention, genetic information of an antigen-specific antibody that is a candidate for antibody production can be quickly and efficiently obtained without using experimental techniques such as biopanning. Furthermore, according to another aspect of the present invention, antigen-specific antibodies can be rapidly and efficiently produced based on the acquired genetic information.

配列決定の過程で生じ得る配列決定の誤りの統合を示す一連のグラフ。図１ａは、特定の基準配列データα（ｎ＝０）と、前記基準配列データαの配列（以下「基準配列α」という）と同じ配列長であるが、前記基準配列に対してｎ個（ｎ＝１～５）の塩基置換を有する配列の配列データ（以下「照会配列データ」という）とを含む、基準配列αに関するヒストグラムである。図１ｂは、基準配列αに関するヒストグラムの階級値の分布（実線）がポアソン分布に関する確率密度関数（破線）に従うことを示す折れ線グラフである。図１ｃは、基準配列データαに、照会配列データ（ｎ＝１～５）を統合した後の棒グラフである。図１ｄは、図１ａの基準配列データαとは別の基準配列データβと、前記基準配列データβの基準配列βと同じ配列長であるが、基準配列βに対してｎ個（ｎ＝１～５）の塩基置換を有する配列の照会配列データ（ｎ＝１～５）とを含む、基準配列βに関するヒストグラムである。図１ｅは、基準配列データβとは別の基準配列データγ（ｎ＝２）及び基準配列γから派生したと推定される配列を有する照会配列データを除外した後の、基準配列βに関するヒストグラムである。図１ｆは、図１ｄの基準配列βのヒストグラムから除外された基準配列データγと照会配列データとを含む、基準配列γに関するヒストグラムである。A series of graphs showing the integration of sequencing errors that can occur during the sequencing process. In FIG. 1a, the specific reference array data α (n=0) has the same array length as the array of the reference array data α (hereinafter referred to as "reference array α"), but n pieces ( This is a histogram regarding the reference sequence α, which includes sequence data of sequences having base substitutions (n=1 to 5) (hereinafter referred to as "query sequence data"). FIG. 1b is a line graph showing that the distribution of the class values of the histogram (solid line) for the reference array α follows the probability density function (dashed line) for the Poisson distribution. FIG. 1c is a bar graph after integrating the reference sequence data α with the query sequence data (n=1 to 5). FIG. 1d shows reference array data β different from the reference array data α in FIG. This is a histogram regarding the reference sequence β, which includes query sequence data (n=1 to 5) of sequences having base substitutions of ~5). Figure 1e is a histogram for reference sequence β after excluding reference sequence data γ (n=2) different from reference sequence data β and query sequence data having sequences presumed to be derived from reference sequence γ. be. FIG. 1f is a histogram for reference sequence γ, including reference sequence data γ and query sequence data excluded from the histogram for reference sequence β of FIG. 1d. 図２ａは、抗体の可変領域の配列長に基づいたクラス分けを示す折れ線グラフである。図２ｂは、図２ａ中の配列長３５４ｂｐの配列データを、Ｖ遺伝子断片及びＪ遺伝子断片に基づいてクラス分けした表である。FIG. 2a is a line graph showing classification based on the sequence length of antibody variable regions. FIG. 2b is a table in which the sequence data of the sequence length 354 bp in FIG. 2a is classified into classes based on V gene fragments and J gene fragments. 図３ａは、図２ｂにおいてクラス分けした配列データグループ（Ｖ遺伝子断片はｖｈｈ３－Ｓ１であり、Ｊ遺伝子断片はｉｇｈＪ－４である）について、配列クラスター非形成性配列データの除外を行わなかった場合の分子系統樹である（Ｕ４０≧０）。図３ｂは、４０個までの塩基置換を有する配列データの数が１０未満の配列データの除外を行った場合の分子系統樹である（Ｕ４０≧１０）。図３ｃは、４０個までの塩基置換を有する配列データの数が５０未満の配列データの除外を行った場合の分子系統樹である（Ｕ４０≧５０）。図３ａ～３ｃにおいて、赤丸（濃色）はＩｇＧ２の配列データを示し、緑丸（淡色）はＩｇＧ３の配列データを示し、各丸の大きさは出現頻度（統合）を表す。Figure 3a shows a case where non-cluster-forming sequence data were not excluded for the sequence data group classified in Figure 2b (V gene fragment is vhh3-S1, J gene fragment is ighJ-4). (U40≧0). FIG. 3b is a molecular phylogenetic tree obtained by excluding sequence data in which the number of sequence data having up to 40 base substitutions is less than 10 (U40≧10). FIG. 3c is a molecular phylogenetic tree obtained by excluding sequence data in which the number of sequence data having up to 40 base substitutions is less than 50 (U40≧50). In FIGS. 3a to 3c, red circles (dark color) indicate sequence data of IgG2, green circles (light color) indicate sequence data of IgG3, and the size of each circle represents the frequency of appearance (integration). 図４ａは、分子系統樹からの配列クラスターを抽出するための分子系統樹である。図４ｂは、図４ａの分子系統樹から抽出された９個の配列クラスター（ｂ１～ｂ９）を示す図である。Figure 4a is a molecular phylogenetic tree for extracting sequence clusters from the molecular phylogenetic tree. FIG. 4b is a diagram showing nine sequence clusters (b1 to b9) extracted from the molecular phylogenetic tree of FIG. 4a. 図５ａは、配列長３７２ｂｐに基づくクラス分け配列データグループ（Ｖ遺伝子断片はＶＨＨ３－Ｓ８であり、Ｊ遺伝子断片はｉｇｈＪ－６である）について作成された分子系統樹である。図５ｂは、配列長３５４ｂｐに基づくクラス分け配列データグループ（Ｖ遺伝子断片はｖｈｈ３－Ｓ１であり、Ｊ遺伝子断片はｉｇｈＪ－４である）について作成された分子系統樹である。図５ｃ及び図５ｄはそれぞれ、図５ａ及び図５ｂ中、赤丸で示された（曲線で丸く囲われた）配列クラスター中の各配列データのビットスコアと採血のタイミングとの関係を示す散布図である。図５ｅ及び図５ｆはそれぞれ、図５ａ及び図５ｂの系統樹中の赤丸で示された（曲線で丸く囲われた）配列クラスター中の配列データから再構築した抗体の結合力を示すＢｉａｃｏｒｅセンサーグラムである。FIG. 5a is a molecular phylogenetic tree created for a classified sequence data group based on a sequence length of 372 bp (the V gene fragment is VHH3-S8 and the J gene fragment is ighJ-6). FIG. 5b is a molecular phylogenetic tree created for the classified sequence data group based on the sequence length of 354 bp (the V gene fragment is vhh3-S1 and the J gene fragment is ighJ-4). Figures 5c and 5d are scatter plots showing the relationship between the bit score of each sequence data in the sequence cluster indicated by a red circle (circled by a curved line) in Figures 5a and 5b, respectively, and the timing of blood collection. be. Figures 5e and 5f are Biacore sensorgrams showing the avidity of antibodies reconstructed from sequence data in the sequence clusters indicated by red circles (circled by curves) in the phylogenetic trees of Figures 5a and 5b, respectively. It is. 図６ａは、配列長３８１ｂｐに基づくクラス分け配列データグループ（Ｖ遺伝子断片はＶＨＨ３－Ｓ８であり、Ｊ遺伝子断片はｉｇｈＪ－６である）について作成された分子系統樹である。部６ｂは、図６ａ中の赤丸で示した（曲線で丸く囲われた）配列クラスター中の配列データのビットスコアと採血のタイミングとの関係を示す散布図である。図６ｃは、図６ａの系統樹中の赤丸で示された（曲線で丸く囲われた）配列クラスター中の配列データについて作成した分子系統樹、並びに、前記分子系統樹中の破線矢印で示した配列データ（ｃ１、ｃ２、ｃ３）から再構築した抗体の結合力を示すＢｉａｃｏｒｅセンサーグラムである。FIG. 6a is a molecular phylogenetic tree created for a classified sequence data group based on a sequence length of 381 bp (the V gene fragment is VHH3-S8 and the J gene fragment is ighJ-6). Part 6b is a scatter diagram showing the relationship between the bit score of sequence data in the sequence cluster indicated by a red circle (circled by a curved line) in FIG. 6a and the timing of blood collection. Figure 6c shows a molecular phylogenetic tree created for the sequence data in the sequence clusters indicated by red circles (circled by curved lines) in the phylogenetic tree of Figure 6a, as well as the molecular phylogenetic tree shown by the dashed arrows in the molecular phylogenetic tree. This is a Biacore sensorgram showing the binding strength of antibodies reconstructed from sequence data (c1, c2, c3). 実施形態２に係る判定システム１の概略構成を示すブロック図。FIG. 2 is a block diagram showing a schematic configuration of a determination system 1 according to a second embodiment. 実施形態２に係るデータベース１０７の概略構成を示すブロック図。FIG. 2 is a block diagram showing a schematic configuration of a database 107 according to a second embodiment. 実施形態２に係る配列データグループの準備ステップを示すフローチャート。7 is a flowchart showing steps for preparing an array data group according to the second embodiment. 実施形態２に係る配列ライブラリー準備ステップを示すフローチャート。3 is a flowchart showing sequence library preparation steps according to Embodiment 2. 実施形態２に係る配列決定の誤りの統合ステップを示すフローチャート。10 is a flowchart illustrating steps for integrating sequencing errors according to Embodiment 2. 実施形態２に係る統合用配列データの選出ステップを示すフローチャート。7 is a flowchart showing a step of selecting sequence data for integration according to the second embodiment.

［定義］
本明細書において「配列データ」は、抗体の一部若しくは全部をコードする塩基配列、又はそれらのアミノ酸配列に関する情報（「配列情報」又は単に「配列」ともいう）を少なくとも含む、コンピュータに読み取り可能な電子データであって、免疫刺激を受けた可能性のある動物から取得された抗体の配列情報を含む電子データを意味する。抗体の配列データは、例えば、免疫刺激を受けた可能性のある動物から採取した血液サンプルに存在したリンパ系細胞から、公知の配列決定法又は市販の配列決定装置を用いて取得することができる。血液サンプルは、例えば、動物から採取した全血、血漿又は血清、若しくは抗体を単離するための処理が施されたサンプルであってよい。公知の配列決定法は、例えば、サンガー法に基づく解析方法であってよい。市販の配列決定装置は、例えば、イルミナ社のＭｉｓｅｑ又はＨｉＳｅｑ４０００などの次世代シークエンサーであってよい。 [Definition]
As used herein, "sequence data" refers to computer-readable information that includes at least the base sequence encoding part or all of an antibody, or information regarding the amino acid sequence thereof (also referred to as "sequence information" or simply "sequence"). refers to electronic data containing sequence information of antibodies obtained from animals that may have received immune stimulation. Antibody sequence data can be obtained, for example, from lymphoid cells present in a blood sample taken from an animal that may have undergone immune stimulation, using known sequencing methods or commercially available sequencing equipment. . The blood sample can be, for example, whole blood, plasma or serum taken from an animal, or a sample that has been processed to isolate antibodies. The known sequencing method may be, for example, an analysis method based on the Sanger method. A commercially available sequencing device may be, for example, a next generation sequencer such as Illumina's Miseq or HiSeq 4000.

抗体の配列情報は、限定するものではないが、１以上の相補性決定領域（ＣＤＲ）をコードする塩基配列若しくはそのアミノ酸配列に関する情報を含む。抗体の配列情報は、例えば、ＣＤＲを少なくとも含む抗体の可変領域（Ｖ領域）の一部をコードする塩基配列若しくはそのアミノ酸配列に関する情報を含む。配列データは、前記塩基配列若しくはアミノ酸配列に関する情報に加えて、例えば、配列決定の品質に関する情報（「品質スコア」ともいう）、抗体を採取した時期に関する情報、抗体を採取した動物種に関する情報を含んでもよい。 Antibody sequence information includes, but is not limited to, information regarding the base sequence encoding one or more complementarity determining regions (CDRs) or the amino acid sequence thereof. The antibody sequence information includes, for example, information regarding the base sequence encoding a part of the antibody variable region (V region) including at least the CDR or the amino acid sequence thereof. In addition to the information regarding the base sequence or amino acid sequence, the sequence data includes, for example, information regarding the quality of sequencing (also referred to as "quality score"), information regarding the time when the antibody was collected, and information regarding the animal species from which the antibody was collected. May include.

本明細書において「配列データグループ」は、抗体の配列情報を有する複数の配列データの集まりを意味する。配列データグループは、限定するものではないが、次世代シークエンサーによる配列決定により得られた各抗体の配列情報を有する膨大な数の配列データの集まり（「配列決定後の配列データグループ」ともいう）であってよい。配列決定後の配列データは、例えば、同一の動物から異なる時期に取得された配列データの集まりを複数個組み合せたものであってよい。配列データグループは、例えば、配列決定後の配列データグループに対して、後述のクリーンアップ、ユニーク配列データへの集約、配列決定の誤りの統合、クラス分け、及び配列クラスター非形成性配列データの除去のいずれか単独の工程又は２以上の組合せの工程若しくは全ての工程を行って生成される配列データの集まりであってもよい。膨大な数の配列データは、例えば、１万～１０００万個、１０万～８００万個、１００万～７００万個、又は２００万～６００万個の配列データであってよい。 As used herein, the term "sequence data group" refers to a collection of multiple sequence data having antibody sequence information. A sequence data group is, but is not limited to, a collection of a huge number of sequence data containing sequence information for each antibody obtained by sequencing using a next-generation sequencer (also referred to as a "sequence data group after sequencing"). It may be. The sequence data after sequencing may be, for example, a combination of multiple sets of sequence data obtained from the same animal at different times. For example, a sequence data group is processed by cleaning up the sequence data group after sequencing, aggregation into unique sequence data, integrating sequencing errors, classification, and removing sequence data that does not form sequence clusters. It may be a collection of sequence data generated by performing any one of the steps alone, a combination of two or more steps, or all of the steps. The huge number of sequence data may be, for example, 10,000 to 10 million pieces, 100,000 to 8 million pieces, 1 million to 7 million pieces, or 2 million to 6 million pieces of sequence data.

本明細書において「ユニーク配列」は、ある配列データグループ内で、配列の並び及び配列長が同じ配列情報を有する唯一の配列データを意味する。ユニーク配列データは、限定するものではないが、配列決定後の配列データグループ内に、配列の並び及び配列長が同じ配列情報を有する２個以上の配列データが存在した場合に、後述のユニーク配列データへの集約処理を実行して得られた配列データグループ内に、前記した配列の並び及び配列長に加えて、集約された配列データの個数に関する情報を更に有する唯一つの配列データであってよい。他の例において、配列決定後の配列データグループ内に、配列の並び及び配列長が同じ配列情報を有する配列データがその１個のみであった場合、配列データグループ内に、前記した配列の並び及び配列長に加えて、集約された配列データの個数が１個であるとの情報を更に有する唯一の配列データであってよい。 As used herein, a "unique sequence" refers to unique sequence data that has sequence information with the same sequence order and sequence length within a certain sequence data group. Unique sequence data is defined as, but not limited to, when there are two or more pieces of sequence data having the same sequence information and sequence length in the sequence data group after sequencing, as described below. In the array data group obtained by performing data aggregation processing, it may be the only array data that further includes information regarding the number of aggregated array data pieces in addition to the above-mentioned array arrangement and array length. . In another example, if there is only one piece of sequence data in the sequence data group after sequencing that has sequence information with the same sequence order and sequence length, the above-mentioned sequence order in the sequence data group In addition to the array length, the array data may be the only array data that further includes information that the number of aggregated array data is one.

本明細書において「次世代シークエンサー」は、膨大な数の遺伝子断片の配列情報を同時に決定できる配列決定装置を意味する。膨大な数の遺伝子断片は、限定するものではないが、１０万断片以上、例えば３０万断片以上、１００万断片以上、３００万断片以上、１０００万断片以上、３０００万断片以上、１億断片以上であって、例えば２億断片以下、１億断片以下、５０００万断片以下の遺伝子断片であってよい。遺伝子断片の鎖長は、限定するものではないが、３０ｂｐ～１０００ｂｐの間、例えば９００ｂｐ以下、５００ｂｐ以下、２００ｂｐ以下であってよい。次世代シークエンサーにより配列情報が決定された抗体断片の配列データは、限定するものではないが、品質スコアを有してもよい。 As used herein, the term "next generation sequencer" refers to a sequencing device that can simultaneously determine the sequence information of a huge number of gene fragments. The huge number of gene fragments includes, but is not limited to, 100,000 or more fragments, such as 300,000 or more fragments, 1 million or more fragments, 3 million or more fragments, 10 million or more fragments, 30 million or more fragments, 100 million or more fragments. For example, the gene fragments may be 200 million fragments or less, 100 million fragments or less, or 50 million fragments or less. The chain length of the gene fragment is not limited, but may be between 30 bp and 1000 bp, such as 900 bp or less, 500 bp or less, or 200 bp or less. Sequence data of antibody fragments whose sequence information has been determined by a next generation sequencer may include, but are not limited to, a quality score.

本明細書において「抗体」は、所定の抗原に対する結合能を有する免疫グロブリン（Ｉｇ）分子の全部もしくは一部を意味する。抗体は、限定するものではないが、免疫グロブリン分子を構成する重鎖（Ｈ鎖）及び軽鎖（Ｌ鎖）のうち可変領域（Ｖ領域）の相補性決定領域（ＣＤＲ）に対応するアミノ酸配列を有するタンパク質である。抗体は、例えば、完全抗体、抗体断片、キメラ抗体を含む。抗体のタイプは、限定するものではないが、ＩｇＧ、ＩｇＭ、ＩｇＡ、ＩｇＤ又はＩｇＥであってよい。ＩｇＧは、例えば、ＩｇＧ１、ＩｇＧ２、ＩｇＧ３、及びＩｇＧ４のサブクラスであってよい。 As used herein, "antibody" refers to all or part of an immunoglobulin (Ig) molecule that has the ability to bind to a predetermined antigen. Antibodies include, but are not limited to, amino acid sequences corresponding to the complementarity determining regions (CDRs) of the variable regions (V regions) of the heavy chains (H chains) and light chains (L chains) that constitute immunoglobulin molecules. It is a protein with Antibodies include, for example, whole antibodies, antibody fragments, and chimeric antibodies. The type of antibody may be, but is not limited to, IgG, IgM, IgA, IgD or IgE. IgG may be, for example, subclasses of IgG1, IgG2, IgG3, and IgG4.

本明細書において「抗体断片」は、所定の抗原に対する結合能を有する抗体の一部を意味する。抗体断片は、限定するものではないが、Ｆａｂ、Ｆ（ａｂ’）_２、又は一本鎖抗体（ｓｃＦｖ）を含む。抗体断片は公知の方法により製造することができる。Ｆａｂは、例えば、完全抗体から公知の酵素処理により製造することができる。 As used herein, "antibody fragment" refers to a portion of an antibody that has the ability to bind to a predetermined antigen. Antibody fragments include, but are not limited to, Fab, F(ab') ₂ , or single chain antibodies (scFv). Antibody fragments can be produced by known methods. Fab can be produced, for example, from a complete antibody by known enzyme treatment.

本明細書において「キメラ抗体」は、所定の抗原に対する結合能を有する、２以上のタンパク質の全部又は一部が結合した融合タンパク質を意味する。キメラ抗体は、限定するものではないが、ヒト以外の種由来の抗体の可変領域（Ｖ領域）とヒト由来の抗体の定常領域（Ｃ領域）とを連結させた抗体、又はヒト以外の種由来の抗体のＣＤＲとＣＤＲ以外のヒト抗体部分とを連結させた抗体（「ヒト化抗体」ともいう）を含む。キメラ抗体は公知の方法により製造することができる。キメラ抗体は、例えば、遺伝子工学的手法により遺伝子を連結することで製造することができる。 As used herein, the term "chimeric antibody" refers to a fusion protein in which all or part of two or more proteins are bound together, and has the ability to bind to a predetermined antigen. Chimeric antibodies include, but are not limited to, antibodies in which the variable region (V region) of an antibody derived from a species other than humans and the constant region (C region) of an antibody derived from humans are linked, or antibodies derived from a species other than humans. (also referred to as a "humanized antibody") in which the CDRs of an antibody of 2000 and human antibody parts other than the CDRs are linked. Chimeric antibodies can be produced by known methods. Chimeric antibodies can be produced, for example, by linking genes using genetic engineering techniques.

本明細書において「抗原特異抗体」は、所定以上の結合力でもって目的とする抗原に結合する抗体を意味する。結合力は、限定するものではないが、解離定数（Ｋ_Ｄ）、親和性定数（Ｋ_Ａ）、結合速度定数（ｋ_ｏｎ）、又は解離速度定数（ｋ_ｏｆｆ）により示される。結合力は、公知の方法若しくは市販の測定装置により測定することができる。市販の測定装置は、例えば、ＧＥヘルスケア社のＢｉａｃｏｒｅであってよい。 As used herein, "antigen-specific antibody" refers to an antibody that binds to a target antigen with a predetermined binding force or higher. Binding strength is indicated by, but is not limited to, a dissociation constant (K _D ), an affinity constant (K _A ), an association rate constant (k _on ), or a dissociation rate constant (k _off ). Bonding strength can be measured by a known method or a commercially available measuring device. A commercially available measurement device may be, for example, GE Healthcare's Biacore.

所定の結合力は、限定するものではないが、１０^－５Ｍ以下、好ましくは１０^－６Ｍ以下、より好ましくは１０^－７Ｍ以下、さらにより好ましくは１０^－８Ｍ以下の解離定数（Ｋ_D）であってよい。所定の結合力は、限定するものではないが、目的の抗原に対しては前記解離定数を示し、前記抗原以外の物質とは、１０^－５Ｍ以上、好ましくは１０^－４Ｍ以上、又はより好ましくは１０^－３Ｍ以上の解離定数を示すことであってよい。 ^The predetermined binding ^strength is, but ^is not limited to, a ^dissociation constant (K _D ). The predetermined binding strength is, but is not limited to, the above-mentioned dissociation constant for the antigen of interest, and for substances other than the antigen, 10 ⁻⁵ M or more, preferably 10 ⁻⁴ M or more, or more. Preferably, it may exhibit a dissociation constant of 10 ⁻³ M or more.

抗原特異抗体は、例えば、所定の抗原に対して１０^－７Ｍ以下の解離定数を示し、前記抗原以外の物質とは１０^－４Ｍ以上の解離定数を示す。抗原特異抗体は、例えば、所定の抗原に対して１０^－８Ｍ以下の解離定数を示し、前記抗原以外の物質とは１０^－３Ｍ以上の解離定数を示す。抗原特異抗体は、例えば、抗原に対して１０^５Ｍ^－１・ｓ^－１以上、好ましくは１０^６Ｍ^－１・ｓ^－１以上の結合速度定数を有する。抗原特異抗体は、例えば、抗原から１０^－３ｓ^－１以下、好ましくは１０^－４ｓ^－１以下の解離速度定数を有する。 For example, an antigen-specific antibody exhibits a dissociation constant of 10 ⁻⁷ M or less with respect to a predetermined antigen, and a dissociation constant of 10 ⁻⁴ M or more with substances other than the antigen. For example, an antigen-specific antibody exhibits a dissociation constant of 10 ⁻⁸ M or less with respect to a predetermined antigen, and a dissociation constant of 10 ⁻³ M or more with substances other than the antigen. The antigen-specific antibody has, for example, a binding rate constant for the antigen of 10 ⁵ M ⁻¹ ·s ⁻¹ or more, preferably 10 ⁶ M ⁻¹ ·s ⁻¹ or more. Antigen-specific antibodies have, for example, a dissociation rate constant from the antigen of 10 ⁻³ s ⁻¹ or less, preferably 10 ⁻⁴ s ^{−1 or} less.

本明細書において「可変領域（Ｖ領域）」は、免疫グロブリン分子のアミノ（Ｎ）末端側約１１０残基のアミノ酸からなる部分を意味する。Ｈ鎖の可変領域はＶ_Ｈと表され、Ｌ鎖の可変領域はＶ_Ｌと表される。本明細書において「相補性決定領域（ＣＤＲ）」は、抗原分子と相補的な立体構造を形成し、抗体の特異性を決める領域を意味する。ＣＤＲは、限定するものではないが、抗体のＨ鎖及びＬ鎖のポリペプチド鎖中にそれぞれ３カ所に分離し存在し、Ｈ鎖ではＨ１、Ｈ２及びＨ３と表され、Ｌ鎖ではＬ１、Ｌ２及びＬ３と表される。 As used herein, the term "variable region (V region)" refers to a portion consisting of approximately 110 amino acid residues on the amino (N) terminal side of an immunoglobulin molecule. The variable region of the heavy chain is designated _VH , and the variable region of the light chain is designated _VL . As used herein, "complementarity determining region (CDR)" means a region that forms a complementary three-dimensional structure with an antigen molecule and determines the specificity of an antibody. Although not limited to, CDRs are present in three separate locations in the H chain and L chain polypeptide chains of antibodies, and are represented as H1, H2, and H3 in the H chain, and L1, L2 in the L chain. and L3.

本明細書において「定常領域（Ｃ領域）」は、可変領域（Ｖ領域）以外の免疫グロブリン分子の部分を意味する。本明細書において「Ｖ遺伝子」及び「Ｃ遺伝子」はそれぞれ、可変領域（Ｖ領域）をコードする遺伝子、及び定常領域（Ｃ領域）をコードする遺伝子を意味する。 As used herein, "constant region (C region)" means a portion of an immunoglobulin molecule other than the variable region (V region). As used herein, "V gene" and "C gene" refer to a gene encoding a variable region (V region) and a gene encoding a constant region (C region), respectively.

本明細書において「Ｖ－Ｊ組換え」は、Ｂ細胞の分化の過程で、Ｖ遺伝子断片とＪ遺伝子断片とが連結し、抗体の可変領域を形成する過程を意味する。本明細書において「ＶＤＪ組換え」は、Ｖ遺伝子断片とＪ遺伝子断片との間にさらにＤ（ｄｉｖｅｒｓｉｔｙ）遺伝子断片が連結されてＨ鎖が形成される過程を意味する。本明細書において「Ｖ（Ｄ）Ｊ組換え」は、Ｖ－Ｊ組換え及び／又はＶＤＪ組換えを意味する。Ｖ（Ｄ）Ｊ組換えでは、ゲノム配列上に存在する複数種類のＶ遺伝子断片、及び複数種類のＪ遺伝子断片、場合により複数種類のＤ遺伝子断片からそれぞれ１つずつ組合される際に、塩基の欠失又は付加がそれらの連結部位でランダムに生じ得る。 As used herein, "VJ recombination" refers to a process in which a V gene fragment and a J gene fragment are linked to form the variable region of an antibody during the differentiation process of B cells. As used herein, "VDJ recombination" refers to a process in which a D (diversity) gene fragment is further linked between a V gene fragment and a J gene fragment to form an H chain. As used herein, "V(D)J recombination" means VJ recombination and/or VDJ recombination. In V(D)J recombination, bases are Deletions or additions may occur randomly at their joining sites.

本明細書において「Ｖ遺伝子断片」及び「Ｊ遺伝子断片」はそれぞれ、免疫グロブリンポリペプチド鎖の可変領域（Ｖ領域）の一部をコードする遺伝子を意味する。免疫グロブリンのＨ鎖のＶ領域は１１０～１２０個のアミノ酸からなり、その遺伝子は、Ｎ末端から９０～１００個のアミノ酸をコードする部分（Ｖ_Ｈ）、続く数個のアミノ酸をコードする部分（Ｄ_Ｈ）、残り十数個のアミノ酸をコードする部分（Ｊ_Ｈ）の３つの遺伝子断片（ｇｅｎｅｓｅｇｍｅｎｔ）から構成される。免疫グロブリンのＬ鎖のＶ領域は、Ｄ遺伝子断片を含まず、Ｖ遺伝子断片（Ｖ_Ｌ）及びＪ遺伝子断片（Ｊ_Ｌ）から構成される。 As used herein, "V gene fragment" and "J gene fragment" each refer to a gene encoding a part of the variable region (V region) of an immunoglobulin polypeptide chain. The V region of the immunoglobulin H chain consists of 110 to 120 amino acids, and the gene consists of a part that encodes 90 to 100 amino acids from the N-terminus (V _H ), and a part that encodes the following several amino acids ( It consists of three gene segments: D _H ), and the remaining part (J _H ) encoding more than ten amino acids. The V region of the immunoglobulin L chain does not contain a D gene fragment and is composed of a V gene fragment (V _L ) and a J gene fragment (J _L ).

本明細書において「親和性成熟」は、抗体が免疫対象の抗原に示す親和性が、免疫後の時間経過とともに増加する現象を意味する。親和性成熟は、限定するものではないが、抗原で刺激されることで活性化したＢ細胞が急速に増殖する過程で、Ｈ鎖、Ｌ鎖のＶ領域に点変異が生じることにより生じ得る。親和性成熟における変異では、限定するものではないが、Ｖ領域のＣＤＲに１細胞世代あたり約１つの変異が導入され得る。 As used herein, "affinity maturation" refers to a phenomenon in which the affinity of an antibody for an antigen to which it is immunized increases over time after immunization. Affinity maturation can occur, but is not limited to, by point mutations occurring in the V regions of H and L chains during the rapid proliferation of B cells activated by antigen stimulation. Mutations during affinity maturation may introduce, but are not limited to, about one mutation per cell generation in the CDRs of the V region.

本明細書において「オリジナル配列」（「始原配列（primordial sequence）」又は「祖先配列（ancestral sequence）」とも称される）は、抗体の配列と、その抗体が得られた動物に対応する動物種のゲノム配列との配列一致度が最も高い配列を意味する。配列一致度が最も高い配列は、類似する配列が複数存在する場合に、それらの類似配列中で、配列一致度が最も高い配列を意味する。オリジナル配列の文脈において、抗体の配列は、限定するものではないが、抗体の可変領域を構成するＶ遺伝子断片及びＪ遺伝子断片のいずれか一方又は両方である。オリジナル配列は、例えば、抗体の可変領域を構成するＶ遺伝子断片及びＪ遺伝子断片のいずれか一方又は両方と、前記抗体が得られた動物に対応する動物種のゲノム配列との配列一致度がホモロジー検索で示された複数の類似配列のうち、配列一致度が最も高い遺伝子断片単独又は組合せである。 As used herein, the "original sequence" (also referred to as "primordial sequence" or "ancestral sequence") refers to the sequence of the antibody and the animal species corresponding to the animal from which the antibody was obtained. This means the sequence with the highest degree of sequence identity with the genome sequence of The sequence with the highest degree of sequence identity means, when a plurality of similar sequences exist, the sequence with the highest degree of sequence identity among the similar sequences. In the context of the original sequence, the antibody sequence is, but is not limited to, one or both of the V and J gene fragments that make up the variable region of the antibody. The original sequence is, for example, homologous in sequence identity between either or both of the V gene fragment and J gene fragment constituting the variable region of the antibody and the genome sequence of the animal species corresponding to the animal from which the antibody was obtained. Among the multiple similar sequences shown in the search, this is the gene fragment alone or in combination that has the highest degree of sequence identity.

本明細書において「ホモロジー検索」は、問い合わせ配列（ｑｕｅｒｙｓｅｑｕｅｎｃｅ）に対して、類似した配列をデータベースから探し出す操作を意味する。ホモロジー検索は、限定するものではないが、公知のプログラムにより実行される。そのような公知のプログラムは、例えば、ＦＡＳＴＡプログラム、ＢＬＳＡＴプログラム、及びＰＳＩ－ＢＬＡＳＴプログラムを含む。配列一致度の算出は、塩基配列又はアミノ酸配列のいずれに対しても行うことができる。 As used herein, "homology search" refers to an operation of searching a database for a sequence similar to a query sequence. Homology searches are performed by, but are not limited to, known programs. Such known programs include, for example, the FASTA program, the BLSAT program, and the PSI-BLAST program. Calculation of sequence identity can be performed for either base sequences or amino acid sequences.

本明細書において「配列一致度」は、２つの配列間で配列が一致する度合いを示す指標を意味する。配列一致度は、限定するものではないが、配列一致の数、配列同一性、又は配列同一性に基づくスコア値である。他の例において、配列一致度は、配列不一致の数、配列不一致性、又は配列不一致性に基づくスコア値である。配列一致度は、限定するものではないが、ギャップペナルティを用いて算出される。 As used herein, "sequence identity" refers to an index indicating the degree of sequence identity between two sequences. Sequence identity is, but is not limited to, the number of sequence matches, sequence identity, or a score value based on sequence identity. In other examples, the degree of sequence identity is a score value based on the number of sequence discrepancies, sequence discrepancies, or sequence discrepancies. Sequence identity is calculated using, but not limited to, gap penalties.

配列一致の数又は配列不一致の数は、問い合わせ配列と類似した配列との間で一致した又は不一致となった塩基若しくはアミノ酸の個数を意味する。配列同一性又は配列不一致性は、問い合わせ配列の長さに対する一致した又は不一致となった塩基若しくはアミノ酸の個数の割合（＝［一致した又は不一致となった塩基若しくはアミノ酸の個数］／［問い合わせ配列の長さ］）を意味する。スコア値は、配列同一性又は配列不一致性をスコア化した数値を意味する。 The number of sequence matches or the number of sequence mismatches refers to the number of bases or amino acids that are matched or mismatched between the query sequence and a similar sequence. Sequence identity or sequence discrepancy is the ratio of the number of matched or mismatched bases or amino acids to the length of the query sequence (=[number of matched or mismatched bases or amino acids]/[of the query sequence]) length]). The score value means a numerical value obtained by scoring sequence identity or sequence discrepancy.

配列同一性のスコア化は、限定するものではないが、スコア行列を用いて行われる。塩基配列の一致及び不一致をスコア化するスコア行列は、例えば、各塩基（Ａ、Ｔ、Ｇ、Ｃ）が置換される確率を一定とするＪｕｋｅｓ－Ｃａｎｔｏｒ６９モデル、Ｋｉｍｕｒａ８０モデル、Ｔａｍｕｒａ９２モデル、Ｔａｍｕｒａ－Ｎｅｉ９３モデル、一般時間反転可能モデル（ｇｅｎｅｒａｌｔｉｍｅ－ｒｅｖｅｒｓａｌ：ＧＴＲ）モデルがある。アミノ酸配列の一致及び不一致をスコア化するスコアモデルは、例えば、ＰＡＭ行列（例えばＰＡＭ２５０）、ＢＬＯＳＵＭ行列（例えばＢＬＯＳＵＭ５０、ＢＬＯＳＵＭ６２、ＢＬＯＳＵＭ８０）がある。 Sequence identity scoring is performed using, but is not limited to, a scoring matrix. Score matrices for scoring matches and mismatches in base sequences include, for example, Jukes-Cantor69 model, Kimura80 model, Tamura92 model, and Tamura-Nei93 model in which the probability of each base (A, T, G, C) being replaced is constant. There is a general time-reversal (GTR) model. Scoring models for scoring matches and mismatches in amino acid sequences include, for example, PAM matrices (eg, PAM250) and BLOSUM matrices (eg, BLOSUM50, BLOSUM62, BLOSUM80).

本明細書において「同一の配列」は、配列の並びが同一であり、且つ、配列長が同じ塩基配列又はアミノ酸配列を意味する。 As used herein, the term "same sequence" refers to base sequences or amino acid sequences that have the same arrangement and length.

本明細書において「配列一致度の変化」は、抗原投与後の時間経過に伴って、配列データグループ中の抗体の配列データに示される配列とそのオリジナル配列との配列一致度が変化すること意味する。配列一致度の変化は、例えば、配列一致度の傾きである。配列一致度の傾きは、採血プロトコル開始時から終了時までの経過時間に対する配列一致度の変化、又は第１回目の採血から採血終了までの採血回数あたりの配列一致度の変化であってよい。 As used herein, "change in sequence identity" means that the degree of sequence identity between the sequence shown in the sequence data of the antibody in the sequence data group and its original sequence changes with the passage of time after antigen administration. do. The change in the degree of sequence identity is, for example, the slope of the degree of sequence identity. The slope of the sequence match may be a change in the sequence match with respect to the elapsed time from the start of the blood collection protocol to the end, or a change in the sequence match per number of blood collections from the first blood collection to the end of blood collection.

本明細書において「抗体種類の変化」は、抗原投与後の時間経過に伴って、抗体のクラスが変化することを意味する。抗体種類の変化は、例えば、抗体のクラススイッチである。一例において、抗体のクラススイッチを示す指標は、ある採血の際に示された抗体集団中の抗体クラスの割合（例えば、［ＩｇＧ２の種類］／［抗体集団中のＩｇＧクラスの種類］）と、その採血後に示される抗体集団中の抗体クラスの割合との差である。 As used herein, "change in antibody type" means that the class of antibody changes over time after antigen administration. The change in antibody type is, for example, an antibody class switch. In one example, the index indicating the class switch of antibodies is the ratio of antibody classes in the antibody population (e.g., [type of IgG2]/[type of IgG class in the antibody population]) shown at the time of blood collection; This is the difference between the proportion of antibody classes in the antibody population shown after blood collection.

本明細書において「分子系統樹」は、抗体の配列に関する分岐関係を示す図を意味する。分子系統樹は、限定するものではないが、星状系統樹又は無限系統樹である。分子系統樹は、限定するものではないが、抗体のアミノ酸配列又は前記アミノ酸配列をコードする塩基配列に関する情報に基づいて作成される。分子系統樹は、公知の方法により作成することができる。分子系統樹を作成するための方法は、例えば、Ｂｉｏｉｎｆｏｒｍａｔｉｃｓ：ＳｅｑｕｅｎｃｅａｎｄＧｅｎｏｍｅＡｎａｌｙｓｉｓ２ｎｄＥｄｉｔｉｏｎ（ＤａｖｉｄＭｏｕｎｔ，ＣｏｌｄＳｐｒｉｎｇＨａｒｂｏｒＬａｂｏｒａｔｏｒｙＰｒｅｓｓ；２ｎｄｅｄｉｔｉｏｎ（Ａｕｇｕｓｔ１６，２００４））に記載の方法を用いることができる。 As used herein, the term "molecular phylogenetic tree" refers to a diagram showing branching relationships regarding antibody sequences. The molecular tree can be, but is not limited to, a star tree or an infinite tree. A molecular phylogenetic tree is created based on, but not limited to, information regarding the amino acid sequence of the antibody or the base sequence encoding the amino acid sequence. A molecular phylogenetic tree can be created by a known method. Methods for creating molecular phylogenetic trees are described, for example, in Bioinformatics: Sequence and Genome Analysis 2nd Edition (David Mount, Cold Spring Harbor Laboratory Press; d edition (August 16, 2004)) can be used.

一例において、分子系統樹は、距離行列に基づいて作成することができる。距離行列に基づいて作成する場合、分子系統樹は、近接結合法、平均距離法又は最小進化法を用いて作成することができる。他の例において、分子系統樹は、距離行列を用いずに作成することができる。距離行列を用いずに作成する場合、分子系統樹は、最大節約法、最尤法又はベイズ推定を用いて作成することができる。分子系統樹は、限定するものではないが、公知のソフトウェア（例えば、統計分析ソフトウェア「Ｒ」のａｐｅパッケージ）、ＭＥＧＡＸ（https://www.megasoftware.net/）を用いて作成することができる。分子系統樹は、配列データグループに対して実行することができる。分子系統樹の作成は、例えば、クラスター形成性配列データグループに対して実行される。他の例において、分子系統樹の作成は、限定するものではないが、後述する配列クラスター中の配列データグループに対して実行してもよい。 In one example, a molecular phylogenetic tree can be created based on a distance matrix. When created based on a distance matrix, a molecular phylogenetic tree can be created using a neighbor-joining method, an average distance method, or a minimum evolution method. In other examples, a molecular phylogenetic tree can be created without using a distance matrix. When created without using a distance matrix, a molecular phylogenetic tree can be created using maximum parsimony, maximum likelihood, or Bayesian estimation. A molecular phylogenetic tree can be created using, but is not limited to, known software (e.g., ape package of statistical analysis software "R"), MEGAX (https://www.megasoftware.net/). . Molecular phylogenetic trees can be performed on groups of sequence data. Creation of a molecular phylogenetic tree is performed, for example, on cluster-forming sequence data groups. In other examples, the creation of a molecular phylogenetic tree may be performed on, but is not limited to, groups of sequence data in sequence clusters as described below.

本明細書において「配列クラスター」は、分子系統樹において、その内部に他とは区別できるような形で相互に結びついた集団（サブグループ）を意味する。配列クラスターは、限定するものではないが、ネットワーク分析の分野で用いられるサブグループを探索する公知の方法を用いて見つけ出すことができる。配列クラスターを見つけ出す方法は、例えば、分離したそれぞれの配列を連結する成分の数に基づいて探索する方法、及び、分子系統樹内部で配列情報の密度が高い部分を取り出す方法がある。これらの方法は、例えば、公知のプログラムにより実行することができる。そのような公知のプログラムとしては、例えば、Ｓｎａパッケージ、及びＩｇｒａｐｈパッケージを用いることができる。一例において、配列クラスターは、分子系統樹における分岐点（連結点ともいう）と分岐点とを結ぶ枝（Ｂｒａｎｃｈ、ｅｄｇｅ）の長さが所定の値以上である場合に、それらの分岐点間の所定の位置から分子系統樹の中心から外側の領域を配列クラスターとして抽出してもよい。 As used herein, the term "sequence cluster" refers to a group (subgroup) that is interconnected in a manner that allows it to be distinguished from others in a molecular phylogenetic tree. Sequence clusters can be found using, without limitation, known methods of searching for subgroups used in the field of network analysis. Methods for finding sequence clusters include, for example, a method of searching based on the number of components connecting each separated sequence, and a method of extracting a portion of a molecular phylogenetic tree with high density of sequence information. These methods can be executed by, for example, known programs. As such known programs, for example, the Sna package and the Igraph package can be used. In one example, a sequence cluster is defined as a sequence cluster between branch points (also referred to as connection points) in a molecular phylogenetic tree when the length of the branch connecting the branch points is greater than or equal to a predetermined value. A region outside the center of the molecular phylogenetic tree from a predetermined position may be extracted as a sequence cluster.

本明細書において「抗体作製の候補」は、配列データグループ又は抗原特異抗体を含むと判定された配列クラスターに存在する配列データを意味する。一例において、抗体作製の候補は、抗原特異抗体を含むと判定された配列クラスターに存在する配列データであってよい。一例において、抗体作製の候補は、クラス分け及び配列クラスター非形成性配列データの排除を行った後に生成された配列グループに存在する配列データであってよい。 As used herein, "candidate for antibody production" refers to sequence data present in a sequence data group or sequence cluster determined to contain an antigen-specific antibody. In one example, candidates for antibody production may be sequence data present in sequence clusters determined to contain antigen-specific antibodies. In one example, candidates for antibody production may be sequence data present in sequence groups generated after classification and elimination of sequence data that do not form sequence clusters.

本明細書において「抗体作製」又は「抗体を作製する」は、抗体を製造することを意味する。抗体は、公知の方法により製造することができる。公知の方法としては、例えば、国際特許公報ＷＯ９９／１８２１２を参照することができる。抗体は、例えば、目的のＣＤＲをコードするＤＮＡ断片を公知の蛋白質発現ベクターに挿入して抗体発現用のプラスミドを構築し、前記プラスミドを所定の宿主に導入し、前記宿主中に目的の抗体を発現させることにより製造又は作製することができる。宿主中に発現させた抗体は、公知の方法により回収及び精製できる。目的のＣＤＲをコードするＤＮＡ断片は、発現させる宿主のコドン出現頻度（ｃｏｄｏｎｕｓａｇｅ又はｃｏｄｏｎｆｒｅｑｕｅｎｃｙ）に応じて最適化できる。 As used herein, "antibody production" or "producing an antibody" means producing an antibody. Antibodies can be produced by known methods. As a known method, reference may be made to, for example, International Patent Publication WO99/18212. For example, an antibody can be produced by constructing a plasmid for antibody expression by inserting a DNA fragment encoding the CDR of interest into a known protein expression vector, introducing the plasmid into a predetermined host, and injecting the antibody of interest into the host. It can be produced or produced by expressing it. Antibodies expressed in a host can be recovered and purified by known methods. A DNA fragment encoding a CDR of interest can be optimized depending on the codon usage or codon frequency of the host in which it is expressed.

本明細書において「抗原」又は「免疫（抗）原」は、抗体又は感作リンパ球を産生させて体液性免疫又は細胞性免疫を誘発する物質を意味する。抗原は、限定するものではないが、その動物の体外から体内に導入された物質であってもよく、又はその動物の体内で産生されたものであってもよい。体外から体内に導入される抗原としては、例えば、興味あるタンパク質性の物質、細菌、又はウイルスであってよい。体内で産生される抗原としては、例えば、突然変異又は外来遺伝子に由来して産生タンパク質性の物質、又は放射線照射等の外的要因により変性されたタンパク質であってよい。 As used herein, the term "antigen" or "immunogen" refers to a substance that induces humoral immunity or cell-mediated immunity by producing antibodies or sensitized lymphocytes. The antigen may be, but is not limited to, a substance introduced into the animal's body from outside the body, or may be produced within the animal's body. The antigen introduced into the body from outside the body may be, for example, a proteinaceous substance of interest, a bacterium, or a virus. Antigens produced in the body may be, for example, proteinaceous substances produced due to mutations or foreign genes, or proteins denatured by external factors such as radiation exposure.

本明細書において「免疫刺激を受けた可能性のある動物」は、限定するものではないが、所定の抗原が投与された動物、放射線照射を受けた動物、ウイルス感染を受けた可能性のある動物を含む。所定の抗原が投与された動物は、例えば、その動物種に関して一般に用いられる公知の投与プロトコル（投与回数、投与間隔、投与量／回）に従って、興味のあるタンパク質抗原が投与された動物である。特定の動物種について公知の投与プロトコルが存在しない場合、投与プロトコルは、当業者が近縁の動物種の公知の投与プロトコルを参照して、適宜設定できる。抗原は、少なくとも１回動物に投与される。抗原は、例えば、その動物において、抗体の親和性成熟が期待される回数投与される。抗原が２回以上投与される場合、それらの抗原の投与間隔は、一定間隔であってもよく不定間隔であってもよい。 As used herein, "animals that may have received immune stimulation" include, but are not limited to, animals that have been administered a predetermined antigen, animals that have been irradiated, and animals that may have been infected with a virus. Contains animals. An animal to which a given antigen has been administered is, for example, an animal to which the protein antigen of interest has been administered according to the known administration protocols (number of doses, interval between doses, dose/time) commonly used for that animal species. If there is no known administration protocol for a specific animal species, an administration protocol can be appropriately determined by those skilled in the art with reference to known administration protocols for closely related animal species. The antigen is administered to the animal at least once. The antigen is administered, for example, in the animal the number of times that affinity maturation of antibodies is expected. When an antigen is administered two or more times, the interval between administrations of the antigen may be at regular intervals or at irregular intervals.

ウイルス感染を受けた可能性のある動物は、例えば、インフルエンザウイルスなどのウイルスが流行している地域に存在する動物を含む。放射線照射を受けた動物は、例えば、放射線治療を受けた動物を含む。限定するものではないが、腫瘍に対する放射線治療を受けた動物では、免疫刺激の結果親和性成熟を経た抗体が得られる場合がある。放射線治療のプロトコルは、当業者であれば、放射線を照射すべき部位の状態及び放射線照射を受ける胴部の状態に応じて適宜設定することができる。 Animals that may have been infected with a virus include, for example, animals that are present in areas where viruses such as influenza viruses are endemic. Irradiated animals include, for example, animals that have undergone radiation therapy. Without limitation, in animals that have undergone tumor-directed radiation therapy, immune stimulation may result in antibodies that have undergone affinity maturation. Those skilled in the art can appropriately set the radiotherapy protocol depending on the condition of the region to be irradiated with radiation and the condition of the torso to be irradiated with radiation.

本明細書において「動物」は、限定するものではないが、軟骨魚類、鯨偶蹄類動物、げっ歯類動物、又は霊長類動物である。軟骨魚類は、例えば、サメ、エイ、およびギンザメを含む。鯨偶蹄類動物は、例えば、アルパカ及びラクダを含むラクダ科動物、ブタを含むイノシシ科動物、並びにウシ、ヤギ及びヒツジを含むウシ科動物を含む。鯨偶蹄類動物は、好ましくはアルパカである。げっ歯類動物は、例えば、ウサギを含むウサギ目動物、並びにマウス及びラットを含むネズミ目動物を含む。げっ歯類動物は、好ましくはウサギである。霊長類動物は、例えば、キツネザルを含む原猿類動物、ニホンザルを含むオナガザル科動物、並びにヒト、チンパンジー及びゴリラを含むヒト科動物である。霊長類は、例えば、非ヒト動物又はヒトである。霊長類は、好ましくはヒトである。 As used herein, "animal" includes, but is not limited to, cartilaginous fish, cetacean artiodactyls, rodents, or primates. Cartilaginous fishes include, for example, sharks, rays, and sharks. Cetacean artiodactyls include, for example, camelids, including alpacas and camels, boars, including pigs, and bovids, including cows, goats, and sheep. The cetacean artiodactyl is preferably an alpaca. Rodents include, for example, lagomorphs, including rabbits, and rodents, including mice and rats. The rodent animal is preferably a rabbit. Primates are, for example, prosimians including lemurs, Cercopithenes including Japanese monkeys, and hominids including humans, chimpanzees, and gorillas. A primate is, for example, a non-human animal or a human. The primate is preferably a human.

本明細書において「動物種」は、ある動物が属する生物学的種を意味する。動物種のゲノム配列は、限定するものではないが、公知のデータベース上に格納されたゲノム配列であってよい。公知のデータベースは、例えば、Ｇｅｎｂａｎｋ（ＮＣＢＩ）、ＤＤＢＪ（日本ＤＮＡデータバンク）、Ｅｎｔｒｅｚ（ＮＣＢＩ）、ＧｅｎｏｍｉｃＢｉｏｌｏｇｙ（ＮＣＢＩ）を用いることができる。 As used herein, "animal species" refers to the biological species to which an animal belongs. The genome sequence of the animal species may be, but is not limited to, a genome sequence stored on a known database. As known databases, for example, Genbank (NCBI), DDBJ (DNA Data Bank of Japan), Entrez (NCBI), and Genomic Biology (NCBI) can be used.

以下に、本発明の態様に係る実施形態について説明するが、これらの実施形態は本発明の例示であり、添付する特許請求の範囲に記載の発明をいかようにも限定するものではない。 Embodiments according to aspects of the present invention will be described below, but these embodiments are illustrative of the present invention and do not limit the invention described in the appended claims in any way.

［実施形態１］
本発明の第１の態様は、抗体作製の候補となる抗原特異抗体を示す配列データを含む配列クラスターを特定する方法を提供する。本発明の第１の態様の１つの実施形態（以下「実施形態１」という）について以下に説明する。 [Embodiment 1]
A first aspect of the invention provides a method for identifying sequence clusters containing sequence data indicative of antigen-specific antibodies that are candidates for antibody production. One embodiment (hereinafter referred to as "Embodiment 1") of the first aspect of the present invention will be described below.

＜配列データグループの準備＞
前記特定方法は、免疫刺激を受けた動物から取得された抗体の配列データを含む配列データグループを準備する工程を含む。前記配列データグループを準備する工程は、特定の配列データグループを取得することを含み、場合により、取得した配列データグループに対して後述するクリーンアップなどの工程を行い、新たな配列データグループを生成することをさらに含む。 <Preparation of sequence data group>
The identification method includes the step of preparing a sequence data group containing sequence data of antibodies obtained from an animal that has undergone immune stimulation. The step of preparing the sequence data group includes acquiring a specific sequence data group, and in some cases, performing a process such as cleanup described below on the acquired sequence data group to generate a new sequence data group. It further includes:

（配列決定後の配列データグループ）
配列データグループを準備する工程は、限定するものではないが、配列決定後の配列データグループを取得することを含む。配列決定後の配列データグループは、例えば、興味ある抗原が投与されて免疫刺激を受けた動物から異なる時期にｉ回（ｉ＝１、２、３・・・）取得された抗体集団の各抗体の配列を配列決定した配列情報を有する配列データの集まり（Ｘ_ｉ）である。配列決定後の配列データグループの取得は、例えば、配列データグループをコンピュータに読み込ませることを含む。ｉは、限定するものではないが、１～１５の間の整数である。ｉは、例えば、３～８の間の整数、３、４、５、６、７又は８である。複数の抗体集団を取得する時期の間隔は一定間隔であってもよく又は不定間隔であってもよい。 (Sequence data group after sequencing)
The step of preparing a sequence data group includes, but is not limited to, obtaining a sequence data group after sequencing. The sequence data group after sequencing is, for example, each antibody of an antibody population obtained i times (i = 1, 2, 3...) at different times from an animal that has been administered with the antigen of interest and subjected to immune stimulation. It is a collection of sequence data (X _i ) having sequence information obtained by determining the sequence of . Obtaining a sequence data group after sequencing includes, for example, loading the sequence data group into a computer. i is, but is not limited to, an integer between 1 and 15. i is, for example, an integer between 3 and 8, 3, 4, 5, 6, 7 or 8. The intervals at which multiple antibody populations are obtained may be at regular intervals or at irregular intervals.

配列決定後の配列データグループは、例えば、興味ある抗原がアジュバントとともに２週間隔で５回投与されたアルパカから、抗原投与前に１回、抗原投与間隔の間に１回ずつ、抗原投与後に１回の計６回採取された血液サンプル中のリンパ系細胞から得られた各抗体の配列データの集まりであってよい。配列決定後の配列データグループは、例えば、約３００万個の配列データを含む。 A post-sequencing sequence data group could be obtained, for example, from an alpaca in which the antigen of interest was administered with adjuvant five times at two-week intervals, once before challenge, once during the challenge interval, and once after challenge. It may be a collection of sequence data of each antibody obtained from lymphoid cells in blood samples collected six times in total. The sequence data group after sequencing includes, for example, about 3 million pieces of sequence data.

（アミノ酸配列を示さない配列データ等のクリーンアップ）
動物の体液（例えば、血液、リンパ液）中のリンパ系細胞から配列決定により得られた抗体の配列データには、例えばＶ（Ｄ）Ｊ組換え過程などで読み取り枠（ｒｅａｄｉｎｇｆｒａｍｅ）がシフトしてアミノ酸配列を示さない遺伝子の配列情報を有する配列データが含まれ得る。 (Cleanup of sequence data, etc. that does not indicate amino acid sequences)
Antibody sequence data obtained by sequencing from lymphoid cells in animal body fluids (e.g., blood, lymph) contains reading frame shifts due to, for example, the V(D)J recombination process. Sequence data having gene sequence information that does not indicate the amino acid sequence may be included.

配列データグループの準備工程は、限定するものではないが、配列データグループから、アミノ酸配列を示さない配列データを除外（クリーンアップ）する工程を含む。このクリーンアップ工程は、例えば、抗体の可変領域部分又は前記部分を含む領域をコードする塩基配列が３の倍数でない塩基配列データを、配列データグループから除外することにより行われてよい。クリーンアップ工程を配列データグループに対して行うことで、アミノ酸配列を示す配列データの集まり（以下「クリーンアップ配列データグループ」ともいう）が生成される。また、クリーンアップ工程により、配列データグループにおける配列データの数を減少させることができる。 The step of preparing a sequence data group includes, but is not limited to, the step of removing (cleaning up) sequence data that does not represent an amino acid sequence from the sequence data group. This cleanup step may be performed, for example, by excluding from the sequence data group nucleotide sequence data in which the nucleotide sequence encoding the variable region portion of the antibody or the region containing the portion is not a multiple of 3. By performing the cleanup process on the sequence data group, a collection of sequence data indicating amino acid sequences (hereinafter also referred to as "cleanup sequence data group") is generated. Additionally, the cleanup process can reduce the number of array data in the array data group.

クリーンアップは、場合により、品質スコアに基づいて、配列データグループから所定の品質スコア未満の配列データを除外する工程をさらに含んでもよい。品質スコアに基づくクリーンアップは、塩基配列情報又はアミノ酸配列情報を有する配列データに対して行うことができる。 The cleanup may optionally further include excluding sequence data below a predetermined quality score from the sequence data group based on the quality score. Cleanup based on quality scores can be performed on sequence data having base sequence information or amino acid sequence information.

（ユニーク配列データへの集約）
動物から取得された抗体集団には、同一の配列を有する抗体が複数含まれ得る。従って、前記抗体集団を配列決定して得られる配列データの集まり（配列決定後の配列データグループ）には、同一の配列情報を有する配列データが複数含まれ得る。 (Aggregation into unique sequence data)
An antibody population obtained from an animal may include multiple antibodies having the same sequence. Therefore, a collection of sequence data obtained by sequencing the antibody population (sequence data group after sequencing) may include a plurality of sequence data having the same sequence information.

配列データグループの準備工程は、限定するものではないが、同一の配列情報を含む複数の配列データを、前記配列情報を含む１つの配列データ（以下「ユニーク配列データ」という）に集約する工程を含む。ユニーク配列データは、集約した配列データの個数（以下「出現頻度（集約）」という）に関する情報も含む。集約工程を配列データグループに対して行うことで、ユニーク配列データの集まり（以下「ユニーク配列データグループ」という）が生成される。集約工程により、配列データグループにおける配列データの数を減少させることができる。 The sequence data group preparation process includes, but is not limited to, a process of aggregating multiple sequence data containing the same sequence information into one sequence data containing the sequence information (hereinafter referred to as "unique sequence data"). include. The unique sequence data also includes information regarding the number of pieces of aggregated sequence data (hereinafter referred to as "frequency of appearance (aggregated)"). By performing the aggregation process on the sequence data group, a collection of unique sequence data (hereinafter referred to as "unique sequence data group") is generated. The aggregation process can reduce the number of sequence data in the sequence data group.

（配列決定の誤りの統合）
配列決定により得られた配列情報には、配列決定に付随する過程で生じるランダムなエラーが含まれ得ることが知られている。このランダムなエラーは、例えば、ＰＣＲによるＤＮＡ増幅及び配列決定時のＤＮＡ複製を含むＤＮＡポリメラーゼによるＤＮＡ合成過程、及び／又は蛍光画像からのシグナル解析過程を含む様々な過程で生じ得る。配列決定の過程で生じるランダムなエラーを有する配列情報に関する配列データを、配列決定の対象である遺伝子の配列（「本来の配列」ともいう）を示す配列データ（以下「統合ユニーク配列データ」という）に統合する。統合ユニーク配列データは、ユニーク配列データに含まれた情報に加えて、統合された配列データに含まれる出現頻度（集約）の合計値に関する情報（以下「出現頻度（統合）」という）をさらに含む。この統合工程を配列データグループに対して行うことで、統合ユニーク配列データの集まり（以下「統合ユニーク配列データグループ」という）が生成される。統合工程により、配列データグループにおける配列データの数を減少させることが可能となる。 (Consolidation of sequencing errors)
It is known that sequence information obtained by sequencing may contain random errors that occur during the process associated with sequencing. This random error can occur in various processes, including, for example, the process of DNA synthesis by DNA polymerases, including DNA amplification by PCR and DNA replication during sequencing, and/or the process of signal analysis from fluorescent images. Sequence data related to sequence information that contains random errors that occur during the sequencing process is referred to as sequence data (hereinafter referred to as "integrated unique sequence data") that indicates the sequence of the gene to be sequenced (also referred to as the "original sequence"). to be integrated into In addition to the information included in the unique sequence data, the integrated unique sequence data further includes information regarding the total value of the frequency of occurrence (aggregated) included in the integrated sequence data (hereinafter referred to as "frequency of occurrence (integrated)"). . By performing this integration step on the sequence data group, a collection of integrated unique sequence data (hereinafter referred to as "integrated unique sequence data group") is generated. The consolidation process makes it possible to reduce the number of sequence data in a sequence data group.

配列データグループの準備工程は、限定するものではないが、配列決定の誤りを統合する工程を含む。配列決定の誤りの統合は、例えば、出現頻度（集約）を有する配列データの集まりにおいて、出現頻度（集約）が最も大きい配列データ（以下「基準配列データ」という）とし、前記基準配列データの配列（以下「基準配列」という）と同じ長さであるが、前記基準配列に対してｎ個の塩基置換を有する配列の配列データ（以下「照会配列データ」という）とし、前記基準配列データの出現頻度（集約）に対する前記照会配列データの出現頻度（集約）の比率が閾値未満の照会配列データを、前記基準配列データに統合する工程を含む。 Preparing a group of sequence data includes, but is not limited to, integrating sequencing errors. Sequencing errors can be integrated, for example, by selecting sequence data (hereinafter referred to as "reference sequence data") with the highest frequency of occurrence (aggregated) in a collection of sequence data having frequencies of occurrence (aggregated), and (hereinafter referred to as "reference sequence"), but has n base substitutions with respect to said reference sequence (hereinafter referred to as "query sequence data"), and the appearance of said reference sequence data The method includes the step of integrating query sequence data in which the ratio of appearance frequency (aggregation) of the query sequence data to frequency (aggregation) is less than a threshold value with the reference sequence data.

前記統合工程は、前記基準配列と同じ長さの配列を有する配列データを、配列決定の誤りの統合を行う配列データグループから予め選出して、統合用配列データグループを生成する工程をさらに含んでよい。 The integration step further includes the step of generating a sequence data group for integration by selecting in advance sequence data having a sequence of the same length as the reference sequence from a sequence data group in which sequencing errors are integrated. good.

配列決定の誤りの統合を、ユニーク配列データに対して実行する場合について説明する。ユニーク配列データグループ中で、本来の配列を示す（又は配列決定の誤りがない）ユニーク配列データに示される塩基配列を「基準配列」とする。前記ユニーク配列データグループ中のユニーク配列データの塩基配列であって、基準配列と同一の配列長を有し、基準配列とは配列が別の塩基配列を「照会配列」とする。 A case will be described in which integration of sequencing errors is performed on unique sequence data. In the unique sequence data group, the base sequence shown in the unique sequence data indicating the original sequence (or without any sequencing errors) is defined as the "reference sequence". The base sequence of the unique sequence data in the unique sequence data group, which has the same sequence length as the reference sequence and is different from the reference sequence, is referred to as a "query sequence."

照会配列の塩基配列が基準配列の塩基配列の配列決定の誤りにより生じたものである場合に、照会配列を示すユニーク配列データを、基準配列を示すユニーク配列データに統合する。照会配列が基準配列の配列決定の誤りにより生じたものであるか否かを、照会配列と基準配列が出現する確率の比を計算することにより判定する。 When the base sequence of the query sequence is caused by an error in sequencing the base sequence of the reference sequence, the unique sequence data representing the query sequence is integrated with the unique sequence data representing the reference sequence. Whether or not the query sequence is caused by an error in sequencing the reference sequence is determined by calculating the ratio of the probabilities that the query sequence and the reference sequence appear.

例えば、配列長ｋの塩基配列のｎ番目の塩基に配列決定の誤りが生じる確率をｐ_ｎとする。配列決定の誤りが生じない確率（Ｐ_０）は、以下の式で表すことができる。

配列長ｋの塩基配列のｉ番目の塩基に１カ所誤りが生じる確率（Ｐ_ｉ）は以下の式で表すことができる。
For example, let p _n be the probability that a sequence determination error will occur at the nth base of a base sequence of sequence length k. The probability (P ₀ ) that no sequencing error occurs can be expressed by the following formula.

The probability (P _i ) of one error occurring in the i-th base of a base sequence of sequence length k can be expressed by the following formula.

また、配列長ｋの塩基配列のｉ番目及びｊ番目の塩基に２カ所誤りが生じる確率（Ｐ_ｉｊ）は以下の式で表すことができる。

同様にして、「基準配列」と「照会配列」とで３カ所以上の塩基に誤りが生じる確率も計算できる。 Further, the probability (P _ij ) of two errors occurring in the i-th and j-th bases of a base sequence of sequence length k can be expressed by the following formula.

Similarly, the probability that errors occur in three or more bases between the "reference sequence" and the "query sequence" can also be calculated.

配列長ｋの塩基配列に関する「基準配列」と「照会配列」とで、例えば、ｉ番目の塩基が１カ所異なっていた場合、塩基配列はＡ、Ｇ、Ｃ及びＴの４種類から構成されるので、任意の塩基から別の塩基へ変化する場合の数は３である。上述のように基準配列が出現する確率をＰ_０とした場合、照会配列が生じる確率は以下の式で表すことができる。塩基置換の種類によって確率が変わることもありうるが、ここでは、すべての塩基置換が同じ確率で生じるとした。
For example, if the i-th base differs by one location between the "reference sequence" and the "query sequence" regarding a base sequence of sequence length k, the base sequence will consist of four types: A, G, C, and T. Therefore, the number of cases where an arbitrary base changes to another base is 3. As mentioned above, when the probability that the reference sequence appears is _P0 , the probability that the query sequence will occur can be expressed by the following formula. Although the probability may change depending on the type of base substitution, we assumed here that all base substitutions occur with the same probability.

また「基準配列」と「照会配列」とで、例えば、i番目の塩基とｊ番目の塩基の２カ所が異なっていた場合、照会配列が生じる確率は以下の式で表すことができる。

同様にして、「基準配列」と「照会配列」とで３カ所以上の塩基が異なっていた場合の照会配列が生じる確率も計算できる。 Furthermore, if the "reference sequence" and the "query sequence" differ in two places, for example, the i-th base and the j-th base, the probability that the query sequence will occur can be expressed by the following formula.

Similarly, the probability that a query sequence will occur when the "reference sequence" and the "query sequence" differ in three or more bases can also be calculated.

ここで、基準配列の出現頻度（集約）をＦ_０とし、基準配列より派生した配列の出現頻度（集約）の総和をＦ_{ｔｏｔａｌ}とすると、「基準配列」が出現する確率Ｐ_０は以下のように表すことができる。
Here, if the appearance frequency (aggregation) of the standard sequence is _F0 , and the sum of the appearance frequencies (aggregation) of sequences derived from the standard sequence is _Ftotal , then the probability _P0 that the "standard sequence" appears is as follows. It can be expressed as

また、照会配列の出現頻度（集約）をＦ_{ｑｕｅｒｙ}とすると、「照会配列」が生じる確率は以下のように表すことができる。
Furthermore, if the appearance frequency (aggregation) of a query sequence is F _query , the probability that a "query sequence" occurs can be expressed as follows.

［数６］及び［数７］から以下の式が導かれる。

［数６］～［数８］は、前記ユニーク配列データグループにおける各配列データの出現頻度（集約）から計算することができる。 The following formula is derived from [Math. 6] and [Math. 7].

[Equation 6] to [Equation 8] can be calculated from the appearance frequency (aggregation) of each sequence data in the unique sequence data group.

ｉ番目の塩基に配列決定の誤りが生じる確率ｐ_ｉの値が明らかになれば、［数４］、［数５］で示した式を変形して、「照会配列が生じる確率／基準配列が生じる確率（Ｐ_０）」を計算することができる。例えば、塩基配列の違いがｉ番目の塩基に１箇所のみの場合、［数４］及び［数６］から以下の式が導かれる。
Once the value of the probability p _i that a sequencing error occurs in the i-th base is known, the formulas shown in [Equation 4] and [Equation 5] can be transformed to ``Probability that the query sequence occurs/Reference sequence is The probability of occurrence (P ₀ ) can be calculated. For example, if there is only one difference in the base sequence at the i-th base, the following equation is derived from [Equation 4] and [Equation 6].

また、塩基配列の違いがｉ番目とｊ番目の塩基の２カ所の場合は［数５］及び［数６］から以下の式が導かれる。

同様にして、「基準配列」と「照会配列」との塩基配列の違いが３カ所以上の場合も、「照会配列が生じる確率／基準配列が生じる確率（Ｐ_０）」を計算することができる。 Furthermore, when the base sequences differ in two locations, i.e., the i-th and j-th bases, the following formula is derived from [Math. 5] and [Math. 6].

Similarly, even if there are three or more differences in the nucleotide sequences between the "reference sequence" and the "query sequence", the "probability of the query sequence occurring/probability of the reference sequence occurring (P ₀ )" can be calculated. .

照会配列の塩基配列が基準配列の塩基配列の配列決定の誤りにより生じたものであるか否かを判定するにあたって、［数９］又は［数１０］の右辺の計算値から、「照会配列が生じる確率／基準配列が生じる確率（Ｐ_０）」に関する閾値（以下「閾値（確率）」ともいう）を設定することができる。前記閾値（確率）は、［数９］又は［数１０］の右辺の計算値よりも大きい値に適宜設定することができる。例えば、ユニーク配列データグループから計算できる［数８］の計算値が、閾値（確率）以下であれば、照会配列を基準配列に統合する。他方、［数８］の計算値が、閾値（確率）よりも大きな値であれば、照会配列は基準配列に統合しない。 In determining whether or not the base sequence of the query sequence is caused by an error in sequencing the base sequence of the reference sequence, from the calculated value on the right side of [Math. 9] or [Math. 10], it is necessary to A threshold value (hereinafter also referred to as "threshold value (probability)") regarding "probability of occurrence/probability of reference sequence occurring (P ₀ )" can be set. The threshold value (probability) can be appropriately set to a value larger than the calculated value on the right side of [Equation 9] or [Equation 10]. For example, if the calculated value of [Equation 8] that can be calculated from the unique sequence data group is equal to or less than a threshold (probability), the query sequence is integrated into the reference sequence. On the other hand, if the calculated value of [Equation 8] is larger than the threshold (probability), the query array is not integrated into the reference array.

塩基配列のｉ番目に配列決定の誤りが生じる確率ｐ_ｉの導出方法を以下に説明する。［数１］～［数１０］では、配列長ｋの塩基配列においてｉ番目の塩基に配列決定の誤りが生じる確率ｐ_ｉについて説明してきたが、ｐ_ｉは実験ごとに決定される値である。
基準配列と照会配列との塩基配列の違いが、ｎ番目の塩基に１カ所のみある場合、ｉ番目の塩基に配列決定の誤りが生じる確率ｐ_ｉは［数２］を変形することによって、以下の式で表される。
A method for deriving the probability p _i that a sequencing error will occur in the i-th base sequence will be described below. [Math. 1] to [Math. 10] have explained the probability p _i that a sequencing error will occur at the i-th base in a base sequence of sequence length k, and p _i is a value determined for each experiment. .
If there is only one difference in the base sequence between the reference sequence and the query sequence at the n-th base, the probability p _i that a sequencing error will occur at the i-th base is calculated as follows by transforming [Equation 2]. It is expressed by the formula.

ここで、塩基配列のｉ番目の塩基が１カ所のみ異なる塩基配列の出現頻度（集約）の総和をＦ_ｉとした場合、Ｐ_ｉは以下の式で表せる。

また、ｐ_ｉは［数６］、［数１１］及び［数１２］より以下の式で表すことができる。
Here, if F _i is the sum of the frequencies of appearance (aggregation) of base sequences in which the i-th base differs in only one place, P _i can be expressed by the following formula.

Further, p _i can be expressed by the following formula from [Equation 6], [Equation 11], and [Equation 12].

以上で導出されたｉ番目の塩基に配列決定の誤りが生じる確率ｐ_ｉの計算方法を以下に説明する。まず、ユニーク配列データグループ中のユニーク配列データを出現頻度（集約）で降順に並び替え、出現頻度（集約）が第１位のユニーク配列データの塩基配列を基準配列とする。基準配列の出現頻度（集約）が［数１３］におけるＦ_０となる。次に、基準配列の塩基配列と配列長が同じであるが、ｉ番目の塩基のみ異なる塩基配列を有するユニーク配列データを、前記ユニーク配列データグループから選出し（ここで、ｉ番目の塩基のみが異なるユニーク配列は、塩基の種類が４種類存在するため、最大３種類のユニーク配列データが選出され得る）、それらのユニーク配列データの出現頻度（集約）を合計する。この合計値は［数１３］におけるＦ_ｉとなる。Ｆ_０及びＦ_ｉの値がそれぞれ得られので、それらの値を［数１３］に代入することで、ｐ_ｉを計算することができる。 A method of calculating the probability p _i that a sequencing error will occur in the i-th base derived above will be explained below. First, the unique sequence data in the unique sequence data group are sorted in descending order of appearance frequency (aggregation), and the base sequence of the unique sequence data with the highest appearance frequency (aggregation) is set as a reference sequence. The appearance frequency (aggregation) of the reference array is F ₀ in [Equation 13]. Next, unique sequence data having a base sequence that is the same as the base sequence of the reference sequence but differs only in the i-th base is selected from the unique sequence data group (here, only the i-th base is Since there are four types of bases in different unique sequences, a maximum of three types of unique sequence data can be selected), and the frequency of appearance (aggregation) of these unique sequence data is summed. This total value becomes F _i in [Equation 13]. Since the values of F ₀ and F _i are obtained, p _i can be calculated by substituting these values into [Equation 13].

配列長ｋの塩基配列のｉ番目の塩基における配列決定の誤りを統合する工程を、ｉ＝１～ｋまで繰り返すことで、ユニーク配列データグループにおいて、出現頻度（集約）が第１位のユニーク配列データについてのｐ_ｉ（ｉ＝１～ｋ）の値が得られる。次に、ユニーク配列データグループにおいて、出現頻度（集約）が第２位のユニーク配列データを基準配列として、上記と同じ計算を行い、出現頻度（集約）が第２位のユニーク配列データについてのｐ_ｉ（ｉ＝１～ｋ）の値が得られる。同様に、第３位、第４位・・・のユニーク配列データについても計算して、基準配列とされた各ユニーク配列データに関してｐ_ｉの値が得られる。 By repeating the process of integrating sequencing errors at the i-th base of a base sequence with sequence length k until i = 1 to k, the unique sequence with the highest frequency of occurrence (aggregated) in the unique sequence data group is obtained. The values of p _i (i=1 to k) for the data are obtained. Next, in the unique sequence data group, the same calculation as above is performed using the unique sequence data with the second highest frequency of occurrence (aggregated) as the reference sequence, and the p The value of _i (i=1 to k) is obtained. Similarly, calculations are made for the third, fourth, etc. unique sequence data, and the value of p _i is obtained for each unique sequence data set as the reference sequence.

得られた数多くのｐ_ｉの値の統計値に基づいて閾値（確率）を設定してもよい。例えば得られた数多くのｐ_ｉの値から外れ値を取り除いた後の複数のｐ_ｉの値から平均値又は中央値を得て、その値から閾値（確率）を計算してもよいし、又は前記値に基づいて適宜設定してもよい。本明細書において「外れ値」は、複数の値から大きく外れた値を意味する。外れ値は、限定するものではないが、標準偏差、スミルノフ・グラブス、又はトンプソン検定に基づいて検出することができる。一例において、外れ値は、スミルノフ・グラブスにより検出される。 The threshold value (probability) may be set based on the statistical values of many obtained values of p _i . For example, the average value or median value may be obtained from a plurality of p _i values after removing outliers from the many obtained p _i values, and a threshold value (probability) may be calculated from that value, or It may be set as appropriate based on the value. As used herein, "outlier" means a value that deviates significantly from a plurality of values. Outliers can be detected based on, but are not limited to, standard deviation, Smirnov-Grubbs, or Thompson test. In one example, outliers are detected by Smirnov-Grubbs.

（クラス分け）
抗体は、Ｖ（Ｄ）Ｊ組換えの過程、次いで、親和性の成熟の過程を経て、成熟することが知られている。Ｖ（Ｄ）Ｊ組換えの過程で、様々な配列長の抗体が生成され得る。親和性成熟の過程で、配列長は同一であるがアミノ酸配列が異なる様々な抗体が、親和性成熟の過程で生成され得る。追加の免疫刺激による抗体の成熟過程を追跡するにあたって、抗体集団の配列データを、配列長ごとにクラス分けし、並びに、Ｖ遺伝子領域及びＪ遺伝子領域ごとに抗体の配列データをクラス分けすることで、抗原特異抗体を示す配列データを効率的に見つけ出すことが可能になり得る。 (class division)
It is known that antibodies mature through a process of V(D)J recombination followed by a process of affinity maturation. During the process of V(D)J recombination, antibodies of various sequence lengths can be produced. During the process of affinity maturation, different antibodies with the same sequence length but different amino acid sequences can be generated during the process of affinity maturation. In order to track the maturation process of antibodies due to additional immune stimulation, we can classify the sequence data of antibody populations by sequence length and classify the antibody sequence data by V gene region and J gene region. , it may become possible to efficiently find sequence data indicative of antigen-specific antibodies.

配列データグループの準備工程は、限定するものではないが、配列データグループ中の配列データを、配列長及び前記動物に対応する動物種のゲノム配列上のオリジナル配列のいずれか一方又は両方に基づいてクラス分けする工程を含む。前記クラス分けする工程により、対応する配列データを含むクラスで構成された、クラス分け配列データグループが生成される。 The sequence data group preparation process includes, but is not limited to, the sequence data in the sequence data group based on either or both of the sequence length and the original sequence on the genome sequence of the animal species corresponding to the animal. Including the step of classifying. The classification step generates a classified sequence data group composed of classes containing corresponding sequence data.

配列データグループ中の配列データは、例えば、配列長及び前記動物種のゲノム配列上のオリジナル配列の組合せに基づいてクラス分けされる。配列長に基づくクラス分けは、例えば、配列データグループ中の配列データを、配列データに示される配列長に基づいて、対応する配列長に関するクラスにそれぞれ割り当てることを含む。オリジナル配列に基づくクラス分けは、例えば、配列データグループ中の配列データを、配列データに示される抗体の可変領域を構成するＶ遺伝子断片及びＪ遺伝子断片のいずれか一方又は両方と、前記動物に対応する動物種のゲノム配列とのホモロジー検索により得られた配列一致度に基づいて、前記配列データを配列一致度が最も高い遺伝子断片（すなわち、オリジナル配列）に関するクラスにそれぞれ割り当てることを含む。前記配列一致度は、例えば、配列の同一性、又は配列の同一性に関するスコア値である。 Sequence data in a sequence data group is classified, for example, based on sequence length and combination of original sequences on the genome sequence of the animal species. Classification based on sequence length includes, for example, assigning each sequence data in a sequence data group to a class related to the corresponding sequence length based on the sequence length indicated in the sequence data. Classification based on the original sequence can be done, for example, by combining sequence data in a sequence data group with one or both of the V gene fragment and J gene fragment constituting the variable region of the antibody shown in the sequence data and corresponding to the animal. The method includes assigning each of the sequence data to a class related to a gene fragment having the highest degree of sequence identity (i.e., the original sequence) based on the degree of sequence identity obtained by a homology search with the genome sequence of the animal species. The degree of sequence identity is, for example, sequence identity or a score value regarding sequence identity.

（配列クラスター非形成性配列データの除外）
親和性成熟の過程を考慮すれば、親和性成熟を受けてより高い親和性を示す抗原特異抗体は、分子系統樹解析を行った場合、その抗体の配列と類似する配列を有し、且つ比較的高い親和性を有する一連の抗体とともに、分子系統樹にて配列クラスターを形成すると考えられる。反対に、親和性成熟を受けていない比較的低い親和性を示す抗体は、その抗体の配列と類似する配列を有する抗体は少なく又はほとんどなく、分子系統樹解析を行った場合、その抗体は分子系統樹にて配列クラスターを形成しないと考えられる。従って、配列クラスターを形成しないと推定される抗体の配列データ（以下「配列クラスター非形成性配列データ」という）を配列データグループから予め除外することで、抗原特異抗体の配列情報を有する配列データを高効率に見つけ出すことが可能になり得る。 (Exclusion of sequence data that does not form sequence clusters)
Considering the process of affinity maturation, an antigen-specific antibody that has undergone affinity maturation and exhibits higher affinity will have a sequence similar to that of the antibody when molecular phylogenetic tree analysis is performed, and It is thought that it forms a sequence cluster in the molecular phylogenetic tree together with a series of antibodies that have high affinity for each other. On the other hand, an antibody that has not undergone affinity maturation and exhibits relatively low affinity has few or almost no antibodies with sequences similar to that of the antibody, and when molecular phylogenetic analysis is performed, the antibody has a relatively low affinity. It is thought that no sequence clusters are formed in the phylogenetic tree. Therefore, by excluding sequence data of antibodies that are estimated not to form sequence clusters (hereinafter referred to as "sequence data that do not form sequence clusters") from the sequence data group, sequence data that has sequence information of antigen-specific antibodies can be It may be possible to find it with high efficiency.

配列データグループ準備工程は、限定するものではないが、配列データグループから配列クラスター非形成性配列データを除外し、配列クラスター形成性データグループを生成する工程を含む。配列クラスター非成形性配列データの除外は、配列データグループ中の特定の配列データに示される配列に対して、所定の配列一致度を有する配列データの個数が、配列データの個数に関する閾値未満である場合に、前記特定の配列データ（配列クラスター非形成性配列データ）を前記配列データグループから除外する工程を含む。 The sequence data group preparation step includes, but is not limited to, removing non-sequence cluster forming sequence data from the sequence data group to generate a sequence cluster forming data group. Sequence cluster non-formable sequence data is excluded when the number of sequence data having a predetermined degree of sequence matching with respect to the sequence indicated by specific sequence data in the sequence data group is less than a threshold regarding the number of sequence data. In this case, the method includes the step of excluding the specific sequence data (sequence data that does not form a sequence cluster) from the sequence data group.

一例において、所定の配列一致度は、配列同一性である。配列同一性を９０％とし、配列データの個数に関する閾値を２０とした場合、配列クラスター非形成性配列データの除外は、配列データグループ中の特定の配列データに示される配列と９０％の配列同一性を有する配列データの数が閾値（２０）未満の場合、その配列データは、前記配列データグループから除外される。この除外工程が繰り返された配列データグループには、ある配列と９０％の配列同一性を示す配列を有する配列データが２０個以上存在する配列データの集まりとなる。そのような配列データグループには、親和性成熟を受けた抗原特異抗体を示す配列データが非常に効率的に含まれると期待される。 In one example, the predetermined degree of sequence match is sequence identity. When the sequence identity is 90% and the threshold for the number of sequence data is 20, the exclusion of sequence data that does not form a sequence cluster is defined as 90% sequence identity with the sequence indicated by specific sequence data in the sequence data group. If the number of sequence data having the same property is less than a threshold (20), the sequence data is excluded from the sequence data group. A sequence data group in which this exclusion process is repeated becomes a collection of sequence data in which there are 20 or more pieces of sequence data having sequences showing 90% sequence identity with a certain sequence. Such sequence data groups are expected to very efficiently contain sequence data indicative of antigen-specific antibodies that have undergone affinity maturation.

他の例において、所定の配列一致度は、配列不一致の数である。配列不一致の数を、配列データに示される配列長の１０％に相当する数値としてよい。例えば、配列長が４００ｂｐの場合、その１０％に当たる４０を配列不一致度の数としてよい。配列不一致度の数を４０とし、配列データ数に関する閾値を１０とした場合（Ｕ４０＜１０）、配列クラスター非形成性配列データの除外は、配列データグループ中の特定の配列データに示される配列に対して不一致となる塩基又はアミノ酸が４０個まで有する配列データの数が閾値（１０）未満の場合、その配列データは、前記配列データグループから除外される。この除外工程が繰り返された配列データグループには親和性成熟を受けた抗原特異抗体を示す配列データが非常に効率的に含まれると期待される。 In other examples, the predetermined sequence match is a number of sequence mismatches. The number of sequence mismatches may be a value corresponding to 10% of the sequence length shown in the sequence data. For example, if the sequence length is 400 bp, 40, which is 10% of the sequence length, may be set as the number of sequence mismatch degrees. When the number of sequence discrepancies is 40 and the threshold value regarding the number of sequence data is 10 (U40<10), the exclusion of sequence data that does not form a sequence cluster is based on the sequence indicated by specific sequence data in the sequence data group. If the number of sequence data having up to 40 bases or amino acids that are mismatched is less than the threshold (10), that sequence data is excluded from the sequence data group. It is expected that the sequence data group in which this exclusion process is repeated will very efficiently contain sequence data indicative of antigen-specific antibodies that have undergone affinity maturation.

配列データの個数に関する閾値は、限定するものではないが、一定の値であってもよく、又は前記特定の配列データの出現頻度（集約）若しくは出現頻度（統合）に応じて変動してもよい。一例において、配列データの個数に関する閾値は１０であってよい。他の例において、配列データの個数に関する閾値は、出現頻度（集約）又は出現頻度（統合）の１０％であってよい。この例において、出現頻度（集約）が１００の場合、配列データの個数に関する閾値は１０となる。 Although the threshold value regarding the number of sequence data is not limited, it may be a constant value, or it may vary depending on the frequency of appearance (aggregation) or frequency of appearance (integration) of the specific sequence data. . In one example, the threshold for the number of array data may be ten. In another example, the threshold for the number of array data may be 10% of the frequency of occurrence (aggregation) or frequency of occurrence (integration). In this example, if the appearance frequency (aggregation) is 100, the threshold for the number of array data is 10.

実施形態１では、配列データグループを準備する工程は、クリーンアップする工程、ユニーク配列データに集約する工程、配列決定の誤りを統合する工程、クラス分けする工程、そして、配列クラスター非形成性配列データを除去する工程の順で含んだが、第１の態様に係る準備工程に含まれる工程の数及び順序はこれに限定されない。配列データグループを準備する工程は、前記した工程からなる群より選択される少なくとも１つの工程、少なくとも２つの工程、少なくとも３つの工程を適宜含んでよい。前記した各工程は、配列決定の誤りを統合する工程の前に、ユニーク配列データに集約する工程が行われれば、特に制限なく様々な順序で含めることができる。ユニーク配列データに集約する工程は、好ましくは、配列クラスター非形成性配列データを除外する工程の前に実施される。配列データグループを準備する工程において、前記した工程が行われた配列データグループが取得される場合、配列データグループは配列データグループを取得する工程のみを含んでよい。 In Embodiment 1, the steps of preparing a sequence data group include cleaning up, aggregating into unique sequence data, integrating sequencing errors, classifying sequence data that do not form sequence clusters. However, the number and order of the steps included in the preparation step according to the first aspect are not limited thereto. The step of preparing a sequence data group may appropriately include at least one step, at least two steps, or at least three steps selected from the group consisting of the above-described steps. The steps described above can be included in various orders without particular limitation, as long as the step of aggregating into unique sequence data is performed before the step of integrating sequencing errors. The step of aggregating into unique sequence data is preferably performed before the step of excluding sequence data that does not form sequence clusters. In the step of preparing a sequence data group, when a sequence data group in which the above-described steps have been performed is obtained, the sequence data group may include only the step of obtaining the sequence data group.

＜分子系統樹の作成＞
実施形態１に係る特定方法は、前記配列データグループから分子系統樹を作成する工程を含む。分子系統樹は、限定するものではないが、公知のソフトウェア（統計分析ソフトウェア「Ｒ」のａｐｅパッケージ）を用いて、配列データグループ中の配列データに示される配列から距離行列を算出し、算出した距離行列から近接結合法を用いることによって作成することができる。 <Creation of molecular phylogenetic tree>
The identification method according to Embodiment 1 includes the step of creating a molecular phylogenetic tree from the sequence data group. The molecular phylogenetic tree was calculated by calculating a distance matrix from the sequences shown in the sequence data in the sequence data group using known software (the ape package of the statistical analysis software "R"), although it is not limited to this. It can be created from a distance matrix by using the neighbor-joining method.

＜配列クラスターが抗原特異抗体を示す配列データを含むか否かの判定＞
実施形態１に係る特定方法は、分子系統樹における配列クラスター中の配列データに示される抗体の配列と前記抗体のオリジナル配列との配列一致度の変化又は抗体の種類の変化に基づいて、前記配列クラスターが抗体作製の候補となる抗原特異抗体を示す配列データを含むか否かを判定する工程を含む。この判定工程により、抗体作製の候補となる抗原特異抗体を示す配列データを含む１つ又は複数の配列クラスターが特定される。 <Determination of whether a sequence cluster contains sequence data indicating an antigen-specific antibody>
The identification method according to Embodiment 1 is based on a change in the degree of sequence identity between the antibody sequence shown in sequence data in a sequence cluster in a molecular phylogenetic tree and the original sequence of the antibody, or a change in the type of antibody. The method includes a step of determining whether the cluster includes sequence data indicating an antigen-specific antibody that is a candidate for antibody production. Through this determination step, one or more sequence clusters containing sequence data indicating antigen-specific antibodies that are candidates for antibody production are identified.

前記オリジナル配列は、配列データに示される抗体の配列と、前記抗体が得られた動物に対応する動物種のゲノム配列との配列一致度が最も高い配列である。オリジナル配列は、限定するものではないが、抗体の可変領域を構成するＶ遺伝子断片及びＪ遺伝子断片それぞれと、前記抗体が得られた動物に対応する動物種のゲノム配列との配列一致度がＢＬＡＳＴ検索により得られる配列同一性（％）が最も高い遺伝子断片である。 The original sequence is a sequence that has the highest sequence identity between the antibody sequence shown in the sequence data and the genome sequence of the animal species corresponding to the animal from which the antibody was obtained. Although the original sequence is not limited, the degree of sequence identity between each of the V gene fragment and J gene fragment constituting the variable region of the antibody and the genome sequence of the animal species corresponding to the animal from which the antibody was obtained is determined by BLAST. This is the gene fragment with the highest sequence identity (%) obtained through the search.

判定工程における配列一致度の変化は、例えば、単位時間当たりの配列一致度の変化（＝「配列一致度の傾き」）であり、前記配列一致度の傾きが、配列一致度の傾きに関する閾値未満の場合に、前記配列クラスターが抗原特異抗体を示す配列データを含むと判定する。判定工程における抗体の種類の変化は、例えば、抗体のクラススイッチであり、前記抗体のクラススイッチが、一定期間の間に所定の回数を超える場合に、前記配列クラスターが抗原特異抗体を示す配列データを含むと判定する。 The change in the degree of sequence identity in the determination process is, for example, the change in the degree of sequence identity per unit time (= "slope of the degree of sequence identity"), and the slope of the degree of sequence identity is less than the threshold regarding the slope of the degree of sequence identity. In this case, it is determined that the sequence cluster contains sequence data indicating an antigen-specific antibody. The change in the type of antibody in the determination step is, for example, a class switch of the antibody, and if the class switch of the antibody exceeds a predetermined number of times during a certain period, the sequence data indicates that the sequence cluster is an antigen-specific antibody. It is judged that it contains.

判定工程は、例えば、分子系統樹における配列クラスター中の配列データに示される抗体の配列と前記抗体のオリジナル配列との、抗原投与前の配列一致度が、抗原投与前の配列一致度に関する閾値未満の場合、前記配列データを配列特異抗体含まない配列クラスターであると判定することを含んでもよい。 The determination step includes, for example, whether the sequence identity of the antibody shown in the sequence data in the sequence cluster in the molecular phylogenetic tree and the original sequence of the antibody before antigen administration is less than a threshold regarding the sequence identity before antigen administration. In this case, the method may include determining that the sequence data is a sequence cluster that does not include a sequence-specific antibody.

判定工程は、分子系統樹から配列クラスターを抽出する工程をさらに含んでよい。一例において、配列クラスターは、分子系統樹における分岐点と分岐点とを結ぶ枝の長さが所定の値以上である場合に、それらの分岐点間の所定の位置から分子系統樹の中心から外側の領域を配列クラスターとして抽出してもよい。 The determination step may further include the step of extracting sequence clusters from the molecular phylogenetic tree. In one example, a sequence cluster is defined as a sequence cluster extending outward from the center of the molecular phylogenetic tree from a predetermined position between the branch points when the length of the branch connecting the branch points in the molecular phylogenetic tree is greater than or equal to a predetermined value. The region may be extracted as a sequence cluster.

実施形態１の分子系統樹の作成は、配列データグループの準備工程で準備された配列データグループについて行われたが、第１の態様に係る発明はこれに限定されない。分子系統樹の作成は、抗体作製の候補となる抗原特異抗体を示す配列データを含むと判定された配列クラスター中に含まれる配列データの集まりについて再度行われてよい。この再度作成された分子系統樹おける配列クラスターが抗体作製の候補となる抗原特異抗体を示す配列データを含むか否かの判定工程が更に行われてもよい。 Although the molecular phylogenetic tree of Embodiment 1 was created for the sequence data group prepared in the sequence data group preparation step, the invention according to the first aspect is not limited to this. The creation of a molecular phylogenetic tree may be performed again for a collection of sequence data included in a sequence cluster determined to include sequence data indicating an antigen-specific antibody that is a candidate for antibody production. A step of determining whether the sequence cluster in this re-created molecular phylogenetic tree includes sequence data indicating an antigen-specific antibody that is a candidate for antibody production may be further performed.

［実施形態２］
本発明の第２の態様は、抗体作製の候補となる抗原特異抗体を示す配列データを含む配列クラスターであるか否かを判定する判定システムを提供する。本発明の第２の態様の１つの実施形態（以下「実施形態２」という）について以下に説明する。 [Embodiment 2]
A second aspect of the present invention provides a determination system for determining whether a sequence cluster includes sequence data indicating an antigen-specific antibody that is a candidate for antibody production. One embodiment of the second aspect of the present invention (hereinafter referred to as "Embodiment 2") will be described below.

前記判定システムは、制御部と、記憶部とを備えている。前記記憶部は、抗原を投与された動物から取得された抗体の配列データを含む配列データグループを含むデータベースを格納する。前記制御部は、前記配列データグループから分子系統樹を作成する分子系統樹作成部と、前記分子系統樹における配列クラスター中の配列データに示される抗体の配列と前記抗体のオリジナル配列との配列一致度の変化又は抗体の種類の変化に基づいて、前記配列クラスターが抗体作製の候補となる抗原特異抗体を示す配列データを含む配列クラスターであるか否かを判定する配列データ判定部とを含む。 The determination system includes a control section and a storage section. The storage unit stores a database including sequence data groups including sequence data of antibodies obtained from animals administered with antigens. The control unit includes a molecular phylogenetic tree creation unit that creates a molecular phylogenetic tree from the sequence data group, and a sequence match between the sequence of the antibody shown in the sequence data in the sequence cluster in the molecular phylogenetic tree and the original sequence of the antibody. and a sequence data determination unit that determines whether the sequence cluster is a sequence cluster containing sequence data indicating an antigen-specific antibody that is a candidate for antibody production, based on a change in degree or a change in type of antibody.

＜制御部及び記憶部＞
実施形態２に係る判定システムは、制御部と、記憶部とを備えている。
図７は、実施形態２に係る判定システム１の概略構成を示すブロック図である。判定システム１は、その全体動作を制御する制御部１１と、データベース１０７を格納した記憶部１３とを備える。判定システム１には、インターフェースを介して、ユーザが入力を行う入力装置１５と、画面表示を行う表示装置１７と、測定装置２０とが接続されている。 <Control unit and storage unit>
The determination system according to the second embodiment includes a control section and a storage section.
FIG. 7 is a block diagram showing a schematic configuration of the determination system 1 according to the second embodiment. The determination system 1 includes a control unit 11 that controls its overall operation, and a storage unit 13 that stores a database 107. The determination system 1 is connected to an input device 15 through which a user performs input, a display device 17 that performs screen display, and a measuring device 20.

制御部１１は、ＣＰＵ（中央処理装置）などのプロセッサに対応する処理回路と、メモリ（主記憶装置）とで構成される。制御部１１のプロセッサは、メモリにロードされたコンピュータプログラムを実行する。制御部１１は、所定のコンピュータプログラムを実行することにより、後述する配列データグループ準備部１０１、分子系統樹作成部１０３、及び配列データ判定部１０５等の機能を実現する。制御部１１は、ソフトウェアと協働して所定の機能を実現してもよいし、ハードウェアのみで所定の機能を実現してもよい。 The control unit 11 includes a processing circuit corresponding to a processor such as a CPU (central processing unit), and a memory (main storage device). The processor of the control unit 11 executes a computer program loaded into memory. The control unit 11 implements the functions of a sequence data group preparation unit 101, a molecular phylogenetic tree creation unit 103, a sequence data determination unit 105, etc., which will be described later, by executing a predetermined computer program. The control unit 11 may realize a predetermined function in cooperation with software, or may realize a predetermined function only by hardware.

記憶部１３は補助記憶装置であり、例えば、ハードディスクドライブ（ＨＤＤ）またはソリッドステートドライブ（ＳＳＤ）である。記憶部１３には、例えば、コンピュータプログラムが格納されている。コンピュータプログラムは、メモリにロードされてプロセッサにより実行される。前記コンピュータプログラムは、オペレーティングシステムおよびアプリケーションプログラムを含む。 The storage unit 13 is an auxiliary storage device, and is, for example, a hard disk drive (HDD) or a solid state drive (SSD). For example, a computer program is stored in the storage unit 13. Computer programs are loaded into memory and executed by a processor. The computer programs include an operating system and application programs.

前記アプリケーションプログラムは、後述する、配列データグループ準備部を機能させる配列データグループ準備プログラム、配列一致度を含む各種パラメーターを算出する算出プログラム、分子系統樹作成部を機能させる分子系統樹作成プログラム、分子系統樹から配列クラスターを抽出する配列クラスター抽出プログラム、配列データ判定部を機能させる配列データ判定プログラム、閾値を取得する閾値取得プログラム、及び判定結果などを表示させる表示プログラムを含む。 The application programs include a sequence data group preparation program that functions the sequence data group preparation section, a calculation program that calculates various parameters including sequence identity, a molecular phylogenetic tree creation program that functions the molecular phylogenetic tree creation section, and a molecule. It includes a sequence cluster extraction program that extracts sequence clusters from a phylogenetic tree, a sequence data determination program that makes the sequence data determination section function, a threshold acquisition program that acquires a threshold value, and a display program that displays determination results.

実施形態２では、データベース１０７は、記憶部１３に格納されている。
図８は、実施形態２に係るデータベース１０７の概略構成を示すブロック図である。データベース１０７には、配列データグループ１１０、閾値データ１２０、関数データ１３０、及び動物ゲノム配列データ１４０が格納されている。 In the second embodiment, the database 107 is stored in the storage unit 13.
FIG. 8 is a block diagram showing a schematic configuration of the database 107 according to the second embodiment. The database 107 stores sequence data groups 110, threshold data 120, function data 130, and animal genome sequence data 140.

配列データグループ１１０は、配列決定後の配列データグループを含む。配列データグループの配列データは、抗体の可変領域部分をコードする塩基配列に関する情報、配列長に関する情報、及び品質スコア（ＱｕａｌｉｔｙＳｃｏｒｅ；Ｑスコア）を含む。配列データは、抗体のタイプ（ＩｇＧ２又はＩｇＧ３）に関する情報、採血時期に関する情報、及び採血した動物に関する情報をさらに含んでもよい。配列データグループ１１０は、さらに、後述するクリーンアップ配列データグループ、ユニーク配列データグループ、統合ユニーク配列データグループ、クラス分け配列データグループ、及び、配列クラスター形成性データグループを含んでもよい。 Sequence data group 110 includes a sequence data group after sequencing. The sequence data of the sequence data group includes information regarding the base sequence encoding the variable region portion of the antibody, information regarding the sequence length, and a quality score (Q score). The sequence data may further include information regarding the type of antibody (IgG2 or IgG3), information regarding the time of blood collection, and information regarding the animal from which the blood was collected. The sequence data group 110 may further include a clean-up sequence data group, a unique sequence data group, an integrated unique sequence data group, a classification sequence data group, and a sequence cluster-forming data group, which will be described later.

閾値データ１２０は、後述の分子系統樹における配列クラスター中の配列データに示される抗体の配列の、前記配列一致度の変化に関する閾値（以下「閾値（一致度）」という）を含む。閾値（一致度）は、例えば、各抗体の配列データが取得された時間の変化あたりの配列一致度の傾きに関する所定の値を含む。閾値は、配列決定の品質スコアに関する閾値、配列決定の誤りを統合する際に利用される「照会配列が生じる確率／基準配列が生じる確率（Ｐ_０）」に関する閾値「以下「閾値（確率）」という）、配列クラスター非形成性の配列データを除外する際に利用される配列データの個数に関する閾値をさらに含んでよい。 The threshold value data 120 includes a threshold value (hereinafter referred to as "threshold value (degree of identity)") regarding a change in the degree of sequence identity of an antibody sequence shown in sequence data in a sequence cluster in a molecular phylogenetic tree described below. The threshold value (degree of matching) includes, for example, a predetermined value regarding the slope of the degree of sequence matching per change in time at which sequence data of each antibody was acquired. The threshold is a threshold related to the quality score of sequencing, and a threshold related to "probability of query sequence occurring/probability of reference sequence occurring (P ₀ )" used when integrating sequencing errors (hereinafter referred to as "threshold (probability)"). ) may further include a threshold regarding the number of sequence data used to exclude sequence data that does not form a sequence cluster.

関数データ１３０は、塩基配列又はアミノ酸配列のホモロジー検索における配列一致度をスコア化するスコア関数データを含む。塩基配列の一致及び不一致をスコア化するスコア関数は、例えば、以下の式で示される：

［式中、Ａは動物のゲノム塩基配列に関する情報を示し、Ｂは配列データの塩基配列に関する情報を示す。］。この例において、ゲノム配列の塩基（Ａ）と対応する配列データの塩基（Ｂ）とが一致する場合（α Ａ＝Ｂ）は１点が付与され、不一致の場合（β Ａ≠Ｂ）は－３が付与される。このスコア関数を以下の表に示す：

The function data 130 includes score function data that scores the degree of sequence matching in a homology search for base sequences or amino acid sequences. A scoring function for scoring matches and mismatches in base sequences is, for example, expressed by the following formula:

[In the formula, A represents information regarding the genome base sequence of the animal, and B represents information regarding the base sequence of the sequence data. ]. In this example, if the base (A) of the genome sequence matches the base (B) of the corresponding sequence data (α A = B), 1 point is given, and if they do not match (β A≠B), - 3 is given. This scoring function is shown in the table below:

アミノ酸配列の一致及び不一致をスコア化するスコア関数は、例えば、ＰＡＭ行列（例えばＰＡＭ２５０）、ＢＬＯＳＵＭ行列（例えばＢＬＯＳＵＭ５０）などのスコア行列である。関数データ１３０に含まれるデータは、ポアソン分布に関する確率密度関数、回帰分析に用いられる関数をさらに含んでもよい。 A scoring function that scores matches and mismatches in amino acid sequences is, for example, a scoring matrix such as a PAM matrix (eg, PAM250), a BLOSUM matrix (eg, BLOSUM50), and the like. The data included in the function data 130 may further include a probability density function related to Poisson distribution and a function used for regression analysis.

動物ゲノム配列データ１４０は、アルパカを含む動物種に関するゲノム配列データを含む。アルパカのゲノム配列データは、抗体のＶ遺伝子、及びＪ遺伝子をコードする配列を含む。動物ゲノム配列データ１４０は、ヒトゲノム配列データをさらに含んでもよい。 The animal genome sequence data 140 includes genome sequence data regarding animal species including alpacas. Alpaca genome sequence data includes sequences encoding the antibody V and J genes. Animal genome sequence data 140 may further include human genome sequence data.

データベース１０７が、磁気ディスク、光ディスク、光磁気ディスク、またはフラッシュメモリなどの記憶媒体に格納されている場合、記憶部１３は、記憶媒体に対して情報を読み出す／書き込むドライブ装置と、その記憶媒体とで構成されてもよい。 When the database 107 is stored in a storage medium such as a magnetic disk, an optical disk, a magneto-optical disk, or a flash memory, the storage unit 13 includes a drive device that reads/writes information to and from the storage medium, and a drive device that reads/writes information to and from the storage medium. It may be composed of.

入力装置１５は、ユーザが必要な情報（配列データグループが格納されたアドレスを指定する指定情報）および指示情報を制御部１１に入力するための機器または装置により構成される。入力装置１５は、入力情報を、インターフェースを介して制御部１１へ出力する機能を有する。入力装置１５は、例えばキーボード、マウス、および音声認識装置であってよい。入力情報の種類はこれらに限定されない。入力情報は、例えば、採血した動物に関する情報を含んでもよい。 The input device 15 is constituted by a device or device for inputting necessary information by the user (specifying information specifying the address where the array data group is stored) and instruction information to the control unit 11. The input device 15 has a function of outputting input information to the control unit 11 via an interface. Input device 15 may be, for example, a keyboard, a mouse, and a voice recognition device. The types of input information are not limited to these. The input information may include, for example, information regarding the animal from which blood was collected.

指示情報は、処理工程を指示する情報を含む。処理工程を指示する情報は、例えば、後述の配列データグループ準備部での処理に関して、配列決定後の配列データグループから、クリーンアップ処理、ユニーク配列への集約処理、配列決定の誤りの統合処理、クラス分け処理、及び配列クラスター非形成性配列データ除外処理のいずれか又は全部をどのような順序で実行するかに関する指示情報を含む。 The instruction information includes information that instructs the processing steps. The information instructing the processing steps includes, for example, regarding the processing in the sequence data group preparation section described below, from the sequence data group after sequencing, cleanup processing, aggregation processing into unique sequences, integration processing of sequencing errors, It includes instruction information regarding the order in which any or all of the classification process and the sequence data non-forming sequence data exclusion process should be executed.

制御部１１は、入力装置１５から入力された指示に従い、データベース１０７に格納された配列データグループ１１０から配列決定後の配列データグループを取得し、後述する分子系統樹において配列クラスターを形成する可能性がある配列データの集まり（以下「配列クラスター形成性配列データグループ」という）を生成することを含む配列ライブラリーを準備する機能；前記配列データグループから分子系統樹を作成する機能；前記分子系統樹の配列クラスターが抗原特異抗体を含むか否かを判定する機能；及び判定結果などを表示する表示機能を有する。 The control unit 11 acquires sequence data groups after sequencing from the sequence data groups 110 stored in the database 107 in accordance with instructions input from the input device 15, and determines the possibility of forming sequence clusters in the molecular phylogenetic tree described below. A function to prepare a sequence library including generating a collection of sequence data (hereinafter referred to as a "sequence cluster-forming sequence data group"); a function to create a molecular phylogenetic tree from the sequence data group; a function to create a molecular phylogenetic tree from the sequence data group; It has a function of determining whether a sequence cluster of contains an antigen-specific antibody; and a display function of displaying the determination results.

制御部１１の前記機能は、制御部１１のメモリにロードされた、対応するアプリケーションプログラムを、制御部１１のプロセッサを含む処理回路にて実行することで実現される。 The functions of the control unit 11 are realized by executing a corresponding application program loaded into the memory of the control unit 11 in a processing circuit including a processor of the control unit 11.

表示装置１７は、制御部１１の判定結果などをユーザに知覚させることが可能な装置により構成され、例えばディスプレイ、およびプリンターであってよい。 The display device 17 is configured by a device that allows the user to perceive the determination result of the control unit 11, and may be a display or a printer, for example.

測定装置２０は、遺伝子配列を測定するための装置により構成される。測定装置２０は、例えば、次世代ＤＮＡシークエンサーであってよい。測定装置２０は、測定データを、インターフェースを介して制御部１１へ出力する機能を有する。 The measuring device 20 is composed of a device for measuring gene sequences. The measuring device 20 may be, for example, a next-generation DNA sequencer. The measuring device 20 has a function of outputting measurement data to the control unit 11 via an interface.

図７において、測定装置２０と制御部１１とを繋ぐ実線は、それらの要素間の有線または無線によるデータおよび信号の送受信を機能させるインターフェースを含む送受信機により構成される。図７において、これら以外の各要素を繋ぐ実線も同様である。 In FIG. 7, a solid line connecting the measuring device 20 and the control unit 11 is constituted by a transceiver including an interface that functions to transmit and receive data and signals by wire or wirelessly between these elements. In FIG. 7, the same applies to solid lines connecting each element other than these.

実施形態２では、データベース１０７は判定システム１内部の記憶部１３に格納されたが、データベースが格納される場所はこれに限定されない。データベース１０７は、例えば判定システム１の外部に存在する記憶装置に格納されていてもよい。このような記憶装置は、例えば、光ディスクなどの記憶媒体の全部または一部により構成されてよく、第２の態様に係る判定システムとネットワークにより接続されるサーバーに記憶装置が設けられていてもよい。 In the second embodiment, the database 107 is stored in the storage unit 13 inside the determination system 1, but the location where the database is stored is not limited to this. The database 107 may be stored in a storage device that exists outside the determination system 1, for example. Such a storage device may be configured, for example, by all or part of a storage medium such as an optical disk, and the storage device may be provided in a server connected to the determination system according to the second aspect via a network. .

実施形態２では、判定システム１は外部に存在する測定装置２０と接続されているが、本発明に係る判定システムはこれに限定されない。例えば、測定装置２０は判定システム１の内部に存在してもよい。測定装置２０を用いて得られた測定データを、制御部１１に読み込ませ、例えば、データベース１０７に格納させてもよい。判定システム１は、測定装置２０と接続されておらず、測定データが制御部１１に読み込まれる構成であってもよい。 In the second embodiment, the determination system 1 is connected to an external measuring device 20, but the determination system according to the present invention is not limited to this. For example, the measuring device 20 may be present inside the determination system 1. The measurement data obtained using the measurement device 20 may be read into the control unit 11 and stored in the database 107, for example. The determination system 1 may be configured such that it is not connected to the measuring device 20 and the measurement data is read into the control unit 11.

実施形態２に係る判定システムの動作を、図９～１１を用いて説明する。
ユーザが、入力装置１５を使用して（図７）、制御部１１に対する指示および必要な情報（各種のデータが格納されたアドレスを指定する指定情報、及び採血した動物に関する情報）を入力する。 The operation of the determination system according to the second embodiment will be explained using FIGS. 9 to 11.
A user uses the input device 15 (FIG. 7) to input instructions to the control unit 11 and necessary information (designation information specifying addresses where various data are stored and information regarding the animal from which blood was collected).

図９は、本発明に係る判定システムにおける、分子系統樹中の配列クラスターが抗原特異抗体を示す配列データを含むか否かの判定を示すフローチャートである。 FIG. 9 is a flowchart showing determination of whether a sequence cluster in a molecular phylogenetic tree includes sequence data indicating an antigen-specific antibody in the determination system according to the present invention.

＜配列データグループ準備部＞
実施形態２に係る判定システムは、所定の配列データグループを生成する、配列データグループ準備部を含む。
制御部１１のプロセッサは、入力装置１５を介したユーザからの指示および必要な情報の入力があると、配列データグループの準備を実行する（Ｓ１１）。配列データグループ準備Ｓ１１では、プロセッサは、記憶部１３に格納された配列データグループ準備プログラムをメモリにロードし実行する。これにより、制御部１１に、配列データグループ準備部１０１が実現される（図７）。 <Sequence data group preparation department>
The determination system according to the second embodiment includes a sequence data group preparation unit that generates a predetermined sequence data group.
When the processor of the control unit 11 receives an instruction and necessary information from the user via the input device 15, it prepares an array data group (S11). In array data group preparation S11, the processor loads the array data group preparation program stored in the storage unit 13 into memory and executes it. Thereby, the array data group preparation section 101 is realized in the control section 11 (FIG. 7).

図１０は、配列ライブラリー準備Ｓ１１における、配列データグループの準備を示すフローチャートである。配列データグループ準備部１０１は、ユーザが入力した指定情報を参照し、配列データグループの取得処理等を実行する。 FIG. 10 is a flowchart showing preparation of sequence data groups in sequence library preparation S11. The array data group preparation unit 101 refers to the designation information input by the user and executes array data group acquisition processing and the like.

（配列データグループの取得）
配列データグループ準備部１０１は、ユーザが入力した指定情報を参照し、データベース１０７内の配列データグループ１１０に格納された配列決定後の配列データグループをメモリにロードする（Ｓ３１）。配列決定後の配列データグループを構成する配列データは、抗体の可変領域部分をコードする塩基配列に関する情報、配列長に関する情報、採血時期（ｗｅｅｋｓ）に関する情報、品質スコア（ＱｕａｌｉｔｙＳｃｏｒｅ；Ｑスコア）を含む。 (Getting array data group)
The sequence data group preparation unit 101 refers to the designation information input by the user and loads into memory the sequence data group after the sequence determination stored in the sequence data group 110 in the database 107 (S31). The sequence data constituting the sequence data group after sequencing includes information regarding the base sequence encoding the variable region portion of the antibody, information regarding the sequence length, information regarding the blood collection period (weeks), and quality score (Q score). include.

配列データグループ準備部１０１は、ユーザが入力した指示情報を参照し、指示された処理を実行して、処理に応じた配列データグループを生成する。実施形態２において、指示情報は、配列決定後の配列データグループに対して、クリーンアップ処理によるクリーンアップ配列データグループの生成、ユニーク配列への集約処理によるユニーク配列データグループの生成、配列決定の誤りの統合処理による統合ユニーク配列データグループの生成、クラス分け処理によるクラス分け配列データグループの生成、及び配列クラスター非形成性配列データ除外処理による配列クラスター形成性配列データグループの生成の順で実行する指示、及び各処理により生成された配列データグループをデータベース１０７に格納する指示を含む。 The array data group preparation unit 101 refers to the instruction information input by the user, executes the instructed process, and generates an array data group according to the process. In Embodiment 2, the instruction information includes generation of a clean sequence data group by cleanup processing, generation of a unique sequence data group by aggregation processing into unique sequences, and sequencing errors for the sequence data group after sequencing. Instructions to execute in the following order: generation of an integrated unique sequence data group by the integration process, generation of a classified sequence data group by the classification process, and generation of a sequence cluster-forming sequence data group by the process of excluding sequence data that does not form a sequence cluster. , and an instruction to store the array data group generated by each process in the database 107.

（クリーンアップ処理）
配列データグループ準備部１０１は、配列決定後の配列データグループが取得されると、アミノ酸配列を示さない配列データの除外（クリーンアップ）を実行する（Ｓ３２）。これにより、クリーンアップ配列データグループが生成され、生成された配列データグループはデータベース１０７中の配列データグループ１１０に格納される。 (Cleanup processing)
When the sequence data group after sequence determination is acquired, the sequence data group preparation unit 101 executes removal (cleanup) of sequence data that does not indicate an amino acid sequence (S32). As a result, a clean-up sequence data group is generated, and the generated sequence data group is stored in the sequence data group 110 in the database 107.

クリーンアップＳ３２では、配列データグループ準備部１０１は、配列決定後の配列データグループ中の配列データに示される抗体の可変領域部分をコードする塩基配列の配列長に関する情報を取得し、前記配列長が３の倍数の場合（３ｋ）に選出タグを付与し、３の倍数でない場合（３ｋ＋１又は３ｋ＋２）に除外タグを付与する。配列データグループ準備部１０１は、選出／除外タグ付けを配列決定後の配列データグループ中の全ての配列データに対して実行する。配列データグループ準備部１０１は、配列決定後の配列データグループ中の全ての配列データにタグ付けを終えると、選出／除外タグ付けを終了する。 In cleanup S32, the sequence data group preparation unit 101 acquires information regarding the sequence length of the base sequence encoding the variable region portion of the antibody shown in the sequence data in the sequence data group after sequencing, and determines whether the sequence length is If the number is a multiple of 3 (3k), a selection tag is assigned, and if the number is not a multiple of 3 (3k+1 or 3k+2), an exclusion tag is assigned. The sequence data group preparation unit 101 performs selection/exclusion tagging on all sequence data in the sequence data group after sequencing. When the sequence data group preparation unit 101 finishes tagging all the sequence data in the sequence data group after sequencing, it ends the selection/exclusion tagging.

配列データグループ準備部１０１は、タグ付終了後に、選出タグが付された配列データを配列決定後の配列データグループから選出し、選出した配列データで構成されるクリーンアップ配列データグループを生成する。生成されたクリーンアップ配列データグループは、データベース１０７中の配列データグループ１１０に格納される。 After the tagging is completed, the sequence data group preparation unit 101 selects the sequence data to which the selection tag has been attached from the sequence data group after sequencing, and generates a clean-up sequence data group composed of the selected sequence data. The generated cleanup sequence data group is stored in sequence data group 110 in database 107.

可変領域部分をコードする塩基配列が３の倍数の場合、その配列はアミノ酸配列を示す。クリーンアップ配列データグループ中の配列データは、アミノ酸配列に関する情報をさらに含んでもよい。 When the nucleotide sequence encoding the variable region portion is a multiple of 3, the sequence represents an amino acid sequence. The sequence data in the cleanup sequence data group may further include information regarding amino acid sequences.

実施例３では、クリーンアップ処理に関して、アミノ酸配列を示さない配列データを除去する処理を説明したが、クリーンアップ処理はこれに限定されない。クリーンアップ処理は、例えば、配列データに含まれる品質スコアに基づくクリーンアップ処理であってよい。また、クリーンアップ処理は、アミノ酸配列を示さない配列データの除去及び品質スコアに基づく除去の組合せであってよい。 In Example 3, regarding the cleanup process, a process of removing sequence data that does not indicate an amino acid sequence was described, but the cleanup process is not limited to this. The cleanup process may be, for example, a cleanup process based on a quality score included in the sequence data. Additionally, the cleanup process may be a combination of removal of sequence data that does not represent an amino acid sequence and removal based on quality scores.

実施形態２では、クリーンアップ処理は、選出タグが付された配列データを選出することにより実行されたが、クリーンアップ処理の内容はこれに限定されない。クリーンアップ処理は、例えば、除外タグが付された配列データを、配列決定後の配列データグループから除外することによって実行されてもよい。実施例３では、クリーンアップ処理は、塩基配列に対して実行されたが、クリーンアップ処理を実行する配列の種類はこれに限定されない。クリーンアップ処理は、アミノ酸配列に対しても実行できる。 In the second embodiment, the cleanup process is executed by selecting array data to which a selection tag is attached, but the contents of the cleanup process are not limited to this. The cleanup process may be performed, for example, by excluding sequence data to which an exclusion tag has been attached from a sequence data group after sequencing. In Example 3, the cleanup process was performed on base sequences, but the type of sequence on which the cleanup process is performed is not limited to this. Cleanup processing can also be performed on amino acid sequences.

（ユニーク配列データへの集約）
配列データグループ準備部１０１は、クリーンアップ配列データグループが生成されると、同一の配列を示す複数の配列データを１つの配列データ（＝「ユニーク配列データ」）へ集約する処理を実行する（Ｓ３３）。これにより、ユニーク配列データグループが生成され、生成された配列データグループはデータベース１０７中の配列データグループ１１０に格納される。 (Aggregation into unique sequence data)
When the cleanup sequence data group is generated, the sequence data group preparation unit 101 executes a process of aggregating multiple sequence data indicating the same sequence into one sequence data (= "unique sequence data") (S33 ). As a result, a unique sequence data group is generated, and the generated sequence data group is stored in the sequence data group 110 in the database 107.

配列データグループ準備部１０１は、ユニーク配列データへの集約処理において、クリーンアップ配列データグループ中の１つの配列データに集約タグ（α）を付与し、その配列データに示される配列に関する情報を取得する。配列データグループ準備部１０１は、取得した配列と同一の配列を示す他の配列データに対して同じ集約タグ（α）を付与する。配列データグループ準備部１０１は、クリーンアップ配列データグループにおいて、前記配列と同一の配列を示す配列データのすべてに集約タグ（α）を付与し終えると、集約タグ（α）のタグ付けを終了する。 In the process of aggregation into unique sequence data, the sequence data group preparation unit 101 assigns an aggregation tag (α) to one sequence data in the cleanup sequence data group, and acquires information regarding the sequence indicated by the sequence data. . The sequence data group preparation unit 101 assigns the same aggregate tag (α) to other sequence data indicating the same sequence as the acquired sequence. When the sequence data group preparation unit 101 finishes attaching the aggregate tag (α) to all the sequence data indicating the same sequence as the aforementioned sequence in the cleanup sequence data group, it finishes tagging with the aggregate tag (α). .

配列データグループ準備部１０１は、集約タグ（α）を付与した配列データに示される配列と別の配列を示す１つ配列データに、集約タグ（α）とは別の集約タグ（β）を付与し、その集約タグ（β）のタグ付けは、上記と同様、クリーンアップ配列データグループにおいて前記配列と同一の配列を示す配列データのすべてに付与し終えると、終了する。さらに、配列データグループ準備部１０１は、集約タグ（β）を付与した配列データに示される配列と別の配列を示す配列データに、別の集約タグ（γ、・・・）を順にタグ付けする。配列データグループ準備部１０１は、クリーンアップ配列データグループ中の配列データの数と、集約タグを付与した配列データの数とが一致すると、集約タグの付与を終了する。 The sequence data group preparation unit 101 assigns an aggregate tag (β) different from the aggregate tag (α) to one piece of sequence data indicating a sequence different from the sequence indicated by the sequence data to which the aggregate tag (α) has been attached. However, similarly to the above, the tagging with the aggregate tag (β) ends when all of the sequence data showing the same sequence as the above sequence in the cleanup sequence data group is tagged. Further, the sequence data group preparation unit 101 sequentially tags array data indicating a different sequence from the sequence indicated by the sequence data to which the aggregate tag (β) has been added with another aggregate tag (γ, . . . ). . When the number of array data in the cleanup array data group matches the number of array data to which aggregate tags have been added, the array data group preparation unit 101 ends the assignment of aggregate tags.

配列データグループ準備部１０１は、次いで、同じ種類の集約タグが付された複数の配列データを、１つの配列データ（例えば、集約タグが付されたいずれかの配列データ、又は新たに生成した配列データ）に集約することで、ユニーク配列データを生成する。ユニーク配列データは、配列データに含まれる情報に加えて、出現頻度（集約）に関する情報をさらに含む。 The sequence data group preparation unit 101 then converts the plurality of sequence data tagged with the same type of aggregation tag into one sequence data (for example, any sequence data tagged with the aggregation tag or a newly generated sequence). data) to generate unique array data. In addition to the information included in the sequence data, the unique sequence data further includes information regarding frequency of appearance (aggregation).

実施例３では、ユニーク配列データへの集約処理は、塩基配列に対して実行されたが、ユニーク配列を実行する配列の種類はこれに限定されない。ユニーク配列データへの集約処理は、アミノ酸配列に対して実行することができる。 In Example 3, the aggregation process into unique sequence data was performed on base sequences, but the type of sequence on which unique sequence data is processed is not limited to this. Aggregation processing into unique sequence data can be performed on amino acid sequences.

（配列決定の誤りの統合）
配列データグループ準備部１０１は、ユニーク配列データグループが生成されると、配列決定の誤りを統合する処理を実行する（Ｓ３４）。この処理により、統合ユニーク配列データグループが生成され、生成された配列データグループはデータベース１０７中の配列データグループ１１０に格納される。 (Consolidation of sequencing errors)
When the unique sequence data group is generated, the sequence data group preparation unit 101 executes processing to integrate sequence determination errors (S34). Through this process, an integrated unique sequence data group is generated, and the generated sequence data group is stored in the sequence data group 110 in the database 107.

図１１は、配列決定の誤りの統合ステップを示すフローチャートである。
（ａ）統合用配列データグループの生成
配列データグループ準備部１０１は、ユニーク配列データグループ中のすべてのユニーク配列データから出現頻度（集約）に関する情報を取得し、出現頻度（集約）が一番大きいユニーク配列データの出現頻度（集約）が２以上の場合（Ｓ４１；ＹＥＳ）、配列データグループ準備部１０１は、統合用配列データグループの生成を実行する（Ｓ４２）。これにより、配列統合用配列データグループが生成され、前記配列統合用配列データグループの生成に用いられた配列データは、前記ユニーク配列データグループから除外される。 FIG. 11 is a flowchart illustrating sequencing error consolidation steps.
(a) Generation of sequence data group for integration The sequence data group preparation unit 101 acquires information regarding frequency of occurrence (aggregation) from all unique sequence data in the unique sequence data group, and selects the frequency of occurrence (aggregation) with the highest frequency of occurrence (aggregation). When the appearance frequency (aggregation) of unique sequence data is 2 or more (S41; YES), the sequence data group preparation unit 101 executes generation of a sequence data group for integration (S42). As a result, a sequence data group for sequence integration is generated, and the sequence data used to generate the sequence data group for sequence integration is excluded from the unique sequence data group.

統合用配列データグループの生成Ｓ４２において、配列データグループ準備部１０１は、一番大きい出現頻度（集約）を示したユニーク配列データ（以下「基準配列データα」という）を選出する。
配列データグループ準備部１０１は、記憶部１３に格納された算出プログラムをメモリにロードして、配列一致度を含む各種パラメーターを生成及び算出する機能を実現（以下「生成＆算出機能」という）する。生成＆算出機能を実現した配列データグループ準備部１０１は、基準配列データαの塩基配列（以下「基準配列α」ともいう）と同じ配列長のユニーク配列データの塩基配列（以下「照会配列」ともいう）とを対比し、不一致となった塩基が何番目の塩基であるかに関する情報、塩基置換の種類に関する情報、及び塩基置換の数に関する情報を生成し、基準配列データαと比較された配列データ（「照会配列データ」ともいう）に付与する。配列データグループ準備部１０１は、前記情報の生成及び付与処理をユニーク配列データグループ中の全ての配列データに対して実行する。 In step S42 of generating an integrated sequence data group, the sequence data group preparation unit 101 selects unique sequence data (hereinafter referred to as "reference sequence data α") that exhibits the highest frequency of appearance (aggregation).
The sequence data group preparation unit 101 loads the calculation program stored in the storage unit 13 into memory and realizes a function of generating and calculating various parameters including sequence matching (hereinafter referred to as “generation & calculation function”). . The sequence data group preparation unit 101 that realizes the generation and calculation function generates a base sequence of unique sequence data (hereinafter also referred to as "query sequence") with the same sequence length as the base sequence of standard sequence data α (hereinafter also referred to as "standard sequence α"). ), and generates information on the number of bases that have mismatched bases, information on the types of base substitutions, and information on the number of base substitutions, and generates information on the sequence compared with the reference sequence data α. Assigned to data (also referred to as "query sequence data"). The array data group preparation unit 101 executes the information generation and provision process for all array data in the unique array data group.

次いで、配列データグループ準備部１０１は、基準配列αから、配列決定の誤りによって生じえた塩基置換数を有する配列データを選出する。配列データグループ準備部１０１は、基準配列データαの出現頻度（集約）に関する情報及びデータベース１０７内の関数データ１３０に格納されたポアソン分布に関する確率密度関数を取得し、取得した出現頻度（集約）とポアソン分布に関する確率密度関数とから、出現頻度（集約）が１以下になる最少の塩基置換数（ｍ）を算出する。例えば、塩基置換数の期待値が０．３個の場合、塩基置換数が０、１、２、３、４となる確率はそれぞれ、０．７４、０．２２、０．０３、０．００３、０．０００３となる。この場合、基準配列データαの出現頻度（集約）が１００の場合、塩基置換数が０、１、２、３、４の配列の出現頻度はそれぞれ、１００、３０、４．５、０．５、０．０３となり、ｍは３に設定される。同様にして、基準配列データαの出現頻度（集約）が１０００の場合、ｍは４に設定される。なお、塩基置換数の期待値は、上述したｉ番目の塩基に配列決定の誤りが生じる確率ｐ_ｉに関する式：

から算出することができる。 Next, the sequence data group preparation unit 101 selects sequence data having the number of base substitutions that could have occurred due to an error in sequencing from the reference sequence α. The sequence data group preparation unit 101 acquires information regarding the appearance frequency (aggregation) of the reference sequence data α and the probability density function regarding the Poisson distribution stored in the function data 130 in the database 107, and calculates the obtained appearance frequency (aggregation) and The minimum number of base substitutions (m) for which the appearance frequency (aggregate) is 1 or less is calculated from the probability density function related to Poisson distribution. For example, if the expected value of the number of base substitutions is 0.3, the probabilities that the number of base substitutions will be 0, 1, 2, 3, and 4 are 0.74, 0.22, 0.03, and 0.003, respectively. , 0.0003. In this case, if the appearance frequency (aggregate) of the standard sequence data α is 100, the appearance frequencies of sequences with 0, 1, 2, 3, and 4 base substitutions are 100, 30, 4.5, and 0.5, respectively. , 0.03, and m is set to 3. Similarly, when the appearance frequency (aggregation) of the reference array data α is 1000, m is set to 4. The expected value of the number of base substitutions is calculated using the above-mentioned formula regarding the probability p _i that a sequence determination error will occur at the i-th base:

It can be calculated from

生成＆算出機能を実現した配列データグループ準備部１０１は、塩基置換の数（ｍ）が設定されると、ユニーク配列データに付与された塩基置換の数に関する情報を参照し、ｍ個までの塩基置換を有する配列データ（塩基置換の数ｎ＝０、１、・・・ｍ）を選出して、統合用配列データグループを生成する。配列データグループ準備部１０１は、統合用配列データグループの生成のために選出されたユニーク配列データを、前記ユニーク配列データグループから除外する。 When the number of base substitutions (m) is set, the sequence data group preparation unit 101 that realizes the generation and calculation function refers to the information regarding the number of base substitutions added to the unique sequence data, and selects up to m bases. Sequence data having substitutions (number of base substitutions n=0, 1, . . . m) are selected to generate a sequence data group for integration. The sequence data group preparation unit 101 excludes the unique sequence data selected for generation of the integrated sequence data group from the unique sequence data group.

（ｂ）統合ユニーク配列データの生成
配列データグループ準備部１０１は、統合用配列データグループが生成されると、前記統合用配列データグループ中の配列データから、統合すべき配列データ（以下「統合用配列データ」という）を選出する処理を実行する（Ｓ４３）。
図１２は、統合用配列データの選出Ｓ４３を示すフローチャートである。生成＆算出機能を実現した配列データグループ準備部１０１は、ユニーク配列データに付与された情報（不一致となった塩基が何番目の塩基であるかに関する情報、塩基置換の種類に関する情報、及び塩基置換の数に関する情報）を参照し、実施形態１において詳述した「照会配列が生じる確率／基準配列が生じる確率（Ｐ_０）」である比率を算出する（Ｓ５１）。算出された前記比率は、前記ユニーク配列データに付与されてよい。 (b) Generation of integrated unique sequence data When a sequence data group for integration is generated, the sequence data group preparation unit 101 generates sequence data to be integrated (hereinafter referred to as "integrated sequence data") from the sequence data in the integrated sequence data group. (S43).
FIG. 12 is a flowchart showing selection S43 of array data for integration. The sequence data group preparation unit 101, which realizes the generation and calculation function, collects information added to the unique sequence data (information regarding the number of the base that has become a mismatch, information regarding the type of base substitution, and information regarding the base substitution). (information regarding the number of occurrences), and calculates the ratio of "probability of occurrence of query sequence/probability of occurrence of reference sequence (P ₀ )" detailed in Embodiment 1 (S51). The calculated ratio may be added to the unique sequence data.

配列データグループ準備部１０１は、前記比率が算出されると、記憶部１３に格納された閾値取得プログラムをメモリにロードして、閾値を取得する機能を実現する。閾値取得機能を実現した配列データグループ準備部１０１は、閾値データ１２０に格納された「照会配列が生じる確率／基準配列が生じる確率（Ｐ_０）」に関する閾値（以下「閾値（確率）」という）を取得する（Ｓ５２）。 When the ratio is calculated, the array data group preparation unit 101 loads the threshold value acquisition program stored in the storage unit 13 into the memory to realize a function of acquiring the threshold value. The sequence data group preparation unit 101 that realizes the threshold value acquisition function obtains a threshold value (hereinafter referred to as "threshold value (probability)") related to "probability that the query sequence occurs/probability that the reference sequence occurs (P ₀ )" stored in the threshold data 120. (S52).

配列データグループ準備部１０１は、閾値（確率）が取得されると、記憶部に格納された配列データ判定プログラムをメモリにロードして、配列データを判定する機能を実現する。配列データ判定機能を実現した配列データグループ準備部１０１は、ユニーク配列データに付与された前記比率が取得された閾値（確率）未満である場合に（Ｓ５３；ＹＥＳ）、統合すべきユニーク配列データであること示す統合タグをそのユニーク配列データに付与する（Ｓ５４）。 When the threshold value (probability) is acquired, the array data group preparation unit 101 loads the array data determination program stored in the storage unit into the memory to implement a function of determining array data. The sequence data group preparation unit 101 that realizes the sequence data determination function selects the unique sequence data to be integrated when the ratio given to the unique sequence data is less than the acquired threshold (probability) (S53; YES). An integrated tag indicating that the unique sequence data is present is added to the unique sequence data (S54).

配列データ判定機能を実現した配列データグループ準備部１０１は、ユニーク配列データに付与された前記比率が取得された閾値（確率）以上である場合に（Ｓ５３；ＮＯ）、統合すべきユニーク配列データではないこと示す非統合タグをそのユニーク配列データに付与する（Ｓ５５）。配列データグループ準備部１０１は、統合用配列データグループ中の全てのユニーク配列データにタグ付けを終えると、統合／非統合タグの付与を終了する。 The sequence data group preparation unit 101 that realizes the sequence data determination function determines that the unique sequence data to be integrated is A non-integrated tag indicating that the unique sequence data is not integrated is added to the unique sequence data (S55). When the sequence data group preparation unit 101 finishes tagging all the unique sequence data in the sequence data group for integration, it ends the assignment of integration/non-integration tags.

配列データグループ準備部１０１は、統合／非統合タグの付与が終了すると、図１１の配列決定の誤りの統合を示すフローチャートに戻って、統合用配列データグループ中のユニーク配列データに付与された統合タグを参照し、統合タグが付与されたユニーク配列データを、基準配列データαに統合する（Ｓ４４）。これにより、基準配列αに関する統合ユニーク配列データが生成され、生成された配列データはデータベース１０７に格納される。基準配列αに関する統合ユニーク配列データは、基準配列データαに含まれた情報に加えて、統合されたユニーク配列データに含まれる出現頻度（集約）の総和に関する情報（以下「出現頻度（統合）」という）をさらに含む。 When the sequence data group preparation unit 101 finishes attaching the integrated/non-integrated tags, the sequence data group preparation unit 101 returns to the flowchart of FIG. With reference to the tag, the unique sequence data to which the integration tag has been added is integrated into the reference sequence data α (S44). As a result, integrated unique sequence data regarding the reference sequence α is generated, and the generated sequence data is stored in the database 107. The integrated unique sequence data related to the reference sequence α includes, in addition to the information included in the reference sequence data α, information regarding the sum of the frequencies of occurrence (aggregated) included in the integrated unique sequence data (hereinafter referred to as "frequency of occurrence (integrated)"). ).

配列データグループ準備部１０１は、統合ユニーク配列データの生成が終了すると、統合用配列データグループ中のユニーク配列データに付与された非統合タグを参照し、非統合タグが付与されたユニーク配列データ（即ち、Ｓ４４において統合ユニーク配列データの生成に用いなかったユニーク配列データ）を、前記ユニーク配列データグループに戻す（Ｓ４５）。 When the generation of integrated unique sequence data is completed, the sequence data group preparation unit 101 refers to the non-integrated tag attached to the unique sequence data in the integrated sequence data group, and generates the unique sequence data ( That is, the unique sequence data that was not used to generate the integrated unique sequence data in S44 is returned to the unique sequence data group (S45).

配列データグループ準備部１０１は、基準配列αに関する統合ユニーク配列データの生成に用いられたユニーク配列データが除外されたユニーク配列データグループ中のすべてのユニーク配列データから出現頻度（集約）に関する情報を取得し、出現頻度（集約）が一番大きいユニーク配列データの出現頻度（集約）が２以上の場合（Ｓ４１；ＹＥＳ）、Ｓ４２～Ｓ４５を繰り返す。 The sequence data group preparation unit 101 acquires information regarding frequency of occurrence (aggregation) from all unique sequence data in the unique sequence data group excluding the unique sequence data used to generate the integrated unique sequence data regarding the reference sequence α. However, if the appearance frequency (aggregation) of the unique sequence data with the highest appearance frequency (aggregation) is 2 or more (S41; YES), S42 to S45 are repeated.

実施形態２の統合用配列データの選出では、比率を算出するＳ５１に続いて、閾値（確率）を取得するＳ５２が実行されたが、ステップの順序はこれに限定されない。Ｓ５１、Ｓ５２は、算出された比率と取得された閾値（確率）とを比較するＳ５３が実行されるまでに、実行されれば、いずれの順序で実行されてもよく、又は同時に実行されてもよい。 In the selection of sequence data for integration in the second embodiment, S52 of obtaining a threshold value (probability) is executed following S51 of calculating a ratio, but the order of steps is not limited to this. S51 and S52 may be executed in any order, or may be executed simultaneously, as long as they are executed before S53, which compares the calculated ratio and the acquired threshold (probability), is executed. good.

（ｃ）統合ユニーク配列データグループの生成
配列データグループ準備部１０１は、統合ユニーク配列データの生成に用いなかったユニーク配列データが除外されたユニーク配列データグループ中のすべてのユニーク配列データから出現頻度（集約）に関する情報を取得し、出現頻度（集約）が一番大きいユニーク配列データの出現頻度（集約）が２未満の場合（Ｓ４１；ＮＯ）、データベースに格納された統合ユニーク配列データ及び統合ユニーク配列データの生成に用いなかったユニーク配列データが除外されたユニーク配列データグループ中のすべてのユニーク配列データをメモリにロードして、統合ユニーク配列データグループを生成する（Ｓ４６）。これにより、統合ユニーク配列データグループが生成され、生成された統合ユニーク配列グループはデータベース１０７中の配列データグループ１１０に格納される。 (c) Generation of integrated unique sequence data group The sequence data group preparation unit 101 generates the frequency of appearance ( If the appearance frequency (aggregate) of the unique sequence data with the highest frequency of occurrence (aggregate) is less than 2 (S41; NO), the integrated unique sequence data and integrated unique sequence stored in the database are obtained. All unique sequence data in the unique sequence data group from which unique sequence data not used for data generation have been excluded is loaded into memory to generate an integrated unique sequence data group (S46). As a result, an integrated unique sequence data group is generated, and the generated integrated unique sequence group is stored in the sequence data group 110 in the database 107.

実施形態２の配列決定の誤りの統合では、統合ユニーク配列データグループの生成Ｓ４６は、ユニーク配列データグループにおいて一番大きい出現数（集約）が２未満と判定された後に（Ｓ４１；ＮＯ）実行されたが、ステップの順序はこれに限定されない。Ｓ４６は、例えば、ユニーク配列データグループにおいて一番大きい出現数（集約）が２以上と判定された後に（Ｓ４１；ＹＥＳ）、統合ユニーク配列データを生成するＳ４４と同時に実行されてもよい。 In the integration of sequencing errors in the second embodiment, generation S46 of an integrated unique sequence data group is executed after it is determined that the largest number of occurrences (aggregation) in the unique sequence data group is less than 2 (S41; NO). However, the order of steps is not limited to this. S46 may be executed simultaneously with S44 of generating integrated unique sequence data, for example, after it is determined that the largest number of occurrences (aggregation) in the unique sequence data group is 2 or more (S41; YES).

実施形態２の統合ユニーク配列データの生成に用いなかったユニーク配列データをユニーク配列データグループに戻すＳ４５は、統合ユニーク配列データの生成Ｓ４４の後に実行されたが、ステップの順序はこれに限定されない。Ｓ４５は、統合用配列データ選出Ｓ４３の後であって、Ｓ４２～Ｓ４５を繰り返すか否かを判定するためのＳ４１が実行される前に、実行されればよい。Ｓ４５は、例えば、統合ユニーク配列データの生成Ｓ４４の前実行されてもよく、Ｓ４４と同時に実行されてもよい。 Although S45 of returning the unique sequence data that was not used in the generation of the integrated unique sequence data to the unique sequence data group in the second embodiment was executed after the generation of the integrated unique sequence data S44, the order of the steps is not limited to this. S45 may be executed after S43 of selecting array data for integration and before S41 for determining whether to repeat S42 to S45 is executed. S45 may be executed, for example, before the generation of integrated unique sequence data S44, or at the same time as S44.

実施形態２の配列決定の誤りの統合では、統合ユニーク配列データの生成Ｓ４４は、統合用配列データの選出Ｓ４３の後に実行されたが、ステップの順序はこれに限定されない。Ｓ４３は、例えば、ユニーク配列データグループにおいて一番大きい出現数（集約）が２未満と判定された後に（Ｓ４１；ＮＯ）実行されてよい。この場合、ユニーク配列データに付与される統合タグには、対比された基準配列の種類（例えば、基準配列データα）に関する情報が付与されており、ユニーク配列データの統合では、付与された基準配列の種類に関する情報が同じユニーク配列データが統合される。 In the integration of sequencing errors in Embodiment 2, the generation of integrated unique sequence data S44 was performed after the selection of sequence data for integration S43, but the order of the steps is not limited to this. S43 may be executed, for example, after it is determined that the largest number of occurrences (aggregation) in the unique sequence data group is less than 2 (S41; NO). In this case, the integration tag attached to the unique sequence data is attached with information regarding the type of reference sequence compared (for example, reference sequence data α), and in the integration of unique sequence data, the attached reference sequence Unique sequence data with the same type information are integrated.

実施形態２のユニーク配列データグループにおいて一番大きい出現数（集約）が２以上か否かを判定するＳ４１は、Ｓ４２～Ｓ４５に先行して実施されたが、ステップの順序はこれに限定されない。Ｓ４１は、例えば、Ｓ４２～Ｓ４５を繰り返すか否かを判定するために、Ｓ４５が実行された後に実行されればよい。即ち、Ｓ４１は、第１回目の統合用配列データグループの生成Ｓ４２の前に実行されてなくてもよい。 Although S41 of determining whether the largest number of occurrences (aggregation) in the unique array data group of the second embodiment is two or more is performed prior to S42 to S45, the order of the steps is not limited to this. S41 may be executed after S45 is executed, for example, in order to determine whether to repeat S42 to S45. That is, S41 does not need to be executed before S42, which generates the first integration array data group.

実施形態２の配列決定の誤りの統合では、統合用配列データグループの生成Ｓ４２が実行されたが、これに限定されない。一例において、配列決定の誤りの統合において、統合用配列データグループの生成Ｓ４２は実行されなくてもよい。この例の場合、配列決定の誤りの統合において、参照配列データは、基準配列データの基準配列と同じ配列長の配列を有する配列データである。 In the integration of sequencing errors in the second embodiment, generation S42 of the sequence data group for integration is executed, but the present invention is not limited to this. In one example, in the integration of sequencing errors, the generation of the sequence data group for integration S42 may not be performed. In this example, in the integration of sequencing errors, the reference sequence data is sequence data that has a sequence with the same sequence length as the reference sequence of the reference sequence data.

（クラス分け）
図１０の配列データグループの準備を示すフローチャートに戻って、配列データグループ準備部１０１は、統合ユニーク配列データグループが生成されると、配列長、オリジナルＶ遺伝子断片、及びオリジナルＪ遺伝子断片に関する情報に基づいたクラス分けを実行する（Ｓ３５）。 (class division)
Returning to the flowchart showing the preparation of sequence data groups in FIG. 10, when the integrated unique sequence data group is generated, the sequence data group preparation unit 101 stores information regarding the sequence length, original V gene fragment, and original J gene fragment. Classification based on this is executed (S35).

（ａ）配列長に基づいたクラス分け
配列長に基づいたクラス分けにおいて、配列データグループ準備部１０１は、統合ユニーク配列データグループ中の各配列データの配列長に関する情報を取得し、同じ配列長の配列を示す各配列データに配列長に関する配列長クラス分けタグを付与する。
配列データグループ準備部１０１は、配列データグループ中の配列データの数と配列長クラス分けタグを付与した回数とが一致すると、配列長クラス分けタグの付与を終了する。 (a) Classification based on sequence length In classification based on sequence length, the sequence data group preparation unit 101 acquires information regarding the sequence length of each sequence data in the integrated unique sequence data group, and A sequence length classification tag related to the sequence length is assigned to each sequence data indicating the sequence.
When the number of array data in the array data group matches the number of times the array length classification tag has been added, the array data group preparation unit 101 finishes adding the sequence length classification tag.

（ｂ）オリジナルＶ遺伝子断片に基づいたクラス分け
配列データグループ準備部１０１は、配列長クラス分けタグの付与が終了すると、統合ユニーク配列データグループ中の各配列データに示される抗体の可変領域を構成するＶ遺伝子断片と対応するゲノム配列上のＶ遺伝子断片（＝「オリジナルＶ遺伝子断片」）を推定し、その配列データにオリジナルＶ遺伝子断片に関するタグを付与する。 (b) Classification based on the original V gene fragment When the sequence data group preparation unit 101 finishes assigning sequence length classification tags, the sequence data group preparation unit 101 configures the antibody variable region shown in each sequence data in the integrated unique sequence data group. A V gene fragment (="original V gene fragment") on the genome sequence corresponding to the V gene fragment to be used is estimated, and a tag related to the original V gene fragment is added to the sequence data.

オリジナルＶ遺伝子断片の推定において、配列データグループ準備部１０１は、記憶部１３に格納された算出プログラムをメモリにロードして、配列一致度を算出する機能を実現する。生成＆算出機能を実現した配列データグループ準備部１０１は、ユーザが入力した入力情報から採血をした動物に関する情報を参照し、データベース１０７内の動物ゲノム配列データ１４０に格納された対応する動物種のゲノム情報を取得する。 In estimating the original V gene fragment, the sequence data group preparation unit 101 loads the calculation program stored in the storage unit 13 into memory to realize a function of calculating the degree of sequence matching. The sequence data group preparation unit 101 that realizes the generation and calculation function refers to the information about the animal whose blood was collected from the input information input by the user, and selects the corresponding animal species stored in the animal genome sequence data 140 in the database 107. Obtain genome information.

動物のゲノム配列データには、動物種に応じて、複数種類のＶ遺伝子断片をコードする領域が存在する。オリジナルＶ遺伝子断片の推定において、生成＆算出機能を実現した配列データグループ準備部１０１は、その配列データに示される抗体のＶ遺伝子領域を含む配列情報とゲノム配列上の各Ｖ遺伝子断片をコードする複数の領域との配列一致度をそれぞれ算出する。配列一致度は、例えば、１から不一致の数を配列長で除した値を差し引くことで算出される（［配列一致度］＝１－［不一致の数］／［配列長］）。 Animal genome sequence data includes regions encoding multiple types of V gene fragments, depending on the animal species. In estimating the original V gene fragment, the sequence data group preparation unit 101 that realizes the generation and calculation function encodes sequence information including the V gene region of the antibody shown in the sequence data and each V gene fragment on the genome sequence. Calculate the degree of sequence matching with multiple regions. The degree of sequence identity is calculated, for example, by subtracting the value obtained by dividing the number of mismatches by the sequence length from 1 ([degree of sequence identity]=1-[number of mismatches]/[sequence length]).

配列データグループ準備部１０１は、算出された配列一致度のうち、最も高い配列一致度を示すゲノム配列上の領域に示されるＶ遺伝子断片をオリジナルＶ遺伝子断片とする。配列データグループ準備部１０１は、オリジナルＶ遺伝子断片に関するＶ遺伝子断片クラス分けタグをその配列データに付与する。配列データグループ準備部１０１は、配列データグループ中の配列データの数とＶ遺伝子断片クラス分けタグを付与した回数とが一致すると、Ｖ遺伝子断片クラス分けタグの付与を終了する。 The sequence data group preparation unit 101 determines the V gene fragment shown in the region on the genome sequence showing the highest degree of sequence identity among the calculated degrees of sequence identity as the original V gene fragment. The sequence data group preparation unit 101 adds a V gene fragment classification tag regarding the original V gene fragment to the sequence data. When the number of sequence data in the sequence data group matches the number of times V gene fragment classification tags have been added, the sequence data group preparation unit 101 finishes adding V gene fragment classification tags.

（ｃ）オリジナルＪ遺伝子断片に基づいたクラス分け
配列データグループ準備部１０１は、Ｖ遺伝子断片クラス分けタグの付与が終了すると、上記と同様にして、配列データに示される抗体の可変領域と対応するゲノム配列上のＪ遺伝子断片（＝「オリジナルＪ遺伝子断片」）を特定し、その配列データにＪ遺伝子断片クラス分けタグを付与する。配列データグループ準備部１０１は、配列データグループ中の配列データの数とＪ遺伝子断片クラス分けタグを付与した回数とが一致すると、Ｊ遺伝子断片クラス分けタグの付与を終了する。 (c) Classification based on the original J gene fragment When the sequence data group preparation unit 101 finishes assigning the V gene fragment classification tags, the sequence data group preparation unit 101 assigns the classification tags based on the original J gene fragments in the same manner as described above. A J gene fragment (="original J gene fragment") on the genome sequence is identified, and a J gene fragment classification tag is attached to the sequence data. When the number of sequence data in the sequence data group matches the number of times J gene fragment classification tags have been added, the sequence data group preparation unit 101 finishes adding J gene fragment classification tags.

（ｄ）クラス分け配列データグループの生成
配列データグループ準備部１０１は、クラス分けタグの付与が終了すると、配列長、オリジナルＶ遺伝子断片の種類及びオリジナルＪ遺伝子断片の種類の組合せに対応したクラスを生成する。配列データグループ準備部１０１は、各種のクラス分けタグが付与された配列データを、対応するクラスに割当て、クラス分け配列データグループを生成する。これにより、生成されたクラス分け配列データグループは、データベース１０７中の配列データグループ１１０に格納される。各クラスには、そのクラスに対応する配列長の配列であって、対応するオリジナルＶ遺伝子断片及びオリジナルＪ遺伝子断片に由来する配列を示す配列データが含まれる。 (d) Generation of Classified Sequence Data Group When the assignment of classification tags is completed, the sequence data group preparation unit 101 generates a class corresponding to the combination of sequence length, type of original V gene fragment, and type of original J gene fragment. generate. The array data group preparation unit 101 assigns array data to which various classification tags are attached to corresponding classes, and generates a classified array data group. Thereby, the generated classification sequence data group is stored in the sequence data group 110 in the database 107. Each class includes sequence data indicating sequences having a sequence length corresponding to the class and derived from the corresponding original V gene fragment and original J gene fragment.

実施形態２のクラス分けでは、配列長に基づくクラス分け、オリジナルＶ遺伝子断片に基づくクラス分け、そしてＪ遺伝子断片に基づくクラス分けの順でクラス分けタグの付与が実行されたが、クラス分けタグの付与を実行する順序はこれに限定されない。例えば、これらのクラス分けタグの付与は同時に行われてもよい。クラス分けタグの付与は、配列長クラス分けタグの付与、Ｖ遺伝子断片クラス分けタグの付与、Ｊ遺伝子断片クラス分けタグの付与からなる群より選択される少なくとも１つのクラス分けタグの付与が実行されてよい。 In the classification of Embodiment 2, classification tags were assigned in the order of sequence length-based classification, original V gene fragment-based classification, and J gene fragment-based classification, but the classification tags were The order in which the assignments are performed is not limited to this. For example, these classification tags may be added at the same time. The assignment of the classification tag is performed by assigning at least one classification tag selected from the group consisting of assignment of a sequence length classification tag, assignment of a V gene fragment classification tag, and assignment of a J gene fragment classification tag. It's fine.

（配列クラスター形成性配列データグループの生成）
配列データグループ準備部１０１は、クラス分けデータグループが生成されると、各クラス中の配列データグループから配列クラスター非形成性配列データの除外を実行する（Ｓ３６）。これにより、配列クラスター形成性配列データグループが生成され、生成された配列データグループはデータベース１０７中の配列データグループ１１０に格納される。 (Generation of sequence cluster-forming sequence data groups)
When the classified data group is generated, the sequence data group preparation unit 101 excludes sequence data that does not form a sequence cluster from the sequence data groups in each class (S36). As a result, a sequence cluster-forming sequence data group is generated, and the generated sequence data group is stored in the sequence data group 110 in the database 107.

配列クラスター非形成性配列データの除外Ｓ３６において、配列データグループ準備部１０１は、ユーザが入力した指定情報を参照し、データベース１０７内の閾値データ１２０に格納された塩基置換の数に関する閾値（例えば３０個）及び配列データ数に関する閾値（例えば５０個）を取得する。 In S36 for excluding sequence data that does not form a sequence cluster, the sequence data group preparation unit 101 refers to the specification information input by the user and sets a threshold value regarding the number of base substitutions (for example, 30 ) and a threshold value (for example, 50) regarding the number of array data.

配列データグループ準備部１０１は、閾値の取得が終了すると、クラス分け配列データグループ中の任意のクラス（例えば、含まれる配列データの数の多いクラス）内の各配列データの配列情報を取得する。配列データグループ準備部１０１は、上記した生成＆算出機能を実現し、前記クラス内の１つの配列データ（例えば、出現頻度（統合）が大きい配列データ）を選択し、選択した配列データの配列と前記クラス内の他の配列データとの塩基置換の数をそれぞれ算出する。 When the acquisition of the threshold value is completed, the array data group preparation unit 101 acquires the array information of each array data in an arbitrary class (for example, a class containing a large number of array data) in the classified array data group. The array data group preparation unit 101 realizes the generation and calculation function described above, selects one array data in the class (for example, array data with a high frequency of appearance (integration)), and combines the selected array data with the array. The number of base substitutions with other sequence data in the class is calculated.

配列データグループ準備部１０１は、前記塩基置換の数に関する閾値未満の塩基置換の数を示す配列データの数を計数し、計数した配列データの数が、前記配列データ数に関する閾値未満であるか以下かを判定する。配列データグループ準備部１０１は、計数した配列データの数が、配列データ数に関する閾値未満の場合、その配列データを、配列クラスター非形成性配列データとして、前記クラスから除外する。 The sequence data group preparation unit 101 counts the number of sequence data indicating the number of base substitutions that is less than the threshold regarding the number of base substitutions, and determines whether the counted number of sequence data is less than or equal to the threshold regarding the number of sequence data. Determine whether When the number of counted sequence data is less than the threshold regarding the number of sequence data, the sequence data group preparation unit 101 excludes the sequence data from the class as sequence data that does not form a sequence cluster.

配列データグループ準備部１０１は、前記配列クラスター非形成性配列データの除外を、前記クラス中の各配列データについて実行する。配列データグループ準備部１０１は、１つのクラスにおいて配列クラスター非形成性配列データの除外処理の回数が、前記クラス中の配列データの数と一致した場合、そのクラスに対する配列クラスター非形成性配列データの除外処理を終了する。 The sequence data group preparation unit 101 executes the exclusion of the sequence data that does not form a sequence cluster for each sequence data in the class. If the number of exclusion processes for sequence data that does not form a sequence cluster in one class matches the number of sequence data in the class, the sequence data group preparation unit 101 removes the sequence data that does not form a sequence cluster for that class. End the exclusion process.

配列クラスター非形成性配列データの除外処理が実行されたクラスを構成する配列データは、後述する分子系統樹において配列クラスターを形成する可能性がある配列データの集まり（「配列クラスター形成性配列データ」）であり、それらの配列データを含む配列クラスター形成性配列データグループである。配列データグループ準備部１０１は、クラス分け配列データグループの１つのクラスに対する配列クラスター非形成性配列データの除外が終了すると、前記クラス分け配列データグループの他のクラスに対して同様に配列クラスター非形成性配列データの除外を実行する。 Sequence data constituting a class for which the process of excluding sequence data that does not form a sequence cluster is a collection of sequence data that may form a sequence cluster in the molecular phylogenetic tree described below ("sequence cluster forming sequence data"). ), and is a sequence cluster-forming sequence data group containing these sequence data. When the sequence data group preparation unit 101 finishes excluding sequence data that does not form a sequence cluster for one class of the classified sequence data group, the sequence data group preparation unit 101 similarly removes sequence data that does not form a sequence cluster for other classes of the classified sequence data group. Exclude sex sequence data.

配列データグループ準備部１０１は、実施形態２において、クラス分け配列データグループ中のクラスに含まれる配列データの数が、前記配列データ数に関する閾値未満の場合、そのクラスは、配列クラスター形成性配列データを含まないとして、クラス分け配列データグループから除外する。 In Embodiment 2, the sequence data group preparation unit 101 determines that when the number of sequence data included in a class in a classified sequence data group is less than the threshold regarding the number of sequence data, the class is classified as sequence cluster-forming sequence data. is excluded from the classification sequence data group as it does not contain it.

実施形態２のクリーンアップ処理Ｓ３２は、配列決定後の配列データグループに対して実行されたが、これに限定されない。クリーンアップ処理Ｓ３２は、ユニーク配列データグループ、統合ユニーク配列データグループ、クラス分け配列データグループ、又は配列クラスター形成性配列データグループに対して実行されてもよく、若しくは実行されなくてもよい。 Although the cleanup process S32 in the second embodiment is executed on the sequence data group after sequencing, the present invention is not limited thereto. The cleanup process S32 may or may not be performed on a unique sequence data group, an integrated unique sequence data group, a classified sequence data group, or a sequence cluster-forming sequence data group.

実施形態２のユニーク配列データへの集約処理Ｓ３３は、クリーンアップ後の配列データグループに対して実行されたが、これに限定されない。ユニーク配列データへの集約処理Ｓ３３は、配列決定後の配列データグループ、統合ユニーク配列データグループ、クラス分け配列データグループ、又は配列クラスター形成性配列データグループに対して実行されてもよく、若しくは実行されなくてもよい。 Although the aggregation process S33 into unique sequence data in the second embodiment is performed on the sequence data group after cleanup, the present invention is not limited thereto. The aggregation process S33 into unique sequence data may or may not be performed on a sequence data group after sequencing, an integrated unique sequence data group, a classified sequence data group, or a sequence cluster-forming sequence data group. You don't have to.

実施形態２の配列決定の誤りの統合Ｓ３４は、ユニーク配列データグループの生成ステップの直後に実行されたが、これに限定されない。配列決定の誤りの統合Ｓ３４は、ユニーク配列データグループの生成が実行された後であれば、いずれの順序で実行されてもよく、若しくは実行されてなくてもよい。 Although the sequencing error integration S34 in the second embodiment was performed immediately after the unique sequence data group generation step, the present invention is not limited thereto. Sequencing error integration S34 may be performed in any order or may not be performed after the generation of the unique sequence data group has been performed.

実施形態２のクラス分けＳ３５は、統合ユニーク配列データグループに対して実行されたが、これに限定されない。クラス分けステップＳ３５は、配列決定後の配列データグループ、クリーンアップ配列データグループ、ユニーク配列データグループ、統合ユニーク配列データグループ、配列クラスター形成性配列データグループに対して実行されてもよく、若しくは実行されなくてもよい。 Although the classification S35 in the second embodiment is performed on the integrated unique sequence data group, the classification is not limited thereto. The classification step S35 may or may not be performed on a post-sequencing sequence data group, a clean-up sequence data group, a unique sequence data group, an integrated unique sequence data group, a sequence cluster-forming sequence data group. You don't have to.

実施形態２の配列クラスター形成性配列データグループの生成Ｓ３６は、クラス分けデータグループに対して実行されたが、これに限定されない。配列クラスター形成性配列データグループの生成Ｓ３６は、配列決定後の配列データグループ、クリーンアップ配列データグループ、ユニーク配列データグループ、統合ユニーク配列データグループに対して実行されてもよく、若しくは実行されてなくてもよい。 Although the generation S36 of the sequence cluster-forming sequence data group in the second embodiment was performed on the classification data group, the present invention is not limited thereto. Generation of sequence cluster-forming sequence data groups S36 may or may not be performed on sequence data groups after sequencing, clean-up sequence data groups, unique sequence data groups, and integrated unique sequence data groups. It's okay.

＜分子系統樹作成部＞
実施形態２に係る判定システムは、配列データグループから分子系統樹を作成する分子系統樹作成部を含む。
制御部１１のプロセッサは、配列データグループが準備されると、図９に示される分子系統樹の作成を実行する（Ｓ１２）。分子系統樹の作成Ｓ１２において、制御部１１のプロセッサは、記憶部１３に格納された分子系統樹作成プログラムをメモリにロードして実行する。これにより、制御部１１に分子系統樹作成部１０３が実現される。 <Molecular phylogenetic tree creation department>
The determination system according to the second embodiment includes a molecular phylogenetic tree creation unit that creates a molecular phylogenetic tree from a sequence data group.
When the sequence data group is prepared, the processor of the control unit 11 executes creation of a molecular phylogenetic tree shown in FIG. 9 (S12). In molecular phylogenetic tree creation S12, the processor of the control unit 11 loads the molecular phylogenetic tree creation program stored in the storage unit 13 into memory and executes it. Thereby, a molecular phylogenetic tree creation section 103 is realized in the control section 11.

分子系統樹作成部１０３は、準備された配列クラスター形成性配列データグループ中の配列データについてマルチプルアライメント（multiple alignment）を行い、マルチプルアライメントを行った配列データから距離行列を計算することで分子系統樹を作成する。 The molecular phylogenetic tree creation unit 103 performs multiple alignment on sequence data in the prepared sequence cluster-forming sequence data group, and calculates a distance matrix from the sequence data that has undergone multiple alignment, thereby creating a molecular phylogenetic tree. Create.

実施形態２の分子系統樹の作成では、距離行列を計算することにより行われたが、分子系統樹の作成方法はこれに限定されない。分子系統樹の作成は、例えば、系統樹の枝の長さの合計を最小にすることに基づく最大節約法を用いてもよい。
実施形態２の分子系統樹の作成は、配列クラスター形成性配列データグループに対して実行されたが、これに限定されない。分子系統樹の作成は、統合ユニーク配列データグループ、又はクラス分け配列データグループに対して実行されてもよい。 Although the molecular phylogenetic tree of the second embodiment was created by calculating a distance matrix, the method of creating the molecular phylogenetic tree is not limited to this. The molecular phylogenetic tree may be created using, for example, a maximum parsimony method based on minimizing the sum of the lengths of the branches of the phylogenetic tree.
The creation of the molecular phylogenetic tree in Embodiment 2 was performed on sequence cluster-forming sequence data groups, but is not limited thereto. Creation of a molecular phylogenetic tree may be performed on an integrated unique sequence data group or a classified sequence data group.

＜配列データ判定部＞
実施形態２に係る判定システムは、分子系統樹における配列クラスター中の配列データに示される抗体の配列と前記抗体のオリジナル配列との配列一致度の変化又は抗体の種類の変化に基づいて、前記配列クラスターが抗体作製の候補となる抗原特異抗体を示す配列データを含む配列クラスターであるか否かを判定する配列データ判定部を含む。 <Sequence data determination section>
The determination system according to Embodiment 2 determines the sequence based on a change in the degree of sequence identity between the sequence of an antibody shown in sequence data in a sequence cluster in a molecular phylogenetic tree and the original sequence of the antibody, or a change in the type of antibody. It includes a sequence data determining unit that determines whether the cluster is a sequence cluster containing sequence data indicating an antigen-specific antibody that is a candidate for antibody production.

（分子系統樹からの配列クラスター抽出）
制御部１１のプロセッサは、分子系統樹が作成されると、記憶部１３に格納された配列データ判定部を機能させる配列データ判定プログラムをメモリにロードして実行する。これにより、制御部１１に配列データ判定部１０５が実現される。 (Sequence cluster extraction from molecular phylogenetic tree)
When the molecular phylogenetic tree is created, the processor of the control unit 11 loads into memory and executes a sequence data determination program that causes the sequence data determination unit stored in the storage unit 13 to function. Thereby, the array data determination section 105 is realized in the control section 11.

配列データ判定部１０５は、記憶部１３に格納された配列クラスター抽出プログラムをメモリにロードして実行する。これにより、配列クラスター抽出機能が実現される。配列クラスター抽出機能を配列データ実現した配列データ判定部１０５は、作成された分子系統樹からの配列クラスターの抽出を実行する（Ｓ１３）。これにより、分子系統樹から抽出された配列クラスター中の配列データが得られる。配列クラスター抽出機能を実現した配列データ判定部１０５は、作成された分子系統樹内で配列情報の密度が高い部分を配列クラスターとして抽出する。 The array data determination unit 105 loads the array cluster extraction program stored in the storage unit 13 into memory and executes it. This realizes a sequence cluster extraction function. The sequence data determination unit 105, which has realized the sequence cluster extraction function using sequence data, extracts sequence clusters from the created molecular phylogenetic tree (S13). This provides sequence data in sequence clusters extracted from the molecular phylogenetic tree. The sequence data determination unit 105 that realizes the sequence cluster extraction function extracts a portion with a high density of sequence information in the created molecular phylogenetic tree as a sequence cluster.

（配列クラスター中の配列データの配列一致度の傾きの算出）
配列データ判定部１０５は、配列クラスターが抽出されると、配列クラスター中の配列データの配列情報の変化（例えば、単位時間当たりの配列一致度の傾き）を算出する（Ｓ１４）。配列一致度の傾きの算出Ｓ１４において、配列データ判定部１０５は、記憶部１３に格納された算出プログラムをメモリにロードして実行する。これにより、配列データ判定部１０５に配列一致度の傾きを算出する機能が実現される。生成＆算出機能を実現した配列データ判定部１０５は、配列クラスター中の各配列データの配列情報と、その配列データに含まれるオリジナルＶ、Ｊ遺伝子断片の配列情報との配列一致度を算出する。 (Calculating the slope of sequence matching of sequence data in a sequence cluster)
When a sequence cluster is extracted, the sequence data determination unit 105 calculates a change in sequence information of sequence data in the sequence cluster (for example, the slope of the sequence coincidence degree per unit time) (S14). In calculation S14 of the slope of the degree of sequence matching, the sequence data determination unit 105 loads the calculation program stored in the storage unit 13 into the memory and executes it. Thereby, the function of calculating the slope of the degree of sequence matching is realized in the sequence data determination unit 105. The sequence data determination unit 105 that realizes the generation and calculation function calculates the degree of sequence matching between the sequence information of each sequence data in the sequence cluster and the sequence information of the original V and J gene fragments included in the sequence data.

配列データ判定部１０５は、採血時期（ｗｅｅｋｓ）に関する情報を前記配列クラスター中の各配列データから取得し、上記で算出した配列一致度を用いて、時間経過に対する配列一致度の傾きを算出する。 The sequence data determination unit 105 acquires information regarding the blood sampling period (weeks) from each sequence data in the sequence cluster, and uses the sequence matching degree calculated above to calculate the slope of the sequence matching degree over time.

（抗原特異抗体を含む配列クラスターであるか否かの判定）
配列データ判定部１０５は、配列一致度の傾きが算出されると、記憶部１３に格納された閾値取得プログラムをメモリにロードして、閾値を取得する機能を実現する。閾値取得機能を実現した配列データ判定部１０５は、閾値データ１２０に格納された配列一致度の傾きに関する閾値（以下「閾値（傾き）」という）を取得する（Ｓ１５）。 (Determination of whether it is a sequence cluster containing antigen-specific antibodies)
When the slope of the sequence matching degree is calculated, the array data determination unit 105 loads the threshold value acquisition program stored in the storage unit 13 into the memory, and realizes a function of acquiring the threshold value. The array data determination unit 105 that has implemented the threshold value acquisition function acquires a threshold value (hereinafter referred to as "threshold value (slope)") regarding the slope of the sequence matching degree stored in the threshold value data 120 (S15).

配列データ判定部１０５は、閾値（傾き）が取得されると、算出した配列一致度の傾きと取得した閾値（傾き）とを対比し、配列一致度の傾きが閾値（傾き）より小さい場合に（Ｓ１６；ＹＥＳ）、その配列一致度の傾きを示した配列クラスターに対して、抗原特異抗体を示す配列データを含むことを示す表示タグを付与する（Ｓ１７）。 When the threshold value (slope) is acquired, the sequence data determination unit 105 compares the calculated slope of the sequence matching degree with the acquired threshold value (slope), and determines if the slope of the sequence matching degree is smaller than the threshold value (slope). (S16; YES), a display tag indicating that sequence data indicating an antigen-specific antibody is included is attached to the sequence cluster showing the slope of the sequence identity (S17).

配列データ判定部１０５は、配列一致度の傾きが閾値（傾き）以上の場合に（Ｓ１６；ＮＯ）、その配列一致度の傾きを示した配列クラスターに対して、抗原特異抗体を示す配列データを含まないことを示す表示タグを付与する（Ｓ１８）。 If the slope of the degree of sequence identity is equal to or greater than the threshold value (slope) (S16; NO), the sequence data determination unit 105 determines sequence data indicating an antigen-specific antibody for the sequence cluster showing the slope of the degree of sequence identity. A display tag indicating that it is not included is added (S18).

＜判定結果表示＞
制御部１１のプロセッサは、表示タグが付与されると、判定結果表示を実行する（Ｓ１９）。判定結果表示Ｓ１９では、プロセッサは、記憶部１３からメモリにロードされた表示プログラムを実行し、表示タグに応じた映像信号を表示装置１７に出力する。表示装置１７は例えばディスプレイであってよい。この場合、該ディスプレイは入力された映像信号をもとに判定結果を表示する。 <Judgment result display>
When the display tag is assigned, the processor of the control unit 11 displays the determination result (S19). In determination result display S19, the processor executes the display program loaded into the memory from the storage unit 13 and outputs a video signal according to the display tag to the display device 17. The display device 17 may be, for example, a display. In this case, the display displays the determination result based on the input video signal.

実施形態２では、分子系統樹の作成Ｓ１２及び配列クラスターの抽出Ｓ１３は１回ずつ実行されたが、実行される回数はこれに限定されない。例えば、配列クラスターの抽出Ｓ１３に抽出された配列クラスター中の配列データの集まりに対して、再度、分子系統樹の作成Ｓ１２’を実行して作成された分子系統樹について、配列クラスターの抽出Ｓ１３’を実行してもよい。 In the second embodiment, the molecular phylogenetic tree creation S12 and the sequence cluster extraction S13 are executed once each, but the number of times they are executed is not limited to this. For example, for the molecular phylogenetic tree created by executing the molecular phylogenetic tree creation S12' again on the collection of sequence data in the sequence cluster extracted in the sequence cluster extraction S13, the sequence cluster extraction S13' is performed. may be executed.

実施形態２では、配列クラスター中の配列データの配列一致度の傾きの算出Ｓ１４、そして、閾値（傾き）の取得Ｓ１５の順に実行されたが、ステップの順序はこれに限定されない。Ｓ１４及びＳ１５は、配列一致度の傾きと閾値（傾き）との対比Ｓ１６までに実行されればよく、Ｓ１４及びＳ１５はいずれの順序で実行されてもよく、若しくは同時に実行されてもよい。 In the second embodiment, calculation of the slope of the degree of sequence matching of the sequence data in the sequence cluster S14 and acquisition of the threshold value (slope) S15 are performed in this order, but the order of the steps is not limited to this. S14 and S15 need only be executed before the comparison S16 of the slope of the sequence matching degree and the threshold value (slope), and S14 and S15 may be executed in any order or may be executed simultaneously.

［実施形態３］
本発明の第３の態様は、抗原特異抗体を作製する方法を提供する。本発明の第３の態様の１つの実施形態（以下「実施形態３」という）について以下に説明する。 [Embodiment 3]
A third aspect of the invention provides a method of producing antigen-specific antibodies. One embodiment of the third aspect of the present invention (hereinafter referred to as "Embodiment 3") will be described below.

実施形態３に係る作製方法は、免疫刺激を受けた可能性のある動物から取得された抗体の配列データを含む配列データグループを準備する工程；前記配列データグループから分子系統樹を作成する工程；前記分子系統樹における配列クラスター中の配列データに示される抗体の配列と前記抗体のオリジナル配列との配列一致度の変化又は抗体の種類の変化に基づいて、前記配列クラスターが抗体作製の候補となる抗原特異抗体を示す配列データを含むか否かを判定する工程；抗原特異抗体を示す配列データを含むと判定された配列クラスターから、抗体作製の候補となる抗原特異抗体を示す配列データを選出する工程；及び選出された配列データに示されるアミノ酸配列を有する抗体を作製する工程；を含む。 The production method according to Embodiment 3 includes the steps of: preparing a sequence data group containing sequence data of an antibody obtained from an animal that may have received immune stimulation; creating a molecular phylogenetic tree from the sequence data group; The sequence cluster becomes a candidate for antibody production based on a change in the degree of sequence identity between the sequence of the antibody shown in the sequence data in the sequence cluster in the molecular phylogenetic tree and the original sequence of the antibody or a change in the type of antibody. Step of determining whether sequence data indicating an antigen-specific antibody is included; selecting sequence data indicating an antigen-specific antibody that is a candidate for antibody production from sequence clusters determined to include sequence data indicating an antigen-specific antibody and producing an antibody having the amino acid sequence shown in the selected sequence data.

実施形態３に係る作製方法における、配列データグループを準備する工程、分子系統樹を作成する工程、及び抗原特異抗体を示す配列データを含む配列クラスターであるか否かを判定する工程には、実施形態１に係る特定方法において説明した特徴及び本明細書に記載の特徴が適宜適用される。 In the production method according to Embodiment 3, the step of preparing a sequence data group, the step of creating a molecular phylogenetic tree, and the step of determining whether a sequence cluster includes sequence data indicating an antigen-specific antibody includes the following steps: The features described in the identification method according to Form 1 and the features described in this specification are applied as appropriate.

＜抗原特異抗体を示す配列データの選出＞
実施形態３に係る作製方法は、抗原特異抗体を示す配列データを含むと判定された配列クラスターについて、抗体作製の候補となる抗原特異抗体を示す配列データを選出する工程を含む。配列データは、例えば、抗原特異抗体を示す配列データを含むと判定された配列クラスター中の配列データのうち、配列データ示される配列とそのオリジナル配列との配列同一性が、配列一致度に関する閾値以下の場合に選出されてよい。他の例において、配列データは、配列データ示される配列とそのオリジナル配列との配列同一性が最も小さいもの、又は抗体を採取したタイミングが一番遅い配列データであって、配列データ示される配列とそのオリジナル配列との配列同一性が配列一致度に関する閾値以下のものであってよい。 <Selection of sequence data showing antigen-specific antibodies>
The production method according to Embodiment 3 includes the step of selecting sequence data indicating an antigen-specific antibody that is a candidate for antibody production from a sequence cluster determined to include sequence data indicating an antigen-specific antibody. Sequence data is, for example, sequence data in a sequence cluster that has been determined to contain sequence data indicating an antigen-specific antibody, and the sequence identity between the sequence indicated by the sequence data and its original sequence is below a threshold regarding sequence identity. may be selected if In other examples, the sequence data is the one that has the least sequence identity between the sequence indicated by the sequence data and its original sequence, or the sequence data for which the antibody was collected the latest and the sequence indicated by the sequence data. The sequence identity with the original sequence may be less than or equal to a threshold for sequence identity.

＜配列データに示されるアミノ酸配列を有する抗体の作製＞
実施形態３に係る作製方法は、抗原特異抗体を示す配列データを含むと判定された配列クラスター中の配列データに示されるアミノ酸配列を有する抗体を作製する工程を含む。抗体は、限定するものではないが、選出された配列データのＣＤＲを大腸菌のコドン出現頻度に応じて最適化した配列を有するＤＮＡ断片を、大腸菌用の蛋白質発現ベクターに挿入してプラスミドを構築し、前記プラスミドを大腸菌に導入して発現させることにより製造される。 <Production of antibody having the amino acid sequence shown in the sequence data>
The production method according to Embodiment 3 includes the step of producing an antibody having an amino acid sequence shown in sequence data in a sequence cluster determined to include sequence data indicating an antigen-specific antibody. Although the antibody is not limited to, a plasmid is constructed by inserting a DNA fragment having a sequence in which the CDR of the selected sequence data is optimized according to the codon frequency of E. coli into a protein expression vector for E. coli. , is produced by introducing the plasmid into E. coli and expressing it.

［実施形態４］
本発明の第４の態様は、抗体作製の候補となる抗原特異抗体を示す配列データを選出する方法を提供する。本発明の第４の態様の１つの実施形態（以下「実施形態４」という）について以下に説明する。実施形態４に係る選出方法は、上述した実施形態１に係る特定方法と異なる方法であるが、具体的な工程において共通する部分があるため、ここでは、実施形態１に係る特定方法との相違点を主として説明する。 [Embodiment 4]
A fourth aspect of the invention provides a method for selecting sequence data indicative of antigen-specific antibodies that are candidates for antibody production. One embodiment of the fourth aspect of the present invention (hereinafter referred to as "Embodiment 4") will be described below. Although the selection method according to Embodiment 4 is different from the identification method according to Embodiment 1 described above, since there are common parts in specific steps, here, the differences from the identification method according to Embodiment 1 will be explained. I will mainly explain the points.

実施形態４に係る選出方法は、実施形態１に係る特定方法と異なり、分子系統樹を作成する工程、及び、抗体作製の候補となる抗原特異抗体を示す配列データを含むか否かを判定する工程を、必須の工程としては含まない。実施形態４に係る選出方法は、実施形態１に係る特定方法について詳述した「配列クラスター非形成性配列データの除外」を必須の工程として含む。実施形態４に係る選出方法は、更に、配列データグループから抗原特異抗体を示す配列データを選出する工程を必須の工程として含む。 The selection method according to Embodiment 4 differs from the identification method according to Embodiment 1 in that it includes the step of creating a molecular phylogenetic tree and determining whether sequence data indicating an antigen-specific antibody that is a candidate for antibody production is included. The process is not included as an essential process. The selection method according to the fourth embodiment includes, as an essential step, "exclusion of sequence data that does not form a sequence cluster", which is detailed in the identification method according to the first embodiment. The selection method according to Embodiment 4 further includes a step of selecting sequence data indicating an antigen-specific antibody from the sequence data group as an essential step.

前記した配列データを選出する工程は、配列データグループ中の配列データに示される抗体の配列と前記抗体のオリジナル配列との配列一致度と、閾値とを対比する工程を含む。前記した対比する工程は、限定するものではないが、配列データグループ中の配列データに示される抗体の配列と前記抗体のオリジナル配列との配列一致度（例えば８８％の一致度）が、前記閾値（例えば９０％の一致度）よりも低い場合に、抗体作製の候補となる抗原特異抗体を示す配列データとして選出することを含む。前記した対比する工程は、例えば、配列一致度と閾値との対比に加えて、その抗体が取得された時期（例えば５回目）と閾値（例えば４回目）とを対比することを更に含んでよい。この例において、選出工程では、抗体が取得された時期が閾値以上の場合であって、配列一致度が閾値未満の場合に、その配列データを抗体作製の候補となる抗原特異抗体を示す配列データとして選出することを含む。 The step of selecting the sequence data described above includes the step of comparing the degree of sequence matching between the sequence of the antibody shown in the sequence data in the sequence data group and the original sequence of the antibody with a threshold value. Although the above-described comparing step is not limited to, the degree of sequence identity (e.g., 88% identity) between the antibody sequence shown in the sequence data in the sequence data group and the original sequence of the antibody is determined by the threshold value. (for example, 90% identity), this includes selecting sequence data indicating an antigen-specific antibody that is a candidate for antibody production. The above-mentioned comparing step may further include, for example, comparing the time when the antibody was obtained (for example, the 5th time) and the threshold value (for example, the 4th time) in addition to comparing the degree of sequence identity and the threshold value. . In this example, in the selection process, if the time when the antibody was acquired is equal to or greater than the threshold value, and if the degree of sequence identity is less than the threshold value, the sequence data is used as sequence data indicating an antigen-specific antibody that is a candidate for antibody production. Including being elected as.

実施形態４に係る選出方法は、実施形態１に係る選出方法において詳述した「アミノ酸配列を示さない配列データ等のクリーンアップ」、「ユニーク配列への集約」、「配列決定の誤りの統合」、及び「クラス分け」を含む。発明の第４の態様に係る選出方法は、さらに、分子系統樹を作成する工程、及び、抗体作製の候補となる抗原特異抗体を示す配列データを含むか否かを判定する工程を含んでもよい。 The selection method according to Embodiment 4 includes "cleaning up sequence data etc. that do not indicate an amino acid sequence", "aggregation into unique sequences", and "integration of sequencing errors" detailed in the selection method according to Embodiment 1. , and “classification”. The selection method according to the fourth aspect of the invention may further include a step of creating a molecular phylogenetic tree, and a step of determining whether or not sequence data indicating an antigen-specific antibody that is a candidate for antibody production is included. .

［実施形態５］
本発明の第５の態様は、抗体作製の候補となる抗原特異抗体を示す配列データを選出する選出システムを提供する。本発明の第の態様の１つの実施形態（以下「実施形態５」という）について以下に説明する。実施形態５に係る選出方法は、上述した実施形態２に係る判定システムと異なるシステムであるが、具体的なデータ処理において共通する部分があるため、ここでは、実施形態２に係る判定システムとの相違点を主として説明する。 [Embodiment 5]
A fifth aspect of the present invention provides a selection system for selecting sequence data indicating antigen-specific antibodies that are candidates for antibody production. One embodiment (hereinafter referred to as "Embodiment 5") of the second aspect of the present invention will be described below. Although the selection method according to Embodiment 5 is a different system from the determination system according to Embodiment 2 described above, there are common parts in specific data processing, so here, the selection method according to Embodiment 2 will be explained. The differences will be mainly explained.

実施形態５に係る選出システムは、実施形態２に係る判定システムと異なり、分子系統樹作成部、及び、配列データ判定部を、必須の部としては含まない。実施形態５に係る選出システムは、実施形態２に係る判定システムについて詳述した「配列データ準備部」のうち「配列クラスター形成性配列データグループの生成」を必須の処理として含む。実施形態５に係る選出システムは、更に、配列データ選出部を必須の部として含む。 The selection system according to the fifth embodiment differs from the determination system according to the second embodiment in that it does not include a molecular phylogenetic tree creation section and a sequence data determination section as essential sections. The selection system according to the fifth embodiment includes, as an essential process, "generation of a sequence data group capable of forming a sequence cluster" in the "sequence data preparation unit" detailed in the determination system according to the second embodiment. The selection system according to the fifth embodiment further includes an array data selection section as an essential section.

前記した配列データ選出部は、配列データグループ中の配列データに示される抗体の配列と前記抗体のオリジナル配列との配列一致度と、閾値とを対比する処理を実行することを含む。前記した対比する処理においては、配列データ選出部は、限定するものではないが、配列データグループ中の配列データに示される抗体の配列と前記抗体のオリジナル配列との配列一致度（例えば８８％の一致度）が、前記閾値（例えば９０％の一致度）よりも低い場合に、抗体作製の候補となる抗原特異抗体を示す配列データであることを示す選出タグを前記配列データに付与することを含む。前記した対比する処理は、例えば、配列一致度と閾値との対比に加えて、その抗体が取得された時期（例えば５回目）と閾値（例えば４回目）とを対比する処理の実行を更に含んでよい。この例において、配列データ選出部では、抗体が取得された時期が閾値以上の場合であって、配列一致度が閾値未満の場合に、その配列データを抗体作製の候補となる抗原特異抗体を示す配列データであることを示す選出タグを付与する処理を含む。 The sequence data selection unit described above includes executing a process of comparing the degree of sequence matching between the sequence of the antibody shown in the sequence data in the sequence data group and the original sequence of the antibody with a threshold value. In the comparison process described above, the sequence data selection unit determines, but is not limited to, the degree of sequence identity between the antibody sequence shown in the sequence data in the sequence data group and the original sequence of the antibody (for example, 88%). If the degree of match) is lower than the threshold value (e.g., 90% match), a selection tag indicating that the sequence data represents an antigen-specific antibody that is a candidate for antibody production is added to the sequence data. include. For example, in addition to comparing the degree of sequence identity and a threshold value, the above-described comparing process further includes performing a process of comparing the time when the antibody was obtained (for example, the 5th time) and the threshold value (for example, the 4th time). That's fine. In this example, the sequence data selection unit uses the sequence data to identify antigen-specific antibodies that are candidates for antibody production when the time when the antibody was acquired is equal to or greater than the threshold and the degree of sequence identity is less than the threshold. Includes processing to add a selection tag indicating that it is array data.

実施形態５に係る選出システムは、実施形態２に係る判定システムにおいて詳述した「クリーンアップ処理」、「ユニーク配列への集約処理」、「配列決定の誤りの統合処理」、及び「クラス分け処理」を含む。発明の第５の態様に係る選出システムは、さらに、分子系統樹作成部、及び、配列データ判定部を含んでもよい。 The selection system according to the fifth embodiment performs the "cleanup processing", "aggregation processing into unique sequences", "integration processing of sequence determination errors", and "classification processing" detailed in the determination system according to the second embodiment. "including. The selection system according to the fifth aspect of the invention may further include a molecular phylogenetic tree creation section and a sequence data determination section.

［実施形態６］
本発明の第６の態様は、抗原特異抗体を作製する方法を提供する。本発明の第６の態様の１つの実施形態（以下「実施形態６」という）について以下に説明する。実施形態６に係る作製方法は、上述した実施形態３に係る作製方法と異なる方法であるが、具体的な工程において共通する部分があるため、ここでは、実施形態３に係る特定方法との相違点を主として説明する。 [Embodiment 6]
A sixth aspect of the invention provides a method of producing antigen-specific antibodies. One embodiment of the sixth aspect of the present invention (hereinafter referred to as "Embodiment 6") will be described below. Although the manufacturing method according to Embodiment 6 is different from the manufacturing method according to Embodiment 3 described above, since there are common parts in specific steps, here, the differences from the specifying method according to Embodiment 3 will be explained. I will mainly explain the points.

実施形態６に係る作製方法は、実施形態３に係る作製方法と異なり、分子系統樹を作成する工程、及び、抗体作製の候補となる抗原特異抗体を示す配列データを含むか否かを判定する工程を、必須の工程としては含まない。実施形態６に係る作製方法は、実施形態３に係る作製方法について詳述した「配列クラスター非形成性配列データの除外」を必須の工程として含む。実施形態６に係る作製方法は、更に、配列データグループから抗原特異抗体を示す配列データを選出する工程を必須の工程として含む。前記した配列データを選出する工程は、実施形態４の配列データを選出する工程において説明した特徴及び本願明細書に記載の特徴が適宜適用される。 The production method according to Embodiment 6 differs from the production method according to Embodiment 3 in that it includes the step of creating a molecular phylogenetic tree and determining whether sequence data indicating an antigen-specific antibody that is a candidate for antibody production is included. The process is not included as an essential process. The production method according to the sixth embodiment includes, as an essential step, "exclusion of sequence data that does not form a sequence cluster", which is detailed in the production method according to the third embodiment. The production method according to Embodiment 6 further includes a step of selecting sequence data indicating an antigen-specific antibody from the sequence data group as an essential step. The features described in the process of selecting sequence data in Embodiment 4 and the features described in this specification are applied as appropriate to the process of selecting sequence data described above.

実施形態６に係る作製方法における配列データグループを準備する工程には、実施形態１に係る配列データグループを準備する工程において説明した特徴及び本願明細書に記載の特徴が適宜適用される。実施形態６に係る抗体を作製する工程には、実施形態３に係る抗体を作製する工程において説明した特徴及び本願明細書に記載の特徴が適宜適用される。 The features described in the step of preparing a sequence data group according to Embodiment 1 and the features described in this specification are applied as appropriate to the step of preparing a sequence data group in the production method according to Embodiment 6. The features described in the step of producing an antibody according to Embodiment 3 and the features described in the present specification are applied as appropriate to the step of producing the antibody according to Embodiment 6.

本発明の第１又は第４の態様に係る実施形態は、例えば、以下に記載されるものであってよいが、これらに限定されない：
［項１］抗体作製の候補となる抗原特異抗体を示す配列データを含む配列クラスターを特定する方法であって、免疫刺激を受けた可能性のある動物から取得された抗体の配列データを含む配列データグループを準備する工程；前記配列データグループから分子系統樹を作成する工程；及び前記分子系統樹における配列クラスター中の配列データに示される抗体の配列と前記抗体のオリジナル配列との配列一致度の変化又は抗体の種類の変化に基づいて、前記配列クラスターが抗体作製の候補となる抗原特異抗体を示す配列データを含むか否かを判定する工程；を含み、前記オリジナル配列が、配列データに示される抗体の配列と、前記抗体が得られた動物に対応する動物種のゲノム配列との配列一致度が最も高い配列である、特定方法。
［項２－１］前記準備工程が、配列データグループを取得する工程を含み、前記準備工程で取得される配列データグループが、配列決定後の配列データグループであり、前記準備工程が、アミノ酸配列を示さない配列データをクリーンアップする工程を含む、項１に記載の特定方法。
［項２－２］前記準備工程が、同一の配列を示す複数の配列データを１つの配列データ（以下「ユニーク配列データ」という）に集約する工程を含み、前記ユニーク配列データは、前記配列データに示される配列に関する情報、及び集約した配列データの個数に関する情報（以下「出現頻度（集約）」という）を含む、項２－１に記載の特定方法。
［項２－３］前記準備工程が、配列データグループを取得する工程を含み、前記準備工程で取得される配列データグループが、配列決定後の配列データグループであり、前記準備工程が、同一の配列を示す複数の配列データをユニーク配列データに集約する工程を含み、前記ユニーク配列データは、前記配列データに示される配列に関する情報、及び出現頻度（集約）に関する情報を含む、項１に記載の特定方法。
［項２－４］前記準備工程が、アミノ酸配列を示さない配列データをクリーンアップする工程を含む、項２－３に記載の特定方法。
［項２－５］前記準備工程が、配列データグループを取得する工程を含み、前記準備工程が、配列データに含まれる配列決定の品質に関する情報と、閾値とを対比することを含み、前記閾値未満の配列データをクリーンアップする工程を含む、項１～項２－４のいずれかに記載の特定方法。 Embodiments according to the first or fourth aspect of the invention may be, for example, but not limited to, those described below:
[Item 1] A method for identifying a sequence cluster containing sequence data indicating an antigen-specific antibody that is a candidate for antibody production, the sequence containing sequence data of an antibody obtained from an animal that may have received immune stimulation. preparing a data group; creating a molecular phylogenetic tree from the sequence data group; and determining the degree of sequence identity between the antibody sequence shown in the sequence data in the sequence cluster in the molecular phylogenetic tree and the original sequence of the antibody. determining whether the sequence cluster contains sequence data indicative of an antigen-specific antibody that is a candidate for antibody production based on a change or a change in the type of antibody; A method for specifying a sequence that has the highest degree of sequence identity between the sequence of the antibody obtained and the genome sequence of the animal species corresponding to the animal from which the antibody was obtained.
[Section 2-1] The preparation step includes a step of obtaining a sequence data group, the sequence data group obtained in the preparation step is a sequence data group after sequencing, and the preparation step includes the step of obtaining an amino acid sequence. Item 2. The identification method according to item 1, comprising the step of cleaning up sequence data that does not show.
[Section 2-2] The preparation step includes a step of aggregating a plurality of sequence data indicating the same sequence into one sequence data (hereinafter referred to as "unique sequence data"), and the unique sequence data is the sequence data The identification method according to item 2-1, which includes information regarding the sequence shown in , and information regarding the number of aggregated sequence data (hereinafter referred to as "frequency of appearance (aggregated)").
[Section 2-3] The preparation step includes a step of obtaining a sequence data group, the sequence data group obtained in the preparation step is a sequence data group after sequencing, and the preparation step includes the step of obtaining a sequence data group, and the preparation step Item 1, comprising the step of aggregating a plurality of sequence data indicating sequences into unique sequence data, wherein the unique sequence data includes information regarding the sequence shown in the sequence data and information regarding frequency of occurrence (aggregation). Specific method.
[Item 2-4] The identification method according to Item 2-3, wherein the preparation step includes a step of cleaning up sequence data that does not indicate an amino acid sequence.
[Section 2-5] The preparation step includes a step of acquiring a sequence data group, and the preparation step includes comparing information regarding the quality of sequencing included in the sequence data with a threshold value, The identification method according to any one of Items 1 to 2-4, comprising the step of cleaning up sequence data below.

［項３］前記配列データグループが、前記ユニーク配列データで構成されるユニーク配列データグループであり、前記ユニーク配列データが塩基配列データであり、前記ユニーク配列データグループ中、出現頻度（集約）が最も大きいユニーク配列データを基準配列データ（ｎ＝０）とし、前記基準配列データの配列（以下「基準配列」という）と同じ長さであるが、前記基準配列に対してｎ個（ｎ＝１、・・・ｍ）の塩基置換を有する配列を有するユニーク配列データを照会配列データとし、前記準備工程が、前記基準配列データの出現頻度（集約）に対する前記照会配列の出現頻度（集約）の比率が閾値未満の照会配列を、前記基準配列データに統合する工程を含む、項２－２～項２－５のいずれかに記載の特定方法。
［項４－１］前記準備工程が、前記配列データグループ中の配列データを、配列長及び前記動物に対応する動物種のゲノム配列上のオリジナル配列のいずれか一方又は両方に基づいてクラス分けする工程を含む、項１～項３のいずれかに記載の特定方法。
［項４－２］前記準備工程が、前記配列データグループ中の特定の配列データに示される配列と所定の配列一致度を有する配列に関する配列データの個数が、配列データの個数に関する閾値未満である場合に、前記特定の配列データを前記配列データグループから除外する工程を含む、項１～項３のいずれかに記載の特定方法。
［項４－３］前記準備工程が、前記配列データグループ中の配列データを、配列長及び前記動物に対応する動物種のゲノム配列上のオリジナル配列のいずれか一方又は両方に基づいてクラス分けする工程、及び、前記配列データグループ中の特定の配列データに示される配列と所定の配列一致度を有する配列に関する配列データの個数が、配列データの個数に関する閾値未満である場合に、前記特定の配列データを前記配列データグループから除外する工程を含む、項１～項３のいずれかに記載の特定方法。 [Item 3] The sequence data group is a unique sequence data group composed of the unique sequence data, the unique sequence data is nucleotide sequence data, and the sequence data group has the highest frequency of occurrence (aggregation) among the unique sequence data groups. The large unique array data is set as reference array data (n=0), and the length is the same as the array of the reference array data (hereinafter referred to as "reference array"), but there are n pieces (n=1, ... m) unique sequence data having a sequence having a base substitution is used as query sequence data, and the preparation step is performed so that the ratio of the frequency of occurrence (aggregated) of the query sequence to the frequency of occurrence (aggregated) of the reference sequence data is The identification method according to any one of Items 2-2 to 2-5, including the step of integrating a query sequence that is less than a threshold into the reference sequence data.
[Section 4-1] The preparation step classifies the sequence data in the sequence data group based on either or both of the sequence length and the original sequence on the genome sequence of the animal species corresponding to the animal. The identification method according to any one of Items 1 to 3, comprising the step.
[Section 4-2] In the preparation step, the number of sequence data regarding sequences having a predetermined degree of sequence matching with the sequence indicated by specific sequence data in the sequence data group is less than a threshold regarding the number of sequence data. Item 3. The identification method according to any one of Items 1 to 3, including the step of excluding the specific sequence data from the sequence data group.
[Section 4-3] The preparation step classifies the sequence data in the sequence data group based on either or both of the sequence length and the original sequence on the genome sequence of the animal species corresponding to the animal. and when the number of pieces of sequence data regarding a sequence having a predetermined degree of sequence matching with the sequence indicated by the specific sequence data in the sequence data group is less than a threshold regarding the number of pieces of sequence data, the specific sequence Item 3. The identification method according to any one of Items 1 to 3, including the step of excluding data from the sequence data group.

［項５－１］前記準備工程が、前記統合工程、及び前記クラス分け工程の順で実施することを含む、項４－３に記載の特定方法。
［項５－２］前記準備工程が、前記統合工程、及び前記除外工程の順で実施することを含む、項４－３に記載の特定方法。
［項５－３］前記準備工程が、前記統合工程、前記クラス分け工程、及び前記除外工程の順で実施することを含む、項４－３に記載の特定方法。
［項６］前記オリジナル配列に基づくクラス分けが、前記配列データグループ中の配列データに示される抗体の可変領域を構成するＶ遺伝子断片及びＪ遺伝子断片のいずれか一方又は両方と、前記動物に対応する動物種のゲノム配列とのホモロジー検索により得られた配列一致度を算出する工程、及び前記配列データを、最も高い配列一致度を示す遺伝子断片に関するクラスに割り当てる工程を含む、項４－１、項４－３、項５－１、又は項５－３に記載の特定方法。
［項７－１］前記配列一致度の変化が、単位時間当たりの配列一致度の変化であり、及び抗体の種類の変化が抗体のクラススイッチである、項１～項６のいずれかに記載の特定方法。
［項７－２］前記配列データグループが、免疫刺激を受けた可能性のある動物から、異なる時期にｉ回取得された抗体集団の各抗体の配列を配列決定した配列情報を有する配列データの集まりである、項１～項６及び項７－１のいずれかに記載の特定方法。 [Item 5-1] The identification method according to Item 4-3, wherein the preparation step includes performing the integration step and the classification step in this order.
[Item 5-2] The identification method according to Item 4-3, wherein the preparation step includes performing the integration step and the exclusion step in this order.
[Item 5-3] The identification method according to Item 4-3, wherein the preparation step includes performing the integration step, the classification step, and the exclusion step in this order.
[Section 6] The classification based on the original sequence corresponds to either or both of the V gene fragment and J gene fragment constituting the variable region of the antibody shown in the sequence data in the sequence data group and the animal. Item 4-1, comprising a step of calculating the degree of sequence matching obtained by a homology search with the genome sequence of an animal species, and a step of assigning the sequence data to a class related to a gene fragment showing the highest degree of sequence matching; The identification method described in Section 4-3, Section 5-1, or Section 5-3.
[Item 7-1] According to any one of Items 1 to 6, wherein the change in sequence identity is a change in sequence identity per unit time, and the change in antibody type is an antibody class switch. How to identify.
[Section 7-2] The sequence data group has sequence information obtained by sequencing each antibody of an antibody population obtained i times at different times from animals that may have received immune stimulation. The identification method according to any one of Items 1 to 6 and 7-1, which is a collection.

［項１０］抗体作製の候補となる抗原特異抗体を示す配列データを選出する方法であって、免疫刺激を受けた可能性のある動物から取得された抗体の配列データを含む配列データグループを準備する工程；及び配列データグループから抗原特異抗体を示す配列データを選出する工程；を含み、前記準備工程が、前記配列データグループ中の特定の配列データに示される配列と所定の配列一致度を有する配列に関する配列データの個数が、配列データの個数に関する閾値未満である場合に、前記特定の配列データを前記配列データグループから除外する工程を含み、前記選出工程が、前記配列データグループ中の配列データに示される抗体の配列と前記抗体のオリジナル配列との配列一致度と、閾値とを対比する工程を含み、前記オリジナル配列が、配列データに示される抗体の配列と、前記抗体が得られた動物に対応する動物種のゲノム配列との配列一致度が最も高い配列である、選出方法。
［項１３－１］前記準備工程が、配列データグループを取得する工程を含み、前記準備工程で取得される配列データグループが、配列決定後の配列データグループであり、前記準備工程が、アミノ酸配列を示さない配列データをクリーンアップする工程を含む、項１０に記載の選出方法。
［項１３－２］前記準備工程が、同一の配列を示す複数の配列データをユニーク配列データに集約する工程を含み、前記ユニーク配列データは、前記配列データに示される配列に関する情報、及び出現頻度（集約）に関する情報を含む、項１３－１に記載の選出方法。
［項１３－３］前記準備工程が、配列データグループを取得する工程を含み、前記準備工程で取得される配列データグループが、配列決定後の配列データグループであり、前記準備工程が、同一の配列を示す複数の配列データをユニーク配列データに集約する工程を含み、前記ユニーク配列データは、前記配列データに示される配列に関する情報、及び出現頻度（集約）に関する情報を含む、項１０に記載の選出方法。
［項１３－４］前記準備工程が、アミノ酸配列を示さない配列データをクリーンアップする工程を含む、項１３－３に記載の選出方法。
［項１３－５］前記準備工程が、クリーンアップ工程、及び除外工程の順で実施することを含む、項１３－１又は項１３－２に記載の選出方法。
［項１３－６］前記準備工程が、ユニーク配列データへの集約工程、及び除外工程の順で実施することを含む、項１３－３又は項１３－４に記載の選出方法。
［項１３－７］前記準備工程が、クリーンアップ工程、ユニーク配列データへの集約工程、及び除外工程の順で実施することを含む、項１３－２又は項１３－４に記載の選出方法。
［項１３－８］前記準備工程が、配列データグループを取得する工程を含み、前記準備工程が、配列データに含まれる配列決定の品質に関する情報と、閾値とを対比することを含み、前記閾値未満の配列データをクリーンアップする工程を含む、項１０、項１３－１～項１３－７のいずれかに記載の選出方法。 [Item 10] A method for selecting sequence data indicating antigen-specific antibodies that are candidates for antibody production, the method comprising preparing a sequence data group containing sequence data of antibodies obtained from animals that may have received immune stimulation. and a step of selecting sequence data indicative of an antigen-specific antibody from a sequence data group, wherein the preparation step has a predetermined degree of sequence identity with a sequence indicated by specific sequence data in the sequence data group. If the number of sequence data related to the sequence is less than a threshold value regarding the number of sequence data, the specific sequence data is excluded from the sequence data group, and the selection step includes the step of excluding the specific sequence data from the sequence data group. The degree of sequence identity between the antibody sequence shown in the sequence data and the original sequence of the antibody is compared with a threshold value, and the original sequence is compared with the antibody sequence shown in the sequence data and the animal from which the antibody was obtained. A selection method in which the sequence has the highest sequence identity with the genome sequence of the corresponding animal species.
[Section 13-1] The preparation step includes a step of obtaining a sequence data group, the sequence data group obtained in the preparation step is a sequence data group after sequencing, and the preparation step 11. The selection method according to item 10, comprising the step of cleaning up sequence data that does not show.
[Section 13-2] The preparation step includes a step of aggregating a plurality of sequence data indicating the same sequence into unique sequence data, and the unique sequence data includes information regarding the sequence shown in the sequence data and frequency of occurrence. The selection method described in Section 13-1, including information regarding (aggregation).
[Section 13-3] The preparation step includes the step of obtaining a sequence data group, the sequence data group obtained in the preparation step is a sequence data group after sequencing, and the preparation step includes the step of obtaining a sequence data group, and the preparation step 11. The method according to item 10, comprising the step of aggregating a plurality of sequence data indicating sequences into unique sequence data, wherein the unique sequence data includes information regarding the sequence shown in the sequence data and information regarding frequency of occurrence (aggregation). Selection method.
[Item 13-4] The selection method according to Item 13-3, wherein the preparation step includes a step of cleaning up sequence data that does not indicate an amino acid sequence.
[Item 13-5] The selection method according to Item 13-1 or 13-2, wherein the preparation step includes performing a cleanup step and an exclusion step in this order.
[Item 13-6] The selection method according to Item 13-3 or Item 13-4, wherein the preparation step includes performing an aggregation step into unique sequence data and an exclusion step in this order.
[Item 13-7] The selection method according to Item 13-2 or Item 13-4, wherein the preparation step includes performing a cleanup step, an aggregation step into unique sequence data, and an exclusion step in this order.
[Section 13-8] The preparation step includes a step of acquiring a sequence data group, and the preparation step includes comparing information regarding the quality of sequencing included in the sequence data with a threshold value, 10. The selection method according to any one of Items 10 and 13-1 to 13-7, which comprises the step of cleaning up sequence data of less than or equal to.

［項１４］前記配列データグループが、前記ユニーク配列データで構成されるユニーク配列データグループであり、前記ユニーク配列データが塩基配列データであり、前記ユニーク配列データグループ中、出現頻度（集約）が最も大きいユニーク配列データを基準配列データ（ｎ＝０）とし、前記基準配列データの配列（以下「基準配列」という）と同じ長さであるが、前記基準配列に対してｎ個（ｎ＝１、・・・ｍ）の塩基置換を有する配列を有するユニーク配列データを照会配列データとし、前記準備工程が、前記基準配列データの出現頻度（集約）に対する前記照会配列の出現頻度（集約）の比率が閾値未満の照会配列を、前記基準配列データに統合する工程を含む、項１３－２～項１３－７のいずれかに記載の選出方法。
［項１５］前記準備工程が、前記配列データグループ中の配列データを、配列長及び前記動物に対応する動物種のゲノム配列上のオリジナル配列のいずれか一方又は両方に基づいてクラス分けする工程を含む、項１０及び項１３～項１４のいずれかに記載の選出方法。
［項１６－１］前記準備工程が、前記統合工程、及び前記除外工程の順で実施することを含む、項１４に記載の選出方法。
［項１６－２］前記準備工程が、前記統合工程、前記クラス分け工程、及び前記除外工程の順で実施することを含む、項１５に記載の選出方法。 [Item 14] The sequence data group is a unique sequence data group composed of the unique sequence data, the unique sequence data is nucleotide sequence data, and the sequence data group has the highest frequency of appearance (aggregation) among the unique sequence data groups. The large unique array data is set as reference array data (n=0), and the length is the same as the array of the reference array data (hereinafter referred to as "reference array"), but there are n pieces (n=1, ... m) unique sequence data having a sequence having a base substitution is used as query sequence data, and the preparation step is performed so that the ratio of the frequency of occurrence (aggregated) of the query sequence to the frequency of occurrence (aggregated) of the reference sequence data is The selection method according to any one of Items 13-2 to 13-7, including the step of integrating inquiry sequences that are less than a threshold into the reference sequence data.
[Section 15] The preparation step includes a step of classifying the sequence data in the sequence data group based on either or both of the sequence length and the original sequence on the genome sequence of the animal species corresponding to the animal. The selection method according to any one of Items 10 and 13 to 14, comprising:
[Section 16-1] The selection method according to Item 14, wherein the preparation step includes performing the integration step and the exclusion step in this order.
[Section 16-2] The selection method according to Item 15, wherein the preparation step includes performing the integration step, the classification step, and the exclusion step in this order.

［項１７］前記オリジナル配列に基づくクラス分けが、前記配列データグループ中の配列データに示される抗体の可変領域を構成するＶ遺伝子断片及びＪ遺伝子断片のいずれか一方又は両方と、前記動物に対応する動物種のゲノム配列とのホモロジー検索により得られた配列一致度を算出する工程、及び前記配列データを、最も高い配列一致度を示す遺伝子断片に関するクラスに割り当てる工程を含む、項１５又は項１６－２に記載の選出方法。
［項１８］前記配列一致度の変化が、単位時間当たりの配列一致度の変化であり、及び抗体の種類の変化が抗体のクラススイッチである、項１０及び項１３～項１７のいずれかに記載の選出方法。
［項１９］前記配列データグループが、免疫刺激を受けた可能性のある動物から、異なる時期にｉ回取得された抗体集団の各抗体の配列を配列決定した配列情報を有する配列データの集まりである、項１０及び項１３～項１８のいずれかに記載の選出方法。 [Section 17] The classification based on the original sequence corresponds to either or both of the V gene fragment and J gene fragment constituting the variable region of the antibody shown in the sequence data in the sequence data group and the animal. Item 15 or Item 16, comprising the step of calculating the degree of sequence identity obtained by a homology search with the genome sequence of an animal species, and the step of assigning the sequence data to a class regarding the gene fragment showing the highest degree of sequence identity. -2. Selection method described in 2.
[Item 18] Any one of Items 10 and 13 to 17, wherein the change in sequence identity is a change in sequence identity per unit time, and the change in antibody type is an antibody class switch. Selection method described.
[Section 19] The sequence data group is a collection of sequence data having sequence information obtained by sequencing each antibody of an antibody population obtained i times at different times from animals that may have received immune stimulation. The selection method according to any one of Items 10 and 13 to 18.

［項２０］前記配列データグループから分子系統樹を作成する工程、及び、前記分子系統樹における配列クラスター中の配列データに示される抗体の配列と前記抗体のオリジナル配列との配列一致度の変化又は抗体の種類の変化に基づいて、前記配列クラスターが抗体作製の候補となる抗原特異抗体を示す配列データを含むか否かを判定する工程を更に含み、前記選出工程が、判定工程において抗体作製の候補となる抗原特異抗体を示す配列データを含むと判定された、配列クラスターから抗原特異抗体を示す配列データを選出する工程に読み替えられる、項１０及び項１３～項１９のいずれかに記載の選出方法。
［項２１］前記準備工程、分子系統樹の作成工程、判定工程、及び選出工程の順で実施する、項２０に記載の選出方法。 [Section 20] A step of creating a molecular phylogenetic tree from the sequence data group, and a change in the degree of sequence identity between the sequence of the antibody shown in the sequence data in the sequence cluster in the molecular phylogenetic tree and the original sequence of the antibody, or The method further includes a step of determining whether or not the sequence cluster includes sequence data indicating an antigen-specific antibody that is a candidate for antibody production, based on a change in the type of antibody, and the selection step The selection according to any one of Items 10 and 13 to 19, which is replaced with the step of selecting sequence data indicating an antigen-specific antibody from a sequence cluster determined to include sequence data indicating a candidate antigen-specific antibody. Method.
[Item 21] The selection method according to Item 20, wherein the preparation step, the creation step of a molecular phylogenetic tree, the determination step, and the selection step are performed in this order.

上記した第１の態様及び第４の態様に係る実施形態について言及した特徴は、本発明の第２の態様、第３の態様、第５の態様及び第６の態様に係る実施形態の対応する工程及び処理に適宜適用することができる。 The features mentioned above for the embodiments according to the first aspect and the fourth aspect correspond to the embodiments according to the second aspect, third aspect, fifth aspect and sixth aspect of the present invention. It can be applied to processes and treatments as appropriate.

以下、具体的な実施例を記載するが、この実施例は本発明の好ましい実施形態を示すものであり、添付する特許請求の範囲に記載の発明をいかようにも限定するものではない。 Hereinafter, specific examples will be described, but these examples indicate preferred embodiments of the present invention, and are not intended to limit the invention described in the appended claims in any way.

［実施例１］
本実施例では、抗原が投与された動物由来の抗体集団を配列決定することで得られた配列データグループの中から、抗原特異抗体を示す配列データを効率的に選び出すことを目的とする。 [Example 1]
The purpose of this example is to efficiently select sequence data indicating antigen-specific antibodies from a sequence data group obtained by sequencing an antibody population derived from an animal to which an antigen has been administered.

（１）抗原投与及び採血プロトコル
アルパカ一頭に、２ｍｇ／回のヒトＩｇＧ抗体のＦａｂを抗原としてアジュバントと共に投与した。抗原は２週間隔で６回投与した。前記アルパカから血液（５０ｍＬ／回）を、抗原投与前に１回、抗原投与開始から１週間ごとに１４回、計１５回採取した。抗原投与及び採血のプロトコルの概要を以下に示す。

(1) Antigen administration and blood collection protocol 2 mg/dose of human IgG antibody Fab was administered as an antigen together with an adjuvant to one alpaca. Antigen was administered six times at two-week intervals. Blood (50 mL/time) was collected from the alpaca a total of 15 times, once before antigen administration and 14 times every week from the start of antigen administration. A summary of the antigen administration and blood collection protocols is provided below.

（２）抗体遺伝子の配列決定
各時期に採取された血液サンプルに、ヒト用リンパ球分離液（１．０７７ｇ／ｍＬ）及びＥＤＴＡ（終濃度０．１％）を添加し、リューコセップ（遠心管）を用いた遠心分離によって末梢血単核細胞（ＰｅｒｉｐｈｅｒａｌＢｌｏｏｄＭｏｎｏｎｕｃｌｅａｒＣｅｌｌｓ：ＰＢＭＣｓ）を分離した。分離した各時期の血液サンプルのＰＢＭＣｓをＰＢＳでそれぞれ洗浄し、次いで、各血液サンプルのｍＲＮＡを、ＲＮＡｉｓｏＰｌｕｓ（タカラバイオ株式会社）を用いて抽出した。 (2) Sequencing of antibody genes Human lymphocyte separation solution (1.077 g/mL) and EDTA (final concentration 0.1%) were added to the blood samples collected at each time, and Leukosep (centrifuge tube) was added. Peripheral blood mononuclear cells (PBMCs) were separated by centrifugation using a PBMC. The PBMCs of the separated blood samples at each time period were washed with PBS, and then the mRNA of each blood sample was extracted using RNAisoPlus (Takara Bio Inc.).

抽出したｍＲＮＡから、ＳｕｐｅｒＳｃｒｉｐｔ（商標）ＩＩＩＦｉｒｓｔ－ＳｔｒａｎｄＳｙｎｔｈｅｓｉｓＳｙｓｔｅｍ（ｉｎｖｉｔｒｏｇｅｎ社）を用いてｃＤＮＡを合成した。合成したｃＤＮＡ及び以下の表に示したプライマーセットを用い、ＰＣＲ法により、アルパカＩｇＧ２及びＩｇＧ３抗体の可変領域をコードしたＤＮＡ断片（約３００ｂｐ～約５００ｂｐの配列長）を得た。 cDNA was synthesized from the extracted mRNA using SuperScript (trademark) III First-Strand Synthesis System (Invitrogen). DNA fragments (sequence lengths of about 300 bp to about 500 bp) encoding the variable regions of alpaca IgG2 and IgG3 antibodies were obtained by PCR using the synthesized cDNA and the primer set shown in the table below.

ＮｅｘｔｅｒａＸＴＩｎｄｅｘｋｉｔ（イルミナ社）を用い、前記ＤＮＡ断片に、Ｍｉｓｅｑ用のアダプター配列及びバーコード配列を付加したシークエンス用のＤＮＡ断片ライブラリーを調製した。前記ライブラリーの塩基配列は、イルミナ社の次世代シークエンサー（Ｍｉｓｅｑ）を用いて、ペアエンド法により決定した。 A DNA fragment library for sequencing was prepared by adding an adapter sequence for Miseq and a barcode sequence to the DNA fragments using Nextera XT Index kit (Illumina). The base sequence of the library was determined by the paired-end method using Illumina's next-generation sequencer (Miseq).

（３）配列データグループ
（Ａ）配列決定後の配列データグループ
各時期に採取された血液サンプル中の抗体集団の各抗体の配列データを集め、配列決定後の配列データグループとした。抗体の配列データは、抗体の可変領域部分をコードする塩基配列に関する情報、抗体のタイプ（ＩｇＧ２又はＩｇＧ３）に関する情報、採血時期に関する情報、採血した動物に関する情報、及び品質スコア（ＱｕａｌｉｔｙＳｃｏｒｅ；Ｑスコア）を有した。
配列決定後の配列データグループ中、各時期に採取された血液サンプル中の抗体（ＩｇＧ２及びＩｇＧ３）に関する配列データの個数（Ａ）を以下の表ａに示す。 (3) Sequence data group
(A) Sequence data group after sequencing Sequence data for each antibody in the antibody population in blood samples collected at each period was collected to form a sequence data group after sequencing. Antibody sequence data includes information on the base sequence encoding the variable region portion of the antibody, information on the type of antibody (IgG2 or IgG3), information on the timing of blood collection, information on the animal from which the blood was collected, and quality score (Q score). ).
The number (A) of sequence data regarding antibodies (IgG2 and IgG3) in blood samples collected at each time in the sequence data group after sequencing is shown in Table a below.

本実施例において、分析対象である配列データグループ中の配列データの個数は約４９５万であった（表ａ）。
本実施例の目的は、前記配列データグループの中から、抗原特異抗体を示す配列データを効率的に選び出すことである。この目的のために、本実施例では、配列データグループ中の配列データについて分子系統樹を作成する。一般に、分子系統樹の作成は、分析対象であるデータ数が増えると、その計算量が指数関数的に増加する。このため、過大なデータ数に対して分子系統樹の作成を行った場合、その計算に膨大な時間を要することとなり、現実的ではない。或いは、計算機の性能によっては、適切な分子系統樹が作成できない場合もある。本実施例の目的を達成するために、前記配列データグループに対して後述の処理（Ｂ）～（Ｆ）を施して、データ数を減らしたうえで、分子系統樹を作成し、作成した分子系統樹から抗原特異抗体を選び出すこととした。 In this example, the number of sequence data in the sequence data group to be analyzed was approximately 4.95 million (Table a).
The purpose of this example is to efficiently select sequence data indicating antigen-specific antibodies from the sequence data group. For this purpose, in this example, a molecular phylogenetic tree is created for the sequence data in the sequence data group. Generally, when creating a molecular phylogenetic tree, the amount of calculation increases exponentially as the number of data to be analyzed increases. For this reason, when a molecular phylogenetic tree is created for an excessive amount of data, the calculation requires an enormous amount of time, which is not practical. Alternatively, depending on the performance of the computer, it may not be possible to create an appropriate molecular phylogenetic tree. In order to achieve the purpose of this example, the sequence data group was subjected to the processes (B) to (F) described below to reduce the number of data, a molecular phylogenetic tree was created, and the created molecules We decided to select antigen-specific antibodies from the phylogenetic tree.

（Ｂ）ユニーク配列データへの集約
前記配列データグループ中、同一の塩基配列を示す配列データを１つの配列データ（＝「ユニーク配列データ」）に集約した。ユニーク配列データの生成を、前記配列データグループ全体に対して行い、ユニーク配列データグループを生成した。このユニーク配列データの集約処理により、例えば血液サンプル（１回目）のＩｇＧ２抗体に関しては、配列ライブラリー中の配列データの個数は約２１万個（Ａ）から約９万個（Ｂ）に減少した（表ａ）。従って、ユニーク配列データへの集約処理により、分子系統樹作成のための計算量を低減させることができる。
ユニーク配列データは、配列データに含まれる前記情報に加えて、集約された塩基配列データの個数に関する情報（＝「出現頻度（集約）」）をさらに含む。 (B) Aggregation into unique sequence data Among the above sequence data groups, sequence data showing the same base sequence was aggregated into one sequence data (="unique sequence data"). Unique sequence data was generated for the entire sequence data group to generate a unique sequence data group. Through this unique sequence data aggregation process, for example, regarding the IgG2 antibody from the blood sample (first time), the number of sequence data in the sequence library decreased from about 210,000 pieces (A) to about 90,000 pieces (B). (Table a). Therefore, the amount of calculation for creating a molecular phylogenetic tree can be reduced by aggregating data into unique sequence data.
In addition to the above-mentioned information included in the sequence data, the unique sequence data further includes information regarding the number of aggregated base sequence data (="frequency of appearance (aggregated)").

（Ｃ）アミノ酸配列を示さない配列データ等の除外（クリーンアップ）
ユニーク配列データのユニーク配列の配列長が３の倍数でない場合、そのユニーク配列データは抗体となるアミノ酸に関する情報を有していないことを意味する。ユニーク配列データグループから、アミノ酸配列を示さないユニーク配列データを除外（クリーンアップ）した。このクリーンアップ処理を、ユニーク配列データグループ中の全ての配列データに対して行い、その配列長が３の倍数のユニーク配列データから構成されるクリーンアップ配列データグループを生成した。
このクリーンアップ処理により、例えば血液サンプル（１回目）のＩｇＧ２抗体に関しては、ユニーク配列データの配列ライブラリー中の配列データの個数は約９万個（Ｂ）から約６．９万個（Ｃ）に減少した（表ａ）。従って、クリーンアップ処理により、分子系統樹作成のための計算量を低減させることができる。 (C) Exclusion of sequence data, etc. that does not indicate the amino acid sequence (cleanup)
If the sequence length of the unique sequence of the unique sequence data is not a multiple of 3, it means that the unique sequence data does not have information regarding the amino acid that becomes the antibody. Unique sequence data that does not indicate an amino acid sequence was removed (cleaned up) from the unique sequence data group. This cleanup process was performed on all sequence data in the unique sequence data group to generate a clean-up sequence data group composed of unique sequence data whose sequence length was a multiple of 3.
Through this cleanup process, for example, regarding the IgG2 antibody of the blood sample (first time), the number of sequence data in the sequence library of unique sequence data will go from about 90,000 pieces (B) to about 69,000 pieces (C). (Table a). Therefore, the amount of calculation for creating a molecular phylogenetic tree can be reduced by the cleanup process.

（Ｄ）配列決定の誤りの統合
配列決定により得られた配列情報には、配列決定に付随する過程で生じるランダムなエラーが含まれ得ることが知られている。ランダムなエラーを有する配列データは、本来の配列に対して一塩基置換、二塩基置換・・・ｎ個の塩基置換が導入された配列情報を有し得る。このような塩基置換を有する配列データは、ランダムに生成されるため、当該配列データの出現率はポアソン分布に従う。あるユニーク配列データの出現率と、そのユニーク配列と一塩基、二塩基、・・・ｎ塩基異なる塩基配列を示す配列データの出現率が、ポアソン分布に従う場合、それら一群の配列データは、前記ユニーク配列にランダムなエラーが入った配列を示す配列データの集まりと考えられるため、前記一群の配列データを本来の配列を示すユニーク配列データに統合する。 (D) Integration of Sequencing Errors It is known that sequence information obtained by sequencing can contain random errors that occur during the processes associated with sequencing. Sequence data with random errors may have sequence information in which a single base substitution, a double base substitution, . . . n base substitutions have been introduced with respect to the original sequence. Sequence data having such base substitutions is randomly generated, so the appearance rate of the sequence data follows a Poisson distribution. If the appearance rate of a certain unique sequence data and the appearance rate of sequence data showing a base sequence that differs from that unique sequence by 1, 2, ... n bases follow a Poisson distribution, then a group of sequence data Since it is considered to be a collection of sequence data representing sequences with random errors in the sequence, the group of sequence data is integrated into unique sequence data representing the original sequence.

本実施例では、配列決定の誤りの統合は以下のように行った：
（ａ）基準配列αに関する統合用配列データグループの生成
クリーンアップ配列データグループ中、出現頻度（集約）が一番大きいユニーク配列データを基準配列データαとし、基準配列データαに示される塩基配列（以下「基準配列α」ともいう）と同じ配列長であるが、基準配列αと一塩基異なる配列を示すユニーク配列データ、二塩基異なる配列を示す配列データ、・・・ｍ個の塩基が異なる配列を示す配列データ（以下「照会配列データ（ｎ＝ｍ）」ともいう）を集め、基準配列αに関する統合用配列データグループを生成した。 In this example, the integration of sequencing errors was performed as follows:
(a) Generation of sequence data group for integration regarding reference sequence α
Among the cleanup sequence data groups, the unique sequence data with the highest frequency of occurrence (aggregation) is set as the standard sequence data α, and the sequence length is the same as the base sequence shown in the standard sequence data α (hereinafter also referred to as “standard sequence α”). However, unique sequence data that shows a sequence that differs by one base from the standard sequence α, sequence data that shows a sequence that differs by two bases, ... sequence data that shows a sequence that differs by m bases (hereinafter referred to as "query sequence data (n=m )") were collected to generate a sequence data group for integration regarding the reference sequence α.

基準配列αに関する統合用配列データグループにおいて、基準配列αと一塩基異なる塩基配列を示す照会配列データ（ｎ＝１）には、塩基置換の数は一個で共通するが、塩基置換の場所が異なる様々な塩基配列のユニーク配列データが存在する。照会配列αに関する統合用配列データグループにおいて、照会配列データ（ｎ＝ｍ）におけるｍは、基準配列データαの出現頻度（集約）と塩基置換の期待値とから、出現頻度（集約）がはじめて１以下になる値とした。 In the integrated sequence data group related to the reference sequence α, the query sequence data (n = 1) that shows a base sequence that differs by one base from the reference sequence α has one base substitution in common, but the location of the base substitution is different. Unique sequence data exists for various base sequences. In the integrated sequence data group for the query sequence α, m in the query sequence data (n = m) is determined from the appearance frequency (aggregate) of the reference sequence data α and the expected value of base substitutions to be 1 for the first time. The values were set as below.

（ｂ）確率密度の算出
基準配列αに関する統合用配列データグループにおいて、基準配列データαの出現頻度（集約）と各照会配列データ（ｎ＝１、２・・・ｍ）の出現頻度（集約）の総和で、基準配列データαの出現頻度（集約）を除することで、基準配列データαの確率密度を算出した。同様に、前記総和で、各照会配列データ（ｎ＝１、２・・・ｍ）の出現頻度（集約）を除することで、各照会配列データの確率密度を算出した。ここで、照会配列データ（ｎ＝１）には、基準配列αと一塩基異なる種々のユニーク配列データが含まれている。従って、照会配列データ（ｎ＝１）の確率密度は、基準配列αと一塩基異なる個々のユニーク配列データの確率密度の合計値である。 (b) Calculation of probability density In the integrated sequence data group regarding the standard sequence α, the appearance frequency (aggregation) of the standard sequence data α and the appearance frequency (aggregation) of each query sequence data (n=1, 2...m) The probability density of the standard sequence data α was calculated by dividing the frequency of appearance (aggregation) of the standard sequence data α by the sum of . Similarly, the probability density of each query sequence data was calculated by dividing the frequency of appearance (aggregation) of each query sequence data (n=1, 2, . . . m) by the total sum. Here, the query sequence data (n=1) includes various unique sequence data that differ by one base from the reference sequence α. Therefore, the probability density of the query sequence data (n=1) is the sum of the probability densities of the individual unique sequence data that differ by one base from the reference sequence α.

（ｃ－１）基準配列に関するヒストグラムがポアソン分布に従う場合
基準配列αに関する統合用配列データグループについて、基準配列αに対する塩基置換の個数の順に並べた、基準配列αに関するヒストグラムを作成した（図１ａ）。基準配列αに関するヒストグラムにおいて、基準配列データαの確率密度と照会配列データ（ｎ＝１～５）の確率密度の分布がポアソン分布に従う場合（図１ｂ）、基準配列αに関する統合用配列データグループ中の照会配列データ（ｎ＝１～５）を基準配列データα（ｎ＝０））に統合し、基準配列αに関する合ユニーク配列データを生成した（図１ｃ）。 (c-1) When the histogram related to the reference sequence follows a Poisson distribution For the integration sequence data group related to the reference sequence α, a histogram related to the reference sequence α was created, arranged in order of the number of base substitutions with respect to the reference sequence α (Figure 1a) . In the histogram related to the reference sequence α, if the distribution of the probability density of the reference sequence data α and the probability density of the query sequence data (n = 1 to 5) follows a Poisson distribution (Fig. 1b), in the integration sequence data group regarding the reference sequence α. The query sequence data (n=1 to 5) were integrated with the reference sequence data α (n=0)) to generate combined unique sequence data regarding the reference sequence α (Fig. 1c).

（ｃ－２）基準配列に関するヒストグラムがポアソン分布に従わない場合
基準配列βに関する統合用配列データグループについて、基準配列βに対する塩基置換の個数の順に並べた、基準配列βに関するヒストグラムを作成した（図１ｄ）。基準配列βに関するヒストグラムにおいて、基準配列データβの確率密度と照会配列データ（ｎ＝１～５）の確率密度の分布がポアソン分布に従わなかったため、以下の処理を行った。 (c-2) When the histogram related to the reference sequence does not follow a Poisson distribution For the integration sequence data group related to the reference sequence β, a histogram related to the reference sequence β was created, arranged in order of the number of base substitutions with respect to the reference sequence β (Fig. 1d). In the histogram regarding the reference sequence β, the distribution of the probability density of the reference sequence data β and the probability density of the query sequence data (n=1 to 5) did not follow a Poisson distribution, so the following processing was performed.

特定の照会配列データ（例えばｎ＝２）に含まれる特定のユニーク配列データの出現率（＝［そのユニーク配列の出現頻度（集約）］／［基準配列データβの出現頻度（集約）と各照会配列データ（ｎ＝１、２・・・ｍ）の出現頻度（集約）の総和］）が、確率の観点から外れ値と判断される場合、そのユニーク配列データは、基準配列βとは別のユニーク配列と推定される。この推定に基づき、前記別のユニーク配列を示すユニーク配列データ（以下「基準配列データγ」という）を、基準配列βに関する統合用配列データグループから除外し、元のクリーンアップ配列データグループに戻した。 Occurrence rate of specific unique sequence data included in specific query sequence data (for example, n = 2) (= [frequency of occurrence (aggregated) of that unique sequence] / [frequency of occurrence (aggregated) of reference sequence data β and each query If the total appearance frequency (aggregation) of sequence data (n=1, 2...m)]) is judged to be an outlier from the perspective of probability, the unique sequence data is different from the standard sequence β. Estimated to be a unique sequence. Based on this estimation, the unique sequence data indicating the other unique sequence (hereinafter referred to as "reference sequence data γ") was excluded from the integration sequence data group regarding reference sequence β and returned to the original cleanup sequence data group. .

また、基準配列データγを除外した基準配列βに関する統合用配列データグループにおいて、基準配列データγの塩基配列と同一の塩基置換（同じ塩基の番号に、同じ種類の塩基置換）を有する、基準配列γと一塩基、二塩基・・・ｌ個の塩基が異なる塩基配列を示す配列データを、基準配列βに関する統合用配列データグループから除外し、元のクリーンアップ配列データグループに戻した。この処理は、基準配列データγの塩基配列と同一の塩基置換を有する配列データは、基準配列データγの塩基配列にランダムなエラーが生じた配列を示す配列データと考えられるためである。 In addition, in the integrated sequence data group for the standard sequence β excluding the standard sequence data γ, the standard sequence that has the same base substitution as the base sequence of the standard sequence data γ (the same type of base substitution at the same base number) Sequence data showing a base sequence different from γ by one base, two bases, etc. was excluded from the integration sequence data group regarding the reference sequence β and returned to the original cleanup sequence data group. This process is performed because sequence data having the same base substitution as the base sequence of the standard sequence data γ is considered to be sequence data indicating a sequence in which a random error has occurred in the base sequence of the standard sequence data γ.

基準配列データγの配列に関連する配列データを除外した基準配列βに関する統合用配列データグループにおいて、確率の観点から外れ値と判断される確率密度を示すユニーク配列データが存在する場合、上記除外処理を繰り返し行った。
基準配列βと別のユニーク配列を示す配列データを除外した後の基準配列βに関する統合用配列データグループにおいて、基準配列データβの確率密度とその照会配列データ（ｎ＝１、２・・・ｍ）の確率密度の分布がポアソン分布に従う場合（図１ｅ）、照会配列データ（ｎ＝１、２・・・ｍ）を基準配列データβに統合した。 If there is unique sequence data that exhibits a probability density that is judged to be an outlier from a probability perspective in the integration sequence data group related to the standard sequence β from which sequence data related to the sequence of the standard sequence data γ has been excluded, the above exclusion process is performed. was repeated.
In the integrated sequence data group regarding the standard sequence β after excluding the sequence data indicating a unique sequence different from the standard sequence β, the probability density of the standard sequence data β and its query sequence data (n=1, 2...m ), the query sequence data (n=1, 2...m) were integrated into the reference sequence data β if the distribution of the probability density of ) followed a Poisson distribution (Fig. 1e).

クリーンアップ配列データグループから配列決定の誤りの統合に用いた配列データを除外し、除外したクリーンアップ配列データグループについて、上記配列決定の誤り統合処理を繰り返す。配列決定の誤り統合処理を繰り返すうち、基準配列データδについても、配列決定の誤りの統合が実行された（図１ｆ）。 Sequence data used for integrating sequencing errors is excluded from the cleanup sequence data group, and the above-described sequencing error integration process is repeated for the excluded cleanup sequence data group. While repeating the sequencing error integration process, the sequencing errors were also integrated for the reference sequence data δ (FIG. 1f).

（ｄ）配列決定の誤り統合後の配列データグループの生成
配列決定の誤りの統合に用いた配列データを除外した配列データグループにおいて、上記配列決定の誤りの統合処理を、一番大きい出現頻度（集約）が１になるまで繰り返した。一番大きい出現頻度（集約）が１になった場合、上記で生成した統合ユニーク配列データを集めて、統合ユニーク配列データグループを生成した。 (d) Generation of sequence data group after integration of sequencing errors In the sequence data group excluding the sequence data used for integration of sequencing errors, the integration process of the above sequencing errors is performed to This process was repeated until the total (aggregation) became 1. When the highest frequency of occurrence (aggregation) was 1, the integrated unique sequence data generated above was collected to generate an integrated unique sequence data group.

この配列決定の誤り統合処理により、配列データグループ中の配列データの個数は、例えば血液サンプル（１回目）のＩｇＧ２抗体に関して、約６．９万個（Ｃ）から約０．２万個（Ｄ）へと大幅に減少した（表ａ）。従って、配列決定の誤り統合処理により減少した個数の配列データを含む配列データグループについて分子系統樹を作成することで、その計算量を低減させることができる。 Through this sequencing error integration process, the number of sequence data in a sequence data group increases from approximately 69,000 (C) to approximately 0,200,000 (D) for IgG2 antibodies in a blood sample (first time). ) (Table a). Therefore, by creating a molecular phylogenetic tree for a sequence data group containing a reduced number of sequence data due to the sequence determination error integration process, the amount of calculation can be reduced.

統合後のユニーク配列データは、ユニーク配列データに含まれる前記情報に加えて、ユニーク配列データに統合された配列データの数に関する情報（出現頻度（統合））をさらに含む。 In addition to the above information included in the unique sequence data, the unique sequence data after integration further includes information (frequency of appearance (integration)) regarding the number of sequence data integrated into the unique sequence data.

（Ｅ）クラス分けによる、クラス分け配列データグループの生成
抗体は、Ｖ（Ｄ）Ｊ組換え（V(D)J recombination）の過程、次いで、親和性の成熟（affinity maturation）の過程を経て、成熟することが知られている。
Ｖ（Ｄ）Ｊ組換えの過程で、抗体は、ゲノム配列上に存在する複数種類のＶ遺伝子断片、複数種類のＤ遺伝子断片及び複数種類のＪ遺伝子断片からそれぞれ１つずつ組合されて産生される。この際、塩基の欠失又は付加がそれらの連結部位でランダムに生じ得る。従って、Ｖ（Ｄ）Ｊ遺伝子断片の種々の組合せを有する様々な配列長の抗体が、Ｖ（Ｄ）Ｊ組換えの過程で生成され得る。
親和性成熟の過程で、抗体は、高頻度の点突然変異をその可変領域に受ける。この点突然変異は、塩基置換変異が主であり、塩基の欠失又は挿入は稀である。従って、配列長は同一であるがアミノ酸配列が異なる様々な抗体が、親和性成熟の過程で生成され得る。 (E) Generation of classification sequence data groups by classification Antibodies undergo a process of V(D)J recombination, then an affinity maturation process, known to mature.
In the process of V(D)J recombination, antibodies are produced by combining one each of multiple types of V gene fragments, multiple types of D gene fragments, and multiple types of J gene fragments that exist on the genome sequence. Ru. At this time, deletions or additions of bases may occur randomly at the connection sites. Thus, antibodies of various sequence lengths with various combinations of V(D)J gene fragments can be generated during the process of V(D)J recombination.
During the process of affinity maturation, antibodies undergo a high frequency of point mutations in their variable regions. These point mutations are mainly base substitution mutations, and base deletions or insertions are rare. Thus, various antibodies with the same sequence length but different amino acid sequences can be generated during the process of affinity maturation.

抗原の追加投与によって、特定の抗体が成熟していく過程を経時的に追跡するにあたっては、血液サンプル中の抗体集団をＶ（Ｄ）Ｊ組換えの過程で産生された抗体ごとにクラス分けし、次いで、前記グループ中の抗体のＶ遺伝子領域、及びＪ遺伝子領域の可変領域に生じた突然変異（塩基置換）を追跡することが合理的と考えた。 To track the maturation process of specific antibodies over time due to additional administration of antigen, the antibody population in the blood sample can be classified according to the antibodies produced during the process of V(D)J recombination. Next, we considered it rational to track mutations (base substitutions) that occurred in the variable regions of the V gene region and J gene region of antibodies in the above group.

（Ｅ－１）配列長に基づくクラス分け
上記したＶ（Ｄ）Ｊ組換えの過程を考慮し、配列データグループ中の配列データをその配列長に基づいてクラス分けし（図２ａ）、各クラスに対応する配列長を示す配列データで構成される配列長クラス配列データグループを生成した。 (E-1) Classification based on sequence length Considering the process of V(D)J recombination described above, the sequence data in the sequence data group is classified based on the sequence length (Figure 2a), and each class A sequence length class sequence data group consisting of sequence data indicating the sequence length corresponding to is generated.

上記したように、抗体の親和性成熟では、塩基置換が主として生じ、通常、その配列長に変化はない。このため、配列長クラス配列データグループ中の各クラスには、Ｖ（Ｄ）Ｊ組換えにより産生された様々な抗体と、その後の親和性成熟を受けた一連の抗体とが集まっていると期待される。従って、配列長に基づくクラス分け処理により得られた各配列長クラス配列データグループについて分子系統樹を作成することで、その計算量を低減させつつ、そのクラス内に含まれる親和性成熟を受けた抗体の配列データを効率的に選び出すことが可能となる。 As mentioned above, affinity maturation of antibodies mainly results in base substitutions, and usually there is no change in the sequence length. Therefore, each class in the sequence length class sequence data group is expected to contain a variety of antibodies produced by V(D)J recombination and a series of antibodies that have undergone subsequent affinity maturation. be done. Therefore, by creating a molecular phylogenetic tree for each sequence length class sequence data group obtained through the classification process based on sequence length, we can reduce the amount of calculations and at the same time It becomes possible to efficiently select antibody sequence data.

（Ｅ－２）オリジナル配列に基づくクラス分け
親和性成熟の過程での塩基置換は、抗体のＶ遺伝子断片及びＪ遺伝子断片のうち、相補性決定領域（ＣＤＲ）に対応する部分に集中する。Ｖ遺伝子断片及びＪ遺伝子断片において、ＣＤＲ配列以外の塩基配列はゲノム配列上のＶ遺伝子断片及びＪ遺伝子断片の塩基配列とほとんど同一である。従って、その抗体のＶ遺伝子断片及びＪ遺伝子断片の塩基配列から、その抗体が、ゲノム配列上のどのＶ遺伝子断片とどのＪ遺伝子断片との組合せにより産生されたかを推定できると考えた。 (E-2) Classification based on original sequence Base substitutions during the affinity maturation process are concentrated in the portions of the antibody V and J gene fragments that correspond to the complementarity determining regions (CDRs). In the V gene fragment and the J gene fragment, the base sequences other than the CDR sequences are almost identical to the base sequences of the V gene fragment and the J gene fragment on the genome sequence. Therefore, from the nucleotide sequences of the V gene fragment and J gene fragment of the antibody, we thought that it would be possible to estimate which combination of V gene fragment and J gene fragment on the genome sequence produced the antibody.

実際、その抗体のＶ遺伝子断片及びＪ遺伝子断片の塩基配列を、アルパカのゲノム配列に対して、配列同一性をｂｌａｓｔにより検索したところ、その抗体のＶ遺伝子断片、Ｊ遺伝子断片の配列に対応するゲノム配列上のＶ遺伝子断片、及びＪ遺伝子断片それぞれの配列を推定可能であった。 In fact, when the nucleotide sequences of the V gene fragment and J gene fragment of the antibody were searched for sequence identity with the alpaca genome sequence using BLAST, they were found to correspond to the sequences of the V gene fragment and J gene fragment of the antibody. It was possible to estimate the sequences of each of the V gene fragment and J gene fragment on the genome sequence.

抗体のＶ遺伝子断片及びＪ遺伝子断片それぞれのオリジナル配列に基づいて、配列データグループをさらにクラス分けする。例えば、抗体の可変領域の塩基配列が３５４ｂｐの配列長クラス配列データグループについて、抗体のＶ遺伝子断片及びＪ遺伝子断片それぞれのオリジナル配列に基づいてクラス分けした（図２ｂ）。このオリジナル配列に基づくクラス分けを、各配列長クラス配列データグループに対して行い、クラス分け配列データグループを生成した。 The sequence data groups are further divided into classes based on the original sequences of each of the antibody V gene fragment and J gene fragment. For example, a sequence length class sequence data group in which the base sequence of the variable region of an antibody is 354 bp was classified based on the original sequences of each of the V gene fragment and J gene fragment of the antibody (FIG. 2b). Classification based on this original sequence was performed for each sequence length class sequence data group to generate classified sequence data groups.

オリジナル配列に基づいてクラス分けされた各配列データは、統合ユニーク配列データに含まれる情報に加えて、そのオリジナルのＶ遺伝子断片及びＪ遺伝子断片に関する情報をさらに含む。 Each sequence data classified based on the original sequence further includes information regarding its original V gene fragment and J gene fragment in addition to the information contained in the integrated unique sequence data.

上記したように、抗体の親和性成熟では、塩基置換が主として抗体のＣＤＲ配列に生じ、通常、その他の塩基配列に変化はない。このため、オリジナル配列データグループ中の各クラスには、Ｖ（Ｄ）Ｊ組換えにより産生された特定の抗体と、その後の親和性成熟を受けた一連の抗体とが効率的に集まっていると期待される。従って、オリジナル配列に基づくクラス分け処理により得られた各クラスについて分子系統樹を作成することで、その計算量を低減させつつ、その分子系統樹から抗原特異抗体をより効率的に選び出すことが可能となる。 As described above, during antibody affinity maturation, base substitutions occur mainly in the CDR sequences of the antibody, and other base sequences are usually not changed. Therefore, each class in the original sequence data group is an efficient collection of specific antibodies produced by V(D)J recombination and a series of antibodies that have undergone subsequent affinity maturation. Be expected. Therefore, by creating a molecular phylogenetic tree for each class obtained through classification processing based on the original sequence, it is possible to reduce the amount of calculation and more efficiently select antigen-specific antibodies from the molecular phylogenetic tree. becomes.

（Ｆ）配列クラスター形成性配列データグループの生成
親和性成熟の過程において、抗原に対する親和性が比較的高い抗体を示すＢ細胞は、前記抗原からの刺激を優先的に受け、前記Ｂ細胞は選択的に増殖することとなる。このＢ細胞の増殖過程で前記抗体の可変領域には高い変異率で変異が導入され、より高い親和性を示す抗体が産生され得る。この親和性成熟の過程を考慮すれば、より高い親和性を示す抗原特異抗体が産生されるまでに、前記抗原からの刺激を優先的に受けることを可能にした、前記抗原特異抗体の塩基配列と類似した一連の抗体群が存在したはずである。
従って、親和性成熟を受けてより高い親和性を示す抗原特異抗体は、分子系統樹解析を行った場合、その抗体の塩基配列と類似する塩基配列を有し、且つ比較的高い親和性を有する一連の抗体とともに、分子系統樹解析において配列クラスターを形成すると考えられる。 (F) Generation of sequence cluster-forming sequence data groups In the process of affinity maturation, B cells exhibiting antibodies with relatively high affinity for the antigen preferentially receive stimulation from the antigen, and the B cells are selectively stimulated by the antigen. This will lead to an increase in numbers. During the proliferation process of B cells, mutations are introduced into the variable region of the antibody at a high mutation rate, and antibodies exhibiting higher affinity can be produced. Considering this process of affinity maturation, the nucleotide sequence of the antigen-specific antibody that enables it to preferentially receive stimulation from the antigen until an antigen-specific antibody exhibiting higher affinity is produced. There must have been a series of antibodies similar to this.
Therefore, an antigen-specific antibody that undergoes affinity maturation and exhibits higher affinity has a base sequence similar to that of the antibody and a relatively high affinity when molecular phylogenetic tree analysis is performed. Together with a series of antibodies, it is thought to form a sequence cluster in molecular phylogenetic tree analysis.

反対に、抗原に対する親和性が比較的低い抗体を示すＢ細は、前記抗原からの刺激を優先的に受けることができず、前記Ｂ細胞は選択的に増殖することができない。この結果、そのような親和性が比較的低い抗体を示す抗体遺伝子に変異が蓄積することはない。
このように、親和性成熟を受けていない比較的低い親和性を示す抗体は、分子系統樹解析を行った場合、その抗体の塩基配列と類似する塩基配列を有する抗体は少なく又はほとんどなく、分子系統樹解析において配列クラスターを形成しないと考えられる。 On the contrary, B cells exhibiting antibodies with relatively low affinity for the antigen cannot preferentially receive stimulation from the antigen, and the B cells cannot selectively proliferate. As a result, mutations do not accumulate in antibody genes that exhibit antibodies with such relatively low affinity.
In this way, an antibody that has not undergone affinity maturation and exhibits a relatively low affinity can be found in molecular phylogenetic tree analysis because there are few or almost no antibodies with a base sequence similar to that of the antibody. It is thought that no sequence clusters are formed in phylogenetic tree analysis.

以上より、分子系統樹解析において、配列クラスターを形成しないと推定される抗体の配列データ（以下「配列クラスター非形成性配列データ」という）を配列データグループから予め除外することで、分子系統樹作成に必要な計算量を低減させることが可能となる。 Based on the above, in molecular phylogenetic tree analysis, by excluding in advance antibody sequence data that is estimated not to form sequence clusters (hereinafter referred to as "sequence data that does not form sequence clusters") from sequence data groups, molecular phylogenetic trees can be created. This makes it possible to reduce the amount of calculation required.

本実施例では、配列決定した抗体の塩基配列の配列長は概ね３００ｂｐ～５００ｂｐであったことから、配列同一性が約９０％となる塩基置換の数として「４０」を設定した。
上記した配列長及びオリジナル配列に基づくクラス分け配列データグループ内の任意のクラス中の特定の配列データの配列に関して、４０個までの塩基置換を有する配列を示す配列データの合計が前記クラス内で１０を下回る場合（以下「Ｕ４０＜１０」等という）、前記特定の配列データは親和性成熟を受けていない抗体に関する配列データ（即ち、配列クラスター非形成性配列データ）と評価し、前記クラスから除外した。 In this example, since the sequence length of the sequenced antibody base sequence was approximately 300 bp to 500 bp, "40" was set as the number of base substitutions that would result in approximately 90% sequence identity.
With respect to the sequences of specific sequence data in any class within the classified sequence data group based on the sequence length and original sequence described above, the total number of sequence data showing sequences having up to 40 base substitutions is 10 in the class. (hereinafter referred to as "U40<10", etc.), the specific sequence data is evaluated as sequence data related to an antibody that has not undergone affinity maturation (i.e., sequence data that does not form a sequence cluster), and is excluded from the class. did.

この配列クラスター非形成性配列データの除外を、前記クラスに含まれるすべての配列データに対して行うことで、そのクラスには４０個までの塩基置換を有する配列を示す配列データが少なくとも１０個存在する配列データ（以下「配列クラスター形成性配列データ」という）が残ることとなる。このような配列クラスター形成性配列データは、親和性成熟を受けた抗原特異抗体を示す配列データであると推定されるため、配列クラスター形成性配列データで構成される配列データグループ（以下「配列クラスター形成性配列データグループ」ともいう）には、抗原特異抗体を示す配列データが非常に効率的に含まれることとなる。 By excluding this sequence data that does not form a sequence cluster from all sequence data included in the class, there are at least 10 sequence data showing sequences with up to 40 base substitutions in that class. Sequence data (hereinafter referred to as "sequence cluster-forming sequence data") will remain. Such sequence cluster-forming sequence data is presumed to be sequence data indicative of antigen-specific antibodies that have undergone affinity maturation. (also referred to as "formative sequence data group") will very efficiently contain sequence data indicative of antigen-specific antibodies.

上記した配列クラスター非形成性配列データの除外処理により、そのクラスに含まれる配列データの個数を低減することができるため、分子系統樹作成の際の計算量を低減させることができる。また、配列クラスター形成性配列データグループについて作成した分子系統樹からは、非常に効率的に抗原特異抗体を選び出すことが可能となる。 By the above-described process of excluding sequence data that does not form a sequence cluster, the number of sequence data included in that class can be reduced, so the amount of calculation when creating a molecular phylogenetic tree can be reduced. Furthermore, it is possible to select antigen-specific antibodies very efficiently from a molecular phylogenetic tree created for a sequence cluster-forming sequence data group.

（４）分子系統樹の作成
配列データグループからＪｕｋｅｓ－Ｃａｎｔｏｒ６９を用いて距離行列を計算し、距離行列から近接結合法（neighbor-joining method）を用いることによって分子系統樹を得た。これらの計算に、統計分析ソフトウェア「Ｒ」のａｐｅパッケージを用いた。 (4) Creation of molecular phylogenetic tree A distance matrix was calculated from the sequence data group using Jukes-Cantor 69, and a molecular phylogenetic tree was obtained from the distance matrix using the neighbor-joining method. For these calculations, the ape package of the statistical analysis software "R" was used.

本実施例では、まず、配列クラスター非形成性配列データの除外を行わず、３５４ｂｐの配列長及びオリジナル配列（ｖｈｈ３－ｉｇｈＪ－４）のクラス分け配列データグループについて、分子系統樹（Ｕ４０≧０）を作成した（図３ａ）。さらに、配列クラスター非形成性配列データの除外（Ｕ４０＜１０、Ｕ４０＜５０）を行い、配列クラスター形成性配列データグループについてそれぞれ分子系統樹を作成した（図３ｂ；Ｕ４０≧１０、及び図３ｃ；Ｕ４０≧５０）。ここで、本実施例において「Ｕ４０≧１０」及び「Ｕ４０≧５０」は、ある配列データに示される塩基配列に４０個までの塩基置換を有する配列を示す配列データの数（＝「Ｕ４０」）が１０以上及び５０以上であることをそれぞれ意味する。 In this example, first, without excluding sequence data that do not form sequence clusters, we created a molecular phylogenetic tree (U40≧0) for a classification sequence data group with a sequence length of 354 bp and an original sequence (vhh3-ighJ-4). was created (Figure 3a). Furthermore, we excluded sequence data that did not form sequence clusters (U40<10, U40<50), and created molecular phylogenetic trees for each group of sequence data that formed sequence clusters (Figure 3b; U40≧10, and Figure 3c; U40≧50). Here, in this example, "U40≧10" and "U40≧50" are the number of sequence data (= "U40") indicating a sequence having up to 40 base substitutions in the base sequence shown in certain sequence data. is 10 or more and 50 or more, respectively.

配列クラスター非形成性配列データを配列データグループから予め除外することによって、分子系統樹の作成のための計算量を低減できるだけでなく、作成される分子系統樹の複雑度（例えば、枝の分岐点（ノード）の数、枝の数）が低減した。これにより、後述する分子系統樹からの配列クラスター抽出が容易となった。 By pre-excluding sequence data that do not form sequence clusters from sequence data groups, it is possible to not only reduce the amount of calculation for creating a molecular phylogenetic tree, but also to reduce the complexity of the created molecular phylogenetic tree (e.g., branching points (number of nodes, number of branches) decreased. This facilitated the extraction of sequence clusters from the molecular phylogenetic tree described below.

（５）分子系統樹から配列クラスターの抽出
親和性成熟を受けたことが期待される一連の抗体群を示す配列データを含む配列クラスターを抽出するために、以下の処理を行った。
作成された分子系統樹において、枝同士のノードから延びる枝の長さが、枝長に関する閾値以上の場合、そのノードから延びる枝において前記閾値に対応する位置よりも枝根に対して遠方（外側）の部分を配列クラスターとして抽出した（図４ａ）。また、配列クラスター抽出後の分子系統樹において、所定範囲内のノードの数がノードの数に関する閾値以上の場合、その所定の領域範囲も配列クラスターとして抽出した（図４ｂ中のｂ２）。この配列クラスター抽出により、図４ａで示される分子系統樹から９つの配列クラスター（ｂ１～ｂ９）が抽出された（図４ｂ）。 (5) Extraction of sequence clusters from the molecular phylogenetic tree In order to extract sequence clusters containing sequence data representing a series of antibody groups expected to have undergone affinity maturation, the following process was performed.
In the created molecular phylogenetic tree, if the length of a branch extending from a node between branches is greater than or equal to the threshold regarding branch length, the branch extending from that node is further away (outside) from the branch root than the position corresponding to the threshold. was extracted as a sequence cluster (Fig. 4a). Furthermore, in the molecular phylogenetic tree after sequence cluster extraction, if the number of nodes within a predetermined range was greater than or equal to the threshold for the number of nodes, that predetermined region range was also extracted as a sequence cluster (b2 in Figure 4b). Through this sequence cluster extraction, nine sequence clusters (b1 to b9) were extracted from the molecular phylogenetic tree shown in FIG. 4a (FIG. 4b).

（６）抗原特異抗体を示す配列データを含む配列クラスターであるか否かの判定
上記のように抽出された配列クラスターが、抗原特異抗体を示す配列データを含むか否かを以下のように判定した。
抗体の親和性成熟では、抗体の可変領域に点突然変異が高頻度に生じ、その結果、抗原との親和性が高い抗原特異抗体へと成熟していく。親和性成熟を受けた抗体の配列には、抗原投与後の時間経過に応じて点突然変異が数多く蓄積する。このため、その塩基配列は、そのオリジナル配列と比べた場合に、その配列同一性又はホモロジー検索におけるスコア値（配列が一致する場合に正のスコアを付与する場合）は小さくなっていくと期待される。 (6) Determining whether the sequence cluster contains sequence data indicating an antigen-specific antibody. Determine whether the sequence cluster extracted as above includes sequence data indicating an antigen-specific antibody as follows. did.
During antibody affinity maturation, point mutations occur frequently in the variable region of antibodies, and as a result, antibodies mature into antigen-specific antibodies with high affinity for antigens. Antibody sequences that have undergone affinity maturation accumulate numerous point mutations over time after antigen administration. For this reason, it is expected that the score value (a positive score is given when the sequences match) in a sequence identity or homology search for that nucleotide sequence will become smaller when compared to its original sequence. Ru.

（４）及び（５）にて説明したようにして、配列データグループから分子系統樹（図５ａ及び図５ｂ）を作成した（Ｕ４０≧１０）。図５ｂは、図４で示される系統樹（図４ａ）のうち、ｂ１で示される配列クラスター（図４ｂ）に含まれる配列データグループから作成された分子系統樹である。作成した分子系統樹における配列クラスター（図５ａ中赤丸で（曲線で丸く）囲った領域、及び図５ｂ中赤丸で（曲線で丸く）囲った領域）中の抗体の配列データについて、各抗体の配列とその抗体のオリジナル配列との配列一致度を算出した。具体的には、各抗体のＶ遺伝子領域及びＪ遺伝子領域の配列データと、ＢＬＡＳＴ検索で推定したその遺伝子領域に対応するオリジナル配列との同一性を、ビットスコア（スコア値）にて評価した。 A molecular phylogenetic tree (FIGS. 5a and 5b) was created from the sequence data group as described in (4) and (5) (U40≧10). FIG. 5b is a molecular phylogenetic tree created from the sequence data group included in the sequence cluster b1 (FIG. 4b) of the phylogenetic tree shown in FIG. 4 (FIG. 4a). Regarding the sequence data of antibodies in the sequence clusters (the area surrounded by the middle red circle (circled by a curved line) in Figure 5a and the area surrounded by the middle red circle (circled by a curved line) in Figure 5b) in the created molecular phylogenetic tree, the sequence of each antibody is The degree of sequence identity between the antibody and the original sequence of the antibody was calculated. Specifically, the identity between the sequence data of the V gene region and J gene region of each antibody and the original sequence corresponding to the gene region estimated by BLAST search was evaluated using a bit score (score value).

図５ａ及び図５ｂ中赤丸で（曲線で丸く）囲った配列クラスターそれぞれにおける抗体の配列データについて採血時期に応じたビットスコアの変化を示すために、横軸を抗原投与後の採血のタイミング（ｗｅｅｋｓ）とし、縦軸をビットスコアとした散布図を作成した（図５ｃ及び図５ｄ）。図５ｃ及び図５ｄにおいて丸プロット（○）はＩｇＧ２の配列データを示し、三角プロット（△）はＩｇＧ３の配列データを示す。各プロットのサイズは、各ユニーク配列データの個数（出現頻度（合計））を反映し、プロットサイズが大きいものほど、出現頻度（合計）が高いことを示す。各プロットの色（濃淡）は、その抗体配列がどの採血時期に初めて出現したかを示している（図５ｃ及びｄ中の挿入図）。 In order to show the change in bit score according to the time of blood collection for the sequence data of antibodies in each sequence cluster surrounded by red circles (circled with curves) in Figures 5a and 5b, the horizontal axis is the timing of blood collection after antigen administration (weeks). ), and a scatter diagram was created with the vertical axis representing the bit score (FIGS. 5c and 5d). In FIGS. 5c and 5d, circle plots (◯) indicate sequence data of IgG2, and triangular plots (△) indicate sequence data of IgG3. The size of each plot reflects the number (frequency of appearance (total)) of each unique sequence data, and the larger the plot size, the higher the frequency of appearance (total). The color (shading) of each plot indicates when the antibody sequence first appeared (insets in Figures 5c and d).

（配列一致度の傾き）
図５ｃは、抗原投与前（１回目）の抗体のオリジナル配列から算出されたビットスコアと、４～１５回の採血時期の抗体の配列データから算出されたビットスコアとがほとんど変化していないことを示す。これは、６回の抗原投与を含む本実施例のプロトコルでは、この配列クラスター（図５ａ中の赤丸（曲線で丸く囲われた領域））中の配列データに示される抗体には変異は蓄積されなかったこと、即ち、前記配列クラスターは投与した抗原に特異的な抗体を示す配列データを含まないことを示している。従って、この配列クラスターは、抗原特異抗体を含む配列クラスターではないと判定される。 (Slope of sequence matching)
Figure 5c shows that there is almost no change between the bit score calculated from the original sequence of the antibody before antigen administration (first time) and the bit score calculated from the antibody sequence data from the 4th to 15th blood sampling period. shows. This means that with the protocol of this example, which includes six antigen administrations, mutations are not accumulated in the antibodies shown in the sequence data in this sequence cluster (red circle (area circled by a curve) in Figure 5a). ie, the sequence cluster does not contain sequence data indicative of antibodies specific to the administered antigen. Therefore, this sequence cluster is determined not to be a sequence cluster containing antigen-specific antibodies.

図５ｄは、抗原投与前（１回目）の抗体のオリジナル配列から算出されたビットスコアと比べて、４～１５回の採血時期の抗体の配列データから算出されたビットスコアは一定の割合で低下した。これは、６回の抗原投与を含む本実施例のプロトコルにより、この配列クラスター（図５ｂ中の赤丸（曲線で丸く囲われた領域））中の配列データに示される抗体に変異が順次蓄積されたこと、即ち、前記配列クラスターは抗原特異抗体を示す配列データを含むことを示唆する。従って、この配列クラスターは、抗原特異抗体を含む配列クラスターであると判定される。 Figure 5d shows that compared to the bit score calculated from the original sequence of the antibody before antigen administration (first time), the bit score calculated from the antibody sequence data from the 4th to 15th blood collection period decreases at a certain rate. did. This is due to the sequential accumulation of mutations in the antibodies shown in the sequence data in this sequence cluster (red circle (area circled by a curve) in Figure 5b) according to the protocol of this example, which includes six antigen administrations. This suggests that the sequence cluster contains sequence data indicative of antigen-specific antibodies. Therefore, this sequence cluster is determined to be a sequence cluster containing antigen-specific antibodies.

（時間経過に伴う抗体種類の多様化）
図５ｄで示されるプロットの色（濃淡）は、抗原投与後からの週数（ｗｅｅｋｓ）が増えるに従って、黄色（淡色の「３」）から水色（比較的濃い色の「９」）を経て赤色（濃色の「１５」）（図５ｄ挿入図）へと変化した。この色（濃淡）の変化は、抗原の追加投与を通じて、抗体の種類が増えたことを示している。反対に、図５ｃで示されるプロットの色（濃淡）は、緑色（淡色の「４～７」）のままであった。この色（濃淡）が変化しないことは、抗原の追加投与を通じて、抗体の種類に変化がなかったことを示している。
これらの結果は、配列クラスター内の配列データの種類が、抗原投与後のある時点で閾値よりも多い場合又は時間経過に伴って増える場合、その配列クラスターは、抗原特異抗体を含む配列クラスターであることを示唆する。従って、そのような配列クラスターは、抗原特異抗体を含む配列クラスターであると判定される。 (Diversification of antibody types over time)
The color (shading) of the plot shown in Figure 5d changes from yellow (light color "3") to light blue (relatively dark color "9") to red as the number of weeks after antigen administration increases. (Dark color "15") (Fig. 5d inset). This change in color (shade) indicates that the number of antibody types has increased through additional administration of antigen. On the contrary, the color (shade) of the plot shown in FIG. 5c remained green (light color "4-7"). The fact that this color (shade) does not change indicates that there was no change in the type of antibody through additional administration of the antigen.
These results indicate that if the number of types of sequence data in a sequence cluster exceeds a threshold at a certain point after antigen administration or increases over time, that sequence cluster is a sequence cluster containing antigen-specific antibodies. suggests that. Therefore, such a sequence cluster is determined to be a sequence cluster containing antigen-specific antibodies.

（クラススイッチ）
図５ｄでは、丸及び三角のプロットが含まれ、それらの出現比が時間経過に伴って大きく変化している。これは、この配列クラスター内にＩｇＧ２抗体とＩｇＧ３抗体とが含まれること、及び、抗体のクラスがＩｇＧ２とＩｇＧ３とで入れ替わっていること（即ち抗体のクラススイッチが生じていること）を示している。反対に、図５ｃでは、丸及び三角のプロットの出現比がほぼ一定であった。これは、この配列クラスター内でクラススイッチはほとんど生じていないことを示している。
これらの結果は、配列クラスター内の配列データに示される抗体クラスが、時間経過に伴って変化する場合、その配列クラスターは、抗原特異的な抗体を含む配列クラスターであることを示唆する。従って、そのような配列クラスターは、抗原特異抗体を含む配列クラスターであると判定される。 (class switch)
In FIG. 5d, plots of circles and triangles are included, and their appearance ratio changes significantly over time. This indicates that this sequence cluster contains IgG2 antibodies and IgG3 antibodies, and that the antibody classes have been switched between IgG2 and IgG3 (that is, antibody class switching has occurred). . On the contrary, in Fig. 5c, the appearance ratio of round and triangular plots was almost constant. This indicates that almost no class switching has occurred within this sequence cluster.
These results suggest that when the antibody class shown in the sequence data within a sequence cluster changes over time, the sequence cluster is a sequence cluster containing antigen-specific antibodies. Therefore, such a sequence cluster is determined to be a sequence cluster containing antigen-specific antibodies.

（抗原投与前の配列一致度）
図５ｄの抗原投与前のビットスコアは約４３０であった。他方、図５ｃの抗原投与前のビットスコアは約３５０であった。本実施例において、ビットスコアは、オリジナル配列との配列一致度が高いほど、大きな数値を示す。
抗原投与前のビットスコアが小さい理由として、その抗体は、本実施例で投与された抗原とは別の抗原に対して既に親和性成熟を受けたものであることが推測される。そのような親和性成熟を受けた抗体は、本実施例で投与された抗原に対して親和性成熟を受けることは生物学的に通常ない。実際、抗原投与前のビットスコアのその最大値に対する割合が小さい抗体を含む図５ｃの配列クラスターは、上記のとおり、本実施例で投与された抗原に対する抗原特異抗体ではないと結論された。従って、抗原投与前のビットスコアのその最大値に対する割合が小さい抗体を含む配列クラスターは、抗原特異抗体を含まないと判定される。 (Sequence identity before antigen administration)
The bit score before challenge in Figure 5d was approximately 430. On the other hand, the bit score before challenge in Figure 5c was approximately 350. In this example, the bit score indicates a larger numerical value as the degree of sequence matching with the original sequence is higher.
The reason for the small bit score before antigen administration is that the antibody has already undergone affinity maturation for an antigen different from the antigen administered in this example. Antibodies that have undergone such affinity maturation do not biologically normally undergo affinity maturation for the antigen administered in this example. In fact, it was concluded that the sequence cluster in Figure 5c containing antibodies with a small ratio of pre-antigen bit score to its maximum value was not an antigen-specific antibody against the antigen administered in this example, as described above. Therefore, a sequence cluster containing an antibody whose bit score before antigen administration has a small ratio to its maximum value is determined not to contain an antigen-specific antibody.

（７）配列クラスター中の配列データから再構築した抗体の結合力の測定
上記（６）のようにして、抗原特異的な抗体の配列データを含むと判定された配列クラスター中の配列データの塩基配列に基づいて、対応する可変領域を含む抗体を再構築した。
具体的には、前記配列データに示される、抗原特異的な抗体の可変領域をコードする塩基配列の情報に基づいて、大腸菌発現用に塩基コドンを最適化した塩基配列を有するＤＮＡを合成し、合成したＤＮＡを大腸菌発現用ベクター（ｐＡＥＤ４またはｐＣｏｌｄ３）に挿入して、ＶＨＨ抗体発現用プラスミドを構築した。前記プラスミドを用いて形質転換した大腸菌（ＢＬ２１（ＤＥ３）ｐＬｙｓＳ）を所定条件下で培養して、前記ＤＮＡから対応する可変領域を有するＶＨＨ抗体を産生させ、公知の方法に従って、大腸菌から前記ＶＨＨ抗体を回収した。 (7) Measurement of binding strength of antibodies reconstructed from sequence data in sequence clusters Bases of sequence data in sequence clusters determined to contain sequence data of antigen-specific antibodies as described in (6) above Based on the sequences, antibodies containing the corresponding variable regions were reconstructed.
Specifically, based on the information on the base sequence encoding the variable region of an antigen-specific antibody shown in the sequence data, synthesize a DNA having a base sequence with optimized base codons for expression in E. coli, The synthesized DNA was inserted into an E. coli expression vector (pAED4 or pCold3) to construct a VHH antibody expression plasmid. E. coli (BL21(DE3)pLysS) transformed using the plasmid is cultured under predetermined conditions to produce a VHH antibody having the corresponding variable region from the DNA, and the VHH antibody is extracted from E. coli according to a known method. was recovered.

回収した前記抗体の投与した抗原（ヒトＩｇＧ抗体のＦａｂ）に対する結合力（解離定数）を、Ｂｉａｃｏｒｅ２０００（ＧＥヘルスケアジャパン）にて測定した。結合力測定では、回収した抗体（濃度：１５０ｎＭ、１００ｎＭ、７５ｎＭ、５０ｎＭ、２５ｎＭ、１２．５ｎＭ）を、ＣＭ５センサーチップ上に固定化した抗原Ｆａｂと接触させる様式にて測定した（図５ｆ）。 The binding strength (dissociation constant) of the collected antibody to the administered antigen (Fab of human IgG antibody) was measured using Biacore 2000 (GE Healthcare Japan). In the binding force measurement, the collected antibodies (concentrations: 150 nM, 100 nM, 75 nM, 50 nM, 25 nM, 12.5 nM) were measured in a manner that they were brought into contact with the antigen Fab immobilized on a CM5 sensor chip (Figure 5f).

その結果、抗原特異的な抗体の配列データを含むと判定された配列クラスター中の前記配列データに示される可変領域を有する抗体は、抗原Ｆａｂに対して０．５ｎＭの解離定数でもって結合することが示された。実施例１の結果から、抗原特異抗体を示す配列データを含むと判定された配列クラスターは、実際に、比較的強い結合力でもって抗原に結合する抗体（抗原特異抗体）の塩基配列データを含むことが示された。 As a result, antibodies having variable regions shown in the sequence data in sequence clusters determined to include sequence data of antigen-specific antibodies bind to antigen Fab with a dissociation constant of 0.5 nM. It has been shown. From the results of Example 1, the sequence cluster determined to contain sequence data indicating an antigen-specific antibody actually contains base sequence data of an antibody (antigen-specific antibody) that binds to the antigen with relatively strong binding force. It was shown that

同様に、上記（６）において、抗原特異的な抗体の配列データを含まないと判定された配列クラスター中の配列データの塩基配列に基づいて、対応する可変領域を含む抗体を再構築し、その抗体の前記抗原に対する結合力を測定した（図５ｅ）。その結果、抗原に対する結合力を測定することはできず、実際に、抗原特異抗体の塩基配列データではないことが示された。 Similarly, in (6) above, based on the base sequence of the sequence data in the sequence cluster determined not to contain antigen-specific antibody sequence data, an antibody containing the corresponding variable region is reconstructed, and the antibody containing the corresponding variable region is reconstructed. The binding strength of the antibody to the antigen was measured (Fig. 5e). As a result, it was not possible to measure the binding strength to the antigen, and it was shown that the data was not actually the base sequence data of an antigen-specific antibody.

［実施例２］
抗原特異的な抗体の配列データを含むと判定された配列クラスター中の種々の配列データから再構築した抗体の結合力測定を行った。
図５ｂとは別のクラス分け配列データグループ（配列長：３８１ｂｐ、Ｖ遺伝子断片：ＶＨＨ３－Ｓ８、及びＪ遺伝子断片：ｉｇｈＪ－６）について、実施例１と同様にして分子系統樹を作成した（図６ａ）。図６ａの赤丸で（曲線で丸く）囲った領域中の抗体配列データについて、前述の方法と同様にして、横軸を抗原投与後の採血のタイミング（ｗｅｅｋｓ）とし、縦軸をビットスコアとした散布図を作成した（図６ｂ）。プロットの傾きおよび色（濃淡）の変遷から、この配列クラスターは抗原特異抗体を含む配列クラスターであると判定された。 [Example 2]
We measured the binding strength of antibodies reconstructed from various sequence data in sequence clusters determined to contain antigen-specific antibody sequence data.
A molecular phylogenetic tree was created in the same manner as in Example 1 for a classification sequence data group (sequence length: 381 bp, V gene fragment: VHH3-S8, and J gene fragment: ighJ-6) different from that in Figure 5b ( Figure 6a). Regarding the antibody sequence data in the area surrounded by a red circle (circled by a curve) in Fig. 6a, the horizontal axis is the timing (weeks) of blood collection after antigen administration, and the vertical axis is the bit score, in the same manner as described above. A scatter plot was created (Figure 6b). From the slope of the plot and the change in color (shading), this sequence cluster was determined to be a sequence cluster containing an antigen-specific antibody.

次に、図６ａ中赤丸で示した（曲線で丸く囲われた）配列クラスターに含まれる抗体の配列データを用いて分子系統樹を作成した（図６ｃ）。実施例１に記載の方法と実質的に同じ方法で、図６ｃ中破線矢印で示した（ｃ１、ｃ２、ｃ３）の配列データから、抗体をそれぞれ再構築した。再構築した抗体の結合力測定を実施例１と同様に行った。 Next, a molecular phylogenetic tree was created using the sequence data of the antibodies contained in the sequence cluster shown in the red circle (circled by a curved line) in FIG. 6a (FIG. 6c). Using substantially the same method as described in Example 1, each antibody was reconstructed from the sequence data of (c1, c2, c3) indicated by the dashed arrow in FIG. 6c. The binding strength of the reconstituted antibody was measured in the same manner as in Example 1.

図６ｃ中のｃ１で示した配列データから再構築した抗体は、抗原への結合定数が３７０ｎＭであり、抗原からの解離が速い又は解離速度定数［ｎＭ／ｓ］が高いという特徴があった。図６ｃ中のｃ２およびｃ３で示した配列データから再構築した２つの抗体は、抗原への結合定数がそれぞれ６８ｎＭ及び９０ｎＭであり、いずれも図６ｃ中のｃ１の抗体に比べて、比較的高い結合力で投与した抗原と結合した。また、抗原からの解離も比較的遅かった。 The antibody reconstructed from the sequence data indicated by c1 in FIG. 6c had a binding constant to the antigen of 370 nM, and was characterized by rapid dissociation from the antigen or a high dissociation rate constant [nM/s]. The two antibodies reconstructed from the sequence data shown as c2 and c3 in Figure 6c have antigen binding constants of 68 nM and 90 nM, respectively, which are both relatively high compared to the antibody c1 in Figure 6c. It bound to the administered antigen by avidity. Dissociation from antigen was also relatively slow.

図６ｃ中のｃ１で示した配列データは６週目の採血のタイミングで得られたものであった。図６ｃ中のｃ１よりも強い結合力を示した図６ｃ中のｃ２及びｃ３の配列データは１３週目の採血のタイミングで得られたものであった。 The sequence data indicated by c1 in FIG. 6c was obtained at the timing of blood collection in the 6th week. The sequence data for c2 and c3 in FIG. 6c, which showed stronger binding strength than c1 in FIG. 6c, was obtained at the timing of blood sampling at the 13th week.

実施例２は、抗原特異抗体を示す配列データを含むと判定された配列クラスター中の配列データはいずれも、比較的高い結合力を有する抗体の可変領域をコードする塩基配列を含むことを示した。また、実施例２は、前記配列クラスター内にも様々な特徴を持つ抗体の可変領域をコードする塩基配列情報を有する配列データが存在することを示した。 Example 2 showed that all sequence data in sequence clusters determined to contain sequence data indicative of antigen-specific antibodies contained nucleotide sequences encoding variable regions of antibodies with relatively high avidity. . Furthermore, Example 2 showed that sequence data having nucleotide sequence information encoding variable regions of antibodies having various characteristics also existed within the sequence cluster.

実施例２は、抗原特異抗体を示す配列データを含むと判定された配列クラスター中の配列データのなかでも、抗原の投与回数がより多く、抗原投与からの週数がより経過した抗体の配列データの方がより高い結合力を有する抗原特異抗体である可能性を示した。 Example 2 shows sequence data of antibodies for which the antigen has been administered more times and more weeks have elapsed since the antigen administration, among the sequence data in the sequence cluster determined to include sequence data indicative of antigen-specific antibodies. showed the possibility that it is an antigen-specific antibody with higher binding strength.

Claims

In the control section of the computer,
preparing a sequence data group containing sequence data of antibodies obtained from animals that may have received immune stimulation;
creating a molecular phylogenetic tree from the sequence data group; and a change in the degree of sequence identity between the antibody sequence shown in the sequence data in the sequence cluster in the molecular phylogenetic tree and the original sequence of the antibody, or a change in the type of antibody. determining whether the sequence cluster contains sequence data indicating an antigen-specific antibody that is a candidate for antibody production, based on;
A program for executing a method for identifying a sequence cluster containing sequence data indicating an antigen-specific antibody that is a candidate for antibody production, the program comprising:
The program, wherein the original sequence has the highest sequence identity between the antibody sequence shown in the sequence data and the genome sequence of the animal species corresponding to the animal from which the antibody was obtained.

The sequence data group obtained in the preparation step is a sequence data group after sequencing,
The preparation step includes a step of aggregating a plurality of sequence data indicating the same sequence into one sequence data (hereinafter referred to as "unique sequence data"),
2. The program according to claim 1, wherein the unique sequence data includes information regarding the sequence shown in the sequence data and information regarding the number of aggregated sequence data (hereinafter referred to as "frequency of appearance (aggregated)").

The sequence data group is a unique sequence data group composed of the unique sequence data,
The unique sequence data is base sequence data,
Among the unique sequence data groups, the unique sequence data with the highest frequency of appearance (aggregation) is set as standard sequence data (n=0), and the length is the same as the sequence of the standard sequence data (hereinafter referred to as "standard sequence"). , unique sequence data having a sequence having n base substitutions (n=1, . . . m) with respect to the reference sequence is set as query sequence data,
The preparatory step is a step of integrating query sequences in which the ratio of the appearance frequency (aggregation) of the query sequence to the appearance frequency (aggregation) of the reference sequence data is less than a threshold value into the reference sequence data (hereinafter referred to as the "integration step"). The program according to claim 2, comprising:

The preparation step is
Classifying the sequence data in the sequence data group based on either or both of the sequence length and the original sequence on the genome sequence of the animal species corresponding to the animal (hereinafter referred to as the "classification step"); /or When the number of pieces of sequence data regarding a sequence having a predetermined degree of sequence identity with the sequence indicated by the specific sequence data in the sequence data group is less than a threshold regarding the number of pieces of sequence data, the specific sequence data is a step of excluding from the sequence data group (hereinafter referred to as "exclusion step");
The program according to any one of claims 1 to 3, comprising either one or both of the following.

The program according to claim 4, which refers to claim 3,
The program, wherein the preparation step includes the integration step, the classification step, and the exclusion step, and includes performing the integration step, the classification step, and the exclusion step in this order.

The classification based on the original sequence is
Obtained by homology search between either or both of the V gene fragment and J gene fragment constituting the variable region of the antibody shown in the sequence data in the sequence data group and the genome sequence of the animal species corresponding to the animal. 6. The program according to claim 4, comprising the steps of: calculating the degree of sequence identity based on the sequence identity; and assigning the sequence data to a class related to a gene fragment showing the highest degree of sequence identity.

7. The program according to claim 1, wherein the change in sequence identity is a change in sequence identity per unit time, and the change in antibody type is an antibody class switch.

causing the control unit to execute a step (hereinafter referred to as "selection step") of selecting sequence data indicating an antigen-specific antibody that is a candidate for antibody production from sequence clusters determined to include sequence data indicating an antigen-specific antibody; further including;
The selection step includes a step of comparing the degree of sequence identity between the sequence of the antibody shown in the sequence data in the sequence data group and the original sequence of the antibody with a threshold value,
Any one of claims 1 to 7, wherein the original sequence has the highest degree of sequence identity between the antibody sequence shown in the sequence data and the genome sequence of the animal species corresponding to the animal from which the antibody was obtained. The program described in paragraph 1.

A determination system for determining whether a sequence cluster includes sequence data indicating an antigen-specific antibody that is a candidate for antibody production,
Comprising a control unit and a storage unit,
The storage unit stores a database containing sequence data groups containing sequence data of antibodies obtained from animals that may have received immune stimulation,
The program according to any one of claims 1 to 7 is loaded into the control unit, and the control unit executes a method for identifying a sequence cluster containing sequence data indicating an antigen-specific antibody that is a candidate for antibody production. Judgment system.

A step in which the program selects sequence data indicative of an antigen-specific antibody that is a candidate for antibody production from sequence clusters determined by the control unit to include sequence data indicative of an antigen-specific antibody (hereinafter referred to as "selection step"). The selecting step includes a step of comparing the degree of sequence identity between the sequence of the antibody shown in the sequence data in the sequence data group and the original sequence of the antibody with a threshold value,
The determination system according to claim 9, wherein the original sequence is a sequence that has the highest degree of sequence identity between the sequence of the antibody shown in the sequence data and the genome sequence of the animal species corresponding to the animal from which the antibody was obtained. .

A method for producing an antigen-specific antibody, the method comprising:
A production method comprising the steps of selecting sequence data using the determination system according to claim 10 and producing an antibody having the amino acid sequence shown in the sequence data.