JP7196579B2

JP7196579B2 - Data processor, analyzer, data processing method and program

Info

Publication number: JP7196579B2
Application number: JP2018229870A
Authority: JP
Inventors: 華奈江寺本
Original assignee: Shimadzu Corp
Current assignee: Shimadzu Corp
Priority date: 2018-12-07
Filing date: 2018-12-07
Publication date: 2022-12-27
Anticipated expiration: 2038-12-07
Also published as: JP2020091783A

Description

本発明は、データ処理装置、分析装置、データ処理方法およびプログラムに関する。 The present invention relates to a data processing device, an analysis device, a data processing method and a program.

複数の種類の菌の間の類似の度合を様々な方法で算出し、わかりやすく表示するための提案がされている。非特許文献１には、複数の生物種について遺伝子の塩基配列または当該遺伝子がコードするアミノ酸配列の生物種間の相同性を示す指数を示したヒートマップが記載されている。非特許文献２には、基準株と他の生物との間の特定の遺伝子の配列相同性と、基準株と他の生物とで共通して観察されたリボソームタンパク質の割合との関係を示すグラフが記載されている。 Proposals have been made to calculate the degree of similarity between a plurality of types of bacteria by various methods and display them in an easy-to-understand manner. Non-Patent Document 1 describes a heat map showing an index indicating the homology between the base sequences of genes or the amino acid sequences encoded by the genes for a plurality of species. Non-Patent Document 2 has a graph showing the relationship between the sequence homology of a specific gene between the type strain and other organisms and the ratio of ribosomal proteins commonly observed between the type strain and other organisms is described.

Contreras-Moreira B and Vinuesa P, "get_homologues-est manual" [online], 2018年5月16日, University of Leeds and Macquarie University, [2018年9月27日検索], インターネット（URL:http://eead-csic-compbio.github.io/get_homologues/manual-est/）Contreras-Moreira B and Vinuesa P, "get_homologues-est manual" [online], May 16, 2018, University of Leeds and Macquarie University, [searched on September 27, 2018], Internet (URL: http:// eead-csic-compbio.github.io/get_homologues/manual-est/) Teramoto K, Sato H, Sun L, Torimura M, Tao H, Yoshikawa H, Hotta Y, Hosoda A, Tamura H. "Phylogenetic classification of Pseudomonas putida strains by MALDI-MS using ribosomal subunit proteins as biomarkers" Anal Chem,（米国）, American Chemical Society, 2007年10月16日、Volume 79, Issue 22, pp.8712-8719Teramoto K, Sato H, Sun L, Torimura M, Tao H, Yoshikawa H, Hotta Y, Hosoda A, Tamura H. "Phylogenetic classification of Pseudomonas putida strains by MALDI-MS using ribosomal subunit proteins as biomarkers" Anal Chem, USA ), American Chemical Society, October 16, 2007, Volume 79, Issue 22, pp.8712-8719

非特許文献１のように遺伝子の塩基配列または当該遺伝子がコードするタンパク質のアミノ酸配列の比較を行う場合は、各生物における遺伝子の塩基配列の情報が原則的に必要である。ここで、遺伝子のシークエンシングは、依然として手間、時間およびコストが相当程度発生する。また、塩基配列の解読ミスやアノテーションのミスの可能性を原理的に排除できない。非特許文献２の方法では、各生物と基準株との間の相同性しか分からず、各生物の間の相同性が分からないという問題があった。 When comparing the nucleotide sequences of genes or the amino acid sequences of proteins encoded by the genes as in Non-Patent Document 1, information on the nucleotide sequences of genes in each organism is required in principle. However, gene sequencing is still quite labor intensive, time consuming and costly. In principle, the possibility of base sequence decoding errors and annotation errors cannot be ruled out. The method of Non-Patent Document 2 has the problem that only the homology between each organism and the reference strain is known, and the homology between each organism is not known.

本発明の好ましい実施形態によるデータ処理装置は、第１生物に由来する第１試料と、第２生物に由来する第２試料とにおいて共通に含まれる対象分子が、前記第１試料および前記第２試料のそれぞれの質量分析において、前記対象分子の構造から算出されたまたは過去の測定により得られた同一のｍ／ｚに対応して検出されているかに基づいて、前記第１生物と前記第２生物との間の類似度を算出し、３以上の種類の生物から２種類の生物を選択して得られた、複数の組合せについて、それぞれ前記類似度を算出する類似度算出部と、前記３以上の種類の生物の間の類似度をマッピングして表示する類似度画像を含む出力画像を生成する出力画像生成部と、前記出力画像を出力する出力部とを備える。
さらに好ましい実施形態では、前記類似度算出部は、前記対象分子の少なくとも一つが、一種類の前記対象分子に対応する複数のｍ／ｚのいずれに対応して検出されているかに基づいて、前記第１生物と前記第２生物との間の類似度を算出する。
さらに好ましい実施形態では、前記出力画像生成部は、前記類似度の値に応じて前記類似度に対応する部分の色相、明度および彩度の少なくとも一つが異なる前記類似度画像を生成する。
さらに好ましい実施形態では、前記出力画像生成部は、前記類似度画像と、前記類似度算出部による前記類似度の算出とは異なる方法で算出された前記３以上の種類の生物の間の類似度についての情報とを含む前記出力画像を生成する。
さらに好ましい実施形態では、前記情報は、前記３以上の種類の生物の系統樹である。
さらに好ましい実施形態では、前記対象分子はタンパク質であり、前記異なる方法で算出された類似度は、前記タンパク質に対応する核酸の塩基配列の相同性に基づいて算出された類似度である。
さらに好ましい実施形態では、前記対象分子は、糖鎖、脂質、タンパク質または核酸である。
さらに好ましい実施形態では、前記対象分子はタンパク質または核酸であり、前記類似度算出部は、前記質量分析において、前記対象分子のアミノ酸配列または塩基配列に基づいて導出された同一のｍ／ｚに対応して前記対象分子が検出されているかに基づいて、前記第１生物と前記第２生物との間の類似度を算出する。
本発明の好ましい実施形態による分析装置は、上述のデータ処理装置と、質量分析計とを備える。
本発明の好ましい実施形態によるデータ処理方法は、第１生物に由来する第１試料と、第２生物に由来する第２試料とにおいて共通に含まれる対象分子が、前記第１試料および前記第２試料のそれぞれの質量分析において、前記対象分子の構造から算出されたまたは過去の測定により得られた同一のｍ／ｚに対応して検出されているかに基づいて、前記第１生物と前記第２生物との間の類似度を算出し、３以上の種類の生物から２種類の生物を選択して得られた、複数の組合せについて、それぞれ前記類似度を算出することと、前記３以上の種類の生物の間の類似度をマッピングして表示する類似度画像を含む出力画像を生成することと、前記出力画像を出力することとを備える。
本発明の好ましい実施形態によるプログラムは、第１生物に由来する第１試料と、第２生物に由来する第２試料とにおいて共通に含まれる対象分子が、前記第１試料および前記第２試料のそれぞれの質量分析において、前記対象分子の構造から算出されたまたは過去の測定により得られた同一のｍ／ｚに対応して検出されているかに基づいて、前記第１生物と前記第２生物との間の類似度を算出し、３以上の種類の生物から２種類の生物を選択して得られた、複数の組合せについて、それぞれ前記類似度を算出する類似度算出処理と、前記３以上の種類の生物の間の類似度をマッピングして表示する類似度画像を含む出力画像を生成する出力画像生成処理とを処理装置に行わせるためのものである。 In the data processing device according to a preferred embodiment of the present invention, a target molecule commonly contained in a first sample derived from a first organism and a second sample derived from a second organism is In each mass spectrometry of the sample, the first organism and the second organism are detected based on whether they are detected corresponding to the same m / z calculated from the structure of the target molecule or obtained by past measurement a similarity calculation unit that calculates the similarity between organisms and calculates the similarity for each of a plurality of combinations obtained by selecting two types of organisms from three or more types of organisms; An output image generation unit for generating an output image including a similarity image that maps and displays similarities between the above types of living things, and an output unit for outputting the output image.
In a further preferred embodiment, the similarity calculation unit determines, based on which of a plurality of m/z's corresponding to one type of target molecule, at least one of the target molecules is detected to correspond to the A degree of similarity between the first organism and the second organism is calculated.
In a further preferred embodiment, the output image generating section generates the similarity image in which at least one of hue, brightness and saturation of the portion corresponding to the similarity is different according to the similarity value.
In a further preferred embodiment, the output image generation unit calculates the similarity between the similarity image and the similarity between the three or more types of organisms calculated by a method different from the calculation of the similarity by the similarity calculation unit. generating said output image containing information about and
In a further preferred embodiment, said information is a phylogenetic tree of said three or more species of organisms.
In a further preferred embodiment, the target molecule is a protein, and the degree of similarity calculated by the different method is the degree of similarity calculated based on the homology of nucleotide sequences of nucleic acids corresponding to the protein.
In a further preferred embodiment, the target molecule is a sugar chain, lipid, protein or nucleic acid.
In a further preferred embodiment, the target molecule is a protein or a nucleic acid, and the similarity calculator corresponds to the same m/z derived based on the amino acid sequence or base sequence of the target molecule in the mass spectrometry. Then, the degree of similarity between the first organism and the second organism is calculated based on whether the target molecule is detected.
An analyzer according to a preferred embodiment of the present invention comprises the data processing device described above and a mass spectrometer.
In a data processing method according to a preferred embodiment of the present invention, a target molecule commonly contained in a first sample derived from a first organism and a second sample derived from a second organism is In each mass spectrometry of the sample, the first organism and the second organism are detected based on whether they are detected corresponding to the same m / z calculated from the structure of the target molecule or obtained by past measurement calculating the degree of similarity between organisms, and calculating the degree of similarity for each of a plurality of combinations obtained by selecting two types of organisms from three or more types of organisms; generating an output image including a similarity image that maps and displays the similarity between the organisms; and outputting the output image.
A program according to a preferred embodiment of the present invention, wherein a target molecule commonly contained in a first sample derived from a first organism and a second sample derived from a second organism is In each mass spectrometry, the first organism and the second organism are detected corresponding to the same m / z calculated from the structure of the target molecule or obtained by past measurement A similarity calculation process for calculating the similarity for each of a plurality of combinations obtained by selecting two types of organisms from three or more types of organisms, and calculating the similarity between the three or more It is for causing the processing device to perform an output image generation process for generating an output image including a similarity image that maps and displays similarities between types of living things.

本発明によれば、迅速に、複数の生物の間の類似度を算出し、分かりやすく表示することができる。 According to the present invention, it is possible to quickly calculate the degree of similarity between a plurality of organisms and display it in an easy-to-understand manner.

図１は、一実施形態に係る分析装置の構成を示す概念図である。FIG. 1 is a conceptual diagram showing the configuration of an analysis device according to one embodiment. 図２は、３種類のアクネ菌について、各リボソームタンパク質に対応するピークが検出されたか否かを示す表（表Ａ、表Ｂおよび表Ｃ）である。FIG. 2 is a table (Table A, Table B and Table C) showing whether peaks corresponding to each ribosomal protein were detected for three types of P. acnes. 図３は、３種類のアクネ菌の間の類似度を示す画像を示す図である。FIG. 3 is a diagram showing an image showing the degree of similarity between three types of P. acnes. 図４は、一実施形態に係る分析方法の流れを示すフローチャートである。FIG. 4 is a flow chart showing the flow of the analysis method according to one embodiment. 図５は、２３種類のアクネ菌の間の類似度を示す表と、当該アクネ菌の系統樹とを示す図である。FIG. 5 is a diagram showing a table showing the degree of similarity among 23 types of P. acnes and a phylogenetic tree of the P. acnes. 図６は、プログラムの提供について説明するための概念図である。FIG. 6 is a conceptual diagram for explaining program provision. 図７は、実施例で得られたアクネ菌のマススペクトルである。FIG. 7 is a mass spectrum of P. acnes obtained in Example.

以下、図を参照して本発明を実施するための形態について説明する。 Embodiments for carrying out the present invention will be described below with reference to the drawings.

－第１実施形態－
第１実施形態では、複数の試料のそれぞれを異なる複数の生物に由来する試料とし、これらの生物の間の類似度を算出し、当該類似度をマッピングして示す画像を作成するデータ処理装置を含む分析装置が説明される。 -First Embodiment-
In the first embodiment, each of a plurality of samples is a sample derived from a plurality of different organisms, a data processing device that calculates the similarity between these organisms, maps the similarity, and creates an image showing the similarity. An analyzer including is described.

図１は、分析装置１の構成を示す概念図である。分析装置１は、測定部１００と、情報処理部４０とを備える。情報処理部４０は、データ処理装置を構成する。
なお、情報処理部４０の機能の一部または全部は、測定部１００とは物理的に離れた電子計算機、サーバ等に配置してもよい。 FIG. 1 is a conceptual diagram showing the configuration of the analysis device 1. As shown in FIG. The analysis device 1 includes a measurement section 100 and an information processing section 40 . The information processing section 40 constitutes a data processing device.
Some or all of the functions of the information processing section 40 may be arranged in a computer, server, or the like physically separated from the measurement section 100 .

測定部１００は、イオン化された試料（以下、試料イオンＳと呼ぶ）をマトリックス支援レーザー脱離イオン化（以下、ＭＡＬＤＩと呼ぶ）により生成するイオン化部１０と、イオン加速部２１と、質量分離部２２と、検出部３０とを備える。イオン加速部２１は、加速電極２１０を備える。質量分離部２２０は、飛行時間型質量分析器２２０を備える。図１では、試料イオンＳの移動を矢印Ａ１で模式的に示した。 The measurement unit 100 includes an ionization unit 10 that generates an ionized sample (hereinafter referred to as sample ions S) by matrix-assisted laser desorption ionization (hereinafter referred to as MALDI), an ion acceleration unit 21, and a mass separation unit 22. and a detection unit 30 . The ion acceleration section 21 includes acceleration electrodes 210 . The mass separator 220 includes a time-of-flight mass spectrometer 220 . In FIG. 1, the movement of the sample ions S is schematically indicated by an arrow A1.

情報処理部４０は、入力部４１と、通信部４２と、記憶部４３と、出力部４４と、制御部５０とを備える。制御部５０は、装置制御部５１と、データ処理部５２と、出力画像生成部５３と、出力制御部５４とを備える。データ処理部５２は、マススペクトル生成部５２１と、類似度算出部５２２とを備える。測定部１００の検出部３０から出力される試料イオンＳの検出信号の流れを矢印Ａ２で模式的に示した。装置制御部５１による測定部１００の制御を矢印Ａ３で模式的に示した。 The information processing section 40 includes an input section 41 , a communication section 42 , a storage section 43 , an output section 44 and a control section 50 . The control unit 50 includes a device control unit 51 , a data processing unit 52 , an output image generation unit 53 and an output control unit 54 . The data processor 52 includes a mass spectrum generator 521 and a similarity calculator 522 . The flow of the detection signal of the sample ions S output from the detection section 30 of the measurement section 100 is schematically indicated by an arrow A2. Control of the measurement unit 100 by the device control unit 51 is schematically indicated by an arrow A3.

（試料について）
試料は、生物に由来する対象分子を含むものであれば特に限定されない。類似度をマッピングして示すため（図３参照）、３以上の生物にそれぞれ由来する３以上の試料が用意される。対象分子は、これら３以上の試料が由来する３以上の種類の生物において共通に含まれる分子から複数選択され、糖鎖、脂質、タンパク質または核酸であることが好ましい。生物は、微生物が好ましく、真正細菌がより好ましいが、対象分子を生成する生物であれば特に限定されない。 (Regarding the sample)
The sample is not particularly limited as long as it contains target molecules derived from organisms. In order to map and show the degree of similarity (see FIG. 3), 3 or more samples each derived from 3 or more organisms are prepared. Target molecules are selected from molecules commonly contained in three or more types of organisms from which these three or more samples are derived, and are preferably sugar chains, lipids, proteins, or nucleic acids. The organism is preferably a microorganism, more preferably a eubacterium, but is not particularly limited as long as it produces the target molecule.

生物が微生物の場合、培養により得られたコロニーを回収し、コロニーを構成する菌体にマトリックスを含む溶液（以下、マトリックス溶液と呼ぶ）を加えてＭＡＬＤＩ用試料プレートに滴下して乾燥させることで試料が調製される。菌体をＭＡＬＤＩ用試料プレートに配置した後、マトリックス溶液を加えてもよい。マトリックスの種類は特に限定されないが、ＣＨＣＡ（α-cyano-4-hydroxycinnamic acid）、シナピン酸、ＤＨＢ（2,5-dihydroxybenzoic acid）、ＴＨＡＰ（2,4,6-trihydroxyacetophenone）またはＤＡＮ（1,5-diaminonaphtalene）が精度よく質量分析を行う上で好ましい。マトリックス溶液の溶媒は、アセトニトリル等の有機溶媒を数十体積％含む水溶液にトリフルオロ酢酸（TFA）が０～３体積％添加されたもの等を用いることができる。
なお、得られたコロニーに含まれる菌体から対象分子を抽出した後、抽出物にマトリックス溶液を加えて試料を調製してもよい。微生物以外の生物から試料を調製する場合も、適宜公知の前処理法等を用いることができる。 When the organism is a microorganism, a colony obtained by culturing is collected, a solution containing a matrix (hereinafter referred to as a matrix solution) is added to the cells forming the colony, and the mixture is dropped onto a MALDI sample plate and dried. A sample is prepared. A matrix solution may be added after arranging the cells on the MALDI sample plate. The type of matrix is not particularly limited, but CHCA (α-cyano-4-hydroxycinnamic acid), sinapinic acid, DHB (2,5-dihydroxybenzoic acid), THAP (2,4,6-trihydroxyacetophenone) or DAN (1,5 -diaminonaphtalene) is preferable for accurate mass spectrometry. As a solvent for the matrix solution, an aqueous solution containing several tens of volume % of an organic solvent such as acetonitrile, to which 0 to 3 volume % of trifluoroacetic acid (TFA) is added, or the like can be used.
A sample may be prepared by adding a matrix solution to the extract after extracting the target molecule from the cells contained in the obtained colonies. When preparing samples from organisms other than microorganisms, well-known pretreatment methods and the like can be used as appropriate.

測定部１００は、質量分析計を備え、試料をイオン化し、質量分離して検出する。 The measurement unit 100 includes a mass spectrometer, ionizes a sample, separates the sample by mass, and detects the ionized sample.

測定部１００のイオン化部１０は、ＭＡＬＤＩ用試料プレートを支持する不図示の試料プレートホルダと、ＭＡＬＤＩ用試料プレート上にレーザーを照射する不図示のレーザー装置を備えるイオン源を備え、試料にレーザーを照射してイオン化する。
なお、試料のイオン化の方法は、対象分子をイオン化することができれば特に限定されず、ＭＡＬＤＩ法以外にもエレクトロスプレー（ＥＳＩ）法等の任意のイオン化の方法を用いることができる。 The ionization unit 10 of the measurement unit 100 includes a sample plate holder (not shown) that supports the MALDI sample plate and an ion source that includes a laser device (not shown) that irradiates the MALDI sample plate with a laser. Ionize by irradiation.
The sample ionization method is not particularly limited as long as the target molecules can be ionized, and any ionization method other than the MALDI method, such as the electrospray (ESI) method, can be used.

イオン加速部２１は、加速電極２１０を備え、導入された試料イオンＳを加速させる。加速された試料イオンＳの流れは、不図示のイオンレンズ等により適宜収束されて質量分離部２２に導入される。 The ion acceleration section 21 includes an acceleration electrode 210 and accelerates the introduced sample ions S. The accelerated flow of sample ions S is appropriately converged by an ion lens (not shown) or the like and introduced into the mass separator 22 .

質量分離部２２は、飛行時間型質量分析器２２０を備え、それぞれの試料イオンＳが飛行時間型質量分析器２２０のフライトチューブの内部を飛行する際の飛行時間の違いより試料イオンＳを分離する。
なお、図１ではリニア型の飛行時間型質量分析器が示されているが、リフレクトロン型やマルチターン型等でもよい。試料イオンＳを分離して所望の精度で検出することができれば、質量分析の方法は特に限定されず、イオントラップや四重極マスフィルタ等の任意の質量分析器を用いることができる。 The mass separation unit 22 includes a time-of-flight mass spectrometer 220, and separates the sample ions S based on the difference in flight time when each sample ion S flies through the flight tube of the time-of-flight mass spectrometer 220. .
Although a linear time-of-flight mass spectrometer is shown in FIG. 1, it may be a reflectron type, a multi-turn type, or the like. The method of mass spectrometry is not particularly limited as long as the sample ions S can be separated and detected with desired accuracy, and any mass spectrometer such as an ion trap or quadrupole mass filter can be used.

検出部３０は、マイクロチャンネルプレート等のイオン検出器を備え、質量分離部２２で分離された試料イオンＳを検出し、検出部３０に入射した試料イオンＳの量に応じた強度の検出信号を出力する。検出部３０から出力された検出信号は、Ａ／Ｄ変換されたのち、情報処理部４０の記憶部４３に測定データとして記憶される（矢印Ａ２）。 The detection unit 30 includes an ion detector such as a microchannel plate, detects the sample ions S separated by the mass separation unit 22, and generates a detection signal having an intensity corresponding to the amount of the sample ions S incident on the detection unit 30. Output. The detection signal output from the detection section 30 is A/D converted and then stored as measurement data in the storage section 43 of the information processing section 40 (arrow A2).

情報処理部４０の入力部４１は、マウス、キーボード、各種ボタンまたはタッチパネル等の入力装置を含んで構成される。入力部４１は、測定部１００の測定および制御部５０の処理に必要な情報等を、分析装置のユーザー（以下、単に「ユーザー」と呼ぶ）から受け付ける。情報処理部４０の通信部４２は、インターネット等の無線や有線による接続により通信可能な通信装置を含んで構成される。通信部４２は、測定部１００の測定および制御部５０の処理に必要な情報等、適宜データを送受信する。 The input unit 41 of the information processing unit 40 includes an input device such as a mouse, keyboard, various buttons, or a touch panel. The input unit 41 receives information and the like necessary for the measurement of the measurement unit 100 and the processing of the control unit 50 from the user of the analyzer (hereinafter simply referred to as "user"). The communication unit 42 of the information processing unit 40 includes a communication device capable of communicating by wireless or wired connection such as the Internet. The communication unit 42 appropriately transmits and receives data such as information required for measurement by the measurement unit 100 and processing by the control unit 50 .

情報処理部４０の記憶部４３は、不揮発性の記憶媒体を備える。記憶部４３は、測定データ、制御部５０の処理に必要なデータおよび制御部５０の処理により得られたデータ、ならびに制御部５０が処理を実行するためのプログラム等を記憶する。情報処理部４０の出力部４４は、液晶モニタ等の表示装置やプリンター等を含んで構成され、データ処理部５２で処理されたデータや出力画像生成部５３で生成された出力画像等を、表示装置に表示したり、紙媒体に印刷して出力する。 The storage unit 43 of the information processing unit 40 includes a nonvolatile storage medium. The storage unit 43 stores measurement data, data required for processing by the control unit 50, data obtained by the processing by the control unit 50, programs for the control unit 50 to execute processing, and the like. The output unit 44 of the information processing unit 40 includes a display device such as a liquid crystal monitor, a printer, and the like, and displays the data processed by the data processing unit 52 and the output image generated by the output image generation unit 53. Display it on a device or print it out on a paper medium.

情報処理部４０の制御部５０は、ＣＰＵ等のプロセッサを含んで構成され、分析装置１を制御する動作の主体として機能する。制御部５０は、記憶部４３等に記憶されたプログラムを実行することにより各種処理を行う。 The control unit 50 of the information processing unit 40 is configured including a processor such as a CPU, and functions as an entity that controls the analysis apparatus 1 . The control unit 50 performs various processes by executing programs stored in the storage unit 43 or the like.

制御部５０の装置制御部５１は、入力部４１からの入力等に応じて設定された分析条件に基づいて、測定部１００の動作を制御する。 The device control section 51 of the control section 50 controls the operation of the measurement section 100 based on the analysis conditions set according to the input from the input section 41 or the like.

制御部５０のデータ処理部５２は、測定データの処理を行う。 A data processing unit 52 of the control unit 50 processes measurement data.

データ処理部５２のマススペクトル生成部５２１は、検出部３０が検出したイオンの強度と、当該イオンの飛行時間とを含む測定データから、マススペクトルに対応するデータ（以下、マススペクトルデータと呼ぶ）を生成する。マススペクトル生成部５２１は、予め得られた較正データに基づいて飛行時間をｍ／ｚ値に換算し、各ｍ／ｚ値に対応する強度を示すマススペクトルデータを生成する。 The mass spectrum generation unit 521 of the data processing unit 52 generates data corresponding to the mass spectrum (hereinafter referred to as mass spectrum data) from the measurement data including the intensity of the ions detected by the detection unit 30 and the time of flight of the ions. to generate The mass spectrum generator 521 converts the time of flight into m/z values based on previously obtained calibration data, and generates mass spectrum data indicating the intensity corresponding to each m/z value.

データ処理部５２の類似度算出部５２２は、３以上の試料のうち、２つの試料（以下、「第１試料」および「第２試料」と呼ぶ）を質量分析して得られた２つのマススペクトルデータに基づいて、第１試料および第２試料がそれぞれ由来する第１生物と第２生物との間の類似度を算出する。第１試料および第２試料を質量分析して得られたマススペクトルを、それぞれ第１マススペクトルおよび第２マススペクトルと呼ぶ。 The similarity calculation unit 522 of the data processing unit 52 calculates two masses obtained by mass spectrometry of two samples (hereinafter referred to as “first sample” and “second sample”) among the three or more samples. A degree of similarity between the first and second organisms from which the first and second samples are derived is calculated based on the spectral data. The mass spectra obtained by mass spectrometrically analyzing the first sample and the second sample are called the first mass spectrum and the second mass spectrum, respectively.

類似度算出部５２２は、記憶部４３を参照し、複数の対象分子に対応する複数のｍ／ｚの値を取得する。複数の対象分子の少なくとも一部は、類似度を算出する対象となる３以上の生物（以下、対象生物と呼ぶ）において複数の変異体（Ｖａｒｉａｎｔ）が存在する。記憶部４３には、対象分子およびその変異体を質量分析により検出する際のｍ／ｚの値が参照データとして記憶されている。これらのｍ／ｚの値は、タンパク質のアミノ酸配列等、対象分子の構造から予め算出されたものか、ユーザーや他者による過去の測定により得られたものである。 The similarity calculation unit 522 refers to the storage unit 43 and acquires multiple m/z values corresponding to multiple target molecules. At least some of the plurality of target molecules have a plurality of variants in three or more organisms (hereinafter referred to as target organisms) whose similarity is to be calculated. The storage unit 43 stores, as reference data, the m/z values when the target molecule and its mutants are detected by mass spectrometry. These m/z values are either pre-calculated from the structure of the target molecule, such as the amino acid sequence of a protein, or obtained from past measurements by the user or others.

以下では、対象分子をタンパク質（対象タンパク質）、対象生物を同属同種の微生物の異なる複数の株とし、対象タンパク質の一部は、対象生物、すなわち株によってアミノ酸配列が異なる複数の変異体（以下、対象変異体と呼ぶ）が存在するとする。 In the following, the target molecule is a protein (target protein), the target organism is a plurality of different strains of microorganisms of the same genus and the same species, and a part of the target protein is a plurality of mutants with different amino acid sequences depending on the target organism, that is, the strain (hereinafter referred to as (referred to as the mutant of interest) exists.

記憶部４３の参照データは、以下のように構成されている。対象タンパク質が対象生物において変異体を持たない場合、対象タンパク質と当該対象タンパク質に対応するｍ／ｚの値が対応付けられて記憶されている。対象生物について、対象タンパク質の複数の対象変異体が存在する場合、対象タンパク質と各対象変異体と、当該各対象変異体に対応するｍ／ｚの値が対応付けられて記憶されている。 Reference data in the storage unit 43 is configured as follows. When the target protein does not have a mutant in the target organism, the target protein and the m/z value corresponding to the target protein are associated and stored. When a plurality of target variants of the target protein exist for the target organism, the target protein, each target variant, and the m/z value corresponding to each target variant are associated and stored.

例えば、対象生物がアクネ菌Ａ、ＢおよびＣ株であり、対象タンパク質がリボソームタンパク質Ｓ１５、Ｓ１９、Ｓ０８、Ｌ０９、Ｓ１８、Ｌ２７、Ｓ１７およびＬ２３とする。このうち、アクネ菌Ａ、ＢおよびＣ株において、Ｓ１７は３種類の異なる対象変異体（以下、Ｓ１７（１）、Ｓ１７（２）およびＳ１７（３）と呼ぶ）が存在し、Ｌ２３は２種類の異なる対象変異体（以下、Ｌ２３（１）およびＬ２３（２）と呼ぶ）が存在しているものとする。これらの対象変異体は互いに検出される際のｍ／ｚが異なる。このとき、参照データでは、対象変異体が存在しないＳ１５、Ｓ１９、Ｓ０８、Ｌ０９、Ｓ１８およびＬ２７のＩＤと、これらに対応するｍ／ｚの各値とが対応付けられている。そして、対象変異体が存在するＳ１７のＩＤには、対象変異体Ｓ１７（１）、Ｓ１７（２）およびＳ１７（３）のＩＤが対応付けられており、Ｓ１７（１）、Ｓ１７（２）およびＳ１７（３）のＩＤには、これらに対応するｍ／ｚの各値が対応付けられている。Ｌ２３についてもＳ１７と同様である。 For example, the target organisms are Acne bacteria A, B and C strains, and the target proteins are ribosomal proteins S15, S19, S08, L09, S18, L27, S17 and L23. Among them, in acne bacteria A, B and C strains, S17 has three different target mutants (hereinafter referred to as S17 (1), S17 (2) and S17 (3)), and L23 has two types different target mutants (hereinafter referred to as L23(1) and L23(2)) are present. These mutants of interest differ in m/z when detected from each other. At this time, in the reference data, the IDs of S15, S19, S08, L09, S18, and L27 in which the target mutant does not exist are associated with the corresponding m/z values. The ID of S17 in which the target mutant exists is associated with the IDs of the target mutants S17(1), S17(2) and S17(3), and S17(1), S17(2) and The IDs of S17(3) are associated with corresponding m/z values. L23 is similar to S17.

類似度算出部５２２は、各対象生物に由来する試料を質量分析して得られたマススぺクトルにおいて、対象タンパク質および対象変異体のｍ／ｚに対応するピークを検出する。具体的には、参照データの上記各ｍ／ｚに対し、質量分析の精度に基づいて定められるｍ／ｚのばらつきの許容範囲（許容誤差）内にピークが検出された場合、当該マススペクトルに対応する対象生物において当該ｍ／ｚに対応する対象タンパク質または対象変異体が存在するものとする。対象生物においてこのようにそれぞれの対象タンパク質または対象変異体が検出されたこと、または検出されなかったことは、二値化された値（以下、検出確認値と呼ぶ）等により記憶部４３に記憶される。 The similarity calculator 522 detects peaks corresponding to the m/z of the target protein and target mutant in the mass spectrum obtained by mass spectrometry of the sample derived from each target organism. Specifically, for each m / z of the reference data, if a peak is detected within the allowable range (permissible error) of m / z variation determined based on the accuracy of mass spectrometry, the mass spectrum Assume that there is a protein of interest or variant of interest corresponding to that m/z in the corresponding organism of interest. The detection or non-detection of each target protein or target mutant in the target organism is stored in the storage unit 43 as a binarized value (hereinafter referred to as a detection confirmation value) or the like. be done.

類似度算出部５２２は、各対象タンパク質が第１マススペクトルおよび第２マススペクトルにおいて同一のｍ／ｚに基づく許容範囲に検出されたか否かに基づいて、第１生物と第２生物との間の類似度を算出する。 The similarity calculation unit 522 determines whether each target protein is detected within the same m/z-based allowable range in the first mass spectrum and the second mass spectrum, between the first organism and the second organism. Calculate the similarity of

図２は、アクネ菌Ａ（P_acnes_A）、Ｂ（P_acnes_B）およびＣ（P_acnes_C）株のそれぞれを質量分析して得られたマススぺクトルにおいて、対象タンパク質または対象変異体を検出した場合に検出確認値を１、検出しなかった場合に検出確認値を０として、各株ごとに検出確認値を表にして示した図である。図２では、２試料間の比較のため、アクネ菌Ａ株とＢ株についての検出確認値を表Ａに、Ａ株とＣ株についての検出確認値を表Ｂに、Ｂ株とＣ株についての検出確認値を表Ｃに示した。 Fig. 2 shows the detection confirmation value when the target protein or target mutant is detected in the mass spectrum obtained by mass spectrometry of each of P. acnes A (P_acnes_A), B (P_acnes_B), and C (P_acnes_C) strains. is 1, and the detection confirmation value is 0 when not detected, and the detection confirmation value is shown in a table for each strain. In FIG. 2, for comparison between the two samples, Table A shows the detection confirmation values for P. acnes A and B strains, Table B shows the detection confirmation values for A and C strains, and B and C strains. are shown in Table C.

表Ａに示されたように、アクネ菌Ａ株では、質量分析によりＳ１５、Ｓ１９、Ｓ０８、Ｌ０９、Ｓ１８、Ｌ２７、Ｓ１７（２）およびＬ２３（１）が検出された。アクネ菌Ｂ株では、質量分析によりＳ１５、Ｓ１９、Ｓ０８、Ｌ０９、Ｓ１８、Ｌ２７、Ｓ１７（３）およびＬ２３（２）が検出された。表Ｂに示されたように、アクネ菌Ｃ株では、質量分析によりＳ１５、Ｓ１９、Ｓ０８、Ｌ０９、Ｓ１８、Ｌ２７、Ｓ１７（１）およびＬ２３（２）が検出された。 As shown in Table A, S15, S19, S08, L09, S18, L27, S17(2) and L23(1) were detected by mass spectrometry in Acne strain A. In P. acnes strain B, mass spectrometry detected S15, S19, S08, L09, S18, L27, S17(3) and L23(2). As shown in Table B, S15, S19, S08, L09, S18, L27, S17(1) and L23(2) were detected by mass spectrometry in P. acnes C strain.

類似度算出部５２２は、アクネ菌Ａ株とＢ株の類似度を、各対象タンパク質ごとに同一のｍ／ｚにより検出されるか否かに基づいて算出する。すなわち対象タンパク質に変異体がある場合は同一の変異体により検出されるか否かに基づいて当該類似度を算出する。 The similarity calculation unit 522 calculates the similarity between the P. acnes A strain and the P. acnes B strain based on whether or not each target protein is detected with the same m/z. That is, if the target protein has a mutation, the degree of similarity is calculated based on whether or not the same mutation is detected.

表Ａのアクネ菌Ａ株とＢ株について、Ｓ１５、Ｓ１９、Ｓ０８、Ｌ０９、Ｓ１８およびＬ２７の６つの対象タンパク質は共に検出されているが、Ｓ１７とＬ２３の２つの対象タンパク質はＡ株とＢ株で異なる変異体が検出されている。従って、類似度算出部５２２は、アクネ菌Ａ株とＢ株の間の類似度を、同一のｍ／ｚにより検出された対象タンパク質数を全対象タンパク質数で割った値、すなわち６／８＝０．７５（または７５％）と算出する。 For P. acnes A and B strains in Table A, the six target proteins S15, S19, S08, L09, S18 and L27 were detected together, but the two target proteins S17 and L23 were detected in strains A and B. different variants have been detected in Therefore, the similarity calculation unit 522 calculates the similarity between P. acnes A strain and B strain by dividing the number of target proteins detected by the same m/z by the total number of target proteins, that is, 6/8 = Calculate 0.75 (or 75%).

同様に、表Ｂのアクネ菌Ａ株とＣ株について、Ｓ１５、Ｓ１９、Ｓ０８、Ｌ０９、Ｓ１８およびＬ２７の６つの対象タンパク質は共に検出されているが、Ｓ１７とＬ２３の２つの対象タンパク質はＡ株とＣ株で異なる変異体が検出されている。従って、類似度算出部５２２は、アクネ菌Ａ株とＣ株の間の類似度を、６／８＝０．７５（または７５％）と算出する。 Similarly, six target proteins S15, S19, S08, L09, S18 and L27 are detected for acne strains A and C in Table B, but two target proteins S17 and L23 are detected in strain A. and C strains have been detected with different mutations. Therefore, the similarity calculation unit 522 calculates the similarity between P. acnes A strain and C strain as 6/8=0.75 (or 75%).

さらに、表Ｃのアクネ菌Ｂ株とＣ株について、Ｓ１５、Ｓ１９、Ｓ０８、Ｌ０９、Ｓ１８、Ｌ２７およびＬ２３（Ｌ２３（２）が共に検出されている）の７つの対象タンパク質は共に同一のｍ／ｚに対応する分子が検出されているが、Ｓ１７の対象タンパク質はＢ株とＣ株で異なる変異体が検出されている。従って、類似度算出部５２２は、アクネ菌Ｂ株とＣ株の間の類似度を、７／８＝０．８７５（または８７．５％）と算出する。 Furthermore, for P. acnes strains B and C in Table C, the 7 proteins of interest S15, S19, S08, L09, S18, L27 and L23 (with L23(2) being detected together) were both of the same m/ A molecule corresponding to z is detected, but different mutants are detected in the target protein of S17 between the B strain and the C strain. Therefore, the similarity calculation unit 522 calculates the similarity between P. acnes B strain and C strain as 7/8=0.875 (or 87.5%).

類似度算出部５２２は、３以上の種類の対象生物から２つの対象生物を選択して得られた複数の組合せ（ペア）について、それぞれ上記のように類似度を算出する。全ての対象生物の組合せについて類似度を算出したら、類似度算出部５２２は、２つの対象生物と当該対象生物間の類似度とが対応付けられた類似度データを記憶部４３に記憶させる。 The similarity calculation unit 522 calculates similarities as described above for a plurality of combinations (pairs) obtained by selecting two target organisms from three or more types of target organisms. After calculating the similarities for all combinations of target organisms, the similarity calculation unit 522 causes the storage unit 43 to store similarity data in which the similarities between two target organisms and the target organisms are associated with each other.

出力画像生成部５３は、類似度データに基づいて、対象生物の間の類似度をマッピングして表示する類似度画像を生成する。出力画像生成部５３は、類似度画像と、適宜分析条件等の他の情報とを含み、出力部４４から出力される出力画像を生成する。 Based on the similarity data, the output image generation unit 53 generates a similarity image that maps and displays the similarity between the target organisms. The output image generation unit 53 generates an output image output from the output unit 44 including the similarity image and other information such as analysis conditions as appropriate.

図３は、類似度画像の例を示す図である。類似度画像６０は、表（以下、類似度表と呼ぶ）が示されており、類似度表では、各行および各列のそれぞれが一つの種類の対象生物に対応している。類似度表では、一つの対象生物（対象生物Ｘとする）に対応する行と、他の対象生物（対象生物Ｙ）に対応する列とに対応する要素Ｅに、対象生物ＸとＹとの間の類似度（％表示）が示されている。図３の類似度表では、アクネ菌のＡ、ＢおよびＣ株の間の類似度が示されている。例えば、アクネ菌Ａ株とＢ株との間の類似度は、アクネ菌Ａ株に対応する行（または列）における、アクネ菌Ｂ株に対応する列（または行）に対応する要素Ｅａｂ（またはＥｂａ）を参照することで得られる。類似度画像６０では、同一の生物間の類似度は、１００％として表示されている。 FIG. 3 is a diagram showing an example of a similarity image. The similarity image 60 is shown as a table (hereinafter referred to as a similarity table), in which each row and each column corresponds to one type of target organism. In the similarity table, an element E corresponding to a row corresponding to one target organism (assumed to be target organism X) and a column corresponding to another target organism (target organism Y) has the relationship between target organisms X and Y. The degree of similarity (in %) between the two is shown. The similarity table in FIG. 3 shows the similarity between A, B and C strains of P. acnes. For example, the degree of similarity between P. acnes A and B strains is the element Eab (or Eba). In the similarity image 60, the similarity between the same organisms is displayed as 100%.

出力画像生成部５３は、類似度に応じて、類似度表における各類似度に対応する要素Ｅの色相、明度および彩度の少なくとも一つが異なるように類似度画像を生成する。図３の例では、各要素Ｅに対応する類似度の値に応じて、色相が異なる点を、各要素Ｅにおけるハッチングが異なることで模式的に示した。 The output image generator 53 generates a similarity image such that at least one of the hue, brightness, and saturation of the element E corresponding to each similarity in the similarity table differs according to the similarity. In the example of FIG. 3, each element E is hatched differently to schematically show that the hue differs according to the similarity value corresponding to each element E. In the example of FIG.

出力制御部５４は、出力画像生成部５３が生成した出力画像を出力部４４から出力させる。分析者等は、出力部４４から出力された類似度画像等を見て、対象生物同士がどの程度類似しているかを視覚的にわかりやすく捉えることができる。 The output control unit 54 causes the output image generated by the output image generation unit 53 to be output from the output unit 44 . An analyst or the like can see the similarity image or the like output from the output unit 44 and visually grasp how similar the target organisms are to each other in an easy-to-understand manner.

図４は、本実施形態に係るデータ処理方法の流れを示すフローチャートである。図２の例でも、対象生物を同属同種の微生物の異なる株（以下、対象微生物と呼ぶ）とし、対象分子をタンパク質とするが、本発明はこれらの条件に限定されない。ステップＳ１００１において、制御部５０は、対象微生物において、複数の株に同一分子または変異体として共通して発現するタンパク質を対象タンパク質、当該変異体を対象変異体として設定する。ステップＳ１００１が終了したら、ステップＳ１００３が開始される。 FIG. 4 is a flow chart showing the flow of the data processing method according to this embodiment. In the example of FIG. 2 as well, different strains of microorganisms of the same genus and species (hereinafter referred to as target microorganisms) are used as target organisms, and proteins are used as target molecules, but the present invention is not limited to these conditions. In step S1001, the control unit 50 sets a protein that is commonly expressed as the same molecule or a mutant in a plurality of strains of a target microorganism as a target protein, and sets the mutant as a target mutant. After step S1001 ends, step S1003 is started.

ステップＳ１００３において、記憶部４３は、対象タンパク質および変異体のｍ／ｚを取得する。ステップＳ１００３が終了したら、ステップＳ１００５が開始される。ステップＳ１００５において、対象微生物を含むｎ個（ｎは３以上）の試料が用意される。ステップＳ１００５が終了したら、ステップＳ１００７が開始される。 In step S1003, the storage unit 43 acquires the m/z of the target protein and mutants. After step S1003 ends, step S1005 is started. In step S1005, n (n is 3 or more) samples containing the target microorganism are prepared. After step S1005 ends, step S1007 is started.

ステップＳ１００７において、測定部１００は複数の試料を質量分析し、マススペクトル生成部５２１は各試料のマススペクトルに対応するデータを生成する。ステップＳ１００７が終了したら、ステップＳ１００９が開始される。ステップＳ１００９において、類似度算出部５２２は、マススペクトルにおいてステップＳ１００３で取得したｍ／ｚに基づく範囲にピークが存在するか否かに基づいて、ｎ個の試料のうちの各ペアについて類似度を算出する。ステップＳ１００９が終了したら、ステップＳ１０１１が開始される。 In step S1007, the measurement unit 100 performs mass spectrometry on a plurality of samples, and the mass spectrum generation unit 521 generates data corresponding to the mass spectrum of each sample. After step S1007 ends, step S1009 starts. In step S1009, the similarity calculation unit 522 calculates the similarity for each pair of the n samples based on whether or not there is a peak in the range based on m/z acquired in step S1003 in the mass spectrum. calculate. After step S1009 ends, step S1011 is started.

ステップＳ１０１１において、類似度算出部５２２は、各ペアの類似度を示す、ｎ×ｎの類似度表に対応するデータを生成する。ステップＳ１０１１が終了したら、ステップＳ１０１３が開始される。ステップＳ１０１３において、出力画像生成部５３は、類似度と、当該類似度の値に応じた色とを対応付けて示す類似度画像に対応する画像データを生成する。ステップＳ１０１３が終了したら、ステップＳ１０１５が開始される。 In step S1011, the similarity calculation unit 522 generates data corresponding to an n×n similarity table indicating the similarity of each pair. After step S1011 ends, step S1013 starts. In step S1013, the output image generation unit 53 generates image data corresponding to a similarity image showing the degree of similarity and the color corresponding to the value of the degree of similarity in association with each other. After step S1013 ends, step S1015 is started.

ステップＳ１０１５において、出力部４４は、類似度表示画像を出力する。ステップＳ１０１５が終了したら、処理が終了される。 In step S1015, the output unit 44 outputs a similarity display image. After step S1015 ends, the process ends.

上述の実施形態によれば、次の作用効果が得られる。
（１）本実施形態に係るデータ処理装置（情報処理部４０）およびデータ処理方法では、類似度算出部５２２が、第１生物に由来する第１試料と第２生物に由来する第２試料とにおいて共通に含まれる対象分子が、第１試料および第２試料のそれぞれの質量分析において、対象分子の構造から算出されたまたは過去の測定により得られた同一のｍ／ｚに対応して検出されているかに基づいて、第１生物と第２生物との間の類似度を算出し、３以上の種類の対象生物から２種類の対象生物を選択して得られた、複数の組合せについて、それぞれ類似度を算出し、出力画像生成部５３が、３以上の種類の対象生物の間の類似度をマッピングして表示する類似度画像を含む出力画像を生成し、出力部４４が出力画像を出力する。これにより、質量分析を用いて迅速に、複数の生物の間の類似度を算出し、分かりやすく表示することができる。また、対象生物の遺伝子の塩基配列が分からなくても、過去の質量分析によりピークのｍ／ｚが得られているリボソームタンパク質等を対象分子とすることで、遺伝子のシークエンシング等を行わなくても類似度を算出することができる。さらに、マススペクトルのパターンではなく対象分子に対応するｍ／ｚ値を用いて解析を行うため、試料調製の条件や装置の機種等によるばらつきの影響を受けにくい。 According to the above-described embodiment, the following effects are obtained.
(1) In the data processing device (information processing unit 40) and data processing method according to the present embodiment, the similarity calculation unit 522 calculates the first sample derived from the first organism and the second sample derived from the second organism. is detected corresponding to the same m/z calculated from the structure of the target molecule or obtained by past measurement in each of the first sample and the second sample. Calculate the degree of similarity between the first organism and the second organism based on whether the The similarity is calculated, the output image generation unit 53 generates an output image including a similarity image that maps and displays the similarity between three or more types of target organisms, and the output unit 44 outputs the output image. do. As a result, it is possible to quickly calculate the degree of similarity between a plurality of organisms using mass spectrometry and display it in an easy-to-understand manner. In addition, even if the base sequence of the gene of the target organism is unknown, by using a ribosomal protein or the like for which the peak m/z has been obtained by mass spectrometry in the past as the target molecule, there is no need to perform gene sequencing or the like. can also calculate the similarity. Furthermore, since analysis is performed using the m/z value corresponding to the target molecule instead of the mass spectrum pattern, it is less susceptible to variations due to sample preparation conditions, equipment models, and the like.

（２）本実施形態に係るデータ処理装置において、類似度算出部５２２は、対象分子の少なくとも一つが、一種類の対象分子に対応する複数のｍ／ｚのいずれに対応して検出されているかに基づいて、対象生物の間の類似度を算出する。これにより、対象分子の変異体についての情報を利用して、正確に類似度を算出することができる。 (2) In the data processing apparatus according to the present embodiment, the similarity calculation unit 522 determines which of a plurality of m/z corresponding to one type of target molecule corresponds to at least one target molecule. Based on, the similarity between the target organisms is calculated. This makes it possible to accurately calculate the degree of similarity using information about mutants of the target molecule.

（３）本実施形態に係るデータ処理装置において、出力画像生成部５３は、類似度の値に応じて類似度に対応する部分（要素Ｅ等）の色相、明度および彩度の少なくとも一つが異なる類似度画像６０を生成する。これにより、対象生物の間の類似度をさらに分かりやすく表示することができる。 (3) In the data processing device according to the present embodiment, the output image generation unit 53 changes at least one of the hue, brightness, and saturation of the portion corresponding to the similarity (element E, etc.) according to the value of the similarity. A similarity image 60 is generated. This makes it possible to display the degree of similarity between target organisms in a more comprehensible manner.

（４）本実施形態に係るデータ処理装置において、対象分子はタンパク質であり、類似度算出部５２２は、質量分析において、対象分子のアミノ酸配列または塩基配列に基づいて導出された同一のｍ／ｚに対応して対象分子が検出されているかに基づいて、対象生物間の類似度を算出することができる。これにより、過去に得られた遺伝情報に基づいて、より正確に類似度を算出することができる。対象分子が核酸の場合でも、このような遺伝情報を用いた類似度の算出を行うことができる。 (4) In the data processing device according to this embodiment, the target molecule is a protein, and the similarity calculation unit 522 calculates the same m/z derived based on the amino acid sequence or base sequence of the target molecule in mass spectrometry. The degree of similarity between target organisms can be calculated based on whether the target molecule is detected corresponding to . This makes it possible to more accurately calculate the degree of similarity based on genetic information obtained in the past. Similarity can be calculated using such genetic information even when the target molecule is a nucleic acid.

（５）本実施形態に係る分析装置は、本実施形態に係るデータ処理装置（情報処理部４０）と、質量分析計とを備える。これにより、質量分析を用いて迅速に、複数の生物の間の類似度を算出し、分かりやすく表示することができる。 (5) The analysis apparatus according to this embodiment includes the data processing apparatus (information processing unit 40) according to this embodiment and a mass spectrometer. As a result, it is possible to quickly calculate the degree of similarity between a plurality of organisms using mass spectrometry and display it in an easy-to-understand manner.

次のような変形も本発明の範囲内であり、上述の実施形態と組み合わせることが可能である。以下の変形例において、上述の実施形態と同様の構造、機能を示す部位等に関しては、同一の符号で参照し、適宜説明を省略する。
（変形例１）
上述の実施形態において、出力画像生成部５３が類似度画像６０を含む出力画像を生成する際に、出力画像には、類似度画像６０に加え、対象生物の系統樹等の、類似度算出部５２２による類似度の算出とは異なる方法で算出された対象生物の間の類似度についての情報を示してもよい。 The following modifications are also within the scope of the present invention and can be combined with the above-described embodiments. In the following modified examples, the same reference numerals are used to refer to parts having the same structures and functions as those of the above-described embodiment, and description thereof will be omitted as appropriate.
(Modification 1)
In the above-described embodiment, when the output image generation unit 53 generates an output image including the similarity image 60, the output image includes, in addition to the similarity image 60, a similarity calculation unit such as a phylogenetic tree of the target organism. Information about the degree of similarity between the target organisms calculated by a method different from the method of calculating the degree of similarity by 522 may be displayed.

図５は、本変形例の出力画像８０の一例を示す図である。出力画像８０では、類似度画像６０と、系統樹画像７０とが示されている。図５の類似度画像６０には、２３種類のアクネ菌の異なる株を対象生物とした類似度が表示されている。系統樹画像７０には、当該対象生物の遺伝子の塩基配列または当該遺伝子がコードするタンパク質のアミノ酸配列に基づいた系統樹が示されている。類似度画像６０の画像部分６１には、アクネ菌の各株に対応する番号が示されている。系統樹画像７０の画像部分７１には、遺伝子座位における置換数を示すスケールバーが示されている。図５の類似度画像６０では、ハッチングを省略した。
なお、出力画像８０は、類似度画像６０と、対象生物の細胞構成成分の類似度に基づくクラスター解析結果を含む情報とを示す画像でもよい。出力画像８０は、当該画像にさらに系統樹画像７０を含んでもよい。 FIG. 5 is a diagram showing an example of an output image 80 of this modified example. The output image 80 shows a similarity image 60 and a phylogenetic tree image 70 . The similarity image 60 of FIG. 5 displays the similarity of 23 different strains of P. acnes as target organisms. The phylogenetic tree image 70 shows a phylogenetic tree based on the nucleotide sequence of the gene of the target organism or the amino acid sequence of the protein encoded by the gene. An image portion 61 of the similarity image 60 indicates a number corresponding to each strain of P. acnes. An image portion 71 of the phylogenetic tree image 70 shows a scale bar indicating the number of substitutions at the gene locus. Hatching is omitted in the similarity image 60 of FIG.
Note that the output image 80 may be an image showing the similarity image 60 and information including the cluster analysis result based on the similarity of the cellular components of the target organism. The output image 80 may further include the phylogenetic tree image 70 in the image.

本変形例では、上述の実施形態の作用効果の他、以下の作用効果が得られる。
（１）本変形例に係るデータ処理装置において、出力画像生成部５３は、類似度画像６０と、類似度算出部５２２による類似度の算出とは異なる方法で算出された対象生物の間の類似度についての情報とを含む出力画像を生成する。これにより、複数の方法で算出された対象生物間の類似度を表示するため、対象生物間の関係についてより詳細な情報を提供することができる。 In addition to the effects of the above-described embodiment, the following effects are obtained in this modified example.
(1) In the data processing device according to this modified example, the output image generation unit 53 calculates the similarity between the similarity image 60 and the target organism calculated by a method different from the calculation of the similarity by the similarity calculation unit 522. generate an output image containing information about the degree of Since the degree of similarity between target organisms calculated by a plurality of methods is displayed, it is possible to provide more detailed information about the relationship between target organisms.

（２）本変形例に係るデータ処理装置において、上記異なる方法で算出された対象生物の間の類似度についての情報は、前記対象生物の系統樹である。これにより、系統樹と上述の実施形態で算出された類似度とを比較等することにより、対象生物間の関係についてさらに詳細な情報を提供することができる。 (2) In the data processing device according to this modified example, the information about the degree of similarity between the target organisms calculated by the different methods is the phylogenetic tree of the target organisms. Accordingly, by comparing the phylogenetic tree with the similarity calculated in the above-described embodiment, it is possible to provide more detailed information about the relationship between the target organisms.

（３）本変形例に係るデータ処理装置において、対象分子はタンパク質であり、上記異なる方法で算出された類似度は、当該タンパク質に対応する核酸の塩基配列の相同性に基づいて算出された類似度であるとすることができる。これにより、対象タンパク質の質量分析と、遺伝子の塩基配列との異なる側面から類似度を算出することができ、対象生物間の関係についてより詳細な情報を提供することができる。 (3) In the data processing device according to this modified example, the target molecule is a protein, and the degree of similarity calculated by the different method is similarity calculated based on the homology of the base sequence of the nucleic acid corresponding to the protein. degree. As a result, the degree of similarity between the mass spectrometry of the target protein and the base sequence of the gene can be calculated from different aspects, and more detailed information about the relationship between the target organisms can be provided.

（変形例２）
上述の実施形態では、対象分子をタンパク質とした例を用いて説明したが、対象分子を糖鎖、脂質、色素または核酸としてもよい。対象分子を脂質とする場合、極性基の分子構造や、炭素鎖長の違いに基づいて、各対象分子を設定することができる。この場合、対象変異体としては鎖式炭化水素の炭素数等が等しいが二重結合の数や位置が互いに異なる複数の分子等とすることができる。対象分子を糖鎖とする場合、糖鎖を構成する単糖の種類、組合せ若しくは数、糖鎖の鎖長、複合糖鎖の種類または結合様式等の違いを利用して、対象分子や対象異性体を設定することができる。対象分子および対象変異体の設定の方法は、対象分子が対象生物に共通して存在し、少なくとも一部の対象分子に関する複数の対象変異体が対象生物に存在するのであれば特に限定されない。また、タンパク質、脂質、色素、糖鎖、核酸などの各細胞構成成分の少なくとも２つの分析結果を組み合わせて類似度データを生成し、当該類似度データに基づく類似度画像を生成することもできる。 (Modification 2)
In the above-described embodiments, an example in which the target molecule is a protein has been described, but the target molecule may be sugar chains, lipids, dyes, or nucleic acids. When the target molecule is a lipid, each target molecule can be set based on the difference in the molecular structure of the polar group and the carbon chain length. In this case, the target mutants may be a plurality of molecules having the same number of carbon atoms in chain hydrocarbons but different numbers and positions of double bonds. When the target molecule is a sugar chain, differences in the type, combination or number of monosaccharides that make up the sugar chain, the chain length of the sugar chain, the type or bonding mode of the complex sugar chain, etc. body can be set. The method of setting target molecules and target variants is not particularly limited as long as the target molecules are commonly present in the target organism and a plurality of target variants related to at least some of the target molecules are present in the target organism. In addition, similarity data can be generated by combining at least two analysis results of each cell component such as protein, lipid, pigment, sugar chain, and nucleic acid, and a similarity image can be generated based on the similarity data.

（変形例３）
分析装置１の情報処理機能を実現するためのプログラムをコンピュータ読み取り可能な記録媒体に記録して、この記録媒体に記録された、上述したデータ処理部５２や出力画像生成部５３による処理を含む測定、データ解析および表示の処理およびそれに関連する処理の制御に関するプログラムをコンピュータシステムに読み込ませ、実行させてもよい。なお、ここでいう「コンピュータシステム」とは、ＯＳ（ＯｐｅｒａｔｉｎｇＳｙｓｔｅｍ）や周辺機器のハードウェアを含むものとする。また、「コンピュータ読み取り可能な記録媒体」とは、フレキシブルディスク、光磁気ディスク、光ディスク、メモリカード等の可搬型記録媒体、コンピュータシステムに内蔵されるハードディスク等の記憶装置のことをいう。さらに「コンピュータ読み取り可能な記録媒体」とは、インターネット等のネットワークや電話回線等の通信回線を介してプログラムを送信する場合の通信線のように、短時間の間、動的にプログラムを保持するもの、その場合のサーバやクライアントとなるコンピュータシステム内部の揮発性メモリのように、一定時間プログラムを保持するものを含んでもよい。また上記のプログラムは、前述した機能の一部を実現するためのものであってもよく、さらに前述した機能をコンピュータシステムにすでに記録されているプログラムとの組み合わせにより実現するものであってもよい。 (Modification 3)
A program for realizing the information processing function of the analysis device 1 is recorded in a computer-readable recording medium, and the measurement including the processing by the data processing unit 52 and the output image generation unit 53 described above recorded in this recording medium , data analysis and display processing and control of related processing may be read and executed by the computer system. The term "computer system" used herein includes an OS (Operating System) and peripheral hardware. The term "computer-readable recording medium" refers to portable recording media such as flexible disks, magneto-optical disks, optical disks, memory cards, etc., and storage devices such as hard disks incorporated in computer systems. Furthermore, "computer-readable recording medium" means a medium that dynamically retains a program for a short period of time, like a communication line when transmitting a program via a network such as the Internet or a communication line such as a telephone line. It may also include something that retains the program for a certain period of time, such as a volatile memory inside a computer system that serves as a server or client in that case. Further, the above program may be for realizing part of the functions described above, or may further realize the above functions by combining with a program already recorded in the computer system. .

また、パーソナルコンピュータ（以下、ＰＣと記載）等に適用する場合、上述した制御に関するプログラムは、ＣＤ－ＲＯＭ、ＤＶＤ－ＲＯＭ等の記録媒体やインターネット等のデータ信号を通じて提供することができる。図６はその様子を示す図である。ＰＣ９５０は、ＣＤ－ＲＯＭ９５３を介してプログラムの提供を受ける。また、ＰＣ９５０は通信回線９５１との接続機能を有する。コンピュータ９５２は上記プログラムを提供するサーバーコンピュータであり、ハードディスク等の記録媒体にプログラムを格納する。通信回線９５１は、インターネット、パソコン通信などの通信回線、あるいは専用通信回線などである。コンピュータ９５２はハードディスクを使用してプログラムを読み出し、通信回線９５１を介してプログラムをＰＣ９５０に送信する。すなわち、プログラムをデータ信号として搬送波により搬送して、通信回線９５１を介して送信する。このように、プログラムは、記録媒体や搬送波などの種々の形態のコンピュータ読み込み可能なコンピュータプログラム製品として供給できる。 When applied to a personal computer (hereinafter referred to as PC) or the like, the above-described control program can be provided through recording media such as CD-ROMs and DVD-ROMs and data signals such as the Internet. FIG. 6 is a diagram showing the situation. The PC 950 receives programs via a CD-ROM 953 . Also, the PC 950 has a connection function with a communication line 951 . A computer 952 is a server computer that provides the above program, and stores the program in a recording medium such as a hard disk. The communication line 951 is a communication line such as the Internet, personal computer communication, or a dedicated communication line. Computer 952 reads the program using the hard disk and transmits the program to PC 950 via communication line 951 . That is, the program is carried by a carrier wave as a data signal and transmitted via the communication line 951 . Thus, the program can be supplied as a computer readable computer program product in various forms such as a recording medium or carrier wave.

上述した情報処理機能を実現するためのプログラムとして、第１生物に由来する第１試料と、第２生物に由来する第２試料とにおいて共通に含まれる対象分子が、第１試料および第２試料のそれぞれの質量分析において、対象分子の構造から算出されたまたは過去の測定により得られた同一のｍ／ｚに対応して検出されているかに基づいて、第１生物と第２生物との間の類似度を算出し、３以上の種類の生物から２種類の生物を選択して得られた、複数の組合せについて、それぞれ類似度を算出する類似度算出処理と、３以上の種類の生物の間の類似度をマッピングして表示する類似度画像６０を含む出力画像８０を生成する出力画像生成処理とを処理装置に行わせるためのプログラムが含まれる。これにより、質量分析を用いて迅速に、複数の生物の間の類似度を算出し、分かりやすく表示することができる。 As a program for realizing the information processing function described above, a target molecule that is commonly contained in a first sample derived from a first organism and a second sample derived from a second organism is detected in the first sample and the second sample. In each mass spectrometry, based on whether it is detected corresponding to the same m / z calculated from the structure of the target molecule or obtained by past measurement, between the first organism and the second organism A similarity calculation process for calculating the similarity for each of a plurality of combinations obtained by selecting two types of organisms from three or more types of organisms, and a similarity calculation process for calculating the similarity for each of the three or more types of organisms and an output image generating process for generating an output image 80 including a similarity image 60 that maps and displays the similarity between the images. As a result, it is possible to quickly calculate the degree of similarity between a plurality of organisms using mass spectrometry and display it in an easy-to-understand manner.

本発明は上記実施形態の内容に限定されるものではない。本発明の技術的思想の範囲内で考えられるその他の態様も本発明の範囲内に含まれる。 The present invention is not limited to the contents of the above embodiments. Other aspects conceivable within the scope of the technical idea of the present invention are also included in the scope of the present invention.

以下に、本実施形態に係る実施例を示すが、本発明は下記の実施例に限定されるものではない。 Examples according to this embodiment are shown below, but the present invention is not limited to the following examples.

（実施例１）
実施例１では、3種類のアクネ菌を質量分析し、8種類のリボソームタンパク質を対象分子として類似度画像を作成した。 (Example 1)
In Example 1, mass spectrometry was performed on three types of P. acnes, and similarity images were created using eight types of ribosomal proteins as target molecules.

皮膚常在菌のP.acnesの菌体をジルコニア製のビーズ（直径0.5 mm）を用いて破砕し、細胞質成分等を含む溶液を15000 gで5分間遠心分離した。遠心分離後の上清を公称分画分子量が100 KDaのろ過フィルターを用いてろ過し、ろ過フィルターに残った画分をリボソームを含むリボソーム画分とした。マトリックスとしてシナピン酸を用い、リボソーム画分をMALDIによりイオン化した後、飛行時間型質量分析により測定した。 Cells of P.acnes, a resident skin bacterium, were disrupted using zirconia beads (0.5 mm in diameter), and the solution containing cytoplasmic components was centrifuged at 15000 g for 5 minutes. The supernatant after centrifugation was filtered using a filtration filter with a nominal cut-off molecular weight of 100 KDa, and the fraction remaining on the filtration filter was used as a ribosome fraction containing ribosomes. Using sinapinic acid as a matrix, the ribosomal fraction was ionized by MALDI and measured by time-of-flight mass spectrometry.

図７は、P.acnesの一つの株（上述のアクネ菌Ａ株に対応）に由来するリボソーム画分を質量分析して得られたマススぺクトルを示す図である。図７では、ピークに対応して帰属されたリボソームタンパク質の名称を示した。これらの帰属は、リボソームタンパク質のアミノ酸配列情報に基づき、タンパク質の観測質量の理論により導出した。Ａ１０で示された範囲では、わかりやすくするため他の範囲と比べて強度値が5倍にして示されている。他の2株（上述のアクネ菌Ｂ株およびＣ株）も同様にマススペクトルを取得し、リボソームタンパク質由来のピークの帰属を行った。 FIG. 7 is a diagram showing a mass spectrum obtained by mass spectrometric analysis of a ribosome fraction derived from one strain of P. acnes (corresponding to the P. acnes A strain described above). FIG. 7 shows the names of the ribosomal proteins assigned to correspond to the peaks. These assignments were derived from the theory of observed masses of proteins based on amino acid sequence information of ribosomal proteins. In the range indicated by A10, the intensity values are shown to be five times greater than the other ranges for clarity. Mass spectra were similarly obtained for the other two strains (A. acnes B and C strains described above), and peaks derived from ribosomal proteins were assigned.

対象分子を8種類のリボソームタンパク質Ｓ１５、Ｓ１９、Ｓ０８、Ｌ０９、Ｓ１８、Ｌ２７、Ｓ１７およびＬ２３とした。対象分子について、アミノ酸配列から求めた理論値と同じm/z値に基づく範囲（許容誤差200 ppm）にピークが観測されたら1、観測されなければ0としてこれらの数値を図２の表Ａ、表Ｂおよび表Ｃに示した。上述の実施形態で示したように、アクネ菌Ａ株とＢ株との間の類似度は0.75、アクネ菌Ａ株とＣ株との間の類似度は0.75、アクネ菌Ｂ株とＣ株との間の類似度は0.875となった。これらの値をマッピングして図３に示す類似度画像が得られた。 Target molecules were eight ribosomal proteins S15, S19, S08, L09, S18, L27, S17 and L23. For the target molecule, if a peak is observed in the same m/z value range (allowable error of 200 ppm) as the theoretical value obtained from the amino acid sequence, it is 1, and if not observed, it is 0. These values are shown in Table A in FIG. Shown in Tables B and C. As shown in the above embodiment, the similarity between P. acnes A and B strains is 0.75, the similarity between P. acnes A and C strains is 0.75, and the similarity between P. acnes B and C strains is 0.75. The similarity between is 0.875. Mapping these values yielded the similarity image shown in FIG.

（実施例２）
P.acnesの異なる23株を実施例１と同様に質量分析し、類似度画像を作成した。また、遺伝子解析に基づいて系統樹を作成した。図５に得られた類似度画像および系統樹を示した。 (Example 2)
Twenty-three different strains of P.acnes were subjected to mass spectrometry in the same manner as in Example 1, and similarity images were created. In addition, a phylogenetic tree was created based on genetic analysis. FIG. 5 shows the obtained similarity image and phylogenetic tree.

１…分析装置、１０…イオン化部、２２…質量分離部、３０…検出部、４０…情報処理部、４３…記憶部、４４…出力部、５２…データ処理部、５３…出力画像生成部、６０…類似度画像、７０…系統樹画像、８０…出力画像、１００…測定部、５２１…マススペクトル生成部、５２２…類似度算出部、Ｅ，Ｅａｂ，Ｅｂａ…類似度表の要素、Ｓ…試料イオン。
DESCRIPTION OF SYMBOLS 1... Analyzer, 10... Ionization part, 22... Mass separation part, 30... Detection part, 40... Information processing part, 43... Storage part, 44... Output part, 52... Data processing part, 53... Output image generation part, 60... Similarity image 70... Phylogenetic tree image 80... Output image 100... Measurement unit 521... Mass spectrum generation unit 522... Similarity calculation unit E, Eab, Eba... Elements of similarity table S... sample ion.

Claims

a storage unit that stores m/z values corresponding to a plurality of molecules, which are calculated from molecular structures or obtained by past mass spectrometric measurements;
By referring to the storage unit, target molecules commonly contained in the first sample derived from the first organism and the second sample derived from the second organism are found in each of the first sample and the second sample. Three or more types of operations for calculating the degree of similarity between the first organism and the second organism based on whether they are detected corresponding to the same m/z value in mass spectrometry A similarity calculation unit that executes each of a plurality of combinations obtained by selecting two types of organisms from the organisms,
an output image generation unit that generates an output image including a similarity image that maps and displays the similarity between the three or more types of organisms;
and an output unit that outputs the output image.

The data processing device according to claim 1,
The storage unit stores m/z values respectively corresponding to a plurality of different mutants for at least one of the molecules,
The similarity calculation unit refers to the storage unit, based on which of a plurality of m/z corresponding to each of a plurality of different mutants , at least one of the target molecules is detected. and a data processing device for calculating a degree of similarity between the first living thing and the second living thing.

3. In the data processing device according to claim 1 or 2,
The output image generating unit is a data processing device that generates the similarity image in which at least one of hue, brightness and saturation of a portion corresponding to the similarity is different according to the value of the similarity.

In the data processing device according to any one of claims 1 to 3,
The output image generation unit includes the similarity image and information about the similarity between the three or more types of organisms calculated by a method different from the calculation of the similarity by the similarity calculation unit. A data processing device for generating said output image.

In the data processing device according to claim 4,
The data processing device, wherein the information is a phylogenetic tree of the three or more types of organisms.

6. In the data processing device according to claim 4 or 5,
the molecule of interest is a protein;
The data processing device, wherein the degree of similarity calculated by the different method is a degree of similarity calculated based on the homology of the nucleotide sequences of the nucleic acids corresponding to the protein.

In the data processing device according to any one of claims 1 to 5,
The data processing device, wherein the target molecules are sugar chains, lipids, proteins or nucleic acids.

In the data processing device according to claim 7,
said molecule of interest is a protein or nucleic acid;
In the mass spectrometry, the similarity calculation unit determines whether the target molecule is detected corresponding to the same m/z derived based on the amino acid sequence or base sequence of the target molecule. A data processing device for calculating a degree of similarity between one organism and the second organism.

In the data processing device according to any one of claims 1 to 8,
The similarity is data obtained by dividing the number of target molecules detected at the same m/z between the first sample and the second sample among the target molecules by the number of all the target molecules. processing equipment.

a data processing device according to any one of claims 1 to 9 ;
an analyzer comprising a mass spectrometer;

a storage unit that stores m/z values corresponding to a plurality of molecules calculated from molecular structures or obtained by past mass spectrometry measurements; a similarity calculation unit; and an output image generation unit. , and an output unit to perform data processing, comprising:
The similarity calculation unit refers to the storage unit, so that target molecules commonly contained in a first sample derived from a first organism and a second sample derived from a second organism are obtained in the first sample and An operation of calculating the degree of similarity between the first organism and the second organism based on whether or not the same m/z value is detected in mass spectrometry of each of the second samples A similarity calculation step for executing each of a plurality of combinations obtained by selecting two types of organisms from three or more types of organisms;
an output image generation step in which the output image generation unit generates an output image including a similarity image that maps and displays the similarity between the three or more types of organisms;
and an output step in which the output unit outputs the output image.

derived from the first organism by referring to a storage unit storing a plurality of m/z values respectively corresponding to a plurality of molecules calculated from the structure of the molecule or obtained by past mass spectrometric measurement The target molecule contained in common in the first sample and the second sample derived from the second organism corresponds to the same m/z value in the mass spectrometry of each of the first sample and the second sample The operation of calculating the similarity between the first organism and the second organism based on whether it is detected as a A similarity calculation process to be executed for each of a plurality of combinations;
A program for causing a data processing device to perform an output image generation process for generating an output image including a similarity image that maps and displays the similarity between the three or more kinds of living things.