WO2006134703A1 - Method of data correction in liquid chromatography - Google Patents

Method of data correction in liquid chromatography Download PDF

Info

Publication number
WO2006134703A1
WO2006134703A1 PCT/JP2006/306907 JP2006306907W WO2006134703A1 WO 2006134703 A1 WO2006134703 A1 WO 2006134703A1 JP 2006306907 W JP2006306907 W JP 2006306907W WO 2006134703 A1 WO2006134703 A1 WO 2006134703A1
Authority
WO
WIPO (PCT)
Prior art keywords
data
peak
solution
retention time
mass
Prior art date
Application number
PCT/JP2006/306907
Other languages
French (fr)
Japanese (ja)
Inventor
Masaya Ono
Tesshi Yamada
Setsuo Hirohashi
Original Assignee
Japan Health Sciences Foundation
Priority date (The priority date is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the date listed.)
Filing date
Publication date
Application filed by Japan Health Sciences Foundation filed Critical Japan Health Sciences Foundation
Priority to JP2007521162A priority Critical patent/JP5119405B2/en
Publication of WO2006134703A1 publication Critical patent/WO2006134703A1/en

Links

Classifications

    • GPHYSICS
    • G01MEASURING; TESTING
    • G01NINVESTIGATING OR ANALYSING MATERIALS BY DETERMINING THEIR CHEMICAL OR PHYSICAL PROPERTIES
    • G01N30/00Investigating or analysing materials by separation into components using adsorption, absorption or similar phenomena or using ion-exchange, e.g. chromatography or field flow fractionation
    • G01N30/02Column chromatography
    • G01N30/86Signal analysis
    • G01N30/8624Detection of slopes or peaks; baseline correction
    • GPHYSICS
    • G01MEASURING; TESTING
    • G01NINVESTIGATING OR ANALYSING MATERIALS BY DETERMINING THEIR CHEMICAL OR PHYSICAL PROPERTIES
    • G01N30/00Investigating or analysing materials by separation into components using adsorption, absorption or similar phenomena or using ion-exchange, e.g. chromatography or field flow fractionation
    • G01N30/02Column chromatography
    • G01N30/62Detectors specially adapted therefor
    • G01N2030/621Detectors specially adapted therefor signal-to-noise ratio
    • G01N2030/623Detectors specially adapted therefor signal-to-noise ratio by modulation of sample feed or detector response

Definitions

  • the present invention relates to a data processing method for liquid chromatography in proteome analysis, and more particularly to a data processing method for ultra-low flow liquid chromatography.
  • Liquid chromatography utilizes a specific affinity between a resin loaded in a column and a substance in the solution to create a stepwise concentration gradient of the solution, thereby concentrating a specific solution concentration. This is a method of releasing a substance from a resin at a degree.
  • the concentration gradient is created as expressed as a function of time (as it varies with time). Therefore, it is possible to identify (specify) a substance by grasping the retention time during which the substance is released. Therefore, the reproducibility of retention time is the most important for identifying substances in liquid chromatography.
  • the flow rate of the solution to the column is high, relatively good reproducibility is obtained.
  • the flow rate of the solution to the column is low! / In this case, the reproducibility is good! /
  • the proteomics technique using a mass spectrometer the protein of an organism can be quantitatively identified. For this reason, the proteomics method has begun to be widely applied in the fields of medicine and biology.
  • a nanoLC / MS system that combines ultra-low flow liquid chromatography and an accurate mass spectrometer has recently attracted attention. According to this apparatus, it is possible to identify a huge amount of protein from a small amount of sample. More specifically, a substance that has been subjected to intensive separation by ultra-low flow liquid chromatography is identified by accurately measuring the mass of each substance with an accurate mass spectrometer. It is.
  • Non-Patent Document 1 Proceeding of the National Academy of Sciences. 99: 11049, 2002 (Lip ton MS, et al)
  • Non-Patent Document 2 Molecular & Cellular Proteomics. 4: 144, 2005 (Lipton MS, et al) Invention Disclosure
  • the present invention has been made in consideration of the above points, and it is possible to confirm the substantially high reproducibility by correcting the data of the retention time that may differ for each inspection (each measurement). It is an object of the present invention to provide a liquid chromatography data correction method that can be performed, particularly, an ultra-low flow liquid chromatography correction method.
  • the present invention provides a liquid chromatography data analysis method capable of realizing high-precision data comparison of a plurality of solutions to be tested, in particular, an ultra-low flow liquid chromatography analysis method.
  • the purpose is to do.
  • the present invention provides a concentration gradient generation step of flowing a solution containing a substance to be inspected through liquid chromatography to generate a concentration gradient of the solution over a predetermined time, and the separation and elution during the concentration gradient generation step
  • Each component of the inspected substance is a two-dimensional measurement process in which retention time data and mass-to-charge ratio data are associated with each other, and the retention time data and mass-to-charge ratio data obtained in the measurement process.
  • a correction step of correcting the data by correlating with the standard two-dimensional data obtained in advance. It is.
  • the target for correlating the two-dimensional data obtained by the measurement is not limited to the standard two-dimensional data obtained in advance. For example, it is of course possible to correlate two two-dimensional data obtained by measurement.
  • the present invention includes a first concentration gradient generation step of generating a concentration gradient of the first solution over a predetermined time by flowing a first solution containing the first analyte in liquid chromatography.
  • a first measurement step for associating retention time data and mass-to-charge ratio data for each component of the first analyte to be separated and eluted during the first concentration gradient generation step; and the liquid chromatography.
  • a second concentration gradient generating step for generating a concentration gradient of the second solution over a predetermined time by flowing a second solution containing the second substance to be inspected; and the second concentration gradient separated and eluted during the second concentration gradient generating step.
  • the second measurement step obtained by associating the retention time data with the mass to charge ratio data, and the retention time data and the mass to charge ratio data obtained in the second measurement step.
  • Two-dimensional data is converted into the first measurement step
  • a correction step of correcting by correlating with the two-dimensional data of the retention time data and the mass-to-charge ratio data obtained in (1).
  • the two correlated two-dimensional data may have been obtained using different liquid chromatography.
  • the present invention provides a first concentration gradient generation step for generating a concentration gradient of the first solution over a predetermined time by flowing the first solution containing the first substance to be tested through the first liquid chromatography.
  • a first measurement step for obtaining retention time data and mass-to-charge ratio data in association with each component of the first analyte to be separated and eluted during the first concentration gradient generation step, and a second liquid chromatography
  • the second solution containing the second substance to be tested is flowed into the second concentration gradient generating step for generating a concentration gradient of the second solution over a predetermined time and separated and eluted during the second concentration gradient generating step.
  • a second measurement step in which retention time data and mass-to-charge ratio data are associated with each other, and retention time data and mass-to-charge ratio obtained in the second measurement step 2D data with the first measurement
  • a data correction method characterized by comprising a correction step of correcting by the this correlating a two-dimensional data of the obtained retention time data and mass-to-charge ratio data in extent.
  • a solution containing a substance to be inspected in liquid chromatography can be flowed at a flow rate of 500 nlZmin or less, particularly preferably at a flow rate of about 200 nlZmin.
  • a dynamic search for searching for an optimum corresponding position using a two-dimensional grid coordinate with each cycle being a cycle number of two two-dimensional data (ascending order number with respect to holding time). Algorithms are being used.
  • the dynamic algorithm is, for example, n cycles of one two-dimensional data.
  • the Pearson product-moment correlation coefficient between the mass-to-charge ratio (mass spectrum) A (n) in the second eye and the mass-to-charge ratio B (n) in the m-th cycle of the other two-dimensional data is R (A (n ), B (m)), the gap penalty is g, the total number of cycles of one 2D data is N, and the total number of cycles of the other 2D data is M, the two 2D data Two-dimensional lattice coordinates L (i, j) with the cycle number as each axis
  • the present invention provides a data correction method having any one of the above-described characteristics for each of a plurality of solutions (or second solutions) each including a plurality of substances to be inspected (or second substances to be inspected).
  • the two-dimensional data of the data correction process for carrying out the process and the retention time data corrected by the data correction method and the mass-to-charge ratio data are stored as the retention time data of each solution (or each second solution) for a certain mass charge.
  • a data development process that develops two-dimensional image data arranged in parallel, and a same peak extraction process that extracts the same peak of retention time data based on the two-dimensional image data It is an analysis method.
  • the same peak of retention time data is extracted from a plurality of solutions (or second solutions) each containing a plurality of substances to be inspected (or second substances to be inspected).
  • the data characteristics of this solution (or second solution) can be analyzed effectively. This As a result, it can be expected to significantly promote the development of disease markers such as tumor markers.
  • the data analysis method described above is performed (the data analysis step), and the first group of multiple solutions (or the second solution) Compare the same peak of retention time data obtained from the same peak of retention time data obtained from multiple solutions (or 2nd solution) of the second group, and there is a significant difference between them. If there is a significant difference, the difference can be used as a “marker”.
  • the same peak extraction step includes a peak detection step for detecting a peak for retention time data of each solution (or each second solution), and each solution detected in the peak detection step ( Or the same peak specifying step for specifying the correspondence between peaks of each second solution).
  • the same peak specifying step includes a candidate peak extracting step of extracting a candidate peak included within a predetermined holding time width, and the candidate peak extracting step in a certain solution (or second solution).
  • a candidate peak extracting step of extracting a candidate peak included within a predetermined holding time width
  • the candidate peak extracting step in a certain solution (or second solution).
  • one candidate peak for the solution (or the second solution) is selected and extracted in the candidate peak extraction step in a certain solution (or the second solution). If there is no candidate peak, there is no candidate peak for the solution (or second solution), and the selected candidate peak score (total intensity) for each of the combinations of candidate peak selections.
  • the combination of the score calculation step for calculating the selection and the selection of the candidate peak that provides the maximum score among the scores obtained in the score calculation step correspond to each other.
  • a peak identification step that identifies the peak as a peak.
  • the same peak specifying step further includes a data dividing step of dividing retention time data into sections by the same peak specified in the peak specifying step after the peak specifying step.
  • the candidate peak extraction step, the score calculation for the retention time data divided in the data division step The process, the peak identifying process, and the data dividing process are recursively repeated.
  • the candidate peak extracting step is performed with each peak as a reference and an allowable deviation range width from the peak as a predetermined holding time width.
  • the allowable deviation range is 0.7 min on the + side.
  • the ratio of the solution without the candidate peak (or the second solution) extracted in the candidate peak extraction step falls below a predetermined minimum detection rate. It is preferable that the execution of the same peak specifying step based on the peak is terminated.
  • the minimum detection rate is set to 0.1 to 0.4 (if it exceeds 0.5, it becomes difficult to specify a significant difference between the two groups).
  • the retention time data of each solution is developed into two-dimensional image data arranged in parallel for each unit mass charge.
  • the same peak extracting step extracts the same peak of the retention time data for each unit mass charge based on the two-dimensional image data.
  • the present invention provides a concentration gradient generating step for generating a concentration gradient of the solution over a predetermined time by flowing a solution containing a substance to be inspected in liquid chromatography, and separation during the concentration gradient generating step.
  • the two-dimensional data cycle is corrected by correlating the two-dimensional data of the retention time data and mass-to-charge ratio data obtained by correlating with the standard two-dimensional data obtained in advance.
  • Dynamic algorithms software that search for the best corresponding position using two-dimensional grid coordinates with numbers (ascending order with respect to holding time) as axes are now used! /
  • a data correction apparatus according to claim Rukoto.
  • the correction device or each element means of the correction device can be realized by a computer system.
  • the recording medium includes a network that propagates various signals in addition to a medium that can be recognized as a single unit such as a flexible disk.
  • the present invention provides a concentration gradient generation step in which a plurality of solutions each containing a plurality of substances to be tested are flowed through the same or different liquid chromatography and a concentration gradient of each solution is generated over a predetermined time. And a measurement step obtained by associating retention time data with mass-to-charge ratio data for each solution for each component of the substance to be inspected separated and eluted during the concentration gradient generation step, and obtained by the measurement step.
  • the present invention provides a concentration gradient generation step in which a plurality of solutions each containing a plurality of substances to be tested are flowed through the same or different liquid chromatography, and a concentration gradient of each solution is generated over a predetermined time. And a measuring step for associating retention time data and mass-to-charge ratio data for each solution for each component of the substance to be inspected separated and eluted during the concentration gradient generation step.
  • a data analysis apparatus for the method wherein the retention time data obtained in the measurement step and the mass-to-charge ratio data are 2
  • a data development device that develops two-dimensional image data in which the retention time data of each solution is arranged in parallel for a certain mass charge, and extracts the same peak of the retention time data based on the two-dimensional image data.
  • a data analysis device comprising the same peak extraction device.
  • the data analysis device or each element means of the data analysis device can be realized by a computer system.
  • a program for realizing them in a computer system and a computer-readable recording medium recording the program are also subject to protection in this case.
  • the recording medium includes a network that propagates various signals in addition to a medium that can be recognized as a single unit such as a flexible disk.
  • FIG. 1 is a flowchart showing an outline of one embodiment of the present invention.
  • FIG. 2 is a schematic diagram showing the concept of a dynamic algorithm according to an embodiment of the present invention.
  • FIG. 3 is a schematic diagram showing the concept of a dynamic algorithm according to an embodiment of the present invention.
  • FIG. 4 is a schematic diagram illustrating the operation of a dynamic algorithm according to an embodiment of the present invention.
  • FIG. 5 is a schematic diagram illustrating the operation of a dynamic algorithm according to an embodiment of the present invention.
  • FIG. 6 is a schematic diagram for explaining the operation of a dynamic algorithm according to an embodiment of the present invention.
  • FIG. 7 is a graph showing an example of correction of measurement data (an example of difference).
  • FIG. 9 is a flowchart showing an outline of the second embodiment of the present invention.
  • FIG. 10 An example of 2D image data in which the retention time data of each sample is arranged in the vertical axis direction.
  • FIG. 11a is a graph showing the concept of a baseline correction process.
  • FIG. 1 lb A graph showing the concept of the smoothing process.
  • FIG. 11c is a graph showing the concept of the peak detection process.
  • FIG. 12a Example of 2D image data.
  • FIG. 12b Image data indicating peaks detected from the two-dimensional image data in FIG. 12a.
  • FIG. 13 Schematic flowchart showing the same peak extraction process (same peak extraction algorithm) of the present embodiment
  • n FIG. 14a
  • FIG. 14c Peak intensity distribution diagram of FIG. 14a.
  • FIG. 15b is a peak intensity distribution diagram of FIG. 15a.
  • the mass-to-charge ratio was converted to the maximum value for each range of lmZz (mass to charge ratio) and output in wiff format.
  • the analysis target range is limited to mass charge specific power 00 ⁇ : L000mZz, retention time (RT) 1 ⁇ 1800sec, and intensity 200 is replaced with 1 ⁇ 255 gray scale. Displayed.
  • data correction was performed by correlating the data of two times (total of four times) each of which two sample forces were collected (Fig. 1: STEP5).
  • the reference side (standard) two-dimensional data is A
  • the correction target two-dimensional data is B
  • the sum of the mass spectrum correlation coefficients at each holding time is the maximum.
  • a correction function is derived such that
  • the dynamic algorithm of the present embodiment uses the mass-to-charge ratio (mass spectrum) A (n) in the n-th cycle of one two-dimensional data and the mass-to-charge ratio in the m-th cycle of the other two-dimensional data.
  • the Pearson product moment correlation coefficient between B (n) and R (A (n), B (m)), and the gap penalty as g (typically 0.5) Let N be the total number of cycles of the dimension data When the total number of cycles of the other two-dimensional data is M, the two-dimensional lattice coordinates L (i, j) with the cycle numbers of the two two-dimensional data as axes are
  • the coordinate system is RT-converted from the cycle number, and the curve obtained by spline interpolation or polynomial regression is used as the correction function.
  • the target data “A” is a figure (data string) distorted in the Y-axis direction with respect to the reference data “A”.
  • this algorithm corresponds to the algorithm for obtaining f _1 (y,).
  • FIG. 3 shows a plane formed with y and y 'as axes corresponding to FIG.
  • the line force connecting the square marks is a line indicating the y-y 'corresponding position in this case.
  • y-y and the corresponding position (route) are obtained.
  • FIG. 4 corresponds to the example of FIGS. 2 and 3, and the Pearson product moment correlation coefficient R (R
  • a (n), B (m)) is determined, and the gap penalty is set to -0.5, and L (i, j) is actually calculated using this algorithm.
  • the cycle number is converted into the retention time, and the curve is obtained by spline interpolation or polynomial regression. This curve is a correction function to be obtained.
  • the measurement data itself is corrected by correlating multiple measurement data (Off 1 and Off 2 or Onl and On2) of the same type of sample for which it is difficult to find reproducibility. Therefore, it can be regarded as a data group of the same type having a certain degree of reproducibility, and by making the values obtained by averaging them as representative values, it becomes possible to perform identification or diagnosis with higher accuracy. .
  • by correlating two or more two-dimensional data it becomes possible to superimpose their characteristics for evaluation, and high reproducibility can be obtained when evaluating measurement data regardless of the difference in measurement data itself. This makes it possible to compare with many samples, for example, patient serum. This significantly increases the possibility of developing markers for different diseases.
  • the data correction processing described above can be normally performed by a data correction apparatus that can be configured by various computer systems.
  • a program for realizing the data correction apparatus on a computer system and a computer-readable recording medium recording the program are also subject to protection in this case.
  • the data correction apparatus when the data correction apparatus is realized by a program such as an OS (second program) operating on the computer system, a program including various instructions for controlling the program such as the OS and the program are recorded.
  • the recorded media are also subject to protection in this case.
  • the recording medium includes a network that propagates various types of signals in addition to a medium that can be recognized as a single unit such as a flexible disk.
  • the mass-to-charge ratio was converted to the maximum value for each range of lmZz (mass to charge ratio) and output in wiff format.
  • the analysis target range is limited to mass charge specific power 00 ⁇ : L000mZz, retention time (RT) 1 ⁇ 1800sec, and intensity 200 is replaced with 1 ⁇ 255 gray scale. Displayed.
  • the data correction was performed by correlating the data of 3 times (total of 111 times) each of which 37 sample forces were collected (FIG. 9: STEP25).
  • the reference (standard) two-dimensional data is A
  • the two-dimensional data to be corrected is B
  • a correction function was derived that maximized the sum of the mass spectral correlation coefficients at the retention time, and the resulting correction function was multiplied by the retention time data for each sample.
  • the data based on the plasma of a non-carcologist was used as the reference (standard) 2D data A.
  • each lmZz is measured for each lmZz by a data analysis device configured by a computer system.
  • Sample data retention time data was expanded into two-dimensional image data arranged in parallel ( Figure 9: STEP26).
  • FIG. 10 An example of 2D image data is shown in FIG. In Fig. 10, the horizontal axis is the holding time (RT:
  • FIG. 10 shows 2D image data for 86 3mZz!
  • the same peak extraction algorithm includes a baseline correction step (FIG. 11 (a)) and a smoothing step (FIG. 11 (b)) for the retention time data of each sample data. And a peak detection step (FIG. 11 (c)).
  • Baseline correction is a process for correcting the inclination and undulation of the baseline that occurs in the spectrum waveform due to the effect of light scattering on the sample.
  • Smoothing is a process that removes noise by taking a weighted average using a Gaussian function (see Equation 1). These processes are often used as a processing method for data analysis.
  • the peak detection accuracy is improved by calculating the signal Z noise ratio for each data point.
  • FIG. 12 (b) An example of an image showing the detected peak is shown in FIG. 12 (b).
  • Fig. 12 (a) is an example of 2D image data
  • Fig. 12 (b) is image data showing a peak in which the data force of Fig. 12 (a) is also detected.
  • the same peak extraction algorithm of the present embodiment includes the same peak specifying step of specifying the correspondence between peaks of each sample data detected in the peak detecting step.
  • an allowable deviation range width from the peak is set as a predetermined holding time width on the basis of each peak, and it is included in the holding time width. It has a candidate peak extraction process to extract complementary peaks (Fig. 13: STEP31).
  • the allowable deviation range width is, for example, 0.7 min on the + side.
  • the peak identification process is completed! /. In such a case, the same peak should not be specified within the retention time width.
  • the minimum detection rate is usually set between 0.1 and 0.4 (if it exceeds 0.5, the significance between the two groups It may be difficult to identify the difference).
  • the retention time data is divided into sections by the same specified and supplemented peak (data dividing step) (FIG. 13: STEP35).
  • the candidate peak extraction step, the score calculation step, the peak identification step, and the data division step are recursively repeated for the retention time data (both) divided in the data division step. ( Figure 13: STEP36).
  • the average peak intensity was 10 or more and a significant difference of 0.0001 or less was shown by the U test between the spleen cancer patient group and the non-carried group. There were 109 peaks (80% for the spleen cancer patient group, 29 peaks for the non-cancer-bearing group). In addition, 32 peaks having an area under the R OC curve of 0.9 or more were observed.
  • Fig. 14 (a) to Fig. 14 (c) show a two-dimensional image (Fig. 14 (a)), ROC curve (Fig. 14 (b)), The peak intensity distribution diagram (Fig. 14 (c)) is shown.
  • Fig. 15 (a) shows a peak image with a discrimination rate of 100% (sensitivity 100%, specificity 100%) by combination
  • Fig. 15 (b) shows the spleen divided using the peak. Peak strength between cancer patients and non-cancer carriers Degree distribution.
  • the same peak of retention time data is extracted from the solution containing the plasma of the spleen cancer patient, and the solution force retention time data including the plasma of the non-cancer carrier is extracted.
  • the solution force retention time data including the plasma of the non-cancer carrier is extracted.
  • the data analysis process described above can be normally performed by a data analysis apparatus that can be configured by various computer systems.
  • a program for realizing the data analysis apparatus on a computer system and a computer-readable recording medium recording the program are also subject to protection in this case.
  • the data analysis device is realized by a program such as an OS (second program) that runs on a computer system
  • a program including various instructions for controlling the program such as the OS and the program are recorded.
  • the recorded media are also subject to protection in this case.
  • the recording medium includes not only a flexible disk or the like that can be recognized as a single unit, but also a network that propagates various signals.

Landscapes

  • Physics & Mathematics (AREA)
  • Health & Medical Sciences (AREA)
  • Life Sciences & Earth Sciences (AREA)
  • Chemical & Material Sciences (AREA)
  • Analytical Chemistry (AREA)
  • Biochemistry (AREA)
  • General Health & Medical Sciences (AREA)
  • General Physics & Mathematics (AREA)
  • Immunology (AREA)
  • Pathology (AREA)
  • Other Investigation Or Analysis Of Materials By Electrical Means (AREA)

Abstract

A method of data correction characterized by including the concentration gradient inducing step of applying a solution containing a test substance to liquid chromatography to thereby induce a concentration gradient of the solution over a given period of time; the measuring step of obtaining retention time data and mass/charge ratio data correlated to each other with respect to each of components of the test substance separated and dissolved out during the concentration gradient inducing step; and the correction step of carrying out correction by correlating two-dimensional data consisting of the retention time data and mass/charge ratio data obtained in the measuring step to a standard two-dimensional data determined in advance.

Description

明 細 書  Specification
液体クロマトグラフィーのデータ補正方法  Data correction method for liquid chromatography
技術分野  Technical field
[0001] 本発明は、プロテオーム解析における液体クロマトグラフィーのデータ処理方法、 特には、超低流量液体クロマトグラフィーのデータ処理方法に関している。  [0001] The present invention relates to a data processing method for liquid chromatography in proteome analysis, and more particularly to a data processing method for ultra-low flow liquid chromatography.
背景技術  Background art
[0002] 液体クロマトグラフィーは、カラムに装填された榭脂と溶液中の物質との特異的な親 和性を利用して、溶液の段階的な濃度勾配を作り出すことにより、ある特定の溶液濃 度で樹脂から物質を遊離させる方法である。  [0002] Liquid chromatography utilizes a specific affinity between a resin loaded in a column and a substance in the solution to create a stepwise concentration gradient of the solution, thereby concentrating a specific solution concentration. This is a method of releasing a substance from a resin at a degree.
[0003] 濃度勾配は、時間の関数として表されるように(時間の長さに対応して変化するよう に)作り出される。このため、物質が遊離されてくる保持時間(retention time )を把握 することによって、当該物質の同定 (特定)が可能である。従って、液体クロマトグラフ ィ一における物質の同定は、保持時間の再現性が最も重要である。ここで、カラムへ の溶液の流量が多い場合には、比較的よい再現性が得られる力 カラムへの溶液の 流量が少な!/、場合には、再現性はよくな 、と 、われて!/、る。  [0003] The concentration gradient is created as expressed as a function of time (as it varies with time). Therefore, it is possible to identify (specify) a substance by grasping the retention time during which the substance is released. Therefore, the reproducibility of retention time is the most important for identifying substances in liquid chromatography. Here, when the flow rate of the solution to the column is high, relatively good reproducibility is obtained. The flow rate of the solution to the column is low! / In this case, the reproducibility is good! /
[0004] また、質量分析器を用いたプロテオミタスの手法によれば、生物が有するたんぱく 質を、定量的に同定することができる。このため、プロテオミタスの手法は、医学及び 生物学の分野において広く応用され始めている。プロテオミタスの手法の中で、超低 流量液体クロマトグラフィーと精密質量分析装置とを組み合わせた nanoLC/MSシス テムという装置が、最近注目を集めている。当該装置によれば、微量のサンプルから 、莫大な量のたんぱく質を同定することが可能である。より具体的には、超低流量液 体クロマトグラフィーにより細力べ分離された物質について、精密質量分析装置によつ てそれぞれの質量を正確に測定することによって、当該物質の同定が行われるもの である。  [0004] In addition, according to the proteomics technique using a mass spectrometer, the protein of an organism can be quantitatively identified. For this reason, the proteomics method has begun to be widely applied in the fields of medicine and biology. Among the proteomics methods, a nanoLC / MS system that combines ultra-low flow liquid chromatography and an accurate mass spectrometer has recently attracted attention. According to this apparatus, it is possible to identify a huge amount of protein from a small amount of sample. More specifically, a substance that has been subjected to intensive separation by ultra-low flow liquid chromatography is identified by accurately measuring the mass of each substance with an accurate mass spectrometer. It is.
[0005] し力しながら、超低流量液体クロマトグラフィーから得られる保持時間データに基づ いてたんぱく質同定を行う方法は、未だ実用段階にはない。例えば、 nanoLC/MSシ ステム力 得られる保持時間データと質量データとを 2次元に展開し、その座標から たんぱく質地図を作るという試み自体は、 2002年に既に発表されているが(Lipton MS, et al. Proceeding of the National Academy of Sciences. 99: 11049, 2002)、そのよ うな試みを実用段階にまで発展させた論文は未だに現れていない。 [0005] However, a method for performing protein identification based on retention time data obtained from ultra-low flow liquid chromatography has not yet been put into practical use. For example, the retention time data and mass data obtained from the nanoLC / MS system force are expanded in two dimensions and the coordinates are used. The attempt to create a protein map has already been published in 2002 (Lipton MS, et al. Proceeding of the National Academy of Sciences. 99: 11049, 2002), but such an attempt has been developed to the practical stage. The published paper has not yet appeared.
[0006] また、 Zhang H, et aUこよる「Molecular & Cellular Proteomics. 4:144, 2005」には、 2次元展開された保持時間データと質量データとに「修正」がなされたという記載があ る力 具体的にとのような修正がなされたのかについては、何ら記載がない。  [0006] In addition, Zhang H, et aU, “Molecular & Cellular Proteomics. 4: 144, 2005” describes that two-dimensionally expanded retention time data and mass data were “corrected”. There is no description as to whether the correction was made specifically.
[0007] そして、 2次元展開された保持時間データと質量データとについて、複数のサンプ ル溶液間での比較を如何にして行うことが好ましいか、という点については、何らの 論文も発表されていない。  [0007] And, regarding the point of how to compare the two-dimensionally developed retention time data and mass data between a plurality of sample solutions, any paper has been published. Absent.
非特許文献 1 : Proceeding of the National Academy of Sciences. 99:11049, 2002 (Lip ton MS, et al)  Non-Patent Document 1: Proceeding of the National Academy of Sciences. 99: 11049, 2002 (Lip ton MS, et al)
非特許文献 2 : Molecular & Cellular Proteomics. 4:144, 2005 (Lipton MS, et al) 発明の開示  Non-Patent Document 2: Molecular & Cellular Proteomics. 4: 144, 2005 (Lipton MS, et al) Invention Disclosure
発明が解決しょうとする課題  Problems to be solved by the invention
[0008] 本発明は、このような点を考慮してなされたものであり、検査ごと (測定ごと)に異なり 得る保持時間のデータを補正して、実質的に高い再現性を確認することができるよう な液体クロマトグラフィーのデータ補正方法、特には、超低流量液体クロマトグラフィ 一の補正方法を提供することを目的とする。 [0008] The present invention has been made in consideration of the above points, and it is possible to confirm the substantially high reproducibility by correcting the data of the retention time that may differ for each inspection (each measurement). It is an object of the present invention to provide a liquid chromatography data correction method that can be performed, particularly, an ultra-low flow liquid chromatography correction method.
[0009] また、本発明は、複数の検査対象溶液についてのデータ比較を高精度に実現する ことができる液体クロマトグラフィーのデータ分析方法、特には、超低流量液体クロマ トグラフィ一の分析方法を提供することを目的とする。 [0009] Further, the present invention provides a liquid chromatography data analysis method capable of realizing high-precision data comparison of a plurality of solutions to be tested, in particular, an ultra-low flow liquid chromatography analysis method. The purpose is to do.
課題を解決するための手段  Means for solving the problem
[0010] 本発明は、液体クロマトグラフィーに被検査物質を含む溶液を流して、当該溶液の 濃度勾配を所定時間かけて生成する濃度勾配生成工程と、前記濃度勾配生成工程 中に分離溶出した前記被検査物質の各成分にっ 、て、保持時間データと質量電荷 比データとを対応付けて得る測定工程と、前記測定工程にて得られた保持時間デー タと質量電荷比データとの 2次元データを、予め求めてあった標準 2次元データと相 関させることによって補正する補正工程と、を備えたことを特徴とするデータ補正方法 である。 [0010] The present invention provides a concentration gradient generation step of flowing a solution containing a substance to be inspected through liquid chromatography to generate a concentration gradient of the solution over a predetermined time, and the separation and elution during the concentration gradient generation step Each component of the inspected substance is a two-dimensional measurement process in which retention time data and mass-to-charge ratio data are associated with each other, and the retention time data and mass-to-charge ratio data obtained in the measurement process. And a correction step of correcting the data by correlating with the standard two-dimensional data obtained in advance. It is.
[0011] 本件発明者によれば、液体クロマトグラフィー方法では、測定される保持時間デー タに関して、後述するように、例えば平均で 79秒、最大で 192秒ものずれが生じ得る 。し力しながら、本発明によれば、 2つの 2次元データを相関させることによって、実質 的に高い再現性を確認することが可能である。すなわち、 2つの 2次元データを相関 させることによって、両者の特徴を重ね合わせて評価することが可能となるため、測定 データ自体の絶対値の相違の存在に関わらず、測定データを評価する上で高い再 現性を確認することができるのである。これにより、例えば異なる試料間での発現差 異さえも識別することが可能となる。従って、本発明は、今後の発展が強く期待されて V、るプロテオミタスの分野にぉ 、て、極めて重要な貢献をもたらすであろう。  [0011] According to the present inventors, in the liquid chromatography method, as will be described later, with respect to the retention time data to be measured, for example, a deviation of 79 seconds on average and 192 seconds at maximum can occur. However, according to the present invention, it is possible to confirm substantially high reproducibility by correlating two two-dimensional data. In other words, by correlating two two-dimensional data, it is possible to superimpose and evaluate the characteristics of the two. Therefore, regardless of the difference in the absolute value of the measurement data itself, High reproducibility can be confirmed. This makes it possible to identify even expression differences between different samples, for example. Therefore, the present invention is expected to be developed in the future, and will make a very important contribution to the field of proteomics.
[0012] 測定によって得られた 2次元データを相関させる対象は、予め求めてあった標準 2 次元データに限定されない。例えば、測定によって得られた 2つの 2次元データを相 関させることも勿論可能である。  [0012] The target for correlating the two-dimensional data obtained by the measurement is not limited to the standard two-dimensional data obtained in advance. For example, it is of course possible to correlate two two-dimensional data obtained by measurement.
[0013] すなわち、本発明は、液体クロマトグラフィーに第 1被検査物質を含む第 1溶液を流 して、当該第 1溶液の濃度勾配を所定時間かけて生成する第 1濃度勾配生成工程と 、前記第 1濃度勾配生成工程中に分離溶出した前記第 1被検査物質の各成分につ いて、保持時間データと質量電荷比データとを対応付けて得る第 1測定工程と、前記 液体クロマトグラフィーに第 2被検査物質を含む第 2溶液を流して、当該第 2溶液の 濃度勾配を所定時間かけて生成する第 2濃度勾配生成工程と、前記第 2濃度勾配 生成工程中に分離溶出した前記第 2被検査物質の各成分について、保持時間デー タと質量電荷比データとを対応付けて得る第 2測定工程と、前記第 2測定工程にて得 られた保持時間データと質量電荷比データとの 2次元データを、前記第 1測定工程 にて得られた保持時間データと質量電荷比データとの 2次元データと相関させること によって補正する補正工程と、を備えたことを特徴とするデータ補正方法である。  [0013] That is, the present invention includes a first concentration gradient generation step of generating a concentration gradient of the first solution over a predetermined time by flowing a first solution containing the first analyte in liquid chromatography. A first measurement step for associating retention time data and mass-to-charge ratio data for each component of the first analyte to be separated and eluted during the first concentration gradient generation step; and the liquid chromatography. A second concentration gradient generating step for generating a concentration gradient of the second solution over a predetermined time by flowing a second solution containing the second substance to be inspected; and the second concentration gradient separated and eluted during the second concentration gradient generating step. (2) For each component of the substance to be inspected, the second measurement step obtained by associating the retention time data with the mass to charge ratio data, and the retention time data and the mass to charge ratio data obtained in the second measurement step. Two-dimensional data is converted into the first measurement step And a correction step of correcting by correlating with the two-dimensional data of the retention time data and the mass-to-charge ratio data obtained in (1).
[0014] 本発明においても、 2つの 2次元データを相関させることによって、実質的に高い再 現性を確認することが可能である。すなわち、 2つの 2次元データを相関させることに よって、両者の特徴を重ね合わせて評価することが可能となるため、測定データ自体 の絶対値の相違の存在に関わらず、測定データを評価する上で高!ヽ再現性を確認 することができる。 [0014] Also in the present invention, it is possible to confirm substantially high reproducibility by correlating two two-dimensional data. In other words, by correlating two two-dimensional data, it is possible to superimpose and evaluate the characteristics of the two, so that the evaluation of the measurement data can be performed regardless of the difference in the absolute value of the measurement data itself. Confirmation of reproducibility can do.
[0015] 更に、相関される 2つの 2次元データは、異なる液体クロマトグラフィーを用いて得ら れたものであってもよい。  [0015] Furthermore, the two correlated two-dimensional data may have been obtained using different liquid chromatography.
[0016] すなわち、本発明は、第 1液体クロマトグラフィーに第 1被検査物質を含む第 1溶液 を流して、当該第 1溶液の濃度勾配を所定時間かけて生成する第 1濃度勾配生成ェ 程と、前記第 1濃度勾配生成工程中に分離溶出した前記第 1被検査物質の各成分 について、保持時間データと質量電荷比データとを対応付けて得る第 1測定工程と、 第 2液体クロマトグラフィーに第 2被検査物質を含む第 2溶液を流して、当該第 2溶液 の濃度勾配を所定時間かけて生成する第 2濃度勾配生成工程と、前記第 2濃度勾 配生成工程中に分離溶出した前記第 2被検査物質の各成分について、保持時間デ ータと質量電荷比データとを対応付けて得る第 2測定工程と、前記第 2測定工程にて 得られた保持時間データと質量電荷比データとの 2次元データを、前記第 1測定ェ 程にて得られた保持時間データと質量電荷比データとの 2次元データと相関させるこ とによって補正する補正工程と、を備えたことを特徴とするデータ補正方法である。  [0016] That is, the present invention provides a first concentration gradient generation step for generating a concentration gradient of the first solution over a predetermined time by flowing the first solution containing the first substance to be tested through the first liquid chromatography. A first measurement step for obtaining retention time data and mass-to-charge ratio data in association with each component of the first analyte to be separated and eluted during the first concentration gradient generation step, and a second liquid chromatography The second solution containing the second substance to be tested is flowed into the second concentration gradient generating step for generating a concentration gradient of the second solution over a predetermined time and separated and eluted during the second concentration gradient generating step. For each component of the second substance to be inspected, a second measurement step in which retention time data and mass-to-charge ratio data are associated with each other, and retention time data and mass-to-charge ratio obtained in the second measurement step 2D data with the first measurement A data correction method characterized by comprising a correction step of correcting by the this correlating a two-dimensional data of the obtained retention time data and mass-to-charge ratio data in extent.
[0017] 本発明においても、 2つの 2次元データを相関させることによって、実質的に高い再 現性を確認することが可能である。すなわち、 2つの 2次元データを相関させることに よって、両者の特徴を重ね合わせて評価することが可能となるため、測定データ自体 の絶対値の相違の存在に関わらず、測定データを評価する上で高!ヽ再現性を確認 することができる。  [0017] Also in the present invention, it is possible to confirm substantially high reproducibility by correlating two two-dimensional data. In other words, by correlating two two-dimensional data, it is possible to superimpose and evaluate the characteristics of the two, so that the evaluation of the measurement data can be performed regardless of the difference in the absolute value of the measurement data itself. With high! Reproducibility can be confirmed.
[0018] 本発明によれば、測定される保持時間のずれを補正工程によって補正することが 可能であるため、従来は実用されていな力つた超低流量の液体クロマトグラフィーの データを利用することが現実的となる。具体的には、前記濃度勾配生成工程におい て、液体クロマトグラフィーに被検査物質を含む溶液を、 500nlZmin以下の流量、 特に好ましくは 200nlZmin程度の流量、で流すことができる。  [0018] According to the present invention, it is possible to correct the measured holding time deviation by the correction process, and therefore, the use of powerful ultra-low flow rate liquid chromatography data that has not been practically used in the past is used. Becomes realistic. Specifically, in the concentration gradient generating step, a solution containing a substance to be inspected in liquid chromatography can be flowed at a flow rate of 500 nlZmin or less, particularly preferably at a flow rate of about 200 nlZmin.
[0019] また、好ましくは、前記補正工程では、 2つの 2次元データのサイクル番号 (保持時 間に対する昇順番号)を各軸とした 2次元の格子座標を用いて最適対応位置を探索 する動的アルゴリズムが用いられるようになって 、る。  [0019] Preferably, in the correction step, a dynamic search for searching for an optimum corresponding position using a two-dimensional grid coordinate with each cycle being a cycle number of two two-dimensional data (ascending order number with respect to holding time). Algorithms are being used.
[0020] より具体的には、前記動的アルゴリズムは、例えば、一方の 2次元データの nサイク ル目における質量電荷比(マススペクトル) A (n)と、他方の 2次元データの mサイクル 目における質量電荷比 B(n)と、の間のピアソン積率相関係数を R(A(n), B(m))と し、ギャップペナルティを gとし、一方の 2次元データの総サイクル数を Nとし、他方の 2次元データの総サイクル数を Mとした時、 2つの 2次元データのサイクル番号を各軸 とした 2次元の格子座標 L(i, j)を、 [0020] More specifically, the dynamic algorithm is, for example, n cycles of one two-dimensional data. The Pearson product-moment correlation coefficient between the mass-to-charge ratio (mass spectrum) A (n) in the second eye and the mass-to-charge ratio B (n) in the m-th cycle of the other two-dimensional data is R (A (n ), B (m)), the gap penalty is g, the total number of cycles of one 2D data is N, and the total number of cycles of the other 2D data is M, the two 2D data Two-dimensional lattice coordinates L (i, j) with the cycle number as each axis
L(i, j) =max(L(i— 1, j) +g、  L (i, j) = max (L (i- 1, j) + g,
L(i, ト l)+gゝ  L (i, G) + g ゝ
L(i-1, j-l)+R(A(n), B(m)))  L (i-1, j-l) + R (A (n), B (m)))
によって求め(i=l, ···, NJ = 1, ···, M)、最適対応位置に対応するように、 L = argmax(k, 1) , ((k = N、l=l, · · ·, M)及び(k=l, · · · , N、 1=M) )を与える座標 (I = l, ···, NJ = 1, ···, M) and L = argmax (k, 1), ((k = N, l = l, ···, M) and (k = l, ···, N, 1 = M))
(k, 1) =V を始点として、 Starting from (k, 1) = V
o  o
L = argmax(V ), (V =V - (1, 1), V_ - (0, 1), V_ 一(1, 0)) で表される座標配列を決定するようになって 、る。  A coordinate array represented by L = argmax (V), (V = V- (1, 1), V _- (0, 1), V_one (1, 0)) is determined.
[0021] 更に好ましくは、前記動的アルゴリズムは、 L = argmax(V )のうちで V =V [0021] More preferably, the dynamic algorithm is V = V out of L = argmax (V).
i i +1 i i i +1 i
+ (1, 1)を満たす座標のみを抽出した後、サイクル番号を保持時間に変換し、スプ ライン補完または多項式回帰により得られる曲線を補正関数として決定するようにな つている。 + After extracting only the coordinates that satisfy (1, 1), the cycle number is converted to the retention time, and the curve obtained by spline interpolation or polynomial regression is determined as the correction function.
[0022] また、本発明は、複数の被検査物質 (または第 2被検査物質)をそれぞれ含む複数 の溶液 (または第 2溶液)の各々について、前記のいずれかの特徴を有するデータ補 正方法を実施するデータ補正工程と、前記データ補正方法によって補正された保持 時間データと質量電荷比データとの 2次元データを、ある質量電荷について、各溶 液 (または各第 2溶液)の保持時間データを並列に並べた 2次元画像データに展開 するデータ展開工程と、前記 2次元画像データに基づいて、保持時間データの同一 ピークを抽出する同一ピーク抽出工程と、を備えたことを特徴とするデータ分析方法 である。  [0022] Further, the present invention provides a data correction method having any one of the above-described characteristics for each of a plurality of solutions (or second solutions) each including a plurality of substances to be inspected (or second substances to be inspected). The two-dimensional data of the data correction process for carrying out the process and the retention time data corrected by the data correction method and the mass-to-charge ratio data are stored as the retention time data of each solution (or each second solution) for a certain mass charge. A data development process that develops two-dimensional image data arranged in parallel, and a same peak extraction process that extracts the same peak of retention time data based on the two-dimensional image data It is an analysis method.
[0023] 本発明によれば、複数の被検査物質 (または第 2被検査物質)をそれぞれ含む複 数の溶液 (または第 2溶液)から保持時間データの同一ピークを抽出することにより、 当該複数の溶液 (または第 2溶液)のデータ特性を効果的に分析することができる。こ れにより、腫瘍マーカー等の疾患マーカーの開発を著しく促進することが期待できる [0023] According to the present invention, the same peak of retention time data is extracted from a plurality of solutions (or second solutions) each containing a plurality of substances to be inspected (or second substances to be inspected). The data characteristics of this solution (or second solution) can be analyzed effectively. This As a result, it can be expected to significantly promote the development of disease markers such as tumor markers.
[0024] 例えば、第 1群に属する複数の被検査物質 (または第 2被検査物質)をそれぞれ含 む複数の溶液 (または第 2溶液)について、及び、第 2群に属する複数の被検査物質 (または第 2被検査物質)をそれぞれ含む複数の溶液 (または第 2溶液)について、前 記のデータ分析方法を実施し (データ分析工程)、第 1群の複数の溶液 (または第 2 溶液)から得られた保持時間データの同一ピークと、第 2群の複数の溶液 (または第 2 溶液)から得られた保持時間データの同一ピークと、を比較して、両者に有意な差が ある力否かを検証し (検定工程)、有意な差が認められるならば、それらの差を「マー カー」として利用することができるのである。 [0024] For example, for a plurality of solutions (or second solutions) each containing a plurality of test substances (or second test substances) belonging to the first group, and a plurality of test substances belonging to the second group For the multiple solutions (or the second solution) each containing (or the second test substance), the data analysis method described above is performed (the data analysis step), and the first group of multiple solutions (or the second solution) Compare the same peak of retention time data obtained from the same peak of retention time data obtained from multiple solutions (or 2nd solution) of the second group, and there is a significant difference between them. If there is a significant difference, the difference can be used as a “marker”.
[0025] ここで、通常、前記同一ピーク抽出工程は、各溶液 (または各第 2溶液)の保持時間 データについてのピークを検出するピーク検出工程と、前記ピーク検出工程で検出 された各溶液 (または各第 2溶液)のピーク同士の対応関係を特定する同一ピーク特 定工程と、を含んでいる。  [0025] Here, usually, the same peak extraction step includes a peak detection step for detecting a peak for retention time data of each solution (or each second solution), and each solution detected in the peak detection step ( Or the same peak specifying step for specifying the correspondence between peaks of each second solution).
[0026] そして、好ましくは、前記同一ピーク特定工程は、所定の保持時間幅内に含まれる 候補ピークを抽出する候補ピーク抽出工程と、ある溶液 (または第 2溶液)において 前記候補ピーク抽出工程で抽出された候補ピークが一つ以上ある場合には、当該 溶液 (または第 2溶液)についての候補ピークを一つ選択し、ある溶液 (または第 2溶 液)において前記候補ピーク抽出工程で抽出された候補ピークが無い場合には、当 該溶液 (または第 2溶液)についての候補ピークは無いものとして、前記候補ピークの 選択の全組合せの各々について、選択された候補ピークのスコア (総強度)を計算す るスコア計算工程と、前記スコア計算工程で得られたスコアのうち、最大のスコアを提 供する候補ピークの選択の組み合わせを、互いに対応する同一ピークとして特定す るピーク特定工程と、を有している。  [0026] Preferably, the same peak specifying step includes a candidate peak extracting step of extracting a candidate peak included within a predetermined holding time width, and the candidate peak extracting step in a certain solution (or second solution). When there are one or more extracted candidate peaks, one candidate peak for the solution (or the second solution) is selected and extracted in the candidate peak extraction step in a certain solution (or the second solution). If there is no candidate peak, there is no candidate peak for the solution (or second solution), and the selected candidate peak score (total intensity) for each of the combinations of candidate peak selections. The combination of the score calculation step for calculating the selection and the selection of the candidate peak that provides the maximum score among the scores obtained in the score calculation step correspond to each other. And a peak identification step that identifies the peak as a peak.
[0027] この場合、より好ましくは、前記同一ピーク特定工程は、前記ピーク特定工程の後 に、前記ピーク特定工程において特定された同一ピークによって保持時間データを 区間分割するデータ分割工程を更に含んでおり、前記データ分割工程において区 間分割された保持時間データについて、前記候補ピーク抽出工程、前記スコア計算 工程、前記ピーク特定工程、及び、前記データ分割工程を再帰的に繰り返すよう〖こ なっている。 [0027] In this case, more preferably, the same peak specifying step further includes a data dividing step of dividing retention time data into sections by the same peak specified in the peak specifying step after the peak specifying step. The candidate peak extraction step, the score calculation for the retention time data divided in the data division step The process, the peak identifying process, and the data dividing process are recursively repeated.
[0028] また、例えば、前記候補ピーク抽出工程は、各ピークを基準にして、当該ピークから の許容ずれ範囲幅を所定の保持時間幅として行われるようになつている。例えば、前 記許容ずれ範囲幅は、 +側に 0. 7minである。  [0028] Further, for example, the candidate peak extracting step is performed with each peak as a reference and an allowable deviation range width from the peak as a predetermined holding time width. For example, the allowable deviation range is 0.7 min on the + side.
[0029] この場合、計算数 (演算負担)低減のために、前記候補ピーク抽出工程で抽出され た候補ピークが無い溶液 (または第 2溶液)の割合が、所定の最小検出率を下回った 場合には、当該ピークを基準にした前記同一ピーク特定工程の実施が終了されるよ うになつていることが好ましい。この場合、好ましくは、前記最小検出率は、 0. 1〜0. 4に設定される (0. 5を越えると、二群間での有意な差を特定し難くなる)。  [0029] In this case, in order to reduce the number of calculations (computation burden), the ratio of the solution without the candidate peak (or the second solution) extracted in the candidate peak extraction step falls below a predetermined minimum detection rate. It is preferable that the execution of the same peak specifying step based on the peak is terminated. In this case, preferably, the minimum detection rate is set to 0.1 to 0.4 (if it exceeds 0.5, it becomes difficult to specify a significant difference between the two groups).
[0030] また、好ましくは、前記データ展開工程は、単位質量電荷ごとに、各溶液 (または各 第 2溶液)の保持時間データを並列に並べた 2次元画像データに展開するようになつ ており、前記同一ピーク抽出工程は、前記 2次元画像データに基づいて、単位質量 電荷ごとに、保持時間データの同一ピークを抽出するようになっている。  [0030] Preferably, in the data development step, the retention time data of each solution (or each second solution) is developed into two-dimensional image data arranged in parallel for each unit mass charge. The same peak extracting step extracts the same peak of the retention time data for each unit mass charge based on the two-dimensional image data.
[0031] また、本発明は、液体クロマトグラフィーに被検査物質を含む溶液を流して、当該溶 液の濃度勾配を所定時間かけて生成する濃度勾配生成工程と、前記濃度勾配生成 工程中に分離溶出した前記被検査物質の各成分について、保持時間データと質量 電荷比データとを対応付けて得る測定工程と、を備えた液体クロマトグラフィー方法 のためのデータ補正装置であって、前記測定工程にて得られた保持時間データと質 量電荷比データとの 2次元データを、予め求めてあった標準 2次元データと相関させ ることによって補正するようになっており、 2つの 2次元データのサイクル番号 (保持時 間に対する昇順番号)を各軸とした 2次元の格子座標を用いて最適対応位置を探索 する動的アルゴリズム(ソフトウェア)が用いられるようになって!/、ることを特徴とするデ ータ補正装置である。  [0031] Further, the present invention provides a concentration gradient generating step for generating a concentration gradient of the solution over a predetermined time by flowing a solution containing a substance to be inspected in liquid chromatography, and separation during the concentration gradient generating step. A measurement step for associating retention time data and mass-to-charge ratio data for each component of the eluted substance to be inspected, and a data correction apparatus for a liquid chromatography method comprising: The two-dimensional data cycle is corrected by correlating the two-dimensional data of the retention time data and mass-to-charge ratio data obtained by correlating with the standard two-dimensional data obtained in advance. Dynamic algorithms (software) that search for the best corresponding position using two-dimensional grid coordinates with numbers (ascending order with respect to holding time) as axes are now used! / A data correction apparatus according to claim Rukoto.
[0032] 前記補正装置あるいは当該補正装置の各要素手段は、コンピュータシステムによ つて実現され得る。  [0032] The correction device or each element means of the correction device can be realized by a computer system.
[0033] また、コンピュータシステムにそれらを実現させるためのプログラム及び当該プログ ラムを記録したコンピュータ読取り可能な記録媒体も、本件の保護対象である。 [0034] ここで、記録媒体とは、フレキシブルディスク等の単体として認識できるものの他、各 種信号を伝搬させるネットワークをも含む。 [0033] Further, a program for realizing them in a computer system and a computer-readable recording medium recording the program are also subject to protection in this case. Here, the recording medium includes a network that propagates various signals in addition to a medium that can be recognized as a single unit such as a flexible disk.
[0035] また、本発明は、同一のまたは異なる液体クロマトグラフィーに複数の被検査物質 をそれぞれ含む複数の溶液を流して、当該各溶液の濃度勾配を所定時間かけて生 成する濃度勾配生成工程と、前記濃度勾配生成工程中に分離溶出した前記被検査 物質の各成分について、各溶液毎に、保持時間データと質量電荷比データとを対応 付けて得る測定工程と、前記測定工程にて得られた保持時間データと質量電荷比 データとの 2次元データを、ある質量電荷について、各溶液の保持時間データを並 列に並べた 2次元画像データに展開するデータ展開工程と、前記 2次元画像データ に基づいて、保持時間データの同一ピークを抽出する同一ピーク抽出工程と、を備 えたことを特徴とするデータ分析方法である。  [0035] Further, the present invention provides a concentration gradient generation step in which a plurality of solutions each containing a plurality of substances to be tested are flowed through the same or different liquid chromatography and a concentration gradient of each solution is generated over a predetermined time. And a measurement step obtained by associating retention time data with mass-to-charge ratio data for each solution for each component of the substance to be inspected separated and eluted during the concentration gradient generation step, and obtained by the measurement step. Data development step of developing the two-dimensional data of the obtained retention time data and mass-to-charge ratio data into two-dimensional image data in which the retention time data of each solution is arranged in parallel for a certain mass charge, and the two-dimensional image It is a data analysis method characterized by comprising the same peak extraction step of extracting the same peak of retention time data based on the data.
[0036] 本件出願の時点では、本願にお!、て優先権主張の基礎とする「特願 2005 - 1775 47」にて提案されたデータ補正方法を利用しなければ、前記 2次元画像データに基 づいて保持時間データの同一ピークを抽出するという同一ピーク抽出工程を、事実 上実施することができない(図 16に示すように、ピーク同士の対応を特定することが できない)。し力しながら、将来においてデータ測定方法の精度が向上した場合には 、「特願 2005— 177547」にて提案されたデータ補正方法を用いることなぐ本願に おいて提案するデータ分析方法が単独で利用され得る。すなわち、本発明によれば [0036] At the time of the filing of the present application, if the data correction method proposed in "Japanese Patent Application 2005-1775 47", which is the basis for claiming priority, is not used, The same peak extraction process of extracting the same peak of retention time data based on this cannot be performed in practice (the correspondence between peaks cannot be specified as shown in FIG. 16). However, if the accuracy of the data measurement method is improved in the future, the data analysis method proposed in this application without using the data correction method proposed in “Japanese Patent Application 2005-177547” is independent. Can be used. That is, according to the present invention,
、複数の被検査物質 (または第 2被検査物質)をそれぞれ含む複数の溶液 (または第 2溶液)から保持時間データの同一ピークを抽出することにより、当該複数の溶液 (ま たは第 2溶液)のデータ特性を効果的に分析することができ、腫瘍マーカー等の疾患 マーカーの開発を著しく促進することが期待できる。 By extracting the same peak of retention time data from a plurality of solutions (or second solutions) each containing a plurality of substances to be inspected (or second substances to be inspected), the plurality of solutions (or second solutions) ) Data characteristics can be effectively analyzed, and the development of disease markers such as tumor markers can be significantly promoted.
[0037] また、本発明は、同一のまたは異なる液体クロマトグラフィーに複数の被検査物質 をそれぞれ含む複数の溶液を流して、当該各溶液の濃度勾配を所定時間かけて生 成する濃度勾配生成工程と、前記濃度勾配生成工程中に分離溶出した前記被検査 物質の各成分について、各溶液毎に、保持時間データと質量電荷比データとを対応 付けて得る測定工程と、を備えた液体クロマトグラフィー方法のためのデータ分析装 置であって、前記測定工程にて得られた保持時間データと質量電荷比データとの 2 次元データを、ある質量電荷について、各溶液の保持時間データを並列に並べた 2 次元画像データに展開するデータ展開装置と、前記 2次元画像データに基づいて、 保持時間データの同一ピークを抽出する同一ピーク抽出装置と、を備えたことを特徴 とするデータ分析装置である。 [0037] Further, the present invention provides a concentration gradient generation step in which a plurality of solutions each containing a plurality of substances to be tested are flowed through the same or different liquid chromatography, and a concentration gradient of each solution is generated over a predetermined time. And a measuring step for associating retention time data and mass-to-charge ratio data for each solution for each component of the substance to be inspected separated and eluted during the concentration gradient generation step. A data analysis apparatus for the method, wherein the retention time data obtained in the measurement step and the mass-to-charge ratio data are 2 A data development device that develops two-dimensional image data in which the retention time data of each solution is arranged in parallel for a certain mass charge, and extracts the same peak of the retention time data based on the two-dimensional image data. A data analysis device comprising the same peak extraction device.
[0038] 前記データ分析装置あるいは当該データ分析装置の各要素手段は、コンピュータ システムによって実現され得る。  [0038] The data analysis device or each element means of the data analysis device can be realized by a computer system.
[0039] また、コンピュータシステムにそれらを実現させるためのプログラム及び当該プログ ラムを記録したコンピュータ読取り可能な記録媒体も、本件の保護対象である。  [0039] Further, a program for realizing them in a computer system and a computer-readable recording medium recording the program are also subject to protection in this case.
[0040] ここで、記録媒体とは、フレキシブルディスク等の単体として認識できるものの他、各 種信号を伝搬させるネットワークをも含む。  [0040] Here, the recording medium includes a network that propagates various signals in addition to a medium that can be recognized as a single unit such as a flexible disk.
図面の簡単な説明  Brief Description of Drawings
[0041] [図 1]本発明の一実施の形態の概略を示すフロー図。  FIG. 1 is a flowchart showing an outline of one embodiment of the present invention.
[図 2]本発明の一実施の形態の動的アルゴリズムの概念を示す概略図。  FIG. 2 is a schematic diagram showing the concept of a dynamic algorithm according to an embodiment of the present invention.
[図 3]本発明の一実施の形態の動的アルゴリズムの概念を示す概略図。  FIG. 3 is a schematic diagram showing the concept of a dynamic algorithm according to an embodiment of the present invention.
[図 4]本発明の一実施の形態の動的アルゴリズムの作用を説明する概略図。  FIG. 4 is a schematic diagram illustrating the operation of a dynamic algorithm according to an embodiment of the present invention.
[図 5]本発明の一実施の形態の動的アルゴリズムの作用を説明する概略図。  FIG. 5 is a schematic diagram illustrating the operation of a dynamic algorithm according to an embodiment of the present invention.
[図 6]本発明の一実施の形態の動的アルゴリズムの作用を説明する概略図。  FIG. 6 is a schematic diagram for explaining the operation of a dynamic algorithm according to an embodiment of the present invention.
[図 7]測定データの補正の例 (相違の例)を示すグラフ。  FIG. 7 is a graph showing an example of correction of measurement data (an example of difference).
[図 8]補正されたデータの再現性を示すグラフ。  [Fig. 8] Graph showing the reproducibility of corrected data.
[図 9]本発明の第 2の実施の形態の概略を示すフロー図。  FIG. 9 is a flowchart showing an outline of the second embodiment of the present invention.
[図 10]各サンプルの保持時間データを縦軸方向に並べた 2次元画像データの例。  [FIG. 10] An example of 2D image data in which the retention time data of each sample is arranged in the vertical axis direction.
[図 11a]ベースライン補正工程の概念を示すグラフ。  FIG. 11a is a graph showing the concept of a baseline correction process.
[図 1 lb]スムージング工程の概念を示すグラフ。  [Fig. 1 lb] A graph showing the concept of the smoothing process.
[図 11c]ピーク検出工程の概念を示すグラフ。  FIG. 11c is a graph showing the concept of the peak detection process.
[図 12a]2次元画像データの例。  [Fig. 12a] Example of 2D image data.
[図 12b]図 12aの 2次元画像データから検出されたピークを示す画像データ。  [FIG. 12b] Image data indicating peaks detected from the two-dimensional image data in FIG. 12a.
[図 13]本実施の形態の同一ピーク抽出工程(同一ピーク抽出アルゴリズム)を示す概 略フロー図 n [図 14a]有意差が認められた同一ピークを含む 2つの 2次元画像の例。 [Fig. 13] Schematic flowchart showing the same peak extraction process (same peak extraction algorithm) of the present embodiment n [Fig. 14a] Example of two two-dimensional images including the same peak with a significant difference.
[図 14b]図 14aの ROC曲線。  [Fig. 14b] ROC curve of Fig. 14a.
[図 14c]図 14aのピーク強度分布図。  [FIG. 14c] Peak intensity distribution diagram of FIG. 14a.
[図 15a]有意差が認められた 2つの同一ピークを含む 2組の 2次元画像の例。  [Figure 15a] Examples of two sets of two-dimensional images containing two identical peaks with significant differences.
[図 15b]図 15aのピーク強度分布図。  FIG. 15b is a peak intensity distribution diagram of FIG. 15a.
[図 16]データ補正を行わない場合の 2次元画像の例。  [Fig.16] Example of 2D image without data correction.
発明を実施するための最良の形態  BEST MODE FOR CARRYING OUT THE INVENTION
[0042] 以下、本発明の実施の形態を、図面を参照して説明する。 Hereinafter, embodiments of the present invention will be described with reference to the drawings.
[0043] まず、本実施の形態で使用された被検査物質及び溶液について説明する。 First, the substance to be inspected and the solution used in the present embodiment will be described.
[0044] 被検査物質としては、テトラサイクリンで発現コントロールできるァクチニン 4 (ACTN 4 )を遺伝子導入した DLD1ヒト大腸癌細胞株(Honda et al. Gastroenterology 2005; 128: 51-62)が用いられた。通常培養では ACTN4が発現されるが(DLDl Tet-off A CTN4 )、 0. 01〜0. 1 μ g/mlのドキソサイクリン(Dox)により、 ACTN4の発現は抑 制される (DLDl Tet-on ACTN4 )。 DLDl Tet— off ACTN4及び DLDl Tet-on ACTN 4のそれぞれの細胞溶液を 3mgZmlの濃度で調整した。 [0044] As a test substance, a DLD1 human colon cancer cell line (Honda et al. Gastroenterology 2005; 128: 51-62) into which actinin 4 (ACTN 4) whose expression can be controlled with tetracycline was introduced was used. In normal culture, ACTN4 is expressed (DLDl Tet-off A CTN4), but 0.01 to 0.1 μg / ml of doxocycline (Dox) suppresses ACTN4 expression (DLDl Tet- on ACTN4). Each cell solution of DLDl Tet—off ACTN4 and DLDl Tet-on ACTN 4 was adjusted to a concentration of 3 mgZml.
[0045] 次に、 DLDl Tet- off ACTN4及び DLDl Tet-on ACTN4のそれぞれの細胞溶液を 100 1採って、アセトン沈殿にて蛋白濃縮を行った。そして、 5Mの UREAを 10 1 、 1Mの NH HCO を 2. 5 1、トリプシンを 3. 3 μ gカロえた後、精製水にて 50 μ 1と [0045] Next, 100 1 of each cell solution of DLDl Tet-off ACTN4 and DLDl Tet-on ACTN4 was collected, and protein concentration was performed by acetone precipitation. Then, add 5 M UREA to 101, 1M NH HCO to 2.51, trypsin to 3.3 μg, and then add 50 μ1 with purified water.
4 3  4 3
した。  did.
[0046] そして、 37°Cで 20時間消化反応させた後、ァセトニトリル 50 μ 1を添カ卩し、 17400G で 10分間遠心し、上清を他のチューブに移し、スピードバックを用いて乾燥した。そ して、 0. 1%蟻酸 50 1で溶解して、測定用のサンプル (溶液)とした(図 1 : STEP1)  [0046] Then, after digestion reaction at 37 ° C for 20 hours, 50 μl of acetonitrile was added, centrifuged at 17400 G for 10 minutes, the supernatant was transferred to another tube, and dried using a speed bag. . Then, it was dissolved in 0.1% formic acid 50 1 to obtain a sample (solution) for measurement (Figure 1: STEP 1).
[0047] 一方、超低流量液体クロマトグラフィーとして、 Splitless Nano HPLC System (KYA ,東京)が用いられた。粒子径 3 m、ポアサイズ 120Aの高純度シリカゲルに、オタ タデシル基を導入後、残存シラノール基を極限までエンドキヤッビングした内径 0. 15 mm、長さ 50mmの逆相カラムが分離カラムに、内径 0. 5mm、長さ lmmのものがト ラップカラムに、それぞれ用いられた (HiQ sil, KYA,東京)。 [0048] そして、前記サンプルを 10 μ 1採って、 200nlZminと!ヽぅ超低流量で、 0. 1%蟻酸 力ら 0. 1%蟻酸 80%ァセトニトリルまで、 60分間かけて、連続濃度勾配を生成した( 図 1 : STEP2)。その間に、各成分が分離溶出した(図 1 : STEP3)。 On the other hand, Splitless Nano HPLC System (KYA, Tokyo) was used as ultra-low flow liquid chromatography. Otadecyl group introduced into high-purity silica gel with a particle size of 3 m and a pore size of 120 A, and then the remaining silanol groups are end-capped to the limit. 0.5 mm and 1 mm long ones were used for the trap columns (HiQ sil, KYA, Tokyo). [0048] Then, 10 μl of the sample was taken, and at a very low flow rate of 200 nlZmin, a continuous concentration gradient was obtained over 60 minutes from 0.1% formic acid to 0.1% formic acid 80% acetonitrile. Generated (Figure 1: STEP2). In the meantime, each component was separated and eluted (Figure 1: STEP3).
[0049] 各成分の質量分析には、 QTOF Ultima (Waters, MA, USA)が用いられ、 250〜16 OOMZZまでの範囲で、スキャン時間を 1秒として、セントロイド形式で、 60分間測定 が行われた。 DLD1 Tet- off ACTN4及び DLD1 Tet- on ACTN4のそれぞれについ て、 duplicateでデータが採取された(2回データが採取された)(図 1 : STEP4)。  [0049] QTOF Ultima (Waters, MA, USA) was used for mass analysis of each component, and measurement was performed for 60 minutes in the centroid format with a scan time of 1 second in the range from 250 to 16 OOMZZ. It was broken. For each of DLD1 Tet-off ACTN4 and DLD1 Tet-on ACTN4, data was collected in duplicate (data was collected twice) (Figure 1: STEP4).
[0050] データの 2次元表示では、質量電荷比が lmZz (mass to charge ratio )の範囲ごと の最大値に変換されて、 wiff形式で出力された。なお、解析対象範囲は、質量電荷 比力 00〜: L000mZz、保持時間(RT)が l〜1800sec、に限定され、強度(Intensi ty)く 200の値が、 1〜255のグレースケールに置換されて表示された。  [0050] In the two-dimensional display of the data, the mass-to-charge ratio was converted to the maximum value for each range of lmZz (mass to charge ratio) and output in wiff format. The analysis target range is limited to mass charge specific power 00 ~: L000mZz, retention time (RT) 1 ~ 1800sec, and intensity 200 is replaced with 1 ~ 255 gray scale. Displayed.
[0051] そして、本実施の形態では、 2つのサンプル力も採取された各 2回(計 4回)のデー タを相関させることで、データ補正が行われた(図 1 : STEP5)。  [0051] In the present embodiment, data correction was performed by correlating the data of two times (total of four times) each of which two sample forces were collected (Fig. 1: STEP5).
[0052] まず、本実施の形態において採用された、データ補正のために用いられる補正関 数の求め方 (アルゴリズム)を説明する。  First, a method (algorithm) for obtaining a correction function used for data correction, which is adopted in the present embodiment, will be described.
[0053] 本実施の形態では、参照される側の (標準とされる) 2次元データを A、補正対象と なる 2次元データを Bとし、各保持時間におけるマススペクトル相関係数の和が最大 になるような補正関数が導出される。  [0053] In the present embodiment, the reference side (standard) two-dimensional data is A, the correction target two-dimensional data is B, and the sum of the mass spectrum correlation coefficients at each holding time is the maximum. A correction function is derived such that
[0054] まず、実行速度の向上と質量計測誤差に対しての冗長性の確保のため、各 RT (保 持時間)におけるマススペクトルのイオン強度が lmZz区間毎の代表値に変換される  [0054] First, in order to improve the execution speed and ensure redundancy for mass measurement errors, the ion intensity of the mass spectrum at each RT (retention time) is converted into a representative value for each lmZz interval.
[0055] 次に、 2つの 2次元データ A、 Bのサイクル番号 (各 RTに対する昇順番号)を各軸と した 2次元の格子座標を用いて、以下に説明するような動的アルゴリズムにより、最適 対応位置を与える経路探索が行われる。 [0055] Next, using the two-dimensional grid coordinates with the cycle numbers (ascending numbers for each RT) of the two two-dimensional data A and B as the respective axes, the dynamic algorithm as described below is used to optimize A route search that provides a corresponding position is performed.
[0056] 本実施の形態の動的アルゴリズムは、一方の 2次元データの nサイクル目における 質量電荷比(マススペクトル) A (n)と、他方の 2次元データの mサイクル目における質 量電荷比 B (n)と、の間のピアソン積率相関係数を R (A (n) , B (m) )とし、ギャップぺ ナルティを gとし (典型的には、 0. 5)、一方の 2次元データの総サイクル数を Nとし 、他方の 2次元データの総サイクル数を Mとした時、 2つの 2次元データのサイクル番 号を各軸とした 2次元の格子座標 L(i, j)を、 [0056] The dynamic algorithm of the present embodiment uses the mass-to-charge ratio (mass spectrum) A (n) in the n-th cycle of one two-dimensional data and the mass-to-charge ratio in the m-th cycle of the other two-dimensional data. The Pearson product moment correlation coefficient between B (n) and R (A (n), B (m)), and the gap penalty as g (typically 0.5), Let N be the total number of cycles of the dimension data When the total number of cycles of the other two-dimensional data is M, the two-dimensional lattice coordinates L (i, j) with the cycle numbers of the two two-dimensional data as axes are
L(i, j) =max(L(i— 1, j) +g  L (i, j) = max (L (i- 1, j) + g
L(i, ト l)+g  L (i, G) + g
L(i-1, j-l)+R(A(n), B(m)))  L (i-1, j-l) + R (A (n), B (m)))
によって求める(i=l, ···, N j = l, ···, M)。  (I = l, ···, N j = l, ···, M).
[0057] そして、最適対応位置に対応するように、 L = argmax(k, 1), ((k=N 1=1, ···, M)及び (k=l, ···, N 1=M))を与える座標(k, 1)=V を始点として、 [0057] Then, L = argmax (k, 1), ((k = N 1 = 1, ···, M) and (k = l, ···, N 1 so as to correspond to the optimum corresponding position = M)) gives the coordinates (k, 1) = V as the starting point
0  0
L = argmax(V ), (V =V - (1, 1), V_ - (0, 1), V_ 一(1, 0)) で表される座標配列が決定される。  A coordinate array represented by L = argmax (V), (V = V- (1, 1), V _- (0, 1), V_one (1, 0)) is determined.
[0058] 更に、本実施の形態の動的アルゴリズムは、 L = argmax(V )のうちで V =V [0058] Furthermore, the dynamic algorithm of the present embodiment is such that V = V among L = argmax (V).
i i +1 i i i +1 i
+ (1, 1)を満たす座標のみを抽出した後、座標系をサイクル番号から RT 変換し て、スプライン補完または多項式回帰により得られる曲線を補正関数とする。 + After extracting only the coordinates satisfying (1, 1), the coordinate system is RT-converted from the cycle number, and the curve obtained by spline interpolation or polynomial regression is used as the correction function.
[0059] 以上のアルゴリズムについて、図面を用いて説明すると、以下のようになる。 The above algorithm will be described below with reference to the drawings.
[0060] 図 2に示すように、参照データ「A」に対して、対象データ「A」は Y軸方向に歪んだ 図形(データ列)である。この場合、本アルゴリズムは、 f_1(y,)を求めるアルゴリズム に相当するものである。 As shown in FIG. 2, the target data “A” is a figure (data string) distorted in the Y-axis direction with respect to the reference data “A”. In this case, this algorithm corresponds to the algorithm for obtaining f _1 (y,).
[0061] 図 3は、図 2に対応して形成された yと y'とを軸とする平面を示している。図 3におい て、四角印を結んだ線力 この場合の y—y'対応位置を示す線である。本ァルゴリズ ムは、この y—y,対応位置 (経路)を求めるものである。  [0061] FIG. 3 shows a plane formed with y and y 'as axes corresponding to FIG. In FIG. 3, the line force connecting the square marks is a line indicating the y-y 'corresponding position in this case. In this algorithm, y-y and the corresponding position (route) are obtained.
[0062] 図 4は、図 2及び図 3の例に対応して、各格子点についてピアソン積率相関係数 R(FIG. 4 corresponds to the example of FIGS. 2 and 3, and the Pearson product moment correlation coefficient R (
A(n), B(m))を求め、ギャップペナルティを—0.5として、実際に本アルゴリズムを 用いて L(i, j)を求めていく過程を示している。 A (n), B (m)) is determined, and the gap penalty is set to -0.5, and L (i, j) is actually calculated using this algorithm.
[0063] 全ての(i, j)に対する L(i, j)が求められたら、図 5に示すように、 L = argmax(k, 1) [0063] Once L (i, j) is obtained for all (i, j), as shown in Fig. 5, L = argmax (k, 1)
, ((k=N l=l, ···, M)及び (k=l, ···, N 1=M))を与える座標(k, 1)=V が , ((k = N l = l,..., M) and (k = l,..., N 1 = M)) gives the coordinates (k, 1) = V
o 始点として特定され、 L = argmax(V ), (V =V —(1, 1), V - (0, 1), V 一(1, 0))で表される座標配列が決定される。  o The coordinate array specified as the starting point and represented by L = argmax (V), (V = V — (1, 1), V-(0, 1), V one (1, 0)) is determined .
[0064] 最後に、図 6に示すように、 L=argmax(V )のうちで V =V +(1, 1)を満た [0064] Finally, as shown in Fig. 6, among L = argmax (V), V = V + (1, 1) is satisfied.
i i+1 i す座標のみを抽出した後、サイクル番号を保持時間に変換し、スプライン補完または 多項式回帰により曲線を得る。当該曲線が求めるべき補正関数である。 i i + 1 i After extracting only the coordinates, the cycle number is converted into the retention time, and the curve is obtained by spline interpolation or polynomial regression. This curve is a correction function to be obtained.
[0065] 以上のようなアルゴリズムにより、本実施の形態において 2つのサンプル力 得られ た 4つの 2次元データのうちの 3つが補正された。  [0065] With the algorithm as described above, three of the four two-dimensional data obtained with the two sample forces in the present embodiment were corrected.
[0066] データ補正にあたっては、まず、 RollingBall Algorithm (from "NIH Image J") (Radi us = 50)を用いて、ノ ックグラウンドが計算された。そして、各ポイントでの閾値を (バ ックグラウンド値 +データ値) Z2として、閾値以上の値を持つ隣接するスポットを統 合し、 peak intensity = Intensity総和、 peak mz = MZ重心、 peak rt = RT重心として、 ピークの検出がなされた。そして、検出されたピークが、液体クロマトグラフィーの補正 されたデータ (前記アルゴリズムにより得られた補正関数をかけた後の保持時間のデ ータ)に当て嵌められ、得られた (補正された)同一サンプルの 2つの 2次元データが 重ね合わされ、「統合」された (平均化された)。この「統合」処理では、質量電荷比、 溶出時間 (保持時間)の変動の許容範囲が、それぞれ、 ± 2、 ± 20とされた。また、そ の中に複数のピークが存在する時には、 Intensityのより高い方を採用し、 MZ, RTは 平均値とした。このようにして得られた各サンプルのデータから、 pairを検出し、 pairが 検出された場合に、それぞれの peak intensityの比が算出された(図 1 : STEP6)。  [0066] For data correction, the knock ground was first calculated using the RollingBall Algorithm (from "NIH Image J") (Radius = 50). Then, the threshold at each point is set as (background value + data value) Z2, and adjacent spots with values above the threshold are integrated, peak intensity = sum of Intensity, peak mz = MZ centroid, peak rt = RT centroid As a result, a peak was detected. Then, the detected peak was fitted to the liquid chromatography corrected data (the retention time data after applying the correction function obtained by the above algorithm), and obtained (corrected). Two 2D data of the same sample are superimposed and “integrated” (averaged). In this “integrated” treatment, the allowable ranges of variation in mass-to-charge ratio and elution time (retention time) were set to ± 2, ± 20, respectively. In addition, when multiple peaks exist, the higher Intensity was adopted, and MZ and RT were averaged. The pair was detected from the data of each sample obtained in this way, and when the pair was detected, the ratio of the peak intensities was calculated (Fig. 1: STEP 6).
[0067] 結果を以下にまとめて示す。  [0067] The results are summarized below.
[0068] 前記のように、 DLD1 Tet- off ACTN4及び DLD1 Tet- on ACTN4のそれぞれに対 して、 duplicateでデータが採取されたが(以下、 Offl、 Off2、 Onl、 On2とする)、こ れら 4つのデータのうち、 DLD1 Tet- off ACTN4の一方のデータ(Off 1)がマスターデ ータ (標準データ)とされ、他の 3つのデータが上記アルゴリズムにより得られた補正 関数によって補正された。  [0068] As described above, data was collected in duplicate for each of DLD1 Tet-off ACTN4 and DLD1 Tet-on ACTN4 (hereinafter referred to as Offl, Off2, Onl, On2). Among these four data, one of DLD1 Tet-off ACTN4 data (Off 1) is the master data (standard data), and the other three data are corrected by the correction function obtained by the above algorithm. .
[0069] 図 7に示すように、 Off 1データを基準とすると、補正前の Off2は、最大で 190sec、 平均で 71. 9secの相違(変動)があり、補正前の Onlは、最大で 192sec、平均で 13 0. 2secの相違(変動)があり、補正前の On2は、最大で 80sec、平均で 36. 3secの 相違 (変動)があった(図 7において、 X軸方向のずれが相違 (変動)である)。ところが 、上記アルゴリズムにより得られた補正関数による補正の結果、図 8に示すように、同 一サンプルの DLD1 Tet- off ACTN4 (Offl、 Off2)では、 0. 92という高い相関係数 を得ることができ、また同様に、同一サンプルの DLDl Tet-on ACTN4 (Onl、 On2) でも、 0. 94という高い相関係数を得ることができた。 [0069] As shown in Figure 7, using Off 1 data as a reference, Off2 before correction has a difference (variation) of 190 sec at maximum and 71.9 sec on average, and Onl before correction has a maximum of 192 sec There was a difference (variation) of 130.2 seconds on average, and On2 before correction had a difference (variation) of 80 seconds at maximum and 36.3 seconds on average (in Fig. 7, the deviation in the X-axis direction was different) (Variation)). However, as a result of correction using the correction function obtained by the above algorithm, as shown in Fig. 8, a high correlation coefficient of 0.92 is obtained for DLD1 Tet-off ACTN4 (Offl, Off2) of the same sample. Similarly, a high correlation coefficient of 0.94 could be obtained even with DLDl Tet-on ACTN4 (Onl, On2) of the same sample.
[0070] なお、 Off 1と補正された Off 2とを平均化して得られた値を DLDl Tet-off ACTN4の 代表値とし、補正された Onlと補正された On2とを平均化して得られた値を DLDl Te t-on ACTN4の代表値として、それら 2群間で、強い発現があるほうの peak intensity 力 S 10以上で一方に発現がないもの、あるいは、 peak intensityの比が 3以上であるも の、力 S203peak認められた(DLDl Tet- off ACTN4優位が 107peak、 DLDl Tet-on A CTN4優位が 96peak)。これらの peakはァクチニン 4 (ACTN4 )の発現によって変動 するものと認められ、すなわち、ァクチニン 4 (蛋白)の同定が可能であった。  [0070] The value obtained by averaging Off 1 and corrected Off 2 was used as the representative value of DLDl Tet-off ACTN4, and was obtained by averaging corrected Onl and corrected On2. As a representative value of DLDl Te t-on ACTN4, between those two groups, there is a strong expression of peak intensity force S 10 or more and no expression in one, or the ratio of peak intensity is 3 or more However, force S203peak was recognized (DLDl Tet-off ACTN4 predominance was 107peak, DLDl Tet-on A CTN4 predominance was 96peak). These peaks were found to vary depending on the expression of actinin 4 (ACTN4), that is, it was possible to identify actinin 4 (protein).
[0071] このように、測定データ自体には再現性を見出すことが困難な同一種のサンプルの 複数回の測定データ(Off 1と Off 2、あるいは、 Onlと On2)を相関させて補正するこ とにより、ある程度の再現性を有する同一種のデータ群とみなすことができ、それらを 平均化して得られる値を代表値とすることで、より高精度の同定ないし診断を行うこと が可能となる。すなわち、 2つ以上の 2次元データを相関させることによって、それら の特徴を重ね合わせて評価することが可能となり、測定データ自体の相違の存在に 関わらず測定データを評価する上で高い再現性を認めることができ、多くの検体、た とえば患者血清で比較検討すること等が可能となる。これにより、今までとは異なる病 気に関するマーカーを開発できる可能性が顕著に高められる。  [0071] In this way, the measurement data itself is corrected by correlating multiple measurement data (Off 1 and Off 2 or Onl and On2) of the same type of sample for which it is difficult to find reproducibility. Therefore, it can be regarded as a data group of the same type having a certain degree of reproducibility, and by making the values obtained by averaging them as representative values, it becomes possible to perform identification or diagnosis with higher accuracy. . In other words, by correlating two or more two-dimensional data, it becomes possible to superimpose their characteristics for evaluation, and high reproducibility can be obtained when evaluating measurement data regardless of the difference in measurement data itself. This makes it possible to compare with many samples, for example, patient serum. This significantly increases the possibility of developing markers for different diseases.
[0072] なお、前記のデータ補正の処理は、通常、各種のコンピュータシステムによって構 成され得るデータ補正装置によって実施され得る。ここで、当該データ補正装置をコ ンピュータシステム上に実現させるためのプログラム及び当該プログラムを記録したコ ンピュータ読取り可能な記録媒体も、本件の保護対象である。  [0072] It should be noted that the data correction processing described above can be normally performed by a data correction apparatus that can be configured by various computer systems. Here, a program for realizing the data correction apparatus on a computer system and a computer-readable recording medium recording the program are also subject to protection in this case.
[0073] さらに、データ補正装置が、コンピュータシステム上で動作する OS等のプログラム( 第 2のプログラム)によって実現される場合、当該 OS等のプログラムを制御する各種 命令を含むプログラム及び当該プログラムを記録した記録媒体も、本件の保護対象 である。  [0073] Further, when the data correction apparatus is realized by a program such as an OS (second program) operating on the computer system, a program including various instructions for controlling the program such as the OS and the program are recorded. The recorded media are also subject to protection in this case.
[0074] ここで、記録媒体とは、フレキシブルディスク等の単体として認識できるものの他、各 種信号を伝搬させるネットワークをも含む。 [0075] 次に、本発明の第 2の実施の形態を、図面を参照して説明する。 Here, the recording medium includes a network that propagates various types of signals in addition to a medium that can be recognized as a single unit such as a flexible disk. [0075] Next, a second embodiment of the present invention will be described with reference to the drawings.
[0076] まず、本実施の形態で使用された被検査物質及び溶液について説明する。 First, the substance to be inspected and the solution used in the present embodiment will be described.
[0077] 本実施の形態では、被検査物質として、脾臓癌患者 18例の血漿 (第 1群)と非担癌 者 19例の血漿(第 2群)を、それぞれ 10 1用いた。そして、これら力 、 100 1の con canavalin Aに吸着する糖タンパク分画を抽出した(この処理は、必須の処理では無 いが、感度調整と 、う点で実施することが好ま 、)。 [0077] In the present embodiment, as test substances, plasma of 18 patients with spleen cancer (Group 1) and plasma of 19 patients with no cancer (Group 2) were used 10 1 respectively. Then, a glycoprotein fraction adsorbed to 100 1 of concanavalin A was extracted with these forces (this treatment is not an essential treatment, but is preferably performed in terms of sensitivity adjustment and point).
[0078] そして、当該糖タンパク分画の各々に、 5Mの UREAを 10 1、 1Mの NH HCO [0078] Then, 5M UREA was added to each of the glycoprotein fractions 10 1 and 1M NH HCO.
4 3 を 2. 5 1、トリプシンを 3. 3 /z gカロ免た後、精製水にて 50 1とした。  4 3 was adjusted to 2.51, and trypsin was adjusted to 3.3 / z g calorie.
[0079] そして、 37°Cで 20時間消化反応させた後、ァセトニトリル 50 μ 1を添カ卩し、 17400G で 10分間遠心し、上清を他のチューブに移し、スピードバックを用いて乾燥した。そ して、 0. 1%蟻酸 50 1で溶解して、測定用のサンプル (溶液)とした(図 9 : STEP21[0079] Then, after digestion reaction at 37 ° C for 20 hours, 50 μl of acetonitrile was added, centrifuged at 17400 G for 10 minutes, the supernatant was transferred to another tube, and dried using a speed bag . Then, it was dissolved in 0.1% formic acid 50 1 to obtain a sample (solution) for measurement (Figure 9: STEP21
) o ) o
[0080] 一方、超低流量液体クロマトグラフィーとして、 Splitless Nano HPLC System (KYA ,東京)が用いられた。粒子径 3 m、ポアサイズ 120Aの高純度シリカゲルに、オタ タデシル基を導入後、残存シラノール基を極限までエンドキヤッビングした内径 0. 15 mm、長さ 50mmの逆相カラムが分離カラムに、内径 0. 5mm、長さ lmmのものがト ラップカラムに、それぞれ用いられた (HiQ sil, KYA,東京)。  On the other hand, Splitless Nano HPLC System (KYA, Tokyo) was used as ultra-low flow liquid chromatography. A high-purity silica gel with a particle size of 3 m and a pore size of 120 A, after introducing otadecyl groups, the remaining silanol groups were end-capped to the limit. 0.5 mm and 1 mm long ones were used for the trap columns (HiQ sil, KYA, Tokyo).
[0081] そして、前記サンプルを 10 1採って、 200nlZminという超低流量で、 0. 1%蟻酸 力ら 0. 1%蟻酸 80%ァセトニトリルまで、 60分間かけて、連続濃度勾配を生成した( 図 9 : STEP22)。その間に、各成分が分離溶出した(図 9 : STEP23)。  [0081] Then, 10 1 of the sample was taken, and a continuous concentration gradient was generated over 60 minutes from 0.1% formic acid power to 0.1% formic acid 80% acetonitrile with an ultra-low flow rate of 200 nlZmin (Fig. 9: STEP22). In the meantime, each component was separated and eluted (Figure 9: STEP23).
[0082] 各成分の質量分析には、 QTOF Ultima (Waters, MA, USA)が用いられ、 250〜16 OOMZZまでの範囲で、スキャン時間を 1秒として、セントロイド形式で、 60分間測定 が行われた。各サンプル (溶液)について、 triplicateでデータが採取された(3回デ ータが採取された)(図 9 : STEP24)。  [0082] QTOF Ultima (Waters, MA, USA) was used for mass analysis of each component, and the measurement was performed for 60 minutes in the centroid format with a scan time of 1 second in the range from 250 to 16 OOMZZ. It was broken. For each sample (solution), data was collected by triplicate (data was collected three times) (Figure 9: STEP24).
[0083] データの 2次元表示では、質量電荷比が lmZz (mass to charge ratio )の範囲ごと の最大値に変換されて、 wiff形式で出力された。なお、解析対象範囲は、質量電荷 比力 00〜: L000mZz、保持時間(RT)が l〜1800sec、に限定され、強度(Intensi ty)く 200の値が、 1〜255のグレースケールに置換されて表示された。 [0084] そして、本実施の形態では、 37例のサンプル力も採取された各 3回(計 111回)の データを相関させることで、データ補正が行われた(図 9: STEP25)。 [0083] In the two-dimensional display of the data, the mass-to-charge ratio was converted to the maximum value for each range of lmZz (mass to charge ratio) and output in wiff format. The analysis target range is limited to mass charge specific power 00 ~: L000mZz, retention time (RT) 1 ~ 1800sec, and intensity 200 is replaced with 1 ~ 255 gray scale. Displayed. [0084] In the present embodiment, the data correction was performed by correlating the data of 3 times (total of 111 times) each of which 37 sample forces were collected (FIG. 9: STEP25).
[0085] 本実施の形態でも、前記実施の形態に関して説明されたアルゴリズムに従って、参 照される側の (標準とされる) 2次元データを A、補正対象となる 2次元データを Bとし、 各保持時間におけるマススペクトル相関係数の和が最大になるような補正関数が導 出され、得られた補正関数が各サンプルの保持時間のデータに掛けられた。ここで は、ある非担癌者の血漿に基づくデータが、参照される側の (標準とされる) 2次元デ ータ Aとして用いられた。  [0085] Also in the present embodiment, according to the algorithm described with respect to the above-described embodiment, the reference (standard) two-dimensional data is A, and the two-dimensional data to be corrected is B. A correction function was derived that maximized the sum of the mass spectral correlation coefficients at the retention time, and the resulting correction function was multiplied by the retention time data for each sample. Here, the data based on the plasma of a non-carcologist was used as the reference (standard) 2D data A.
[0086] このようにして得られた補正後の 2次元データ (保持時間データー質量電荷比デー タ)について、本実施の形態では、コンピュータシステムにより構成されたデータ分析 装置によって、 lmZz毎に、各サンプルデータの保持時間データを並列に並べた 2 次元画像データに展開された(図 9: STEP26)。  [0086] With respect to the two-dimensional data after correction (retention time data-mass-to-charge ratio data) obtained in this way, in this embodiment, each lmZz is measured for each lmZz by a data analysis device configured by a computer system. Sample data retention time data was expanded into two-dimensional image data arranged in parallel (Figure 9: STEP26).
[0087] 2次元画像データの例を図 10に示す。図 10において、横軸方向が保持時間 (RT:  An example of 2D image data is shown in FIG. In Fig. 10, the horizontal axis is the holding time (RT:
20〜30min)であり、縦軸方向に各サンプルデータが並べられている。図 10は、 86 3mZzにつ!/、ての 2次元画像データである。  Each sample data is arranged in the vertical axis direction. Figure 10 shows 2D image data for 86 3mZz!
[0088] そして、前記データ分析装置によって、新規に開発された同一ピーク抽出アルゴリ ズムに従って、 2次元画像データ中の保持時間データの同一ピークが抽出された(図 9 : STEP27)。  [0088] Then, the same peak of retention time data in the two-dimensional image data was extracted by the data analysis apparatus according to the newly developed same peak extraction algorithm (FIG. 9: STEP27).
[0089] ここで、本実施の形態の同一ピーク抽出アルゴリズムは、各サンプルデータの保持 時間データに対する、ベースライン補正工程(図 11 (a) )と、スムージング工程(図 11 (b) )と、ピーク検出工程(図 11 (c) )と、を含んでいる。  Here, the same peak extraction algorithm according to the present embodiment includes a baseline correction step (FIG. 11 (a)) and a smoothing step (FIG. 11 (b)) for the retention time data of each sample data. And a peak detection step (FIG. 11 (c)).
[0090] ベースライン補正とは、試料の光散乱の影響などでスペクトル波形に生じるベース ラインの傾きやうねりを矯正するための処理である。スムージングとは、ガウス関数に よる加重平均をとることでノイズを除去する処理である(数 1参照)。これらの処理は、 データ分析のための処理手法として、従前力もよく利用されているものである。  [0090] Baseline correction is a process for correcting the inclination and undulation of the baseline that occurs in the spectrum waveform due to the effect of light scattering on the sample. Smoothing is a process that removes noise by taking a weighted average using a Gaussian function (see Equation 1). These processes are often used as a processing method for data analysis.
[数 1]
Figure imgf000019_0001
[Number 1]
Figure imgf000019_0001
0 = area, ax = center, a2 = width {std dev.) 0 = area, a x = center, a 2 = width (std dev.)
[0091] また、本実施の形態のピーク検出工程では、データポイント毎に信号 Zノイズ比を 算出することによって、ピーク検出精度を高めている。 [0091] Also, in the peak detection step of the present embodiment, the peak detection accuracy is improved by calculating the signal Z noise ratio for each data point.
[0092] 検出されたピークを示す画像の例を、図 12 (b)に示す。図 12 (a)が 2次元画像デ ータの例であり、図 12 (b)が図 12 (a)のデータ力も検出されたピークを示す画像デー タである。 An example of an image showing the detected peak is shown in FIG. 12 (b). Fig. 12 (a) is an example of 2D image data, and Fig. 12 (b) is image data showing a peak in which the data force of Fig. 12 (a) is also detected.
[0093] そして、本実施の形態の同一ピーク抽出アルゴリズムは、ピーク検出工程で検出さ れた各サンプルデータのピーク同士の対応関係を特定する同一ピーク特定工程を 含んでいる。  [0093] Then, the same peak extraction algorithm of the present embodiment includes the same peak specifying step of specifying the correspondence between peaks of each sample data detected in the peak detecting step.
[0094] この同一ピーク特定工程は、図 13に示すように、各ピークを基準にして、当該ピー クからの許容ずれ範囲幅を所定の保持時間幅とし、当該保持時間幅内に含まれる候 補ピークを抽出する候補ピーク抽出工程を有している(図 13 : STEP31)。許容ずれ 範囲幅は、例えば、 +側に 0. 7minである。  As shown in FIG. 13, in this same peak specifying step, an allowable deviation range width from the peak is set as a predetermined holding time width on the basis of each peak, and it is included in the holding time width. It has a candidate peak extraction process to extract complementary peaks (Fig. 13: STEP31). The allowable deviation range width is, for example, 0.7 min on the + side.
[0095] そして、同一ピーク特定工程は、あるサンプルデータにおいて前記保持時間幅内 に抽出された候補ピークが一つ以上ある場合には、当該サンプルデータについての 候補ピークを一つ選択し、あるサンプルデータにおいて前記保持時間幅内に抽出さ れた候補ピークが無 、場合には、当該サンプルデータにっ 、ての候補ピークは無 ヽ ものとして、前記候補ピークの(全サンプルデータに亘る)選択の全組合せの各々に っ 、て、選択された候補ピークのスコア (総強度)を計算するスコア計算工程を有して いる(図 13 : STEP32)。  Then, in the same peak specifying step, when there are one or more candidate peaks extracted within the holding time width in a certain sample data, one candidate peak for the sample data is selected and a certain sample is selected. If there is no candidate peak extracted within the retention time width in the data, the candidate peak is assumed to be none according to the sample data, and the selection of the candidate peak (over all sample data) is performed. Each of the all combinations has a score calculation step for calculating the score (total intensity) of the selected candidate peak (FIG. 13: STEP 32).
[0096] ここで、前記保持時間幅内に抽出された候補ピークが無 、サンプルデータの割合 力 所定の最小検出率を下回った場合には、その時点で、当該保持時間幅での前 記同一ピーク特定工程の実施が終了されるようになって!/、ることが好ま 、。そのよう な場合には、当該保持時間幅内において同一ピークを特定すべきでないからである 。最小検出率は、通常、 0. 1〜0. 4に設定される(0. 5を越えると、二群間での有意 な差を特定し難くなると考えられる)。 Here, if there is no candidate peak extracted within the holding time width and the ratio of sample data falls below a predetermined minimum detection rate, at that time, the same as described above for the holding time width. It is preferable that the peak identification process is completed! /. In such a case, the same peak should not be specified within the retention time width. The minimum detection rate is usually set between 0.1 and 0.4 (if it exceeds 0.5, the significance between the two groups It may be difficult to identify the difference).
[0097] そして、スコア計算工程で得られたスコアのうち、最大のスコアを提供する候補ピー クの選択の組み合わせ力 互いに対応する同一ピークとして特定される(ピーク特定 工程)(図 13 : STEP33)。そして、対応する同一ピークが認められな力つた (抽出さ れな力つた)サンプルデータに対して、ピークを補完する処理が行われる(図 13: ST EP34)。  [0097] Of the scores obtained in the score calculation process, the combinatorial power of selecting the candidate peak that provides the maximum score is identified as the same peak corresponding to each other (peak identification process) (Figure 13: STEP33) . Then, a process for complementing the peak is performed on the sample data for which the corresponding same peak is not recognized (with no extracted force) (FIG. 13: STEP 34).
[0098] その後、特定及び補完された同一ピークによって、保持時間データが区間分割さ れる(データ分割工程)(図 13 : STEP35)。そして、当該データ分割工程において区 間分割された保持時間データ (の双方)について、前記候補ピーク抽出工程、前記ス コア計算工程、前記ピーク特定工程、及び、前記データ分割工程が再帰的に繰り返 される(図 13 : STEP36)。  [0098] After that, the retention time data is divided into sections by the same specified and supplemented peak (data dividing step) (FIG. 13: STEP35). The candidate peak extraction step, the score calculation step, the peak identification step, and the data division step are recursively repeated for the retention time data (both) divided in the data division step. (Figure 13: STEP36).
[0099] 以上のような本実施の形態の同一ピーク抽出アルゴリズムによって、 105457個の 同一ピークが特定された。これらの同一ピークについて、脾臓癌患者 18例の血漿( 第 1群)に基づく同一ピークと、非担癌者 19例の血漿 (第 2群)に基づく同一ピークと 、を比較し、両者に有意な差があるか否かを検証し、有意な差が認められるのであれ ば、それらの差を「マーカー」として利用することができる(図 9: STEP28)。  [0099] 105457 identical peaks were identified by the same peak extraction algorithm of the present embodiment as described above. For these same peaks, the same peak based on the plasma of the 18 patients with splenic cancer (Group 1) and the same peak based on the plasma of the 19 non-cancer-bearing patients (Group 2) were compared. If there are significant differences, those differences can be used as “markers” (Figure 9: STEP28).
[0100] 具体的には、本実施の形態の場合、脾臓癌患者群と非担癌者群とで、平均ピーク 強度が 10以上であって U検定で 0. 0001以下の有意差を示したものが、 109ピーク 認められた (脾臓癌患者群優位が 80ピーク、非担癌者群優位が 29ピーク)。また、 R OC曲線下面積が 0. 9以上であるピークが、 32認められた。  [0100] Specifically, in the case of the present embodiment, the average peak intensity was 10 or more and a significant difference of 0.0001 or less was shown by the U test between the spleen cancer patient group and the non-carried group. There were 109 peaks (80% for the spleen cancer patient group, 29 peaks for the non-cancer-bearing group). In addition, 32 peaks having an area under the R OC curve of 0.9 or more were observed.
[0101] これら 32のピークのうちの一つについて、図 14 (a)乃至図 14 (c)に、 2次元画像( 図 14 (a) )と、 ROC曲線 (図 14 (b) )と、ピーク強度分布図(図 14 (c) )と、を示す。  [0101] For one of these 32 peaks, Fig. 14 (a) to Fig. 14 (c) show a two-dimensional image (Fig. 14 (a)), ROC curve (Fig. 14 (b)), The peak intensity distribution diagram (Fig. 14 (c)) is shown.
[0102] また、前記 32のピークに対して、 SVMを用いて、 2因子での解析を行った。クロス バリデーシヨンを行ったうえで、判別率が 100% (感度 100%、特異度 100%)である ピークの組み合わせが 3つあり、判別率が 97% (感度 100%、特異度 95%、あるい は、感度 94%、特異度 100%)のピークの組み合わせが 28あった。図 15 (a)に、組 み合わせにより判別率が 100% (感度 100%、特異度 100%)となるピーク画像を示 し、図 15 (b)に、そのピークを用いて分けられた脾臓癌患者と非担癌者とのピーク強 度の分布を示す。 [0102] The 32 peaks were analyzed with two factors using SVM. After cross validation, there are three combinations of peaks with a discrimination rate of 100% (sensitivity 100%, specificity 100%), and a discrimination rate of 97% (sensitivity 100%, specificity 95%, there is Or, there were 28 peak combinations with 94% sensitivity and 100% specificity. Fig. 15 (a) shows a peak image with a discrimination rate of 100% (sensitivity 100%, specificity 100%) by combination, and Fig. 15 (b) shows the spleen divided using the peak. Peak strength between cancer patients and non-cancer carriers Degree distribution.
[0103] このように、本実施の形態によれば、脾臓癌患者の血漿を含む溶液から保持時間 データの同一ピークを抽出すると共に非担癌者の血漿を含む溶液力 保持時間デ ータの同一ピークを抽出し、両者を比較することによって、脾臓癌マーカーの開発を 促進することができる。  [0103] Thus, according to the present embodiment, the same peak of retention time data is extracted from the solution containing the plasma of the spleen cancer patient, and the solution force retention time data including the plasma of the non-cancer carrier is extracted. By extracting the same peak and comparing the two, development of a spleen cancer marker can be promoted.
[0104] なお、前記のデータ分析の処理は、通常、各種のコンピュータシステムによって構 成され得るデータ分析装置によって実施され得る。ここで、当該データ分析装置をコ ンピュータシステム上に実現させるためのプログラム及び当該プログラムを記録したコ ンピュータ読取り可能な記録媒体も、本件の保護対象である。  It should be noted that the data analysis process described above can be normally performed by a data analysis apparatus that can be configured by various computer systems. Here, a program for realizing the data analysis apparatus on a computer system and a computer-readable recording medium recording the program are also subject to protection in this case.
[0105] さらに、データ分析装置が、コンピュータシステム上で動作する OS等のプログラム( 第 2のプログラム)によって実現される場合、当該 OS等のプログラムを制御する各種 命令を含むプログラム及び当該プログラムを記録した記録媒体も、本件の保護対象 である。  [0105] Furthermore, when the data analysis device is realized by a program such as an OS (second program) that runs on a computer system, a program including various instructions for controlling the program such as the OS and the program are recorded. The recorded media are also subject to protection in this case.
[0106] ここで、記録媒体とは、フレキシブルディスク等の単体として認識できるものの他、各 種信号を伝搬させるネットワークをも含む。  Here, the recording medium includes not only a flexible disk or the like that can be recognized as a single unit, but also a network that propagates various signals.
[0107] なお、前記実施の形態は、脾臓癌マーカーの開発を意図したものであるが、本発 明はこれに限定されるものではない。複数の被検査物質をそれぞれ含む複数の溶液 力 保持時間データの同一ピークを抽出することにより、当該複数の溶液のデータ特 性を効果的に特定 (分析)することができるため、各種の疾患マーカーの開発を促進 することが期待できる。  [0107] The above embodiment is intended to develop a spleen cancer marker, but the present invention is not limited to this. Extracting the same peak of multiple solution force retention time data, each containing multiple analytes, can effectively identify (analyze) the data characteristics of the multiple solutions. Can be expected to promote the development of
[0108] また、本件出願の時点では、本願において優先権主張の基礎とする「特願 2005— 177547」にて提案されたデータ補正方法を利用しなければ、各サンプルデータの 保持時間データを並べた 2次元画像データは図 16に示すような状態であるので、当 該 2次元画像データに基づ 、て保持時間データの同一ピークを抽出すると!/、う同一 ピーク抽出工程を実施することは事実上不可能である。し力しながら、将来において データ測定方法の精度が向上した場合には、前記データ補正方法を用いることなく 、前記データ分析方法のみが単独で利用されることもあり得る。  [0108] At the time of filing this application, if the data correction method proposed in "Japanese Patent Application 2005-177547" which is the basis of the priority claim in this application is not used, the retention time data of each sample data is arranged. Since the two-dimensional image data is in a state as shown in FIG. 16, if the same peak of the retention time data is extracted based on the two-dimensional image data! Virtually impossible. However, if the accuracy of the data measurement method is improved in the future, only the data analysis method may be used alone without using the data correction method.

Claims

請求の範囲 The scope of the claims
[1] 液体クロマトグラフィーに被検査物質を含む溶液を流して、当該溶液の濃度勾配を 所定時間かけて生成する濃度勾配生成工程と、  [1] A concentration gradient generation step of flowing a solution containing a substance to be tested in liquid chromatography and generating a concentration gradient of the solution over a predetermined time;
前記濃度勾配生成工程中に分離溶出した前記被検査物質の各成分について、保 持時間データと質量電荷比データとを対応付けて得る測定工程と、  A measurement step for obtaining retention time data and mass-to-charge ratio data in association with each component of the test substance separated and eluted during the concentration gradient generation step;
前記測定工程にて得られた保持時間データと質量電荷比データとの 2次元データ を、予め求めてあった標準 2次元データと相関させることによって補正する補正工程 と、  A correction step for correcting the two-dimensional data of the retention time data and the mass-to-charge ratio data obtained in the measurement step by correlating with the standard two-dimensional data obtained in advance;
を備えたことを特徴とするデータ補正方法。  A data correction method comprising:
[2] 液体クロマトグラフィーに第 1被検査物質を含む第 1溶液を流して、当該第 1溶液の 濃度勾配を所定時間かけて生成する第 1濃度勾配生成工程と、 [2] A first concentration gradient generating step for generating a concentration gradient of the first solution over a predetermined time by flowing a first solution containing the first analyte in liquid chromatography;
前記第 1濃度勾配生成工程中に分離溶出した前記第 1被検査物質の各成分につ いて、保持時間データと質量電荷比データとを対応付けて得る第 1測定工程と、 前記液体クロマトグラフィーに第 2被検査物質を含む第 2溶液を流して、当該第 2溶 液の濃度勾配を所定時間かけて生成する第 2濃度勾配生成工程と、  A first measurement step of associating retention time data and mass-to-charge ratio data for each component of the first analyte to be separated and eluted during the first concentration gradient generation step; A second concentration gradient generating step of generating a concentration gradient of the second solution over a predetermined time by flowing a second solution containing the second test substance;
前記第 2濃度勾配生成工程中に分離溶出した前記第 2被検査物質の各成分につ Vヽて、保持時間データと質量電荷比データとを対応付けて得る第 2測定工程と、 前記第 2測定工程にて得られた保持時間データと質量電荷比データとの 2次元デ ータを、前記第 1測定工程にて得られた保持時間データと質量電荷比データとの 2 次元データと相関させることによって補正する補正工程と、  A second measurement step for obtaining retention time data and mass-to-charge ratio data in association with each component of the second analyte to be separated and eluted during the second concentration gradient generation step; Correlate the two-dimensional data of the retention time data and the mass-to-charge ratio data obtained in the measurement process with the two-dimensional data of the retention time data and the mass-to-charge ratio data obtained in the first measurement process. Correction process to correct by,
を備えたことを特徴とするデータ補正方法。  A data correction method comprising:
[3] 第 1液体クロマトグラフィーに第 1被検査物質を含む第 1溶液を流して、当該第 1溶 液の濃度勾配を所定時間かけて生成する第 1濃度勾配生成工程と、 [3] a first concentration gradient generating step for generating a concentration gradient of the first solution over a predetermined time by flowing a first solution containing the first analyte in the first liquid chromatography;
前記第 1濃度勾配生成工程中に分離溶出した前記第 1被検査物質の各成分につ いて、保持時間データと質量電荷比データとを対応付けて得る第 1測定工程と、 第 2液体クロマトグラフィーに第 2被検査物質を含む第 2溶液を流して、当該第 2溶 液の濃度勾配を所定時間かけて生成する第 2濃度勾配生成工程と、  A first measurement step for associating retention time data and mass-to-charge ratio data for each component of the first analyte to be separated and eluted during the first concentration gradient generation step; and a second liquid chromatography A second concentration gradient generating step of flowing a second solution containing the second substance to be inspected to generate a concentration gradient of the second solution over a predetermined time;
前記第 2濃度勾配生成工程中に分離溶出した前記第 2被検査物質の各成分につ ヽて、保持時間データと質量電荷比データとを対応付けて得る第 2測定工程と、 前記第 2測定工程にて得られた保持時間データと質量電荷比データとの 2次元デ ータを、前記第 1測定工程にて得られた保持時間データと質量電荷比データとの 2 次元データと相関させることによって補正する補正工程と、 For each component of the second analyte to be separated and eluted during the second concentration gradient generation step, Then, the second measurement step obtained by associating the retention time data and the mass-to-charge ratio data, and the two-dimensional data of the retention time data and the mass-to-charge ratio data obtained in the second measurement step, A correction step for correcting by correlating with the two-dimensional data of the retention time data and the mass-to-charge ratio data obtained in the first measurement step;
を備えたことを特徴とするデータ補正方法。  A data correction method comprising:
[4] 前記濃度勾配生成工程では、液体クロマトグラフィーに被検査物質を含む溶液力 [4] In the concentration gradient generating step, the liquid force containing the test substance in the liquid chromatography
500nlZmin以下の流量で流されるようになって!/、る  The flow rate is 500nlZmin or less!
ことを特徴とする請求項 1乃至 3のいずれかに記載のデータ補正方法。  The data correction method according to claim 1, wherein the data correction method is a data correction method.
[5] 前記補正工程では、 2つの 2次元データのサイクル番号 (保持時間に対する昇順番 号)を各軸とした 2次元の格子座標を用いて最適対応位置を探索する動的アルゴリズ ムが用いられるようになって!/、る [5] In the correction process, a dynamic algorithm is used to search for an optimal corresponding position using a two-dimensional grid coordinate with each cycle being the cycle number of two two-dimensional data (ascending order with respect to holding time). It looks like! /
ことを特徴とする請求項 1乃至 4のいずれかに記載のデータ補正方法。  The data correction method according to claim 1, wherein the data correction method is a data correction method.
[6] 前記動的アルゴリズムは、 [6] The dynamic algorithm is:
一方の 2次元データの nサイクル目における質量電荷比(マススペクトル) A (n)と、 他方の 2次元データの mサイクル目における質量電荷比 B(n)と、の間のピアソン積 率相関係数を R(A(n), B(m))とし、  Pearson product phase relationship between the mass-to-charge ratio (mass spectrum) A (n) in the n-th cycle of one two-dimensional data and the mass-to-charge ratio B (n) in the m-th cycle of the other two-dimensional data Let R (A (n), B (m))
ギャップペナノレティを gとし、  Let the gap pennality be g,
一方の 2次元データの総サイクル数を Nとし、  Let N be the total number of cycles for one 2D data,
他方の 2次元データの総サイクル数を Mとした時、  When the total number of cycles of the other two-dimensional data is M,
2つの 2次元データのサイクル番号を各軸とした 2次元の格子座標 L (i, j)を、 L(i, j) =max(L(i— 1, j) +g、  Two-dimensional lattice coordinates L (i, j) with the cycle numbers of two two-dimensional data as axes, L (i, j) = max (L (i- 1, j) + g,
L(i, ト l)+gゝ  L (i, G) + g ゝ
L(i-1, j-l)+R(A(n), B(m)))  L (i-1, j-l) + R (A (n), B (m)))
によって求め(i=l, ···, N、j = l, ···, M)、  (I = l, ···, N, j = l, ···, M),
最適対応位置に対応するように、 L = argmax(k, 1) , ((k = N、l=l, ·'·, Μ)及 び (k=l, ···, N、1=M))を与える座標(k, 1)=V を始点として、  L = argmax (k, 1), ((k = N, l = l, ... ', Μ) and (k = l, ..., N, 1 = M )) Gives the coordinates (k, 1) = V
0  0
L = argmax(V ), (V =V - (1, 1), V_ - (0, 1), V_ 一(1, 0)) で表される座標配列を決定するようになって ヽる ことを特徴とする請求項 5に記載のデータ補正方法。 L = argmax (V), (V = V-(1, 1), V_-(0, 1), V_ one (1, 0)) The data correction method according to claim 5, wherein:
[7] 前記動的アルゴリズムは、 L = argmax (V )のうちで V =V + (1, 1)を満た [7] The dynamic algorithm satisfies V = V + (1, 1) out of L = argmax (V).
i i+ 1 i  i i + 1 i
す座標のみを抽出した後、サイクル番号を保持時間に変換し、スプライン補完または 多項式回帰により得られる曲線を補正関数として決定するようになっている ことを特徴とする請求項 6に記載のデータ補正方法。  7. The data correction according to claim 6, wherein after extracting only the coordinates, the cycle number is converted into a holding time, and a curve obtained by spline interpolation or polynomial regression is determined as a correction function. Method.
[8] 複数の被検査物質または第 2被検査物質をそれぞれ含む複数の溶液または第 2溶 液の各々について、請求項 1乃至 7のいずれかに記載のデータ補正方法を実施する データ補正工程と、 [8] A data correction step of performing the data correction method according to any one of claims 1 to 7 for each of a plurality of solutions or second solutions each containing a plurality of substances to be inspected or a second substance to be inspected. ,
前記データ補正方法によって補正された保持時間データと質量電荷比データとの 2次元データを、ある質量電荷について、各溶液または各第 2溶液の保持時間デー タを並列に並べた 2次元画像データに展開するデータ展開工程と、  The two-dimensional data of the retention time data and mass-to-charge ratio data corrected by the data correction method is converted into two-dimensional image data in which the retention time data of each solution or each second solution is arranged in parallel for a certain mass charge. Data development process to be deployed,
前記 2次元画像データに基づいて、保持時間データの同一ピークを抽出する同一 ピーク抽出工程と、  The same peak extraction step of extracting the same peak of the retention time data based on the two-dimensional image data;
を備えたことを特徴とするデータ分析方法。  A data analysis method characterized by comprising:
[9] 前記同一ピーク抽出工程は、 [9] The same peak extraction step includes:
各溶液または各第 2溶液の保持時間データについてのピークを検出するピーク検 出工程と、  A peak detection step for detecting a peak for the retention time data of each solution or each second solution;
前記ピーク検出工程で検出された各溶液または各第 2溶液のピーク同士の対応関 係を特定する同一ピーク特定工程と、  The same peak identifying step for identifying the correspondence between the peaks of each solution or each second solution detected in the peak detecting step;
を含んでいる  Contains
ことを特徴とする請求項 8に記載のデータ分析方法。  The data analysis method according to claim 8, wherein:
[10] 前記同一ピーク特定工程は、 [10] The same peak identifying step includes:
所定の保持時間幅内に含まれる候補ピークを抽出する候補ピーク抽出工程と、 ある溶液または第 2溶液において前記候補ピーク抽出工程で抽出された候補ピー クがーつ以上ある場合には、当該溶液または第 2溶液についての候補ピークを一つ 選択し、ある溶液または第 2溶液にぉ ヽて前記候補ピーク抽出工程で抽出された候 補ピークが無 、場合には、当該溶液または第 2溶液にっ 、ての候補ピークは無!、も のとして、前記候補ピークの選択の全組合せの各々について、選択された候補ピー クのスコア (総強度)を計算するスコア計算工程と、 If there are more than one candidate peak extracted in the candidate peak extraction step in a solution or second solution, and a candidate peak extraction step for extracting candidate peaks included within a predetermined holding time width, the solution Alternatively, one candidate peak for the second solution is selected, and if there is no candidate peak extracted in the candidate peak extraction step in a certain solution or the second solution, the candidate solution or the second solution Therefore, there is no candidate peak! For each of all combinations of candidate peak selections, A score calculation process for calculating the score (total strength) of the
前記スコア計算工程で得られたスコアのうち、最大のスコアを提供する候補ピーク の選択の組み合わせを、互いに対応する同一ピークとして特定するピーク特定工程 と、  Among the scores obtained in the score calculation step, a peak identification step for identifying the combination of candidate peak selections that provide the maximum score as the same corresponding peak,
を有している  have
ことを特徴とする請求項 9に記載のデータ分析方法。  The data analysis method according to claim 9, wherein:
[11] 前記同一ピーク特定工程は、前記ピーク特定工程の後に、 [11] The same peak specifying step, after the peak specifying step,
前記ピーク特定工程において特定された同一ピークによって保持時間データを区 間分割するデータ分割工程  A data dividing step for dividing the holding time data into segments by the same peak specified in the peak specifying step
を更に含んでおり、  Further including
前記データ分割工程において区間分割された保持時間データについて、前記候 補ピーク抽出工程、前記スコア計算工程、前記ピーク特定工程、及び、前記データ 分割工程を再帰的に繰り返すようになって!/ヽる、  For the retention time data divided in the data division step, the candidate peak extraction step, the score calculation step, the peak identification step, and the data division step are repeated recursively! ,
ことを特徴とする請求項 10に記載のデータ分析方法。  The data analysis method according to claim 10, wherein:
[12] 前記候補ピーク抽出工程は、各ピークを基準にして、当該ピークからの許容ずれ範 囲幅を所定の保持時間幅として行われるようになつている [12] The candidate peak extraction step is performed using each peak as a reference and an allowable deviation range width from the peak as a predetermined holding time width.
ことを特徴とする請求項 10または 11に記載のデータ分析方法。  The data analysis method according to claim 10 or 11, wherein:
[13] 前記許容ずれ範囲幅は、 +側に 0. 7minである [13] The allowable deviation range width is 0.7 min on the + side.
ことを特徴とする請求項 12に記載のデータ分析方法。  The data analysis method according to claim 12, wherein:
[14] 前記候補ピーク抽出工程で抽出された候補ピークが無い溶液または第 2溶液の割 合が、所定の最小検出率を下回った場合には、当該ピークを基準にした前記同一ピ ーク特定工程の実施が終了されるようになって!/、る [14] When the ratio of the solution without the candidate peak extracted in the candidate peak extraction step or the second solution falls below a predetermined minimum detection rate, the identification of the same peak based on the peak is performed. Implementation of the process has ended! /
ことを特徴とする請求項 12または 13に記載のデータ分析方法。  14. The data analysis method according to claim 12 or 13, wherein:
[15] 前記最小検出率は、 0. 1〜0. 4に設定される [15] The minimum detection rate is set to 0.1 to 0.4
ことを特徴とする請求項 14に記載のデータ分析方法。  15. The data analysis method according to claim 14, wherein:
[16] 前記データ展開工程は、単位質量電荷ごとに、各溶液または各第 2溶液の保持時 間データを並列に並べた 2次元画像データに展開するようになっており、 [16] In the data development step, for each unit mass charge, the retention time data of each solution or each second solution is developed into two-dimensional image data arranged in parallel.
前記同一ピーク抽出工程は、前記 2次元画像データに基づいて、単位質量電荷ご とに、保持時間データの同一ピークを抽出するようになっている The same peak extraction step is performed based on the two-dimensional image data. In addition, the same peak of retention time data is extracted
ことを特徴とする請求項 8乃至 15に記載のデータ分析方法。  The data analysis method according to any one of claims 8 to 15, wherein
[17] 第 1群に属する複数の被検査物質または第 2被検査物質をそれぞれ含む複数の溶 液または第 2溶液について、及び、第 2群に属する複数の被検査物質または第 2被 検査物質をそれぞれ含む複数の溶液または第 2溶液について、請求項 8乃至 16の いずれかに記載のデータ分析方法を実施するデータ分析工程と、 [17] For a plurality of solutions or second solutions each containing a plurality of test substances or a second test substance belonging to the first group, and a plurality of test substances or a second test substance belonging to the second group A data analysis step of performing the data analysis method according to any one of claims 8 to 16 for a plurality of solutions or second solutions each containing
第 1群の複数の溶液または第 2溶液力 得られた保持時間データの同一ピークと、 第 2群の複数の溶液または第 2溶液力 得られた保持時間データの同一ピークと、を 比較して、両者に有意な差があるか否かを検証する検定工程と、  Compare the same peak of retention time data obtained for multiple solutions or second solution force of the first group with the same peak of retention time data obtained for multiple solutions of the second group or second solution force. A verification process to verify whether there is a significant difference between the two,
を備えたことを特徴とするデータ比較方法。  A data comparison method characterized by comprising:
[18] 液体クロマトグラフィーに被検査物質を含む溶液を流して、当該溶液の濃度勾配を 所定時間かけて生成する濃度勾配生成工程と、 [18] A concentration gradient generating step for generating a concentration gradient of the solution over a predetermined time by flowing a solution containing the substance to be tested through liquid chromatography;
前記濃度勾配生成工程中に分離溶出した前記被検査物質の各成分について、保 持時間データと質量電荷比データとを対応付けて得る測定工程と、  A measurement step for obtaining retention time data and mass-to-charge ratio data in association with each component of the test substance separated and eluted during the concentration gradient generation step;
を備えた液体クロマトグラフィー方法のためのデータ補正装置であって、  A data correction apparatus for a liquid chromatography method comprising:
前記測定工程にて得られた保持時間データと質量電荷比データとの 2次元データ を、予め求めてあった標準 2次元データと相関させることによって補正するようになつ ており、  The two-dimensional data of the retention time data and the mass-to-charge ratio data obtained in the measurement step is corrected by correlating with the standard two-dimensional data obtained in advance.
2つの 2次元データのサイクル番号 (保持時間に対する昇順番号)を各軸とした 2次 元の格子座標を用いて最適対応位置を探索する動的アルゴリズムが用いられるよう になっている  A dynamic algorithm that searches for the optimal corresponding position using a two-dimensional grid coordinate with each cycle as the cycle number of two two-dimensional data (ascending order number with respect to holding time) is used.
ことを特徴とするデータ補正装置。  A data correction apparatus characterized by that.
[19] 前記動的アルゴリズムは、 [19] The dynamic algorithm is:
一方の 2次元データの nサイクル目における質量電荷比(マススペクトル) A (n)と、 他方の 2次元データの mサイクル目における質量電荷比 B (n)と、の間のピアソン積 率相関係数を R (A(n) , B (m) )とし、  Pearson product phase relationship between the mass-to-charge ratio (mass spectrum) A (n) in the n-th cycle of one two-dimensional data and the mass-to-charge ratio B (n) in the m-th cycle of the other two-dimensional data Let the number be R (A (n), B (m))
ギャップペナノレティを gとし、  Let the gap pennality be g,
一方の 2次元データの総サイクル数を Nとし、 他方の 2次元データの総サイクル数を Mとした時、 Let N be the total number of cycles for one 2D data, When the total number of cycles of the other two-dimensional data is M,
2つの 2次元データのサイクル番号を各軸とした 2次元の格子座標 L (i, j)を、 L(i, j) =max(L(i— 1, j) +g、  Two-dimensional lattice coordinates L (i, j) with the cycle numbers of two two-dimensional data as axes, L (i, j) = max (L (i- 1, j) + g,
L(i, ト l)+gゝ  L (i, G) + g ゝ
L(i-1, j-l)+R(A(n), B(m)))  L (i-1, j-l) + R (A (n), B (m)))
によって求め(i=l, ···, N、j = l, ···, M)、  (I = l, ···, N, j = l, ···, M),
最適対応位置に対応するように、 L = argmax(k, 1) , ((k = N、l=l, ·'·, Μ)及 び (k=l, ···, N、1=M))を与える座標(k, 1)=V を始点として、  L = argmax (k, 1), ((k = N, l = l, ... ', Μ) and (k = l, ..., N, 1 = M )) Gives the coordinates (k, 1) = V
0  0
L = argmax(V ), (V =V - (1, 1), V_ - (0, 1), V_ 一(1, 0)) で表される座標配列を決定するようになって ヽる  L = argmax (V), (V = V-(1, 1), V_-(0, 1), V_ one (1, 0))
ことを特徴とする請求項 18に記載のデータ補正装置。  The data correction device according to claim 18, wherein
[20] 前記動的アルゴリズムは、 L = argmax(V )のうちで V =V + (1, 1)を満た [20] The dynamic algorithm satisfies V = V + (1, 1) among L = argmax (V).
i i+1 i  i i + 1 i
す座標のみを抽出した後、サイクル番号を保持時間に変換し、スプライン補完または 多項式回帰により得られる曲線を補正関数として決定するようになっている ことを特徴とする請求項 19に記載のデータ補正装置。  20. The data correction according to claim 19, wherein after extracting only the coordinates, the cycle number is converted into a holding time, and a curve obtained by spline interpolation or polynomial regression is determined as a correction function. apparatus.
[21] 少なくとも 1台のコンピュータを含むコンピュータシステムによって実行されて、前記 コンピュータシステムに請求項 18乃至 20のいずれかに記載のデータ補正装置を実 現させるプログラム。 21. A program that is executed by a computer system including at least one computer and causes the computer system to implement the data correction device according to any one of claims 18 to 20.
[22] 少なくとも 1台のコンピュータを含むコンピュータシステム上で動作する第 2のプログ ラムを制御する命令が含まれており、  [22] instructions for controlling a second program running on a computer system including at least one computer are included;
前記コンピュータシステムによって実行されて、前記第 2のプログラムを制御して、 前記コンピュータシステムに請求項 18乃至 20のいずれかに記載のデータ補正装置 を実現させるプログラム。  21. A program that is executed by the computer system, controls the second program, and causes the computer system to implement the data correction apparatus according to claim 18.
[23] 同一のまたは異なる液体クロマトグラフィーに複数の被検査物質をそれぞれ含む複 数の溶液を流して、当該各溶液の濃度勾配を所定時間かけて生成する濃度勾配生 成工程と、 [23] A concentration gradient generating step of generating a concentration gradient of each solution over a predetermined time by flowing a plurality of solutions each containing a plurality of test substances in the same or different liquid chromatography;
前記濃度勾配生成工程中に分離溶出した前記被検査物質の各成分について、各 溶液毎に、保持時間データと質量電荷比データとを対応付けて得る測定工程と、 前記測定工程にて得られた保持時間データと質量電荷比データとの 2次元データ を、ある質量電荷について、各溶液の保持時間データを並列に並べた 2次元画像デ ータに展開するデータ展開工程と、 For each component of the inspected substance separated and eluted during the concentration gradient generation step, a measurement step for associating retention time data and mass-to-charge ratio data for each solution; and Data expansion that expands the two-dimensional data of the retention time data and mass-to-charge ratio data obtained in the measurement process into two-dimensional image data in which the retention time data of each solution is arranged in parallel for a certain mass charge Process,
前記 2次元画像データに基づいて、保持時間データの同一ピークを抽出する同一 ピーク抽出工程と、  The same peak extraction step of extracting the same peak of the retention time data based on the two-dimensional image data;
を備えたことを特徴とするデータ分析方法。  A data analysis method characterized by comprising:
[24] 同一のまたは異なる液体クロマトグラフィーに複数の被検査物質をそれぞれ含む複 数の溶液を流して、当該各溶液の濃度勾配を所定時間かけて生成する濃度勾配生 成工程と、 [24] A concentration gradient generating step of generating a concentration gradient of each solution over a predetermined time by flowing a plurality of solutions each containing a plurality of test substances in the same or different liquid chromatography,
前記濃度勾配生成工程中に分離溶出した前記被検査物質の各成分について、各 溶液毎に、保持時間データと質量電荷比データとを対応付けて得る測定工程と、 を備えた液体クロマトグラフィー方法のためのデータ分析装置であって、  A measurement step of associating retention time data and mass-to-charge ratio data for each solution for each component of the analyte to be separated and eluted during the concentration gradient generation step, and a liquid chromatography method comprising: A data analysis device for
前記測定工程にて得られた保持時間データと質量電荷比データとの 2次元データ を、ある質量電荷について、各溶液の保持時間データを並列に並べた 2次元画像デ ータに展開するデータ展開装置と、  Data expansion that expands the two-dimensional data of the retention time data and mass-to-charge ratio data obtained in the measurement process into two-dimensional image data in which the retention time data of each solution is arranged in parallel for a certain mass charge Equipment,
前記 2次元画像データに基づいて、保持時間データの同一ピークを抽出する同一 ピーク抽出装置と、  The same peak extraction device for extracting the same peak of the retention time data based on the two-dimensional image data;
を備えたことを特徴とするデータ分析装置。  A data analysis apparatus comprising:
[25] 前記同一ピーク抽出装置は、 [25] The same peak extraction device comprises:
各溶液または各第 2溶液の保持時間データについてのピークを検出するピーク検 出工程と、  A peak detection step for detecting a peak for the retention time data of each solution or each second solution;
前記ピーク検出工程で検出された各溶液または各第 2溶液のピーク同士の対応関 係を特定する同一ピーク特定工程と、  The same peak identifying step for identifying the correspondence between the peaks of each solution or each second solution detected in the peak detecting step;
を実行するようになっている  Is supposed to run
ことを特徴とする請求項 24に記載のデータ分析装置。  25. The data analysis apparatus according to claim 24.
[26] 前記同一ピーク特定工程は、 [26] The same peak specifying step,
所定の保持時間幅内に含まれる候補ピークを抽出する候補ピーク抽出工程と、 ある溶液または第 2溶液において前記候補ピーク抽出工程で抽出された候補ピー クがーつ以上ある場合には、当該溶液または第 2溶液についての候補ピークを一つ 選択し、ある溶液または第 2溶液にぉ ヽて前記候補ピーク抽出工程で抽出された候 補ピークが無 、場合には、当該溶液または第 2溶液にっ 、ての候補ピークは無!、も のとして、前記候補ピークの選択の全組合せの各々について、選択された候補ピー クのスコア (総強度)を計算するスコア計算工程と、 A candidate peak extraction step for extracting candidate peaks included within a predetermined holding time width, and a candidate peak extracted in the candidate peak extraction step in a solution or a second solution. If there is more than one mark, select one candidate peak for the solution or the second solution, and there is no candidate peak extracted in the candidate peak extraction step over a certain solution or the second solution. In this case, there is no candidate peak in the solution or the second solution, and as a result, the score (total intensity) of the selected candidate peak for each of all combinations of the candidate peak selections. A score calculation step for calculating
前記スコア計算工程で得られたスコアのうち、最大のスコアを提供する候補ピーク の選択の組み合わせを、互いに対応する同一ピークとして特定するピーク特定工程 と、  Among the scores obtained in the score calculation step, a peak identification step for identifying combinations of candidate peaks that provide the maximum score as the same corresponding peaks;
を有している  have
ことを特徴とする請求項 25に記載のデータ分析装置。  26. The data analysis apparatus according to claim 25.
[27] 前記同一ピーク特定工程は、前記ピーク特定工程の後に、 [27] The same peak identification step, after the peak identification step,
前記ピーク特定工程において特定された同一ピークによって保持時間データを区 間分割するデータ分割工程  A data dividing step for dividing the holding time data into segments by the same peak specified in the peak specifying step
を更に含んでおり、  Further including
前記データ分割工程において区間分割された保持時間データについて、前記候 補ピーク抽出工程、前記スコア計算工程、前記ピーク特定工程、及び、前記データ 分割工程を再帰的に繰り返すようになって!/ヽる、  For the retention time data divided in the data division step, the candidate peak extraction step, the score calculation step, the peak identification step, and the data division step are repeated recursively! ,
ことを特徴とする請求項 26に記載のデータ分析装置。  27. The data analysis apparatus according to claim 26.
[28] 少なくとも 1台のコンピュータを含むコンピュータシステムによって実行されて、前記 コンピュータシステムに請求項 24乃至 27のいずれかに記載のデータ分析装置を実 現させるプログラム。 [28] A program which is executed by a computer system including at least one computer and causes the computer system to implement the data analysis device according to any one of claims 24 to 27.
[29] 少なくとも 1台のコンピュータを含むコンピュータシステム上で動作する第 2のプログ ラムを制御する命令が含まれており、  [29] includes instructions for controlling a second program running on a computer system including at least one computer;
前記コンピュータシステムによって実行されて、前記第 2のプログラムを制御して、 前記コンピュータシステムに請求項 24乃至 27のいずれかに記載のデータ分析装置 を実現させるプログラム。  A program that is executed by the computer system to control the second program to cause the computer system to realize the data analysis apparatus according to any one of claims 24 to 27.
PCT/JP2006/306907 2005-06-17 2006-03-31 Method of data correction in liquid chromatography WO2006134703A1 (en)

Priority Applications (1)

Application Number Priority Date Filing Date Title
JP2007521162A JP5119405B2 (en) 2005-06-17 2006-03-31 Data correction method for liquid chromatography

Applications Claiming Priority (2)

Application Number Priority Date Filing Date Title
JP2005-177547 2005-06-17
JP2005177547 2005-06-17

Publications (1)

Publication Number Publication Date
WO2006134703A1 true WO2006134703A1 (en) 2006-12-21

Family

ID=37532072

Family Applications (1)

Application Number Title Priority Date Filing Date
PCT/JP2006/306907 WO2006134703A1 (en) 2005-06-17 2006-03-31 Method of data correction in liquid chromatography

Country Status (2)

Country Link
JP (2) JP5119405B2 (en)
WO (1) WO2006134703A1 (en)

Cited By (2)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
JP2008232650A (en) * 2007-03-16 2008-10-02 Japan Health Science Foundation Method of analyzing sugar peptide tandem mass data
JP2013506142A (en) * 2009-10-01 2013-02-21 フェノメノーム ディスカバリーズ インク Serum-based biomarkers of pancreatic cancer and their use for disease detection and diagnosis

Citations (4)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
JPH06324029A (en) * 1993-03-15 1994-11-25 Hitachi Ltd Method and apparatus of analyzing and displaying chromatogram
JPH11344482A (en) * 1998-06-02 1999-12-14 Jeol Ltd Mass spectrometer system
JP2000304735A (en) * 1999-04-20 2000-11-02 Shimadzu Corp Chromatographic mass spectroscope
JP2001108665A (en) * 1999-10-05 2001-04-20 Shimadzu Corp Data processor for chromatograph

Family Cites Families (2)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
JPH0835960A (en) * 1994-07-20 1996-02-06 Shimadzu Corp Data processor for chromatograph mass analyzer
GB2404193A (en) * 2003-07-21 2005-01-26 Amersham Biosciences Ab Automated chromatography/mass spectrometry analysis

Patent Citations (4)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
JPH06324029A (en) * 1993-03-15 1994-11-25 Hitachi Ltd Method and apparatus of analyzing and displaying chromatogram
JPH11344482A (en) * 1998-06-02 1999-12-14 Jeol Ltd Mass spectrometer system
JP2000304735A (en) * 1999-04-20 2000-11-02 Shimadzu Corp Chromatographic mass spectroscope
JP2001108665A (en) * 1999-10-05 2001-04-20 Shimadzu Corp Data processor for chromatograph

Non-Patent Citations (2)

* Cited by examiner, † Cited by third party
Title
STRICKLER M.P. ET AL.: "peptide mapping of variant glycoproteins from Trypanosoma rhodesiense by reverse phase liquid chromatography", JOURNAL OF LIQUID CHROMATOGRAPHY, vol. 5, no. 10, December 1982 (1982-12-01), pages 1933 - 1940, XP003002816 *
ZHANG H. ET AL.: "High Throughput Quantitative Analysis of Serum Proteins Using Glycopeptide Capture and Liquid Chromatography Mass Spectrometry", MOLECULAR & CELLULAR PROTEOMICS, vol. 4, no. 2, February 2005 (2005-02-01), pages 144 - 155, XP003002815 *

Cited By (2)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
JP2008232650A (en) * 2007-03-16 2008-10-02 Japan Health Science Foundation Method of analyzing sugar peptide tandem mass data
JP2013506142A (en) * 2009-10-01 2013-02-21 フェノメノーム ディスカバリーズ インク Serum-based biomarkers of pancreatic cancer and their use for disease detection and diagnosis

Also Published As

Publication number Publication date
JP5119405B2 (en) 2013-01-16
JP2012198245A (en) 2012-10-18
JPWO2006134703A1 (en) 2009-01-08
JP5760253B2 (en) 2015-08-05

Similar Documents

Publication Publication Date Title
JP7295092B2 (en) How to choose a binding reagent
Liu et al. Serum proteomics for gastric cancer
CN102062780A (en) Polypeptide immunoassay kit and detection method thereof
Zhong et al. Biomarker discovery for ovine paratuberculosis (Johne's disease) by proteomic serum profiling
KR101645841B1 (en) Identification of proteins in human serum indicative of pathologies of human lung tissues
Hatakeyama et al. Identification of a novel protein isoform derived from cancer‐related splicing variants using combined analysis of transcriptome and proteome
KR101946884B1 (en) Method for diagnosing Behcet's disease by using metabolomics
El-Aneed et al. Proteomics in the diagnosis of hepatocellular carcinoma: focus on high risk hepatitis B and C patients
CN107119120A (en) A kind of key effect molecular detecting method based on chromatin 3D conformation technologies
JP5760253B2 (en) Data correction method for liquid chromatography
Kojima et al. Applying proteomic-based biomarker tools for the accurate diagnosis of pancreatic cancer
JP7107477B2 (en) Methods for detecting mitochondrial tRNA modifications
JP2007502990A (en) Method for diagnosing squamous cell carcinoma of the head and neck
KR101806136B1 (en) Method for diagnosing Behcet's disease with arthritis by using metabolomics
Ao et al. Comparative proteomic analysis of radiation-induced changes in mouse lung: fibrosis-sensitive and-resistant strains
Kim et al. Urine proteomics and biomarkers in renal disease
CN109239211B (en) Serum marker and detection kit for identifying human body infected hydatid
CN116430026A (en) Nucleotide analysis method based on pairing derivatization technology
CN116754772A (en) Peripheral blood protein marker for early diagnosis of senile dementia, application and auxiliary diagnosis system
CN105699514A (en) Liquid chromatography-mass spectrometry model for detecting gastric cancer associated metabolism small molecule, and preparation method of liquid chromatography-mass spectrometry model
CN109870580B (en) Application of serum protein marker group in preparation of detection kit for identifying schistosomiasis and detection kit
Qiu et al. Searching for potential ovarian cancer biomarkers with matrix-assisted laser desorption/ionization time-of-flight mass spectrometry
US20040033613A1 (en) Saliva-based protein profiling
J Reis et al. Evaluation of post-surgical cognitive function and protein fingerprints in the cerebro-spinal fluid utilizing surface-enhanced laser desorption/ionization time-of-flight mass-spectrometry (SELDI-TOF MS) after coronary artery bypass grafting: review of proteomic analytic tools and introducing a new syndrome
JP2009210469A (en) Analytical method for serum protein

Legal Events

Date Code Title Description
121 Ep: the epo has been informed by wipo that ep was designated in this application
WWE Wipo information: entry into national phase

Ref document number: 2007521162

Country of ref document: JP

NENP Non-entry into the national phase

Ref country code: DE

122 Ep: pct application non-entry in european phase

Ref document number: 06730855

Country of ref document: EP

Kind code of ref document: A1