WO2024090062A1 - Antigen discovery method and antigen discovery system - Google Patents

Antigen discovery method and antigen discovery system Download PDF

Info

Publication number
WO2024090062A1
WO2024090062A1 PCT/JP2023/033322 JP2023033322W WO2024090062A1 WO 2024090062 A1 WO2024090062 A1 WO 2024090062A1 JP 2023033322 W JP2023033322 W JP 2023033322W WO 2024090062 A1 WO2024090062 A1 WO 2024090062A1
Authority
WO
WIPO (PCT)
Prior art keywords
data
antigen
antibody
target proteins
disease
Prior art date
Application number
PCT/JP2023/033322
Other languages
French (fr)
Japanese (ja)
Inventor
裕一 引地
悠矢 宮本
慎弥 渡辺
基樹 高木
順一 今井
Original Assignee
チューニングフォーク・バイオ・インク
公立大学法人福島県立医科大学
Priority date (The priority date is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the date listed.)
Filing date
Publication date
Priority claimed from JP2022172488A external-priority patent/JP2024064128A/en
Application filed by チューニングフォーク・バイオ・インク, 公立大学法人福島県立医科大学 filed Critical チューニングフォーク・バイオ・インク
Publication of WO2024090062A1 publication Critical patent/WO2024090062A1/en

Links

Images

Definitions

  • the present invention relates to an antigen discovery method and an antigen discovery system that uses a protein microarray to discover target proteins that are correlated with disease-related antigens associated with each of a number of diseases.
  • a protein microarray is a substrate such as a glass slide on which thousands to tens of thousands of target proteins are immobilized in an array of spots.
  • Autoantibodies are produced against disease-related antigens that are related to a disease. Therefore, by analyzing measurement data regarding the antigen-antibody reaction between each target protein on the protein microarray substrate and the autoantibody, it is possible to search for specific target proteins that have a correlation with disease-related antigens from among the target proteins.
  • the object of the present invention is to provide an antigen discovery method and an antigen discovery system that can efficiently discover target proteins that are correlated with disease-related antigens that are related to each of a number of diseases, using a protein microarray in which a number of target proteins are fixed on a substrate.
  • An antigen discovery method is a method for discovering target proteins that are correlated with disease-related antigens associated with each of a plurality of diseases, using a protein microarray in which a first number of target proteins are fixed on a substrate.
  • This antigen discovery method includes a data acquisition step of acquiring measurement data for each of the first number of target proteins, which is related to a predetermined feature amount when each of a plurality of specimen samples derived from a plurality of subjects is contacted with the first number of target proteins; a data calculation step of calculating, based on the measurement data, antibody data for each of the first number of target proteins, which is related to binding associated with an antigen-antibody reaction between an autoantibody in the plurality of specimen samples and the first number of target proteins, in association with diagnostic data showing diagnostic results for a plurality of diseases for each of the plurality of subjects; and an antigen selection step of generating a machine learning model using the antibody data for each of the first number of target proteins associated with the diagnostic data as input data, selecting a specific target protein that is correlated with the disease-related antigen from among the first number of target proteins based on the antibody data, corresponding to each of a plurality of diseases shown in the diagnostic data, according to the machine learning model, and outputting the selected specific target protein as output
  • An antigen search system comprises a measurement device that uses a protein microarray having a first number of target proteins fixed on a substrate to measure a predetermined characteristic amount when each of a plurality of specimen samples derived from a plurality of specimens is contacted with the first number of target proteins, and outputs measurement data for each of the first number of target proteins indicating the measurement results, and an antigen search device that searches for target proteins that have a correlation with disease-associated antigens associated with each of a plurality of diseases.
  • the antigen search device includes a data acquisition unit that acquires the measurement data, a data calculation unit that calculates antibody data for each of the first number of target proteins related to binding associated with an antigen-antibody reaction between the autoantibody in the multiple test specimens and the first number of target proteins based on the measurement data, in association with diagnostic data showing diagnostic results for multiple diseases for each of the multiple test specimens, and an antigen selection unit that generates a machine learning model using the antibody data for each of the first number of target proteins associated with the diagnostic data as input data, selects a specific target protein that has a correlation with the disease-related antigen from among the first number of target proteins based on the antibody data in accordance with the machine learning model, corresponding to each of the multiple diseases shown in the diagnostic data, and outputs the selected specific target protein as output data.
  • FIG. 1 is a block diagram of an antigen discovery system according to an embodiment of the present invention.
  • FIG. 2 is a diagram explaining a measurement process performed by a measurement device provided in the antigen search system.
  • FIG. 11 is a diagram showing an example of measurement data showing a measurement result by a measurement device.
  • 1 is a flowchart of an antigen searching method performed by an antigen searching device provided in an antigen searching system.
  • 11 is a diagram showing an example of antibody data calculated by a data calculation unit of the antigen searching device.
  • FIG. 1 is a block diagram of an antigen discovery system according to an embodiment of the present invention.
  • FIG. 2 is a diagram explaining a measurement process performed by a measurement device provided in the antigen search system.
  • FIG. 11 is a diagram showing an example of measurement data showing a measurement result by a measurement device.
  • 1 is a flowchart of an antigen searching method performed by an antigen searching device provided in an antigen searching system.
  • 11 is a diagram showing an example of
  • the antigen search system 1 is a system that includes a measurement device 2 and an antigen search device 3.
  • the measurement device 2 acquires measurement data D1 related to diseases of multiple subjects using a protein microarray 4 shown in Fig. 2, and the antigen search device 3 searches for proteins that have a correlation with disease-related antigens related to each of the multiple diseases.
  • the subject primarily refers to humans and other mammals.
  • Other mammals include, for example, primates such as monkeys and chimpanzees, livestock animals such as cows, horses, pigs and sheep, pet animals such as dogs and cats, and laboratory animals such as mice, rats and rabbits. Of these, it is preferable that the subject is a human.
  • Neuromuscular diseases such as attention deficit hyperactivity disorder, dementia with Lewy bodies, mild cognitive impairment, major depressive disorder, and Parkinson's disease; malignant tumors such as brain tumors, gastric cancer, colon cancer, liver cancer, and lung cancer; digestive system diseases such as ulcerative colitis; respiratory system diseases such as idiopathic interstitial pneumonia; circulatory system diseases such as idiopathic dilated cardiomyopathy; blood system diseases such as idiopathic thrombocytopenic purpura; and sequelae of infectious diseases such as COVID-19. Of these, it is preferable to target multiple diseases as neuromuscular diseases.
  • neuromuscular diseases such as attention deficit hyperactivity disorder, dementia with Lewy bodies, mild cognitive impairment, major depressive disorder, and Parkinson's disease
  • malignant tumors such as brain tumors, gastric cancer, colon cancer, liver cancer, and lung cancer
  • digestive system diseases such as ulcerative colitis
  • respiratory system diseases such as idiopathic interstitial pneumonia
  • circulatory system diseases such as idiopathic
  • the measurement device 2 is a device that acquires measurement data D1 regarding a reference sample SPR and a plurality of specimen samples SPS derived from a plurality of specimens, using a protein microarray 4 shown in FIG.
  • the protein microarray 4 is a matrix of a plurality of spots 42 arranged on a substrate 41 such as a slide glass.
  • a first number of a plurality of target proteins TP ranging from several thousand to several tens of thousands, are fixed on the spots 42 on the substrate 41. That is, the protein microarray 4 is a matrix of a first number of target proteins TP, ranging from several thousand to several tens of thousands, fixed on the spots 42 on the substrate 41.
  • the target proteins TP are antigen proteins such as recombinant proteins expressed in a cell-free expression system such as wheat germ extract.
  • target proteins TP include standard proteins such as A1BG (1alpha-1-B glycoprotein), A1CF (APOBEC1 complementation factor), and A2M (2alpha-2-macroglobulin), mutant proteins derived from the p53 gene, and fusion proteins derived from the ALK fusion gene.
  • the specimen sample SPS derived from a subject may include, for example, whole blood, serum, plasma, urine, prostatic fluid, tears, mucus ascites, oral fluid, saliva, semen, seminal plasma, mucus, stool, sputum, cerebrospinal fluid, bone marrow, lymph, etc., and these may be prepared with a diluent or the like.
  • the specimen sample SPS includes samples derived from subjects without a disease and samples derived from subjects with a disease.
  • the specimen sample SPS derived from a subject with a disease contains the autoantibody ABA produced against the disease-associated antigen AG (see Figure 4 below) associated with the disease.
  • the specimen sample SPS derived from a subject without a disease does not contain the autoantibody ABA.
  • Disease-associated antigen AG is a protein that produces autoantibody ABA in the body of a subject due to a disease.
  • Autoantibody ABA refers to an antibody against disease-associated antigen AG.
  • Examples of autoantibody ABA include antibodies with globulin types (classes) of IgG, IgM, IgA, IgD, and IgE.
  • Autoantibody ABA binds to target protein TP that is correlated with disease-associated antigen AG associated with a disease among the first number of target proteins TP fixed to spots 42 on substrate 41 of protein microarray 4 by antigen-antibody reaction with the target protein TP.
  • autoantibody ABA does not bind to target protein TP that is not correlated with disease-associated antigen AG associated with a disease among the first number of target proteins TP, or even if it does bind, the binding strength is weak. That is, the autoantibody ABA exhibits high binding affinity to the target protein TP that correlates with the disease-associated antigen AG associated with the disease corresponding to the autoantibody ABA, but exhibits low binding affinity to the target protein TP that does not correlate with the disease-associated antigen AG.
  • the reference sample SPR is a sample containing a reference antibody ABR that can bind, through an antigen-antibody reaction, to any of the first number of target proteins TP fixed to the spots 42 on the substrate 41 of the protein microarray 4.
  • An example of the reference antibody ABR is a Goat reference antibody.
  • the measuring device 2 measures a predetermined feature amount when each of a plurality of specimen samples SPS derived from a plurality of specimens is brought into contact with a first number of target proteins TP fixed to spots 42 on a substrate 41 of a protein microarray 4.
  • the measuring device 2 is not particularly limited in its measurement method as long as it is capable of measuring a feature amount that is an index of binding associated with an antigen-antibody reaction between the first number of target proteins TP and the autoantibody ABA in the specimen sample SPS.
  • the measurement method of the measuring device 2 is distinguished according to the labeling substance that is bound to the autoantibody ABA to measure the feature amount. Examples of the measurement method of the measuring device 2 include a method using a fluorescent substance as a labeling substance, a method using an enzyme as a labeling substance, and a method using RI (radioisotope) as a labeling substance.
  • the measuring device 2 uses a fluorescent substance as a labeling substance and measures the fluorescence intensity as a feature quantity. Specifically, the measuring device 2 measures the fluorescence intensity of the fluorescence emitted from the first fluorescent substance in response to irradiation of a predetermined excitation light in the first secondary antibody AB1 that can bind to the reference antibody ABR in the reference sample SPR and is labeled with a first fluorescent substance. The measuring device 2 also measures the fluorescence intensity of the fluorescence emitted from the second fluorescent substance in response to irradiation of a predetermined excitation light in the second secondary antibody AB2 that can bind to the autoantibody ABA in the specimen sample SPS and is labeled with a second fluorescent substance.
  • the first secondary antibody AB1 is, for example, an anti-Goat IgG antibody labeled with a first fluorescent substance that emits green fluorescence.
  • the second secondary antibody AB2 is, for example, an anti-human IgG antibody, anti-human IgM antibody, anti-human IgA antibody, anti-human IgD antibody, anti-human IgE antibody, etc., labeled with a second fluorescent substance that emits red fluorescence.
  • the reference antibody ABR in the reference sample SPR binds to all of the first number of target proteins TP through an antigen-antibody reaction (primary reaction).
  • the first secondary antibody AB1 and the second secondary antibody AB2 are contacted with all of the first number of target proteins TP in a state in which the reference antibody ABR is bound, the first secondary antibody AB1 binds to the reference antibody ABR in the secondary reaction, and the second secondary antibody AB2 floats away and is removed as it is since there is no binding target.
  • the first secondary antibody AB1 labeled with the first fluorescent substance binds to each of the first number of target proteins TP via the reference antibody ABR, while the second secondary antibody AB2 is not bound.
  • the measuring device 2 When only the reference sample SPR is brought into contact with the first number of target proteins TP, the measuring device 2 outputs measurement data D1 of a negative control (NC) for each of the first number of target proteins TP as data showing the measurement result.
  • the measurement data D1 of the NC includes a first fluorescence intensity value D11 indicating the intensity of the fluorescence emitted from the first fluorescent substance of the first secondary antibody AB1 capable of binding to the reference antibody ABR, and a second fluorescence intensity value D12 indicating the intensity of the fluorescence emitted from the second fluorescent substance of the second secondary antibody AB2 capable of binding to the autoantibody ABA in the specimen sample SPS.
  • the first secondary antibody AB1 labeled with the first fluorescent substance is bound to each of the first number of target proteins TP via the reference antibody ABR, while the second secondary antibody AB2 is not bound. Therefore, in the measurement data D1 of the NC, ideally, for each of the first number of target proteins TP, the first fluorescence intensity value D11 indicates a value equal to or greater than a predetermined first reference intensity value, and the second fluorescence intensity value D12 indicates a value close to zero that is less than the predetermined second reference intensity value.
  • the reference sample SPR and the specimen sample SPS are brought into contact with the first number of target proteins TP fixed to the spots 42 on the substrate 41 of the protein microarray 4.
  • the reference antibody ABR in the reference sample SPR binds to all of the first number of target proteins TP by an antigen-antibody reaction (primary reaction).
  • the autoantibody ABA in the specimen sample SPS binds to the target proteins TP that are correlated with the disease-associated antigen AG associated with the disease corresponding to the autoantibody ABA by an antigen-antibody reaction (primary reaction) among the first number of target proteins TP, but does not bind to the target proteins TP that are not correlated with the disease-associated antigen AG.
  • the first secondary antibody AB1 and the second secondary antibody AB2 are brought into contact with the first number of target proteins TP in a state in which the reference antibody ABR binds to all of the first number of target proteins TP and the autoantibody ABA binds to the target proteins TP that are correlated with the disease-associated antigen AG, the first secondary antibody AB1 binds to the reference antibody ABR and the second secondary antibody AB2 binds to the autoantibody ABA in the secondary reaction.
  • a first secondary antibody AB1 labeled with a first fluorescent substance is bound to each of the first number of target proteins TP via a reference antibody ABR
  • a second secondary antibody AB2 labeled with a second fluorescent substance is bound to a target protein TP that has a correlation with a disease-associated antigen AG via an autoantibody ABA.
  • the measuring device 2 When the reference sample SPR and the specimen sample SPS are brought into contact with the first number of target proteins TP, the measuring device 2 outputs, as data indicating the measurement results, measurement data D1 regarding the specimen sample SPS for each of the first number of target proteins TP in association with diagnostic data D2 (FIG. 1) indicating the diagnostic results regarding the disease of the specimen.
  • the measurement data D1 regarding the specimen sample SPS includes a first fluorescence intensity value D11 indicating the intensity of fluorescence emitted from a first fluorescent substance of the first secondary antibody AB1 capable of binding to the reference antibody ABR, and a second fluorescence intensity value D12 indicating the intensity of fluorescence emitted from a second fluorescent substance of the second secondary antibody AB2 capable of binding to the autoantibody ABA in the specimen sample SPS.
  • the first secondary antibody AB1 labeled with a first fluorescent substance is bound to each of the first number of target proteins TP via the reference antibody ABR
  • the second secondary antibody AB2 labeled with a second fluorescent substance is bound to the target protein TP correlated with the disease-related antigen AG via the autoantibody ABA.
  • the first fluorescence intensity value D11 indicates a value equal to or greater than the first reference intensity value for the target protein TP correlated with the disease-related antigen AG
  • the second fluorescence intensity value D12 indicates a value equal to or greater than the second reference intensity value
  • the first fluorescence intensity value D11 indicates a value equal to or greater than the first reference intensity value
  • the second fluorescence intensity value D12 indicates a value close to zero that is less than the second reference intensity value.
  • the specimen sample SPS derived from a subject without a disease does not contain the autoantibody ABA produced against the disease-related antigen AG. Therefore, in the measurement data D1 relating to the specimen sample SPS derived from a subject without a disease, ideally, for each of the first number of target proteins TP, the first fluorescence intensity value D11 indicates a value equal to or greater than the first reference intensity value, and the second fluorescence intensity value D12 indicates a value close to zero that is less than the second reference intensity value.
  • the measurement data D1 includes, for each of the first number of target proteins TP, a first fluorescence intensity value D11 and a second fluorescence intensity value D12 for NC, and a first fluorescence intensity value D11 and a second fluorescence intensity value D12 associated with diagnostic data D2 for a plurality of specimen samples SPS derived from a plurality of specimens.
  • the measurement data D1 for the plurality of specimen samples SPS includes, as a data group of the first fluorescence intensity value D11 and the second fluorescence intensity value D12, a data group associated with the first diagnostic data D21 indicating a diagnosis result of no disease, and a data group associated with the second to sixth diagnostic data D22 to D26 indicating a diagnosis result of, for example, a disease related to a neuromuscular disease.
  • the second diagnostic data D22 is diagnostic data indicating a diagnosis result of having attention deficit hyperactivity disorder.
  • the third diagnostic data D23 is diagnostic data indicating a diagnosis result of having Lewy body dementia.
  • the fourth diagnostic data D24 is diagnostic data indicating a diagnosis result of mild cognitive impairment.
  • the fifth diagnostic data D25 is diagnostic data indicating a diagnosis result of major depressive disorder.
  • the sixth diagnostic data D26 is diagnostic data indicating a diagnosis result of Parkinson's disease.
  • the measurement data D1 output from the measurement device 2 is input to the antigen search device 3.
  • the antigen search device 3 will be described with reference to FIG. 1 as well as FIG. 4 and FIG. 5.
  • the antigen search device 3 is a computer equipped with a CPU (Central Processing Unit), a storage area such as a HDD (Hard Disk Drive) or a flash memory, and a RAM (Random Access Memory) used as a working area for the CPU.
  • the antigen search device 3 searches for a target protein TP that has a correlation with each disease-related antigen AG associated with each of a plurality of diseases (e.g., diseases related to neuromuscular diseases) from among a first number of target proteins TP fixed on the substrate 41 of the protein microarray 4.
  • the antigen search device 3 executes an antigen search method for searching for a target protein TP that has a correlation with each disease-related antigen AG associated with each of a plurality of diseases by machine learning using a neural network.
  • a neural network is designed to mimic the structure of the human brain, and is made up of multiple layers of logic circuits that mimic the functions of neurons (nerve cells) in the human brain.
  • the antigen search device 3 has, as its functional configuration, a data acquisition unit 31 that performs the data acquisition step S1 in the antigen search method, a data calculation unit 32 that performs the data calculation step S2 in the antigen search method, and an antigen selection unit 33 that performs the antigen selection step S3 in the antigen search method.
  • the data acquisition unit 31 performs the data acquisition step S1 by acquiring the measurement data D1 output from the measurement device 2.
  • the measurement data D1 includes, for each of the first number of target proteins TP fixed on the substrate 41 of the protein microarray 4, a first fluorescence intensity value D11 and a second fluorescence intensity value D12 for NC, and a first fluorescence intensity value D11 and a second fluorescence intensity value D12 associated with diagnostic data D2 for multiple specimen samples SPS derived from multiple specimens.
  • the data acquisition unit 31 acquires the measurement data D1 (step S11) and outputs the acquired measurement data D1 to the data calculation unit 32 (step S12).
  • the data calculation unit 32 performs the data calculation step S2 by calculating antibody data D3 for each of the first number of target proteins TP, which relates to the binding associated with the antigen-antibody reaction between the autoantibody ABA in the multiple specimen samples SPS and the first number of target proteins TP, in association with the diagnostic data D2 for each of the multiple specimens, based on the measurement data D1.
  • the data calculation unit 32 calculates the antibody data D3 in association with the diagnostic data D2 based on the measurement data D1 (step S21), and outputs the calculated antibody data D3 to the antigen selection unit 33 (step S22).
  • the antibody data D3 for each of the first number of target proteins TP shows different values according to the difference in binding affinity of the autoantibody ABA to each target protein TP, and shows a larger value as the binding affinity of the autoantibody ABA increases.
  • the antibody data D3 for each of the first number of target proteins TP includes a data group of first antibody data D31 associated with first diagnostic data D21 indicating a diagnosis result of no disease, and a data group of second antibody data D32 associated with second to sixth diagnostic data D22 to D26 indicating a diagnosis result of disease.
  • the first antibody data D31 associated with the first diagnostic data D21 is calculated based on the first fluorescence intensity value D11 and the second fluorescence intensity value D12 associated with the first diagnostic data D21 in the measurement data D1.
  • the first antibody data D31 indicates a value smaller than the second antibody data D32 associated with the second to sixth diagnostic data D22 to D26 for each of the first number of target proteins TP.
  • the second antibody data D32 associated with the second to sixth diagnostic data D22 to D26 is calculated based on the first fluorescence intensity value D11 and the second fluorescence intensity value D12 associated with the second to sixth diagnostic data D22 to D26 in the measurement data D1.
  • the value for the target protein TP that is correlated with each disease-related antigen AG associated with each disease indicated in the second to sixth diagnostic data D22 to D26 is greater than the values for the other target proteins TP.
  • the number of data groups in the first antibody data D31 associated with the first diagnostic data D21 matches the number of specimen samples SPS derived from the subjects of the diagnostic results shown in the first diagnostic data D21.
  • the number of data groups in the second antibody data D32 associated with the second to sixth diagnostic data D22 to D26 matches the number of specimen samples SPS derived from each subject of the diagnostic results shown in the second to sixth diagnostic data D22 to D26. For example, assume that the number of specimen samples SPS derived from each subject of the diagnostic results shown in the first to sixth diagnostic data D21 to D26 is 10.
  • the number of data groups of the first antibody data D31 associated with the first diagnostic data D21, the number of data groups of the second antibody data D32 associated with the second diagnostic data D22, the number of data groups of the second antibody data D32 associated with the third diagnostic data D23, the number of data groups of the second antibody data D32 associated with the fourth diagnostic data D24, the number of data groups of the second antibody data D32 associated with the fifth diagnostic data D25, and the number of data groups of the second antibody data D32 associated with the sixth diagnostic data D26 are each "10".
  • the antigen selection unit 33 performs the antigen selection step S3 by selecting a specific target protein TPS that has a correlation with the disease-related antigen AG from among the first number of target proteins TP based on the antibody data D3 corresponding to each of the multiple diseases indicated in the diagnostic data D2. Specifically, the antigen selection unit 33 uses the antibody data D3 for each of the first number of target proteins TP associated with the diagnostic data D2 as input data, selects a specific target protein TPS that has a correlation with the disease-related antigen AG from among the first number of target proteins TP based on the antibody data D3 corresponding to each of the multiple diseases indicated in the diagnostic data D2, and generates a machine learning model LM that outputs the selected specific target protein TPS as output data (step S31).
  • the antigen selection unit 33 analyzes the relationship between the diagnostic data D2 and the first number of target proteins TP based on the antibody data D3 according to the machine learning model LM, and selects a specific target protein TPS from among the first number of target proteins TP corresponding to each of the multiple diseases indicated in the diagnostic data D2 (step S33). The antigen selection unit 33 then outputs the specific target protein TPS selected for each of the multiple diseases indicated in the diagnostic data D2 as a target protein TP that has a correlation with each disease-related antigen AG related to each disease (step S34).
  • antibody data D3 relating to the binding associated with the antigen-antibody reaction between a first number of target proteins TP fixed on the substrate 41 of the protein microarray 4 and autoantibodies ABA in multiple specimen samples SPS is input as input data to the machine learning model LM, and a specific target protein TPS that is correlated with a disease-associated antigen AG selected from the first number of target proteins TP based on the antibody data D3 corresponding to each of the multiple diseases indicated in the diagnostic data D2 can be output as output data.
  • This makes it possible to efficiently search for a specific target protein TPS that is correlated with each disease-associated antigen AG associated with each of the multiple diseases, compared to searching for target proteins TP individually corresponding to each of the multiple diseases.
  • the antibody data D3 for each of the first number of target proteins TP calculated by the data calculation unit 32 includes a data group of the first antibody data D31 associated with the first diagnostic data D21 indicating a diagnosis result of not having a disease, and a data group of the second antibody data D32 associated with the second to sixth diagnostic data D22 to D26 indicating a diagnosis result of having a disease.
  • the antigen selection unit 33 may perform a process of step S32 of extracting a second number of multiple target proteins TP, which is smaller than the first number, from the first number of target proteins TP before the process of step S33 of selecting a specific target protein TPS.
  • the antigen selection unit 33 extracts a second number of target proteins TP from the first number of target proteins TP that satisfy the condition that the first antibody data D31 is equal to or smaller than a first threshold value and the second antibody data D32 is equal to or larger than a second threshold value that is larger than the first threshold value.
  • the antigen selection unit 33 may extract a second number of target proteins TP from the first number of target proteins TP that satisfy the condition that the number of the first antibody data D31 equal to or greater than the first threshold is equal to or less than the first determination number, and the number of the second antibody data D32 equal to or greater than the second threshold is equal to or greater than the second determination number.
  • the antigen selection unit 33 analyzes the relationship between the diagnostic data D2 and the second number of target proteins TP based on the antibody data D3 according to the machine learning model LM, and selects a specific target protein TPS from the second number of target proteins TP corresponding to each of the multiple diseases indicated in the diagnostic data D2 (step S33).
  • the specific target protein TPS is selected from the second number of target proteins TP that is less than the first number of target proteins TP fixed on the substrate 41 of the protein microarray 4. This makes it possible to shorten the time required to select a specific target protein TPS using the machine learning model LM, and to more accurately search for a specific target protein TPS corresponding to each of the multiple diseases.
  • the antigen selection unit 33 may obtain the similarity between the antibody data D3 for each of the second number of target proteins TP, and classify the second number of target proteins TP into a plurality of groups corresponding to each of the plurality of diseases indicated in the diagnostic data D2 by hierarchical clustering HC based on the similarity. Then, the antigen selection unit 33 selects the target proteins TP belonging to each of the plurality of groups as the specific target protein TPS.
  • the antigen selection unit 33 classifies the second number of target proteins TP into a plurality of groups by hierarchical clustering HC based on the similarity between the antibody data D3 for each of the second number of target proteins TP. Then, the antigen selection unit 33 can select the target proteins TP belonging to each of the plurality of groups as the specific target protein TPS having a correlation with each disease-related antigen AG related to each of the plurality of diseases.
  • the antigen selection unit 33 starts by assigning the antibody data D3 for each of the second number of target proteins TP to one cluster in the hierarchical clustering HC, and then recursively combines similar clusters of antibody data D3.
  • the hierarchical clustering HC includes methods such as the shortest distance method, the longest distance method, and the group average method.
  • the similarity between clusters is defined as the similarity between the antibody data D3 across the clusters.
  • the antigen selection unit 33 combines the clusters into one cluster in order starting from the pair with the greatest similarity between them, and stops the combination when the similarity between all clusters falls below a predetermined similarity reference value.
  • the second number of target proteins TP are classified into a plurality of groups corresponding to each of a plurality of diseases.
  • the antigen selection unit 33 sets a maximum limit number for the number of target proteins TP belonging to each group, and by considering the similarity to be zero when the total number of antibody data D3 contained in the two combined clusters is greater than the maximum limit number, it is possible to prevent the number of target proteins TP belonging to each group from exceeding the maximum limit number.
  • the antigen selection unit 33 may also use a genetic algorithm GA when generating the machine learning model LM in step S31 (step S35).
  • the antigen selection unit 33 generates the machine learning model LM using the genetic algorithm GA with the goal that the diseases indicated in the diagnostic data D2 corresponding to the antibody data D3 of each target protein TP belonging to each of the multiple groups are the same.
  • the antigen selection unit 33 uses the antibody data D3 for each of the second number of target proteins TP associated with the diagnostic data D2 as input data, and generates the machine learning model LM using the genetic algorithm GA with a specific target protein TPS belonging to each group classified by the hierarchical clustering HC as output data.
  • the antigen selection unit 33 can generate the machine learning model LM with the goal that the diseases indicated in the diagnostic data D2 corresponding to the antibody data D3 of each target protein TP belonging to each group classified by the hierarchical clustering HC are the same.
  • the antigen selection unit 33 evaluates, for example, the identity of each disease corresponding to each target protein TP belonging to each group classified by the hierarchical clustering HC, and gives a first reward if the diseases are the same, and gives a second reward lower than the first reward if the diseases are not the same.
  • the antigen selection unit 33 can generate a machine learning model LM in which the diseases corresponding to each target protein TP belonging to each group classified by the hierarchical clustering HC are the same.
  • the second number of target proteins TP can be classified by the hierarchical clustering HC into multiple groups corresponding to each of the multiple diseases indicated in the diagnostic data D2.
  • the measurement data D1 obtained when using a protein microarray 4 may have low reproducibility. For this reason, when using multiple protein microarrays 4 corresponding to multiple diseases in the antigen searching device 3 to search for a specific target protein TPS that correlates with a disease-associated antigen AG associated with each disease, it is necessary to take into account the measurement error between the measurement data D1 corresponding to each of the multiple diseases.
  • the data calculation unit 32 performs a predetermined standardization process on the antibody data D3 input as input data to the machine learning model LM to absorb the measurement error between the measurement data D1 corresponding to each of the multiple diseases indicated in the diagnostic data D2, thereby calculating the standardized antibody data D3 for each of the first number of target proteins TP.
  • the data calculation unit 32 performs a standardization process on the antibody data D3 for each of the multiple diseases indicated in the diagnostic data D2 using a value at least at one percentile, thereby calculating the standardized antibody data D3 for each of the first number of target proteins TP.
  • the data calculation unit 32 can calculate the standardized antibody data D3 that appropriately absorbs the measurement error between the measurement data D1 corresponding to each of the multiple diseases by this percentile-based standardization process.
  • the antigen selection unit 33 can input the standardized antibody data D3 as input data to the machine learning model LM, thereby outputting the accurate specific target protein TPS as output data corresponding to each of the multiple diseases indicated in the diagnostic data D2.
  • the protein microarray 4 there may be variation in the amount of the first number of target proteins TP fixed on the substrate 41. For this reason, when searching for specific target proteins TP that are correlated with disease-associated antigens AG related to each of a plurality of diseases using the protein microarray 4 in the antigen searching device 3 corresponding to each of a plurality of diseases, it is necessary to take into account measurement errors caused by differences in the amount of target proteins TP fixed on the substrate 41 for the measurement data D1 corresponding to each of the plurality of diseases.
  • the data acquiring unit 31 acquires, as the measurement data D1 for each of the first number of target proteins TP fixed on the substrate 41 of the protein microarray 4, a first fluorescence intensity value D11 relating to the intensity of fluorescence emitted from the first fluorescent substance of the first secondary antibody AB1 bound to a predetermined reference antibody ABR bound to all of the first number of target proteins TP, and a second fluorescence intensity value D12 relating to the intensity of fluorescence emitted from the second fluorescent substance of the second secondary antibody AB2 bound to the autoantibody ABA bound to some of the target proteins TP among the first number of target proteins TP.
  • the data calculating unit 32 performs a standardization process using the first fluorescence intensity value D11 and the second fluorescence intensity value D12 for the antibody data D3 input as input data to the machine learning model LM, thereby calculating the standardized antibody data D3 for each of the first number of target proteins TP.
  • This allows the data calculation unit 32 to calculate standardized antibody data D3 for the measurement data D1 corresponding to each of the multiple diseases, which appropriately absorbs measurement errors caused by differences in the amount of the target protein TP fixed on the substrate 41.
  • the antigen selection unit 33 can input the standardized antibody data D3 as input data to the machine learning model LM, thereby outputting an accurate specific target protein TPS as output data corresponding to each of the multiple diseases indicated in the diagnosis data D2.
  • the data calculation unit 32 calculates the standardized antibody data D3 based on the measurement data D1, according to the first to fifth standardization process steps shown below.
  • the data calculation unit 32 divides, for each of the first number of target proteins TP, the first fluorescence intensity value D11 of the NC and the first fluorescence intensity values D11 associated with each of the first to sixth diagnostic data D21 to D26 by the average value of each of the first fluorescence intensity values D11. As a result, the data calculation unit 32 obtains a first standardized index value for each of the first number of target proteins TP in association with the NC and each of the first to sixth diagnostic data D21 to D26.
  • the data calculation unit 32 divides the second fluorescence intensity value D12 of the NC and the second fluorescence intensity value D12 associated with each of the first to sixth diagnostic data D21 to D26 by each of the first standardized index values for each of the first number of target proteins TP. As a result, the data calculation unit 32 obtains the second standardized index value for each of the first number of target proteins TP in association with the NC and each of the first to sixth diagnostic data D21 to D26.
  • the data calculation unit 32 subtracts the second standardized index value associated with NC from each of the second standardized index values associated with the first to sixth diagnostic data D21 to D26 for each of the first number of target proteins TP. In this way, the data calculation unit 32 obtains a third standardized index value for each of the first number of target proteins TP in association with each of the first to sixth diagnostic data D21 to D26.
  • the data calculation unit 32 subtracts the value at the 25th percentile from each of the third standardized index values for the first number of target proteins TP for each of the first to sixth diagnostic data D21 to D26, and divides the subtracted value by the standard deviation. In this way, the data calculation unit 32 obtains a fourth standardized index value for each of the first number of target proteins TP in association with each of the first to sixth diagnostic data D21 to D26.
  • the data calculation unit 32 divides each of the fourth standardized index values for the first number of target proteins TP by the value at the 75th percentile for each of the first to sixth diagnostic data D21 to D26, and multiplies the divided value by "10". In this way, the data calculation unit 32 obtains standardized antibody data D3 for each of the first number of target proteins TP in association with each of the first to sixth diagnostic data D21 to D26.
  • the data calculation unit 32 can calculate standardized antibody data D3 that appropriately absorbs the measurement error of the measurement data D1 caused by differences in the amount of the target protein TP fixed on the substrate 41 of the protein microarray 4 and the measurement error between the measurement data D1 corresponding to each of the first to sixth diagnostic data D21 to D26.
  • the antigen selection unit 33 can input the standardized antibody data D3 as input data to the machine learning model LM, and output an accurate specific target protein TPS as output data corresponding to each of the multiple diseases indicated in the diagnostic data D2.
  • the antigen selection unit 33 may be configured to extract various proteins associated with each of a plurality of diseases based on information registered in existing databases such as OMIM, Genecards, and AAgAtlas. In this case, the antigen selection unit 33 can narrow down the proteins that are correlated with the disease-related antigen AG associated with each of a plurality of diseases to those that are common to the various proteins extracted for a specific target protein TPS output from the machine learning model LM.
  • An antigen discovery method is a method for discovering target proteins that are correlated with disease-related antigens associated with each of a plurality of diseases, using a protein microarray in which a first number of target proteins are fixed on a substrate.
  • This antigen discovery method includes a data acquisition step of acquiring measurement data for each of the first number of target proteins, which is related to a predetermined feature amount when each of a plurality of specimen samples derived from a plurality of subjects is contacted with the first number of target proteins; a data calculation step of calculating, based on the measurement data, antibody data for each of the first number of target proteins, which is related to binding associated with an antigen-antibody reaction between an autoantibody in the plurality of specimen samples and the first number of target proteins, in association with diagnostic data showing diagnostic results for a plurality of diseases for each of the plurality of subjects; and an antigen selection step of generating a machine learning model using the antibody data for each of the first number of target proteins associated with the diagnostic data as input data, selecting a specific target protein that is correlated with the disease-related antigen from among the first number of target proteins based on the antibody data, corresponding to each of a plurality of diseases shown in the diagnostic data, according to the machine learning model, and outputting the selected specific target protein as output
  • antibody data relating to the binding associated with the antigen-antibody reaction between a first number of target proteins fixed on a protein microarray substrate and autoantibodies in multiple test samples is input as input data into a machine learning model, and specific target proteins that are correlated with disease-associated antigens selected from the first number of target proteins based on the antibody data for each of the multiple diseases indicated in the diagnostic data can be output as output data.
  • This makes it possible to efficiently search for specific target proteins that are correlated with each disease-associated antigen associated with each of the multiple diseases, compared to searching for target proteins individually for each of the multiple diseases.
  • the antibody data may include a data group of first antibody data associated with the diagnostic data indicating a diagnosis result of not having a disease, and a data group of second antibody data associated with the diagnostic data indicating a diagnosis result of having a disease.
  • the antigen selection step satisfies the condition that the first antibody data is equal to or less than a first threshold value and the second antibody data is equal to or greater than a second threshold value that is greater than the first threshold value, and a second number of target proteins less than the first number are extracted from the first number of target proteins, and the specific target protein is selected from the second number of target proteins.
  • the specific target protein when selecting a specific target protein that is correlated with a disease-related antigen for each of a plurality of diseases indicated in the diagnostic data according to the machine learning model, the specific target protein is selected from a second number of target proteins that is less than the first number of target proteins immobilized on the substrate of the protein microarray. This makes it possible to shorten the time required to select a specific target protein using the machine learning model, and to more accurately search for a specific target protein corresponding to each of a plurality of diseases.
  • the antigen selection step may involve determining a similarity between the antibody data for each of the second number of target proteins, and classifying the second number of target proteins into a plurality of groups corresponding to each of a plurality of diseases indicated in the diagnostic data by hierarchical clustering based on the similarity, thereby selecting target proteins belonging to each of the plurality of groups as the specific target proteins.
  • the second number of target proteins are classified into a plurality of groups by hierarchical clustering based on the similarity between the antibody data for each of the second number of target proteins. Then, the target proteins belonging to each of the plurality of groups can be selected as specific target proteins that are correlated with each disease-associated antigen associated with each of the plurality of diseases.
  • Hierarchical clustering antibody data for each of the second number of target proteins is assigned to one cluster, and clusters with similar antibody data are recursively combined.
  • Hierarchical clustering includes methods such as the shortest distance method, the longest distance method, and the group average method, depending on the criteria for selecting clusters to be combined.
  • the similarity between clusters is defined as the similarity between antibody data across clusters.
  • Clusters are combined in order from the pair with the greatest similarity to form one cluster, and the combination is stopped when the similarity between all clusters falls below a predetermined similarity reference value.
  • the second number of target proteins are classified into multiple groups corresponding to each of multiple diseases.
  • a maximum limit is set for the number of target proteins belonging to each group, and if the total number of antibody data included in the two clusters to be combined is greater than the maximum limit, the similarity is considered to be zero, making it possible to prevent the number of target proteins belonging to each group from exceeding the maximum limit.
  • the machine learning model may be generated using a genetic algorithm with the goal that the diseases indicated in the diagnostic data corresponding to the antibody data for each of the target proteins belonging to each of the multiple groups are identical.
  • a machine learning model is generated using a genetic algorithm, in which antibody data for each of a second number of target proteins associated with diagnostic data is used as input data, and specific target proteins belonging to each group classified by hierarchical clustering are used as output data.
  • a genetic algorithm By using a genetic algorithm, a machine learning model can be generated with the goal of identifying the same disease indicated by diagnostic data corresponding to each antibody data for each target protein belonging to each group classified by hierarchical clustering.
  • the identity of each disease corresponding to each target protein belonging to each group classified by hierarchical clustering is evaluated, and if the diseases are the same, a first reward is given, and if the diseases are not the same, a second reward lower than the first reward is given.
  • a genetic algorithm By repeatedly performing such a genetic algorithm, it is possible to generate a machine learning model in which the diseases corresponding to each target protein belonging to each group classified by hierarchical clustering are the same. In this way, when antibody data for each of the second number of target proteins is input as input data to the machine learning model generated using the genetic algorithm, the second number of target proteins can be classified by hierarchical clustering into multiple groups corresponding to each of the multiple diseases indicated in the diagnostic data.
  • the data calculation step may involve performing a predetermined standardization process to absorb measurement errors between the measurement data corresponding to each of the multiple diseases indicated in the diagnostic data, thereby calculating standardized antibody data for each of the first number of target proteins.
  • the measurement data may have low reproducibility. For this reason, when searching for specific target proteins that are correlated with disease-related antigens related to each of a plurality of diseases using multiple protein microarrays corresponding to each of a plurality of diseases, it is necessary to take into account the measurement error between the measurement data corresponding to each of the plurality of diseases. Therefore, for the antibody data input as input data to the machine learning model, a predetermined standardization process is performed to absorb the measurement error between the measurement data corresponding to each of the plurality of diseases, and standardized antibody data is calculated for each of the first number of target proteins. In this case, by inputting the standardized antibody data as input data to the machine learning model, it is possible to output accurate specific target proteins as output data corresponding to each of the plurality of diseases indicated in the diagnostic data.
  • the data calculation step may perform the standardization process using a value at least at one percentile for the antibody data for each of the multiple diseases indicated in the diagnostic data, thereby calculating standardized antibody data for each of the first number of target proteins.
  • standardization processing is performed on the antibody data for each of the multiple diseases indicated in the diagnostic data, using a value at least at one percentile.
  • This percentile-based standardization processing makes it possible to calculate standardized antibody data that appropriately absorbs the measurement error between the measurement data corresponding to each of the multiple diseases.
  • the measurement data may include a first fluorescence intensity value, the feature value being the intensity of fluorescence emitted from a first fluorescent substance in a first secondary antibody that is capable of binding to a predetermined reference antibody that binds to any of the first number of target proteins and is labeled with a first fluorescent substance, and a second fluorescence intensity value, the feature value being the intensity of fluorescence emitted from a second fluorescent substance in a second secondary antibody that is capable of binding to autoantibodies in the multiple test samples and is labeled with a second fluorescent substance.
  • the standardization process is performed using the first fluorescence intensity value and the second fluorescence intensity value, thereby calculating the standardized antibody data for each of the first number of target proteins.
  • a first fluorescence intensity value related to the intensity of fluorescence emitted from a first fluorescent substance of a first secondary antibody bound to a predetermined reference antibody bound to all of the first number of target proteins, and a second fluorescence intensity value related to the intensity of fluorescence emitted from a second fluorescent substance of a second secondary antibody bound to an autoantibody in the test sample are obtained.
  • standardization processing is performed using the first fluorescence intensity value and the second fluorescence intensity value for the antibody data input as input data to the machine learning model, thereby calculating standardized antibody data for each of the first number of target proteins.
  • the standardized antibody data as input data to the machine learning model, it is possible to output accurate specific target proteins as output data corresponding to each of the multiple diseases indicated in the diagnostic data.
  • An antigen search system comprises a measurement device that uses a protein microarray having a first number of target proteins fixed on a substrate to measure a predetermined characteristic amount when each of a plurality of specimen samples derived from a plurality of specimens is contacted with the first number of target proteins, and outputs measurement data for each of the first number of target proteins indicating the measurement results, and an antigen search device that searches for target proteins that have a correlation with disease-associated antigens associated with each of a plurality of diseases.
  • the antigen search device includes a data acquisition unit that acquires the measurement data, a data calculation unit that calculates antibody data for each of the first number of target proteins related to binding associated with an antigen-antibody reaction between the autoantibody in the multiple test specimens and the first number of target proteins based on the measurement data, in association with diagnostic data showing diagnostic results for multiple diseases for each of the multiple test specimens, and an antigen selection unit that generates a machine learning model using the antibody data for each of the first number of target proteins associated with the diagnostic data as input data, selects a specific target protein that has a correlation with the disease-related antigen from among the first number of target proteins based on the antibody data in accordance with the machine learning model, corresponding to each of the multiple diseases shown in the diagnostic data, and outputs the selected specific target protein as output data.
  • the antigen selection unit of the antigen search device inputs antibody data relating to the binding associated with the antigen-antibody reaction between a first number of target proteins fixed on a protein microarray substrate and autoantibodies in multiple test samples as input data into a machine learning model, and is then able to output, as output data, specific target proteins that are correlated with disease-associated antigens selected from the first number of target proteins based on the antibody data, corresponding to each of the multiple diseases indicated in the diagnostic data.
  • This makes it possible to efficiently search for specific target proteins that are correlated with each disease-associated antigen associated with each of the multiple diseases, compared to searching for target proteins individually corresponding to each of the multiple diseases.
  • an antigen discovery method and an antigen discovery system that can efficiently discover target proteins that are correlated with disease-related antigens that are associated with each of a plurality of diseases, using a protein microarray in which a plurality of target proteins are fixed onto a substrate.

Abstract

This antigen discovery method comprises a data acquisition step for acquiring measurement data, a data calculation step, and an antigen selection step. In the data calculation step, antigen data is calculated on the basis of the measurement data, in which the antigen data is associated with an antigen-antibody reaction between a first number of target protein molecules immobilized on a protein microarray and autoantibody molecules in a plurality of samples to be tested. In the antigen selection step, the antibody data are input to a machine learning model as input data, and a specific target protein having a correlation with a disease-associated antigen, which is selected from the first number of target protein molecules, is output for each of a plurality of diseases.

Description

抗原探索方法及び抗原探索システムAntigen discovery method and antigen discovery system
 本発明は、タンパク質マイクロアレイを用いて複数の疾患のそれぞれに関連する疾患関連抗原と相関を有する標的タンパク質を探索する抗原探索方法及び抗原探索システムに関する。 The present invention relates to an antigen discovery method and an antigen discovery system that uses a protein microarray to discover target proteins that are correlated with disease-related antigens associated with each of a number of diseases.
 疾患を有する被検体に由来する被検体試料を、タンパク質マイクロアレイを用いて分析することが知られている(例えば特許文献1参照)。タンパク質マイクロアレイは、スライドガラス等の基板上に標的タンパク質を数千~数万スポット並べて固定化したものである。基板上の各標的タンパク質に被検体試料を接触させることにより、被検体試料中の自己抗体と各標的タンパク質との抗原抗体反応に伴う結合性の指標となる特徴量を測定データとして得ることができる。 It is known that specimen samples derived from subjects with a disease can be analyzed using protein microarrays (see, for example, Patent Document 1). A protein microarray is a substrate such as a glass slide on which thousands to tens of thousands of target proteins are immobilized in an array of spots. By contacting the specimen sample with each target protein on the substrate, it is possible to obtain, as measurement data, feature quantities that serve as an index of binding properties associated with the antigen-antibody reaction between the autoantibodies in the specimen sample and each target protein.
 自己抗体は、疾患に関連する疾患関連抗原に対して産生される。このため、タンパク質マイクロアレイの基板上の各標的タンパク質と自己抗体との抗原抗体反応に関する測定データを解析することで、各標的タンパク質の中から疾患関連抗原と相関を有する特定の標的タンパク質を探索することができる。 Autoantibodies are produced against disease-related antigens that are related to a disease. Therefore, by analyzing measurement data regarding the antigen-antibody reaction between each target protein on the protein microarray substrate and the autoantibody, it is possible to search for specific target proteins that have a correlation with disease-related antigens from among the target proteins.
 しかしながら、複数の疾患が存在するところ、当該複数の疾患のそれぞれに対応して、各疾患に関連する各疾患関連抗原と相関を有する特定の標的タンパク質を効率よく探索する点で改善の余地がある。 However, since there are multiple diseases, there is room for improvement in efficiently searching for specific target proteins that correlate with each disease-associated antigen associated with each disease.
特表2021-521536号公報Specific Publication No. 2021-521536
 本発明の目的は、基板上に複数の標的タンパク質が固定されてなるタンパク質マイクロアレイを用いて、複数の疾患のそれぞれに関連する疾患関連抗原と相関を有する標的タンパク質を効率よく探索することが可能な抗原探索方法及び抗原探索システムを提供することである。 The object of the present invention is to provide an antigen discovery method and an antigen discovery system that can efficiently discover target proteins that are correlated with disease-related antigens that are related to each of a number of diseases, using a protein microarray in which a number of target proteins are fixed on a substrate.
 本発明の一の局面に係る抗原探索方法は、基板上に第1数の複数の標的タンパク質が固定されてなるタンパク質マイクロアレイを用いて、複数の疾患のそれぞれに関連する疾患関連抗原と相関を有する標的タンパク質を探索する方法である。この抗原探索方法は、複数の被検体に由来する複数の被検体試料のそれぞれを前記第1数の標的タンパク質に接触させたときの所定の特徴量に関する、前記第1数の標的タンパク質のそれぞれについての測定データを取得するデータ取得工程と、前記測定データに基づいて、前記複数の被検体試料中の自己抗体と前記第1数の標的タンパク質との抗原抗体反応に伴う結合性に関する、前記第1数の標的タンパク質のそれぞれについての抗体データを、前記複数の被検体のそれぞれに対する複数の疾患に関する診断結果を示す診断データと対応付けて算出するデータ算出工程と、前記診断データと対応付けられた前記第1数の標的タンパク質のそれぞれについての前記抗体データを入力データとする機械学習モデルを生成し、前記機械学習モデルに従って、前記診断データで示される複数の疾患のそれぞれに対応して、前記第1数の標的タンパク質の中から前記抗体データに基づき前記疾患関連抗原と相関を有する特定の標的タンパク質を選定し、当該選定した前記特定の標的タンパク質を出力データとして出力する抗原選定工程と、を含む。  An antigen discovery method according to one aspect of the present invention is a method for discovering target proteins that are correlated with disease-related antigens associated with each of a plurality of diseases, using a protein microarray in which a first number of target proteins are fixed on a substrate. This antigen discovery method includes a data acquisition step of acquiring measurement data for each of the first number of target proteins, which is related to a predetermined feature amount when each of a plurality of specimen samples derived from a plurality of subjects is contacted with the first number of target proteins; a data calculation step of calculating, based on the measurement data, antibody data for each of the first number of target proteins, which is related to binding associated with an antigen-antibody reaction between an autoantibody in the plurality of specimen samples and the first number of target proteins, in association with diagnostic data showing diagnostic results for a plurality of diseases for each of the plurality of subjects; and an antigen selection step of generating a machine learning model using the antibody data for each of the first number of target proteins associated with the diagnostic data as input data, selecting a specific target protein that is correlated with the disease-related antigen from among the first number of target proteins based on the antibody data, corresponding to each of a plurality of diseases shown in the diagnostic data, according to the machine learning model, and outputting the selected specific target protein as output data.
 本発明の他の局面に係る抗原探索システムは、基板上に第1数の複数の標的タンパク質が固定されてなるタンパク質マイクロアレイを用いて、複数の被検体に由来する複数の被検体試料のそれぞれを前記第1数の標的タンパク質に接触させたときの所定の特徴量を測定し、その測定結果を示す前記第1数の標的タンパク質のそれぞれについての測定データを出力する測定装置と、複数の疾患のそれぞれに関連する疾患関連抗原と相関を有する標的タンパク質を探索する抗原探索装置と、を備える。前記抗原探索装置は、前記測定データを取得するデータ取得部と、前記測定データに基づいて、前記複数の被検体試料中の自己抗体と前記第1数の標的タンパク質との抗原抗体反応に伴う結合性に関する、前記第1数の標的タンパク質のそれぞれについての抗体データを、前記複数の被検体のそれぞれに対する複数の疾患に関する診断結果を示す診断データと対応付けて算出するデータ算出部と、前記診断データと対応付けられた前記第1数の標的タンパク質のそれぞれについての前記抗体データを入力データとする機械学習モデルを生成し、前記機械学習モデルに従って、前記診断データで示される複数の疾患のそれぞれに対応して、前記第1数の標的タンパク質の中から前記抗体データに基づき前記疾患関連抗原と相関を有する特定の標的タンパク質を選定し、当該選定した前記特定の標的タンパク質を出力データとして出力する抗原選定部と、を含む。 An antigen search system according to another aspect of the present invention comprises a measurement device that uses a protein microarray having a first number of target proteins fixed on a substrate to measure a predetermined characteristic amount when each of a plurality of specimen samples derived from a plurality of specimens is contacted with the first number of target proteins, and outputs measurement data for each of the first number of target proteins indicating the measurement results, and an antigen search device that searches for target proteins that have a correlation with disease-associated antigens associated with each of a plurality of diseases. The antigen search device includes a data acquisition unit that acquires the measurement data, a data calculation unit that calculates antibody data for each of the first number of target proteins related to binding associated with an antigen-antibody reaction between the autoantibody in the multiple test specimens and the first number of target proteins based on the measurement data, in association with diagnostic data showing diagnostic results for multiple diseases for each of the multiple test specimens, and an antigen selection unit that generates a machine learning model using the antibody data for each of the first number of target proteins associated with the diagnostic data as input data, selects a specific target protein that has a correlation with the disease-related antigen from among the first number of target proteins based on the antibody data in accordance with the machine learning model, corresponding to each of the multiple diseases shown in the diagnostic data, and outputs the selected specific target protein as output data.
 本発明の目的、特徴及び利点は、以下の詳細な説明と添付図面とによって、より明白となる。 The objects, features and advantages of the present invention will become more apparent from the following detailed description and the accompanying drawings.
本発明の実施形態に係る抗原探索システムのブロック図である。FIG. 1 is a block diagram of an antigen discovery system according to an embodiment of the present invention. 抗原探索システムに備えられる測定装置による測定処理を説明する図である。FIG. 2 is a diagram explaining a measurement process performed by a measurement device provided in the antigen search system. 測定装置による測定結果を示す測定データの一例を示す図である。FIG. 11 is a diagram showing an example of measurement data showing a measurement result by a measurement device. 抗原探索システムに備えられる抗原探索装置によって実行される抗原探索方法のフローチャートである。1 is a flowchart of an antigen searching method performed by an antigen searching device provided in an antigen searching system. 抗原探索装置のデータ算出部により算出される抗体データの一例を示す図である。11 is a diagram showing an example of antibody data calculated by a data calculation unit of the antigen searching device. FIG.
 以下、本発明の実施形態に係る抗原探索システム、及び、抗原探索システムの抗原探索装置によって実行される抗原探索方法について、図面に基づいて説明する。 Below, an antigen search system according to an embodiment of the present invention and an antigen search method executed by an antigen search device of the antigen search system will be described with reference to the drawings.
 [抗原探索システムの全体構成]
 図1に示されるように、抗原探索システム1は、測定装置2と抗原探索装置3とを備えるシステムである。抗原探索システム1では、測定装置2において図2に示されるタンパク質マイクロアレイ4を用いて複数の被検体の疾患に関する測定データD1が取得され、抗原探索装置3において複数の疾患のそれぞれに関連する疾患関連抗原と相関を有するタンパク質が探索される。
[Overall configuration of antigen discovery system]
As shown in Fig. 1, the antigen search system 1 is a system that includes a measurement device 2 and an antigen search device 3. In the antigen search system 1, the measurement device 2 acquires measurement data D1 related to diseases of multiple subjects using a protein microarray 4 shown in Fig. 2, and the antigen search device 3 searches for proteins that have a correlation with disease-related antigens related to each of the multiple diseases.
 被検体は、主として、ヒト、及びその他の哺乳動物を意味する。その他の哺乳動物としては、例えば、サル、チンパンジー等の霊長類の動物、ウシ、ウマ、ブタ、ヒツジ等の家畜動物、イヌ、ネコ等のペット用動物、マウス、ラット、ウサギ等の実験動物などが挙げられる。これらのうち、被検体は、ヒトであることが好ましい。 The subject primarily refers to humans and other mammals. Other mammals include, for example, primates such as monkeys and chimpanzees, livestock animals such as cows, horses, pigs and sheep, pet animals such as dogs and cats, and laboratory animals such as mice, rats and rabbits. Of these, it is preferable that the subject is a human.
 疾患としては、注意欠陥多動性障害、レビー小体型認知症、軽度認知障害、大うつ病障害、パーキンソン病等の神経・筋疾患、脳腫瘍、胃がん、大腸がん、肝がん、肺がん等の悪性腫瘍、潰瘍性大腸炎等の消化器系疾患、特発性間質性肺炎等の呼吸器系疾患、特発性拡張型心筋症等の循環器系疾患、特発性血小板減少性紫斑病等の血液系疾患、新型コロナウイルス感染症等の感染症の後遺症などが挙げられる。これらのうち、神経・筋疾患としての複数の疾患を対象とすることが好ましい。 Diseases include neuromuscular diseases such as attention deficit hyperactivity disorder, dementia with Lewy bodies, mild cognitive impairment, major depressive disorder, and Parkinson's disease; malignant tumors such as brain tumors, gastric cancer, colon cancer, liver cancer, and lung cancer; digestive system diseases such as ulcerative colitis; respiratory system diseases such as idiopathic interstitial pneumonia; circulatory system diseases such as idiopathic dilated cardiomyopathy; blood system diseases such as idiopathic thrombocytopenic purpura; and sequelae of infectious diseases such as COVID-19. Of these, it is preferable to target multiple diseases as neuromuscular diseases.
 [測定装置について]
 測定装置2は、図2に示されるタンパク質マイクロアレイ4を用いて、基準試料SPRと、複数の被検体に由来する複数の被検体試料SPSとに関する測定データD1を取得する装置である。
[Measuring equipment]
The measurement device 2 is a device that acquires measurement data D1 regarding a reference sample SPR and a plurality of specimen samples SPS derived from a plurality of specimens, using a protein microarray 4 shown in FIG.
 タンパク質マイクロアレイ4は、スライドガラス等の基板41上に複数のスポット42がマトリクス状に配列されたものである。タンパク質マイクロアレイ4では、基板41上のスポット42において、数千~数万の第1数の複数の標的タンパク質TPが固定されている。すなわち、タンパク質マイクロアレイ4は、基板41上のスポット42に数千~数万の第1数の標的タンパク質TPが固定されたものである。標的タンパク質TPは、コムギ胚芽抽出液等の無細胞発現系で発現された組換えタンパク質などの抗原タンパク質である。標的タンパク質TPとしては、例えば、A1BG(1alpha-1-B glycoprotein)、A1CF(APOBEC1 complementation factor)、A2M(2alpha-2-macroglobulin)等の標準型タンパク質、p53遺伝子に由来する変異体タンパク質、ALK融合遺伝子に由来する融合タンパク質などが挙げられる。 The protein microarray 4 is a matrix of a plurality of spots 42 arranged on a substrate 41 such as a slide glass. In the protein microarray 4, a first number of a plurality of target proteins TP, ranging from several thousand to several tens of thousands, are fixed on the spots 42 on the substrate 41. That is, the protein microarray 4 is a matrix of a first number of target proteins TP, ranging from several thousand to several tens of thousands, fixed on the spots 42 on the substrate 41. The target proteins TP are antigen proteins such as recombinant proteins expressed in a cell-free expression system such as wheat germ extract. Examples of target proteins TP include standard proteins such as A1BG (1alpha-1-B glycoprotein), A1CF (APOBEC1 complementation factor), and A2M (2alpha-2-macroglobulin), mutant proteins derived from the p53 gene, and fusion proteins derived from the ALK fusion gene.
 被検体に由来する被検体試料SPSとしては、例えば、全血、血清、血漿、尿、前立腺液、涙、粘液腹水、口腔液、唾液、精液、精漿、粘液、大便、痰、脳脊髄液、骨髄、リンパ液などが挙げられ、これらを希釈液等で調製してもよい。被検体試料SPSは、疾患を有しない被検体に由来する試料と、疾患を有する被検体に由来する試料とを含む。疾患を有する被検体に由来する被検体試料SPSは、疾患に関連する疾患関連抗原AG(後記の図4)に対して産生される自己抗体ABAを含んでいる。一方、疾患を有しない被検体に由来する被検体試料SPSは、自己抗体ABAを含んではいない。 The specimen sample SPS derived from a subject may include, for example, whole blood, serum, plasma, urine, prostatic fluid, tears, mucus ascites, oral fluid, saliva, semen, seminal plasma, mucus, stool, sputum, cerebrospinal fluid, bone marrow, lymph, etc., and these may be prepared with a diluent or the like. The specimen sample SPS includes samples derived from subjects without a disease and samples derived from subjects with a disease. The specimen sample SPS derived from a subject with a disease contains the autoantibody ABA produced against the disease-associated antigen AG (see Figure 4 below) associated with the disease. On the other hand, the specimen sample SPS derived from a subject without a disease does not contain the autoantibody ABA.
 疾患関連抗原AGは、被検体の体内において疾患で自己抗体ABAが産生されるタンパク質である。自己抗体ABAは、疾患関連抗原AGに対する抗体を意味する。自己抗体ABAとしては、グロブリンタイプ(クラス)がIgG型、IgM型、IgA型、IgD型、IgE型の抗体が挙げられる。自己抗体ABAは、タンパク質マイクロアレイ4の基板41上のスポット42に固定された第1数の標的タンパク質TPのうちの、疾患に関連する疾患関連抗原AGと相関を有する標的タンパク質TPとの抗原抗体反応によって、当該標的タンパク質TPと結合する。一方、第1数の標的タンパク質TPのうちの、疾患に関連する疾患関連抗原AGと相関を有しない標的タンパク質TPに対しては、自己抗体ABAは、結合しないか、或いは、結合したとしてもその結合力は弱い。すなわち、自己抗体ABAは、自己抗体ABAに対応した疾患に関連する疾患関連抗原AGと相関を有する標的タンパク質TPに対しては高い結合性を示す一方、疾患関連抗原AGと相関を有しない標的タンパク質TPに対する結合性は低い。 Disease-associated antigen AG is a protein that produces autoantibody ABA in the body of a subject due to a disease. Autoantibody ABA refers to an antibody against disease-associated antigen AG. Examples of autoantibody ABA include antibodies with globulin types (classes) of IgG, IgM, IgA, IgD, and IgE. Autoantibody ABA binds to target protein TP that is correlated with disease-associated antigen AG associated with a disease among the first number of target proteins TP fixed to spots 42 on substrate 41 of protein microarray 4 by antigen-antibody reaction with the target protein TP. On the other hand, autoantibody ABA does not bind to target protein TP that is not correlated with disease-associated antigen AG associated with a disease among the first number of target proteins TP, or even if it does bind, the binding strength is weak. That is, the autoantibody ABA exhibits high binding affinity to the target protein TP that correlates with the disease-associated antigen AG associated with the disease corresponding to the autoantibody ABA, but exhibits low binding affinity to the target protein TP that does not correlate with the disease-associated antigen AG.
 基準試料SPRは、タンパク質マイクロアレイ4の基板41上のスポット42に固定された第1数の標的タンパク質TPの何れに対しても抗原抗体反応によって結合可能なリファレンス抗体ABRを含む試料である。リファレンス抗体ABRとしては、例えば、Goatリファレンス抗体などが挙げられる。 The reference sample SPR is a sample containing a reference antibody ABR that can bind, through an antigen-antibody reaction, to any of the first number of target proteins TP fixed to the spots 42 on the substrate 41 of the protein microarray 4. An example of the reference antibody ABR is a Goat reference antibody.
 測定装置2は、タンパク質マイクロアレイ4の基板41上のスポット42に固定された第1数の標的タンパク質TPに対して複数の被検体に由来する複数の被検体試料SPSのそれぞれを接触させたときの所定の特徴量を測定する。測定装置2は、第1数の標的タンパク質TPと被検体試料SPS中の自己抗体ABAとの抗原抗体反応に伴う結合性に関する指標となる特徴量の測定が可能であれば、その測定手法は特に限定されない。測定装置2の測定手法は、特徴量の測定のために自己抗体ABAに結合させる標識物質に応じて区別される。測定装置2の測定手法としては、標識物質として蛍光物質を用いる手法、標識物質として酵素を用いる手法、標識物質としてRI(放射性同位元素)を用いる手法などが挙げられる。 The measuring device 2 measures a predetermined feature amount when each of a plurality of specimen samples SPS derived from a plurality of specimens is brought into contact with a first number of target proteins TP fixed to spots 42 on a substrate 41 of a protein microarray 4. The measuring device 2 is not particularly limited in its measurement method as long as it is capable of measuring a feature amount that is an index of binding associated with an antigen-antibody reaction between the first number of target proteins TP and the autoantibody ABA in the specimen sample SPS. The measurement method of the measuring device 2 is distinguished according to the labeling substance that is bound to the autoantibody ABA to measure the feature amount. Examples of the measurement method of the measuring device 2 include a method using a fluorescent substance as a labeling substance, a method using an enzyme as a labeling substance, and a method using RI (radioisotope) as a labeling substance.
 本実施形態では、測定装置2は、標識物質として蛍光物質を用いて特徴量として蛍光強度を測定する。具体的には、測定装置2は、基準試料SPR中のリファレンス抗体ABRに結合可能であって第1蛍光物質で標識された第1二次抗体AB1において、所定の励起光の照射に応じて第1蛍光物質から放出された蛍光の蛍光強度を測定する。また、測定装置2は、被検体試料SPS中の自己抗体ABAに結合可能であって第2蛍光物質で標識された第2二次抗体AB2において、所定の励起光の照射に応じて第2蛍光物質から放出された蛍光の蛍光強度を測定する。第1二次抗体AB1は、例えば、緑色の蛍光を放出する第1蛍光物質で標識された抗GoatIgG抗体などである。第2二次抗体AB2は、例えば、赤色の蛍光を放出する第2蛍光物質で標識された、抗ヒトIgG抗体、抗ヒトIgM抗体、抗ヒトIgA抗体、抗ヒトIgD抗体、抗ヒトIgE抗体などである。 In this embodiment, the measuring device 2 uses a fluorescent substance as a labeling substance and measures the fluorescence intensity as a feature quantity. Specifically, the measuring device 2 measures the fluorescence intensity of the fluorescence emitted from the first fluorescent substance in response to irradiation of a predetermined excitation light in the first secondary antibody AB1 that can bind to the reference antibody ABR in the reference sample SPR and is labeled with a first fluorescent substance. The measuring device 2 also measures the fluorescence intensity of the fluorescence emitted from the second fluorescent substance in response to irradiation of a predetermined excitation light in the second secondary antibody AB2 that can bind to the autoantibody ABA in the specimen sample SPS and is labeled with a second fluorescent substance. The first secondary antibody AB1 is, for example, an anti-Goat IgG antibody labeled with a first fluorescent substance that emits green fluorescence. The second secondary antibody AB2 is, for example, an anti-human IgG antibody, anti-human IgM antibody, anti-human IgA antibody, anti-human IgD antibody, anti-human IgE antibody, etc., labeled with a second fluorescent substance that emits red fluorescence.
 図2に示されるように、タンパク質マイクロアレイ4の基板41上のスポット42に固定された第1数の標的タンパク質TPに対して基準試料SPRのみを接触させた場合を想定する。この場合、基準試料SPR中のリファレンス抗体ABRが、抗原抗体反応(一次反応)によって、第1数の全ての標的タンパク質TPに結合する。第1数の全ての標的タンパク質TPに対してリファレンス抗体ABRが結合した状態で第1二次抗体AB1及び第2二次抗体AB2を接触させると、二次反応下において、第1二次抗体AB1がリファレンス抗体ABRに結合し、第2二次抗体AB2は結合対象が存在しないのでそのまま浮遊して除去される。この場合、第1数の標的タンパク質TPのそれぞれに対して、リファレンス抗体ABRを介して第1蛍光物質で標識された第1二次抗体AB1が結合される一方、第2二次抗体AB2は結合されていない。 As shown in FIG. 2, assume that only the reference sample SPR is contacted with the first number of target proteins TP fixed to spots 42 on the substrate 41 of the protein microarray 4. In this case, the reference antibody ABR in the reference sample SPR binds to all of the first number of target proteins TP through an antigen-antibody reaction (primary reaction). When the first secondary antibody AB1 and the second secondary antibody AB2 are contacted with all of the first number of target proteins TP in a state in which the reference antibody ABR is bound, the first secondary antibody AB1 binds to the reference antibody ABR in the secondary reaction, and the second secondary antibody AB2 floats away and is removed as it is since there is no binding target. In this case, the first secondary antibody AB1 labeled with the first fluorescent substance binds to each of the first number of target proteins TP via the reference antibody ABR, while the second secondary antibody AB2 is not bound.
 第1数の標的タンパク質TPに対して基準試料SPRのみを接触させた場合、測定装置2は、測定結果を示すデータとして、第1数の標的タンパク質TPのそれぞれについてのネガティブコントロール(NC)の測定データD1を出力する。NCの測定データD1は、リファレンス抗体ABRに結合可能な第1二次抗体AB1の第1蛍光物質から放出された蛍光の強度を示す第1蛍光強度値D11と、被検体試料SPS中の自己抗体ABAに結合可能な第2二次抗体AB2の第2蛍光物質から放出された蛍光の強度を示す第2蛍光強度値D12と、を含む。第1数の標的タンパク質TPに対して基準試料SPRのみを接触させた場合には、上記の通り、第1数の標的タンパク質TPのそれぞれに対して、リファレンス抗体ABRを介して第1蛍光物質で標識された第1二次抗体AB1が結合される一方、第2二次抗体AB2は結合されていない。このため、NCの測定データD1においては、理想的には、第1数の標的タンパク質TPのそれぞれについて、第1蛍光強度値D11は所定の第1基準強度値以上の値を示し、第2蛍光強度値D12は所定の第2基準強度値未満のゼロに近い値を示す。 When only the reference sample SPR is brought into contact with the first number of target proteins TP, the measuring device 2 outputs measurement data D1 of a negative control (NC) for each of the first number of target proteins TP as data showing the measurement result. The measurement data D1 of the NC includes a first fluorescence intensity value D11 indicating the intensity of the fluorescence emitted from the first fluorescent substance of the first secondary antibody AB1 capable of binding to the reference antibody ABR, and a second fluorescence intensity value D12 indicating the intensity of the fluorescence emitted from the second fluorescent substance of the second secondary antibody AB2 capable of binding to the autoantibody ABA in the specimen sample SPS. When only the reference sample SPR is brought into contact with the first number of target proteins TP, as described above, the first secondary antibody AB1 labeled with the first fluorescent substance is bound to each of the first number of target proteins TP via the reference antibody ABR, while the second secondary antibody AB2 is not bound. Therefore, in the measurement data D1 of the NC, ideally, for each of the first number of target proteins TP, the first fluorescence intensity value D11 indicates a value equal to or greater than a predetermined first reference intensity value, and the second fluorescence intensity value D12 indicates a value close to zero that is less than the predetermined second reference intensity value.
 タンパク質マイクロアレイ4の基板41上のスポット42に固定された第1数の標的タンパク質TPに対して基準試料SPR及び被検体試料SPSを接触させた場合を想定する。この場合、基準試料SPR中のリファレンス抗体ABRは、抗原抗体反応(一次反応)によって、第1数の全ての標的タンパク質TPに結合する。一方、被検体試料SPS中の自己抗体ABAは、第1数の標的タンパク質TPのうち、自己抗体ABAに対応した疾患に関連する疾患関連抗原AGと相関を有する標的タンパク質TPに対しては抗原抗体反応(一次反応)によって結合する一方、疾患関連抗原AGと相関を有しない標的タンパク質TPに対しては結合しない。第1数の全ての標的タンパク質TPに対してリファレンス抗体ABRが結合し、疾患関連抗原AGと相関を有する標的タンパク質TPに対して自己抗体ABAが結合した状態で第1二次抗体AB1及び第2二次抗体AB2を接触させると、二次反応下において、第1二次抗体AB1がリファレンス抗体ABRに結合するとともに、第2二次抗体AB2が自己抗体ABAに結合する。この場合、第1数の標的タンパク質TPのそれぞれに対してリファレンス抗体ABRを介して第1蛍光物質で標識された第1二次抗体AB1が結合され、疾患関連抗原AGと相関を有する標的タンパク質TPに対しては自己抗体ABAを介して第2蛍光物質で標識された第2二次抗体AB2が結合される。  Let us assume that the reference sample SPR and the specimen sample SPS are brought into contact with the first number of target proteins TP fixed to the spots 42 on the substrate 41 of the protein microarray 4. In this case, the reference antibody ABR in the reference sample SPR binds to all of the first number of target proteins TP by an antigen-antibody reaction (primary reaction). On the other hand, the autoantibody ABA in the specimen sample SPS binds to the target proteins TP that are correlated with the disease-associated antigen AG associated with the disease corresponding to the autoantibody ABA by an antigen-antibody reaction (primary reaction) among the first number of target proteins TP, but does not bind to the target proteins TP that are not correlated with the disease-associated antigen AG. When the first secondary antibody AB1 and the second secondary antibody AB2 are brought into contact with the first number of target proteins TP in a state in which the reference antibody ABR binds to all of the first number of target proteins TP and the autoantibody ABA binds to the target proteins TP that are correlated with the disease-associated antigen AG, the first secondary antibody AB1 binds to the reference antibody ABR and the second secondary antibody AB2 binds to the autoantibody ABA in the secondary reaction. In this case, a first secondary antibody AB1 labeled with a first fluorescent substance is bound to each of the first number of target proteins TP via a reference antibody ABR, and a second secondary antibody AB2 labeled with a second fluorescent substance is bound to a target protein TP that has a correlation with a disease-associated antigen AG via an autoantibody ABA.
 第1数の標的タンパク質TPに対して基準試料SPR及び被検体試料SPSを接触させた場合、測定装置2は、測定結果を示すデータとして、第1数の標的タンパク質TPのそれぞれについての被検体試料SPSに関する測定データD1を、被検体に対する疾患に関する診断結果を示す診断データD2(図1)と対応付けて出力する。被検体試料SPSに関する測定データD1は、リファレンス抗体ABRに結合可能な第1二次抗体AB1の第1蛍光物質から放出された蛍光の強度を示す第1蛍光強度値D11と、被検体試料SPS中の自己抗体ABAに結合可能な第2二次抗体AB2の第2蛍光物質から放出された蛍光の強度を示す第2蛍光強度値D12と、を含む。第1数の標的タンパク質TPに対して基準試料SPR及び被検体試料SPSを接触させた場合には、上記の通り、第1数の標的タンパク質TPのそれぞれに対してリファレンス抗体ABRを介して第1蛍光物質で標識された第1二次抗体AB1が結合され、疾患関連抗原AGと相関を有する標的タンパク質TPに対しては自己抗体ABAを介して第2蛍光物質で標識された第2二次抗体AB2が結合される。このため、被検体試料SPSに関する測定データD1においては、理想的には、第1数の標的タンパク質TPにおいて、疾患関連抗原AGと相関を有する標的タンパク質TPについては第1蛍光強度値D11が前記第1基準強度値以上の値を示すとともに第2蛍光強度値D12が前記第2基準強度値以上の値を示し、疾患関連抗原AGと相関を有しない標的タンパク質TPについては第1蛍光強度値D11が前記第1基準強度値以上の値を示すとともに第2蛍光強度値D12が前記第2基準強度値未満のゼロに近い値を示す。 When the reference sample SPR and the specimen sample SPS are brought into contact with the first number of target proteins TP, the measuring device 2 outputs, as data indicating the measurement results, measurement data D1 regarding the specimen sample SPS for each of the first number of target proteins TP in association with diagnostic data D2 (FIG. 1) indicating the diagnostic results regarding the disease of the specimen. The measurement data D1 regarding the specimen sample SPS includes a first fluorescence intensity value D11 indicating the intensity of fluorescence emitted from a first fluorescent substance of the first secondary antibody AB1 capable of binding to the reference antibody ABR, and a second fluorescence intensity value D12 indicating the intensity of fluorescence emitted from a second fluorescent substance of the second secondary antibody AB2 capable of binding to the autoantibody ABA in the specimen sample SPS. When the reference sample SPR and the specimen sample SPS are brought into contact with the first number of target proteins TP, as described above, the first secondary antibody AB1 labeled with a first fluorescent substance is bound to each of the first number of target proteins TP via the reference antibody ABR, and the second secondary antibody AB2 labeled with a second fluorescent substance is bound to the target protein TP correlated with the disease-related antigen AG via the autoantibody ABA. Therefore, in the measurement data D1 regarding the specimen sample SPS, ideally, for the first number of target proteins TP, the first fluorescence intensity value D11 indicates a value equal to or greater than the first reference intensity value for the target protein TP correlated with the disease-related antigen AG, and the second fluorescence intensity value D12 indicates a value equal to or greater than the second reference intensity value, and for the target protein TP not correlated with the disease-related antigen AG, the first fluorescence intensity value D11 indicates a value equal to or greater than the first reference intensity value, and the second fluorescence intensity value D12 indicates a value close to zero that is less than the second reference intensity value.
 なお、疾患を有しない被検体に由来する被検体試料SPSには、疾患関連抗原AGに対して産生される自己抗体ABAが含まれていない。このため、疾患を有しない被検体に由来する被検体試料SPSに関する測定データD1においては、理想的には、第1数の標的タンパク質TPのそれぞれについて、第1蛍光強度値D11は前記第1基準強度値以上の値を示し、第2蛍光強度値D12は前記第2基準強度値未満のゼロに近い値を示す。 The specimen sample SPS derived from a subject without a disease does not contain the autoantibody ABA produced against the disease-related antigen AG. Therefore, in the measurement data D1 relating to the specimen sample SPS derived from a subject without a disease, ideally, for each of the first number of target proteins TP, the first fluorescence intensity value D11 indicates a value equal to or greater than the first reference intensity value, and the second fluorescence intensity value D12 indicates a value close to zero that is less than the second reference intensity value.
 図3には、測定装置2から出力された測定データD1の一例が示されている。図3の例では、測定データD1は、第1数の標的タンパク質TPのそれぞれについて、NCに関する第1蛍光強度値D11及び第2蛍光強度値D12と、複数の被検体に由来する複数の被検体試料SPSに関して診断データD2と対応付けられた第1蛍光強度値D11及び第2蛍光強度値D12と、を含む。この際、複数の被検体試料SPSに関する測定データD1には、第1蛍光強度値D11及び第2蛍光強度値D12のデータ群として、疾患を有しないとの診断結果を示す第1診断データD21と対応付けられたデータ群と、例えば神経・筋疾患に関する疾患を有するとの診断結果を示す第2~第6診断データD22~D26と対応付けられたデータ群と、が含まれる。例えば、第2診断データD22は、注意欠陥多動性障害を有するとの診断結果を示す診断データである。第3診断データD23は、レビー小体型認知症を有するとの診断結果を示す診断データである。第4診断データD24は、軽度認知障害を有するとの診断結果を示す診断データである。第5診断データD25は、大うつ病障害を有するとの診断結果を示す診断データである。第6診断データD26は、パーキンソン病を有するとの診断結果を示す診断データである。 3 shows an example of the measurement data D1 output from the measurement device 2. In the example of FIG. 3, the measurement data D1 includes, for each of the first number of target proteins TP, a first fluorescence intensity value D11 and a second fluorescence intensity value D12 for NC, and a first fluorescence intensity value D11 and a second fluorescence intensity value D12 associated with diagnostic data D2 for a plurality of specimen samples SPS derived from a plurality of specimens. In this case, the measurement data D1 for the plurality of specimen samples SPS includes, as a data group of the first fluorescence intensity value D11 and the second fluorescence intensity value D12, a data group associated with the first diagnostic data D21 indicating a diagnosis result of no disease, and a data group associated with the second to sixth diagnostic data D22 to D26 indicating a diagnosis result of, for example, a disease related to a neuromuscular disease. For example, the second diagnostic data D22 is diagnostic data indicating a diagnosis result of having attention deficit hyperactivity disorder. The third diagnostic data D23 is diagnostic data indicating a diagnosis result of having Lewy body dementia. The fourth diagnostic data D24 is diagnostic data indicating a diagnosis result of mild cognitive impairment. The fifth diagnostic data D25 is diagnostic data indicating a diagnosis result of major depressive disorder. The sixth diagnostic data D26 is diagnostic data indicating a diagnosis result of Parkinson's disease.
 [抗原探索装置について]
 測定装置2から出力された測定データD1は、抗原探索装置3に入力される。抗原探索装置3について、図1に加えて図4及び図5を参照しながら説明する。抗原探索装置3は、CPU(Central Processing Unit)、HDD(Hard Disk Drive)やフラッシュメモリなどの記憶領域、CPUの作業領域として使用されるRAM(Random Access Memory)等を備えたコンピュータである。抗原探索装置3は、測定データD1に基づいて、タンパク質マイクロアレイ4の基板41上に固定された第1数の標的タンパク質TPの中から、複数の疾患(例えば神経・筋疾患に関する疾患)のそれぞれに関連する各疾患関連抗原AGと相関を有する標的タンパク質TPを探索する。抗原探索装置3は、ニューラルネットワーク(Neural Network)を用いた機械学習によって、複数の疾患のそれぞれに関連する各疾患関連抗原AGと相関を有する標的タンパク質TPを探索する抗原探索方法を実行する。ニューラルネットワークは、人間の脳の構造を模した構成となっており、人間の脳におけるニューロン(神経細胞)の機能を模した論理回路を多層に積層して構成される。
[About the antigen search device]
The measurement data D1 output from the measurement device 2 is input to the antigen search device 3. The antigen search device 3 will be described with reference to FIG. 1 as well as FIG. 4 and FIG. 5. The antigen search device 3 is a computer equipped with a CPU (Central Processing Unit), a storage area such as a HDD (Hard Disk Drive) or a flash memory, and a RAM (Random Access Memory) used as a working area for the CPU. Based on the measurement data D1, the antigen search device 3 searches for a target protein TP that has a correlation with each disease-related antigen AG associated with each of a plurality of diseases (e.g., diseases related to neuromuscular diseases) from among a first number of target proteins TP fixed on the substrate 41 of the protein microarray 4. The antigen search device 3 executes an antigen search method for searching for a target protein TP that has a correlation with each disease-related antigen AG associated with each of a plurality of diseases by machine learning using a neural network. A neural network is designed to mimic the structure of the human brain, and is made up of multiple layers of logic circuits that mimic the functions of neurons (nerve cells) in the human brain.
 抗原探索装置3は、機能的構成として、抗原探索方法におけるデータ取得工程S1を行うデータ取得部31と、抗原探索方法におけるデータ算出工程S2を行うデータ算出部32と、抗原探索方法における抗原選定工程S3を行う抗原選定部33と、を備える。 The antigen search device 3 has, as its functional configuration, a data acquisition unit 31 that performs the data acquisition step S1 in the antigen search method, a data calculation unit 32 that performs the data calculation step S2 in the antigen search method, and an antigen selection unit 33 that performs the antigen selection step S3 in the antigen search method.
 データ取得部31は、測定装置2から出力された測定データD1を取得することにより、データ取得工程S1を行う。本実施形態では、上記の通り、測定データD1は、タンパク質マイクロアレイ4の基板41上に固定された第1数の標的タンパク質TPのそれぞれについて、NCに関する第1蛍光強度値D11及び第2蛍光強度値D12と、複数の被検体に由来する複数の被検体試料SPSに関して診断データD2と対応付けられた第1蛍光強度値D11及び第2蛍光強度値D12と、を含む。データ取得部31は、測定データD1を取得し(ステップS11)、その取得した測定データD1をデータ算出部32に向けて出力する(ステップS12)。 The data acquisition unit 31 performs the data acquisition step S1 by acquiring the measurement data D1 output from the measurement device 2. In this embodiment, as described above, the measurement data D1 includes, for each of the first number of target proteins TP fixed on the substrate 41 of the protein microarray 4, a first fluorescence intensity value D11 and a second fluorescence intensity value D12 for NC, and a first fluorescence intensity value D11 and a second fluorescence intensity value D12 associated with diagnostic data D2 for multiple specimen samples SPS derived from multiple specimens. The data acquisition unit 31 acquires the measurement data D1 (step S11) and outputs the acquired measurement data D1 to the data calculation unit 32 (step S12).
 データ算出部32は、測定データD1に基づいて、複数の被検体試料SPS中の自己抗体ABAと第1数の標的タンパク質TPとの抗原抗体反応に伴う結合性に関する、第1数の標的タンパク質TPのそれぞれについての抗体データD3を、複数の被検体のそれぞれに対する診断データD2と対応付けて算出することにより、データ算出工程S2を行う。データ算出部32は、測定データD1に基づいて診断データD2と対応付けて抗体データD3を算出し(ステップS21)、その算出した抗体データD3を抗原選定部33に向けて出力する(ステップS22)。 The data calculation unit 32 performs the data calculation step S2 by calculating antibody data D3 for each of the first number of target proteins TP, which relates to the binding associated with the antigen-antibody reaction between the autoantibody ABA in the multiple specimen samples SPS and the first number of target proteins TP, in association with the diagnostic data D2 for each of the multiple specimens, based on the measurement data D1. The data calculation unit 32 calculates the antibody data D3 in association with the diagnostic data D2 based on the measurement data D1 (step S21), and outputs the calculated antibody data D3 to the antigen selection unit 33 (step S22).
 第1数の標的タンパク質TPのそれぞれについての抗体データD3は、各標的タンパク質TPに対する自己抗体ABAの結合性の違いに応じて異なる値を示し、自己抗体ABAの結合性が高くなるに従って大きい値を示す。図5に示されるように、第1数の標的タンパク質TPのそれぞれについての抗体データD3は、疾患を有しないとの診断結果を示す第1診断データD21と対応付けられた第1抗体データD31のデータ群と、疾患を有するとの診断結果を示す第2~第6診断データD22~D26と対応付けられた第2抗体データD32のデータ群と、を含む。 The antibody data D3 for each of the first number of target proteins TP shows different values according to the difference in binding affinity of the autoantibody ABA to each target protein TP, and shows a larger value as the binding affinity of the autoantibody ABA increases. As shown in FIG. 5, the antibody data D3 for each of the first number of target proteins TP includes a data group of first antibody data D31 associated with first diagnostic data D21 indicating a diagnosis result of no disease, and a data group of second antibody data D32 associated with second to sixth diagnostic data D22 to D26 indicating a diagnosis result of disease.
 第1診断データD21と対応付けられた第1抗体データD31は、測定データD1において第1診断データD21と対応付けられた第1蛍光強度値D11及び第2蛍光強度値D12に基づいて算出される。第1抗体データD31は、第1数の標的タンパク質TPごとに、第2~第6診断データD22~D26と対応付けられた第2抗体データD32よりも小さい値を示す。 The first antibody data D31 associated with the first diagnostic data D21 is calculated based on the first fluorescence intensity value D11 and the second fluorescence intensity value D12 associated with the first diagnostic data D21 in the measurement data D1. The first antibody data D31 indicates a value smaller than the second antibody data D32 associated with the second to sixth diagnostic data D22 to D26 for each of the first number of target proteins TP.
 第2~第6診断データD22~D26と対応付けられた第2抗体データD32は、測定データD1において第2~第6診断データD22~D26と対応付けられた第1蛍光強度値D11及び第2蛍光強度値D12に基づいて算出される。第2~第6診断データD22~D26と対応付けられた第2抗体データD32では、第1数の標的タンパク質TPにおいて、第2~第6診断データD22~D26で示される各疾患に関連する各疾患関連抗原AGと相関を有する標的タンパク質TPに対する値が、その他の標的タンパク質TPに対する値よりも大きくなる。 The second antibody data D32 associated with the second to sixth diagnostic data D22 to D26 is calculated based on the first fluorescence intensity value D11 and the second fluorescence intensity value D12 associated with the second to sixth diagnostic data D22 to D26 in the measurement data D1. In the second antibody data D32 associated with the second to sixth diagnostic data D22 to D26, in the first number of target proteins TP, the value for the target protein TP that is correlated with each disease-related antigen AG associated with each disease indicated in the second to sixth diagnostic data D22 to D26 is greater than the values for the other target proteins TP.
 なお、第1診断データD21と対応付けられた第1抗体データD31のデータ群の数は、第1診断データD21で示される診断結果の被検体に由来する被検体試料SPSの数と一致している。また、第2~第6診断データD22~D26とそれぞれ対応付けられた第2抗体データD32の各データ群の数は、第2~第6診断データD22~D26で示される診断結果の各被検体に由来する各被検体試料SPSの数と一致している。例えば、第1~第6診断データD21~D26で示される診断結果の被検体に由来する被検体試料SPSの数がそれぞれ10個ずつであった場合を想定する。この場合、第1診断データD21と対応付けられた第1抗体データD31のデータ群の数、第2診断データD22と対応付けられた第2抗体データD32のデータ群の数、第3診断データD23と対応付けられた第2抗体データD32のデータ群の数、第4診断データD24と対応付けられた第2抗体データD32のデータ群の数、第5診断データD25と対応付けられた第2抗体データD32のデータ群の数、第6診断データD26と対応付けられた第2抗体データD32のデータ群の数は、それぞれ「10」である。 The number of data groups in the first antibody data D31 associated with the first diagnostic data D21 matches the number of specimen samples SPS derived from the subjects of the diagnostic results shown in the first diagnostic data D21. The number of data groups in the second antibody data D32 associated with the second to sixth diagnostic data D22 to D26 matches the number of specimen samples SPS derived from each subject of the diagnostic results shown in the second to sixth diagnostic data D22 to D26. For example, assume that the number of specimen samples SPS derived from each subject of the diagnostic results shown in the first to sixth diagnostic data D21 to D26 is 10. In this case, the number of data groups of the first antibody data D31 associated with the first diagnostic data D21, the number of data groups of the second antibody data D32 associated with the second diagnostic data D22, the number of data groups of the second antibody data D32 associated with the third diagnostic data D23, the number of data groups of the second antibody data D32 associated with the fourth diagnostic data D24, the number of data groups of the second antibody data D32 associated with the fifth diagnostic data D25, and the number of data groups of the second antibody data D32 associated with the sixth diagnostic data D26 are each "10".
 抗原選定部33は、診断データD2で示される複数の疾患のそれぞれに対応して、第1数の標的タンパク質TPの中から抗体データD3に基づき疾患関連抗原AGと相関を有する特定の標的タンパク質TPSを選定することにより、抗原選定工程S3を行う。具体的には、抗原選定部33は、診断データD2と対応付けられた第1数の標的タンパク質TPのそれぞれについての抗体データD3を入力データとし、診断データD2で示される複数の疾患のそれぞれに対応して第1数の標的タンパク質TPの中から抗体データD3に基づき疾患関連抗原AGと相関を有する特定の標的タンパク質TPSを選定し、当該選定した特定の標的タンパク質TPSを出力データとして出力する機械学習モデルLMを生成する(ステップS31)。抗原選定部33は、入力データが入力されると、機械学習モデルLMに従って、抗体データD3に基づき診断データD2と第1数の標的タンパク質TPとの関係性を解析し、診断データD2で示される複数の疾患のそれぞれに対応して第1数の標的タンパク質TPの中から特定の標的タンパク質TPSを選定する(ステップS33)。そして、抗原選定部33は、診断データD2で示される複数の疾患のそれぞれに対応して選定した特定の標的タンパク質TPSを、各疾患に関連する各疾患関連抗原AGと相関を有する標的タンパク質TPとして出力する(ステップS34)。 The antigen selection unit 33 performs the antigen selection step S3 by selecting a specific target protein TPS that has a correlation with the disease-related antigen AG from among the first number of target proteins TP based on the antibody data D3 corresponding to each of the multiple diseases indicated in the diagnostic data D2. Specifically, the antigen selection unit 33 uses the antibody data D3 for each of the first number of target proteins TP associated with the diagnostic data D2 as input data, selects a specific target protein TPS that has a correlation with the disease-related antigen AG from among the first number of target proteins TP based on the antibody data D3 corresponding to each of the multiple diseases indicated in the diagnostic data D2, and generates a machine learning model LM that outputs the selected specific target protein TPS as output data (step S31). When the input data is input, the antigen selection unit 33 analyzes the relationship between the diagnostic data D2 and the first number of target proteins TP based on the antibody data D3 according to the machine learning model LM, and selects a specific target protein TPS from among the first number of target proteins TP corresponding to each of the multiple diseases indicated in the diagnostic data D2 (step S33). The antigen selection unit 33 then outputs the specific target protein TPS selected for each of the multiple diseases indicated in the diagnostic data D2 as a target protein TP that has a correlation with each disease-related antigen AG related to each disease (step S34).
 抗原探索装置3では、タンパク質マイクロアレイ4の基板41上に固定された第1数の標的タンパク質TPと複数の被検体試料SPS中の自己抗体ABAとの抗原抗体反応に伴う結合性に関する抗体データD3を入力データとして機械学習モデルLMに入力することにより、診断データD2で示される複数の疾患のそれぞれに対応して、第1数の標的タンパク質TPの中から抗体データD3に基づき選定された疾患関連抗原AGと相関を有する特定の標的タンパク質TPSを出力データとして出力することができる。これにより、複数の疾患のそれぞれに対応して個別に標的タンパク質TPを探索する場合と比較して、複数の疾患のそれぞれに関連する各疾患関連抗原AGと相関を有する特定の標的タンパク質TPSを効率よく探索することが可能である。 In the antigen search device 3, antibody data D3 relating to the binding associated with the antigen-antibody reaction between a first number of target proteins TP fixed on the substrate 41 of the protein microarray 4 and autoantibodies ABA in multiple specimen samples SPS is input as input data to the machine learning model LM, and a specific target protein TPS that is correlated with a disease-associated antigen AG selected from the first number of target proteins TP based on the antibody data D3 corresponding to each of the multiple diseases indicated in the diagnostic data D2 can be output as output data. This makes it possible to efficiently search for a specific target protein TPS that is correlated with each disease-associated antigen AG associated with each of the multiple diseases, compared to searching for target proteins TP individually corresponding to each of the multiple diseases.
 既述の通り、データ算出部32により算出される第1数の標的タンパク質TPのそれぞれについての抗体データD3は、疾患を有しないとの診断結果を示す第1診断データD21と対応付けられた第1抗体データD31のデータ群と、疾患を有するとの診断結果を示す第2~第6診断データD22~D26と対応付けられた第2抗体データD32のデータ群と、を含む。この場合、図4に示されるように、抗原選定部33は、特定の標的タンパク質TPSを選定するステップS33の処理の前に、第1数の標的タンパク質TPの中から、第1数よりも少ない第2数の複数の標的タンパク質TPを抽出するステップS32の処理を行ってもよい。抗原選定部33は、第1抗体データD31が第1閾値以下であり且つ第2抗体データD32が第1閾値よりも大きい第2閾値以上である条件を満たす第2数の標的タンパク質TPを、第1数の標的タンパク質TPの中から抽出する。なお、第1診断データD21と対応付けられた第1抗体データD31のデータ群が複数存在し、第2~第6診断データD22~D26とそれぞれ対応付けられた第2抗体データD32の各データ群が複数存在する場合、抗原選定部33は、第1閾値以上の第1抗体データD31の数が第1判定数以下であり、且つ、第2閾値以上の第2抗体データD32の数が第2判定数以上である条件を満たす第2数の標的タンパク質TPを、第1数の標的タンパク質TPの中から抽出するようにしてもよい。 As described above, the antibody data D3 for each of the first number of target proteins TP calculated by the data calculation unit 32 includes a data group of the first antibody data D31 associated with the first diagnostic data D21 indicating a diagnosis result of not having a disease, and a data group of the second antibody data D32 associated with the second to sixth diagnostic data D22 to D26 indicating a diagnosis result of having a disease. In this case, as shown in FIG. 4, the antigen selection unit 33 may perform a process of step S32 of extracting a second number of multiple target proteins TP, which is smaller than the first number, from the first number of target proteins TP before the process of step S33 of selecting a specific target protein TPS. The antigen selection unit 33 extracts a second number of target proteins TP from the first number of target proteins TP that satisfy the condition that the first antibody data D31 is equal to or smaller than a first threshold value and the second antibody data D32 is equal to or larger than a second threshold value that is larger than the first threshold value. In addition, when there are multiple data groups of the first antibody data D31 associated with the first diagnostic data D21, and there are multiple data groups of the second antibody data D32 associated with the second to sixth diagnostic data D22 to D26, respectively, the antigen selection unit 33 may extract a second number of target proteins TP from the first number of target proteins TP that satisfy the condition that the number of the first antibody data D31 equal to or greater than the first threshold is equal to or less than the first determination number, and the number of the second antibody data D32 equal to or greater than the second threshold is equal to or greater than the second determination number.
 ステップS32において第1数の標的タンパク質TPの中から第2数の標的タンパク質TPを抽出した場合、抗原選定部33は、機械学習モデルLMに従って、抗体データD3に基づき診断データD2と第2数の標的タンパク質TPとの関係性を解析し、診断データD2で示される複数の疾患のそれぞれに対応して第2数の標的タンパク質TPの中から特定の標的タンパク質TPSを選定する(ステップS33)。この場合、診断データD2で示される複数の疾患のそれぞれについて、機械学習モデルLMに従って疾患関連抗原AGと相関を有する特定の標的タンパク質TPSを選定する際に、タンパク質マイクロアレイ4の基板41上に固定された第1数の標的タンパク質TPよりも少ない第2数の標的タンパク質TPの中から特定の標的タンパク質TPSを選定する。これにより、機械学習モデルLMを用いて特定の標的タンパク質TPSを選定するのに要する時間の短縮化を図ることができるとともに、複数の疾患のそれぞれに対応して特定の標的タンパク質TPSをより的確に探索することが可能である。 When the second number of target proteins TP is extracted from the first number of target proteins TP in step S32, the antigen selection unit 33 analyzes the relationship between the diagnostic data D2 and the second number of target proteins TP based on the antibody data D3 according to the machine learning model LM, and selects a specific target protein TPS from the second number of target proteins TP corresponding to each of the multiple diseases indicated in the diagnostic data D2 (step S33). In this case, when selecting a specific target protein TPS having a correlation with the disease-related antigen AG according to the machine learning model LM for each of the multiple diseases indicated in the diagnostic data D2, the specific target protein TPS is selected from the second number of target proteins TP that is less than the first number of target proteins TP fixed on the substrate 41 of the protein microarray 4. This makes it possible to shorten the time required to select a specific target protein TPS using the machine learning model LM, and to more accurately search for a specific target protein TPS corresponding to each of the multiple diseases.
 また、ステップS33において第2数の標的タンパク質TPの中から特定の標的タンパク質TPSを選定する場合、抗原選定部33は、第2数の標的タンパク質TPのそれぞれについての抗体データD3の間の類似度を求め、当該類似度に基づく階層的クラスタリングHCによって第2数の標的タンパク質TPを、診断データD2で示される複数の疾患のそれぞれに対応して複数のグループに分類してもよい。そして、抗原選定部33は、複数のグループにそれぞれ属する標的タンパク質TPを特定の標的タンパク質TPSとして選定する。この場合、抗原選定部33は、第2数の標的タンパク質TPのそれぞれについての抗体データD3の間の類似度に基づいて、階層的クラスタリングHCによって第2数の標的タンパク質TPを複数のグループに分類する。そして、抗原選定部33は、複数のグループにそれぞれ属する標的タンパク質TPを、複数の疾患のそれぞれに関連する各疾患関連抗原AGと相関を有する特定の標的タンパク質TPSとして選定することができる。 Furthermore, when selecting a specific target protein TPS from the second number of target proteins TP in step S33, the antigen selection unit 33 may obtain the similarity between the antibody data D3 for each of the second number of target proteins TP, and classify the second number of target proteins TP into a plurality of groups corresponding to each of the plurality of diseases indicated in the diagnostic data D2 by hierarchical clustering HC based on the similarity. Then, the antigen selection unit 33 selects the target proteins TP belonging to each of the plurality of groups as the specific target protein TPS. In this case, the antigen selection unit 33 classifies the second number of target proteins TP into a plurality of groups by hierarchical clustering HC based on the similarity between the antibody data D3 for each of the second number of target proteins TP. Then, the antigen selection unit 33 can select the target proteins TP belonging to each of the plurality of groups as the specific target protein TPS having a correlation with each disease-related antigen AG related to each of the plurality of diseases.
 抗原選定部33は、階層的クラスタリングHCにおいて、第2数の標的タンパク質TPのそれぞれについての抗体データD3を1個ずつのクラスタに割り当てるところから開始し、抗体データD3の類似したクラスタを再帰的に結合する。階層的クラスタリングHCには、結合するクラスタを選択する基準によって、最短距離法、最長距離法、群平均法などの手法がある。クラスタ間の類似度は、クラスタを跨る抗体データD3の間の類似度で定義される。 The antigen selection unit 33 starts by assigning the antibody data D3 for each of the second number of target proteins TP to one cluster in the hierarchical clustering HC, and then recursively combines similar clusters of antibody data D3. Depending on the criteria for selecting the clusters to be combined, the hierarchical clustering HC includes methods such as the shortest distance method, the longest distance method, and the group average method. The similarity between clusters is defined as the similarity between the antibody data D3 across the clusters.
 抗原選定部33は、クラスタ間の類似度が大きい組から順次結合して1個のクラスタとしていき、全てのクラスタ間の類似度が予め定めた類似度基準値を下回ったときに結合を停止する。抗原選定部33による階層的クラスタリングHCでは、クラスタの結合が停止されたときに、第2数の標的タンパク質TPが複数の疾患のそれぞれに対応した複数のグループに分類される。この際、抗原選定部33は、各グループに属する標的タンパク質TPの数について最大制限数を設定しておき、結合する2個のクラスタに含まれる抗体データD3の数の合計が前記最大制限数より大きい場合は類似度をゼロとみなすことにより、各グループに属する標的タンパク質TPの数が前記最大制限数を超えないようにすることが可能である。 The antigen selection unit 33 combines the clusters into one cluster in order starting from the pair with the greatest similarity between them, and stops the combination when the similarity between all clusters falls below a predetermined similarity reference value. In the hierarchical clustering HC by the antigen selection unit 33, when the combination of clusters is stopped, the second number of target proteins TP are classified into a plurality of groups corresponding to each of a plurality of diseases. At this time, the antigen selection unit 33 sets a maximum limit number for the number of target proteins TP belonging to each group, and by considering the similarity to be zero when the total number of antibody data D3 contained in the two combined clusters is greater than the maximum limit number, it is possible to prevent the number of target proteins TP belonging to each group from exceeding the maximum limit number.
 また、抗原選定部33は、ステップS31において機械学習モデルLMを生成する際に、遺伝的アルゴリズムGAを用いてもよい(ステップS35)。この場合、抗原選定部33は、複数のグループにそれぞれ属する各標的タンパク質TPの各抗体データD3に対応した診断データD2で示される各疾患が同一となることを目標として、遺伝的アルゴリズムGAを用いて機械学習モデルLMを生成する。抗原選定部33は、診断データD2と対応付けられた第2数の標的タンパク質TPのそれぞれについての抗体データD3を入力データとし、階層的クラスタリングHCによって分類された各グループに属する特定の標的タンパク質TPSを出力データとする機械学習モデルLMを、遺伝的アルゴリズムGAを用いて生成する。抗原選定部33は、遺伝的アルゴリズムGAを用いることによって、階層的クラスタリングHCによって分類された各グループに属する各標的タンパク質TPの各抗体データD3に対応した診断データD2で示される各疾患が同一となることを目標として、機械学習モデルLMを生成することができる。 The antigen selection unit 33 may also use a genetic algorithm GA when generating the machine learning model LM in step S31 (step S35). In this case, the antigen selection unit 33 generates the machine learning model LM using the genetic algorithm GA with the goal that the diseases indicated in the diagnostic data D2 corresponding to the antibody data D3 of each target protein TP belonging to each of the multiple groups are the same. The antigen selection unit 33 uses the antibody data D3 for each of the second number of target proteins TP associated with the diagnostic data D2 as input data, and generates the machine learning model LM using the genetic algorithm GA with a specific target protein TPS belonging to each group classified by the hierarchical clustering HC as output data. By using the genetic algorithm GA, the antigen selection unit 33 can generate the machine learning model LM with the goal that the diseases indicated in the diagnostic data D2 corresponding to the antibody data D3 of each target protein TP belonging to each group classified by the hierarchical clustering HC are the same.
 遺伝的アルゴリズムGAにおいて、抗原選定部33は、例えば、階層的クラスタリングHCによって分類された各グループに属する各標的タンパク質TPに対応した各疾患の同一性について評価し、各疾患が同一の場合は第1の報酬を与え、各疾患が同一ではない場合には第1の報酬よりも低い第2の報酬を与える。抗原選定部33は、このような遺伝的アルゴリズムGAを繰り返し行うことによって、階層的クラスタリングHCによって分類された各グループに属する各標的タンパク質TPに対応した各疾患が同一となるような機械学習モデルLMを生成することができる。これにより、遺伝的アルゴリズムGAを用いて生成される機械学習モデルLMに対し、第2数の標的タンパク質TPのそれぞれについての抗体データD3が入力データとして入力されると、階層的クラスタリングHCによって第2数の標的タンパク質TPを、診断データD2で示される複数の疾患のそれぞれに対応して複数のグループに分類することができる。 In the genetic algorithm GA, the antigen selection unit 33 evaluates, for example, the identity of each disease corresponding to each target protein TP belonging to each group classified by the hierarchical clustering HC, and gives a first reward if the diseases are the same, and gives a second reward lower than the first reward if the diseases are not the same. By repeatedly performing such a genetic algorithm GA, the antigen selection unit 33 can generate a machine learning model LM in which the diseases corresponding to each target protein TP belonging to each group classified by the hierarchical clustering HC are the same. As a result, when the antibody data D3 for each of the second number of target proteins TP is input as input data to the machine learning model LM generated using the genetic algorithm GA, the second number of target proteins TP can be classified by the hierarchical clustering HC into multiple groups corresponding to each of the multiple diseases indicated in the diagnostic data D2.
 タンパク質マイクロアレイ4を用いた場合の測定データD1は、再現性が低いことがある。このため、抗原探索装置3において複数の疾患のそれぞれに対応して複数のタンパク質マイクロアレイ4を用いて各疾患に関連する疾患関連抗原AGと相関を有する特定の標的タンパク質TPSを探索する場合、複数の疾患のそれぞれに対応した測定データD1の間の測定誤差を考慮する必要がある。 The measurement data D1 obtained when using a protein microarray 4 may have low reproducibility. For this reason, when using multiple protein microarrays 4 corresponding to multiple diseases in the antigen searching device 3 to search for a specific target protein TPS that correlates with a disease-associated antigen AG associated with each disease, it is necessary to take into account the measurement error between the measurement data D1 corresponding to each of the multiple diseases.
 そこで、本実施形態に係る抗原探索装置3において、データ算出部32は、機械学習モデルLMに対して入力データとして入力される抗体データD3について、診断データD2で示される複数の疾患のそれぞれに対応した測定データD1の間の測定誤差を吸収するための所定の標準化処理を行うことにより、標準化された抗体データD3を第1数の標的タンパク質TPのそれぞれについて算出する。具体的には、データ算出部32は、診断データD2で示される複数の疾患ごとの抗体データD3について、少なくとも一のパーセンタイルにある値を用いて標準化処理を行うことにより、標準化された抗体データD3を第1数の標的タンパク質TPのそれぞれについて算出する。データ算出部32は、このパーセンタイル基準の標準化処理によって、複数の疾患のそれぞれに対応した測定データD1の間の測定誤差を適切に吸収した、標準化された抗体データD3を算出することができる。この場合、抗原選定部33は、標準化された抗体データD3を入力データとして機械学習モデルLMに入力することにより、診断データD2で示される複数の疾患のそれぞれに対応して、的確な特定の標的タンパク質TPSを出力データとして出力することができる。 In the antigen search device 3 according to this embodiment, the data calculation unit 32 performs a predetermined standardization process on the antibody data D3 input as input data to the machine learning model LM to absorb the measurement error between the measurement data D1 corresponding to each of the multiple diseases indicated in the diagnostic data D2, thereby calculating the standardized antibody data D3 for each of the first number of target proteins TP. Specifically, the data calculation unit 32 performs a standardization process on the antibody data D3 for each of the multiple diseases indicated in the diagnostic data D2 using a value at least at one percentile, thereby calculating the standardized antibody data D3 for each of the first number of target proteins TP. The data calculation unit 32 can calculate the standardized antibody data D3 that appropriately absorbs the measurement error between the measurement data D1 corresponding to each of the multiple diseases by this percentile-based standardization process. In this case, the antigen selection unit 33 can input the standardized antibody data D3 as input data to the machine learning model LM, thereby outputting the accurate specific target protein TPS as output data corresponding to each of the multiple diseases indicated in the diagnostic data D2.
 また、タンパク質マイクロアレイ4においては、基板41上に固定された第1数の標的タンパク質TPの量にばらつきが生じていることがある。このため、抗原探索装置3において複数の疾患のそれぞれに対応してタンパク質マイクロアレイ4を用いて各疾患に関連する疾患関連抗原AGと相関を有する特定の標的タンパク質TPを探索する場合、複数の疾患のそれぞれに対応した測定データD1について、基板41上における標的タンパク質TPの固定量の相違に起因した測定誤差を考慮する必要がある。 In addition, in the protein microarray 4, there may be variation in the amount of the first number of target proteins TP fixed on the substrate 41. For this reason, when searching for specific target proteins TP that are correlated with disease-associated antigens AG related to each of a plurality of diseases using the protein microarray 4 in the antigen searching device 3 corresponding to each of a plurality of diseases, it is necessary to take into account measurement errors caused by differences in the amount of target proteins TP fixed on the substrate 41 for the measurement data D1 corresponding to each of the plurality of diseases.
 そこで、本実施形態に係る抗原探索装置3において、データ取得部31は、タンパク質マイクロアレイ4の基板41上に固定された第1数の標的タンパク質TPのそれぞれについての測定データD1として、第1数の全ての標的タンパク質TPに結合した所定のリファレンス抗体ABRに結合された第1二次抗体AB1の第1蛍光物質から放出された蛍光の強度に関する第1蛍光強度値D11と、第1数の標的タンパク質TPのうちの一部の標的タンパク質TPに結合した自己抗体ABAに結合された第2二次抗体AB2の第2蛍光物質から放出された蛍光の強度に関する第2蛍光強度値D12と、を取得する。そして、データ算出部32は、機械学習モデルLMに対して入力データとして入力される抗体データD3について、第1蛍光強度値D11及び第2蛍光強度値D12を用いて標準化処理を行うことにより、標準化された抗体データD3を第1数の標的タンパク質TPのそれぞれについて算出する。これにより、データ算出部32は、複数の疾患のそれぞれに対応した測定データD1について、基板41上における標的タンパク質TPの固定量の相違に起因した測定誤差を適切に吸収した、標準化された抗体データD3を算出することができる。この場合、抗原選定部33は、標準化された抗体データD3を入力データとして機械学習モデルLMに入力することにより、診断データD2で示される複数の疾患のそれぞれに対応して、的確な特定の標的タンパク質TPSを出力データとして出力することができる。 Therefore, in the antigen searching device 3 according to this embodiment, the data acquiring unit 31 acquires, as the measurement data D1 for each of the first number of target proteins TP fixed on the substrate 41 of the protein microarray 4, a first fluorescence intensity value D11 relating to the intensity of fluorescence emitted from the first fluorescent substance of the first secondary antibody AB1 bound to a predetermined reference antibody ABR bound to all of the first number of target proteins TP, and a second fluorescence intensity value D12 relating to the intensity of fluorescence emitted from the second fluorescent substance of the second secondary antibody AB2 bound to the autoantibody ABA bound to some of the target proteins TP among the first number of target proteins TP. Then, the data calculating unit 32 performs a standardization process using the first fluorescence intensity value D11 and the second fluorescence intensity value D12 for the antibody data D3 input as input data to the machine learning model LM, thereby calculating the standardized antibody data D3 for each of the first number of target proteins TP. This allows the data calculation unit 32 to calculate standardized antibody data D3 for the measurement data D1 corresponding to each of the multiple diseases, which appropriately absorbs measurement errors caused by differences in the amount of the target protein TP fixed on the substrate 41. In this case, the antigen selection unit 33 can input the standardized antibody data D3 as input data to the machine learning model LM, thereby outputting an accurate specific target protein TPS as output data corresponding to each of the multiple diseases indicated in the diagnosis data D2.
 次に、図3に示される測定データD1と図5に示される抗体データD3とを対比しながら、データ算出部32が測定データD1に基づき抗体データD3を算出する際の標準化処理の一例について、より詳細に説明する。データ算出部32は、以下に示す第1~第5の標準化処理ステップに従って、測定データD1に基づき標準化された抗体データD3を算出する。 Next, an example of the standardization process in which the data calculation unit 32 calculates the antibody data D3 based on the measurement data D1 will be described in more detail, while comparing the measurement data D1 shown in FIG. 3 with the antibody data D3 shown in FIG. 5. The data calculation unit 32 calculates the standardized antibody data D3 based on the measurement data D1, according to the first to fifth standardization process steps shown below.
 第1の標準化処理ステップにおいて、データ算出部32は、第1数の標的タンパク質TPごとに、NCの第1蛍光強度値D11と第1~第6診断データD21~D26のそれぞれに対応付けられた第1蛍光強度値D11との各第1蛍光強度値D11を、各第1蛍光強度値D11の平均値で除算する。これにより、データ算出部32は、NCと第1~第6診断データD21~D26のそれぞれとに対応付けて、第1数の標的タンパク質TPのそれぞれについての第1標準化指標値を求める。 In the first standardization process step, the data calculation unit 32 divides, for each of the first number of target proteins TP, the first fluorescence intensity value D11 of the NC and the first fluorescence intensity values D11 associated with each of the first to sixth diagnostic data D21 to D26 by the average value of each of the first fluorescence intensity values D11. As a result, the data calculation unit 32 obtains a first standardized index value for each of the first number of target proteins TP in association with the NC and each of the first to sixth diagnostic data D21 to D26.
 第2の標準化処理ステップにおいて、データ算出部32は、第1数の標的タンパク質TPのそれぞれについて、NCの第2蛍光強度値D12と第1~第6診断データD21~D26のそれぞれに対応付けられた第2蛍光強度値D12との各第2蛍光強度値D12を、各第1標準化指標値で除算する。これにより、データ算出部32は、NCと第1~第6診断データD21~D26のそれぞれとに対応付けて、第1数の標的タンパク質TPのそれぞれについての第2標準化指標値を求める。 In the second standardization processing step, the data calculation unit 32 divides the second fluorescence intensity value D12 of the NC and the second fluorescence intensity value D12 associated with each of the first to sixth diagnostic data D21 to D26 by each of the first standardized index values for each of the first number of target proteins TP. As a result, the data calculation unit 32 obtains the second standardized index value for each of the first number of target proteins TP in association with the NC and each of the first to sixth diagnostic data D21 to D26.
 第3の標準化処理ステップにおいて、データ算出部32は、第1数の標的タンパク質TPのそれぞれについて、第1~第6診断データD21~D26のそれぞれに対応付けられた各第2標準化指標値から、NCに対応付けられた第2標準化指標値を減算する。これにより、データ算出部32は、第1~第6診断データD21~D26のそれぞれに対応付けて、第1数の標的タンパク質TPのそれぞれについての第3標準化指標値を求める。 In the third standardization processing step, the data calculation unit 32 subtracts the second standardized index value associated with NC from each of the second standardized index values associated with the first to sixth diagnostic data D21 to D26 for each of the first number of target proteins TP. In this way, the data calculation unit 32 obtains a third standardized index value for each of the first number of target proteins TP in association with each of the first to sixth diagnostic data D21 to D26.
 第4の標準化処理ステップにおいて、データ算出部32は、第1~第6診断データD21~D26ごとに、第1数の標的タンパク質TPのそれぞれについての各第3標準化指標値から25パーセンタイルにある値を減算し、その減算値を標準偏差で除算する。これにより、データ算出部32は、第1~第6診断データD21~D26のそれぞれに対応付けて、第1数の標的タンパク質TPのそれぞれについての第4標準化指標値を求める。 In the fourth standardization process step, the data calculation unit 32 subtracts the value at the 25th percentile from each of the third standardized index values for the first number of target proteins TP for each of the first to sixth diagnostic data D21 to D26, and divides the subtracted value by the standard deviation. In this way, the data calculation unit 32 obtains a fourth standardized index value for each of the first number of target proteins TP in association with each of the first to sixth diagnostic data D21 to D26.
 第5の標準化処理ステップにおいて、データ算出部32は、第1~第6診断データD21~D26ごとに、第1数の標的タンパク質TPのそれぞれについての各第4標準化指標値を75パーセンタイルにある値で除算し、その除算値に「10」を乗算する。これにより、データ算出部32は、第1~第6診断データD21~D26のそれぞれに対応付けて、第1数の標的タンパク質TPのそれぞれについての標準化された抗体データD3を求める。 In the fifth standardization process step, the data calculation unit 32 divides each of the fourth standardized index values for the first number of target proteins TP by the value at the 75th percentile for each of the first to sixth diagnostic data D21 to D26, and multiplies the divided value by "10". In this way, the data calculation unit 32 obtains standardized antibody data D3 for each of the first number of target proteins TP in association with each of the first to sixth diagnostic data D21 to D26.
 データ算出部32は、上記の第1~第5の標準化処理ステップに従って標準化処理を行うことにより、タンパク質マイクロアレイ4の基板41上における標的タンパク質TPの固定量の相違に起因した測定データD1の測定誤差、第1~第6診断データD21~D26のそれぞれに対応した測定データD1の間の測定誤差を適切に吸収した、標準化された抗体データD3を算出することができる。この場合、抗原選定部33は、標準化された抗体データD3を入力データとして機械学習モデルLMに入力することにより、診断データD2で示される複数の疾患のそれぞれに対応して、的確な特定の標的タンパク質TPSを出力データとして出力することができる。 By performing standardization processing according to the first to fifth standardization processing steps described above, the data calculation unit 32 can calculate standardized antibody data D3 that appropriately absorbs the measurement error of the measurement data D1 caused by differences in the amount of the target protein TP fixed on the substrate 41 of the protein microarray 4 and the measurement error between the measurement data D1 corresponding to each of the first to sixth diagnostic data D21 to D26. In this case, the antigen selection unit 33 can input the standardized antibody data D3 as input data to the machine learning model LM, and output an accurate specific target protein TPS as output data corresponding to each of the multiple diseases indicated in the diagnostic data D2.
 本実施形態に係る抗原探索装置3において、抗原選定部33は、OMIM、Genecards、AAgAtlas等の既存のデータベースに登録された情報に基づいて、複数の疾患のそれぞれに関連する各種のタンパク質を抽出するように構成されてもよい。この場合、抗原選定部33は、複数の疾患のそれぞれに関連する疾患関連抗原AGと相関を有するタンパク質として、機械学習モデルLMから出力された特定の標的タンパク質TPSに対し、抽出した各種のタンパク質と共通するタンパク質に絞り込むことができる。 In the antigen search device 3 according to this embodiment, the antigen selection unit 33 may be configured to extract various proteins associated with each of a plurality of diseases based on information registered in existing databases such as OMIM, Genecards, and AAgAtlas. In this case, the antigen selection unit 33 can narrow down the proteins that are correlated with the disease-related antigen AG associated with each of a plurality of diseases to those that are common to the various proteins extracted for a specific target protein TPS output from the machine learning model LM.
 なお、上述した具体的実施形態には以下の構成を有する発明が主に含まれている。 The specific embodiments described above mainly include inventions having the following configurations:
 本発明の一の局面に係る抗原探索方法は、基板上に第1数の複数の標的タンパク質が固定されてなるタンパク質マイクロアレイを用いて、複数の疾患のそれぞれに関連する疾患関連抗原と相関を有する標的タンパク質を探索する方法である。この抗原探索方法は、複数の被検体に由来する複数の被検体試料のそれぞれを前記第1数の標的タンパク質に接触させたときの所定の特徴量に関する、前記第1数の標的タンパク質のそれぞれについての測定データを取得するデータ取得工程と、前記測定データに基づいて、前記複数の被検体試料中の自己抗体と前記第1数の標的タンパク質との抗原抗体反応に伴う結合性に関する、前記第1数の標的タンパク質のそれぞれについての抗体データを、前記複数の被検体のそれぞれに対する複数の疾患に関する診断結果を示す診断データと対応付けて算出するデータ算出工程と、前記診断データと対応付けられた前記第1数の標的タンパク質のそれぞれについての前記抗体データを入力データとする機械学習モデルを生成し、前記機械学習モデルに従って、前記診断データで示される複数の疾患のそれぞれに対応して、前記第1数の標的タンパク質の中から前記抗体データに基づき前記疾患関連抗原と相関を有する特定の標的タンパク質を選定し、当該選定した前記特定の標的タンパク質を出力データとして出力する抗原選定工程と、を含む。  An antigen discovery method according to one aspect of the present invention is a method for discovering target proteins that are correlated with disease-related antigens associated with each of a plurality of diseases, using a protein microarray in which a first number of target proteins are fixed on a substrate. This antigen discovery method includes a data acquisition step of acquiring measurement data for each of the first number of target proteins, which is related to a predetermined feature amount when each of a plurality of specimen samples derived from a plurality of subjects is contacted with the first number of target proteins; a data calculation step of calculating, based on the measurement data, antibody data for each of the first number of target proteins, which is related to binding associated with an antigen-antibody reaction between an autoantibody in the plurality of specimen samples and the first number of target proteins, in association with diagnostic data showing diagnostic results for a plurality of diseases for each of the plurality of subjects; and an antigen selection step of generating a machine learning model using the antibody data for each of the first number of target proteins associated with the diagnostic data as input data, selecting a specific target protein that is correlated with the disease-related antigen from among the first number of target proteins based on the antibody data, corresponding to each of a plurality of diseases shown in the diagnostic data, according to the machine learning model, and outputting the selected specific target protein as output data.
 この抗原探索方法によれば、タンパク質マイクロアレイの基板上に固定された第1数の標的タンパク質と複数の被検体試料中の自己抗体との抗原抗体反応に伴う結合性に関する抗体データを入力データとして機械学習モデルに入力することにより、診断データで示される複数の疾患のそれぞれに対応して、第1数の標的タンパク質の中から抗体データに基づき選定された疾患関連抗原と相関を有する特定の標的タンパク質を出力データとして出力することができる。これにより、複数の疾患のそれぞれに対応して個別に標的タンパク質を探索する場合と比較して、複数の疾患のそれぞれに関連する各疾患関連抗原と相関を有する特定の標的タンパク質を効率よく探索することが可能である。 According to this antigen search method, antibody data relating to the binding associated with the antigen-antibody reaction between a first number of target proteins fixed on a protein microarray substrate and autoantibodies in multiple test samples is input as input data into a machine learning model, and specific target proteins that are correlated with disease-associated antigens selected from the first number of target proteins based on the antibody data for each of the multiple diseases indicated in the diagnostic data can be output as output data. This makes it possible to efficiently search for specific target proteins that are correlated with each disease-associated antigen associated with each of the multiple diseases, compared to searching for target proteins individually for each of the multiple diseases.
 上記の抗原探索方法において、前記抗体データは、疾患を有しないとの診断結果を示す前記診断データと対応付けられた第1抗体データのデータ群と、疾患を有するとの診断結果を示す前記診断データと対応付けられた第2抗体データのデータ群と、を含んでもよい。この場合、前記抗原選定工程では、前記第1抗体データが第1閾値以下であり且つ前記第2抗体データが前記第1閾値よりも大きい第2閾値以上である条件を満たすとともに、前記第1数よりも少ない第2数の複数の標的タンパク質を前記第1数の標的タンパク質の中から抽出し、前記第2数の標的タンパク質の中から前記特定の標的タンパク質を選定する。 In the above antigen search method, the antibody data may include a data group of first antibody data associated with the diagnostic data indicating a diagnosis result of not having a disease, and a data group of second antibody data associated with the diagnostic data indicating a diagnosis result of having a disease. In this case, the antigen selection step satisfies the condition that the first antibody data is equal to or less than a first threshold value and the second antibody data is equal to or greater than a second threshold value that is greater than the first threshold value, and a second number of target proteins less than the first number are extracted from the first number of target proteins, and the specific target protein is selected from the second number of target proteins.
 この態様では、診断データで示される複数の疾患のそれぞれについて、機械学習モデルに従って疾患関連抗原と相関を有する特定の標的タンパク質を選定する際に、タンパク質マイクロアレイの基板上に固定された第1数の標的タンパク質よりも少ない第2数の標的タンパク質の中から特定の標的タンパク質を選定する。これにより、機械学習モデルを用いて特定の標的タンパク質を選定するのに要する時間の短縮化を図ることができるとともに、複数の疾患のそれぞれに対応して特定の標的タンパク質をより的確に探索することが可能である。 In this embodiment, when selecting a specific target protein that is correlated with a disease-related antigen for each of a plurality of diseases indicated in the diagnostic data according to the machine learning model, the specific target protein is selected from a second number of target proteins that is less than the first number of target proteins immobilized on the substrate of the protein microarray. This makes it possible to shorten the time required to select a specific target protein using the machine learning model, and to more accurately search for a specific target protein corresponding to each of a plurality of diseases.
 上記の抗原探索方法において、前記抗原選定工程では、前記第2数の標的タンパク質のそれぞれについての前記抗体データの間の類似度を求め、前記類似度に基づく階層的クラスタリングによって前記第2数の標的タンパク質を、前記診断データで示される複数の疾患のそれぞれに対応して複数のグループに分類することにより、前記複数のグループにそれぞれ属する標的タンパク質を前記特定の標的タンパク質として選定してもよい。 In the antigen search method described above, the antigen selection step may involve determining a similarity between the antibody data for each of the second number of target proteins, and classifying the second number of target proteins into a plurality of groups corresponding to each of a plurality of diseases indicated in the diagnostic data by hierarchical clustering based on the similarity, thereby selecting target proteins belonging to each of the plurality of groups as the specific target proteins.
 この態様では、第2数の標的タンパク質のそれぞれについての抗体データの間の類似度に基づいて、階層的クラスタリングによって第2数の標的タンパク質を複数のグループに分類する。そして、複数のグループにそれぞれ属する標的タンパク質を、複数の疾患のそれぞれに関連する各疾患関連抗原と相関を有する特定の標的タンパク質として選定することができる。 In this embodiment, the second number of target proteins are classified into a plurality of groups by hierarchical clustering based on the similarity between the antibody data for each of the second number of target proteins. Then, the target proteins belonging to each of the plurality of groups can be selected as specific target proteins that are correlated with each disease-associated antigen associated with each of the plurality of diseases.
 階層的クラスタリングでは、第2数の標的タンパク質のそれぞれについての抗体データが1個ずつのクラスタに割り当てられるところから開始され、抗体データの類似したクラスタが再帰的に結合される。階層的クラスタリングには、結合するクラスタを選択する基準によって、最短距離法、最長距離法、群平均法などの手法がある。クラスタ間の類似度は、クラスタを跨る抗体データの間の類似度で定義される。クラスタ間の類似度が大きい組から順次結合して1個のクラスタとしていき、全てのクラスタ間の類似度が予め定めた類似度基準値を下回ったときに結合を停止する。階層的クラスタリングでは、クラスタの結合が停止されたときに、第2数の標的タンパク質が複数の疾患のそれぞれに対応した複数のグループに分類される。この際、各グループに属する標的タンパク質の数について最大制限数を設定しておき、結合する2個のクラスタに含まれる抗体データの数の合計が前記最大制限数より大きい場合は類似度をゼロとみなすことにより、各グループに属する標的タンパク質の数が前記最大制限数を超えないようにすることが可能である。 In hierarchical clustering, antibody data for each of the second number of target proteins is assigned to one cluster, and clusters with similar antibody data are recursively combined. Hierarchical clustering includes methods such as the shortest distance method, the longest distance method, and the group average method, depending on the criteria for selecting clusters to be combined. The similarity between clusters is defined as the similarity between antibody data across clusters. Clusters are combined in order from the pair with the greatest similarity to form one cluster, and the combination is stopped when the similarity between all clusters falls below a predetermined similarity reference value. In hierarchical clustering, when the combination of clusters is stopped, the second number of target proteins are classified into multiple groups corresponding to each of multiple diseases. In this case, a maximum limit is set for the number of target proteins belonging to each group, and if the total number of antibody data included in the two clusters to be combined is greater than the maximum limit, the similarity is considered to be zero, making it possible to prevent the number of target proteins belonging to each group from exceeding the maximum limit.
 上記の抗原探索方法において、前記抗原選定工程では、前記複数のグループにそれぞれ属する各標的タンパク質の各前記抗体データに対応した前記診断データで示される各疾患が同一となることを目標として、遺伝的アルゴリズムを用いて前記機械学習モデルを生成してもよい。 In the antigen search method described above, in the antigen selection step, the machine learning model may be generated using a genetic algorithm with the goal that the diseases indicated in the diagnostic data corresponding to the antibody data for each of the target proteins belonging to each of the multiple groups are identical.
 この態様では、診断データと対応付けられた第2数の標的タンパク質のそれぞれについての抗体データを入力データとし、階層的クラスタリングによって分類された各グループに属する特定の標的タンパク質を出力データとする機械学習モデルを、遺伝的アルゴリズムを用いて生成する。遺伝的アルゴリズムを用いることによって、階層的クラスタリングによって分類された各グループに属する各標的タンパク質の各抗体データに対応した診断データで示される各疾患が同一となることを目標として、機械学習モデルを生成することができる。 In this embodiment, a machine learning model is generated using a genetic algorithm, in which antibody data for each of a second number of target proteins associated with diagnostic data is used as input data, and specific target proteins belonging to each group classified by hierarchical clustering are used as output data. By using a genetic algorithm, a machine learning model can be generated with the goal of identifying the same disease indicated by diagnostic data corresponding to each antibody data for each target protein belonging to each group classified by hierarchical clustering.
 遺伝的アルゴリズムでは、例えば、階層的クラスタリングによって分類された各グループに属する各標的タンパク質に対応した各疾患の同一性について評価し、各疾患が同一の場合は第1の報酬を与え、各疾患が同一ではない場合には第1の報酬よりも低い第2の報酬を与える。このような遺伝的アルゴリズムを繰り返し行うことによって、階層的クラスタリングによって分類された各グループに属する各標的タンパク質に対応した各疾患が同一となるような機械学習モデルを生成することができる。これにより、遺伝的アルゴリズムを用いて生成される機械学習モデルに対し、第2数の標的タンパク質のそれぞれについての抗体データが入力データとして入力されると、階層的クラスタリングによって第2数の標的タンパク質を、診断データで示される複数の疾患のそれぞれに対応して複数のグループに分類することができる。 In the genetic algorithm, for example, the identity of each disease corresponding to each target protein belonging to each group classified by hierarchical clustering is evaluated, and if the diseases are the same, a first reward is given, and if the diseases are not the same, a second reward lower than the first reward is given. By repeatedly performing such a genetic algorithm, it is possible to generate a machine learning model in which the diseases corresponding to each target protein belonging to each group classified by hierarchical clustering are the same. In this way, when antibody data for each of the second number of target proteins is input as input data to the machine learning model generated using the genetic algorithm, the second number of target proteins can be classified by hierarchical clustering into multiple groups corresponding to each of the multiple diseases indicated in the diagnostic data.
 上記の抗原探索方法において、前記データ算出工程では、前記診断データで示される複数の疾患のそれぞれに対応した前記測定データの間の測定誤差を吸収するための所定の標準化処理を行うことにより、標準化された前記抗体データを前記第1数の標的タンパク質のそれぞれについて算出してもよい。 In the above antigen search method, the data calculation step may involve performing a predetermined standardization process to absorb measurement errors between the measurement data corresponding to each of the multiple diseases indicated in the diagnostic data, thereby calculating standardized antibody data for each of the first number of target proteins.
 タンパク質マイクロアレイを用いた場合の測定データは、再現性が低いことがある。このため、複数の疾患のそれぞれに対応して複数のタンパク質マイクロアレイを用いて各疾患に関連する疾患関連抗原と相関を有する特定の標的タンパク質を探索する場合、複数の疾患のそれぞれに対応した測定データの間の測定誤差を考慮する必要がある。そこで、機械学習モデルに対して入力データとして入力される抗体データについて、複数の疾患のそれぞれに対応した測定データの間の測定誤差を吸収するための所定の標準化処理を行うことにより、標準化された抗体データを第1数の標的タンパク質のそれぞれについて算出する。この場合、標準化された抗体データを入力データとして機械学習モデルに入力することにより、診断データで示される複数の疾患のそれぞれに対応して、的確な特定の標的タンパク質を出力データとして出力することができる。 When using protein microarrays, the measurement data may have low reproducibility. For this reason, when searching for specific target proteins that are correlated with disease-related antigens related to each of a plurality of diseases using multiple protein microarrays corresponding to each of a plurality of diseases, it is necessary to take into account the measurement error between the measurement data corresponding to each of the plurality of diseases. Therefore, for the antibody data input as input data to the machine learning model, a predetermined standardization process is performed to absorb the measurement error between the measurement data corresponding to each of the plurality of diseases, and standardized antibody data is calculated for each of the first number of target proteins. In this case, by inputting the standardized antibody data as input data to the machine learning model, it is possible to output accurate specific target proteins as output data corresponding to each of the plurality of diseases indicated in the diagnostic data.
 上記の抗原探索方法において、前記データ算出工程では、前記診断データで示される複数の疾患ごとの前記抗体データについて、少なくとも一のパーセンタイルにある値を用いて前記標準化処理を行うことにより、標準化された前記抗体データを前記第1数の標的タンパク質のそれぞれについて算出してもよい。 In the above antigen discovery method, the data calculation step may perform the standardization process using a value at least at one percentile for the antibody data for each of the multiple diseases indicated in the diagnostic data, thereby calculating standardized antibody data for each of the first number of target proteins.
 この態様では、診断データで示される複数の疾患ごとの抗体データについて、少なくとも一のパーセンタイルにある値を用いて標準化処理を行う。このパーセンタイル基準の標準化処理によって、複数の疾患のそれぞれに対応した測定データの間の測定誤差を適切に吸収した、標準化された抗体データを算出することができる。 In this embodiment, standardization processing is performed on the antibody data for each of the multiple diseases indicated in the diagnostic data, using a value at least at one percentile. This percentile-based standardization processing makes it possible to calculate standardized antibody data that appropriately absorbs the measurement error between the measurement data corresponding to each of the multiple diseases.
 上記の抗原探索方法において、前記測定データは、前記第1数の標的タンパク質の何れに対しても結合する所定のリファレンス抗体に結合可能であって、第1蛍光物質で標識された第1二次抗体において、前記第1蛍光物質から放出された蛍光の強度を前記特徴量とする第1蛍光強度値と、前記複数の被検体試料中の自己抗体に結合可能であって、第2蛍光物質で標識された第2二次抗体において、前記第2蛍光物質から放出された蛍光の強度を前記特徴量とする第2蛍光強度値と、を含んでもよい。この場合、前記データ算出工程では、前記第1蛍光強度値及び前記第2蛍光強度値を用いて前記標準化処理を行うことにより、標準化された前記抗体データを前記第1数の標的タンパク質のそれぞれについて算出する。 In the above antigen search method, the measurement data may include a first fluorescence intensity value, the feature value being the intensity of fluorescence emitted from a first fluorescent substance in a first secondary antibody that is capable of binding to a predetermined reference antibody that binds to any of the first number of target proteins and is labeled with a first fluorescent substance, and a second fluorescence intensity value, the feature value being the intensity of fluorescence emitted from a second fluorescent substance in a second secondary antibody that is capable of binding to autoantibodies in the multiple test samples and is labeled with a second fluorescent substance. In this case, in the data calculation step, the standardization process is performed using the first fluorescence intensity value and the second fluorescence intensity value, thereby calculating the standardized antibody data for each of the first number of target proteins.
 タンパク質マイクロアレイにおいては、基板上に固定された第1数の標的タンパク質の量にばらつきが生じていることがある。このため、複数の疾患のそれぞれに対応してタンパク質マイクロアレイを用いて各疾患に関連する疾患関連抗原と相関を有する特定の標的タンパク質を探索する場合、複数の疾患のそれぞれに対応した測定データについて、基板上における標的タンパク質の固定量の相違に起因した測定誤差を考慮する必要がある。 In protein microarrays, there may be variation in the amount of a first number of target proteins immobilized on a substrate. For this reason, when searching for specific target proteins that correlate with disease-associated antigens related to each of a number of diseases using protein microarrays corresponding to each of a number of diseases, it is necessary to take into account measurement errors caused by differences in the amount of target proteins immobilized on the substrate for the measurement data corresponding to each of a number of diseases.
 そこで、タンパク質マイクロアレイの基板上に固定された第1数の標的タンパク質のそれぞれについての測定データとして、第1数の全ての標的タンパク質に結合した所定のリファレンス抗体に結合された第1二次抗体の第1蛍光物質から放出された蛍光の強度に関する第1蛍光強度値と、被検体試料中の自己抗体に結合された第2二次抗体の第2蛍光物質から放出された蛍光の強度に関する第2蛍光強度値と、を取得する。そして、機械学習モデルに対して入力データとして入力される抗体データについて、第1蛍光強度値及び第2蛍光強度値を用いて標準化処理を行うことにより、標準化された抗体データを第1数の標的タンパク質のそれぞれについて算出する。これにより、複数の疾患のそれぞれに対応した測定データについて、基板上における標的タンパク質の固定量の相違に起因した測定誤差を適切に吸収した、標準化された抗体データを算出することができる。この場合、標準化された抗体データを入力データとして機械学習モデルに入力することにより、診断データで示される複数の疾患のそれぞれに対応して、的確な特定の標的タンパク質を出力データとして出力することができる。 Therefore, as measurement data for each of the first number of target proteins fixed on the substrate of the protein microarray, a first fluorescence intensity value related to the intensity of fluorescence emitted from a first fluorescent substance of a first secondary antibody bound to a predetermined reference antibody bound to all of the first number of target proteins, and a second fluorescence intensity value related to the intensity of fluorescence emitted from a second fluorescent substance of a second secondary antibody bound to an autoantibody in the test sample are obtained. Then, standardization processing is performed using the first fluorescence intensity value and the second fluorescence intensity value for the antibody data input as input data to the machine learning model, thereby calculating standardized antibody data for each of the first number of target proteins. This makes it possible to calculate standardized antibody data for measurement data corresponding to each of the multiple diseases, in which measurement errors caused by differences in the amount of target protein fixed on the substrate are appropriately absorbed. In this case, by inputting the standardized antibody data as input data to the machine learning model, it is possible to output accurate specific target proteins as output data corresponding to each of the multiple diseases indicated in the diagnostic data.
 本発明の他の局面に係る抗原探索システムは、基板上に第1数の複数の標的タンパク質が固定されてなるタンパク質マイクロアレイを用いて、複数の被検体に由来する複数の被検体試料のそれぞれを前記第1数の標的タンパク質に接触させたときの所定の特徴量を測定し、その測定結果を示す前記第1数の標的タンパク質のそれぞれについての測定データを出力する測定装置と、複数の疾患のそれぞれに関連する疾患関連抗原と相関を有する標的タンパク質を探索する抗原探索装置と、を備える。前記抗原探索装置は、前記測定データを取得するデータ取得部と、前記測定データに基づいて、前記複数の被検体試料中の自己抗体と前記第1数の標的タンパク質との抗原抗体反応に伴う結合性に関する、前記第1数の標的タンパク質のそれぞれについての抗体データを、前記複数の被検体のそれぞれに対する複数の疾患に関する診断結果を示す診断データと対応付けて算出するデータ算出部と、前記診断データと対応付けられた前記第1数の標的タンパク質のそれぞれについての前記抗体データを入力データとする機械学習モデルを生成し、前記機械学習モデルに従って、前記診断データで示される複数の疾患のそれぞれに対応して、前記第1数の標的タンパク質の中から前記抗体データに基づき前記疾患関連抗原と相関を有する特定の標的タンパク質を選定し、当該選定した前記特定の標的タンパク質を出力データとして出力する抗原選定部と、を含む。 An antigen search system according to another aspect of the present invention comprises a measurement device that uses a protein microarray having a first number of target proteins fixed on a substrate to measure a predetermined characteristic amount when each of a plurality of specimen samples derived from a plurality of specimens is contacted with the first number of target proteins, and outputs measurement data for each of the first number of target proteins indicating the measurement results, and an antigen search device that searches for target proteins that have a correlation with disease-associated antigens associated with each of a plurality of diseases. The antigen search device includes a data acquisition unit that acquires the measurement data, a data calculation unit that calculates antibody data for each of the first number of target proteins related to binding associated with an antigen-antibody reaction between the autoantibody in the multiple test specimens and the first number of target proteins based on the measurement data, in association with diagnostic data showing diagnostic results for multiple diseases for each of the multiple test specimens, and an antigen selection unit that generates a machine learning model using the antibody data for each of the first number of target proteins associated with the diagnostic data as input data, selects a specific target protein that has a correlation with the disease-related antigen from among the first number of target proteins based on the antibody data in accordance with the machine learning model, corresponding to each of the multiple diseases shown in the diagnostic data, and outputs the selected specific target protein as output data.
 この抗原探索システムによれば、抗原探索装置の抗原選定部は、タンパク質マイクロアレイの基板上に固定された第1数の標的タンパク質と複数の被検体試料中の自己抗体との抗原抗体反応に伴う結合性に関する抗体データを入力データとして機械学習モデルに入力することにより、診断データで示される複数の疾患のそれぞれに対応して、第1数の標的タンパク質の中から抗体データに基づき選定された疾患関連抗原と相関を有する特定の標的タンパク質を出力データとして出力することができる。これにより、複数の疾患のそれぞれに対応して個別に標的タンパク質を探索する場合と比較して、複数の疾患のそれぞれに関連する各疾患関連抗原と相関を有する特定の標的タンパク質を効率よく探索することが可能である。 According to this antigen search system, the antigen selection unit of the antigen search device inputs antibody data relating to the binding associated with the antigen-antibody reaction between a first number of target proteins fixed on a protein microarray substrate and autoantibodies in multiple test samples as input data into a machine learning model, and is then able to output, as output data, specific target proteins that are correlated with disease-associated antigens selected from the first number of target proteins based on the antibody data, corresponding to each of the multiple diseases indicated in the diagnostic data. This makes it possible to efficiently search for specific target proteins that are correlated with each disease-associated antigen associated with each of the multiple diseases, compared to searching for target proteins individually corresponding to each of the multiple diseases.
 以上説明したように、本発明によれば、基板上に複数の標的タンパク質が固定されてなるタンパク質マイクロアレイを用いて、複数の疾患のそれぞれに関連する疾患関連抗原と相関を有する標的タンパク質を効率よく探索することが可能な抗原探索方法及び抗原探索システムを提供することができる。

 
As described above, according to the present invention, it is possible to provide an antigen discovery method and an antigen discovery system that can efficiently discover target proteins that are correlated with disease-related antigens that are associated with each of a plurality of diseases, using a protein microarray in which a plurality of target proteins are fixed onto a substrate.

Claims (8)

  1.  基板上に第1数の複数の標的タンパク質が固定されてなるタンパク質マイクロアレイを用いて、複数の疾患のそれぞれに関連する疾患関連抗原と相関を有する標的タンパク質を探索する抗原探索方法であって、
     複数の被検体に由来する複数の被検体試料のそれぞれを前記第1数の標的タンパク質に接触させたときの所定の特徴量に関する、前記第1数の標的タンパク質のそれぞれについての測定データを取得するデータ取得工程と、
     前記測定データに基づいて、前記複数の被検体試料中の自己抗体と前記第1数の標的タンパク質との抗原抗体反応に伴う結合性に関する、前記第1数の標的タンパク質のそれぞれについての抗体データを、前記複数の被検体のそれぞれに対する複数の疾患に関する診断結果を示す診断データと対応付けて算出するデータ算出工程と、
     前記診断データと対応付けられた前記第1数の標的タンパク質のそれぞれについての前記抗体データを入力データとする機械学習モデルを生成し、前記機械学習モデルに従って、前記診断データで示される複数の疾患のそれぞれに対応して、前記第1数の標的タンパク質の中から前記抗体データに基づき前記疾患関連抗原と相関を有する特定の標的タンパク質を選定し、当該選定した前記特定の標的タンパク質を出力データとして出力する抗原選定工程と、を含む、抗原探索方法。
    1. An antigen discovery method for discovering a target protein having a correlation with a disease-associated antigen associated with each of a plurality of diseases, using a protein microarray having a first number of target proteins immobilized on a substrate, the method comprising:
    a data acquiring step of acquiring measurement data for each of the first number of target proteins, the measurement data relating to a predetermined feature amount when each of a plurality of test specimen samples derived from a plurality of test specimens is contacted with the first number of target proteins;
    a data calculation step of calculating antibody data for each of the first number of target proteins, which is related to binding properties associated with an antigen-antibody reaction between the autoantibody in the plurality of test specimens and the first number of target proteins, based on the measurement data, in association with diagnostic data indicating diagnostic results for a plurality of diseases for each of the plurality of test specimens;
    an antigen selection step of generating a machine learning model using the antibody data for each of the first number of target proteins associated with the diagnostic data as input data, selecting a specific target protein that has a correlation with the disease-associated antigen from among the first number of target proteins based on the antibody data, corresponding to each of a plurality of diseases indicated in the diagnostic data, in accordance with the machine learning model, and outputting the selected specific target protein as output data.
  2.  前記抗体データは、疾患を有しないとの診断結果を示す前記診断データと対応付けられた第1抗体データのデータ群と、疾患を有するとの診断結果を示す前記診断データと対応付けられた第2抗体データのデータ群と、を含み、
     前記抗原選定工程では、前記第1抗体データが第1閾値以下であり且つ前記第2抗体データが前記第1閾値よりも大きい第2閾値以上である条件を満たすとともに、前記第1数よりも少ない第2数の複数の標的タンパク質を前記第1数の標的タンパク質の中から抽出し、前記第2数の標的タンパク質の中から前記特定の標的タンパク質を選定する、請求項1に記載の抗原探索方法。
    the antibody data includes a data group of first antibody data associated with the diagnostic data indicating a diagnosis result of not having a disease, and a data group of second antibody data associated with the diagnostic data indicating a diagnosis result of having a disease,
    2. The antigen searching method according to claim 1, wherein the antigen selection step satisfies the conditions that the first antibody data is equal to or less than a first threshold value and the second antibody data is equal to or greater than a second threshold value that is greater than the first threshold value, and a second number of target proteins less than the first number are extracted from the first number of target proteins, and the specific target protein is selected from the second number of target proteins.
  3.  前記抗原選定工程では、前記第2数の標的タンパク質のそれぞれについての前記抗体データの間の類似度を求め、前記類似度に基づく階層的クラスタリングによって前記第2数の標的タンパク質を、前記診断データで示される複数の疾患のそれぞれに対応して複数のグループに分類することにより、前記複数のグループにそれぞれ属する標的タンパク質を前記特定の標的タンパク質として選定する、請求項2に記載の抗原探索方法。 The antigen search method according to claim 2, wherein in the antigen selection step, a similarity between the antibody data for each of the second number of target proteins is calculated, and the second number of target proteins are classified into a plurality of groups corresponding to each of a plurality of diseases indicated in the diagnostic data by hierarchical clustering based on the similarity, thereby selecting a target protein belonging to each of the plurality of groups as the specific target protein.
  4.  前記抗原選定工程では、前記複数のグループにそれぞれ属する各標的タンパク質の各前記抗体データに対応した前記診断データで示される各疾患が同一となることを目標として、遺伝的アルゴリズムを用いて前記機械学習モデルを生成する、請求項3に記載の抗原探索方法。 The antigen search method according to claim 3, wherein in the antigen selection step, the machine learning model is generated using a genetic algorithm with the goal that each disease indicated in the diagnostic data corresponding to each of the antibody data for each of the target proteins belonging to each of the multiple groups is the same.
  5.  前記データ算出工程では、前記診断データで示される複数の疾患のそれぞれに対応した前記測定データの間の測定誤差を吸収するための所定の標準化処理を行うことにより、標準化された前記抗体データを前記第1数の標的タンパク質のそれぞれについて算出する、請求項1に記載の抗原探索方法。 The antigen search method according to claim 1, wherein in the data calculation step, a predetermined standardization process is performed to absorb measurement errors between the measurement data corresponding to each of the multiple diseases indicated in the diagnostic data, thereby calculating standardized antibody data for each of the first number of target proteins.
  6.  前記データ算出工程では、前記診断データで示される複数の疾患ごとの前記抗体データについて、少なくとも一のパーセンタイルにある値を用いて前記標準化処理を行うことにより、標準化された前記抗体データを前記第1数の標的タンパク質のそれぞれについて算出する、請求項5に記載の抗原探索方法。 The antigen discovery method according to claim 5, wherein in the data calculation step, the standardization process is performed using at least one percentile value for the antibody data for each of the multiple diseases indicated in the diagnostic data, thereby calculating standardized antibody data for each of the first number of target proteins.
  7.  前記測定データは、
      前記第1数の標的タンパク質の何れに対しても結合する所定のリファレンス抗体に結合可能であって、第1蛍光物質で標識された第1二次抗体において、前記第1蛍光物質から放出された蛍光の強度を前記特徴量とする第1蛍光強度値と、
      前記複数の被検体試料中の自己抗体に結合可能であって、第2蛍光物質で標識された第2二次抗体において、前記第2蛍光物質から放出された蛍光の強度を前記特徴量とする第2蛍光強度値と、を含み、
     前記データ算出工程では、前記第1蛍光強度値及び前記第2蛍光強度値を用いて前記標準化処理を行うことにより、標準化された前記抗体データを前記第1数の標的タンパク質のそれぞれについて算出する、請求項5に記載の抗原探索方法。
    The measurement data is
    a first secondary antibody capable of binding to a predetermined reference antibody that binds to any of the first number of target proteins and that is labeled with a first fluorescent substance, the first secondary antibody having an intensity of fluorescence emitted from the first fluorescent substance as the feature amount; and
    a second fluorescent intensity value, the second fluorescent intensity value being determined as the characteristic quantity based on an intensity of fluorescent light emitted from a second fluorescent substance in a second secondary antibody capable of binding to an autoantibody in the plurality of test samples and labeled with a second fluorescent substance;
    The antigen searching method according to claim 5, wherein in the data calculation step, the standardization process is performed using the first fluorescence intensity value and the second fluorescence intensity value, thereby calculating standardized antibody data for each of the first number of target proteins.
  8.  基板上に第1数の複数の標的タンパク質が固定されてなるタンパク質マイクロアレイを用いて、複数の被検体に由来する複数の被検体試料のそれぞれを前記第1数の標的タンパク質に接触させたときの所定の特徴量を測定し、その測定結果を示す前記第1数の標的タンパク質のそれぞれについての測定データを出力する測定装置と、
     複数の疾患のそれぞれに関連する疾患関連抗原と相関を有する標的タンパク質を探索する抗原探索装置と、を備え、
     前記抗原探索装置は、
      前記測定データを取得するデータ取得部と、
      前記測定データに基づいて、前記複数の被検体試料中の自己抗体と前記第1数の標的タンパク質との抗原抗体反応に伴う結合性に関する、前記第1数の標的タンパク質のそれぞれについての抗体データを、前記複数の被検体のそれぞれに対する複数の疾患に関する診断結果を示す診断データと対応付けて算出するデータ算出部と、
      前記診断データと対応付けられた前記第1数の標的タンパク質のそれぞれについての前記抗体データを入力データとする機械学習モデルを生成し、前記機械学習モデルに従って、前記診断データで示される複数の疾患のそれぞれに対応して、前記第1数の標的タンパク質の中から前記抗体データに基づき前記疾患関連抗原と相関を有する特定の標的タンパク質を選定し、当該選定した前記特定の標的タンパク質を出力データとして出力する抗原選定部と、を含む、抗原探索システム。
    a measuring device that uses a protein microarray having a first number of target proteins immobilized on a substrate, measures a predetermined feature amount when each of a plurality of test specimen samples derived from a plurality of test specimens is contacted with the first number of target proteins, and outputs measurement data for each of the first number of target proteins that indicates the measurement results;
    An antigen searching device that searches for a target protein having a correlation with a disease-related antigen associated with each of a plurality of diseases,
    The antigen search device is
    A data acquisition unit for acquiring the measurement data;
    a data calculation unit that calculates antibody data for each of the first number of target proteins, which is related to binding properties associated with an antigen-antibody reaction between the autoantibody in the plurality of test specimens and the first number of target proteins, based on the measurement data, in association with diagnostic data indicating diagnostic results for a plurality of diseases for each of the plurality of test specimens;
    an antigen selection unit that generates a machine learning model using the antibody data for each of the first number of target proteins associated with the diagnostic data as input data, selects a specific target protein that has a correlation with the disease-associated antigen from among the first number of target proteins based on the antibody data, corresponding to each of a plurality of diseases indicated in the diagnostic data, in accordance with the machine learning model, and outputs the selected specific target protein as output data.
PCT/JP2023/033322 2022-10-27 2023-09-13 Antigen discovery method and antigen discovery system WO2024090062A1 (en)

Applications Claiming Priority (2)

Application Number Priority Date Filing Date Title
JP2022172488A JP2024064128A (en) 2022-10-27 Antigen discovery method and antigen discovery system
JP2022-172488 2022-10-27

Publications (1)

Publication Number Publication Date
WO2024090062A1 true WO2024090062A1 (en) 2024-05-02

Family

ID=90830487

Family Applications (1)

Application Number Title Priority Date Filing Date
PCT/JP2023/033322 WO2024090062A1 (en) 2022-10-27 2023-09-13 Antigen discovery method and antigen discovery system

Country Status (1)

Country Link
WO (1) WO2024090062A1 (en)

Citations (3)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
JP2004534213A (en) * 2001-04-10 2004-11-11 ザ ボード オブ トラスティーズ オブ ザ リーランド スタンフォード ジュニア ユニバーシティ Methods for therapeutic and diagnostic use of antibody-specific profiles
JP2021521536A (en) * 2018-04-13 2021-08-26 フリーノーム・ホールディングス・インコーポレイテッドFreenome Holdings, Inc. Machine learning implementation for multi-sample assay of biological samples
WO2022076237A1 (en) * 2020-10-05 2022-04-14 Freenome Holdings, Inc. Markers for the early detection of colon cell proliferative disorders

Patent Citations (3)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
JP2004534213A (en) * 2001-04-10 2004-11-11 ザ ボード オブ トラスティーズ オブ ザ リーランド スタンフォード ジュニア ユニバーシティ Methods for therapeutic and diagnostic use of antibody-specific profiles
JP2021521536A (en) * 2018-04-13 2021-08-26 フリーノーム・ホールディングス・インコーポレイテッドFreenome Holdings, Inc. Machine learning implementation for multi-sample assay of biological samples
WO2022076237A1 (en) * 2020-10-05 2022-04-14 Freenome Holdings, Inc. Markers for the early detection of colon cell proliferative disorders

Similar Documents

Publication Publication Date Title
Gayoso et al. Joint probabilistic modeling of single-cell multi-omic data with totalVI
JP7434161B2 (en) Methods and systems for protein identification
Hartmann et al. Protein microarrays for diagnostic assays
Ellington et al. Antibody-based protein multiplex platforms: technical and operational challenges
Suprun et al. Novel Bead-Based Epitope Assay is a sensitive and reliable tool for profiling epitope-specific antibody repertoire in food allergy
JP2021508885A (en) Decoding approach for protein identification
US20110275537A1 (en) Method of biological and medical diagnostics using immune patterns obtained with arrays of peptide probes
Rosenberg et al. Protein microarrays: a new tool for the study of autoantibodies in immunodeficiency
US10126300B2 (en) Immunosignature based diagnosis and characterization of canine lymphoma
US20180231565A1 (en) Methods for determining the risk of a systemic lupus erythematosus (sle) patient to develop neuropsychiatric syndromes
Daly et al. Evaluating concentration estimation errors in ELISA microarray experiments
Toghi Eshghi et al. Quality assessment and interference detection in targeted mass spectrometry data using machine learning
Egertson et al. A theoretical framework for proteome-scale single-molecule protein identification using multi-affinity protein binding reagents
Bankova et al. Rapid immunoassays for diagnosis of heparin-induced thrombocytopenia: comparison of diagnostic accuracy, reproducibility, and costs in clinical practice
Rausch et al. Comparison of pre-processing methods for multiplex bead-based immunoassays
US20220299525A1 (en) Computational sensing with a multiplexed flow assays for high-sensitivity analyte quantification
WO2024090062A1 (en) Antigen discovery method and antigen discovery system
Kantor et al. Biomarker discovery by comprehensive phenotyping for autoimmune diseases
US20150023568A1 (en) Computing systems, computer-readable media and methods of antibody profiling
Lin et al. An antibody-based leukocyte-capture microarray for the diagnosis of systemic lupus erythematosus
JP2024064128A (en) Antigen discovery method and antigen discovery system
Kalina et al. Profiling of polychromatic flow cytometry data on B‐cells reveals patients' clusters in common variable immunodeficiency
JP2004533223A (en) Methods for associating genomic and proteomic pathways involved in physiological or pathophysiological processes
CN116635950A (en) Improvements in or relating to quantitative analysis of samples
CN106053825B (en) Standardizing reagent and method