WO2006004182A1 - 配列予測システム - Google Patents
配列予測システム Download PDFInfo
- Publication number
- WO2006004182A1 WO2006004182A1 PCT/JP2005/012542 JP2005012542W WO2006004182A1 WO 2006004182 A1 WO2006004182 A1 WO 2006004182A1 JP 2005012542 W JP2005012542 W JP 2005012542W WO 2006004182 A1 WO2006004182 A1 WO 2006004182A1
- Authority
- WO
- WIPO (PCT)
- Prior art keywords
- sequence
- data
- biopolymer
- unit
- database
- Prior art date
Links
Classifications
-
- G—PHYSICS
- G16—INFORMATION AND COMMUNICATION TECHNOLOGY [ICT] SPECIALLY ADAPTED FOR SPECIFIC APPLICATION FIELDS
- G16B—BIOINFORMATICS, i.e. INFORMATION AND COMMUNICATION TECHNOLOGY [ICT] SPECIALLY ADAPTED FOR GENETIC OR PROTEIN-RELATED DATA PROCESSING IN COMPUTATIONAL MOLECULAR BIOLOGY
- G16B40/00—ICT specially adapted for biostatistics; ICT specially adapted for bioinformatics-related machine learning or data mining, e.g. knowledge discovery or pattern finding
-
- G—PHYSICS
- G16—INFORMATION AND COMMUNICATION TECHNOLOGY [ICT] SPECIALLY ADAPTED FOR SPECIFIC APPLICATION FIELDS
- G16B—BIOINFORMATICS, i.e. INFORMATION AND COMMUNICATION TECHNOLOGY [ICT] SPECIALLY ADAPTED FOR GENETIC OR PROTEIN-RELATED DATA PROCESSING IN COMPUTATIONAL MOLECULAR BIOLOGY
- G16B30/00—ICT specially adapted for sequence analysis involving nucleotides or amino acids
-
- G—PHYSICS
- G16—INFORMATION AND COMMUNICATION TECHNOLOGY [ICT] SPECIALLY ADAPTED FOR SPECIFIC APPLICATION FIELDS
- G16B—BIOINFORMATICS, i.e. INFORMATION AND COMMUNICATION TECHNOLOGY [ICT] SPECIALLY ADAPTED FOR GENETIC OR PROTEIN-RELATED DATA PROCESSING IN COMPUTATIONAL MOLECULAR BIOLOGY
- G16B15/00—ICT specially adapted for analysing two-dimensional or three-dimensional molecular structures, e.g. structural or functional relations or structure alignment
Definitions
- the present invention relates to a sequence prediction system, and more particularly to a sequence prediction system and a sequence prediction database for predicting a sequence of a peptide having specific physical properties.
- the present invention also relates to a sequence prediction support system that supports this sequence prediction.
- the present invention relates to a sequence prediction program and method for operating a sequence prediction system.
- the present invention also relates to a sequence prediction support program and method for operating a sequence prediction support system.
- HCV hepatitis C virus
- CTL cytotoxic T cells
- CTL epitopes [0004] In order to identify such CTL epitopes, database hypotheses such as BIMAS and SYFPEITHI also perform epitope prediction and conduct experiments to determine whether they actually bind to HLA molecules according to the prediction results. Those that bound to CTL were identified as CTL epitopes.
- Non-Patent Document 1 describes a method for more accurately identifying a peptide that binds to an HLA molecule in order to identify a peptide that binds to an HLA molecule in fewer! / ⁇ experiments. Yes.
- Non-Patent Document 1 Udaka, K., et al, 'Empirical Evaluation of a Dynamic Experiment Design Method for Prediction of MHC Class I- Binging Peptides', The Journal oflmmunology, 169, p5744-5753, 2002
- Non-Patent Document 1 it is determined whether or not a peptide sequence arbitrarily selected from a computer has a predetermined physical property, for example, the ability to bind to an HLA molecule as described above. Whether or not the actually selected peptide sequence has a predetermined physical property has been confirmed by conducting an experiment. Non-Patent Document 1 describes that the selected peptide sequence has actually been confirmed to have a predetermined physical property with a high probability by experiments (2nd column, page 5749, right column).
- Non-Patent Document 1 the technique described in Non-Patent Document 1 is focused on a specific target, for example, a virus antigen, and the predicted peptide sequence functions as a virus antigen without experimentation. However, it was not sufficient for the purpose of selecting the sequences that were identified as having the specific physical properties necessary for the purpose and quantitatively distinguishing them.
- the present invention has been made in view of the above-described circumstances, and a sequence prediction system and sequence capable of selecting only a biopolymer sequence having a certain predetermined physical property without performing an experiment.
- the object is to provide a prediction database, a sequence prediction support system, a sequence prediction program, a sequence prediction support program, a sequence prediction method, and a sequence prediction support method.
- the sequence prediction system includes a database having a biopolymer attribute including a biopolymer sequence and an attribute value included in the biopolymer of the sequence.
- a question point extraction unit that obtains a variance of attribute values for each biopolymer sequence in the second data set and extracts a biopolymer sequence having a variance larger than a certain reference as a question point;
- An attribute value for the question point is received, the received attribute value is associated with a biopolymer sequence that is applied to the question point, a data control unit that accumulates in the database, and an entire array of predetermined biopolymers.
- a sequence candidate extraction unit for extracting biopolymer sequence candidates to be subjected to total sequence force prediction received by the sequence input reception unit
- An attribute value estimation unit that generates a rule from all the data sets of the database after accepting sequence input and applies the rule to each of the biopolymer sequence candidates to estimate an attribute value of the biopolymer sequence candidate; Including.
- N data sets having a database power are extracted by the selection unit, and a plurality of different data subsets are generated from the N data sets by the generation unit.
- the learning unit analyzes each of the data subsets independently to generate a certain hypothesis, and applies the hypothesis to the biopolymer sequence of the second data set to derive attribute values.
- the second data set having the biopolymer sequence and the derived attribute value is generated as many as the number of data subsets. That is, attribute values are derived for the same biopolymer sequence based on hypotheses derived from each data subset.
- the question point extraction unit obtains a variance of a plurality of attribute values derived for the same biopolymer sequence, and extracts a biopolymer sequence having a variance larger than a certain standard as a question point.
- the data control unit receives the attribute value for the question point, associates it with the biopolymer sequence related to the question point, accumulates it in the database, and updates the contents of the database.
- the sequence input accepting unit accepts the entire sequence of a predetermined biopolymer, and the sequence candidate extracting unit extracts a biopolymer sequence candidate for attribute value prediction from the entire sequence.
- the attribute value estimation unit the updated database data A rule is generated from the set, and the rule value is estimated for each biopolymer sequence by applying this rule to each candidate biopolymer sequence.
- the learning unit may function as an attribute value estimation unit after receiving the array input.
- the hypothesis generated by applying the hypothesis generated to each of the plurality of data subsets from the generating unit is arbitrarily created. While deriving attribute values for each biopolymer sequence in the data set, at the time of attribute value prediction, a law generated from the data set included in the updated database is applied to each biopolymer sequence candidate.
- the attribute value can be calculated as an estimated value.
- the sequence candidate extraction unit extracts a biopolymer sequence in units of p monomer extraction units from the beginning of all sequences received by the sequence input reception unit.
- the polymer sequence candidate may be extracted for every P monomer extraction units while shifting the q monomer units downstream by q units.
- sequence candidate extraction unit may exclude, from the extracted biopolymer sequence candidates, biopolymer sequences that do not require prediction that satisfy a predetermined condition before sending them to the attribute value estimation unit. Yes.
- the interrogation point extraction unit may extract a biopolymer array having a large variance in a certain range as a query point or the variance is less than a predetermined value. Larger biopolymer sequences may be extracted as question points.
- a sequence for extracting a biopolymer sequence candidate having an attribute value satisfying a predetermined condition among the attribute values of each biopolymer sequence candidate estimated by the attribute value estimation unit An extraction unit may be further provided.
- the biopolymer array in which the estimated attribute value satisfies a predetermined condition It can be extracted as a predicted sequence.
- sequence prediction system includes a database having a biopolymer attribute including a biopolymer sequence and an attribute value included in the biopolymer of this sequence;
- a sequence input receiving unit that receives the entire sequence of a predetermined biopolymer
- a sequence candidate extraction unit for extracting biopolymer sequence candidates to be subjected to total sequence force prediction received by the sequence input reception unit
- An attribute value estimation unit that generates a rule from all the data sets of the database after accepting sequence input and applies the rule to each of the biopolymer sequence candidates to estimate an attribute value of the biopolymer sequence candidate; Including.
- the sequence input accepting unit accepts the entire sequence of a predetermined biopolymer
- the sequence candidate extracting unit selects a biopolymer sequence candidate for attribute value prediction from the entire sequence. Extract.
- the attribute value estimation unit generates a rule from the data set of the database, applies this rule to each biopolymer sequence candidate, and estimates an attribute value for each biopolymer sequence.
- the sequence prediction database according to the present invention includes attribute values obtained by the sequence prediction system described above and a biopolymer sequence.
- the sequence prediction support system includes a database having a biopolymer attribute including a biopolymer sequence and an attribute value included in the biopolymer of this sequence;
- a generating unit that generates a plurality of different data subsets from the data set, and a hypothesis for each data subset, and a hypothesis for each of the second data set consisting of biopolymer sequences independent of the data set Apply the second
- a learning unit for deriving the attribute value of the biopolymer sequence according to the data set of 2 is a learning unit for deriving the attribute value of the biopolymer sequence according to the data set of 2,
- a question point extraction unit that obtains a variance of attribute values for each biopolymer sequence in the second data set and extracts a biopolymer sequence having a variance larger than a certain reference as a question point;
- a data control unit that receives an attribute value for the question point, associates the received attribute value with a biopolymer sequence that is applied to the question point, and stores the attribute value in the database; including.
- N data sets of database power are extracted by the selection unit, and a plurality of different data subsets are generated from the N data sets by the generation unit.
- the learning unit analyzes each of the data subsets independently to generate a certain hypothesis, and applies the hypothesis to the biopolymer sequence of the second data set to derive attribute values.
- the second data set having the biopolymer sequence and the derived attribute value is generated as many as the number of data subsets. That is, attribute values are derived for the same biopolymer sequence based on hypotheses derived from each data subset.
- the question point extraction unit obtains a variance of a plurality of attribute values derived for the same biopolymer sequence, and extracts a biopolymer sequence having a variance larger than a certain standard as a question point.
- the data control unit receives the attribute value for the question point, associates it with the biopolymer sequence related to the question point, accumulates it in the database, updates the contents of the database, and constructs a database that supports sequence prediction.
- a sequence prediction program according to the present invention includes a computer device
- a database having biopolymer attributes including a biopolymer array and attribute values of the biopolymer of this array;
- a generating unit that generates a plurality of different data subsets from the data set, and a hypothesis for each data subset, and a hypothesis for each of the second data set consisting of biopolymer sequences independent of the data set And applying a learning unit for deriving the attribute value of the biopolymer sequence for the second data set,
- a question point extraction unit that obtains a variance of attribute values for each biopolymer sequence in the second data set and extracts a biopolymer sequence having a variance larger than a certain reference as a question point;
- An attribute value for the question point is received, the received attribute value is associated with a biopolymer sequence that is applied to the question point, and a data control unit that accumulates in the database; and an entire array of predetermined biopolymers An array input receiving unit for receiving;
- An attribute value estimation unit that generates a rule from all the data sets of the database after accepting sequence input and applies the rule to each of the biopolymer sequence candidates to estimate an attribute value of the biopolymer sequence candidate; It functions as a sequence prediction system.
- N data sets having database power are extracted by the selection unit, and a plurality of different data subsets are generated from the N data sets by the generation unit.
- the learning unit analyzes each of the data subsets independently to generate a certain hypothesis, and applies the hypothesis to the biopolymer sequence of the second data set to derive attribute values.
- the second data set having the biopolymer sequence and the derived attribute value is generated as many as the number of data subsets. That is, attribute values are derived for the same biopolymer sequence based on hypotheses derived from each data subset.
- the question point extraction unit obtains a variance of a plurality of attribute values derived for the same biopolymer sequence, and extracts a biopolymer sequence having a variance larger than a certain standard as a question point.
- the data control unit receives the attribute value for the question point, associates it with the biopolymer sequence related to the question point, accumulates it in the database, and updates the contents of the database.
- the sequence input accepting unit accepts the entire sequence of a predetermined biopolymer
- the sequence candidate extracting unit extracts a biopolymer sequence candidate for attribute value prediction from the entire sequence.
- the attribute value estimation unit generates a rule from the updated database data set, applies the rule to each biopolymer sequence candidate, and estimates the attribute value for each biopolymer sequence.
- the general-purpose computer device functions as an array prediction system.
- a sequence prediction program according to the present invention includes a computer device,
- a database having biopolymer attributes including a biopolymer array and attribute values of the biopolymer of this array;
- a sequence input receiving unit that receives the entire sequence of a predetermined biopolymer
- a sequence candidate extraction unit for extracting biopolymer sequence candidates to be subjected to total sequence force prediction received by the sequence input reception unit;
- An attribute value estimation unit that generates a rule from all the data sets of the database after accepting sequence input and applies the rule to each of the biopolymer sequence candidates to estimate an attribute value of the biopolymer sequence candidate; It functions as a sequence prediction system.
- the sequence input receiving unit accepts the entire sequence of a predetermined biopolymer
- the sequence candidate extraction unit selects a biopolymer sequence candidate for attribute value prediction from the entire sequence. Extract.
- the attribute value estimation unit generates a rule from the data set of the database, applies this rule to each biopolymer sequence candidate, and estimates an attribute value for each biopolymer sequence.
- the general-purpose computer apparatus functions as an array prediction system.
- a sequence prediction support program includes a computer device
- a database having biopolymer attributes including a biopolymer array and attribute values of the biopolymer of this array;
- a generating unit that generates a plurality of different data subsets from the data set, and a hypothesis for each data subset, and a hypothesis for each of the second data set consisting of biopolymer sequences independent of the data set And applying a learning unit for deriving the attribute value of the biopolymer sequence for the second data set,
- a question point extraction unit that obtains a variance of attribute values for each biopolymer sequence in the second data set and extracts a biopolymer sequence having a variance larger than a certain reference as a question point;
- An attribute value for the question point is received, the received attribute value is associated with a biopolymer sequence that is applied to the question point, and a data control unit that accumulates in the database is caused to function as a sequence prediction support system. Is.
- N data sets having database power are extracted by the selection unit, and a plurality of different data subsets are generated from the N data sets by the generation unit.
- the learning unit analyzes each subset of data independently to generate a certain hypothesis, applies the hypothesis to the biopolymer sequence of the second data set, and sets the attribute value Is derived.
- the second data set having the biopolymer sequence and the derived attribute value is generated as many as the number of data subsets. That is, attribute values are derived for the same biopolymer sequence based on hypotheses derived from each data subset.
- the question point extraction unit obtains a variance of a plurality of attribute values derived for the same biopolymer sequence, and extracts a biopolymer sequence having a variance larger than a certain standard as a question point.
- the data control unit receives the attribute value for the question point, associates it with the biopolymer sequence related to the question point, accumulates it in the database, updates the contents of the database, and constructs a database that supports sequence prediction.
- the general-purpose computer device functions as an array prediction support system.
- N data sets are selected from a database having a biopolymer sequence and an attribute value included in the biopolymer of this sequence, and a plurality of different data sets from the data set are selected.
- a hypothesis is generated for each data subset, and a second hypothesis is applied to each second data set having a biopolymer alignment force independent of the data set to obtain a second hypothesis.
- a question point extraction stage in which biopolymer sequences having a variance larger than a certain standard among the calculated variances are extracted as question points;
- the attribute value for the question point is received, the received attribute value is associated with the biopolymer sequence that is applied to the question point, and the data update stage stored in the database and the entire sequence of the predetermined biopolymer are received. Then, a sequence candidate extraction step for extracting the biopolymer sequence candidate that is a target of the received total sequence force prediction,
- An attribute value estimation stage for estimating the attribute value of
- the sequence prediction support method selects N data sets from a database having a sequence of a biopolymer and an attribute value included in a biopolymer of the sequence, and further, the data set A data supply stage for generating a plurality of different data subsets from the
- a hypothesis is generated for each data subset, and a second hypothesis is applied to each second data set having a biopolymer alignment force independent of the data set to obtain a second hypothesis.
- a question point extraction stage in which biopolymer sequences having a variance larger than a certain standard among the calculated variances are extracted as question points;
- sequence prediction system sequence prediction support system
- sequence prediction program sequence prediction support program
- sequence prediction method include the following modes.
- One aspect of the sequence prediction system includes a database that stores data including a peptide sequence having a first predetermined number of amino acid forces, and physical properties that are indicative of a predetermined physiological activity of the peptide sequence; A plurality of learning units for deriving a hypothesis obtained for the third predetermined number of peptide sequences from the peptide sequence and physical properties based on the predetermined number of data, and a database power.
- a random resampling unit that randomly supplies a second predetermined number of data to each learning unit; a target sequence setting unit that sets a predetermined peptide sequence included in a hypothesis derived by each learning unit; A physical property extraction unit that extracts the physical properties specified by a given peptide sequence from the hypotheses of each learning unit, and a variance evaluation unit that evaluates the variance of the physical properties extracted from each learning unit , Based on the estimated variance, the target that requests the true data for the physical properties of the hypothetical peptide sequence Question point extraction unit that extracts the data, and the peptide that is extracted by receiving the requested true data A data updating unit that performs processing for associating physical properties based on true data; a data control unit that accumulates new data including peptide sequences obtained by the data updating unit and physical properties based on true data in a database; A sequence input accepting unit that accepts the entire amino acid sequence of a predetermined protein, and a peptide sequence candidate to be predicted is extracted from all the amino acid sequences accepted by the sequence input accepting unit
- the database power fourth predetermined number of data is randomly resampled by the second predetermined number of data that is smaller than the fourth predetermined number by the random resampling unit, Sent to each learning unit.
- different data is supplied for each learning unit.
- Each learning unit analyzes the supplied data to obtain a predetermined physical property for a third predetermined peptide sequence from a certain hypothesis, that is, a peptide sequence having a predetermined number of amino acid forces and a predetermined physical property.
- Data sets are derived.
- the focused sequence setting unit sets a predetermined peptide sequence for comparing hypotheses derived by each learning unit, and the focused physical property extraction unit sets the physical property specified by the set predetermined peptide sequence.
- the variance evaluation unit evaluates the variance of physical properties extracted from each learning unit, and the question point extraction unit is a target for requesting true data for hypothetical physical properties based on this evaluated variance! Peptide sequences are extracted and their hypotheses are compared.
- the data update unit receives the true data, associates the true data with the extracted peptide sequence, and sends it to the data control unit.
- the data controller updates the contents of the database by adding data including the peptide sequence and physical properties based on the true data.
- the sequence input accepting unit accepts the entire amino acid sequence of a given protein, extracts peptide sequence candidates to be predicted from the entire amino acid sequence, and sends the peptide sequence candidates to the learning unit.
- the physical property estimation unit estimates the physical properties of the extracted peptide sequence candidates from the results obtained in each learning unit.
- the sequence candidate extraction unit receives the sequence input reception unit. From the beginning of the entire amino acid sequence, the peptide extraction unit having the fifth predetermined number of amino acid strengths is extracted, and the subsequent peptide sequence candidates are shifted to the downstream side by the sixth predetermined number of amino acids. The peptide sequence may be extracted. Furthermore, peptide sequences that do not require prediction satisfying a predetermined condition of the extracted sequence candidates can be excluded before being sent to the learning unit.
- the peptide sequence candidates are also extracted with respect to the total amino acid sequence ability of the received protein, and unnecessary peptide sequences are extracted from the extracted peptide sequence candidates before prediction of physical properties. This eliminates the need for unnecessary estimation calculations.
- the query point extraction unit may extract peptide sequences having a large variance in the seventh predetermined number range as the query points, or the variance is less than a predetermined value. Large peptide sequences may be extracted as question points.
- the hypothesis correction unit is requested by the data request unit that requests true data of physical properties based on the peptide sequence extracted by the question point extraction unit.
- a data receiving unit that receives the true data, and a data adding unit that sends the received true data to the data control unit in association with the extracted peptide sequence.
- the data request unit for example, request an experiment to the outside or obtain information from an external database for the peptide sequence that is the question point.
- the data accepting unit accepts data corresponding to the true data
- the data adding unit accepts the received true data to the data control unit so as to add it to the database in association with the peptide sequence for which data is requested. send.
- sequence extraction unit that extracts peptide sequence candidates having physical properties satisfying a predetermined estimated condition among the physical properties of each peptide sequence candidate estimated by the physical property estimation unit May be further provided.
- the physical property estimation unit can extract a peptide sequence candidate having a predetermined physical property as having a predetermined physical property with respect to a predetermined protein.
- this peptide is characterized by predicting the base sequence of a nucleic acid encoding the sequence.
- One aspect of the sequence prediction support system includes a database storing data including a peptide sequence having a first predetermined number of amino acid powers and physical properties serving as an index of a predetermined physiological activity of the peptide sequence; Based on the second predetermined number of data, a plurality of learning units for deriving a hypothesis obtained for the third predetermined number of peptide sequences from the peptide sequence and physical properties, and a database power A random resampling unit that takes out and randomly supplies a second predetermined number of data to each learning unit, a target sequence setting unit that sets a predetermined peptide sequence included in a hypothesis derived by each learning unit, Each learning unit's hypothetical power is extracted from the physical properties specified by the specified peptide sequence.
- a focused physical property extraction unit and a variance evaluation that evaluates the variance of the physical properties extracted from each learning unit Based on the evaluated variance, a question point extraction unit that extracts the peptide sequence for which the true data for the hypothetical physical property is requested, and the peptide sequence extracted by receiving the requested true data
- a data update unit that performs processing for associating physical properties based on true data with respect to the data
- a data control unit that accumulates new data including peptide sequences obtained in the data update unit and physical properties based on true data in a database, including.
- the database power fourth predetermined number of data is randomly resampled by the second predetermined number of data that is smaller than the fourth predetermined number by the random resampling unit, Sent to each learning unit.
- different data is supplied for each learning unit.
- Each learning unit analyzes the supplied data to obtain a predetermined physical property for a third predetermined peptide sequence from a certain hypothesis, that is, a peptide sequence having a predetermined number of amino acid forces and a predetermined physical property.
- Data sets are derived.
- the focused sequence setting unit sets a predetermined peptide sequence for comparing hypotheses derived by each learning unit, and the focused physical property extraction unit sets the physical property specified by the set predetermined peptide sequence.
- the variance evaluation unit evaluates the variance of the physical properties extracted from each learning unit, and the question point extraction unit evaluates the evaluated variance. Based on the above, the target peptide sequences for which true data on the physical properties of the hypothesis are requested are extracted, and the hypotheses are compared.
- the data update unit receives the true data, associates the true data with the extracted peptide sequence, and sends it to the data control unit. Furthermore, the data controller updates the contents of the database by adding data including the peptide sequence and physical properties based on the true data, thereby constructing a database that supports sequence prediction.
- a computer device stores a data including a peptide sequence having a first predetermined number of amino acid forces and physical properties serving as an index of a predetermined physiological activity of the peptide sequence. And a plurality of learning units for deriving a hypothesis obtained from the third predetermined number of peptide sequences from the peptide sequence and physical properties based on the second predetermined number of data, and a database power
- the random resampling unit that randomly supplies a second predetermined number of data to each learning unit and sets the predetermined peptide sequence included in the hypothesis derived by each learning unit
- a physical property extraction unit that extracts the physical properties specified by the set predetermined peptide sequence and the hypothetical power of each learning unit, and the physical property extracted from each learning unit.
- a variance evaluation unit a question point extraction unit that extracts a peptide sequence that is a target for requesting true data for a hypothetical physical property based on the evaluated variance, and accepts the requested true data
- a data update unit that performs processing for associating the physical properties based on the true data with the extracted peptide sequence, and new data including the peptide sequences obtained by the data update unit and the physical properties based on the true data are stored in the database.
- a data control unit that stores data, a sequence input reception unit that accepts all amino acid sequences of a given protein, and a peptide sequence candidate that is a target for prediction of the total amino acid sequence power received by the sequence input reception unit.
- the sequence candidate extraction unit that sends the peptide sequence candidates to the learning unit, and the physical properties of the extracted peptide sequence candidates are estimated from the results obtained by each learning unit
- And Properties estimator that is intended to function as a sequence prediction system including.
- the database power fourth predetermined number of data is randomly resampled by the second predetermined number of data that is smaller than the fourth predetermined number by the random resampling unit, Sent to each learning unit.
- different learning units are used.
- Data is provided.
- Each learning unit analyzes the supplied data to obtain a predetermined physical property for a third predetermined peptide sequence from a certain hypothesis, that is, a peptide sequence having a predetermined number of amino acid forces and a predetermined physical property.
- Data sets are derived.
- the focused sequence setting unit sets a predetermined peptide sequence for comparing hypotheses derived by each learning unit, and the focused physical property extraction unit sets the physical property specified by the set predetermined peptide sequence.
- the variance evaluation unit evaluates the variance of physical properties extracted from each learning unit, and the question point extraction unit is a target for requesting true data for hypothetical physical properties based on this evaluated variance! Peptide sequences are extracted and their hypotheses are compared.
- the data update unit receives the true data, associates the true data with the extracted peptide sequence, and sends it to the data control unit.
- the data controller updates the contents of the database by adding data including the peptide sequence and physical properties based on the true data.
- the sequence input accepting unit accepts the entire amino acid sequence of a given protein, extracts peptide sequence candidates to be predicted from the entire amino acid sequence, and sends the peptide sequence candidates to the learning unit.
- the physical property estimation unit estimates the physical properties of the extracted peptide sequence candidates from the results obtained in each learning unit.
- the general-purpose computer apparatus functions as an array prediction system.
- the computer device stores data including a first predetermined number of peptide sequences having amino acid power and physical properties that are indicative of predetermined physiological activity of the peptide sequences.
- a database a plurality of learning units for deriving a hypothesis obtained from a peptide sequence and physical properties based on a second predetermined number of the data, and a database capability; Focusing on extracting a predetermined number of data and setting a random resampling unit that randomly supplies each learning unit with a second predetermined number of data and a predetermined peptide sequence included in the hypothesis derived by each learning unit
- the sequence setting unit, the physical property extraction unit that extracts the physical properties specified by the set predetermined peptide sequence from the hypotheses of each learning unit, and the material extracted from each learning unit Distributed and dispersed evaluation unit which evaluates a, based on the evaluation dispersion, an interrogator point extraction unit that extracts a peptide sequence of interest that require true data for the physical properties of the hypothesis, the requested true
- the database power fourth predetermined number of data is randomly resampled by the second predetermined number of data that is smaller than the fourth predetermined number by the random resampling unit, Sent to each learning unit.
- different data is supplied for each learning unit.
- Each learning unit analyzes the supplied data to obtain a predetermined physical property for a third predetermined peptide sequence from a certain hypothesis, that is, a peptide sequence having a predetermined number of amino acid forces and a predetermined physical property.
- Data sets are derived.
- the focused sequence setting unit sets a predetermined peptide sequence for comparing hypotheses derived by each learning unit, and the focused physical property extraction unit sets the physical property specified by the set predetermined peptide sequence.
- the variance evaluation unit evaluates the variance of physical properties extracted from each learning unit, and the question point extraction unit is a target for requesting true data for hypothetical physical properties based on this evaluated variance! Peptide sequences are extracted and their hypotheses are compared.
- the data update unit receives the true data, associates the true data with the extracted peptide sequence, and sends it to the data control unit. Furthermore, the data controller updates the contents of the database by adding data including the peptide sequence and physical properties based on the true data, thereby constructing a database that supports sequence prediction.
- general-purpose computer devices function as an array prediction support system.
- data including a peptide sequence having a first predetermined number of amino acid forces and physical properties serving as an index of a predetermined physiological activity of the peptide sequence is stored.
- Database and database power Based on the second predetermined number of data randomly fetched from the fourth predetermined number of data and randomly sent from the fourth predetermined number of data, the peptide sequence and physical properties
- a plurality of hypothesis deriving units for deriving the hypothesis obtained for the third predetermined number of peptide sequences and a predetermined peptide sequence included in the hypothesis derived by each hypothesis deriving unit are set.
- the extracted physical properties are extracted from the hypotheses of each hypothesis deriving unit, the variance of the extracted physical properties is evaluated, and based on the evaluated variance, the peptide sequence for which the true data for the physical properties of the hypothesis is requested is extracted.
- a query point sequence extraction unit; a data update unit that receives the requested true data and associates the extracted peptide sequence with physical properties based on the true data; and a peptide sequence obtained by the data update unit A data control unit that accumulates new data including physical properties based on true data in a database and a peptide sequence that is predicted from the received amino acid sequence of the given protein.
- the candidate is extracted, and the extracted peptide sequence candidate is sent to the hypothetical derivation unit, and the extracted peptide sequence candidate is extracted from the output result.
- a sequence extraction unit may be further provided that extracts peptide sequence candidates having physical properties satisfying a predetermined condition among the physical properties of each peptide sequence candidate estimated by the physical property estimation output unit.
- sequence prediction support system stores data including a peptide sequence having a first predetermined number of amino acid forces and physical properties serving as an index of a predetermined physiological activity of the peptide sequence.
- Database and database power The fourth predetermined number of data is randomly extracted, and the peptide sequence and the data are determined based on the second predetermined number of data randomly transmitted from the fourth predetermined number of data.
- a plurality of hypothesis deriving units for deriving a hypothesis obtained from the third predetermined number of peptide sequences and a predetermined peptide sequence included in the hypothesis derived by each hypothesis deriving unit are set.
- the physical properties specified by the set predetermined peptide sequence are extracted from the hypotheses of each hypothesis deriving unit, the variance of the extracted physical properties is evaluated, and the hypothetical physical properties are evaluated based on the evaluated variance.
- True A query point sequence extraction unit that extracts the peptide sequence for which data is requested and a process that accepts the requested true data and associates physical properties based on the true data with the extracted peptide sequence
- a data control unit for storing new data including the peptide sequence obtained by the data update unit and physical properties based on the true data in a database.
- a computer device is connected to the first predetermined number of amino acids.
- a database that stores data including a peptide sequence that has a force and physical properties that are indicative of a predetermined physiological activity of the peptide sequence, and a database power.
- a fourth predetermined number of data is randomly extracted to obtain a fourth predetermined number.
- a plurality of hypotheses are derived from the peptide sequence and physical properties, and the hypothesis obtained for the third predetermined number of peptide sequences
- a predetermined peptide sequence included in the hypothesis derived by the deriving unit and each hypothesis deriving unit is set, and physical properties specified by the set predetermined peptide sequence are respectively extracted from the hypotheses of each hypothesis deriving unit
- a query point sequence extractor that evaluates the variance of the extracted physical properties and extracts a peptide sequence that is a target of requesting true data for the hypothetical physical properties based on the evaluated variance, and a request
- a data update unit that receives the true data and associates the physical properties based on the true data with respect to the extracted peptide sequence, and a new data including the peptide sequence obtained by the data update unit and the physical properties based on the true data.
- the data control unit that accumulates data in the database and all amino acid sequences of a given protein are received, and peptide sequence candidates to be predicted are extracted from the received all amino acid sequences, and the extracted peptide A sequence candidate is sent to a hypothesis deriving unit, and is made to function as a sequence prediction system including a physical property estimation output unit that estimates the physical property of the extracted peptide sequence candidate from the output result.
- the computer device stores data including a first predetermined number of peptide sequences having amino acid power and physical properties that are indicative of predetermined physiological activity of the peptide sequences.
- Database and database power The fourth predetermined number of data is randomly extracted, and the fourth predetermined number of data is the middle force Based on the second predetermined number of data sent at random, from the peptide sequence and physical properties A plurality of hypothesis deriving sections for deriving hypotheses obtained for the third predetermined number of peptide sequences, and a predetermined peptide sequence included in the hypothesis derived by each hypothesis deriving section.
- the physical properties specified by the peptide sequence are extracted from the hypotheses of each hypothesis deriving section, the variance of the extracted physical properties is evaluated, and the physical properties of the hypothesis are evaluated based on the evaluated variance.
- a question point sequence extraction unit that extracts the peptide sequence that is the target of requesting the true data, and accepts the requested true data, and the extracted peptide sequence is based on the true data.
- Sequence prediction support which includes a data update unit that performs processing for associating sex, and a data control unit that stores new data including peptide sequences obtained by the data update unit and physical properties based on true data in a database It functions as a system.
- sequence predicting method from a database that stores data including a peptide sequence having a first predetermined number of amino acid forces and physical properties that are indicative of a predetermined physiological activity of the peptide sequence.
- the random resampling unit extracts the fourth predetermined number of data, and randomly supplies the second predetermined number of data from the fourth predetermined number of data to each of the plurality of learning units.
- a hypothesis derivation step in which each learning unit derives a hypothesis obtained for the third predetermined number of peptide sequences from the peptide sequence and physical properties based on the second predetermined number of data.
- the target sequence setting stage which sets a predetermined peptide sequence included in the hypothesis derived by the learning unit, and the physical properties specified by the set predetermined peptide sequence
- the target physical property extraction stage to be extracted the variance evaluation stage to evaluate the variance of the physical properties extracted from each learning unit, and the peptides for which true data is requested for the hypothetical physical properties based on the evaluated variance
- New additional data including physical properties based on the data update stage for accumulating in the database, and accepting all amino acid sequences of a given protein, and accepting the peptide sequence candidates for prediction of the received total amino acid sequence power
- the extracted peptide sequence candidates are sent out to the learning unit from the extracted sequence candidate extraction stage and the results obtained in each learning unit. Including a physical property estimation step for estimating a physical property of the tides sequence candidates, a
- sequence prediction support method is also included in the embodiments of the present invention. That is, from a database storing data including a peptide sequence having the first predetermined number of amino acid forces and physical properties that are indicative of a predetermined physiological activity of the peptide sequence, a fourth predetermined number of data is obtained by a random resampling unit.
- a random re-sampling stage in which a second predetermined number of data is randomly supplied from a fourth predetermined number of data to each of the plurality of learning units, and each learning unit has a second predetermined number Peptide sequence and data based on A hypothesis derivation stage for deriving a hypothesis obtained for the third predetermined number of peptide sequences from the physical properties, and a target sequence setting stage for setting a predetermined peptide sequence included in the hypothesis derived by each learning unit.
- the physical property extraction stage that extracts the physical properties specified by the specified peptide sequence from the hypothesis of each learning department, the variance evaluation stage that evaluates the variance of physical properties extracted from each learning section, and the evaluated variance And a question point extraction stage for extracting peptide sequences for which true data for hypothetical physical properties is requested based on
- the extracted peptide sequence is subjected to a process for associating the physical properties based on the true data, and new additional data including the obtained peptide sequence and the physical properties based on the true data is obtained.
- This is a sequence prediction support method including a data update stage stored in a database.
- FIG. 1 is a block diagram showing an overview of a sequence prediction system according to a first embodiment of the present invention.
- FIG. 2 is a diagram showing an example of a data set stored in a storage device.
- FIG. 3 is a diagram showing an example of the existence probability of each amino acid at each aligned position of virtual peptide sequences tabulated based on probability parameters calculated by a learning unit.
- FIG. 4 is a diagram illustrating an example of a hypothesis output by a learning unit.
- FIG. 5 is a diagram schematically showing an example of data for question point extraction.
- FIG. 6 shows an example in which the sequence candidate extraction unit is configured to exclude unnecessary peptide sequences.
- FIG. 7 is a block diagram showing an overview of a sequence prediction system according to a second embodiment of the present invention.
- FIG. 8 is a functional block diagram illustrating the function of the hypothesis comparison unit in FIG.
- FIG. 9 A diagram showing a case where a request for true data is made to an external database not to a user.
- FIG. 10 is a flowchart explaining the operation of the sequence prediction support method according to the first embodiment.
- FIG. 11 is a flowchart showing the operation of a sequence prediction system using a database constructed by a sequence prediction support system or an existing database.
- FIG. 12 is a flowchart explaining the operation of the sequence prediction support method according to the second embodiment.
- FIG. 13 is a flowchart showing the operation of the sequence prediction system using the database constructed by the sequence prediction support system according to the second embodiment.
- FIG. 1 is a block diagram showing an overview of the sequence prediction system according to the first embodiment of the present invention.
- This sequence prediction system includes a storage device 126, which is a database having a biopolymer attribute including a biopolymer sequence and attribute values of the biopolymer of this sequence, and N data sets from the storage device 126.
- a data control unit 128 as a selection unit to be selected, a generation unit 102 that generates a plurality of different data subsets from the data set, and a hypothesis for each data subset, and the data set Applying a hypothesis to each of the second data set consisting of independent biopolymer sequences and deriving attribute values of the biopolymer sequence for the second data set, and a learning unit 104 in the second data set
- the question point extraction unit 118 that obtains the variance of the attribute value for each biopolymer sequence of and extracts a biopolymer sequence having a variance larger than a certain standard as a question point, and this question The attribute value for the point is received, the received attribute value is associated with the biopolymer sequence related to the question point, the data control unit 128
- the storage device 126 is a database that accumulates a data set including peptide sequences as biopolymer sequences and attribute values of the peptide sequences.
- This data set is composed of known data (referred to as “known data”) that has been clarified by documents or the like, or data sent from the data receiving unit 122 through the data control unit 128 described later.
- FIG. 2 is a diagram showing an example of a data set stored in the storage device 126. As shown in FIG.
- this data set consists of a peptide sequence having a predetermined number of amino acid strengths, and an attribute value of this peptide sequence, for example, an antigen closely related to a physical property indicating a predetermined physiological activity, for example, immune induction. And a binding constant (-logKd) to a human leukocyte antigen (HLA) complex which is a display molecule.
- the number of amino acids in the peptide sequence is 8 to when HLA class I molecules are targeted: a fixed value of L1, for example 9, and 20 when HLA class II molecules are targeted. The following fixed values can be used.
- examples of biological macromolecules having a predetermined physiological activity include DNAs and RNAs composed of a predetermined number of nucleotides and having a predetermined base sequence.
- the attribute value of the biopolymer sequence includes a physical property that serves as an index of the binding ability to a predetermined substance.
- This physical property is related to, for example, the binding constant in addition to the binding target. It may be a physical property such as a hydrophobicity (or hydrophilicity).
- the data control unit 128 functions as a selection unit that selects N data sets, and the selected N data sets are sent to the generation unit 102. Further, the data control unit 128 updates the data content of the storage device 126 by sending an additional data set sent from the data reception unit 122 to the storage device 126, as will be described later. [0068] In addition, in the data control unit 128, when all the arrays of a predetermined biological molecular arrangement are input from the array input receiving unit 130 described later, all data sets are stored from the data sets stored in the storage device 126. It is taken out and sent to the learning unit 104 as an attribute value estimation unit.
- the generation unit 102 randomly samples from the N data sets sent from the data control unit 128 to generate a data subset including arbitrary m pieces (N> m) of data.
- the subset is sent to the learning unit 104.
- each data subset may be the same number of data sets or may be a different number of data sets.
- the learning unit 104 when a data subset is sent from the generation unit 102, a hypothesis described later is generated for each data subset, and when a data set is sent from the data control unit 128, The rules for estimating the attribute values for the candidate peptide sequences to be generated, such as the binding constants of FIG. 2, are generated.
- the learning unit 104 may include a plurality of calculation units, and each calculation unit may be configured to perform processing related to a plurality of data subsets in parallel, or may include a single calculation unit. It is configured to process each data subset serially.
- arithmetic processing is performed according to the procedure of the hidden Markov model learning system described in Japanese Patent No. 3094860, for example.
- the probability parameters stored in the parameter storage device 140 are, in the case of a hypothesis regarding a peptide sequence having a predetermined number, for example, nine amino acid forces, the existence probability of each amino acid at each alignment position in each amino acid alignment order, and each alignment. It consists of transition probabilities before and after the position. [0075]
- the existence probability of each amino acid at each alignment position of the virtual peptide sequence as shown in Fig. 3 for example, based on the existence probability of each amino acid at each alignment position and the transition probability before and after each alignment position. Is calculated as a hypothesis. In Fig.
- the top row shows that the first or ninth amino acid is methionine (M) with a 29% probability, isoleucine (I) with a 16% probability, and norrin (V) with a 12% probability. It is shown. The remaining 43% is calculated as the total probability of the remaining amino acids.
- the lower part of Fig. 3 shows the alignment position of 8 amino acids from left to right. According to this, the probability that the leftmost Threonin (T) is first is 1%, and the probability that it is second is 22%. In this way, the probability of existence is shown to the right, and the amino acids from the top 1 to 3 are shown above each aligned position. That is, the parameter storage device 140 is configured to store each probability parameter used for aggregating hypotheses that also include such parameter forces.
- LKa L-(L — LKa,)
- L is the peptide sequence O in a given HMM (Hidden Markov Model)
- LKa ′ represents the average value of logKa of all peptides used in the calculation.
- H ′ represents a reference HMM when the existence probability is uniform.
- the learning unit 104 applies the hypothesis to the second data set composed of the biopolymer sequence independent from the data set taken out by the data control unit 128, and applies this hypothesis to the second data set.
- the attribute value of the biopolymer sequence is derived and sent to the question point extraction unit 118.
- This second data set contains, for example, 100,000 peptide sequences, and hypotheses from multiple data subsets are applied to this second data set, respectively.
- a second data set that also has the attribute value of each array The number of data subsets is generated.
- the peptide sequence related to the second data set may be a variable set that is set each time a data subset is sent from the generation unit 102, or may be arbitrarily input or entered by a person using this system. It may be a set to be selected. Further, it may be included in a predetermined data table.
- a data set when a data set is sent from the data control unit 128, it acts as an attribute value estimation unit. That is, a law is generated based on a probability parameter that can be obtained by performing the same operation as described above. Unlike generating hypotheses, a set of rules is generated. For each candidate peptide sequence sent from the sequence candidate extraction unit 131 described later, an estimated value obtained by applying the rule is obtained, and this estimated value is associated as an attribute value of the corresponding candidate peptide sequence, Sent to peptide database 138.
- a calculation process is performed to obtain the variance of attribute values for each peptide sequence in the second data set.
- FIG. 4 shows an example of the result of this calculation process.
- ori indicates a binding constant as a temporary score of an attribute value that is a starting point of calculation in the learning unit 104.
- 0.0000 is assigned as an initial value for all peptide sequences.
- Mean means the average value of the prediction scores derived for each specific peptide sequence in the second data set, max in the same row is the maximum of the same prediction score, and min in the same row is the minimum of the same prediction score.
- Sd of the bank represents the standard deviation of the prediction score
- var of the bank represents the variance of the prediction score.
- the question point extraction unit 118 sequentially takes out the direction of the large variance.
- Figure 5 schematically shows the ranking in the dataset.
- peptide sequences as biopolymer sequences in a certain range, for example, from the one with the largest variance to the top 50 are extracted as question points, and the extracted peptide sequences are used as data request parts. 1 Sent to 20. Alternatively, it may be extracted as a peptide sequence ability question point whose variance is greater than a predetermined value!
- the peptide sequence related to the question point extracted by the question point extraction unit 118 is stored in data indicating a true attribute value, for example, measurement data obtained by an experiment or an external database.
- Request data such as literature.
- Data reception unit 122 Accepts measurement data input by the user in response to a request from the data request unit 120 or data such as literature obtained from a predetermined database as described later, and these data are data indicating true attribute values.
- the data control unit 128 associates the data sent from the data receiving unit 122 with the peptide sequence obtained as the question point, and adds an additional value including the peptide sequence and an attribute value related to the data.
- a data set is generated and sent to the storage device 126. As described above, this additional data set is accumulated in the storage device 126 and becomes a candidate for data in the subsequent hypothesis derivation.
- the sequence input receiving unit 130 forms information on the entire amino acid sequence of a predetermined protein for specifying a candidate peptide sequence that is desired to be predicted, for example, a target protein for which identification of an epitope is desired, such as a virus antigen.
- the input of the entire amino acid sequence of the protein is received, and the received data is sent to the sequence candidate extraction unit 131.
- This input may be performed through a user interface by a predetermined input device, or may be performed through a network connected to the user interface.
- target proteins other than viral antigens include bacteria, butteria, etc. involved in infectious diseases such as Mycobacterium tuberculosis, 0-157, Salmonella, Pseudomonas aeruginosa, Helicobacter pylori, Staphylococcus aureus, and malaria.
- Allergic diseases such as type I diabetes, Syugren's syndrome, hay fever, atopy, asthma, rheumatism, collagen disease, autoimmune diseases, suppression of organ transplant rejection, cancer immunity, etc. It can also be applied to proteins, such as cancer antigens; proteins related to Arno, imaemia, such as beta amyloid, which is the causative protein.
- sequence candidate extraction unit 131 a peptide sequence candidate to be predicted is extracted based on the entire amino acid sequence of a predetermined protein, which is information received by the sequence input reception unit 130, and the extracted peptide sequence Candidates are sent to the learning unit 104.
- the peptide sequences extracted by the sequence candidate extraction unit 131 may include sequences that cannot actually be used. Let's try to eliminate such unnecessary peptide sequences automatically without human intervention.
- FIG. 6 shows an example in which the sequence candidate extraction unit 131 is configured to eliminate unnecessary peptide sequences. Indicates.
- the peptide sequence is determined by using, for example, a peptide extraction unit consisting of 8 to 11, particularly 9 amino acids, as P monomer units of the total amino acid sequence power of the predetermined protein sent from the sequence input reception unit 130.
- a candidate extraction unit 150 that extracts candidates and an unnecessary sequence exclusion unit 152 that excludes peptide sequences that do not require prediction satisfying a predetermined condition satisfying the medium force of the extracted peptide sequence candidates are provided.
- candidate extraction unit 150 a peptide sequence is extracted in the peptide extraction unit from the beginning of the entire amino acid sequence received by sequence input reception unit 130, and the subsequent peptide sequence candidates are divided into q monomer units. For example, each peptide extraction unit is extracted while shifting one amino acid to the downstream side.
- the unnecessary sequence exclusion unit 152 stores an unnecessary sequence database 154 that accumulates data relating to peptide sequences that do not need to be predicted to satisfy a predetermined condition from the extracted peptide sequence candidates, for example, data related to unnecessary peptide sequences.
- the peptide sequence specified by reference is considered unnecessary, and the prediction candidate power is also excluded before sending it to the learning unit 104, and the remaining peptide sequence candidates are sent to the learning unit 104.
- unnecessary peptide sequences include, for example, low water solubility and peptide sequences.
- a virus antigen desired to be identified by the sequence input accepting unit 130 such as a CTL epitope of hepatitis C virus
- the entire amino acid sequence of the antigen protein of hepatitis C virus Peptide sequence candidates that act as force epitopes are extracted.
- the antigen of hepatitis C virus is formed from 8 to 11 amino acids presented on human leukocyte antigen (HLA) class I molecules that induce immunity as a specific protein, and CTL is this part. Recognizing that hepatitis C virus is known to be impaired.
- HLA human leukocyte antigen
- candidate peptide sequences are also extracted for the total amino acid sequence of the received protein, and unnecessary peptide sequences are extracted from the extracted peptide sequences before physical properties are predicted. This eliminates the need for unnecessary estimation operations in the learning unit 104.
- the unnecessary sequence database 154 may be a part of the storage device 126. In this case, add data related to physical properties such as hydrophobicity to a part of the data shown in Fig. 2.
- the data accumulated in the unnecessary sequence database 154 includes information on peptide sequences that require licenses from other companies and is configured to exclude such peptide sequences, for example, for the development of new drugs.
- the present embodiment can be used for the purpose of extracting peptide sequence candidates necessary for the preparation.
- an attribute value estimated by the learning unit 104 for example, a data set that is a combination force of a binding constant to an HLA class I molecule and a peptide sequence having this binding constant is accumulated.
- the condition input receiving unit 134 receives an input of an attribute value, for example, a binding constant, which becomes a keyword for extracting a peptide sequence having a predetermined physical property from the peptide database 138. Similar to the array input receiving unit 130, this input may be performed through a user interface by a predetermined input device, or may be performed through this network by connecting a network to the user interface.
- an attribute value for example, a binding constant
- an input of a condition (attribute value) required according to the use of the peptide sequence to be extracted is accepted.
- a condition attribute value
- a keyword having a binding constant higher than 6 for an HLA class I molecule that is a predetermined protein is accepted as a keyword.
- the sequence extraction unit 136 extracts a peptide sequence that satisfies the conditions received by the condition input reception unit 134 from the peptide database 138, and outputs the extracted peptide sequence as a prediction result.
- the learning part 104 receives an input to that effect, for example, a peptide sequence whose binding constant is estimated and information on the number of substitutions indicating how many amino acids are to be substituted in the peptide sequence.
- the calculation at the estimation stage can be performed at, and the attribute value of the new peptide sequence can be estimated based on the calculation result.
- FIG. 7 is a block diagram showing an overview of the sequence prediction system according to the second embodiment of the present invention.
- This sequence prediction system includes a storage device 126, which is a database for storing data including a peptide sequence having a first predetermined number of amino acid forces and physical properties that are indicative of a predetermined physiological activity of the peptide having the peptide sequence.
- a hypothesis deriving unit composed of a random resampling unit 110 that extracts a fourth predetermined number of data from 126 and supplies the second predetermined number of data to each learning unit 112 at random, and each learning
- the target sequence setting unit 160 (FIG.
- the physical property extraction unit 162 (Fig. 8) that extracts from each of the above hypotheses, and the variance evaluation unit 164 (Fig. 8) that evaluates the variance of the physical properties extracted from each learning unit 112.
- a question point sequence extraction unit 118 including a question point extraction unit 118 for extracting a peptide sequence for which true data for the physical property of the hypothesis is requested based on the evaluated variance, and the requested true value.
- a data control unit 128 that accumulates new data including the peptide sequence obtained by the data update unit and physical properties based on the true data in the storage device 126, and an entire amino acid of a predetermined protein.
- the sequence input accepting unit 130 that accepts an acid sequence, and the total amino acid sequence ability accepted by the sequence input accepting unit 130 also extract peptide sequence candidates that are subject to prediction, and the extracted peptide sequence candidates to the learning unit 112 A sequence candidate extraction unit 131 to be sent, and a physical property estimation output unit composed of a physical property estimation unit 132 that estimates the physical properties of the extracted peptide sequence candidates from the results obtained by the learning units 112.
- FIG. 2 is a diagram showing an example of a data set stored in the storage device 126.
- this data set includes a first predetermined number of amino acid-powered peptide sequences represented by known data and additional data as true data, and a predetermined physiological activity of this peptide sequence. It includes physical properties that serve as indices, for example, a binding constant (-logKd) to a human leukocyte antigen (HLA) complex that is an antigen presenting molecule closely related to immune induction.
- the number of amino acids that is the first predetermined number is 8 to when HLA class I molecules are targeted: a fixed value of L1, for example 9, and HLA class II molecules are targeted In some cases it is a fixed value of 20 or less.
- the binding target is an example of a peptide sequence having HLA as an antigen-presenting molecule.
- a physiologically active substance such as a G protein conjugate having a peptide as a ligand. It may be a peptide sequence that targets a type receptor, or may be a base sequence of a nucleic acid (such as DNA) encoding a predetermined peptide sequence as described above.
- the physical properties that serve as an index of the binding ability to a predetermined substance include physical properties related to binding, such as hydrophobicity (or hydrophilicity). May be.
- the learning unit 112 derives the data based on the data resampled by the random resampling unit 110 described later, and the data adding unit 124 described later if necessary.
- the additional data including the true data added in step S3 is sent to the storage device 126, and the data set stored in the storage device 126 is updated.
- Random resampler 110 resamples a second predetermined number of data randomly from the fourth predetermined number of data sent from data controller 128 and supplies the data to each learning unit 112 To do.
- the data control unit 128 and the random resampling unit 110 work together to supply the same number of different data (samples) to each learning unit 112 at random. For example, when 100 data as the fourth predetermined number is extracted from the storage device 126 and 50 data as the second predetermined number is supplied to each learning unit 112, the same data is supplied to all the learning units 112. Randomly resample and retrieve 100 to 50 data, send the retrieved data to one learning unit 112, and resample and retrieve another 50 data randomly The data is sent to another learning unit 112, and finally 50 different data are supplied to all learning units. Thereby, it is possible to prevent the same hypothesis from being derived from each learning unit 112. In this way, if there are at most several hundreds of measured values (document values), prediction by this system can be performed.
- the learning unit 112 performs processing according to the purpose at the learning stage and the estimation stage.
- the data control unit 128 performs the learning stage calculation.
- the control signal cont is sent to each learning unit 112, and when the control signal cont is input, the learning unit 112 performs a calculation in the learning stage.
- an estimation stage calculation is performed.
- a plurality of learning units for example, 50 learning units are used by using input data in accordance with the procedure of the hidden Markov model learning system described in Japanese Patent No. 3094860, for example.
- the probability calculation is performed at, and the calculation result is stored in the parameter storage device 140.
- the probability parameters accumulated in the parameter storage device 140 are the first predetermined number, for example, the probability of existence of each amino acid at each alignment position in the arrangement sequence of peptide sequences having nine amino acid forces, and the front and rear of each alignment position. It consists of the transition probability of.
- the probability is accumulated according to the probability parameters accumulated in the parameter storage device 140, and the existence probability of each amino acid at each aligned position of the virtual peptide sequence as shown in Fig. 3 is obtained. It becomes like this.
- a third predetermined number for example, 100,000 peptide sequences are combined based on the aggregation results as shown in FIG.
- Hypothesis data is obtained by calculating a prediction score corresponding to a constant.
- This hypothesis data is sent to the hypothesis comparison unit 114. Further, when the data set of the storage device 126 is updated using the hypothesis data in the storage device 126, the hypothesis data may be sent to the data control unit 128.
- the third predetermined number of peptide sequence sets may be a variable set that is set each time the learning phase calculation starts, and is arbitrarily input or selected by a person using this system. It may be a set.
- the calculation in the estimation stage is performed in substantially the same way as the calculation in the learning stage, and the scoring power of the binding constant corresponding to each peptide sequence obtained in each learning section 112 In the hypothesis comparison section 1 14 Without being sent, it is sent to the physical property estimation unit 132 described later.
- FIG. 8 shows a functional block diagram for explaining the function of the hypothesis comparison unit 114.
- the hypothesis comparison unit 114 includes a target sequence setting unit 160, a target physical property extraction unit 162, and a variance evaluation unit 164.
- the sequence-of-interest setting unit 160 sets a peptide sequence to be compared for determining how much the hypothesis derived from each learning unit 112 converges to V.
- This set peptide sequence is one of the peptide sequences of data constituting each hypothesis.
- the focused physical property extraction unit 162 extracts the physical property specified by the peptide sequence set by the focused sequence setting unit 160 from the hypothesis data.
- the variance evaluation unit 164 calculates the variance of the physical properties extracted by the focused physical property extraction unit 162 to obtain, for example, a data set as shown in FIG. 4 described above. The obtained variance is sent to the question point extraction unit 118.
- the question point extraction unit 118 sequentially extracts the direction of great variance obtained by the hypothesis comparison unit 114.
- Figure 5 schematically shows the ranking in the data set.
- the direction with large variance is extracted as the question points up to the top 50 which is the seventh predetermined number range, and the extracted peptide sequence is sent to the data request unit 120.
- a peptide sequence having a variance greater than a predetermined value may be extracted as a target peptide sequence for which true data is requested, that is, a question point.
- the data request unit 120 requests true data, for example, measurement data obtained by experiments or data such as documents stored in an external database, with respect to the peptide sequences related to the question points extracted by the question point extraction unit 118.
- the data receiving unit 122 receives the measurement data according to the input by the user according to the request from the data requesting unit 120, and the literature data obtained from a predetermined database as described later, and converts these data into the true data. To the data adding unit 124.
- the data adding unit 124 once captures the true data sent from the data receiving unit 122, associates it with the peptide sequence that was the question point, and generates additional data including this peptide sequence and this physical property. Processing is performed, and this additional data is sent to the data control unit 128.
- the sequence input receiving unit 130 desires to identify information on the entire amino acid sequence of a predetermined protein, for example, identification of an epitope, in order to specify a candidate peptide sequence that is desired to be predicted.
- the input of the entire amino acid sequence of the target protein to be processed, for example, the protein forming the virus antigen, is received, and the received data is sent to the sequence candidate extraction unit 131.
- This input may be performed through a user interface by a predetermined input device, or may be performed through a network connected to the user interface.
- target proteins other than viral antigens as described above may be targeted for sequence input reception.
- the sequence candidate extraction unit 131 extracts peptide sequence candidates to be predicted based on the entire amino acid sequence of a predetermined protein, which is information received by the sequence input reception unit 130, and extracts the extracted peptide sequence Candidates are sent to each learning unit 112.
- the peptide sequences extracted by the sequence candidate extraction unit 131 may include sequences that cannot actually be used.
- the sequence candidate extraction unit 131 may be configured to automatically eliminate such unnecessary peptide sequences without human assistance.
- the learning is performed in the estimation step of the learning unit 112 according to the peptide sequence candidates after being extracted by the sequence candidate extraction unit 131 and unnecessary peptide sequences are excluded as necessary. According to the obtained results, the physical properties of each peptide sequence are estimated. This calculation result is obtained, for example, in a data set as shown in FIG. 5 described above.
- the physical property estimation unit 1 32 for example, an average value is obtained for each peptide sequence, and a given protein of the peptide sequence, for example, a target protein. This estimation is performed for all peptide sequence candidates, and the combination of the peptide sequence and the estimated physical property is sent to the peptide database 138.
- a data set is obtained which is a combination of the physical properties estimated by the physical property estimation unit 132, for example, binding constants to HLA class I molecules and peptide sequences having the physical properties.
- the condition input accepting unit 134 accepts input of physical properties, for example, binding constants, which are keywords for extracting peptide sequences having predetermined physical properties from the peptide database 138.
- This input may be made through a user interface by a predetermined input device as in the case of the array input receiving unit 130, and may be input to the user interface via a network. Connect this network and make it through this network.
- an input of conditions (physical properties) required according to the use of the peptide sequence to be extracted is accepted.
- a peptide sequence is used as a therapeutic agent for hepatitis C
- the binding constant for an HLA class I molecule that is a predetermined protein is accepted as a keyword.
- the sequence extraction unit 136 extracts a peptide sequence satisfying the condition received by the condition input reception unit 134 from the peptide database 138, and outputs the extracted peptide sequence as a prediction result.
- the physical properties of a new peptide sequence obtained by substituting one to several amino acids into the peptide sequence were examined.
- an input to that effect for example, a peptide sequence for which a binding constant is estimated, and an eighth predetermined number of information on how many amino acids are to be substituted in the peptide sequence are input.
- the learning unit 112 can perform an estimation stage calculation, and the physical property estimation unit 132 can estimate the physical property of a new peptide sequence based on the calculation result.
- Fig. 9 is a diagram showing a case where a request for true data is made to an external database that is not directed to the user.
- an example applied to the sequence prediction system shown in FIG. 7 is shown, but the present invention can also be applied to the sequence prediction system shown in FIG.
- the peptide sequence is sent to the database control unit 162 via the network 160 in response to a request from the data request unit 120, and the database control unit 162 stores the measured value of this peptide sequence in the measured value database 164.
- this actual value is obtained, it is sent as data such as documents to the data reception unit 122 through the network 160. By doing so, it is possible to obtain true data automatically without help from people.
- FIG. 10 is a flowchart for explaining the operation of the sequence prediction support system according to the embodiment of the sequence prediction support method of the present invention.
- the sequence prediction support system of this embodiment is included in the sequence prediction system according to the first embodiment shown in FIG. 1, and the reference numerals in FIG. [0133]
- N data sets are selected from a database having biopolymer sequences and attribute values of the biopolymers of this sequence, and a plurality of different data sets are selected from the data sets.
- a hypothesis is generated for each data subset in step S1, which is a data supply stage for generating and supplying a data subset to the learning unit, and the biopolymer alignment ability independent of the data set is also generated in the learning unit.
- Step S2 which is a hypothesis derivation stage in which the hypothesis is applied to each second data set to derive the attribute value of the biopolymer sequence related to the second data set, and each biological height in the second data set
- Step S3 which is a variance calculation stage for calculating the variance of attribute values for the molecular arrangement, and the living body height having a variance larger than a certain standard among the calculated variances
- Step S4 which is a question point extraction stage for extracting a child sequence as a question point, and an attribute value for this question point are received, and the received attribute value is associated with the biopolymer sequence related to the question point, thereby de- And step S5, which is a data update stage stored in the database.
- step S1 the data control unit 128 selects N data sets each including an array of biopolymers and attribute values of the biopolymers of this array from a storage device as a database, and further generates a generation unit.
- a plurality of different data subsets are generated from these N data sets and supplied to the learning unit 104.
- step S2 as described above, the hypothesis generated for each data subset by the learning unit 104 is applied to the biopolymer sequence (peptide sequence) of the second data set. , The attribute value of each peptide sequence is derived.
- step S3 the question point extraction unit 118 calculates the variance of the attribute value of each biopolymer sequence.
- Step S4 the question point extraction unit 118 continues to extract, as the question points, biopolymer sequences having a variance larger than a certain standard among the calculated variances.
- step S5 the attribute value for the extracted question point is received by the data receiving unit 122, and the data control unit 128 associates the received attribute value with the biomolecular sequence related to the question point.
- the data is sent to and stored in the storage device 126, and the contents of the storage device 126 are updated.
- a database that supports sequence prediction is constructed.
- steps S1 to S5 are performed by, for example, maximum dispersion obtained in step S3.
- the reliability of the contents of the sequence prediction support database can be further improved until the value becomes smaller than the predetermined value.
- FIG. 11 is a flowchart showing the operation of the sequence prediction system using the database constructed by the sequence prediction support system that works on the first embodiment shown in FIG. 1 or an existing database.
- step S110 the sequence input accepting unit 130 accepts the entire sequence of a predetermined biopolymer, for example, a protein, and the sequence candidate extracting unit 118 accepts this all sequence force prediction target.
- a biopolymer sequence to be obtained, in this case, a peptide sequence candidate is extracted and sent to the learning unit 104.
- step S111 after receiving the array input, the data control unit 128 extracts all data sets in the storage device 128 and sends them to the learning unit 104.
- the learning unit 104 generates a rule from the entire data set and applies the rule to each of the biopolymer sequence candidates to estimate the attribute value of the biopolymer sequence candidate.
- step S112 is provided, and the attribute value estimated by the learning unit 104 is sent to the peptide database 138 and accumulated in association with the corresponding peptide sequence, whereby data consisting of the peptide sequence and the attribute value is stored.
- a database of sets can be created. This data set is not limited to peptide sequences, and any database of biopolymers such as DNA and RNA can be databased together with attribute values.
- Step S 113 to Step S 114 are provided.
- a keyword for extracting a peptide sequence having a predetermined attribute value from the peptide database 138 in the condition input receiving unit 134 for example, Accepts input of conditions such as the attribute value being greater than the binding constant for a specific protein.
- step S114 the sequence extraction unit 136 extracts a peptide sequence satisfying the condition received by the condition input reception unit 134 from the peptide database 138, and outputs the extracted peptide sequence as a prediction result. .
- the peptide arrangement force having a predetermined attribute value. It can be extracted as expected to show a loop.
- FIG. 12 is a flowchart for explaining the operation of the sequence prediction support system included in the sequence prediction system according to the second embodiment shown in FIG. In the following description, the reference numerals in FIG.
- step S 10 data is extracted from the storage device 126 by the data control unit 128, and different data powers are randomly resampled to each learning unit 112 through the random resampling 110.
- each learning unit 112 analyzes the supplied data and determines a third hypothesis, that is, a third predetermined number of 100,000 peptide sequences based on a certain hypothesis, that is, a peptide sequence and predetermined physical properties.
- a data set including the obtained score is derived.
- step S30 the target sequence setting unit 160 sets a predetermined peptide sequence for comparing the same hypotheses derived by each learning unit 112.
- step S40 the target physical property extraction unit 162 extracts the set predetermined peptide sequence and physical properties from the hypotheses of the respective learning units 112.
- step S50 the variance evaluation unit 164 evaluates the variance of the physical properties extracted from each learning unit 112.
- step S60 the question point extraction unit 118 extracts in order of magnitude and direction of variance evaluated by the variance evaluation unit 164 of the hypothesis comparison unit 114.
- the data set obtained in this way is shown schematically in Fig. 5.
- step S70 the top 50 of the data set obtained in step S60 is extracted as the question points as described above, and the extracted peptide sequence is used as the true data for the hypothetical physical properties. It is extracted as a peptide sequence to be requested.
- step S80 the data requesting unit 120 requests true data, the data receiving unit 122 receives the requested true data, and the data adding unit 124 extracts the array extracted in step S70. Additional data can be obtained by defining the hypothetical physical properties as true data.
- step S90 the data is sent to the storage device 126 through the additional data force data control unit 128 obtained by the data addition unit 124, and the data in the storage device 126 is updated.
- step S100 it is determined whether or not to perform the next learning.
- This judgment result is YES, That is, when the next learning is performed, the process returns to step S10, and learning data is randomly supplied to each learning unit 112 by random resampling 110.
- the determination result power NO that is, when the next learning is not performed, the sequence prediction support operation ends.
- the number of times of learning may be determined in advance as a predetermined number of times, or it may be determined whether or not the next learning is performed at each end.
- steps S60 and S70 the peptide sequences are rearranged in descending order of the hypothesis data, and a predetermined number, for example, up to 50 is extracted from the top as question points. Make sure that the peptide sequence is extracted as a question point.
- FIG. 13 is a flowchart showing the operation of the sequence prediction system using the database constructed by the sequence prediction support system according to the second embodiment.
- step S200 the sequence input accepting unit 130 accepts the entire amino acid sequence of a virus antigen, which is a target protein for a predetermined substance, for example, an antigen-presenting molecule.
- a virus antigen which is a target protein for a predetermined substance, for example, an antigen-presenting molecule.
- Peptide sequence candidates are extracted, the learning unit 112 performs an estimation stage calculation, and the physical property estimation unit estimates the binding constant of the peptide sequence candidate to the virus antigen from the calculation result.
- a data set of all the peptide sequence candidates and predetermined physical properties is generated and stored in the database 138.
- step S230 the condition input receiving unit 134 receives an input of a physical property as a keyword for extracting a peptide sequence having a predetermined physical property from the peptide database 138, for example, a binding constant for a predetermined protein.
- step S240 the sequence extraction unit 136 extracts a peptide sequence satisfying the condition received by the condition input reception unit 134 from the peptide database 138, and outputs the extracted peptide sequence as a prediction result. .
- a peptide sequence having a predetermined physical property can be extracted as expected to exhibit an epitope that binds to a predetermined substance.
- a third predetermined number of peptide sequences As a hypothesis to be output to the plurality of learning units 112, a third predetermined number of peptide sequences and Instead of using the value of the binding constant for this, it is possible to make an epitope prediction calculation by outputting a list of 9 amino acids derived from the amino acid sequence of another predetermined protein, such as a target protein, such as a viral antigen. also not limited to the number of 100,000 as the third predetermined number, Tsu All peptide sequences be output for 20 nine whole peptide sequence when a predetermined number of fifth and 9, the prediction of the hand become able to.
- a peptide sequence having such immunity-inducing ability can be predicted using the number of T cell proliferation induced by this as an indicator of physiological activity.
- peptides are used as ligands, but specific peptide ligands have not been identified.
- Optimization of ligands for orphan G-protein coupled receptors (orphan-GPCRs) Optimum for this activity system, using numerical values such as calcium concentration increase and intracellular cAMP (intracellular biomolecule) increase in cultured cells following peptide administration as indicators of physiological activity. By predicting the correct peptide sequence.
- the peptide sequence can also be predicted using the increase in the blood concentration of a physiologically active peptide or physiologically active hormone that also has peptide power as an index of physiological activity.
- the present embodiment can also be applied to DNA sequence prediction.
- a transcription factor that controls gene expression must bind upstream of the gene sequence on the DNA, and the DNA base sequence of the transcription factor binding site must have a certain motif or law. It has been known. Therefore, by predicting candidate transcription factor sequences that bind to promoters involved in specific gene expression, a law is established between gene expression and the DNA sequence pattern of the transcription factor binding site in a specific gene expression system. Can be found, and it is also possible to control gene expression and transcription factor binding.
- RNAi sequence prediction For example, a specific small double-stranded RNA base sequence (siRNA) of about 10 to 20 bases.
- siRNA small double-stranded RNA base sequence
- auxiliary factor 'Downstream
- siRNA sequence candidates that bind to mRNA involved in specific gene expression, it becomes possible to predict the relationship between specific physiological activities and RNAi sequences.
- RNAi sequence design which is actively researched and developed, will be possible.
- RNA aptamer is usually an RNA strand of 20 bases or more, and has a specific stable three-dimensional structure formed by binding between complementary bases in the sequence.
- a specific protein such as a target protein is used by utilizing this structural property.
- the present invention also provides a program that causes a general-purpose computer device to function as the above-described sequence prediction system or sequence prediction support system.
- biopolymer sequence such as a peptide sequence having a certain predetermined physical property or a nucleic acid base sequence by experiments. Become.
- each component of the above sequence prediction system or sequence prediction support system can also be expressed by a program, and by using such a program, a general-purpose computer apparatus can be connected to the sequence prediction system or the sequence prediction system. It is possible to operate as a prediction support system.
- an unnecessary sequence removing unit as shown in FIG. You may provide a structure like an unnecessary arrangement
Landscapes
- Life Sciences & Earth Sciences (AREA)
- Physics & Mathematics (AREA)
- Health & Medical Sciences (AREA)
- Medical Informatics (AREA)
- Engineering & Computer Science (AREA)
- General Health & Medical Sciences (AREA)
- Spectroscopy & Molecular Physics (AREA)
- Biophysics (AREA)
- Bioinformatics & Cheminformatics (AREA)
- Bioinformatics & Computational Biology (AREA)
- Biotechnology (AREA)
- Evolutionary Biology (AREA)
- Theoretical Computer Science (AREA)
- Artificial Intelligence (AREA)
- Databases & Information Systems (AREA)
- Software Systems (AREA)
- Public Health (AREA)
- Bioethics (AREA)
- Computer Vision & Pattern Recognition (AREA)
- Data Mining & Analysis (AREA)
- Evolutionary Computation (AREA)
- Epidemiology (AREA)
- Chemical & Material Sciences (AREA)
- Proteomics, Peptides & Aminoacids (AREA)
- Analytical Chemistry (AREA)
- Information Retrieval, Db Structures And Fs Structures Therefor (AREA)
- Management, Administration, Business Operations System, And Electronic Commerce (AREA)
Abstract
Description
Claims
Priority Applications (2)
Application Number | Priority Date | Filing Date | Title |
---|---|---|---|
JP2006528959A JPWO2006004182A1 (ja) | 2004-07-07 | 2005-07-07 | 配列予測システム |
US11/571,822 US20090144209A1 (en) | 2004-07-07 | 2005-07-07 | Sequence prediction system |
Applications Claiming Priority (2)
Application Number | Priority Date | Filing Date | Title |
---|---|---|---|
JP2004-201116 | 2004-07-07 | ||
JP2004201116 | 2004-07-07 |
Publications (2)
Publication Number | Publication Date |
---|---|
WO2006004182A1 true WO2006004182A1 (ja) | 2006-01-12 |
WO2006004182A9 WO2006004182A9 (ja) | 2006-03-09 |
Family
ID=35782982
Family Applications (1)
Application Number | Title | Priority Date | Filing Date |
---|---|---|---|
PCT/JP2005/012542 WO2006004182A1 (ja) | 2004-07-07 | 2005-07-07 | 配列予測システム |
Country Status (3)
Country | Link |
---|---|
US (1) | US20090144209A1 (ja) |
JP (1) | JPWO2006004182A1 (ja) |
WO (1) | WO2006004182A1 (ja) |
Cited By (9)
Publication number | Priority date | Publication date | Assignee | Title |
---|---|---|---|---|
WO2007094137A1 (ja) | 2006-02-17 | 2007-08-23 | Nec Corporation | 細胞傷害性t細胞の誘導方法、細胞傷害性t細胞の誘導剤、およびそれを用いた医薬組成物およびワクチン |
WO2008047876A1 (fr) * | 2006-10-18 | 2008-04-24 | Nec Soft, Ltd. | Procédé permettant d'identifier une séquence nucléotidique et procédé permettant d'obtenir une structure secondaire de molécule d'acide nucléique, appareil permettant d'identifier une séquence nucléotidique et appareil permettant d'obtenir une structure secondaire de molécule d'acide nucléique, et programme permettant d'ide |
WO2009066462A1 (ja) | 2007-11-20 | 2009-05-28 | Nec Corporation | 細胞傷害性t細胞の誘導方法、細胞傷害性t細胞の誘導剤、およびそれを用いた医薬組成物およびワクチン |
JP2010115177A (ja) * | 2008-11-14 | 2010-05-27 | Nec Soft Ltd | 分解耐性を有するrnaアプタマー分子の修飾ヌクレオチド配列の選択方法 |
JP2010519904A (ja) * | 2007-02-28 | 2010-06-10 | アメリカ合衆国 | ブラキュリポリペプチドおよび使用方法 |
JP2012515402A (ja) * | 2009-01-14 | 2012-07-05 | ガタカ,エルエルシー | ウイルスデータを管理するための統合デスクトップソフトウェア |
JP5262709B2 (ja) * | 2006-03-15 | 2013-08-14 | 日本電気株式会社 | 分子構造予測システム、方法及びプログラム |
EP3925968A2 (en) | 2014-10-07 | 2021-12-22 | Cytlimic Inc. | Hsp70-derived peptide, pharmaceutical composition for treating or preventing cancer using same, immunity inducer, and method for producing antigen-presenting cell |
US11618770B2 (en) | 2015-03-09 | 2023-04-04 | Nec Corporation | MUC1-derived peptide, and pharmaceutical composition for treatment or prevention of cancer, immunity-inducing agent and method for manufacturing antigen presenting cell using same |
Families Citing this family (3)
Publication number | Priority date | Publication date | Assignee | Title |
---|---|---|---|---|
WO2012005898A2 (en) * | 2010-06-15 | 2012-01-12 | Alnylam Pharmaceuticals, Inc. | Chinese hamster ovary (cho) cell transcriptome, corresponding sirnas and uses thereof |
US9609074B2 (en) * | 2014-06-18 | 2017-03-28 | Adobe Systems Incorporated | Performing predictive analysis on usage analytics |
JP7259596B2 (ja) * | 2019-07-01 | 2023-04-18 | 富士通株式会社 | 予測プログラム、予測方法および予測装置 |
-
2005
- 2005-07-07 US US11/571,822 patent/US20090144209A1/en not_active Abandoned
- 2005-07-07 JP JP2006528959A patent/JPWO2006004182A1/ja active Pending
- 2005-07-07 WO PCT/JP2005/012542 patent/WO2006004182A1/ja active Application Filing
Non-Patent Citations (2)
Title |
---|
ASOGAWA M. ET AL: "Nodo Gakushuho o Riyo shita Soyaku Screening. (Drug Screening Using Active Learning)", NEC TECHNICAL JOURNAL., vol. 56, no. 10, 25 November 2003 (2003-11-25), pages 28 - 32, XP002998982 * |
MIYAGAWA T. ET AL: "Nodo Gakushuho o Riyo shita Peptide Vaccine Kaihatsu. (Peptide Vaccine Development with Application of "Active Learning Methods")", NEC TECHNICAL JOURNAL., vol. 56, no. 10, 25 November 2003 (2003-11-25), pages 33 - 37, XP002998981 * |
Cited By (15)
Publication number | Priority date | Publication date | Assignee | Title |
---|---|---|---|---|
EP2491940A3 (en) * | 2006-02-17 | 2012-11-28 | Nec Corporation | Method for inducing cytotoxic T-cells, cytotoxic T-cell inducer, and pharmaceutical composition and vaccine employing same |
WO2007094137A1 (ja) | 2006-02-17 | 2007-08-23 | Nec Corporation | 細胞傷害性t細胞の誘導方法、細胞傷害性t細胞の誘導剤、およびそれを用いた医薬組成物およびワクチン |
EP2491940A2 (en) | 2006-02-17 | 2012-08-29 | Nec Corporation | Method for inducing cytotoxic T-cells, cytotoxic T-cell inducer, and pharmaceutical composition and vaccine employing same |
JP5262709B2 (ja) * | 2006-03-15 | 2013-08-14 | 日本電気株式会社 | 分子構造予測システム、方法及びプログラム |
WO2008047876A1 (fr) * | 2006-10-18 | 2008-04-24 | Nec Soft, Ltd. | Procédé permettant d'identifier une séquence nucléotidique et procédé permettant d'obtenir une structure secondaire de molécule d'acide nucléique, appareil permettant d'identifier une séquence nucléotidique et appareil permettant d'obtenir une structure secondaire de molécule d'acide nucléique, et programme permettant d'ide |
JP2008102675A (ja) * | 2006-10-18 | 2008-05-01 | Nec Soft Ltd | 塩基配列の同定方法及び核酸分子の二次構造取得方法、並びにこれらを実行する装置及びプログラム |
US9311447B2 (en) | 2006-10-18 | 2016-04-12 | Nec Solution Innovators, Ltd. | Method for identifying nucleotide sequence, method for acquiring secondary structure of nucleic acid molecule, apparatus for identifying nucleotide sequence, apparatus for acquiring secondary structure of nucleic acid molecule, program for identifying nucleotide sequence, and program for acquiring secondary structure of nucleic acid molecule |
US8200441B2 (en) | 2006-10-18 | 2012-06-12 | Nec Soft, Ltd. | Method for identifying nucleotide sequence, method for acquiring secondary structure of nucleic acid molecule, apparatus for identifying nucleotide sequence, apparatus for acquiring secondary structure of nucleic acid molecule, program for identifying nucleotide sequence, and program for acquiring secondary structure of nucleic acid molecule |
JP2010519904A (ja) * | 2007-02-28 | 2010-06-10 | アメリカ合衆国 | ブラキュリポリペプチドおよび使用方法 |
WO2009066462A1 (ja) | 2007-11-20 | 2009-05-28 | Nec Corporation | 細胞傷害性t細胞の誘導方法、細胞傷害性t細胞の誘導剤、およびそれを用いた医薬組成物およびワクチン |
EP2216041A4 (en) * | 2007-11-20 | 2012-10-24 | Nec Corp | METHOD FOR INDUCING CYTOTOXIC LYMPHOCYTE T, CYTOTOXIC LYMPHOCYTE T INDUCER, AND PHARMACEUTICAL COMPOSITION AND VACCINE COMPRISING EACH INDUCER |
JP2010115177A (ja) * | 2008-11-14 | 2010-05-27 | Nec Soft Ltd | 分解耐性を有するrnaアプタマー分子の修飾ヌクレオチド配列の選択方法 |
JP2012515402A (ja) * | 2009-01-14 | 2012-07-05 | ガタカ,エルエルシー | ウイルスデータを管理するための統合デスクトップソフトウェア |
EP3925968A2 (en) | 2014-10-07 | 2021-12-22 | Cytlimic Inc. | Hsp70-derived peptide, pharmaceutical composition for treating or preventing cancer using same, immunity inducer, and method for producing antigen-presenting cell |
US11618770B2 (en) | 2015-03-09 | 2023-04-04 | Nec Corporation | MUC1-derived peptide, and pharmaceutical composition for treatment or prevention of cancer, immunity-inducing agent and method for manufacturing antigen presenting cell using same |
Also Published As
Publication number | Publication date |
---|---|
JPWO2006004182A1 (ja) | 2008-04-24 |
WO2006004182A9 (ja) | 2006-03-09 |
US20090144209A1 (en) | 2009-06-04 |
Similar Documents
Publication | Publication Date | Title |
---|---|---|
WO2006004182A1 (ja) | 配列予測システム | |
Jain et al. | Prediction modelling of COVID using machine learning methods from B-cell dataset | |
Li et al. | DeepImmuno: deep learning-empowered prediction and generation of immunogenic peptides for T-cell immunity | |
RU2015110326A (ru) | Система и способ клинической поддержки | |
US20220130541A1 (en) | Disease-gene prioritization method and system | |
Yang et al. | Prediction of aptamer–protein interacting pairs based on sparse autoencoder feature extraction and an ensemble classifier | |
Vanunu et al. | A propagation-based algorithm for inferring gene-disease associations | |
CN110738650B (zh) | 一种传染病感染识别方法、终端设备及存储介质 | |
KR20220099504A (ko) | 친화도 예측 방법 및 모델의 트레이닝 방법, 장치, 전자 기기 및 기록 매체 | |
US20240170097A1 (en) | Method and system for optimal vaccine design | |
CN103473416A (zh) | 蛋白质相互作用的模型建立方法和装置 | |
Xu et al. | NetBCE: an interpretable deep neural network for accurate prediction of linear B-cell epitopes | |
Oladipo et al. | Immunoinformatics design of multi-epitope peptide for the diagnosis of Schistosoma haematobium infection | |
CN113345581B (zh) | 一种基于集成学习的脑卒中溶栓后出血概率预测方法 | |
Yerneni et al. | IAS: Interaction specific GO term associations for predicting Protein-Protein Interaction Networks | |
Barrio et al. | EVALLER: a web server for in silico assessment of potential protein allergenicity | |
Li et al. | ACNNT3: attention-CNN framework for prediction of sequence-based bacterial type III secreted effectors | |
JP2019101654A (ja) | 健康管理支援装置、方法およびプログラム | |
JP5773406B2 (ja) | Gpiアンカー型タンパク質の判定装置、判定方法及び判定プログラム | |
Zhang et al. | Optimally-connected hidden markov models for predicting MHC-binding peptides | |
CN100428254C (zh) | 交叉反应抗原计算机辅助筛选的方法 | |
Ullah et al. | Estimating a ranked list of human hereditary diseases for clinical phenotypes by using weighted bipartite network | |
Singh et al. | Prediction and analysis of paralogous proteins in Trichomonas vaginalis genome | |
CN114388123A (zh) | 智能辅诊方法、装置、设备及存储介质 | |
CN109256215B (zh) | 一种基于自回避随机游走的疾病关联miRNA预测方法及系统 |
Legal Events
Date | Code | Title | Description |
---|---|---|---|
AK | Designated states |
Kind code of ref document: A1 Designated state(s): AE AG AL AM AT AU AZ BA BB BG BR BW BY BZ CA CH CN CO CR CU CZ DE DK DM DZ EC EE EG ES FI GB GD GE GH GM HR HU ID IL IN IS JP KE KG KM KP KR KZ LC LK LR LS LT LU LV MA MD MG MK MN MW MX MZ NA NG NI NO NZ OM PG PH PL PT RO RU SC SD SE SG SK SL SM SY TJ TM TN TR TT TZ UA UG US UZ VC VN YU ZA ZM ZW |
|
AL | Designated countries for regional patents |
Kind code of ref document: A1 Designated state(s): GM KE LS MW MZ NA SD SL SZ TZ UG ZM ZW AM AZ BY KG KZ MD RU TJ TM AT BE BG CH CY CZ DE DK EE ES FI FR GB GR HU IE IS IT LT LU LV MC NL PL PT RO SE SI SK TR BF BJ CF CG CI CM GA GN GQ GW ML MR NE SN TD TG |
|
COP | Corrected version of pamphlet |
Free format text: PAGES 1/10 AND 3/10, DRAWINGS, REPLACED BY NEW PAGES 1/10 AND 3/10; DUE TO LATE TRANSMITTAL BY THE RECEIVING OFFICE |
|
121 | Ep: the epo has been informed by wipo that ep was designated in this application | ||
WWE | Wipo information: entry into national phase |
Ref document number: 2006528959 Country of ref document: JP |
|
NENP | Non-entry into the national phase |
Ref country code: DE |
|
WWW | Wipo information: withdrawn in national office |
Country of ref document: DE |
|
122 | Ep: pct application non-entry in european phase | ||
WWE | Wipo information: entry into national phase |
Ref document number: 11571822 Country of ref document: US |