WO2006004182A9 - 配列予測システム - Google Patents
配列予測システムInfo
- Publication number
- WO2006004182A9 WO2006004182A9 PCT/JP2005/012542 JP2005012542W WO2006004182A9 WO 2006004182 A9 WO2006004182 A9 WO 2006004182A9 JP 2005012542 W JP2005012542 W JP 2005012542W WO 2006004182 A9 WO2006004182 A9 WO 2006004182A9
- Authority
- WO
- WIPO (PCT)
- Prior art keywords
- sequence
- data
- biopolymer
- unit
- database
- Prior art date
Links
Classifications
-
- G—PHYSICS
- G16—INFORMATION AND COMMUNICATION TECHNOLOGY [ICT] SPECIALLY ADAPTED FOR SPECIFIC APPLICATION FIELDS
- G16B—BIOINFORMATICS, i.e. INFORMATION AND COMMUNICATION TECHNOLOGY [ICT] SPECIALLY ADAPTED FOR GENETIC OR PROTEIN-RELATED DATA PROCESSING IN COMPUTATIONAL MOLECULAR BIOLOGY
- G16B40/00—ICT specially adapted for biostatistics; ICT specially adapted for bioinformatics-related machine learning or data mining, e.g. knowledge discovery or pattern finding
-
- G—PHYSICS
- G16—INFORMATION AND COMMUNICATION TECHNOLOGY [ICT] SPECIALLY ADAPTED FOR SPECIFIC APPLICATION FIELDS
- G16B—BIOINFORMATICS, i.e. INFORMATION AND COMMUNICATION TECHNOLOGY [ICT] SPECIALLY ADAPTED FOR GENETIC OR PROTEIN-RELATED DATA PROCESSING IN COMPUTATIONAL MOLECULAR BIOLOGY
- G16B30/00—ICT specially adapted for sequence analysis involving nucleotides or amino acids
-
- G—PHYSICS
- G16—INFORMATION AND COMMUNICATION TECHNOLOGY [ICT] SPECIALLY ADAPTED FOR SPECIFIC APPLICATION FIELDS
- G16B—BIOINFORMATICS, i.e. INFORMATION AND COMMUNICATION TECHNOLOGY [ICT] SPECIALLY ADAPTED FOR GENETIC OR PROTEIN-RELATED DATA PROCESSING IN COMPUTATIONAL MOLECULAR BIOLOGY
- G16B15/00—ICT specially adapted for analysing two-dimensional or three-dimensional molecular structures, e.g. structural or functional relations or structure alignment
Definitions
- the present invention relates to a sequence prediction system, and more particularly to a sequence prediction system and a sequence prediction database for predicting a sequence of a peptide having specific physical properties.
- the present invention also relates to a sequence prediction support system that supports this sequence prediction.
- the present invention relates to a sequence prediction program and method for operating a sequence prediction system.
- the present invention also relates to a sequence prediction support program and method for operating a sequence prediction support system.
- HCV hepatitis C virus
- CTL cytotoxic T cells
- CTL epitopes In order to identify such CTL epitopes, we perform database capillary predictions such as BIMAS and SYFPEITHI, and then conduct experiments to determine whether they actually bind to HLA molecules according to the prediction results. Those that bound to CTL were identified as CTL epitopes.
- Non-patent document 1 describes a method for identifying peptides that bind to HLA molecules more accurately in order to identify peptides that bind to HLA molecules in less experiments! / Listed in a hurry.
- Non-Patent Document 1 Udaka, K., et al, 'Empirical Evaluation of a Dynamic Experiment Design Method for Prediction of MHC Class I- Binging Peptides', The Journal oflmmunology, 169, p5744-5753, 2002
- Non-Patent Document 1 determination is made of power / force of having a predetermined physical property, for example, a binding ability to an HLA molecule as described above, with respect to a peptide sequence arbitrarily selected from a computer. Thus, whether or not the actually selected peptide sequence has a predetermined physical property was confirmed by conducting an experiment. Non-Patent Document 1 describes that the selected peptide sequence was actually confirmed to have a predetermined physical property with a high V probability (2nd paragraph, page 5749, right column).
- Non-Patent Document 1 the technique described in Non-Patent Document 1 is limited to a specific target, for example, a virus antigen, and it is necessary for the predicted peptide sequence to function as a virus antigen without experimentation. If a key having a specific physical property is quantitatively discriminated and only a sequence that is discriminated to have it is selected, it cannot be applied as it is, and it is still insufficient.
- RNAi RNA interference sequence prediction
- RNA aptamer single sequence prediction RNA aptamer single sequence prediction
- the present invention has been made in view of the above-described circumstances, and a sequence prediction system and sequence capable of selecting only a biopolymer sequence having a certain predetermined physical property without performing an experiment.
- the object is to provide a prediction database, a sequence prediction support system, a sequence prediction program, a sequence prediction support program, a sequence prediction method, and a sequence prediction support method.
- the sequence prediction system has a database having a biopolymer attribute including a biopolymer sequence and an attribute value included in the biopolymer of this sequence.
- a generator for generating a plurality of different data subsets from the data set; A hypothesis is generated for each data subset, and each hypothesis is applied to a second data set consisting of a biopolymer sequence independent of the data set, and a biological height of the second data set is determined.
- a learning unit for deriving the attribute value of the molecular sequence, and obtaining the variance of the attribute value for each biopolymer sequence in the second data set, and extracting the biopolymer sequence having a variance larger than a certain standard as a question point Question point extractor and
- An attribute value for the question point is received, the received attribute value is associated with the question point, and the biopolymer sequence is associated with the data control unit, and the entire sequence of the predetermined biopolymer is received.
- An array input receiving unit
- a sequence candidate extraction unit that extracts biopolymer sequence candidates to be predicted from all the sequences received by the sequence input reception unit
- An attribute value estimation unit that generates a rule from all the data sets of the database after receiving sequence input, and applies a rule to each of the biopolymer sequence candidates to estimate an attribute value of the biopolymer sequence candidate; Mu
- N data sets having a data pace force are extracted by the selection unit, and a plurality of different data subsets are generated from the N data sets by the generation unit.
- the learning unit analyzes each of the data subsets independently to generate a certain hypothesis, and applies the hypothesis to the biopolymer sequence of the second data set to derive attribute values.
- the second data set having the biopolymer sequence and the derived attribute value is generated as many as the number of data subsets. That is, attribute values are derived for the same biopolymer sequence based on hypotheses derived from each data subset.
- the question point extraction unit obtains a variance of a plurality of attribute values derived for the same biopolymer sequence, and extracts a biopolymer sequence having a variance larger than a certain standard as a question point.
- the data control unit receives the attribute value for the question point, associates it with the biopolymer sequence related to the question point, accumulates it in the database, and updates the contents of the database.
- the sequence input accepting unit accepts the entire sequence of the predetermined biopolymer, and the sequence candidate extracting unit extracts the biopolymer sequence candidate that is the target of attribute value prediction from the entire sequence.
- the attribute value estimation unit the updated database data A rule is generated from the set, and the rule value is estimated for each biopolymer array by applying this rule to the biopolymer array scouting.
- the learning unit may be configured to function as an attribute value estimation unit after receiving the array input.
- the second hypothesis created arbitrarily by applying hypotheses generated for each of a plurality of data subsets generated by the generator, etc. While deriving attribute values for each biopolymer sequence in the current data set, at the time of attribute value prediction, a law generated from the data set included in the updated database is applied to each biopolymer sequence.
- the attribute value can be calculated as an estimated value for the candidate.
- the biopolymer sequence is extracted in units of p monomer extraction units from the beginning of all the sequences received by the sequence input reception unit. Extraction may be performed for every P monomer extraction units while shifting each monomer unit downstream.
- sequence candidate extraction unit may exclude a biopolymer sequence that does not require prediction from the extracted biopolymer sequence candidates that satisfies a predetermined condition before sending it to the attribute value estimation unit. Good.
- the question point extraction unit may extract a biopolymer sequence having a large variance! / And a certain range from the direction as a question point.
- a biopolymer sequence having a variance larger than a predetermined value may be extracted as a question point.
- a sequence for extracting a biopolymer sequence candidate having an attribute value satisfying a predetermined condition among the attribute values of each biopolymer sequence candidate estimated by the attribute value estimation unit An extraction unit may be further provided.
- the biopolymer array in which the estimated attribute value satisfies a predetermined condition It can be extracted as a predicted sequence.
- the sequence prediction system according to the present invention includes a database having a biopolymer attribute including a biopolymer sequence and an attribute value included in the biopolymer of this sequence;
- a sequence input receiving unit that receives the entire sequence of a predetermined biopolymer
- a sequence candidate extraction unit that extracts a biopolymer sequence candidate to be predicted from all the sequences received by the sequence input reception unit;
- the sequence input receiving unit accepts the entire sequence of a predetermined living body height, and the sequence candidate capturing unit extracts the body height that is the target of attribute value prediction from the entire sequence. Extract molecular sequence candidates.
- the attribute value estimation unit generates a rule from the data set of the database, applies this rule to each biopolymer sequence candidate, and estimates an attribute value for each biopolymer sequence.
- the sequence prediction database according to the present invention includes attribute values obtained by the sequence prediction system described above and a biopolymer sequence.
- the sequence prediction support system includes a database having a biopolymer attribute including a biopolymer sequence and an attribute value included in the biopolymer of this sequence;
- a generation unit for generating a plurality of different data subsets from the data set, and a hypothesis for each data subset, and a second data set consisting of a biopolymer sequence independent of the data set.
- a learning unit that derives attribute values of the biopolymer sequence for the second data set by applying hypotheses,
- a question point extraction unit that obtains a variance of attribute values for each biopolymer sequence in the second data set and extracts a biopolymer sequence having a variance larger than a certain reference as a question point;
- a data control unit that receives an attribute value for the question point, associates the received attribute value with a biopolymer sequence related to the question point, and stores the attribute value in the database; 2005/012542
- the selection unit extracts N data sets from the database, and the generation unit generates a plurality of different data subsets from the N data sets.
- the learning unit analyzes each of the data subsets independently to generate a certain hypothesis, and applies the hypothesis to the biopolymer sequence of the second data set to derive attribute values.
- the second data set having the biopolymer sequence and the derived attribute value is generated as many as the number of data subsets. That is, attribute values are derived for the same biopolymer sequence based on hypotheses derived from each data subset.
- the question point extraction unit obtains a variance of a plurality of attribute values derived for the same biopolymer sequence, and extracts a biopolymer sequence having a variance larger than a certain standard as a question point.
- the data control unit receives the attribute value for the question point, associates it with the biopolymer sequence related to the question point, accumulates it in the database, updates the contents of the database, and constructs a database that supports sequence prediction.
- a sequence prediction program according to the present invention includes a computer device
- a database having biopolymer attributes including a biopolymer array and attribute values of the biopolymer of this array;
- a generation unit that generates a plurality of different data subsets from the data set, and a hypothesis for each data subset, and a second data set that is a biopolymer sequence independent of the data set.
- a learning unit that derives attribute values of the biopolymer sequence for the second data set by applying hypotheses,
- a question point extraction unit that obtains a variance of attribute values for each biopolymer sequence in the second data set and extracts a biopolymer sequence having a variance larger than a certain reference as a question point;
- An attribute value for the interrogation point is received, the received attribute value is associated with a biopolymer sequence related to the interrogation point, and a data control unit that accumulates in the database and an entire sequence of a predetermined biopolymer are accepted An array input receiving unit;
- the biopolymer arrangement to be predicted from the total sequence force received by the sequence input receiving unit
- An attribute value estimation unit that generates a rule from all the data sets of the database after receiving sequence input, and applies a rule to each of the biopolymer sequence candidates to estimate an attribute value of the biopolymer sequence candidate; It functions as a sequence prediction system.
- N data sets having a database power are extracted by the selection unit, and a plurality of different data subsets are generated from the N data sets by the generation unit.
- the learning unit analyzes each of the data subsets independently to generate a certain hypothesis, and applies the hypothesis to the biopolymer sequence of the second data set to derive attribute values.
- the second data set having the biopolymer sequence and the derived attribute value is generated as many as the number of data subsets. That is, attribute values are derived for the same biopolymer sequence based on hypotheses derived from each data subset.
- the question point extraction unit the variance of a plurality of attribute values derived for the same biopolymer sequence is obtained, and a biomolecule sequence having a variance larger than a certain standard is extracted as a question point.
- the data control unit receives the attribute value for the question point, associates it with the biopolymer sequence related to the question point, accumulates it in the database, and updates the contents of the database.
- the sequence input accepting unit accepts the entire sequence of a predetermined biopolymer, and the sequence candidate extracting unit extracts biopolymer sequence candidates for attribute value prediction from this all sequence group.
- the attribute value estimator generates a law from the updated database data set, applies this law to the biopolymer array candidate, and estimates the attribute value for each biopolymer array.
- the general-purpose computer device functions as an array prediction system.
- a sequence prediction program according to the present invention includes a computer device,
- a data pace having a biopolymer attribute including an array of biopolymers and an attribute value included in the biopolymer of the array;
- a sequence input receiving unit that receives the entire sequence of a predetermined biopolymer
- a sequence candidate extraction unit for extracting biopolymer sequence candidates to be predicted from the total sequence force received by the sequence input reception unit;
- An attribute value estimator that generates a rule from all the data sets of the database after accepting the sequence input, and applies a rule to each of the biopolymer sequence candidates to estimate an attribute value of the biopolymer sequence candidate; It functions as a sequence prediction system.
- the sequence input receiving unit accepts the entire sequence of a predetermined biopolymer
- the sequence candidate extraction unit also captures the biopolymer sequence candidate for which the attribute value is to be predicted.
- the attribute value estimation unit generates a rule from the data set of the database, applies the rule to the biopolymer sequence candidate, and estimates the attribute value for each biopolymer sequence.
- the general-purpose computer apparatus functions as an array prediction system.
- a sequence prediction support program includes a computer device
- a database having biopolymer attributes including a biopolymer array and attribute values of the biopolymer of this array;
- a generating unit that generates a plurality of different data subsets from the data set, and a hypothesis for each data subset, and a hypothesis for each of the second data set consisting of biopolymer sequences independent of the data set And applying a learning unit for deriving the attribute value of the biopolymer sequence for the second data set,
- a question point extraction unit that obtains a variance of attribute values for each biopolymer sequence in the second data set and extracts a biopolymer sequence having a variance larger than a certain reference as a question point;
- a data control unit that receives an attribute value for the question point, associates the received attribute value with a biopolymer sequence related to the question point, and stores the attribute value in the database, and causes the function to function as a sequence prediction support system. It is.
- N data sets having database power are extracted by the selection unit, and a plurality of different data subsets are generated from the N data sets by the generation unit.
- the learning unit analyzes each subset of data independently to generate a certain hypothesis, applies the hypothesis to the biopolymer sequence of the second data set, and sets the attribute value Is derived.
- the second data set having the derived values for the biopolymer sequence is generated as many as the number of data subsets. That is, attribute values are derived for the same biopolymer sequence based on hypotheses derived from each data subset.
- the question point extraction unit obtains a variance of a plurality of attribute values derived for the same biopolymer sequence, and extracts a biopolymer sequence having a variance larger than a certain standard as a question point.
- the data control unit receives the attribute value for the question point, associates it with the biopolymer sequence related to the question point, accumulates it in the database, updates the contents of the database, and constructs a database that supports sequence prediction.
- the general-purpose computer device functions as an array prediction support system.
- N data sets are selected from a database having a biopolymer sequence and an attribute value included in the biopolymer of this sequence, and a plurality of different data sets from the data set are selected.
- a hypothesis is generated for each data subset, and a second hypothesis is applied to each second data set having a biopolymer alignment force independent of the data set to obtain a second hypothesis.
- a question point extraction stage in which biopolymer sequences having a variance larger than a certain standard among the calculated variances are extracted as question points;
- the attribute value for the question point is received, the received attribute value is associated with the biopolymer sequence that is applied to the question point, and the data update stage stored in the database and the entire sequence of the predetermined biopolymer are received. Then, a sequence candidate capturing step for extracting candidate biopolymer sequences to be predicted from all the received sequences.
- An attribute value estimation stage for estimating the attribute value of
- the sequence prediction support method selects N data sets from a database having a sequence of a biopolymer and an attribute value included in a biopolymer of the sequence, and further, the data set A data supply stage for generating a plurality of different data subsets from the
- the learning unit generates hypotheses for each data subset, and applies the hypothesis to each of the second data set having a biopolymer alignment force independent of the data set.
- a variance calculating step for calculating the variance of the attribute value For each biopolymer sequence in the second data set! /, A variance calculating step for calculating the variance of the attribute value;
- a question point extraction stage in which biopolymer sequences having a variance larger than a certain standard among the calculated variances are extracted as question points;
- sequence prediction system sequence prediction support system
- sequence prediction program sequence prediction support program
- sequence prediction method include the following modes.
- One aspect of the sequence prediction system includes a database that stores data including a peptide sequence composed of a first predetermined number of amino acids and physical properties that are indicative of a predetermined physiological activity of the peptide sequence; A plurality of learning units for deriving a hypothesis obtained for the third predetermined number of peptide sequences from the peptide sequence and physical properties based on the predetermined number of data, and a fourth predetermined number of data are extracted from the database, A random resampling unit that randomly supplies a second predetermined number of data to each learning unit; a target sequence setting unit that sets a predetermined peptide sequence included in a hypothesis derived by each learning unit; A physical property extraction unit that extracts the physical properties specified by a given peptide sequence from the hypothesis data of each learning unit, and a variance evaluation that evaluates the variance of the physical properties extracted from each learning unit And parts, Te based ⁇ the evaluated dispersed, the target that requests the real data for the properties of the hypothetical peptide sequence A question point extraction unit that extracts
- New data including physical properties based on the obtained peptide sequence and true data is received by the data control unit that accumulates in the database, the sequence input accepting unit that accepts all amino acid sequences of a given protein, and the sequence input accepting unit.
- the sequence complement extraction unit sends the extracted peptide sequence candidates to the learning unit, and the results obtained in each learning unit are extracted.
- a physical property estimation unit for estimating physical properties of the captured peptide sequence.
- the database power fourth predetermined number of data is randomly resampled by the second predetermined number of data that is smaller than the fourth predetermined number by the random resampling unit, Sent to each learning unit.
- different data is supplied for each learning unit.
- Each learning unit analyzes the supplied data to obtain a predetermined physical property for a third predetermined peptide sequence from a certain hypothesis, that is, a peptide sequence consisting of the first predetermined number of amino acids and a predetermined physical property.
- Data sets are derived.
- the focused sequence setting unit sets a predetermined peptide sequence for comparing hypotheses derived by each learning unit, and the focused physical property extraction unit sets the physical property specified by the set predetermined peptide sequence.
- the variance evaluation unit evaluates the variance of physical properties extracted from each learning unit, and the question point extraction unit uses the evaluated variance to determine the peptide for which true data for the hypothetical physical property is requested. Sequences are extracted and their hypotheses are compared.
- the data update unit receives the true data, associates the true data with the extracted peptide sequence, and sends it to the data control unit. In addition, the data controller updates the contents of the database by adding data including the peptide sequence and physical properties based on the true data.
- the sequence input accepting unit accepts the entire amino acid sequence of a given protein, extracts the peptide sequence candidate to be predicted from the entire amino acid sequence, and sends the peptide sequence candidate to the learning unit. send.
- the physical property estimation unit estimates the physical properties of the extracted peptide sequence from the results obtained in each learning unit.
- the sequence candidate extraction unit accepts the sequence input reception unit.
- the peptide extraction unit consisting of the fifth predetermined number of amino acids is extracted from the beginning of the entire amino acid sequence, and the subsequent peptide sequence candidates are shifted downstream by a predetermined number of the sixth amino acid for each peptide extraction unit.
- the peptide sequence may be extracted. Furthermore, peptide sequences that do not need to be predicted to satisfy the predetermined conditions of the extracted sequence candidate can also be eliminated before being sent to the learning unit.
- the query point extraction unit may extract peptide sequences in the seventh predetermined number range from the largest variance as the query points, or the variance is less than a predetermined value. Large peptide sequences may be extracted as question points.
- the hypothesis correction unit V is a data request unit that requests true data of physical properties of the peptide sequence extracted by the question point extraction unit, a data reception unit that receives the requested true data, and
- the received true data may include a data adding unit that is associated with the extracted peptide sequence and sent to the data control unit.
- the data request unit for example, request an experiment to the outside or obtain information from an external database for the peptide sequence that is the question point.
- the data accepting unit accepts data corresponding to the true data
- the data adding unit accepts the received true data to the data control unit so as to add it to the database in association with the peptide sequence for which data is requested. send.
- a sequence extraction unit is further provided for extracting peptide sequence candidates having physical properties satisfying a predetermined estimated condition among the physical properties of each peptide sequence candidate estimated by the physical property estimation unit. May be.
- the physical property estimation unit can extract a peptide sequence candidate having a predetermined physical property as having a predetermined physical property with respect to a predetermined protein.
- this peptide is characterized by predicting the base sequence of a nucleic acid encoding the sequence.
- One aspect of the sequence prediction support system is a database storing data including a peptide sequence composed of a first predetermined number of amino acids and physical properties that are indicative of a predetermined physiological activity of the peptide sequence; A plurality of learning units for deriving a hypothesis obtained for the third predetermined number of peptide sequences from the peptide sequence physical properties based on the second predetermined number of data, and a database capability.
- a random resampling unit that extracts data and supplies each learning unit with a second predetermined number of data at random, and a target sequence setting unit that sets a predetermined peptide sequence included in the hypothesis derived by each learning unit
- the physical properties specified by the set predetermined peptide sequences are extracted from the hypothesis data of each learning unit respectively.
- the physical property extraction unit of interest and the variance of the physical properties extracted from each learning unit are evaluated. Based and dispersion evaluation unit, the evaluation has been distributed! /
- the question point extraction unit that extracts the peptide sequence for which the true data for the hypothetical physical property is requested, and the physical data based on the true data for the extracted peptide sequence are received.
- a data update unit that performs a process of associating; and a data control unit that stores new data including the peptide sequence obtained by the data update unit and physical properties based on the true data in a database.
- the second predetermined number of data in which the database power of the fourth predetermined number 'is smaller than the fourth predetermined number is randomly resampled by the random resampling unit.
- different data is supplied for each learning unit.
- Each learning unit analyzes the supplied data to obtain a predetermined physical property for a third predetermined peptide sequence from a certain hypothesis, that is, a peptide sequence consisting of the first predetermined number of amino acids and a predetermined physical property.
- Data sets are derived.
- the focused sequence setting unit sets a predetermined peptide sequence for comparing hypotheses derived by each learning unit, and the focused physical property extraction unit sets the physical property specified by the set predetermined peptide sequence. Are extracted from the hypotheses of each learning unit.
- the variance evaluation unit evaluates the variance of the physical properties extracted from each learning unit, and the question point extraction unit evaluates the evaluated variance. Based on the above, the target peptide sequences for which true data on the physical properties of the hypothesis are requested are extracted, and the hypotheses are compared.
- the data update unit receives the true data, associates the true data with the extracted peptide sequence, and sends it to the data control unit. Furthermore, the data controller updates the contents of the database by adding data including the peptide sequence and physical properties based on the true data, thereby constructing a database that supports sequence prediction.
- a computer device stores a data including a peptide sequence having a first predetermined number of amino acid forces and physical properties serving as an index of a predetermined physiological activity of the peptide sequence.
- a plurality of learning units for deriving the hypothesis obtained from the third predetermined number of peptide sequences from the peptide sequence and physical properties, and database power A random resampling unit that takes out a fourth predetermined number of data and randomly supplies each learning unit with a second predetermined number of data, and a predetermined peptide sequence included in the hypothesis derived by each learning unit.
- the target sequence setting unit to be set the physical property extraction unit that extracts the physical properties specified by the set predetermined peptide sequence, respectively, the hypothetical power of each learning unit, and the component of each extracted physical property
- a variance evaluation unit that evaluates the variance
- a question point extraction unit that extracts a peptide sequence for requesting true data for the hypothetical physical properties based on the evaluated variance, and accepts the requested true data
- the data update unit that performs processing to correlate the physical properties based on the true data with respect to the extracted peptide sequences, and new data including the peptide sequences obtained by the data update unit and the physical properties based on the true data are stored in the database.
- a data control unit a sequence input accepting unit that accepts the entire amino acid sequence of a predetermined protein, and an all amino acid sequence received by the sequence input accepting unit.
- the sequence candidate extraction unit that sends the extracted peptide sequence candidate to the learning unit, and the physical properties of the extracted peptide sequence candidate from the results obtained in each learning unit And Properties estimator for constant for, that they appear as a sequence prediction system including.
- the database power fourth predetermined number of data is randomly resampled by the second predetermined number of data that is smaller than the fourth predetermined number by the random resampling unit, Sent to each learning unit.
- different learning units are used.
- Data is provided.
- Each learning unit analyzes the supplied data to obtain a predetermined physical property for a third predetermined peptide sequence from a certain hypothesis, that is, a peptide sequence consisting of the first predetermined number of amino acids and a predetermined physical property.
- Data sets are derived.
- the target sequence setting unit sets a predetermined peptide sequence for comparing hypotheses derived by each learning unit, and the target physical property extraction unit is specified by the set predetermined peptide sequence.
- the physical properties are extracted from the hypotheses of each learning unit.
- the variance evaluation unit evaluates the variance of the physical properties extracted by each learning component, and the question point extraction unit extracts the peptide sequence that is the target of requesting true data for the hypothesized physical properties. And each hypothesis is compared.
- the data update unit receives the true data, associates the true data with the extracted peptide sequence, and sends it to the data control unit.
- the data controller updates the contents of the database by adding data including the peptide sequence and physical properties based on the true data.
- the sequence input accepting unit accepts the entire amino acid sequence of a given protein, extracts peptide sequence candidates to be predicted from the entire amino acid sequence, and sends the peptide sequence candidate to the learning unit.
- the physical property estimation unit estimates the physical properties of the extracted peptide sequence candidates from the results obtained in each learning unit.
- the general-purpose computer apparatus functions as an array prediction system.
- a computer device stores data including a peptide sequence composed of a first predetermined number of amino acids and physical properties that serve as an index of a predetermined physiological activity of the peptide sequence.
- a plurality of learning units for deriving a hypothesis obtained from a third predetermined number of peptide sequences from peptide sequences and physical properties based on a second predetermined number of the data, and a database capability
- a random sampling unit that randomly supplies each learning unit with a second predetermined number of data and a predetermined peptide sequence included in the hypothesis derived by each learning unit.
- a data update unit that receives data and performs a process of correlating physical properties based on the true data with the extracted peptide sequence, and a peptide sequence obtained by the data update unit and physical properties based on the true data.
- a data control unit that accumulates new data including the data in a database to function as a sequence prediction support system.
- the database power fourth predetermined number of data is randomly resampled by the second predetermined number of data that is smaller than the fourth predetermined number by the random resampling unit, Sent to each learning unit.
- different data is supplied for each learning unit.
- Each learning unit analyzes the supplied data to obtain a predetermined physical property for a third predetermined peptide sequence from a certain hypothesis, that is, a peptide sequence consisting of the first predetermined number of amino acids and a predetermined physical property.
- Data sets are derived.
- the focused sequence setting unit sets a predetermined peptide sequence for comparing hypotheses derived by each learning unit, and the focused physical property extraction unit sets the physical property specified by the set predetermined peptide sequence.
- the variance evaluation unit evaluates the variance of the physical properties extracted for each learning component, and the question point extraction unit extracts the peptide sequence that is the target of requesting true data for the hypothetical physical properties based on the evaluated variance. And each hypothesis is compared.
- the data update unit receives the true data, associates the true data with the extracted peptide sequence, and sends it to the data control unit. Furthermore, the data controller updates the contents of the database by adding data including the peptide sequence and physical properties based on the true data, thereby constructing a database that supports sequence prediction.
- the general-purpose computer device functions as an array prediction support system.
- data including a peptide sequence composed of a first predetermined number of amino acids and physical properties that serve as an index of a predetermined physiological activity of the peptide sequence is stored.
- Database and database power A fourth predetermined number of data is randomly extracted, and based on a second predetermined number of data randomly sent from the fourth predetermined number of data, Based on the physical properties, a plurality of hypothesis deriving units for deriving the hypothesis obtained by the third predetermined number of peptide sequences and a predetermined peptide sequence included in the hypothesis derived by each hypothesis deriving unit are set.
- the extracted physical properties are extracted from the hypothesis data of each hypothesis deriving unit, the variance of the extracted physical properties is evaluated, and based on the evaluated variance, the peptide sequence for which the true data for the hypothetical physical properties is requested is determined.
- the data control unit that accumulates new data including the sequence and physical properties based on the true data in the data pace, and the whole amino acid sequence of the predetermined protein are received, and the received whole amino acid sequence is a pair of predictions.
- the peptide sequence candidate is extracted, and the extracted peptide sequence candidates are sent to the hypothetical derivation unit. Including a physical property estimation output unit for estimating a physical property of the candidate, the.
- each peptide sequence estimated by the physical property estimation output unit Among candidate physical properties, a sequence extraction unit for extracting peptide sequence candidates having physical properties satisfying a predetermined condition is further provided. Also good.
- sequence prediction support system stores data including a peptide sequence having a first predetermined number of amino acid forces and physical properties serving as an index of a predetermined physiological activity of the peptide sequence.
- Database and database power A fourth predetermined number of data is extracted at random, and the peptide is based on the second predetermined number of data randomly sent from the fourth predetermined number of data.
- a plurality of hypothesis deriving units for deriving hypotheses obtained from the third predetermined number of peptide sequences from the sequence and physical properties, and predetermined peptide sequences included in the hypotheses derived by each hypothesis deriving unit The hypothetical power of each hypothesis deriving unit is extracted for each physical property specified by the set predetermined peptide sequence, the variance of the extracted physical properties is evaluated, and based on the evaluated variance, Truth for hypothetical physical properties And the question point array extraction unit for extracting the peptide sequence of interest to request the data, accepts the true data that has been requested, the extracted peptide sequence Nitsu! Stores new data in the database, including a data update unit that handles physical property-based physical properties, and peptide sequences obtained by the data update unit and physical properties based on true data Data control unit.
- a computer device is connected to the first predetermined number of amino acids.
- a database that stores data including a peptide sequence that has a force and physical properties that are indicative of a predetermined physiological activity of the peptide sequence, and a database power.
- a fourth predetermined number of data is randomly extracted to obtain a fourth predetermined number. Based on the second predetermined number of data sent at random from among the data of the above, the hypothesis obtained from the physical properties of the peptide sequence is derived for the third predetermined number of peptide sequences.
- a plurality of hypothesis deriving units and predetermined peptide sequences included in the hypotheses derived by each hypothesis deriving unit, and the physical properties specified by the set predetermined peptide sequences are set to the hypothesis category of each hypothesis deriving unit.
- a query point sequence extraction unit that extracts each of the extracted physical properties, evaluates the variance of the extracted physical properties, and extracts a peptide sequence for requesting true data for the hypothetical physical properties based on the evaluated variance;
- a data update unit that receives the requested true data and associates the extracted peptide sequence with the physical property based on the true data, and a new property that includes the peptide sequence obtained by the data update unit and the physical property based on the true data.
- the data control unit that accumulates data in the database and the entire amino acid sequence of a given protein are received, and the received all amino acid sequences are extracted as peptide sequence candidates for prediction, and the extracted
- the peptide sequence candidate is sent to the hypothesis deriving unit, and the sequence prediction system including the physical property estimation output unit for estimating the physical property of the extracted peptide sequence candidate from the output result is made to function.
- the computer device stores data including a peptide sequence composed of a first predetermined number of amino acids and physical properties that serve as an index of a predetermined physiological activity of the peptide sequence.
- Database and database power The fourth predetermined number of data is randomly extracted, and the peptide sequence and physical properties are based on the second predetermined number of data randomly sent from the fourth predetermined number of data. From the third predetermined number of peptide sequences, a plurality of hypothesis deriving units for deriving the hypothesis to be obtained and a predetermined peptide sequence included in the hypothesis derived by each hypothesis deriving unit are set.
- the physical properties specified by the set predetermined peptide sequence are extracted from the hypothesis data of each hypothesis derivation unit, the variance of the extracted physical properties is evaluated, and the hypothesis of the hypothesis is based on the evaluated variance.
- object An interrogator point sequence extraction unit for extracting the peptide sequence of interest that requires real data to, and receiving the requested true data, extracted peptide sequence Nitsu 1 /, Te based on the true data Sequence prediction support, which includes a data update unit that performs processing for associating sex, and a data control unit that stores new data including peptide sequences obtained by the data update unit and physical properties based on true data in a database It functions as a system.
- a database that stores data including a peptide sequence composed of a first predetermined number of amino acids and physical properties that serve as an index of a predetermined physiological activity of the peptide sequence.
- the random resampling unit extracts the fourth predetermined number of data, and randomly supplies the second predetermined number of data from the fourth predetermined number of data to each of the plurality of learning units.
- a hypothesis derivation step in which each learning unit derives a hypothesis obtained for a third predetermined number of peptide sequences from V, peptide sequences and physical properties based on a second predetermined number of data;
- the target sequence setting stage for setting a predetermined peptide sequence included in the hypothesis derived in each learning unit, and the physical properties specified by the set predetermined peptide sequence are determined from the hypothesis data of each learning unit.
- a question point extraction stage for extracting a peptide sequence to be received, and processing for associating physical properties based on the true data with respect to the extracted peptide sequence by receiving the requested true data and converting the extracted peptide sequence to the true data.
- New additional data including physical properties based on the data update stage for accumulating in the database, and accepting all amino acid sequences of a given protein.
- the extracted peptide sequence candidates are sent to the learning unit and the results obtained in each learning unit Including a physical property estimation step for estimating a physical property of capturing peptide sequences weather, the.
- sequence prediction support method is also included in the embodiments of the present invention. That is, from a database that stores data including a peptide sequence composed of a first predetermined number of amino acids and physical properties that are indicative of a predetermined physiological activity of the peptide sequence, a fourth predetermined number of data is obtained by a random resampling unit.
- a random re-sampling stage in which a second predetermined number of data is randomly supplied from a fourth predetermined number of data to each of the plurality of learning units, and each learning unit has a second predetermined number Peptide sequence and data based on A hypothesis derivation stage for deriving a hypothesis obtained for the third predetermined number of peptide sequences from the physical properties, and a target sequence setting stage for setting a predetermined peptide sequence included in the hypothesis derived by each learning unit.
- the physical property extraction stage that extracts the physical properties specified by the specified peptide sequence, each of the learning department's hypothesis ability, the variance evaluation stage that evaluates the variance of the physical properties extracted from each learning section, and the evaluated variance
- a question point extraction stage for extracting the peptide sequence for which the true data for the hypothetical physical property is requested, and accepting the requested true data and associating the extracted peptide sequence with the physical property based on the true data
- a data update stage in which new additional data including the obtained peptide sequence and physical properties based on the true data is stored in the database. Is a non-array prediction support method.
- FIG. 1 is a block diagram showing an overview of a sequence prediction system according to the first embodiment of the present invention.
- FIG. 2 is a diagram showing an example of a data set stored in a storage device.
- FIG. 3 is a diagram showing an example of the existence probability of each amino acid at each aligned position of virtual peptide sequences tabulated based on probability parameters calculated by a learning unit.
- FIG. 4 is a diagram illustrating an example of a hypothesis output by a learning unit.
- FIG. 5 is a diagram schematically showing an example of data for question point extraction.
- FIG. 6 shows an example in which the sequence candidate extraction unit is configured to exclude unnecessary peptide sequences.
- FIG. 7 is a block diagram showing an overview of a sequence prediction system according to a second embodiment of the present invention.
- FIG. 9 is a diagram showing a case where a request for true data is made to an external database, not to a user.
- FIG. 10 is a flowchart explaining the operation of the sequence prediction support method according to the first embodiment.
- FIG. 11 is a flowchart showing the operation of a sequence prediction system using a database constructed by a sequence prediction support system or an existing database.
- FIG. 12 is a flow chart illustrating the operation of the sequence prediction support method according to the second embodiment.
- FIG. 13 is a flowchart showing the operation of the sequence prediction system using the database constructed by the sequence prediction support system according to the second embodiment.
- FIG. 1 is a block diagram showing an overview of the sequence prediction system according to the first embodiment of the present invention.
- This sequence prediction system includes a storage device 126, which is a database having a biopolymer attribute including a biopolymer sequence and attribute values of the biopolymer of this sequence, and N data sets from the storage device 126.
- a data control unit 128 as a selection unit to be selected, a generation unit 102 that generates a plurality of different data subsets from the data set, and a hypothesis for each data subset, and the data set Applying a hypothesis to each second data set, which is an independent biopolymer array power, and deriving the attribute value of the biopolymer array for the second data set, and the second data set
- the question point extraction unit 118 that obtains the variance of the attribute value for each biopolymer sequence of and extracts a biopolymer sequence having a variance larger than a certain standard as a question point, and this question
- the attribute value for the data is received, the received attribute value is associated with the biopolymer sequence related to the question point, and the data control unit 128 stored in the
- the sequence input receiving unit 130 the sequence candidate extracting unit 131 that extracts the biopolymer sequence candidate to be predicted by the total sequence force received by the sequence input receiving unit 130, and all the storage devices 126 after receiving the sequence input
- each rule is applied to each candidate biopolymer sequence, and
- a learning unit 104 as an attribute value estimation unit for estimating the attribute value.
- the storage device 126 is a database that accumulates a data set including peptide sequences as biopolymer sequences and attribute values of the peptide sequences.
- This data set is composed of known data (referred to as “known data”) that has been clarified by documents or the like, or data sent from the data receiving unit 122 through the data control unit 128 described later.
- FIG. 2 is a diagram showing an example of a data set stored in the storage device 126. As shown in FIG.
- this data set consists of a peptide sequence consisting of a predetermined number of amino acids and an attribute value of this peptide sequence, for example, a physical property that is an index of a predetermined physiological activity, such as an antigen closely related to immune induction. And a binding constant (-logKd) to a human leukocyte antigen (HLA) complex which is a display molecule.
- the number of amino acids in the peptide sequence is 8 to when targeting HLA class I molecules, for example, a fixed value of L1, for example 9, and 20 or less when targeting HLA class II molecules. Can be a fixed value.
- an example of a peptide sequence in which HLA is an antigen-presenting molecule as a binding target will be described as an example of a biopolymer sequence, but a biopolymer sequence having other physiological activity,
- it may be a peptide sequence that targets a G protein-coupled receptor having a peptide as a ligand, or may be a base sequence of a nucleic acid (such as DNA) encoding a predetermined peptide sequence as described above.
- examples of biological macromolecules having a predetermined physiological activity include DNAs and RNAs composed of a predetermined number of nucleotides and having a predetermined base sequence.
- the attribute value of the biopolymer sequence includes a physical property that serves as an index of the binding ability to a predetermined substance.
- This physical property is related to, for example, the binding constant in addition to the binding target. It may be a physical property such as a hydrophobicity (or hydrophilicity).
- the data control unit 128 functions as a selection unit that selects N data sets, and the selected N data sets are sent to the generation unit 102. Further, the data control unit 128 updates the data content of the storage device 126 by sending an additional data set sent from the data reception unit 122 to the storage device 126, as will be described later. [0068] In addition, in the data control unit 128, when all the arrays of a predetermined biomolecular arrangement are input from the array input receiving unit 130 described later, all data sets are stored from the data sets stored in the storage device 126. It is taken out and sent to the learning unit 104 as an attribute value estimation unit.
- the generation unit 102 randomly samples from the N data sets sent from the data control unit 128, generates a data subset having arbitrary m (N> m) data power, The subset is sent to the learning unit 104.
- each data subset may be the same number of data sets or may be a different number of data sets.
- the learning unit 104 when a data subset is sent from the generation unit 102, a hypothesis described later is generated for each data subset, and when a data set is sent from the data control unit 128, A rule for estimating an attribute value for a candidate peptide sequence described later, for example, a binding constant in FIG. 2, is generated.
- the learning unit 104 may include a plurality of calculation units, and each calculation unit may be configured to perform processing related to a plurality of data subsets in parallel, or may include a single calculation unit. The processing may be performed serially for each data subset.
- arithmetic processing is performed according to the procedure of the hidden Markov model learning system described in Japanese Patent No. 3094860, for example.
- the top row shows that the first or ninth amino acid has a methiyun (M) force of 3 ⁇ 49%, isoleucine (I) has a probability of 16%, and parin (V) has a probability of 12%. It is shown. The remaining 43% is calculated as the total probability of the remaining amino acids.
- the aligned positions of the 8 amino acids are shown in the order of the left force in the right direction. According to this, the probability that the leftmost Threonin (T) is first is 1%, and the probability that it is second is 22%. In this way, the probability of existence is shown to the right, and the amino acids from the top 1 to 3 are shown above each aligned position. That is, the parameter storage device 140 is configured to store each probability parameter used for aggregating hypotheses that also include such parameter forces.
- Non-Patent Document 1 The outline is as follows.
- L is the peptide sequence O in a given HMM (Hidden Markov Model)
- LKa ′ represents the average value of gKa of all peptides used in the calculation.
- H indicates a reference HMM when the existence probability is uniform.
- the learning unit 104 applies the hypothesis to a second data set consisting of a biopolymer sequence independent of the data set extracted by the data control unit 128, and the second data
- the attribute value of the biopolymer sequence for the set is derived and sent to the question point extraction unit 118.
- This second data set contains, for example, 100,000 peptide sequences, and hypotheses from multiple data subsets are applied to this second data set, respectively.
- a second data set of attribute values for each array The number of data subsets is generated.
- the peptide sequence for the second data set may be a variable set that is set each time a data subset is sent from the generation unit 102, and is arbitrarily set by the person using this system. It may be a set that is entered or selected. Further, it may be included in a predetermined data table.
- a data set when a data set is sent from the data control unit 128, it acts as an attribute value estimation unit. That is, a law is generated based on a probability parameter that can be obtained by performing the same operation as described above. Unlike generating hypotheses, a set of rules is generated. For each candidate peptide sequence sent from the sequence candidate extraction unit 131 described later, an estimated value obtained by applying the rule is obtained, and this estimated value is associated as an attribute value of the corresponding candidate peptide sequence. , Sent to peptide database 138.
- a calculation process is performed to obtain the variance of attribute values for each peptide sequence in the second data set.
- FIG. 4 shows an example of the result of this calculation process.
- ori indicates a binding constant as a temporary score of an attribute value that is a starting point of calculation in the learning unit 104.
- 0.0000 is assigned as an initial value for all peptide sequences.
- Mean means the average value of the prediction scores derived for each specific peptide sequence in the second data set, max in the same row is the maximum of the same prediction score, and min in the same row is the minimum of the same prediction score.
- Sd of the bank represents the standard deviation of the prediction score
- var of the bank represents the variance of the prediction score.
- the question point extraction unit 118 extracts in order from the one with the largest variance.
- Figure 5 schematically shows the ranking in the dataset.
- peptide sequences as biopolymer sequences within a certain range, for example, the top 50 in the direction of large dispersion are extracted as question points in this data set, and the extracted peptide sequences are extracted as data request parts. 1 Sent to 20.
- peptide sequences with a variance greater than a predetermined value may be extracted as question points! / ⁇ .
- the peptide sequence related to the question point extracted by the question point extraction unit 118 is stored in data indicating a true attribute value, for example, measurement data obtained by an experiment or an external database.
- Request data such as literature.
- Data reception unit 122 Accepts measurement data input by the user in response to a request from the data request unit 120, or data such as literature obtained from a predetermined database as described later, and indicates these data as true attribute values. The data is sent to the data control unit 128 as data.
- the data control unit 128 associates the data sent from the data receiving unit 122 with the peptide sequence selected as the question point, and adds an additional value including this peptide sequence and the attribute value related to this data.
- a data set is generated and sent to the storage device 126. As described above, this additional data set is accumulated in the storage device 126 and becomes a candidate for data in the subsequent hypothesis derivation.
- the sequence input receiving unit 130 forms information on the entire amino acid sequence of a predetermined protein for specifying a candidate peptide sequence that is desired to be predicted, for example, a target protein for which identification of an epitope is desired, such as a virus antigen.
- the input of the entire amino acid sequence of the protein is received, and the received data is sent to the sequence candidate extraction unit 131.
- This input may be made through a user interface by a predetermined input device, or via a network connected to the user interface.
- target proteins other than the Winores antigen include bacteria and bacteria involved in infectious diseases such as Mycobacterium tuberculosis, O-157, Salmonella, Pseudomonas aeruginosa, Helicobacter pylori, Staphylococcus aureus, and malaria.
- infectious diseases such as Mycobacterium tuberculosis, O-157, Salmonella, Pseudomonas aeruginosa, Helicobacter pylori, Staphylococcus aureus, and malaria.
- Proteins involved in allergic diseases such as type I diabetes, Syugren's syndrome, hay fever, atopy, asthma, rheumatism, collagen disease, autoimmune diseases, rejection of organ transplants; cancer immunity It can also be applied to proteins such as cancer antigens; Algno, proteins that are conducive to imamosis, such as beta amyloid, which is the causative protein.
- the sequence candidate extraction unit 131 extracts and extracts a peptide sequence candidate to be predicted based on the entire amino acid sequence of a predetermined protein, which is information received by the sequence input reception unit 130. Peptide sequence candidates are sent to the learning unit 104.
- the peptide sequence extracted by the sequence candidate extraction unit 131 may include a sequence that cannot be practically used. Such unnecessary peptide sequences may be automatically excluded without human assistance. '
- FIG. 6 shows an example in which the sequence candidate extraction unit 131 is configured to eliminate unnecessary peptide sequences. Indicates.
- sequence candidate extraction unit 131 as a monomer unit of the total amino acid sequence power p units of the predetermined protein sent from the sequence input reception unit 130, for example, in a peptide extraction unit consisting of 8 to 11, particularly 9 amino acids, A weather catcher 150 for removing the weather trap and an unnecessary sequence exclusion unit 152 for removing a peptide sequence that does not require prediction satisfying a predetermined condition from the extracted peptide sequence candidates are provided.
- the peptide sequence is extracted from the beginning of all the amino acid sequences received by the sequence input reception unit 130 in the peptide extraction unit, and the subsequent peptide sequence capturing is performed in q monomer units. For example, each peptide extraction unit is extracted while shifting one amino acid at a time downstream.
- an unnecessary sequence database that accumulates data relating to peptide sequences that do not require prediction satisfying a predetermined condition from the extracted peptide sequence candidates, for example, data relating to unnecessary peptide sequences.
- Peptide sequences identified by referring to the above are excluded from the prediction candidates before being sent to the learning unit 104, and the remaining peptide sequence candidates are sent to the learning unit 104. ing.
- unnecessary peptide sequences have low water solubility, for example! / ⁇ Peptide sequences are fisted.
- the sequence input accepting unit 130 when identifying a virus antigen desired to be identified by the sequence input accepting unit 130, such as a CTL epitope of hepatitis C virus, the entire amino acid sequence of the antigen protein of hepatitis C virus From the above, an indication of a peptide sequence acting as an epitope is extracted.
- the antigen of hepatitis C virus is formed from 8 to 11 amino acids presented in human leukocyte antigen (HLA) class I molecules that induce immunity as a specific protein, and CTL is this part. Recognizing that hepatitis C virus is known to be impaired.
- HLA human leukocyte antigen
- 8 to 11 amino acid units are extracted as P monomer extraction units from the beginning of the entire amino acid sequence of hepatitis C virus antigen, followed by q monomer units from the beginning, for example, from the second amino acid shifted by 1 amino acid.
- the first amino acid is shifted downstream by one amino acid, and then the extracted peptide sequence is extracted as an attribute value.
- a candidate peptide sequence is extracted from the entire amino acid sequence of the received protein, and unnecessary peptide sequences are extracted from the extracted peptide sequences before prediction of physical properties. This eliminates the need for unnecessary estimation operations in the learning unit 104.
- the unnecessary sequence database 154 may be a part of the storage device 126.
- data related to physical properties such as hydrophobicity may be added to a part of the data as shown in FIG.
- the data accumulated in the unnecessary sequence database 154 includes information on peptide sequences that require licenses from other companies, and is configured to exclude such peptide sequences, for example, for the development of new drugs.
- the present embodiment can be used for the purpose of extracting necessary peptide sequence candidates.
- an attribute value estimated by the learning unit 104 for example, a data set composed of a combination of a binding constant to an HLA class I molecule and a peptide sequence having this binding constant is accumulated.
- the condition input receiving unit 134 receives an input of an attribute value, for example, a binding constant, which becomes a keyword for extracting a peptide sequence having a predetermined physical property from the peptide database 138. Similar to the array input receiving unit 130, this input may be performed through a user interface by a predetermined input device, or may be performed through this network by connecting a network to the user interface.
- an attribute value for example, a binding constant
- an input of a condition (attribute value) required according to the use of the peptide sequence to be extracted is accepted.
- a condition attribute value
- a keyword having a binding constant higher than 6 for an HLA class I molecule that is a predetermined protein is accepted as a keyword.
- the sequence extraction unit 136 extracts a peptide sequence that satisfies the conditions received by the condition input reception unit 134 from the peptide database 138, and outputs the extracted peptide sequence as a prediction result.
- the sequence input was accepted.
- the learning part 104 receives an input to that effect, for example, a peptide sequence whose binding constant is estimated and information on the number of substitutions indicating how many amino acids are to be substituted in the peptide sequence.
- the calculation at the estimation stage can be performed at, and the attribute value of the new peptide sequence can be estimated based on the calculation result.
- FIG. 7 is a block diagram showing an overview of the sequence prediction system according to the second embodiment of the present invention.
- This sequence prediction system includes a storage device 126 which is a database for storing data including a peptide sequence composed of a first predetermined number of amino acids and physical properties that are indicative of a predetermined physiological activity of the peptide of this peptide sequence.
- a plurality of learning units 112 for deriving a hypothesis to be obtained for a third predetermined number of peptide sequences from the peptide sequence and the physical properties based on a second predetermined number of the data; and
- a hypothesis deriving unit configured by a random resampling unit 110 that extracts a fourth predetermined number of data from the storage device 126 and supplies the second predetermined number of data to each learning unit 112 at random;
- the target sequence setting unit 160 (FIG.
- the physical property extraction unit 162 (FIG. 8) that also extracts the hypothesis power of each of the above, and the variance evaluation unit 164 (FIG. 8) that evaluates the variance of the physical properties extracted from each learning unit 112. 114, and a question point sequence extraction unit comprising a question point extraction unit 118 for extracting a peptide sequence for which true data for the physical property of the hypothesis is requested based on the evaluated variance.
- a data control unit 128 that accumulates new data including the peptide sequence obtained by the data update unit and physical properties based on the true data in the storage device 126, and an entire array of predetermined proteins.
- the sequence input receiving unit 130 that accepts a non-acid sequence, and the peptide sequence candidates to be predicted from the all amino acid sequences received by the sequence input receiving unit 130 are extracted, and the extracted peptide sequence candidates are learned.
- a physical property estimation output unit comprising a sequence candidate extraction unit 131 to be sent to the unit 112, and a physical property estimation unit 132 that estimates the physical properties of the extracted peptide sequence candidates from the results obtained by the learning units 112; Mu
- the storage device 126 is clearly described in the literature including a peptide sequence composed of the first predetermined number of amino acids and physical properties that are indicative of a predetermined physiological activity of the peptide sequence.
- this is a database for accumulating a data set having known data (“known data”). Further, as will be described later, it can be updated by additional data sent through the data control unit 128.
- FIG. 2 is a diagram showing an example of a data set stored in the storage device 126.
- this data set is an addition of known data and true data, a peptide sequence consisting of a first predetermined number of amino acids, indicated by the data, and a predetermined physiology of this peptide sequence. It includes physical properties that are indicative of activity, for example, a binding constant (-logKd) to a human leukocyte antigen (HLA) complex that is an antigen-presenting molecule closely related to immune induction.
- the number of amino acids as the first predetermined number is a fixed value of 8 to 11 when targeting HLA class I molecules, for example 9, and when targeting HLA class II molecules. Is a fixed value of 20 or less. 2005/012542
- an example of a peptide sequence in which the binding target is HLA that is an antigen-presenting molecule is shown as a peptide sequence to be sought. It may be a peptide sequence targeting a conjugated receptor, or may be a base sequence of a nucleic acid (such as DNA) encoding a predetermined peptide sequence as described above.
- the physical properties that serve as an index of the binding ability to a predetermined substance include physical properties related to binding, such as hydrophobicity (or hydrophilicity). May be.
- the learning unit 112 derives the data based on the data resampled by the random resampling unit 110 described later, and the data adding unit 124 described later if necessary.
- the additional data including the true data added in step S3 is sent to the storage device 126, and the data set stored in the storage device 126 is updated.
- Random resampler 110 resamples a second predetermined number of data randomly from the fourth predetermined number of data sent from data controller 128 and supplies the data to each learning unit 112 To do.
- the data control unit 128 and the random resampling unit 110 are interlocked so that the same number of different data (samples) are randomly supplied to each learning unit 112. For example, when 100 data as the fourth predetermined number is extracted from the storage device 126 and 50 data as the second predetermined number is supplied to each learning unit 112, the same data is supplied to all the learning units 112. Randomly resample and retrieve 100 to 50 data, send the retrieved data to one learning unit 112, and resample and retrieve another 50 data randomly The data is sent to another learning unit 112, and finally 50 different data are supplied to all learning units. Thereby, it is possible to prevent the same hypothesis from being derived from each learning unit 112. In this way, if the measured value (reference value) force S is about several hundreds at most, prediction by this system can be performed.
- the learning unit 112 performs processing according to the purpose at the learning stage and the estimation stage.
- the data control unit 128 performs the learning stage calculation.
- the control signal cont is sent to each learning unit 112, and when the control signal cont is input, the learning unit 112 performs a calculation in the learning stage.
- an estimation stage calculation is performed.
- a plurality of learning units for example, 50 learning units are used by using input data in accordance with the procedure of the hidden Markov model learning system described in Japanese Patent No. 3094860, for example.
- the probability calculation is performed at, and the calculation result is stored in the parameter storage device 140.
- the probability parameters accumulated in the parameter storage device 140 are the first predetermined number, for example, the presence probability of each amino acid at each alignment position in the order of the peptide sequences consisting of 9 amino acids, and the order before and after each alignment position. It consists of the transition probability of.
- the probability is accumulated according to the probability parameters accumulated in the parameter storage device 140, and the existence probability of each amino acid at each aligned position of the virtual peptide sequence as shown in Fig. 3 is obtained. It is like that.
- a third predetermined number for example, 100,000 peptide sequences are combined based on the aggregation results as shown in FIG.
- Hypothesis data is obtained by calculating a prediction score corresponding to a constant.
- This hypothesis data is sent to the hypothesis comparison unit 114.
- the hypothesis data may be sent to the data control unit 128.
- this third predetermined number of peptide sequence sets may be a variable set that is set each time the learning phase calculation starts, and can be arbitrarily entered or selected by the person using this system. It may be a set.
- the calculation in the estimation stage is performed in substantially the same way as the calculation in the learning stage, and the score of the binding constant corresponding to each peptide sequence obtained in each learning section 112 is the hypothesis comparison section. Instead, it is sent to the physical property estimation unit 132 described later.
- the probability parameter stored in the parameter storage device 140 is overwritten every time deta random resampling is performed in the learning stage, and is stored last in the estimation stage. Use the probability parameter! The score is calculated. 05 012542
- FIG. 8 shows a functional block diagram for explaining the function of the hypothesis comparison unit 114.
- the hypothesis comparison unit 114 includes a target sequence setting unit 160, a target physical property extraction unit 162, and a variance evaluation unit 164. '
- the sequence-of-interest setting unit 160 sets a peptide sequence to be compared for determining how much the hypothesis derived from each learning unit 112 converges to V.
- This set peptide sequence is one of those that are listed as the peptide sequences of the data that make up each hypothesis!
- the focused physical property extracting unit 162 extracts the physical properties specified by the peptide sequence set by the focused sequence setting unit 160 from the hypothesis data.
- the variance evaluation unit 164 calculates the variance of the physical properties extracted by the focused physical property extraction unit 162 to obtain, for example, a data set as shown in FIG. 4 described above. The obtained variance is sent to the question point extraction unit 118.
- the question point extraction unit 118 extracts in order of increasing variance obtained by the hypothesis comparison unit 114.
- Figure 5 schematically shows the ranking in the data set. Further, from the data set, the highest 50th, which is the seventh predetermined number range from the one with the largest variance, is extracted as the question points, and the extracted peptide sequence is sent to the data request unit 120. Alternatively, a peptide sequence having a variance greater than a predetermined value may be extracted as a target peptide sequence for which true data is requested, that is, a question point.
- the data request unit 120 requests true data, for example, measurement data obtained by experiments or data such as documents stored in an external database, with respect to the peptide sequences related to the question points extracted by the question point extraction unit 118.
- the data accepting unit 122 accepts the force requested by the data requesting unit 120, the force input by the user S, the measurement data obtained from the user S, and the literature data obtained from a predetermined database as described later.
- the data is sent to the data adding unit 124 as true data.
- the data adding unit 124 once captures the true data sent from the data receiving unit 122, associates it with the peptide sequence that was the question point, and generates additional data including this peptide sequence and this physical property. This additional data is sent to the data control unit 128.
- the sequence input receiving unit 130 desires to identify information on the entire amino acid sequence of a predetermined protein, for example, identification of an epitope, in order to specify a candidate peptide sequence that is desired to be predicted.
- the input of the entire amino acid sequence of the target protein to be processed, for example, the protein forming the virus antigen, is received, and the received data is sent to the sequence candidate capturing unit 131.
- This input may be made through a user interface by a predetermined input device or via a network connected to the user interface.
- a target protein other than the virus antigen as described above may be the target of sequence input acceptance.
- the sequence candidate extraction unit 131 extracts and extracts a candidate peptide sequence target for prediction based on the entire amino acid sequence of a predetermined protein, which is information received by the sequence input reception unit 130. Peptide sequence candidates are sent to each learning unit 112.
- the peptide sequences extracted by the sequence candidate extraction unit 131 may include sequences that cannot actually be used.
- the sequence capturing / extracting unit 131 may be configured to automatically exclude such unnecessary peptide sequences without human assistance.
- the physical property estimation unit 132 the calculation in the estimation stage of the learning unit 112 is performed according to the peptide sequence candidate after being extracted by the sequence candidate extraction unit 131 and unnecessary peptide sequences are excluded as necessary. According to the results obtained by the above, the physical properties of each peptide sequence are estimated. This calculation result is obtained, for example, in a data set as shown in FIG. 5 described above. In the physical property estimation unit 1 32, for example, an average value is obtained for each peptide sequence, and a given protein of the peptide sequence, for example, a target protein. This estimation is performed for all peptide sequence candidates, and the combination of peptide sequence and putative physical properties is sent to the peptide database 138.
- a data set consisting of a combination of the physical property estimated by the physical property estimation unit 132, for example, a binding constant to the HLA class I molecule and a peptide sequence having this physical property is obtained.
- the condition input accepting unit 134 accepts input of physical properties, for example, binding constants, which are keywords for extracting peptide sequences having predetermined physical properties from the peptide data pace 138.
- This input may be made through a user interface by a predetermined input device, similarly to the array input receiving unit 130, and may be connected to the user interface via a network. 2005/012542
- an input of conditions (physical properties) required according to the use of the peptide sequence to be extracted is accepted.
- a peptide sequence is used as a therapeutic agent for hepatitis C
- the binding constant for an HLA class I molecule that is a predetermined protein is accepted as a keyword.
- the sequence extraction unit 136 extracts a peptide sequence satisfying the condition received by the condition input reception unit 134 from the peptide database 138, and outputs the extracted peptide sequence as a prediction result.
- the physical properties of a new peptide sequence obtained by substituting one to several amino acids into the peptide sequence were examined.
- an input to that effect for example, a peptide sequence for which a binding constant is estimated, and V in the peptide sequence, an eighth predetermined number of information on whether to replace two amino acids, are input.
- Each learning unit 112 performs an estimation stage calculation, and based on the calculation result, the physical property estimation unit 132 can estimate the physical property of the new peptide sequence.
- Fig. 9 is a diagram showing a case where a request for true data is made to an external database that is not directed to the user.
- an example applied to the sequence prediction system shown in FIG. 7 is shown, but the present invention can also be applied to the sequence prediction system shown in FIG.
- the peptide sequence is sent to the database control unit 162 via the network 160 in response to a request from the data request unit 120, and the database control unit 162 stores the measured value of this peptide sequence in the measured value database 164.
- this actual value is obtained, it is sent as data such as documents to the data reception unit 122 through the network 160. In this way, true data can be obtained automatically without human assistance.
- FIG. 10 is a flowchart for explaining the operation of the sequence prediction support system according to the embodiment of the sequence prediction support method of the present invention.
- the sequence prediction support system of this embodiment is included in the sequence prediction system according to the first embodiment shown in FIG. 1, and the reference numerals in FIG. .
- N data sets are selected from a database having biopolymer sequences and attribute values of the biopolymers of this sequence, and a plurality of different data sets are selected from the data sets.
- Step S1 which is a data supply stage for generating a data subset and supplying it to the learning unit, and the learning unit generate hypotheses for each data subset, and from the biopolymer sequence independent of the data set.
- Step S2 which is a hypothesis derivation stage in which the hypothesis is applied to each second data set to derive the attribute value of the biopolymer sequence related to the second data set, and each biological height in the second data set
- Step S3 which is a variance calculation stage for calculating the variance of attribute values for the molecular arrangement, and the living body height having a variance larger than a certain standard among the calculated variances
- Step S4 which is a question point extraction stage for extracting a child sequence as a question point, and an attribute value for this question point are received, and the received attribute value is associated with the biopolymer sequence related to the question point, thereby de- And step S5, which is a data update stage stored in the database.
- step S1 the data control unit 128 selects N data sets each including an array of storage device biomolecules as a database and attribute values included in the biopolymers of this array, and further generates a generation unit.
- N data sets each including an array of storage device biomolecules as a database and attribute values included in the biopolymers of this array, and further generates a generation unit.
- a plurality of different data subsets are generated from these N data sets and supplied to the learning unit 104.
- step S2 as described above, the hypothesis generated for each data subset by the learning unit 104 is applied to the biopolymer sequence (peptide sequence) of the second data set. , The attribute value of each peptide sequence is derived.
- step S3 the question point extraction unit 118 calculates the variance of the attribute value of each biopolymer sequence.
- Step S4 the question point extraction unit 118 continues to extract, as the question points, biopolymer sequences having a variance larger than a certain standard among the calculated variances.
- step S5 the attribute value for the extracted question point is received by the data receiving unit 122, and the data control unit 128 associates the received attribute value with the biomolecular sequence related to the question point.
- the data is sent to and stored in the storage device 126, and the contents of the storage device 126 are updated.
- a database that supports sequence prediction is constructed.
- Steps S1 to S5 for example, the maximum variance obtained in step S3 It may be repeated as appropriate until the value becomes smaller than the predetermined value. In this case, the reliability of the contents of the sequence prediction support database is further improved.
- FIG. 11 is a flowchart showing the operation of a sequence prediction system using a database constructed by the sequence prediction support system according to the first embodiment shown in FIG. It is.
- step S110 the sequence input receiving unit 130 receives the entire sequence of a predetermined biopolymer, for example, a protein, and the sequence candidate extracting unit 118 predicts the received all sequence data.
- the target biopolymer sequence in this case a peptide sequence candidate, is extracted and sent to the learning unit 104.
- step S 111 after receiving the array input, the data control unit 128 extracts all data sets in the storage device 128 and sends them to the learning unit 104.
- the learning unit 104 generates a rule from the entire data set and applies the rule to each of the above-described biopolymer sequence candidates to estimate the attribute value of the biopolymer sequence candidate.
- the attribute value estimated by the learning unit 104 is sent to the peptide database 138, and is stored in association with the corresponding peptide sequence, whereby data comprising the peptide sequence and the attribute value is stored.
- a database of sets can be created. This data set is not limited to peptide sequences, and any database of biopolymers such as DNA and RNA can be databased together with attribute values.
- Step S113 to Step S114 are provided, and in Step S113, a keyword for extracting a peptide sequence having a predetermined attribute value from the peptide database 138 in the condition input receiving unit 134, for example, an attribute value is provided. Accepts input of conditions such as greater than the binding constant for a specific protein.
- step S114 the sequence extraction unit 136 extracts a peptide sequence satisfying the condition received by the condition input reception unit 134 from the peptide database 138, and outputs the extracted peptide sequence as a prediction result. .
- FIG. 12 is a flowchart for explaining the operation of the sequence prediction support system included in the sequence prediction system according to the second embodiment shown in FIG. In the following description, the reference numerals in FIG.
- step S 10 data is extracted from the storage device 126 by the data control unit 128, and different data powers are randomly resampled to each learning unit 112 through the random resampling 110.
- each learning unit 112 analyzes the supplied data and determines a third hypothesis, that is, a third predetermined number of 100,000 peptide sequences based on a certain hypothesis, that is, a peptide sequence and predetermined physical properties.
- a data set including the obtained score is derived.
- step S30 the target sequence setting unit 160 sets a predetermined peptide sequence for comparing the same hypotheses derived by each learning unit 112.
- step S 40 the target physical property extraction unit 162 extracts the set predetermined peptide sequence and physical properties from the hypothesis data of each learning unit 112.
- step S50 the variance evaluation unit 164 evaluates the variance of the physical properties extracted from each learning unit 112.
- step S60 the question point extraction unit 118 extracts the data in descending order of the variance evaluated by the variance evaluation unit 164 of the hypothesis comparison unit 114.
- the data set obtained in this way is shown schematically in Fig. 5.
- step S70 the top 50 of the data set obtained in step S60 is extracted as the question points as described above, and the extracted peptide sequence is used as the true data for the hypothetical physical properties. It is extracted as a peptide sequence to be requested.
- step S80 the data requesting unit 120 requests true data, the data receiving unit 122 receives the requested true data, and the data adding unit 124 extracts the array extracted in step S70. Additional data can be obtained by defining the hypothetical physical properties as true data.
- step S90 the additional data obtained by the data adding unit 124 is sent to the storage device 126 through the data control unit 128, and the data in the storage device 126 is updated.
- step S100 it is determined whether or not to perform the next learning. This judgment result is YES, That is, when the next learning is performed, the process returns to step S10, and the learning data force S is randomly supplied to each learning unit 112 by random resampling 110. If the determination result is NO, that is, if the next learning is not performed, the sequence prediction support operation ends.
- the number of times of learning may be determined in advance as a predetermined number, or it may be determined whether or not the next learning is performed at each end.
- the peptide sequences are rearranged in descending order of the hypothesis data, and a predetermined number, for example, up to 50 are extracted from the top as question points.
- a predetermined number for example, up to 50 are extracted from the top as question points.
- the estimated variance is a predetermined value.
- the peptide sequence as described above may be extracted as a question point.
- FIG. 13 is a flowchart showing the operation of the sequence prediction system using the database constructed by the sequence prediction support system according to the second embodiment.
- step S200 the sequence input accepting unit 130 accepts the entire amino acid sequence of a virus antigen that is a target protein for a predetermined substance, for example, an antigen-presenting molecule, and in step S210, the received entire amino acid sequence is predicted. Peptide sequence candidates to be subjected to extraction are extracted, and the learning unit 112 performs an estimation stage calculation. From the calculation results, the physical property estimation unit estimates the binding constant of the peptide sequence candidate to the virus antigen. In step S220, In the peptide database 138, a data set of all the peptide sequence candidates and predetermined physical properties is generated and accumulated.
- a predetermined substance for example, an antigen-presenting molecule
- step S230 the condition input receiving unit 134 receives an input of a physical property as a keyword for extracting a peptide sequence having a predetermined physical property from the peptide database 138, for example, a binding constant for a predetermined protein.
- step S240 the sequence extraction unit 136 extracts a peptide sequence satisfying the condition received by the condition input reception unit 134 from the peptide database 138, and outputs the extracted peptide sequence as a prediction result. .
- a peptide sequence having a predetermined physical property can be extracted as expected to exhibit an epitope that binds to a predetermined substance.
- a target protein for example, 9 amino acids derived from the amino acid sequence of a viral antigen
- a peptide sequence having such immunity-inducing ability can be predicted using the number of T cell proliferation induced by this as an indicator of physiological activity.
- peptide is assumed to be a ligand, a specific peptide ligand has not been identified.
- Orphan G-protein coupled receptor (orphan-GPCR) ligand optimization Optimum for this assay, with numerical values such as calcium concentration increase and intracellular cAMP (intracellular biomolecule) increase in cultured cells accompanying peptide administration as indicators of physiological activity. Predicted peptide sequences.
- the peptide sequence can also be predicted using the increase in the blood concentration of a physiologically active peptide or a physiologically active hormone composed of the peptide as an index of physiological activity.
- the present embodiment can also be applied to DNA sequence prediction.
- a transcription factor that controls gene expression must bind upstream of the gene sequence on the DNA, and the DNA base sequence of the transcription factor binding site must have a certain motif or law. It has been known. Therefore, by predicting the transcription of the transcription factor sequence that binds to the promoter involved in the specific gene expression, the gene expression and the DNA sequence pattern of the transcription factor binding site in a specific gene expression system are predicted. Rules can be found, and gene expression and transcription factor binding can be controlled.
- the present embodiment can also be applied to RNAi sequence prediction.
- RNAi sequence prediction For example, by binding to and cleaving mRNA with sequence homology in the presence of a specific small double-stranded RNA base sequence (siRNA) force-capturing factor of about 10 to 20 bases. Upstream ⁇ Downstream It is known to inhibit gene product production. Therefore, by predicting siRNA sequence candidates that bind to mRNA involved in specific gene expression, it becomes possible to predict the relationship between specific physiological activities and RNAi sequences.
- siRNA sequence design which is actively researched and developed, will be possible.
- RNA abutama sequence prediction An RNA aptamer is usually an RNA strand of 20 bases or more, and has a specific stable three-dimensional structure by binding between preferential bases within the sequence. Using this structural property, a target protein or the like can be identified. It is a substance that binds to the functional site of and controls its function. Therefore, by predicting RNA base sequence candidates that have a structure that binds to the functional site of the target protein, it becomes possible to predict the relationship between a specific physiological activity and an RNA abutama sequence. As a substance, it is possible to design an RNA abutama that is actively researched and developed.
- the present invention also provides a program that causes a general-purpose computer device to function as the above-described sequence prediction system or sequence prediction support system.
- biopolymer sequence such as a peptide sequence having a certain predetermined physical property or a nucleotide sequence of a nucleic acid by experiments. Become.
- each component of the sequence prediction system or the sequence prediction support system described above can be expressed in a program.
- a general-purpose computer device can be connected to the sequence prediction system. Or it can be operated as a sequence prediction support system.
Landscapes
- Life Sciences & Earth Sciences (AREA)
- Physics & Mathematics (AREA)
- Health & Medical Sciences (AREA)
- Medical Informatics (AREA)
- Engineering & Computer Science (AREA)
- General Health & Medical Sciences (AREA)
- Spectroscopy & Molecular Physics (AREA)
- Biophysics (AREA)
- Bioinformatics & Cheminformatics (AREA)
- Bioinformatics & Computational Biology (AREA)
- Biotechnology (AREA)
- Evolutionary Biology (AREA)
- Theoretical Computer Science (AREA)
- Artificial Intelligence (AREA)
- Databases & Information Systems (AREA)
- Software Systems (AREA)
- Public Health (AREA)
- Bioethics (AREA)
- Computer Vision & Pattern Recognition (AREA)
- Data Mining & Analysis (AREA)
- Evolutionary Computation (AREA)
- Epidemiology (AREA)
- Chemical & Material Sciences (AREA)
- Proteomics, Peptides & Aminoacids (AREA)
- Analytical Chemistry (AREA)
- Information Retrieval, Db Structures And Fs Structures Therefor (AREA)
- Management, Administration, Business Operations System, And Electronic Commerce (AREA)
Abstract
Description
Claims
Priority Applications (2)
Application Number | Priority Date | Filing Date | Title |
---|---|---|---|
JP2006528959A JPWO2006004182A1 (ja) | 2004-07-07 | 2005-07-07 | 配列予測システム |
US11/571,822 US20090144209A1 (en) | 2004-07-07 | 2005-07-07 | Sequence prediction system |
Applications Claiming Priority (2)
Application Number | Priority Date | Filing Date | Title |
---|---|---|---|
JP2004201116 | 2004-07-07 | ||
JP2004-201116 | 2004-07-07 |
Publications (2)
Publication Number | Publication Date |
---|---|
WO2006004182A1 WO2006004182A1 (ja) | 2006-01-12 |
WO2006004182A9 true WO2006004182A9 (ja) | 2006-03-09 |
Family
ID=35782982
Family Applications (1)
Application Number | Title | Priority Date | Filing Date |
---|---|---|---|
PCT/JP2005/012542 WO2006004182A1 (ja) | 2004-07-07 | 2005-07-07 | 配列予測システム |
Country Status (3)
Country | Link |
---|---|
US (1) | US20090144209A1 (ja) |
JP (1) | JPWO2006004182A1 (ja) |
WO (1) | WO2006004182A1 (ja) |
Cited By (1)
Publication number | Priority date | Publication date | Assignee | Title |
---|---|---|---|---|
JP7516368B2 (ja) | 2019-06-07 | 2024-07-16 | 中外製薬株式会社 | 情報処理システム、情報処理方法、プログラム、及び、抗原結合分子或いはタンパク質を製造する方法 |
Families Citing this family (12)
Publication number | Priority date | Publication date | Assignee | Title |
---|---|---|---|---|
WO2007094137A1 (ja) | 2006-02-17 | 2007-08-23 | Nec Corporation | 細胞傷害性t細胞の誘導方法、細胞傷害性t細胞の誘導剤、およびそれを用いた医薬組成物およびワクチン |
JP5262709B2 (ja) * | 2006-03-15 | 2013-08-14 | 日本電気株式会社 | 分子構造予測システム、方法及びプログラム |
JP4841396B2 (ja) * | 2006-10-18 | 2011-12-21 | Necソフト株式会社 | 塩基配列の同定装置、核酸分子の二次構造取得装置、塩基配列の同定方法、核酸分子の二次構造取得方法、プログラム及び記録媒体 |
DK2918598T3 (en) * | 2007-02-28 | 2019-04-29 | The Govt Of U S A As Represented By The Secretary Of The Dept Of Health And Human Services | Brachyury polypeptides and methods of use |
WO2009066462A1 (ja) | 2007-11-20 | 2009-05-28 | Nec Corporation | 細胞傷害性t細胞の誘導方法、細胞傷害性t細胞の誘導剤、およびそれを用いた医薬組成物およびワクチン |
JP2010115177A (ja) * | 2008-11-14 | 2010-05-27 | Nec Soft Ltd | 分解耐性を有するrnaアプタマー分子の修飾ヌクレオチド配列の選択方法 |
EP2387780A4 (en) * | 2009-01-14 | 2015-03-04 | Johanna Craig | INTEGRATED OFFICE SOFTWARE FOR VIRUS DATA MANAGEMENT |
WO2012005898A2 (en) * | 2010-06-15 | 2012-01-12 | Alnylam Pharmaceuticals, Inc. | Chinese hamster ovary (cho) cell transcriptome, corresponding sirnas and uses thereof |
US9609074B2 (en) * | 2014-06-18 | 2017-03-28 | Adobe Systems Incorporated | Performing predictive analysis on usage analytics |
CA3116265A1 (en) | 2014-10-07 | 2016-04-14 | Cytlimic Inc. | Hsp70-derived peptide, pharmaceutical composition for treating or preventing cancer using same, immunity inducer, and method for producing antigen-presenting cell |
TW201639868A (zh) | 2015-03-09 | 2016-11-16 | Nec Corp | 來自muc1之胜肽、使用此胜肽之用於治療或預防癌症之醫藥組成物、免疫誘導劑、及抗原呈現細胞之製造方法 |
JP7259596B2 (ja) * | 2019-07-01 | 2023-04-18 | 富士通株式会社 | 予測プログラム、予測方法および予測装置 |
-
2005
- 2005-07-07 WO PCT/JP2005/012542 patent/WO2006004182A1/ja active Application Filing
- 2005-07-07 JP JP2006528959A patent/JPWO2006004182A1/ja active Pending
- 2005-07-07 US US11/571,822 patent/US20090144209A1/en not_active Abandoned
Cited By (1)
Publication number | Priority date | Publication date | Assignee | Title |
---|---|---|---|---|
JP7516368B2 (ja) | 2019-06-07 | 2024-07-16 | 中外製薬株式会社 | 情報処理システム、情報処理方法、プログラム、及び、抗原結合分子或いはタンパク質を製造する方法 |
Also Published As
Publication number | Publication date |
---|---|
JPWO2006004182A1 (ja) | 2008-04-24 |
WO2006004182A1 (ja) | 2006-01-12 |
US20090144209A1 (en) | 2009-06-04 |
Similar Documents
Publication | Publication Date | Title |
---|---|---|
WO2006004182A9 (ja) | 配列予測システム | |
DK3144672T3 (en) | GENOME IDENTIFICATION SYSTEM | |
CN111180081B (zh) | 一种智能问诊方法及装置 | |
Zou et al. | Approaches for recognizing disease genes based on network | |
CN108108592B (zh) | 一种用于遗传变异致病性打分的机器学习模型的构建方法 | |
CN108122611B (zh) | 一种信息推荐方法、装置及存储介质、程序产品 | |
JP2005512015A (ja) | 少なくとも1つの順序づけされた制限酵素マップを使用して1つ以上の遺伝子配列マップの検証、アラインメントおよび再順序づけを行うためのシステムおよび方法 | |
Vanunu et al. | A propagation-based algorithm for inferring gene-disease associations | |
EP2919137A1 (en) | Related data generating apparatus, related data generating method, and program | |
JP2007102709A (ja) | 遺伝子診断用のマーカー選定プログラム、該プログラムを実行する装置及びシステム、並びに遺伝子診断システム | |
KR20220099504A (ko) | 친화도 예측 방법 및 모델의 트레이닝 방법, 장치, 전자 기기 및 기록 매체 | |
CN112837747A (zh) | 基于注意力孪生网络的蛋白质结合位点预测方法 | |
CN103473416A (zh) | 蛋白质相互作用的模型建立方法和装置 | |
US20150356238A1 (en) | Scoring the Deviation of an Individual with High Dimensionality from a First Population | |
CN109409522B (zh) | 一种基于集成学习的生物网络推理算法 | |
Guo et al. | An encoding-decoding framework based on CNN for CircRNA-RBP binding sites prediction | |
KR102000832B1 (ko) | miRNA-mRNA 연관도 분석 방법 및 miRNA-mRNA 네트워크 생성 장치 | |
EP4233057A1 (en) | Drug optimisation by active learning | |
KR102187594B1 (ko) | 신약 후보 물질 발굴을 위한 멀티오믹스 데이터 처리 장치 및 방법 | |
CN109256215B (zh) | 一种基于自回避随机游走的疾病关联miRNA预测方法及系统 | |
CN114388123A (zh) | 智能辅诊方法、装置、设备及存储介质 | |
Gupta et al. | DAVI: Deep learning-based tool for alignment and single nucleotide variant identification | |
CN110739028B (zh) | 一种基于k-近邻约束矩阵分解的细胞系药物响应预测方法 | |
CN112133367A (zh) | 药物与靶点间的相互作用关系预测方法及装置 | |
JP2014112307A (ja) | モチーフ検索プログラム、情報処理装置及びモチーフ検索方法 |
Legal Events
Date | Code | Title | Description |
---|---|---|---|
AK | Designated states |
Kind code of ref document: A1 Designated state(s): AE AG AL AM AT AU AZ BA BB BG BR BW BY BZ CA CH CN CO CR CU CZ DE DK DM DZ EC EE EG ES FI GB GD GE GH GM HR HU ID IL IN IS JP KE KG KM KP KR KZ LC LK LR LS LT LU LV MA MD MG MK MN MW MX MZ NA NG NI NO NZ OM PG PH PL PT RO RU SC SD SE SG SK SL SM SY TJ TM TN TR TT TZ UA UG US UZ VC VN YU ZA ZM ZW |
|
AL | Designated countries for regional patents |
Kind code of ref document: A1 Designated state(s): GM KE LS MW MZ NA SD SL SZ TZ UG ZM ZW AM AZ BY KG KZ MD RU TJ TM AT BE BG CH CY CZ DE DK EE ES FI FR GB GR HU IE IS IT LT LU LV MC NL PL PT RO SE SI SK TR BF BJ CF CG CI CM GA GN GQ GW ML MR NE SN TD TG |
|
COP | Corrected version of pamphlet |
Free format text: PAGES 1/10 AND 3/10, DRAWINGS, REPLACED BY NEW PAGES 1/10 AND 3/10; DUE TO LATE TRANSMITTAL BY THE RECEIVING OFFICE |
|
121 | Ep: the epo has been informed by wipo that ep was designated in this application | ||
WWE | Wipo information: entry into national phase |
Ref document number: 2006528959 Country of ref document: JP |
|
NENP | Non-entry into the national phase |
Ref country code: DE |
|
WWW | Wipo information: withdrawn in national office |
Country of ref document: DE |
|
122 | Ep: pct application non-entry in european phase | ||
WWE | Wipo information: entry into national phase |
Ref document number: 11571822 Country of ref document: US |