CN105590038A - Method and system for deducing bonding site of oligonucleotide on genome - Google Patents

Method and system for deducing bonding site of oligonucleotide on genome Download PDF

Info

Publication number
CN105590038A
CN105590038A CN201410568387.0A CN201410568387A CN105590038A CN 105590038 A CN105590038 A CN 105590038A CN 201410568387 A CN201410568387 A CN 201410568387A CN 105590038 A CN105590038 A CN 105590038A
Authority
CN
China
Prior art keywords
thermodynamics
sequence
oligonucleotides
information
enthalpy
Prior art date
Legal status (The legal status is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the status listed.)
Pending
Application number
CN201410568387.0A
Other languages
Chinese (zh)
Inventor
张成岗
屈武斌
刘哲言
Current Assignee (The listed assignees may be inaccurate. Google has not performed a legal analysis and makes no representation or warranty as to the accuracy of the list.)
Beijing Yunyi International Technology Co ltd
Institute of Radiation Medicine of CAMMS
Original Assignee
Beijing Yunyi International Technology Co ltd
Institute of Radiation Medicine of CAMMS
Priority date (The priority date is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the date listed.)
Filing date
Publication date
Application filed by Beijing Yunyi International Technology Co ltd, Institute of Radiation Medicine of CAMMS filed Critical Beijing Yunyi International Technology Co ltd
Priority to CN201410568387.0A priority Critical patent/CN105590038A/en
Publication of CN105590038A publication Critical patent/CN105590038A/en
Pending legal-status Critical Current

Links

Landscapes

  • Management, Administration, Business Operations System, And Electronic Commerce (AREA)
  • Measuring Or Testing Involving Enzymes Or Micro-Organisms (AREA)

Abstract

The invention provides a method and system for deducing a bonding site of oligonucleotide on a genome. According to the method, an index table of thermodynamics of arbitrary 7-mer oligonucleotide is built, a stable bonding sequence of the oligonucleotide to be deduced on the thermodynamics is acquired by using the index table, the position of the bonding sequence is positioned on the genome, thus, the bonding site of the oligonucleotide to be deduced can be efficiently deduced, and a basis is further provided for judging whether the oligonucleotide is high-quality oligonucleotide with regard to a target sequence in the genome. The invention also provides a system for the bonding site of the oligonucleotide on the genome. With the adoption of the above method by the system, the real bonding condition of the oligonucleotide and the target sequence in the genome can be efficiently reflected from the thermodynamics property, and an accurate basis can be provided for judging the quality of the oligonucleotide to be deduced.

Description

A kind of method and system of inferring oligonucleotides binding site on genome
Technical field
The present invention relates to a kind of method and system of inferring oligonucleotides binding site, relate in particular to a kind of method and system of inferring oligonucleotides binding site on genome.
Background technology
Oligonucleotides (oligonucleotide or oligo) is the general name of short-chain nucleotides, and by mating with its complementary series, Chang Zuowei probe to be to determine the structure of target gene, or carrys out effective amplification template sequence as primer. The at present extensive use playing a significant role in the fields such as the order-checking of two generations, the pathogenic microorganism examination, chip hybridization, clinical diagnosis of the molecular biotechnology (as chip probe, PCR primer etc.) based on oligonucleotide hybridization. Its cardinal principle is making nucleic acid molecular hybridization, complementary nucleotide sequence under certain condition (suitable temperature and ionic strength etc.) form non-covalent bond by Watson-Crick base pairing, thereby form the process of stable heteroduplex molecule.
Because the stability of these Protocols in Molecular Biologies depends on sensitiveness oligonucleotides (the corresponding probe in chip of designing strongly, corresponding primer in PCR) quality, design oligonucleotides self and need to there is high stability, can be combined with the target area of target sequence high specific, and be not combined with other region simultaneously. If design not goodly, taking PCR as example, non-specific primer can trigger non-specific amplification, and then causes false-positive testing result. So design suitable oligonucleotide probe and PCR primer is most important, and by oligonucleotides binding site on target sequence of predictive designs, be a kind ofly effectively to judge whether this oligonucleotides is applicable to the method for this target sequence.
Current oligonucleotide design software be widely used BLAST carry out sequence similarity analysis to design oligonucleotides carry out quality control, BLAST one is enclosed within the Local Alignment analysis tool of carrying out sequence similarity comparison in Protein Data Bank or nucleic acid database, gives a mark to illustrate the similarity degree between sequence by sequence similarity. So-called sequence similarity has another name called sequence identity, is that the one of sequence similarity degree is described, and the size of value depends on and in sequence alignment process, detect between sequence and target sequence the number of identical characters on correspondence position, be worth greatlyr, and expression two sequences is more similar. Compare faster speed and higher comparison accuracy because BLAST possesses, therefore in conventional sequence alignment analysis, be most widely used. For example current popular PCR primer-design software PerlPrimer, chip probe design software Mprobe, oligoarray and siRNA design software siRNATargetFinder etc. used the thought of BLAST sequence alignment.
But BLAST can not accurately reflect the truth of oligonucleotides combination. The marking system of BLAST is to give different marks according to the base on correspondence position between sequence identical (as AA, TT etc.) or not identical (as AT, GC etc.), or further to not identical base, according to different matching degrees, (conversion between purine and purine or between pyrimidine and pyrimidine is as mating between AG or CT, transversion between purine and pyrimidine is as the coupling between AC, AT, GC, GT) provide different marking, thereby the summation with each several part coupling mark determines overall similarity degree, and then weighs in conjunction with effect. And oligonucleotide hybridization process is a kind of biochemical reaction, between molecule, interacting is not to have mated how many bases, whether can under the surroundings that has comprised the complexity such as temperature, salt ionic concentration, pH value, form stable duplex structure and depend on. Therefore the essence of oligonucleotides combination is a thermodynamic (al) stability process instead of sequence alignment. Therefore, in BLAST, giving consistent marking for coupling between AT, GC just can not react between GC because the connective stability of three hydrogen bonds is higher than this fact of binding ability of two hydrogen bonds between AT. And it is stable that some mispairing structure shows in thermodynamics binding ability, give identical point penalty and process and easily cause these binding sites to lose predictive ability and BLAST is unified.
Oligonucleotide hybridization state can present various modes simultaneously, except the most stable state perfect matching, as a rule, in conjunction with also there will be the labyrinths such as mispairing, hairpin structure, interior ring, expansion loop in two strands. The limitation of BLAST algorithm self makes to predict a part of structure wherein, only allow the mispairing of a few base of sequence stage casing, cause the loss of quite a few structure, and feel simply helpless for labyrinths such as end mispairing, loop, hairpin.
Arest neighbors model (Nearest-NeighborModel, be called for short NNmodel or close on method model most) be widely popular and application the most reliable thermodynamic calculation method, this model is pointed out the stability that the stability dependency of a given base-pair closes on base-pair in it, and its main thought is that the standard enthalpy change of DNA molecule hybridize course of reaction and Entropy Changes calculating are converted into the standard enthalpy change of 10 dimers (duplex) that formed by 4 base A, T, G, C forming DNA molecular and the cumulative sum of Entropy Changes. But the calculating of the thermodynamic stability of oligonucleotide sequence and its binding sequence hybridization to design is applied to from front to back the calculating of base one by one by the method in prior art conventionally, the thinking traveling through, computing complexity, search procedure is slow and efficiency is lower.
Therefore, need efficiently a kind of, can really reflect the method and system of oligonucleotides in conjunction with macroscopic property, determine binding site and the combination stability of the oligonucleotides designing on target sequence, and then judge whether this oligonucleotides is the high-quality oligonucleotides for this target sequence.
Summary of the invention
The invention provides a kind of method of inferring oligonucleotides binding site on genome. By building the concordance list of thermodynamics information of any 7-mer oligonucleotides, utilize described concordance list to obtain oligonucleotides to be inferred stable binding sequence on thermodynamics, and on genome, locate the position of this binding sequence, can realize efficiently and treat the deduction of inferring oligonucleotides binding site, and then for judging that whether this oligonucleotides is for providing foundation for the high-quality oligonucleotides of target sequence in this genome.
The present invention also provides a kind of system of inferring oligonucleotides binding site on genome. This system utilizes said method to reflect the oligonucleotides truth that target sequence is combined in genome from macroscopic property efficiently, can provide accurate foundation for oligonucleotide mass judgement to be inferred.
A kind of method of inferring oligonucleotides binding site on genome provided by the invention, comprising:
Build the concordance list of the thermodynamics information of any 7-mer oligonucleotides, described thermodynamics information is the information that described oligonucleotides binding sequences all with it are hybridized between two, comprises hybrid structure, hybridization sequences, enthalpy, entropy and free energy;
Utilize described concordance list to obtain the thermodynamics information of oligonucleotides to be inferred, and determine binding sequence stable on thermodynamics on the thermodynamics information basis obtaining;
On genome, find described binding sequence, and locate its position on genome.
In the solution of the present invention, described genome can be for extracting the genome obtaining from organism. Further, this genome can be that existing method for example checks order etc. and to know the genome of its sequence through this area. Therefore, method of the present invention can also comprise extracts genome from organism, and obtains the step of this genome sequence. Described organism can be for example animal, bacterium, plant, fungi etc. Oligonucleotides described to be inferred can be for example that method for designing can be probe or the primer design method of this area routine for probe or the primer of this genome design.
In the solution of the present invention, build the thermodynamics information concordance list of any 7-mer oligonucleotides, object is the thermodynamics information of quick obtaining oligonucleotides to be inferred; In concordance list, oligonucleotides length gets 7, is by the oligonucleotides (from 3~12) of different length relatively, then considers to determine after memory space and computational efficiency. 7-mer oligonucleotides comprises 4 arbitrarily7Bar oligonucleotide sequence. For each 7-mer oligonucleotides, in the time of the hybridization of itself and binding sequence, the arbitrary base (as A) on 7-mer oligonucleotides all can in binding sequence, run into five kinds of situations (T, A, G, C ,-; Wherein T represents coupling, and its excess-three base represents mispairing, and "-" represents that room gap does not match, thereby produces circulus), therefore it should have 5 in theory7Bar binding sequence, is limited by computable thermodynamic parameter, and the binding sequence that in fact carries out thermodynamics assessment will be less than above-mentioned value. In the concordance list process of thermodynamics information that builds any 7-mer oligonucleotides, do not limit maximum mispairing on heteroduplex and the number of coupling, utilize all reliable thermodynamic parameter of arest neighbors method and experimental verification to calculate the thermodynamic data that any 7-mer sequence (pure base composition, containing room structure) binding sequences all with it (may containing having vacant position structure) hybridize between two and comprise hybrid structure, hybridization sequences, enthalpy, entropy and free energy. Described hybrid structure comprises dimeric structure and the non-perfect matching structure of perfect matching.
In another detailed description of the invention of the present invention, utilize described concordance list to obtain the thermodynamics information of oligonucleotides to be inferred, and determine that binding sequence stable on thermodynamics comprises on the thermodynamics information basis obtaining:
Oligonucleotides to be inferred is cut apart from 5' extreme direction to 3' end with the length of 7-mer, obtained the oligonucleotide fragment that oligonucleotide fragment that length is 7-mer and/or length are less than 7-mer;
The oligonucleotide fragment that is 7-mer for length, its thermodynamics information obtains by searching above-mentioned concordance list, is less than the oligonucleotide fragment of 7-mer for length, and its thermodynamics information is by rebuilding acquisition;
The thermodynamics information of respectively cutting apart the oligonucleotide fragment obtaining is combined, and the every thermodynamics information in each combination is summed up, obtain the thermodynamics information of oligonucleotides to be inferred;
According to the size of the free energy in the thermodynamics information of oligonucleotides to be inferred, oligonucleotides to be inferred stable binding sequence on thermodynamics described in determining.
In the solution of the present invention, stable binding sequence on thermodynamics, present the default threshold of stable state (this is the threshold value according to research experience and documentation & info obtain in the past using-11kcal/mol as heteroduplex, user can adjust according to actual conditions), think described oligonucleotides stable binding sequence on thermodynamics with free energy after described oligonucleotide hybridization at-binding sequence below 11kcal/mol (being that absolute value is greater than 11kcal/mol).
In another detailed description of the invention of the present invention, on genome, find described binding sequence, and locate their positions on genome and comprise:
Build any 9-mer sequence in genome according to existing 9-mer Index Algorithm, from 5 ' end to 3 ' extreme direction, in the positional information of positive-sense strand and antisense strand,
On genome, locate oligonucleotides acquired to be inferred stable binding sequence on thermodynamics.
In the solution of the present invention, take k-mer Index Algorithm on genome, to locate oligonucleotides acquired to be inferred stable binding sequence on thermodynamics, it is algorithm well known in the prior art, by more different k values (from 5 to 12), then consider the factor such as search efficiency, index stores and finally determine that k=9 is comparatively suitable value in the present invention. The thermodynamics index of this k-mer Index Algorithm and above-mentioned 7-mer oligonucleotides is irrelevant, object is in genome, to build in advance the positional information of k-mer fragment, provide convenience for locate oligonucleotides acquired to be inferred stable binding sequence on thermodynamics on genome, accelerate the searching process of binding site.
A kind of system of inferring oligonucleotides binding site on genome provided by the invention, comprising:
Concordance list builds module, be used for the concordance list of the thermodynamics information that builds any 7-mer oligonucleotides, described thermodynamics information is the information that described oligonucleotides binding sequences all with it are hybridized between two, comprises hybrid structure, hybridization sequences, enthalpy, entropy and free energy;
Determine binding sequence module, for utilizing described concordance list to obtain the thermodynamics information of oligonucleotides to be inferred, and determine binding sequence stable on thermodynamics on the thermodynamics information basis obtaining;
Locating module, for find described binding sequence on genome, and locates its position on genome.
In another detailed description of the invention of the present invention, described definite binding sequence module comprises cuts apart module, thermodynamics information acquisition module, thermodynamics information composite module, and judge module;
The described module of cutting apart is for cutting apart with the length of 7-mer oligonucleotides to be inferred from 5' extreme direction to 3' end, obtain the oligonucleotide fragment that oligonucleotide fragment that length is 7-mer and/or length are less than 7-mer;
Described thermodynamics information acquisition module, for being that the thermodynamics information of 7-mer oligonucleotide fragment obtains by searching above-mentioned concordance list by length, is less than length the thermodynamics information of 7-mer oligonucleotide fragment by rebuilding acquisition;
Described thermodynamics information composite module is used for the thermodynamics information of each segmentation to combine, and the every thermodynamics information in each combination is summed up, and obtains the thermodynamics information of oligonucleotides to be inferred;
Described judge module is for according to the size of the free energy of the thermodynamics information of oligonucleotides to be inferred, oligonucleotides to be inferred stable binding sequence on thermodynamics described in determining.
In another detailed description of the invention of the present invention, described locating module for finding described binding sequence on genome, and locate their positions on genome and comprise: build in genome k-mer sequence arbitrarily according to known k-mer Index Algorithm, from 5 ' end to 3 ' extreme direction, in the positional information of positive-sense strand and antisense strand; On genome, locate oligonucleotides acquired to be inferred stable binding sequence on thermodynamics. In the solution of the present invention, k is preferably 9.
In the solution of the present invention, H ° of enthalpy Δ in the thermodynamics information of described concordance list be by obtaining cumulative the enthalpy of the perfect matching dimeric structure comprising from the sequence between the initial perfect matching base-pair to finishing in described hybrid structure and non-perfect matching structure, and S ° of described entropy Δ is by by the dimeric structure of perfect matching comprising from the sequence between the initial perfect matching base-pair to finishing in described hybrid structure with the entropy of non-perfect matching structure is cumulative obtains.
The computational methods that described enthalpy Δ is H ° are, for example:
The computational methods that described entropy Δ is S ° are, for example:
Described free energyPass through total enthalpyAnd total entropyProofread and correct public according to salinityWithCalculate,
Described total enthalpy passes through the enthalpy of the dimeric structure of all perfect matchings in described hybrid structure and non-perfect matching structure, enthalpy and H ° of (Symmetry) cumulative obtaining of the symmetric enthalpy Δ of sequence with the independent base-pair GC of starting and ending or AT, described total entropy passes through the entropy of the dimeric structure of all perfect matchings in described hybrid structure and non-perfect matching structure, with entropy and S ° of (Symmetry) cumulative obtaining of the symmetric entropy Δ of sequence of the independent base-pair GC of starting and ending or AT.
The computational methods of described total enthalpy are, for example
The computational methods of described total entropy are, for example:
For the enthalpy change of the dimeric structure of perfect matching, Entropy Changes, the Thermodynamic Basis parameter list (perfect matching between AT or GC) that can use SantaLucia to propose in 1998,10 dimeric standard enthalpy changes that A, T, G, C form and Entropy Changes, acquisition as shown in table 1.
Table 1
The oligonucleotides nearest neighbor method thermodynamic parameter (1mol/LNaCl, 37 DEG C) that SantLucia proposes
Do not mate the enthalpy change of the labyrinth such as (dangling-end), expansion loop (bulgeloop) for non-perfect matching structure as independent mispairing (singemismatch), continuously mispairing (tandemmismatches), end, Entropy Changes can pass through document (AllawiandSantaLucia, 1997,1998a, 1998b, 1998c; Bommarito etc., 2000; Ohmichi etc., 2002; Peyret etc., 1999; SantaLuciaandHicks, 2004; Tanaka etc., 2004) obtain.
Entropy and the enthalpy of the independent base-pair GC of described starting and ending or AT can obtain by table 1.
The symmetric entropy Δ of described sequence H ° (Symmetry) and the symmetric enthalpy Δ of sequence S ° (Symmetry) can obtain by table 1.
In the solution of the present invention, in concordance list, above-mentioned combination can adopt specific symbol to represent. The for example coupling in hybrid structure, mispairing and not matching status can use respectively " | ", " x " and "-" to represent. In the process of index building table, those skilled in the art can select according to actual conditions the combination of one or more binding sequences and described oligonucleotides. In view of each base (as A) on arbitrary oligonucleotides can in binding sequence, run into five kinds of situations (T, A, G, C ,-; Wherein T represents coupling " | ", and its excess-three base represents mispairing " x ", and "-" represents that room gap does not match). In order further to save memory space and to improve operational efficiency, facilitate follow-up treating to infer obtaining of oligonucleotides thermodynamics information, in concordance list of the present invention, can make the one in 0 in quinary digit, 1,2,3 respectively corresponding four kinds of DNAs, 4 corresponding room gap, DNA sequence dna code in described 7-mer oligonucleotides binding sequences all with it is converted to quinary digit, then this quinary digit is converted to decimal number.
DNA sequence dna code is converted to quinary digit, then this quinary digit is converted into decimal numeral process, for example: described any 7-mer oligonucleotides binding sequences all with it, according to A=0, G=1, C=2, T=3,-=4 relation, first converts quinary code to, again according to binary system, octal system and hexadecimal transformation rule, we have designed the conversion method between quinary and the decimal system, the most quinary digital code of 7 changes decimal code into, as
“ATACGAA→(0302100)5→9650;“CTTCG-C”→(2332142)5→42797。
Further, in the solution of the present invention, the thermodynamics information that by the length of cutting apart oligonucleotides acquisition to be inferred is 7-mer oligonucleotide fragment obtains by searching above-mentioned concordance list, comprise: make the one in 0 in quinary digit, 1,2,3 respectively corresponding four kinds of DNAs, 4 corresponding room gap, by cutting apart the DNA sequence dna code conversion quinary digit of the 7-mer oligonucleotide fragment that oligonucleotides to be inferred obtains, then this quinary digit is converted to decimal number; Decimal number corresponding thermodynamics information in described concordance list of searching described 7-mer oligonucleotide fragment obtains the thermodynamics information of this 7-mer oligonucleotide fragment. Length is less than to the thermodynamics information of 7-mer oligonucleotide fragment by rebuilding acquisition; Construction method is with concordance list building process.
Further, the thermodynamics information of each segmentation is combined: establishing segmentation number is n, the 1st the corresponding a group of fragment thermodynamics information, the 2nd the corresponding b group of fragment thermodynamics information ... the corresponding f group of n fragment thermodynamics information, the thermodynamics information of the oligonucleotides to be inferred after combination be a × b × ... × f group; For example: taking one section of 21mer oligonucleotide sequence as example, three fragments " TCAGCTG ", " CCACGTC " that " TCAGCTGCCACGTCGACAACA " is divided into respectively contain respectively 15536,15740 and 15536 hybridization sequences with " GACAACA ", therefore after combination, symbiosis becomes 15536*15740*15536=3799121239040 bar binding sequence, to there being 3799121239040 groups of thermodynamics informations.
Every thermodynamics information in each combination is summed up and comprised: the hybridization sequences of the n in each combination is spliced; And calculate the total enthalpy of n segmentation, comprise enthalpy and the symmetric enthalpy of sequence of the enthalpy of the enthalpy of n segmentation, a n-1 joining place hybrid structure, the enthalpy of the non-perfect matching structure in sequence to be inferred two ends, the independent base-pair GC of sequence starting and ending to be inferred or AT, sum up the total enthalpy that obtains n segmentation; Calculate the total entropy of n segmentation, comprise entropy and the symmetric entropy of sequence of the entropy of the entropy of n fragment, a n-1 joining place hybrid structure, the entropy of the non-perfect matching structure in sequence to be inferred two ends, the independent base-pair GC of sequence starting and ending to be inferred or AT, sum up the total entropy that obtains n segmentation; By the total enthalpy obtaining and total entropy, proofread and correct public according to salinityWithCalculate. Salinity updating formula of the present invention is proposed in 1998 by SantaLucia.
With following 3 segmentationFor example is described segmentation splicing in detail, the computational process of total enthalpy and total entropy: 3 segmentation are held splicing successively obtain from 5' extreme direction to 3'Calculate the total enthalpy of 3 segmentation: by the enthalpy of three segmentation, two joints while adding splicingEnthalpy, add the initial non-perfect matching structure of complete sequence 5 ' endEnthalpy and two groups of two ends perfect matching mate separately enthalpy and the symmetric enthalpy of sequence of base-pair AT, obtain merging total enthalpy of posterior restoration oligonucleotide sequence to be inferred, calculate in the same way total entropy of whole piece oligonucleotide sequence to be inferred. In the solution of the present invention, if the two ends of the complete sequence of oligonucleotides to be inferred do not have non-perfect matching structure after splicing, the computational process of above-mentioned total enthalpy and total entropy is not considered enthalpy and the entropy of non-perfect matching structure.
In the solution of the present invention, known k-mer Index Algorithm mainly completes by following steps:
(1) adopt sliding window method, setting window size is k-mer, determines fragment to be searched 4k bar sequence altogether, k=9 here,
(2) taking step-length as 1,9-mer fragment to be searched is scanned from front to back on appointment genome,
(3) record the position that 9-mer fragment to be searched occurs on heterogeneic positive-sense strand and antisense strand,
(4) whole fragments to be searched are repeated to (2) (3) process,
(5) any 9-mer fragment occurring in genome is represented with decimal numeral form, store the positional information on each comfortable genome simultaneously.
In the solution of the present invention, based on 9-mer Index Algorithm, get the positional information of all 9-mer sequences on genome and deposit database in, database comprises genomic.2bit (genomic data that storage represents with binary form), genomic.sqlite3.db (storage 9-mer sequence 3 ' is held the positional information of first base on genome) and three files of genomic.uni (annotation information of storage genome sequence). On genome, locating binding sequence stable on acquired thermodynamics completes by following steps:
(1) intercept the each 9-mer fragment of binding sequence to be found 5 ' end and 3 ' end, according to quinary A=0, G=1, C=2, T=3,-=4 relation, is converted into decimal number, genomic.sqlite3.db is corresponding with genes of interest group database, therefrom obtains genome positional information;
(2) judge whether two 9-mer fragments occur in the same chain of same gene, criterion:
Positive-sense strand meet right_pos-left_pos+9=seq_size or
Antisense strand meets left_pos-right_pos+9=seq_size
Wherein, right_pos: the position of last base of 9-mer fragment intercepting on target sequence 3 ' end on genome, left_pos: the position of last base of 9-mer fragment intercepting on target sequence 5 ' end on genome, the length that seq_size is target sequence;
(3) if set up (2), further utilize genomic.2bit file and adopt twoBitToFa program, the base and the binding sequence entirety that intercept on genome correspondence position are carried out comparison of coherence:
Intercepting scope: hit_id[left_pos-8..right_pos on genome positive-sense strand]
Intercepting scope: hit_id[right_pos..left_pos+8 on genome antisense strand]
Hit_id represents chromosome ID;
(4), if (3) are verified, illustrate that target sequence is successfully located on genome, otherwise do not exist on genome;
(5) binding sequence of all acquisitions is repeated to (1)-(4) process.
The result obtaining according to above step, can realize and treat the deduction of inferring oligonucleotides binding site, and then for judging that whether this oligonucleotides is for providing foundation for the high-quality oligonucleotides of target sequence in this genome.
The solution of the present invention has the following advantages:
1) the thermodynamic parameter table that the solution of the present invention employing arest neighbors model, and combination is proposed in 1998 by SantaLucia and the macroscopic property that can be characterized well oligonucleotide hybridization by the salinity updating formula of proposition in 1998.
2) the solution of the present invention is by building in advance thermodynamics index, builds the concordance list of the thermodynamics information that any 7-mer oligonucleotides binding sequences all with it hybridize between two, thereby improves the efficiency of calculation of thermodynamics. For multiple oligonucleotide sequences that relate to, need not be at every turn the calculation of thermodynamics of base one by one from front to back, reduce computational complexity, by cutting apart pattern associating rapid extraction and merging thermodynamics information, what make that thermodynamics complexity causes can not computational problem be resolved, and calculates means and acquisition methods for user provides more fast and convenient thermodynamics information.
3) in the solution of the present invention, the 7-mer oligonucleotides binding sequence all with it can represent and be stored in concordance list by decimal number arbitrarily, thereby can save the memory space of concordance list and improve operational efficiency, facilitating follow-up treating to infer obtaining of oligonucleotides thermodynamics information.
4) the solution of the present invention is from the essential thermodynamics of bioconjugation and be no longer the crossover process that sequence alignment is simulated oligonucleotide molecules and genes of interest, the diversity of hybridization state is embodied, thereby reflect more real regulation relationship in organism, predict oligonucleotides binding site from full genomic level, to realize the most comprehensively analysis and prediction, this is the first scheme based on thermodynamic (al) full genomic level prediction oligonucleotides binding site, for high-quality chip probe in future, many application such as the design of PCR primer and siRNA provide important analysis means.
5) the 9-mer Index Algorithm that the solution of the present invention adopts in full genome is found the process of binding site, can significantly accelerate the searching process of binding site, directly from database extract location information, save each locate to genome loaded down with trivial details, make to come true based on thermodynamic (al) full genome prediction, genomic positive-sense strand and antisense strand are brought in hybridization object simultaneously simultaneously, ensured the comprehensive of oligonucleotides binding site.
Brief description of the drawings
Fig. 1 a kind of structure chart of inferring oligonucleotides system of binding site on genome provided by the invention.
Detailed description of the invention
Embodiment 1: use the definite binding sequence module 12 in the system 10 of genomic level deduction oligonucleotides binding site shown in Fig. 1 to treat deduction oligonucleotide sequence the method according to this invention and obtain binding sequence
Step 1: the concordance list that uses concordance list in described system 10 to build module 11 to build the thermodynamics information of any 7-mer oligonucleotides, described thermodynamics information is the information that described oligonucleotides binding sequences all with it are hybridized between two, comprises hybrid structure, hybridization sequences, enthalpy, entropy and free energy;
1) each base (as A) on arbitrary oligonucleotides can run into five kinds of situations (T, A, G, C,-) in binding sequence, make the one in 0 in quinary digit, 1,2,3 respectively corresponding four kinds of DNAs, 4 corresponding room gap, by any 7-mer oligonucleotides binding sequence all with it, all be converted into quinary digit, then this quinary digit be converted to decimal number and represent. For example make A=0, G=1, C=2, T=3,-=4, by the sequence code conversion quinary digit of each 7-mer oligonucleotides and its all binding sequence, then this quinary digit is converted to decimal number; Table 2 has shown 57Mapping relations (partial data is provided) between bar nucleotide sequence code and numeral.
Table 2
2) in thermodynamics information, the sequence of each 7-mer oligonucleotides is pressed decimal number from small to large, the hybrid structure of each 7-mer oligonucleotide sequence binding sequences all with it sorts from high to low by stability simultaneously, and (free energy is lower, in conjunction with more stable), table 3 is for building complete concordance list (having shown part): comprise hybrid structure, hybridization sequences, enthalpy, entropy and free energy; The 7-mer oligonucleotide sequence of the first numeral decimal number of row representative, uses respectively " | ", " x " and " " to represent coupling between two bases, mispairing and matching status not in hybrid structure; The hybridization sequences of the 7-mer oligonucleotide sequence of the numeral decimal number representative after each hybrid structure; In concordance list, constructed sequence is all from 5 ' end to 3 ' extreme direction; In every a line, use "; " separate the thermodynamic data of the various hybrid structures of this oligonucleotide sequence.
Table 3
Step 2: utilize described concordance list to obtain the thermodynamics information of oligonucleotides to be inferred by definite binding sequence module 12, and determine binding sequence stable on thermodynamics on the thermodynamics information basis obtaining:
1) the existing process of determining binding sequence stable on thermodynamics taking the long primer of one section 24mer as example describes the present embodiment in detail:
The long primer Query=GAGTTTTAGAGGCTGTTAATTTGC of 24mer (5'-> 3')
2) utilize the module 101 of cutting apart in described definite binding sequence module 12 that primer long this 24mer is cut apart from 5' extreme direction to 3' end with the length of 7-mer, obtain " GAGTTTT ", " AGAGGCT ", " GTTAATT " and " TGC " four fragments.
3) utilize the thermodynamics information acquisition module 102 in described definite binding sequence module 12, the thermodynamics information that by length is 7-mer oligonucleotide fragment obtains by searching above-mentioned concordance list, comprise the DNA sequence dna code conversion quinary digit of each 7-mer oligonucleotides, then this quinary digit is converted to decimal number, decimal number corresponding thermodynamics information in described concordance list of searching 7-mer oligonucleotide fragment obtains the thermodynamics information of this 7-mer oligonucleotide fragment. As shown in table 4, table 5 and table 6.
The thermodynamics information of all hybridization sequences of table 4 " GAGTTTT " (" 16718 "), thermodynamic parameter can computer capacity (perfect matching that this scope is mentioned in referring to above-mentioned 7 pages and non-perfect matching structure be as independent mispairing, mispairing continuously, end do not mate, Entropy Changes and the enthalpy change value of expansion loop one class labyrinth, the available heat Mechanical Data confirming through experiment obtaining by document at present; Also have some structures there is no at present thermodynamic data, or certain structures there is thermodynamic data but through experiment confirm do not there is confidence level, will not adopt here) in contain altogether by statistics 16147 hybridization sequences (partial data is provided).
The thermodynamics information of all hybridization sequences of table 5 " AGAGGCT " (" 3288 "), can contain 15631 hybridization sequences (partial data is provided) in computer capacity by statistics altogether in thermodynamic parameter.
The thermodynamics information of all hybridization sequences of table 6 " GTTAATT " (" 26893 "), can add up and contain altogether 15767 hybridization sequences (partial data is provided) by statistics in computer capacity in thermodynamic parameter.
4) utilize the thermodynamics information acquisition module 102 in described definite binding sequence module 12, length is less than to the thermodynamics information of 7-mer oligonucleotide fragment by rebuilding acquisition; Construction method is with step 1, difference is, do not have this oligonucleotide fragment and all binding sequences thereof to be converted to decimal number, and directly show with the form of DNA sequence dna code, and with the thermodynamic data of this oligonucleotide fragment hybrid structure of content representation in every group of quotation marks, contain this oligonucleotide sequence, hybrid structure, hybridization sequences, enthalpy and entropy. Result is as shown in table 7.
The thermodynamics information of all hybridization sequences of table 7 " TGC ", can contain 77 hybridization sequences in computer capacity by statistics altogether in thermodynamic parameter
5) utilize the described thermodynamics information composite module 103 in described definite binding sequence module 12 that the thermodynamics information of each segmentation is combined, and the every thermodynamics information in each combination is summed up, obtain the thermodynamics information of oligonucleotides to be inferred.
Hybrid structure, hybridization sequences, enthalpy and the entropy of " GAGTTTT ", " AGAGGCT ", " GTTAATT " and " TGC " each sequence fragment in the thermodynamics information of combination, are stored. First,, by table 4, the binding sequence in 5,6 in thermodynamics information is according to A=0, G=1, C=2, T=3,-=4 relation, convert the decimal system to quinary digit, then this quinary digit is converted to DNA sequence dna code, and to set its length be 7, " A " polishing for front end that less than is seven. Then by table 4,5,6 combine with the thermodynamics information of table 7. Table 8 has shown that 306420912229663 (16147*15631*15767*77) plants combination situations (partial data is provided) altogether.
Table 8
Secondly, carry out adding of thermodynamic data entropy and enthalpy and calculate for each assembled state, to obtain the thermodynamics information of long primer (Query sequence) the entirety hybridization of 24mer. With a kind of situation wherein, the process that the every thermodynamics information in each combination sums up is described:
First will
["GAGTTTT,|||||||,AAAACTC,-48.1,-132.2","AGAGGCT,|||||||,AGCCTCT,-49.4,-129.5","GTTAATT,|||||||,AATTAAC,-46.5,-130.7","TGC,||,-CA,-8.5,-22.7"]
Four hybridized fragments are as follows:
Then four fragments enthalpy and entropy is separately added:
In addition, in the process summing up four fragments, also need to consider the thermodynamics value of three joining place TA/AT, TG/AC and the double-stranded hybridization of TT/AA:
Moreover, integrate the enthalpy of the rear complete independent base-pair GC of Query sequence starting and ending or AT, thermodynamics value and the also consideration in the lump of sequence symmetry factor that entropy and 5 ' and 3 ' is held non-perfect matching structure:
Finally, carry out free energy calculating:
6) utilize judge module in described definite binding sequence module 12 104 according to the size of waiting to infer the free energy in the thermodynamics information of the long oligonucleotides of described 24mer, oligonucleotides to be inferred stable binding sequence on thermodynamics described in determining
According to a large amount of experiments,-threshold value that 11kcal/mol presents stable state as heteroduplex is carried out to binding site filtration, by free energy in the splicing of-following fragment hybrid structure of 11kcal/mol and hybridization sequences and sort from high to low with the stability in conjunction with free energy, thereby obtain stable binding sequence list. The stable binding sequence list on thermodynamics of Query sequence is as shown in table 9: the packets of information of every binding sequence is containing in conjunction with free energy, hybrid structure, hybridization sequences and remove the hybridization sequences behind room, and sequence all represents with 5 ' end to 3 ' extreme direction.
Table 9
The method that the present embodiment provides and strategy, compare from front to back the calculation of thermodynamics of base one by one, reduce computational complexity, by cutting apart pattern associating rapid extraction and merging process, what make that thermodynamics complexity causes can not computational problem be resolved, calculate means and acquisition methods for user provides more fast and convenient thermodynamics information, thereby infer that for realizing treating the deduction of oligonucleotides binding site provides foundation.
Embodiment 2 utilizes the locating module 13 in the application's system 10 to locate the position of binding sequence stable on the thermodynamics obtaining in embodiment 1 on the full genome of the mankind
Known k-mer algorithm process process is as follows:
(1) adopt sliding window method, setting window size is k-mer, determines fragment to be searched totally 4kBar sequence, k=9 here,
(2) taking step-length as 1,9-mer fragment to be searched is scanned from front to back on appointment genome,
(3) record the position that 9-mer fragment to be searched occurs on heterogeneic positive-sense strand and antisense strand,
(4) whole fragments to be searched are repeated to (2) (3) process,
(5) any 9-mer fragment occurring in genome is represented with decimal numeral form, by each any 9-mer fragment, the positional information on genome is stored in database in the lump simultaneously.
K-mer Preprocessing Algorithm builds any 9-mer sequence in full mankind genome (5 ' end is to 3 ' extreme direction) complete and deposits in database in the positional information of positive-sense strand (plus) and antisense strand (minus), in database, can utilize content to comprise genomic.2bit (genomic data that storage represents with binary form), genomic.sqlite3.db (storage 9-mer sequence 3 ' is held the positional information of first base on genome) and three files of genomic.uni (annotation information of storage sequence), now according to primer stable binding sequence list on thermodynamics that in 9-mer fast search algorithm and above-described embodiment, success obtains, describe the position fixing process of binding sequence on the full genome of the mankind in the present embodiment in detail.
Be the positional information of 9-mer sequence and sequence is converted into numeral stores due to what comprise in database, first be target sequence using the binding sequence in obtained removal room as research object, intercept the each 9-mer fragment of its 5 ' end and 3 ' end, and according to mapping relations DI_CONVERT={A=0, G=1, C=2, T=3,-=4}, first conversion sequence is 9 quinary digits, then use the transformation rule between quinary and the decimal system, change two 9-mer fragments into digital code, thereby respectively get each sequence positional information heterogeneic positive-sense strand and antisense strand on corresponding with genomic.sqlite3.db. positional information in the two strands of wherein storing in database is all to represent with the coordinate of 3 ' first base of holding on genome in 9-mer sequence, and come home position with 5 ' to 3 ' direction in positive-sense strand, antisense strand, taking the position of relative positive-sense strand as reference, adopts contrary direction.
Meet right_pos-left_pos+9=seq_size with positive-sense strand, or antisense strand meets two conditions of left_pos-right_pos+9=seq_size, verify whether two 9-mer fragments occur in the same chain of same gene. Here right_pos: the position of last base on genome in target sequence 3 ' direction 9-mer fragment, left_pos: the position of last base on genome in target sequence 5 ' direction 9-mer fragment, the length that seq_size is target sequence.
If the verification passes, utilize genomic.2bit file and adopt twoBitToFa program, further verify the uniformity of genome correspondence position and target sequence entirety. Intercepting scope: hit_id[left_pos-8..right_pos on genome positive-sense strand], intercepting scope: hit_id[right_pos..left_pos+8 on genome antisense strand], hit_id represents chromosome No. ID. When time in full accord, this aim sequence successful location on genome is described, otherwise do not exist on genome.
Binding sequence stable on the thermodynamics of all acquisitions is repeated to said process, and then obtain success is located on the full genome of the mankind binding sequence and relevant position thereof. The genome annotation information providing in conjunction with genomic.uni file, all binding sites by primer sequence on the full genome of the mankind sort from high to low according to stability, and present in the mode of straightforward with reference to the form of BLAST. Predict the outcome as shown in table 10: contained free energy size, the hybridization site of primer on human genome and the detailed annotation information of gene of each calmodulin binding domain CaM of primer and genes of interest hybridization, " | ", " x " have represented respectively the different crossing patterns that primer is combined with human genome from " ". Can find out 51667843 to 51667866 this section of nucleotide sequence perfect matchings No. 17 chromosomes of primer and people from predicting the outcome, show the most stable bonding state.
Table 10
Use the method in the present embodiment to pass through directly from database extract location information, save the process that repeatedly arrives genome location, can significantly accelerate the search progress of binding site, make to come true based on thermodynamic (al) full genome scanning, simultaneously the method can be applied on the full genome of different plant species as people, nematode, fruit bat, mouse etc., makes to predict more specific and comprehensive based on thermodynamic (al) oligonucleotides binding site under full genome background.
Although the present invention discloses as above with embodiment; but it is not in order to limit the present invention, any those skilled in the art, without departing from the spirit and scope of the present invention; can change arbitrarily or be equal to replacement, therefore the scope that protection scope of the present invention should be defined with the application's claims is as the criterion.

Claims (12)

1. a method of inferring oligonucleotides binding site on genome, is characterized in that, comprising:
Build the concordance list of the thermodynamics information of any 7-mer oligonucleotides, described thermodynamics information is instituteState the information that oligonucleotides binding sequences all with it are hybridized between two, comprise hybrid structure, hybridization sequences,Enthalpy, entropy and free energy;
Utilize described concordance list to obtain the thermodynamics information of oligonucleotides to be inferred, and the thermodynamics obtainingOn Information base, determine binding sequence stable on thermodynamics;
On genome, find described binding sequence, and locate its position on genome.
2. method according to claim 1, is characterized in that, the thermodynamics information of described concordance listIn H ° of enthalpy Δ by by described hybrid structure from the order between the initial perfect matching base-pair to finishingThe dimeric structure of perfect matching that row comprise and the enthalpy of non-perfect matching structure are cumulative to be obtained;
S ° of described entropy Δ by by described hybrid structure between the initial perfect matching base-pair to finishingThe dimeric structure of perfect matching that comprises of sequence and the entropy of non-perfect matching structure is cumulative obtains;
Described free energyPass through total enthalpyAnd total entropyProofread and correct public according to salinityWithCalculate, described total enthalpy passes through willThe enthalpy of the dimeric structure of all perfect matchings and non-perfect matching structure in described hybrid structure, with initialWith finish the enthalpy of independent base-pair GC or AT and the symmetric enthalpy of sequence is cumulative obtains, described total entropy is logicalCross the entropy of the dimeric structure of all perfect matchings in described hybrid structure and non-perfect matching structure, withThe entropy of the independent base-pair GC of starting and ending or AT and the symmetric entropy of sequence are cumulative to be obtained,
Make the one in 0 in quinary digit, 1,2,3 respectively corresponding four kinds of DNAs, 4 pairsAnswer room gap, by the DNA sequence dna code conversion in described 7-mer oligonucleotides binding sequences all with itFor quinary digit, then this quinary digit is converted to decimal number.
3. method according to claim 1 and 2, is characterized in that, utilizes described concordance list to obtainThe thermodynamics information of oligonucleotides to be inferred, and determine at thermodynamics on the thermodynamics information basis obtainingUpper stable binding sequence comprises:
Oligonucleotides to be inferred is cut apart from 5' extreme direction to 3' end with the length of 7-mer, obtained longThe oligonucleotide fragment that degree is 7-mer and/or length are less than the oligonucleotide fragment of 7-mer;
The oligonucleotide fragment that is 7-mer for length, its thermodynamics information is by searching above-mentioned concordance listObtain, be less than the oligonucleotide fragment of 7-mer for length, its thermodynamics information obtains by rebuilding;
The thermodynamics information of respectively cutting apart the oligonucleotide fragment obtaining is combined, and by each combinationEvery thermodynamics information sum up, obtain the thermodynamics information of oligonucleotides to be inferred;
According to the size of the free energy in the thermodynamics information of oligonucleotides to be inferred, described in determining, wait to inferOligonucleotides stable binding sequence on thermodynamics.
4. method according to claim 1, is characterized in that, finds described combination on genomeSequence, and locate their positions on genome and comprise:
Build any 9-mer sequence in genome according to 9-mer Index Algorithm, from 5 ' end to 3 ' end sideTo, in the positional information of positive-sense strand and antisense strand;
On genome, locate oligonucleotides acquired to be inferred stable binding sequence on thermodynamics.
5. method according to claim 3, is characterized in that, length is 7-mer oligonucleotides sheetThe thermodynamics information of section comprises by searching above-mentioned concordance list acquisition:
Make the one in 0 in quinary digit, 1,2,3 respectively corresponding four kinds of DNAs, 4 pairsAnswer room gap, will cut apart the DNA sequence dna of the 7-mer oligonucleotide fragment that oligonucleotides to be inferred obtainsCode conversion quinary digit, is then converted to decimal number by this quinary digit,
Search the thermodynamics of decimal number correspondence in described concordance list of described 7-mer oligonucleotide fragmentInformation obtain the thermodynamics information of this 7-mer oligonucleotide fragment.
6. method according to claim 3, is characterized in that, will respectively cut apart the oligonucleotides obtainingThe thermodynamics information of fragment combines: establishing segmentation number is n, the 1st the corresponding a group of fragment heating powerInformation, the 2nd the corresponding b group of fragment thermodynamics information ..., n the corresponding f group of fragment thermodynamics letterBreath, the thermodynamics information of the oligonucleotides to be inferred after combination be a × b × ... × f group;
Every thermodynamics information in each combination is summed up and comprised: be assorted by the n in each combinationHand over sequence to splice; And calculate the total enthalpy of n segmentation, comprise by the enthalpy of n segmentation,The enthalpy of n-1 joining place hybrid structure, the enthalpy of the non-perfect matching structure in sequence to be inferred two ends, order to be inferredThe enthalpy of the independent base-pair GC of row starting and ending or AT and the symmetric enthalpy of sequence, sum up and obtainThe total enthalpy of n segmentation; The total entropy that calculates n segmentation, comprises the entropy of n fragment, n-1The entropy of individual joining place hybrid structure, the entropy of the non-perfect matching structure in sequence to be inferred two ends, sequence to be inferredThe entropy of the independent base-pair GC of starting and ending or AT and the symmetric entropy of sequence, sum up and obtain nThe total entropy of individual segmentation; By the total enthalpy obtaining and total entropy, according to salinity updating formulaWithCalculate free energy.
7. a system of inferring oligonucleotides binding site on genome, is characterized in that, comprising:
Concordance list builds module, for building the concordance list of thermodynamics information of any 7-mer oligonucleotides,Described thermodynamics information is the information that described oligonucleotides binding sequences all with it are hybridized between two, comprises assortedKnot structure, hybridization sequences, enthalpy, entropy and free energy;
Determine binding sequence module, for utilizing described concordance list to obtain the thermodynamics of oligonucleotides to be inferredInformation, and determine binding sequence stable on thermodynamics on the thermodynamics information basis obtaining;
Locating module, for finding described binding sequence on genome, and locates it on genomePosition.
8. system according to claim 7, is characterized in that, described concordance list builds module and obtainsThermodynamics information in H ° of enthalpy Δ by by described hybrid structure from the initial perfect matching base to finishingTo between the dimeric structure of perfect matching that comprises of sequence and the enthalpy of non-perfect matching structure cumulativeArrive;
S ° of described entropy Δ by by described hybrid structure between the initial perfect matching base-pair to finishingThe dimeric structure of perfect matching that comprises of sequence and the entropy of non-perfect matching structure is cumulative obtains;
Described free energyPass through total enthalpyAnd total entropyProofread and correct public according to salinityWithCalculate, described total enthalpy passes through willThe enthalpy of the dimeric structure of all perfect matchings and non-perfect matching structure in described hybrid structure, with initialWith finish the enthalpy of independent base-pair GC or AT and the symmetric enthalpy of sequence is cumulative obtains, described total entropy is logicalCross the entropy of the dimeric structure of all perfect matchings in described hybrid structure and non-perfect matching structure, withThe entropy of the independent base-pair GC of starting and ending or AT and the symmetric entropy of sequence are cumulative to be obtained.
Make the one in 0 in quinary digit, 1,2,3 respectively corresponding four kinds of DNAs, 4 pairsAnswer room gap, by the DNA sequence dna code conversion in described 7-mer oligonucleotides binding sequences all with itFor quinary digit, then this quinary digit is converted to decimal number.
9. according to the system described in claim 7 or 8, it is characterized in that described definite binding sequence mouldPiece comprises cuts apart module, thermodynamics information acquisition module, thermodynamics information composite module and judge module;
The described module of cutting apart is for holding oligonucleotides to be inferred to 3' from 5' extreme direction with the length of 7-merCut apart, obtain the few nucleosides that oligonucleotide fragment that length is 7-mer and/or length are less than 7-merAcid fragment;
Described thermodynamics information acquisition module is for by length being the thermodynamics letter of 7-mer oligonucleotide fragmentBreath obtains by searching above-mentioned concordance list, length is less than to the thermodynamics information of 7-mer oligonucleotide fragmentBy rebuilding acquisition;
Described thermodynamics information composite module is used for the thermodynamics information of each segmentation to combine, andEvery thermodynamics information in each combination is summed up, obtain the thermodynamics letter of oligonucleotides to be inferredBreath;
Large for according to the free energy of the thermodynamics information of oligonucleotides to be inferred of described judge moduleLittle, oligonucleotides to be inferred stable binding sequence on thermodynamics described in determining.
10. system according to claim 7, is characterized in that, described locating module is used at baseBecause organizing the described binding sequence of upper searching, and locate their positions on genome and comprise:
Build any 9-mer sequence in genome according to 9-mer Index Algorithm, from 5 ' end to 3 ' end sideTo, in the positional information of positive-sense strand and antisense strand,
On genome, locate oligonucleotides acquired to be inferred stable binding sequence on thermodynamics.
11. systems according to claim 9, is characterized in that, described length is the few nucleosides of 7-merThe thermodynamics information of acid fragment comprises by searching above-mentioned concordance list acquisition:
Make the one in 0 in quinary digit, 1,2,3 respectively corresponding four kinds of DNAs, 4 pairsAnswer room gap, will cut apart the DNA sequence dna of the 7-mer oligonucleotide fragment that oligonucleotides to be inferred obtainsCode conversion quinary digit, is then converted to decimal number by this quinary digit,
Search the thermodynamics information of decimal number correspondence in described concordance list of 7-mer oligonucleotide fragmentObtain the thermodynamics information of this 7-mer oligonucleotide fragment.
12. systems according to claim 9, is characterized in that, by the thermodynamics of each segmentationInformation combines: establishing segmentation number is n, the 1st the corresponding a group of fragment thermodynamics information, the 2ndThe corresponding b group of individual fragment thermodynamics information ..., n the corresponding f group of fragment thermodynamics information, after combinationThe thermodynamics information of oligonucleotides to be inferred be a × b × ... × f group;
Every thermodynamics information in each combination is summed up and comprised: the n in each combination hybridizationSequence is spliced; And calculate the total enthalpy of n segmentation, comprise the enthalpy of n segmentation, n-1The enthalpy of individual joining place hybrid structure, the enthalpy of the non-perfect matching structure in sequence to be inferred two ends, sequence to be inferredThe enthalpy of the independent base-pair GC of starting and ending or AT and the symmetric enthalpy of sequence, sum up and obtain nThe total enthalpy of individual segmentation; Calculate the total entropy of n segmentation, comprise the entropy of n fragment, n-1The entropy of joining place hybrid structure, the entropy of the non-perfect matching structure in sequence to be inferred two ends, sequence to be inferred riseBegin and finish entropy and the symmetric entropy of sequence of independent base-pair GC or AT, summing up and obtain nThe total entropy of individual segmentation, proofreaies and correct public according to salinityWithCalculate free energy.
CN201410568387.0A 2014-10-22 2014-10-22 Method and system for deducing bonding site of oligonucleotide on genome Pending CN105590038A (en)

Priority Applications (1)

Application Number Priority Date Filing Date Title
CN201410568387.0A CN105590038A (en) 2014-10-22 2014-10-22 Method and system for deducing bonding site of oligonucleotide on genome

Applications Claiming Priority (1)

Application Number Priority Date Filing Date Title
CN201410568387.0A CN105590038A (en) 2014-10-22 2014-10-22 Method and system for deducing bonding site of oligonucleotide on genome

Publications (1)

Publication Number Publication Date
CN105590038A true CN105590038A (en) 2016-05-18

Family

ID=55929614

Family Applications (1)

Application Number Title Priority Date Filing Date
CN201410568387.0A Pending CN105590038A (en) 2014-10-22 2014-10-22 Method and system for deducing bonding site of oligonucleotide on genome

Country Status (1)

Country Link
CN (1) CN105590038A (en)

Cited By (4)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN106709274A (en) * 2017-02-27 2017-05-24 新疆大学 Method for analyzing coordinated regulation physical mechanism of biological function gene in growth and development by entropy change
CN109903812A (en) * 2019-02-22 2019-06-18 哈尔滨工业大学(深圳) A kind of gene order digital implementation and system based on comentropy
CN111681711A (en) * 2020-06-28 2020-09-18 江苏先声医学诊断有限公司 Design and screening method of degenerate primer
CN115960993A (en) * 2023-01-16 2023-04-14 中国科学院苏州生物医学工程技术研究所 Method for designing LAMP primer group and application thereof

Citations (4)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
US20050050033A1 (en) * 2003-08-29 2005-03-03 Shiby Thomas System and method for sequence matching and alignment in a relational database management system
EP1681641A3 (en) * 2005-01-13 2007-04-04 International Business Machines Corporation Incremental indexing
CN102243697A (en) * 2010-05-11 2011-11-16 解放军第三○二医院 Primer library and screening system for rapid PCR (Polymerase Chain Reaction) detection for population sudden viral epidemics
JP5183155B2 (en) * 2007-11-06 2013-04-17 株式会社日立製作所 Batch search method and search system for a large number of sequences

Patent Citations (4)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
US20050050033A1 (en) * 2003-08-29 2005-03-03 Shiby Thomas System and method for sequence matching and alignment in a relational database management system
EP1681641A3 (en) * 2005-01-13 2007-04-04 International Business Machines Corporation Incremental indexing
JP5183155B2 (en) * 2007-11-06 2013-04-17 株式会社日立製作所 Batch search method and search system for a large number of sequences
CN102243697A (en) * 2010-05-11 2011-11-16 解放军第三○二医院 Primer library and screening system for rapid PCR (Polymerase Chain Reaction) detection for population sudden viral epidemics

Non-Patent Citations (1)

* Cited by examiner, † Cited by third party
Title
刘哲言: "microRNA对人类管家基因和非管家基因的调控差异及寡核苷酸结合位点的热力学研究", 《中国优秀硕士学位论文全文数据库 基础科学辑》 *

Cited By (5)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN106709274A (en) * 2017-02-27 2017-05-24 新疆大学 Method for analyzing coordinated regulation physical mechanism of biological function gene in growth and development by entropy change
CN109903812A (en) * 2019-02-22 2019-06-18 哈尔滨工业大学(深圳) A kind of gene order digital implementation and system based on comentropy
CN111681711A (en) * 2020-06-28 2020-09-18 江苏先声医学诊断有限公司 Design and screening method of degenerate primer
CN111681711B (en) * 2020-06-28 2021-03-16 江苏先声医学诊断有限公司 Design and screening method of degenerate primer
CN115960993A (en) * 2023-01-16 2023-04-14 中国科学院苏州生物医学工程技术研究所 Method for designing LAMP primer group and application thereof

Similar Documents

Publication Publication Date Title
Sedlar et al. Bioinformatics strategies for taxonomy independent binning and visualization of sequences in shotgun metagenomics
CA2424031C (en) System and process for validating, aligning and reordering genetic sequence maps using ordered restriction map
Sun et al. Identifying splicing sites in eukaryotic RNA: support vector machine approach
US20120330566A1 (en) Sequence assembly and consensus sequence determination
WO2016141294A1 (en) Systems and methods for genomic pattern analysis
WO2010075570A2 (en) Methods, computer-accessible medium, and systems for score-driven whole-genome shotgun sequence assemble
Nebel et al. JAguc—a software package for environmental diversity analyses
CN105590038A (en) Method and system for deducing bonding site of oligonucleotide on genome
Rasheed et al. A map-reduce framework for clustering metagenomes
US12062417B2 (en) System, method and computer accessible-medium for multiplexing base calling and/or alignment
Zhang et al. Benchmarking deep learning methods for predicting CRISPR/Cas9 sgRNA on-and off-target activities
EP2137528A2 (en) Methods, computer-accessible medium, and systems for generating a genome wide haplotype sequence
CN103339632B (en) Information nucleic acid treating apparatus and processing method thereof
CN111261228B (en) Method and system for calculating conserved nucleic acid sequences
Rusinova et al. Model Formalization for Genomes Comparative Analysis Using a Graph Database
Böer Multiple alignment using hidden Markov models
Liu et al. Statistical models for biological sequence motif discovery
Garai et al. A novel genetic approach for optimized biological sequence alignment
Wu et al. An expert system to identify co-regulated gene groups from time-lagged gene clusters using cell cycle expression data
Ashraf et al. A novel gene-tree based approach to infer relations among disease-genes across different cancer types
Durai Novel graph based algorithms for transcriptome sequence analysis
Tsou et al. Biological data warehousing system for identifying transcriptional regulatory sites from gene expressions of microarray data
Kerdprasop et al. Recognizing DNA splice sites with the frequent pattern mining technique
Xu et al. BSS-HMM3s: An improved HMM method for identifying transcription factor binding sites: Full Length Research paper
Yonia et al. DNA Pattern Matching Algorithms within Sorghum bicolor Genome: A Comparative Study

Legal Events

Date Code Title Description
C06 Publication
PB01 Publication
C10 Entry into substantive examination
SE01 Entry into force of request for substantive examination
WD01 Invention patent application deemed withdrawn after publication

Application publication date: 20160518

WD01 Invention patent application deemed withdrawn after publication