CN107577918A - The recognition methods of CpG islands, device based on genetic algorithm and hidden Markov model - Google Patents
The recognition methods of CpG islands, device based on genetic algorithm and hidden Markov model Download PDFInfo
- Publication number
- CN107577918A CN107577918A CN201710725585.7A CN201710725585A CN107577918A CN 107577918 A CN107577918 A CN 107577918A CN 201710725585 A CN201710725585 A CN 201710725585A CN 107577918 A CN107577918 A CN 107577918A
- Authority
- CN
- China
- Prior art keywords
- chromosome
- fitness value
- markov model
- sequence
- hidden markov
- Prior art date
- Legal status (The legal status is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the status listed.)
- Pending
Links
Landscapes
- Investigating Or Analysing Biological Materials (AREA)
- Measuring Or Testing Involving Enzymes Or Micro-Organisms (AREA)
Abstract
The present invention relates to a kind of CpG islands recognition methods based on genetic algorithm and hidden Markov model, comprise the following steps:1)Multiple chromosomes for including gene elements are obtained, each gene elements use real number representation, and multiple Encoded Chromosomes form one group of hidden Markov model parameter;2) fitness value of each chromosome is determined using fitness function, the fitness value is used for representing chromosome quality degree;3)Using genetic algorithm, according to the fitness value, searching process is performed to the chromosome, then redefines the chromosome fitness value after optimizing again;4)Iteration is applicable step 3), after meeting to set end condition, export optimal hidden Markov model parameter;5)Using the optimal HMM parameter of output, on the basis of given observation sequence, it is determined that the maximum probability hidden state sequence of the observation sequence is generated, for representing the position on CpG islands.
Description
Technical field
The invention belongs to biological information field, and in particular to a kind of CpG based on genetic algorithm and hidden Markov model
Island recognition methods, device.
Background technology
With the completion that biological gene is sequenced, problems and challenge are faced with gene order identification.In many bases
Because dinucleotides most rare in group is CG, the C in CG is easiest to be methylated, and this can cause C to be mutated into T.But methylate
Effect is usually suppressed by the gene in a region, and this region is exactly CpG islands.It is a kind of length in the special of hundreds of bp
The frequency that DNA sequence dna, wherein CG nucleotides occur is very high.Often find that a CpG island means that its sequence may include base
Because the promoter and its First Exon of transcription and the identification on CpG islands help to determine that we are interested in genome sequence
Region.Therefore, CpG islands have vital meaning to gene order identification.
The identification on CpG islands mainly faces two problems:1. giving a short genome sequence, how to judge whether it comes from
CpG islands.2. giving a long sequence, if containing CpG islands, how to identify.
Current research is concentrated mainly on Second Problem.Researcher thinks that length is more than 200bp, CG 50% with
On, actual CpG contents are with it is expected that region of the ratio more than 0.6 of CpG contents is referred to as CpG islands.The identification on traditional CpG islands is calculated
Method is to define a sliding window, by the CG contents of gene order in calculation window and actual CpG contents with it is expected CpG contents
Ratio realize.We can be found that the setting of window size, and recognition effect is had a great influence, and computation complexity
It is very big.And propose discrimination standard be all artificially defined, thus identify that CpG islands biological significance it is little.In order to
Enough discrimination standards for correctly finding out more biological significance, have researcher to propose the method based on hidden Markov model (HMM)
To identify the position on CpG islands.HMM is a kind of probabilistic model, and it is produced by a hidden state change sequence and by the hidden state
Raw observable symbol sebolic addressing composition.
One hidden Markov model is that have alphabet ∑, a state set Q, a state probability matrix A and one
Send what probability matrix B was defined, wherein:
● ∑ is an alphabet;
● Q represents the set of the symbol sent from alphabet;
● A describes HMM and is transferred to state t+1 shape probability of states from state t;
● B describes the probability for the symbol s that HMM is sent in state t;
Once a system can be described as HMM, it is possible to for solving three basic problems.
Decoding problem:Setting models and character string, an optimal path is found in a model.The path from starting shape
State is set out, and each state selects to discharge a character in path, realizes decoding operate.
Evaluation problem:For setting models, the probability for producing character string is sought.Generally select forwards algorithms
The probability of an observation sequence after given HMM is calculated, and therefore selects most suitable HMM.
Problem concerning study:HMM is generated according to observation sequence.
The problem of wherein the first two is pattern-recognition:Given HMM seeks the probability (assessment) of an observation sequence;Search most has
The hidden state sequence (decoding) of an observation sequence may be generated.3rd problem is that given observation sequence generates a HMM
(study).3rd problem, and to HMM the problem of related in be most difficult to, known collection (come from according to an observation sequence
Close), and associated hidden state collection, estimate a most suitable HMM.A total of eight in HMM
Kind state:{ A+, G+, C+, T+, A-, G-, C-, T- }, A+ represent this state inside CpG islands, and A- represents this state on CpG islands
It is outside.Each base correspond to two states in model.In the case of given base sequence, it is impossible to determine which kind of base corresponds to
State value.Allow mutually to change in model, between state.The application method of hidden Markov model is as follows:
The DNA sequence dna on a number of CpG islands having determined is collected first, and depanning is trained using these real data
The problem concerning study of the parameter of type, i.e. hidden Markov model.Mould is obtained from training data by establishing hidden Markov model
Shape parameter, the Model Identification CpG islands further obtained with training.
For HMM and corresponding observation sequence, it is intended that find out the most probable hidden state for generating this sequence
Sequence.We can each combine corresponding observation sequence by listing all possible hidden state sequence and calculating to correspond to
General sequence looks for most probable hidden state, but this method computation complexity is very high.
Hidden Markov model is the probabilistic model based on sequential, and it relies on initial state probability vector, transition probability square
Battle array and observation probability matrix.By research find, although hidden Markov model can be obtained on solving the problems, such as overfitting compared with
Good effect, but still have many problems.It dependent on strong it is assumed that NextState is only influenceed by laststate,
This hypothesis excessively simplifies, and therefore, only assuming that in the case of consistent with real data, hidden Markov model could basis
Maximal possibility estimation is made effectively and accurately identified.But under normal circumstances, real data is not only by the shadow of laststate
Ring.This causes HMM to be easily trapped into the situation of local optimum, and computation complexity is higher.In order to improve knowledges of the HMM to CpG islands
Other ability to HMM parameters, it is necessary to optimize design.
The content of the invention
For the deficiencies in the prior art, the invention provides one kind to be based on genetic algorithm and hidden Markov model
The recognition methods of CpG islands, the solution in HMM spaces can be considered, so as to draw globally optimal solution, can preferably be optimized
HMM parameters, so as to improve to CpG islands recognition capability.
The technical scheme is that:
A kind of CpG islands recognition methods based on genetic algorithm and hidden Markov model, comprises the following steps:
1) multiple chromosomes for including gene elements are obtained, each gene elements use real number representation, multiple dyeing
Body forms one group of hidden Markov model parameter;
2) fitness value of each chromosome is determined using fitness function, the fitness value is used for representing to dye
Body quality degree;
3) genetic algorithm is used, according to the fitness value, searching process is performed to the chromosome, it is then again true again
Determine the chromosome fitness value after optimizing;
4) iteration is applicable step 3), after meeting to set end condition, exports optimal hidden Markov model parameter;
5) using the optimal HMM parameter of output, on the basis of given observation sequence, it is determined that generation institute
The maximum probability hidden state sequence of observation sequence is stated, for representing the position on CpG islands.
Wherein, we determine to generate the maximum probability hidden state sequence of the observation sequence using Viterbi algorithm, bag
Include:
According to a local probability and a local optimum path corresponding to each base state in the observation sequence, lead to
The probability of hidden state and the product of corresponding observation probability are crossed, selects current time maximum local probability and its corresponding office
Portion's optimal path, recalled according to the local optimum path at current time, obtain the position recognition result on CpG islands.
Included using the specific formula of Viterbi algorithm:
For each state i, i=1 ..., n, Viberbi algorithms are defined as:
Xi=(Xi1,Xi2,…,XiT)
The local probability at t=1 moment is calculated by the probability of hidden state and the product of corresponding observation probability.It is right
In other moment, each state Viterbi algorithm saves a backward pointerAnd store one in each state
Individual local probability δ, observation state kt, probability b, state transition probability a are observed, reaches state i nearest local path
Probability is δt(i):
It can determine to reach the optimal path of NextState by as above formula.In order to determine that the t=T moment is most probable hidden
Tibetan state, makes it:
it=argmax (δT(i))
For other moment it,
Computation complexity can be reduced by recurrence using Viterbi algorithm, the context of observation sequence has been obtained most
Good explanation.
Further, the genetic algorithm includes selection operation, crossover operation and mutation operation, by successively using selection
Operation, crossover operation and mutation operation, optimizing is performed to the chromosome.
Wherein, selection operation includes:According to the fitness value of each chromosome, fitness value is selected to meet hereditary demand
Chromosome carries out heredity, deletes not selected chromosome.
If Population Size is N, chromosome is xiFitness function is f (xi), then xiSelected probability is:
qiFor calculating chromosome xiThe accumulated probability of (i=1,2,3 ... .n):
We are based on above-mentioned accumulated probability, using the method choice chromosome of roulette, are met the sample of hereditary demand
This.
Wherein, crossover operation includes:In the chromosome that the fitness value meets hereditary demand, select fitness value compared with
Excellent chromosome dyad carries out crossover operation between two neighboring parent chromosome, produces child chromosome as parent.
Likewise, during parent chromosome is selected, also using the method for roulette.
Mutation operation includes:In the child chromosome, it is first determined genetic mutation site, according to the mutation of setting
Rate, change the genic value of the genetic mutation site.
Specifically, setting p as the genetic mutation site chosen, random numbers of the r between [0,1], ct is current algebraically,
Mt is total algebraically, and b=2, C are the genic values after variation.Genetic mutation site is changed into:
Mutation operation by and its small probability go change chromosome value, caused HMM parameters with variation before HMM join
It is several and its close.
Further, the fitness function is:
Further, iteration is applicable step 3), using Baum-Welch algorithms, including:
We define the HMM model reevaluatedAbove formula left end represents revaluation HMM model
Three parameters.γ in formulat(j) represent that t is located at hidden state SjProbability, ξt(i, j) represents that t is located at and hides shape
State SiAnd the t+1 moment is located at hidden state SjProbability, O represent observation sequence.After successive ignition, it can obtain on HMM
Maximal possibility estimation.Output solution is optimal hidden Markov model parameter.
The invention also provides a kind of computer-readable storage medium, is stored with a plurality of instruction, and the instruction is suitable to by processor
Load and perform following processing:
1) multiple chromosomes for including gene elements are obtained, each gene elements use real number representation, multiple codings
Chromosome forms one group of hidden Markov model parameter;
2) fitness value of each chromosome is determined using fitness function, the fitness value is used for representing to dye
Body quality degree;
3) genetic algorithm is used, according to the fitness value, searching process is performed to the chromosome, it is then again true again
Determine the chromosome fitness value after optimizing;
4) iteration is applicable step 3), after meeting to set end condition, exports optimal hidden Markov model parameter;
5) using the optimal HMM parameter of output, on the basis of given observation sequence, it is determined that generation institute
The maximum probability hidden state sequence of observation sequence is stated, for representing the position on CpG islands.
The present invention has also been proposed a kind of CpG islands identification device based on genetic algorithm and hidden Markov model, including place
Device is managed, for realizing each instruction;And computer-readable storage medium, for storing a plurality of instruction, the instruction is suitable to by processor
Load and perform following processing:
1) multiple chromosomes for including gene elements are obtained, each gene elements use real number representation, multiple codings
Chromosome forms one group of hidden Markov model parameter;
2) fitness value of each chromosome is determined using fitness function, the fitness value is used for representing to dye
Body quality degree;
3) genetic algorithm is used, according to the fitness value, searching process is performed to the chromosome, it is then again true again
Determine the chromosome fitness value after optimizing;
4) iteration is applicable step 3), after meeting to set end condition, exports optimal hidden Markov model parameter;
5) using the optimal HMM parameter of output, on the basis of given observation sequence, it is determined that generation institute
The maximum probability hidden state sequence of observation sequence is stated, for representing the position on CpG islands.
Beneficial effects of the present invention:
The present invention carries out parameter optimization using HMM first, to improve the ability on identification CpG islands.Secondly, by using heredity
Algorithm estimates HMM parameter.Finally, the method can obtain HMM maximum likelihood estimation, and the model is for identifying CGIS
Position it is highly useful.Show on the basis of experiment, the CpG islands recognition methods of genetic algorithm and hidden Markov model combination
Improve accuracy and recall rate.
Brief description of the drawings
The flow chart for the CpG islands recognition methods that Fig. 1 genetic algorithms and hidden Markov model combine;
Crossover operation in Fig. 2 genetic algorithms;
Mutation operation in Fig. 3 genetic algorithms;
Fig. 4 genetic algorithms relevant parameter controls;
Fig. 5 with iterations increase corresponding to fitness value;
The HMM results of Fig. 6 combination genetic algorithms;
The HMM and HMM of Fig. 7 genetic algorithm optimizations contrast to CpG islands recognition capability;
Embodiment:
The invention will be further described with embodiment below in conjunction with the accompanying drawings:
It is noted that described further below is all exemplary, it is intended to provides further instruction to the application.It is unless another
Indicate, all technologies used herein and scientific terminology are with usual with the application person of an ordinary skill in the technical field
The identical meanings of understanding.
It should be noted that term used herein above is merely to describe embodiment, and be not intended to restricted root
According to the illustrative embodiments of the application.As used herein, unless the context clearly indicates otherwise, otherwise singulative
It is also intended to include plural form, additionally, it should be understood that, when in this manual using term "comprising" and/or " bag
Include " when, it indicates existing characteristics, step, operation, device, component and/or combinations thereof.
As background technology is previously mentioned, hidden Markov model be easily trapped on solving the problems, such as overfitting it is local most
Excellent situation, and computation complexity is higher.In order to improve recognition capabilities of the HMM to CpG islands, the present invention proposes one kind and is based on
The CpG islands recognition methods of genetic algorithm and hidden Markov model, comprises the following steps, as shown in Figure 1:
1) model parameter initializes:Multiple chromosomes for including gene elements are obtained, each gene elements are using real
Number represents that multiple chromosomes form one group of hidden Markov model parameter;
Chromosome generally represents the element of a character string, and each element is otherwise known as gene.Due to HMM parameters A, B, Pi
It is three real number matrix, and dimension is higher, is difficult to using binary coding, for direct real reaction model parameter
Change, so being represented using real number genomic strings.
2) fitness value of each chromosome is determined using fitness function, the fitness value is used for representing to dye
Body quality degree;
3) genetic algorithm is used, according to the fitness value, searching process is performed to the chromosome, it is then again true again
Determine the chromosome fitness value after optimizing;
4) iteration is applicable step 3), after meeting to set end condition, exports optimal hidden Markov model parameter;
5) using the optimal HMM parameter of output, applied in test set, the base of observation sequence is being given
Decoding operate is carried out on plinth, it is determined that the maximum probability hidden state sequence of the observation sequence is generated, for representing the position on CpG islands
Put.
Genetic algorithm in the present invention includes three kinds of genetic operators, is selection operation, crossover operation and mutation operation respectively.
Selection operation is exactly that outstanding individual is selected from multiple chromosomes, in selection mechanism simulation natural selection
Survival of the fittest mechanism.The high chromosome of the fitness value chromosome survival ability lower than fitness value is stronger, in subsequent evolution
During, not selected chromosome is deleted.Decide whether to be genetic to the next generation from the mode of roulette.According to dyeing
The fitness value of body itself, different size of region area is corresponded to respectively.If Population Size is N, chromosome is xiFitness
Function is f (xi), then xiSelected probability is:
qiFor calculating chromosome xiThe accumulated probability of (i=1,2,3 ... .n):
Crossover operation is to recombinate the combination of parent chromosome.Selected parent is the maximum chromosome of fitness value, such as
Shown in Fig. 2.Therefore, it can be seen that the operation can intersect optimal parent so as to draw more outstanding offspring.Father
In generation, is chosen based on roulette mechanism.
Chromosome mutation adds the change of model parameter.By changing the value of gene in chromosome, have genetic algorithm
There is the ability of global search.It makes the information that genetic algorithm is recovered to lose in initial phase and model parameter phylogenetic scale, makes
Genetic algorithm can search optimal model parameter.Model parameter change before, if mutation rate be more than or equal to
The probability of machine generation, mutation rate will test with the probability that generates at random.If result is correct, model parameter will be repaiied
Change, it is as shown in Figure 3 using single-point random variation.P is the value for choosing the gene position of variation original, and r is random between [0,1]
Number, ct are current algebraically, and mt is total algebraically, and b=2, C are the values after variation.Become changing into for dystopy:
It is worth noting that before mutation operation, it is first determined gene mutation site, then changed according to certain probability
The original gene of change point.Mutation operation by and its small probability go to change the value of chromosome, caused HMM parameters should be with
HMM parameters before variation and its close.
In the present embodiment, fitness value can reflect genetic algorithm performance and and be used to assess chromosome adaptation energy
Power.Whether each individual has a fitness scoring, which dictates that being chosen.
Generally, if object function is maximization problems, fitness function is defined as follows:
If object function is minimization problem, fitness function is defined as follows:
Wherein cmin(cmax) it is coefficient correlation.
Fitness function calculation formula is used in the method in order to reduce the complexity of fitness function:
By the number for adjusting training dataset in CpG islands, it is possible to adjust the complexity of fitness function.Therefore, should
Method
Significantly reduce the complexity of the function of fitness.
In order to filter out optimal HMM parameter, we are iterated using Baum-Welch algorithms, tool
Body implementation method is:
We define the HMM model reevaluatedAbove formula left end represents revaluation HMM model
Three parameters.γ in formulat(j) represent that t is located at hidden state SjProbability, ξt(i, j) represents that t is located at and hides shape
State SiAnd the t+1 moment is located at hidden state SjProbability, O represent observation sequence.After successive ignition, it can obtain on HMM
Maximal possibility estimation.Output solution is optimal hidden Markov model parameter.
After optimal model parameter is drawn, in order to verify the optimal model parameter, we are applied to test set
In, for HMM and corresponding observation sequence, it is desirable to find out the most probable hidden state sequence for generating this sequence, also
It is decoding operate.Each base state corresponds to a local probability and a local optimum path in DNA sequence dna, and we can be with
Global optimal path is determined by the state and its corresponding local optimum path that select this moment maximum local probability.Utilize
The product of state transition probability and corresponding observation probability, selects the probability of maximum, obtained probable value i.e. most possible
Hidden state sequence.Decoded in this method from Viterbi algorithm.Viberbi algorithms can be defined as:
Xi=(Xi1,Xi2,…,XiT)
The local probability at t=1 moment is calculated by the probability of hidden state and the product of corresponding observation probability.It is right
In other moment:
It can determine to reach the most probable path of NextState by as above formula.In order to determine t=T moment most probables
Hidden state, make it:
it=argmax (δT(i))
For other moment it,
Recalled according to most probable path, the "+" state for completing to pass through in rear path will a corresponding CpG island.Most
The recognition result such as Fig. 6 on Zhong Dui CpG islands, shown in Fig. 7.
Viterbi algorithm reduces computation complexity by recurrence, by being carried out to the whole context of observation sequence
Best explanation.The "+" state passed through in path will a corresponding CpG island.It is assumed that test data set T=ATTAGCGAT,
The optimal path that Viterbi algorithm is found is status switch { A-, T-, T+, A+, G+C+, G+, A-, T }, then may determine that TAGCG
For a CpG island.It can be seen that HMM identifies field on CpG islands, there is very big value.
The invention also provides a kind of computer-readable storage medium, is stored with a plurality of instruction, and the instruction is loaded by processor
And perform following handle:
1) multiple chromosomes for including gene elements are obtained, each gene elements use real number representation, multiple codings
Chromosome forms one group of hidden Markov model parameter;
2) fitness value of each chromosome is determined using fitness function, the fitness value is used for representing to dye
Body quality degree;
3) genetic algorithm is used, according to the fitness value, searching process is performed to the chromosome, it is then again true again
Determine the chromosome fitness value after optimizing;
4) iteration is applicable step 3), after meeting to set end condition, exports optimal hidden Markov model parameter;
5) using the optimal HMM parameter of output, on the basis of given observation sequence, it is determined that generation institute
The maximum probability hidden state sequence of observation sequence is stated, for representing the position on CpG islands.
Further, the present invention is had also been proposed a kind of identified based on the CpG islands of genetic algorithm and hidden Markov model and filled
Put, including processor, for realizing each instruction;And computer-readable storage medium, for storing a plurality of instruction, the instruction by
Reason device loads and performs following processing:
1) multiple chromosomes for including gene elements are obtained, each gene elements use real number representation, multiple codings
Chromosome forms one group of hidden Markov model parameter;
2) fitness value of each chromosome is determined using fitness function, the fitness value is used for representing to dye
Body quality degree;
3) genetic algorithm is used, according to the fitness value, searching process is performed to the chromosome, it is then again true again
Determine the chromosome fitness value after optimizing;
4) iteration is applicable step 3), after meeting to set end condition, exports optimal hidden Markov model parameter;
5) using the optimal HMM parameter of output, on the basis of given observation sequence, it is determined that generation institute
The maximum probability hidden state sequence of observation sequence is stated, for representing the position on CpG islands.
The present invention carries out parameter optimization using HMM first, to improve the ability on identification CpG islands.Secondly, by using heredity
Algorithm estimates HMM parameter.Finally, the method can obtain HMM maximum likelihood estimation, and the model is for identifying CGIS
Position it is highly useful.Show on the basis of experiment, the CpG islands recognition methods of genetic algorithm and hidden Markov model combination
Improve accuracy and recall rate.
The preferred embodiment of the application is the foregoing is only, is not limited to the application, for the skill of this area
For art personnel, the application can have various modifications and variations.It is all within spirit herein and principle, made any repair
Change, equivalent substitution, improvement etc., should be included within the protection domain of the application.
Claims (10)
1. a kind of CpG islands recognition methods based on genetic algorithm and hidden Markov model, it is characterised in that including following step
Suddenly:
1) multiple chromosomes for including gene elements are obtained, each gene elements use real number representation, multiple chromosome structures
Into one group of hidden Markov model parameter;
2) fitness value of each chromosome is determined using fitness function, the fitness value is used for representing that chromosome is excellent
Bad degree;
3) genetic algorithm is used, according to the fitness value, searching process is performed to the chromosome, then redefines and seeks again
Chromosome fitness value after excellent;
4) iteration is applicable step 3), after meeting to set end condition, exports optimal hidden Markov model parameter;
5) using the optimal HMM parameter of output, on the basis of given observation sequence, it is determined that generating the sight
The maximum probability hidden state sequence of sequence is examined, for representing the position on CpG islands.
2. according to the method for claim 1, it is characterised in that determine to generate the observation sequence using Viterbi algorithm
Maximum probability hidden state sequence.
3. according to the method for claim 2, it is characterised in that the Viterbi algorithm determines to generate the observation sequence
Maximum probability hidden state sequence include:
According to a local probability and a local optimum path corresponding to each base state in the observation sequence, by hidden
The product of the probability of Tibetan state and corresponding observation probability, selects current time maximum local probability and its corresponding part most
Good path, recalled according to the local optimum path at current time, obtain the position recognition result on CpG islands.
4. according to the method for claim 1, it is characterised in that the genetic algorithm include selection operation, crossover operation and
Mutation operation, by using selection operation, crossover operation and mutation operation successively, optimizing is performed to the chromosome.
5. according to the method for claim 4, it is characterised in that the selection operation includes:According to the suitable of each chromosome
Angle value is answered, selects fitness value to meet that the chromosome of hereditary demand carries out heredity, deletes not selected chromosome.
6. according to the method for claim 5, it is characterised in that the crossover operation includes:Meet in the fitness value
In the chromosome of hereditary demand, selecting fitness value, preferably chromosome dyad is as parent, in two neighboring parent chromosome
Between carry out crossover operation, produce child chromosome.
7. according to the method for claim 6, it is characterised in that the mutation operation includes:In the child chromosome,
Genetic mutation site is determined first, according to the mutation rate of setting, changes the genic value of the genetic mutation site.
8. according to the method for claim 1, it is characterised in that the fitness function is:
The number on CpG islands is concentrated by training data in adjustment, it is possible to adjust the complexity of fitness function.
9. a kind of computer-readable storage medium, it is stored with a plurality of instruction, it is characterised in that the instruction is suitable to by processor loading simultaneously
Perform following handle:
1) multiple chromosomes for including gene elements are obtained, each gene elements use real number representation, multiple coding dyeing
Body forms one group of hidden Markov model parameter;
2) fitness value of each chromosome is determined using fitness function, the fitness value is used for representing that chromosome is excellent
Bad degree;
3) genetic algorithm is used, according to the fitness value, searching process is performed to the chromosome, then redefines and seeks again
Chromosome fitness value after excellent;
4) iteration is applicable step 3), after meeting to set end condition, exports optimal hidden Markov model parameter;
5) using the optimal HMM parameter of output, on the basis of given observation sequence, it is determined that generating the sight
The maximum probability hidden state sequence of sequence is examined, for representing the position on CpG islands.
10. a kind of CpG islands identification device based on genetic algorithm and hidden Markov model, including processor, each for realizing
Instruction;And computer-readable storage medium, for storing a plurality of instruction, it is characterised in that:The instruction is suitable to be loaded by processor
And perform following handle:
1) multiple chromosomes for including gene elements are obtained, each gene elements use real number representation, multiple coding dyeing
Body forms one group of hidden Markov model parameter;
2) fitness value of each chromosome is determined using fitness function, the fitness value is used for representing that chromosome is excellent
Bad degree;
3) genetic algorithm is used, according to the fitness value, searching process is performed to the chromosome, then redefines and seeks again
Chromosome fitness value after excellent;
4) iteration is applicable step 3), after meeting to set end condition, exports optimal hidden Markov model parameter;
5) using the optimal HMM parameter of output, on the basis of given observation sequence, it is determined that generating the sight
The maximum probability hidden state sequence of sequence is examined, for representing the position on CpG islands.
Priority Applications (1)
Application Number | Priority Date | Filing Date | Title |
---|---|---|---|
CN201710725585.7A CN107577918A (en) | 2017-08-22 | 2017-08-22 | The recognition methods of CpG islands, device based on genetic algorithm and hidden Markov model |
Applications Claiming Priority (1)
Application Number | Priority Date | Filing Date | Title |
---|---|---|---|
CN201710725585.7A CN107577918A (en) | 2017-08-22 | 2017-08-22 | The recognition methods of CpG islands, device based on genetic algorithm and hidden Markov model |
Publications (1)
Publication Number | Publication Date |
---|---|
CN107577918A true CN107577918A (en) | 2018-01-12 |
Family
ID=61035069
Family Applications (1)
Application Number | Title | Priority Date | Filing Date |
---|---|---|---|
CN201710725585.7A Pending CN107577918A (en) | 2017-08-22 | 2017-08-22 | The recognition methods of CpG islands, device based on genetic algorithm and hidden Markov model |
Country Status (1)
Country | Link |
---|---|
CN (1) | CN107577918A (en) |
Cited By (4)
Publication number | Priority date | Publication date | Assignee | Title |
---|---|---|---|---|
CN109063417A (en) * | 2018-07-09 | 2018-12-21 | 福建国脉生物科技有限公司 | A kind of genotype complementing method constructing hidden Markov chain |
CN111276239A (en) * | 2019-11-29 | 2020-06-12 | 上海正雅齿科科技股份有限公司 | Method and device for determining tooth position of tooth model |
CN114300038A (en) * | 2021-12-27 | 2022-04-08 | 山东师范大学 | Multi-sequence comparison method and system based on improved biophysical optimization algorithm |
CN114550827A (en) * | 2022-01-14 | 2022-05-27 | 山东师范大学 | Gene sequence comparison method and system |
Citations (1)
Publication number | Priority date | Publication date | Assignee | Title |
---|---|---|---|---|
CN106682503A (en) * | 2017-01-06 | 2017-05-17 | 浙江中都信息技术有限公司 | Application of genetic algorithm based hidden Markov model to mainframe risk assessment |
-
2017
- 2017-08-22 CN CN201710725585.7A patent/CN107577918A/en active Pending
Patent Citations (1)
Publication number | Priority date | Publication date | Assignee | Title |
---|---|---|---|---|
CN106682503A (en) * | 2017-01-06 | 2017-05-17 | 浙江中都信息技术有限公司 | Application of genetic algorithm based hidden Markov model to mainframe risk assessment |
Non-Patent Citations (3)
Title |
---|
张倩倩: "基于隐马尔科夫模型的入侵检测方法研究", 《万方数据》 * |
石欧燕 等: "基于MATLAB的隐马尔可夫模型识别CpG岛", 《计算机应用与软件》 * |
蒋红敬 等: "基于HMM的CpG岛位置识别", 《数学理论与应用》 * |
Cited By (7)
Publication number | Priority date | Publication date | Assignee | Title |
---|---|---|---|---|
CN109063417A (en) * | 2018-07-09 | 2018-12-21 | 福建国脉生物科技有限公司 | A kind of genotype complementing method constructing hidden Markov chain |
CN109063417B (en) * | 2018-07-09 | 2022-03-15 | 福建国脉生物科技有限公司 | Genotype filling method for constructing hidden Markov chain |
CN111276239A (en) * | 2019-11-29 | 2020-06-12 | 上海正雅齿科科技股份有限公司 | Method and device for determining tooth position of tooth model |
CN111276239B (en) * | 2019-11-29 | 2023-06-27 | 正雅齿科科技(上海)有限公司 | Method and device for determining tooth position of tooth model |
CN114300038A (en) * | 2021-12-27 | 2022-04-08 | 山东师范大学 | Multi-sequence comparison method and system based on improved biophysical optimization algorithm |
CN114300038B (en) * | 2021-12-27 | 2023-09-29 | 山东师范大学 | Multi-sequence comparison method and system based on improved biological geography optimization algorithm |
CN114550827A (en) * | 2022-01-14 | 2022-05-27 | 山东师范大学 | Gene sequence comparison method and system |
Similar Documents
Publication | Publication Date | Title |
---|---|---|
CN107577918A (en) | The recognition methods of CpG islands, device based on genetic algorithm and hidden Markov model | |
EP2430441B1 (en) | Method and system for calling variations in a sample polynucleotide sequence with respect to a reference polynucleotide sequence | |
CN105974799A (en) | Fuzzy control system optimization method based on differential evolution-local unimodal sampling algorithm | |
CN111898689A (en) | Image classification method based on neural network architecture search | |
CN108470237B (en) | Multi-preference high-dimensional target optimization method based on co-evolution | |
CN106991295B (en) | Protein network module mining method based on multi-objective optimization | |
CN106934722A (en) | Multi-objective community detection method based on k node updates Yu similarity matrix | |
CN108038538A (en) | Multi-objective Evolutionary Algorithm based on intensified learning | |
CN112084877A (en) | NSGA-NET-based remote sensing image identification method | |
CN102521654B (en) | Supercritical water oxidation reaction kinetic model parameter estimation method employing RNA (Ribonucleic Acid) genetic algorithm | |
CN114266509A (en) | Flexible job shop scheduling method for solving by random greedy initial population genetic algorithm | |
CN113407185A (en) | Compiler optimization option recommendation method based on Bayesian optimization | |
CN108960486A (en) | Interactive set evolvement method based on grey support vector regression prediction adaptive value | |
CN117611974B (en) | Image recognition method and system based on searching of multiple group alternative evolutionary neural structures | |
Roeva et al. | Description of simple genetic algorithm modifications using generalized nets | |
CN111126560A (en) | Method for optimizing BP neural network based on cloud genetic algorithm | |
Oluoch et al. | A review on RNA secondary structure prediction algorithms | |
CN105740952A (en) | Multi-objective rapid genetic method for community network detection | |
CN116306919A (en) | Large-scale multi-objective combination optimization method based on problem recombination and application | |
CN111859807A (en) | Initial pressure optimizing method, device, equipment and storage medium for steam turbine | |
CN110705704A (en) | Neural network self-organizing genetic evolution algorithm based on correlation analysis | |
CN102799940A (en) | Online community partitioning method based on genetic algorithm and priori knowledge | |
Saraçoglu et al. | Developing an adaptation process for real-coded genetic algorithms | |
CN113077849B (en) | Escherichia coli beta-lactam acquired drug resistance phenotype prediction composite method | |
CN113141272A (en) | Network security situation analysis method based on iteration optimization RBF neural network |
Legal Events
Date | Code | Title | Description |
---|---|---|---|
PB01 | Publication | ||
PB01 | Publication | ||
SE01 | Entry into force of request for substantive examination | ||
SE01 | Entry into force of request for substantive examination | ||
RJ01 | Rejection of invention patent application after publication |
Application publication date: 20180112 |
|
RJ01 | Rejection of invention patent application after publication |