CN107577918A - The recognition methods of CpG islands, device based on genetic algorithm and hidden Markov model - Google Patents

The recognition methods of CpG islands, device based on genetic algorithm and hidden Markov model Download PDF

Info

Publication number
CN107577918A
CN107577918A CN201710725585.7A CN201710725585A CN107577918A CN 107577918 A CN107577918 A CN 107577918A CN 201710725585 A CN201710725585 A CN 201710725585A CN 107577918 A CN107577918 A CN 107577918A
Authority
CN
China
Prior art keywords
chromosome
fitness value
markov model
sequence
hidden markov
Prior art date
Legal status (The legal status is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the status listed.)
Pending
Application number
CN201710725585.7A
Other languages
Chinese (zh)
Inventor
刘弘
何演林
郑元杰
赵丹丹
陆佃杰
吕晨
Current Assignee (The listed assignees may be inaccurate. Google has not performed a legal analysis and makes no representation or warranty as to the accuracy of the list.)
Shandong Normal University
Original Assignee
Shandong Normal University
Priority date (The priority date is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the date listed.)
Filing date
Publication date
Application filed by Shandong Normal University filed Critical Shandong Normal University
Priority to CN201710725585.7A priority Critical patent/CN107577918A/en
Publication of CN107577918A publication Critical patent/CN107577918A/en
Pending legal-status Critical Current

Links

Landscapes

  • Investigating Or Analysing Biological Materials (AREA)
  • Measuring Or Testing Involving Enzymes Or Micro-Organisms (AREA)

Abstract

The present invention relates to a kind of CpG islands recognition methods based on genetic algorithm and hidden Markov model, comprise the following steps:1)Multiple chromosomes for including gene elements are obtained, each gene elements use real number representation, and multiple Encoded Chromosomes form one group of hidden Markov model parameter;2) fitness value of each chromosome is determined using fitness function, the fitness value is used for representing chromosome quality degree;3)Using genetic algorithm, according to the fitness value, searching process is performed to the chromosome, then redefines the chromosome fitness value after optimizing again;4)Iteration is applicable step 3), after meeting to set end condition, export optimal hidden Markov model parameter;5)Using the optimal HMM parameter of output, on the basis of given observation sequence, it is determined that the maximum probability hidden state sequence of the observation sequence is generated, for representing the position on CpG islands.

Description

The recognition methods of CpG islands, device based on genetic algorithm and hidden Markov model
Technical field
The invention belongs to biological information field, and in particular to a kind of CpG based on genetic algorithm and hidden Markov model Island recognition methods, device.
Background technology
With the completion that biological gene is sequenced, problems and challenge are faced with gene order identification.In many bases Because dinucleotides most rare in group is CG, the C in CG is easiest to be methylated, and this can cause C to be mutated into T.But methylate Effect is usually suppressed by the gene in a region, and this region is exactly CpG islands.It is a kind of length in the special of hundreds of bp The frequency that DNA sequence dna, wherein CG nucleotides occur is very high.Often find that a CpG island means that its sequence may include base Because the promoter and its First Exon of transcription and the identification on CpG islands help to determine that we are interested in genome sequence Region.Therefore, CpG islands have vital meaning to gene order identification.
The identification on CpG islands mainly faces two problems:1. giving a short genome sequence, how to judge whether it comes from CpG islands.2. giving a long sequence, if containing CpG islands, how to identify.
Current research is concentrated mainly on Second Problem.Researcher thinks that length is more than 200bp, CG 50% with On, actual CpG contents are with it is expected that region of the ratio more than 0.6 of CpG contents is referred to as CpG islands.The identification on traditional CpG islands is calculated Method is to define a sliding window, by the CG contents of gene order in calculation window and actual CpG contents with it is expected CpG contents Ratio realize.We can be found that the setting of window size, and recognition effect is had a great influence, and computation complexity It is very big.And propose discrimination standard be all artificially defined, thus identify that CpG islands biological significance it is little.In order to Enough discrimination standards for correctly finding out more biological significance, have researcher to propose the method based on hidden Markov model (HMM) To identify the position on CpG islands.HMM is a kind of probabilistic model, and it is produced by a hidden state change sequence and by the hidden state Raw observable symbol sebolic addressing composition.
One hidden Markov model is that have alphabet ∑, a state set Q, a state probability matrix A and one Send what probability matrix B was defined, wherein:
● ∑ is an alphabet;
● Q represents the set of the symbol sent from alphabet;
● A describes HMM and is transferred to state t+1 shape probability of states from state t;
● B describes the probability for the symbol s that HMM is sent in state t;
Once a system can be described as HMM, it is possible to for solving three basic problems.
Decoding problem:Setting models and character string, an optimal path is found in a model.The path from starting shape State is set out, and each state selects to discharge a character in path, realizes decoding operate.
Evaluation problem:For setting models, the probability for producing character string is sought.Generally select forwards algorithms The probability of an observation sequence after given HMM is calculated, and therefore selects most suitable HMM.
Problem concerning study:HMM is generated according to observation sequence.
The problem of wherein the first two is pattern-recognition:Given HMM seeks the probability (assessment) of an observation sequence;Search most has The hidden state sequence (decoding) of an observation sequence may be generated.3rd problem is that given observation sequence generates a HMM (study).3rd problem, and to HMM the problem of related in be most difficult to, known collection (come from according to an observation sequence Close), and associated hidden state collection, estimate a most suitable HMM.A total of eight in HMM Kind state:{ A+, G+, C+, T+, A-, G-, C-, T- }, A+ represent this state inside CpG islands, and A- represents this state on CpG islands It is outside.Each base correspond to two states in model.In the case of given base sequence, it is impossible to determine which kind of base corresponds to State value.Allow mutually to change in model, between state.The application method of hidden Markov model is as follows:
The DNA sequence dna on a number of CpG islands having determined is collected first, and depanning is trained using these real data The problem concerning study of the parameter of type, i.e. hidden Markov model.Mould is obtained from training data by establishing hidden Markov model Shape parameter, the Model Identification CpG islands further obtained with training.
For HMM and corresponding observation sequence, it is intended that find out the most probable hidden state for generating this sequence Sequence.We can each combine corresponding observation sequence by listing all possible hidden state sequence and calculating to correspond to General sequence looks for most probable hidden state, but this method computation complexity is very high.
Hidden Markov model is the probabilistic model based on sequential, and it relies on initial state probability vector, transition probability square Battle array and observation probability matrix.By research find, although hidden Markov model can be obtained on solving the problems, such as overfitting compared with Good effect, but still have many problems.It dependent on strong it is assumed that NextState is only influenceed by laststate, This hypothesis excessively simplifies, and therefore, only assuming that in the case of consistent with real data, hidden Markov model could basis Maximal possibility estimation is made effectively and accurately identified.But under normal circumstances, real data is not only by the shadow of laststate Ring.This causes HMM to be easily trapped into the situation of local optimum, and computation complexity is higher.In order to improve knowledges of the HMM to CpG islands Other ability to HMM parameters, it is necessary to optimize design.
The content of the invention
For the deficiencies in the prior art, the invention provides one kind to be based on genetic algorithm and hidden Markov model The recognition methods of CpG islands, the solution in HMM spaces can be considered, so as to draw globally optimal solution, can preferably be optimized HMM parameters, so as to improve to CpG islands recognition capability.
The technical scheme is that:
A kind of CpG islands recognition methods based on genetic algorithm and hidden Markov model, comprises the following steps:
1) multiple chromosomes for including gene elements are obtained, each gene elements use real number representation, multiple dyeing Body forms one group of hidden Markov model parameter;
2) fitness value of each chromosome is determined using fitness function, the fitness value is used for representing to dye Body quality degree;
3) genetic algorithm is used, according to the fitness value, searching process is performed to the chromosome, it is then again true again Determine the chromosome fitness value after optimizing;
4) iteration is applicable step 3), after meeting to set end condition, exports optimal hidden Markov model parameter;
5) using the optimal HMM parameter of output, on the basis of given observation sequence, it is determined that generation institute The maximum probability hidden state sequence of observation sequence is stated, for representing the position on CpG islands.
Wherein, we determine to generate the maximum probability hidden state sequence of the observation sequence using Viterbi algorithm, bag Include:
According to a local probability and a local optimum path corresponding to each base state in the observation sequence, lead to The probability of hidden state and the product of corresponding observation probability are crossed, selects current time maximum local probability and its corresponding office Portion's optimal path, recalled according to the local optimum path at current time, obtain the position recognition result on CpG islands.
Included using the specific formula of Viterbi algorithm:
For each state i, i=1 ..., n, Viberbi algorithms are defined as:
Xi=(Xi1,Xi2,…,XiT)
The local probability at t=1 moment is calculated by the probability of hidden state and the product of corresponding observation probability.It is right In other moment, each state Viterbi algorithm saves a backward pointerAnd store one in each state Individual local probability δ, observation state kt, probability b, state transition probability a are observed, reaches state i nearest local path Probability is δt(i):
It can determine to reach the optimal path of NextState by as above formula.In order to determine that the t=T moment is most probable hidden Tibetan state, makes it
it=argmax (δT(i))
For other moment it,
Computation complexity can be reduced by recurrence using Viterbi algorithm, the context of observation sequence has been obtained most Good explanation.
Further, the genetic algorithm includes selection operation, crossover operation and mutation operation, by successively using selection Operation, crossover operation and mutation operation, optimizing is performed to the chromosome.
Wherein, selection operation includes:According to the fitness value of each chromosome, fitness value is selected to meet hereditary demand Chromosome carries out heredity, deletes not selected chromosome.
If Population Size is N, chromosome is xiFitness function is f (xi), then xiSelected probability is:
qiFor calculating chromosome xiThe accumulated probability of (i=1,2,3 ... .n):
We are based on above-mentioned accumulated probability, using the method choice chromosome of roulette, are met the sample of hereditary demand This.
Wherein, crossover operation includes:In the chromosome that the fitness value meets hereditary demand, select fitness value compared with Excellent chromosome dyad carries out crossover operation between two neighboring parent chromosome, produces child chromosome as parent.
Likewise, during parent chromosome is selected, also using the method for roulette.
Mutation operation includes:In the child chromosome, it is first determined genetic mutation site, according to the mutation of setting Rate, change the genic value of the genetic mutation site.
Specifically, setting p as the genetic mutation site chosen, random numbers of the r between [0,1], ct is current algebraically,
Mt is total algebraically, and b=2, C are the genic values after variation.Genetic mutation site is changed into:
Mutation operation by and its small probability go change chromosome value, caused HMM parameters with variation before HMM join It is several and its close.
Further, the fitness function is:
Further, iteration is applicable step 3), using Baum-Welch algorithms, including:
We define the HMM model reevaluatedAbove formula left end represents revaluation HMM model Three parameters.γ in formulat(j) represent that t is located at hidden state SjProbability, ξt(i, j) represents that t is located at and hides shape State SiAnd the t+1 moment is located at hidden state SjProbability, O represent observation sequence.After successive ignition, it can obtain on HMM Maximal possibility estimation.Output solution is optimal hidden Markov model parameter.
The invention also provides a kind of computer-readable storage medium, is stored with a plurality of instruction, and the instruction is suitable to by processor Load and perform following processing:
1) multiple chromosomes for including gene elements are obtained, each gene elements use real number representation, multiple codings Chromosome forms one group of hidden Markov model parameter;
2) fitness value of each chromosome is determined using fitness function, the fitness value is used for representing to dye Body quality degree;
3) genetic algorithm is used, according to the fitness value, searching process is performed to the chromosome, it is then again true again Determine the chromosome fitness value after optimizing;
4) iteration is applicable step 3), after meeting to set end condition, exports optimal hidden Markov model parameter;
5) using the optimal HMM parameter of output, on the basis of given observation sequence, it is determined that generation institute The maximum probability hidden state sequence of observation sequence is stated, for representing the position on CpG islands.
The present invention has also been proposed a kind of CpG islands identification device based on genetic algorithm and hidden Markov model, including place Device is managed, for realizing each instruction;And computer-readable storage medium, for storing a plurality of instruction, the instruction is suitable to by processor Load and perform following processing:
1) multiple chromosomes for including gene elements are obtained, each gene elements use real number representation, multiple codings Chromosome forms one group of hidden Markov model parameter;
2) fitness value of each chromosome is determined using fitness function, the fitness value is used for representing to dye Body quality degree;
3) genetic algorithm is used, according to the fitness value, searching process is performed to the chromosome, it is then again true again Determine the chromosome fitness value after optimizing;
4) iteration is applicable step 3), after meeting to set end condition, exports optimal hidden Markov model parameter;
5) using the optimal HMM parameter of output, on the basis of given observation sequence, it is determined that generation institute The maximum probability hidden state sequence of observation sequence is stated, for representing the position on CpG islands.
Beneficial effects of the present invention:
The present invention carries out parameter optimization using HMM first, to improve the ability on identification CpG islands.Secondly, by using heredity Algorithm estimates HMM parameter.Finally, the method can obtain HMM maximum likelihood estimation, and the model is for identifying CGIS Position it is highly useful.Show on the basis of experiment, the CpG islands recognition methods of genetic algorithm and hidden Markov model combination Improve accuracy and recall rate.
Brief description of the drawings
The flow chart for the CpG islands recognition methods that Fig. 1 genetic algorithms and hidden Markov model combine;
Crossover operation in Fig. 2 genetic algorithms;
Mutation operation in Fig. 3 genetic algorithms;
Fig. 4 genetic algorithms relevant parameter controls;
Fig. 5 with iterations increase corresponding to fitness value;
The HMM results of Fig. 6 combination genetic algorithms;
The HMM and HMM of Fig. 7 genetic algorithm optimizations contrast to CpG islands recognition capability;
Embodiment:
The invention will be further described with embodiment below in conjunction with the accompanying drawings:
It is noted that described further below is all exemplary, it is intended to provides further instruction to the application.It is unless another Indicate, all technologies used herein and scientific terminology are with usual with the application person of an ordinary skill in the technical field The identical meanings of understanding.
It should be noted that term used herein above is merely to describe embodiment, and be not intended to restricted root According to the illustrative embodiments of the application.As used herein, unless the context clearly indicates otherwise, otherwise singulative It is also intended to include plural form, additionally, it should be understood that, when in this manual using term "comprising" and/or " bag Include " when, it indicates existing characteristics, step, operation, device, component and/or combinations thereof.
As background technology is previously mentioned, hidden Markov model be easily trapped on solving the problems, such as overfitting it is local most Excellent situation, and computation complexity is higher.In order to improve recognition capabilities of the HMM to CpG islands, the present invention proposes one kind and is based on The CpG islands recognition methods of genetic algorithm and hidden Markov model, comprises the following steps, as shown in Figure 1:
1) model parameter initializes:Multiple chromosomes for including gene elements are obtained, each gene elements are using real Number represents that multiple chromosomes form one group of hidden Markov model parameter;
Chromosome generally represents the element of a character string, and each element is otherwise known as gene.Due to HMM parameters A, B, Pi It is three real number matrix, and dimension is higher, is difficult to using binary coding, for direct real reaction model parameter Change, so being represented using real number genomic strings.
2) fitness value of each chromosome is determined using fitness function, the fitness value is used for representing to dye Body quality degree;
3) genetic algorithm is used, according to the fitness value, searching process is performed to the chromosome, it is then again true again Determine the chromosome fitness value after optimizing;
4) iteration is applicable step 3), after meeting to set end condition, exports optimal hidden Markov model parameter;
5) using the optimal HMM parameter of output, applied in test set, the base of observation sequence is being given Decoding operate is carried out on plinth, it is determined that the maximum probability hidden state sequence of the observation sequence is generated, for representing the position on CpG islands Put.
Genetic algorithm in the present invention includes three kinds of genetic operators, is selection operation, crossover operation and mutation operation respectively.
Selection operation is exactly that outstanding individual is selected from multiple chromosomes, in selection mechanism simulation natural selection Survival of the fittest mechanism.The high chromosome of the fitness value chromosome survival ability lower than fitness value is stronger, in subsequent evolution During, not selected chromosome is deleted.Decide whether to be genetic to the next generation from the mode of roulette.According to dyeing The fitness value of body itself, different size of region area is corresponded to respectively.If Population Size is N, chromosome is xiFitness Function is f (xi), then xiSelected probability is:
qiFor calculating chromosome xiThe accumulated probability of (i=1,2,3 ... .n):
Crossover operation is to recombinate the combination of parent chromosome.Selected parent is the maximum chromosome of fitness value, such as Shown in Fig. 2.Therefore, it can be seen that the operation can intersect optimal parent so as to draw more outstanding offspring.Father In generation, is chosen based on roulette mechanism.
Chromosome mutation adds the change of model parameter.By changing the value of gene in chromosome, have genetic algorithm There is the ability of global search.It makes the information that genetic algorithm is recovered to lose in initial phase and model parameter phylogenetic scale, makes Genetic algorithm can search optimal model parameter.Model parameter change before, if mutation rate be more than or equal to The probability of machine generation, mutation rate will test with the probability that generates at random.If result is correct, model parameter will be repaiied Change, it is as shown in Figure 3 using single-point random variation.P is the value for choosing the gene position of variation original, and r is random between [0,1] Number, ct are current algebraically, and mt is total algebraically, and b=2, C are the values after variation.Become changing into for dystopy:
It is worth noting that before mutation operation, it is first determined gene mutation site, then changed according to certain probability The original gene of change point.Mutation operation by and its small probability go to change the value of chromosome, caused HMM parameters should be with HMM parameters before variation and its close.
In the present embodiment, fitness value can reflect genetic algorithm performance and and be used to assess chromosome adaptation energy Power.Whether each individual has a fitness scoring, which dictates that being chosen.
Generally, if object function is maximization problems, fitness function is defined as follows:
If object function is minimization problem, fitness function is defined as follows:
Wherein cmin(cmax) it is coefficient correlation.
Fitness function calculation formula is used in the method in order to reduce the complexity of fitness function:
By the number for adjusting training dataset in CpG islands, it is possible to adjust the complexity of fitness function.Therefore, should Method
Significantly reduce the complexity of the function of fitness.
In order to filter out optimal HMM parameter, we are iterated using Baum-Welch algorithms, tool Body implementation method is:
We define the HMM model reevaluatedAbove formula left end represents revaluation HMM model Three parameters.γ in formulat(j) represent that t is located at hidden state SjProbability, ξt(i, j) represents that t is located at and hides shape State SiAnd the t+1 moment is located at hidden state SjProbability, O represent observation sequence.After successive ignition, it can obtain on HMM Maximal possibility estimation.Output solution is optimal hidden Markov model parameter.
After optimal model parameter is drawn, in order to verify the optimal model parameter, we are applied to test set In, for HMM and corresponding observation sequence, it is desirable to find out the most probable hidden state sequence for generating this sequence, also It is decoding operate.Each base state corresponds to a local probability and a local optimum path in DNA sequence dna, and we can be with Global optimal path is determined by the state and its corresponding local optimum path that select this moment maximum local probability.Utilize The product of state transition probability and corresponding observation probability, selects the probability of maximum, obtained probable value i.e. most possible Hidden state sequence.Decoded in this method from Viterbi algorithm.Viberbi algorithms can be defined as:
Xi=(Xi1,Xi2,…,XiT)
The local probability at t=1 moment is calculated by the probability of hidden state and the product of corresponding observation probability.It is right In other moment:
It can determine to reach the most probable path of NextState by as above formula.In order to determine t=T moment most probables Hidden state, make it
it=argmax (δT(i))
For other moment it,
Recalled according to most probable path, the "+" state for completing to pass through in rear path will a corresponding CpG island.Most The recognition result such as Fig. 6 on Zhong Dui CpG islands, shown in Fig. 7.
Viterbi algorithm reduces computation complexity by recurrence, by being carried out to the whole context of observation sequence Best explanation.The "+" state passed through in path will a corresponding CpG island.It is assumed that test data set T=ATTAGCGAT, The optimal path that Viterbi algorithm is found is status switch { A-, T-, T+, A+, G+C+, G+, A-, T }, then may determine that TAGCG For a CpG island.It can be seen that HMM identifies field on CpG islands, there is very big value.
The invention also provides a kind of computer-readable storage medium, is stored with a plurality of instruction, and the instruction is loaded by processor And perform following handle:
1) multiple chromosomes for including gene elements are obtained, each gene elements use real number representation, multiple codings Chromosome forms one group of hidden Markov model parameter;
2) fitness value of each chromosome is determined using fitness function, the fitness value is used for representing to dye Body quality degree;
3) genetic algorithm is used, according to the fitness value, searching process is performed to the chromosome, it is then again true again Determine the chromosome fitness value after optimizing;
4) iteration is applicable step 3), after meeting to set end condition, exports optimal hidden Markov model parameter;
5) using the optimal HMM parameter of output, on the basis of given observation sequence, it is determined that generation institute The maximum probability hidden state sequence of observation sequence is stated, for representing the position on CpG islands.
Further, the present invention is had also been proposed a kind of identified based on the CpG islands of genetic algorithm and hidden Markov model and filled Put, including processor, for realizing each instruction;And computer-readable storage medium, for storing a plurality of instruction, the instruction by Reason device loads and performs following processing:
1) multiple chromosomes for including gene elements are obtained, each gene elements use real number representation, multiple codings Chromosome forms one group of hidden Markov model parameter;
2) fitness value of each chromosome is determined using fitness function, the fitness value is used for representing to dye Body quality degree;
3) genetic algorithm is used, according to the fitness value, searching process is performed to the chromosome, it is then again true again Determine the chromosome fitness value after optimizing;
4) iteration is applicable step 3), after meeting to set end condition, exports optimal hidden Markov model parameter;
5) using the optimal HMM parameter of output, on the basis of given observation sequence, it is determined that generation institute The maximum probability hidden state sequence of observation sequence is stated, for representing the position on CpG islands.
The present invention carries out parameter optimization using HMM first, to improve the ability on identification CpG islands.Secondly, by using heredity Algorithm estimates HMM parameter.Finally, the method can obtain HMM maximum likelihood estimation, and the model is for identifying CGIS Position it is highly useful.Show on the basis of experiment, the CpG islands recognition methods of genetic algorithm and hidden Markov model combination Improve accuracy and recall rate.
The preferred embodiment of the application is the foregoing is only, is not limited to the application, for the skill of this area For art personnel, the application can have various modifications and variations.It is all within spirit herein and principle, made any repair Change, equivalent substitution, improvement etc., should be included within the protection domain of the application.

Claims (10)

1. a kind of CpG islands recognition methods based on genetic algorithm and hidden Markov model, it is characterised in that including following step Suddenly:
1) multiple chromosomes for including gene elements are obtained, each gene elements use real number representation, multiple chromosome structures Into one group of hidden Markov model parameter;
2) fitness value of each chromosome is determined using fitness function, the fitness value is used for representing that chromosome is excellent Bad degree;
3) genetic algorithm is used, according to the fitness value, searching process is performed to the chromosome, then redefines and seeks again Chromosome fitness value after excellent;
4) iteration is applicable step 3), after meeting to set end condition, exports optimal hidden Markov model parameter;
5) using the optimal HMM parameter of output, on the basis of given observation sequence, it is determined that generating the sight The maximum probability hidden state sequence of sequence is examined, for representing the position on CpG islands.
2. according to the method for claim 1, it is characterised in that determine to generate the observation sequence using Viterbi algorithm Maximum probability hidden state sequence.
3. according to the method for claim 2, it is characterised in that the Viterbi algorithm determines to generate the observation sequence Maximum probability hidden state sequence include:
According to a local probability and a local optimum path corresponding to each base state in the observation sequence, by hidden The product of the probability of Tibetan state and corresponding observation probability, selects current time maximum local probability and its corresponding part most Good path, recalled according to the local optimum path at current time, obtain the position recognition result on CpG islands.
4. according to the method for claim 1, it is characterised in that the genetic algorithm include selection operation, crossover operation and Mutation operation, by using selection operation, crossover operation and mutation operation successively, optimizing is performed to the chromosome.
5. according to the method for claim 4, it is characterised in that the selection operation includes:According to the suitable of each chromosome Angle value is answered, selects fitness value to meet that the chromosome of hereditary demand carries out heredity, deletes not selected chromosome.
6. according to the method for claim 5, it is characterised in that the crossover operation includes:Meet in the fitness value In the chromosome of hereditary demand, selecting fitness value, preferably chromosome dyad is as parent, in two neighboring parent chromosome Between carry out crossover operation, produce child chromosome.
7. according to the method for claim 6, it is characterised in that the mutation operation includes:In the child chromosome, Genetic mutation site is determined first, according to the mutation rate of setting, changes the genic value of the genetic mutation site.
8. according to the method for claim 1, it is characterised in that the fitness function is:
The number on CpG islands is concentrated by training data in adjustment, it is possible to adjust the complexity of fitness function.
9. a kind of computer-readable storage medium, it is stored with a plurality of instruction, it is characterised in that the instruction is suitable to by processor loading simultaneously Perform following handle:
1) multiple chromosomes for including gene elements are obtained, each gene elements use real number representation, multiple coding dyeing Body forms one group of hidden Markov model parameter;
2) fitness value of each chromosome is determined using fitness function, the fitness value is used for representing that chromosome is excellent Bad degree;
3) genetic algorithm is used, according to the fitness value, searching process is performed to the chromosome, then redefines and seeks again Chromosome fitness value after excellent;
4) iteration is applicable step 3), after meeting to set end condition, exports optimal hidden Markov model parameter;
5) using the optimal HMM parameter of output, on the basis of given observation sequence, it is determined that generating the sight The maximum probability hidden state sequence of sequence is examined, for representing the position on CpG islands.
10. a kind of CpG islands identification device based on genetic algorithm and hidden Markov model, including processor, each for realizing Instruction;And computer-readable storage medium, for storing a plurality of instruction, it is characterised in that:The instruction is suitable to be loaded by processor And perform following handle:
1) multiple chromosomes for including gene elements are obtained, each gene elements use real number representation, multiple coding dyeing Body forms one group of hidden Markov model parameter;
2) fitness value of each chromosome is determined using fitness function, the fitness value is used for representing that chromosome is excellent Bad degree;
3) genetic algorithm is used, according to the fitness value, searching process is performed to the chromosome, then redefines and seeks again Chromosome fitness value after excellent;
4) iteration is applicable step 3), after meeting to set end condition, exports optimal hidden Markov model parameter;
5) using the optimal HMM parameter of output, on the basis of given observation sequence, it is determined that generating the sight The maximum probability hidden state sequence of sequence is examined, for representing the position on CpG islands.
CN201710725585.7A 2017-08-22 2017-08-22 The recognition methods of CpG islands, device based on genetic algorithm and hidden Markov model Pending CN107577918A (en)

Priority Applications (1)

Application Number Priority Date Filing Date Title
CN201710725585.7A CN107577918A (en) 2017-08-22 2017-08-22 The recognition methods of CpG islands, device based on genetic algorithm and hidden Markov model

Applications Claiming Priority (1)

Application Number Priority Date Filing Date Title
CN201710725585.7A CN107577918A (en) 2017-08-22 2017-08-22 The recognition methods of CpG islands, device based on genetic algorithm and hidden Markov model

Publications (1)

Publication Number Publication Date
CN107577918A true CN107577918A (en) 2018-01-12

Family

ID=61035069

Family Applications (1)

Application Number Title Priority Date Filing Date
CN201710725585.7A Pending CN107577918A (en) 2017-08-22 2017-08-22 The recognition methods of CpG islands, device based on genetic algorithm and hidden Markov model

Country Status (1)

Country Link
CN (1) CN107577918A (en)

Cited By (4)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN109063417A (en) * 2018-07-09 2018-12-21 福建国脉生物科技有限公司 A kind of genotype complementing method constructing hidden Markov chain
CN111276239A (en) * 2019-11-29 2020-06-12 上海正雅齿科科技股份有限公司 Method and device for determining tooth position of tooth model
CN114300038A (en) * 2021-12-27 2022-04-08 山东师范大学 Multi-sequence comparison method and system based on improved biophysical optimization algorithm
CN114550827A (en) * 2022-01-14 2022-05-27 山东师范大学 Gene sequence comparison method and system

Citations (1)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN106682503A (en) * 2017-01-06 2017-05-17 浙江中都信息技术有限公司 Application of genetic algorithm based hidden Markov model to mainframe risk assessment

Patent Citations (1)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN106682503A (en) * 2017-01-06 2017-05-17 浙江中都信息技术有限公司 Application of genetic algorithm based hidden Markov model to mainframe risk assessment

Non-Patent Citations (3)

* Cited by examiner, † Cited by third party
Title
张倩倩: "基于隐马尔科夫模型的入侵检测方法研究", 《万方数据》 *
石欧燕 等: "基于MATLAB的隐马尔可夫模型识别CpG岛", 《计算机应用与软件》 *
蒋红敬 等: "基于HMM的CpG岛位置识别", 《数学理论与应用》 *

Cited By (7)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN109063417A (en) * 2018-07-09 2018-12-21 福建国脉生物科技有限公司 A kind of genotype complementing method constructing hidden Markov chain
CN109063417B (en) * 2018-07-09 2022-03-15 福建国脉生物科技有限公司 Genotype filling method for constructing hidden Markov chain
CN111276239A (en) * 2019-11-29 2020-06-12 上海正雅齿科科技股份有限公司 Method and device for determining tooth position of tooth model
CN111276239B (en) * 2019-11-29 2023-06-27 正雅齿科科技(上海)有限公司 Method and device for determining tooth position of tooth model
CN114300038A (en) * 2021-12-27 2022-04-08 山东师范大学 Multi-sequence comparison method and system based on improved biophysical optimization algorithm
CN114300038B (en) * 2021-12-27 2023-09-29 山东师范大学 Multi-sequence comparison method and system based on improved biological geography optimization algorithm
CN114550827A (en) * 2022-01-14 2022-05-27 山东师范大学 Gene sequence comparison method and system

Similar Documents

Publication Publication Date Title
CN107577918A (en) The recognition methods of CpG islands, device based on genetic algorithm and hidden Markov model
EP2430441B1 (en) Method and system for calling variations in a sample polynucleotide sequence with respect to a reference polynucleotide sequence
CN105974799A (en) Fuzzy control system optimization method based on differential evolution-local unimodal sampling algorithm
CN111898689A (en) Image classification method based on neural network architecture search
CN108470237B (en) Multi-preference high-dimensional target optimization method based on co-evolution
CN106991295B (en) Protein network module mining method based on multi-objective optimization
CN106934722A (en) Multi-objective community detection method based on k node updates Yu similarity matrix
CN108038538A (en) Multi-objective Evolutionary Algorithm based on intensified learning
CN112084877A (en) NSGA-NET-based remote sensing image identification method
CN102521654B (en) Supercritical water oxidation reaction kinetic model parameter estimation method employing RNA (Ribonucleic Acid) genetic algorithm
CN114266509A (en) Flexible job shop scheduling method for solving by random greedy initial population genetic algorithm
CN113407185A (en) Compiler optimization option recommendation method based on Bayesian optimization
CN108960486A (en) Interactive set evolvement method based on grey support vector regression prediction adaptive value
CN117611974B (en) Image recognition method and system based on searching of multiple group alternative evolutionary neural structures
Roeva et al. Description of simple genetic algorithm modifications using generalized nets
CN111126560A (en) Method for optimizing BP neural network based on cloud genetic algorithm
Oluoch et al. A review on RNA secondary structure prediction algorithms
CN105740952A (en) Multi-objective rapid genetic method for community network detection
CN116306919A (en) Large-scale multi-objective combination optimization method based on problem recombination and application
CN111859807A (en) Initial pressure optimizing method, device, equipment and storage medium for steam turbine
CN110705704A (en) Neural network self-organizing genetic evolution algorithm based on correlation analysis
CN102799940A (en) Online community partitioning method based on genetic algorithm and priori knowledge
Saraçoglu et al. Developing an adaptation process for real-coded genetic algorithms
CN113077849B (en) Escherichia coli beta-lactam acquired drug resistance phenotype prediction composite method
CN113141272A (en) Network security situation analysis method based on iteration optimization RBF neural network

Legal Events

Date Code Title Description
PB01 Publication
PB01 Publication
SE01 Entry into force of request for substantive examination
SE01 Entry into force of request for substantive examination
RJ01 Rejection of invention patent application after publication

Application publication date: 20180112

RJ01 Rejection of invention patent application after publication