CN114550827B - Gene sequence comparison method and system - Google Patents

Gene sequence comparison method and system Download PDF

Info

Publication number
CN114550827B
CN114550827B CN202210044384.1A CN202210044384A CN114550827B CN 114550827 B CN114550827 B CN 114550827B CN 202210044384 A CN202210044384 A CN 202210044384A CN 114550827 B CN114550827 B CN 114550827B
Authority
CN
China
Prior art keywords
honey
honey source
solution
gene sequence
source
Prior art date
Legal status (The legal status is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the status listed.)
Active
Application number
CN202210044384.1A
Other languages
Chinese (zh)
Other versions
CN114550827A (en
Inventor
张庆科
李天奇
汪玉成
高昊
卜降龙
来明旭
张化祥
Current Assignee (The listed assignees may be inaccurate. Google has not performed a legal analysis and makes no representation or warranty as to the accuracy of the list.)
Shandong Normal University
Original Assignee
Shandong Normal University
Priority date (The priority date is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the date listed.)
Filing date
Publication date
Application filed by Shandong Normal University filed Critical Shandong Normal University
Priority to CN202210044384.1A priority Critical patent/CN114550827B/en
Publication of CN114550827A publication Critical patent/CN114550827A/en
Application granted granted Critical
Publication of CN114550827B publication Critical patent/CN114550827B/en
Active legal-status Critical Current
Anticipated expiration legal-status Critical

Links

Images

Classifications

    • GPHYSICS
    • G16INFORMATION AND COMMUNICATION TECHNOLOGY [ICT] SPECIALLY ADAPTED FOR SPECIFIC APPLICATION FIELDS
    • G16BBIOINFORMATICS, i.e. INFORMATION AND COMMUNICATION TECHNOLOGY [ICT] SPECIALLY ADAPTED FOR GENETIC OR PROTEIN-RELATED DATA PROCESSING IN COMPUTATIONAL MOLECULAR BIOLOGY
    • G16B30/00ICT specially adapted for sequence analysis involving nucleotides or amino acids
    • G16B30/10Sequence alignment; Homology search
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06NCOMPUTING ARRANGEMENTS BASED ON SPECIFIC COMPUTATIONAL MODELS
    • G06N3/00Computing arrangements based on biological models
    • G06N3/004Artificial life, i.e. computing arrangements simulating life
    • G06N3/006Artificial life, i.e. computing arrangements simulating life based on simulated virtual individual or collective life forms, e.g. social simulations or particle swarm optimisation [PSO]

Landscapes

  • Engineering & Computer Science (AREA)
  • Physics & Mathematics (AREA)
  • Life Sciences & Earth Sciences (AREA)
  • Theoretical Computer Science (AREA)
  • General Health & Medical Sciences (AREA)
  • Health & Medical Sciences (AREA)
  • Biophysics (AREA)
  • General Physics & Mathematics (AREA)
  • Chemical & Material Sciences (AREA)
  • Evolutionary Computation (AREA)
  • Computational Linguistics (AREA)
  • Molecular Biology (AREA)
  • Computing Systems (AREA)
  • General Engineering & Computer Science (AREA)
  • Biomedical Technology (AREA)
  • Mathematical Physics (AREA)
  • Software Systems (AREA)
  • Artificial Intelligence (AREA)
  • Data Mining & Analysis (AREA)
  • Analytical Chemistry (AREA)
  • Proteomics, Peptides & Aminoacids (AREA)
  • Bioinformatics & Cheminformatics (AREA)
  • Bioinformatics & Computational Biology (AREA)
  • Biotechnology (AREA)
  • Evolutionary Biology (AREA)
  • Medical Informatics (AREA)
  • Spectroscopy & Molecular Physics (AREA)
  • Measuring Or Testing Involving Enzymes Or Micro-Organisms (AREA)

Abstract

The invention provides a gene sequence comparison method and a system, comprising the following steps: coding parameters of the hidden Markov model into honey sources, and adopting the hidden Markov models corresponding to all the honey sources to obtain various hidden state sequences corresponding to each gene sequence; judging whether a termination condition is met, if so, comparing every two hidden state sequences of all gene sequences obtained by a hidden Markov model corresponding to a honey source with the maximum fitness value to obtain a gene sequence most similar to each gene sequence; otherwise, dividing all the honey sources into a plurality of populations based on the fitness value of each honey source, and performing difference learning among different populations to optimize the parameters of the hidden Markov model until the termination condition is met. The randomness of parameter search of the hidden Markov model is enhanced, the solution is prevented from falling into local optimum, and the solution precision is improved when multiple sequences are compared.

Description

Gene sequence comparison method and system
Technical Field
The invention belongs to the technical field of sequence comparison, and particularly relates to a gene sequence comparison method and system.
Background
The statements in this section merely provide background information related to the present disclosure and may not necessarily constitute prior art.
In recent years, the rapid development of biological science and technology, and how to analyze and process the implicit meaning of data in a biological database is a serious challenge for human beings. Sequence alignment reflects the information that biological sequences possess and has been widely used to identify related DNA and protein sequences. The development of sequence alignment has been over decades, and a large number of sequence alignment methods have been proposed, for example, a sequence alignment algorithm based on dynamic programming, but the algorithm consumes a lot of time and space, and cannot solve practical problems; the algorithm is a progressive alignment algorithm, but it tends to fall into local optimality and cannot be corrected.
To overcome the drawbacks of the above two types of algorithms, iterative alignment algorithms based on the generation of multiple sequence alignment algorithms have emerged. The iterative comparison algorithm mainly refers to a swarm intelligence algorithm constructed based on the swarm behaviors of organisms, such as a particle swarm algorithm, a genetic algorithm, an artificial bee swarm algorithm and the like. An Artificial Bee Colony Algorithm (ABC) is a Colony intelligent algorithm based on Bee Colony honey collection behavior. The method has the advantages of few control parameters, easy realization and the like, and has been focused and improved by more and more scholars in recent years, and has been successfully applied to optimization problems in many fields. However, with the intensive research of people on the ABC algorithm, it is found that the probability selection mechanism of the ABC algorithm in the bee following stage fails in the later iteration stage of the population, so that the algorithm is slow in convergence in the later iteration stage and low in solution accuracy.
Disclosure of Invention
In order to solve the technical problems in the background art, the invention provides a gene sequence comparison method and a gene sequence comparison system, which enhance the randomness of parameter search of a hidden Markov model, avoid the solution from being trapped in local optimization and improve the solution precision in multi-sequence comparison.
In order to achieve the purpose, the invention adopts the following technical scheme:
in a first aspect, the present invention provides a method for gene sequence alignment, comprising:
obtaining a plurality of gene sequences;
coding parameters of the hidden Markov model into honey sources, and for each gene sequence, adopting the hidden Markov models corresponding to all the honey sources to obtain various hidden state sequences corresponding to each gene sequence; judging whether a termination condition is met, if so, comparing every two hidden state sequences of all gene sequences obtained by a hidden Markov model corresponding to a honey source with the maximum fitness value to obtain a gene sequence most similar to each gene sequence; otherwise, dividing all the honey sources into a plurality of populations based on the fitness value of each honey source, and performing difference learning among different populations to optimize the parameters of the hidden Markov model until a termination condition is met.
Further, the performing difference learning between different populations includes:
for a certain honey source, acquiring global optimal solutions of all the honey sources, and randomly selecting one solution in different populations respectively;
calculating different difference values based on the global optimal solution and the random selected solution;
weighting and summing the different difference values, and then summing the weighted sum with the honey source to obtain leading bees;
and if the fitness value of the leading bee is larger than that of the honey source, replacing the honey source with the leading bee.
Further, the performing difference learning between different populations includes:
for a certain honey source, acquiring global optimal solutions of all the honey sources, and randomly selecting one solution in different populations respectively;
calculating different difference values based on the global optimal solution and the random selected solution;
weighting and summing the different difference values and the honey source to obtain follower bees;
and if the fitness value of the follower bee is larger than that of the honey source, replacing the honey source with the follower bee.
Further, the performing difference learning between different populations includes:
and for a certain honey source, if the iteration failure times of the honey source reach the set times, generating a plurality of new solutions by adopting a plurality of new solution generation modes, and selecting the optimal solution from the plurality of new solutions according to a greedy selection strategy to replace the honey source.
Further, the new solution generation method is as follows: a new solution is randomly generated.
Further, the new solution generation method is as follows:
selecting a population with the maximum fitness value from a plurality of populations as a base population, and taking the rest populations as auxiliary populations;
randomly selecting a honey source from the basic population as a basic honey source;
randomly selecting a honey source in each auxiliary population as an auxiliary honey source;
and calculating the difference between the auxiliary honey sources, multiplying the difference by a random number, and summing the sum with the basic honey source to obtain a new solution.
Further, the new solution generation method is as follows:
selecting a minimum value, a maximum value and a global optimal solution from all honey sources;
and calculating the difference between the minimum value plus the maximum value and the global optimal solution to obtain a new solution.
In a second aspect of the present invention, there is provided a gene sequence alignment system comprising:
a gene sequence acquisition module configured to: obtaining a plurality of gene sequences;
a gene sequence alignment module configured to: coding parameters of the hidden Markov model into honey sources, and for each gene sequence, adopting the hidden Markov models corresponding to all the honey sources to obtain various hidden state sequences corresponding to each gene sequence; judging whether a termination condition is met, if so, comparing every two hidden state sequences of all gene sequences obtained by a hidden Markov model corresponding to a honey source with the maximum fitness value to obtain a gene sequence most similar to each gene sequence; otherwise, dividing all the honey sources into a plurality of populations based on the fitness value of each honey source, and performing difference learning among different populations to optimize the parameters of the hidden Markov model until the termination condition is met.
A third aspect of the present invention provides a computer readable storage medium, on which a computer program is stored, which program, when executed by a processor, implements the steps in a method for gene sequence alignment as described above.
A fourth aspect of the present invention provides a computer device, comprising a memory, a processor and a computer program stored in the memory and executable on the processor, wherein the processor executes the computer program to implement the steps of the gene sequence alignment method.
Compared with the prior art, the invention has the beneficial effects that:
the invention provides a gene sequence comparison method, which optimizes parameters of a hidden Markov model based on an artificial bee colony algorithm of layered learning, can avoid the danger that the algorithm is trapped in local optimization, accelerates the convergence speed and improves the solution precision during multi-sequence comparison.
The invention provides a gene sequence comparison method, which constructs a new hierarchical ring topology structure based on an artificial bee colony algorithm of hierarchical learning, and populations among different levels can be subjected to difference learning; therefore, the search strategies in the two stages are improved, and the overall optimization capability and the search capability of the algorithm are enhanced; the defect that the ABC algorithm is easy to converge and stagnate at the later stage of iteration due to a probability selection mechanism is avoided, the convergence speed is increased, and the accuracy of the solution is improved; solutions in three different directions are generated in the bee investigation stage, the search randomness is enhanced, and the situation that the solution is trapped in local optimization is avoided.
Drawings
The accompanying drawings, which are incorporated in and constitute a part of this specification, are included to provide a further understanding of the invention, and are included to illustrate an exemplary embodiment of the invention and not to limit the invention.
FIG. 1 is a flow chart of a method according to a first embodiment of the present invention;
FIG. 2 (a) is a graph showing the convergence of the 1ad2_ref1 gene sequence according to the first embodiment of the present invention;
FIG. 2 (b) is a graph showing the convergence of the 1ivy _ref5gene sequence in the first embodiment of the present invention;
FIG. 2 (c) is a 451c _ref1gene sequence convergence graph in accordance with the first embodiment of the present invention;
FIG. 2 (d) is a graph showing the convergence of the kinase _ ref1 gene sequence according to the first embodiment of the present invention.
Detailed Description
The invention is further described with reference to the following figures and examples.
It is to be understood that the following detailed description is exemplary and is intended to provide further explanation of the invention as claimed. Unless defined otherwise, all technical and scientific terms used herein have the same meaning as commonly understood by one of ordinary skill in the art to which this invention belongs.
It is noted that the terminology used herein is for the purpose of describing particular embodiments only and is not intended to be limiting of exemplary embodiments according to the invention. As used herein, the singular forms "a", "an" and "the" are intended to include the plural forms as well, and it should be understood that when the terms "comprises" and/or "comprising" are used in this specification, they specify the presence of stated features, steps, operations, devices, components, and/or combinations thereof, unless the context clearly indicates otherwise.
Example one
This example provides a gene sequence alignment method, as shown in fig. 1, which specifically includes the following steps:
step 1, obtaining a plurality of gene sequences;
step 2, initialization: including initialization of parameters and generation of initial honey sources. The initialized parameters comprise population size SN, number SN of honey sources, individual dimension D, threshold limit, maximum iteration number MCN, maximum evaluation number MFE and maximum UB j And minimum value LB j (ii) a The initial honey source is generated by randomly generating SN initial honey sources through an equation (1):
x i,j =LB j +rand(0,1)·(UB j -LB j ) (1)
wherein x is i,j J-dimensional vector representing the ith honey source (individual), i =1,2,3.. SN, j =1,2,3.. D, { LB j ,UB j Denotes the value range of the j-dimension variable, and rand (0,1) denotesA random number between 0 and 1. Each honey source X i Represents the parameters of a hidden Markov model, and D is the number of the parameters of the hidden Markov model.
Step 3, coding parameters of the hidden Markov models into honey sources, and for each gene sequence, adopting the hidden Markov models corresponding to all the honey sources to obtain a plurality of hidden state sequences corresponding to each gene sequence; judging whether a termination condition is met or not (wherein the termination condition is that the iteration number reaches the maximum iteration number MCN), if so, comparing every two hidden state sequences of all gene sequences obtained by the hidden Markov model corresponding to the honey source with the maximum fitness value to obtain a gene sequence most similar to each gene sequence (namely, the gene sequence corresponding to the hidden state sequence with the maximum similarity); otherwise, based on the fitness value of each honey source, dividing all the honey sources into a plurality of populations, and performing difference learning among different populations to optimize the parameters of the hidden Markov model until a termination condition is met, namely executing steps 301-304:
301, obtaining hidden state sequences of all gene sequences based on each honey source (a parameter of a hidden Markov model), and calculating fitness value fit (X) of each individual according to formula (2) i ),fit(X i ) Value of (1) is honey source X i The following hidden markov model yields an SPS of the sequence of hidden states:
Figure BDA0003471546690000061
wherein l i Indicating the ith aligned hidden state sequence, l j Representing the j hidden state sequence to be compared, D is a function representing the similarity of the two sequences, and in actual operation, a similarity score matrix is used for calculating D, so D (l) i ,l j ) Is generally expressed as i And l j The replacement score corresponding to the residual value. The higher the SPS score, the better the accuracy of the alignment representing the gene sequence.
According to the constructed hierarchical ring topology structure, dividing the population into a first population S1, a second population S2 and a third population S3 according to the fitness value by utilizing a hierarchical learning mode, wherein the number of the first population S1, the second population S2 and the third population S3 is integrated into SN, and the ratio of the number of the first population S1, the second population S2 and the third population S3 is 1: 7: 2; the fitness value of each individual in the first population S1 is greater than the fitness value of each individual in the second population S2, and the fitness value of each individual in the second population S2 is greater than the fitness value of each individual in the third population S3.
The idea of layered learning is as follows: in the whole population, the outer-layer population learns the difference of the inner-layer population, and the whole population is continuously close to a more excellent solution. Meanwhile, the innermost population is learned towards the global optimum, and a better solution is found.
Step 302, bee leading stage: for a certain honey source, acquiring global optimal solutions of all the honey sources, and randomly selecting one solution in different populations respectively; calculating different difference values based on the global optimal solution and the random selected solution; weighting and summing the different difference values, and then summing the weighted sum with the honey source to obtain a leading bee; and if the fitness value of the leading bee is larger than that of the honey source, replacing the honey source with the leading bee. Specifically, based on the idea of layered learning, a search equation at the stage of the leading bees is improved, the leading bees do not perform neighborhood search only on a single honey source any more, but perform differential learning between different population levels according to the formula (3) to obtain a high-quality solution.
Figure BDA0003471546690000071
Wherein v is i,j For newly generated solutions, x i,j Is the j-th dimension vector of the ith honey source, phi i,j Is [ -1,1]A random number in between, and a random number,
Figure BDA0003471546690000072
is [0,1.5]Random number between, x gbest,j Is a global optimal solution, x, of the j-th dimension S1,j 、x S2,j 、x s3,j Are respectively randomly arranged in three layers S1, S2 and S3Three solutions were selected.
Calculating the fitness value (SPS value) fit (V) of the newly generated solution according to equation (2) i ) (or new _ fit) if it is greater than the SPS value of the current individual, i.e., fit (X) i )<fit(V i ) Replacing the current lead bee individual with the new individual, and real i =0; otherwise, X is reserved i ,trail i =trail i +1。
Step 303, following the bee stage: for a certain honey source, acquiring global optimal solutions of all the honey sources, and randomly selecting one solution in different populations respectively; calculating different difference values based on the global optimal solution and the random selected solution; weighting and summing the different difference values and the honey source to obtain follower bees; and if the adaptability value of the follower bee is greater than that of the honey source, replacing the honey source with the follower bee. In particular, in the following stage, three elite improvement operators r are introduced between three different layers 1 、r 2 、r 3 The original probability selection mechanism is replaced by the layered learning.
v i,j =r 1 ·x i,jr +r 2 ·(xS 2,j -xS 3,j )+r 3 ·(x gbest,j -x s1,r ) (4)
Wherein r is 1 、r 2 、r 3 Is three [0,1]A random number in between, and r 1 +r 2 +r 3 =1,x gbest,j Is a global optimal solution, x, of the j-th dimension S1,j 、x S2,j 、x S3,j Respectively, three solutions randomly selected among the three levels S1, S2, S3.
Calculating the SPS value of the newly generated solution according to the formula (2), if the SPS value of the current individual is larger than the SPS value of the current individual, replacing the current leading bee individual with the new individual, and obtaining the final i =0; otherwise, X is reserved i ,trail i =trail i +1。
Step 304, detecting bees: for a certain honey source, if the iteration failure times of the honey source reach the set times, a plurality of new solution generation modes are adopted to generate a plurality of new solutions, and a greedy selection strategy is adoptedAnd selecting the optimal solution from a plurality of new solutions to replace the honey source. Specifically, in the bee detection stage, a layered learning and opponent learning-based method is introduced, and trail is performed when iteration fails i Reaches the set number of times limit i Instead of using the original single new solution generation approach, three different solutions are generated.
Randomly generating a new solution: first solution m 1 Still, it is generated according to (1).
Selecting a population with the maximum fitness value from a plurality of populations as a base population, and taking the rest populations as auxiliary populations; randomly selecting a honey source from the basic population as a basic honey source; randomly selecting a honey source in each auxiliary population as an auxiliary honey source; and calculating the difference between the auxiliary honey sources, multiplying the difference by a random number, and summing the sum with the base honey source to obtain a new solution. I.e. the second solution m 2 The idea of learning by layering is generated according to the following formula:
m 2 =x S1i,j ·(x S2 -x S3 ) (5)
wherein x is S1,j 、x S2,j 、x S3,j Are respectively three solutions, phi, randomly selected from three layers S1, S2, S3 i,j Is [ -1,1]A random number in between.
Selecting a minimum value, a maximum value and a global optimal solution from all honey sources; and calculating the difference between the minimum value plus the maximum value and the global optimal solution to obtain a new solution. I.e. the third solution m 3 According to the thought of opponent learning, a solution is searched on the opposite surface of the global optimal solution to avoid the search from being trapped in local optimal, and the generation formula is as follows:
m 3 =LB+UB-x gbest (6)
wherein LB and UB are respectively the minimum value and the maximum value of the solution, x gbest Is a globally optimal solution.
And (3) calculating the fitness values of the three newly generated solutions through a formula (2), and selecting the optimal solution as a newly generated solution according to a greedy selection strategy.
In this example, 4 groups of test experiments were performed, and the gene sequences were 1ad2_ref1, 1ivy _ref5, 451c _ref1, and kinase _ ref1, respectively, and were aligned with ABC and the present invention to find the same group of sequences. In the experiment, the ABC algorithm and the algorithm of the invention are operated under the same experimental conditions, each test function is independently operated for 10 times, iteration is carried out for 1000 generations, and the maximum value, the minimum value and the average value are recorded.
TABLE 1 accuracy of multiple sequence alignment test results
Figure BDA0003471546690000091
The algorithm of the present invention is significantly higher than the result of the ABC algorithm, both in terms of mean and optimum or worst value. Therefore, the superiority of the algorithm of the invention can be fully seen. In order to more fully express the performance of the algorithm of the present invention, besides giving the precision results shown in table 1, a convergence curve graph of the operation of the algorithm of the present invention (HLABC) and ABC is also shown in a form of a graph. As shown in fig. 2 (a), 2 (b), 2 (c) and 2 (d), the horizontal axis represents the number of iterations (Iteration) and the vertical axis represents the average value of the SPS (Score). From which it can be concluded that: the invention can avoid the danger that the algorithm is trapped in local optimum, quickens the convergence speed and improves the precision of the solution when the multiple sequences are compared.
The invention provides a novel ring topology structure based on the idea of layered learning, improves the original search strategy so as to improve the randomness of search, replaces the original probability selection mechanism with the layered learning method, and improves the optimizing capability and the convergence speed of the algorithm, thereby overcoming the defects of the original ABC algorithm, achieving the optimization effect of the ABC algorithm and improving the solution precision of multi-sequence comparison.
Example two
The embodiment provides a gene sequence comparison system, which specifically comprises the following modules:
a gene sequence acquisition module configured to: obtaining a plurality of gene sequences;
a gene sequence alignment module configured to: coding parameters of the hidden Markov model into honey sources, and adopting the hidden Markov models corresponding to all the honey sources to obtain various hidden state sequences corresponding to each gene sequence; judging whether a termination condition is met, if so, comparing every two hidden state sequences of all gene sequences obtained by a hidden Markov model corresponding to a honey source with the maximum fitness value to obtain a gene sequence most similar to each gene sequence; otherwise, dividing all the honey sources into a plurality of populations based on the fitness value of each honey source, and performing difference learning among different populations to optimize the parameters of the hidden Markov model until a termination condition is met.
It should be noted that, each module in the present embodiment corresponds to each step in the first embodiment one to one, and the specific implementation process is the same, which is not described herein again.
EXAMPLE III
This embodiment provides a computer-readable storage medium, on which a computer program is stored, which when executed by a processor, implements the steps in a method for gene sequence alignment as described in the first embodiment above.
Example four
This embodiment provides a computer device, which includes a memory, a processor, and a computer program stored in the memory and executable on the processor, wherein the processor executes the computer program to implement the steps of the gene sequence alignment method according to the above embodiment.
As will be appreciated by one skilled in the art, embodiments of the present invention may be provided as a method, system, or computer program product. Accordingly, the present invention may take the form of a hardware embodiment, a software embodiment, or an embodiment combining software and hardware aspects. Furthermore, the present invention may take the form of a computer program product embodied on one or more computer-usable storage media (including, but not limited to, disk storage, optical storage, and the like) having computer-usable program code embodied therein.
The present invention is described with reference to flowchart illustrations and/or block diagrams of methods, apparatus (systems), and computer program products according to embodiments of the invention. It will be understood that each flow and/or block of the flowchart illustrations and/or block diagrams, and combinations of flows and/or blocks in the flowchart illustrations and/or block diagrams, can be implemented by computer program instructions. These computer program instructions may be provided to a processor of a general purpose computer, special purpose computer, embedded processor, or other programmable data processing apparatus to produce a machine, such that the instructions, which execute via the processor of the computer or other programmable data processing apparatus, create means for implementing the functions specified in the flowchart flow or flows and/or block diagram block or blocks.
These computer program instructions may also be stored in a computer-readable memory that can direct a computer or other programmable data processing apparatus to function in a particular manner, such that the instructions stored in the computer-readable memory produce an article of manufacture including instruction means which implement the function specified in the flowchart flow or flows and/or block diagram block or blocks.
These computer program instructions may also be loaded onto a computer or other programmable data processing apparatus to cause a series of operational steps to be performed on the computer or other programmable apparatus to produce a computer implemented process such that the instructions which execute on the computer or other programmable apparatus provide steps for implementing the functions specified in the flowchart flow or flows and/or block diagram block or blocks.
It will be understood by those skilled in the art that all or part of the processes of the methods of the embodiments described above may be implemented by a computer program, which may be stored in a computer readable storage medium and executed by a computer to implement the processes of the embodiments of the methods described above. The storage medium may be a magnetic disk, an optical disk, a Read-Only Memory (ROM), a Random Access Memory (RAM), or the like.
The above description is only a preferred embodiment of the present invention and is not intended to limit the present invention, and various modifications and changes may be made by those skilled in the art. Any modification, equivalent replacement, or improvement made within the spirit and principle of the present invention should be included in the protection scope of the present invention.

Claims (7)

1. A method of gene sequence alignment comprising:
obtaining a plurality of gene sequences;
coding parameters of the hidden Markov model into honey sources, and adopting the hidden Markov models corresponding to all the honey sources to obtain various hidden state sequences corresponding to each gene sequence; judging whether a termination condition is met, if so, comparing every two hidden state sequences of all gene sequences obtained by a hidden Markov model corresponding to a honey source with the maximum fitness value to obtain a gene sequence most similar to each gene sequence; otherwise, dividing all the honey sources into a plurality of populations based on the fitness value of each honey source, and performing difference learning among different populations to optimize the parameters of the hidden Markov model until a termination condition is met;
wherein the fitness value is honey source X i The following hidden markov model yields an SPS of the sequence of hidden states:
Figure FDA0003881485680000011
wherein l i Indicating the ith aligned hidden state sequence, l j Representing the j hidden state sequence to be compared, D is a function representing the similarity of the two sequences;
the differential learning between different populations comprises:
for a certain honey source, acquiring global optimal solutions of all the honey sources, and randomly selecting one solution in different populations respectively; calculating different difference values based on the global optimal solution and the random selected solution; weighting and summing the different difference values, and then summing the weighted sum with the honey source to obtain leading bees; if the fitness value of the leading bee is larger than that of the honey source, replacing the honey source with the leading bee;
for a certain honey source, acquiring global optimal solutions of all the honey sources, and randomly selecting one solution in different populations respectively; calculating different difference values based on the global optimal solution and the random selected solution; weighting and summing the different difference values and the honey source to obtain follower bees; if the fitness value of the follower bees is larger than that of the honey source, replacing the honey source with the follower bees;
and for a certain honey source, if the iteration failure times of the honey source reach the set times, generating a plurality of new solutions by adopting a plurality of new solution generation modes, and selecting the optimal solution from the plurality of new solutions according to a greedy selection strategy to replace the honey source.
2. The method of claim 1, wherein the new solution is generated by: a new solution is randomly generated.
3. The method of claim 1, wherein the new solution is generated by:
selecting a population with the maximum fitness value from a plurality of populations as a base population, and taking the rest populations as auxiliary populations;
randomly selecting a honey source from the basic population as a basic honey source;
randomly selecting a honey source in each auxiliary population as an auxiliary honey source;
and calculating the difference between the auxiliary honey sources, multiplying the difference by a random number, and summing the sum with the basic honey source to obtain a new solution.
4. The method of claim 1, wherein the new solution is generated by:
selecting a minimum value, a maximum value and a global optimal solution from all honey sources;
and calculating the difference between the minimum value plus the maximum value and the global optimal solution to obtain a new solution.
5. A system for aligning gene sequences, comprising:
a gene sequence acquisition module configured to: obtaining a plurality of gene sequences;
a gene sequence alignment module configured to: coding parameters of the hidden Markov model into honey sources, and adopting the hidden Markov models corresponding to all the honey sources to obtain various hidden state sequences corresponding to each gene sequence; judging whether a termination condition is met, if so, comparing every two hidden state sequences of all gene sequences obtained by the hidden Markov model corresponding to the honey source with the maximum fitness value to obtain a gene sequence most similar to each gene sequence; otherwise, dividing all the honey sources into a plurality of populations based on the fitness value of each honey source, and performing difference learning among different populations to optimize the parameters of the hidden Markov model until a termination condition is met;
wherein the fitness value is honey source X i The following hidden markov model yields an SPS of the sequence of hidden states:
Figure FDA0003881485680000031
wherein l i Representing the i-th aligned hidden state sequence, l j Representing the j-th hidden state sequence to be compared, and D is a function representing the similarity of the two sequences;
the differential learning between different populations comprises:
for a certain honey source, acquiring global optimal solutions of all the honey sources, and randomly selecting one solution from different populations respectively; calculating different difference values based on the global optimal solution and the random selected solution; weighting and summing the different difference values, and then summing the weighted sum with the honey source to obtain leading bees; if the fitness value of the leading bee is larger than that of the honey source, replacing the honey source with the leading bee;
for a certain honey source, acquiring global optimal solutions of all the honey sources, and randomly selecting one solution in different populations respectively; calculating different difference values based on the global optimal solution and the random selected solution; weighting and summing the different difference values and the honey source to obtain follower bees; if the fitness value of the follower bees is larger than that of the honey source, replacing the honey source with the follower bees;
and for a certain honey source, if the iteration failure times of the honey source reach the set times, generating a plurality of new solutions by adopting a plurality of new solution generation modes, and selecting the optimal solution from the plurality of new solutions according to a greedy selection strategy to replace the honey source.
6. A computer-readable storage medium, on which a computer program is stored, which program, when executed by a processor, carries out the steps of a method of gene sequence alignment according to any one of claims 1 to 4.
7. A computer device comprising a memory, a processor and a computer program stored on the memory and executable on the processor, wherein the processor executes the program to perform the steps of a method of gene sequence alignment according to any one of claims 1-4.
CN202210044384.1A 2022-01-14 2022-01-14 Gene sequence comparison method and system Active CN114550827B (en)

Priority Applications (1)

Application Number Priority Date Filing Date Title
CN202210044384.1A CN114550827B (en) 2022-01-14 2022-01-14 Gene sequence comparison method and system

Applications Claiming Priority (1)

Application Number Priority Date Filing Date Title
CN202210044384.1A CN114550827B (en) 2022-01-14 2022-01-14 Gene sequence comparison method and system

Publications (2)

Publication Number Publication Date
CN114550827A CN114550827A (en) 2022-05-27
CN114550827B true CN114550827B (en) 2022-11-22

Family

ID=81671250

Family Applications (1)

Application Number Title Priority Date Filing Date
CN202210044384.1A Active CN114550827B (en) 2022-01-14 2022-01-14 Gene sequence comparison method and system

Country Status (1)

Country Link
CN (1) CN114550827B (en)

Citations (5)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
KR20110066380A (en) * 2009-12-11 2011-06-17 한국생명공학연구원 System and method for identifying and classifying the resistance gene in plant using the hidden markov model
CN106202998A (en) * 2016-07-05 2016-12-07 集美大学 A kind of method of non-mode biology transcript profile gene order structural analysis
CN108629400A (en) * 2018-05-15 2018-10-09 福州大学 A kind of chaos artificial bee colony algorithm based on Levy search
CN110570909A (en) * 2019-09-11 2019-12-13 华中农业大学 Method for mining epistatic sites of artificial bee colony optimized Bayesian network
CN113539378A (en) * 2021-07-16 2021-10-22 明科生物技术(杭州)有限公司 Data analysis method, system, equipment and storage medium of virus database

Family Cites Families (4)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
WO2007134162A2 (en) * 2006-05-10 2007-11-22 Washington State University Research Foundation Involvement of a novel nuclear-encoded mitochondrial poly(a) polymerase papd1 in extreme obesity-related phenotypes in mammals
CN107577918A (en) * 2017-08-22 2018-01-12 山东师范大学 The recognition methods of CpG islands, device based on genetic algorithm and hidden Markov model
CN110456815A (en) * 2019-07-04 2019-11-15 北京航空航天大学 It is a kind of based on the heuristic intelligent unmanned plane cluster co-located method of army antenna
CN113257337A (en) * 2021-04-20 2021-08-13 浙江工业大学 Protein multi-sequence comparison method based on metagenome

Patent Citations (5)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
KR20110066380A (en) * 2009-12-11 2011-06-17 한국생명공학연구원 System and method for identifying and classifying the resistance gene in plant using the hidden markov model
CN106202998A (en) * 2016-07-05 2016-12-07 集美大学 A kind of method of non-mode biology transcript profile gene order structural analysis
CN108629400A (en) * 2018-05-15 2018-10-09 福州大学 A kind of chaos artificial bee colony algorithm based on Levy search
CN110570909A (en) * 2019-09-11 2019-12-13 华中农业大学 Method for mining epistatic sites of artificial bee colony optimized Bayesian network
CN113539378A (en) * 2021-07-16 2021-10-22 明科生物技术(杭州)有限公司 Data analysis method, system, equipment and storage medium of virus database

Non-Patent Citations (3)

* Cited by examiner, † Cited by third party
Title
Hidden Markov Model: Application towards genomic analysis;Shary Jose 等;《2016 International Conference on Circuit, Power and Computing Technologies (ICCPCT)》;20160804;全文 *
基于马尔可夫链的人工蜂群算法;郭佳 等;《北京邮电大学学报》;20200229;第43卷(第1期);全文 *
群智能优化算法在多序列比对中的应用;徐小俊;《中国优秀硕士学位论文全文数据库 (信息科技辑)》;20111130(第11期);全文 *

Also Published As

Publication number Publication date
CN114550827A (en) 2022-05-27

Similar Documents

Publication Publication Date Title
CN108389211B (en) Image segmentation method based on improved whale optimized fuzzy clustering
CN107169983B (en) Multi-threshold image segmentation method based on cross variation artificial fish swarm algorithm
CN111291854A (en) Artificial bee colony algorithm optimization method based on multiple improved strategies
Cai et al. Softer pruning, incremental regularization
Zhang et al. Evolving neural network classifiers and feature subset using artificial fish swarm
Pan et al. SemiBin2: self-supervised contrastive learning leads to better MAGs for short-and long-read sequencing
CN113611356A (en) Drug relocation prediction method based on self-supervision graph representation learning
Shokouhifar et al. A hybrid approach for effective feature selection using neural networks and artificial bee colony optimization
Li et al. Rethinking the optimization of average precision: Only penalizing negative instances before positive ones is enough
CN114550827B (en) Gene sequence comparison method and system
Shokouhifar et al. An artificial bee colony optimization for feature subset selection using supervised fuzzy C_means algorithm
CN111209939A (en) SVM classification prediction method with intelligent parameter optimization module
Sun et al. Class-based quantization for neural networks
CN107273842B (en) Selective integrated face recognition method based on CSJOGA algorithm
Bentley et al. COIL: Constrained optimization in learned latent space: Learning representations for valid solutions
CN108256623A (en) Particle swarm optimization on multiple populations based on period interaction mechanism and knowledge plate synergistic mechanism
Lanzarini et al. A new binary pso with velocity control
Hanczar et al. Phenotypes prediction from gene expression data with deep multilayer perceptron and unsupervised pre-training
Li et al. Improved Otsu multi-threshold image segmentation method based on sailfish optimization
Hu et al. A classification surrogate model based evolutionary algorithm for neural network structure learning
CN112949859A (en) Improved genetic clustering algorithm
Lin et al. Weight evolution: Improving deep neural networks training through evolving inferior weight values
Murthy Genetic Algorithms: Basic principles and applications
Indira et al. Association rule mining using genetic algorithm: The role of estimation parameters
CN113345420B (en) Anti-audio generation method and system based on firefly algorithm and gradient evaluation

Legal Events

Date Code Title Description
PB01 Publication
PB01 Publication
SE01 Entry into force of request for substantive examination
SE01 Entry into force of request for substantive examination
GR01 Patent grant
GR01 Patent grant