CN114550827A - Gene sequence comparison method and system - Google Patents

Gene sequence comparison method and system Download PDF

Info

Publication number
CN114550827A
CN114550827A CN202210044384.1A CN202210044384A CN114550827A CN 114550827 A CN114550827 A CN 114550827A CN 202210044384 A CN202210044384 A CN 202210044384A CN 114550827 A CN114550827 A CN 114550827A
Authority
CN
China
Prior art keywords
honey
gene sequence
honey source
solution
hidden markov
Prior art date
Legal status (The legal status is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the status listed.)
Granted
Application number
CN202210044384.1A
Other languages
Chinese (zh)
Other versions
CN114550827B (en
Inventor
张庆科
李天奇
汪玉成
高昊
卜降龙
来明旭
张化祥
Current Assignee (The listed assignees may be inaccurate. Google has not performed a legal analysis and makes no representation or warranty as to the accuracy of the list.)
Shandong Normal University
Original Assignee
Shandong Normal University
Priority date (The priority date is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the date listed.)
Filing date
Publication date
Application filed by Shandong Normal University filed Critical Shandong Normal University
Priority to CN202210044384.1A priority Critical patent/CN114550827B/en
Publication of CN114550827A publication Critical patent/CN114550827A/en
Application granted granted Critical
Publication of CN114550827B publication Critical patent/CN114550827B/en
Active legal-status Critical Current
Anticipated expiration legal-status Critical

Links

Images

Classifications

    • GPHYSICS
    • G16INFORMATION AND COMMUNICATION TECHNOLOGY [ICT] SPECIALLY ADAPTED FOR SPECIFIC APPLICATION FIELDS
    • G16BBIOINFORMATICS, i.e. INFORMATION AND COMMUNICATION TECHNOLOGY [ICT] SPECIALLY ADAPTED FOR GENETIC OR PROTEIN-RELATED DATA PROCESSING IN COMPUTATIONAL MOLECULAR BIOLOGY
    • G16B30/00ICT specially adapted for sequence analysis involving nucleotides or amino acids
    • G16B30/10Sequence alignment; Homology search
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06NCOMPUTING ARRANGEMENTS BASED ON SPECIFIC COMPUTATIONAL MODELS
    • G06N3/00Computing arrangements based on biological models
    • G06N3/004Artificial life, i.e. computing arrangements simulating life
    • G06N3/006Artificial life, i.e. computing arrangements simulating life based on simulated virtual individual or collective life forms, e.g. social simulations or particle swarm optimisation [PSO]

Landscapes

  • Engineering & Computer Science (AREA)
  • Physics & Mathematics (AREA)
  • Life Sciences & Earth Sciences (AREA)
  • Theoretical Computer Science (AREA)
  • Biophysics (AREA)
  • General Health & Medical Sciences (AREA)
  • Health & Medical Sciences (AREA)
  • Spectroscopy & Molecular Physics (AREA)
  • Computational Linguistics (AREA)
  • Bioinformatics & Computational Biology (AREA)
  • Biotechnology (AREA)
  • Evolutionary Biology (AREA)
  • Proteomics, Peptides & Aminoacids (AREA)
  • Medical Informatics (AREA)
  • Analytical Chemistry (AREA)
  • Chemical & Material Sciences (AREA)
  • Artificial Intelligence (AREA)
  • Biomedical Technology (AREA)
  • Bioinformatics & Cheminformatics (AREA)
  • Data Mining & Analysis (AREA)
  • Evolutionary Computation (AREA)
  • Molecular Biology (AREA)
  • Computing Systems (AREA)
  • General Engineering & Computer Science (AREA)
  • General Physics & Mathematics (AREA)
  • Mathematical Physics (AREA)
  • Software Systems (AREA)
  • Measuring Or Testing Involving Enzymes Or Micro-Organisms (AREA)

Abstract

The invention provides a gene sequence comparison method and a system, comprising the following steps: coding parameters of the hidden Markov model into honey sources, and adopting the hidden Markov models corresponding to all the honey sources to obtain various hidden state sequences corresponding to each gene sequence; judging whether a termination condition is met, if so, comparing every two hidden state sequences of all gene sequences obtained by a hidden Markov model corresponding to a honey source with the maximum fitness value to obtain a gene sequence most similar to each gene sequence; otherwise, dividing all the honey sources into a plurality of populations based on the fitness value of each honey source, and performing difference learning among different populations to optimize the parameters of the hidden Markov model until the termination condition is met. The randomness of parameter search of the hidden Markov model is enhanced, the solution is prevented from falling into local optimum, and the solution precision is improved when multiple sequences are compared.

Description

Gene sequence comparison method and system
Technical Field
The invention belongs to the technical field of sequence comparison, and particularly relates to a gene sequence comparison method and system.
Background
The statements in this section merely provide background information related to the present disclosure and may not constitute prior art.
In recent years, the rapid development of biological science and technology, and how to analyze and process the implicit meaning of data in a biological database is a serious challenge for human beings. Sequence alignment reflects the information that biological sequences possess and has been widely used to identify related DNA and protein sequences. The development of sequence alignment has been in the history for decades, and a large number of sequence alignment methods have been proposed, for example, a sequence alignment algorithm based on dynamic programming, but the algorithm consumes a lot of time and space, and cannot solve the practical problem; the algorithm is a progressive alignment algorithm, but it tends to fall into local optima and cannot be corrected.
To overcome the drawbacks of the above two types of algorithms, iterative alignment algorithms based on the generation of multiple sequence alignment algorithms have emerged. The iterative comparison algorithm mainly refers to a swarm intelligence algorithm constructed based on the swarm behaviors of organisms, such as a particle swarm algorithm, a genetic algorithm, an artificial bee swarm algorithm and the like. An Artificial Bee Colony Algorithm (ABC) is a swarm intelligence algorithm based on the Bee Colony honey collection behavior. The method has the advantages of few control parameters, easy realization and the like, and has been focused and improved by more and more scholars in recent years, and has been successfully applied to optimization problems in many fields. However, with the intensive research on the ABC algorithm, it is found that the probability selection mechanism in the bee following stage fails in the later iteration stage of the population, which results in slow convergence and low solution accuracy of the algorithm in the later iteration stage.
Disclosure of Invention
In order to solve the technical problems in the background art, the invention provides a gene sequence comparison method and system, which enhance the randomness of parameter search of a hidden Markov model, avoid solving the solution to be trapped in local optimization and improve the precision of the solution when multiple sequences are compared.
In order to achieve the purpose, the invention adopts the following technical scheme:
in a first aspect, the present invention provides a method for gene sequence alignment, comprising:
obtaining a plurality of gene sequences;
coding parameters of the hidden Markov model into honey sources, and adopting the hidden Markov models corresponding to all the honey sources to obtain various hidden state sequences corresponding to each gene sequence; judging whether a termination condition is met, if so, comparing every two hidden state sequences of all gene sequences obtained by the hidden Markov model corresponding to the honey source with the maximum fitness value to obtain a gene sequence most similar to each gene sequence; otherwise, dividing all the honey sources into a plurality of populations based on the fitness value of each honey source, and performing difference learning among different populations to optimize the parameters of the hidden Markov model until a termination condition is met.
Further, the performing difference learning between different populations includes:
for a certain honey source, acquiring global optimal solutions of all the honey sources, and randomly selecting one solution in different populations respectively;
calculating different difference values based on the global optimal solution and the random selected solution;
weighting and summing the different difference values, and then summing the weighted sum with the honey source to obtain a leading bee;
and if the fitness value of the leading bee is larger than that of the honey source, replacing the honey source with the leading bee.
Further, the performing difference learning between different populations includes:
for a certain honey source, acquiring global optimal solutions of all the honey sources, and randomly selecting one solution in different populations respectively;
calculating different difference values based on the global optimal solution and the random selected solution;
weighting and summing the different difference values and the honey source to obtain follower bees;
and if the fitness value of the follower bee is larger than that of the honey source, replacing the honey source with the follower bee.
Further, the performing difference learning between different populations includes:
and for a certain honey source, if the iteration failure times of the honey source reach the set times, generating a plurality of new solutions by adopting a plurality of new solution generation modes, and selecting the optimal solution from the plurality of new solutions according to a greedy selection strategy to replace the honey source.
Further, the new solution generation method is as follows: a new solution is randomly generated.
Further, the new solution generation method is as follows:
selecting a population with the maximum fitness value from a plurality of populations as a base population, and taking the rest populations as auxiliary populations;
randomly selecting a honey source in the basic population as a basic honey source;
randomly selecting a honey source in each auxiliary population as an auxiliary honey source;
and calculating the difference between the auxiliary honey sources, multiplying the difference by a random number, and summing the sum with the basic honey source to obtain a new solution.
Further, the new solution generation method is as follows:
selecting a minimum value, a maximum value and a global optimal solution from all honey sources;
and calculating the difference between the minimum value plus the maximum value and the global optimal solution to obtain a new solution.
In a second aspect, the present invention provides a gene sequence alignment system, comprising:
a gene sequence acquisition module configured to: obtaining a plurality of gene sequences;
a gene sequence alignment module configured to: coding parameters of the hidden Markov model into honey sources, and adopting the hidden Markov models corresponding to all the honey sources to obtain various hidden state sequences corresponding to each gene sequence; judging whether a termination condition is met, if so, comparing every two hidden state sequences of all gene sequences obtained by a hidden Markov model corresponding to a honey source with the maximum fitness value to obtain a gene sequence most similar to each gene sequence; otherwise, dividing all the honey sources into a plurality of populations based on the fitness value of each honey source, and performing difference learning among different populations to optimize the parameters of the hidden Markov model until a termination condition is met.
A third aspect of the present invention provides a computer-readable storage medium having stored thereon a computer program which, when executed by a processor, implements the steps in a method of gene sequence alignment as described above.
A fourth aspect of the present invention provides a computer device, comprising a memory, a processor and a computer program stored in the memory and executable on the processor, wherein the processor executes the computer program to implement the steps of the gene sequence alignment method.
Compared with the prior art, the invention has the beneficial effects that:
the invention provides a gene sequence comparison method, which optimizes parameters of a hidden Markov model based on a manual bee colony algorithm of layered learning, can avoid the danger that the algorithm is trapped in local optimization, accelerates the convergence speed and improves the precision of multi-sequence comparison on solution.
The invention provides a gene sequence comparison method, which constructs a new hierarchical ring topology structure based on an artificial bee colony algorithm of hierarchical learning, and populations among different levels can be subjected to difference learning; therefore, the search strategies in the two stages are improved, and the overall optimization capability and the search capability of the algorithm are enhanced; the defect that the ABC algorithm is easy to converge and stagnate at the later stage of iteration due to a probability selection mechanism is avoided, the convergence speed is increased, and the accuracy of the solution is improved; solutions in three different directions are generated in the bee investigation stage, the search randomness is enhanced, and the situation that the solution is trapped in local optimization is avoided.
Drawings
The accompanying drawings, which are incorporated in and constitute a part of this specification, are included to provide a further understanding of the invention, and are included to illustrate an exemplary embodiment of the invention and not to limit the invention.
FIG. 1 is a flow chart of a method according to a first embodiment of the present invention;
FIG. 2(a) is a graph showing the convergence of the 1ad2_ ref1 gene sequence in the first embodiment of the present invention;
FIG. 2(b) is a graph showing the convergence of the 1ivy _ ref5 gene sequence in the first embodiment of the present invention;
FIG. 2(c) is a graph showing the convergence of the 451c _ ref1 gene sequence according to the first embodiment of the present invention;
FIG. 2(d) is a graph showing the convergence of the sequence of kinase _ ref1 gene according to the first embodiment of the present invention.
Detailed Description
The invention is further described with reference to the following figures and examples.
It is to be understood that the following detailed description is exemplary and is intended to provide further explanation of the invention as claimed. Unless defined otherwise, all technical and scientific terms used herein have the same meaning as commonly understood by one of ordinary skill in the art to which this invention belongs.
It is noted that the terminology used herein is for the purpose of describing particular embodiments only and is not intended to be limiting of exemplary embodiments according to the invention. As used herein, the singular forms "a", "an" and "the" are intended to include the plural forms as well, and it should be understood that when the terms "comprises" and/or "comprising" are used in this specification, they specify the presence of stated features, steps, operations, devices, components, and/or combinations thereof, unless the context clearly indicates otherwise.
Example one
This example provides a gene sequence alignment method, as shown in fig. 1, which specifically includes the following steps:
step 1, obtaining a plurality of gene sequences;
step 2, initialization: including initialization of parameters and generation of initial honey sources. The initialized parameters comprise a population size SN, the number SN of honey sources, an individual dimension D, a threshold limit, a maximum iteration number MCN, a maximum evaluation number MFE and a maximum UBjAnd minimum value LBj(ii) a The initial honey source is generated by randomly generating SN initial honey sources through an equation (1):
xi,j=LBj+rand(0,1)·(UBj-LBj) (1)
wherein x isi,jA j-th dimension vector representing an i-th honey source (individual), i ═ 1, 2, 3.. SN, j ═ 1, 2, 3.. D, { LBj,UBjDenotes a value range of a variable of the j-th dimension, and rand (0, 1) denotes a random number between 0 and 1. Each honey source XiRepresents a parameter of the hidden Markov model, and D is the number of the parameters of the hidden Markov model.
Step 3, coding parameters of the hidden Markov models into honey sources, and for each gene sequence, adopting the hidden Markov models corresponding to all the honey sources to obtain a plurality of hidden state sequences corresponding to each gene sequence; judging whether a termination condition is met (wherein the termination condition is that the iteration times reach the maximum iteration times MCN), if so, pairwise comparing hidden state sequences of all gene sequences obtained by the hidden Markov model corresponding to the honey source with the maximum fitness value to obtain a gene sequence most similar to each gene sequence (namely, a gene sequence corresponding to the hidden state sequence with the maximum similarity); otherwise, based on the fitness value of each honey source, dividing all the honey sources into a plurality of populations, and performing difference learning between different populations to optimize the parameters of the hidden markov model until the termination condition is met, i.e. executing step 301-:
301, obtaining hidden state sequences of all gene sequences based on each honey source (a parameter of a hidden Markov model), and calculating fitness value fit (X) of each individual according to formula (2)i),fit(Xi) Value of (1) is honey source XiThe lower hidden markov model yields an SPS of the hidden state sequence:
Figure BDA0003471546690000061
wherein liIndicating the ith aligned hidden state sequence, ljRepresenting the j hidden state sequence to be compared, D is a function representing the similarity of the two sequences, and in actual operation, a similarity score matrix is used for calculating D, so D (l)i,lj) Is generally expressed asiAnd ljThe replacement score corresponding to the residual value. The higher the SPS score, the better the accuracy of the alignment representing the gene sequence.
According to the constructed hierarchical ring topology structure, the population is divided into a first population S1, a second population S2 and a third population S3 according to the size of the fitness value by utilizing a hierarchical learning mode, wherein the number of the first population S1, the second population S2 and the third population S3 is integrated into SN, and the ratio of the number of the first population S1, the number of the second population S2 and the number of the third population S3 is 1: 7: 2; the fitness value of each individual in the first population S1 is greater than the fitness value of each individual in the second population S2, and the fitness value of each individual in the second population S2 is greater than the fitness value of each individual in the third population S3.
The idea of layered learning is as follows: in the whole population, the outer-layer population learns the difference of the inner-layer population, and the whole population is continuously close to a more excellent solution. Meanwhile, the innermost population is learned towards the global optimum, and a better solution is found.
Step 302, leading bee stage: for a certain honey source, acquiring global optimal solutions of all the honey sources, and randomly selecting one solution in different populations respectively; calculating different difference values based on the global optimal solution and the random selected solution; weighting and summing the different difference values, and then summing the weighted sum with the honey source to obtain leading bees; and if the fitness value of the leading bee is larger than that of the honey source, replacing the honey source with the leading bee. Specifically, based on the idea of layered learning, a search equation at the stage of the leading bees is improved, the leading bees do not perform neighborhood search only on a single honey source any more, but perform differential learning between different population levels according to the formula (3) to obtain a high-quality solution.
Figure BDA0003471546690000071
Wherein v isi,jFor newly generated solutions, xi,jIs the j-th dimension vector of the ith honey source, phii,jIs [ -1, 1 [ ]]A random number in between, and a random number,
Figure BDA0003471546690000072
is [0, 1.5 ]]Random number between, xgbest,jIs a global optimal solution, x, of the j-th dimensionS1,j、xS2,j、xs3,jThree solutions are randomly selected among the three hierarchies S1, S2, S3, respectively.
Calculating the fitness value (SPS value) fit (V) of the newly generated solution according to equation (2)i) (or new _ fit) if it is greater than the SPS value of the current individual, i.e., fit (X)i)<fit(Vi) Replacing the current lead bee individual with the new individual, and reali0; otherwise, X is reservedi,traili=traili+1。
Step 303, following the bee stage: for a certain honey source, acquiring global optimal solutions of all the honey sources, and randomly selecting one solution in different populations respectively; calculating different difference values based on the global optimal solution and the random selected solution; weighting and summing the different difference values and the honey source to obtain follower bees; and if the fitness value of the follower bee is larger than that of the honey source, replacing the honey source with the follower bee. Specifically, in the following stage, three elite improvement operators r are introduced among three different layers1、r2、r3The original probability selection mechanism is replaced by the layered learning.
vi,j=r1·xi,jr+r2·(xS2,j-xS3,j)+r3·(xgbest,j-xs1,r) (4)
Wherein r is1、r2、r3Is three [0, 1 ]]A random number in between, and r1+r2+r3=1,xgbest,jIs a global optimal solution, x, of the j-th dimensionS1,j、xS2,j、xS3,jThree solutions are randomly selected among the three hierarchies S1, S2, S3, respectively.
Calculating the SPS value of the newly generated solution according to the formula (2), if the SPS value of the current individual is larger than the SPS value of the current individual, replacing the current leading bee individual with the new individual, and obtaining the finali0; otherwise, X is reservedi,traili=traili+1。
Step 304, detecting bees: and for a certain honey source, if the iteration failure times of the honey source reach the set times, generating a plurality of new solutions by adopting a plurality of new solution generation modes, and selecting the optimal solution from the plurality of new solutions according to a greedy selection strategy to replace the honey source. Specifically, in the bee detection stage, a layered learning and opponent learning-based method is introduced, and trail is performed when iteration failsiReaches the set number of times limitiInstead of using the original single new solution generation approach, three different solutions are generated.
Randomly generating a new solution: first solution m1Still, it is generated according to (1).
Selecting a population with the maximum fitness value from a plurality of populations as a base population, and taking the rest populations as auxiliary populations; randomly selecting a honey source in the basic population as a basic honey source; randomly selecting a honey source in each auxiliary population as an auxiliary honey source; and calculating the difference between the auxiliary honey sources, multiplying the difference by a random number, and summing the sum with the basic honey source to obtain a new solution. I.e. the second solution m2The idea of learning by layering is generated according to the following formula:
m2=xS1i,j·(xS2-xS3) (5)
wherein x isS1,j、xS2,j、xS3,jAre respectively three solutions, phi, randomly selected from three layers S1, S2, S3i,jIs [ -1, 1 [ ]]A random number in between.
Selecting a minimum value, a maximum value and a global optimal solution from all honey sources; and calculating the difference between the minimum value plus the maximum value and the global optimal solution to obtain a new solution. I.e. the third solution m3According to the thought of opponent learning, a solution is searched on the opposite side of the global optimal solution to avoid the search from being trapped in local optimal solutionThe generation formula is:
m3=LB+UB-xgbest (6)
wherein LB and UB are respectively the minimum value and the maximum value of the solution, xgbestIs a globally optimal solution.
And (3) calculating the fitness values of the three newly generated solutions through a formula (2), and selecting the optimal solution as a newly generated solution according to a greedy selection strategy.
In this example, 4 sets of test experiments were performed, and the gene sequences were 1ad2_ ref1, 1ivy _ ref5, 451c _ ref1, and kinase _ ref1, respectively, and were aligned with ABC and the same set of sequences obtained by the present invention. In the experiment, the ABC algorithm and the algorithm of the invention are operated under the same experimental conditions, each test function is independently operated for 10 times, 1000 iterations are carried out, and the maximum value, the minimum value and the average value are recorded.
TABLE 1 accuracy of multiple sequence alignment test results
Figure BDA0003471546690000091
The algorithm of the present invention is significantly higher than the result of the ABC algorithm, both in terms of mean and optimum or worst value. Therefore, the superiority of the algorithm of the invention can be fully seen. In order to more fully express the performance of the algorithm of the present invention, besides giving the precision results shown in table 1, a convergence curve graph of the operation of the algorithm of the present invention (HLABC) and ABC is also shown in a form of a graph. As shown in fig. 2(a), fig. 2(b), fig. 2(c) and fig. 2(d), wherein the horizontal axis represents the Iteration number (Iteration) and the vertical axis represents the average value (Score) of the SPS. From which it can be concluded that: the invention can avoid the danger that the algorithm is trapped in local optimum, quickens the convergence speed and improves the accuracy of the solution when the multiple sequences are compared.
The invention provides a novel ring topology structure based on the idea of layered learning, improves the original search strategy to improve the randomness of search, and replaces the original probability selection mechanism with the layered learning method to improve the optimization capability and the convergence speed of the algorithm, thereby overcoming the defects of the original ABC algorithm, achieving the optimization effect of the ABC algorithm and improving the precision of solution when multiple sequences are compared.
Example two
The embodiment provides a gene sequence alignment system, which specifically comprises the following modules:
a gene sequence acquisition module configured to: obtaining a plurality of gene sequences;
a gene sequence alignment module configured to: coding parameters of the hidden Markov model into honey sources, and adopting the hidden Markov models corresponding to all the honey sources to obtain various hidden state sequences corresponding to each gene sequence; judging whether a termination condition is met, if so, comparing every two hidden state sequences of all gene sequences obtained by a hidden Markov model corresponding to a honey source with the maximum fitness value to obtain a gene sequence most similar to each gene sequence; otherwise, dividing all the honey sources into a plurality of populations based on the fitness value of each honey source, and performing difference learning among different populations to optimize the parameters of the hidden Markov model until a termination condition is met.
It should be noted that, each module in the present embodiment corresponds to each step in the first embodiment one to one, and the specific implementation process is the same, which is not described herein again.
EXAMPLE III
This embodiment provides a computer-readable storage medium, on which a computer program is stored, which when executed by a processor, implements the steps in a method for gene sequence alignment as described in the first embodiment above.
Example four
This embodiment provides a computer device, which includes a memory, a processor, and a computer program stored in the memory and executable on the processor, wherein the processor executes the computer program to implement the steps of the gene sequence alignment method according to the above embodiment.
As will be appreciated by one skilled in the art, embodiments of the present invention may be provided as a method, system, or computer program product. Accordingly, the present invention may take the form of a hardware embodiment, a software embodiment, or an embodiment combining software and hardware aspects. Furthermore, the present invention may take the form of a computer program product embodied on one or more computer-usable storage media (including, but not limited to, disk storage, optical storage, and the like) having computer-usable program code embodied therein.
The present invention is described with reference to flowchart illustrations and/or block diagrams of methods, apparatus (systems), and computer program products according to embodiments of the invention. It will be understood that each flow and/or block of the flow diagrams and/or block diagrams, and combinations of flows and/or blocks in the flow diagrams and/or block diagrams, can be implemented by computer program instructions. These computer program instructions may be provided to a processor of a general purpose computer, special purpose computer, embedded processor, or other programmable data processing apparatus to produce a machine, such that the instructions, which execute via the processor of the computer or other programmable data processing apparatus, create means for implementing the functions specified in the flowchart flow or flows and/or block diagram block or blocks.
These computer program instructions may also be stored in a computer-readable memory that can direct a computer or other programmable data processing apparatus to function in a particular manner, such that the instructions stored in the computer-readable memory produce an article of manufacture including instruction means which implement the function specified in the flowchart flow or flows and/or block diagram block or blocks.
These computer program instructions may also be loaded onto a computer or other programmable data processing apparatus to cause a series of operational steps to be performed on the computer or other programmable apparatus to produce a computer implemented process such that the instructions which execute on the computer or other programmable apparatus provide steps for implementing the functions specified in the flowchart flow or flows and/or block diagram block or blocks.
It will be understood by those skilled in the art that all or part of the processes of the methods of the embodiments described above can be implemented by a computer program, which can be stored in a computer-readable storage medium, and when executed, can include the processes of the embodiments of the methods described above. The storage medium may be a magnetic disk, an optical disk, a Read-Only Memory (ROM), a Random Access Memory (RAM), or the like.
The above description is only a preferred embodiment of the present invention and is not intended to limit the present invention, and various modifications and changes may be made by those skilled in the art. Any modification, equivalent replacement, or improvement made within the spirit and principle of the present invention should be included in the protection scope of the present invention.

Claims (10)

1. A method of gene sequence alignment comprising:
obtaining a plurality of gene sequences;
coding parameters of the hidden Markov model into honey sources, and adopting the hidden Markov models corresponding to all the honey sources to obtain various hidden state sequences corresponding to each gene sequence; judging whether a termination condition is met, if so, comparing every two hidden state sequences of all gene sequences obtained by a hidden Markov model corresponding to a honey source with the maximum fitness value to obtain a gene sequence most similar to each gene sequence; otherwise, dividing all the honey sources into a plurality of populations based on the fitness value of each honey source, and performing difference learning among different populations to optimize the parameters of the hidden Markov model until a termination condition is met.
2. The method of claim 1, wherein the learning the differences between different populations comprises:
for a certain honey source, acquiring global optimal solutions of all the honey sources, and randomly selecting one solution in different populations respectively;
calculating different difference values based on the global optimal solution and the random selected solution;
weighting and summing the different difference values, and then summing the weighted sum with the honey source to obtain leading bees;
and if the fitness value of the leading bee is larger than that of the honey source, replacing the honey source with the leading bee.
3. The method of claim 1, wherein the learning the differences between different populations comprises:
for a certain honey source, acquiring global optimal solutions of all the honey sources, and randomly selecting one solution in different populations respectively;
calculating different difference values based on the global optimal solution and the random selected solution;
weighting and summing the different difference values and the honey source to obtain follower bees;
and if the fitness value of the follower bee is larger than that of the honey source, replacing the honey source with the follower bee.
4. The method of claim 1, wherein the learning the differences between different populations comprises:
and for a certain honey source, if the iteration failure times of the honey source reach the set times, generating a plurality of new solutions by adopting a plurality of new solution generation modes, and selecting the optimal solution from the plurality of new solutions according to a greedy selection strategy to replace the honey source.
5. The method of claim 4, wherein the new solution is generated by: a new solution is randomly generated.
6. The method of claim 4, wherein the new solution is generated by:
selecting a population with the maximum fitness value from a plurality of populations as a base population, and taking the rest populations as auxiliary populations;
randomly selecting a honey source in the basic population as a basic honey source;
randomly selecting a honey source in each auxiliary population as an auxiliary honey source;
and calculating the difference between the auxiliary honey sources, multiplying the difference by a random number, and summing the sum with the basic honey source to obtain a new solution.
7. The method of claim 4, wherein the new solution is generated by:
selecting a minimum value, a maximum value and a global optimal solution from all honey sources;
and calculating the difference between the minimum value plus the maximum value and the global optimal solution to obtain a new solution.
8. A system for aligning gene sequences, comprising:
a gene sequence acquisition module configured to: obtaining a plurality of gene sequences;
a gene sequence alignment module configured to: coding parameters of the hidden Markov model into honey sources, and adopting the hidden Markov models corresponding to all the honey sources to obtain various hidden state sequences corresponding to each gene sequence; judging whether a termination condition is met, if so, comparing every two hidden state sequences of all gene sequences obtained by a hidden Markov model corresponding to a honey source with the maximum fitness value to obtain a gene sequence most similar to each gene sequence; otherwise, dividing all the honey sources into a plurality of populations based on the fitness value of each honey source, and performing difference learning among different populations to optimize the parameters of the hidden Markov model until a termination condition is met.
9. A computer-readable storage medium, on which a computer program is stored, which, when being executed by a processor, carries out the steps of a method for gene sequence alignment according to any one of claims 1 to 7.
10. A computer device comprising a memory, a processor and a computer program stored on the memory and executable on the processor, wherein the processor executes the program to perform the steps of a method of gene sequence alignment according to any one of claims 1-7.
CN202210044384.1A 2022-01-14 2022-01-14 Gene sequence comparison method and system Active CN114550827B (en)

Priority Applications (1)

Application Number Priority Date Filing Date Title
CN202210044384.1A CN114550827B (en) 2022-01-14 2022-01-14 Gene sequence comparison method and system

Applications Claiming Priority (1)

Application Number Priority Date Filing Date Title
CN202210044384.1A CN114550827B (en) 2022-01-14 2022-01-14 Gene sequence comparison method and system

Publications (2)

Publication Number Publication Date
CN114550827A true CN114550827A (en) 2022-05-27
CN114550827B CN114550827B (en) 2022-11-22

Family

ID=81671250

Family Applications (1)

Application Number Title Priority Date Filing Date
CN202210044384.1A Active CN114550827B (en) 2022-01-14 2022-01-14 Gene sequence comparison method and system

Country Status (1)

Country Link
CN (1) CN114550827B (en)

Citations (9)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
US20090024401A1 (en) * 2006-05-10 2009-01-22 Zhihua Jiang Involvement of a Novel Nuclear-Encoded Mitochondrial Poly(A) Polymerase PAPD1 in Extreme Obesity-Related Phenotypes in Mammals
KR20110066380A (en) * 2009-12-11 2011-06-17 한국생명공학연구원 System and method for identifying and classifying the resistance gene in plant using the hidden markov model
CN106202998A (en) * 2016-07-05 2016-12-07 集美大学 A kind of method of non-mode biology transcript profile gene order structural analysis
CN107577918A (en) * 2017-08-22 2018-01-12 山东师范大学 The recognition methods of CpG islands, device based on genetic algorithm and hidden Markov model
CN108629400A (en) * 2018-05-15 2018-10-09 福州大学 A kind of chaos artificial bee colony algorithm based on Levy search
CN110456815A (en) * 2019-07-04 2019-11-15 北京航空航天大学 It is a kind of based on the heuristic intelligent unmanned plane cluster co-located method of army antenna
CN110570909A (en) * 2019-09-11 2019-12-13 华中农业大学 Method for mining epistatic sites of artificial bee colony optimized Bayesian network
CN113257337A (en) * 2021-04-20 2021-08-13 浙江工业大学 Protein multi-sequence comparison method based on metagenome
CN113539378A (en) * 2021-07-16 2021-10-22 明科生物技术(杭州)有限公司 Data analysis method, system, equipment and storage medium of virus database

Patent Citations (9)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
US20090024401A1 (en) * 2006-05-10 2009-01-22 Zhihua Jiang Involvement of a Novel Nuclear-Encoded Mitochondrial Poly(A) Polymerase PAPD1 in Extreme Obesity-Related Phenotypes in Mammals
KR20110066380A (en) * 2009-12-11 2011-06-17 한국생명공학연구원 System and method for identifying and classifying the resistance gene in plant using the hidden markov model
CN106202998A (en) * 2016-07-05 2016-12-07 集美大学 A kind of method of non-mode biology transcript profile gene order structural analysis
CN107577918A (en) * 2017-08-22 2018-01-12 山东师范大学 The recognition methods of CpG islands, device based on genetic algorithm and hidden Markov model
CN108629400A (en) * 2018-05-15 2018-10-09 福州大学 A kind of chaos artificial bee colony algorithm based on Levy search
CN110456815A (en) * 2019-07-04 2019-11-15 北京航空航天大学 It is a kind of based on the heuristic intelligent unmanned plane cluster co-located method of army antenna
CN110570909A (en) * 2019-09-11 2019-12-13 华中农业大学 Method for mining epistatic sites of artificial bee colony optimized Bayesian network
CN113257337A (en) * 2021-04-20 2021-08-13 浙江工业大学 Protein multi-sequence comparison method based on metagenome
CN113539378A (en) * 2021-07-16 2021-10-22 明科生物技术(杭州)有限公司 Data analysis method, system, equipment and storage medium of virus database

Non-Patent Citations (3)

* Cited by examiner, † Cited by third party
Title
SHARY JOSE 等: "Hidden Markov Model: Application towards genomic analysis", 《2016 INTERNATIONAL CONFERENCE ON CIRCUIT, POWER AND COMPUTING TECHNOLOGIES (ICCPCT)》 *
徐小俊: "群智能优化算法在多序列比对中的应用", 《中国优秀硕士学位论文全文数据库 (信息科技辑)》 *
郭佳 等: "基于马尔可夫链的人工蜂群算法", 《北京邮电大学学报》 *

Also Published As

Publication number Publication date
CN114550827B (en) 2022-11-22

Similar Documents

Publication Publication Date Title
Yuan et al. Sampling+ reweighting: Boosting the performance of AdaBoost on imbalanced datasets
CN107169983B (en) Multi-threshold image segmentation method based on cross variation artificial fish swarm algorithm
Cai et al. Softer pruning, incremental regularization
CN111291854A (en) Artificial bee colony algorithm optimization method based on multiple improved strategies
Zhang et al. Evolving neural network classifiers and feature subset using artificial fish swarm
Pan et al. SemiBin2: self-supervised contrastive learning leads to better MAGs for short-and long-read sequencing
Li et al. Automatic design of machine learning via evolutionary computation: A survey
Shokouhifar et al. A hybrid approach for effective feature selection using neural networks and artificial bee colony optimization
Shokouhifar et al. An artificial bee colony optimization for feature subset selection using supervised fuzzy C_means algorithm
CN114550827B (en) Gene sequence comparison method and system
Bentley et al. COIL: Constrained optimization in learned latent space: Learning representations for valid solutions
Sun et al. Class-based quantization for neural networks
CN108256623A (en) Particle swarm optimization on multiple populations based on period interaction mechanism and knowledge plate synergistic mechanism
Lanzarini et al. A new binary pso with velocity control
Hanczar et al. Phenotypes prediction from gene expression data with deep multilayer perceptron and unsupervised pre-training
Li et al. Improved Otsu multi-threshold image segmentation method based on sailfish optimization
Hu et al. A classification surrogate model based evolutionary algorithm for neural network structure learning
Lin et al. Weight evolution: Improving deep neural networks training through evolving inferior weight values
Santiago et al. Evolutionary approach to feature selection with associative models
Lun et al. A Bilevel Genetic Algorithm for Global Optimization Problems
Zhang Multi-layer attention aggregation in deep neural network
CN113345420B (en) Anti-audio generation method and system based on firefly algorithm and gradient evaluation
Shi et al. Feature-Gate Coupling for Dynamic Network Pruning
Jullapak et al. Adaptive Learning Rate For Neural Network Classification Model
CN114300038B (en) Multi-sequence comparison method and system based on improved biological geography optimization algorithm

Legal Events

Date Code Title Description
PB01 Publication
PB01 Publication
SE01 Entry into force of request for substantive examination
SE01 Entry into force of request for substantive examination
GR01 Patent grant
GR01 Patent grant