CN111339635B - DNA storage coding optimization method of multi-element universe algorithm based on damping factors - Google Patents

DNA storage coding optimization method of multi-element universe algorithm based on damping factors Download PDF

Info

Publication number
CN111339635B
CN111339635B CN202010051588.9A CN202010051588A CN111339635B CN 111339635 B CN111339635 B CN 111339635B CN 202010051588 A CN202010051588 A CN 202010051588A CN 111339635 B CN111339635 B CN 111339635B
Authority
CN
China
Prior art keywords
universe
dna
fitness
damping
initial
Prior art date
Legal status (The legal status is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the status listed.)
Active
Application number
CN202010051588.9A
Other languages
Chinese (zh)
Other versions
CN111339635A (en
Inventor
王宾
曹犇
吕卉
张强
魏小鹏
Current Assignee (The listed assignees may be inaccurate. Google has not performed a legal analysis and makes no representation or warranty as to the accuracy of the list.)
Dalian University
Original Assignee
Dalian University
Priority date (The priority date is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the date listed.)
Filing date
Publication date
Application filed by Dalian University filed Critical Dalian University
Priority to CN202010051588.9A priority Critical patent/CN111339635B/en
Publication of CN111339635A publication Critical patent/CN111339635A/en
Application granted granted Critical
Publication of CN111339635B publication Critical patent/CN111339635B/en
Active legal-status Critical Current
Anticipated expiration legal-status Critical

Links

Images

Classifications

    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06NCOMPUTING ARRANGEMENTS BASED ON SPECIFIC COMPUTATIONAL MODELS
    • G06N3/00Computing arrangements based on biological models
    • G06N3/004Artificial life, i.e. computing arrangements simulating life
    • G06N3/006Artificial life, i.e. computing arrangements simulating life based on simulated virtual individual or collective life forms, e.g. social simulations or particle swarm optimisation [PSO]

Landscapes

  • Engineering & Computer Science (AREA)
  • Theoretical Computer Science (AREA)
  • Physics & Mathematics (AREA)
  • Data Mining & Analysis (AREA)
  • General Health & Medical Sciences (AREA)
  • Biomedical Technology (AREA)
  • Biophysics (AREA)
  • Computational Linguistics (AREA)
  • Life Sciences & Earth Sciences (AREA)
  • Evolutionary Computation (AREA)
  • Artificial Intelligence (AREA)
  • Molecular Biology (AREA)
  • Computing Systems (AREA)
  • General Engineering & Computer Science (AREA)
  • General Physics & Mathematics (AREA)
  • Mathematical Physics (AREA)
  • Software Systems (AREA)
  • Health & Medical Sciences (AREA)
  • Management, Administration, Business Operations System, And Electronic Commerce (AREA)

Abstract

The invention discloses a DNA storage coding optimization method of a multi-element universe algorithm based on damping factors, which comprises the following steps: constructing an optimal DNA coding sequence meeting combination constraint conditions, firstly constructing a certain number of DNA sequences as an initial population, and evaluating and sequencing the fitness of the population. Secondly, the obtained DNA coding sequence is utilized, and an updated formula containing damping factors and Levy flight search are used for optimization, so that the DNA storage coding sequence with higher fitness is obtained. And then judging whether to add the alternative solution set according to the constraint by constraint comparison. And finally, outputting the optimal DNA storage coding sequence. The method can search out the DNA storage coding sequence with better quantity.

Description

DNA storage coding optimization method of multi-element universe algorithm based on damping factors
Technical Field
The invention relates to a group intelligent optimization algorithm and DNA storage coding, in particular to a method for optimizing a DNA coding sequence by using a multi-element universe algorithm, damping factors and Levy flight, which belongs to the field of coding design in DNA storage.
Background
The massive growth of data has made it desirable to have new media that are more efficient in storage and have higher storage capacity. DNA acts as a high density storage medium and has long term stability as a viable solution. DNA consists of four nucleotides ATCG, the theoretical storage capacity being twice that of the binary system in a conventional electronic system. Studies have shown that DNA can be stored for over a thousand years under suitable conditions. DNA-based data storage is commonly used for pioneering work, for example Joe Davis proposes an pioneering "Microvenus" project, using bacteria as a storage medium for storing non-biological information. At the beginning of the century, bancroft et al proposed a simple coding method using codon triplets, showing the great potential of DNA as a storage medium. DNA storage has advantages over storage time, and DNA data storage can be maintained for years under the conditions of adaptation. However, the cost of reading and writing DNA data remains high. However, recently, DNA synthesis and sequencing methods have been rapidly developed, and DNA storage will be a very competitive storage solution in the future.
Disclosure of Invention
The application provides a DNA storage coding optimization method of a multi-element universe algorithm based on damping factors, which comprises the steps of firstly searching an initial solution set for an initial population by using a worm hole strategy in the multi-element universe algorithm; secondly, updating the universe set by using an updating formula with a damping disturbance factor; then, carrying out Levy flight operation on the optimal universe in the updated universe set; finally, comparing whether the obtained set meets constraint conditions or not, and adding an alternative solution set meeting the constraint; the method can construct a DNA storage coding sequence with a better quantity.
In order to achieve the above purpose, the technical scheme of the application is as follows: the DNA storage coding optimization method of the multi-element universe algorithm based on the damping factors comprises the following specific steps: constructing an optimal DNA coding sequence meeting combination constraint conditions, firstly constructing a certain number of DNA sequences as an initial population, and evaluating and sequencing the fitness of the population. Secondly, the obtained DNA coding sequence is utilized to optimize by using a damping factor and Levy flight, so as to obtain the DNA coding sequence with higher fitness. And then judging whether to add the alternative solution set according to the constraint by constraint comparison. Finally, the optimal DNA coding sequence is output.
By adopting the technical scheme, the invention can obtain the following technical effects:
1. the fitness calculation is carried out on the initial population by using a multi-universe algorithm, and the introduction of the black/white tunnel not only can randomly transmit substances to the best universe, but also can improve the average fitness of the initial population;
2. the update strategy of Levy flight is used for replacing random update based on the current global optimum, so that the influence of individuals with maximum and minimum values on an update mechanism is reduced, and the convergence speed is increased. When an undesirable state is reached in the optimizing process, the damping factor can be used for jumping out of the current state to try to search in a larger range, so that the algorithm can be prevented from being trapped into local optimum in the later stage;
3. the DNA sequence optimization algorithm of the multi-element universe algorithm of the damping factor provided by the invention can search out the DNA storage coding sequences with better quantity.
Drawings
FIG. 1 is a flow chart of an implementation of the present invention.
Detailed Description
The following description of the embodiments of the present invention will be made with reference to the accompanying drawings, in which it is to be understood that the examples described are merely some, but not all embodiments of the present invention. All other embodiments, which can be made by a person skilled in the art without making any inventive effort, are intended to fall within the scope of the present invention.
Four constraint conditions are respectively full discontinuity constraint, editing distance, GC content and address irrelevant constraint. And taking the sum of the editing distances in the constraint conditions as an objective function, and taking the other three terms as the constraint conditions. The fitness value of each individual is calculated for use in the second step of the claims. The total discontinuity constraint indicates that identical bases in a DNA sequence cannot occur consecutively when adjacent. The edit distance refers to the sum of the minimum number of single character edits (insertions, deletions, substitutions) between pairs of DNA sequences. GC content constraint means that the number of guanine (G) and cytosine (C) in any one sequence in the set of DNA sequences is a percentage of the number of bases in the entire sequence, which is constrained to 50% in this example. Address-independent constraints refer to pairs of DNA sequences L, H. The prefix of L cannot appear as a suffix of H and vice versa, the prefix length is set to 3 in the present application.
The detailed steps are as follows:
step 1: generating an initial universe population, wherein parameters TDR, WEP, MAXIter, C1 and WEP required by an initialization algorithm are the existence probability of a worm hole, WEP is a travel distance rate, and MAXtime is the maximum iteration number; c1 is a damping disturbance factor;
step 2: calculating fitness (expansion rate) of each universe, updating a parameter best_universe, namely the current Best universe, sequencing an initial universe population by using a worm hole strategy in a multi-universe algorithm, selecting an optimal fitness universe, and taking the current fitness universe as an initial universe set;
step 3: generating random number r 1 Sequentially selecting universe through roulette to generate white holes, and exchanging substances with other universe through the white holes by the optimal universe;
step 4: for each cosmic individual, a random number r is generated 2 For r 2 Judging the existence probability WEP of the worm hole, if r 2 If the probability of existence of the white hole is smaller than the probability WEP, executing the step 5, otherwise executing the step 7;
step 5: generating two random numbers r 3 、r 4 And based on random number r 4 And the damping disturbance factor C1 and the travel distance rate TDR to update the cosmic substance, if r 3 <0.5 executing the update formula 2, otherwise executing the update formula 3;
step 6: taking the updated result as input of Levy flight search, and carrying out Levy flight search operation on the optimum adaptability universe as the center;
step 7: calculating whether other universe and initial universe meet constraint conditions, and adding an initial universe set if the constraint conditions are met;
step 8: judging whether the maximum iteration times are reached, if yes, carrying out the step 9, otherwise, returning to the step 2;
step 9: and counting the results, and outputting the maximum number of sequences.
Example 1
The embodiment of the invention is implemented on the premise of the technical scheme of the invention, and a detailed implementation mode and a specific operation process are provided, but the protection scope of the invention is not limited to the following embodiment. In the example, the DNA coding length n is 8, the editing distance constraint is d is more than or equal to 5, and the total discontinuity constraint, the GC content constraint and the address uncorrelated constraint are as described above.
Step 1: the population was initialized to generate 1000 DNA coding sequences of length 8. The method comprises the steps that related parameters required by an algorithm are initialized, min in the worm hole existence probability WEP is 0.2, max is 1, and p in the travel distance rate TDR is 6;
step 2: searching an initial population by using a worm hole strategy in a multi-element universe algorithm, firstly initializing the fitness of the universe population, sequencing the fitness of the universe, selecting the optimal fitness, and taking the current fitness as an initial universe set. Simulation experiments are carried out by MATLAB on the embodiment, and initial set of 29 is obtained through GC content, total discontinuity and address uncorrelated constraint;
step 3: and (3) continuing optimizing the 29 8-dimensional DNA sequences obtained in the step (2) through constraint by using a multi-universe algorithm. In the embodiment, the result of sequencing the fitness of universe individuals is obtained by using the sort () function in MATLAB, and a random number r is used 1 The universe is selected to generate white holes through roulette in turn, substances are exchanged with other universe, and the universe substances are updated in the following ways:
Figure GDA0002468784490000051
wherein X is ij Represents the jth substance, X in the ith universe vj And the same is true. U (U) i The i-th universe is represented, so NI (Ui) is the standard expansion rate of the i-th universe. Selecting the jth substance of the jth universe as X through a roulette mechanism wj . Wherein r is 1 Is [0,1]Random numbers in between;
step 4: for each universe, a random number r is generated 2 For r 2 Judging the existence probability WEP of the worm hole, if r 2 If the probability of existence of the hole is smaller than WEP, executing the step 5, otherwise executing the step 7;
step 5: at [0,1]Interval generation of two random numbers r 3 ,r 4 And according to the random number r 4 And travel distance rate TDR to update the cosmic substance if r 3 <0.5 executing the update formula 2, otherwise executing the update formula 3;
x ij =X j +C1*TDR×((ub j -lb j )×r 4 +lb j ) (2)
x ij =X j -C1*TDR×((ub j -lb j )×r 4 +lb j ) (3)
Figure GDA0002468784490000061
wherein X is ij Represents the jth substance, X in the ith universe j The jth substance representing the best universe created at present has boundaries of ub j And lb j C1 is the damping disturbance factor calculated by equation 4, TDR is the adaptive parameter travel distance rate, r4 is [0,1]Random numbers of intervals.
Step 6: taking the updated result as input of Levy flight search, and carrying out Levy flight search operation on the universe with the optimal fitness;
step 7: calculating whether other universe and initial universe meet constraint conditions, and adding initial universe set newDNA if the constraint conditions are met;
step 8: judging whether the maximum iteration times reach 1000 generations, if yes, carrying out the step 9, otherwise, returning to the step 2;
step 9: counting the results and outputting the maximum number of sequences;
the invention provides a DNA storage coding optimization method of a multi-element universe algorithm based on damping factors, which searches an initial population by using the multi-element universe algorithm based on the damping factors. Screening out DNA sequences meeting requirements through GC, address uncorrelated constraint and total discontinuity constraint, continuously updating according to a multi-element universe algorithm updating formula with damping factors based on the sequences, performing Levy flight search operation on the optimal universe after each updating, evaluating the optimal fitness, entering the next iteration, and finally taking the obtained maximum DNA sequence coding set as an output result. In the invention, under the running environment of Intel (R) CPU3.6GHz, 4.0GB memory and Windows10, the simulation experiment is carried out on the algorithm by means of MATLAB, and the experimental result shows that the method result of the example is superior to the experimental results of other algorithms.
Table 1 shows the initial DNA sequences
Figure GDA0002468784490000071
Table 2 shows the optimal DNA sequence set for n=8 and d.gtoreq.5
Figure GDA0002468784490000072
The above description is only of the preferred embodiments of the present invention and is not intended to limit the present invention, and it should be noted that it is possible for those skilled in the art to make several improvements and modifications without departing from the technical principle of the present invention, and these improvements and modifications should also be regarded as the protection scope of the present invention.

Claims (2)

1. The DNA storage coding optimization method of the multi-element universe algorithm based on the damping factors is characterized by comprising the following specific steps:
step 1: generating an initial universe population and initializing required parameters;
step 2: calculating fitness of each universe, updating a parameter best_universe, namely the current Best universe, sequencing an initial universe population by using a worm hole strategy in a multi-universe algorithm, selecting an optimal fitness universe, and taking the current fitness universe as an initial universe set;
step 3: generating random number r 1 Sequentially selecting universe through roulette to generate white holes, and exchanging substances with other universe through the white holes by the optimal universe;
step 4: for each cosmic individual, a random number r is generated 2 For r 2 Judging the existence probability WEP of the worm hole, if r 2 If the probability of existence of the white hole is smaller than the probability WEP, executing the step 5, otherwise executing the step 7;
step 5: generating two random numbers r 3 、r 4 And based on random number r 4 Damping disturbance factor C1 and travelThe distance rate TDR updates the cosmic substance if r 3 <0.5 executing the update formula 2, otherwise executing the update formula 3;
step 6: taking the updated result as input of Levy flight search, and carrying out Levy flight search operation on the universe with the optimal fitness;
step 7: calculating whether other universe and initial universe meet constraint conditions, and adding an initial universe set if the constraint conditions are met;
step 8: judging whether the maximum iteration times are reached, if yes, carrying out the step 9, otherwise, returning to the step 2;
step 9: the results are statistically summarized, and a sequence set and the maximum number of sequences are output;
the update mode of the cosmic substance in the step 3 is as follows:
Figure FDA0004213470710000021
wherein x is ij Represents the jth substance in the ith universe, U i Representing the ith universe, NI (U i ) Is the standard fitness value of the ith universe; selecting the jth substance of the jth universe as x by a roulette mechanism vj The method comprises the steps of carrying out a first treatment on the surface of the Wherein r is 1 Is [0,1]Random numbers in between;
update formula 2 and update formula 3 are respectively:
x ij =X j +C1*TDR×((ub j -lb j )×r 4 +lb j ) (2)
x ij =X j -C1*TDR×((ub j -lb j )×r 4 +lb j ) (3)
Figure FDA0004213470710000022
wherein X is j The jth substance representing the best universe created at present has boundaries of ub j And lb j C1 is the damping disturbance factor and TDR is the adaptationThe corresponding parameter travel distance rate, r 4 Is [0,1 ]]The random number of the interval, time is the current running algebra, max_time is the maximum running algebra.
2. The method for optimizing DNA storage coding of a damping factor-based multivariate algorithm of claim 1, wherein the parameters required in step 1 comprise TDR, WEP, MAXIter, C1, wherein WEP is the probability of existence of a worm hole and MAXIter is the maximum number of iterations; c1 is the damping perturbation factor.
CN202010051588.9A 2020-01-17 2020-01-17 DNA storage coding optimization method of multi-element universe algorithm based on damping factors Active CN111339635B (en)

Priority Applications (1)

Application Number Priority Date Filing Date Title
CN202010051588.9A CN111339635B (en) 2020-01-17 2020-01-17 DNA storage coding optimization method of multi-element universe algorithm based on damping factors

Applications Claiming Priority (1)

Application Number Priority Date Filing Date Title
CN202010051588.9A CN111339635B (en) 2020-01-17 2020-01-17 DNA storage coding optimization method of multi-element universe algorithm based on damping factors

Publications (2)

Publication Number Publication Date
CN111339635A CN111339635A (en) 2020-06-26
CN111339635B true CN111339635B (en) 2023-06-30

Family

ID=71185184

Family Applications (1)

Application Number Title Priority Date Filing Date
CN202010051588.9A Active CN111339635B (en) 2020-01-17 2020-01-17 DNA storage coding optimization method of multi-element universe algorithm based on damping factors

Country Status (1)

Country Link
CN (1) CN111339635B (en)

Families Citing this family (2)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN113858200B (en) * 2021-09-29 2022-10-28 长春师范大学 Group robot control method for improving multi-universe inspired by foraging behavior of slime mold
CN113904347B (en) * 2021-09-30 2024-04-23 广东电网有限责任公司 Parameter optimization method and device for controllable phase shifter additional damping controller

Citations (3)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN109300507A (en) * 2018-09-04 2019-02-01 大连大学 The DNA encoding sequence optimisation method of chaos invasive weed algorithm based on population
CN109389206A (en) * 2018-09-26 2019-02-26 大连大学 The DNA encoding sequence optimisation method of mixing bat algorithm based on non-dominated ranking
CN110533096A (en) * 2019-08-27 2019-12-03 大连大学 The DNA of multiverse algorithm based on K-means cluster stores Encoding Optimization

Patent Citations (3)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN109300507A (en) * 2018-09-04 2019-02-01 大连大学 The DNA encoding sequence optimisation method of chaos invasive weed algorithm based on population
CN109389206A (en) * 2018-09-26 2019-02-26 大连大学 The DNA encoding sequence optimisation method of mixing bat algorithm based on non-dominated ranking
CN110533096A (en) * 2019-08-27 2019-12-03 大连大学 The DNA of multiverse algorithm based on K-means cluster stores Encoding Optimization

Non-Patent Citations (2)

* Cited by examiner, † Cited by third party
Title
G.M.Viswanathan 等.Optimizing the sucess of random searches.《NATURE》.1999,第401卷全文. *
张强 等.基于动态遗传算法的DNA 序列集合设计.《计算机学报》.2008,第31卷(第12期),全文. *

Also Published As

Publication number Publication date
CN111339635A (en) 2020-06-26

Similar Documents

Publication Publication Date Title
US11748628B2 (en) Method for optimizing reservoir operation for multiple objectives based on graph convolutional neural network and NSGA-II algorithm
Hong et al. Simultaneously applying multiple mutation operators in genetic algorithms
CN111339635B (en) DNA storage coding optimization method of multi-element universe algorithm based on damping factors
CN110533096B (en) DNA storage coding optimization method of multivariate universe algorithm based on K-means clustering
Michalewicz et al. Evolution strategies and other methods
CN114022693B (en) Single-cell RNA-seq data clustering method based on double self-supervision
CN112734051A (en) Evolutionary ensemble learning method for classification problem
CN113704500A (en) Knowledge graph community division method based on graph neural network
Anwaar et al. Genetic algorithms: Brief review on genetic algorithms for global optimization problems
Savic et al. Optimal opportunistic maintenance policy using genetic algorithms, 1: formulation
De et al. Fitness evaluation in genetic algorithms with ancestors’ influence
Yin et al. An exact schema theorem for adaptive genetic algorithm and its application to machine cell formation
Liu et al. A hybrid genetic based clustering algorithm
Polani On the optimization of self-organizing maps by genetic algorithms
CN114817653A (en) Unsupervised community discovery method based on central node graph convolutional network
CN109508415B (en) Influence maximization seed set establishment method based on social network hierarchical structure
Jian-hua et al. A hybrid genetic algorithm for reduct of attributes in decision system based on rough set theory
CN116882305B (en) Carbon dioxide and water gas alternative oil displacement multi-objective optimization method based on pre-search acceleration
Grosan et al. Adaptive representation for single objective optimization
CN117519244B (en) Unmanned plane cluster collaborative detection multi-target path planning method and system
Peng et al. Evolutionary algorithm based on overlapped gene expression
CN114067916A (en) DNA storage coding optimization method of barnacle algorithm based on weight and mixed mutation strategy
Carse et al. Parallel evolutionary learning of fuzzy rule bases using the island injection genetic algorithm
Kvasnička et al. Simulation of Baldwin effect and Dawkins memes by genetic algorithm
Mohsen et al. HSRNAFold: A harmony search algorithm for RNA secondary structure prediction based on minimum free energy

Legal Events

Date Code Title Description
PB01 Publication
PB01 Publication
SE01 Entry into force of request for substantive examination
SE01 Entry into force of request for substantive examination
GR01 Patent grant
GR01 Patent grant