CN116842730A - DNA storage coding set construction method based on improved firefly search algorithm - Google Patents

DNA storage coding set construction method based on improved firefly search algorithm Download PDF

Info

Publication number
CN116842730A
CN116842730A CN202310811122.8A CN202310811122A CN116842730A CN 116842730 A CN116842730 A CN 116842730A CN 202310811122 A CN202310811122 A CN 202310811122A CN 116842730 A CN116842730 A CN 116842730A
Authority
CN
China
Prior art keywords
firefly
individual
population
individuals
dna
Prior art date
Legal status (The legal status is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the status listed.)
Granted
Application number
CN202310811122.8A
Other languages
Chinese (zh)
Other versions
CN116842730B (en
Inventor
张勋才
申超楠
牛莹
张强
郭丹蕾
王时达
刘冠鹤
王延峰
何艳
张凯
Current Assignee (The listed assignees may be inaccurate. Google has not performed a legal analysis and makes no representation or warranty as to the accuracy of the list.)
Zhengzhou University of Light Industry
Original Assignee
Zhengzhou University of Light Industry
Priority date (The priority date is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the date listed.)
Filing date
Publication date
Application filed by Zhengzhou University of Light Industry filed Critical Zhengzhou University of Light Industry
Priority to CN202310811122.8A priority Critical patent/CN116842730B/en
Publication of CN116842730A publication Critical patent/CN116842730A/en
Application granted granted Critical
Publication of CN116842730B publication Critical patent/CN116842730B/en
Active legal-status Critical Current
Anticipated expiration legal-status Critical

Links

Classifications

    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F30/00Computer-aided design [CAD]
    • G06F30/20Design optimisation, verification or simulation
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06NCOMPUTING ARRANGEMENTS BASED ON SPECIFIC COMPUTATIONAL MODELS
    • G06N3/00Computing arrangements based on biological models
    • G06N3/004Artificial life, i.e. computing arrangements simulating life
    • G06N3/006Artificial life, i.e. computing arrangements simulating life based on simulated virtual individual or collective life forms, e.g. social simulations or particle swarm optimisation [PSO]
    • GPHYSICS
    • G16INFORMATION AND COMMUNICATION TECHNOLOGY [ICT] SPECIALLY ADAPTED FOR SPECIFIC APPLICATION FIELDS
    • G16BBIOINFORMATICS, i.e. INFORMATION AND COMMUNICATION TECHNOLOGY [ICT] SPECIALLY ADAPTED FOR GENETIC OR PROTEIN-RELATED DATA PROCESSING IN COMPUTATIONAL MOLECULAR BIOLOGY
    • G16B30/00ICT specially adapted for sequence analysis involving nucleotides or amino acids
    • GPHYSICS
    • G16INFORMATION AND COMMUNICATION TECHNOLOGY [ICT] SPECIALLY ADAPTED FOR SPECIFIC APPLICATION FIELDS
    • G16BBIOINFORMATICS, i.e. INFORMATION AND COMMUNICATION TECHNOLOGY [ICT] SPECIALLY ADAPTED FOR GENETIC OR PROTEIN-RELATED DATA PROCESSING IN COMPUTATIONAL MOLECULAR BIOLOGY
    • G16B50/00ICT programming tools or database systems specially adapted for bioinformatics
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F2111/00Details relating to CAD techniques
    • G06F2111/04Constraint-based CAD

Abstract

The invention provides a DNA storage coding set construction method based on an improved firefly search algorithm, which is used for solving the problems of low base utilization rate, low coding quality and the like in the existing DNA storage coding set construction method. The method comprises the following steps: establishing a mathematical model of a DNA coding set taking the sum of Hamming distances among sequences as an objective function; constructing an improved firefly search algorithm by combining a pyramid structure and a direction adjustment mechanism; and an improved firefly search algorithm is adopted to construct a DNA storage coding set meeting combination constraint, so that coding efficiency and coding quality are ensured. The improved firefly search algorithm reduces the possibility of the original algorithm falling into local optimum, and improves the convergence rate of the algorithm; more DNA storage code sets meeting the constraint condition can be constructed within a certain base length. The coding set constructed by the invention meets hamming distance constraint, GC content constraint and no run length constraint, and has certain error correction capability and a plurality of coding advantages.

Description

DNA storage coding set construction method based on improved firefly search algorithm
Technical Field
The invention relates to the technical field of DNA coding set construction, in particular to a DNA storage coding set construction method based on an improved firefly search algorithm.
Background
The construction of the DNA encoding set is a key step in DNA storage. The DNA data storage technology is used as a novel storage mode and plays an important role in saving energy storage sources and promoting large data storage development. Encoding is a key technology in DNA storage, the result of which directly affects the performance of the storage and the integrity of the data read and write. Reasonably efficient encoding is important for the whole DNA storage system. For the DNA coding problem, the current research content is mainly divided into two aspects of coding quality and coding quantity. When solving the actual problem, the coding direction needs to be selected according to the key point of the problem: 1) Under the condition of sufficient coding quantity, the research aim is to improve the coding quality; 2) On the premise that the coding quality meets the requirement, the research aim is to improve the code quantity. Generally, quality optimization and DNA encoding amounts often fail to achieve win-win. There is no good way to solve this NP-complete problem, as the amount of DNA code always decreases with increasing constraints.
In 2001, marath et al proposed constructing the lower bound of a DNA coding set that satisfies the constraint using coding theory. In 2002 Tulpan et al used a random local search algorithm to generate a reliable set of DNA codes using constraints such as Hamming distance constraints, anti-Bumming distance constraints, and GC content constraints. The Kobayashiet et al proposes a template mapping strategy that makes it easy to construct DNA encoded sets of specific sequences, although the encoding length and distance constraints cannot be too great. In 2003 Tulpan et al replaced the single domain search in the random local search algorithm with the multi-domain search and produced a high quality set of DNA codes, and experimental results demonstrated the effectiveness of the algorithm after improvement. In 2005, gaborit et al studied the lower and lower bounds of the DNA coding set by a combination of linear construction and random search. In 2006, kawashimoet et al proposed a Dynamic Neighborhood Search (DNS) algorithm to design a DNA encoding set, which easily constructed an encoding set that satisfied similar hamming distance constraints. In 2020, zhang et al screened DNA storage codes using combined constraints and constructed DNA storage code sets using heuristic algorithms such as CLGBO and non-HHO; lenz et al propose a storage model for unordered sequence representation, deriving the Gilbert-Varshamov lower bound on the basis of error correction code reachability, and deriving the upper bound on the sphere wrapper. The DNA storage encoding problem may be equivalent to the DNA encoding screening problem that satisfies the combinatorial constraint. However, the efficiency of using conventional algorithms is too low due to the high computational complexity of the constraint.
Conventional electronic memory blocks are composed of a boot sector (address bits), data bits, and other check bits. DNA storage systems are also similarly divided into these sections, where DNA single strand storage structures typically include payload bits (payload) and non-payload bits (non-payload). These non-payload bits include address bits, check bits, and primers, etc., which are very important, and critical to properly address and read the complete data. However, in the conventional studies, encoding of non-payload bits, particularly address bits, is not emphasized to a sufficient extent. Thus, this work is focused mainly on building DNA storage coding sets to target non-payload bits.
Heuristic algorithms can provide a viable solution for each instance of the combinatorial optimization problem, which can be equivalent to the DNA encoding screening problem that satisfies the combinatorial constraints. However, the efficiency of using conventional algorithms is too low due to the high computational complexity of the constraint.
Disclosure of Invention
Aiming at the technical problems of low base utilization rate and low coding quality of the existing DNA storage coding set construction method, the invention provides a DNA storage coding set construction method based on an improved firefly search algorithm, which is used for solving the construction of the DNA coding set and constructing the DNA storage coding set meeting the combination constraint, and the coding efficiency and the coding quality are ensured.
In order to achieve the above purpose, the technical scheme of the invention is realized as follows: a DNA storage coding set construction method based on an improved firefly search algorithm comprises the following steps:
step one: modeling the DNA coding combination constraint as a target constraint optimization problem according to the characteristics of the DNA storage coding set;
step two: coding the DNA sequence, and solving the target constraint optimization problem in the first step by applying an improved firefly search algorithm based on a pyramid structure and a direction adjustment mechanism to obtain an optimal firefly individual; and converting the optimal firefly individual into a DNA sequence to obtain a DNA storage encoding set meeting the constraint condition.
Preferably, the objective function of the objective constraint optimization problem in the step one is the sum of hamming distances between sequences and:
objective function: fitness (S) = Σh (X, Y), where X, Y belongs to any two DNA sequences encoding set S;
the constraints for DNA sequences X and Y are: GC content 50%, no run length constraint and no address-independent constraint;
wherein H (X, Y) is the Hamming distance of the DNA sequences X and Y.
Preferably, the hamming distance is:
wherein H (X, Y) represents the DNA sequence X (X) 1 、x 2 、x 3 …x n ) And DNA sequence Y (Y) 1 、y 2 、y 3 …y n ) Hamming distance between them; h (x) i ,y i ) The method is used for judging whether two bases are identical, wherein different values are 1, and the identical value is 0; n represents the length of the DNA sequence, x i 、y i The ith base in DNA sequences X and Y, respectively;
the GC content is:
wherein |g| and |c| represent the numbers of G and C, respectively, in the DNA sequence X;
the runlength constraint free is that the DNA sequence does not contain repeated bases: for a length n of DNA sequence X (X 1 、x 2 、x 3 …x n ):
x i ≠x i+1 i∈[1,n-1];
The address-independent constraint is that one DNA sequence does not have a long enough suffix to be the prefix of another DNA sequence and vice versa: for a pair of DNA sequences X (X 1 、x 2 、x 3 …x n ) And Y (Y) 1 、y 2 、y 3 …y n ) The suffix of DNA sequence X cannot be used as a prefix of Y and vice versa; sequence (x) 1 、x 2 ...x s ) Sequence (y) n-s+1 ,y n-s+2 ...y n ) And sequence (y) 1 ,y 2 ...y s ) Sequence (y) n-s+1 ,y n-s+2 ...y n ) S is the prefix and suffix length.
Preferably, the steps of the improved firefly search algorithm based on the pyramid structure and the direction adjustment mechanism are as follows:
(1) Initializing required parameters and randomly generating an initial population, wherein each individual represents a coding set, and the size of the coding set is the dimension of the problem; the coding mode of the bases in the DNA sequence is as follows: 0-A,1-T,2-C,3-G;
(2) Calculating fitness value fitness of each individual, and according to the fitness value, arranging the population in ascending order to establish a pyramid population topological structure;
(3) Competition and collaboration strategy: in the pyramid population topological structure, firefly individuals in each layer can cooperate, and information transmission exists between the layers; firefly individuals learn toward individuals who are better than themselves when attempting to move toward a better direction; the firefly with the highest light intensity is positioned on the top layer;
(4) Direction adjustment mechanism: judging whether the local optimum is sunk or not; when the adjustment probability is greater than the generated random value, the population will continue to learn towards the optimal individuals in each layer; when the adjustment probability is smaller than the generated random value, generating candidate individuals, and changing the population into learning towards the candidate individuals to try to jump out of the local optimum;
(5) Obtaining a new firefly population, after finishing one iteration, recalculating the fitness value of each individual, updating the maximum fitness value, the minimum fitness value and the relevant positions, and updating the pyramid population topology structure;
(6) And (3) outputting an optimal firefly individual when the optimal result reaches the maximum iteration number or accuracy condition, otherwise, returning to the step (3).
Preferably, the method for establishing the pyramid population topological structure comprises the following steps: initializing required parameters and randomly generating initial population size N, maximum iteration times T and dimension d of a search space; randomly generating N firefly individuals, and giving the firefly individuals an initial position X= [ X ] 1 ,x 2 ,x 3 ,…,x N ,]Calculate each individual x i Is a fitness value of (a);
determining the number of individuals on each layer as n according to the proportion relation between the layers u U=1, 2,3, …, L, where L is the number of layers of the pyramid;
according to the fitness value of individuals in the population, sorting according to ascending order to obtain corresponding individuals X' = [ X ] 1 ′,x 2 ′,x 3 ′,…,x N ′]Particles n in the first layer in the individual X 1 Is distributed to the top of the pyramid, the next particle n 2 Assigned to a second layer, and so on; up to the last layer of particles n in the individual X L Is placed at the bottom of the pyramid;
for each layer: u (u) dx2 =u dx1 +n u-1
Pyramid structure P of the generated floor layer floor,1:ni =X′ udx1:udx2
u dx1 =u dx2 +1;
Wherein X 'is' udx1:udx2 Representing the u-th in the population after ascending order dx1 To u dx2 U dx1 、u dx2 Indicating that an individual index value is assigned to each layer.
Preferably, the implementation method of the competition and collaboration strategy comprises the following steps:
the method for updating firefly individuals in the first layer by cooperative cooperation of the first layer comprises the following steps:
where the decision parameter p=q+h, x l.i (t) represents the ith firefly individual in the first layer in the t-th iteration, x l . best (t) represents the best individual in layer I in iteration t, x l.j (t) jth firefly individual in layer i in the t-th iteration, x l.i (t) ith firefly individual in the first layer in the t-th iteration, β represents the attractive force between the two individuals, α B Is Brownian motion random step length, alpha L Is the random step size of the Lewy flight, the function sign provides a random direction, ++>Representing Lewy flight, < >>Representing brownian motion; learning factor c 2 The updated formula of (2) is +.>Wherein the learning factor c 1 =1-c 2 T represents the maximum number of iterations; rand of 1 Is [0,1]Random numbers uniformly distributed on the surface, rand 2 Is [ -1,1]Random numbers uniformly distributed on the surface, rand 3 Is [ -2,2]Random numbers distributed uniformly among them, symbol->Representing a term-wise multiplication; rand is [0,1]Random numbers uniformly distributed on the base;
the firefly individual updating mode with the highest brightness in each layer is as follows:
x l.best =r 1 ·x l.best +r 2 ·x l-1.best +r 3 ·x 1.best
wherein r is 1 、r 2 、r 3 Is [0,1]Constant of the two.
Preferably, the stochastic model of brownian motion at the x-point is as follows:
where the step size of the particle motion is defined by zero mean μ=0 and unit variance σ 2 A probability function defined by gaussian distribution of =1 is calculated;
random numbers based on the Lewy distribution are generated by a Mantegna method:
where x and y are two normally distributed variables,y=Normal(0,1),
wherein the method comprises the steps ofα=1.5。
Preferably, firefly individual x based on location update j Adjust its position in the solution space:
wherein x is j (t+1) represents the firefly individual x at the t+1st iteration j Is a position of (2); t is the current iteration number, the step factor alpha is a constant, and 0,1 is taken ]The method comprises the steps of carrying out a first treatment on the surface of the rand represents a random number, at [0,1]Obeying uniform distribution;
firefly individual x i For firefly individual x j Is of (1)β 0 Representing the attraction of the light source to fireflies;
firefly individual x i Relative to firefly individual x j The brightness of (2) is:
wherein I is i Is firefly individual x i Absolute brightness of (2); gamma represents the light intensity absorption coefficient, r ij Is firefly individual x i To firefly individual x j Is a euclidean distance of (c).
Preferably, the implementation method of the direction adjustment mechanism is as follows:
the number Num of times the population falls into the local optimum is used as a counter and initialized to 0;
when all fireflies areThe case where the optimal position of the volume is not improved in one iteration is: f (x) l.best ) t -f(x l.best ) t-1 When=0, the population falls into local optima; the parameter number Num is updated as: num=num+1; wherein f (x) l.best ) t The fitness value of the optimal position of the optimal firefly individual in the first layer is the fitness value of the optimal position of the optimal firefly individual in the t-th iteration;
the probability of the population adjusting the searching direction is the adjustment probability, and the adjustment probability
When the value of the adjustment probability is smaller than the generated random number, the population still can be obtained from the current optimal individual x l.best Learning in the middle;
when the value of the adjustment probability is greater than the generated random number, the population adjusts its search direction by learning the candidate individual.
Preferably, the candidate individuals will not only be globally optimal individuals x according to the pre-population l.best But also according to the excellent structure of other fireflies; the candidate individuals are generated by:
wherein Candida d Representing the value in d-dimension of the generated candidate individual;and->Respectively representing values of two firefly individuals selected randomly in the d dimension; gaussian (sigma) d ) A Gaussian offset value calculated from the standard deviation; f (x) l.k ) And f (x) l.m ) The fitness values of two individuals k and m are respectively represented; />Representing the value of the ith individual in the d dimension; average d Representing the average value of the population in the d dimension; n represents the number of individuals in the population; sigma (sigma) d Is a standard deviation variable reflecting the individual distribution in the firefly subgroup.
Compared with the prior art, the invention has the beneficial effects that: the improved firefly search algorithm (FAPDA) is used for constructing a DNA storage code set meeting combination constraint, so that the coding efficiency and coding quality are ensured, the possibility that the original algorithm falls into local optimum is reduced, the convergence rate of the algorithm is improved, and more DNA storage code sets meeting constraint conditions can be constructed within a certain base length. The coding set constructed by the invention meets hamming distance constraint, GC content constraint and no run length constraint, has certain error correction capability, and also has a plurality of coding advantages such as high robustness, low coding complexity and shorter coding time.
Drawings
In order to more clearly illustrate the embodiments of the invention or the technical solutions in the prior art, the drawings that are required in the embodiments or the description of the prior art will be briefly described, it being obvious that the drawings in the following description are only some embodiments of the invention, and that other drawings may be obtained according to these drawings without inventive effort for a person skilled in the art.
FIG. 1 is a flow chart of the present invention.
Fig. 2 is a schematic diagram of a pyramid structure.
FIG. 3 is a schematic representation of the evolution of a population in a complex search space, where (a) is individual x l.best Near local optimum, (b) for individual x l.best Near global optimum.
Fig. 4 is a graph of the adjustment probability of the present invention.
FIG. 5 is a schematic diagram of the evolution state in two dimensions at random, where (a) is in a first dimension and (b) is in a second dimension.
FIG. 6 is a graph comparing the convergence curves of fitness values of the algorithms over 12 functions.
Detailed Description
The following description of the embodiments of the present invention will be made clearly and completely with reference to the accompanying drawings, in which it is apparent that the embodiments described are only some embodiments of the present invention, but not all embodiments. All other embodiments, which can be made by those skilled in the art based on the embodiments of the invention without any inventive effort, are intended to be within the scope of the invention.
As shown in FIG. 1, a method for constructing a DNA storage coding set based on an improved firefly search algorithm is considered to build a pyramid model based on a plurality of groups of firefly search algorithms, the quality of individuals in a subgroup is improved through competition and cooperation among each layer of the pyramid model and between layers, a direction adjustment mechanism is added, the algorithm has the capability of jumping out of a part, candidate individuals are generated when the situation that the population falls into the local optimum is judged, and the population has the capability of jumping out of the part by learning the population individuals to the candidate individuals. For this reason, a firefly search algorithm (FAPDA) based on a pyramid structure and a direction adjustment mechanism is proposed. The DNA storage coding set meeting the combination constraint is constructed by the proposed FAPDA algorithm, so that the coding efficiency and the coding quality are ensured; the FAPDA algorithm reduces the possibility of the original algorithm falling into local optimum, improves the convergence rate of the algorithm, and ensures the coding efficiency and coding quality. By converting the DNA coding problem into the target constraint optimization problem, the invention can construct more DNA storage code sets meeting constraint conditions within a certain base length, and the constructed code sets meet Hamming distance constraint, GC content constraint and no run length constraint, have certain error correction capability, and have a plurality of coding advantages such as high robustness, low coding complexity and shorter coding time. The specific implementation method of the invention comprises the following steps:
Step one: according to the characteristics of the DNA storage code set, the DNA coding combination constraint is modeled as a target constraint optimization problem, and for the convenience of calculation, 0, 1, 2 and 3 are used for representing bases A, T, C, G in a DNA sequence.
The DNA sequence is effectively encoded, so that the throughput rate can be improved, the storage capacity can be increased, and the probability of error occurrence in the storage process can be reduced. Reasonable coding has a supporting effect that is difficult to replace for the integrity of data and the robustness of a DNA storage system. The problem of combinatorial constraints in DNA coding was first addressed by Garzon et al, and later, although a number of different combinations of DNA coding constraints were used by the scholars, in the research literature in recent years, the combination of coding constraints chosen was commonly used for four indicators, base continuity, hamming distance constraints, G-C content constraints, and no run length constraints.
For any pair of length n DNA sequences X (X 1 、x 2 、x 3 …x n ) And Y (Y) 1 、y 2 、y 3 …y n ) The Hamming distance constraint is expressed as H (X, Y). Gtoreq.d, where H (X, Y) represents the number of positions of the corresponding sequence that differ between DNA sequences X and Y. The Hamming distance calculating method comprises the following steps:
wherein H (X, Y) represents the DNA sequence X (X) 1 、x 2 、x 3 …x n ) And DNA sequence Y (Y) 1 、y 2 、y 3 …y n ) Hamming distance between them; h (x) i ,y i ) The method is used for judging whether two bases are identical, wherein different values are 1, and the identical value is 0; n represents the length of the sequence, x i 、y i The ith base in DNA sequences X and Y, respectively. Hamming distance is used to describe the magnitude of the similarity of two sequences, and the smaller its value, the higher the similarity. This means that the smaller the number of different bases between two DNA codes, the number of identical basesThe more; thus, the greater the likelihood of non-specific hybridization between DNA sequences.
GC content is the ratio of the base G, C in the total number of bases in a DNA strand. Generally, GC content of about 50% is not prone to error and stability. The GC content of the DNA sequence X of length n is designated GC (X). The GC content was calculated using the following formula:
wherein |G| and |C| represent the numbers of G and C in the DNA sequence X, respectively, and n is the length of the DNA sequence X.
No run length constraint is that the DNA sequence should not contain repeated bases, and long runs of the same nucleotide can result in undesirable secondary structure of the DNA code, affecting the reliability of DNA storage. For example, in ATTTAC, T is repetitive, so multiple T are easily interpreted as a small number of T during synthesis and sequencing, increasing the loss rate of DNA information and reducing read-write coverage. For a length n of DNA sequence X (X 1 、x 2 、x 3 …x n ):
x i ≠x i+1 i∈[1,n-1] (4)
Address-independent constraints refer to that in DNA storage, molecules in the reaction cell must be prefixed with a specific address in order to recover the data information of the DNA storage. The address sequences must not be similar to avoid failure to retrieve the data block information. Thus, address bits require a special encoding, which can be imposed by address-independent constraints. The result of the address-uncorrelation constraint is that one sequence does not have a sufficiently long suffix to be the prefix of another sequence and vice versa. The code set obtained by the irrelevant address constraint not only can eliminate the cross hybridization between the addresses, but also can avoid the sequence selection error in the sequencing process. For a pair of DNA sequences X (X 1 、x 2 、x 3 …x n ) And Y (Y) 1 、y 2 、y 3 …y n ) The suffix of X cannot be used as a prefix for Y and vice versa. The prefix and suffix length is defined as s, orderColumn (x) 1 、x 2 ...x s ) Sequence (y) n-s+1 ,y n-s+2 ...y n ) And sequence (y) 1 ,y 2 ...y s ) Sequence (y) n-s+1 ,y n-s+2 ...y n ). Here s=3 is defined.
In the invention, the sum of Hamming distances of one constraint is used as a fitness function of a DNA constraint coding process, and other constraint conditions are used as objective function constraints. The sum of Hamming distances among sequences in a DNA coding set is used as an adaptability function of a DNA constraint coding process in a DNA coding optimization model, and the constraint conditions comprise that the GC content is 50%, and the constraint of no run length and the constraint of no address correlation are included:
Objective function: fitness (S) = Σh (X, Y), where X, Y belongs to any two DNA sequences of coding set S
The purpose of coding set design in DNA storage is to construct a set of DNA sequences of a given length n. The set of DNA coding sequences is constructed for encoding the address by employing a given optimization algorithm. In order to make more economical and efficient use of DNA sequences, the set of given lengths is made as large as possible, the encoded DNA is more stable, and fewer errors are generated in the reaction. For the construction of DNA storage codes, the aim is to further increase the coding set size and coding rate under the same constraint requirements.
Step two: each individual is expressed as a coding set, and an improved firefly search algorithm (FAPDA) based on a pyramid structure and a direction adjustment mechanism is applied to solve a target constraint optimization problem, so that an optimal firefly individual is obtained; and converting the optimal firefly individual into a DNA sequence, and screening a DNA storage encoding set meeting the constraint condition.
The method comprises the following steps:
(1) Initializing required parameters and randomly generating an initial population, wherein each individual represents a coding set, and the size of the coding set is the dimension of the problem. The coding mode of the bases in the DNA sequence is as follows: 0-A,1-T,2-C,3-G.
(2) And calculating fitness value fitness of each individual, and according to the fitness value, arranging the population in ascending order to establish a pyramid population topological structure.
(3) Competition and collaboration strategy: in the pyramid population topology structure, firefly individuals in each layer can cooperate, and information transmission exists between the layers. In particular, firefly individuals may also learn toward individuals who are better than themselves when attempting to move toward a better direction. The firefly with the highest light intensity is positioned on the top layer and affects the firefly individuals on each layer below.
(4) Direction adjustment mechanism: judging whether the algorithm falls into the local optimum or not, recording the number of times of falling into the local by using a counter, and enabling the population to enter the local optimum more likely along with the increase of the count value, wherein the population is enabled to adjust the searching direction. When the adjustment probability is greater than the generated random value, the population will continue to learn towards the optimal individuals in each layer; when the adjustment probability is smaller than the generated random value, generating candidate individuals, and the population is turned to learn towards the candidate individuals to try to jump out of the local optimum.
(5) And (3) obtaining a new firefly population, after finishing one iteration, recalculating the fitness value of each individual, updating the maximum fitness value, the minimum fitness value and the relevant positions, and updating the pyramid population topology structure.
(6) And (3) outputting an optimal DNA storage coding set when the maximum iteration number or accuracy condition is reached by the optimization result, otherwise, returning to the step (3).
The Yang of the university of Cambridge in England is inspired by the actions of fluorescence brightness communication and aggregation for fireflies, a new group intelligent optimization algorithm is provided, and the fireflies search algorithm has a unique evolution mechanism and strong local search capability, and can be compared with other group intelligent optimization algorithms in various optimization problems and application fields. Meanwhile, in order to make the flow of the algorithm clearer, the mathematical model is simpler, and the following rules are adopted:
(1) Fireflies, whether female or male, move toward the brightest and nearest firefly, and if there is no such firefly in the surroundings, they do no target movement in the original location.
(2) The intensity and distance determine the extent to which fireflies attract each other. The brighter the fluorescence emitted by the firefly, the more likely other fireflies can be allowed to move to it. This attractive force increases with increasing light intensity and decreases with increasing distance.
(3) The optimization function represents the brightness of the firefly in the firefly search algorithm. The better the fitness value of the objective function, the brighter the fireflies, their position being the solution of the optimization function. The population is updated by evolution and mutual attraction, and other fireflies move towards the brightest individuals, thereby updating the population.
In firefly search algorithms, brightness and mutual attraction determine individual characteristics. The individual judges the position according to the brightness and selects the moving direction; while attractive forces control the range of their movement. The individual brightness is continuously changed in the population evolution process, and finally iteration termination conditions are reached and an objective function optimal solution is obtained. In the firefly search algorithm, the distance of individuals i to j is denoted as r ij The euclidean distance is typically used in the calculation, namely:
where d represents the dimension of the search space. X is x i And x j Respectively representing the ith individual and the jth individual in the population, x i,k And x j,k Representing the values of individual i and individual j in the k dimension, respectively.
The mutual attraction between fireflies is caused by the difference in their luminous intensity. The attractive force of firefly i to firefly j can be usedThe description is proportional to the relative brightness of firefly individual j. Wherein beta is 0 Representing the attraction of a light source to fireflies, the light source having a position r ij =0, where fireflies are most attractive, beta is generally chosen 0 =1, γ is the light intensity absorption coefficient.
The brightness of firefly i relative to firefly j is:
wherein I is i Is the absolute brightness of firefly i, representing the objective function value of the potential solution represented by firefly i; gamma denotes the light intensity absorption coefficient, and the medium has a refractive index and a scattering coefficient for the light wave, which lead to gradual attenuation of the intensity of the light wave as it propagates therein, and is generally constant.
Based on the location update, firefly j adjusts its own location in the solution space, which adjustment depends on the brightness, distance, and attraction between fireflies i and j.
Wherein x is j (t+1) represents the position of firefly j at the t+1st iteration; beta ij Representing the attraction of firefly i to firefly j; t is the current iteration number of the algorithm, the step factor alpha is a constant, and [0,1 ] is taken]The method comprises the steps of carrying out a first treatment on the surface of the rand represents a random number, at [0,1]And is subject to uniform distribution.
In firefly search algorithms, individuals only experience light intensity to make movements to light, which lack diversity and tend to converge prematurely or fall into local optima. Stimulated by the phenomenon of "multilaminate" in the real world, pyramid topologies are used to organize population individuals. As shown in FIG. 2, the pyramid operates in the shape of its name, with one or more leaders at the top and multiple leaders at the bottom, specifically, the firefly individuals on the top layer may direct all firefly individuals in the pyramid, while firefly individuals on other layers may direct firefly individuals on each layer below. This topology can distribute responsibility more evenly for each individual than conventional topologies. At the same time, the mass of the individual determines the location of the individual. The better a firefly individual is, the higher its level in the pyramid.
It is assumed that the optimization problem is a minimization problem and its layers are divided according to the fitness of the individuals in the population. Assume that there are N individuals x= [ X ] in this population 1 ,x 2 ,x 3 ,…,x N ]Determining that the number of individuals on each layer is n according to the proportion relation among layers u U=1, 2,3, …, L, where L is the number of layers of the pyramid, can be based on the fitness value f= [ F (x 1 ),f(x 2 ),f(x 3 ),…,f(x N )]Sequencing according to ascending order to obtain corresponding individual X' = [ X ] 1 ′,x 2 ′,x 3 ′,…,x N ′]Then particles n in the first layer in individual X 1 Is distributed to the top of the pyramid, the next particle n 2 Assigned to a second layer, and so on. This process is repeated until the last layer of particles n in the individual X L Is placed at the bottom of the pyramid. Finally, the pyramid is complete and the construction process can be described in algorithm 1.
Line 9, P of Algorithm 1 u,j Represents the jth firefly individual of the ith layer of the pyramid. P (X', L, n) u ) Representing a function of building a pyramid in pseudo code, u dx1 、u dx2 Index value, P, representing the allocation of individual for each layer floor,1:nu Representing the pyramid structure of the generated floor layer, X' udx1:udx2 Representing the u-th in the population after ascending order dx1 To u dx2 Is a subject of (a). It can be seen that this pyramid has the following properties:
(1) Each firefly individual is assigned to a particular layer;
(2) The higher the layer of firefly individual is, the better the adaptability value is; the globally best firefly individual is located at the top layer, while the worst firefly individual is located at the bottom layer;
(3) Firefly individuals on the same floor have close fitness values.
In firefly search algorithms, interactions between individuals are achieved through mutual attraction, rather than through a direct competing mechanism. In the firefly search algorithm, each firefly individual attracts firefly individuals at different distances according to the light-emitting brightness of the firefly individual, and the fireflies tend to move more toward the direction causing stronger light intensity attraction to the fireflies themselves. The direction and distance of movement of the individual will also be adjusted according to the brightness and distance of the surrounding fireflies. Such location updating is based on an attractive relationship rather than by competition.
When the firefly search algorithm runs, two conditions of relatively scattered firefly individuals or relatively concentrated firefly individuals exist. The presence of these two conditions may lead to different results in firefly populations. When individuals in a firefly population are relatively dispersed, the firefly population may be locally optimal due to limitations in attractive force calculation and movement modes, and cannot jump out. When firefly populations are relatively concentrated, the fireflies of these solutions may attract each other, resulting in reduced diversity of individuals in the population. Meanwhile, since the firefly search algorithm needs to calculate the distances and attractions between all firefly individuals, this may result in a large calculation cost, especially in the case of a large search space.
To address this problem, a more competitive strategy was introduced in firefly search algorithms. Besides the operation of increasing randomness or diversity to keep the diversity of the population, the position can be updated by selecting proper firefly individuals to attract the firefly individuals, so that the cost caused by calculation can be greatly reduced, and the calculation speed is improved.
First, when the pyramid is built, all firefly individuals participate in the ranking to determine their number of layers in the pyramid. Since the ranking involves all firefly individuals, it is referred to as a global competition strategy. Secondly, each firefly individual is placed in a specific layer, and different treatments can be carried out on the firefly individual in the same layer, so that the efficiency is improved. According to this idea, a learning strategy for a layer of firefly individuals was introduced. In each layer, all firefly individuals will be attracted by the light intensity of firefly individuals in the same layer, and at the same time, learn towards firefly individuals with the highest brightness in the same layer, other firefly individuals will not be affected by firefly individuals from other layers except firefly individuals with the highest brightness in each layer, which is called a local collaborative strategy. And for the highest brightness individual in each layer, it will be affected by the firefly individual from the highest brightness in the previous layer.
Inspired by the analysis, a new cooperation strategy is provided, namely fireflies in each layer can cooperate in the built pyramid, and information transmission exists between the layers. In particular, fireflies can also learn to individuals better than themselves when attempting to move in a better direction, with firefly individuals with the highest light intensity being located on the top layer affecting firefly individuals on each layer below. Rather than merely updating the location by attraction between individuals. Mathematically, the update of firefly individuals in layer i is:
wherein, p=q+h, x l.i (t) represents the ith firefly individual in the first layer, x in the t-th iteration l.best (t) represents the best individual in layer I in iteration t, x l.j (t) jth firefly individual in layer i in the t-th iteration, x l.i (t) ith firefly individual in the first layer in the t-th iteration, β represents the attractive force between the two individuals, α B Is Brownian motion random step length, alpha L Is the random step size of the Lewy flight, the function sign provides a random direction, ++>Representing Lewy flight, < >>Representing Brownian motion, p representing decision parameters; learning factor c 2 The updated formula of (2) is +.>Wherein the learning factor c 1 =1-c 2 T represents the maximum number of iterations. rand of 1 Is [0,1]Random numbers uniformly distributed on the surface, rand 2 Is [ -1,1]Random numbers uniformly distributed on the surface, rand 3 Is [ -2,2]Random numbers uniformly distributed among the random numbers, t is the current iteration number, and the symbol +.>Representing a multiplication term by term, T being the set maximum number of iterations.
Brownian motion is a stochastic strategy that simulates the random motion of particles suspended in a liquid or gas without stopping. The motion of the particles in brownian motion is random, and the direction and distance of each step is determined by normally distributing random numbers. In the continuous time brownian motion, the position of the particles changes with time, and the trajectories thereof exhibit irregular fluctuations. In each time step the movement of the particles is only dependent on the current position and not on the previous position, so that brownian motion is a markov process. And the distribution of the moving distance is subjected to normal distribution in different time periods, and the Brownian motion has scale invariance. The stochastic model of brownian motion at the x-point is as follows:
where the step size of the particle motion is defined by zero mean μ=0 and unit variance σ 2 A gaussian distribution-defined probability function of =1 is calculated.
The Levin flight is a probability distribution with exponential variance and long tail distribution characteristics, and has relatively high probability of larger strides in the random walking process. In order to better measure the motion trail of the lewy flight, mantgna proposes an accurate and rapid method for generating the lewy trail stabilization process in the range of 0.3 to 1.99. Random numbers based on the Lewy distribution are generated as follows using the method of Mantegna.
Where x and y are two normally distributed variables,y=Normal(0,1),
wherein the method comprises the steps of
The randomness of Brownian motion is adopted in the early stage of the algorithm to replace the traditional uniform distribution in the firefly search algorithm, so that the performance can be better exerted. Similarly, the randomness of the Lewy flight is used in the later stage of the algorithm to replace the traditional uniform distribution in the firefly search algorithm, so that the characteristics of small step length and occasional long jump behaviors of the Lewy flight are exerted to balance the global search capability and the local development capability of the algorithm.
The firefly individual updating mode with the highest brightness in each layer is as follows:
x l.best =r 1 ·x l.best +r 2 ·x l-1.best +r 3 ·x 1.best (11)
wherein r is 1 、r 2 、r 3 Is [0,1]Constant of the two.
The contention and collaboration policy may be described as algorithm 2.
When searching in complex space, firefly populations may fall into local optima. For fireflies in a layer of pyramid, the evolution state can be described with FIG. 3, where firefly x l.best The individual with the highest brightness in the pyramid layer is represented by a curve which shows the change trend of the fitness value in the dimension. In FIG. 3 (a), each firefly, e.g., x l.1 And x l.2 Can all be along x l.best And (5) searching the direction, and entering local optimum after multiple iterations. Let f (x) l.best ) t And the fitness value of the optimal position of the optimal firefly individual in the first layer at the t-th iteration is represented. After each iteration, x will be i And x l.best Comparing, updating individual x l.best And fitness value f (x l.best ) t . The case when the optimal position of all firefly individuals is not improved in one iteration can be described as:
f(x l.best ) t -f(x l.best ) t-1 =0 (12)
in the formula, t is the iteration number. Clearly, the feature that the optimal position of all fireflies is not improved after one iteration may reflect that the population falls into a local optimum. In order to avoid that the group falls into a locally optimal state, the efficiency of the algorithm is guaranteed, and the search direction of the group can be adjusted when the equation is established. However, the decision of the search direction of a population cannot be adjusted solely by means of the above-mentioned situation, especially when the population searches in a complex solution space. As shown in FIG. 3 (b), the optimum value x of the firefly of the layer l.best Is a firefly that is near global optimum. In iteration t, fireflies in the population, e.g. x 1 (t) and x 2 (t) can be along x l.best The optimal position of these firefly individuals is not improved in iteration 1, in which case equation f (x l.best ) t -f(x l.best ) t-1 Also, 0 is satisfied. Frequent information exchange may speed up the exchange of erroneous information, whereas sporadic information exchange may impair the exchange performance of multiple groups. Therefore, in order to fully exert individual x l.best Is to provide a delay-based search direction adjustment mechanism for checking whether a subgroup has been stopped, activating the stopped subgroup to evolve again, and adjusting the search direction at an appropriate time to prevent the subgroup from falling into the officeAnd the best.
In fact, as the number of occurrences increases where the optimal position of all fireflies is not improved, the probability of the population falling into a local optimum increases, and then the individual needs to adjust its search direction in time. The number of times a population falls into a part is denoted by Num, which is used as a counter and initialized to 0. If all fireflies have not improved after one iteration, the parameter Num can be updated as:
Num=Num+1 (13)
obviously, as the value of the parameter Num increases, individuals are more likely to enter local optima, meaning that the population should adjust its search direction. The probability of the population to adjust the searching direction is expressed as adjustment probability, and the adjustment probability Prob adjust Taking according to experience:
the adjustment probabilities here are updated after each iteration. FIG. 4 shows the variation of the adjustment probability with the number of iterations t, if the adjustment probability is greater than 1, the population will stop from the current optimal individual x l.best And learning and adjusting the searching direction to be the position of the new firefly individual. The specific procedure of the delay-based search direction adjustment mechanism is shown in algorithm 3.
As shown in FIG. 4, when the number of times Num that the population is trapped in a part is small, the adjustment probability is determined by Num and is likely to be smaller than the generated random number, then the population will still be from the current optimal individual x l.best Learning in the middle. Optimal individual x l.best Will be fully utilized in the next few iterations, especially when the optimal individual x is l.best Near the global optimum as shown in fig. 3 (b). If all fireflies have not improved in the next few iterations, the value of the parameter Num will increase and the population is more likely to fall into local optimum, then the value of the adjustment probability will be greater than 1 and the population will adjust its search direction by learning a new firefly. The new firefly is called a candidate individual, and then, is identified as a candidateAfter learning for a period of time from the individual, the population may jump out of the current local optimum. In summary, the strategy can fully utilize the optimal individual x l.best To help the population jump out of local optimum.
In the direction adjustment strategy, all fireflies start from the optimal individual x l.best The population uses a counter to determine if it needs to adjust its search direction. When a population may be trapped in a locally optimal state, the search direction will be adjusted by learning a candidate new firefly individual. Random candidate individuals are generated in the search space, and the population can learn from the generated candidates and jump out the current local optimum. However, the ability of randomly generated candidates to guide the evolution of a population is difficult to guarantee, especially when searching in complex space, where the candidates are more likely to guide the population into another part. In order to effectively generate candidate individuals, a self-learning-based candidate generation strategy is proposed to fully utilize excellent firefly history optimal solution structures in each layer.
Obviously, the current x l.best Is still the best and x l.best Good solution structure is still maintained in most dimensions. Thus, in generating candidate individuals, x l.best The solution structure of (a) is worth learning. Furthermore, since the fitness value of a firefly is determined by the structure of all dimensional solutions of the individual, it is expressed as:
thus, firefly individuals with slightly poorer fitness values may have better solution structures in certain dimensions, which also deserve learning in this case.
FIG. 5 shows the evolution of a population in two random dimensions, with fireflies searching for maxima in the dimensionsThe curve shows the trend of the fitness value in each dimension. X is x l.best Is the globally optimal individual for the current population, x l.1 And x l.2 Two individuals in the same layer at random. It can be seen that x l.best The structure of the solution in the first dimension is best, and the individual x l.1 The structure of the solution in the second dimension is better than the optimal individual x l.best . Thus, the candidate individuals will not only be based on x l.best But will also be generated according to the excellent structure of other fireflies.
Randomly selecting two firefly individuals x in the same layer l.k And x l.m And by sigma d To select the more excellent one. Since the selection process is random, in theory all fireflies in the population can provide their own search information to candidate firefly individuals. Candidate individual may obtain an advantage over the current individual x l.best And more likely to approach global optima, so that a population may jump out of local optima by adjusting its search to candidate individual directions, after a period of iteration, a global optimum may be found. The candidate individuals are generated by:
wherein Candida d Representing the value in d-dimension of the generated candidate individual;and->Representing values of two randomly selected individuals k and m in d-dimension; gaussian (sigma) d ) A Gaussian offset value calculated from the standard deviation; f (x) l.k ) And f (x) l.m ) The fitness value of two individuals k and m is represented; />Representing the value of the ith individual in the d dimension; average d Representing the average value of the population in the d dimension; n represents the number of individuals in the population; sigma (sigma) d Is a standard deviation variable reflecting the individual distribution in the firefly subgroup. As the probability of candidates increases, the generated candidates will be more similar to x l.best Also, candidates may be equal to x l.best There are a large variety of ways and the ability to guide the population to jump from the current local optimum. The strategy-specific process of candidate individual based on self-learning is shown in algorithm 4
/>
The FAPDA searching algorithm is based on the parameter self-adaptive strategy, the population of the firefly searching algorithm is divided again, a pyramid model is built, and a cooperative strategy is added between pyramid layers and between layers; meanwhile, a direction adjustment mechanism based on delay is added to avoid the algorithm from being trapped in a local part. The specific steps of the FAPDA search algorithm are as follows:
step one: initializing the relevant parameters such as population size N, iteration times T, individual dimension d and the like.
Step two: n firefly individuals were randomly generated and given an initial position x= [ X ] 1 ,x 2 ,x 3 ,…,x N ,]Calculate each individual x i The fitness value f (x) i ) And (3) according to the fitness value, carrying out ascending arrangement on the population, and establishing a pyramid population topological structure.
Step three: competition and collaboration strategy: in the pyramid topology, firefly individuals in each layer can cooperate, and information transfer exists between layers. In particular, firefly individuals may also be directed to better individuality than themselves when attempting to move in a better direction. The fireflies in each layer update the position according to equation (8). The firefly with the highest light intensity is positioned on the top layer and affects the firefly individuals on each layer below. And updating the position of the optimal firefly individual in each layer according to the formula (11).
Step four: and (3) direction adjustment: according to formula (12), it is determined whether the algorithm falls into the local optimum, the number of times of falling into the local is recorded with a counter, and as the count value increases, the population is more likely to enter the local optimum, which means that the population should adjust its search direction. When the adjustment probability is greater than the generated random value, the population will continue to learn towards the optimal individuals in each layer; when the adjustment probability is less than the generated random value, candidate individuals will be generated according to equation (16), and the population will turn to learn towards the candidate individuals, attempting to jump out of local optimum.
Step five: and (3) obtaining a new firefly population, after finishing one iteration, recalculating the fitness value of each individual, updating the maximum fitness value, the minimum fitness value and the relevant positions, and generating a new pyramid structure according to the second step.
Step six: the optimization result is output when the maximum iteration number or accuracy condition is reached, otherwise, the step three is returned.
Algorithm pseudocode is shown in algorithm 5.
/>
In order to verify the optimizing capability and robustness of the FAPDA, comprehensive comparison analysis is carried out with an improved firefly search algorithm (FABMLA), a firefly search algorithm (FA), an adaptive firefly search algorithm (GDFA), a Harris hawk algorithm (HHO) and a particle swarm algorithm (PSO). For experimental fairness, the size of the comparison algorithm (N) and the maximum number of iterations (T) were set to 50 and 1000, respectively. The specific parameter settings are as in table 1, with each algorithm running independently 30 times on each reference function. The laboratory environment is matlab2020b, windows10 operating system, running memory 4G.
To evaluate the performance of the proposed improved firefly search algorithm, simulation experiments used 12 benchmark functions, which were divided into three categories: comprising a unimodal function (F 1 -F 6 ) And a multimodal function (F 7 -F 12 ) To evaluate the performance of the improved firefly search algorithm. The unimodal basis function tests the performance of the algorithm in terms of convergence speed and local development capability. The multimodal function contains a number of locally optimal solutions, and the higher the dimension, the greater the number of solutions. Therefore, these multimodal functions can further test the exploratory ability and global optimization performance of the algorithm. Table 2 shows the benchmark functions used in the experiments. Meanwhile, in order to further detect the searching capability of various algorithms, the dimension of each test function is increased to 30 dimensions.
Table 1 parameter settings for each algorithm
/>
TABLE 2 benchmark test function
Table 3 records the mean, standard deviation, best and worst fitness values (best results highlighted in bold) after 30 runs of each algorithm. As can be seen from table 3, the proposed FAPDA algorithm finds the optimal solution on all the unimodal functions. Furthermore, the FAPDA algorithm can achieve better results than other competitors on most unimodal functions. For example in test function F 1 、F 2 、F 3 And F 4 In this way, the FAPDA finds the theoretical optimal fitness value, and has a larger improvement in both the convergence speed and the final convergence accuracy compared with the algorithms FABMLA, GDFA, PSO, FA and HHO. For function F 5 The FAPDA algorithm performs substantially the same as the FABMLA algorithm, but is far superior to other algorithms. For function F 6 The final convergence accuracy of the FAPDA algorithm is far beyond that of the contrast algorithm.
In the multimodal function, function F 7 In the test of (a), the optimization result of the FABMLA algorithm far exceeds that of other algorithms, and the function F is divided 7 In addition, algorithm FAPDA is at F 8 、F 9 、F 10 、F 11 And F 12 The optimal value and the average value of (a) are obviously superior to other algorithms. Notably, the FAPDA algorithm is at F 8 The global optimum can be obtained above, while other algorithms cannot. At F 11 And F 12 The FABMLA algorithm is next to FABMLA, and ranks second among all algorithms.
Fig. 6 shows a comparison of the convergence performance of 6 algorithms on 12 benchmark functions. The FAPDA algorithm has a faster convergence rate than other algorithms on most unimodal functions. Although FABMLA was adaptively changed by parameters, HHO was given to FAPDA by a combination of four predation modes. According to the characteristics of the unimodal function, the FAPDA can be said to have good development capability. In most multimodal functions, FAPDA has a better convergence speed and sufficient accuracy. The FABMLA algorithm is only based on the function F 7 Superior to FAPDA. The experiment shows that the FAPDA algorithm still has excellent performance in exploring capability, and the FAPDA algorithm benefits from high population diversity brought by cooperative strategies among populations under the framework of a pyramid, and the searching direction adjustment mechanism based on delay has the capability of avoiding sinking into local optimum. In conclusion, the search performance of the algorithm is effectively improved based on the pyramid structure and the addition of the direction adjustment mechanism. In addition, the direction adjustment mechanism can effectively jump out of local optimum.
Table 3 statistics of algorithms on 12 reference test functions of 30 dimensions
/>
In order to demonstrate whether there is a significant difference between the proposed algorithm and other algorithms, a significant test is required in addition to having verified the advantages of FAPDA in terms of mean, standard deviation, convergence and stability. The Wilcoxon's rank sum test is a method used to compare whether two sets of data differ. It need not assume that the data obeys a certain distribution, but only that the two sets of data are independent and have similar degrees of variation. The basic idea is to merge two sets of data and then rank each data in order of size. The sum of the ranks of each set of data is then calculated to see if they are similar. If the two sets of data are very similar, then the sum of their rankings should also be very similar. If there is a significant difference between the two sets of data, then there will be a large difference in the sum of their rankings. As can be seen from Table 4, the FAPDA algorithm divides the function F 7 The other 11 test functions are significantly better than the FABMLA algorithm. FAPDA has significant differences from GDFA in 12 baseline test functions. FAPDA and PSO all differ significantly in 12 test functions. There were significant differences in the 12 baseline test functions for FA, FAPDA. FAPDA and HHO all differ significantly in 12 test functions. In summary, FAPDA is significantly better than the other five algorithms in most test functions. Therefore, it can be explained that the FAPDA algorithm has better performance.
Table 4 comparison of FAPDA with five other algorithms on 12 benchmark functions
TABLE 5 MAE average ranking over 12 benchmark functions
Finally, to quantify the performance of the algorithms, the mean absolute error (Mean Absolute Error, MAE) values of the various algorithms are calculated and ranked. MAE is an effective statistical method that can show the difference between the results and the actual values. The MAE formula is as follows:
wherein m is i Is the optimal value of the algorithm, k i Representing the corresponding actual value of the reference function, N is the number of reference functions. Table 5 shows MAE values and ranks of experimental algorithms, and it is apparent that FAPDA performs better than other algorithms.
Limit of DNA storage code: a DNA code set with length of n, hamming distance d and meeting Hamming distance constraint, GC content constraint, address irrelevant constraint and no repeated base constraint is defined as A GC-NL-UL (n, d, w). The results in Table 6 are lower bound values where 4.ltoreq.n.ltoreq.10, 3.ltoreq.d.ltoreq.n satisfies the constraint. Compared with the prior literature results, under the condition of the same constraint condition and a certain sequence length, a larger coding set is constructed, so that larger address space is coded, more effective information can be addressed by using a shorter sequence, and the cost is reduced.
TABLE 6 DNA storage encoding set lower bound
The invention provides a firefly search algorithm based on a pyramid structure and a direction adjustment mechanism to solve the problem of insufficient search capability of the firefly search algorithm, firstly, a topological structure of a pyramid mode is built, responsibility is distributed more uniformly for each individual, and the diversity of the algorithm is improved; secondly, in the iterative process, a more competitive strategy is introduced, and the position is updated by selecting proper attraction of firefly individuals to other individuals, so that better competition and cooperation among populations are performed, and the calculation speed is improved. Finally, in order to avoid the population from falling into the local optimum, a direction adjustment mechanism based on a delay strategy is used, and when the population falls into the local, a candidate solution is generated, so that the population has the capability of jumping out of the local. Experiments were performed on 12 unimodal, multimodal and complex basis functions with different characteristics and compared with several other optimization algorithms. Experimental results show that the FAPDA algorithm has better performance in the aspects of searching capacity, development strength and ability of jumping out of local optimum. In addition, the FAPDA algorithm was also applied to the pressure vessel design problem, and satisfactory results were obtained. This illustrates that the FAPDA algorithm is an effective optimization technique. The present invention achieves satisfactory results by applying the improved firefly algorithm to the encoding problem in DNA storage.
The foregoing description of the preferred embodiments of the invention is not intended to be limiting, but rather is intended to cover all modifications, equivalents, alternatives, and improvements that fall within the spirit and scope of the invention.

Claims (10)

1. The method for constructing the DNA storage coding set based on the improved firefly search algorithm is characterized by comprising the following steps of:
step one: modeling the DNA coding combination constraint as a target constraint optimization problem according to the characteristics of the DNA storage coding set;
step two: coding the DNA sequence, and solving the target constraint optimization problem in the first step by applying an improved firefly search algorithm based on a pyramid structure and a direction adjustment mechanism to obtain an optimal firefly individual; and converting the optimal firefly individual into a DNA sequence to obtain a DNA storage encoding set meeting the constraint condition.
2. The method for constructing a DNA storage encoding set based on the improved firefly search algorithm as claimed in claim 1, wherein the objective function of the objective constraint optimization problem in the first step is a sum of hamming distances between sequences and:
objective function: fitness (S) = Σh (X, Y), where X, Y belongs to any two DNA sequences encoding set S;
The constraints for DNA sequences X and Y are: GC content 50%, no run length constraint and no address-independent constraint;
wherein H (X, Y) is the Hamming distance of the DNA sequences X and Y.
3. The method for constructing a set of DNA storage codes based on an improved firefly search algorithm as claimed in claim 2, wherein said Hamming distance is
Wherein H (X, Y) represents the DNA sequence X (X) 1 、x 2 、x 3 …x n ) And DNA sequence Y (Y) 1 、y 2 、y 3 …y n ) Hamming distance between them; h (x) i ,y i ) The method is used for judging whether two bases are identical, wherein different values are 1, and the identical value is 0; n represents the length of the DNA sequence, x i 、y i The ith base in DNA sequences X and Y, respectively;
the GC content is:
wherein |g| and |c| represent the numbers of G and C, respectively, in the DNA sequence X;
the runlength constraint free is that the DNA sequence does not contain repeated bases: for a length n of DNA sequence X (X 1 、x 2 、x 3 …x n ):
x i ≠x i+1 i∈[1,n-1];
The addresses are not phase-separatedA constraint is that one DNA sequence does not have a long enough suffix to be the prefix of another DNA sequence and vice versa: for a pair of DNA sequences X (X 1 、x 2 、x 3 …x n ) And Y (Y) 1 、y 2 、y 3 …y n ) The suffix of DNA sequence X cannot be used as a prefix of Y and vice versa; sequence (x) 1 、x 2 ...x s ) Sequence (y) n-s+1 ,y n-s+2 ...y n ) And sequence (y) 1 ,y 2 ...y s ) Sequence (y) n-s+1 ,y n-s+2 ...y n ) S is the prefix and suffix length.
4. The method for constructing a set of DNA storage codes based on an improved firefly search algorithm according to any one of claims 1 to 3, wherein the improved firefly search algorithm based on a pyramid structure and a direction adjustment mechanism comprises the steps of:
(1) Initializing required parameters and randomly generating an initial population, wherein each individual represents a coding set, and the size of the coding set is the dimension of the problem; the coding mode of the bases in the DNA sequence is as follows: 0-A,1-T,2-C,3-G;
(2) Calculating fitness value fitness of each individual, and according to the fitness value, arranging the population in ascending order to establish a pyramid population topological structure;
(3) Competition and collaboration strategy: in the pyramid population topological structure, firefly individuals in each layer can cooperate, and information transmission exists between the layers; firefly individuals learn toward individuals who are better than themselves when attempting to move toward a better direction; the firefly with the highest light intensity is positioned on the top layer;
(4) Direction adjustment mechanism: judging whether the local optimum is sunk or not; when the adjustment probability is greater than the generated random value, the population will continue to learn towards the optimal individuals in each layer; when the adjustment probability is smaller than the generated random value, generating candidate individuals, and changing the population into learning towards the candidate individuals to try to jump out of the local optimum;
(5) Obtaining a new firefly population, after finishing one iteration, recalculating the fitness value of each individual, updating the maximum fitness value, the minimum fitness value and the relevant positions, and updating the pyramid population topology structure;
(6) And (3) outputting an optimal firefly individual when the optimal result reaches the maximum iteration number or accuracy condition, otherwise, returning to the step (3).
5. The method for constructing a DNA storage encoding set based on an improved firefly search algorithm as claimed in claim 4, wherein said method for constructing pyramid population topology is as follows: initializing required parameters and randomly generating initial population size N, maximum iteration times T and dimension d of a search space; randomly generating N firefly individuals, and giving the firefly individuals an initial position X= [ X ] 1 ,x 2 ,x 3 ,…,x N ,]Calculate each individual x i Is a fitness value of (a);
determining the number of individuals on each layer as n according to the proportion relation between the layers u U=1, 2,3, …, L, where L is the number of layers of the pyramid;
according to the fitness value of individuals in the population, sorting according to ascending order to obtain corresponding individuals X' = [ X ] 1 ′,x 2 ′,x 3 ′,…,x N ′]Particles n in the first layer in the individual X 1 Is distributed to the top of the pyramid, the next particle n 2 Assigned to a second layer, and so on; up to the last layer of particles n in the individual X L Is placed at the bottom of the pyramid;
for each layer: u (u) dx2 =u dx1 +n u-1
Pyramid structure P of the generated floor layer floor,1:ni =X′ udx1:udx2
u dx1 =u dx2 +1;
Wherein X 'is' udx1:udx2 Representing the u-th in the population after ascending order dx1 To u dx2 U dx1 、u dx2 Indicating that an individual index value is assigned to each layer.
6. The method for constructing a DNA storage encoding set based on an improved firefly search algorithm according to claim 5, wherein the method for implementing the competition and collaboration strategy is as follows:
the method for updating firefly individuals in the first layer by cooperative cooperation of the first layer comprises the following steps:
in the decision parameters x l.i (t) represents the ith firefly individual in the first layer in the t-th iteration, x l.best (t) represents the best individual in layer I in iteration t, x l.j (t) jth firefly individual in layer i in the t-th iteration, x l.i (t (ith firefly individual in the first layer in the t-th iteration, β represents the attractive force between the two individuals, α) B Is Brownian motion random step length, alpha L Is the random step size of the Lewy flight, the function sign provides a random direction, ++>Representing Lewy flight, < >>Representing brownian motion; learning factor c 2 The updated formula of (2) is +.>Wherein the learning factor c 1 =1-c 2 T represents the maximum number of iterations; rand of 1 Is [0,1]Random numbers uniformly distributed on the surface, rand 2 Is [ -1,1]Random numbers uniformly distributed on the surface, rand 3 Is [ -2,2]Random numbers distributed uniformly among them, symbol->Representing a term-wise multiplication; rand is [0,1]Random numbers uniformly distributed on the base;
The firefly individual updating mode with the highest brightness in each layer is as follows:
x l.best =r 1 ·x l.best +r 2 ·x l-1.best +r 3 ·x 1.best
wherein r is 1 、r 2 、r 3 Is [0,1]Constant of the two.
7. The method for constructing a DNA storage encoding set based on an improved firefly search algorithm as claimed in claim 6, wherein the random model of brownian motion at x-point is as follows:
where the step size of the particle motion is defined by zero mean μ=0 and unit variance σ 2 A probability function defined by gaussian distribution of =1 is calculated;
random numbers based on the Lewy distribution are generated by a Mantegna method:
where x and y are two normally distributed variables,y=Normal(0,1),
wherein the method comprises the steps of
8. The method for constructing a set of DNA storage codes based on an improved firefly search algorithm as claimed in claim 7, wherein the firefly individual x is updated according to the location j Adjust its position in the solution space:
wherein x is j (t+1) represents the firefly individual x at the t+1st iteration j Is a position of (2); t is the current iteration number, the step factor alpha is a constant, and 0,1 is taken]The method comprises the steps of carrying out a first treatment on the surface of the rand represents a random number, at [0,1]Obeying uniform distribution;
firefly individual x i For firefly individual x j Is of (1)β 0 Representing the attraction of the light source to fireflies;
firefly individual x i Relative to firefly individual x j The brightness of (2) is:
Wherein I is i Is firefly individual x i Absolute brightness of (2); gamma represents the light intensity absorption coefficient, r ij Is firefly individual x i To firefly individual x j Is a euclidean distance of (c).
9. The method for constructing a DNA storage encoding set based on an improved firefly search algorithm according to any one of claims 1 to 3 and 5 to 8, wherein the direction adjustment mechanism is implemented by:
the number Num of times the population falls into the local optimum is used as a counter and initialized to 0;
the case when the optimal position of all firefly individuals is not improved in one iteration is: f (x) l.best ) t -f(x l.best ) t-1 When=0, the population falls into local optima; the parameter number Num is updated as: num=num+1; wherein f (x) l.best ) t The fitness value of the optimal position of the optimal firefly individual in the first layer is the fitness value of the optimal position of the optimal firefly individual in the t-th iteration;
the probability of the population adjusting the searching direction is the adjustment probability, and the adjustment probability
When the value of the adjustment probability is smaller than the generated random number, the population still can be obtained from the current optimal individual x l.best Learning in the middle;
when the value of the adjustment probability is greater than the generated random number, the population adjusts its search direction by learning the candidate individual.
10. The method for constructing a set of DNA storage codes based on an improved firefly search algorithm as claimed in claim 9, wherein said candidate individuals will not only be globally optimal individuals x according to the previous population l.best But also according to the excellent structure of other fireflies; the candidate individuals are generated by:
wherein Candida d Representing the value in d-dimension of the generated candidate individual;and->Respectively representing values of two firefly individuals selected randomly in the d dimension; gaussian (sigma) d ) A Gaussian offset value calculated from the standard deviation; f (x) l.k ) And f (x) l.m ) The fitness values of two individuals k and m are respectively represented; />Representing the value of the ith individual in the d dimension; average d Representing the average value of the population in the d dimension; n represents the number of individuals in the population; sigma (sigma) d Is a standard deviation variable reflecting the individual distribution in the firefly subgroup.
CN202310811122.8A 2023-07-04 2023-07-04 DNA storage coding set construction method based on improved firefly search algorithm Active CN116842730B (en)

Priority Applications (1)

Application Number Priority Date Filing Date Title
CN202310811122.8A CN116842730B (en) 2023-07-04 2023-07-04 DNA storage coding set construction method based on improved firefly search algorithm

Applications Claiming Priority (1)

Application Number Priority Date Filing Date Title
CN202310811122.8A CN116842730B (en) 2023-07-04 2023-07-04 DNA storage coding set construction method based on improved firefly search algorithm

Publications (2)

Publication Number Publication Date
CN116842730A true CN116842730A (en) 2023-10-03
CN116842730B CN116842730B (en) 2024-04-26

Family

ID=88173848

Family Applications (1)

Application Number Title Priority Date Filing Date
CN202310811122.8A Active CN116842730B (en) 2023-07-04 2023-07-04 DNA storage coding set construction method based on improved firefly search algorithm

Country Status (1)

Country Link
CN (1) CN116842730B (en)

Citations (2)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN105635006A (en) * 2016-01-12 2016-06-01 南京信息工程大学 Wavelet blind equalization method based on DNA glowworm swarm optimization algorithm
CN113225449A (en) * 2021-05-27 2021-08-06 郑州轻工业大学 Image encryption method based on chaos sequence and DNA coding

Patent Citations (2)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN105635006A (en) * 2016-01-12 2016-06-01 南京信息工程大学 Wavelet blind equalization method based on DNA glowworm swarm optimization algorithm
CN113225449A (en) * 2021-05-27 2021-08-06 郑州轻工业大学 Image encryption method based on chaos sequence and DNA coding

Non-Patent Citations (9)

* Cited by examiner, † Cited by third party
Title
NIU YING, SHEN CHAONAN, AND ZHANG XUNCAI: "Design of logic circuits based on metallo-toehold strand displacement", 《JOURNAL OF NANOELECTRONICS AND OPTOELECTRONICS》, 31 December 2019 (2019-12-31), pages 232 - 237 *
NIU, YING,ET AL.: "Improved Multi-Objective Particle Swarm Optimization Algorithm for DNA Sequence Design", 《JOURNAL OF NANOELECTRONICS AND OPTOELECTRONICS》, 31 December 2020 (2020-12-31), pages 1450 - 1459 *
XUNCAI ZHANG ET AL.: "An Image Encryption Method Based on the Feistel Network and Dynamic DNA Encoding", 《IEEE PHOTONICS JOURNAL》, 31 August 2018 (2018-08-31), pages 1 - 15 *
姜婷: "基于改进萤火虫算法的配送中心选址研究", 《宜春学院学报》, 30 September 2016 (2016-09-30), pages 36 - 39 *
张强等: "基于动态遗传算法的DNA序列集合设计", 《计算机学报》, 31 December 2008 (2008-12-31), pages 2193 - 2199 *
牛莹;张勋才;: "基于变步长约瑟夫遍历和DNA动态编码的图像加密算法", 电子与信息学报, no. 06, 15 June 2020 (2020-06-15), pages 84 - 92 *
王晓静;彭虎;邓长寿;黄海燕;张艳;谭旭杰;: "基于均匀局部搜索和可变步长的萤火虫算法", 计算机应用, no. 03, 10 March 2018 (2018-03-10), pages 107 - 113 *
郭业才;陆璐;李晨;: "基于新型DNA遗传萤火虫优化的二维图像盲恢复算法研究", 电子测量与仪器学报, no. 11, 15 November 2017 (2017-11-15), pages 111 - 116 *
龙文;蔡绍洪;焦建军;陈义雄;黄亚飞;: "求解约束优化问题的萤火虫算法及其工程应用", 中南大学学报(自然科学版), no. 04, 26 April 2015 (2015-04-26), pages 92 - 99 *

Also Published As

Publication number Publication date
CN116842730B (en) 2024-04-26

Similar Documents

Publication Publication Date Title
Lanctot et al. A unified game-theoretic approach to multiagent reinforcement learning
CN109639760B (en) It is a kind of based on deeply study D2D network in cache policy method
Meng et al. HARD-DE: Hierarchical archive based mutation strategy with depth information of evolution for the enhancement of differential evolution on numerical optimization
Wang et al. Stud krill herd algorithm
CN108465244B (en) AI method for parameter configuration, device, equipment and storage medium for racing class AI model
Li et al. A survey on firefly algorithms
WO2019154215A1 (en) Robot running path generation method, computing device and storage medium
CN104166630A (en) Method oriented to prediction-based optimal cache placement in content central network
CN103116693A (en) Hardware and software partitioning method based on artificial bee colony
CN113204417A (en) Multi-satellite multi-point target observation task planning method based on improved genetic and firefly combined algorithm
CN111008685A (en) Improved artificial ecosystem optimization algorithm based on producer probability-dependent reverse regeneration mechanism
CN113722980A (en) Ocean wave height prediction method, system, computer equipment, storage medium and terminal
CN110442758A (en) A kind of figure alignment schemes, device and storage medium
CN113255873A (en) Clustering longicorn herd optimization method, system, computer equipment and storage medium
CN116842730B (en) DNA storage coding set construction method based on improved firefly search algorithm
CN102254225B (en) Evolvable hardware implementation method based on trend-type compact genetic algorithm
Fan et al. Gdi: Rethinking what makes reinforcement learning different from supervised learning
Li et al. An improved binary quantum-behaved particle swarm optimization algorithm for knapsack problems
Chen et al. A novel marine predators algorithm with adaptive update strategy
Kihel et al. Firefly Optimization Using Artificial Immune System for Feature Subset Selection.
Rasekh et al. EDNC: Evolving differentiable neural computers
CN102323949A (en) Keyword optimization classification method based on fuzzy genetic algorithm
CN112131089B (en) Software defect prediction method, classifier, computer device and storage medium
CN114064235A (en) Multitask teaching and learning optimization method, system and equipment
CN107480768A (en) Bayesian network structure adaptive learning method and device, storage device and terminal device

Legal Events

Date Code Title Description
PB01 Publication
PB01 Publication
SE01 Entry into force of request for substantive examination
SE01 Entry into force of request for substantive examination
GR01 Patent grant
GR01 Patent grant