AU2020103826A4

AU2020103826A4 - Whale dna sequence optimization method based on harmony search (hs)

Info

Publication number: AU2020103826A4
Application number: AU2020103826A
Authority: AU
Inventors: Xue LI; Bin Wang; Qiang Zhang
Original assignee: Dalian University
Current assignee: Dalian University
Priority date: 2020-12-01
Filing date: 2020-12-01
Publication date: 2021-02-11
Anticipated expiration: 2028-12-01

Abstract

The disclosure provides a whale DNA sequence optimization method based on harmony search (HS), including the following steps: randomly generating an initial population; updating the population using the WOA algorithm; expanding a search range using HS to give a new population; adding a plurality of constraints to reduce a solution space; and sorting fitness values of results and outputting an optimal sequence. HS is highly discrete and exhibits a strong ability to search for local optimal solutions. Continuous HS increases the disturbance around an individual, thereby preventing the algorithm from falling into the local optimum and increasing the richness of a population at the same time. Initialize a population Calculate fitness values No p<0.5 Yes No Al<1 Yes Update by a spiral Use an individual with the Randomly select an method optimpdaftness for individual for updating Harmony search Use constraints to reduce a solution space Add 1 to the number of iterations No Whether a maximum number of iterations is reached Yes End FIG. I

Description

Initialize a population

Calculate fitness values

No p<0.5

Yes No Al<1

Yes

Update by a spiral Use an individual with the Randomly select an method optimpdaftness for individual for updating

Harmony search

Use constraints to reduce a solution space

Add 1 to the number of iterations

No Whether a maximum number of iterations is reached

Yes

End

FIG. I

WHALE DNA SEQUENCE OPTIMIZATION METHOD BASED ON HARMONY SEARCH (HS) TECHNICAL FIELD The disclosure belongs to the field of coding design in DNA computing, and in particular, relates to a whale DNA sequence optimization method based on harmony search (HS). BACKGROUND In 1994, professor Adelman of the University of Southern California solved the problem of seven-vertex Hamiltonian path by using DNA molecules as a computing medium. This finds a method for combining biotechnology with a computer and starts a new way to DNA computing. Utilizing the characteristic that DNA molecules can be uniquely identified during encoding, and following the principle of complementary base pairing, a problem to be solved turns into a collection of DNA molecules. A biochemical reaction is conducted on the generated DNA molecules to give all solution spaces for the problem, and the solution spaces are then separated and extracted to give a solution for the problem. The followed principle of complementary base pairing is that base A is paired with base T and base C is paired with base G. The biochemical reaction usually adopts hybridization of DNA molecules, so that encoded DNA molecules are completely hybridized, which is the key for accurately and reliably solving the problem. Therefore, it is a significant task to study high-quality DNA molecular sequences. SUMMARY By combining a whale algorithm with the harmony search algorithm (HSA), a whale DNA sequence optimization method based on HS is provided. Compared with the existing methods, this method adds a new constraint: pairing. By simulating this method, a DNA sequence with a higher sequence quality can be obtained. In order to achieve the above object, the disclosure adopts the following technical solution: a whale DNA sequence optimization method based on HS, which needs to obtain an optimized DNA sequence that meets a plurality of constraints. The method includes the following steps: randomly generating an initial population; updating the population using the whale optimization algorithm (WOA); expanding a search range using HS to give a new population; adding a plurality of constraints to reduce a solution space; and sorting fitness values of results and outputting an optimal sequence. The specific steps are as follows: step 1: randomly generating an initial population, and initializing parameters, where, Maxiter is a maximum number of iterations, and t is a current number of iterations; step 2: calculating fitness values for the current population, and taking a minimum of the sum of individual fitness values as a current optimal whale, and recording position information thereof; step 3: randomly generating a variable for a change in each iteration; step 4: judging the variable p(p E-[0,1]), and if p < 0.5, proceeding to step 5, otherwise, proceeding to step 6; step 5: determining whether JAI of this iteration is less than 1, and if it is less than 1, using the current optimal whale to update positions of the remaining whales, otherwise, randomly selecting a whale to update positions of other whales; step 6: using the current optimal whale to update positions of other whales by a spiral formula; step 7: subjecting all existing populations to HS, and expanding a search range to give a new outstanding population; step 8: evaluating all populations using constraints, deleting whales that do not meet the constraints, and using a fast non-dominated sorting approach to select the same number of whales as in the initial population; step 9: adding 1 to the number of iterations to determine whether a maximum number of iterations is reached, and if it is not reached, proceeding to step 2, otherwise, proceeding to step ; and step 10: sorting fitness values of results and outputting the results to give an optimized population. Through the above method, the disclosure can achieve the following effects: 1. When fitness values are calculated for the initial population, the minimum value of the sum of individual fitness values is determined as the current optimum, and the minimum value is recorded, so that individuals can better adapt to the environment when a whale algorithm population evolves, thereby enabling comprehensive optimization. 2. The whale algorithm has the disadvantage of easily falling into the local optimum. However, HS is highly discrete and exhibits a strong ability to search for local optimal solutions. Continuous HS increases the disturbance around an individual, thereby preventing the algorithm from falling into the local optimum and increasing the richness of a population at the same time. 3. The whale DNA sequence optimization algorithm based on HS provided by the disclosure can be used to acquire DNA coding sequences with high quality. BRIEF DESCRIPTION OF DRAWING FIG. 1 is a flowchart for implementing the disclosure. DETAILED DESCRIPTION The disclosure will be described in detail below by way of non-limiting example only, with reference to the accompanying drawing. The disclosure adopts the following 8 constraints: hairpin structure, H-measure, continuity, similarity, Hamming distance, melting temperature, GC content, and a newly proposed constraint: pairing. The first four constraints mentioned above are adopted as the objective function, and the rest are adopted as constraints. They are used in the second step of the claims to calculate the fitness of each individual. Hairpin structure indicates that a DNA strand has self-folding due to self-complementarity. H-measure represents the number of complementary base pairs in two complementary sequences, which is used to limit the unnecessary hybridization among sequences. Continuity means that the number of consecutive identical bases in a specified interval of a DNA sequence should be limited to a specific threshold. Similarity refers to the probability that two DNA sequences in the same direction have the same base on an allele. Hamming distance refers to the number of different bases on alleles of two different DNA sequences. Melting temperature indicates the temperature at which half of a DNA molecule changes from a double-stranded state to a single-stranded state. GC content indicates the percentage of bases G (guanine) and C (cytosine) contained in any DNA strand in all bases in this DNA strand, which is defined as 50% in the disclosure. Pairing means that any 3 consecutive bases are regarded as one unit, and then the unit is compared with all the remaining units to see if they are paired; and if they are completely paired, the positions of two different bases are adjusted. Example 1 The example of the disclosure is implemented on the premise of the technical solution of the disclosure, and the detailed implementations and specific operation processes are given, but the protection scope of the disclosure is not limited to the following examples. In the example, the length and dimension for DNA coding were both 20, and constraints such as hairpin structure, H measure, continuity, similarity, Hamming distance, melting temperature, GC content, and pairing were as described above. Step 1: An initial DNA coding sequence, with a dimension and a length both of 20, was initialized. Relevant parameters were initialized: the maximum number of iterations Maxiter: 300; the number of iterations t: starting from 0; and the Hamming distance L: 11. Step 2: Fitness values were calculated for an existing population. The individual with the minimum fitness value was adopted as the current optimal solution, and position information thereof was recorded. Step 3: A parameter value was defined for a change in each iteration. a was reduced to 0 from 2 using a linear method, i was any random number between [0,1]; 1 was any random number between [-1,1]; p was any random number between [0,1]; A was defined according to formula (6): = 2a -' - a (6) Step 4: A randomly selected p was judged, and if p < 0.5, it will proceed to step 5, otherwise, it will proceed to step 6. Step 5: It was determined whether JAI of this iteration is less than 1, and if it is less than 1, the current optimal whale will be used to update positions of the remaining whales according to formula (7), otherwise, a whale was randomly selected to update other whales according to formula (8); Z(t + 1) = X'(t) - A - D (7) where,(t) - r 2, F(t) is the position of the current optimal whale, and X is the current updated whale position; Z(t + 1) = Xrand - A D (8) where, D = C rand-I, and Xrand represents an randomly-selected whale position. Step 6: The current optimal whale was used to update positions of other whales by a spiral formula shown as follows: Z(t + 1)=7 - e' - cos(2rcl) + X(t) (9) where, D' = |X(t)- Z|. Step 7: All populations were subjected to HS. In the HS, a voice was transformed according to the current timbre, a transformed voice was compared with the original voice, and a sound with a better timbre was reserved, thereby acquiring a new outstanding population. Step 8: All populations were evaluated using constraints, whales that do not meet the constraints were deleted, and whales that meet the constraints were reserved. A fast non-dominated sorting approach was adopted to pick out the top 20 among whales for the next iteration. If the number of populations that meet the constraints is below 20, all will enter the next iteration. Step 9: 1 was added to the number of iterations to determine whether a maximum number of iterations is reached, and if it is not reached, it will proceed to step 2, otherwise, it will proceed to step 10. Step 10: Fitness values of results were sorted and the results were output to give an optimized population. Table 1 Initial DNA sequences AATATCCGAAATCCCCTGCT GCTTAAGATTTGTGAATGAT CGAAATCTTAATAGACACGA GGAAAAAATTATCATCAAAT AATGGCATCATGGCGACTTA CATGTAGCCTGGATACTTTG TATAATCTGTGTTTAAAATA ATACAATCGAACAAGATGGT GAACTATTAAATTCAAATCA GTATAAATCCATTAACAGGC CATGCTTTTGTAGCTATTAG

GCATCCGCTATTAATATATG GTAAATTGATTAATGGCATA TCCTAGTGCTTATACAGATT ACCGGCCTCTGTTATATATA CATTCACTTTTCAGTTCACC TAGTAAGTAATTAGGTTTAT GTTTTGAGAATTTTAATCTT AGAACCATAATTTAACTGTC GCTTTTCCTATAGTTCTTAT

Table 2 A set of the optimal DNA sequences CTCGTCTAACCTTCTTCAGC CTGTGTGGAATGCAAGGATG CGAGCGTAGTGTAGTCATCA AATTACAGGCCACCACCGAT CAGTAGCAGTCATAACGAGC GCATAGCACATCGTAGCGTA TGGACCTTGAGAGTGGAGAT The disclosure provides a whale DNA sequence optimization method based on HS, where, an initial population is searched with a whale algorithm. The populations are screened based on constraints such as hairpin structure, H-measure, continuity, similarity, Hamming distance, melting temperature, GC content, and pairing, and finally the top 7 among selected sequences as the final outstanding set for outputting. The disclosure runs under the environment of WinOIntel(R)CPU2.70GHz, ARM 8.00GB, and simulation is conducted with MATLAB2018a. An experiment shows that the optimized sequence obtained in this example is better than sequences obtained from other algorithms. In this specification, any reference to the prior art is not intended to be, and is not to be construed as, any admission, implication or suggestion that the prior art forms part of the common general knowledge in Australia or anywhere else. In this specification, the word "comprising" has the non-limiting meaning of "including at least" and not a limiting meaning such as "including only". The same applies, with necessary changes made, to other grammatical forms of the word such as "comprise", "comprised", and ''comprises". Although the invention is described above by way of specific disclosure and example, it is not limited thereto but may include other embodiments and examples limited only by the claims.

Claims

What is claimed is: 1. A whale DNA sequence optimization method based on harmony search (HS), comprising the following specific steps: step 1: randomly generating an initial population; step 2: calculating fitness values for the current population, and taking a minimum of the sum of individual fitness values as a current optimal whale, and recording position information thereof; step 3: randomly generating a variable for a change in each iteration; step 4: determining whether the variable is less than a set value, if so, proceeding to step 5, and if not, proceeding to step 6; step 5: determining whether the modulus JAI of a coefficient vector A of the iteration is less than 1, and if it is less than 1, using the current optimal whale to update positions of the remaining whales, otherwise, randomly selecting a whale to update positions of other whales; step 6: using the current optimal whale to update positions of other whales by a spiral formula; step 7: adding HS, and expanding a search range to give a new outstanding population; step 8: evaluating all populations using constraints, deleting whales that do not meet the constraints, and using a fast non-dominated sorting approach to select the same number of whales as in the initial population; step 9: adding 1 to the number of iterations to determine whether a maximum number of iterations is reached, and if it is not reached, proceeding to step 2, otherwise, proceeding to step ; and step 10: sorting fitness values of results and outputting the results to give an optimized population.
2. The whale DNA sequence optimization method based on HS according to claim 1, wherein, the calculation formula for updating whale positions is as follows:

Z(t + 1) =f1 ifp<.5 (1) D' -e -cos(2rcl) + X*(t) if p 0.5 wherein, A 2 - ' - a, a is reduced to 0 from 2 using a linear method, and i is any random number between [0,1]; 1 is any random number between [-1,1]; p is any random number between

[0,1]; X*(t) represents the position of the current optimal whale; X# is defined in formula (2):

X* (t) |AI< 1 X# = Uad JI>1(2) ~Xrana |Al21 wherein, Xrand represents an randomly-selected whale position; D is defined in formula (3): , D = | CX(t -X( -I AI < 1 (3) C- Xrand -1 |JAI 1

D' is defined in formula (4): D' = |X- (t) - XI (4)
3. The whale DNA sequence optimization method based on HS according to claim 1, wherein a new population constraint is added, and specifically: for a sequence x and a reverse sequence y thereof, 3 consecutive bases are regarded as a unit, namely, x' = (xi, xi+1 , xi+2 ) and y' = (Yj, Yj+1Y, y+2), for V i, j E [1, n - 2], which is subject to function (5) pair(x') subcb(x',y', k) 3 fpair(x) =x' subcb(x',y',k)*3 (5)

wherein, the function subcb () aggregates whether x' and y' are completely complementary paired; and when subcb ()= 3, it means that bases in the two units are completely paired, and then any two different bases in x' are used for position exchange.