CN113010986B

CN113010986B - Antenna array design method based on reinforcement learning and random optimization algorithm

Info

Publication number: CN113010986B
Application number: CN202110284035.2A
Authority: CN
Inventors: 陈晓江; 赵宇航; 王夫蔚; 王基; 房鼎益
Original assignee: Northwest University
Current assignee: Northwest University
Priority date: 2021-03-17
Filing date: 2021-03-17
Publication date: 2023-02-14
Anticipated expiration: 2041-03-17
Also published as: CN113010986A

Abstract

The invention discloses an antenna array design method based on reinforcement learning and random optimization algorithm, which comprises the following steps: step 1, selecting a random optimization algorithm, and establishing an initial antenna array model to be optimized; step 2, designing a fitness function; step 3, compiling VB scripts in MATLAB; step 4, creating a plurality of HFSS threads to process VB scripts in parallel to obtain results corresponding to the random particles; step 5, generating a plurality of groups of xt by iteration; step 6, obtaining the optimal solution of all fitness functions by utilizing multiple groups of xt as a training set; step 7, designing a Q-Learning algorithm to train the training set to obtain a Q table; and 8, inputting the new cache reward and the Q table into a Q-Learning algorithm to obtain the optimal result of the antenna array to be designed. The method provided by the invention does not need additional human intervention, and completely depends on the reinforcement learning intelligent agent to judge the antenna optimization result, so that the overall efficiency of the method is improved to more than 50% compared with the traditional antenna optimization design mode.

Description

Antenna array design method based on reinforcement learning and random optimization algorithm

Technical Field

The invention belongs to the field of antenna array optimization application, and relates to an antenna array design method based on reinforcement learning and random optimization algorithm.

Background

With the development of wireless communication technology, communication devices have gone into thousands of households, and become a part of people's daily life. The performance of the antenna, which is an important medium in the wireless communication system, directly affects the transmission quality of the whole wireless communication system. The single antenna can realize basic electromagnetic wave radiation and wireless transmission of signals, but for specific functions and great improvement of performance, the single antenna is somewhat more involved. The array antenna can perfectly make up for the defects of a single antenna, greatly improve the radiation gain of the antenna, and realize the functions of beam forming and the like which need to be realized in a specific scene.

The most obvious advantage of the antenna array is its high gain and directivity, and in a point-to-point communication system, the antenna is often required to have strong directivity, that is, the antenna is required to concentrate most of energy to a point and transmit the energy, so that both directivity and high gain are realized. For the adjustment of the directivity, the phase adjustment is reflected on the antenna, for the array, the phase adjustment involves more parameters, and when the array elements are more or the optimization target is more complex, the method of manual adjustment is too slow, so that some commonly used random optimization algorithms are used for optimizing partial parameters of the antenna, and the commonly used random optimization algorithms include a particle swarm algorithm, a simulated annealing algorithm, a differential evolution algorithm and the like.

The speed of algorithm optimization parameters is greatly improved compared with that of manpower, but no matter which random optimization algorithm is adopted, because the optimal fitness function is difficult to describe once for optimization by the optimal target of the antenna, a designer needs to observe the optimal result after the fitness function is optimized for many times to modify function parameters in the process of optimizing the antenna array, and the optimization time of modifying the fitness function for optimization needs several or dozens of hours each time. In particular, for multi-parameter, multi-objective optimization processes, manual operations may require many attempts to achieve a superior result due to the number of variables involved. Therefore, in this way, since human intervention is always required to continuously adjust the fitness function, the fitness function is relatively long and random in optimization time, and the optimization efficiency is low.

Disclosure of Invention

Aiming at the problem of low efficiency of array optimization design of the traditional optimization algorithm, the invention aims to provide an antenna array design method based on reinforcement learning and random optimization algorithm.

In order to solve the technical problems, the invention adopts the following technical scheme:

an antenna array design method based on reinforcement learning and random optimization algorithm specifically comprises the following steps:

step 1, selecting a random optimization algorithm, establishing an initial antenna array model to be optimized by using HFSS (high frequency signal-to-satellite) and deriving a corresponding radiation and scattering result x in a simulation manner;

step 2, designing a fitness function f (x) = a 1. F1+ a 2. F2+ \ 8230, + an. Fn, and determining the values of a1-an as [ a1 ] respectively according to the requirement of matching the magnitude of f1-fn ₁ -a1 _k ]、[a2 ₁ - a2 _k ]…[an ₁ -an _k ]；

Where x comprises the radiation pattern x of the array ₁ And scattering pattern x ₂ (ii) a f1-fn are respectively different evaluation functions of the evaluation x; a1-an are coefficients of different evaluation functions respectively, and have k ⁿ A fitness function;

step 3, compiling the selected random optimization algorithm in MATLAB, and compiling a plurality of VB scripts according to initial random particles of the random optimization algorithm; operating the step 4.2;

step 4, creating a plurality of HFSS threads to process VB scripts in parallel to obtain a group of radiation and scattering results xt corresponding to random particles, and specifically comprising the following substeps:

step 4.1, compiling a plurality of VB scripts according to random particles of a random optimization algorithm;

step 4.2, a plurality of HFSS threads are created by adopting a parallel computing function of MATLAB, VB scripts are processed by the HFSS threads, and radiation and scattering results xt led out by each HFSS thread are obtained, wherein xt corresponds to each particle in random particles, so that a group of xt corresponding to the current random particles one by one is obtained;

step 5, randomly selecting f (x) in the fitness function f (x), respectively substituting the xt obtained in the step 4 into the selected f (x) for calculation to obtain a new xt group, thereby obtaining a group of random particles corresponding to the xt group, taking the new random particles as random particles of a random optimization algorithm, returning to and iteratively executing the step 4 until the obtained current generation random particles meet a convergence condition, and finally obtaining a plurality of groups of xt generated in an iterative process;

and 6, substituting multiple groups of new xt into all fitness functions obtained in the step 2 as x respectively, taking the minimum value of the result of each fitness function as the optimal solution of the fitness function, and thus obtaining the optimal solutions of all the fitness functions with k in total ⁿ A, connect this k ⁿ Taking the optimal solution as a training set;

step 7, designing a Q-Learning algorithm to train a training set to obtain an intelligent agent optimizing strategy Q table;

step 8, a fitness function corresponding to the state s at the random position of the Q table is used in a random optimization algorithm, the random optimization algorithm is used for optimizing the array to be designed to obtain an optimized radiation and scattering result x ' derived from the HFSS, and the difference between the x ' and a target beam is used for obtaining a new cache reward tr ' at the state s; and inputting the new buffer reward tr' and the Q table into a Q-Learning algorithm to obtain the optimal result of the antenna array to be designed.

Further, in the step 1, the random optimization algorithm is a particle swarm algorithm, a simulated annealing algorithm or a differential evolution algorithm.

Further, the step 7 specifically includes the following sub-steps:

step 7.1, determining a state s, an action a and an award r in a Q-Learning algorithm; wherein the state s is k obtained in step 6 ⁿ The optimal solution, action a is the transition of state s, and action a comprises 2 (k-1) n different actions;

generating a cache reward tr: each data in the state s set is differentiated from the target wave beam to obtain all buffer reward values tr, and normalization processing is carried out on all the buffer reward values tr;

setting a reward r: taking the minimum value of the normalized tr, setting all rewards r at the action a transferred to the minimum tr as 100, comparing the state correspondence tr before and after all actions a, if the transferred tr is smaller, setting the reward r as +1, otherwise, setting the reward r as-1; simultaneously setting the reward r of all the pointing actions a with the larger state difference with the target beam as-50;

and 7.2, starting from the initial position, gradually and iteratively searching for an optimal solution path by applying the state s, the action a and the reward r determined in the step 7.1 to obtain an agent optimization strategy Q table obtained by a Q-Learning algorithm.

Further, in step 7.1, the generating function of the target beam is as follows:

in the formula, target is a design-required pointing angle, and any angle between 45 degrees and 135 degrees is substituted into a function; deg is degree, the value is 0-180, and the step length is 1, and all the functions are substituted to obtain the target wave beam.

Further, in step 8, the antenna array to be designed has a similar arrangement to the initial antenna array in step 1.

Further, the similar arrangement means similar arrangement with the initial antenna array, including: (1) The number of the transverse or longitudinal antennas is not changed, and only the other one-dimensional number is modified; (2) Different array elements are used under the condition that the arrangement state of the antenna array is unchanged.

Compared with the prior art, the method adopts the reinforcement learning method to replace the function of human decision in the antenna array optimization process, so that the parameters of the fitness function needing human intervention automatically generate relative optimal values through reinforcement learning, the whole antenna design process does not need additional human intervention, and the antenna optimization result is judged completely by means of an intelligent body of reinforcement learning. Through experiments, compared with the traditional antenna optimization design mode, the method provided by the invention has the advantage that the overall efficiency is improved by more than 50%.

Drawings

FIG. 1 is a flow chart of the method of the present invention.

Fig. 2 shows the original 2 x 4 antenna array model and parameters in an embodiment.

FIG. 3 is a schematic diagram of a2 × 6 array model to be optimized after optimization.

Fig. 4 is a schematic diagram illustrating the consistency between the optimized result of the array model to be optimized and the designed target beam in the embodiment.

FIG. 5 is a comparison of the time consumption of the conventional optimization method and the reinforcement learning method of the present invention.

The present invention will be explained in further detail with reference to examples.

Detailed Description

The invention relates to an antenna array design method based on reinforcement learning and random optimization algorithm, which comprises the following steps:

the random optimization algorithm can be any conventional random optimization algorithm, such as a particle swarm algorithm, a simulated annealing algorithm, a differential evolution algorithm and the like.

Where x comprises the radiation pattern x of the array ₁ And scattering pattern x ₂ (ii) a f1-fn are different evaluation functions of the evaluation x, such as the maximum peak value of the main lobe, the mean value of the side lobes, the width of the main lobe and the like; a1-an are respectively coefficients of different evaluation functions, the coefficients are used for balancing magnitude and changing action of different evaluation functions in the whole fitness function, therefore, k is shared according to different value combinations of the coefficients of different evaluation functions ⁿ And the fitness function is used for enabling the random optimization algorithm to obtain different optimal results.

Step 3, writing the selected random optimization algorithm in MATLAB, and writing a plurality of VB scripts according to the initial random particles of the random optimization algorithm (the number of the VB scripts is the same as that of the initial random particles); the VB script is used for controlling the HFSS to conduct parameter adjustment and result derivation of the initial antenna array model. Operating the step 4.2;

and 4, creating a plurality of HFSS threads to process the VB script in parallel to obtain a group of radiation and scattering results xt corresponding to the random particles. The method comprises the following specific steps:

and 4.1, writing a plurality of VB scripts according to the random particles of the random optimization algorithm. The VB script is used for controlling the HFSS to carry out parameter adjustment and result derivation on the initial antenna array model (the number of the VB script is the same as that of the random particles);

and 4.2, creating a plurality of HFSS threads by adopting a parallel computing function of MATLAB (the number of the HFSS threads is determined by the processing capacity of a CPU (Central processing Unit) of the machine), processing VB scripts by the HFSS threads to obtain radiation and scattering results xt led out by each HFSS thread, wherein the xt corresponds to each particle in random particles, and thus a group of xt corresponding to the current random particles one by one is obtained.

The step calls a plurality of solving processes in parallel, so that the convergence rate of the whole optimization algorithm is improved, and each generation of algorithm can be solved more quickly.

Step 5, randomly selecting an f (x) from the fitness function f (x), respectively substituting the xt obtained in the step 4 into the selected f (x) for calculation, and conventionally screening and transforming the calculation result to obtain a new xt group so as to obtain a group of random particles corresponding to the xt group (the screening and transforming process is determined by the selected random optimization algorithm), taking the new random particles as the random particles of the random optimization algorithm, returning and iteratively executing the step 4 until the obtained current random particles meet the convergence condition, and finally obtaining a plurality of groups of xt generated in the iterative process;

and 6, substituting multiple groups of new xt into all fitness functions obtained in the step 2 as x respectively, taking the minimum value of the result of each fitness function as the optimal solution of the fitness function, and thus obtaining the total k of the optimal solutions of all the fitness functions ⁿ A, connect this k ⁿ The optimal solution is used as a training set for reinforcement learning training;

and 7, designing a Q-Learning algorithm to train the training set to obtain an intelligent agent optimizing strategy Q table. The method comprises the following substeps:

step 7.1, determining a state s, an action a and an award r in a Q-Learning algorithm; wherein the state s is k obtained in step 6 ⁿ The optimal solution, action a is the transition of state s, namely the change types of the coefficients of different fitness functions, and action a comprises 2 (k-1) n different actions;

function g (deg) to generate the target beam:

in the formula, target is a designed required pointing angle, and any angle between 45 degrees and 135 degrees is substituted into a function; deg is degree, the value is 0-180, and the step length is 1, and all the steps are substituted into the function to obtain a target wave beam trans;

setting a reward r: taking the minimum value of the normalized tr, setting the reward r at the action a transferred to the minimum tr as 100, comparing the state corresponding to tr before and after all the actions a, if the transferred tr is smaller, setting the reward r as +1, otherwise, setting the reward r as-1; while the reward r at all pointing actions a that differ from the target beam by a large amount (tr is greater than 0.7 in the example) is set to-50 so that the algorithm avoids changing the fitness function to that state.

Step 7.2, starting from the initial position, the agent gradually and iteratively searches for an optimal solution path by applying the state s, the action a and the reward r determined in the step 7.1 to obtain an agent optimization strategy Q table obtained by a Q-Learning algorithm;

Having a similar arrangement means that the antenna array is similar in arrangement to the original antenna array, including:

(1) The case where the number of horizontal or vertical antennas is not changed but the number of the other one-dimension is modified. For example, for a2 × 4 original array, the method can be universally deployed and optimized on a2 × 6 universal array, a2 × 8 universal array and the like;

(2) And different array elements are used under the condition that the arrangement state of the antenna array is not changed. If the array element of the original array is a microstrip antenna, the antenna to be designed is a slot antenna.

Example 1:

step 1, according to the specific design requirements of a communication system (such as the limitation of factors such as the type, number and integral size of an antenna element), using HFSS to establish an initial antenna array model to be optimized.

Taking a2 × 4 microstrip patch antenna array as an example, the parameters of each array element include the lengths of the impedance line and the delay line. Spacing d of the antennas in this embodiment ₁ -d ₁₀ Length il of impedance line of eight antenna elements including 6 transverse and 4 longitudinal spacings of the elements ₁ -il ₈ And the length dl of the delay line ₁ -dl ₈ Are set as variables in the HFSS for subsequent optimization modification using MATLAB generation scripts.

Step 2, designing an optimization algorithm fitness function, wherein f (x) = a 1. F1+ a 2. F2+ \8230, + an. Fn and the variation range [ a1-an ] of a1-an ₁ -a1 ₅ ]、[a2 ₁ -a2 ₅ ]…[an ₁ -an ₅ ]。

For a2 x 4 microstrip patch array antenna, the optimization objective is to increase the scattering intensity and shift the radiation pattern. The fitness function is designed as follows:

f(x)＝-0.3*max(x ₂ )+1*len(x ₁ )+50*std(crest(x ₁ ))-0.3*max(x ₁ ) +3 × diff (x 1-target), where max () is the function taking the maximum value, len () is the function taking the width of the main lobe, std (crest ()) is the function taking the variance of the peak and valley of the side lobe, diff (x 1-target) isThe difference between the main lobe orientation and the design orientation.

And 3, selecting a random optimization algorithm as a learning basis, taking a differential evolution algorithm as an example, writing the differential evolution algorithm in MATLAB, setting the number of random particles to be 50, setting the iteration number to be 200, and setting the convergence condition to be that the optimal difference between two generations is less than 0.002.

And 4, creating a plurality of HFSS threads to process the VB script in parallel to obtain a group of radiation and scattering results xt corresponding to the random particles.

In steps 5 and 6, because the number of changes of f (x) is too large, a great amount of time is consumed if all generated fitness functions are traversed and solved. A modified solution process is therefore employed here to generate all the training sets. The method specifically comprises the steps of randomly selecting a fitness function, solving the fitness function, and storing all intermediate results xt. And then generating the optimal solution corresponding to all fitness functions. Finally, k is put ⁿ And taking the optimal solution as a training set for later reinforcement learning training.

And 7, designing a Q-Learning algorithm to train the training set to obtain an intelligent agent optimizing strategy Q table. The method specifically comprises the following steps: a state s, an action a, and a reward r are determined, wherein in the reward r setting, a function is used to generate the target beam, as shown by the dotted line in fig. 4, and the actual result is subtracted from the optimal value to obtain the cache reward tr.

And 8, applying the Q table to the simulation design antenna array, and enabling the intelligent agent to automatically optimize the antenna according to the Q table value and the really obtained reward. Particularly for the Q table obtained by training with the 2 × 4 array, the method can be universally deployed and optimized on the universal arrays such as 2 × 6, 2 × 8 and the like.

Fig. 4 shows simulation results (indicated by solid lines) obtained by optimizing the Q-table obtained by training the 2 × 4 array on the 2 × 6 microstrip array shown in fig. 3, wherein the dotted line is a target beam of the design, and it can be seen from fig. 4 that the solid line and the dotted line have a higher degree of coincidence, and thus substantially meet design requirements. Meanwhile, from the design time shown in fig. 5, the whole optimization speed is improved by 50% by optimizing the antenna array by using the reinforcement learning method of the invention. In addition, the overall optimization speed was improved by 70% when the Q-table obtained by training the 2 x 4 array was applied to the 2 x 8 array.

Claims

1. An antenna array design method based on reinforcement learning and random optimization algorithm is characterized by comprising the following steps:

step 2, designing a fitness function f (x) = a 1. F1+ a 2. F2+ \ 8230, + an. Fn, and determining the values of a1-an as [ a1 ] respectively according to the requirement of matching the magnitude of f1-fn ₁ -a1 _k ]、[a2 ₁ -a2 _k ]…[an ₁ -an _k ]；

step 4, creating a plurality of HFSS threads to process the VB script in parallel to obtain a group of radiation and scattering results xt corresponding to random particles, and specifically comprising the following sub-steps:

and 6, substituting multiple groups of new xt into all fitness functions obtained in the step 2 as x respectively, taking the minimum value of the result of each fitness function as the optimal solution of the fitness function, and thus obtaining the optimal solutions of all the fitness functions with k in total ⁿ A, will k this ⁿ Using the optimal solution as a training set;

step 7, designing a Q-Learning algorithm to train the training set to obtain an intelligent agent optimizing strategy Q table;

the method specifically comprises the following substeps:

step 7.1, determining a state s, an action a and an award r in a Q-Learning algorithm; wherein the state s is k obtained in step 6 ⁿ The optimal solution is that action a is the transition of state s, and action a comprises 2 (k-1) n different actions;

the state with larger phase difference means that the generated cache reward tr is larger than 0.7;

the generation function of the target beam is as follows:

in the formula, target is a designed required pointing angle, and any angle between 45 degrees and 135 degrees is substituted into a function; deg is degree, the value is 0-180, and the step length is 1 to be substituted into the function to obtain the target wave beam;

2. The method for designing an antenna array based on reinforcement learning and random optimization algorithm according to claim 1, wherein in the step 1, the random optimization algorithm is a particle swarm algorithm, a simulated annealing algorithm or a differential evolution algorithm.

3. The method according to claim 1, wherein in step 8, the antenna array to be designed has a similar arrangement to the initial antenna array in step 1;

the similar arrangement means that the antenna array is similar to the initial antenna array in arrangement, and comprises the following steps: (1) The number of the transverse or longitudinal antennas is not changed, and only the other one-dimensional number is modified; (2) Different array elements are used under the condition that the arrangement state of the antenna array is not changed.