CN114218849A

CN114218849A - Intelligent design method of complex array antenna based on deep reinforcement learning

Info

Publication number: CN114218849A
Application number: CN202111113588.8A
Authority: CN
Inventors: 陈方园; 陈小忠; 刘宇峰; 李文博
Original assignee: Zhejiang Jinyichang Technology Co ltd
Current assignee: Zhejiang Jinyichang Technology Co ltd
Priority date: 2021-09-23
Filing date: 2021-09-23
Publication date: 2022-03-22

Abstract

The intelligent design method of the complex array antenna based on the deep reinforcement learning converts the design process of the radiation performance parameter target to be achieved by the antenna expected to be designed into a target extreme value optimizing process of an optimized function. Adjusting various physical parameters of the antenna by using an optimization strategy and a method for adjusting a plurality of variables in an optimized function; thereby achieving an optimum value for the optimized function, i.e. the desired radiation performance of the antenna. By using the method, when facing a complex electromagnetic environment and a complex antenna design scene, dependence of antenna designers on electromagnetic theory knowledge and experience can be reduced, design efficiency of the complex array antenna can be greatly improved, and antenna design time can be reduced.

Description

Intelligent design method of complex array antenna based on deep reinforcement learning

Technical Field

The invention relates to the field of deep learning, in particular to a complex array antenna intelligent design method based on deep reinforcement learning.

Background

An antenna in the prior art is a device capable of receiving or emitting electromagnetic waves, which is widely used in wireless communication, detection, navigation, and other systems, and is closely related to human life. The performance of the antenna is related to the performance of the whole electronic system. The performance of a single-element antenna is limited in some aspects, and in some important occasions, the performance of the single-element antenna cannot meet some special performance index requirements required by a system in the occasions, such as high gain, low sidelobe, controllable directional diagram and the like. In order to obtain parameters such as high antenna gain, directivity, standing-wave ratio and the like, a group array form can be utilized to realize the required antenna performance.

For the design of a complex array antenna, not only a single oscillator needs to be designed independently and optimally, but also the array units are flexibly arranged on the radiation surface, and meanwhile, the distribution problem of a power distribution network is also faced, the radiation field of the antenna is faced with the problems of large calculation amount, difficulty in convergence of calculation results and the like. The optimization problem of the array antenna is often complicated due to the uniqueness of the surface of the array antenna, the optimization objective function of the array antenna is often multi-optimization parameter, non-linear and multi-extremum, and the characteristics improve the optimization difficulty, so that higher requirements are put forward on the performance of the optimization algorithm.

The optimization of the array antenna faces the problem of excessive parameters, and hundreds of times of adjustment of each parameter calculation are often needed to obtain the required performance of the array antenna. And evaluating the relative quality of the antenna performance corresponding to the current parameter by using the designed objective function, and manually adjusting the antenna performance is extremely difficult. The efficient intelligent optimization algorithm can well solve the problems of large calculation amount of electromagnetic field solving, complex parameter adjustment and the like.

Disclosure of Invention

In order to solve the above problems, the present invention provides an automatic and fast intelligent design method for a complex array antenna based on deep reinforcement learning,

in order to achieve the purpose, the invention discloses a complex array antenna intelligent design method based on deep reinforcement learning, which comprises the following steps:

the method comprises the following steps: determining the geometric size range of the antenna according to the required antenna radiation frequency, the antenna radiation directional diagram and the requirements of the space environment, initializing the antenna model design, and establishing an initial antenna radiation metal patch structure, wherein the radiation metal patch is a rectangle with a fixed size;

step two: setting a corresponding solving space, solving dimensions, variable constraint conditions, optimization targets, the number of the optimization targets, a normalized target factor, a state space and convergence conditions;

step three: discretizing and modeling an antenna structure according to a designed initial antenna model, generating a structure sequence of a corresponding position by using a program, controlling the existence of the structure of the corresponding position of the antenna by using a program code, forming an antenna shape of a random structure by generating the position of the random structure, calculating and solving the current distribution of the antenna and the electromagnetic wave radiation of the antenna to obtain the electromagnetic field distribution of the antenna, and simulating an electromagnetic field radiation field by using an electromagnetic field calculation program or software;

step four: performing deep algorithm learning by using Markov Decision Process (MDP), forming a corresponding database according to the obtained antenna electromagnetic field distribution and radiation pattern, determining a normalized expected target, performing a learning Process by using a Bellman optimization mode, realizing automatic target approximation by using a random gradient strategy, determining a backspacing strategy, and in the optimization of each sequence, if the optimization result is not converged, backspacing to perform continuous optimization until the optimization is converged, and stopping learning; or optimizing to the optimal result position, and stopping learning;

step five: determining the result convergence condition, wherein the calculation result adopts a maximum iteration number method and a target realization method to carry out convergence judgment, and judging whether the result convergence condition realizes target approximation or meets the maximum iteration number; if yes, then executing step six, otherwise, repeating step three and step four;

step six: calculating and outputting antenna parameter results, carrying out parametric modeling based on the optimized parameter results, determining the calculation results and deriving a 3D model;

step seven: and finishing the learning.

In a further proposal, in the step two, the solving dimension corresponds to the geometric dimension and physical property parameters of the optimized antenna, the physical parameter variation range of the antenna corresponding to the solving space, the corresponding relation of the antenna parameter and the structural parameter requirement in the optimization process corresponding to the variable constraint condition, the optimization target corresponds to the optimization target requirements of various radiation performance parameters of the antenna design, the number of the optimization targets corresponds to the number of the radiation performance parameter optimization targets of the antenna design, the normalized target factor represents the normalized expected value of the multiple optimization targets after weight processing, the state space corresponds to the result aiming at the target to be optimized in the corresponding iteration depth reinforced learning process, namely the optimized antenna parameter condition, the convergence condition corresponds to the condition that the calculated target function reaches the expected condition in the learning process of the deep learning algorithm. The physical parameter variation range of the antenna comprises: the number of the antenna radiation sources, the size of the radiation units, the distance between the radiation units, the overall size, the dielectric constant, the loss tangent, the frequency and the overall size of the antenna are determined by the number, the size and the distance between the rectangular radiation metal patches, and the current distribution of different forms on the antenna is realized by changing the positions and the sizes of the rectangular radiation metal patches, so that the performance parameters of the electromagnetic wave radiation of the antenna are optimized.

In a further scheme, in the third step, the radiation source of the antenna is composed of the rectangular radiation metal patches with fixed size, and the electromagnetic field of each radiation source is equivalent to that of

Wherein

Q is equivalent electric charge and electric quantity for field intensity distribution,

a vector position of an observation point in space from a radiation source, k is a state parameter of the corresponding space, and a series of equivalent radiation sources are combined to form random structure irregularAnd the law radiation unit is used for obtaining the electromagnetic field distribution of the antenna after calculation.

The radiation pattern required in step four is derived from a theoretical formula,

wherein z represents the radiation direction, l is the length of the radiation source, theta represents the angle between the line from the observation point to the central point of the array antenna and the normal line passing through the central point of the antenna, and I_n(z') and phi_n(z') represents the amplitude and phase distribution of the radiation source along the antenna, respectively.

In the fourth step, the deep reinforcement learning by using the Markov Decision Process is specifically as follows:

P(S_t+1，R_t+1|S₀，A₀，R₁....S_t，A_t)＝P(S_t+1，R_t+1|S_t，A_t)；

i.e. in state S_tWhen taking action A_tLast state S_t+1And earnings R_t+1Only with respect to current actions and states, and not with respect to historical states.

In the fourth step, a weighting mode is adopted to determine a normalized expected target, each radiation performance parameter target is assigned as an expected target value with a certain weight for optimization, and for N targets, an overall target G is optimized,

G＝＝w₁G₁+w₂G₂+....+w_iG_i....+w_NG_N

in the formula w_iIs the weight coefficient of each object.

In step four, the Bellman Optimality equation is adopted to execute the MDP reinforcement learning process,

V(s)＝E[R_t+1+γmaxV(s_t+1)|S_t＝S]；

Q(s，a)＝E[R_t+1+γmaxQ(s_t+1，a′)|S_t＝S，A_t＝A]；

in the formula, gamma is a discount factor of the long-term income, E represents an expected value in a corresponding space and state, and a is an action taken in the corresponding state.

In step four, for any state space S_NThe Stochastic strategy Gradient Method is adopted to carry out multi-target approximation calculation,

delta represents a value of the amount of change of the gradient,

the expression of the gradient operator is used to indicate,

representing a derivative operator, α representing an iteration step correlation factor, P representing a given probability function, i.e. the output action obeys a probability distribution in the corresponding state space S; and gradually adjusting the result gradient to approach to an optimal value for the antenna field distribution result optimized each time.

In the fourth step, the backspacing strategy adopts the following formula to carry out income reward and punishment, namely a reward and punishment strategy:

where v is the result of the optimization of the response step,

is the average value from the ith result to the nth result, max (v) is the maximum value from the ith result to the nth result, min (v) is the minimum value from the ith result to the nth result, v_c(x_i) Is an optimized value, v, of the step in the rollback strategy_c+1(x_i) In the next step after adopting the backspacing strategy, considering the reward and punishment strategyThe value obtained.

In the backspacing strategy, according to the Bellman Optimality equation, a formula is adopted

V′(s)＝V(s)[v_c+1(x_i)·R⁽ⁿ⁾]

Q′(s，a)＝Q(s，a)[v_c+1(x_i)·R⁽ⁿ⁾]

And carrying out long-term benefit and expectation solution, wherein the symbol is matrix inner product, and V '(s) and Q' (s, a) are in s state, and obtaining the comprehensive expectation and benefit value of the new radiation performance parameter optimization target after adopting the backspacing strategy and the reward and punishment strategy.

Drawings

Fig. 1 shows a side view of the material distribution required for designing an antenna.

Fig. 2 is an optimized antenna initial model setup, where dark rectangular metal patches are the location distribution of metal radiating metal patches, and the size.

Fig. 3 is a schematic diagram of a time difference regression method adopted in the deep learning strategy Markov Decision Process.

FIG. 4 is a graphical presentation of a random gradient strategy optimization two-dimensional space.

FIG. 5A random gradient strategy optimizes a three-dimensional spatial graphical presentation.

Fig. 6 is a schematic diagram of time differential back-off calculation.

Fig. 7 is a flow chart of an antenna optimization strategy using the present algorithm.

The embodiment of fig. 8 is a convergence situation diagram obtained by optimizing a two-dimensional schafer function based on a deep reinforcement learning algorithm.

The antenna pattern of the embodiment of fig. 9.

Fig. 10 illustrates the return loss of the antenna according to the embodiment.

Detailed Description

To further illustrate the technical means and effects of the present invention adopted to achieve the predetermined objects, the following detailed description of the embodiments, structures, features and effects according to the present invention will be made with reference to the accompanying drawings and preferred embodiments.

Example 1.

In the intelligent design method of the complex array antenna based on deep reinforcement learning described in this embodiment, an antenna base model is first designed. And determining the geometric size range of the antenna according to the required antenna radiation frequency, the directional diagram and the requirements of the space environment, and carrying out model design planning.

As shown in fig. 1, taking the PCB microstrip antenna in this embodiment as an example, a dielectric board is processed, and copper is plated. The main structure of the antenna in the embodiment is divided into three layers, wherein the lowest layer is a copper-clad layer with the thickness of 0.035 mm; the middle layer is a dielectric plate with the thickness of 2mm, and the dielectric constant of the dielectric plate is 2.65; the top of the film was coated with copper with a thickness of 0.035mm, and the metal coating method was adjusted according to the designed shape.

As shown in fig. 1, the antenna is a side view, and the lowest part of the antenna is a copper film which is a metal bottom plate and is used for reflecting electromagnetic waves and blocking the electromagnetic waves from radiating backwards, so that the rear lobe of the antenna is reduced. The middle layer is a medium layer which is used as a medium for bearing electromagnetic wave radiation; the propagation of electromagnetic waves in the medium and the distribution of an electromagnetic field are adjusted by controlling the dielectric constant of the medium, so that the electromagnetic wave radiation of the antenna is adjusted; the top is a metal copper-clad radiation metal patch, and the control of the electromagnetic wave of the antenna is realized by controlling the size, position and spacing of the metal patch and realizing the current distribution control of the antenna, thereby realizing the antenna index requirements of the required antenna radiation directional diagram, gain, sidelobe suppression and the like.

And determining the radiation direction of the antenna and the placement position of the antenna in a radiation space, and carrying out initialized three-dimensional numerical modeling, wherein the model is designed and has the same size as a real object. As shown in fig. 1, which is a hierarchical representation of an antenna. The first layer is that irregular metal patches 1 are distributed above the antenna, and the metal patches 1 are coated with copper in a metal coating mode; the second layer is a Teflon dielectric plate 2 which is designed for supporting an antenna structure below the metal patch 1 and is used for transmitting electromagnetic waves between the plates; the third layer is a micro-strip power distribution structure layer 3 for conducting current, which is realized by adopting a metal coating mode, the fourth layer is a Teflon dielectric plate 4 for supporting the metal coating layer and used for transmission of electromagnetic waves between plates, the fifth layer is a metal bottom plate 5 below an antenna, the lower part of the micro-strip power distribution structure layer 3 is connected with a coaxial line inner conductor 6 for electromagnetic wave feed, the outer part of the conductor 6 is a peripheral medium 7 for electromagnetic wave feed transmission, and the conductor 6 and the peripheral medium 7 jointly determine the working frequency of a feed port.

A Teflon dielectric plate 2 with a rectangular supporting antenna structure design and a Teflon dielectric plate 4 with a supporting metal coating layer are manufactured, an irregular metal patch 1 is placed on the Teflon dielectric plate 2 with the rectangular supporting antenna structure design, a copper coating film is coated on metal below the Teflon dielectric plate 2 with the rectangular supporting antenna structure design, etching is carried out, and a microstrip common-minute structure feed circuit is formed. Meanwhile, a Teflon dielectric plate 4 for supporting the metal coating layer is arranged between the micro-strip power distribution structural layer 3 and the metal bottom plate 5, and the Teflon dielectric plate 2 and the micro-strip power distribution structural layer 3 which are designed for supporting the antenna structure jointly form a double-layer micro-strip metal structure radiation antenna.

The hatched part of fig. 2 is the initial antenna model metal microstrip metal patch distribution. L is₁Is the length of the dielectric substrate, L₂Is the width of the dielectric substrate, the dotted line 8 is the metal patch omitted from the illustration, and a is the metal patch specific to each position.

d₁And d₂Respectively, the geometric dimensions of the optimized rectangular metal copper-clad metal patches are the same as the geometric dimensions of the rectangular metal patches at all the positions in the embodiment, that is, the dimension d is adopted in the embodiment₁×d₂The rectangular metal patch is excellentAnd realizing the radiation of the electromagnetic wave of the antenna.

The position center points of the rectangular metal patches are optimized by adopting a method of unequal intervals. Take the metal patch of the ith row and the jth column as an example, the interval is p_ij+d₁(ii) a And for the m-th row and n-th column of metal patches with a spacing s_mm+d₂. In the calculation, by adjusting p_ij+d₁And s_mn+d₂And the central position of the radiating metal patch is adjusted.

The metal patch has a transverse spacing of p_ijLongitudinal spacing of s_mn。p_ijAnd s_m，nThe value of (c) changes with the change in i, j, m, n. In the real optimization process, the distances corresponding to the i, j, m and n serial numbers are optimized, and deep learning is achieved.

In an embodiment, all metal patches are located in a total size L₁×L₂On a rectangular plane. For each metal patch c_ij(the ith row and the jth column of metal patches are shown), and are further divided into M × N small cells. By controlling the generation or disappearance of each small unit, different plane radiation structures are realized.

Fig. 3 is a metal patch denoted by a in fig. 2; extracting an optimized metal patch from the metal patch above the dielectric substrate; the line shape and the size are obtained by using algorithm optimization. Specifically, each metal patch has a different shape and size.

And determining the parameter variable of the antenna to be optimized, namely the parameter variation range. The method comprises the following steps: the number of antenna radiation sources, the size of the radiation units, the distance between the radiation units, the overall size, the dielectric constant of the antenna, the loss tangent and the frequency. The series of parameters correspond to variables to be optimized in the deep reinforcement learning algorithm.

For the antenna, the overall size of the antenna depends on the number, the size and the spacing of the radiating metal patches; the antenna has the total size L of the radiating metal patches, the number N of the radiating metal patches and the distance D between the radiating sources of the antenna; and the real part of the antenna dielectric constant epsilon', the loss tangent epsilon "(the dielectric constant and the loss tangent of the medium used are chosen only within given values, depending on the materials available).

The material property is determined, the boundary space of the embodiment is set as air, and is set as an open radiation space boundary condition, and the antenna current distribution solution and the antenna electromagnetic wave radiation solution are performed by using a Finite-Difference Time Domain Method (FDTD). In the specific implementation process, the finite difference time domain method in the present embodiment is not limited, and any electromagnetic field numerical calculation method may be adopted.

The antenna current distribution is determined by the layers in fig. 1 together. In this embodiment, the input of electromagnetic waves is realized by feeding the coaxial conductor 6 and the peripheral medium 7; electromagnetic waves propagate in a two-layer teflon medium on the antenna. Meanwhile, the induced current is generated in the metal patch 1 and the microstrip power division structure layer 3, and the spatial radiation of electromagnetic waves is realized. In the process of deep learning algorithm reinforcement learning, different forms of current distribution on the antenna are realized by changing the position and the size of the irregular metal patch 1 in the figure 1, so that the performance parameters of the electromagnetic wave radiation of the antenna are optimized.

As mentioned above, the antenna is optimized and target optimized in four optimization dimensions, namely the size L of the radiating metal patches, the number N of the radiating metal patches, the distance D between radiating sources of the antenna, the dielectric constant epsilon of the adopted medium of the antenna and the like. The specific parameter corresponding to the embodiment is a distance parameter p between the radiation metal patches_ij， s_ij(ii) a Size parameter d of antenna radiation metal patch_i,d_j(ii) a And the number M multiplied by N of the small metal patches divided on each metal patch, and the dielectric constant ε ═ ε '+ ε' of the medium.

And forming a corresponding database according to the calculated antenna current distribution and the radiation pattern. And adjusting the parameters to be optimized each time by using a deep learning algorithm, and updating and iterating the database. The adjustment parameters comprise the size of the antenna radiation unit, the distance between the radiation units, the shape of the radiation unit and the geometric parameter size of the antenna.

And normalizing and integrating the objective function value of the antenna, and performing objective optimization approximation by adopting a gradient strategy. As shown in fig. 4, in the two-dimensional situation, the normalized composite objective parameter optimization result is projected to obtain a path curve of the function value on the plane. As shown in fig. 5, a path curve of the normalized integrated objective function value in a three-dimensional case is shown.

As shown in fig. 6, the optimization result may not be the optimal solution (the minimum value in this embodiment) or may be the suboptimal solution by using the deep learning method in this embodiment. And adopting a backspacing strategy to carry out continuous learning or continuously optimizing along the route until a set convergence condition is reached. Optimizing unconverged data for each path according to the database records, and returning; if successive rollback re-optimizations do not converge, the deep learning strategy path is removed from the database.

In the optimization process by adopting the deep learning algorithm, for each group of prediction results which do not reach the preset expected normalized antenna target expectation, time difference backspacing calculation is carried out, and as shown in fig. 4, the optimization result prediction is carried out again. In order to ensure better performance, but the optimization result which does not reach the target can be better continued; and accelerating the elimination rate of the result with poor performance, wherein the backspacing strategy adopts the following formula to carry out income reward and punishment:

where v is the result of the optimization of the response step,

is the average value from the ith result to the nth result, max (v) is the maximum value from the ith result to the nth result, min (v) is the minimum value from the ith result to the nth result, v_c(x_i) Is an optimized value, v, of the step in the rollback strategy_c+1(x_i) The method is a value obtained by considering a penalty factor in the next step after a rollback strategy is adopted.

Performing MDP reinforcement learning process by combining Bellman optimization equation and adopting formula

V′(s)＝V(s)[v_c+1(x_i)·R⁽ⁿ⁾]

Q′(s，a)＝Q(s，a)[v_c+1(x_i)·R⁽ⁿ⁾]

Long-term revenue and expectation solution is performed. Where the sign · is the matrix inner product. And V '(s) and Q' (s, a) are new comprehensive expected values and profit values after a backspacing strategy and a reward and punishment strategy are adopted in an s state.

According to the method of the fourth step, the learning parameters of the algorithm are set as an antenna directional diagram R, a gain Ga, a bandwidth W and a return loss S₁₁(ii) a The normalization function weight coefficient is set to w₁＝0.4，w₁＝0.1， w₃＝0.2，w₄0.3; namely:

G＝w₁Ga+w₂R+w₃W+w₄S₁₁

＝0.4Ga+0.1R+0.2W+0.3S₁₁

fig. 7 shows a flowchart for invoking electromagnetic field numerical calculation by using a deep learning algorithm in the present embodiment. In the optimization of each sequence, if the optimization result is not converged, returning to perform continuous optimization until the optimization result is converged, and stopping learning; or optimizing to the optimal result position and stopping learning.

In the deep learning process, firstly, setting an optimization space to obtain an electromagnetic field distribution database of a corresponding data set in the space; the optimization space in the embodiment corresponds to the optimization range of the parameter. Example parameter p_ijAnd s_ijThe optimizing space is set to be 15mm-20mm]，d_iAnd d_jSet as [8mm-10mm ]]M.times.N is set to [ 5X 5-10X 10 ]]The value of [2.65| | |4.4 ] is set as ∈ ═ epsilon' + epsilon ″]. Here the symbol | | is the or operator.

Performing deep algorithm learning through an MDP (Markov Decision Process) setting; and (4) establishing a normalized expected target, establishing a dynamic discount factor for calculation, and adopting a punishment mechanism strategy for the weight of the discount factor. In the optimization process, a target gradient approximation method is adopted to calculate a target optimal value.

In gradient calculation, thisThe embodiment adopts a normalized calculation result G as an optimization target, an antenna directional diagram R, a gain Ga, a bandwidth W and a return loss S₁₁As an optimization variable; the implementation of the gradient strategy optimization is realized by adopting the following formula

Where Max is a function of taking the maximum value, the optimization results are gradually adjusted for each optimized normalized antenna performance to ensure that the maximum gradient value is found and approximated to the optimum value at the fastest rate until the calculation results as described in 000050 are reached.

In the embodiment, the calculation result is subjected to convergence judgment by adopting a maximum iteration number method and a target realization method; when the calculation algebra is more than or equal to 500 generations and the record of each deep learning path database is more than or equal to 500, performing time difference backspacing calculation; and when the learning result meets the target requirement, recording the database.

To prove the learning ability of the deep learning algorithm of the embodiment, the algorithm is tested by using a two-dimensional Schaffer function benchmark function. The function is a typical multi-objective, multi-optimization parameter basis function. The performance of the algorithm on the optimizing speed and the optimizing precision of the function can be reflected. As shown in fig. 8, it shows the convergence obtained by the two-dimensional schafer function optimization based on the deep reinforcement learning algorithm of the present embodiment, and the convergence obtained by using the conventional genetic algorithm. Compared with the traditional genetic algorithm, the deep learning algorithm in the embodiment only needs 9 × 10⁴Instead, it is possible to achieve a convergence degree of < 10^-13Orders of magnitude, thereby demonstrating the superiority of the performance of the algorithm.

And based on a deep learning algorithm in the embodiment, antenna optimization is carried out, and an optimal result obtained in a database is extracted. And performing reverse numerical modeling on the optimization result described in the embodiment, and calculating by using the finite difference time domain method adopted in the embodiment to obtain the final radiation effect of the antenna. FIG. 9 shows the antenna radiation calculated in this exampleA directional pattern, the radiation of which achieves a high-gain antenna optimization goal; FIG. 10 shows the calculated return loss (S) based on this example₁₁). The results show that the performance of the antenna obtained by the method of the embodiment meets the final design purpose.

Although the present invention has been described with reference to the preferred embodiments, it should be understood that various changes, substitutions and alterations can be made herein without departing from the spirit and scope of the invention as defined by the appended claims.

Claims

1. A complex array antenna intelligent design method based on deep reinforcement learning is characterized in that the antenna design steps are as follows:

step four: performing deep algorithm learning by using a Markov Decision Process, forming a corresponding database according to the obtained antenna electromagnetic field distribution and radiation pattern, determining a normalized expected target, performing a learning Process by using a Bellman Optimality mode, realizing automatic target approximation by using a random gradient strategy, determining a backspacing strategy, and in the optimization of each sequence, if the optimization result is not converged, backspacing to perform continuous optimization until the optimization is converged, and stopping learning; or optimizing to the optimal result position, and stopping learning;

step five: determining the result convergence condition, judging whether the target approximation is realized or the maximum iteration number is met, if so, executing the step six, otherwise, repeating the step three and the step four;

step seven: and finishing the learning.

2. The intelligent design method for a complex array antenna based on deep reinforcement learning of claim 1, wherein in step two, the solving dimension corresponds to the geometric dimension and physical attribute parameters of the optimized antenna, the solving space corresponds to the physical parameter variation range of the antenna, the variable constraint condition corresponds to the antenna parameter correspondence and structural parameter requirements in the optimization process, the optimization target corresponds to the various radiation performance parameter optimization target requirements of the antenna design, the number of the optimization targets corresponds to the number of the radiation performance parameter optimization targets of the antenna design, the normalized target factor represents the normalized expected value of multiple optimization targets subjected to weight processing, and the state space corresponds to the result of the corresponding iterative deep reinforcement learning process for the target to be optimized, i.e. the optimized antenna parameter condition, in the process of learning the convergence condition corresponding to the deep learning algorithm, the calculated target function reaches an expected condition;

wherein the physical parameter variation range of the antenna comprises: the number of antenna radiation sources, the size of the radiation units, the distance between the radiation units, the overall size, the dielectric constant of the antenna, the loss tangent and the frequency;

the overall size depends on the number, size and spacing of the rectangular radiating metal patches. .

3. The intelligent design method for complex array antenna based on deep reinforcement learning as claimed in claim 1, wherein in step three, the radiation source of the antenna is composed of the rectangular radiation metal patch with fixed size, and the electromagnetic field of each radiation source is equivalent to

Wherein

the vector position of an observation point in a space from a radiation source is calculated, k is a state parameter of the corresponding space, a series of equivalent radiation sources are combined into an irregular radiation unit with a random structure, and the distribution of an antenna electromagnetic field is obtained after calculation.

4. The intelligent design method for complex array antenna based on deep reinforcement learning as claimed in claim 1, wherein the radiation pattern required in step four is derived from a theoretical formula,

5. The intelligent design method of a complex array antenna based on deep reinforcement learning of claim 1, wherein in step four, the deep reinforcement learning by the Markov Decision Process is specifically as follows:

6. The intelligent design method for complex array antenna based on deep reinforcement learning as claimed in claim 1, wherein in step four, a weighting method is used to determine a normalized expected target, each radiation performance parameter target is assigned to a weighted expected target value for optimization, and for N targets, an overall target G is optimized,

G＝＝w₁G₁+w₂G₂+....+w_iG_i....+w_NG_N

in the formula w_iIs the weight coefficient of each object.

7. The intelligent design method for complex array antenna based on deep reinforcement learning as claimed in claim 1, wherein in step four, the Bellman optimization equation is used to execute the MDP reinforcement learning process,

V(s)＝E[R_t+1+γmax V(s_t+1)|S_t＝S]；

Q(s，a)＝E[R_t+1+γmax Q(s_t+1，a′)|S_t＝S，A_t＝A]；

8. The intelligent design method for complex array antenna based on deep reinforcement learning as claimed in claim 1, wherein in step four, for any state space S_NThe Stochastic strategy Gradient Method is adopted to carry out multi-target approximation calculation,

delta represents a value of the amount of change of the gradient,

the expression of the gradient operator is used to indicate,

9. The intelligent design method for the complex array antenna based on the deep reinforcement learning of claim 1, wherein in step four, the backoff strategy uses the following formula to perform revenue reward and punishment, namely a reward and punishment strategy:

where v is the result of the optimization of the response step,

is the average value from the ith result to the nth result, max (v) is the maximum value from the ith result to the nth result, min (v) is the minimum value from the ith result to the nth result, v_c(x_i) Is an optimized value, v, of the step in the rollback strategy_c+1(x_i) The method is a value obtained by considering a reward punishment strategy in the next step after a backspacing strategy is adopted.

10. The intelligent design method for complex array antenna based on deep reinforcement learning as claimed in claim 10, wherein in the backspacing strategy, according to Bellman Optimality equation, formula is adopted

V′(s)＝V(s)[v_c+1(x_i)·R⁽ⁿ⁾]

Q′(s，a)＝Q(s，a)[v_c+1(x_i)·R⁽ⁿ⁾]